Backup Configuration - Final Specification ¶
Overview ¶
Comprehensive backup strategy for S3 bucket and RDS PostgreSQL database with dual-tier recovery capabilities and cross-region disaster recovery.
Recovery Objectives ¶
RTO (Recovery Time Objective): The maximum acceptable time to restore service after a failure. This measures how quickly data can be recovered and systems brought back online.
RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time. This represents the age of the data that must be recovered for operations to resume.
Architecture Components ¶
S3 Backup Strategy ¶
Source Bucket (eu-west-1) ¶
Purpose: Operational backup and live data access
Configuration:
- Versioning: Enabled
- Object Lock: Disabled
- Current Versions: S3 Intelligent-Tiering (automatic optimization)
- Noncurrent Versions: S3 Intelligent-Tiering for 35 days, then permanently deleted
- Retention: 35 days for deleted files
Recovery Capabilities:
- Deleted file recovery: 0-35 days (immediate access)
- Accidental overwrites: Instant restoration from previous versions
Replica Buckets (eu-west-3 for media, eu-north-1 for vault) ¶
Purpose: Disaster recovery with 180-day retention window
Configuration:
- Versioning: Enabled
- Object Lock: GOVERNANCE mode, 180-day default retention
- Replication: Cross-Region Replication from source bucket
- Current Versions: Stored in Deep Archive (via replication), exist as long as counterpart exists in source bucket (mirror principle)
- Noncurrent Versions: Immediate transition to Deep Archive, expire after 180 days
- Maximum Retention: Noncurrent versions retained for 180 days (aligned with Deep Archive minimum billing)
Recovery Capabilities:
- File recovery: Any file state from the last 180 days (12-48 hour retrieval via Standard restore)
- Ransomware protection: Immutable versions for 180 days
- Clean disaster recovery window: Recover to any point within 180-day window
Note: S3 Glacier Flexible Retrieval can be used instead of Deep Archive to reduce restoration time to 3-5 hours (Standard retrieval) at approximately 2x higher storage costs. This trade-off may be considered if faster disaster recovery is prioritized over storage cost optimization.
RDS Backup Strategy ¶
Operational Backup (eu-west-1) ¶
Purpose: Point-in-time recovery for recent operational issues
Configuration:
- Method: RDS native automated backup (continuous)
- Retention: 35 days
- Granularity: Second-level PITR
Recovery Capabilities:
- Point-in-time restore: Any second within 35 days
- RTO: 30-60 minutes
- RPO: <1 minute
Disaster Recovery Backup ¶
Purpose: Long-term disaster recovery snapshots
Configuration:
- Method: AWS Backup snapshots
- Frequency: Every 6 hours
- Retention: 180 days
- Primary Region: eu-west-1
- DR Region: eu-west-3 (cross-region copy)
Recovery Capabilities:
- Snapshot restore: Any 6-hour interval within 180 days
- RTO: 2-6 hours
- RPO: Up to 6 hours
Recovery Windows Summary ¶
File recovery scenarios include both S3 file restoration and RDS metadata restoration to maintain data integrity.
| Recovery Scenario | Timeframe | RTO | RPO | Recovery Steps |
|---|---|---|---|---|
| Deleted File Recovery (Recent) | 0-35 days | 30-60 minutes | <1 minute | 1. Restore RDS to exact timestamp via PITR 2. Restore S3 file from source bucket noncurrent version |
| Deleted File Recovery (Long-term) | 35-180 days | 12-48 hours | Up to 6 hours | 1. Restore RDS from 6-hour snapshot 2. Restore S3 file from replica bucket Deep Archive (eu-west-3 or eu-north-1) |
| Complete Region Failure | 0-180 days | 12-48 hours | Up to 6 hours | 1. Restore RDS from eu-west-3 snapshot copy 2. Restore S3 files from replica buckets (eu-west-3 or eu-north-1) |
Data Coupling Considerations ¶
Files and database records are coupled (files require metadata from database for context). Backup synchronization ensures consistency:
- 0-35 days: Perfect consistency (PITR allows exact timestamp restore)
- 35-180 days: Up to 6-hour consistency window between DB snapshots and S3 state with minimal impact: potential orphaned files or missing references within 6-hour window, all data recoverable
Security & Protection ¶
IAM Access Control ¶
- Application Role: Read/write access to source bucket only
- Allowed:
s3:PutObject,s3:GetObject,s3:DeleteObject(creates delete markers) - Denied:
s3:DeleteObjectVersion,s3:BypassGovernanceRetention - Denied: All access to replica bucket
- Allowed:
- Replica Bucket: Accessible only by S3 Replication Service and authorized administrators
Object Lock Protection (Replica Bucket) ¶
- Mode: GOVERNANCE
- Retention: 180 days
- Protection Against:
- Ransomware/malware deletion attempts
- Compromised credentials
- Accidental admin deletion
- Lifecycle misconfiguration
- Insider threats
Versioning Protection (Both Buckets) ¶
- Prevents permanent data loss from accidental overwrites
- Maintains version history per retention policy
- Delete operations create markers, preserving underlying versions
Monitoring & Alerting ¶
Notification Channels ¶
- Email: SNS topic subscriptions for immediate alerts
- Slack/Teams: Webhook integration for team visibility
CloudWatch Alarms ¶
AWS Backup Monitoring:
- Backup job failures (FAILED, ABORTED, EXPIRED states)
- Monitored in both primary (eu-west-1) and DR (eu-west-3) regions
- Alert method: EventBridge → SNS
S3 Replication Monitoring:
- Replication lag exceeding 15 minutes (both eu-west-3 and eu-north-1)
- Replication operation failures
- Alert method: CloudWatch Alarms → SNS
Lifecycle Policy Monitoring:
- Unexpected object count drops in replica buckets
- Alert method: CloudWatch Alarms → SNS
Comments
Please login to leave a comment.
No comments yet. Be the first to comment!