AWS Backup Implementation Guide ¶
Based on BACKUP-CONFIGURATION.md Specification ¶
Account: 828879644785
CLI Profile: mnemonica
Date: 2025-10-31
This guide provides step-by-step instructions to implement a comprehensive backup strategy for S3 buckets and RDS PostgreSQL databases with dual-tier recovery capabilities and cross-region disaster recovery.
Note: This guide includes Object Lock configuration sections marked as [DEFERRABLE]. These can be implemented later as they require creating a new bucket. All other sections can be implemented immediately on existing infrastructure.
Table of Contents ¶
- Prerequisites
- Pre-Implementation Checklist
- Step 0: Backup Current Configurations
- Current Infrastructure Status
- Phase 1: S3 Source Bucket Configuration
- Phase 2: S3 Replica Bucket Configuration
- Phase 3: S3 Cross-Region Replication
- Phase 4: IAM Security Configuration
- Phase 5: RDS Operational Backup Configuration
- Phase 6: AWS Backup for RDS
- Phase 7: Validation and Testing
- Phase 8: Monitoring and Alerting
- Recovery Procedures
- Summary Checklist
- [DEFERRABLE] Object Lock Implementation
Prerequisites ¶
Production Environment ¶
- AWS Account: 828879644785
- Media Source Bucket: mne-media-prod (eu-west-1) - Already exists
- Media Replica Bucket: mne-media-replica (eu-west-3) - Already exists
- Vault Source Bucket: mne-vault-prod (eu-west-1) - Already exists
- Vault Replica Bucket: mne-vault-replica (eu-north-1) - Already exists
- RDS Instance: eks-mnemonica-prod (eu-west-1) - Already exists
- CLI Profile: mnemonica
Current Metrics ¶
- Media Source Bucket (mne-media-prod): ~155 TB
- Standard Storage: 711 GB
- Intelligent-Tiering (Frequent): 21.7 TB
- Intelligent-Tiering (Infrequent): 32.3 TB
- Intelligent-Tiering (Archive Instant): 100.2 TB
- Media Replica Bucket (mne-media-replica): ~92.3 TB (Deep Archive)
- Vault Source Bucket (mne-vault-prod): ~41.1 TB (57,551 objects)
- Intelligent-Tiering storage class
- Current: 180-day noncurrent version retention (needs change to 35 days)
- Vault Replica Bucket (mne-vault-replica): ~41.1 TB (57,551 objects)
- Deep Archive storage class (Stockholm region)
- No lifecycle policy currently configured (needs 180-day noncurrent retention)
- RDS Instance: PostgreSQL 16.8, db.m7g.xlarge, 100GB
Required Permissions ¶
Your AWS user/role must have permissions to:
- Modify S3 bucket lifecycle configurations
- Manage IAM roles and policies
- Configure RDS backups
- Create and manage AWS Backup plans
AWS CLI Configuration ¶
# Verify AWS CLI is configured with mnemonica profile
aws sts get-caller-identity --no-cli-pager --profile mnemonica
# Expected output: Account "828879644785"
Pre-Implementation Checklist ¶
- Review all configuration files in
configs/directory - Ensure you have AWS CLI configured with
mnemonicaprofile - Verify you have necessary permissions (S3, RDS, Backup, IAM)
- Backup current configurations (commands provided below)
- Schedule maintenance window if needed
- Notify team of changes
Step 0: Backup Current Configurations ¶
Before making any changes, save current configurations for rollback:
# Create backups directory if it doesn't exist
mkdir -p backups
# Backup media bucket lifecycles
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica > backups/source-bucket-lifecycle-backup.json
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica > backups/replica-bucket-lifecycle-backup.json
# Backup vault bucket lifecycles
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica > backups/vault-source-bucket-lifecycle-backup.json
# Note: vault-replica has no lifecycle policy currently, skip backup
echo "No lifecycle policy configured for mne-vault-replica (will be configured in Phase 2B)" > backups/vault-replica-bucket-lifecycle-backup.json
# Backup current AWS Backup plan
aws backup get-backup-plan \
--backup-plan-id e88b2f03-25d3-4bc0-a585-f6994e54cdaa \
--region eu-west-1 \
--no-cli-pager --profile mnemonica > backups/aws-backup-plan-backup.json
Current Infrastructure Status ¶
What's Already Working ✅ ¶
Media Buckets:
- S3 versioning enabled on both media buckets
- Cross-region replication active (mne-media-prod → mne-media-replica, eu-west-3 Paris)
- Source bucket: Intelligent-Tiering, 180d→35d retention change needed
- Replica bucket: Deep Archive, 15d→180d retention change needed
Vault Buckets:
- S3 versioning enabled on both vault buckets
- Cross-region replication active (mne-vault-prod → mne-vault-replica, eu-north-1 Stockholm)
- Source bucket: Intelligent-Tiering, 180d→35d retention change needed
- Replica bucket: Deep Archive, needs lifecycle policy ⚠️
RDS:
- RDS automated backups (35-day retention, PITR enabled)
- AWS Backup plan exists (daily, 35-day retention)
- IAM roles for replication and backup services
What Needs Changes 🔧 ¶
Media Buckets:
- mne-media-prod noncurrent version retention (180d → 35d)
- mne-media-replica noncurrent version retention (15d → 180d)
Vault Buckets:
- mne-vault-prod noncurrent version retention (180d → 35d)
- mne-vault-replica needs lifecycle policy (180d noncurrent retention)
RDS:
- AWS Backup frequency (daily → 6-hour)
- AWS Backup retention (35d → 180d)
IAM:
- Application IAM role (needs creation with vault bucket access)
What's Deferred ⏸️ ¶
- S3 Object Lock on replica buckets (requires new buckets or risk acceptance)
Phase 1: S3 Source Bucket Configuration ¶
Bucket: mne-media-prod
Region: eu-west-1
Status: Already exists, needs lifecycle adjustment
Step 1.1: Review Current Configuration ¶
# Check current versioning
aws s3api get-bucket-versioning \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Check current lifecycle rules
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Current State:
- ✅ Versioning: Enabled
- ✅ Intelligent-Tiering: Day 0 transition
- ⚠️ Noncurrent version expiration: 180 days (needs to be 35 days)
Step 1.2: Update Lifecycle Configuration ¶
Goal: Change noncurrent version retention from 180 days to 35 days per plan specification.
Configuration file: configs/source-bucket-lifecycle-35d.json (already created)
Note: Current configuration was backed up in Step 0.
# Apply new lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket mne-media-prod \
--lifecycle-configuration file://configs/source-bucket-lifecycle-35d.json \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Step 1.3: Validate Changes ¶
# Verify new configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Expected: NoncurrentVersionExpiration: { "NoncurrentDays": 35 }
Impact:
- Deleted file versions will now expire after 35 days (instead of 180)
- No immediate data loss (only affects future deletions)
- Operational recovery window: 0-35 days
Phase 2: S3 Replica Bucket Configuration ¶
Bucket: mne-media-replica
Region: eu-west-3
Status: Already exists, needs lifecycle adjustment
Step 2.1: Review Current Configuration ¶
# Check current versioning
aws s3api get-bucket-versioning \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Check current lifecycle rules
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Check Object Lock status
aws s3api get-object-lock-configuration \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
Current State:
- ✅ Versioning: Enabled
- ✅ Storage Class: Deep Archive (via replication)
- ⚠️ Noncurrent version expiration: 15 days (needs to be 180 days)
- ❌ Object Lock: Not enabled (see Phase 11 - Deferrable)
Step 2.2: Update Lifecycle Configuration ¶
Goal:
- Change noncurrent version retention from 15 days to 180 days
- Remove current version expiration (mirror source bucket)
- Align with Deep Archive minimum storage duration (180 days)
Design Principle: Replica bucket should mirror the source bucket. Current versions stay as long as they exist in source. Only noncurrent versions (deleted/replaced) expire after 180 days.
Configuration file: configs/replica-bucket-lifecycle-180d-mirror.json (already created)
Note: Current configuration was backed up in Step 0.
# Apply new lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket mne-media-replica \
--lifecycle-configuration file://configs/replica-bucket-lifecycle-180d-mirror.json \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
Step 2.3: Validate Changes ¶
# Verify new configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Expected:
# - No expiration on current versions
# - NoncurrentVersionExpiration: { "NoncurrentDays": 180 }
Impact:
- Extends noncurrent version retention from 15d to 180d
- Aligns with Deep Archive billing (no early deletion charges)
- Disaster recovery window: 0-180 days
- Current versions stay indefinitely (mirrors source)
Phase 2A: S3 Vault Source Bucket Configuration ¶
Bucket: mne-vault-prod
Region: eu-west-1
Status: Already exists, needs lifecycle adjustment
Step 2A.1: Review Current Configuration ¶
# Check current versioning
aws s3api get-bucket-versioning \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Check current lifecycle rules
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Check replication configuration
aws s3api get-bucket-replication \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Current State:
- ✅ Versioning: Enabled
- ✅ Intelligent-Tiering: Day 0 transition
- ⚠️ Noncurrent version expiration: 180 days (needs to be 35 days)
- ✅ Cross-region replication: Active to mne-vault-replica (eu-north-1)
- ✅ Replication storage class: DEEP_ARCHIVE
- ✅ Delete marker replication: Enabled
- ✅ Encryption: AES256 with bucket key enabled
Step 2A.2: Update Lifecycle Configuration ¶
Goal: Change noncurrent version retention from 180 days to 35 days to match media-prod retention strategy.
Configuration file: configs/vault-source-bucket-lifecycle-35d.json (already created)
Note: Current configuration was backed up in Step 0.
# Apply new lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket mne-vault-prod \
--lifecycle-configuration file://configs/vault-source-bucket-lifecycle-35d.json \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Step 2A.3: Validate Changes ¶
# Verify new configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Expected: NoncurrentVersionExpiration: { "NoncurrentDays": 35 }
Impact:
- Deleted file versions will now expire after 35 days (instead of 180)
- Matches media-prod bucket retention strategy
- No immediate data loss (only affects future deletions)
- Operational recovery window: 0-35 days
Phase 2B: S3 Vault Replica Bucket Configuration ¶
Bucket: mne-vault-replica
Region: eu-north-1 (Stockholm)
Status: Needs lifecycle policy ⚠️
Step 2B.1: Review Current Configuration ¶
# Check current versioning
aws s3api get-bucket-versioning \
--bucket mne-vault-replica \
--region eu-north-1 \
--no-cli-pager --profile mnemonica
# Check current lifecycle rules (currently none)
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-vault-replica \
--region eu-north-1 \
--no-cli-pager --profile mnemonica
Current State:
- ✅ Versioning: Enabled
- ✅ Storage Class: Deep Archive (via replication)
- ✅ Encryption: AES256 with bucket key enabled
- ❌ Lifecycle policy: NOT configured (NoSuchLifecycleConfiguration)
Step 2B.2: Apply Lifecycle Configuration ¶
Goal: Configure 180-day noncurrent version retention to match source bucket and align with Deep Archive minimum storage duration (180 days).
Design Principle: Replica bucket should mirror the source bucket. Current versions stay as long as they exist in source. Only noncurrent versions (deleted/replaced) expire after 180 days.
Configuration file: configs/vault-replica-bucket-lifecycle-180d-mirror.json (already created)
# Apply lifecycle configuration
aws s3api put-bucket-lifecycle-configuration \
--bucket mne-vault-replica \
--lifecycle-configuration file://configs/vault-replica-bucket-lifecycle-180d-mirror.json \
--region eu-north-1 \
--no-cli-pager --profile mnemonica
Step 2B.3: Validate Changes ¶
# Verify new configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-vault-replica \
--region eu-north-1 \
--no-cli-pager --profile mnemonica
# Expected:
# - No expiration on current versions
# - NoncurrentVersionExpiration: { "NoncurrentDays": 180 }
# - AbortIncompleteMultipartUpload: 180 days
Impact:
- Adds noncurrent version retention of 180 days
- Aligns with Deep Archive billing (no early deletion charges)
- Disaster recovery window: 0-180 days
- Current versions stay indefinitely (mirrors source)
Phase 3: S3 Cross-Region Replication ¶
Status: Already configured and working ✅
Step 3.1: Verify Media Bucket Replication ¶
# Check media bucket replication configuration
aws s3api get-bucket-replication \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Media Bucket Replication:
- ✅ Source: mne-media-prod (eu-west-1)
- ✅ Destination: mne-media-replica (eu-west-3, Paris)
- ✅ Storage Class: DEEP_ARCHIVE
- ✅ Delete Marker Replication: Enabled
- ✅ IAM Role: s3crr_role_for_mne-media-prod_to_crr-media-prod
No changes needed - Media bucket replication is working correctly.
Step 3.2: Verify Vault Bucket Replication ¶
# Check vault bucket replication configuration
aws s3api get-bucket-replication \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Vault Bucket Replication:
- ✅ Source: mne-vault-prod (eu-west-1)
- ✅ Destination: mne-vault-replica (eu-north-1, Stockholm)
- ✅ Storage Class: DEEP_ARCHIVE
- ✅ Delete Marker Replication: Enabled
- ✅ IAM Role: s3crr_role_for_mne-vault-prod_to_mne-vault-replica
- ✅ Replication Metrics: Enabled
No changes needed - Vault bucket replication is working correctly.
Step 3.3: Test Replication (Optional) ¶
Test media bucket replication:
# Upload a test file to media bucket
echo "Test media replication - $(date)" > test-media-replication-$(date +%s).txt
aws s3 cp test-media-replication-*.txt s3://mne-media-prod/ --region eu-west-1 --no-cli-pager --profile mnemonica
# Wait 2-5 minutes, then check replica
aws s3 ls s3://mne-media-replica/ --region eu-west-3 --no-cli-pager --profile mnemonica | grep test-media-replication
# Clean up
aws s3 rm s3://mne-media-prod/test-media-replication-*.txt --region eu-west-1 --no-cli-pager --profile mnemonica
Test vault bucket replication:
# Upload a test file to vault bucket
echo "Test vault replication - $(date)" > test-vault-replication-$(date +%s).txt
aws s3 cp test-vault-replication-*.txt s3://mne-vault-prod/ --region eu-west-1 --no-cli-pager --profile mnemonica
# Wait 2-5 minutes, then check replica
aws s3 ls s3://mne-vault-replica/ --region eu-north-1 --no-cli-pager --profile mnemonica | grep test-vault-replication
# Clean up
aws s3 rm s3://mne-vault-prod/test-vault-replication-*.txt --region eu-west-1 --no-cli-pager --profile mnemonica
Phase 4: IAM Security Configuration ¶
Step 4.1: Verify Existing IAM Roles ¶
# Check replication role
aws iam get-role \
--role-name s3crr_role_for_mne-media-prod_to_crr-media-prod \
--no-cli-pager --profile mnemonica
# Check AWS Backup service role
aws iam get-role \
--role-name AWSBackupDefaultServiceRole \
--no-cli-pager --profile mnemonica
Status: ✅ Both roles exist and are working
Step 4.2: Create Application IAM Role ¶
Purpose: Restrict application access to source buckets only (media and vault), deny version deletion and replica bucket access.
Configuration files:
configs/application-iam-role-trust-policy.jsonconfigs/application-s3-policy.json(updated to include vault buckets)
# Create the role
aws iam create-role \
--role-name eks-mnemonica-prod-s3-role \
--assume-role-policy-document file://configs/application-iam-role-trust-policy.json \
--description "Restricted S3 access for eks-mnemonica-prod application" \
--no-cli-pager --profile mnemonica
# Attach the policy
aws iam put-role-policy \
--role-name eks-mnemonica-prod-s3-role \
--policy-name eks-mnemonica-prod-s3-access \
--policy-document file://configs/application-s3-policy.json \
--no-cli-pager --profile mnemonica
Step 4.3: Validate Application Role ¶
# Verify role exists
aws iam get-role \
--role-name eks-mnemonica-prod-s3-role \
--no-cli-pager --profile mnemonica
# Get role ARN for application configuration
aws iam get-role \
--role-name eks-mnemonica-prod-s3-role \
--query 'Role.Arn' \
--output text \
--no-cli-pager --profile mnemonica
# Expected: arn:aws:iam::828879644785:role/eks-mnemonica-prod-s3-role
Policy Restrictions:
- ✅ Allows: PutObject, GetObject, DeleteObject, ListBucket on mne-media-prod and mne-vault-prod
- ❌ Denies: DeleteObjectVersion (prevents permanent version deletion from both source buckets)
- ❌ Denies: All access to replica buckets (mne-media-replica, mne-vault-replica)
- ❌ Denies: BypassGovernanceRetention
Phase 5: RDS Operational Backup Configuration ¶
Instance: eks-mnemonica-prod
Engine: PostgreSQL 16.8
Region: eu-west-1
Step 5.1: Verify Current RDS Backup Configuration ¶
# Check current backup settings
aws rds describe-db-instances \
--db-instance-identifier eks-mnemonica-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'DBInstances[0].{BackupRetentionPeriod:BackupRetentionPeriod,LatestRestorableTime:LatestRestorableTime,PreferredBackupWindow:PreferredBackupWindow}'
Current Configuration:
- ✅ Backup Retention: 35 days (matches plan)
- ✅ PITR Enabled: Yes
- ✅ Latest Restorable Time: Within minutes (active)
- ✅ Backup Window: 22:00-00:00 UTC
No changes needed - RDS automated backups are correctly configured.
Step 5.2: Validate PITR Capability ¶
# Verify point-in-time recovery is available
aws rds describe-db-instances \
--db-instance-identifier eks-mnemonica-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'DBInstances[0].[LatestRestorableTime,BackupRetentionPeriod]'
# Expected: Recent timestamp and 35 days
Capabilities:
- RTO: 30-60 minutes
- RPO: <1 minute (second-level granularity)
- Recovery Window: 0-35 days
Phase 6: AWS Backup for RDS ¶
Purpose: Long-term disaster recovery snapshots with 6-hour frequency and 180-day retention
Architecture Note (Updated 2025-11-22):
- PITR: Handled by RDS native automated backups (NOT AWS Backup)
- AWS Backup: Only manages 6-hour snapshots for extended retention and cross-region DR
Step 6.1: Review Current AWS Backup Plan ¶
# List current backup plans
aws backup list-backup-plans \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Get details of existing plan
aws backup get-backup-plan \
--backup-plan-id 16cae9bf-5a0c-4ac7-8fb6-6f9ae2eec630 \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Target Configuration:
- Plan Name: eks-mnemonica-6h-snapshots-180d
- Single Rule: 6-hour snapshots - 180 days retention with DR copy to eu-west-3
- Schedule: Every 6 hours (cron: 0 */6 * * ? *)
- Recovery coverage: Days 0-180 (discrete 6-hour snapshots)
- PITR coverage: Days 0-35 (via RDS native automated backups)
Step 6.2: Update Backup Plan Configuration ¶
Configuration File: configs/aws-backup-plan-6h-snapshots.json
This configuration contains only the 6-hour snapshot rule (PITR rule has been removed).
# Update the backup plan to use the snapshot-only configuration
aws backup update-backup-plan \
--backup-plan-id 16cae9bf-5a0c-4ac7-8fb6-6f9ae2eec630 \
--backup-plan file://configs/aws-backup-plan-6h-snapshots.json \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Expected Result:
- Plan updated to contain only one rule: 6-Hour-Snapshots-180-Day-Retention-DR-Copy
- No PITR recovery points will be created by AWS Backup
- PITR is handled by RDS native automated backups (Step 5.2)
Step 6.3: Validate Backup Plan ¶
# Verify backup plan configuration
aws backup get-backup-plan \
--backup-plan-id 16cae9bf-5a0c-4ac7-8fb6-6f9ae2eec630 \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Expected:
# - BackupPlanName: "eks-mnemonica-6h-snapshots-180d"
# - Single rule: 6-Hour-Snapshots-180-Day-Retention-DR-Copy
# - ScheduleExpression: "cron(0 */6 * * ? *)"
# - EnableContinuousBackup: false
# - Lifecycle DeleteAfterDays: 180
# - CopyActions to eu-west-3 with 180-day retention
Step 6.4: Monitor Backup Jobs ¶
# Wait for next backup (within 6 hours), then check
aws backup list-backup-jobs \
--by-resource-arn arn:aws:rds:eu-west-1:828879644785:db:eks-mnemonica-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--max-results 5
# Expected:
# - State: COMPLETED
# - CreatedBy.BackupRuleName: "6-Hour-Snapshots-180-Day-Retention-DR-Copy"
# - BackupVaultName: "eks-mnemonica-prod-vault"
# Check DR region copy
aws backup list-backup-jobs \
--by-resource-arn arn:aws:rds:eu-west-1:828879644785:db:eks-mnemonica-prod \
--region eu-west-3 \
--no-cli-pager --profile mnemonica \
--max-results 5
Backup Schedule:
- Frequency: Every 6 hours (00:00, 06:00, 12:00, 18:00 Europe/Rome time)
- Retention: 180 days (snapshots), 35 days (PITR via RDS native)
- RTO: Days 0-35: 30-60 min (PITR), Days 36-180: 2-6 hours (snapshots)
- RPO: Days 0-35: <1 minute (PITR), Days 36-180: Up to 6 hours (snapshots)
Phase 7: Validation and Testing ¶
Step 7.1: Validate S3 Configuration ¶
# Source bucket validation
aws s3api get-bucket-versioning \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
| jq '.Rules[] | {ID, NoncurrentVersionExpiration}'
# Replica bucket validation
aws s3api get-bucket-versioning \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
aws s3api get-bucket-lifecycle-configuration \
--bucket mne-media-replica \
--region eu-west-3 \
--no-cli-pager --profile mnemonica \
| jq '.Rules[] | {ID, NoncurrentVersionExpiration}'
Expected Results:
- Source: Versioning enabled, 35-day noncurrent retention
- Replica: Versioning enabled, 180-day noncurrent retention, no current expiration
Step 7.2: Test File Deletion Recovery ¶
# 1. Upload test file
echo "Test file - $(date)" > test-recovery-$(date +%s).txt
aws s3 cp test-recovery-*.txt s3://mne-media-prod/ \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# 2. Wait for replication (5 minutes)
sleep 300
# 3. Verify replicated
aws s3 ls s3://mne-media-replica/ \
--region eu-west-3 \
--no-cli-pager --profile mnemonica | grep test-recovery
# 4. Delete from source (creates delete marker)
aws s3 rm s3://mne-media-prod/test-recovery-*.txt \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# 5. Verify deleted
aws s3 ls s3://mne-media-prod/ \
--region eu-west-1 \
--no-cli-pager --profile mnemonica | grep test-recovery
# (Should not appear)
# 6. List versions to see delete marker
aws s3api list-object-versions \
--bucket mne-media-prod \
--prefix test-recovery \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# 7. Remove delete marker to restore
DELETE_MARKER_ID=$(aws s3api list-object-versions \
--bucket mne-media-prod \
--prefix test-recovery \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'DeleteMarkers[0].VersionId' \
--output text)
aws s3api delete-object \
--bucket mne-media-prod \
--key test-recovery-*.txt \
--version-id $DELETE_MARKER_ID \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# 8. Verify restored
aws s3 ls s3://mne-media-prod/ \
--region eu-west-1 \
--no-cli-pager --profile mnemonica | grep test-recovery
# (Should appear again)
# 9. Clean up
aws s3 rm s3://mne-media-prod/test-recovery-*.txt \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Step 7.3: Validate RDS Backup Configuration ¶
# Check RDS automated backup
aws rds describe-db-instances \
--db-instance-identifier eks-mnemonica-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
| jq '.DBInstances[0] | {BackupRetentionPeriod, LatestRestorableTime, PreferredBackupWindow}'
# Check AWS Backup jobs
aws backup list-backup-jobs \
--by-resource-arn arn:aws:rds:eu-west-1:828879644785:db:eks-mnemonica-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--max-results 10
# Verify cross-region copies
aws backup list-recovery-points-by-backup-vault \
--backup-vault-name Default \
--region eu-west-3 \
--no-cli-pager --profile mnemonica \
--max-results 5
Phase 8: Monitoring and Alerting ¶
Objective: Set up essential monitoring for backup failures, replication issues, and lifecycle policy problems.
Estimated Time: 15 minutes
Cost Impact: ~$1.70/month (SNS + CloudWatch Alarms)
Architecture Note: All CloudWatch alarms are created in eu-west-1 (same region as the SNS topic) for simplicity. CloudWatch can monitor S3 metrics from any region, so alarms in eu-west-1 can monitor buckets in eu-west-3 and eu-north-1.
Step 8.1: Create SNS Topic for Alerts ¶
Create an SNS topic in eu-west-1 that will handle all backup and replication alerts.
# Create SNS topic
aws sns create-topic \
--name mnemonica-backup-alerts \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Save the TopicArn from the output - you'll need it for subsequent steps.
Subscribe email address(es):
# Replace with your email address
aws sns subscribe \
--topic-arn arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--protocol email \
--notification-endpoint your-email@example.com \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Confirm the subscription via the email you receive
Subscribe Slack/Teams webhook:
First, create an incoming webhook in your Slack/Teams workspace:
- Slack: Workspace Settings → Apps → Incoming Webhooks → Add to Slack
- Teams: Channel → Connectors → Incoming Webhook → Configure
Then subscribe the webhook to SNS:
# Replace with your webhook URL
aws sns subscribe \
--topic-arn arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--protocol https \
--notification-endpoint https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Note: For Slack/Teams integration, SNS sends raw JSON. You may want to add a Lambda function to format messages nicely (optional enhancement).
Step 8.2: AWS Backup Job Failure Alarms ¶
Create EventBridge rule to detect failed AWS Backup jobs.
# Create EventBridge rule for backup failures
aws events put-rule \
--name mnemonica-backup-job-failures \
--event-pattern file://configs/monitoring-backup-failure-pattern.json \
--state ENABLED \
--description "Alert on AWS Backup job failures" \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Add SNS topic as target
aws events put-targets \
--rule mnemonica-backup-job-failures \
--targets "Id"="1","Arn"="arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts" \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Grant EventBridge permission to publish to SNS
aws sns set-topic-attributes \
--topic-arn arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--attribute-name Policy \
--attribute-value file://configs/monitoring-sns-policy.json \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Repeat for DR region (eu-west-3):
# Create rule in DR region
aws events put-rule \
--name mnemonica-backup-job-failures-dr \
--event-pattern file://configs/monitoring-backup-failure-pattern.json \
--state ENABLED \
--description "Alert on AWS Backup job failures in DR region" \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Create SNS topic in DR region
aws sns create-topic \
--name mnemonica-backup-alerts-dr \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Subscribe same email to DR topic
aws sns subscribe \
--topic-arn arn:aws:sns:eu-west-3:828879644785:mnemonica-backup-alerts-dr \
--protocol email \
--notification-endpoint your-email@example.com \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Add SNS as target
aws events put-targets \
--rule mnemonica-backup-job-failures-dr \
--targets "Id"="1","Arn"="arn:aws:sns:eu-west-3:828879644785:mnemonica-backup-alerts-dr" \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
Step 8.3: S3 Replication Monitoring ¶
Monitor replication lag and failures for both media and vault buckets.
First, verify the replication rule IDs:
# Get media bucket replication rule ID
aws s3api get-bucket-replication \
--bucket mne-media-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'ReplicationConfiguration.Rules[0].ID' \
--output text
# Get vault bucket replication rule ID
aws s3api get-bucket-replication \
--bucket mne-vault-prod \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'ReplicationConfiguration.Rules[0].ID' \
--output text
# Save the outputs and replace rule IDs in the commands below
Create replication monitoring alarms for media buckets:
# IMPORTANT: Replace "paris-replica" with your actual media bucket rule ID
# Media bucket: replication lag alarm
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-media-replication-lag \
--alarm-description "Alert when media bucket replication takes more than 15 minutes" \
--metric-name ReplicationLatency \
--namespace AWS/S3 \
--statistic Maximum \
--period 300 \
--evaluation-periods 2 \
--threshold 900000 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=SourceBucket,Value=mne-media-prod Name=DestinationBucket,Value=mne-media-replica Name=RuleId,Value=paris-replica \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Media bucket: replication failures alarm
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-media-replication-failures \
--alarm-description "Alert when media bucket replication operations fail" \
--metric-name OperationsFailedReplication \
--namespace AWS/S3 \
--statistic Sum \
--period 300 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--dimensions Name=SourceBucket,Value=mne-media-prod Name=DestinationBucket,Value=mne-media-replica Name=RuleId,Value=paris-replica \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Create replication monitoring alarms for vault buckets:
# IMPORTANT: Replace "vault-replica-rule" with your actual vault bucket rule ID
# Vault bucket: replication lag alarm
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-vault-replication-lag \
--alarm-description "Alert when vault bucket replication takes more than 15 minutes" \
--metric-name ReplicationLatency \
--namespace AWS/S3 \
--statistic Maximum \
--period 300 \
--evaluation-periods 2 \
--threshold 900000 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=SourceBucket,Value=mne-vault-prod Name=DestinationBucket,Value=mne-vault-replica Name=RuleId,Value=vault-replica-rule \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Vault bucket: replication failures alarm
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-vault-replication-failures \
--alarm-description "Alert when vault bucket replication operations fail" \
--metric-name OperationsFailedReplication \
--namespace AWS/S3 \
--statistic Sum \
--period 300 \
--evaluation-periods 1 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--dimensions Name=SourceBucket,Value=mne-vault-prod Name=DestinationBucket,Value=mne-vault-replica Name=RuleId,Value=vault-replica-rule \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Step 8.4: Lifecycle Policy Monitoring ¶
Monitor for unexpected object deletions in replica buckets (potential lifecycle misconfiguration).
Note: These alarms are created in eu-west-1 (same region as SNS topic) but monitor buckets in other regions. CloudWatch supports cross-region monitoring for S3 metrics.
Media replica bucket (mne-media-replica in eu-west-3):
# Create anomaly detection alarm for media replica bucket object count
# Alarm is in eu-west-1, monitoring bucket in eu-west-3
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-media-replica-object-count-anomaly \
--alarm-description "Alert when media replica bucket object count deviates from expected baseline" \
--comparison-operator LessThanLowerThreshold \
--evaluation-periods 1 \
--metrics '[
{
"Id": "m1",
"ReturnData": true,
"MetricStat": {
"Metric": {
"Namespace": "AWS/S3",
"MetricName": "NumberOfObjects",
"Dimensions": [
{"Name": "BucketName", "Value": "mne-media-replica"},
{"Name": "StorageType", "Value": "AllStorageTypes"}
]
},
"Period": 86400,
"Stat": "Average"
}
},
{
"Id": "ad1",
"Expression": "ANOMALY_DETECTION_BAND(m1, 2)",
"Label": "Media Replica Object Count Anomaly Detection"
}
]' \
--threshold-metric-id ad1 \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Vault replica bucket (mne-vault-replica in eu-north-1):
# Create anomaly detection alarm for vault replica bucket object count
# Alarm is in eu-west-1, monitoring bucket in eu-north-1
aws cloudwatch put-metric-alarm \
--alarm-name mnemonica-vault-replica-object-count-anomaly \
--alarm-description "Alert when vault replica bucket object count deviates from expected baseline" \
--comparison-operator LessThanLowerThreshold \
--evaluation-periods 1 \
--metrics '[
{
"Id": "m1",
"ReturnData": true,
"MetricStat": {
"Metric": {
"Namespace": "AWS/S3",
"MetricName": "NumberOfObjects",
"Dimensions": [
{"Name": "BucketName", "Value": "mne-vault-replica"},
{"Name": "StorageType", "Value": "AllStorageTypes"}
]
},
"Period": 86400,
"Stat": "Average"
}
},
{
"Id": "ad1",
"Expression": "ANOMALY_DETECTION_BAND(m1, 2)",
"Label": "Vault Replica Object Count Anomaly Detection"
}
]' \
--threshold-metric-id ad1 \
--alarm-actions arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--treat-missing-data notBreaching \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Benefits of Anomaly Detection:
- Automatically learns normal object count patterns
- Adapts to gradual growth or shrinkage
- No manual threshold updates needed
- Detects deviations outside 2 standard deviations (configurable)
Step 8.5: Validation ¶
Test SNS topic delivery:
# Send test notification
aws sns publish \
--topic-arn arn:aws:sns:eu-west-1:828879644785:mnemonica-backup-alerts \
--subject "Test: Backup Monitoring Alert" \
--message "This is a test notification from your backup monitoring system. If you received this, notifications are working correctly." \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Check email and Slack/Teams channel to confirm receipt.
Verify alarms exist:
# List all alarms (all are in eu-west-1)
aws cloudwatch describe-alarms \
--alarm-names mnemonica-media-replication-lag mnemonica-media-replication-failures mnemonica-vault-replication-lag mnemonica-vault-replication-failures mnemonica-media-replica-object-count-anomaly mnemonica-vault-replica-object-count-anomaly \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Check alarm states (should be OK or INSUFFICIENT_DATA initially)
aws cloudwatch describe-alarms \
--state-value ALARM \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Verify EventBridge rules:
# List rules
aws events list-rules \
--name-prefix mnemonica-backup \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Check rule targets
aws events list-targets-by-rule \
--rule mnemonica-backup-job-failures \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Recovery Procedures ¶
Scenario 1: Deleted File Recovery (0-35 days) ¶
Objective: Restore a deleted file from source bucket with perfect consistency
RTO: 30-60 minutes
RPO: <1 minute
Steps:
- Identify deletion timestamp:
# List object versions to find when it was deleted
aws s3api list-object-versions \
--bucket mne-media-prod \
--prefix path/to/file.ext \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
- Restore RDS to exact timestamp via PITR:
# Restore DB to specific point in time
aws rds restore-db-instance-to-point-in-time \
--source-db-instance-identifier eks-mnemonica-prod \
--target-db-instance-identifier eks-mnemonica-prod-restored-$(date +%Y%m%d-%H%M) \
--restore-time "2025-10-30T15:30:00Z" \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
- Restore S3 file from noncurrent version:
# Get the version ID before deletion
VERSION_ID=$(aws s3api list-object-versions \
--bucket mne-media-prod \
--prefix path/to/file.ext \
--region eu-west-1 \
--no-cli-pager --profile mnemonica \
--query 'Versions[0].VersionId' \
--output text)
# Copy version to restore it as current
aws s3api copy-object \
--copy-source mne-media-prod/path/to/file.ext?versionId=$VERSION_ID \
--bucket mne-media-prod \
--key path/to/file.ext \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Scenario 2: Deleted File Recovery (35-180 days) ¶
Objective: Restore file from replica bucket Deep Archive
RTO: 12-48 hours (Deep Archive restoration time)
RPO: Up to 6 hours (AWS Backup snapshot frequency)
Steps:
- Initiate restoration from Deep Archive:
# Start restore operation (takes 12-48 hours with Standard tier)
aws s3api restore-object \
--bucket mne-media-replica \
--key path/to/file.ext \
--restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}' \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
- Check restoration status:
aws s3api head-object \
--bucket mne-media-replica \
--key path/to/file.ext \
--region eu-west-3 \
--no-cli-pager --profile mnemonica \
--query 'Restore'
- Download restored file (after restoration completes):
aws s3 cp s3://mne-media-replica/path/to/file.ext ./restored-file.ext \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
- Restore RDS from closest AWS Backup snapshot:
# List recovery points near the desired time
aws backup list-recovery-points-by-backup-vault \
--backup-vault-name Default \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
# Restore from recovery point
aws backup start-restore-job \
--recovery-point-arn <recovery-point-arn> \
--metadata '{"DBInstanceIdentifier":"eks-mnemonica-prod-restored"}' \
--iam-role-arn arn:aws:iam::828879644785:role/service-role/AWSBackupDefaultServiceRole \
--region eu-west-1 \
--no-cli-pager --profile mnemonica
Scenario 3: Complete Region Failure (eu-west-1) ¶
Objective: Failover to eu-west-3 for disaster recovery
RTO: 12-48 hours
RPO: Up to 6 hours
Steps:
- Restore RDS from eu-west-3 snapshot:
# List recovery points in DR region
aws backup list-recovery-points-by-backup-vault \
--backup-vault-name Default \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Restore RDS instance in eu-west-3
aws backup start-restore-job \
--recovery-point-arn <recovery-point-arn-in-eu-west-3> \
--metadata '{"DBInstanceIdentifier":"eks-mnemonica-prod-dr"}' \
--iam-role-arn arn:aws:iam::828879644785:role/service-role/AWSBackupDefaultServiceRole \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
- Access S3 replica bucket:
# Replica bucket is already in eu-west-3
# Update application configuration to use replica bucket
# S3 endpoint: s3.eu-west-3.amazonaws.com
- Restore S3 objects from Deep Archive (if needed):
Option A: Restore individual files (see Scenario 2 for single-file restore commands)
Option B: Bulk restore using S3 Batch Operations (recommended for large-scale recovery):
S3 Batch Operations allows you to restore thousands of objects in parallel from Deep Archive.
Step 3a: Create S3 Inventory (if not already configured)
# Create inventory configuration for the replica bucket
aws s3api put-bucket-inventory-configuration \
--bucket mne-media-replica \
--id mnemonica-replica-inventory \
--inventory-configuration file://configs/s3-inventory-config.json \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
# Example inventory config (save as configs/s3-inventory-config.json):
# {
# "Destination": {
# "S3BucketDestination": {
# "AccountId": "828879644785",
# "Bucket": "arn:aws:s3:::mne-media-replica",
# "Format": "CSV",
# "Prefix": "inventory/"
# }
# },
# "IsEnabled": true,
# "Id": "mnemonica-replica-inventory",
# "IncludedObjectVersions": "Current",
# "Schedule": {
# "Frequency": "Daily"
# }
# }
Step 3b: Create IAM role for Batch Operations (first time only)
# Create trust policy (save as configs/batch-ops-trust-policy.json):
# {
# "Version": "2012-10-17",
# "Statement": [{
# "Effect": "Allow",
# "Principal": {"Service": "batchoperations.s3.amazonaws.com"},
# "Action": "sts:AssumeRole"
# }]
# }
aws iam create-role \
--role-name S3BatchOperationsRole \
--assume-role-policy-document file://configs/batch-ops-trust-policy.json \
--no-cli-pager --profile mnemonica
# Attach policy with S3 permissions
aws iam attach-role-policy \
--role-name S3BatchOperationsRole \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess \
--no-cli-pager --profile mnemonica
Step 3c: Create Batch Restore Job
# Create manifest with objects to restore (use inventory CSV or create custom manifest)
# Manifest format: bucket,key
# Example: mne-media-replica,path/to/file1.jpg
# Create batch job (example for restoring all objects)
aws s3control create-job \
--account-id 828879644785 \
--operation '{
"S3InitiateRestoreObject": {
"ExpirationInDays": 7,
"GlacierJobParameters": {
"Tier": "Bulk"
}
}
}' \
--manifest '{
"Spec": {
"Format": "S3InventoryReport_CSV_20211130",
"Fields": ["Bucket", "Key"]
},
"Location": {
"ObjectArn": "arn:aws:s3:::mne-media-replica/inventory/manifest.json",
"ETag": "MANIFEST_ETAG"
}
}' \
--report '{
"Bucket": "arn:aws:s3:::mne-media-replica",
"Format": "Report_CSV_20180820",
"Enabled": true,
"Prefix": "batch-restore-reports/",
"ReportScope": "AllTasks"
}' \
--priority 10 \
--role-arn arn:aws:iam::828879644785:role/S3BatchOperationsRole \
--region eu-west-3 \
--no-cli-pager --profile mnemonica \
--description "Bulk restore from Deep Archive for DR"
# Monitor job progress
aws s3control describe-job \
--account-id 828879644785 \
--job-id <JOB_ID_FROM_CREATE_OUTPUT> \
--region eu-west-3 \
--no-cli-pager --profile mnemonica
Important Notes:
- Bulk tier: 12-48 hours restore time, lowest cost (~$0.0025/GB)
- Standard tier: 12-48 hours (same as Bulk for Deep Archive)
- Expedited tier: Not available for Deep Archive
- For ~92TB replica, expect 48+ hours for full restoration
- Cost estimate: ~$230 for full 92TB restore
Documentation:
- Update application DNS/endpoints to point to eu-west-3
Summary Checklist ¶
- Step 0: Backed up current configurations
- Phase 1: Updated media source bucket lifecycle (35-day retention)
- Phase 2: Updated media replica bucket lifecycle (180-day retention, mirror mode)
- Phase 2A: Verified vault source bucket lifecycle (180-day retention, already correct)
- Phase 2B: Applied vault replica bucket lifecycle (180-day retention, mirror mode)
- Phase 3: Verified cross-region replication for both media and vault buckets (already working)
- Phase 4: Created/updated application IAM role with vault bucket access
- Phase 5: Verified RDS automated backups (already working)
- Phase 6: Created new AWS Backup plan "eks-mnemonica-pitr35d-6h180d" (PITR 35d + 6-hour snapshots 180d)
- Phase 6: Migrated RDS resource to new backup plan
- Phase 6: Created dedicated backup vaults (eks-mnemonica-prod-vault, eks-mnemonica-prod-vault-dr)
- Phase 6: Updated backup plan to use dedicated vaults
- Phase 6: Verified new plan working (24 hours, 4+ successful jobs)
- Phase 6: Deleted old plan "Daily-10d-retention"
- Phase 7: Validated all configurations (media and vault buckets)
- Phase 8: Created SNS topics and subscriptions (email + Slack/Teams)
- Phase 8: Configured AWS Backup failure alarms (both regions)
- Phase 8: Configured S3 replication monitoring for both bucket pairs (lag + failures)
- Phase 8: Configured lifecycle policy monitoring (object count)
- Phase 8: Tested all alert channels
- Monitored first backup jobs (within 6 hours)
- Documented new backup plan ID
- Updated team documentation
- Scheduled cost review (1 month)
What's NOT Implemented (Deferred) ¶
⏸️ S3 Object Lock on replica bucket
- Requires new bucket creation or risk acceptance
- Decision pending further analysis
- Implementation guide available below when ready
[DEFERRABLE] Object Lock Implementation ¶
Status: ⏸️ DEFERRED - Requires creating new bucket or risk acceptance
Object Lock provides immutability (WORM) protection for disaster recovery data. This section can be implemented later as a separate project.
Why Object Lock? ¶
Protection Against:
- Ransomware/malware deletion attempts
- Compromised credentials
- Accidental admin deletion
- Insider threats
- Lifecycle misconfiguration
Requirement from Plan: GOVERNANCE mode, 180-day default retention on replica bucket
Current Limitation ¶
❌ Object Lock is NOT enabled on mne-media-replica
Problem: Object Lock can ONLY be enabled at bucket creation time. Cannot be added to existing buckets.
Implementation Options ¶
Option A: Create New Bucket with Object Lock (Full Compliance) ¶
Steps:
- Create new bucket
mne-media-replica-v2with Object Lock enabled - Configure Object Lock: GOVERNANCE mode, 180-day retention
- Update replication configuration to point to new bucket
- Choose migration strategy:
- Parallel operation: Keep both buckets, let old one expire naturally (recommended)
- Copy data: Restore from Deep Archive and copy (expensive: ~$2,570)
Pros:
- ✅ Full compliance with plan
- ✅ True immutability protection
- ✅ WORM storage for regulatory compliance
Cons:
- Requires bucket name change
- Some migration effort
- Temporary dual storage costs
Cost:
- Parallel operation: No additional cost (old bucket shrinks naturally)
- Full copy: ~$2,570 one-time (Deep Archive restoration)
Option B: Enhanced IAM Policies (Partial Protection) ¶
Steps:
- Apply strict bucket policies denying deletion
- Use MFA delete on bucket
- Restrict IAM permissions
Pros:
- ✅ No infrastructure changes
- ✅ Immediate implementation
Cons:
- ❌ NOT true immutability (policies can be overridden by admins)
- ❌ Does not meet plan requirements
- ❌ No WORM compliance
Option C: Accept Risk (Document Only) ¶
Steps:
- Document risk acceptance
- Rely on IAM access controls
- Implement in future when convenient
Pros:
- ✅ No changes needed now
Cons:
- ❌ No immutability protection
- ❌ Compliance gap
- ❌ Vulnerable to admin errors/malicious deletion
Recommendation ¶
For production environments with compliance requirements: Option A (New bucket)
Implementation guide available separately when ready to proceed with Object Lock.
Cost-benefit analysis: See COST-COMPARISON.md for detailed scenarios
Maintenance Tasks ¶
Weekly ¶
- Review AWS Backup job success rate
- Check S3 replication metrics for any lag
- Verify latest restorable time for RDS PITR
Monthly ¶
- Test file recovery from source bucket (0-35 days scenario)
- Review storage costs and optimize if needed
- Audit IAM access logs
Quarterly ¶
- Perform full disaster recovery test (Scenario 3)
- Test Deep Archive restoration (Scenario 2)
- Review and update backup retention policies if needed
- Conduct RDS PITR test restore
- Re-evaluate Object Lock implementation decision
Implementation Guide Version: 2.0
Last Updated: 2025-10-31
Customized for: Account 828879644785 (mnemonica profile)
Comments
Please login to leave a comment.
No comments yet. Be the first to comment!