Aws Specialist Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
AWS Specialist — Worked Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
Few-Shot Examples
Example 1: Lambda Cold Start Optimization
Input:
"Our Java Lambda function has 3-4 second cold starts on the /login endpoint. How do I fix it?"
Output:
Cold starts on Java are primarily caused by JVM initialization plus class loading. Three levers to pull, in priority order:
1. Enable SnapStart (zero cold start for Java 21)
# CDK: Enable SnapStart on the function
from aws_cdk import aws_lambda as lambda_
function = lambda_.Function(
self, "LoginHandler",
runtime=lambda_.Runtime.JAVA_21,
handler="com.example.LoginHandler::handleRequest",
code=lambda_.Code.from_asset("target/login-handler.jar"),
memory_size=1024,
snap_start=lambda_.SnapStartConf.ON_PUBLISHED_VERSIONS,
)
# SnapStart only works on published versions, not $LATEST
alias = lambda_.Alias(
self, "ProdAlias",
alias_name="prod",
version=function.current_version,
)
2. Increase memory to reduce initialization time
# Memory directly controls CPU allocation in Lambda
# Run Power Tuning tool comparing 512MB, 1024MB, 2048MB
# Java typically sees diminishing returns above 1792MB
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:powerTuningStateMachine \
--input '{
"lambdaARN": "arn:aws:lambda:us-east-1:123456789:function:login-handler",
"powerValues": [512, 1024, 1769, 2048, 3008],
"num": 20,
"payload": {"username": "test", "password": "test"},
"strategy": "balanced"
}'
3. Add provisioned concurrency for the login alias
aws lambda put-provisioned-concurrency-config \
--function-name login-handler \
--qualifier prod \
--provisioned-concurrent-executions 10
# Use Application Auto Scaling to scale PC during business hours
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:login-handler:prod \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 5 \
--max-capacity 50
Expected outcome: SnapStart reduces cold start from 3-4s to <200ms. Provisioned concurrency eliminates cold starts entirely for the configured baseline. Cost increase: ~$15/month per 10 provisioned concurrency units.
Example 2: S3 Cost Analysis and Lifecycle Optimization
Input:
"Our S3 bill jumped 40% last month. The bucket has 500TB of mixed data — some accessed daily, most untouched for 6+ months. How do we reduce costs?"
Output:
Start with data — analyze actual access patterns before writing lifecycle rules:
# Enable S3 Inventory to get object-level storage class and last modified
aws s3api put-bucket-inventory-configuration \
--bucket my-data-bucket \
--id full-inventory \
--inventory-configuration '{
"Id": "full-inventory",
"IsEnabled": true,
"Destination": {
"S3BucketDestination": {
"Bucket": "arn:aws:s3:::my-inventory-bucket",
"Format": "Parquet"
}
},
"Schedule": {"Frequency": "Weekly"},
"IncludedObjectVersions": "Current",
"OptionalFields": ["StorageClass", "LastModifiedDate", "Size", "ReplicationStatus"]
}'
# After inventory runs, query with Athena
# Create table first, then:
SELECT
storage_class,
COUNT(*) as object_count,
SUM(size) / POWER(1024, 4) as size_tb,
AVG(DATE_DIFF('day', from_iso8601_timestamp(last_modified_date), current_date)) as avg_age_days
FROM s3_inventory.my_data_bucket
GROUP BY storage_class
ORDER BY size_tb DESC;
Once you know the distribution, apply targeted lifecycle rules:
# CDK: Tiered lifecycle based on your access pattern data
bucket.add_lifecycle_rule(
id="hot-to-cold",
enabled=True,
prefix="data/", # Apply only to data/ prefix, not logs/
transitions=[
s3.Transition(
storage_class=s3.StorageClass.INFREQUENT_ACCESS,
transition_after=Duration.days(30), # IA: $0.0125/GB vs $0.023/GB standard
),
s3.Transition(
storage_class=s3.StorageClass.GLACIER_INSTANT_RETRIEVAL,
transition_after=Duration.days(90), # GIR: $0.004/GB — millisecond retrieval
),
s3.Transition(
storage_class=s3.StorageClass.DEEP_ARCHIVE,
transition_after=Duration.days(180), # DA: $0.00099/GB — 12h retrieval
),
],
)
Projected savings on 500TB:
- 400TB moved to Glacier Instant Retrieval: from $9,200/mo to $1,600/mo
- 80TB moved to Deep Archive: from $1,840/mo to $79/mo
- 20TB stays Standard for active data: $460/mo unchanged
- Total: ~$11,500/mo → ~$2,140/mo (81% reduction)
Example 3: Well-Architected Framework Finding — Reliability Pillar
Input:
"Run a Well-Architected review on our reliability posture. We have a single RDS instance with daily snapshots and no read replicas."
Output:
Finding: REL-3 — Single point of failure in database tier
Risk level: HIGH
Current state:
- Single RDS instance in one availability zone
- Recovery method: restore from daily snapshot (RTO: 2-4 hours, RPO: up to 24 hours)
- No read replica to offload reporting queries
Remediation plan:
# Step 1: Promote to Multi-AZ (< 5 minutes downtime)
aws rds modify-db-instance \
--db-instance-identifier production-db \
--multi-az \
--apply-immediately
# Monitor the modification
aws rds describe-db-instances \
--db-instance-identifier production-db \
--query 'DBInstances[0].{Status:DBInstanceStatus,MultiAZ:MultiAZ,SecondaryAZ:SecondaryAvailabilityZone}'
# CDK: Multi-AZ with read replica and automated backups
from aws_cdk import aws_rds as rds, Duration
primary = rds.DatabaseInstance(
self, "Primary",
engine=rds.DatabaseInstanceEngine.postgres(
version=rds.PostgresEngineVersion.VER_16_2
),
instance_type=ec2.InstanceType.of(
ec2.InstanceClass.R6G, ec2.InstanceSize.XLARGE
),
multi_az=True, # Automatic standby in second AZ
backup_retention=Duration.days(7), # Reduce RPO to <5 minutes with PITR
delete_automated_backups=False,
deletion_protection=True,
)
read_replica = rds.DatabaseInstanceReadReplica(
self, "ReadReplica",
source_database_instance=primary,
instance_type=ec2.InstanceType.of(
ec2.InstanceClass.R6G, ec2.InstanceSize.LARGE
),
)
Validate failover:
# Force a failover to test your RTO
aws rds reboot-db-instance \
--db-instance-identifier production-db \
--force-failover
# Measure time until DNS resolves to new primary
watch -n 5 "aws rds describe-db-instances \
--db-instance-identifier production-db \
--query 'DBInstances[0].{Status:DBInstanceStatus,AZ:AvailabilityZone}'"
Expected outcome after remediation:
- RTO: <60 seconds (automatic failover to standby)
- RPO: <5 minutes (continuous transaction log shipping)
- Reliability pillar risk: HIGH → NONE for this question