Cloud Engineer Interview Prep
AWS, Azure, or GCP. Technical questions on services, networking, and security plus behavioural questions about projects.
General tips for this role
- Always draw on the whiteboard. Even if simple, visualising shows architectural thinking.
- When unsure of an exact AWS service name, describe what it does. 'A managed message queue' is fine even if you cannot remember 'SQS'.
- Mention cost. Senior engineers always think about cost.
- Mention security. Always.
- Ask clarifying questions before diving in. 'Is this a startup MVP or an enterprise migration?' Changes everything.
What is the difference between IaaS, PaaS, and SaaS?
Show model answer
IaaS (Infrastructure) gives you raw virtual machines and storage; you install everything. Example: AWS EC2. PaaS (Platform) gives you a platform to deploy code; you do not manage the OS. Example: Azure App Service. SaaS (Software) is ready-to-use software; you just log in. Example: Salesforce. The deeper the stack the provider manages, the less control and responsibility you have.
Use the 'pizza analogy': IaaS is buying the ingredients, PaaS is take-and-bake, SaaS is delivered hot.
What is the difference between Availability Zones and Regions?
Show model answer
A region is a geographic area (e.g. eu-west-2 = London). Each region has multiple Availability Zones (AZs), which are physically separate datacentres within the region. AZs let you build apps that survive a single datacenter failure. For disaster recovery, you also need to think about multiple regions.
How would you design a highly available web app on AWS?
Show model answer
Run the web tier on EC2 instances in an Auto Scaling group across at least two Availability Zones. Put an Application Load Balancer in front to distribute traffic. Use RDS Multi-AZ for the database. Store static assets in S3 with CloudFront as a CDN. Add Route 53 for DNS failover. For 99.99% you would need at least 2 AZs; for 99.999% you would need multi-region.
Always mention BOTH compute AND data layers. Many candidates forget the database.
What is the difference between Security Groups and NACLs in AWS?
Show model answer
Security Groups are stateful, applied at the instance level โ return traffic is automatically allowed. NACLs are stateless, applied at the subnet level โ you must explicitly allow both inbound and outbound. Security Groups are 'allow only'; NACLs can both allow and deny.
Most candidates remember the names but not which is stateful. Memorise: SG = stateful, NACL = not.
How does Auto Scaling work and what are its limitations?
Show model answer
Auto Scaling automatically adds or removes EC2 instances based on metrics (CPU, network, custom CloudWatch metrics). It needs a launch template and an Auto Scaling group. Limitations: takes minutes to scale up, so spikes are not handled instantly; cold starts; you pay for the new instances from second 1. For instant scaling, consider Lambda or pre-warmed pools.
Explain how you would secure data at rest and in transit on AWS.
Show model answer
At rest: use AWS KMS for key management; enable encryption on S3 (SSE-S3 or SSE-KMS), EBS volumes, RDS, and DynamoDB. In transit: use TLS 1.2+ everywhere, force HTTPS via ALB/CloudFront, use VPC endpoints to keep traffic off the public internet. Add IAM least-privilege policies and CloudTrail for auditing.
Encryption AT rest + IN transit are different. Always mention both.
Walk me through how you would migrate a legacy on-premises application to the cloud.
Show model answer
Use the AWS 6 Rs framework. 1) Discover and assess: inventory apps, dependencies, performance, compliance needs. 2) Categorise each app into the 6 Rs: Rehost (lift-and-shift), Replatform, Repurchase (move to SaaS), Refactor, Retire, Retain. 3) Pilot a low-risk app first to validate tooling and process. 4) Build a landing zone (VPCs, IAM, logging, security baseline). 5) Migrate in waves, starting with non-critical apps. 6) Optimise after migration: right-size instances, adopt managed services, set up FinOps.
Mention 'landing zone' โ it shows enterprise experience.
What are the trade-offs between Lambda and EC2?
Show model answer
Lambda: serverless, pay per execution, scales to zero, 15-min max runtime, cold starts. Great for event-driven, short-lived tasks. EC2: full control, runs continuously, predictable cost at scale, no cold starts. Better for long-running services or when you need a specific OS. Cost crossover: roughly, if your workload runs more than ~30% of the time, EC2 becomes cheaper than Lambda.
Mention cold starts โ interviewers love when you acknowledge Lambda is not always the answer.
How would you monitor a microservices architecture?
Show model answer
Three pillars: metrics, logs, traces. Metrics with CloudWatch/Prometheus + Grafana dashboards (CPU, latency, error rate, request rate). Centralised logs (CloudWatch Logs, ELK, Datadog). Distributed tracing (X-Ray, Jaeger, Datadog APM) to follow a request across services. Set up alerts on SLO breaches, not raw metrics. Implement health checks at the service and dependency level.
What is Infrastructure as Code and why does it matter?
Show model answer
IaC is defining infrastructure (servers, networks, databases) in code rather than clicking in a console. Tools: Terraform, AWS CloudFormation, Pulumi, Azure Bicep. Benefits: version control, code review, repeatable deployments, easy rollback, can spin up identical environments (dev/staging/prod). It is the foundation of modern DevOps.
Tell me about a time you handled a production outage.
Show model answer
Use STAR. Situation: which system went down, when, how you found out. Task: your role in the response. Action: triage, hypothesis, fix, communication. Result: how long it lasted, customer impact, what you learned. End with the postmortem and improvements you made.
Interviewers want to hear about ownership, calm under pressure, and learning. Never blame others.
How do you balance speed with reliability when shipping changes?
Show model answer
Talk about your CI/CD pipeline, automated testing, blue-green or canary deployments, feature flags. Mention error budgets if you can. Show you take BOTH seriously and have processes that catch issues early.
Mentioning 'error budgets' (the SRE concept) signals seniority.
Why do you want to work in cloud engineering?
Show model answer
Honest answer + one technical reason. 'I love that cloud lets you build globally-scaled systems with a few API calls. The fact that I can spin up a multi-region database in 5 minutes still amazes me. I want to work with that scale.'
Avoid generic answers like 'It's the future'. Pick something specific.
Your team's AWS bill has doubled in the last month. How do you investigate?
Show model answer
Start with Cost Explorer to find which services drove the increase. Filter by service, then by tag, then by resource. Common culprits: unattached EBS volumes, idle EC2 instances, data transfer between AZs, NAT Gateway egress, unused EIPs. Set up Budgets to alert before it happens again. Implement tagging policy so all resources can be attributed. Consider Reserved Instances or Savings Plans for steady workloads. Long-term: implement FinOps practices.
A new junior joins your team. How do you help them ramp up on AWS?
Show model answer
Give them read-only access first. Pair them on the architecture diagram of one core service. Have them go through AWS Skill Builder fundamentals (1 week). Assign a small, low-risk Terraform PR. Code review with detailed comments. Set up a weekly 1:1 to ask questions. After a month, they should be able to deploy a basic service end-to-end with supervision.
Shows you can mentor โ important for senior roles.