Skip to main content

Migrating to Kubernetes on AWS EKS - A Step-by-Step Guide

· 11 min read
Mike Thompson
Senior Cloud Engineer @ CloudCraft

Kubernetes has become the de facto standard for container orchestration, and AWS EKS (Elastic Kubernetes Service) makes it easier than ever to run Kubernetes in the cloud. In this comprehensive guide, we'll walk through a real-world migration from traditional EC2-based applications to a modern, scalable Kubernetes architecture on EKS.

Why Migrate to Kubernetes on EKS?

After helping dozens of companies migrate to Kubernetes, we've seen consistent benefits:

Scalability and Efficiency

  • Auto-scaling: Horizontal and vertical pod auto-scaling based on metrics
  • Resource efficiency: Better resource utilization compared to traditional VMs
  • Multi-tenancy: Run multiple applications on shared infrastructure

Operational Benefits

  • Standardized deployments: Consistent deployment patterns across environments
  • Rolling updates: Zero-downtime deployments with automatic rollback
  • Service discovery: Built-in service discovery and load balancing

AWS EKS Advantages

  • Managed control plane: AWS handles master node management and updates
  • AWS integrations: Native integration with AWS services (ALB, IAM, VPC)
  • Security: Built-in security best practices and compliance certifications

Pre-Migration Assessment

Before starting your migration, assess your current infrastructure:

Application Inventory

# Create an inventory of your current applications
echo "Application,Instances,Dependencies,Database,Storage" > app_inventory.csv

# Example inventory items
echo "web-frontend,3,api-backend,none,EBS" >> app_inventory.csv
echo "api-backend,5,database,PostgreSQL,EFS" >> app_inventory.csv
echo "background-jobs,2,redis+database,Redis+PostgreSQL,EBS" >> app_inventory.csv

Resource Analysis

# Analyze current resource usage
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time 2024-02-01T00:00:00Z \
--end-time 2024-03-01T00:00:00Z \
--period 3600 \
--statistics Average,Maximum

Phase 1: EKS Cluster Setup

1. Infrastructure as Code Setup

First, let's set up our EKS cluster using Terraform:

# eks-cluster.tf
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"

cluster_name = var.cluster_name
cluster_version = "1.28"

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true

# EKS Managed Node Groups
eks_managed_node_groups = {
general = {
min_size = 2
max_size = 10
desired_size = 3

instance_types = ["t3.medium"]
capacity_type = "ON_DEMAND"

k8s_labels = {
Environment = var.environment
NodeGroup = "general"
}

update_config = {
max_unavailable_percentage = 25
}
}

spot = {
min_size = 0
max_size = 5
desired_size = 2

instance_types = ["t3.medium", "t3.large"]
capacity_type = "SPOT"

k8s_labels = {
Environment = var.environment
NodeGroup = "spot"
}

taints = {
dedicated = {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
}
}
}

# Cluster access entry
access_entries = {
admin = {
kubernetes_groups = ["system:masters"]
principal_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/eks-admin"

policy_associations = {
admin = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = {
type = "cluster"
}
}
}
}
}

tags = {
Environment = var.environment
Terraform = "true"
}
}

2. Essential Add-ons Setup

# eks-addons.tf
resource "aws_eks_addon" "vpc_cni" {
cluster_name = module.eks.cluster_name
addon_name = "vpc-cni"
}

resource "aws_eks_addon" "coredns" {
cluster_name = module.eks.cluster_name
addon_name = "coredns"
}

resource "aws_eks_addon" "kube_proxy" {
cluster_name = module.eks.cluster_name
addon_name = "kube-proxy"
}

resource "aws_eks_addon" "ebs_csi" {
cluster_name = module.eks.cluster_name
addon_name = "aws-ebs-csi-driver"
}

3. Application Load Balancer Controller

# aws-load-balancer-controller.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: aws-load-balancer-controller
name: aws-load-balancer-controller
namespace: kube-system
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT-ID:role/AmazonEKSLoadBalancerControllerRole
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: aws-load-balancer-controller
name: aws-load-balancer-controller
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: aws-load-balancer-controller
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/name: aws-load-balancer-controller
spec:
containers:
- args:
- --cluster-name=my-cluster
- --ingress-class=alb
image: amazon/aws-alb-ingress-controller:v2.7.2
name: controller
resources:
limits:
cpu: 200m
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
serviceAccountName: aws-load-balancer-controller

Phase 2: Application Containerization

1. Containerizing Applications

Here's an example Dockerfile for a Node.js application:

# Multi-stage build for efficiency
FROM node:18-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Production stage
FROM node:18-alpine AS production

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001

WORKDIR /app

# Copy application files
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs . .

USER nodejs

EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js

CMD ["node", "index.js"]

2. Kubernetes Manifests

Create Kubernetes manifests for your applications:

# web-frontend-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
namespace: production
labels:
app: web-frontend
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: web-frontend
template:
metadata:
labels:
app: web-frontend
version: v1
spec:
containers:
- name: web-frontend
image: your-account.dkr.ecr.us-west-2.amazonaws.com/web-frontend:v1.2.3
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
- name: API_URL
value: http://api-backend:8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
name: web-frontend
namespace: production
spec:
selector:
app: web-frontend
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-frontend
namespace: production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:123456789012:certificate/12345678-1234-1234-1234-123456789012
alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-frontend
port:
number: 80

Phase 3: Database and Storage Migration

1. Database Strategy

For stateful services, consider these patterns:

Option 1: External RDS (Recommended)

# database-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: database-secret
namespace: production
type: Opaque
data:
host: <base64-encoded-rds-endpoint>
username: <base64-encoded-username>
password: <base64-encoded-password>
database: <base64-encoded-database-name>

---
apiVersion: v1
kind: ConfigMap
metadata:
name: database-config
namespace: production
data:
port: "5432"
ssl_mode: "require"

Option 2: In-cluster Database (Development only)

# postgres-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: development
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_DB
value: myapp
- name: POSTGRES_USER
value: myuser
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3
resources:
requests:
storage: 20Gi

2. Storage Classes

Configure appropriate storage classes:

# storage-classes.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
provisioner: ebs.csi.aws.com
parameters:
type: gp3
throughput: "1000"
iops: "3000"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-1234567890abcdef0
directoryPerms: "700"

Phase 4: CI/CD Pipeline Migration

1. GitHub Actions for Kubernetes

# .github/workflows/deploy-to-eks.yml
name: Deploy to EKS

on:
push:
branches: [main]
pull_request:
branches: [main]

env:
AWS_REGION: us-west-2
EKS_CLUSTER_NAME: my-cluster
ECR_REPOSITORY: my-app

jobs:
build-and-deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}

- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2

- name: Build, tag, and push image to Amazon ECR
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest

- name: Update kube config
run: aws eks update-kubeconfig --name $EKS_CLUSTER_NAME

- name: Deploy to EKS
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
# Update image in deployment
kubectl set image deployment/web-frontend \
web-frontend=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
--namespace=production

# Wait for rollout to complete
kubectl rollout status deployment/web-frontend --namespace=production

- name: Verify deployment
run: |
kubectl get pods --namespace=production
kubectl get services --namespace=production

2. Helm Charts for Complex Applications

For more complex applications, use Helm:

# helm/Chart.yaml
apiVersion: v2
name: my-app
description: My Application Helm Chart
version: 0.1.0
appVersion: "1.0"

# helm/values.yaml
replicaCount: 3

image:
repository: my-account.dkr.ecr.us-west-2.amazonaws.com/my-app
tag: "latest"
pullPolicy: IfNotPresent

service:
type: ClusterIP
port: 80
targetPort: 3000

ingress:
enabled: true
className: "alb"
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
hosts:
- host: myapp.example.com
paths:
- path: /
pathType: Prefix

resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi

autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80

Phase 5: Monitoring and Observability

1. Prometheus and Grafana Setup

# monitoring-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring

---
# prometheus-values.yaml for Helm
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi

grafana:
persistence:
enabled: true
storageClassName: gp3
size: 10Gi

adminPassword: "your-secure-password"

ingress:
enabled: true
ingressClassName: alb
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
hosts:
- grafana.your-domain.com

2. Application Monitoring

# app-monitoring.yaml
apiVersion: v1
kind: ServiceMonitor
metadata:
name: web-frontend
namespace: production
spec:
selector:
matchLabels:
app: web-frontend
endpoints:
- port: metrics
interval: 30s
path: /metrics

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: web-frontend-alerts
namespace: production
spec:
groups:
- name: web-frontend
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: High error rate detected
description: "Error rate is {{ $value }} errors per second"

- alert: HighMemoryUsage
expr: container_memory_usage_bytes{pod=~"web-frontend-.*"} / container_spec_memory_limit_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: High memory usage
description: "Memory usage is above 80%"

Migration Strategy and Timeline

Week 1-2: Infrastructure Setup

  • Set up EKS cluster with Terraform
  • Configure networking and security groups
  • Install essential add-ons (ALB controller, CSI drivers)
  • Set up ECR repositories

Week 3-4: Application Containerization

  • Containerize applications
  • Create Kubernetes manifests
  • Set up development environment
  • Implement health checks and probes

Week 5-6: Database and Storage Migration

  • Set up RDS instances (if not existing)
  • Configure persistent volumes
  • Test database connectivity from pods
  • Implement backup strategies

Week 7-8: CI/CD Pipeline

  • Set up GitHub Actions workflows
  • Create Helm charts (if needed)
  • Implement automated testing
  • Set up deployment approvals

Week 9-10: Monitoring and Testing

  • Deploy monitoring stack
  • Set up dashboards and alerts
  • Performance testing
  • Security scanning

Week 11-12: Production Migration

  • Blue-green deployment setup
  • Gradual traffic migration
  • Monitor and optimize
  • Decommission old infrastructure

Real-World Migration Case Study

Client: SaaS platform with 50+ microservices on EC2

Challenge:

  • High operational overhead managing EC2 instances
  • Inconsistent deployment processes across teams
  • Difficulty scaling during traffic spikes
  • Manual security patching and updates

Solution:

  1. Phased Migration: Migrated services in groups of 5-10
  2. Service Mesh: Implemented Istio for inter-service communication
  3. GitOps: Used ArgoCD for declarative deployments
  4. Observability: Comprehensive monitoring with Prometheus/Grafana

Results:

  • 75% reduction in deployment time
  • 60% improvement in resource utilization
  • 99.9% uptime achieved through auto-healing
  • 50% reduction in operational overhead

Common Pitfalls and How to Avoid Them

1. Resource Sizing

# ❌ Bad: No resource limits
containers:
- name: app
image: my-app:latest

# ✅ Good: Proper resource management
containers:
- name: app
image: my-app:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"

2. Security Configuration

# ❌ Bad: Running as root
spec:
containers:
- name: app
image: my-app:latest

# ✅ Good: Non-root user with security context
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
containers:
- name: app
image: my-app:latest
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true

3. Networking Issues

# Ensure proper network policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: web-frontend-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: web-frontend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: TCP
port: 3000
egress:
- to:
- podSelector:
matchLabels:
app: api-backend
ports:
- protocol: TCP
port: 8080

Post-Migration Optimization

1. Cost Optimization

# Use Spot instances for non-critical workloads
kubectl taint nodes spot=true:NoSchedule

# Set up Cluster Autoscaler
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

2. Performance Tuning

# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-frontend-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-frontend
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

Conclusion

Migrating to Kubernetes on AWS EKS is a significant undertaking that requires careful planning, execution, and ongoing optimization. The benefits—improved scalability, operational efficiency, and cost optimization—make it worthwhile for most organizations running containerized workloads.

Key success factors include:

  • Thorough planning and application assessment
  • Phased migration approach to minimize risk
  • Proper monitoring and observability from day one
  • Team training on Kubernetes concepts and operations
  • Continuous optimization based on real-world usage patterns

Remember that migration to Kubernetes is not just a technology change—it's also a cultural shift toward DevOps practices and shared responsibility for application lifecycle management.


Ready to migrate to Kubernetes? Our team has successfully migrated hundreds of applications to EKS and can help you plan and execute your migration strategy. Contact us for a consultation.

Related Resources: