Migrating to Kubernetes on AWS EKS - A Step-by-Step Guide

March 5, 2024 · 11 min read

Senior Cloud Engineer @ CloudCraft

Kubernetes has become the de facto standard for container orchestration, and AWS EKS (Elastic Kubernetes Service) makes it easier than ever to run Kubernetes in the cloud. In this comprehensive guide, we'll walk through a real-world migration from traditional EC2-based applications to a modern, scalable Kubernetes architecture on EKS.

Why Migrate to Kubernetes on EKS?

After helping dozens of companies migrate to Kubernetes, we've seen consistent benefits:

Scalability and Efficiency

Auto-scaling: Horizontal and vertical pod auto-scaling based on metrics
Resource efficiency: Better resource utilization compared to traditional VMs
Multi-tenancy: Run multiple applications on shared infrastructure

Operational Benefits

Standardized deployments: Consistent deployment patterns across environments
Rolling updates: Zero-downtime deployments with automatic rollback
Service discovery: Built-in service discovery and load balancing

AWS EKS Advantages

Managed control plane: AWS handles master node management and updates
AWS integrations: Native integration with AWS services (ALB, IAM, VPC)
Security: Built-in security best practices and compliance certifications

Pre-Migration Assessment

Before starting your migration, assess your current infrastructure:

Application Inventory

# Create an inventory of your current applications
echo "Application,Instances,Dependencies,Database,Storage" > app_inventory.csv

# Example inventory items
echo "web-frontend,3,api-backend,none,EBS" >> app_inventory.csv
echo "api-backend,5,database,PostgreSQL,EFS" >> app_inventory.csv
echo "background-jobs,2,redis+database,Redis+PostgreSQL,EBS" >> app_inventory.csv

Resource Analysis

# Analyze current resource usage
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2024-02-01T00:00:00Z \
  --end-time 2024-03-01T00:00:00Z \
  --period 3600 \
  --statistics Average,Maximum

Phase 1: EKS Cluster Setup

1. Infrastructure as Code Setup

First, let's set up our EKS cluster using Terraform:

# eks-cluster.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = var.cluster_name
  cluster_version = "1.28"

  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = module.vpc.private_subnets
  cluster_endpoint_public_access = true

  # EKS Managed Node Groups
  eks_managed_node_groups = {
    general = {
      min_size     = 2
      max_size     = 10
      desired_size = 3

      instance_types = ["t3.medium"]
      capacity_type  = "ON_DEMAND"

      k8s_labels = {
        Environment = var.environment
        NodeGroup   = "general"
      }

      update_config = {
        max_unavailable_percentage = 25
      }
    }

    spot = {
      min_size     = 0
      max_size     = 5
      desired_size = 2

      instance_types = ["t3.medium", "t3.large"]
      capacity_type  = "SPOT"

      k8s_labels = {
        Environment = var.environment
        NodeGroup   = "spot"
      }

      taints = {
        dedicated = {
          key    = "spot"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

  # Cluster access entry
  access_entries = {
    admin = {
      kubernetes_groups = ["system:masters"]
      principal_arn     = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/eks-admin"

      policy_associations = {
        admin = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            type = "cluster"
          }
        }
      }
    }
  }

  tags = {
    Environment = var.environment
    Terraform   = "true"
  }
}

2. Essential Add-ons Setup

# eks-addons.tf
resource "aws_eks_addon" "vpc_cni" {
  cluster_name = module.eks.cluster_name
  addon_name   = "vpc-cni"
}

resource "aws_eks_addon" "coredns" {
  cluster_name = module.eks.cluster_name
  addon_name   = "coredns"
}

resource "aws_eks_addon" "kube_proxy" {
  cluster_name = module.eks.cluster_name
  addon_name   = "kube-proxy"
}

resource "aws_eks_addon" "ebs_csi" {
  cluster_name = module.eks.cluster_name
  addon_name   = "aws-ebs-csi-driver"
}

3. Application Load Balancer Controller

# aws-load-balancer-controller.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: aws-load-balancer-controller
  name: aws-load-balancer-controller
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT-ID:role/AmazonEKSLoadBalancerControllerRole
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: aws-load-balancer-controller
  name: aws-load-balancer-controller
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/name: aws-load-balancer-controller
  template:
    metadata:
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/name: aws-load-balancer-controller
    spec:
      containers:
      - args:
        - --cluster-name=my-cluster
        - --ingress-class=alb
        image: amazon/aws-alb-ingress-controller:v2.7.2
        name: controller
        resources:
          limits:
            cpu: 200m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 200Mi
      serviceAccountName: aws-load-balancer-controller

Phase 2: Application Containerization

1. Containerizing Applications

Here's an example Dockerfile for a Node.js application:

# Multi-stage build for efficiency
FROM node:18-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Production stage
FROM node:18-alpine AS production

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy application files
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=nodejs:nodejs . .

USER nodejs

EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node healthcheck.js

CMD ["node", "index.js"]

2. Kubernetes Manifests

Create Kubernetes manifests for your applications:

# web-frontend-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: production
  labels:
    app: web-frontend
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-frontend
  template:
    metadata:
      labels:
        app: web-frontend
        version: v1
    spec:
      containers:
      - name: web-frontend
        image: your-account.dkr.ecr.us-west-2.amazonaws.com/web-frontend:v1.2.3
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          value: production
        - name: API_URL
          value: http://api-backend:8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
  name: web-frontend
  namespace: production
spec:
  selector:
    app: web-frontend
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-frontend
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:123456789012:certificate/12345678-1234-1234-1234-123456789012
    alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-frontend
            port:
              number: 80

Phase 3: Database and Storage Migration

1. Database Strategy

For stateful services, consider these patterns:

Option 1: External RDS (Recommended)

# database-config.yaml
apiVersion: v1
kind: Secret
metadata:
  name: database-secret
  namespace: production
type: Opaque
data:
  host: <base64-encoded-rds-endpoint>
  username: <base64-encoded-username>
  password: <base64-encoded-password>
  database: <base64-encoded-database-name>

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: database-config
  namespace: production
data:
  port: "5432"
  ssl_mode: "require"

Option 2: In-cluster Database (Development only)

# postgres-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: development
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_DB
          value: myapp
        - name: POSTGRES_USER
          value: myuser
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3
      resources:
        requests:
          storage: 20Gi

2. Storage Classes

Configure appropriate storage classes:

# storage-classes.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  throughput: "1000"
  iops: "3000"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-1234567890abcdef0
  directoryPerms: "700"

Phase 4: CI/CD Pipeline Migration

1. GitHub Actions for Kubernetes

# .github/workflows/deploy-to-eks.yml
name: Deploy to EKS

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  AWS_REGION: us-west-2
  EKS_CLUSTER_NAME: my-cluster
  ECR_REPOSITORY: my-app

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: ${{ env.AWS_REGION }}

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v2

    - name: Build, tag, and push image to Amazon ECR
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG $ECR_REGISTRY/$ECR_REPOSITORY:latest
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest

    - name: Update kube config
      run: aws eks update-kubeconfig --name $EKS_CLUSTER_NAME

    - name: Deploy to EKS
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        IMAGE_TAG: ${{ github.sha }}
      run: |
        # Update image in deployment
        kubectl set image deployment/web-frontend \
          web-frontend=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
          --namespace=production

        # Wait for rollout to complete
        kubectl rollout status deployment/web-frontend --namespace=production

    - name: Verify deployment
      run: |
        kubectl get pods --namespace=production
        kubectl get services --namespace=production

2. Helm Charts for Complex Applications

For more complex applications, use Helm:

# helm/Chart.yaml
apiVersion: v2
name: my-app
description: My Application Helm Chart
version: 0.1.0
appVersion: "1.0"

# helm/values.yaml
replicaCount: 3

image:
  repository: my-account.dkr.ecr.us-west-2.amazonaws.com/my-app
  tag: "latest"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: "alb"
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

Phase 5: Monitoring and Observability

1. Prometheus and Grafana Setup

# monitoring-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring

---
# prometheus-values.yaml for Helm
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

grafana:
  persistence:
    enabled: true
    storageClassName: gp3
    size: 10Gi

  adminPassword: "your-secure-password"

  ingress:
    enabled: true
    ingressClassName: alb
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
    hosts:
      - grafana.your-domain.com

2. Application Monitoring

# app-monitoring.yaml
apiVersion: v1
kind: ServiceMonitor
metadata:
  name: web-frontend
  namespace: production
spec:
  selector:
    matchLabels:
      app: web-frontend
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: web-frontend-alerts
  namespace: production
spec:
  groups:
  - name: web-frontend
    rules:
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: High error rate detected
        description: "Error rate is {{ $value }} errors per second"

    - alert: HighMemoryUsage
      expr: container_memory_usage_bytes{pod=~"web-frontend-.*"} / container_spec_memory_limit_bytes > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: High memory usage
        description: "Memory usage is above 80%"

Migration Strategy and Timeline

Week 1-2: Infrastructure Setup

Set up EKS cluster with Terraform
Configure networking and security groups
Install essential add-ons (ALB controller, CSI drivers)
Set up ECR repositories

Week 3-4: Application Containerization

Containerize applications
Create Kubernetes manifests
Set up development environment
Implement health checks and probes

Week 5-6: Database and Storage Migration

Set up RDS instances (if not existing)
Configure persistent volumes
Test database connectivity from pods
Implement backup strategies

Week 7-8: CI/CD Pipeline

Set up GitHub Actions workflows
Create Helm charts (if needed)
Implement automated testing
Set up deployment approvals

Week 9-10: Monitoring and Testing

Deploy monitoring stack
Set up dashboards and alerts
Performance testing
Security scanning

Week 11-12: Production Migration

Blue-green deployment setup
Gradual traffic migration
Monitor and optimize
Decommission old infrastructure

Real-World Migration Case Study

Client: SaaS platform with 50+ microservices on EC2

Challenge:

High operational overhead managing EC2 instances
Inconsistent deployment processes across teams
Difficulty scaling during traffic spikes
Manual security patching and updates

Solution:

Phased Migration: Migrated services in groups of 5-10
Service Mesh: Implemented Istio for inter-service communication
GitOps: Used ArgoCD for declarative deployments
Observability: Comprehensive monitoring with Prometheus/Grafana

Results:

75% reduction in deployment time
60% improvement in resource utilization
99.9% uptime achieved through auto-healing
50% reduction in operational overhead

Common Pitfalls and How to Avoid Them

1. Resource Sizing

# ❌ Bad: No resource limits
containers:
- name: app
  image: my-app:latest

# ✅ Good: Proper resource management
containers:
- name: app
  image: my-app:latest
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "512Mi"
      cpu: "500m"

2. Security Configuration

# ❌ Bad: Running as root
spec:
  containers:
  - name: app
    image: my-app:latest

# ✅ Good: Non-root user with security context
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
  containers:
  - name: app
    image: my-app:latest
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true

3. Networking Issues

# Ensure proper network policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-frontend-netpol
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-frontend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 3000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: api-backend
    ports:
    - protocol: TCP
      port: 8080

Post-Migration Optimization

1. Cost Optimization

# Use Spot instances for non-critical workloads
kubectl taint nodes spot=true:NoSchedule

# Set up Cluster Autoscaler
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

2. Performance Tuning

# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-frontend-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-frontend
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Conclusion

Migrating to Kubernetes on AWS EKS is a significant undertaking that requires careful planning, execution, and ongoing optimization. The benefits—improved scalability, operational efficiency, and cost optimization—make it worthwhile for most organizations running containerized workloads.

Key success factors include:

Thorough planning and application assessment
Phased migration approach to minimize risk
Proper monitoring and observability from day one
Team training on Kubernetes concepts and operations
Continuous optimization based on real-world usage patterns

Remember that migration to Kubernetes is not just a technology change—it's also a cultural shift toward DevOps practices and shared responsibility for application lifecycle management.

Ready to migrate to Kubernetes? Our team has successfully migrated hundreds of applications to EKS and can help you plan and execute your migration strategy. Contact us for a consultation.

Related Resources:

Why Migrate to Kubernetes on EKS?​

Scalability and Efficiency​

Operational Benefits​

AWS EKS Advantages​

Pre-Migration Assessment​

Application Inventory​

Resource Analysis​

Phase 1: EKS Cluster Setup​

1. Infrastructure as Code Setup​

2. Essential Add-ons Setup​

3. Application Load Balancer Controller​

Phase 2: Application Containerization​

1. Containerizing Applications​

2. Kubernetes Manifests​

Phase 3: Database and Storage Migration​

1. Database Strategy​

2. Storage Classes​

Phase 4: CI/CD Pipeline Migration​

1. GitHub Actions for Kubernetes​

2. Helm Charts for Complex Applications​

Phase 5: Monitoring and Observability​

1. Prometheus and Grafana Setup​

2. Application Monitoring​

Migration Strategy and Timeline​

Week 1-2: Infrastructure Setup​

Week 3-4: Application Containerization​

Week 5-6: Database and Storage Migration​

Week 7-8: CI/CD Pipeline​

Week 9-10: Monitoring and Testing​

Week 11-12: Production Migration​

Real-World Migration Case Study​

Common Pitfalls and How to Avoid Them​

1. Resource Sizing​

2. Security Configuration​

3. Networking Issues​

Post-Migration Optimization​

1. Cost Optimization​

2. Performance Tuning​

Conclusion​