Fargate for EKS is great serverless service from AWS, but the nature of the serverless service is ephemeral, and there is no way to run daemon set on Fargate, so no way to run Datadog to collect metrics and log etc. when something goes wrong it will be great if we could still check the log even after the pod and ec2 node has gone. we could mount EFS to the pod running on fargate , if we could write log to EFS system then the information will be still available after pod get destroyed

efs

There are already blogs to talk about how to enalbe EFS on Fargate, like https://aws.amazon.com/blogs/aws/new-aws-fargate-for-amazon-eks-now-supports-amazon-efs/, but it's works for newly created eks cluster only, and assume CSI driver already been installed in the cluster, for my case I have an really old EKS cluster which is upgraded to newer version. so the CSI driver may not get installed, we have to install them manually, below is the steps I have to perform to make EFS work on my Fargate for EKS

create iam policy for CSI driver

  • save content below to efs-csi-iam-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:DescribeAccessPoints",
        "elasticfilesystem:DescribeFileSystems"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:CreateAccessPoint"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/efs.csi.aws.com/cluster": "true"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "elasticfilesystem:DeleteAccessPoint",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
        }
      }
    }
  ]
}
  • create policy by
aws iam create-policy \
    --policy-name AmazonEKS_EFS_CSI_Driver_Policy \
    --policy-document file://efs-csi-iam-policy.json

create iam role for the CSI driver

  • get OIDC provider
aws eks describe-cluster --name acme-Development --query "cluster.identity.oidc.issuer" --output text
  • create role policy, by saving content below to efs-csi-iam-role.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::xxxx:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/xxxx"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/xxxx:sub": "system:serviceaccount:kube-system:efs-csi-controller-sa"
        }
      }
    }
  ]
}
  • create the iam role
aws iam create-role \
  --role-name AmazonEKS_EFS_CSI_DriverRole \
  --assume-role-policy-document file://efs-csi-iam-role.json
  • attach policy to role
attach-role-policy:
    aws iam attach-role-policy \
  --policy-arn arn:aws:iam::xxxxx:policy/AmazonEKS_EFS_CSI_Driver_Policy \
  --role-name AmazonEKS_EFS_CSI_DriverRole

install CSI driver

  • create service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: efs-csi-controller-sa
  namespace: kube-system
  labels:
    app.kubernetes.io/name: aws-efs-csi-driver
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxxxx:role/AmazonEKS_EFS_CSI_DriverRole
  • get the driver.yaml as below
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: aws-efs-csi-driver
  name: efs-csi-external-provisioner-role
rules:
  - apiGroups:
      - ""
    resources:
      - persistentvolumes
    verbs:
      - get
      - list
      - watch
      - create
      - delete
  - apiGroups:
      - ""
    resources:
      - persistentvolumeclaims
    verbs:
      - get
      - list
      - watch
      - update
  - apiGroups:
      - storage.k8s.io
    resources:
      - storageclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - events
    verbs:
      - list
      - watch
      - create
  - apiGroups:
      - storage.k8s.io
    resources:
      - csinodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs:
      - get
      - watch
      - list
      - delete
      - update
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: aws-efs-csi-driver
  name: efs-csi-provisioner-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: efs-csi-external-provisioner-role
subjects:
  - kind: ServiceAccount
    name: efs-csi-controller-sa
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: aws-efs-csi-driver
  name: efs-csi-controller
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: efs-csi-controller
      app.kubernetes.io/instance: kustomize
      app.kubernetes.io/name: aws-efs-csi-driver
  template:
    metadata:
      labels:
        app: efs-csi-controller
        app.kubernetes.io/instance: kustomize
        app.kubernetes.io/name: aws-efs-csi-driver
    spec:
      containers:
        - args:
            - --endpoint=$(CSI_ENDPOINT)
            - --logtostderr
            - --v=2
            - --delete-access-point-root-dir=false
          env:
            - name: CSI_ENDPOINT
              value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
          image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-efs-csi-driver:v1.2.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 3
          name: efs-plugin
          ports:
            - containerPort: 9909
              name: healthz
              protocol: TCP
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/csi/sockets/pluginproxy/
              name: socket-dir
        - args:
            - --csi-address=$(ADDRESS)
            - --v=2
            - --feature-gates=Topology=true
            - --leader-election
          env:
            - name: ADDRESS
              value: /var/lib/csi/sockets/pluginproxy/csi.sock
          image: public.ecr.aws/eks-distro/kubernetes-csi/external-provisioner:v2.1.1-eks-1-18-2
          name: csi-provisioner
          volumeMounts:
            - mountPath: /var/lib/csi/sockets/pluginproxy/
              name: socket-dir
        - args:
            - --csi-address=/csi/csi.sock
            - --health-port=9909
          image: public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.2.0-eks-1-18-2
          name: liveness-probe
          volumeMounts:
            - mountPath: /csi
              name: socket-dir
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: efs-csi-controller-sa
      tolerations:
        - operator: Exists
      volumes:
        - emptyDir: {}
          name: socket-dir
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app.kubernetes.io/name: aws-efs-csi-driver
  name: efs-csi-node
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: efs-csi-node
      app.kubernetes.io/instance: kustomize
      app.kubernetes.io/name: aws-efs-csi-driver
  template:
    metadata:
      labels:
        app: efs-csi-node
        app.kubernetes.io/instance: kustomize
        app.kubernetes.io/name: aws-efs-csi-driver
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: eks.amazonaws.com/compute-type
                    operator: NotIn
                    values:
                      - fargate
      containers:
        - args:
            - --endpoint=$(CSI_ENDPOINT)
            - --logtostderr
            - --v=2
          env:
            - name: CSI_ENDPOINT
              value: unix:/csi/csi.sock
          image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-efs-csi-driver:v1.2.1
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthz
              port: healthz
            initialDelaySeconds: 10
            periodSeconds: 2
            timeoutSeconds: 3
          name: efs-plugin
          ports:
            - containerPort: 9809
              name: healthz
              protocol: TCP
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /var/lib/kubelet
              mountPropagation: Bidirectional
              name: kubelet-dir
            - mountPath: /csi
              name: plugin-dir
            - mountPath: /var/run/efs
              name: efs-state-dir
            - mountPath: /var/amazon/efs
              name: efs-utils-config
            - mountPath: /etc/amazon/efs-legacy
              name: efs-utils-config-legacy
        - args:
            - --csi-address=$(ADDRESS)
            - --kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)
            - --v=2
          env:
            - name: ADDRESS
              value: /csi/csi.sock
            - name: DRIVER_REG_SOCK_PATH
              value: /var/lib/kubelet/plugins/efs.csi.aws.com/csi.sock
            - name: KUBE_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          image: public.ecr.aws/eks-distro/kubernetes-csi/node-driver-registrar:v2.1.0-eks-1-18-2
          name: csi-driver-registrar
          volumeMounts:
            - mountPath: /csi
              name: plugin-dir
            - mountPath: /registration
              name: registration-dir
        - args:
            - --csi-address=/csi/csi.sock
            - --health-port=9809
            - --v=2
          image: public.ecr.aws/eks-distro/kubernetes-csi/livenessprobe:v2.2.0-eks-1-18-2
          name: liveness-probe
          volumeMounts:
            - mountPath: /csi
              name: plugin-dir
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/os: linux
      priorityClassName: system-node-critical
      tolerations:
        - operator: Exists
      volumes:
        - hostPath:
            path: /var/lib/kubelet
            type: Directory
          name: kubelet-dir
        - hostPath:
            path: /var/lib/kubelet/plugins/efs.csi.aws.com/
            type: DirectoryOrCreate
          name: plugin-dir
        - hostPath:
            path: /var/lib/kubelet/plugins_registry/
            type: Directory
          name: registration-dir
        - hostPath:
            path: /var/run/efs
            type: DirectoryOrCreate
          name: efs-state-dir
        - hostPath:
            path: /var/amazon/efs
            type: DirectoryOrCreate
          name: efs-utils-config
        - hostPath:
            path: /etc/amazon/efs
            type: DirectoryOrCreate
          name: efs-utils-config-legacy
---
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
  annotations:
    helm.sh/hook: pre-install, pre-upgrade
    helm.sh/hook-delete-policy: before-hook-creation
    helm.sh/resource-policy: keep
  name: efs.csi.aws.com
  namespace: kube-system
spec:
  attachRequired: false

create the storageclass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-xxxxx # this is the EFS system id could be found from aws console
  directoryPerms: "700"
  # gidRangeStart: "1000" # optional
  # gidRangeEnd: "2000" # optional
  # basePath: "/dynamic_provisioning" # optional

create pv and pvc

apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-pv-flow
  namespace: flow
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: efs-sc
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-xxxx

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim-flow
  namespace: flow
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi

create the pod to test EFS

apiVersion: v1
kind: Pod
metadata:
  name: efs-app
  namespace: flow
  labels:
    infrastructure: fargate
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: efs-claim-flow

I've created an fargate profile to match label infrastructure: fargate on namespace flow to schedule pod on faragte

after the pod get created, we could login to the pod to check the file system, the data folder has mounted efs, so we could access efs from pod now!

kubectl exec -it efs-app -n flow -- bash
[root@efs-app /]# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          30G   10G   18G  36% /
tmpfs            64M     0   64M   0% /dev
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
127.0.0.1:/     8.0E  402G  8.0E   1% /data
overlay          30G   10G   18G  36% /etc/hosts
/dev/xvdcz       30G   10G   18G  36% /etc/hostname
shm              64M     0   64M   0% /dev/shm
tmpfs           2.0G   12K  2.0G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           2.0G     0  2.0G   0% /proc/acpi
tmpfs           2.0G     0  2.0G   0% /proc/scsi
tmpfs           2.0G     0  2.0G   0% /sys/firmware