Kubernetes Fundamentals: Volumes

Published in

FAUN — Developer Community 🐾

8 min readMar 19, 2021

Data storage in a distributed environment is complex in itself. Add to that virtualisation and automation and you have multiple layers of planning to do. So, how does data work in a Kubernetes cluster?

Here I assume you are familiar with the basic objects in Kubernetes and how they interact with each other.

There are 2 Parts

There are 2 components to volumes in Kubernetes —

The Cluster Object.
The Physical Storage.

The persistence and share-ability of these components are independent of each other. This means if the object dies it doesn’t imply that the physical data is lost/cleared. A new object can be created and attached to the same physical storage. For beginners, this part can be quite confusing since persistence of object is often equated to persistence of data. It is not.

Multiple objects can point to the same physical storage. Persistence of the object is independent of the physical storage persistence — Volumes in Kubernetes

I — Cluster Objects

Kubernetes supports 2 types of objects —

Volumes
Persistent Volumes

Remember, we are talking about the cluster object, not the physical storage. Irrespective of whether it's a Volumes or Persistent volume you have to plan on what physical storage will back the object— where will the files be actually saved.

Kubernetes Volumes

Containers inside a Pod can share Kubernetes Volume objects. Hence, the life of such objects is tightly associated with the life of the Pod. These objects do not have a life independent of the Pod. As soon as the Pod dies this object dies. This object persists across container restarts.

Anytime you want to access data in a container of a Pod you have to use a Kubernetes Volume object — whether the data is from a Folder, ConfigMap, Secret, or a Kubernetes Persistent Volume.

Kubernetes Persistent Volumes (PV)

Pods can share Kubernetes Persistent Volume objects — irrespective of the node where the Pod is created on. Hence, the life of persistent volume objects are independent of any other object in the cluster. Remember, Kubernetes will NEVER automatically delete a Persistent Volume object. You have to always do/trigger the deletion manually.

Just because the persistent volume object can be shared across Pods it does not mean Pods across nodes will have access to the same files — that depends on the physical storage. So, if the physical storage is a simple folder then Pods across nodes need not have the same files. On the other hand if, the physical storage is an NFS volume then, Pods across nodes can have the same files. Beginners tend to stumble on this.

For a container to access a Kubernetes Persistent Volume it has to create a Kubernetes Volume backed by the Kubernetes Persistent Volume. Remember, containers can only access/mount Kubernetes Volumes.

How to define and use a Persistent Volume

Defining and using a Kubernetes Persistent Volumes isn’t as simple as a Kubernetes Volume. The easiest way to understand this is by breaking it down into 2 steps:

Splitting the physical storage into multiple cluster objects of defined size — Kubernetes Persistent Volume object (PV)
Describing the type of storage your application needs and finding the cluster object that matching this requirement — Kubernetes Persistent Volume Claim object (PVC)

When a PVC object is created, K8s searches through all available PV objects and finds the one that matches the PVC. As soon as a match is found, the PV object is attached to the claim. It is a 1–1 relationship i.e., one PV object can be attached to only one PVC and vice versa. This PVC object is then used by the Pod.

II — Physical Storage

Kubernetes supports a lot of volume types (that is, physical storage solutions). You can find a complete list in the k8s official documentation. Depending on the type of volume (storage solution) you choose the parameters that need to be passed change.

An interesting point to remember is, some of these storage solutions can be dynamically provisioned — these are especially useful when you want to work with StatefulSets objects.

Kubernetes Volume Syntax

Remember,

K8s volumes do not have independent existence — tightly coupled with the life of the Pod. Hence, they are defined as part of your pod spec.
You can have multiple K8s volume objects defined inside a Pod
The same K8s volume object can be mounted inside multiple containers in a Pod

apiVersion: v1
kind: Pod
metadata:
  name: [POD NAME]
spec:
  volumes:
    - [K8s VOL OBJ NAME]
      [STORAGE SOLUTION TYPE]
        [STORAGE SOLUTION PARAMETERS]
    - ....
    - ....
  containers:
    - name: [CONTAINER NAME]
      image: [CONATINER IMAGE]
      volumeMounts
        - name: [ANY K8s VOL OBJ NAME DEFINE ABOVE]
          path: [PATH INSIDE THE CONTAINER TO MOUNT THE VOL]
    - name: [CONTAINER NAME]
      image: [CONATINER IMAGE]
      volumeMounts
        - name: [ANY K8s VOL OBJ NAME DEFINED ABOVE]
          path: [PATH INSIDE THE CONTAINER TO MOUNT THE VOL]
    - ...
    - ...

Example:

A K8s Volume object with the name shared-data is defined
Both the containers, first and second mount the same object inside the container. Note that the mount path need not be the same.
Container second is manipulating a file in the mount path which will be available in the container first since they are sharing the same K8s volume object.
Here the physical storage solution used is emptyDir — which is an ephemeral storage solution.

apiVersion: v1
kind: Pod
metadata:
  name: two-containers
  labels:
    app: two
spec:
  volumes:
  - name: shared-data
    emptyDir: {}
  containers:
  - name: first
    image: nginx
    ports:
      - containerPort: 80
    volumeMounts:
      - name: shared-data
        mountPath: /usr/share/nginx/html
  - name: second
    image: debian
    volumeMounts:
    - name: shared-data
      mountPath: /pod-data
    command: ["/bin/sh"]
    args:
      - "-c"
      - >
        while true; do
          date >> /pod-data/index.html;
          echo Hello from the second container more >> /pod-data/index.html;
          sleep 20;
        done

Kubernetes Persistent Volume Syntax

Remembering the break down that we mentioned before, these are the steps to use a PV.

Step 1: Define the Kubernetes Persistent Volume Object
This is where you define the cluster object and the physical storage that backs the object. Every Kubernetes PV object defines these 4 attributes:

Capacity: What is the size of the volume object. This size is defined in gigibytes (Gi) by default.
AccessModes: Describes how the object can be accessed by the nodes — ReadWriteOnce, ReadWriteMany, ReadOnlyMany. This depends on the underlying storage solution too. For example, NFS can support multiple read/write, but a specific NFS volume object can be defined as read-only.
StorageClassName: What type of physical storage are you using and which class should K8s use to access the physical storage
Additional storage parameters: any values needed for the physical storage type

apiVersion: v1
kind: PersistentVolume
metadata:
  name: [OBJECT NAME]
spec:
  storageClassName: [PHYSICAL STORAGE NAME]
  capacity:
    storage: [SIZE IN GiGiBytes (by default)]
  accessModes:
    - [ACCESS MODE; ReadWriteOnce/ReadOnlyMany/ReadWriteMany]
  [PHYSICAL STORAGE PARAMETERS]

Step 2: Define the Kubernetes Persistent Volume Claim Object
Once the volume objects are ready, we need to describe the type of volume our application needs. For this we create a PVC object. The PVC object describes the StorageClassName, AccessMode and Capacity needed for our application. Kubernetes will then search for a PV object that matches this claim and attaches it.

A point to note here is, the StorageClassName and AccessMode have to be an exact match but, the Capacity can be equal to or greater. So, if a 1 Gi object isn’t available it will search for the next higher object (≥ 1Gi).

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: [OBJECT NAME]
  labels:
    type: local
spec:
  storageClassName: [PHYSICAL STORAGE YOU NEED]
  accessModes:
    - [ACCESS MODE YOU NEED]
  resources:
    requests:
      storage: [SIZE YOU NEED IN Gigi Bytes (by default)]

Step 3: Use the Volume Claim object to create a Kubernetes Volume object in the Pod and mount it into the containers
Like mentioned before, only Kubernetes Volume objects can be mounted inside a container. So,

first, you define a Kubernetes Volume in the Pod backed by the PVC
then, you mount it inside the container as usual

Remember, the rules for Kubernetes Volumes apply here too — you can mount this same Kubernetes Volume object inside multiple containers in the Pod.

apiVersion: v1
kind: Pod
metadata:
  name: [POD NAME]
spec:
  volumes:
    - [K8s VOL OBJ NAME]
      persistentVolumeClaim:
        claimName: [PVC OBJ NAME]
    - ....
    - ....
  containers:
    - name: [CONTAINER NAME]
      image: [CONATINER IMAGE]
      volumeMounts
        - name: [K8s VOL OBJ NAME]
          path: [PATH INSIDE THE CONTAINER TO MOUNT THE VOL]
    - ...
    - ...

Example:

PV object named pv-volume is created. Physical storage solution is a folder on the host (/mnt/data) hence, the access mode is ReadWriteOnce (i.e., only 1 node can use it as ReadWrite)
PVC object name pv-volume-claim is created — the capacity requirement is slightly lesser than the available pv object. Hence, a successful match will happen.
The same PVC is connected to 2 pods — task-pv-pod and task-pv-pod2 — via Kubernetes Volume objects defined in the Pod spec (called pv-storage and pv-storage2 respectively)
This Kubernetes volume object is then mounted inside the container.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pv-volume-claim
  labels:
    type: local
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod
spec:
  volumes:
    - name: pv-storage
      persistentVolumeClaim:
        claimName: pv-volume-claim
  containers:
    - name: pv-container
      image: nginx
      ports:
        - containerPort: 80
      volumeMounts:
        - name: pv-storage
          mountPath: "/usr/share/nginx/html"---
apiVersion: v1
kind: Pod
metadata:
  name: task-pv-pod2
spec:
  volumes:
    - name: pv-storage2
      persistentVolumeClaim:
        claimName: pv-volume-claim
  containers:
    - name: pv-container2
      image: nginx
      ports:
        - containerPort: 80
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: pv-storage2