How to provision persistent volume in Kubernetes¶
20 September 2018¶
I recently got the chance to speak at DevoxxFR. I loved the event, and even more the chat I had in the hallway track. I went there to talk about the different ways of managing stateful applications via Kubernetes
, this talk raised a lot of questions which inspired me to write this blog post.
Most of this questions were about the persistent volume
provisioning, so let's dive into this subject.
The 2 ways of managing it:
Manually creating the persistent volume
, which is leading to you creating all the persistent volume on your own (the last thing you want to do)Automatically creating the persistent volume
by letting Kubernetes talk with your storage backend so that it can provision itself what it needs
If you do not know what is a persistent volume
, that's ok . Let say that you can see it as a network disk or a network drive, whatever rings a bell in your mind.
Some would argue that there is also the statefulset
object and I would answer that from a state perspective the statefulset
is using the automatic creation of the persistent volume
, nothing more. Hoever statefulset
are providing a number of feature usually useful when dealing with stateful applications. The most useful one, in my opinion, being the ordered deployment and deletion of pods.
Let set some context¶
To make it easier to understand, let put some context here. Let's say that we are in a company called BigCo
. BigCo
has already rolled out Kubernetes
to manage all services it provides. John
and Bob
are 2 employees of BigCo
. John
is a sysadmin and Bob
is a developer. One day Bob
come to John
with an ask, he would need to run a MySQL database in the Kubernetes
cluster hosting his application.
Manually creating the persistent volumes¶
After some research John
is ready to share a small POC with Bob
. John
just want to validate the concept, therefore he choose to create the persistent manually himself and don't get into the automatic persistent volume creation.
The flow¶
So here is how the provisioning process work in this situation:
1.BigCo
has a Kubernetes
cluster and a storage backend up and running. They can both communicate together:
2.John
create a persistent volume
:
3.Bob
create a persistent volume claim
referencing the persistent volume
previously created:
4.Bob
create a pod
and reference the persistent volume claim
in this pod
:
5.Kubernetes
mount the persistent volume
in the pod
:
The code¶
Tested on Minikube, if you are not running Minikube, there will some minor change to do, mainly in the persistent volume
description.
1.Deploy a namespace
(that's always the first thing to do):
1 2 3 4 |
|
persistent volume
:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
persistent volume claim
:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
pod
. Here we will be deploying it using a deployment
object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
service
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Kubernetes
cluster using this command:
1 |
|
The result¶
Now if the pod dies and is rescheduled somewhere else, the data will not be lost.
Limitation¶
John
had to create a persistent volume before Bob
could do his job, in BigCo
, that is usually meaning the Bob
had to submit a ticket to John
. Which is usually source of slowdown in the process, which we want to avoid.
Automatically creating the persistent volumes¶
As Bob
is happy with the solution proposed by John
, he wants spin up more instances. John
quickly need to industrialize the process and after some additional research, he put in place the automatique creation of the persistent volume
.
The flow¶
So here is how the provisioning process work in this situation:
1.Again, BigCo
has a Kubernetes
cluster and a storage backend up and running. They can both communicate together:
2.John
create one or many storage class
, you can see storage class
as type of storage (SSD, HDD, ...):
3.Bob
create a persistent volume claim
referring this time to one of the storage class
:
4.Kubernetes
create the persistent volume
based on the persistent volume claim
and the storage class
chosen:
5.Bob
create a pod
and reference the persistent volume claim
in the pod
:
6.Kubernetes
mount the persistent volume
in the pod
:
The code¶
Here again tested on Minikube. Should also work on other Kubernetes
cluster with minor changes to the storage class
.
1.Deploy 2 storage class
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
namespace
:
1 2 3 4 |
|
persistent volume claim
:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
pod
, again using a deployment
object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
service
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Kubernetes
cluster using this command:
1 |
|
The result¶
Same thing as previously, if the pod dies and is rescheduled somewhere else, the data will not be lost.
We even reused a lot of the code, basically, the biggest changes are the storage class
introduction and the fact that the persistent volume claim
is now referencing to the storage class
instead of the persistent volume
.
Additionally, once the many storage class
have been created by John
, Bob
can create without delay the persistent volume
he need.
Limitation¶
As persistent volume
are dynamically created, some people tend to forget to manage the all lifecycle, which can lead to persistent volume
not deleted even if they are not used anymore. However, this can be easily worked around by reviewing from time to time the list of persistent volume
instantiated and not linked to a pod
.
Conclusion¶
To sum up everything, use the automated way, as always. And if you want to manage stateful app on top of Kubernetes, Statefulset is worth a look.