Yoga in the Morning to Set Intentions

This class is another installment of morning yoga! Practicing yoga in the morning can be a wonderful way to set intentions for the day, but whether you hit the mat or not, setting daily intentions is…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Secure container runtime

Container is a great tech, but it has problems with security: weak isolation. To cover for it, our company has a policy that enforce all containers in our k8s cluster must run as non-root users. However, there ‘s still some exceptions that need root privileges, like our mail relay pod using postfix. To let those services run on our k8s while maintaining our security policies, we need a tighter isolation container runtime.

While the new container runtime has better isolation than runc — our current runtime, it also must satisfy:

At first, we had 3 candidates, picked up by our boss: kata container, gvisor and firecracker. However, I removed firecracker from the list because it ‘s actually not container runtime. It ‘s VMM (virtual machine monitor), which is optimized to run serverless functions. It ‘s not follow OCI standards, therefore there ‘s no easy way to make it work with our k8s. That leave us gvisor and firecracker to choose.

Gvisor is an application kernel, written in Go. Its runtime is called runsc and follows OCI standard making it easy to integrate with k8s.

Gvisor creates a sandbox to run containers in the user-space kernel, implement a part of Linux system calls. The containers, instead of running on host kernel, will run on a separated gvisor kernel, which is very similar to a virtual machine. However, unlike VM, gvisor doesn’t need translation through virtual hardware.

Because gvisor create a filter between application and kernel, it intercepts with system calls, and will cause performance degradation in case of heavy system calls.

At the time of this post, gvisor provides an environment equivalent to Linux4.4 with limited system calls so incompatibilities may occur.

Is a combination of Clear Containers (intel) and runv (Hyper.sh) and follow OCI standard, so it could be seamlessly plugged into k8s.

Kata container make use of virtualization technology to secure containers, by putting containers into a light-weight Virtual machine. The machine image is optimized to run containers, with only needed virtual devices and software to run containers. Because of that, you could get isolation of VM on containers — the best of both worlds.

Kata container works by start a VM with standard linux kernel. Inside the VM, an agent will be used to manage the containers using libcontainer, kata runtime will manage the agent via vsock.

The downside is it need some resources overhead for each VM. We could reduce the OS memory footprint by enabling KSM (kernel samepage merging), with the cost of some CPU power in case we have multiple kata pods running on the same host.

Based on what we know, we could come to conclusions:

To compare the performance of kata and gvisor, I created a simple sysbench container and start the container with 3 runtime in turn on the same host.

The result shows that for cpu bench, all 3 runtime has the same score (with very small margin, which we could ignore). That ‘s understandable, because sysbench evaluates cpu power by calculate the prime numbers smaller or equal to cpu-map-prime parameter. It ‘s arithmetic calculation and involved almost no syscalls .

At the memory benchmark, runc performance is higher than kata and gvisor about 13%

You could see that gvisor ‘s performance is very poor in this case, which is the main reason why I choose kata over gvisor.

As both gvisor and kata follow OCI standard, integrate to k8s is easy.

Create kata-runtime.yaml file with content:

Note: We need to define overhead of runtime class, this will count for resource quota calculations, node scheduling as well as Pod cgroup sizing.

Apply on k8s:

2. Install kata runtime on nodes

As we ‘re using debian:

3. Configure containerd by adding to /etc/containerd/config.toml

Restart containerd to apply the new config

4. Deploy new pod with kata runtime

We just need to define directive “runtimeClassName: kata” at pod spec:

I setup an prototype pod using kata runtime. It run well for more than 1 month without problem. And the setup is simple, you could do it within minutes. Even the overhead is small, we ‘re willing to accept it for increased security. But when I presented this solution to my teammates, there ‘s one raised his concerns about adding one more layer to our stack will make the troubleshooting problems more difficult. That ‘s reasonable and he has his points. However, security comes at prices —and it ‘s a price we ‘re willing to pay.

Add a comment

Related posts:

Chitai Pita

This is a Bengali cake. this is very popular in Bangladesh and West Bengal. This cake looks round and flat. This cake is prepared in hot tawa, mixed with rice flour. This cake is eaten with molasses…

Artajasa dan Wikimedia

Ada dua acara daring yang saya ikuti pada Jumat dan Sabtu lalu. Acara pertama adalah webinar dengan pegawai Artajasa, sedangkan acara kedua adalah diskusi dengan komunitas Wikimedia Indonesia. Karena…

7 LinkedIn Networking Mistakes to Avoid in Tech

Avoid these common LinkedIn mistakes to send great connection requests and make meaningful connections online this year as a tech professional.