WEBVTT

00:00.000 --> 00:22.000
All right, let's get back to it. I'm going to be talking about running continuous in the next VMs.

00:23.000 --> 00:25.000
Thanks so much.

00:26.000 --> 00:31.000
Thanks very much for the intro. Thanks for being here.

00:32.000 --> 00:34.000
I'm Anastasia's Nanos. I'm along with my colleague Babbi.

00:34.000 --> 00:38.000
We're going to talk about sandboxing applications.

00:39.000 --> 00:43.000
The overhead, the isolation boundaries and blah, blah, blah.

00:44.000 --> 00:49.000
So about us, we're a really small company with the research.

00:49.000 --> 00:52.000
We focus on systems software.

00:53.000 --> 00:56.000
Of course, what we're presenting is not entirely our work.

00:57.000 --> 01:00.000
There's a team that works on specific stuff and other stuff.

01:01.000 --> 01:03.000
We also do hardware acceleration.

01:04.000 --> 01:05.000
We do hypervisors.

01:06.000 --> 01:17.000
So the the overview of this presentation is essentially we're going to talk about containers about how we sandbox applications

01:18.000 --> 01:20.000
to execute them.

01:21.000 --> 01:24.000
About our favorite sandbox container and time cata.

01:25.000 --> 01:33.000
And how we try to optimize the execution of applications in the same setting as cata, but another way.

01:34.000 --> 01:38.000
So we all know we are in the in the containers.

01:39.000 --> 01:44.000
We all know about containers. We most probably like using them.

01:45.000 --> 01:52.000
Apparently, it's the default application packaging and deployment framework right now.

01:53.000 --> 01:59.000
You can do a docker build something and docker on something on your laptop.

02:00.000 --> 02:04.000
You can deploy that in the cloud on edge devices.

02:05.000 --> 02:07.000
So it's really, really easy.

02:08.000 --> 02:12.000
Containers offer these these isolation mechanisms.

02:13.000 --> 02:16.000
So we've got the process level isolation thing.

02:17.000 --> 02:20.000
It's based on on mechanisms that the Linux kernel offers.

02:21.000 --> 02:24.000
So we we have got namespaces.

02:25.000 --> 02:30.000
We've got we can limit the usage of resources through C groups.

02:31.000 --> 02:36.000
The main concern from mainly from cloud providers from from

02:37.000 --> 02:41.000
cloud to talent execution environments.

02:42.000 --> 02:45.000
Let's say is that what you you share the Linux kernel.

02:46.000 --> 02:49.000
So you have you have a shared surface that you can attack.

02:50.000 --> 02:56.000
So what cloud providers do in order to offer you service for for containers.

02:57.000 --> 03:02.000
They sandbox these containers using another another isolation layer.

03:02.000 --> 03:08.000
Which could be software like the device or that was that was mentioned in the previous talk.

03:09.000 --> 03:13.000
Or using hardware assisted isolation.

03:14.000 --> 03:16.000
We we focus on that.

03:17.000 --> 03:21.000
And by by hardware assisted isolation.

03:22.000 --> 03:24.000
Think by by hardware assisted isolation.

03:25.000 --> 03:26.000
What we mean is that.

03:26.000 --> 03:35.000
We we put a VM and inside this VM there's a mechanism that can spawn the container that the user asked.

03:36.000 --> 03:42.000
And this this this premise is used in in the cataccontainers.

03:43.000 --> 03:52.000
Runtime which is a sandbox container runtime meaning that you do a doctor run and net city and run whatever.

03:52.000 --> 03:58.000
A micro VM is spawned or a VM full of let's VM is spawned inside this VM.

03:59.000 --> 04:07.000
There's there's a management utility that the catay agent which communicates with the runtime which runs outside the VM.

04:08.000 --> 04:14.000
And it spawns a container the container that you that that the user asked.

04:15.000 --> 04:19.000
However, in in order to do that you need to spawn the VM.

04:19.000 --> 04:24.000
So you have a you haven't an early a cold start overhead let's say.

04:25.000 --> 04:27.000
You have to have a Linux kernel.

04:27.000 --> 04:29.000
You have to have a root of phase for the VM.

04:30.000 --> 04:33.000
There's it's it's a complicated stack.

04:34.000 --> 04:37.000
So digging in a bit a bit more into into cata.

04:38.000 --> 04:40.000
So in when we do a doggarun.

04:41.000 --> 04:45.000
Using the the the the cata containers runtime.

04:45.000 --> 04:54.000
We spawn a VM with a Linux kernel usually it's cata the the cata releases are shipped with a specialized.

04:55.000 --> 04:57.000
With a specialized config a bit of stripped down.

04:58.000 --> 05:01.000
There is a specialized root of phase based on various.

05:02.000 --> 05:07.000
Distros depending on what what the user wants essentially.

05:08.000 --> 05:10.000
So the the VM is spawned.

05:11.000 --> 05:20.000
We establish communication between the runtime and the and the cata agent running as they need process in the VM in the micro VM.

05:21.000 --> 05:22.000
And and the containers are spawned.

05:23.000 --> 05:26.000
So we've got the side car containers in in the Kubernetes setting.

05:27.000 --> 05:29.000
The whole pod is a single VM.

05:30.000 --> 05:37.000
So we've got side car containers and we've got the user code in the in the container all within the same VM.

05:37.000 --> 05:48.000
The isolation boundaries now are between the infrastructure between the the host system and the containers is the is the hardware assisted isolation.

05:49.000 --> 05:52.000
And between the containers inside this pod.

05:53.000 --> 05:58.000
We've got the the standard the content the least containers inside this micro VM.

05:59.000 --> 06:04.000
They said the same we look scanner they say they have a C groups and namespaces and all that stuff.

06:05.000 --> 06:11.000
Now I'm going to give the floor to babies to to talk about what we have been doing to optimize this.

06:12.000 --> 06:14.000
So hello from my side to.

06:15.000 --> 06:19.000
So we we thought we can do the opposite.

06:20.000 --> 06:23.000
So as we see on your left.

06:24.000 --> 06:32.000
We have like the typical VM and inside we have a container on time that spawns and handles and manages our containers.

06:33.000 --> 06:37.000
And we usually have some kind of remote management of the containers that run inside the VM.

06:38.000 --> 06:43.000
But what if we do the opposite what if we just spawn a VM inside a container.

06:44.000 --> 06:51.000
So that's what we wanted to do but of course because we are also familiar with the cata containers.

06:51.000 --> 06:55.000
We couldn't do that with cata but we can do that with the urnc.

06:56.000 --> 06:58.000
urnc is a container on time.

06:59.000 --> 07:04.000
It's a serial compatible in this specialized from managing the unit kernels as containers.

07:05.000 --> 07:12.000
It's very extensible you can add whatever new gifts and whatever new hypervisor you want to support.

07:13.000 --> 07:20.000
And the key differences between urnc and other kind of sandbox container on times are that first of all urnc spawns.

07:21.000 --> 07:23.000
The application inside the VM.

07:24.000 --> 07:26.000
But it treats the VM as the actual process.

07:27.000 --> 07:36.000
So it's kind of like having your process running inside the VM and urnc spawns is the process through the VM and we have one VM per container.

07:37.000 --> 07:43.000
Meaning that all kind of containers will be spawned in a different virtual buzzing all the time.

07:44.000 --> 07:50.000
So to give a better overview of what exactly this looks like.

07:51.000 --> 07:58.000
We have on top the urnc that is usually just a normal container running on the host with using namespace, c groups,

07:59.000 --> 08:03.000
scalability is all this kind of goodies that Linux kernel can provide.

08:04.000 --> 08:08.000
Then we have cata which is still the same thing but inside the VM.

08:08.000 --> 08:13.000
So and then we have urnc which takes quite different approach.

08:14.000 --> 08:24.000
It spawns it is a container on time that it creates the VM inside a sandbox environment inside like a typical container.

08:25.000 --> 08:31.000
So to give a better overview of how this works in Kubernetes for example.

08:32.000 --> 08:36.000
The difference here is that urnc will take here only of the user containers.

08:36.000 --> 08:44.000
So for example when we have cases like in the K native we where we have one more side there container like the cube proxy or the pulse container for the pod.

08:45.000 --> 08:47.000
This is not going to be handled from the urnc.

08:48.000 --> 08:52.000
We don't we treat this kind of containers as our containers.

08:53.000 --> 08:57.000
These are containers that belong to the stack to the system stack system stack.

08:58.000 --> 09:04.000
So the cube proxy doesn't need to have any strong isolated but the code provided by the user.

09:04.000 --> 09:12.000
This needs to be isolated and that's why we contain it inside the virtual machine.

09:13.000 --> 09:20.000
Only the untrusted part of the whole code.

09:21.000 --> 09:29.000
So as I said before we want to do this with we want to spawn a VM and run a container inside it.

09:29.000 --> 09:33.000
And I will try to do that right now.

09:34.000 --> 09:37.000
Sorry for that.

09:38.000 --> 09:40.000
So maybe I will skip that.

09:41.000 --> 09:45.000
Because it's not that exciting maybe next one would be more exciting.

09:46.000 --> 09:54.000
So we'll talk a bit about yeah so this demo was about we were just getting the kernel from kata containers so we're just running on one container on top of it.

09:55.000 --> 10:07.000
But the thing here is that in this demo we're using the default kata kernel which is configured with around 1000 options in the kernel.

10:08.000 --> 10:18.000
It has various kind of support like device, hot plug, ACPI, X tables and all the other kind of things that some of them might be useful.

10:18.000 --> 10:25.000
Some of them might be not useful depending on which kind of scenario which kind of use case you target.

10:26.000 --> 10:35.000
But we know that specialization is the key for having low overhead and footprint better performance and stronger security.

10:36.000 --> 10:44.000
So on one side we know that we have the kata Linux we have this kind of all these features that kata can provide in the Linux image.

10:44.000 --> 10:54.000
And then on the other side we have unique kernels which go to the extreme and they only get what is extremely necessary for the applications.

10:55.000 --> 11:00.000
But because most people don't really like unique kernels because they're difficult they're not that easy.

11:01.000 --> 11:03.000
It's a special reporting.

11:04.000 --> 11:11.000
People in the unique kernel community have tried to create let's say Linux clones so they're trying to be binary compatible with Linux.

11:11.000 --> 11:17.000
And we don't believe that this is the best approach because we don't get a benefit of unique kernels.

11:18.000 --> 11:22.000
And instead of doing that we're never going to be Linux compatible if we are not Linux.

11:23.000 --> 11:25.000
So why not just use Linux? Linux is running everywhere.

11:26.000 --> 11:32.000
We can just customize it as much as we want and it's compatible with everything.

11:33.000 --> 11:37.000
So we are not the first one who got this idea of course.

11:38.000 --> 11:48.000
There was a team from IBM that created a very nice paper called Lupin and they were trying to make Linux in looking more like a unique kernel.

11:49.000 --> 11:59.000
They identified that there are a lot of configuration options that they could simply just remove them and they were really created very, very tiny Linux kernels.

11:59.000 --> 12:09.000
And they also tried to play around with system codes and overhead and they were the key findings that they were able to boot as fast as unique kernels.

12:10.000 --> 12:14.000
They were very small and they identified that the system code overhead is not that big.

12:15.000 --> 12:19.000
So based on that we want to do the same thing.

12:20.000 --> 12:26.000
We want to create a tool that will analyze the container will identify what is especially required for a specific container.

12:26.000 --> 12:36.000
It will create a configuration for the kernel for this exact container and then it will analyze the.

12:38.000 --> 12:43.000
Yeah so and then it will just build the.

12:44.000 --> 12:49.000
The kernel so I will go here and so.

12:50.000 --> 12:53.000
Yeah we don't really have time but this is.

12:54.000 --> 13:12.000
A kernel a latest kernel and we have configured already as you can see we have already built it because we don't have time we're not going to build it now it's not going to take a lot of time would just take a few maybe one minute or something it's very very small actually I can so you the size.

13:13.000 --> 13:24.000
Of this it's only around 12 around 13 megabytes so we will take this.

13:25.000 --> 13:28.000
A kernel here we will copy it.

13:30.000 --> 13:38.000
Inside here and then we will and we have created the tool that creates a some it's based on built it and LLB.

13:38.000 --> 13:42.000
So what what we're going to do right now will just copy the.

13:43.000 --> 13:44.000
The kernel inside the.

13:45.000 --> 13:48.000
Inside the container and we use us a base the n's the next.

13:49.000 --> 13:59.000
Alpine version and we have this huge similar line that we try it to we just say to n's and don't run on them on because we will.

13:59.000 --> 14:02.000
So we will build this.

14:03.000 --> 14:05.000
Container.

14:06.000 --> 14:08.000
Single with the bigger build.

14:09.000 --> 14:16.000
And this will take a few moments okay and now to we can run it.

14:19.000 --> 14:27.000
Yeah so here we we have the container running inside the VM it's a next and we can even.

14:29.000 --> 14:34.000
Get here.

14:37.000 --> 14:38.000
Get the IP.

14:39.000 --> 14:43.000
And we can do it here to solve that it's working it's working.

14:44.000 --> 14:47.000
Yes it does working and we can also.

14:48.000 --> 14:49.000
Stop it here.

14:57.000 --> 14:58.000
Oops.

14:59.000 --> 15:04.000
Okay.

15:12.000 --> 15:16.000
And to to show you that we indeed run in a different kernel.

15:17.000 --> 15:20.000
We will just say here to.

15:21.000 --> 15:22.000
You name.

15:29.000 --> 15:32.000
Yeah I soon do been here.

15:37.000 --> 15:39.000
We will build it with Docker.

15:40.000 --> 15:41.000
Okay.

15:42.000 --> 15:48.000
And now we will just run the n's next alpine with just.

15:48.000 --> 16:01.000
Okay so at first we will just run it on top of.

16:02.000 --> 16:08.000
And we see that we are running in this kind of kernel here in 6.12.

16:09.000 --> 16:11.000
We will do the same with.

16:12.000 --> 16:17.000
You run with runc and we will do the same with urnc.

16:18.000 --> 16:20.000
And we successfully panicked.

16:21.000 --> 16:25.000
But why we panicked is because the application just exit.

16:26.000 --> 16:31.000
So as you can see here somewhere.

16:32.000 --> 16:35.000
Oh yeah it's at nom there.

16:36.000 --> 16:39.000
Okay this binary doesn't exist.

16:40.000 --> 16:42.000
Okay so we're not going to search it.

16:43.000 --> 16:45.000
But it's simply runs a new name.

16:46.000 --> 16:49.000
So if we do like if we try to do.

16:50.000 --> 16:51.000
Here.

16:53.000 --> 16:55.000
It was been you name.

16:56.000 --> 16:57.000
Yeah.

16:59.000 --> 17:01.000
It seems it's there.

17:02.000 --> 17:03.000
Okay that's weird.

17:04.000 --> 17:07.000
So to to wrap up.

17:08.000 --> 17:12.000
We we also do the some very very early evaluation about this.

17:13.000 --> 17:18.000
To measure the kernel elements that we created that I just sold you and also measure the good time that we have.

17:19.000 --> 17:21.000
And we you can see the test with here.

17:22.000 --> 17:24.000
So regarding the image size we had.

17:25.000 --> 17:28.000
We target the n's next and we can see that lower is better here.

17:28.000 --> 17:31.000
The sizes are in.

17:32.000 --> 17:35.000
And we see that cut is extremely big.

17:36.000 --> 17:40.000
Then we have the case of unicraft is the generic unicraft that tries to be binary compatible with Linux.

17:41.000 --> 17:45.000
And then we can see that our Linux is even smaller than unicraft.

17:46.000 --> 17:49.000
And then we can see that the original looping kernel was just.

17:50.000 --> 17:52.000
And that was based on an old kernel.

17:53.000 --> 17:54.000
So maybe that's the reason.

17:54.000 --> 17:59.000
We have the unicraft HTTP which is a very specialized unicanol that is just a few kilo bytes.

18:00.000 --> 18:01.000
So regarding the start time.

18:02.000 --> 18:04.000
We see that again lower is better.

18:05.000 --> 18:06.000
The start time is in milliseconds.

18:07.000 --> 18:09.000
We see that runc is the fastest one.

18:10.000 --> 18:12.000
Cut is quite slow.

18:13.000 --> 18:19.000
And you can see can almost be it's not as good as runc but it's very close.

18:19.000 --> 18:23.000
And so that's even with the booting of the the Linux.

18:24.000 --> 18:29.000
I would like to mention that this project is partially funded from European projects.

18:30.000 --> 18:32.000
And some arise we know that containers are great.

18:33.000 --> 18:34.000
They like some kind of isolation.

18:35.000 --> 18:40.000
Some books execution is as an alternative for providing strong isolation.

18:41.000 --> 18:43.000
But they induce some kind of overhead.

18:44.000 --> 18:47.000
We try to reduce it and we can do that with urnc.

18:48.000 --> 18:57.000
And we even want to strip down the Linux kernel to make it even smaller and fast.

18:58.000 --> 19:01.000
And you can check out the code in this kind of two links.

19:02.000 --> 19:05.000
This is the building tool and this is the container runtime.

19:06.000 --> 19:09.000
And you can also count more to the web assembly development.

19:10.000 --> 19:13.000
See how we also integrate this kind of things with web assembly.

19:13.000 --> 19:15.000
So thank you for your time.

19:16.000 --> 19:17.000
And that's down the side.

19:21.000 --> 19:23.000
So thank you unfortunately out of time.

19:24.000 --> 19:27.000
So if you have any questions you can take them outside.

