WEBVTT

00:00.000 --> 00:13.760
So, thanks Udo for the kind introduction, and I'm thrilled to be here.

00:13.760 --> 00:16.160
This is my first, first demo.

00:16.160 --> 00:22.160
My usual community is formal verification, formal specification and verification.

00:22.160 --> 00:29.680
So today, I'm going to give you a high level overview of the formal specification and

00:29.680 --> 00:36.240
verification of the NOVA and Michael Hypervisor that we, the formal metal, formal metal team

00:36.240 --> 00:40.480
at blue rock security is doing.

00:40.480 --> 00:49.800
So, what we are building at blue rock now is a virtualization start with NOVA at a bottom running

00:49.800 --> 00:56.920
in Hypervisor privilege, and then we have give you a high level overview of how we perform

00:56.920 --> 01:05.960
modular, formal specification and verification of NOVA, Michael Hypervisor within the whole stack.

01:05.960 --> 01:13.800
And so, if you are not familiar with formal specification and verification, at a high level,

01:13.800 --> 01:21.880
the idea is just to state the properties and verify the properties, meaning doing proof

01:21.880 --> 01:25.080
in a very rigorous logic.

01:25.080 --> 01:33.880
This is allow us to, and forks us to actually think about all of the properties of our components

01:33.880 --> 01:41.000
in a most ambiguous way, most implicit way, a more thorough way, and with the result is that

01:41.000 --> 01:48.520
we have much greater confidence on our proven properties than just simple testing and even

01:48.520 --> 01:53.800
stronger than just running static analyzers.

01:53.800 --> 01:59.560
Now, it may be obvious now that this requires a lot of effort, and obviously what we really

01:59.560 --> 02:06.280
want is that we want to do this incrementally, and the keyword here is modularity in a way

02:06.360 --> 02:13.960
that we want to be able to specify and verify components in a more or less independent way,

02:13.960 --> 02:19.560
and then we can compose the results to get the specification and the verification result for bigger

02:19.560 --> 02:28.120
systems. And in this talk, I will give you a quick overview of the techniques, which is separation

02:28.120 --> 02:35.480
logic, a very recent development in the formal verification research that we apply in order

02:35.480 --> 02:42.360
to achieve modularity. And last but not least, in order to scale our specification and verification

02:42.360 --> 02:51.400
to bigger system, we want to be able to perform these proofs in a more automatic way, and one of the

02:51.400 --> 02:57.640
most important thing, and the modern standard of formal applications is doing this in a proof

02:57.720 --> 03:07.080
assistant where our tens of thousand lines are proof, check algorithmically, and what we're using

03:07.080 --> 03:17.720
is the expressive type system of the proof assistant. And on top of that, we also want to

03:17.720 --> 03:25.960
reduce the effort or even when if you're familiar with horror logic from university, the specification

03:26.040 --> 03:32.280
in horror logic is often given as a triple where you have the pre-condition, the program itself,

03:32.280 --> 03:40.760
and the post-condition, and the triple is just simply saying that the program update, the pre-condition

03:40.760 --> 03:48.680
state to the post-condition state. And in this case, it's doing that sequentially. And in this version

03:48.680 --> 03:55.400
of separation logic, the pre and the post-condition mentions what we call a small proof-print

03:56.440 --> 04:03.560
points to assertions, which only talk about the fragment of a state that our piece of code

04:03.560 --> 04:09.800
care about. So for example, what we're showing is that in the pre-condition of updating a

04:09.800 --> 04:18.280
capability, we only need the points to assertion of the selector that's our ownership of that

04:18.360 --> 04:27.400
capability. For example, if we just running cap update on one single entry in the capability page

04:27.400 --> 04:34.840
table, then the specification of that functions, we just mentioned that particular entry and nothing else.

04:36.200 --> 04:45.960
And for that, we can specify and verify cap update in cap really date each function in isolation

04:46.440 --> 04:52.760
and when we need to compose them, for example, here we have a thread, two threads running

04:52.760 --> 05:01.000
these two function in parallel. And if we happen to know that, well, this is actually

05:01.000 --> 05:08.520
cell one, sorry. The cap update is working on cell one and cap update is working on selector

05:08.520 --> 05:15.960
two. And we happen to know that these two are the joint in the capability table. This is

05:15.960 --> 05:23.240
encoded as the separating the junction a star in separation logic and basically say that these two

05:23.240 --> 05:29.160
are the joint resources. Then we can simply compose the specification that we have up here

05:29.160 --> 05:36.840
without understanding the implementation of this function to get the help proof of the composition

05:36.840 --> 05:43.000
and basically the proof here to saying that small complicated and say a resources are very

05:43.000 --> 05:53.400
common. So we have to go to more stay up here at logical construct in modern separation logic

05:53.400 --> 06:01.400
in order to be able to encode and to specify and verify this concurrently access share resources.

06:01.480 --> 06:06.760
And in a way, you can see this is actually a separation because it's a separation in time

06:06.760 --> 06:12.200
where everyone would in only one moment in time access one particular resources.

06:14.200 --> 06:22.040
And for that, we go to more advanced features of separation logic and that was called concurrent

06:22.040 --> 06:28.840
separation logic and you probably familiar with the very basic form of concurrency where

06:29.800 --> 06:36.680
excessive concurrency resources are protected by lock. So this is, for example, if you happen to have

06:36.680 --> 06:44.840
through threatening, update in capital A now on the same selector, you want to be able to

06:45.480 --> 06:52.680
to distinguish separate them so that one does not interfere with each other, you can put the

06:53.640 --> 06:59.800
lock to protect the selector and this come up very naturally in separation logic by just saying

06:59.800 --> 07:08.600
that, okay, now the lock protects the tiny little bit of resources, the capability table entry

07:09.160 --> 07:14.120
and whoever acquired lock will get access to that, sequentially, and can do whatever they want with them

07:15.400 --> 07:21.400
and when it release the lock, they have to return that. This is very natural and

07:22.120 --> 07:33.720
very basic also concurrency separation logic. Yeah, and then, but you don't really want to use

07:33.720 --> 07:41.560
lock based concurrency, but what we have seen is that in highly concurrency code, we want

07:41.560 --> 07:50.440
five grand concurrency where access to the resources are atomic. And in this case, if we happen to

07:50.600 --> 07:56.440
have an implementation, I've kept very long and kept update to just using atomic to

07:57.880 --> 08:04.920
access the resources, then we need a word to specify this highly concurrency behavior

08:06.280 --> 08:17.880
of operation. And more importantly, you can now chain up this atomic update to be able to specify

08:17.960 --> 08:24.520
multiple linearization points. It happens that your function doesn't have one single atomic effect,

08:24.520 --> 08:34.280
but it's hadn't had multiple atomic effects, multiple linearization point. And so, we will jump to

08:34.280 --> 08:45.240
an example directly in Nova and this is a bit blurry. This is the specification, the documentation

08:46.200 --> 08:53.640
of Nova, and this is just screenshotting the page about control and semaphore. And you don't need

08:53.640 --> 09:00.200
to read it as blurry anyway, but at the first step, you can think of it in a more form of way

09:01.400 --> 09:11.800
that this is basically the visibly atomic effects of semaphore control up and semaphore control

09:11.880 --> 09:19.960
down in which the effect is something up. There will be a linearization point an atomic effect

09:19.960 --> 09:26.200
that check the capability, this will require you to use the capability of the semaphore.

09:26.200 --> 09:32.760
And then there will be an atomic effect of actually bumping the counter of the semaphore. And on

09:32.760 --> 09:40.360
now, you have also a capability check and then several other points. One of this is that the

09:40.360 --> 09:48.680
now actually happened to because the execution context is blocked waiting for the semaphore.

09:48.680 --> 09:55.480
And then it can happen that as a simulator, it's either it's unblocked. So, definitely an acquired

09:55.480 --> 10:06.200
semaphore or it just simply time out. So, this is one important point here is that these

10:06.280 --> 10:11.560
points are the linearization points so it can happen concurrently. And so it can be the case that

10:11.560 --> 10:18.680
some other operation running concurrently can just happen to update the capability. But the

10:18.680 --> 10:24.200
semaphore, the rest of the behavior is still going on as long as here we have the permission.

10:24.200 --> 10:32.120
We have the capability to perform the operation. And one other important point is that if you look

10:32.120 --> 10:39.880
at the documentation, it's just say okay in what cases in the behavior will be that you get

10:39.880 --> 10:47.240
some error code. But without specifying the exact order of these operations. But in order to

10:47.240 --> 10:54.520
specify these more formally, we have to create have to see and specify the order between these

10:54.600 --> 11:03.000
to the realization points. And so, in order to specify the semaphore, we have to think about

11:03.000 --> 11:08.760
what are the resources of a control semaphore care about in this operation. And what I

11:08.760 --> 11:14.920
from listed here is basically it's just briefing. You need the capability to store the permission

11:14.920 --> 11:21.400
to access the semaphore. And then you need the counter value of the semaphore and potentially

11:21.480 --> 11:30.200
the list of block execution contacts waiting for the semaphore. And we just simply introduce

11:30.200 --> 11:37.480
though free or not simply we have to prove that we can do that in the interface of separation logic

11:37.480 --> 11:48.280
for NOVA. And with that we can use the idea of logical atomic updates to specify the behavior.

11:48.280 --> 11:57.560
And this is just pretty easy to read. I'm hoping this is easy to read. And it's basically just

11:57.560 --> 12:05.240
saying okay there's read up the cover really up a selector. And then you check if that one

12:05.240 --> 12:11.880
have the right permission to perform enough. And if that happened, then it will be the case that

12:11.960 --> 12:25.320
the counter value will be updated. Otherwise you get that capability. And the Q here is a logic

12:25.320 --> 12:31.640
assertion that is picked by the client or specification to specify what exactly will happen

12:31.640 --> 12:43.480
after the call up the control semaphore. And now there is also very similar. You can see that at

12:43.480 --> 12:50.520
each linearization point we use an atomic update to specify the pre-inport condition of what

12:50.520 --> 12:58.120
happened. You certainly again have to check the capability. And then if it happened, then the state

12:58.120 --> 13:05.320
the list of block AC will be appended with the current AC. And then sometime later it will be

13:05.320 --> 13:13.960
unblock because you have a hammer out or it will be actually decremented. The counter will be

13:13.960 --> 13:22.200
recremented because somebody unblock it. And so we actually acquired the semaphore. And one

13:22.360 --> 13:29.480
important point here is that we use a classical conjunction to specify the possibilities that

13:29.480 --> 13:35.080
the client have to deal with. And in this case both cases can happen and so the client should be able

13:35.080 --> 13:49.080
to deal with either a semaphore or a timeout happen. And just to summary to summarize this idea,

13:49.160 --> 13:54.680
the simple idea is to use separation of repot the arrows. And so in a way the entrances client

13:54.680 --> 14:06.760
cannot crash nobody. Okay, so I quickly show you how this the verification of some controls

14:06.760 --> 14:13.640
and before down works. And this is the code, the implementation, the current implementation of

14:13.640 --> 14:25.400
Nova. And what happens here is that this is the internal C++ code. And we have our own internal resources

14:26.440 --> 14:34.920
stated on some institutional logic to specify the behavior of the C++ structure or data in this

14:34.920 --> 14:43.720
case. And this is fairly complicated but we look really we have strong proof automation for C++.

14:43.720 --> 14:50.840
And this can be blast through mostly before we just have to manually write these pre-imposed condition.

14:51.640 --> 14:59.080
Again, this is you can see here that there's a lock. And so this is happening sequentially.

14:59.880 --> 15:07.720
But on the other side, you can see that when we're doing this proof, we will have to be able

15:07.720 --> 15:16.440
to certify the interface, the specification using atomic updates. And I'm just showing you that

15:16.440 --> 15:21.960
where these atomic updates will happen. And they happen in connection with the internal data

15:22.040 --> 15:29.800
to the internal states or those in separation logic. For example, here the updates are actually

15:31.320 --> 15:39.880
downing the counter, acquiring the resource, happen at the pawns where the count is decremented.

15:42.120 --> 15:50.840
And in the case, that's the normal case. So that actually happens is that the execution

15:50.840 --> 15:57.960
can't technically block waiting for the semaphore. And in one of the case, it will set a time out.

15:57.960 --> 16:04.600
And when the time out happen, which is in another implementation, not exactly in the call out

16:04.600 --> 16:11.960
control valve, we will commit the rest of the case in the specification. And this is if you're familiar,

16:11.960 --> 16:18.680
this is called an external linearization pawns. And this can be handled easily with atomic updates.

16:19.640 --> 16:28.200
In a case where you suddenly got unblocked here, then your unblocked is actually done with the

16:28.200 --> 16:35.240
help of another control and up semaphore. And so the update is happening inside the up. And this is

16:35.240 --> 16:47.000
also a notion of helping in a linearization. And this is so easily handled. And with that,

16:47.880 --> 16:51.560
I'm just showing this specification. This is a fairly complex protocol, but

16:51.560 --> 16:55.960
doing formal applications just mean that we have to stay this explicitly and have to check on

16:55.960 --> 17:05.720
the case on the case explicitly. And we were able to specify the behavior of Nova and just to

17:05.720 --> 17:16.040
move up one level of abstraction, which is the then the behavior of the whole VCPU running on top of Nova.

17:17.080 --> 17:26.040
This combination of a user semantics, where the kernel semantics is replaced by

17:26.040 --> 17:32.680
humans, then just simple testing or setting analyzes. But that's also a lot of efforts.

17:34.280 --> 17:42.920
But luckily we have reached a pawn where we have a lot of tools to be able to have modularity

17:42.920 --> 17:47.960
in our process and also a lot of automation in the process to reduce their efforts.

