WEBVTT

00:00.000 --> 00:08.200
Thank you all to be here today.

00:08.200 --> 00:11.480
So my name is Paul O'Guenner-Canaison.

00:11.480 --> 00:15.640
Today I'm speaking about the Linux Foundation

00:15.640 --> 00:18.000
Energy C-Pass project.

00:18.000 --> 00:22.400
And most specially about the SVT-Trace tools, which

00:22.400 --> 00:27.080
are latent scenarios, this tools that we develop during

00:27.160 --> 00:34.160
20, 24-year in the context of virtualized networking platforms.

00:34.160 --> 00:37.720
So first I'm working at the company, which is called

00:37.720 --> 00:38.720
the Stafford family news.

00:38.720 --> 00:43.320
So we are company, which are experts in the Libre and the

00:43.320 --> 00:44.960
Free and Open Source Technologies.

00:44.960 --> 00:48.400
And we are based in Canada and Europe.

00:48.400 --> 00:51.960
And we are mainly working on a project which is called

00:51.960 --> 00:57.840
C-Pass with is a project of the Linux Foundation

00:57.840 --> 00:59.040
Energy.

00:59.040 --> 01:01.920
So quickly, what is the C-Pass project?

01:01.920 --> 01:04.840
We will not take too much time in it.

01:04.840 --> 01:09.760
So this is a very, very, very oversimplified schema

01:09.760 --> 01:13.360
about all the electric energy is produced,

01:13.360 --> 01:16.600
transported and distributed to your home.

01:16.600 --> 01:21.440
And as you can see, there is a place

01:21.440 --> 01:23.000
that we call a substation.

01:23.000 --> 01:29.360
And the main purpose of substation is to make some monitoring

01:29.360 --> 01:33.120
of the electrical energy and some other stuff.

01:33.120 --> 01:38.880
And it's currently, and since the decades,

01:38.880 --> 01:42.840
lots of hardware devices and there is transition

01:42.840 --> 01:47.000
of into this substation, into digital substation.

01:47.000 --> 01:49.640
And more and more digital devices.

01:49.640 --> 01:54.200
And for this, we want to implement these virtual devices

01:54.200 --> 01:56.760
into virtualization platform.

01:56.760 --> 02:00.000
And this is where C-Pass operates.

02:00.000 --> 02:04.640
C-Pass is mainly an open source appaivizer

02:04.640 --> 02:07.960
and designed for the digital substation.

02:07.960 --> 02:13.720
And the main purpose of C-Pass is to hosting virtualization

02:13.720 --> 02:18.560
and automation and protection application

02:18.560 --> 02:22.320
with real-time constraints.

02:22.320 --> 02:26.840
So this is the schema of the semantic technologies

02:26.840 --> 02:27.560
we are using.

02:27.560 --> 02:32.960
It's the main technologies all about a test of clustering,

02:32.960 --> 02:38.880
features, and a lot of virtualization stuff.

02:38.880 --> 02:42.920
If you're interested about a more deeper presentation

02:42.920 --> 02:45.760
of the C-Pass project, we are already

02:45.760 --> 02:48.720
on some presentation in the past few years

02:48.720 --> 02:50.560
in the energy dev room.

02:50.560 --> 02:53.040
So if you're interested, you can take a look

02:53.040 --> 02:57.240
at the past few presentation.

02:57.240 --> 03:01.080
So let's go now in the subject, which

03:01.080 --> 03:05.160
is about the need of low latency communication

03:05.160 --> 03:08.160
in a substation.

03:08.160 --> 03:12.760
So to do a little bit of context in substation,

03:12.760 --> 03:16.600
as low latency communication are a priority.

03:16.600 --> 03:19.160
Because, as you can see, substation,

03:19.160 --> 03:21.960
they are safety and reliability is critical.

03:21.960 --> 03:25.000
And for example, during a storm, there

03:25.000 --> 03:28.840
is a trees are formed on a electrical power line.

03:28.840 --> 03:33.440
The current must be cut off very, very, very quick.

03:33.440 --> 03:36.480
And as the substation is monitoring

03:36.480 --> 03:42.680
this electrical power lines, we have a real-time constraints

03:42.680 --> 03:46.320
and the virtualized protection application must react

03:46.320 --> 03:50.720
as soon as possible.

03:50.720 --> 03:54.400
So to communicate in a substation, there

03:54.400 --> 04:00.440
is a standard, which is called IECS 61850,

04:00.440 --> 04:05.360
which defines a lot of protocols.

04:05.360 --> 04:11.200
And one of them is called SV for some pair values protocol.

04:11.200 --> 04:16.360
And it's just basically an L2 protocol,

04:16.360 --> 04:21.800
which can be identified with a photo field,

04:21.800 --> 04:24.520
it's an internet photo field.

04:24.520 --> 04:30.400
And it follows a publisher and a subscriber schema.

04:30.400 --> 04:35.000
The publisher machine is sending periodically

04:35.000 --> 04:39.360
the sample value packets at a fixed pace, which

04:39.360 --> 04:43.520
depends of the frequency of your electrical network.

04:43.520 --> 04:48.520
And the subscriber beside this packet

04:48.520 --> 04:52.200
and then decodes in and then an application

04:52.200 --> 04:58.560
do whatever she wants with the data in this easy.

04:58.560 --> 05:06.160
So if we take a look of all those these in the past architecture,

05:06.160 --> 05:08.240
we have a publisher machine.

05:08.240 --> 05:12.280
It sends the SV to a C-Pass IPAvisor.

05:12.280 --> 05:16.240
And then we have the subscribers, which are virtual machines.

05:16.240 --> 05:20.400
And all of them are receiving the same sample values

05:20.400 --> 05:24.280
and receive decodes and et cetera.

05:24.280 --> 05:30.440
So what we need is to ensure we are very, very low latency,

05:30.440 --> 05:33.480
latency communication between the emission of the publisher

05:33.480 --> 05:36.280
machine to the subscriber machine.

05:36.280 --> 05:41.400
And when I said low, it's the lowest possible.

05:41.400 --> 05:44.920
So what I wanted to do is to first make

05:44.920 --> 05:51.800
a lab of the transit of this SV from the permission

05:51.800 --> 05:57.080
to the publisher machine to the reception

05:57.080 --> 06:00.000
in the subscriber.

06:00.000 --> 06:05.800
So as you see here, first the publisher machine is sending

06:05.800 --> 06:10.040
the sample value from the networking interface.

06:10.040 --> 06:11.640
It travels through the network.

06:11.640 --> 06:14.440
And then it's received at the T1,

06:14.440 --> 06:20.120
in T2, the IPAvisor network interface.

06:20.120 --> 06:24.920
And then in T2, the IPAvisor, this sample values

06:24.920 --> 06:29.800
is transmitted through an open-to-switch bridge

06:29.800 --> 06:33.120
to all the virtual machines.

06:33.120 --> 06:38.600
And so we have, at the T2, the reception of the sample value

06:38.600 --> 06:42.800
inside the network interface of each virtual machine.

06:45.440 --> 06:52.280
And finally, inside the subscriber, inside the virtual machine,

06:52.280 --> 06:58.760
the sample value is transmitted from the network interface

06:58.760 --> 07:03.920
to a user application, a protection application.

07:03.920 --> 07:08.440
So as you see here, we have a lot of latency

07:08.440 --> 07:15.080
that we identified with T, IPAvisor, T2, 4T subscriber.

07:15.080 --> 07:21.120
And what we want to know is where the latency is

07:21.120 --> 07:22.520
are coming from.

07:22.520 --> 07:27.920
And we want to track where exactly there is

07:27.920 --> 07:28.880
problems.

07:28.880 --> 07:33.520
So for the next examples, I have taken the hypothesis

07:33.520 --> 07:37.960
of a perfectly-published machine, which is sending

07:37.960 --> 07:41.720
at a perfect paste the sample values.

07:41.720 --> 07:51.520
And we will take here the track of the latency inside the IPAvisor

07:51.520 --> 07:52.680
only.

07:52.680 --> 07:58.640
So where the latency of this SV are coming from?

07:58.640 --> 08:00.680
And all can we do that.

08:00.680 --> 08:08.480
So the first step to do this is to identify inside the kernel,

08:08.480 --> 08:13.960
where what is kernel stack, which is a security

08:13.960 --> 08:16.760
by the kernel, when a sample value is transmitted.

08:16.760 --> 08:21.240
What our kernel is doing when it receives a packet.

08:21.240 --> 08:26.440
So to this, we use a tool, which is called P, the value

08:26.440 --> 08:30.040
AU, which stands for packets, where are you.

08:30.040 --> 08:33.400
And the main purpose of this tool is to show us

08:33.400 --> 08:36.920
every kernel function which has been called

08:36.920 --> 08:41.080
when a specific packet we have filtered

08:41.080 --> 08:42.880
around on the nation.

08:42.880 --> 08:46.680
As you see here, there are three main process,

08:46.680 --> 08:49.680
the NICI-I-U-Q, the open-viswitch

08:49.680 --> 08:50.680
diamond.

08:50.680 --> 08:54.320
And finally, the Q-E-U process, and most

08:54.320 --> 08:57.720
specialties, a specific thread of the Q-E-U

08:57.720 --> 09:00.280
process, which is called the BIOS driver, which

09:00.280 --> 09:08.000
it main purpose is to send the data to the virtual machines.

09:08.000 --> 09:12.360
And what is interesting here is that we have identified

09:12.360 --> 09:16.360
a variable of the sample value on the IPAvisor

09:16.360 --> 09:21.360
and finally, a departure of the sample value.

09:21.360 --> 09:25.000
And we have a departure, an arrival, an arrival,

09:25.000 --> 09:26.640
a departure, and we want to know,

09:26.640 --> 09:30.840
or can we measure the latency between these two points.

09:30.840 --> 09:33.920
And so to do this, we use another tool.

09:33.920 --> 09:37.720
That is called BPF-TROS, maybe you have here

09:37.720 --> 09:40.560
about it, because there is a lot of, for the end

09:40.560 --> 09:44.000
confluence, talk about it.

09:44.000 --> 09:47.240
BPF-TROS is a arrival or can enter a cell,

09:47.240 --> 09:48.880
based on the EPPF.

09:48.880 --> 09:55.040
And the main purpose is that you have scripting language.

09:55.040 --> 10:00.560
And you can make some custom action based on the

10:00.560 --> 10:04.760
what can a tracing points are called.

10:04.760 --> 10:11.320
And here, so the two tracing, we are two props,

10:11.320 --> 10:13.920
we have identified previously.

10:13.920 --> 10:17.360
SKB-TROS and the consumer SKB, which are K-TROS.

10:17.360 --> 10:21.720
So when with BPF-TROS, when this function

10:21.720 --> 10:26.520
are executed, we can execute some custom actions.

10:26.520 --> 10:29.920
So for example, we can in BPF-TROS script,

10:29.920 --> 10:34.200
we call a timestamp, each time of the SKB-TROS function

10:34.200 --> 10:38.840
is called, and we call a second timestamp

10:38.840 --> 10:42.520
when the consumer SKB function is called,

10:42.520 --> 10:45.640
and then between these two points,

10:45.640 --> 10:48.280
we can measure latency.

10:51.480 --> 10:57.160
So the first step is to wrap this BPF-TROS script

10:57.160 --> 11:03.520
into Python wrapper, and this is where the SKB-TROS

11:03.520 --> 11:05.120
tool operates.

11:05.120 --> 11:08.280
We developed it with two main features.

11:08.280 --> 11:10.440
The one is a live feature.

11:10.760 --> 11:14.640
What you see on the left, and it's allowed to show us

11:14.640 --> 11:19.320
in a direct, an instant gram with the real time latency

11:19.320 --> 11:23.840
of the transistor of our sample values

11:23.840 --> 11:28.840
with some statistic about the average maximum latency,

11:28.840 --> 11:30.600
actually, et cetera.

11:30.600 --> 11:33.760
And the second one is a recall.

11:33.760 --> 11:37.120
And in the log file, we record all of the latency,

11:37.120 --> 11:40.960
we record it, and we use it for long test

11:40.960 --> 11:44.120
or post-process analysis.

11:45.120 --> 11:48.600
So let's do some practical case,

11:48.600 --> 11:52.480
and let's see if we can track some latency,

11:52.480 --> 11:55.640
sources, and maybe fix them.

11:55.640 --> 11:57.720
So first, we do a nominal case.

11:57.720 --> 12:01.960
So we are investigating latency between the T1

12:01.960 --> 12:06.960
and the T2 point, and to see if we can optimize the situation here.

12:06.960 --> 12:12.840
So what you see here is that we did a four-hour test

12:12.840 --> 12:17.040
without any optimization in the IPvizer

12:17.040 --> 12:19.880
or inside the virtual machine, and what you see,

12:19.880 --> 12:22.320
is not very good results.

12:22.320 --> 12:29.720
We have a lot of latency that are largely above 200 microseconds,

12:29.720 --> 12:34.640
and in a real-time context, it's really, really bad.

12:34.640 --> 12:40.040
So what we did is some optimization.

12:40.040 --> 12:44.960
We isolate the KVM calls of our virtual machines.

12:44.960 --> 12:50.200
We isolate the T1 main process,

12:50.200 --> 12:56.440
and we have very better results.

12:56.440 --> 13:02.720
We have less sample values that are above 200 microseconds.

13:02.720 --> 13:09.520
But we still see what I call a glispice that I have identified.

13:09.520 --> 13:12.200
Some of them are at the beginning,

13:12.200 --> 13:16.840
and are a logical behavior.

13:16.840 --> 13:19.760
The first one is open the switch that when it

13:19.760 --> 13:23.360
start to translate the sample value for whatever reason,

13:23.360 --> 13:26.920
it's used to translate inside the user space,

13:26.920 --> 13:29.040
and then it can load the kernel driver,

13:29.080 --> 13:35.360
and all of these are the rest of the SVR sending through the kernel space.

13:35.360 --> 13:39.680
But the last two spikes we don't know

13:39.680 --> 13:42.960
where they are coming from.

13:42.960 --> 13:50.240
So this is here where we used LTTNG,

13:50.240 --> 13:54.520
because it's a real-time programming.

13:54.520 --> 13:59.000
It's becoming too complex to investigate just with

13:59.000 --> 13:59.840
the graph.

13:59.840 --> 14:02.200
And so what's you see here?

14:02.200 --> 14:12.360
The first LTTNG trace shows the execution of the pipeline of some per values.

14:12.360 --> 14:17.120
So first we see that it arrives in the IP address IRQ,

14:17.120 --> 14:22.280
and it passed at the end with the Vios, PMVios driver,

14:22.280 --> 14:27.280
and then it is transmitted to the virtual machines.

14:27.280 --> 14:33.800
But we see here we have a bigger latency of 100 microseconds,

14:33.800 --> 14:38.160
and we see here that there is a green process,

14:38.160 --> 14:42.400
the QMU system process, and we don't know what is doing here.

14:42.400 --> 14:46.520
And we see very often this problem,

14:46.520 --> 14:49.880
and this is what is creating this big spikes.

14:49.880 --> 14:52.480
So the solution is quite easy.

14:52.520 --> 14:55.560
It's just a priority fix.

14:55.560 --> 15:01.000
You just have to make the QMU system priority much lower

15:01.000 --> 15:10.000
than the Vios to the Vios to thread the Vios to the QMU thread.

15:10.000 --> 15:14.240
And the good result as a partition below.

15:14.240 --> 15:20.520
We have 40 microseconds between the reception of the IP address

15:20.520 --> 15:27.800
and then it turns it to the virtual machine.

15:27.800 --> 15:29.800
So thank you for your attention,

15:29.800 --> 15:33.520
and if you're interested about SVTRAS project,

15:33.520 --> 15:37.080
or maybe about the CPass project,

15:37.080 --> 15:40.440
you can take your look at the GitHub page.

15:40.440 --> 15:41.440
Thank you.

15:41.440 --> 15:44.440
Thank you.

16:01.440 --> 16:03.600
In their example, you used an OVS bridge,

16:03.600 --> 16:07.160
but is there any reason this couldn't work with a simple Linux

16:07.160 --> 16:11.240
purchase as well, or does SVTRAS support Linux purchase as well?

16:11.240 --> 16:14.680
So you're asking about the OVS bridge.

16:14.680 --> 16:18.360
And if you could use this with regular Linux

16:18.360 --> 16:22.440
purchase as well, regular Linux purchase, just.

16:22.440 --> 16:26.720
And we didn't have an opportunity

16:26.720 --> 16:28.600
time to test all the technologies.

16:28.600 --> 16:31.320
We only tested the open-list bridge.

16:31.320 --> 16:34.280
We did some tests with PCI password,

16:34.280 --> 16:36.520
but it's some cheat.

16:36.520 --> 16:39.240
And it's not quite the same because PCI password

16:39.240 --> 16:44.240
you are reserving one Ethernet port for the virtual machine.

16:44.240 --> 16:48.360
And the reason where very good, but it's PCI password

16:48.360 --> 16:52.280
and not software turns it.

16:52.280 --> 16:58.680
But we are planning to try with other logistics technologies

16:58.680 --> 17:01.800
and also with other things like maybe

17:01.800 --> 17:07.720
or express that a path or maybe a deep decay.

17:07.720 --> 17:10.280
About OVS, that's called an up call.

17:10.280 --> 17:11.800
This is how OVS is working.

17:11.800 --> 17:13.960
So what you're seeing as a spike is an up call.

17:13.960 --> 17:18.360
And you don't use OVS on a real-time system pass.

17:18.360 --> 17:19.560
You don't do that.

17:19.560 --> 17:20.920
It's not made for it.

17:20.920 --> 17:23.240
But OK.

17:23.240 --> 17:28.800
Did you use huge pages and CPU dedicated CPU for the VM as well?

17:28.800 --> 17:29.800
Yes.

17:29.880 --> 17:33.560
Huge pages, we don't need because we don't have

17:33.560 --> 17:36.360
a lot of usage of memory.

17:36.360 --> 17:38.920
We are not in case we are, for example, using DPDK.

17:38.920 --> 17:48.200
But CPU allocation will use its KVM calls are isolated.

17:48.200 --> 17:50.600
The KVM call ends a premiuming process

17:50.600 --> 17:58.040
is isolated too on a reserved real-time call.

17:58.040 --> 18:02.120
So if you see some TLB shoot down, you should use huge pages.

18:02.120 --> 18:03.880
And that's going to decrease your latency as well.

18:03.880 --> 18:05.800
But maybe this is something that you cannot measure,

18:05.800 --> 18:08.840
because your latency overall is still pretty high

18:08.840 --> 18:10.840
from an absolute number.

18:10.840 --> 18:14.120
Because we have this stack running since here.

18:14.120 --> 18:16.920
With real-time end-to-end, and we have lower latency

18:16.920 --> 18:20.600
that we measure with cyclic tests and other tools.

18:20.600 --> 18:25.320
And so I think you have almost the right ingredients,

18:25.320 --> 18:28.600
but you need a little of tuning and just don't use OVS.

18:28.600 --> 18:31.720
Or you make sure you don't have a call.

18:31.720 --> 18:35.720
This is something that you can do, but you need to know how to do it.

18:35.720 --> 18:36.120
Yes.

18:36.120 --> 18:39.880
But we use open this switch, because it's mainly the only technologies

18:39.880 --> 18:44.040
that results us a lot of our needs.

18:44.040 --> 18:47.320
So there is still a lot of all the technologies

18:47.320 --> 18:51.560
that maybe we could use.

18:51.640 --> 18:54.360
But this is why we have planned to test

18:54.360 --> 18:56.840
other things that only open this switch.

19:01.560 --> 19:04.360
Anyone else?

19:04.360 --> 19:04.920
Thank you very much.

19:04.920 --> 19:05.400
Thank you.

