WEBVTT

00:00.000 --> 00:19.760
Thank you.

00:19.760 --> 00:21.280
Nothing,

00:21.280 --> 00:26.480
just everyone's going just for this time.

00:26.480 --> 00:51.480
So you don't have to switch during the talk?

00:51.480 --> 00:52.480
Thank you.

00:52.480 --> 00:54.480
Have any interference?

00:54.480 --> 00:55.480
Hold it down.

00:55.480 --> 00:56.480
It's on.

00:56.480 --> 00:57.480
This is always on.

00:57.480 --> 00:59.480
And don't shut it up.

00:59.480 --> 01:02.480
Put it somewhere else while you're not speaking.

01:02.480 --> 01:04.480
I will do this.

01:04.480 --> 01:06.480
And then there are questions.

01:06.480 --> 01:09.480
I'll take this microphone and walk around.

01:09.480 --> 01:10.480
Okay.

01:10.480 --> 01:11.480
Perfect.

01:11.480 --> 01:12.480
Thank you.

01:12.480 --> 01:13.480
Okay.

01:13.480 --> 01:14.480
I'll get it.

01:14.480 --> 01:15.480
Okay.

01:15.480 --> 01:16.480
Okay.

01:17.480 --> 01:18.480
Okay.

01:18.480 --> 01:19.480
Okay.

01:19.480 --> 01:22.480
All right.

01:22.480 --> 01:23.480
Thank you.

01:23.480 --> 01:35.480
I appreciate you coming through our discussion about dynamic resource allocation for networking.

01:35.480 --> 01:38.480
My name is Doug Smith.

01:38.480 --> 01:44.480
I work on OpenShift's networking and a member of Network Pulling Working Group.

01:44.480 --> 01:47.480
And this is Miguel, who I'm joined with.

01:47.480 --> 01:53.480
We also collaborate on a number of things in this area.

01:53.480 --> 01:54.480
Okay.

01:54.480 --> 01:56.480
So I'll take it from here.

01:56.480 --> 01:59.480
So the first thing, let's walk through the agenda.

01:59.480 --> 02:01.480
We'll try to define the problem.

02:01.480 --> 02:04.480
We're trying to solve and which problem we're facing right now.

02:04.480 --> 02:12.480
We will try to paint the landscape of multi networking in the Kubernetes ecosystem.

02:13.480 --> 02:16.480
And we will afterwards provide the context for DRA.

02:16.480 --> 02:19.480
Explain what dynamic resource allocation is.

02:19.480 --> 02:25.480
I mean, I guess you guess that has to do allocating resources, but we will go throughout a little bit deeper.

02:25.480 --> 02:31.480
And kind of contextualize what does that mean in the context of networking in Kubernetes.

02:31.480 --> 02:34.480
Or especially multi networking in Kubernetes.

02:34.480 --> 02:41.480
We will then finalize where the tour of the resources of a community like what do we have at this point in time.

02:41.480 --> 02:44.480
And we will then finalize where the call to action.

02:44.480 --> 02:48.480
So DQR and problem and the alternative to that.

02:48.480 --> 02:49.480
I'm not sure.

02:49.480 --> 02:54.480
I assume most of you are used to Kubernetes and are using it.

02:54.480 --> 02:57.480
And are quite aware of the Kubernetes networking model.

02:57.480 --> 03:07.480
So pretty much what you get is like all your workloads and the cluster will get one single networking interface that is totally managed.

03:07.480 --> 03:09.480
It connects every workload in the system.

03:09.480 --> 03:10.480
You don't want that.

03:10.480 --> 03:13.480
Put network policy on top to kind of drop traffic.

03:13.480 --> 03:17.480
And that is what you have today.

03:17.480 --> 03:22.480
But question is what you need what if you need more than one network interface.

03:22.480 --> 03:24.480
So we're a connection to more networks.

03:24.480 --> 03:27.480
Let's say you're implementing a firewall or something.

03:27.480 --> 03:30.480
Well, you won't do that with one single interface.

03:30.480 --> 03:34.480
What is Kubernetes has made it.

03:34.480 --> 03:40.480
Let's say this smart news, but it totally avoided answering this problem.

03:40.480 --> 03:44.480
And left it out of there like use cases.

03:44.480 --> 03:46.480
Thing is the community.

03:46.480 --> 03:51.480
Join forces and like there's an initiative called the network plumbing working group.

03:51.480 --> 03:59.480
They came up with something called multi-CNI, which kind of tricks the system and provides you an API that you can request more attachments.

03:59.480 --> 04:02.480
To different networks on your workloads.

04:02.480 --> 04:04.480
So it pretty much is an out-of-three solution.

04:04.480 --> 04:06.480
This is not integrating Kubernetes at all.

04:06.480 --> 04:10.480
Kubernetes does not know anything about this.

04:10.480 --> 04:16.480
And while that is extremely flexible, it looks extremely bad.

04:16.480 --> 04:18.480
So it's the two things in the left.

04:18.480 --> 04:22.480
If you look at it, it's like a network attachment definition in the bottom left.

04:22.480 --> 04:28.480
It's pretty much like a JSON encode string where you put whatever it is that she wants.

04:28.480 --> 04:32.480
That will make sense to your plugin for it to do what you want.

04:32.480 --> 04:35.480
And in the bottom, we have how you request an attachment to that.

04:35.480 --> 04:37.480
You just put in the annotations.

04:37.480 --> 04:42.480
Again, a JSON encode string saying which runtime parameters you want to have it.

04:42.480 --> 04:44.480
So it's not very comfortable.

04:44.480 --> 04:48.480
It looks bad, but does the deed.

04:48.480 --> 04:54.480
Now, the something came up like recently a couple of years ago, probably,

04:54.480 --> 04:59.480
called the Kubernetes native multi networking initiative and they propose something different.

04:59.480 --> 05:04.480
So you would have a generic CRD called pod network in the bottom right.

05:04.480 --> 05:06.480
Where you would define your provider.

05:06.480 --> 05:08.480
You would pass a couple of parameters and all that.

05:08.480 --> 05:12.480
And then in the upper right, you would have like,

05:12.480 --> 05:16.480
you would get to refer to these things in the pod specification.

05:16.480 --> 05:20.480
So you would have to mutate the pod specification as it is today.

05:20.480 --> 05:22.480
You would need to change the pod specification.

05:22.480 --> 05:29.480
If you have like a network array where you could put like which attachments you want to have.

05:29.480 --> 05:31.480
Now,

05:31.480 --> 05:35.480
a signal network, a Kubernetes signal network looked at this and said pretty much.

05:35.480 --> 05:38.480
Nope, you're not going to touch that.

05:38.480 --> 05:41.480
You're not going to change the pod specification.

05:41.480 --> 05:43.480
Find something else.

05:43.480 --> 05:47.480
And this was like there were pretty adamant about this.

05:47.480 --> 05:51.480
This was like a non-negotiable kind of thing.

05:51.480 --> 05:58.480
And it kind of left us, well, left the community in a position where we do not know how to proceed.

05:58.480 --> 06:02.480
Like doing this implemented out of three, we already have that's not what we want.

06:02.480 --> 06:03.480
How do we put this in?

06:03.480 --> 06:05.480
So the question essentially is,

06:05.480 --> 06:17.480
how do we come up with the way a Kubernetes native way to request like a generic resource that happens to be an attachment to a network?

06:17.480 --> 06:21.480
That's the only thing we need to sort out, I guess.

06:21.480 --> 06:23.480
And Doug, we'll take it over from now.

06:23.480 --> 06:25.480
Awesome. Thank you, Miguel.

06:25.480 --> 06:29.480
So now that Miguel has laid the groundwork,

06:29.480 --> 06:31.480
you might be wondering, well,

06:31.480 --> 06:36.480
what the heck is DRA dynamic resource allocation?

06:36.480 --> 06:40.480
Let's imagine this scenario.

06:40.480 --> 06:46.480
You've got a workload and it's going to use a GPU as you may have heard

06:46.480 --> 06:52.480
artificial intelligence and machine learning is fairly popular these days.

06:52.480 --> 07:01.480
Dynamic resource allocation is really designed for this specific case where you may need a GPU resource.

07:01.480 --> 07:04.480
So if you've got a workload, it wants to use a GPU.

07:04.480 --> 07:09.480
And you've got a cluster and there are so many GPUs available.

07:09.480 --> 07:20.480
Well, what DRA allows you to do is to inform the scheduler that you have a number of those available.

07:20.480 --> 07:24.480
So if you've got this workload, which node should it arrive on?

07:24.480 --> 07:28.480
Well, as you can see here on the middle node, there are no GPUs available.

07:28.480 --> 07:32.480
So it would leave you with one or two.

07:32.480 --> 07:37.480
So DRA allows you a way to express that you want to,

07:37.480 --> 07:43.480
you want to express your intention to get some of these hardware resources.

07:43.480 --> 07:45.480
So what about in networking?

07:45.480 --> 07:51.480
Well, you can say the same thing for a virtualized Nick, for example.

07:51.480 --> 08:00.480
So SROV, single root Iovertualization, which is a hardware technology to have high performance networking.

08:00.480 --> 08:04.480
You have exhaustible resources there as well.

08:04.480 --> 08:05.480
So the same thing applies.

08:05.480 --> 08:10.480
If you say I'd like to request a virtual interface,

08:10.480 --> 08:16.480
well, maybe there aren't any available on a particular node.

08:16.480 --> 08:21.480
If you've been in this space for a while, you may be asking yourself,

08:21.480 --> 08:29.480
well, don't we already have device plugins, which have been in Kubernetes for quite some time?

08:29.480 --> 08:34.480
And believe it or not, even back in Kubernetes 1.8.

08:34.480 --> 08:39.480
This was also designed for AIML scenarios as well.

08:39.480 --> 08:45.480
And the thing with device plugins, and we do use these for SROV, for example,

08:45.480 --> 08:50.480
is it's just a integer counter.

08:50.480 --> 08:54.480
That's all that you get, is just a count of resources,

08:54.480 --> 09:01.480
1, 2, 3, 4, decrement the counter if you use one on a particular node or not.

09:01.480 --> 09:08.480
In DRA, you have a way to express this with Kubernetes resource,

09:08.480 --> 09:17.480
resource class and a resource claim, which actually gives you a richer structure to express those.

09:17.480 --> 09:23.480
And additionally, you can also, instead of saying one pod claims this one resource,

09:23.480 --> 09:26.480
it's exhausted, you can't use this anymore.

09:26.480 --> 09:32.480
DRA allows you this particular opportunity to say,

09:32.480 --> 09:36.480
share this resource.

09:36.480 --> 09:42.480
So why would we use this for networking?

09:42.480 --> 09:51.480
And I would say that the problem is really actually compounded by a problem that a lot of people are trying to solve today,

09:51.480 --> 09:57.480
which is actually where you need GPUs and nicks at the same time.

09:57.480 --> 10:04.480
So if you've got a scaled up training scenario where you're using GPUs and you're using nicks,

10:04.480 --> 10:13.480
because you need to have a high data transfer rate between nodes to share the checkpoints as you're running through training,

10:13.480 --> 10:19.480
you're going to need to allocate both the GPUs and the nicks,

10:19.480 --> 10:24.480
and you're also going to need to get, particularly like a numa alignment.

10:24.480 --> 10:33.480
So you have like the same compute nodes in order to, in order to get that high data,

10:33.480 --> 10:36.480
to get that high bandwidth that you need.

10:36.480 --> 10:38.480
And so that's one part.

10:38.480 --> 10:48.480
And the other part is we kind of still don't have a common Kubernetes native way to express our intent to use a network resource,

10:48.480 --> 10:52.480
as Miguel is mentioning, we've been struggling with this for years.

10:52.480 --> 10:56.480
And the question also is, why not CNI?

10:56.480 --> 11:03.480
Well, CNI is used widely for networking in Kubernetes,

11:03.480 --> 11:12.480
but CNI in some ways predates Kubernetes as designed to be container orchestration agnostic,

11:12.480 --> 11:15.480
and then Kubernetes one.

11:15.480 --> 11:19.480
So it's the container orchestration engine that people talk about.

11:19.480 --> 11:23.480
So you have two different API.

11:23.480 --> 11:26.480
So if you're somebody who's building Kubernetes controllers,

11:26.480 --> 11:29.480
and you use Kubernetes API all day every day,

11:29.480 --> 11:35.480
and you go to use CNI, all of a sudden this looks like a total non-secward RQ.

11:35.480 --> 11:40.480
So sometimes people want to kind of walk around CNI,

11:40.480 --> 11:45.480
which removes kind of like a common place that we use for problem solving.

11:45.480 --> 11:49.480
And are there downsides to DRA for networking?

11:49.480 --> 11:50.480
Yes.

11:51.480 --> 11:55.480
And we are trying to work through these, and here's the thing.

11:55.480 --> 11:59.480
CNI itself is actually very elegant and very simple.

11:59.480 --> 12:01.480
I'm a real big fan of it.

12:01.480 --> 12:05.480
DRA has a number of moving parts,

12:05.480 --> 12:10.480
and those moving parts will probably be easier to interact with programatically

12:10.480 --> 12:13.480
if you're building Kubernetes controllers,

12:13.480 --> 12:15.480
but as an administrator of a system,

12:15.480 --> 12:18.480
as a human interacting with a system,

12:18.480 --> 12:22.480
it's more parts to line up.

12:22.480 --> 12:24.480
And at the end of the day,

12:24.480 --> 12:28.480
don't be afraid that CNI is just going to disappear

12:28.480 --> 12:31.480
as we know, APIs kind of are forever,

12:31.480 --> 12:35.480
and CNI will continue to be used.

12:35.480 --> 12:38.480
And in fact, there is a project that is out there,

12:38.480 --> 12:43.480
very, very fresh, that's a CNI DRA driver

12:43.480 --> 12:47.480
that's built to bridge these worlds together.

12:47.480 --> 12:52.480
So let's take a real quick tour of the community.

12:52.480 --> 12:56.480
A few of the working groups that you may want to visit,

12:56.480 --> 13:00.480
there's a number of stakeholders in this between the INL side,

13:00.480 --> 13:02.480
the networking side, and otherwise,

13:02.480 --> 13:05.480
one we have the device management working group,

13:05.480 --> 13:09.480
which was looking to build out structured parameters for DRA.

13:09.480 --> 13:13.480
This is particularly to solve a shared resources problem

13:13.480 --> 13:15.480
where let's say you have a GPU,

13:15.480 --> 13:18.480
that has 48 gigs of V-RAM on it,

13:18.480 --> 13:27.480
and you schedule a 12 gigs worth of V-RAM usage on that GPU.

13:27.480 --> 13:30.480
Well, you just have like reserved the whole thing,

13:30.480 --> 13:33.480
and now you've wasted the majority of the V-RAM.

13:33.480 --> 13:37.480
So this was intended to help share that.

13:37.480 --> 13:41.480
There's also the Kubernetes multi network working group,

13:41.480 --> 13:42.480
as Miguel mentioned,

13:42.480 --> 13:45.480
to come up with a Kubernetes native way to express your intent

13:45.480 --> 13:47.480
to have multiple network attachments.

13:47.480 --> 13:49.480
They're very interested in this,

13:49.480 --> 13:52.480
and last but not least, is the network plumbing working group,

13:52.480 --> 13:54.480
which actually has been working on the Kubernetes

13:54.480 --> 13:57.480
multi networking problem for eight years.

13:57.480 --> 14:00.480
And it's hard to work around the number of the problems

14:00.480 --> 14:03.480
and knows the pain points of these very well.

14:03.480 --> 14:07.480
There's also a number of projects out there for you to take a look at.

14:07.480 --> 14:10.480
One of my favorites is one called network DRA,

14:10.480 --> 14:13.480
which is a POC by Lionel Join,

14:13.480 --> 14:17.480
and kind of does some of the like initial exploration

14:17.480 --> 14:21.480
into this idea of using DRA for networking.

14:21.480 --> 14:25.480
Lionel also spearheaded the next bullet point on the slide,

14:25.480 --> 14:27.480
the CNI DRA driver.

14:27.480 --> 14:31.480
This has exactly one pull request merge to it,

14:31.480 --> 14:34.480
as of today, it is spanking new.

14:34.480 --> 14:37.480
And also out of the device management working group,

14:37.480 --> 14:42.480
there's a number of examples that use SRIOV as part of their examples

14:42.480 --> 14:47.480
for those AML training scenarios with high speed networking.

14:47.480 --> 14:51.480
In closing, what is the future of this?

14:51.480 --> 14:53.480
And part of the question is,

14:53.480 --> 14:57.480
we don't know, we're kind of seeing the POCs right now.

14:57.480 --> 15:02.480
And so that means it is the ultimate time to get involved.

15:02.480 --> 15:05.480
Your voice can really be heard.

15:05.480 --> 15:08.480
I also think it's an incredibly important time

15:08.480 --> 15:10.480
for the community to be involved

15:10.480 --> 15:14.480
so that we don't necessarily see the 800 pound gorillas in the room

15:14.480 --> 15:17.480
running away with it and taking it for themselves.

15:17.480 --> 15:19.480
Like, if you have a use case,

15:19.480 --> 15:21.480
if you develop CNI solutions,

15:21.480 --> 15:25.480
you can actually get your voice heard right now.

15:25.480 --> 15:29.480
And I think it'll be really important to the long-term life cycles

15:29.480 --> 15:32.480
of these to have you there.

15:33.480 --> 15:36.480
And to me, the answer is really why

15:36.480 --> 15:41.480
why DRA for networking is because we need a Kubernetes native way

15:41.480 --> 15:45.480
to express how to consume network attachments.

15:45.480 --> 15:47.480
We don't have that today,

15:47.480 --> 15:50.480
and this represents potentially a common place

15:50.480 --> 15:52.480
where we can start to standardize that.

15:52.480 --> 15:55.480
If you would like to learn more,

15:55.480 --> 15:58.480
please check out this blog article of mine.

15:58.480 --> 16:03.480
It goes into depth, and also gives a couple pathways

16:03.480 --> 16:06.480
to put your hands on this technology

16:06.480 --> 16:10.480
and run through a tutorial to actually see it in action.

16:10.480 --> 16:13.480
So I really appreciate that,

16:13.480 --> 16:15.480
and thank you for your time.

16:15.480 --> 16:17.480
Any questions?

16:17.480 --> 16:30.480
Thanks for the talk.

16:30.480 --> 16:32.480
That was really interesting.

16:32.480 --> 16:36.480
I know there are some people from IBM research group

16:36.480 --> 16:39.480
that were working on the CDI concept,

16:39.480 --> 16:42.480
which stands for Composable Disaggregated Infrastructure.

16:42.480 --> 16:45.480
And for me, it looks like the DRA

16:45.480 --> 16:49.480
that we have been talking about through the presentation.

16:49.480 --> 16:52.480
So my question here is, are you familiar with it?

16:52.480 --> 16:55.480
And if so, are there any similarities

16:55.480 --> 16:58.480
and differences with the DRA concept?

17:01.480 --> 17:05.480
Great question and a very informed question.

17:05.480 --> 17:09.480
Actually, if you bring up some of those POCs,

17:09.480 --> 17:13.480
you're actually going to see that they also use CDI as well.

17:13.480 --> 17:18.480
So you're going to see that that is actually an important way

17:18.480 --> 17:22.480
that some of that information lines up

17:22.480 --> 17:26.480
at the container runtime level as well,

17:26.480 --> 17:31.480
because there's sort of like Kubernetes layer of expression,

17:31.480 --> 17:33.480
and then what's going to happen at the runtime

17:33.480 --> 17:36.480
to get that information down into the containers

17:36.480 --> 17:39.480
and CDI is used.

17:39.480 --> 17:42.480
So you'll see on some of Lionel's POCs that he's saying,

17:42.480 --> 17:45.480
hey, enable CDI.

17:45.480 --> 17:48.480
No problem.

17:48.480 --> 17:52.480
Anyone else?

17:52.480 --> 17:54.480
No? Okay. Thank you very much.

17:54.480 --> 17:56.480
Thank you. Appreciate your time.

18:07.480 --> 18:11.480
Oh, yeah.

18:11.480 --> 18:14.480
Oh, you were? Okay.

18:14.480 --> 18:15.480
Okay, cool.

18:15.480 --> 18:17.480
That's awesome.

18:17.480 --> 18:19.480
Oh, yeah.

18:19.480 --> 18:21.480
Thank you.

18:21.480 --> 18:23.480
Thank you.

18:23.480 --> 18:25.480
Thank you.

18:36.480 --> 18:46.480
Oh, yeah.

18:46.480 --> 18:49.480
Oh, yeah.

18:49.480 --> 18:52.480
Thank you.

18:52.480 --> 18:55.480
Thank you.

18:55.480 --> 18:58.480
Thank you.

18:58.480 --> 19:01.480
Oh, thank you.

19:02.480 --> 19:03.480
Oh, no.

19:03.480 --> 19:05.480
I'm just trying.

19:05.480 --> 19:07.480
So, I have to speak to myself.

19:07.480 --> 19:08.480
Yeah.

19:08.480 --> 19:09.480
Oh, yeah.

19:09.480 --> 19:11.480
Thank you.

19:11.480 --> 19:13.480
Thank you.

19:13.480 --> 19:15.480
Thank you.

19:15.480 --> 19:17.480
Thank you.

19:17.480 --> 19:19.480
Thank you.

19:19.480 --> 19:21.480
Thank you.

19:21.480 --> 19:23.480
Thank you.

19:23.480 --> 19:26.480
Thank you.

19:26.480 --> 19:28.480
Thank you.

19:28.480 --> 19:30.480
Thank you.

19:30.480 --> 19:34.480
Thank you.

19:34.480 --> 19:35.480
All right.

19:35.480 --> 19:37.480
Who do you are for student?

19:37.480 --> 19:39.480
Who do you work for?

19:39.480 --> 19:42.480
No.

19:42.480 --> 19:44.480
Hello.

19:44.480 --> 19:46.480
What are you doing?

19:46.480 --> 19:47.480
Yeah.

19:47.480 --> 19:48.480
But we don't.

19:48.480 --> 19:49.480
We don't.

19:49.480 --> 19:51.480
I'm not using it.

19:51.480 --> 19:53.480
No.

19:53.480 --> 19:55.480
I can't.

19:55.480 --> 19:56.480
I can't.

19:56.480 --> 19:57.480
I can't.

19:57.480 --> 19:58.480
I can't.

19:58.480 --> 19:59.480
I can't.

19:59.480 --> 20:00.480
I can't.

20:00.480 --> 20:01.480
I can't.

20:01.480 --> 20:03.480
I can't.

20:03.480 --> 20:05.480
I can't.

