WEBVTT

00:00.000 --> 00:10.280
I'm going to give you a presentation about torchemlay hour, or mainly I give you an introduction

00:10.280 --> 00:11.280
into torchemlay hour.

00:11.280 --> 00:17.800
I'm Marius, I'm with AMD, and the talk will be about, well, first of all, as it is the

00:17.800 --> 00:22.640
first MLAR talk today, I give you at least a brief idea about what MLAR is, so that you

00:22.640 --> 00:28.200
hurt the kind of the fundamentals, tell you what torchemlay are is, what the torch

00:28.280 --> 00:33.240
dialect is, and mainly if you, yeah, as it is supposed to be an introduction, tell you

00:33.240 --> 00:37.840
how can you build a project, how can you use it, what can you do with it, and also how

00:37.840 --> 00:40.160
to get involved if you want to.

00:40.160 --> 00:46.000
So, what's MLAR, if you look at the homepage, so MLAR is essentially a sub project of

00:46.000 --> 00:53.040
the LNVM foundation, if you look at the homepage, MLAR.lvm.org, it tells you it's a novel approach

00:53.040 --> 00:58.040
to build a reusable and extensible compiler infrastructure, I put novel into you, yeah,

00:58.040 --> 01:03.920
it raises here because it's not that novel anymore, to be honest, so it was like spread

01:03.920 --> 01:11.760
to the world in 2019, and in 2020 it made its way into the LLVM project, it provides

01:11.760 --> 01:21.080
an extensible way to create and design intermediate representation, so it's all SSA based,

01:21.080 --> 01:24.920
format essentially looks like like shown here, you have a result, you have your operation

01:24.920 --> 01:34.840
with inputs, you can define custom attributes, custom pipes, all that stuff, and what it

01:34.840 --> 01:40.120
is about is like normally if you design, sub-shame, intermediate representation, that's

01:40.120 --> 01:46.800
called dialect, within the words of MLAR, and it's not like you need to go from one dialect

01:46.800 --> 01:51.520
completely into another dialect, you can also mix dialects, means like for example, in

01:51.520 --> 01:56.760
upstream there is an SCF dialect structured control flow, but that can be mixed with other

01:56.760 --> 02:04.960
dialects and you can nicely combine this and dialect just more or less loosely scope, different

02:04.960 --> 02:09.880
kind of things, so like linear operations are in the linear dialect structured control

02:09.880 --> 02:16.880
flow has operations for do structured control flow, so and with that said, touch MLAR actually

02:16.880 --> 02:21.360
aims to provide to be a first class compiler support between the PyTorch ecosystem and

02:21.440 --> 02:28.560
the MLAR ecosystem, it's participating in the LLVM incubator process, so even here I'm here

02:28.560 --> 02:33.840
on behalf of AMD, that's not an AMD project, it's an LLVM project, but we heavily contribute

02:33.840 --> 02:40.080
to it and use it, it's do a license under Apache license with LLVM exceptions, but also under

02:40.080 --> 02:44.720
BSD star license, I think that's due to historical reasons, it wasn't that clear back in

02:44.720 --> 02:54.800
the time, if it either makes it into more like MLAR or LLVM or go small down the route to PyTorch,

02:54.800 --> 03:00.840
and with that said, actually the main component, it looks like following, so if you have

03:00.840 --> 03:07.840
PyTorch which is in the PyTorch repository, PyTorch itself comes which touch of X and what

03:07.920 --> 03:15.520
we have in the touch MLAR project is an FX importer and that FX importer allows you to go

03:15.520 --> 03:22.560
to rate down, to import, charge it back into the touch dialect, so one of the main things in

03:22.560 --> 03:28.960
touch MLAR is the touch dialect, and furthermore what we have in the project are conversions

03:28.960 --> 03:37.760
to other dialects, to other MLAR dialects, so these conversion also live in the touch MLAR repository,

03:38.480 --> 03:45.840
and allow you to convert to TOSA, to Linux, but also to our upstream dialects living in the main

03:45.840 --> 03:51.600
MLAR project, but you can also there's also conversions down to stable HLO, which is more used

03:51.600 --> 03:59.280
or comes from the tells of low ecosystem and lives in the open XLA, organization, so that's

03:59.280 --> 04:06.480
all what is in the touch MLAR project, and the main thing we have is the touch dialect, so that's

04:06.560 --> 04:14.240
the central MLAR abstraction to represent a touch model essentially, mostly of this is auto-generated

04:14.240 --> 04:20.960
based on the PyTorch GDR operator registry, but we also have some manually implemented ops

04:21.920 --> 04:26.640
to make it short, we have those for operations, which cannot be automatically implemented

04:27.600 --> 04:32.000
among other things for example, because they are directly supported in a GDR operator

04:32.880 --> 04:38.240
without having a corresponding operation in the registry itself, so that's why we cannot

04:38.240 --> 04:43.600
auto-generate it, and we also have a touch operator operation to represent ops,

04:45.280 --> 04:49.600
which we haven't been auto-generated for what that is used for, I will show you in a minute,

04:51.200 --> 04:57.520
but that's the main thing, and in MLAR, this all lives, so if you look to upstream MLAR, it would

04:57.520 --> 05:04.000
be include MLAR, here we have include touch MLAR dialect, touch IR touch ops, that's where all

05:04.000 --> 05:10.400
ops are actually implemented, and in line 26 you see an include, that's actually where we have some

05:10.400 --> 05:16.160
scripts that generate those auto-generated operations, that's got included here in the definition.

05:18.000 --> 05:23.200
If you want to build a project, well you need of course to clone it, and there are several

05:23.200 --> 05:27.840
sub-modules which you need to initialize before you can build a project, so the one thing is

05:29.840 --> 05:36.320
touch MLAR comes with one of the git sub-modules is actually LLBM, because we need to

05:36.320 --> 05:43.360
to pick the right version of MLAR, you can also use a separately checked out MLAR, which you

05:43.360 --> 05:50.160
compile an install, that might work, or it might not work, because you need to pick the right

05:50.160 --> 05:56.320
commit, and that's what he is serialized, we're pinning the correct sap module, if you specified

05:56.320 --> 06:02.640
that by one that gives you a shallow clone, so you don't clone the entire LLBM, just to let you know.

06:04.400 --> 06:09.760
It's suggested to set up a Python virtual environment, and install your requirements,

06:09.760 --> 06:15.040
you need to upgrade pip if you want to play around with touch vision, because all the

06:15.040 --> 06:20.480
pip versions normally don't fail on the dependency resolution, and you cannot install touch vision.

06:21.040 --> 06:25.200
In the requirements file, you also get the correct version of touch, which is compatible

06:25.200 --> 06:30.320
with touch MLAR, so that's all in the requirements file and should be pretty straightforward.

06:31.200 --> 06:35.200
If you want to build the project, we don't go through all this CMake magic here,

06:35.840 --> 06:40.560
I just want to highlight two things. You can either do a monolithic bolt, which means

06:40.560 --> 06:47.280
we are building LLBM and MLAR, and touch MLAR in one single bolt, and to do this,

06:47.840 --> 06:52.080
you need to enable the project MLAR, of course, which is a subject of LLBM.

06:52.640 --> 06:58.480
In addition to that, you need to specify that there is an LLBM external project,

06:58.480 --> 07:04.160
with this case, touch MLAR, point to the source here, where to find the source, which is quite

07:04.160 --> 07:09.600
obvious, we are right now in the root of the repository, so it's just here, and LLBM essentially

07:09.680 --> 07:15.600
is cloned into external LLBM project LLBM, that is where it gets, I get some modules, initialized.

07:16.400 --> 07:21.760
If you want to look more into the specifics, I gave a talk two years ago here at the LLBM,

07:21.760 --> 07:26.000
they're through about how to build your own MLAR dialect, that covers some of these

07:26.960 --> 07:32.400
aspects as well, gives you a little bit more background, or you just look into the MLAR standalone example,

07:32.400 --> 07:36.320
which also shows you the different ways, how to build it, but it's also, of course, in the

07:36.320 --> 07:44.000
touch MLAR documentation. The other way is to do a component build, so we build MLAR independent

07:44.000 --> 07:49.120
of touch MLAR first, so that's essentially just building MLAR, without touch MLAR at all,

07:50.160 --> 07:55.040
so it's stripped down configuration set of what we have seen before, and we install it to build

07:55.040 --> 08:02.400
LLBM here, for example. Then it allows you to build touch MLAR against this pre-built version of

08:02.480 --> 08:09.760
MLAR, and the important flex here is you need to specify the MLAR where to find the project,

08:09.760 --> 08:16.160
the LLBM, where to find the project, and that's the thing which is often missed,

08:17.440 --> 08:22.160
and you'll fail then right at the end of compiling it all, you need to point to

08:22.160 --> 08:25.920
various the LLBM that can be found, and that's not installed, it's still in the built here,

08:25.920 --> 08:31.440
you just need to be aware, pass it, and you'll be fine. And with that, you can actually build

08:31.520 --> 08:36.880
the target, for example, check touch MLAR. I think this target specifically mainly works with

08:36.880 --> 08:41.600
the monolithic build, for the component build, it seemed to have some trouble when I last tried it,

08:41.600 --> 08:45.280
so normally people do the monolithic build, but that's the options you actually have.

08:46.080 --> 08:52.240
If you just want to play a round with the project, there's a way easier way, so we got pre-built

08:52.240 --> 08:58.320
Python wheels, which gives you the Python RP, as well as some of the command line tools.

09:00.720 --> 09:05.280
That would be the torch version, which is right now pinned down in the requirements file,

09:05.280 --> 09:11.040
so you probably want to install that one by yourself. Here we point to the 90 CPU wheels,

09:11.040 --> 09:16.080
so if you just pip install torch, that pulls in all the CUDA dependencies, there's also a

09:16.160 --> 09:21.920
rocker version available on PyTorch.org, but to be honest, if you just want to play around with

09:21.920 --> 09:27.920
it a little bit, and don't need to use to upload anything on a GPU, go with the CPU versions,

09:27.920 --> 09:34.640
as well as those are way smaller, and they don't load your display that much,

09:35.520 --> 09:42.000
we do it in our CI specifically all the time for the small tests. You need to install on

09:42.080 --> 09:47.520
next, if you also want to import on next models to torch MLR, we get to that in a second,

09:47.520 --> 09:54.960
and then you can just install torch MLR from essentially, you find this link in the command

09:54.960 --> 10:01.200
in LLVM's less torch MLR release as well, and then you can play around with it.

10:01.840 --> 10:05.360
So a simple torch model could look like the following, so here we have just

10:06.320 --> 10:12.560
a torch in an module with an in and forward path, and what you simply can do to get it

10:12.560 --> 10:20.400
translated or imported into torch MLR is, use the torch MLR FX, use export in import,

10:20.400 --> 10:25.040
call it in your model, say what you want to have for an output, here we say torch to get into the

10:25.040 --> 10:31.520
torch dialect, and that's it, then we are in torch. For onics, it's a little bit different, so you

10:31.600 --> 10:35.760
can just for example download a model which is pre-trained on the end of the start of set,

10:36.480 --> 10:41.680
and then you can use the command line tool to torch MLR import onics, which comes also with

10:41.680 --> 10:49.120
the Python wheels, past the onics file to it, specify an operator set version, what the

10:49.120 --> 10:54.640
does is it raises the operators that are in the model to a specific version you specify that

10:54.640 --> 10:59.840
can be quite advantage if you want to later on compile your model, opposite 17 is what we are normally

11:00.240 --> 11:07.920
using, and output the mness MLR file, so in such an MLR file looks like the following,

11:09.280 --> 11:13.840
I cut it down to the most important parts, there are constant in there like all the weights

11:13.840 --> 11:19.840
are in the MLR file, we don't want to look at that, but essentially here you have an onics add

11:19.840 --> 11:27.040
and onics re-lew and an onics max pool, but that's only names, so the actual operation is torch.operator,

11:27.840 --> 11:33.200
and this torch that operator needs to be lowered somehow, and that is where we implemented

11:33.200 --> 11:38.640
or what is implemented, and it's just like paving the name for this up and later on we need to

11:38.640 --> 11:45.760
process it and decompose it into other other other ops, so for example this could look like the

11:45.760 --> 11:53.200
following, in MLR or in torch MLR normally files are structured as follows, so you have lift conversion

11:53.200 --> 11:59.360
and then we have torch onics to torch, which is the conversion path to lowers this torch.operator

11:59.360 --> 12:06.160
ops which then have onics ops in it to torch ops, for an app's up it's quite easy, so first of all

12:06.160 --> 12:12.400
you have the name apps, one is the operator set, so you can also for some operations, there are

12:12.400 --> 12:17.680
different operators that support it, or if you need to check why is my onics operation not

12:17.680 --> 12:22.400
supported the way I think it should be, that might be one of the reasons that the operator set

12:22.400 --> 12:28.000
version is not what you expected, there's some something technically triggered there, but essentially

12:28.000 --> 12:35.280
we see the revirator and that has a replace up with new op, and we just replace this torch operator

12:35.280 --> 12:41.040
op with in torch 8 in apps op, and that's it, so for that simple ops it's quite straightforward,

12:41.040 --> 12:47.680
or we get then from onics to torch, for other ops it's a little bit different, one of the

12:47.760 --> 12:54.000
this is for example average pool, and that's the thing you might notice that the onics op

12:54.000 --> 12:59.760
in general is supported, but several attributes are not, and here for example there's an attribute

12:59.760 --> 13:04.960
which is called auto pad, but that's not supported, so there are many operations which are

13:04.960 --> 13:11.040
supported in general for a specific offset, but it might not be that the options are fully supported,

13:11.040 --> 13:15.760
so that's need to be something where you should be aware of, but that's also slightly a good way

13:15.760 --> 13:22.400
to get yourself into torch MLA if you want to contribute. There's also a decompose complex operation

13:22.400 --> 13:29.120
path, so for some things like for here it's hard shrink, we replace this by way easier ops,

13:30.560 --> 13:40.000
so essentially you end up with some simple letter than scalar op, a greater than scalar op,

13:40.080 --> 13:47.120
and some other ops, so that is also a way, where this place it's now in a lit dialect torch

13:47.120 --> 13:51.680
transforms, and that's because we are not leaving the dialect, so we are working on the dialect,

13:52.720 --> 13:56.720
replacing ops of the dialect with other ops of the same dialect, that's why it's not

13:56.720 --> 14:04.160
normally called a transform and by you find far there. Pipelines, Pipelines is another important

14:04.160 --> 14:08.560
thing of course, so when we have single conversion, you normally don't want to stick with a single

14:08.640 --> 14:13.920
conversion, you're doing multiple conversions, and if you do multiple conversions, you can

14:13.920 --> 14:19.760
put them together to build up a pipeline, Zeus are the pipelines that are available right now in

14:21.200 --> 14:25.840
torch MLAR, and the most important ones are probably the torch backends one,

14:27.040 --> 14:31.440
either you can look into torch on extra torch back and pipeline, which shows you how to get from

14:31.440 --> 14:38.160
torch on extra torch, but as I mentioned, and if you want to run a model, you don't want to end

14:38.160 --> 14:44.000
in the torch dialect, so the torch project itself depends a little bit from where you come from,

14:44.000 --> 14:49.200
if you look at the full stack, we are having machine learning model and want to execute it,

14:49.200 --> 14:54.080
torch somewhere in the middle, from a compiler perspective, torch is more like in the front,

14:54.080 --> 14:59.280
so it's a front and dialect, but it's not to be used as standalone, so we use it in our

14:59.280 --> 15:04.800
compilers as a front and mainly to get a pie torch model or on next models into the compiler,

15:04.880 --> 15:10.640
but then we go all the way down to the like on tensors, and then our compiler takes over,

15:12.000 --> 15:18.160
does more magic, lowered this all to LB and CPU, for example, but if you just want to use torch

15:18.160 --> 15:23.040
MLAR that's possible, there's a rough backend, which is not included in the Python wheels,

15:23.040 --> 15:27.920
so you need to be aware that it's right now you would just copy over the files from the repository

15:28.000 --> 15:34.320
or from source, so that's the torch MLAR E2E test suite, which is there, and essentially here

15:34.320 --> 15:40.080
our output type is not torch anymore, it's linear contenders, and then you can just say, okay,

15:40.080 --> 15:49.040
please backend, compile it, load it, and there we go, so how to get involved, so a good thing is

15:49.040 --> 15:55.840
maybe you want to import and lower a model by using torch MLAR OPT, it's worth to also look into

15:55.840 --> 16:03.280
what do the pipelines do, if I just take one of the conversions like convert torch on x to on x,

16:03.280 --> 16:08.880
what's the next step, how do I lower it further, like torch decompose complex ops could be an

16:08.880 --> 16:15.040
option, how do I need to, when should I use the canonicalizer, all the stuff is set up in the

16:15.040 --> 16:21.280
pipeline, but to get familiar with it and what it does, it's quite good to just use the command line

16:21.440 --> 16:27.840
to, and heck your way, heck your way into it, and just try to see what it does, compare I are before

16:27.840 --> 16:35.120
and after, if you have issues, fill issues, please, if you have questions, you can also do this by

16:36.160 --> 16:42.160
directing the repository in the issue tracker, but it's most often the better way to just join

16:42.160 --> 16:49.920
discussions on the LVM discord, and what we normally did was new status and what's quite

16:50.240 --> 16:55.840
nice is, we let new status implement new ops, which were unsupported, so right now many of

16:55.840 --> 17:02.000
the ops are already supported, but as I mentioned, not every attribute is supported, so if you want

17:02.000 --> 17:08.400
to get a hands on really, try to see what attributes are missing, hopefully there's a to do there,

17:08.960 --> 17:13.840
but maybe you also notice when you lower your model, that something is not working as you expect,

17:14.400 --> 17:19.840
and then try to implement or add support for one of these attributes, which belongs to an

17:19.840 --> 17:25.280
onex operator, which is also specified in the onex documentation, so this is also a pretty good one

17:25.280 --> 17:31.920
to look into what is the op should be suspected or you try to implement an onex op with a different

17:31.920 --> 17:38.400
ops set and try to add support for newer attributes essentially, so yeah, and that's my brief

17:38.400 --> 18:06.560
introduction into torch MLAR, peace, I was asked, or the question was, in the pet lines

18:06.560 --> 18:11.840
overview, we have the torch simplification pipeline, which sounds quite interesting, if I can

18:11.840 --> 18:17.040
get a comment on, if I could comment on what it does or what's in there, and to be honest, no,

18:17.040 --> 18:24.080
I have no idea, no, I would need to look it up to be honest, so it might be that there are some

18:24.080 --> 18:30.160
shape computations, which could be the thing, we definitely have some shape refinement pipelines,

18:30.160 --> 18:35.520
also simplification might be that, it might also be that it is what I mentioned before,

18:35.520 --> 18:42.320
like this decompose complex op, that that is part of the specific pipeline, but I don't know

18:42.320 --> 18:51.360
that one in particular, what passes that in, I'm sorry, question about the kind of motivation

18:51.360 --> 18:57.920
of this project, like, or the next can do most of these things, and why you might having to be

18:57.920 --> 19:05.040
other, so the question was about the motivation of the project and that annex can do most of the

19:05.040 --> 19:13.360
things as well, can you, well, I think you're referring to the ONIX MLAR project then, so

19:17.280 --> 19:23.600
you're right, that for ONIX, that's for sure true, but for torch, I think it's not, so if you really

19:23.600 --> 19:28.560
want to implement, or if you convert a model from torch to ONIX, that's often painful,

19:29.280 --> 19:33.840
that's at least what people tell me, who are more on the model side, and that's why it's mainly

19:33.920 --> 19:40.640
intended to be to provide a native way to get torch into MLAR, that's why it's the torch MLAR project,

19:41.520 --> 19:49.680
from a historical perspective, it started as NPCOM, which was like more numpy, front and specific,

19:49.680 --> 19:54.880
then switched gears a little bit, and it's for torch, and I think that we have ONIX, it's just

19:56.080 --> 20:03.120
more or less like, you can just wait, work all the way down, with the same stack to not pull in even

20:03.600 --> 20:05.600
more.

20:08.640 --> 20:12.240
One, last, if I have time, go for it.

20:12.240 --> 20:20.800
You should, for example, of a rewrite as a sort of piston-assumatic sequence plus, do you have a

20:20.800 --> 20:26.080
DSL project, sometimes, when you say something like this, as a special reason, rewrite it

20:26.160 --> 20:29.680
or something like we need, and we'll tape it down and send this up.

20:32.240 --> 20:36.000
You mean, for here, for that one, are?

20:36.000 --> 20:40.000
Yeah, but you're, is that a dark, dark rewrite?

20:40.000 --> 20:46.080
Yeah, now I think we don't do it in table jump, but I cannot tell you why we don't do it,

20:46.080 --> 20:50.640
or why it's done that way, maybe because people don't really like to write table jump.

20:50.640 --> 20:58.960
All right, I don't that, though, all right, thank you.