WEBVTT

00:00.000 --> 00:07.000
OK, so it's green, we're taking on faith for it's working.

00:07.000 --> 00:10.000
All right, I think we're starting.

00:10.000 --> 00:13.000
OK, and I have working mouse, excellent.

00:13.000 --> 00:15.000
And I will stay behind the desk.

00:15.000 --> 00:20.000
OK, so I'm a compiler dev over at AMD.

00:20.000 --> 00:23.000
And I want to float a slightly weird idea,

00:23.000 --> 00:25.000
as a bunch of friendly compiler engineers,

00:25.000 --> 00:27.000
and falls down to quite friendly,

00:27.000 --> 00:29.000
and some of you are probably compiler engineers.

00:29.000 --> 00:32.000
So I hope that's going to work out.

00:32.000 --> 00:35.000
Just for those brief scientists check for me,

00:35.000 --> 00:36.000
just at your hand of the air,

00:36.000 --> 00:40.000
if you've committed something to LOM or GC.

00:40.000 --> 00:43.000
OK, that's really good news.

00:43.000 --> 00:46.000
This is a short block of time.

00:46.000 --> 00:48.000
So I can't start with a nice discussion

00:48.000 --> 00:50.000
of what a compiler front end is.

00:50.000 --> 00:52.000
So we're going to go straight for,

00:52.000 --> 00:55.000
I think the compiler back end

00:56.000 --> 01:00.000
is more specialized to the task that it needs to be.

01:00.000 --> 01:03.000
But we would have a happier time as compiler engineers,

01:03.000 --> 01:05.000
and as users of the compiler,

01:05.000 --> 01:09.000
if we gradually threw away the back end,

01:09.000 --> 01:13.000
by stretching the middle end further and further back,

01:13.000 --> 01:17.000
until it actually hits instruction, the mission.

01:17.000 --> 01:20.000
And that is not a widely popular view.

01:21.000 --> 01:25.000
But it's not completely unfounded.

01:25.000 --> 01:29.000
So the spirit of establishing some form of credibility,

01:29.000 --> 01:32.000
I'm going to try to talk about a couple of very strongly

01:32.000 --> 01:35.000
back end specific things, which I mentioned in the back end,

01:35.000 --> 01:38.000
which I didn't, and it was fine.

01:38.000 --> 01:40.000
There.

01:40.000 --> 01:44.000
It's a little bit GPU-specific, but not horrendously so.

01:44.000 --> 01:48.000
I think complete ignorance of GPU architectures will be fine.

01:48.000 --> 01:52.000
So that, notwithstanding, first example is memory allocation,

01:52.000 --> 01:54.000
specific to GPU.

01:54.000 --> 01:56.000
Oh, over this context.

01:56.000 --> 01:57.000
Excellent.

01:57.000 --> 02:00.000
Yes, please talk to me while I'm speaking,

02:00.000 --> 02:03.000
or after I'm speaking, or using these computer things,

02:03.000 --> 02:07.000
if you must, questions are welcome anywhere in this.

02:07.000 --> 02:09.000
If none of you ask any questions,

02:09.000 --> 02:11.000
there's going to be a lot of time at the end,

02:11.000 --> 02:14.000
so I probably don't have 20 minutes of content for you all.

02:15.000 --> 02:18.000
So it's been called ODS.

02:18.000 --> 02:21.000
This is a small block, our very fast memory,

02:21.000 --> 02:26.000
which it's important to use if you want your GPU code to run quickly.

02:26.000 --> 02:31.000
There's a GPU is structured as a kind of...

02:31.000 --> 02:34.000
Emure was a number of completely independent programs,

02:34.000 --> 02:36.000
which can talk to each other a bit,

02:36.000 --> 02:41.000
backed by slightly jibis terminology.

02:41.000 --> 02:46.000
The best way to describe this is as a register allocation problem.

02:46.000 --> 02:50.000
The premises, one part of your paid,

02:50.000 --> 02:54.000
has a number on it for how many bytes of its magic memory do you want,

02:54.000 --> 02:56.000
and you have to ask for a small number,

02:56.000 --> 02:57.000
it doesn't start.

02:57.000 --> 03:00.000
But you can say, I want 16 kilobytes,

03:00.000 --> 03:03.000
and you have that, and that's fine.

03:03.000 --> 03:05.000
And then other parts of your code,

03:05.000 --> 03:08.000
want to use variables, which are somewhere in the single,

03:08.000 --> 03:12.000
integer's block, and they need to find them.

03:12.000 --> 03:15.000
So the original game here,

03:15.000 --> 03:19.000
is if you want to reference ODS from a different function,

03:19.000 --> 03:22.000
to one which allocates a debt, you can't,

03:22.000 --> 03:25.000
just like tough, doesn't compile.

03:25.000 --> 03:27.000
So you can say,

03:27.000 --> 03:29.000
who has 16 kilobytes,

03:29.000 --> 03:31.000
and a few calls bar,

03:31.000 --> 03:34.000
and bar wants to reference a variable number ODS,

03:34.000 --> 03:35.000
air can't.

03:36.000 --> 03:39.000
That was unpopular, because people write functions,

03:39.000 --> 03:41.000
and if he functions,

03:41.000 --> 03:44.000
cannot be fully in line into the caller,

03:44.000 --> 03:48.000
you get a compile error, and it's sad.

03:48.000 --> 03:52.000
So what I want to do here is take

03:52.000 --> 03:55.000
the block memory allocates in one function,

03:55.000 --> 04:01.000
and scribble out enough metadata that it can be found from other functions.

04:02.000 --> 04:05.000
And the task here amounts to,

04:05.000 --> 04:08.000
x is a double, y is a float.

04:08.000 --> 04:11.000
They both need an address in your single block memory,

04:11.000 --> 04:15.000
about 10 and 16 or whatever you want.

04:15.000 --> 04:18.000
Such that it can be located elsewhere.

04:18.000 --> 04:20.000
And,

04:20.000 --> 04:22.000
for the compile engine, there's some buildings,

04:22.000 --> 04:25.000
but it's really obviously register allocation.

04:25.000 --> 04:29.000
We used to think it registers as you have 16 distinct values,

04:29.000 --> 04:31.000
and we all have our own special name.

04:31.000 --> 04:36.000
But it's clearly a single block of memory of length 16 times which we've registered.

04:36.000 --> 04:41.000
And the goal is assigning offsets in this block of memory to the magic names,

04:41.000 --> 04:45.000
so that you can find stuff in these variables from somewhere else.

04:51.000 --> 04:55.000
And the reason you do register allocation in the back end,

04:55.000 --> 04:58.000
which is you want to do it relatively late.

04:59.000 --> 05:02.000
And the reason we did ODS allocation in the back end,

05:02.000 --> 05:05.000
is because that's where we did register allocation.

05:09.000 --> 05:12.000
So yeah, there's a small common theme here.

05:12.000 --> 05:14.000
There's a thing called global ISO.

05:14.000 --> 05:18.000
I love the underside of the selection-day based back end

05:18.000 --> 05:22.000
was ugly and could be improved,

05:22.000 --> 05:26.000
and adopted a more as safe and alternative IR.

05:26.000 --> 05:29.000
I want to say 10 years ago.

05:29.000 --> 05:32.000
I remember it being announced and being very excited.

05:32.000 --> 05:35.000
And I don't believe we've moved over to it yet.

05:35.000 --> 05:39.000
But if you write code for the NDGP back end,

05:39.000 --> 05:42.000
you end up implementing for SDG and for global ISO,

05:42.000 --> 05:44.000
and the code path is different and it's,

05:44.000 --> 05:47.000
I don't want to write all of test cases twice.

05:47.000 --> 05:49.000
So instead,

05:49.000 --> 05:51.000
there's an IR pass,

05:51.000 --> 05:53.000
which does all the red,

05:53.000 --> 05:56.000
like sort of things you'd expect to see.

05:56.000 --> 05:58.000
It proves for call graph going,

05:58.000 --> 06:00.000
who can reach this variable,

06:00.000 --> 06:01.000
do this variable as ALS,

06:01.000 --> 06:04.000
where shall we allocate these variables?

06:04.000 --> 06:06.000
And the IR pass,

06:06.000 --> 06:10.000
and it's a table investigator as IR.

06:10.000 --> 06:15.000
It's a constant array of integers.

06:15.000 --> 06:18.000
And that's tested in IR.

06:18.000 --> 06:20.000
It has lists of optimizations,

06:20.000 --> 06:22.000
which try to do a better job of laying out memory.

06:22.000 --> 06:25.000
Or to an IR.

06:25.000 --> 06:28.000
And because it's written as a single pass,

06:28.000 --> 06:32.000
the SDG and global ISO parts are tiny.

06:32.000 --> 06:35.000
In fact, it's mostly implemented in the assembler,

06:35.000 --> 06:40.000
which had to be taught what a constant number was for ALDS.

06:40.000 --> 06:44.000
And it was fine.

06:44.000 --> 06:47.000
I got some pushback from colleagues,

06:47.000 --> 06:50.000
because I was writing it in my wrong place.

06:51.000 --> 06:52.000
But it worked.

06:52.000 --> 06:54.000
It's since been modified by people.

06:54.000 --> 06:57.000
People have now mostly stopped tagging me and reviews for it.

06:57.000 --> 06:59.000
So it means that code is coherent enough

06:59.000 --> 07:03.000
for other people to change it without dragging me into a loop, which is great.

07:03.000 --> 07:06.000
And even though it's been done in the wrong place,

07:06.000 --> 07:08.000
it's fine.

07:08.000 --> 07:12.000
It's kind of a simple instance of red sugar allocation.

07:12.000 --> 07:17.000
But if we can do some forms of red sugar allocation in IR,

07:18.000 --> 07:22.000
we can't do other forms of red sugar allocation in IR.

07:22.000 --> 07:26.000
We're like brief side note that some other compilers do

07:26.000 --> 07:29.000
in fact do red sugar allocation on S-saform.

07:29.000 --> 07:31.000
And it's okay.

07:31.000 --> 07:34.000
It works out all right.

07:34.000 --> 07:35.000
Here's your example.

07:35.000 --> 07:39.000
I've got cooler and is very back end.

07:39.000 --> 07:41.000
It's very architecture-specific.

07:41.000 --> 07:44.000
There's lots of scribbling stuff in specific registers.

07:44.000 --> 07:46.000
And carefully managing the stack.

07:46.000 --> 07:50.000
I'm lucky that previous talk had pictures of the stack,

07:50.000 --> 07:52.000
manipulation code around function calls,

07:52.000 --> 07:55.000
because I don't have any code.

07:55.000 --> 07:59.000
And very edit functions are crafty awkward things,

07:59.000 --> 08:03.000
where you have to scribble state into particular parts of the stack

08:03.000 --> 08:06.000
and go forth and find it later.

08:06.000 --> 08:08.000
And people want,

08:08.000 --> 08:11.000
strictly speaking people want to print F on the GPU.

08:11.000 --> 08:15.000
The only very edit function anyone ever cares about is print F.

08:16.000 --> 08:18.000
Yeah, it's cyber.

08:18.000 --> 08:21.000
It seems to be the case.

08:21.000 --> 08:24.000
And in fact, print F is currently implemented in,

08:24.000 --> 08:26.000
I think, four different ways on AMD,

08:26.000 --> 08:28.000
four different languages.

08:28.000 --> 08:30.000
Like, hip and open-open, open-open,

08:30.000 --> 08:31.000
or have their own one.

08:31.000 --> 08:33.000
Oh, and lib C has its own one as well.

08:33.000 --> 08:36.000
And all of that one is a dreadful.

08:40.000 --> 08:43.000
So I was looking at,

08:43.000 --> 08:48.000
implementing their edit functions for lib C mostly.

08:48.000 --> 08:51.000
And the similar theme turned up.

08:51.000 --> 08:53.000
I don't want to do this twice.

08:53.000 --> 08:56.000
Ah, I have fixed that typo under review,

08:56.000 --> 08:58.000
and then failed to update the slides.

08:58.000 --> 08:59.000
Never mind.

08:59.000 --> 09:03.000
And I don't want to write the test code for,

09:03.000 --> 09:05.000
are we lowering their edit calls correctly?

09:05.000 --> 09:08.000
Because I've done that for a simple architecture.

09:08.000 --> 09:10.000
It was dreadful.

09:10.000 --> 09:12.000
It was for a huge comment or a problem.

09:12.000 --> 09:15.000
And you have to do it for a stack and the global ISO.

09:15.000 --> 09:17.000
And it's just very edit functions.

09:17.000 --> 09:19.000
No one cares about much about them.

09:19.000 --> 09:22.000
So really, who's got the patience?

09:22.000 --> 09:25.000
Also, that's kind of common theme.

09:25.000 --> 09:27.000
People don't care about their edit functions.

09:27.000 --> 09:30.000
The standard thing which you get when you're bringing up

09:30.000 --> 09:32.000
a new architecture is to do,

09:32.000 --> 09:34.000
or like, fatal error, unimplemented.

09:34.000 --> 09:38.000
Or maybe if name is print F, do this otherwise fatal error.

09:39.000 --> 09:43.000
And there's nothing that magic about varied functions,

09:43.000 --> 09:44.000
inherently.

09:44.000 --> 09:48.000
You can kill these things off in IR.

09:48.000 --> 09:51.000
What a varied function amongst you is,

09:51.000 --> 09:53.000
take all of the arguments,

09:53.000 --> 09:56.000
and stick them in contingent memory,

09:56.000 --> 09:59.000
and pass a pointer to that contingent memory.

09:59.000 --> 10:00.000
That's all we're doing.

10:00.000 --> 10:01.000
That's all ARCH does.

10:01.000 --> 10:02.000
It's all X86 does.

10:02.000 --> 10:05.000
They have different weird ideas about pointer.

10:05.000 --> 10:07.000
Shout out to WebAssembly for actually just

10:07.000 --> 10:09.000
doing a really obvious simple thing.

10:09.000 --> 10:11.000
A WebAssembly just sticks them all in struct,

10:11.000 --> 10:13.000
and passes a void start of a struct.

10:13.000 --> 10:15.000
That's it.

10:15.000 --> 10:18.000
And that works for everything.

10:18.000 --> 10:19.000
Right?

10:19.000 --> 10:22.000
So AMD GPU and the APTX,

10:22.000 --> 10:24.000
they can totally take all the arguments

10:24.000 --> 10:25.000
and put them in a struct,

10:25.000 --> 10:28.000
and pass a pointer to a struct.

10:28.000 --> 10:31.000
And then you don't have to do a bunch of stuff in back end.

10:31.000 --> 10:33.000
But that would have been even worse,

10:33.000 --> 10:34.000
because this was partly for lib C.

10:34.000 --> 10:35.000
I'd have had to do this for

10:35.000 --> 10:39.000
S-Dite and Global ISO for AMD GPU and for MDPTX,

10:39.000 --> 10:42.000
and I really don't want to write test cases for MDPTX.

10:42.000 --> 10:45.000
So just not having it.

10:45.000 --> 10:50.000
So I do this as an ARCH bus.

10:50.000 --> 10:54.000
This actually works out really pretty,

10:54.000 --> 10:57.000
because all the horrible target specific craft,

10:57.000 --> 10:58.000
which was very scary,

10:58.000 --> 11:01.000
and made people want to do this in the back end.

11:01.000 --> 11:06.000
Both of you actually had to send the ARCH 64 API docs

11:06.000 --> 11:09.000
are crazy in having described

11:09.000 --> 11:11.000
but you have to lower the stuff.

11:11.000 --> 11:14.000
But all the architecture dependent stuff

11:14.000 --> 11:17.000
is kind of hidden behind the fearless destroyer,

11:17.000 --> 11:20.000
which is basically a pointer

11:20.000 --> 11:22.000
that you can walk forwards.

11:22.000 --> 11:24.000
So you can take a very early function

11:24.000 --> 11:26.000
as we're presented in IR,

11:26.000 --> 11:29.000
and you can build a struct,

11:29.000 --> 11:30.000
an alica,

11:31.000 --> 11:34.000
and you can copy the ARCH into it.

11:34.000 --> 11:38.000
And then you can kill the dot dot dots at the end of a function,

11:38.000 --> 11:41.000
and pass a VA list instead.

11:41.000 --> 11:44.000
And now you're a very early function's gone,

11:44.000 --> 11:46.000
although the no longer really knows

11:46.000 --> 11:48.000
if it used to be a very early function,

11:48.000 --> 11:51.000
it thinks you're just passing a pointer to an alica.

11:51.000 --> 11:54.000
And that means stuff like inlining, now works again.

11:54.000 --> 11:57.000
And for AMD and MDPTX,

11:57.000 --> 11:59.000
that's just how it works.

11:59.000 --> 12:01.000
Veredic functions are free.

12:01.000 --> 12:03.000
Instead of this crafty aggravesing thing,

12:03.000 --> 12:05.000
they're just weird syntactic shooter

12:05.000 --> 12:07.000
for passing a struct to a function.

12:07.000 --> 12:09.000
For a web assembly,

12:09.000 --> 12:10.000
I actually have this conference.

12:10.000 --> 12:11.000
I need to find someone in WebAssembly

12:11.000 --> 12:13.000
and get them to review a change

12:13.000 --> 12:15.000
because I implemented a thing for WebAssembly

12:15.000 --> 12:17.000
and couldn't get them sign off on it.

12:17.000 --> 12:20.000
So Veredic should be free on WebAssembly too.

12:20.000 --> 12:23.000
And we'll be, once I find one of them.

12:23.000 --> 12:26.000
Strictly speaking, I haven't bothered to do this.

12:27.000 --> 12:30.000
I've implemented this website to six and eight out of six to four.

12:30.000 --> 12:31.000
And when I went to check,

12:31.000 --> 12:33.000
May and it turns out I haven't pushed it.

12:33.000 --> 12:35.000
So they will be free on those scenes,

12:35.000 --> 12:36.000
but currently they're not.

12:39.000 --> 12:41.000
Yeah.

12:41.000 --> 12:43.000
So I like testing code in IR,

12:43.000 --> 12:46.000
because you write for code you've got

12:46.000 --> 12:48.000
and you write for your code you expect.

12:48.000 --> 12:50.000
And then you argue with file checks,

12:50.000 --> 12:52.000
red-exengine for a while.

12:52.000 --> 12:54.000
And then you're done.

12:54.000 --> 12:57.000
And relative to testing in MIR.

12:57.000 --> 12:58.000
It's great.

12:58.000 --> 13:00.000
Relative to testing clan.

13:00.000 --> 13:01.000
It's great.

13:01.000 --> 13:04.000
You have an IR pass.

13:04.000 --> 13:05.000
You can print bits of IR,

13:05.000 --> 13:07.000
or the objects you've got.

13:07.000 --> 13:08.000
You can dump.

13:08.000 --> 13:10.000
That is so easy.

13:10.000 --> 13:13.000
And the backend does not always like that.

13:13.000 --> 13:18.000
So this is sort of a question I want to post to you guys.

13:18.000 --> 13:23.000
The compiler backend is very specialized to machine code.

13:23.000 --> 13:27.000
So specialized to specific targets machine code.

13:27.000 --> 13:29.000
You drop out of a safe form,

13:29.000 --> 13:33.000
as soon as you do register allocation.

13:33.000 --> 13:36.000
So that actually necessary.

13:36.000 --> 13:41.000
We've got here in a sensible half dependent kind of fashion.

13:41.000 --> 13:45.000
But I live from does red-alip on a safe form.

13:45.000 --> 13:47.000
And it's fine.

13:47.000 --> 13:49.000
You had an infinite set of SSA variables.

13:50.000 --> 13:54.000
And you marked some of them as this one needs to be in Rax.

13:54.000 --> 13:56.000
So this one.

13:56.000 --> 13:57.000
And you're fine.

13:57.000 --> 14:01.000
It's the same as red-alip always is.

14:01.000 --> 14:03.000
I sell.

14:03.000 --> 14:07.000
My favorite is not GPA.

14:07.000 --> 14:11.000
My favorite is the crackpot ASIC, which is no longer with us.

14:11.000 --> 14:16.000
But that featured a kind of MIPS style instruction set,

14:16.000 --> 14:18.000
which is very friendly to work with.

14:18.000 --> 14:23.000
And we just did, I just did an intrinsic for every instruction.

14:23.000 --> 14:26.000
Every instruction you could write an assembly had an intrinsic,

14:26.000 --> 14:29.000
with a name very like the assembly instruction.

14:29.000 --> 14:33.000
So if you didn't want to do instruction selection in the compiler,

14:33.000 --> 14:35.000
and you didn't want to write assembly,

14:35.000 --> 14:39.000
you could just write the sequence of intrinsic.

14:39.000 --> 14:40.000
You wanted.

14:40.000 --> 14:42.000
And you got out exactly about sequence of assembly,

14:42.000 --> 14:46.000
position intrinsic turned into one assembly instruction.

14:47.000 --> 14:49.000
And scheduling's even easier.

14:49.000 --> 14:53.000
We can reorder instructions in SSA for almost not problem.

14:53.000 --> 14:55.000
And this DAQ and BIND thing.

14:55.000 --> 14:57.000
I really like S DAQ and BIND.

14:57.000 --> 14:58.000
That's very pretty.

14:58.000 --> 15:02.000
You write your little transform and see this passing turn into a simple thing.

15:02.000 --> 15:03.000
It's lovely.

15:03.000 --> 15:08.000
But it's combined as the same thing, with a better DSL.

15:08.000 --> 15:14.000
And MIR rewrite passes looks an awful lot like an IR rewrite pass.

15:14.000 --> 15:19.000
It's the same premise, right? Just with more awkward notation.

15:19.000 --> 15:21.000
So yeah.

15:21.000 --> 15:25.000
I kind of it we shouldn't do this.

15:25.000 --> 15:29.000
I'm running our sign and no one's asking any questions.

15:29.000 --> 15:31.000
Which is dreadful.

15:31.000 --> 15:35.000
So I'm going to try to prompt you to ask something here,

15:35.000 --> 15:38.000
because I've just told you wildly contentious things, right?

15:38.000 --> 15:42.000
We've got a compiler back end, it has its own special structures.

15:42.000 --> 15:46.000
I'm saying we didn't have to do that.

15:46.000 --> 15:49.000
No one wants to call me on that.

15:49.000 --> 15:51.000
Wonderful, we have to take.

15:51.000 --> 15:52.000
You're the one.

15:52.000 --> 15:53.000
Go, you're first.

15:53.000 --> 15:56.000
Well, I think you're working your business here.

15:56.000 --> 15:57.000
I have to like this.

15:57.000 --> 16:02.000
I have to refer to this idea of being an estimator.

16:02.000 --> 16:05.000
So you work on a formula that I've always scaled up.

16:05.000 --> 16:08.000
I'm trying to realize that I'm a bit in the end.

16:09.000 --> 16:12.000
But on the end, do you mean you're, what the heck it does?

16:12.000 --> 16:16.000
And in terms of the director, so we're dealing with a lot of things.

16:16.000 --> 16:20.000
So that is tomorrow's talk.

16:20.000 --> 16:24.000
The question here is paraphrasing slightly.

16:24.000 --> 16:27.000
Isn't the AMD GB, isn't the AMD back end?

16:27.000 --> 16:29.000
Crazy complicated.

16:29.000 --> 16:31.000
Which is yes.

16:31.000 --> 16:36.000
It would be an awful lot simpler if we kept it in IR.

16:37.000 --> 16:42.000
But the AMD GB back end, slowly, solely under my prodding,

16:42.000 --> 16:45.000
trying to move stuff out of the back end into IR,

16:45.000 --> 16:47.000
because the back end confuses the hell out of me.

16:47.000 --> 16:51.000
And every time I touch it it breaks in really weird ways.

16:51.000 --> 16:56.000
So I don't think we should throw the back end away in really right fashion.

16:56.000 --> 17:01.000
Though I need to find some MLIR people to see if they're a bit more game for that.

17:01.000 --> 17:05.000
I think we should just bear in mind that when you're writing something in the back end,

17:06.000 --> 17:08.000
wouldn't to be easier in IR.

17:08.000 --> 17:11.000
There's madly slow optimization you're writing in MIR.

17:11.000 --> 17:15.000
Wouldn't to be much nicer to write it in IR instead.

17:15.000 --> 17:18.000
How much structured do you need to add to the IR,

17:18.000 --> 17:21.000
so you can write the optimization in IR instead,

17:21.000 --> 17:24.000
as still get the code out.

17:24.000 --> 17:31.000
And if we can slowly drift things like, I don't say,

17:32.000 --> 17:35.000
all the type mungent nonsense around lowering,

17:35.000 --> 17:40.000
but you've got an I-66, because I love the info that was a good idea,

17:40.000 --> 17:42.000
and your target doesn't know what that is.

17:42.000 --> 17:45.000
So you do the type legalising and all the re-rising.

17:45.000 --> 17:47.000
Totally divasin IR.

17:47.000 --> 17:51.000
And that's for this huge block of really complicated stuff in IR stack,

17:51.000 --> 17:55.000
which we could just gently migrate up into IR,

17:55.000 --> 17:58.000
and factor out of IR and out of global IRs,

17:58.000 --> 18:01.000
and release in two places when you add it in one place,

18:01.000 --> 18:03.000
and you test them more easily,

18:03.000 --> 18:06.000
and you just have two slightly simpler back ends.

18:06.000 --> 18:09.000
And everything is better, except for compile time.

18:09.000 --> 18:15.000
But it's, but it's, it's something that everything is better.

18:15.000 --> 18:17.000
I'm not sure I've got any more slides.

18:17.000 --> 18:19.000
Oh yeah, we should do more of this.

18:19.000 --> 18:21.000
And we've had one question.

18:21.000 --> 18:23.000
Did you have another one?

18:23.000 --> 18:26.000
Oh yes, one person speaks, two more speak.

18:26.000 --> 18:27.000
Yeah.

18:27.000 --> 18:34.000
So I understand, I, once I'm communicating about where you draw the line,

18:34.000 --> 18:39.000
meet with middle end and back end, because in my act,

18:39.000 --> 18:43.000
well, I'm not very much involved with either 360 or LLVM.

18:43.000 --> 18:48.000
I, but my knife interpretation is something like the middle end

18:48.000 --> 18:51.000
is something architecture independent stuff,

18:51.000 --> 18:55.000
and the back end is where the architecture independent stuff happens.

18:55.000 --> 18:56.000
Excellent.

18:56.000 --> 18:58.000
It is one way of cutting it.

18:58.000 --> 19:02.000
But I get the impression you use a different way of cutting it

19:02.000 --> 19:03.000
with a sign.

19:03.000 --> 19:07.000
So you would say, middle end is where you are in IR,

19:07.000 --> 19:13.000
which is a mostly machine independent representation.

19:13.000 --> 19:16.000
And then we have MIR with, if I gather correctly,

19:16.000 --> 19:21.000
means something like machine dependent IR in LLVM.

19:21.000 --> 19:24.000
And you are mostly talking about doing stuff

19:24.000 --> 19:27.000
on IR instead of MIR.

19:27.000 --> 19:31.000
But the stuff may still be targeted dependent,

19:31.000 --> 19:36.000
because you introduce more target specific.

19:36.000 --> 19:40.000
The target specific in 26 into IR.

19:40.000 --> 19:45.000
So if I get the point of your talk correctly,

19:45.000 --> 19:48.000
you don't say do everything platform independent,

19:48.000 --> 19:52.000
because there are really things that are platform dependent.

19:52.000 --> 19:55.000
But you are more questioning.

19:55.000 --> 19:57.000
I think this is a very good question,

19:57.000 --> 19:59.000
and we should consider it.

19:59.000 --> 20:03.000
Do we need some different kinds of IR for the back end

20:03.000 --> 20:05.000
and for the middle end?

20:05.000 --> 20:10.000
And I think you just show that most likely

20:10.000 --> 20:12.000
we don't need to kinds of IR.

20:12.000 --> 20:13.000
Excellent.

20:13.000 --> 20:14.000
The main point.

20:14.000 --> 20:16.000
OK, I must try to repeat that.

20:16.000 --> 20:21.000
So the first of it is where you draw the line

20:21.000 --> 20:25.000
between the middle end and the back end.

20:25.000 --> 20:27.000
It's different between different people.

20:27.000 --> 20:30.000
And the general premise of this talk is roughly

20:30.000 --> 20:33.000
that we're drawing it in the wrong place.

20:33.000 --> 20:35.000
One belief is that I love the IR.

20:35.000 --> 20:36.000
It's target independent.

20:36.000 --> 20:39.000
I think the target independent work in the middle end

20:39.000 --> 20:41.000
on IR.

20:41.000 --> 20:42.000
I love the IR.

20:42.000 --> 20:43.000
It's not target independent.

20:43.000 --> 20:47.000
Also, that was a nice idea which we lost 20 years ago.

20:47.000 --> 20:49.000
They're in tracks really badly.

20:49.000 --> 20:51.000
We're like, see.

20:51.000 --> 20:53.000
So you can't have that.

20:53.000 --> 21:00.000
Currently, we try to keep the very back end specific stuff.

21:00.000 --> 21:02.000
Way down in MIR.

21:02.000 --> 21:04.000
But by the time it was work out,

21:04.000 --> 21:08.000
we increasingly drift information back up towards the middle end.

21:08.000 --> 21:11.000
Because if you don't spend things like that,

21:11.000 --> 21:14.000
there are vectorizer optimization in middle end.

21:14.000 --> 21:16.000
You don't want to write about for every target.

21:16.000 --> 21:18.000
I know what registers you've got.

21:18.000 --> 21:20.000
What's not going to work.

21:20.000 --> 21:25.000
So I think we have some slightly historic ideas

21:25.000 --> 21:28.000
about where we should do work in the compiler.

21:28.000 --> 21:31.000
And it should be recognised as such.

21:31.000 --> 21:36.000
There's not so much of a right place to do the transform.

21:36.000 --> 21:40.000
It's where do we currently do similar transforms?

21:40.000 --> 21:43.000
And there's some inertia to moving stuff.

21:43.000 --> 21:51.000
But you can absolutely write the entirety of X86 as a safe form.

21:51.000 --> 21:53.000
And you can do the same with MDGPA.

21:53.000 --> 21:55.000
And you should not do it with SIMT.

21:55.000 --> 21:57.000
You should do it with vector types.

21:57.000 --> 21:59.000
I'm a fat gentleman.

21:59.000 --> 22:03.000
But if you can describe machine code,

22:03.000 --> 22:06.000
literally verbatim in a safe form.

22:06.000 --> 22:10.000
And we do most of our optimizations in a safe form.

22:11.000 --> 22:15.000
Why are we accidentally, well,

22:15.000 --> 22:19.000
are we sure that it's important to change

22:19.000 --> 22:21.000
to this much more awkward representation

22:21.000 --> 22:23.000
to shuffle stuff around for a bit?

22:23.000 --> 22:26.000
Because it feels like our testing would be best

22:26.000 --> 22:27.000
if we didn't.

22:27.000 --> 22:29.000
Ah, I've run out of time.

22:29.000 --> 22:30.000
Thank you.

22:30.000 --> 22:31.000
Your question was wonderful.

22:31.000 --> 22:33.000
It was a summary of the entire talk.

22:33.000 --> 22:35.000
Thank you.

