WEBVTT

00:00.000 --> 00:08.000
Okay, so let's start.

00:08.000 --> 00:12.000
Hi, I'm Fred Casto, I'm from Sousa.

00:12.000 --> 00:20.000
And likely I've been interested in switched statements and how GCC optimizes switched statements.

00:20.000 --> 00:28.000
So I've come here to tell you about what I've found on my journey.

00:28.000 --> 00:36.000
And also to share one contribution I actually made to GCC switched statement optimizations.

00:42.000 --> 00:47.000
Let's start with some definitions, so that we understand each other.

00:47.000 --> 00:51.000
What will be important? Well, this will be important.

00:51.000 --> 00:55.000
I expect you know what a switch statement in CS.

00:55.000 --> 01:01.000
And the thing in parenthesis that determines which of the cases will execute.

01:01.000 --> 01:05.000
Let's call that the index variable.

01:05.000 --> 01:07.000
So this one important thing.

01:07.000 --> 01:11.000
And the other important thing is the case range.

01:11.000 --> 01:21.000
The case range, I basically mean the interval from the smallest case number to the highest case number.

01:21.000 --> 01:25.000
And then this case range may have some holes in it.

01:25.000 --> 01:29.000
Not sure if there's any technical term for this.

01:29.000 --> 01:35.000
Meaning for example here we don't have case for in case 5.

01:35.000 --> 01:47.000
If there are any holes in the case range and how many there are how big they are will be important for some of the optimizations.

01:47.000 --> 01:53.000
So let's say you are the compiler and you run into a switch.

01:53.000 --> 01:59.000
At some point during the compilation you have to break down the switch into some simpler statements.

01:59.000 --> 02:05.000
And this is the situation we'll be dealing with in this talk.

02:05.000 --> 02:11.000
And first I want to quickly mention the navel way of doing this.

02:11.000 --> 02:17.000
You can break down the switch to a chain of if and else statements.

02:17.000 --> 02:25.000
You just go through all the cases and for each one ask if i is equal to its number.

02:25.000 --> 02:31.000
This works but in the worst case you go through all the cases.

02:31.000 --> 02:41.000
So you go through many conditionals which is not efficient in terms of CPU cycles.

02:41.000 --> 02:49.000
And before we discuss smart database of doing this I want to show you two specific switches.

02:49.000 --> 02:55.000
And I want to think about how you would optimize this.

02:55.000 --> 03:01.000
How would you rewrite this to something more efficient and something that isn't a switch.

03:01.000 --> 03:13.000
I will tell you what GCC will in its current version do with this which is at the end.

03:13.000 --> 03:25.000
So eventually we will find out how GCC takes advantage of those.

03:25.000 --> 03:31.000
So let's now go through the major switch optimizations in GCC.

03:31.000 --> 03:37.000
The one I want to focus on the most is switch conversion.

03:37.000 --> 03:45.000
This is an optimization that was contributed by Martin Jambor who is also from Susan.

03:45.000 --> 03:53.000
And the scenario where we would use switch conversion is such.

03:53.000 --> 04:01.000
You have a switch like this one that only assigns constants to some variables.

04:01.000 --> 04:03.000
In this case it is the variable x.

04:03.000 --> 04:11.000
That may sound niche but I think pretty useful case to optimize.

04:11.000 --> 04:17.000
And the idea is that you take these constants.

04:17.000 --> 04:27.000
You put them in an array and then you use the variable name index variable to index the array.

04:27.000 --> 04:37.000
What you then get is instead of many conditionals you get just a memory read which is nice.

04:37.000 --> 04:53.000
One thing to know is that you actually also have to include some bounce checks so that you don't end up doing them out of bounce memory read.

04:53.000 --> 04:55.000
Let's take this a bit further.

04:55.000 --> 04:59.000
What if we could get even of that memory read?

04:59.000 --> 05:11.000
Well it turns out we can and GCC is able to do this in the case where x turns out to be a linear function of i.

05:11.000 --> 05:21.000
Which is how obvious this is but you can compute x by multiplying i by free and adding 5.

05:21.000 --> 05:31.000
And GCC can recognize this and instead of the memory read you have seen any instead of the array.

05:31.000 --> 05:39.000
Just insert the necessary numerical operations, the multiplication and addition.

05:39.000 --> 05:47.000
And one thing I forgot to mention in the previous case with the basics which conversion.

05:47.000 --> 05:57.000
This optimization can handle some holes in the case range but there mustn't be too many.

05:57.000 --> 06:01.000
In the case of this thing we need to transformation this flavor of switch conversion.

06:01.000 --> 06:07.000
There mustn't be any holes in the case range.

06:09.000 --> 06:15.000
So that was the switch conversion.

06:15.000 --> 06:21.000
I also want to show you the other major switch optimizations.

06:21.000 --> 06:31.000
The most general of this and now we are giving the realm of switches which only assign constants to some variables.

06:31.000 --> 06:37.000
The most general optimization is the decision tree optimization.

06:37.000 --> 06:49.000
Basically this is a direct upgrade in most cases I think of the knife approach we've seen.

06:49.000 --> 06:53.000
Here you basically take the case range.

06:53.000 --> 07:05.000
You split it at some point and you ask in which of the resulting sections of the case range is i.

07:05.000 --> 07:19.000
And then you do this again recursively and again you can visualise this as a tree and eventually end up in the code that you want to execute.

07:19.000 --> 07:27.000
So this is nice. This gets us from worst case linear time to worst case algorithmic time.

07:28.000 --> 07:33.000
But yeah this is like the most basic thing we can do.

07:33.000 --> 07:45.000
Usually we want to apply some more powerful techniques at the expense of them not being entirely general.

07:45.000 --> 07:53.000
So one of these is the jump table optimization optimization.

07:53.000 --> 07:59.000
This thing to understand jump tables.

07:59.000 --> 08:13.000
We have to look at the problem at the assembler level and the core concept is an indirect jump.

08:13.000 --> 08:31.000
What you do to go to the appropriate case to the appropriate code block is that you do an indirect jump indexed by the index variable.

08:31.000 --> 08:41.000
You have a table of target locations and you index this table by i in this case.

08:41.000 --> 08:53.000
It's kind of similar in concept to the switch conversion but this time we're jumping not reading from memory.

08:53.000 --> 09:09.000
So this is pretty powerful and there's one prerequisite and that's again that's the case range doesn't have too many holes in it.

09:09.000 --> 09:23.000
And to finish our turn from through the major gcc switch optimizations there's also a bit test.

09:23.000 --> 09:30.000
This one is pretty specific most contributed by regesail.

09:30.000 --> 09:37.000
We have a switch where many cases are pointing to a single code block.

09:37.000 --> 09:46.000
To this side effectively if you want to execute a thing A or thing B.

09:46.000 --> 09:53.000
We would like to efficiently check if I is in a set of all these case numbers.

09:53.000 --> 10:01.000
And the I think pretty smart idea is to use a bit vector for this.

10:01.000 --> 10:11.000
Meaning that you take a binary number and you put once at the positions which correspond to the numbers you want to represent in the set.

10:11.000 --> 10:27.000
And then you do some bit by operations and you can efficiently find out if i indeed is in the set.

10:27.000 --> 10:43.000
So those are the major switch optimizations and as I promised now I want to show you the switches from the beginning what happens to them.

10:43.000 --> 10:52.000
Now that you know about switch conversion you might guess that this is exactly the case where you would want to apply it.

10:52.000 --> 10:56.000
You are just assigning constants to some variable x.

10:56.000 --> 11:00.000
But there's a problem and that's the case range.

11:00.000 --> 11:08.000
It is two spars, there's too many holes in the case range.

11:08.000 --> 11:16.000
But we see a pattern here. We see that the case numbers are actually powers of two.

11:16.000 --> 11:24.000
So what if we could switch on the exponents of the powers of two instead of the numbers themselves?

11:24.000 --> 11:28.000
Well that would be pretty nice.

11:28.000 --> 11:38.000
It would lead us to a switch like this which this is very much processable by switch conversion.

11:38.000 --> 11:44.000
The case range actually in this case scenario is completely contiguous.

11:44.000 --> 12:00.000
But one thing we have to do is that we at runtime have to compute the base two logarithm of the index variable.

12:00.000 --> 12:04.000
Which may seem like it would be a problem.

12:04.000 --> 12:14.000
But since all the relevant values are exact, well, are powers of two.

12:14.000 --> 12:18.000
And we have some fancy instructions on modern CPUs.

12:18.000 --> 12:24.000
This actually is pretty fast. You can do this in like one or two instructions.

12:24.000 --> 12:26.000
Usually.

12:26.000 --> 12:34.000
And yeah, this is my humble contribution to GCC spiritual premises.

12:34.000 --> 12:36.000
This is what I did.

12:36.000 --> 12:43.000
And this is how it looks if one eventually switch conversion runs.

12:43.000 --> 12:50.000
You get some bounce checks which I left out, logarithm and memory read.

12:50.000 --> 12:54.000
And I also want to show you the second switch.

12:54.000 --> 12:59.000
Let one gets reduced to almost only logarithm.

12:59.000 --> 13:06.000
Because that is what it was basically computing and I have left through the switch statement.

13:06.000 --> 13:10.000
So I think it's pretty neat that GCC can now recognize this.

13:10.000 --> 13:16.000
And just insert a computation for logarithm.

13:16.000 --> 13:24.000
I want to do show you some future work, but I'm running up to time.

13:24.000 --> 13:30.000
So, recap, this is what you've seen. Switch conversion, this entry is just tables and it tests.

13:30.000 --> 13:34.000
I hope you learned something.

13:34.000 --> 13:40.000
And are now more excited about optimizing switch statements. Thank you.

13:40.000 --> 13:52.000
Do I have some time for questions?

13:52.000 --> 13:56.000
Okay.

13:56.000 --> 14:00.000
So there were two questions.

14:00.000 --> 14:06.000
The first one was if this works with other logaritms.

14:06.000 --> 14:12.000
No, this only works with base to logarithm.

14:12.000 --> 14:18.000
I don't think that this would work with like base free logarithm.

14:18.000 --> 14:24.000
Maybe with powers of two, but that would.

14:24.000 --> 14:28.000
How open would an opponent practice?

14:29.000 --> 14:33.000
So the question is, no, this works only with base to logarithm.

14:33.000 --> 14:41.000
And the second question was if this also works with chains of if else.

14:41.000 --> 14:55.000
And it does because, yeah, what I didn't mention, GCC also optimizes chains of if else into switches internally.

14:55.000 --> 15:03.000
So this is also applicable if you write this as if else chain.

15:03.000 --> 15:05.000
Last one.

15:05.000 --> 15:07.000
Okay.

15:07.000 --> 15:09.000
I saw you, okay.

15:09.000 --> 15:16.000
So you say two decisions, right?

15:17.000 --> 15:19.000
First, yes.

15:19.000 --> 15:20.000
First, yes.

15:20.000 --> 15:21.000
Jump, jump, jump.

15:21.000 --> 15:26.000
And all modern hyper, jump, hyper, traffic, traffic, traffic, traffic.

15:26.000 --> 15:32.000
So you should avoid always to assist.

15:32.000 --> 15:36.000
It is just never predicted in a lot of hyper.

15:36.000 --> 15:39.000
If it means like, single times, right?

15:39.000 --> 15:45.000
And the better hyper is essentially never predicted correctly.

15:45.000 --> 15:53.000
So, well, multiplication is the important thing for performance, right?

15:53.000 --> 15:58.000
And it does not work in that process.

15:58.000 --> 16:01.000
There is no good solution.

16:01.000 --> 16:02.000
Yeah.

16:02.000 --> 16:07.000
And well, there is no good solution expected.

16:07.000 --> 16:10.000
And then next month I'll be here for some further.

16:10.000 --> 16:12.000
And it is no good solution.

16:12.000 --> 16:15.000
So, best we have, GCC.

16:15.000 --> 16:16.000
Okay.

16:16.000 --> 16:21.000
And in GCC, we do have a truck type specific,

16:21.000 --> 16:26.000
Markov thinging to decide whether to use this thing.

16:26.000 --> 16:27.000
Okay.

16:27.000 --> 16:31.000
So basically, everything will switch to decisions.

16:31.000 --> 16:34.000
We always in the next couple of days.

16:34.000 --> 16:36.000
Because it's just so much better.

16:36.000 --> 16:37.000
Okay.

16:37.000 --> 16:41.000
So to repeat for people watching the recording,

16:41.000 --> 16:42.000
I understood correctly.

16:42.000 --> 16:45.000
This was a correction to my client.

16:45.000 --> 16:47.000
This is the basic thing.

16:47.000 --> 16:48.000
And this is better.

16:48.000 --> 16:50.000
It's not, it's the June table.

16:50.000 --> 16:52.000
It's not the modern thing.

16:52.000 --> 16:54.000
It's the 10, 20 years old.

16:54.000 --> 16:58.000
It does not have any advantage.

16:58.000 --> 17:00.000
It is only different functions.

17:00.000 --> 17:01.000
Okay.

17:01.000 --> 17:02.000
It's always well.

17:02.000 --> 17:03.000
Yeah.

17:03.000 --> 17:04.000
Yeah.

17:04.000 --> 17:05.000
It does work.

17:05.000 --> 17:07.000
That's good thing, right?

17:07.000 --> 17:09.000
It's good work.

17:09.000 --> 17:10.000
Yeah.

17:10.000 --> 17:14.000
So what I've learned from the audience is this, this technique is old.

17:14.000 --> 17:17.000
And isn't as fast as I was saying.

17:17.000 --> 17:20.000
And will be very, very obsolete.

17:20.000 --> 17:21.000
Okay.

17:21.000 --> 17:22.000
Thank you.

17:22.000 --> 17:27.000
I think that, yes, GCC doesn't decide.

17:27.000 --> 17:31.000
On the, depending on how big this switch is,

17:31.000 --> 17:34.000
it will do decision trees,

17:34.000 --> 17:39.000
even if it can see that it could do a jump table.

17:40.000 --> 17:43.000
But yeah, I don't know more details about this.

17:43.000 --> 17:45.000
Thanks.

17:45.000 --> 17:46.000
Okay.

17:46.000 --> 17:48.000
So there's all the questions.

17:48.000 --> 17:49.000
Sorry.

17:49.000 --> 17:51.000
Again, thank you for your attention.

17:52.000 --> 17:53.000
Thank you.

17:53.000 --> 17:55.000
Thank you.

17:55.000 --> 17:56.000
Thank you.

17:56.000 --> 17:57.000
Thank you.

17:57.000 --> 17:58.000
Thank you.

17:58.000 --> 17:59.000
Thank you.

17:59.000 --> 18:00.000
Thank you.

