WEBVTT

00:00.000 --> 00:10.480
Hi everyone, my name is Alex, and today we are going to talk about Profile Guide

00:10.480 --> 00:15.960
Deptimization of Pidgeor, but not about the optimization itself, about the challenges

00:15.960 --> 00:19.560
of adopting the implementation of Pidgeor in practice.

00:19.560 --> 00:23.160
Why is it like how many of you know what Pidgeor is?

00:23.160 --> 00:28.880
Great, it's a great audience, so thank you, thank you.

00:28.880 --> 00:35.360
If you were about me, so I spent some time with HECK and LVM, different parts,

00:35.360 --> 00:39.560
especially static analyzer, so I was interesting in different kinds of optimizations.

00:39.560 --> 00:45.040
My diploma project was about extending Pidgeor infrastructure for some specific cases,

00:45.040 --> 00:51.280
and during the last years, during the work on the awesome Pidgeor project, where I am

00:51.280 --> 00:54.760
trying to collect as much as possible information about Pidgeor.

00:54.760 --> 00:59.960
Unfortunately, I collected as much as possible trips of Pidgeor in real life, and I want

00:59.960 --> 01:02.760
to share this in my experience.

01:02.760 --> 01:04.600
So, why do we care?

01:04.600 --> 01:09.560
I should be care about Pidgeor, because it works, and it works for real applications,

01:09.560 --> 01:15.800
it works for databases, compilers, browsers, log shipping solutions, and so on, and it

01:15.800 --> 01:20.760
works especially efficiently efficient in practice.

01:20.760 --> 01:30.200
And you know, we need to pay some price for that improvements, and this price is issues

01:30.200 --> 01:35.120
if integration Pidgeor into different applications.

01:35.120 --> 01:40.520
And let's start from the most, most refined one.

01:40.520 --> 01:46.560
You know, reproducibility is an important topic nowadays, and actually, I was talking

01:46.560 --> 01:51.200
with many, many, many containers from different ecosystems, and top one, blocker for adopting

01:51.200 --> 01:53.680
Pidgeor was reproducibility.

01:53.680 --> 01:58.880
We can talk, we can divide this problem into two parts, like two dedicated cases.

01:58.880 --> 02:04.720
The first one is reproducible built in some kind of predefined Pidgeor profiles, safe,

02:04.720 --> 02:05.720
and collect.

02:05.720 --> 02:11.080
And the second one, the most complicated, reproducible Pidgeor profile generation.

02:11.080 --> 02:15.720
So, the first one is pretty simple to be resolved.

02:15.720 --> 02:21.000
You can just save some Pidgeor profile into your favorite version control system and reuse

02:21.000 --> 02:22.640
this profile during each build.

02:22.640 --> 02:28.600
So, if you're built was reproducible in this way, your build will stay reproducible.

02:28.600 --> 02:37.320
However, you will generate a bunch of another interesting issues like updating Pidgeor profiles

02:37.320 --> 02:41.520
between compiler, braids, and so on, because they are not compatible.

02:41.520 --> 02:44.040
And many, many other issues.

02:44.040 --> 02:46.280
But they are kind of resolvable.

02:46.280 --> 02:53.120
The second case, unfortunately, is generally unresolved, because it requires deterministic

02:53.120 --> 02:55.560
execution of the application.

02:55.560 --> 03:00.240
And if you're talking about real application, big applications like data bases, browsers,

03:00.240 --> 03:02.400
it's not a suitable in practice, unfortunately.

03:02.400 --> 03:07.360
Oh, documentation.

03:07.360 --> 03:13.800
I must admit, the documentation for Pidgeor in all their was improved significantly

03:13.800 --> 03:15.680
during the last few months.

03:15.680 --> 03:21.480
And I wanted to say thank you for everyone who was working on it, but still has some hidden

03:21.480 --> 03:22.480
traps.

03:22.480 --> 03:26.840
The first one, about hidden Pidgeor features.

03:26.840 --> 03:34.860
For example, like some trickier ways to do a Pidgeor, like a CSS Pidgeor, or maybe some

03:34.860 --> 03:41.860
relationships between Pidgeor and Pidgeor, because I was wondering that Pidgeor could

03:41.860 --> 03:46.500
be improved when Pidgeor is used at the same time.

03:46.500 --> 03:51.500
It's due to some internal implementation details.

03:51.500 --> 03:56.140
These all features are not documented at all.

03:56.140 --> 04:03.100
Clank has the best documentation about Pidgeor from all of them, based compilers.

04:03.100 --> 04:12.340
And other LVM-based compilers, like Rossi, they like to just refer to the Clank documentation.

04:12.340 --> 04:17.780
Unfortunately, it doesn't work in practice well, because don't let's call them down

04:17.780 --> 04:23.140
stream compilers from the Pidgeor perspective, don't implement all Pidgeor features.

04:23.140 --> 04:30.500
And you as a user should guess which parts of Clank documentation applicable to your compiler.

04:30.500 --> 04:36.940
And I know which parts are applicable users in general don't.

04:36.940 --> 04:42.100
And smaller issues like different warnings, like rings, start out of static counters and

04:42.100 --> 04:43.620
they're like that.

04:43.620 --> 04:50.820
And if you know, if you have some experience with Pidgeor, you don't care, but Pidgeor

04:50.820 --> 04:53.820
and Ubis, let's call them like that.

04:53.820 --> 04:59.820
They are scary about this one, because oh, Pidgeor broke my build, especially if they

04:59.820 --> 05:06.700
have enabled by default warning questionnaire policy.

05:06.700 --> 05:12.580
Many of these questions can be answered by, let's call them Pidgeor gurus.

05:12.580 --> 05:20.380
And this is the most convenient way to resolve your current problems with Pidgeor, because

05:20.380 --> 05:23.980
official documentation select of many, many interesting details.

05:23.980 --> 05:26.780
And it's much easier just ask a question from bright people.

05:26.780 --> 05:32.220
For example, from LVM, it's a single young, directly, I hope I pronounced properly.

05:32.220 --> 05:36.620
Although it's a Johnson, they're both from Google, from the RFC compiler.

05:36.620 --> 05:42.020
It's a Yaku Bironic, it's a person who optimized RFC, if Pidgeor and many, many other

05:42.020 --> 05:48.660
people from different ecosystems like Pidgeor have a system, like a Keshe OS, they are crazy

05:48.660 --> 05:52.100
about applying Pidgeor to different stuff.

05:52.100 --> 05:56.940
And it would be nice to highlight such people in some way, so it will simplify a life

05:56.940 --> 06:00.460
for future Pidgeor doctors.

06:00.460 --> 06:08.780
Unfortunately, many people still are not aware about Pidgeor, about what is Pidgeor, how

06:08.780 --> 06:11.780
efficient it is in practice and so on.

06:11.780 --> 06:18.900
I found many times that different kinds of misinformation about Pidgeor.

06:18.900 --> 06:24.100
For example, Eskillite, for many years, had the note in the official documentation that Pidgeor

06:24.100 --> 06:26.460
doesn't help to optimise his QI performance.

06:26.460 --> 06:34.260
I was triggered by it and I performed my own benchmarks and I got persecution movements.

06:34.260 --> 06:40.900
I reported back to the upstream and after very, very long debate, with Eskillite deaths, they

06:40.900 --> 06:43.220
removed this note from the documentation.

06:43.220 --> 06:46.220
Unfortunately, they didn't add a note that Pidgeor helps.

06:46.300 --> 06:51.180
So, it is what it is.

06:51.180 --> 06:56.180
And let's also my favorite.

06:56.180 --> 07:02.700
People try to apply Pidgeor to one application and do a dozen work-for-all cases.

07:02.700 --> 07:07.500
And if it doesn't work for on case, they make a conclusion, Pidgeor doesn't work at all

07:07.500 --> 07:12.220
and write, she's a comment on Reddit, so I think that's about, so please don't try

07:12.300 --> 07:16.300
Pidgeor, it's all it didn't help to optimise my effect impact.

07:16.300 --> 07:21.900
Eskillite's impact is a good piece of software from the performance perspective at least.

07:21.900 --> 07:27.420
So, and sorry for the guys.

07:27.420 --> 07:34.220
And unfortunately, almost all our software in the industry are not the same quality

07:34.220 --> 07:38.300
from the performance perspective and it's a pity situation.

07:38.300 --> 07:42.700
The lack of Pidgeor integration examples, you know, every engineer is a copy-pacing

07:42.700 --> 07:43.700
engineer.

07:43.700 --> 07:47.740
And it's fine because they have a different job to do like business job, optimising

07:47.740 --> 07:50.140
whatever.

07:50.140 --> 07:53.980
And unfortunately, current algorithm, the commitment has very, very few good examples

07:53.980 --> 07:55.980
how to integrate Pidgeor in practice.

07:55.980 --> 08:02.500
So we have built instructions for clank and cement caches and let's see it.

08:03.060 --> 08:05.940
All these documentation doesn't ask questions like this.

08:07.940 --> 08:12.580
And these questions are very important in practice, especially the last one is the most

08:12.580 --> 08:19.620
trickest one, how can I report a performance degradation from Pidgeor to the upstream?

08:21.460 --> 08:27.220
And there are many, many other questions like that and they are not answered by official

08:27.220 --> 08:28.660
documentation, unfortunately.

08:30.420 --> 08:35.140
And to make things worse, different use cases, different Pidgeor challenges.

08:35.140 --> 08:38.260
For example, what Pidgeor kinds should I use in my case?

08:38.260 --> 08:41.140
We have two major Pidgeor kinds of limitations and sampling.

08:42.100 --> 08:49.140
And we can't give a simple answer here because we have different domains,

08:49.140 --> 08:51.540
budget limitations and so on.

08:51.540 --> 08:54.420
And different use cases require different kinds of Pidgeor.

08:55.220 --> 09:03.940
And even if you give people some recommendations, they like to mess up these recommendations.

09:03.940 --> 09:09.300
For example, only something Pidgeor should be used nowadays, nope, it shouldn't,

09:09.300 --> 09:16.020
because it has their own limitation and for example, if you are talking about creating packages

09:16.020 --> 09:20.820
for creating system tech, creating the results for creating system tech,

09:20.820 --> 09:23.780
if sampling Pidgeor is not achievable, at least nowadays.

09:24.980 --> 09:29.220
Instremutation Pidgeor can be used in production environment, nope, it's cannot be,

09:29.220 --> 09:33.620
but please be careful. There are situations when instrumentation Pidgeor could be used.

09:33.620 --> 09:38.420
For production environment, for example, when you want to squeeze as much as possible performance

09:38.420 --> 09:48.100
from Pidgeor because instrumentation Pidgeor is still better than sampling Pidgeor from the

09:49.060 --> 09:51.060
optimization of Pidgeor for the compiler.

09:53.060 --> 09:58.260
And that's why we need much more examples from different ecosystems, different domains and so on.

09:58.260 --> 10:03.220
So people will be able to find the closest to their use case example

10:03.220 --> 10:07.380
and try to copy Pidgeor's multiple things from there. And that's fine.

10:09.380 --> 10:10.420
Oh, tooling support.

10:10.580 --> 10:18.420
Oh, so there are almost no dual system integrations for Pidgeor.

10:18.420 --> 10:24.340
Basel hasn't some kind of integration and back to that dual system from phased methods, sorry.

10:26.100 --> 10:32.340
Other dual systems like D4, dual system for C++, CMAC, doesn't have anything like that.

10:33.220 --> 10:35.700
The same thing goes about depending on the semangereuse.

10:36.660 --> 10:43.300
And it makes the life harder for regular Pidgeor users.

10:44.260 --> 10:46.900
And we need to somehow resolve it.

10:47.460 --> 10:51.860
Rust has a really great example of such a tool like a cargo Pidgeor.

10:53.140 --> 10:59.060
I'm used this extension a lot in my practice and it simplifies the life

10:59.860 --> 11:04.260
significantly when you want to apply Pidgeor at least at least in

11:04.260 --> 11:10.580
in simple cases, at least in them. It has only images like no semangereuse support but

11:12.580 --> 11:18.580
and so for different use cases, highly likely we will need different tools.

11:21.380 --> 11:26.820
So about different use cases, semangereuse issues like more than a way to do Pidgeor.

11:27.780 --> 11:36.260
As a default tooling for doing a semangereuse was is out of video that's a project from Google.

11:36.260 --> 11:42.500
Unfortunately, it has own problems. Like a build, issues with the most recent

11:42.500 --> 11:46.180
LVM versions, you need to fix them manually or apply it patches.

11:46.900 --> 11:53.220
Different bugs and the things like huge memory consumption and instead of good

11:53.220 --> 11:56.580
shell solution in upstream you get advice like that.

11:58.340 --> 12:04.260
Sorry, I don't like to work on a heavily sloped Linux machine and I already have 48

12:04.260 --> 12:06.900
gigabyte of RAM. So no, let's see not the solution.

12:09.460 --> 12:14.660
Out of the video plans to migrate to the LVM repo so it will resolve at least all

12:14.660 --> 12:19.300
built issues but unfortunately, GCC support highly likely will be dropped.

12:19.380 --> 12:26.420
Where is the link to the mail thread about it and as far as I know GCC folks didn't

12:26.420 --> 12:30.500
migrate it to their own repository. So it could be a problem for them.

12:32.420 --> 12:40.660
We have another tool LVM Proof Gen and it so it can be used in some cases like replacements

12:40.660 --> 12:45.540
for all the video. Unfortunately, not in all cases. For example, the most significant one is

12:45.540 --> 12:54.820
non LBR support LBR. It's hardware feature from Intel CPUs. It helps to perform sampling

12:54.820 --> 13:00.980
PIDRO in a more efficient way. This feature is available for Intel CPUs for many years but

13:00.980 --> 13:07.860
unfortunately in AMD CPUs it's available only since then free with an asterisk and since then

13:07.860 --> 13:15.860
4 without an asterisk. So this use case is still important for the industry.

13:18.500 --> 13:24.980
So if you are talking about sampling PIDRO, frequently we want to apply

13:26.180 --> 13:31.220
this optimization continuously because we have large fleet of services and someone

13:31.220 --> 13:34.660
and we don't want to perform all optimizations manually for each release.

13:35.300 --> 13:41.860
So Google does it internally. We have a special system like based on Google White Profiler.

13:41.860 --> 13:52.580
It has a large architecture. It's like that. You need to understand that regular PIDRO user

13:52.580 --> 13:58.660
can replicate it because Google implemented them internally and unfortunately this system is

13:58.660 --> 14:04.180
closed-sourced but finally, today's ago, the AMD's open-sourced completely the same system.

14:04.740 --> 14:10.340
It's a scope here for it. So you can use it. It's completely open-sourced. You can deploy it on your

14:12.180 --> 14:17.460
on your servers and use it like Google does it internally. So highly recommend to give it a try.

14:19.620 --> 14:24.500
By the way, the LBR compilers are supported at least yet. In a official way,

14:24.500 --> 14:30.980
very discussions about other things like a Grafana Pyros code but nothing to try beside discussions yet.

14:30.980 --> 14:38.020
And I don't see an activity. So, interesting question about profile guide optimization,

14:38.020 --> 14:42.500
post-link optimization like LBR, both the propeller, people don't understand the difference

14:42.500 --> 14:50.020
between both of these tools. They think it can give me one tool like to optimize my software

14:50.100 --> 14:56.340
and I can understand the point of view. And since you have two different tools, you need to

14:57.620 --> 15:05.700
tweak your build scripts twice and so on. And it creates more complexion in so on.

15:05.700 --> 15:11.620
That's not a good thing. And you know, we need better documentation of PIDRO tools to

15:11.620 --> 15:19.300
both have at least quite good read me. Google propeller, okay. So that's not the documentation, sorry.

15:20.660 --> 15:31.300
And, you know, for my practice for PIDRO tools, in many, in almost four cases, it's much easier to just

15:32.100 --> 15:36.180
ask a question directly in the LBR discord from the developers.

15:37.460 --> 15:43.300
About the LPR integration into LBR itself, only Klan has some kind of integration.

15:43.300 --> 15:50.260
What we can do more things like a Klan format, fland, Klan Gd, I have benchmarks for that,

15:50.260 --> 15:56.900
already, no integration yet. And the same goes for PLO adoption to weekend optimise LBR tool

15:56.900 --> 16:07.620
more we can, but we don't do it for now. Regarding PIDRO integration into other LBR based

16:07.700 --> 16:16.100
compilers, Klan has the best support, RSC, RSC, all of the same as the Klan with some fancy PIDRO features

16:16.100 --> 16:21.620
and other PIDRO LBR based compilers unfortunately support on the basic PIDRO

16:21.620 --> 16:27.860
way instrumentation or maybe don't support that's all PIDRO. And it's a good big good

16:27.860 --> 16:34.580
motivation for all LBR and non GSC based compilers to implement, to implement their good

16:34.740 --> 16:40.340
optimising compilers based on all of them because the other LBR infrastructure will be

16:40.340 --> 16:49.460
quite tricky to implement PIDRO in the right way. So why should we care about PIDRO?

16:51.460 --> 16:59.620
The idea behind all some PIDRO project is quite simple. I'm trying to feel as much as possible

16:59.620 --> 17:04.900
gaps in the current PIDRO infrastructure, in the current Klan ecosystem and not talking

17:04.900 --> 17:09.860
not only about LBR. PIDRO efficiency demonstration rail applications,

17:09.860 --> 17:14.260
highlighting more PIDRO features that are hidden different RFC and maybe commitments and so on.

17:16.660 --> 17:22.900
Making a shift from just PIDRO and LBR documentation to how to use PIDRO

17:22.900 --> 17:29.620
in practice for real stuff, not just like smaller example and also all possible

17:30.740 --> 17:38.900
questions about PIDRO that could appear in the future. It is I want to introduce more data

17:38.900 --> 17:44.020
driven optimizations into current world so without all the stuff like random and winding

17:44.020 --> 17:52.260
and selling, it can do better based on a data. It's not a big prize to that and but

17:52.340 --> 17:57.140
that I want to increase the default level, the default performance level for the world with

17:57.140 --> 18:04.980
a system and improve the world. If even if you need to open some amount of issues, let's

18:05.060 --> 18:08.820
stay from us today from my GitHub. And that's it for today. Thank you.

18:09.780 --> 18:13.500
APP Plan

18:28.900 --> 18:33.220
Avishkanta, are Tid Cruz's

18:33.220 --> 18:43.380
Temporal. Temporal, Pidgeor. Well, Temporal Pidgeor, that's kind of Pidgeor, that was documented

18:43.380 --> 18:49.480
a month ago, I guess, that's kind of Pidgeor that helps you to optimize startup time for

18:49.480 --> 18:54.580
your application. That's it, Temporal. I don't have experience with this kind of Pidgeor.

18:54.580 --> 19:03.100
I only, I only was reading RFC and discussions on LVM forum. I didn't try it. Sorry.

19:03.600 --> 19:17.980
Can you maybe explain a bit how it's implemented in LVM? Is it a notification pass?

19:17.980 --> 19:33.820
Yeah, that's one of several optimization passes here. How they implemented inside of

19:33.820 --> 19:40.620
them, I cannot say you much because you know, different limitations, so on. Are you interested

19:40.620 --> 19:49.180
in optimizing stuff or on the optimization phase or instrumentation phase? Optimization phase

19:49.180 --> 19:59.740
basically what Pidgeor brings, why it's so helpful at practice. It helps to perform much

19:59.740 --> 20:07.660
much better in LVM. And in LVM, if you know about something about LVM, inside it's a very important

20:07.660 --> 20:15.740
optimization, not only because in LVM, it improves performance directly like LVM jumps, called

20:15.740 --> 20:23.980
now in LVM, gives to a compiler a context to perform other optimizations. And that's why it's

20:23.980 --> 20:29.740
so important. There are other things like, I guess, the virtualization, I'm not sure it's

20:29.740 --> 20:35.340
implemented in LVM in goal, it's implemented in goal, compiler, but in LVM is the most important.

20:37.660 --> 20:41.260
Okay, we're out of tune. Thank you.

