WEBVTT

00:00.000 --> 00:12.800
Okay, so hey, so hi, I'm Tom, and I guess thank you for staying until the end.

00:12.800 --> 00:18.120
And today I'm going to be talking a little bit about how the Native Image Builder works.

00:18.120 --> 00:25.960
So I'm a principal researcher at Graal VM, and I joined the Native Image Team back in 2020.

00:25.960 --> 00:30.760
So in this talk to the alphars, we have a brief overview of what Native Image is.

00:30.760 --> 00:38.460
In case you have no clue, basically it's a tool for creating standalone executables from Java applications.

00:38.460 --> 00:47.360
I also then kind of describe at a high level kind of what the process that Native Image is going through to build your standalone executable.

00:47.360 --> 00:51.160
And then after that, I'll give some more details about the front end of the builder.

00:51.160 --> 00:58.960
In particular, how we create grau graphs from Java byte code, what our grau graphs look like, how to look at them.

00:58.960 --> 01:06.360
And also within these graphs, how we interact with both the substrate world and the host JVM.

01:06.360 --> 01:14.360
Now, by the substrate world, what I mean is all the metadata methods and objects that we actually install into the final executable.

01:14.360 --> 01:19.360
And by the host JVM, I mean the JVM that's running the Native Image Builder itself.

01:19.360 --> 01:24.960
Because in case you don't know, Native Image is just another Java application.

01:24.960 --> 01:31.560
So hopefully by the end of this talk, you have a better understanding of kind of what the general builder process is.

01:31.560 --> 01:36.060
How do we view and, sorry, retrieve and view grau graphs?

01:36.060 --> 01:46.860
How within these graphs, the substrate universe is represented and how the substrate universe relates to the Java universe of the host JVM.

01:46.860 --> 01:53.760
So, as I said before, Native Image is a tool for creating stand-alone executables from Java applications.

01:53.760 --> 02:04.960
And what we do is through analysis, we kind of determine all the possible Java code that you could execute during runtime and we had a time of A of T-A-O-T compile that.

02:04.960 --> 02:13.960
In addition to this code, we also are able to evenly initialize some objects, we run some class initializers as part of the build process,

02:14.060 --> 02:18.060
and we inject those objects into the executable as well.

02:18.060 --> 02:23.860
Now, these objects that we're injecting into the executable would just be part of the normal Java heap.

02:23.860 --> 02:30.860
But this portion that we install during the build process is what I call the image heap.

02:30.860 --> 02:37.060
So, there's some benefits of using Native Image, including that it's just a standalone executable.

02:37.060 --> 02:42.060
You don't need a JDK distribution or many dependencies at runtime.

02:42.160 --> 02:46.360
Because it's A-O-T compiled code, you have fast startup.

02:46.360 --> 02:50.160
Also, you have low memory footprint, because we're doing all the stuff.

02:50.160 --> 02:58.960
As part of the build process, we're able to purge all the normal metadata that you like hotspot has to keep around during execution.

02:58.960 --> 03:03.460
And also, because we A-O-T compiled code, we have very predictable performance.

03:03.460 --> 03:11.360
You don't have to worry about warm-up, and also you don't have to worry about any performance cliffs on due to the optimization.

03:11.360 --> 03:20.260
And yeah, Native Image works on a variety of popular frameworks, including Spring Boots, Micronot, and Quarkus.

03:20.260 --> 03:26.960
So, the 1000-foot view of Native Image and what it's doing kind of looks something like this.

03:26.960 --> 03:34.260
So, basically what we have to do is kind of suck up all the dependencies your application has,

03:34.260 --> 03:39.360
including on third-party libraries, the JDK, and kind of the runtime itself.

03:39.360 --> 03:47.360
Figure out what you're actually going to execute from it, and then actually install it into code in the text section.

03:47.360 --> 03:53.360
As I said before, too, also, we're going to run some initializations as part of the build process,

03:53.360 --> 04:01.360
and take the objects that are present in those static fields in these classes, and install them in the image heap.

04:01.360 --> 04:10.360
Of course, before we do that, we're going to have to invert these hosts, JVM objects, and to the layout that represents the substrate world.

04:10.360 --> 04:19.360
Now, of course, when you're installing some objects into the image heap, you'll have to install the transitive closure of those objects to the image heap as well.

04:19.360 --> 04:22.360
So, we have to do some object scanning.

04:22.360 --> 04:29.360
So, in general, kind of the way we determine what code you have to install in the text section is we have to use a points to analysis,

04:29.360 --> 04:38.360
and we create a types reachable graph, whereas we kind of inject more types, as, sorry, as more types become part of the substrate world,

04:38.360 --> 04:41.360
and implies that more code becomes reachable.

04:41.360 --> 04:47.360
So, kind of what's going along on here with regards to the analysis, initialization, scanning.

04:47.360 --> 04:53.360
It's not like a one-shot thing, but it's kind of a gradual process that gradually fills out the substrate world.

04:53.360 --> 05:05.360
And the reason for that is, for example, when you scan an object, you actually might cause a new object to become read that has a different type that you haven't seen before,

05:05.360 --> 05:13.360
which then might cause more code to become reachable, which then in that new code you're analyzing, you might actually see a new constant,

05:13.360 --> 05:20.360
which then causes more scanning, so you can see it's kind of this iterative process that we have to go through.

05:21.360 --> 05:31.360
So, yeah, so this is kind of my definition, but the way I'd kind of classify the native image builder would be into four different stages,

05:31.360 --> 05:39.360
the analysis, finalized substrate world, compile, and prepare executable.

05:39.360 --> 05:44.360
So, in the analysis phase stage rather, we need to do three main things.

05:45.360 --> 05:48.360
We need to create graph graphs from Java bytecode.

05:48.360 --> 05:53.360
We need to run all the class initialization, for objects we're going to install in the image heap,

05:53.360 --> 05:59.360
and we need to discover all the information that potentially is going to be installed in the substrate world.

05:59.360 --> 06:08.360
And once again, by substrate world, I mean all the metadata methods and objects that we put in the executable at the end.

06:08.360 --> 06:12.360
So, now, in the second stage, our analysis is complete.

06:12.360 --> 06:17.360
We know all that could be possibly in the substrate world, so now we could finalize it.

06:17.360 --> 06:23.360
So, what that means is we need to kind of finalize the object layout and also calculate some and header information,

06:23.360 --> 06:28.360
such as kind of the metadata needed for type checks and also such things as like, you know,

06:28.360 --> 06:32.360
V-table layouts and what V-table indexes methods have.

06:32.360 --> 06:41.360
Also, at this point, because the analysis is complete, we could optimize the graphs that we use based on the analysis.

06:41.360 --> 06:48.360
Compile stage, pretty self-implanatory, we just take our graphs and compile them down to machine code.

06:48.360 --> 06:55.360
And then, and then, in the final stage, is when we actually need to generate the executable itself.

06:55.360 --> 06:59.360
So, we need to write all the image heap objects to the data section.

06:59.360 --> 07:03.360
We need to lay out all the code we have compiled and put it in the text section.

07:03.360 --> 07:09.360
Then, also, any dependencies between different methods or references to the heap.

07:09.360 --> 07:15.360
We either need to patch ourselves to reflect their layout or install relocations.

07:15.360 --> 07:21.360
And then, we have to invoke the linker itself to create the executable.

07:21.360 --> 07:31.360
So, yeah, this is not sure how easy this is to see, but normally when you run native image, this is kind of on any application.

07:31.360 --> 07:35.360
This is kind of the output you get from the command line.

07:35.360 --> 07:40.360
And before it's described in four stages and kind of, how does it not to this?

07:40.360 --> 07:45.360
Well, let's say these first two stages, initializing and performing analysis,

07:45.360 --> 07:49.360
correspond to that analysis stage I was talking about before.

07:50.360 --> 07:55.360
Building universe corresponds to the finalized substrate world.

07:55.360 --> 08:01.360
These next three stages, phases, parsing and lining and compiling correspond to the compile phase.

08:01.360 --> 08:12.360
And the final two phases lay out methods in creating an image correspond to the prepared executable phase stage.

08:12.360 --> 08:17.360
So, now, and we'll talk a little bit about how we create graphs.

08:17.360 --> 08:24.360
And actually, the code and within the graph repo where all this graph creation is done is in the bytecode parser.

08:24.360 --> 08:30.360
So, what it's doing is it's reading the Java bytecode and creating graph grass from it.

08:30.360 --> 08:35.360
And this is only done at the very beginning the first time we've touched a method.

08:35.360 --> 08:43.360
And I should also mention in these slides, especially in these later slides, I've added a bunch of links to actually references to the source code.

08:43.360 --> 08:49.360
So, you know, in 20 minutes I don't have that much, I can't go too much in detail, but I encourage you to look at these links.

08:49.360 --> 08:55.360
I'm sorry, later on if you really curious how like the nitty-gritty of how things work.

08:55.360 --> 09:02.360
So, yes, from the bytecode, we create this graph graphs only once in a very beginning.

09:02.360 --> 09:06.360
And then as soon as we create the graph graphs, the graphs are our source of truth.

09:06.360 --> 09:10.360
We never look at the bytecode again, we could just throw it away.

09:10.360 --> 09:18.360
Everything we do operates via viewing and transforming or analyzing these graphs, including the analysis itself.

09:18.360 --> 09:25.360
We create this type flow analysis directly by looking and processing the graph graph.

09:25.360 --> 09:32.360
And the perk of this is that it allows us to do optimizations at any point of the builder process.

09:32.360 --> 09:39.360
We could include, you know, as soon as you optimize a graph, all later stages will see those changes you make,

09:39.360 --> 09:42.360
because we're always just looking at these graph graphs.

09:42.360 --> 09:49.360
So, it's nice so we could do some optimizations before analysis, and then it can make our analysis work better.

09:49.360 --> 09:52.360
So, what our graphs look like?

09:52.360 --> 09:57.360
Well, if you're familiar with hot spot, graph graphs look a lot like it.

09:57.360 --> 10:06.360
We use a C of node representation, where different nodes or denote different operations or actions being performed,

10:06.360 --> 10:10.360
and then the arrows denote dependencies between these nodes.

10:10.360 --> 10:21.360
So, in this graph here, the red lines are controlled dependencies, whereas the blue lines are value dependencies.

10:21.360 --> 10:27.360
So, because the graph, so the source of truth throughout the builder process,

10:27.360 --> 10:31.360
it's very valuable to actually look at these graphs and see how they're changing.

10:31.360 --> 10:34.360
It could give you a lot of insights.

10:34.360 --> 10:40.360
And the tool we have for doing that is something called IGV, or ideal graph visualizer.

10:40.360 --> 10:43.360
Now, technically this tool isn't unique to grow.

10:44.360 --> 10:48.360
Hot spot for viewing C2 graphs also has an IGV.

10:48.360 --> 10:52.360
I should point out the two versions of IGV are different.

10:52.360 --> 10:56.360
So, if you want to view our graph, please look here.

10:56.360 --> 11:01.360
Please go to this link here and get our version of IGV.

11:01.360 --> 11:07.360
Now, from native image, the builder process itself, the way to actually get it to dump out graphs,

11:07.360 --> 11:11.360
are to add these three methods to your build command.

11:11.360 --> 11:16.360
And I've glued the links here too, because you could make it only print out one method

11:16.360 --> 11:18.360
or in various levels of detail.

11:18.360 --> 11:22.360
So, yeah, it's a good to read the options at this links.

11:22.360 --> 11:26.360
In general, too, if you want to see all the options available to you as flags,

11:26.360 --> 11:31.360
you could run the command on native image expert options all.

11:31.360 --> 11:39.360
So, using IGV was the tool look like, well, once again, I assume you can't see it too well,

11:40.360 --> 11:45.360
it's not that important, but basically in the middle, you could see the graph itself.

11:45.360 --> 11:51.360
IGV provides opportunity to zoom in if you want or select a subset of the nodes,

11:51.360 --> 11:56.360
which is very important, because this is a very simple graph, but usually throughout the build

11:56.360 --> 11:59.360
or process, these graphs get quite large.

11:59.360 --> 12:04.360
On the right side, when you select a node, it shows various properties of it.

12:04.360 --> 12:10.360
And then on the left side, then you could see the methods of the graphs you've actually dumped out.

12:10.360 --> 12:16.360
Now, due to a little quirk, in the way throughout the build or process for any one graph,

12:16.360 --> 12:21.360
native image oftentimes is serializes and decirilizes that graph.

12:21.360 --> 12:26.360
So, even for one method, you actually see it listed multiple times on this left hand side,

12:26.360 --> 12:30.360
which is just kind of a thing to keep in mind.

12:30.360 --> 12:36.360
Now, how these different, but they're one method is listed multiple times.

12:36.360 --> 12:42.360
It's usually very easy to figure out which of these folders have what phase of the build or process

12:42.360 --> 12:46.360
of corresponds to, because you could just look at the optimization phases,

12:46.360 --> 12:50.360
which I won't go into now either, but just a reference.

12:50.360 --> 12:57.360
The one thing I will say is this analysis strengthens graph phase is quite important,

12:57.360 --> 13:02.360
and that's the phase where we're actually injecting all of our analysis results into the graph.

13:02.360 --> 13:07.360
So, that's a spot where a lot of changes are made to the graph.

13:07.360 --> 13:11.360
So, now we have these graph graphs, they're all important.

13:11.360 --> 13:16.360
We also need to represent Java information in these graph graphs,

13:16.360 --> 13:19.360
and we need a way to clear the Java world.

13:19.360 --> 13:25.360
Now, the normal way to kind of interact with the Java world and application would be to use reflection.

13:25.360 --> 13:29.360
But we really don't want to do that for two reasons.

13:29.360 --> 13:33.360
One is we need additional information about the VM internals,

13:33.360 --> 13:38.360
and second, I guess, even more importantly, is we need a degree of separation.

13:38.360 --> 13:45.360
Because if you're via reflection, what you're seeing is the Java world of the host JVN,

13:45.360 --> 13:50.360
then these graph graphs themselves is actually, we're creating the Java world for the sub,

13:50.360 --> 13:54.360
we're not creating the Java world, we're creating the substrate world.

13:54.360 --> 14:01.360
So, the solution that hotspot provided for us is something called JVNCI,

14:01.360 --> 14:08.360
or Java level JVN compiler interface, and it was added to hotspot as part of JF243.

14:08.360 --> 14:15.360
And it provides an API for creating the JVN instance information we need.

14:15.360 --> 14:20.360
So, if we kind of have some mapping between reflection data,

14:20.360 --> 14:22.360
that it has different names.

14:22.360 --> 14:27.360
So, instead of having the accesses to on class fields and methods,

14:27.360 --> 14:32.360
now we have JVNCI semi-equivalence of resolved Java type,

14:32.360 --> 14:36.360
resolved Java field, and resolved Java method.

14:36.360 --> 14:41.360
So, it's important to note now in these graphs, whenever we're reporting,

14:41.360 --> 14:46.360
in these graphs, we're referring to the Java world or creating the substrate world,

14:47.360 --> 14:51.360
we always have to refer to it via JVNCI objects.

14:51.360 --> 14:57.360
So, for instance, in a call, the target of the call will be a resolved Java method.

14:57.360 --> 15:02.360
Likewise, we're doing a field access, when we refer to a field,

15:02.360 --> 15:07.360
it's going to be a type resolved Java field.

15:07.360 --> 15:14.360
So, via this JVNCI, what we're doing is we're actually exposing our substrate world.

15:14.360 --> 15:19.360
And the way we do that is we actually create need of image has its own implementation

15:19.360 --> 15:22.360
of these JVNCI objects.

15:22.360 --> 15:27.360
In particular, when we first create the graph, what we're inserting into it

15:27.360 --> 15:32.360
are analysis types, analysis fields, and analysis methods.

15:32.360 --> 15:37.360
So, going back to this example here, this field here,

15:37.360 --> 15:43.360
will have an object that will be of type analysis field.

15:43.360 --> 15:47.360
Later on as part of the builder process, we actually swap out these analysis

15:47.360 --> 15:52.360
and types fields and methods for hosted fields and methods.

15:52.360 --> 15:56.360
And we do this once we've actually finalized the substrate world.

15:56.360 --> 16:01.360
So, these new hosted variants, like this is also just another native

16:01.360 --> 16:05.360
implementation of these JVNCI objects.

16:05.360 --> 16:12.360
But, these new hosted variants of these JVNCI objects are created

16:12.360 --> 16:15.360
as part of the finalized substrate world stage.

16:15.360 --> 16:19.360
And they're inserted into graphs right before the compile stage.

16:19.360 --> 16:25.360
Via this on class here, this analysis, the hosted graph transplanter.

16:25.360 --> 16:31.360
Now, JVNCI is very nice and it's easy to use,

16:31.360 --> 16:36.360
but because we are creating our own objects.

16:36.360 --> 16:41.360
However, from our object often times, we actually, they just serve as a wrapper

16:41.360 --> 16:46.360
and delicate to the host JVN, JVNCI objects.

16:46.360 --> 16:52.360
So, we heavily rely on using the host JVN to answer a lot of queries for us,

16:52.360 --> 16:56.360
including just, you know, general information, such as methods

16:56.360 --> 17:01.360
signature or method resolutions, for instance, for like a virtual call

17:01.360 --> 17:05.360
with a given receiver, what would its target be.

17:05.360 --> 17:09.360
It's nice to use the host JVN for this, simply because we don't

17:09.360 --> 17:13.360
then have to implement any of this logic ourselves.

17:13.360 --> 17:19.360
However, because we do have a wrapper, we can modify this world to actually,

17:19.360 --> 17:23.360
or sorry, we could always modify callbacks to always accurately reflect

17:23.360 --> 17:26.360
the substrate world we're creating.

17:26.360 --> 17:30.360
So, we're able to add and renew fields, and we also could change method

17:30.360 --> 17:35.360
implementations whenever we want to.

17:35.360 --> 17:39.360
So, one other thing I should mention is, you know,

17:39.360 --> 17:43.360
I've talked about this hosted types, or as I analysis and hosted,

17:43.360 --> 17:47.360
but also for the constants that we represent in these graphs,

17:47.360 --> 17:52.360
we also have to represent them via the proper JVNCI way.

17:52.360 --> 17:56.360
And the proper way in graph to represent on constants,

17:56.360 --> 18:01.360
to be a JVNCI is through Java constants.

18:01.360 --> 18:06.360
Now, for native image, what we're doing is we actually wrap these

18:06.360 --> 18:10.360
no more Java constants with something called an image heap constant,

18:10.360 --> 18:13.360
which is kind of the same thing we're doing with regards to this analysis

18:13.360 --> 18:15.360
and hosted types.

18:15.360 --> 18:20.360
Normally, then we have this delegate and point to a hosted JVN value,

18:20.360 --> 18:23.360
but we don't strictly have to, and we've actually started to not

18:23.360 --> 18:25.360
in some situations now.

18:26.360 --> 18:30.360
So, kind of one thing I want to circle back on,

18:30.360 --> 18:33.360
is I mentioned before that the strength and graphs,

18:33.360 --> 18:37.360
is a piece of code I highly recommend looking at,

18:37.360 --> 18:41.360
because this is the spot where we apply all of our analysis results.

18:41.360 --> 18:44.360
A granted later on it could be further optimized based on it,

18:44.360 --> 18:49.360
but this is the phase where we inject better information about the type system via

18:49.360 --> 18:52.360
steps.

18:52.360 --> 18:55.360
Because through our analysis, we've been able to sometimes figure out

18:55.360 --> 18:57.360
more specific types that are going to be used,

18:57.360 --> 19:00.360
or other nullness information.

19:00.360 --> 19:04.360
We're able to fold constants, I'm sorry, logic checks,

19:04.360 --> 19:07.360
since as a null, if we know an object could be never null.

19:07.360 --> 19:12.360
And also, in many cases, if we know a given virtual or invoker,

19:12.360 --> 19:16.360
interface invoke, actually only has one possible target,

19:16.360 --> 19:21.360
at this stage we can convert it to a direct call.

19:21.360 --> 19:26.360
So, yeah, with that, once again, these are the goals I listed that,

19:26.360 --> 19:30.360
hopefully you have a better understanding of after this talk.

19:30.360 --> 19:35.360
In regards to answering them, in terms of what I kind of view as the four

19:35.360 --> 19:38.360
stages of the builder, they are analysis,

19:38.360 --> 19:44.360
find their life substrates world, compile, and prepare executable.

19:44.360 --> 19:49.360
It's very worthwhile to use IGV and look at growl graphs,

19:49.360 --> 19:53.360
because that's kind of the main currency of what native image builder

19:53.360 --> 19:55.360
is working with.

19:55.360 --> 19:59.360
And then within these growl graphs, the way we're representing the

19:59.360 --> 20:04.360
substrate world we're creating is representing it via

20:04.360 --> 20:09.360
JVMCI and our analysis or hosted types and objects.

20:09.360 --> 20:14.360
One good read that kind of says some more information about this

20:14.360 --> 20:19.360
are analysis types or hosted types and how they relate to the host

20:19.360 --> 20:23.360
JVM is the Java doc within hosted universe.

20:23.360 --> 20:28.360
So, I'd encourage you to read that later if you're interested to learn more.

20:28.360 --> 20:33.360
And finally, within our JVMCI implementation,

20:33.360 --> 20:38.360
we oftentimes just use the host JVM objects to answer a lot

20:38.360 --> 20:40.360
of our queries.

20:40.360 --> 20:43.360
But we're able to, because we have our own objects,

20:43.360 --> 20:47.360
modify the results to accurately model our substrate world.

20:47.360 --> 20:51.360
And yeah, with that, I'm done.

20:51.360 --> 20:55.360
Thank you for listening and, oh,

20:55.360 --> 20:59.360
and yeah, I'll take any questions now if you have any.

20:59.360 --> 21:11.360
So, yeah, any questions?

21:11.360 --> 21:14.360
Yes.

21:14.360 --> 21:28.360
So, the question is, how is the host world relate to the substrate world?

21:28.360 --> 21:33.360
When you use it as a share library and create an isolate.

21:33.360 --> 21:36.360
You've introduced many concepts there, right?

21:36.360 --> 21:39.360
So, like the notion of ice, for people who don't know,

21:39.360 --> 21:43.360
the notion of an isolate is that from within a native image,

21:43.360 --> 21:48.360
you could actually have multiple instances running on the same process.

21:48.360 --> 21:50.360
And that's called an isolate.

21:50.360 --> 21:53.360
The share library doesn't really affect it,

21:53.360 --> 21:56.360
whether it's an execute or a share of libraries the same way.

21:56.360 --> 22:00.360
But the main difference between the host world and the substrate world

22:00.360 --> 22:05.360
is, usually, we could prune out methods or fields.

22:05.360 --> 22:08.360
We don't actually access this part of execution.

22:08.360 --> 22:12.360
Also, too, for our substitutions, we have to make a lot of changes.

22:12.360 --> 22:16.360
We try not to make too many, but that's the main kind of thing

22:16.360 --> 22:23.360
that differs between the substrate world and the host JVM objects.

22:24.360 --> 22:27.360
Any other questions?

22:27.360 --> 22:29.360
Nope.

22:29.360 --> 22:32.360
All right. Well, thank you.