WEBVTT

00:00.000 --> 00:07.760
Well, thank you all, thank you everybody for coming.

00:07.760 --> 00:13.200
This is a talk about whip it, which is a garbage collector library, which I've been working

00:13.200 --> 00:22.200
on as a potential and hopeful replacement for the garbage collector library used in Gile.

00:22.200 --> 00:28.960
And in this talk, I have one big part, and one medium part, and one tiny part.

00:29.040 --> 00:35.120
The big part is the big idea, which I try to explain what's going on with this thing.

00:35.120 --> 00:41.440
And then we're going to look a bit of what do we win effectively, like what changes in

00:41.440 --> 00:47.760
Gile and potentially other systems by switching to whip it, and then some forward-looking statements.

00:48.480 --> 00:55.200
So, starting off with a big idea, it's kind of like I said in the top title, whip it is a

00:55.280 --> 01:02.720
practical memory management upgrade, so memory manager, that's upgrade, for Gile and beyond.

01:02.720 --> 01:07.040
And we're going to break this down and kind of keep repeating it and looking into the individual

01:07.040 --> 01:14.560
parts. And so, first of all, it is a memory manager, meaning it is a garbage collector,

01:14.560 --> 01:19.840
it is something that takes care of allocating and reclaiming memory in your program.

01:20.160 --> 01:26.640
And ideally, it preserves a property that you never reference memory that's unused.

01:27.440 --> 01:33.120
You eliminate use after free bugs by using a garbage collector system, and it efficiently reclaims memory,

01:33.120 --> 01:39.680
so it keeps the system fast. But before we go in, I like to give a little bit of texture.

01:39.680 --> 01:44.320
Like when you go into a store and you want to touch something and see how it feels, you know.

01:45.200 --> 01:49.840
We're going to touch the API a little bit and see what it means to embed

01:49.840 --> 01:53.840
whip it into our programs. So, I just have like three slides here. This is a minimal

01:55.280 --> 02:00.320
get up and running with whip it. It's a tiny C library that goes in your source stream.

02:00.320 --> 02:06.480
It's not something you dynamically link to. There's sort of, we're declaring our general

02:07.440 --> 02:15.280
some parameters as GC and it function. But notably, I want to say that we declare a heap type,

02:15.280 --> 02:20.080
which is opaque to the user, and a mutator type. Every thread has a mutator. Every

02:20.640 --> 02:24.560
part of your program that allocates has a pointer on a mutator. And when you create a new thread,

02:24.560 --> 02:32.240
you create a new mutator from your heap. And there's a opaque set of options that

02:32.240 --> 02:37.840
actually parses parameters for the GC from like a command line or an environment variable or

02:37.840 --> 02:45.200
something like this. And we can also collect statistics, have a set of callbacks, that the

02:45.200 --> 02:49.840
garbage sector will invoke when it starts collection. So you can do histograms and things like this.

02:50.480 --> 02:57.760
So once you've called GC in it, you've made your heap. You now have an initialized heap

02:57.840 --> 03:02.480
and mutator, and then you can allocate, and you allocate by passing in the mutator with the

03:02.480 --> 03:10.480
size. And of course, it returns a voice star because it's a C. It's a couple of the deeper options.

03:10.480 --> 03:15.040
For example, here we can allocate some options and then set some values. In this case,

03:15.040 --> 03:19.760
we'll just set some things from the environment. It's my analogy to control the parallelism of

03:19.760 --> 03:26.080
the mutator or the heap size. The growth policy, or you can have it fixed, or you can allow

03:26.160 --> 03:32.720
to grow, or you dynamically increase it when your mutators allocating more, and then try to shrink

03:32.720 --> 03:42.160
the heap when it's reached a steady state. The embedded actually provides a definition of how do you

03:42.160 --> 03:47.920
enumerate the edges that point into the graph of live objects. So this struck GC mutator roots

03:47.920 --> 03:53.200
is actually provided by the embedded, meaning the guyel, in this case, instead of whip it. And

03:53.200 --> 03:58.880
it's a guyel who attaches roots to a particular mutator, and then the collector will be able to

03:58.880 --> 04:02.720
trace those. We'll see that in a minute. And additionally, if you have a generational collector

04:02.720 --> 04:07.600
configuration, you need to have right barriers. Right barriers are tiny bits of code that run

04:07.600 --> 04:12.880
when you mutate a field in an object that help the garbage collector keep its internal accounting

04:12.880 --> 04:17.840
of, for example, if it's trying to partition objects into two sets, and you mutate one of the

04:17.840 --> 04:26.240
edges, it might need to move an object into one set or into another set. Whip it is generally

04:26.240 --> 04:33.360
a cooperative safe point system, so the mutator will have to every allocation is a potential

04:33.360 --> 04:38.560
garbage collection point, but if you go through a long period without allocating, you might need

04:38.560 --> 04:43.920
to emit a safe point, and all of these have like fast paths and slow paths, and the fast

04:43.920 --> 04:49.360
paths in the inline and such. And for some collectors, you might need to like pin an object.

04:49.360 --> 04:54.720
That's a general shape of the API. And then on the embedded side, which is typically not something

04:54.720 --> 05:00.320
users have to do so much, but it is for each embedding, you need to implement the kind of

05:00.320 --> 05:08.960
the hooks into how to trace the graph essentially. So if you get an object, how do you

05:09.840 --> 05:14.640
the better implements a GC trace object function only one of them for the whole program,

05:14.640 --> 05:17.600
that needs to be able to trace any kind of object, so you need to be able to

05:18.160 --> 05:23.360
introspect somehow on this object, and that is up to the host. The garbage collector does not

05:24.880 --> 05:28.960
impose any restriction or requirement on the representation of objects as

05:28.960 --> 05:34.400
purely in better concern, and you call a visit function for each field, essentially on it,

05:34.480 --> 05:38.800
passing some data, the usual thing. Same for how do you trace and mutate the roots?

05:38.800 --> 05:43.440
And this leaves it completely up to the embedded as to whether you have handles, stack maps,

05:44.480 --> 05:51.120
what your strategy is for tracking roots, and making it possible for the collector to

05:52.320 --> 05:57.440
enumerate all the edges into the graph. That's all the code that I'm actually going to show in this

05:58.320 --> 06:05.040
presentation. So it is a memory manager, has a general feel, it slots into your source

06:05.040 --> 06:11.760
tree, and I'm designing it with a particular use case in mine, which is Gile. Gile is a very old,

06:11.760 --> 06:20.000
it's like what 33, 2, could be 40 years old depending on how you count it. And it has a garbage

06:20.000 --> 06:26.400
collector that is the boom, Denver's Weiser garbage collector, a conservative garbage collector,

06:26.480 --> 06:33.280
works very well, but it's old, and there are better things to do now. And there are things we

06:33.280 --> 06:38.240
would like that the boom collector does not give us. We would like to be able to do allocation,

06:38.240 --> 06:42.480
like bump pointer allocation instead of previous allocation. We would like to have some features

06:42.480 --> 06:48.240
that are impossible to implement in the boom collector, like a femrons. We would like to have

06:48.240 --> 06:53.280
keep growth and shrinking. We would like to have more control over the overall size of the heap,

06:53.280 --> 07:01.600
more visibility into the dynamics of a program and a better and a heap usage,

07:01.600 --> 07:07.840
or to be able to control things a bit better. And we would also like to be able to

07:09.200 --> 07:14.880
experiment with different collector algorithms, actually. There are a few different algorithms out there

07:14.880 --> 07:21.760
and a few different ways you can compose spaces. And there's no global optimum here. And so

07:21.760 --> 07:27.200
it may be that an embedded needs to choose a particular allocator for a particular workload. And that's

07:27.200 --> 07:31.600
that's how it should be. And with it, this choice is done in compile time, because we want to compile

07:31.600 --> 07:36.240
time specialization of the collector to the embedded and of the embedded to the particular collector

07:36.240 --> 07:42.160
configuration. It's not an dynamic thing like in Java, that would be that would essentially require

07:42.160 --> 07:47.680
jit compilation for good performance and have more warm up and such. This is a different point

07:47.680 --> 07:53.600
of the design space. So as something for a guy, we would like all of these things, but we have

07:53.600 --> 07:59.840
to be able to get there from where we are. And where we are is a funny place. We still have

07:59.840 --> 08:07.520
a fair amount of C code. We still have a lot of effectively. We don't explicitly add each

08:07.520 --> 08:11.600
route, each reference from like a local variable that points to a garbage collected object. We

08:11.600 --> 08:17.440
don't put them anywhere. They're just on the stack. And we rely on the boom collector to essentially

08:17.440 --> 08:23.280
look at each word on the stack and see if it might point to an object. We actually support this

08:23.280 --> 08:30.480
use case. In an effort to provide a path from where we are to where we would like to be. And

08:30.480 --> 08:35.760
we're going to be evaluating, like whether this is a useful strategy to keep or whether we should

08:35.760 --> 08:41.040
move off to what's called precise routing, which is a possibility, but we don't look at

08:41.120 --> 08:47.840
about that later. So the idea is that whip it is like a load bearing abstraction.

08:48.640 --> 08:55.600
But because it's in the load bearing part of this abstraction is the API. And the fact that it's

08:55.600 --> 09:03.200
supported by this load bearing point allows us to pivot. So if in gal we switch to whip it,

09:03.920 --> 09:10.080
we can start with behavior that's very close to what the current collector does. And over time,

09:11.040 --> 09:17.120
we can change things to enable different configurations and more performance. So the first

09:17.120 --> 09:24.160
collector that we're going to try in gal after the boom collector that I would like to try

09:24.160 --> 09:30.080
is this MMC collector because it supports conservative routes from the stack and conservative

09:30.080 --> 09:35.840
routes from global heap sections and optionally even conservative edges between objects.

09:36.240 --> 09:45.760
And so in the whip it library, there are a few collectors, collectors are configurations of spaces.

09:47.040 --> 09:52.400
There are three main collectors, like three main collector variants. One of them is the MMC collector,

09:52.960 --> 09:59.840
the mostly marking collector. It mostly marks objects, sometimes it can copy and evacuate objects.

09:59.920 --> 10:03.040
So that's where the name comes from. And it's composed of two spaces.

10:04.800 --> 10:09.360
The space we call the novel space and the large object space or the low space.

10:10.320 --> 10:15.680
And the novel space, this is the one that if you were here a couple of years ago when I presented

10:15.680 --> 10:23.680
first about whip it, I was very excited about because this MX design allows for

10:24.080 --> 10:29.680
improved performance and bump pointer allocation and optional allocation and optional

10:29.680 --> 10:34.720
conservative routes. And I was just so excited and made this prototype. And at that time whip it was

10:34.720 --> 10:42.720
essentially just this space. And in these interseeding two years whip it has become more of a collector

10:42.720 --> 10:51.520
or a family of particular collectors but like an embeddable library. That's the essential

10:51.600 --> 10:55.200
difference relative to a couple of years ago besides performance and bugs and features and stuff.

10:55.200 --> 11:00.800
Like we've gone really from prototype to something that is pretty much ready right now.

11:00.800 --> 11:07.200
And the novel space has some memory overhead, not 12 percent, because it records a bite for

11:07.200 --> 11:16.160
per granule or yes. And a granule is 16 bytes. So 12 percent or 6 percent.

11:16.240 --> 11:24.320
You're 6 percent isn't it? It's how it is actually. No.

11:27.200 --> 11:34.800
That's yeah, right. It's 6 percent. It supports pinning which is an pinning permanently

11:34.800 --> 11:41.520
maybe because you really need this object not to move. You can't be sure that the references to it

11:41.680 --> 11:48.880
will allow moving for whatever reason. And it can also pin routes of preventive moving.

11:48.880 --> 11:54.240
The routes that come in from conservative routes, the ones you might not know that actually refer to an object

11:54.240 --> 12:00.480
or not, those objects can't move because we're not sure if the edge coming in is relocatable or not.

12:00.480 --> 12:04.320
But then if you can precisely trade the inter-teap edges then you can move everything else,

12:04.320 --> 12:09.520
which is a pretty interesting possibility. And it's also absolutely generational using what's

12:09.600 --> 12:14.720
called a sticky market algorithm. And then larger objects are never moved.

12:14.720 --> 12:20.160
They're allocated with M map effectively. And when they get freed they go on a free list

12:20.160 --> 12:25.360
that after a couple seconds if that free list entry hasn't been reused, gets returned to

12:25.360 --> 12:32.080
the OS. So we're trying to minimize virtual memory traffic here. And it's optionally

12:32.080 --> 12:36.560
generally generational as well. That's probably the space that's going to be the main one for

12:36.560 --> 12:41.600
for a guy although we'll see. And whip it is an upgrade, I think, not just for a guy

12:41.600 --> 12:47.600
itself, but also potentially for other systems. So I'm building it with guy on the mind,

12:47.600 --> 12:52.800
but I also want to target other small languages, which is I think some folks in here,

12:54.960 --> 13:00.320
that when you build a system, you've got a lot to build. You have the compiler and you want to build

13:00.400 --> 13:06.640
the GC, I mean, you want to build everything, right? I'm speaking from projection, right?

13:08.080 --> 13:11.520
But you don't want to have so much time. And so it would be nice to be able to just include

13:12.560 --> 13:18.080
something that you know works and you know stretches farther to the state of the art than

13:18.080 --> 13:23.600
something like the bone collector. And I think we're going to be able to do this,

13:25.520 --> 13:30.080
especially in system, new systems often have precise roots. And so one of the possibilities

13:30.160 --> 13:34.640
that whip it gives you is a fully copying collector or a fully moving collector. So this is

13:34.640 --> 13:38.800
another new one relative to a couple of years ago. This is a parallel copying collector.

13:39.920 --> 13:45.280
It's a large object space, or managed just as before. And then the copy space is a blocks

13:45.280 --> 13:52.080
structured space. It's stopped the world, but highly parallel and high parallel for mutators as well.

13:52.080 --> 13:57.760
When a mutator needs to allocate more memory, it does bump-pointing allocation in, I think,

13:57.760 --> 14:03.520
they're 128 kilobyte blocks. But when it runs out, it gets a new one. It's all locked free. So it

14:03.520 --> 14:10.800
scales very well to us. And it's always compacting. And of course, it has 100% overhead because

14:10.800 --> 14:17.840
it's a copying collector. When you copy, you need to reserve all its space. We also have a

14:17.840 --> 14:21.760
generational configuration here. So instead of having a copy space and a large object space,

14:21.760 --> 14:27.600
we have a copy space and a copy space and a large object space. But because it's generational,

14:27.600 --> 14:31.920
you can limit the size of the nursery, and you can limit it ahead of time. And so you can

14:33.120 --> 14:38.560
you can do a number of tricks to make it cheap, to test whether an object is in the nursery or

14:38.560 --> 14:45.040
the old generation. This needs to be tuned. It's a relatively recent thing. And tuning

14:46.080 --> 14:54.400
generational GCs is very tricky. But what it does have is, and this is the same thing for

14:54.400 --> 14:58.560
the generational configuration of the most e-marking collector. And it has a very precise

15:00.000 --> 15:05.760
right barrier. So it precisely records fields that point into the new generation. It's not a

15:07.440 --> 15:12.880
it's not a barrier that marks any object within this block, for example. It's really precise fields

15:13.680 --> 15:21.120
the field logging might vary, so it's called. Yep, very good. In this collector, objects stay

15:21.120 --> 15:28.720
around in the nursery for one cycle, and then they get promoted up if they survive. And finally,

15:28.720 --> 15:33.760
this is a third major collector that is in the whip it is the whip at API, but with the bump

15:33.760 --> 15:42.240
collector behind it. And this mostly works, though, I mean it works, but it has a difference relative

15:42.240 --> 15:48.080
to other collector configurations, and it's not cooperative. If you're familiar with the bump

15:48.160 --> 15:55.840
collector, it's an amazing hack, but it stops a world by sending signals to processes. So you

15:55.840 --> 16:03.200
can be stopped anywhere. You're in better SV a little bit more ready with regards to being able to

16:03.200 --> 16:13.200
trace its roots at any point. But the other nice thing with the bump collector is you don't necessarily

16:13.200 --> 16:17.920
need to implement GCC trace object there, because we don't we don't actually call that. We allow

16:17.920 --> 16:25.360
the bump collector to conservatively track all edges there. Right, all right, um, practical memory

16:25.360 --> 16:30.080
management upgrade for gone beyond. So practical on on that side, I say this for embedding, I say

16:30.080 --> 16:38.240
it's not just for Gile, and this is not anything. How do you test this? I wanted to have these

16:38.240 --> 16:42.160
properties that it's embed only, it's not something you link to, you include in your source

16:43.040 --> 16:48.000
street, and it has no dependencies. It's it's C for better and for worse. I think we all know the

16:48.880 --> 16:53.520
advantages and trade backs there, and it's something you can hack on. And the thing I've been

16:53.520 --> 16:59.920
working on to test this, I know something else. The other ones, you're going to find it funny.

17:00.640 --> 17:07.760
It's another scheme of meditation. I wrote a special scheme of meditation. It's not a great one.

17:07.840 --> 17:14.160
It just compiles to see, and the whole purpose is to test whip it in its different configurations.

17:14.160 --> 17:19.760
Because writing, um, when you when you have a garbage collector like this, you don't actually

17:19.760 --> 17:24.640
want to write, um, you don't, the embedded shouldn't really have C programs. It should be a language

17:24.640 --> 17:30.160
of meditation, and it's compiler and runtime should ensure the property that like you enumerate all

17:30.160 --> 17:35.600
the, all the, all the roots, like handles all your stack mapping and stuff. And, and what I had before

17:35.680 --> 17:41.520
was like manually written C programs that, uh, that were the micro benchmarks, and, and whiffle allows

17:41.520 --> 17:46.320
me to write scheme programs that, that there are the benchmarks. Still micro micro benchmarks,

17:46.320 --> 17:52.320
it's kind of where we are, but, um, and I also wanted to test like explicit handle registration

17:52.320 --> 17:57.760
versus stack maps and things like this. Um, motivation was testing, and, like I say, it's

17:57.840 --> 18:05.440
got many, many bugs. Um, so foreguld itself. Here is, here's, here's a plan, as I present to

18:05.440 --> 18:13.600
the little, right? The plan is, uh, first, uh, switch dial to use the whip at API, right? Every,

18:13.600 --> 18:19.680
everywhere you would call the bone collector instead, call the whip at API, and thread the

18:19.680 --> 18:25.280
mutator through everywhere that you need, like the sort of thing. Um, but, but use the bone

18:25.360 --> 18:29.760
collector behind it, so we're not actually changing behavior. And then, let's switch over to

18:29.760 --> 18:35.920
the mostly marking collector with conservative roots. Maybe even conservatively tracing the heap,

18:35.920 --> 18:41.840
we'll see. Um, ideally you would want to implement GC trace objects so that we, we get compaction on

18:41.840 --> 18:47.760
these benefits. And then, uh, potentially add support for a generational collection. We'll see what

18:47.760 --> 18:54.160
this buys this, uh, in a bit, but it requires right barriers. So we have to be, we have to be careful

18:54.240 --> 18:59.840
about this. Uh, existing dial users don't have these right barriers in their code for it. There's

18:59.840 --> 19:05.200
mathematics here. And then maybe, you know, maybe we can switch to, uh, mostly marking the novel

19:05.200 --> 19:09.680
space and the old generation and then a copying, uh, nursery, which would be a very conventional,

19:10.240 --> 19:16.640
and, and more or less, they are configuration. Um, and I, I should mention here, uh, I've been

19:16.640 --> 19:23.040
working on this over the past few months, along with, uh, spritly work. Um, and, and my work on,

19:23.040 --> 19:26.800
on with it is been sponsored by it and on that. So I appreciate that very much. Thank you.

19:26.800 --> 19:35.840
Um, there's more. But yeah, great. Uh, all right. So, and, and, and it's for, for Gallen beyond,

19:35.840 --> 19:41.040
I have a few more things I want to do, like WebAssembly to see, but like WebAssembly with GC, and,

19:41.040 --> 19:46.080
and you embed this as it's, uh, as a garbage collector, so that way we can run actually standalone,

19:47.120 --> 19:52.080
programs, compiled from skiing to WebAssembly. Uh, and I'm, I'm also targeting okay, I'm on our

19:52.160 --> 19:58.160
issues like that. Right. So what do we get? Right? I don't know if it's top line or bottom line,

19:58.160 --> 20:02.800
I'm not sure how people do accounting, but, um, what we can expect more or less is for a given

20:02.800 --> 20:10.080
memory size, uh, we can improve throughput, um, could be 20%, you know, maybe an accident, 40,

20:10.080 --> 20:14.080
sometimes it's actually quite a lot, but, sometimes it's literally a bit, you know, that's a general

20:14.080 --> 20:20.720
range. And then also, uh, we can, uh, end up with systems that use less memory, right? Um,

20:20.880 --> 20:25.360
that add a given throughput, you can use less memory. Let's see. It's going to be a bunch of graphs,

20:25.360 --> 20:31.440
I'm going to go through them faster than, you know, is recommended, but here we go. All these graphs

20:31.440 --> 20:35.680
are, are, are space time graphs, right? This is how you evaluate the GC, because the GC is a

20:35.680 --> 20:40.480
fundamental space time trade off. Uh, the more heat you give it, uh, the less time it takes,

20:40.480 --> 20:47.040
essentially. And so you expect to see a, a curve like this is like the, this is a heat size multiplier

20:47.120 --> 20:52.320
from the x-axis. As you get towards like a one, one times live variable size, then you're going

20:52.320 --> 20:55.920
to be collecting all the time because you have very little space to work in. And as you have more space,

20:55.920 --> 21:01.120
then the collector can do better. Uh, the green line, uh, is the most e-marking collector,

21:01.120 --> 21:06.400
for, um, this one particular benchmark, it's an employer standard, uh, Gabriel benchmark. Um,

21:06.400 --> 21:11.120
the blue is the bone collector, and then the orange is the parallel cotton collector, and they're

21:11.440 --> 21:17.040
a little bit compressed here, um, because we see a couple things. One, uh, the most e-marking

21:17.040 --> 21:21.200
collector allows us to access these smaller heat sizes that we cannot get in the other collectors,

21:21.200 --> 21:26.240
right? Nobody else, they, they all fail at this size. This is setting in fixed heat size.

21:26.800 --> 21:34.080
Two, we are, we are more, we do better than, then the bone collector at every point here,

21:35.040 --> 21:41.360
and the bone collectors at the blue line lowers better. And then, uh, if you switch to another

21:41.360 --> 21:45.760
garbage collection algorithm, like a copy and collector, which has different performance characteristics,

21:46.720 --> 21:52.160
it beats the most, mostly marking collector at larger heat sizes, which is what we expect.

21:52.160 --> 21:56.400
So if you have a workloads like you need maximum throughput, and you have a lot of memory,

21:56.960 --> 22:00.720
then you might want to configure your collector to use the parallel cotton collector, for example.

22:01.520 --> 22:08.240
And we see similar things in, like, uh, this other, uh, tests here, uh, this is a partial value

22:08.240 --> 22:14.960
error test. Um, again, most e-marking collectors doing, you know, pretty good, looks like we actually

22:14.960 --> 22:20.960
cross here with, with the bone collector it seems. Um, and, uh, all those tests before we're with one

22:20.960 --> 22:28.080
mutator, uh, if you have eight, this is on my laptop with, uh, eight cores, 16 threads. Uh, if you have

22:29.040 --> 22:34.800
threads, then you see a similar graph, and, and where, you know, we, if a pretty good, uh,

22:34.800 --> 22:39.760
advantage here is between four, four and six seconds here, uh, for the bone collector versus MC,

22:41.520 --> 22:48.080
there in this sort of tail of that graph. Um, and, and some of the things for, for, for P about,

22:48.800 --> 22:52.880
right, uh, you might want to ask some questions, right, because, you know, there's a lot of narratives

22:52.880 --> 22:57.760
about garbage collection about, you know, what is actually good. You know, people like statements,

22:58.640 --> 23:01.840
and, and you know, know of his marketing or not, and says, another one of the goals of whip, it

23:01.840 --> 23:07.520
is be able to use the same system, but different configurations and see what, um, how do things

23:07.520 --> 23:13.920
before him? Like, uh, I have found two things. One, conservative root finding is quite fine, right?

23:13.920 --> 23:20.800
It's not actually a problem in these micro benchmarks. Um, and two, uh, general, generational GC is very

23:20.800 --> 23:26.640
complicated. It can decrease throughput in my current tuneings, which might not be optimal. They certainly

23:26.720 --> 23:31.360
reduce pause time, which I'm not going to show, uh, but, you know, that there's a two things.

23:31.360 --> 23:36.000
Okay. So this is first, um, three different configurations for the mostly market collector,

23:36.720 --> 23:40.960
uh, on this one micro benchmark that's, I think, with eight new traders, this one's probably about,

23:42.240 --> 23:46.880
uh, gigabyte heap or something like that. Um, as you can see, uh,

23:48.720 --> 23:55.920
all three mostly follow the same shape. Uh, the blue here on the top, the dark blue, is a configuration

23:56.000 --> 24:02.640
in which all edges are, are conservative. The lighter blue is a configuration in which

24:03.360 --> 24:08.960
edges into the graph from the stack are conservative, but, uh, edges within the heap graph are trace

24:08.960 --> 24:13.840
precisely. And then the red here on the bottom is a, is a precise one. It's almost in different

24:13.840 --> 24:21.280
between the stack conservative versus stack precise, uh, configurations. And, and there is a difference

24:21.360 --> 24:27.280
between the fully conservative. It's, it's, it's less good. It's not a huge difference, right?

24:27.280 --> 24:32.480
So it is something that you can accept. If engineering wise, that's, that's a configuration you need,

24:34.080 --> 24:40.400
same graph, different test. Um, generational collectors. We're not winning. We're not winning currently.

24:42.240 --> 24:49.600
The, uh, here is my whole heap collector and for some reason, my generational configuration

24:49.680 --> 24:55.840
of the Mercy Martin collector is, is, is not as good in terms of throughput. This can be a

24:55.840 --> 25:02.240
possible result, um, but it's, it's, it's a little bit of aplexing in some way. So there's

25:02.240 --> 25:06.720
still some investigation to do here. And I have a similar difference for the leg generational

25:06.720 --> 25:12.080
copy collector, um, which has different characteristics because the, the Mercy Martin collector,

25:12.080 --> 25:17.040
the nursery is the entire heap size versus, for the copying collector, the nursery is, it's just

25:17.120 --> 25:22.880
two megabytes per, per active mutator. Um, so there's more investigation to do here. It's a little bit,

25:24.560 --> 25:30.160
I, I, I don't fully understand these things. Um, but the, the, the conclusion I'm taking is that

25:30.720 --> 25:35.040
general GC is coming, generational GC is a little bit complicated, to, to, to, to, to, to, which is, I

25:35.040 --> 25:40.960
think, uh, mostly well, then. Finally, future. Um, finally going to slam it in a go. I think it's

25:41.040 --> 25:46.960
about this month or so, I'm going to start tacking on that. Um, maybe, you know, maybe, um,

25:46.960 --> 25:51.280
the other language runtime, you know, which talk, I should, we should see, uh, how we're doing.

25:52.080 --> 25:57.520
Eventually, I would like to do concurrent marking, uh, so that you have the tiny pause times,

25:57.520 --> 26:03.200
sub-millisecond for the, for the, for the minor GCs. And then when it comes time for the major GC,

26:03.200 --> 26:08.400
you've already marked most of the graph concurrently. And so you have to, you can minimize that pause time.

26:08.880 --> 26:14.320
Um, and then there are other, um, things I would like to investigate this card structure,

26:14.320 --> 26:18.480
algorithm called Lixer that uses, uh, some reference counting for the all-generation for a

26:18.480 --> 26:25.120
prompt recognition of the, all-generation code. Um, yeah, that's it. So, there we are. And,

26:25.120 --> 26:28.400
right out. Okay. That's what actually means. Thank you.

26:33.280 --> 26:36.720
Boy, we went straight into that. Didn't make. Yeah. Thanks for, thanks for coming along.

26:37.600 --> 26:38.560
Any, any questions?

26:42.720 --> 26:43.200
Say what?

26:46.320 --> 26:47.040
What pointers?

26:48.640 --> 26:50.800
Colored pointers. What is a colored pointer?

26:50.800 --> 27:02.400
Oh, yeah. We are not planning to use colored pointers, because I don't want to impose

27:02.480 --> 27:07.520
that kind of requirement on, uh, on the, on the collector. Uh, sorry, I'm on the, on the

27:07.520 --> 27:11.440
and better. It's an interesting possibility, but it's, it's, it's not within scope. Yeah.

27:17.360 --> 27:18.640
Another one. Dave?

27:18.640 --> 27:25.680
Yeah, I sure, uh, I know you folks have a, in the stop, but I'm interested in, uh, graph stop times times.

27:25.760 --> 27:31.200
Does I mention today how with it, uh, forms for some real time applications, video data.

27:32.400 --> 27:38.160
See me later today. Okay. All right. Everyone, please squeeze in. We're going to have a full house.

27:42.000 --> 27:43.280
In the middle of this, there are a question.

27:55.920 --> 28:10.560
If you're writing in rust, you should use mmtk instead of whip it. And if you're writing in

28:10.560 --> 28:16.560
zig, I don't, I don't know what facilities exist for, for the sort of C source code. It's, it's

28:16.560 --> 28:21.440
really not appropriate. I would say. Yeah. Sorry, the question is whether rust and zig should

28:21.440 --> 28:27.040
use whip it and the answer is no. In, in rust specifically, I'm TK's good option and in, in zig.