WEBVTT

00:00.000 --> 00:13.240
Hello everyone, welcome. My name is Peter, I'm a computer scientist and software engineer from

00:13.240 --> 00:19.160
Norway. You have my handles and my website, I'm a master-down handle over on the right

00:19.160 --> 00:26.000
side. If you want to get in touch, that's my nickname on pretty much all of the various

00:26.000 --> 00:34.000
platforms where you can find me. What I'm going to talk about today is NIM and C and I have

00:34.000 --> 00:41.480
subtitled the talk, reaching the stars by standing on the shoulders of giants. So there are

00:41.480 --> 00:48.880
a lot of languages created every day and just a quick show of hands, like how many here

00:48.880 --> 00:56.960
have tried their hand at making their own language? Okay, that's decent decent chunk.

00:56.960 --> 01:02.320
So you can create your own language and you can have all of these nice features, but as soon

01:02.320 --> 01:08.320
as you get one user, you get tricky questions like, how do I calculate and MD5 somewhere,

01:08.320 --> 01:13.800
hopefully I have shot 256 on? How do I download files from the internet? How do I do this?

01:13.800 --> 01:19.040
How do I do that? At that point, you need to start looking into creating libraries and obviously

01:19.040 --> 01:26.040
you don't want to create them all yourself. A lot of languages in sort of to bootstrap

01:26.040 --> 01:32.720
the environment of libraries and packages will allow you to use libraries from a different

01:32.720 --> 01:40.680
language. So for example, closure and F-sharp will allow you to use Java and C-sharp packages

01:40.760 --> 01:48.040
because they're built on the respective virtual machines. TypeScript, closure script, they

01:48.040 --> 01:56.760
have both compile or transpiled down to JavaScript, so they will allow you to use JavaScript

01:56.760 --> 02:04.520
functions and SIG and NIN and a lot of other including probably all of these will also allow

02:04.600 --> 02:13.160
you to use C language or C libraries. But just quickly before we get going, what is NIN?

02:15.240 --> 02:21.960
We got a small introduction, but NIN is a compile language, creates tiny binaries that you can

02:21.960 --> 02:28.440
just send through an email and the other person can run it in their system. It's a statically

02:28.520 --> 02:35.240
type language, so all type information has to be known on compile time, but it has type inference,

02:35.240 --> 02:41.720
so for example, some in count, NIN knows that those are integers, same for the line and the

02:41.720 --> 02:48.440
force statement, the lines iterator, only ever returns strings, so line has to be a string.

02:50.040 --> 02:56.280
It also has a lot of like clever types, so you can have for example an ID stored as a distinct

02:56.280 --> 03:03.960
integer, so you can't take an ID and time it by two, and you can write a procedure that only

03:03.960 --> 03:09.160
takes IDs and won't accept other integers, even though there are stored as an integer.

03:11.000 --> 03:19.880
It also importantly for this project, it has a flexible macro system. So NIN macros take a piece of code

03:20.760 --> 03:25.080
and it outputs a piece of code. So you get an abstract syntax tree,

03:25.080 --> 03:31.800
parsed of the code you gave it, and you can return a new tree doing whatever you want the code to do.

03:34.280 --> 03:40.040
And this is a quote, or I thought it was a quote, by the creator of NIN, the speed of C, E

03:40.040 --> 03:46.120
of Python, and the flexibility of Pearl. I've talked to him since and he says C never said it.

03:47.080 --> 03:55.960
And now I've been quoted as saying that, so yeah. But yeah, for this project, it's important to know

03:55.960 --> 04:03.000
that NIN sort of compiles to C, so as you saw, NIN will create C code, that C code goes into the C compiler

04:03.960 --> 04:10.680
and outcomes binary. This of course means that we get very easy foreign function interface,

04:11.000 --> 04:17.160
the only thing that NIN needs to do to allow you to call the C function is just to put that C function

04:17.160 --> 04:27.080
into the C file it creates. NIN also has hookable automatic memory management. So it has a GC,

04:27.080 --> 04:34.280
you can enable a GC, but by default it uses an automatic memory management system, which isn't technically

04:34.360 --> 04:42.040
a GC, it's not stopped the world, it's not a lot of those issues and it's hookable. So if you

04:42.040 --> 04:47.480
import a C library, you can say, well okay, when I create this type called this function and when

04:47.480 --> 04:52.600
the type goes out of scope and it's not reachable by anything anymore, call this structure function.

04:53.960 --> 05:01.080
So that way we can wrap C libraries in a way that makes them super easy to use in our NIN project

05:01.080 --> 05:09.880
and feels more like a NIN library. But I said compiles to C, he said transpiles to C,

05:11.800 --> 05:19.720
this is an example of some more horrible C code generated by NIN. This is just the inner

05:19.720 --> 05:29.320
part of this loop. NIN uses C more as an intermediate representation. You're not intended to look

05:29.320 --> 05:39.560
at this code. There was a question about debugging, it will generate line directives and stuff

05:39.560 --> 05:44.600
like that. So if you open it in GDB, you will actually see the NIN code and I'll see code, you can

05:44.600 --> 05:50.280
step through the NIN code. But if you want, you can disable that and step through the C code instead.

05:51.880 --> 06:01.240
I'm like if this horrible mess, it's doable. So yeah, it's more, it creates efficient C code

06:01.240 --> 06:06.040
that the C compiler is able to optimize very well, but it's not meant for human consumption.

06:06.680 --> 06:18.040
So the foundations are for doing FFI and NIN. We have a logic that C file just contains

06:18.040 --> 06:27.240
a single function. In our calculated.NIN file, we have compile logic.c. So we need to tell NIN

06:28.280 --> 06:34.120
how to find the C code. So it could be a dynamic library, it could be a static library. Or in this

06:34.120 --> 06:42.040
case, we just say compile list file, it will be available during runtime. Then we'd say, okay,

06:42.040 --> 06:49.960
you have a function add to integers and it is imported from C. So we just tell NIN trust me,

06:49.960 --> 06:55.400
there's a function called add to integers. Don't worry about where it comes from, it will be available.

06:56.200 --> 07:02.840
And then we can just call that in our program. We don't need actually the one. It's main module

07:03.000 --> 07:11.080
thing, but this is quite a simple C file. The libraries that you want to use from C are probably

07:11.080 --> 07:19.240
a bit more complex than this. So writing all of these gets super tedious, super fast. And if you

07:19.640 --> 07:25.560
spend a lot of time, you write your nice wrapper, a new version comes out and they've suddenly

07:25.560 --> 07:32.440
like added a field to a type here or change the function or stuff like that. Then you need to remember

07:32.760 --> 07:39.160
go over all the types and put everything back in yours. You need to do the whole manual process

07:39.160 --> 07:47.800
all over again. And that is where we try to automate stuff. So this projects that I've created is

07:47.800 --> 07:55.560
called tooth arc, it basically creates all of those definitions automatically from C header files.

07:56.520 --> 08:01.640
And this isn't a new thing. NIN has had C wrappers since pretty much day one.

08:02.360 --> 08:10.280
The first one was created by the creator of NIN himself and that tries to read the header files itself

08:10.280 --> 08:19.000
and understand everything in output NIN code. Unfortunately, C is notoriously hard to parse and

08:19.000 --> 08:24.120
understand and the whole pre-processed sort of thing doesn't exactly make it easier.

08:25.960 --> 08:33.000
So it sort of fails in a couple things. Then another project came along called I think was named

08:33.000 --> 08:41.720
drop or yeah, which is the tree-citter algorithm, which is great for parsing C syntax,

08:42.280 --> 08:47.640
but not for parsing C semantics. So it's meant for like syntax highlighting and understanding

08:47.640 --> 08:53.080
like the structure of C, but it doesn't really tell you much about what's actually going on

08:53.080 --> 08:57.320
in the code, like what's the type as this thing. So it's still only to understand like the

08:57.320 --> 09:06.200
semantics of C when you're using tree-citter. My approach is to use live-clank. So live-clank

09:06.200 --> 09:12.680
the C language fontan or LVM, because what's better at understanding C, then it's E compiler.

09:13.640 --> 09:21.000
Luckily for me, playing also has a library version dynamic library that you can just import

09:21.880 --> 09:26.200
and now all of a sudden you have a C compiler within your own program.

09:28.200 --> 09:34.520
So the way it works, you import the library. We still need to tell

09:34.840 --> 09:40.840
them how to find the library. In this case, I pass a link or flag to just add this to my

09:41.640 --> 09:47.640
and aesthetic library to my build. And then I say import from C, I give it the path where my header

09:47.640 --> 09:53.720
files live, if they are installed system-wide, you don't need this. Then you give it the header

09:53.720 --> 10:00.760
file you want, and then I've created a bunch of like small niceties, small rules that you can apply

10:00.760 --> 10:14.280
to in this case rename the MAPM type to MAPM internal. So MAPM is just an arbitrary position

10:14.280 --> 10:24.040
math library from C. So what happens when you do this, the NIM file goes into the NIM compiler,

10:24.040 --> 10:29.080
the NIM compiler calls out to the food archive library, but the food archive library is basically

10:29.080 --> 10:37.000
just one big macro. So everything runs on compile time. And on compile time, we can't pull in dynamic

10:37.000 --> 10:42.200
libraries because the compiler won't let you do that. So it has a helper program called up there.

10:43.000 --> 10:49.000
So a food archive calls the upper program, which reads the C header files, and then the upper

10:49.000 --> 10:55.640
program spits out basically a big JSON file containing all the functions, all the types, all of that,

10:56.600 --> 11:05.480
which then foods art parses the JSON code and generates NIM code. Then the NIM compiler only has a bunch

11:05.480 --> 11:14.040
of NIM code. So it compiles the NIM code to C, and the C compiler comes in, pulls in the same C sources,

11:14.760 --> 11:21.080
and you don't need to compile with a client. So you can you can do this step with client,

11:21.160 --> 11:28.600
and that's step with whichever C compiler you want. And then that spits out a binary, which then

11:28.600 --> 11:38.040
can load it in NIM file B or whatever you want. So the code, the import C statement from before,

11:38.840 --> 11:44.200
on computer compile time, it basically gets turned into this. This is the output from food art,

11:44.280 --> 11:53.000
just a bunch of definitions. And all of a sudden you can just import the library where we put

11:53.000 --> 12:01.240
that code, and you can use it just like you would in C. But we're not using NIM to write C code.

12:01.240 --> 12:09.800
I don't want to write C style code in my NIM project. So with a couple of tweaks, wrapping things

12:09.800 --> 12:16.520
in nice NIM features, adding the automatic memory management stuff, and also in this case,

12:17.160 --> 12:23.480
hooking into the error function in the library to pull out. It has a stupid thing, but

12:24.120 --> 12:29.640
if it throws an error, it quits the program. It calls the quit function. It's quite annoying.

12:30.600 --> 12:37.640
So what I did, I hooked into that function, turned that into an exception, and then you end up with

12:37.800 --> 12:44.600
code looking like this. And this looks like NIM code. To here, like this, does a call to init

12:44.600 --> 12:52.920
map them feeds this into the map function. When this is no longer needed, then it will feed that

12:52.920 --> 13:00.680
object automatically by calling the C function. And all of that just works as you would want.

13:01.320 --> 13:10.680
The extra wrapping here is about 500 lines of code. A lot of that is just duplication,

13:10.680 --> 13:15.160
because I need to do the error correction for every function, or the error handling for every function.

13:16.680 --> 13:26.360
But the good thing is that the whole C to NIM conversion is done automatically. That's handled

13:27.320 --> 13:32.440
for me. So I don't need to sort of ever revisit that part. The only thing I need to care about

13:33.000 --> 13:43.480
is my sort of NIMification of the project. And if the map and library work to update, which

13:43.480 --> 13:50.760
it probably won't, because the author unfortunately passed away, and it was one one-man project.

13:51.640 --> 13:57.960
But if it updates, then I can run the code, I can compile again, and it says, well,

13:57.960 --> 14:05.240
this function has now changed, or the type, one of the type size, and then we'll just know that.

14:05.240 --> 14:10.760
It won't try to allocate a smaller size for an object or do something, don't like that.

14:13.320 --> 14:18.680
But of course, this has a couple of limitations. The good things first, handles pretty much any

14:18.920 --> 14:26.200
code. Macros, you can do like thoughts slash configure on the project and get the header files

14:26.200 --> 14:33.000
and everything in order. You can do C defines and all of that. So all of that pretty much just works.

14:34.280 --> 14:43.320
I'm wrapped stuff like GDK, I've wrapped X-Line Wies, I've wrapped. I used this to wrap a project

14:43.320 --> 14:49.560
that wasn't intended to be used as a library to create a dynamic library to link into it.

14:51.640 --> 14:57.640
And it all just works. It's also a very automated process. So, as you saw, you just do the

14:57.640 --> 15:04.360
import C. It just works. A lot of the earlier solutions would sort of, if it didn't understand

15:04.360 --> 15:11.240
something it failed. Truthart has a lot cleaner fallbacks. If you want to override something,

15:11.320 --> 15:16.440
if you want to improve on the wrapping, you can do that much more easily.

15:18.280 --> 15:27.400
Sort of the middle things. It's agnostic to the C linking. You need to specify how these things

15:27.400 --> 15:35.480
are actually getting into the final binary, which is good because it's flexible, but you also don't

15:35.560 --> 15:41.720
get any help from Fuitart for doing that stuff. It's also recently portable to C++.

15:41.720 --> 15:49.640
Leb playing reads C++ code just fine, but I haven't had time to do the conversion yet.

15:49.640 --> 15:54.360
So, it doesn't currently support C++, but there's no technical reason for that.

15:55.480 --> 16:02.920
Apart from time being finite. It's then like rate on function style macros. For those of you who know,

16:03.720 --> 16:08.040
the pre-processer and C is basically just like a copy-paste machine.

16:09.080 --> 16:14.040
So, a function style macros doesn't need to be complete. You can have like half a statement in

16:14.040 --> 16:20.040
a macro and then the other half comes when you compile. So, they don't have type information. They don't have

16:21.880 --> 16:27.720
yeah. Even Leb playing doesn't really know how to deal with those. If anyone has any good ideas

16:27.720 --> 16:35.240
for us all of this, I would be open to ideas. I know C does something very similar in how they

16:35.240 --> 16:45.080
are wrapped. C code. I tried to look at their solution for this and it's still just basically guessing.

16:46.920 --> 16:54.520
And it only wraps the C code. As we saw, we only get like the C output and then we have to

16:54.600 --> 17:01.880
nameify it ourselves. Yeah. And that's it.

17:08.120 --> 17:10.920
So, yeah, have three minutes for questions? Yeah, I have three minutes.

17:12.200 --> 17:22.600
If C bugger goes a so much trouble, maybe I'll try to see after. So, the question is if C macros

17:22.680 --> 17:30.120
calls so much trouble, why not wait until or why not pre-process the C code and then

17:30.120 --> 17:37.640
read the output? The problem is that a lot of libraries will use C functions down macros to apply

17:37.640 --> 17:43.880
or to give you default arguments. So, a function that takes a lot of arguments, they might have a C

17:44.760 --> 17:51.400
macro that you're supposed to call, which fills out extra arguments. To still need some of the

17:51.400 --> 17:56.200
functions style macros in your output. Yep.

18:21.480 --> 18:34.520
Finding a library, a macro component. Yeah, so I think the question is with all of these pre-processors

18:34.520 --> 18:49.560
stuff. Yes, yes. So, when you wrap this code, you need to give it all the same defines that you

18:49.640 --> 18:54.600
would for building the binary. And yeah.

18:54.600 --> 18:57.400
So, my question, do you have questions? Yeah.

18:57.400 --> 19:06.040
Would it be more correct than, say, more safe to analyze the library directly as we all get

19:06.520 --> 19:11.880
the same ones, instead of trying to cross the other five glitches by the commission and I

19:14.360 --> 19:16.360
I'm not sure if I

19:18.600 --> 19:23.640
times up. If you, yeah, I'll be, I'll be down here soon.

