WEBVTT

00:00.000 --> 00:05.000
Thank you.

00:05.000 --> 00:14.000
So I believe my talk today will be first of all shorter than potentially what you're used to.

00:14.000 --> 00:17.000
I think I will finish earlier than most of you.

00:17.000 --> 00:23.000
And secondly, I think it might be a little bit different because I'm coming from digital humanities.

00:23.000 --> 00:26.000
Is there anybody here who also comes from?

00:26.000 --> 00:29.000
Okay, all right. Okay, that's great.

00:29.000 --> 00:41.000
And I think as a continuation of what Olivia was saying, it's going to be very much about how an interdisciplinary work.

00:41.000 --> 00:52.000
It's very sometimes difficult and complex for people coming from different fields to understand each other.

00:52.000 --> 00:59.000
So in order to start that, I'm going to first say who I am or where I come from and where I work very quick.

00:59.000 --> 01:04.000
I'm Lucina Sadi. I did a Bachelor's in Computer Engineering at the University of Tehran.

01:04.000 --> 01:09.000
Then I did a Masters in Digital Humanities at the University of Bolonia.

01:09.000 --> 01:16.000
And now I'm doing a PhD at the University of Antwerp, also in Digital Humanities.

01:16.000 --> 01:25.000
And I'm working at the faculty of literature with people who work with digital scholarly editions.

01:25.000 --> 01:32.000
Now, the problems I face because a lot of you already come from Digital Humanities, I believe a lot of you also face.

01:32.000 --> 01:38.000
Me, as I said, I'm like Olivia, I do not have 30 years of self-engineering experience.

01:38.000 --> 01:41.000
I only have a Bachelor's degree.

01:41.000 --> 01:45.000
So sometimes I feel like I'm Superman and this scene.

01:45.000 --> 01:53.000
They think my humanities colleagues think that I've got them, but nobody knows who's got me.

01:53.000 --> 02:00.000
So yeah, in Digital Humanities a lot of us are, let's say, amateurs.

02:00.000 --> 02:07.000
So yeah, sometimes getting expert advice can be a little bit difficult.

02:07.000 --> 02:20.000
And well, right now, in specifically computational literary studies, two prototypes, which are what I work on,

02:20.000 --> 02:26.000
are not the best thing to present, because right now the buzzword is not that.

02:26.000 --> 02:31.000
The buzzword right now is generative AI and what you can do with that.

02:31.000 --> 02:35.000
And the H, as you all know, is still quite young.

02:35.000 --> 02:43.000
And it lags a little bit behind the cutting edge of science, even though it is very buzzwordy.

02:43.000 --> 02:55.000
So yeah, because of that, it happens quite often that the people in places of power in the H communities prefer to work with

02:55.000 --> 03:04.000
what has been working so far and are a little reluctant to keep working to improve upon things.

03:04.000 --> 03:12.000
And yeah, again, as the previous speaker mentioned, team structures and digital humanities have not exactly matured yet,

03:12.000 --> 03:17.000
having that common ground is difficult.

03:17.000 --> 03:24.000
Now what do we do in my research group, which will lead me to my, let's say, two prototypes here,

03:24.000 --> 03:28.000
we make digital scholarly additions of manuscripts.

03:28.000 --> 03:35.000
In my faculty, in my research group, we work with a different variety of manuscripts,

03:35.000 --> 03:41.000
medieval, middle-dutch manuscripts, from, for example, the Hernah monastery,

03:41.000 --> 03:44.000
but we also use modernist manuscripts.

03:44.000 --> 03:47.000
We have a lot of Joyce and Beckett.

03:47.000 --> 03:52.000
And we also have colleagues who work on born digital manuscripts.

03:53.000 --> 04:02.000
The manuscripts that have actually been written on a computer and the process of making them has been recorded by an input logger.

04:02.000 --> 04:11.000
And to analyze a lot of these manuscripts, we use hundreds in text recognition to facilitate the transcription of these documents,

04:11.000 --> 04:19.000
because one of my colleagues, for example, has thousands upon thousands of pages of police reports from early 1900s in Antwerp,

04:19.000 --> 04:25.000
and she uses hundreds in text recognition to facilitate the transcription of these documents,

04:25.000 --> 04:31.000
and then, of course, as everywhere in academia, we would like to publish our results somewhere.

04:31.000 --> 04:40.000
Now, you all are probably familiar with this, but just as a basic starter, the process of hundreds in text recognition,

04:40.000 --> 04:42.000
looks something like this.

04:42.000 --> 04:48.000
You first have your manuscript, you scan it, then you put it through a hundreds in text recognition platform,

04:48.000 --> 04:57.000
from which you end up getting computer readable text, then you correct the mistakes you find in there,

04:57.000 --> 05:02.000
and that would be the result of this hundreds in text recognition process.

05:02.000 --> 05:14.000
And what is used very often in the humanities faculties are these two programs, transcripts and escriptorium.

05:14.000 --> 05:25.000
Transcribers is the more popular one, but it's proprietary to use to work on a freemium model, then it became subscription based,

05:25.000 --> 05:32.000
and it has a centralized server, and they clearly say that they will be using all of your data,

05:32.000 --> 05:37.000
so whatever you send over there, they will use and then they make a model and then sell it to you.

05:37.000 --> 05:42.000
It is a very, very good app, but well, it's not free in open source.

05:42.000 --> 05:51.000
Escriptorium is an up-and-coming alternative to transcribers. It is free in open source.

05:51.000 --> 06:02.000
It does not have a centralized server, which is very good in a sense, but it also makes it less popular because of that,

06:02.000 --> 06:15.000
because people who work in the humanities usually are not very technically, like, capable of setting up their own institutional server,

06:15.000 --> 06:20.000
and sometimes they would need super computers if they have a lot of complex data.

06:20.000 --> 06:29.000
So that's another issue that exists with Escriptorium, but on the other hand, everything could be shared.

06:30.000 --> 06:34.000
Now, these two are really, really good.

06:34.000 --> 06:40.000
Transcribers actually has a tagline on lock the past, and it's amazing.

06:40.000 --> 06:47.000
You get to train models and, yeah, on lock your past and get a lot of transcripts.

06:47.000 --> 07:00.000
But then, the output of these two programs ends up either just being downloaded as plain text and puts in an archive, never to be seen again,

07:00.000 --> 07:10.000
or used for distant reading, for some kind of computational literary analysis, like counting how many times so and so work was used.

07:11.000 --> 07:30.000
But specifically, in my research group, what we do digital scholarly editions, we are really care about that inherent connection between the text and the actual manuscripts, in this case, TypeScript.

07:30.000 --> 07:34.000
This is, for example, the BDMP, the Beckett Digital Management Project.

07:34.000 --> 07:39.000
And as you can see, like, this was done completely manually.

07:39.000 --> 07:42.000
This was before Transcribers, et cetera, was a thing.

07:42.000 --> 07:55.000
And the people who made it, it was actually done in our faculty, really cared that the person can see exactly what line and how it was written.

07:55.000 --> 08:17.000
It was not just the text that was important, but the connection between the context, the content of the text and the form of it was actually an important part of this, yep, like, process.

08:17.000 --> 08:34.000
So now the options to, like, publish whatever it is you have with the text, the image with the text are either to make bespoke digital editions from scratch, which requires lots of money, lots of time, lots of effort.

08:34.000 --> 08:43.000
Use Transcribers sites, which starts at over 47 euros a month, if you have more than 1,000 pages.

08:43.000 --> 08:49.000
It is, again, not for your open source. It is not for the user to be scriptorium.

08:49.000 --> 08:54.000
Or to use TI publisher, which is for your open source, but it loses the text to image connection.

08:54.000 --> 09:03.000
You can use your image and your text, but the text does not have any inherent connection to the image that you have.

09:03.000 --> 09:13.000
And it is dependent on the existDB ecosystem, which, again, creates a steep learning curve for anybody in the humanities who wants to make it.

09:14.000 --> 09:27.000
Now, this is where I come to my proposed solution that is for now locally, and, like, also, one other, like, faculty is using it.

09:27.000 --> 09:46.000
Which is a viewer I call, Nick Durus. I call it Nick Durus, because the editor I previously made and presented on this year is called Axelotl. And Nick Durus is another kind of that species. I don't remember what they are called even.

09:46.000 --> 10:05.000
So, it is a lightweight and modularized TI viewer and publisher. It is a React component. And process data from transcribers or escriptorium, or even not necessarily from the output of a hundreds of text recognition platform.

10:05.000 --> 10:19.000
Can be directly exported to and rendered in it. And the interlink to zones, which is inherently valuable for digital scholarly editions, will still be present to aid the user.

10:19.000 --> 10:34.000
This is pretty much the setup of the architecture that we have designed. And these are the features. They are not going to sound out of this world, because this is a room full of software developers.

10:35.000 --> 10:57.000
I have pagination, so it is not just one page, you can have as many pages as you want. I have collection support, so you can have multiple books. You have the raw XML view, so you can see it as rendered, beautiful HTML, whatever, but you can also see the underlying TI XML underneath it.

10:57.000 --> 11:24.000
The page layout is flexible. You can replace the manuscript section and the text section, different places. You have a collection search. You have authentication. The architecture is API driven, so it is like database, agnostic, let's say. And of course, as opposed to most of the options that exist, it is free and open source.

11:24.000 --> 11:39.000
This is a sample of what it looks like. It is very very bare bones. I am leaving the extra CSS to the people who are using it. This is a sample of those herna manuscripts I mentioned.

11:39.000 --> 11:49.000
As you can see, you have the connection between the text and the lines and the image.

11:50.000 --> 12:05.000
And one issue with this was that it was difficult for humanity scholars to set up, because again, this is like you would need a server, a database, and just setting up a web application.

12:05.000 --> 12:21.000
Again, humanity scholars could find this difficult. So what I did was I made a GitHub pages template for a static version of the app that has a lot of those previously mentioned features, not all of them.

12:21.000 --> 12:32.000
And the only required skill to get it up and working is minimal familiarity with Git, so you just have to know what's like pull and push are pretty much.

12:33.000 --> 12:53.000
And it works very well for smaller projects and for bigger projects that can act as a first step because a number of the archives we've talked to have said, okay, this looks interesting, but is there a way we can put up some of our data on this so that we can see if we like it.

12:53.000 --> 13:14.000
And this can act as a first step for that. And this one, this is a digital scholarly edition of the last notebooks of Virginia Wool, is done with that early template with that does not have all of the features.

13:14.000 --> 13:38.000
As you can see, it looks very much the same, has 270 pages over 45 documents and it works pretty well and the person who made it is not familiar with programming at all, but he made it completely on his own just using that GitHub template.

13:38.000 --> 14:00.000
Yep, I just wanted to share this like let's say at talk solution that I came up with and how I continued to use it and how other people are using it, this person for example who's using it is at Oxford and is doing something that's completely not related to me.

14:00.000 --> 14:12.000
But he was looking for something like this and he just couldn't find anything that would work with his level of digital literacy, let's say.

14:12.000 --> 14:17.000
And that's it for my talk. Thank you all so much.

14:18.000 --> 14:24.000
And we have some questions in the room.

14:24.000 --> 14:39.000
Thanks for the report, it was a very interesting tool to provide.

14:39.000 --> 14:44.000
I was wondering because I have been facing a report for this in the last years.

14:44.000 --> 14:51.000
And then I chose to go with just T.I. to display T.I. to enhance it with graphic information.

14:51.000 --> 14:52.000
Yes.

14:53.000 --> 15:16.000
That's actually a very interesting question and we've had a lot of talks about this at our research group.

15:17.000 --> 15:23.000
The thing is that people who work with digital scholarly editions specifically wants to use T.I.

15:23.000 --> 15:44.000
Because T.I. has some specific tags that's the people who work for example with genetic editing really need, for example, to set up their apparatus, the genetic apparatus of how a manuscript was edited.

15:45.000 --> 15:52.000
And also is focused on where in the page everything is, which makes a lot of sense.

15:52.000 --> 15:58.000
What you get out of transcript was any scripturium is actually also either alto or page XML.

15:58.000 --> 16:05.000
I end up transforming this that is why I developed that editor I previously mentioned.

16:05.000 --> 16:13.000
There I transform them to T.I. because the people who make these digital scholarly editions still very much only want to use T.I.

16:13.000 --> 16:15.000
That best the reason.

16:15.000 --> 16:16.000
Yep.

16:16.000 --> 16:17.000
Thank you.

16:17.000 --> 16:20.000
Thank you a lot of questions.

16:25.000 --> 16:28.000
And let's speak again.

16:28.000 --> 16:29.000
Thank you.