WEBVTT

00:00.000 --> 00:12.680
So, the next talk will be presented by Sid Rikul Mer, and it's going to be about end-to-end

00:12.680 --> 00:15.000
enterprise search with Datafariq.

00:15.000 --> 00:16.000
Yes, good.

00:16.000 --> 00:22.560
So, I have the immense privilege of having 25 minutes, and my laptop worked right away.

00:22.560 --> 00:29.960
So, when you have finished on time, so I'm Sid Rik and the President co-founder of

00:30.040 --> 00:34.760
France Labs, who has a software maker of Datafariq, and so today, we're going to talk

00:34.760 --> 00:39.560
about end-to-end enterprise search with Datafariq community edition, because we have two

00:39.560 --> 00:40.560
editions.

00:40.560 --> 00:46.960
So, Viagenda, we'll talk a bit about France Labs, organization, definition of enterprise

00:46.960 --> 00:50.960
search, because I'm not sure if we're all in line with what it means.

00:50.960 --> 00:56.960
Let me talk about Datafariq's architecture, the main functionality is, and some Gen AI,

00:57.960 --> 01:02.960
I was going to say that everybody mentions AI, but passable did not mention AI, so it's an exception.

01:02.960 --> 01:06.960
Maybe did not have time too, so you're ready?

01:06.960 --> 01:07.960
Yeah.

01:07.960 --> 01:08.960
Oh, yeah, wow.

01:08.960 --> 01:09.960
Okay, thank you.

01:09.960 --> 01:13.960
So, we were born 40, so who knew Datafariq before?

01:13.960 --> 01:15.960
Yeah, I think you're expecting that.

01:15.960 --> 01:16.960
That's good.

01:16.960 --> 01:18.960
So, we are specialising not being known.

01:18.960 --> 01:23.960
So, let's see, for 14 years of existence, we released Datafariq in 2015.

01:23.960 --> 01:28.960
November, we're basing these on the French Riviera, with a swimming pool.

01:28.960 --> 01:31.960
We speak in many conferences related to search.

01:31.960 --> 01:34.960
That's why maybe you don't know who you are, enterprise search.

01:34.960 --> 01:38.960
We are involved in open source businesses, so we are part of the French,

01:38.960 --> 01:44.960
Cluster, French Association of Open Source Businesses, and European Association of Open Source Businesses,

01:44.960 --> 01:47.960
respectively, CNL and Appell.

01:47.960 --> 01:53.960
Members of the search network and customers in weird industry sectors.

01:53.960 --> 01:58.960
So, the surprise, with sort of a make of Datafariq, so we provide expertise around it,

01:58.960 --> 02:00.960
and also an Apache solar.

02:00.960 --> 02:02.960
So, what isn't the price search?

02:02.960 --> 02:08.960
So, I could be pasted via, what I'm quoting, actually, the definition for Martin White,

02:08.960 --> 02:11.960
and he's book enterprise search, actually, that's been passed here.

02:11.960 --> 02:14.960
He was the only serious book about enterprise search, so it's.

02:14.960 --> 02:18.960
The enterprise search application enables employees to find all the information,

02:18.960 --> 02:22.960
but the company processes without the need to know where information is stored.

02:22.960 --> 02:26.960
And I did, oops, sorry, I didn't hit the microphone, but,

02:26.960 --> 02:29.960
particularly because most customers, we want security.

02:29.960 --> 02:34.960
By security, I mean, here, respecting the access control to documents.

02:34.960 --> 02:38.960
So, what does the typical enterprise search solution works?

02:38.960 --> 02:42.960
Well, it looks like a search solution, no surprise here.

02:42.960 --> 02:46.960
So, we'll detail a bit more what it does in Datafariq, but, yeah, it's a search bar.

02:46.960 --> 02:51.960
You search, you have a complete, you have facets to filter, and you have a results list.

02:51.960 --> 02:52.960
Okay, no surprise.

02:52.960 --> 02:56.960
Any addition, you have some monitoring, like statistics, analytics,

02:56.960 --> 03:00.960
dashboards to see who search is for what?

03:00.960 --> 03:03.960
And how does that usually work?

03:03.960 --> 03:06.960
Well, there are several components from left to right.

03:06.960 --> 03:09.960
You have a crawling face where you connect to the data sources,

03:09.960 --> 03:11.960
the documents, whoever they are stored.

03:11.960 --> 03:13.960
You retrieve them, you open them, you analyze what's in it,

03:13.960 --> 03:17.960
you take the metadata, you take the content, you put it in an index,

03:17.960 --> 03:19.960
search index, then you have the search components,

03:19.960 --> 03:23.960
we take a care of in real time analyzing the user's query,

03:23.960 --> 03:27.960
looking for the data in Vindex, and providing

03:27.960 --> 03:32.960
the results usually in a UI, and also a component for statistics.

03:32.960 --> 03:36.960
Okay, so now you are experts in enterprise search.

03:36.960 --> 03:38.960
Yeah.

03:38.960 --> 03:42.960
Okay, so what the data far is CE allows to do.

03:42.960 --> 03:44.960
So CE is the community edition.

03:44.960 --> 03:46.960
Because it's first them, I don't know.

03:46.960 --> 03:48.960
You choose for a Rocks at me.

03:48.960 --> 03:52.960
So it allows you to crawl multiple types of documents,

03:52.960 --> 03:54.960
and I have some slides explaining a bit more what it does,

03:54.960 --> 03:57.960
and this is thanks to Apache many full CF,

03:57.960 --> 04:01.960
which is two minor ledge, we only existing open source framework,

04:01.960 --> 04:03.960
doing this type of stuff.

04:03.960 --> 04:06.960
We use Apache Chica, we'll see that afterwards.

04:07.960 --> 04:11.960
Data Free CE allows you to do some search using VM25,

04:11.960 --> 04:16.960
which is SQL2TFIDF, and so we will do vector search.

04:16.960 --> 04:20.960
I will talk about it later, and while that we use Apache Solar,

04:20.960 --> 04:24.960
you know Apache Solar, we're in good,

04:24.960 --> 04:27.960
some nodding here, so that's good.

04:27.960 --> 04:30.960
So you are able to do some analysis,

04:30.960 --> 04:33.960
thanks to Dashboards, and you're able to manage your tool,

04:33.960 --> 04:36.960
which is difficult to use interfaces for administrators,

04:36.960 --> 04:40.960
from playing with the weights, doing some promo links.

04:40.960 --> 04:43.960
I don't have time for that, but entity detection, etc.

04:43.960 --> 04:47.960
You can use the admin UI of many full CF and Apache Solar.

04:47.960 --> 04:49.960
So what does it not allow you to do?

04:49.960 --> 04:52.960
So I put easily, because we don't form it anything.

04:52.960 --> 04:55.960
Everything's open, so you can find the time to do it,

04:55.960 --> 04:58.960
just that in order to survive, this is what I post up,

04:58.960 --> 05:01.960
but we put as extra intent to price edition.

05:01.960 --> 05:04.960
And security, and here I'm talking about access control,

05:04.960 --> 05:06.960
paramission document-level permissions,

05:06.960 --> 05:09.960
and to connect to a key directory,

05:09.960 --> 05:11.960
they want to adapt this type of stuff.

05:11.960 --> 05:13.960
Does not easily allow you to do big data search,

05:13.960 --> 05:16.960
in the sense that if you need a cluster of data for a system

05:16.960 --> 05:18.960
with distributed components,

05:18.960 --> 05:20.960
this is not done automatically,

05:20.960 --> 05:22.960
there is no answer, but recipient is kind of stuff.

05:22.960 --> 05:24.960
And it's not easy to do a backup,

05:24.960 --> 05:27.960
well, a nest, of course, if you are the Docker,

05:27.960 --> 05:30.960
and it's persistent, you can just save it, but then.

05:30.960 --> 05:32.960
And we don't prevent anything, okay.

05:32.960 --> 05:34.960
So I hope you feel better now.

05:34.960 --> 05:37.960
So what does it look like in terms of architecture?

05:37.960 --> 05:40.960
So it's one, it's on our weeky.

05:40.960 --> 05:43.960
And yeah, so that seems a gray box,

05:43.960 --> 05:46.960
that's for the monolithic version of Datafari.

05:46.960 --> 05:48.960
So in the gray box, it's exposing,

05:48.960 --> 05:51.960
it's using interface to users on the left,

05:51.960 --> 05:55.960
and we fetch the data, which on the right of the schema.

05:55.960 --> 05:59.960
So you can see we have some Apache, some Tomcat,

05:59.960 --> 06:03.960
some Zuki Perse, some Solos, some Casanoa, some Puzz Gray,

06:03.960 --> 06:05.960
LogStash, for historical reasons,

06:05.960 --> 06:09.960
Tika and I think, I've mentioned all of them.

06:09.960 --> 06:12.960
And it's very, it's a big soup of stuff to manage

06:12.960 --> 06:13.960
and to deploy.

06:13.960 --> 06:16.960
So it lots of fun when doing the deployments scripts,

06:16.960 --> 06:18.960
but we work it out.

06:18.960 --> 06:20.960
So the main components for that,

06:20.960 --> 06:22.960
we're using, so really the main ones

06:22.960 --> 06:25.960
would be Apache mini 4 CF, because that's the very framework

06:25.960 --> 06:27.960
for crawling the documents.

06:27.960 --> 06:30.960
And it's Apache Solos, for the search part,

06:30.960 --> 06:33.960
and it's Apache Tomcat, for owned Datafari web app.

06:33.960 --> 06:38.960
And we provide an admin UI to install managing use.

06:38.960 --> 06:41.960
If we dig a bit into many false CF,

06:41.960 --> 06:43.960
anybody new, many false CF people?

06:43.960 --> 06:44.960
Yeah, no.

06:44.960 --> 06:45.960
No, okay.

06:45.960 --> 06:48.960
Nobody cares about crawling, but it's an important step in,

06:48.960 --> 06:50.960
oops, let's keep it.

06:50.960 --> 06:52.960
So many false CF allows you to connect,

06:52.960 --> 06:54.960
to, I don't know, for instance, I put,

06:54.960 --> 06:56.960
yeah, file shares, I take the directory,

06:56.960 --> 06:58.960
I'm an LDAP web, XWiki.

06:58.960 --> 06:59.960
I mean, I think all of us,

06:59.960 --> 07:01.960
many presentations today were mentioned,

07:01.960 --> 07:03.960
this week is, I did my homework.

07:03.960 --> 07:05.960
We have a connect of YWiki.

07:05.960 --> 07:09.960
Yeah, you can have a pipeline of transformation

07:09.960 --> 07:11.960
connectors to transform the dates,

07:11.960 --> 07:12.960
have a choice of extracted.

07:12.960 --> 07:14.960
Maybe you want to put some regular expressions

07:14.960 --> 07:17.960
to find entities, maybe you want to

07:17.960 --> 07:19.960
print out some particular document types,

07:19.960 --> 07:21.960
or, yeah, I don't know,

07:21.960 --> 07:24.960
if things can be done in this pipeline,

07:24.960 --> 07:27.960
and then we push whatever has been analyzed

07:27.960 --> 07:28.960
into the solar index.

07:28.960 --> 07:29.960
So that's how many false CFs,

07:29.960 --> 07:31.960
so you can see many false CFs

07:31.960 --> 07:34.960
as an octopuses able to connect to many types.

07:34.960 --> 07:37.960
Many types of data management systems.

07:37.960 --> 07:39.960
It's also a month array.

07:39.960 --> 07:41.960
I didn't draw them,

07:41.960 --> 07:44.960
but they are coupé left, I think I hope.

07:44.960 --> 07:47.960
I think the guy is dead when it was years ago,

07:47.960 --> 07:48.960
so that's fine.

07:48.960 --> 07:50.960
So he can open me formats,

07:50.960 --> 07:53.960
to Apache Tika, so PDF, obviously,

07:53.960 --> 07:55.960
liberal fees, Microsoft Office,

07:55.960 --> 07:58.960
database tables, any other types.

07:58.960 --> 08:03.960
So it goes quick, but that's fine.

08:03.960 --> 08:06.960
So now you're expressing that patchymane false CF.

08:06.960 --> 08:09.960
Yeah, of course, now about solar.

08:09.960 --> 08:12.960
So well, solar is job is to take what a patchymane

08:12.960 --> 08:14.960
false CF sends it,

08:14.960 --> 08:17.960
and analyze it and create the solar documents

08:17.960 --> 08:18.960
that it puts in Vindex,

08:18.960 --> 08:20.960
and it's job is then at the search phase

08:20.960 --> 08:23.960
to answer very fast to the user's query,

08:23.960 --> 08:27.960
and to deliver relevant documents classified.

08:27.960 --> 08:29.960
It's similar to open search,

08:29.960 --> 08:32.960
if you just know open search with them.

08:32.960 --> 08:34.960
And for the analytics components,

08:34.960 --> 08:36.960
well, but now it has become pretty easy,

08:36.960 --> 08:37.960
what we do, because in the past,

08:37.960 --> 08:38.960
we were using L,

08:38.960 --> 08:39.960
and then we were fed up with L,

08:39.960 --> 08:41.960
and then we used Zeppelin,

08:41.960 --> 08:43.960
and Zeppelin was grass roots.

08:43.960 --> 08:45.960
So now we got rid of all that,

08:45.960 --> 08:47.960
and we take everything that's come from

08:47.960 --> 08:49.960
from the logs of Apache Manifold CF,

08:49.960 --> 08:50.960
of Apache solar.

08:50.960 --> 08:52.960
We still have log-stash,

08:52.960 --> 08:53.960
we need to get rid of it,

08:53.960 --> 08:56.960
and we put it in the dedicated search collection of solar,

08:56.960 --> 08:59.960
and this is where how we feed our home-made

08:59.960 --> 09:02.960
dash balls in pure HTML.

09:02.960 --> 09:04.960
Also, we stopped frameworks,

09:04.960 --> 09:06.960
like Zeppelin.

09:06.960 --> 09:09.960
So if we detail a bit,

09:09.960 --> 09:11.960
what you currently have with the data for UI,

09:11.960 --> 09:15.960
this is the screenshot of the version 2 of React UI.

09:16.960 --> 09:18.960
It's actually very similar to previous ones.

09:18.960 --> 09:20.960
So this is, and it cost us a lot to do,

09:20.960 --> 09:22.960
but we forgot to modify,

09:22.960 --> 09:24.960
we look and feel so that you have the feeling it's brand new.

09:24.960 --> 09:27.960
So it's brand new, a bit like the old one.

09:27.960 --> 09:30.960
So you have a search bar on top.

09:30.960 --> 09:31.960
When you start typing,

09:31.960 --> 09:33.960
you have your dropdown that appears,

09:33.960 --> 09:34.960
or you can also remove it,

09:34.960 --> 09:35.960
but that's fine.

09:35.960 --> 09:37.960
You have the suggestions for queries,

09:37.960 --> 09:38.960
or the standard,

09:38.960 --> 09:39.960
or the complete,

09:39.960 --> 09:40.960
you can do,

09:40.960 --> 09:41.960
we can have auto suggestions,

09:41.960 --> 09:43.960
which is proposing documents right away,

09:43.960 --> 09:45.960
to add some suggestions based on specific fields.

09:45.960 --> 09:46.960
So if you have product numbers,

09:46.960 --> 09:47.960
for instance,

09:47.960 --> 09:49.960
if it could propose you product numbers,

09:49.960 --> 09:50.960
as well, to complete.

09:50.960 --> 09:52.960
So that's why we drop down.

09:52.960 --> 09:54.960
Then you can see a bit on the left,

09:54.960 --> 09:55.960
the tabs,

09:55.960 --> 09:59.960
which allow you to filter based on a specific type of metadata that you want.

09:59.960 --> 10:01.960
You can also opt for facets.

10:01.960 --> 10:04.960
Any metadata can be a facet,

10:04.960 --> 10:05.960
basically.

10:05.960 --> 10:07.960
And you have different types of interactions with your facets.

10:07.960 --> 10:09.960
Can we have a checkboxes?

10:09.960 --> 10:11.960
Can we have the Windows,

10:11.960 --> 10:13.960
for example time window, or price window,

10:13.960 --> 10:14.960
whatever you want,

10:14.960 --> 10:15.960
or if you missed that,

10:15.960 --> 10:16.960
wait too much.

10:16.960 --> 10:17.960
Thank you.

10:17.960 --> 10:20.960
I hope you have lots of questions of the one.

10:20.960 --> 10:22.960
Otherwise, I will ask you questions.

10:22.960 --> 10:24.960
So this I covered.

10:24.960 --> 10:25.960
You had the search tools.

10:25.960 --> 10:27.960
I didn't put the screenshot about it,

10:27.960 --> 10:29.960
but you can export your search results.

10:29.960 --> 10:31.960
You can save your searches.

10:31.960 --> 10:33.960
You can have email alerts.

10:33.960 --> 10:35.960
So if you don't want every day to come to your corpus,

10:35.960 --> 10:37.960
and ask that data vary,

10:37.960 --> 10:39.960
did the new document appear.

10:39.960 --> 10:42.960
Then the data vary will run these queries for you,

10:42.960 --> 10:43.960
and send you an email saying,

10:43.960 --> 10:45.960
if today there is something for you.

10:45.960 --> 10:47.960
And you can do advanced search,

10:47.960 --> 10:50.960
graphically to combine search for specific fields

10:50.960 --> 10:52.960
with Boolean operators.

10:52.960 --> 10:53.960
Okay, let's clear.

10:53.960 --> 10:54.960
You understood using the face.

10:54.960 --> 10:56.960
There's specific challenge normally,

10:56.960 --> 10:57.960
but we never know.

10:57.960 --> 10:58.960
And you can notice,

10:58.960 --> 11:00.960
little box asks your content.

11:00.960 --> 11:01.960
It's a teaser,

11:01.960 --> 11:03.960
because we will go back to it a bit later.

11:03.960 --> 11:05.960
But you can see it's fair.

11:05.960 --> 11:07.960
Ask your content.

11:07.960 --> 11:11.960
I'm seeing this because now it's time to talk about AI.

11:11.960 --> 11:13.960
So historically we could connect

11:13.960 --> 11:15.960
at the indexing phase to install

11:15.960 --> 11:18.960
other field party systems to do a translation,

11:18.960 --> 11:20.960
or imagine analysis.

11:20.960 --> 11:23.960
In our case, most of the time

11:23.960 --> 11:25.960
it was to do optical character recognition.

11:25.960 --> 11:29.960
We've test your act for for image recognition.

11:29.960 --> 11:31.960
And now it's kind of changing,

11:31.960 --> 11:33.960
and it's more focused on using

11:34.960 --> 11:36.960
Gen AI to do this type of stuff.

11:36.960 --> 11:38.960
So it's actually easier for us.

11:38.960 --> 11:40.960
And so we can connect

11:40.960 --> 11:42.960
at indexing time and at search time.

11:42.960 --> 11:43.960
And at search time,

11:43.960 --> 11:46.960
it's more for the vector search and the rug.

11:46.960 --> 11:49.960
And I will explain a bit more about the rug afterwards.

11:49.960 --> 11:52.960
Visorinjectivity is actually so not co-founded,

11:52.960 --> 11:55.960
but co-funded by European Union.

11:55.960 --> 11:58.960
More precisely the NNGI search project,

11:58.960 --> 12:00.960
called Neural Data Faber project.

12:01.960 --> 12:03.960
View maze by Vologo.

12:03.960 --> 12:04.960
Which is a great logo.

12:04.960 --> 12:06.960
And the N2N prototype is planned

12:06.960 --> 12:08.960
for what we end of this quarter.

12:08.960 --> 12:10.960
So we think we need to hurry for that.

12:10.960 --> 12:13.960
Visorinject is done in collaboration

12:13.960 --> 12:16.960
with CIS, S-E-A-S-E,

12:16.960 --> 12:18.960
company in the UK.

12:18.960 --> 12:21.960
More precisely Alessandro Benedetti,

12:21.960 --> 12:23.960
who is a committer in Apache Solar.

12:23.960 --> 12:25.960
And actually yesterday evening

12:25.960 --> 12:29.960
he did the second pull request on Solar

12:29.960 --> 12:36.960
to add VMBedit's call of the embedding model in Solar.

12:36.960 --> 12:38.960
So the PR has been done,

12:38.960 --> 12:40.960
now it's waiting for reviews,

12:40.960 --> 12:42.960
which means that from now on in Solar

12:42.960 --> 12:44.960
independently from data ferry,

12:44.960 --> 12:45.960
you just need to push the document,

12:45.960 --> 12:46.960
actually like the parameter,

12:46.960 --> 12:48.960
say which model you want,

12:48.960 --> 12:51.960
and Solar will take care of doing the embedding.

12:51.960 --> 12:53.960
Until yesterday it was not the case.

12:53.960 --> 12:55.960
Until yesterday you had to do the embedding externally.

12:55.960 --> 12:58.960
And then Solar was able to do the vector calculations

12:58.960 --> 12:59.960
and vector management,

12:59.960 --> 13:01.960
but not the embedding itself.

13:01.960 --> 13:04.960
So it's quite fresh actually.

13:04.960 --> 13:06.960
So Rag, what does it look like?

13:06.960 --> 13:08.960
So we're going back to videos into phase.

13:08.960 --> 13:11.960
And so the little box I should do on the lower rights,

13:11.960 --> 13:14.960
if you click on it, it's a chatbot.

13:14.960 --> 13:16.960
I'm sure it's not really a surprise for it,

13:16.960 --> 13:19.960
but what does it allow you to do?

13:19.960 --> 13:21.960
You can ask your content.

13:21.960 --> 13:25.960
So the first mechanism is the typical Rag

13:26.960 --> 13:28.960
recovery management generation.

13:28.960 --> 13:31.960
Everybody's familiar with Hague.

13:31.960 --> 13:33.960
Yeah, I have some handshake.

13:33.960 --> 13:36.960
I will take it as a yes for everybody.

13:36.960 --> 13:41.960
So we go as a Rag for all of the content.

13:41.960 --> 13:44.960
But you can also ask questions for specific documents.

13:44.960 --> 13:45.960
So in the search results,

13:45.960 --> 13:48.960
you can notice that there is a little ask with AI.

13:48.960 --> 13:50.960
But then if you pick on it,

13:50.960 --> 13:53.960
the chatbot changes and allows you to ask questions

13:53.960 --> 13:55.960
only for this particular document.

13:55.960 --> 13:57.960
Because I mean, we have customers.

13:57.960 --> 14:00.960
We have documents which are 6,000 pages long.

14:00.960 --> 14:02.960
Which is not a good idea actually.

14:02.960 --> 14:04.960
I think they should do several books.

14:04.960 --> 14:05.960
But okay.

14:05.960 --> 14:07.960
And at least with that,

14:07.960 --> 14:09.960
you can ask a question only for that.

14:09.960 --> 14:11.960
And we have ideals for quick actions.

14:11.960 --> 14:14.960
So it's prepared actually prompt for instance

14:14.960 --> 14:16.960
to do the summarization of a specific document.

14:16.960 --> 14:18.960
And then you can decide.

14:18.960 --> 14:21.960
You can create your own quick actions.

14:22.960 --> 14:25.960
This is meant for for his purpose.

14:25.960 --> 14:26.960
So what is Rags?

14:26.960 --> 14:27.960
I knew you said you knew it,

14:27.960 --> 14:28.960
but just behind the scenes.

14:28.960 --> 14:29.960
So it's a standard.

14:29.960 --> 14:31.960
So user, I did the drawings.

14:31.960 --> 14:33.960
I hope you're amazed.

14:33.960 --> 14:34.960
Okay.

14:34.960 --> 14:35.960
Really proud of them,

14:35.960 --> 14:36.960
especially for brain.

14:36.960 --> 14:37.960
It's amazing.

14:37.960 --> 14:39.960
So we use it as a query.

14:39.960 --> 14:41.960
He's happy before doing the query.

14:41.960 --> 14:43.960
But so data for it as a vector search.

14:43.960 --> 14:44.960
Thank you.

14:44.960 --> 14:46.960
So to the so-law, we request dimensions.

14:46.960 --> 14:48.960
We have a dedicated collection.

14:48.960 --> 14:50.960
Composive mini chunks.

14:51.960 --> 14:54.960
Of the vectors coming from the documents that I've been indexed.

14:54.960 --> 14:56.960
We classify them.

14:56.960 --> 14:59.960
We take the top end depending on the context of the capabilities.

14:59.960 --> 15:02.960
And we ask for GNAI.

15:02.960 --> 15:05.960
Please answer to the user's query.

15:05.960 --> 15:10.960
Using only these chunks as a context to the prompt.

15:10.960 --> 15:17.960
You can either ask an onsite or a cloud cloud,

15:17.960 --> 15:18.960
trade LLM.

15:18.960 --> 15:22.960
As long as it's compatible with OpenAI API interface.

15:22.960 --> 15:24.960
For instance, Mistral.

15:24.960 --> 15:25.960
Yeah.

15:25.960 --> 15:27.960
To promote some French innovation.

15:27.960 --> 15:30.960
I know he's just a gentleman one.

15:30.960 --> 15:32.960
I think it collapsed.

15:32.960 --> 15:34.960
No gentleman LLM.

15:34.960 --> 15:35.960
You don't know.

15:35.960 --> 15:36.960
There is no gentleman in the room.

15:36.960 --> 15:37.960
So that's fine.

15:37.960 --> 15:38.960
No.

15:38.960 --> 15:39.960
It's not good.

15:39.960 --> 15:40.960
Perfect.

15:40.960 --> 15:42.960
And then we have LLM generates a nice answer.

15:42.960 --> 15:45.960
And data for it takes it generates a user interface.

15:46.960 --> 15:51.960
The way it works, now that we have the Solovic top part.

15:51.960 --> 15:52.960
So why do we do rag?

15:52.960 --> 15:55.960
Well, why not proposing customers to train LLM.

15:55.960 --> 15:59.960
So there are these are the rationales for the enterprise edition.

15:59.960 --> 16:02.960
And this is the rational for the community edition.

16:02.960 --> 16:08.960
So the enterprise edition is because if you train LLM with the copies of the commands,

16:08.960 --> 16:11.960
LLM does not care about permission level systems.

16:11.960 --> 16:13.960
It will generate the statistics of words.

16:13.960 --> 16:16.960
And it will create an answer out of the complete copies.

16:16.960 --> 16:21.960
So you will disclose confidential information to potentially to an employee.

16:21.960 --> 16:23.960
So you can't, you can't do that.

16:23.960 --> 16:27.960
And we flag because we have this first step of doing the vector search.

16:27.960 --> 16:29.960
Based on the documents that Solov manages.

16:29.960 --> 16:33.960
We do filter out the documents that the users are not allowed to see.

16:33.960 --> 16:38.960
And this way, the answer generated is only based on what the user is allowed to see.

16:38.960 --> 16:42.960
But for the customer of the community edition,

16:42.960 --> 16:48.960
it's not easy to manage modifications in an LLM when you train it.

16:48.960 --> 16:50.960
Imagine you have train it with documents.

16:50.960 --> 16:52.960
And now the documents need to be modified.

16:52.960 --> 16:54.960
You cannot just say to be an M4, get this document.

16:54.960 --> 16:56.960
And here is a new version.

16:56.960 --> 16:57.960
Okay, there are many things to do.

16:57.960 --> 16:59.960
It was basically one being to retrain everything,

16:59.960 --> 17:01.960
which is something you may not want to do.

17:03.960 --> 17:08.960
I hope you're amazed by my roadmap compatible with the EU.

17:09.960 --> 17:12.960
So that's a roadmap for 2025.

17:12.960 --> 17:13.960
Q1.

17:13.960 --> 17:15.960
It's a recruitment generation prototype.

17:15.960 --> 17:18.960
As I said, the last PR was done yesterday.

17:18.960 --> 17:21.960
So normally everything is almost ready on data very side.

17:21.960 --> 17:24.960
So it should be complete in time.

17:24.960 --> 17:26.960
Second quarter.

17:26.960 --> 17:29.960
We'll finalize the transformation connectors.

17:29.960 --> 17:33.960
Okay, the transformation connectors are for the many full CF pipelines.

17:33.960 --> 17:35.960
So it will allow you to easily say,

17:35.960 --> 17:37.960
I want to have classification.

17:37.960 --> 17:39.960
I want to have entity extraction.

17:39.960 --> 17:40.960
I want to have translation.

17:40.960 --> 17:44.960
This type of actions when you do Vintexing phase.

17:44.960 --> 17:48.960
Second quarter as well, we provide Victor Search instead of BM25.

17:48.960 --> 17:49.960
So for the results list.

17:49.960 --> 17:54.960
So you will be able to choose either the good old BM25 or Victor Search.

17:54.960 --> 17:57.960
And then in Q3, you will have the hybrid search.

17:57.960 --> 17:59.960
Are you familiar with hybrid search?

17:59.960 --> 18:00.960
Yeah.

18:00.960 --> 18:02.960
Well, it's good.

18:02.960 --> 18:03.960
Okay, good.

18:03.960 --> 18:05.960
But the hybrid search is you do it with two queries.

18:05.960 --> 18:06.960
You do a BM25.

18:06.960 --> 18:07.960
So the base query.

18:07.960 --> 18:10.960
At the same time, you do a Victor Search query.

18:10.960 --> 18:15.960
And you merge the results into just one results set.

18:15.960 --> 18:17.960
And we will be using RRF.

18:17.960 --> 18:18.960
Reciprocare rank.

18:18.960 --> 18:19.960
Fusion.

18:19.960 --> 18:20.960
So it's always hard to say.

18:20.960 --> 18:22.960
Recipro.

18:22.960 --> 18:23.960
Okay.

18:23.960 --> 18:24.960
RRF.

18:24.960 --> 18:25.960
As an algorithm.

18:25.960 --> 18:27.960
And.

18:27.960 --> 18:29.960
Six minutes for questions.

18:29.960 --> 18:31.960
Because you're all eager to ask questions.

18:31.960 --> 18:34.960
You can go on our website for the downloads.

18:34.960 --> 18:35.960
I have already.

18:35.960 --> 18:36.960
So it's Apache license.

18:36.960 --> 18:37.960
Okay.

18:37.960 --> 18:38.960
In Java.

18:38.960 --> 18:40.960
We are not allergic to Java.

18:40.960 --> 18:42.960
If you are a doctor.

18:42.960 --> 18:43.960
Lovers.

18:43.960 --> 18:44.960
You can use the container on the curb.

18:44.960 --> 18:47.960
That's probably the quickest way for you to play with it.

18:47.960 --> 18:49.960
Lots of documentation.

18:49.960 --> 18:50.960
Okay.

18:50.960 --> 18:51.960
We are now.

18:51.960 --> 18:53.960
I think we have 300 pages of the group.

18:53.960 --> 18:55.960
Some of them are maybe.

18:55.960 --> 18:56.960
Deplicated.

18:56.960 --> 18:57.960
But we are not aware of.

18:57.960 --> 18:59.960
So if you end up with issues.

18:59.960 --> 19:01.960
But best is probably to ask us.

19:01.960 --> 19:04.960
You know, you do a GitHub discussion or GitHub issue.

19:04.960 --> 19:05.960
And you say.

19:05.960 --> 19:06.960
Is it?

19:06.960 --> 19:07.960
It's a documentation.

19:07.960 --> 19:08.960
Really.

19:08.960 --> 19:09.960
Okay.

19:09.960 --> 19:10.960
We don't know best.

19:10.960 --> 19:11.960
But it's not easy.

19:11.960 --> 19:13.960
So the code is normally on our GitHub.

19:13.960 --> 19:15.960
But we have a mirror on GitHub.

19:15.960 --> 19:18.960
But all the projects are on GitHub.

19:18.960 --> 19:19.960
And yeah.

19:19.960 --> 19:20.960
The main one is on GitHub.

19:20.960 --> 19:22.960
And the LinkedIn account.

19:22.960 --> 19:24.960
You want to follow our amazing activities.

19:24.960 --> 19:27.960
Because we just recruited the marketing and.

19:28.960 --> 19:31.960
So this day we have lots of LinkedIn contributions.

19:31.960 --> 19:32.960
So if you want to follow.

19:32.960 --> 19:33.960
That's what it was.

19:33.960 --> 19:34.960
Six months.

19:34.960 --> 19:35.960
It should follow.

19:35.960 --> 19:36.960
The.

19:36.960 --> 19:37.960
And I'm done.

19:37.960 --> 19:39.960
Thank you very much.

19:45.960 --> 19:46.960
Thank you very much.

19:46.960 --> 19:47.960
The trick.

19:47.960 --> 19:48.960
Do we have any question?

19:48.960 --> 19:49.960
Five minutes.

19:49.960 --> 19:50.960
Three.

19:50.960 --> 19:51.960
Two.

19:51.960 --> 19:52.960
Two.

19:52.960 --> 19:54.960
Two.

19:55.960 --> 19:56.960
Do you have a connector for next.

19:56.960 --> 19:58.960
Or import the documents?

19:58.960 --> 19:59.960
Uh, no.

19:59.960 --> 20:01.960
In a sense that it's it's easy.

20:01.960 --> 20:03.960
I mean, to have one without permission.

20:03.960 --> 20:04.960
It's easy because you just need to.

20:04.960 --> 20:05.960
You create an account.

20:05.960 --> 20:06.960
Next.

20:06.960 --> 20:07.960
I can't with full access to everything.

20:07.960 --> 20:08.960
And you.

20:08.960 --> 20:09.960
You index with mirror.

20:09.960 --> 20:10.960
Or which is local.

20:10.960 --> 20:14.960
Uh, but for doing it with the access control.

20:14.960 --> 20:16.960
So we asked to.

20:16.960 --> 20:17.960
A company specialized.

20:17.960 --> 20:19.960
It would not be expensive to do.

20:19.960 --> 20:20.960
But we don't have the money.

20:20.960 --> 20:21.960
For that.

20:21.960 --> 20:22.960
So.

20:22.960 --> 20:23.960
Yeah.

20:23.960 --> 20:24.960
It's doable.

20:24.960 --> 20:25.960
For release.

20:25.960 --> 20:26.960
10,000 euros.

20:26.960 --> 20:27.960
But not enough.

20:27.960 --> 20:28.960
But yeah.

20:28.960 --> 20:29.960
What would be.

20:29.960 --> 20:30.960
Thank you.

20:30.960 --> 20:32.960
Other questions.

20:32.960 --> 20:33.960
Um.

20:33.960 --> 20:34.960
Um.

20:34.960 --> 20:35.960
Is the.

20:35.960 --> 20:37.960
The index only updated.

20:37.960 --> 20:39.960
Uh, through crawling or is there.

20:39.960 --> 20:40.960
Some.

20:40.960 --> 20:41.960
So.

20:41.960 --> 20:42.960
And the passive way or is there some.

20:42.960 --> 20:45.960
Um, active way like if a document changes in one source.

20:45.960 --> 20:50.960
The source can notify the search engine to re-intex it.

20:50.960 --> 20:54.960
I mean, if it's here, uh, initially, it was thought to be both push and pull.

20:54.960 --> 20:57.960
Uh, they have implemented just the interface for the push.

20:57.960 --> 20:58.960
Okay.

20:58.960 --> 20:59.960
So for now, it's put.

20:59.960 --> 21:01.960
So we fetch the data regularly.

21:01.960 --> 21:02.960
We've crawl.

21:02.960 --> 21:03.960
Prior to crawl.

21:03.960 --> 21:04.960
So you can configure.

21:04.960 --> 21:05.960
Uh, but there is no.

21:05.960 --> 21:06.960
It.

21:06.960 --> 21:07.960
And you always crawl.

21:07.960 --> 21:08.960
Everything.

21:08.960 --> 21:09.960
I don't know.

21:09.960 --> 21:10.960
This depends on the API.

21:10.960 --> 21:11.960
So.

21:11.960 --> 21:12.960
Uh, five shares.

21:12.960 --> 21:13.960
For instance, yes.

21:13.960 --> 21:14.960
It grows everything.

21:14.960 --> 21:16.960
Although we have an agent.

21:16.960 --> 21:17.960
Uh, for.

21:17.960 --> 21:20.960
And, uh, as some be shares on windows, but monitors at the low level.

21:20.960 --> 21:22.960
Uh, the OS level, modifications.

21:22.960 --> 21:25.960
We saw those to do real delta in five shares.

21:25.960 --> 21:27.960
But for instance, for excuse.

21:27.960 --> 21:29.960
Uh, which is an amazing witty.

21:29.960 --> 21:31.960
Uh, the API gives us the delta.

21:31.960 --> 21:32.960
So it's really fast.

21:32.960 --> 21:35.960
We don't need to record everything.

21:35.960 --> 21:37.960
Um, do you have a strategy?

21:37.960 --> 21:39.960
How to deal with images and.

21:39.960 --> 21:40.960
And, uh, a lot of them.

21:40.960 --> 21:41.960
Why?

21:41.960 --> 21:44.960
I mean, let me refresh that.

21:44.960 --> 21:48.960
I mean, first, there's seem to be several strategies.

21:48.960 --> 21:49.960
How to index images.

21:49.960 --> 21:51.960
So do we have any plans for that?

21:51.960 --> 21:55.960
And the second thing is you could also consider images in the content.

21:55.960 --> 21:57.960
Uh, for the chat.

21:57.960 --> 21:59.960
So something you plan or you do.

21:59.960 --> 22:00.960
Uh, yeah.

22:00.960 --> 22:02.960
You mean as an input like we use a put to an image in the chat?

22:02.960 --> 22:06.960
No. I mean, if the content for example references an image or if the documentary index

22:06.960 --> 22:10.960
contents in image, you could provide the image to the LM.

22:10.960 --> 22:11.960
I mean, we have.

22:11.960 --> 22:12.960
Yeah.

22:12.960 --> 22:16.960
So that's that's way further than the 2025 or the map.

22:16.960 --> 22:18.960
It's how to handle the multi-modality.

22:18.960 --> 22:22.960
Uh, so we we have looked very briefly at Colpalee, for instance,

22:22.960 --> 22:25.960
for the followers of you for the multi-modality models.

22:25.960 --> 22:30.960
Um, but yeah, the graph, the holy grail of indexing this day is how to be able to really

22:30.960 --> 22:33.960
properly manage multi-modal documents.

22:33.960 --> 22:38.960
Like a document which has a graph, an image, text, formula, etc.

22:38.960 --> 22:40.960
There are ways for now.

22:41.960 --> 22:44.960
But considering where we are, it will take some time.

22:44.960 --> 22:48.960
But yeah, if you have a first pass when you do indexing, where you take everything as an image,

22:48.960 --> 22:51.960
basically, when it's looking at a text, everything is an image.

22:51.960 --> 22:55.960
And you have one LLM dedicated to detecting the types of modalities.

22:55.960 --> 22:58.960
Okay. And he says, okay, here very similar.

22:58.960 --> 23:01.960
And he calls an LLM dedicated to analyzing formulas.

23:01.960 --> 23:06.960
It stores as a metadata, a summary of what it has analyzed.

23:06.960 --> 23:08.960
So like this formula talks about this and that.

23:08.960 --> 23:10.960
This image talks about the cat, etc.

23:10.960 --> 23:12.960
And we store all that.

23:12.960 --> 23:16.960
And then when the user asks a query, the text query.

23:16.960 --> 23:18.960
So it's in the future.

23:18.960 --> 23:20.960
What the user will ask a query.

23:20.960 --> 23:23.960
We will look in this text description of the different modalities.

23:23.960 --> 23:26.960
Which ones would satisfy what the user asks?

23:26.960 --> 23:27.960
Okay.

23:27.960 --> 23:30.960
And then it will call for proper LLM specialized.

23:30.960 --> 23:35.960
So let's say I have a document where I have my financial statements as a table for the last two

23:35.960 --> 23:40.960
years. And the question is, what is the financial revenue expected for the next two years?

23:40.960 --> 23:41.960
Obviously.

23:41.960 --> 23:42.960
Obviously.

23:42.960 --> 23:48.960
The dream is that the LLM, the system will choose the LLM specialized in the graph analysis.

23:48.960 --> 23:51.960
And then do the computation for the forecast because obviously this is where it stands.

23:51.960 --> 23:53.960
So what's the goal?

23:53.960 --> 23:58.960
The pieces are more or less ver, but I heard that Colpally generates so many vectors for one single page.

23:58.960 --> 24:01.960
But it would compute in this row, it was calamity for now.

24:01.960 --> 24:03.960
So it's ongoing work.

24:03.960 --> 24:05.960
But not there yet.

24:05.960 --> 24:06.960
Thank you.

24:06.960 --> 24:08.960
Any other question?

24:08.960 --> 24:11.960
We have time for one more?

24:11.960 --> 24:12.960
No.

24:12.960 --> 24:13.960
Okay.

24:13.960 --> 24:14.960
And thank you very much.

24:14.960 --> 24:15.960
Thank you very much.

