WEBVTT

00:00.000 --> 00:10.880
So I think we are going to start.

00:10.880 --> 00:19.440
This is when an AI system, when is an AI system free slash open for a lack of a better title.

00:19.440 --> 00:22.000
And I'm Richard Fontan, but I'm just the moderator.

00:22.000 --> 00:27.920
And we have an awesome set of panelists, and I think we'll just start out having everyone

00:27.920 --> 00:31.600
quickly introduce themselves.

00:31.600 --> 00:32.800
My name is Kiran O'Reardon.

00:32.800 --> 00:37.680
I've worked on free software policy for about 20 years now, and I kind of work for open

00:37.680 --> 00:42.680
form Europe.

00:42.680 --> 00:50.560
Hello, my name is Zoe Koyman, and I'm the executive director of the F.S.F.

00:50.640 --> 00:53.040
I'm Julia Ferrioli.

00:53.040 --> 01:06.880
I am here on my own as representing my own personal opinions, and most of the time, I work at AWS, otherwise.

01:09.760 --> 01:15.840
Hi, I have a black 25 years using writing contributing to open software.

01:15.840 --> 01:21.920
I also hear on my own, so he might have thought I was affiliated with something else, and not just here as myself.

01:27.760 --> 01:32.800
Okay, maybe we should, I don't know, this is the most important question to get to, but

01:33.920 --> 01:38.800
given that I observed that the title of this session is kind of problematic.

01:39.680 --> 01:46.560
What do you, any of you have any thoughts on what the scope of this question ought to be?

01:48.960 --> 01:56.000
I think we're going to, at least touch upon the OSI's open source AI definition, which has captured a lot of

01:56.560 --> 02:02.720
attention and criticism over the past several months, and it's sort of defined.

02:02.720 --> 02:14.160
It reuses this legislative definition of, or OECD definition of AI, so-called AI system, which is,

02:15.200 --> 02:24.240
I think is kind of a problematic definition, and you could be argued to be too broad, too narrow.

02:25.120 --> 02:30.720
When we talk about open or free software, AI, whatever you want to describe it,

02:31.680 --> 02:37.840
what do we mean by AI, is basically the question, because I don't want to have thoughts on this, should this be limited to

02:38.560 --> 02:45.600
machine learning, or a subset of machine learning, or should it be AI in the sort of academic sense, or what used to be the academic sense?

02:48.000 --> 02:56.960
So my background is, I have a bachelor's and master's in artificial intelligence, and human computer interaction,

02:57.040 --> 03:02.160
last I checked, and I've been working in open source for basically my entire career,

03:02.800 --> 03:09.440
I've been doing research in, I've done research in AI using open source software,

03:10.880 --> 03:19.760
20 years ago, I think at this point. So I think that it gets very complicated when you're talking about

03:20.320 --> 03:25.600
a system, because no one really knows what that is, despite the definition, we can't agree

03:26.000 --> 03:32.640
upon the interpretation of that definition, and it's not necessarily the right

03:34.000 --> 03:44.320
level to be talking about free and open source at the system level. It's like saying,

03:44.320 --> 03:49.680
okay, well we've got an entire stack that we're going to call free and open source, including the

03:49.680 --> 03:56.960
run-able components, which it's unnecessarily complicated, and we could simplify that a lot

03:56.960 --> 04:03.760
by looking at what actually is in that system definition.

04:11.760 --> 04:17.120
I'm not sure if the question of the AI system is something that we're going to have problems with,

04:17.120 --> 04:22.000
because I always think like I know that a certain company that is disagrees with everyone

04:22.000 --> 04:27.200
on the stage at the moment, and they travel around European Parliament, I'm a lobbyist,

04:27.200 --> 04:32.720
I cross paths to these people, and they have this diagram with 25 different modules and they

04:32.720 --> 04:37.520
explain to the members of parliament, this is what an AI system is, and this is what all the

04:37.520 --> 04:43.440
different parts do. But if you have 25 modules on this slide, you're not actually expecting

04:43.520 --> 04:48.400
anyone to understand it. So that means you're expecting people to not understand it, and you're

04:48.400 --> 04:52.320
trying to put them into a position where you're hoping they'll take your word. So when I look at

04:52.320 --> 04:56.880
this set of modules, I see I look at each one and I'm like, well that's software, that's software,

04:56.880 --> 05:01.840
that's software, you end up with a 24 modules that are software, and one module which is an AI model.

05:01.840 --> 05:06.720
So I don't really think in terms of an AI system, I think of a software system, and 24 parts

05:06.720 --> 05:10.720
are software, and we've already decided pretty much what to do with software, so all we really need to

05:10.720 --> 05:18.080
focus on is the AI model. I haven't yet found anyone who found a practical problem with the

05:18.080 --> 05:24.000
difference in machine learning and AI models, but I also haven't understood the distinction,

05:24.000 --> 05:27.040
I know that that would be described in tomorrow's presentation, or I think it would be described

05:27.040 --> 05:32.640
in tomorrow's presentation by FSF, but I think in general the AI system, it's not the big

05:32.720 --> 05:41.040
stumbling block that we have to focus on, I think that'll be some goodbye over time.

05:42.160 --> 05:49.280
I think I agree with Julia to say that the word system is a vague concept that would do well

05:49.280 --> 05:54.000
being identified when I think it's also worth pointing out that for us, at least from a

05:55.920 --> 06:00.560
reception perspective, the word artificial intelligence is something that

06:01.280 --> 06:07.280
gives people the idea that the application is doing something that we don't necessarily believe

06:08.000 --> 06:13.680
them to be able to do, and so we at the FF at least limits our wording to the

06:13.680 --> 06:22.560
words machine learning, because that's much more appropriate to what the application actually does,

06:22.560 --> 06:30.160
intelligence is something that is still specific to humans, and I think we should for now at least

06:30.160 --> 06:38.400
keep it there. And I think from where I sit as the person who I think was the first one to suggest

06:38.400 --> 06:44.320
during the definition process with the outside that it used the OECD's definition,

06:44.320 --> 06:47.600
actually painted a friend of mine on OECD at the time to be like, hey, fix this typo,

06:48.240 --> 06:54.960
while we were looking at it in the work sessions of sorry, but my reason was this is an emergent

06:55.040 --> 07:02.400
system, it's not intelligence, I agree, but it is trained. It's a database, in essence,

07:03.440 --> 07:09.840
it's all code except for the data, and then some properties emerge as that data gets compressed

07:09.840 --> 07:17.840
and put in a format that can be used whether to pattern match or to predict, to draw insights

07:17.840 --> 07:23.120
in ways that humans really struggle to understand, still how these things are making the outputs

07:23.200 --> 07:29.760
from the inputs that they receive, but in their use, they are automated decision-making systems,

07:30.480 --> 07:40.240
in their systems. Let me ask this question, and I don't know where this would best fit in,

07:40.240 --> 07:48.640
so I'm just starting out with it. Do you think we need a, do any of you think we need a definition?

07:49.280 --> 07:53.200
So we have the free software definition, we have the open source definition,

07:53.200 --> 08:00.720
those of their flawed definitions, but they've kind of served as well for, for, for fuss,

08:01.920 --> 08:05.520
and they've been applied in practice to things other than software,

08:06.880 --> 08:14.560
maybe you think that that fact is sort of trivial, but it is interesting. Do we need a definition

08:15.200 --> 08:20.000
as some people have been assuming or can we just build upon and interpret the existing definitions

08:20.000 --> 08:28.000
we have for that work for free and opens our software? I think the four freedoms were well

08:28.000 --> 08:35.280
articulated and applied, still here, it's still a computer, we can build a system in that

08:35.920 --> 08:40.560
programming plus data and we want the recipients of it to have the same freedoms.

08:45.200 --> 08:54.560
This is a good question, and it's not one where the simple answer. I think the question

08:54.560 --> 09:01.200
becomes what is the definition designed for what is motivating the definition.

09:02.560 --> 09:11.120
If we're looking at it from a, a technical level for machine learning and the outputs of

09:11.120 --> 09:19.520
machine learning, whether those be models or, or user-facing applications,

09:23.200 --> 09:33.040
it, I don't think we do need a new definition for that. However, that's the, that's the

09:33.040 --> 09:41.440
kind of maybe five, ten years ago answer. If we're looking at it from a defensive

09:42.720 --> 09:50.400
position about pushing back against use of the word, or use of the term, open source,

09:50.400 --> 10:00.160
in conjunction with machine learning models, then maybe we do just to say, no, this is,

10:00.160 --> 10:12.320
this clearly is not that. I don't like that approach because I want things to be motivated from,

10:12.320 --> 10:17.600
okay, we, we need this. We're not just going to artificially introduce a new term, a new set of

10:17.600 --> 10:26.320
requirements, a new definition, when we have perfectly good ones that already cover both software,

10:26.400 --> 10:39.200
data, and the combination of those. So there might be need for a new term. I don't think it needs to

10:39.200 --> 10:56.080
be called open source anything. So we thought about this as well on whether or not it should actually

10:56.560 --> 11:02.960
get a whole new definition. As more and more people are doing their computing through machine

11:02.960 --> 11:10.480
learning, the concept of machine learning is for people not quite the same as just software,

11:11.120 --> 11:20.320
and we have responded to people's requests to bring clarification to what a system like that is built

11:20.480 --> 11:27.360
like, and it's not just software, it's also, so it's software licenses, but it can also be

11:27.360 --> 11:34.640
creative commas licenses or other things like data, and papers that might have to be published with

11:34.640 --> 11:41.680
a complete free application, and those all don't necessarily fall in the concepts that we as the

11:41.680 --> 11:47.760
free software foundation have always advocated for, and so we're at least looking at expanding

11:51.040 --> 12:02.320
upon the information that we've given in the past, and we want to make sure that the two things

12:02.320 --> 12:06.320
are not considered the same. We want to make sure that they're not conflated, because we are

12:06.320 --> 12:12.160
very clear about what is free software. We want to stay clear on that, and machine learning is a

12:13.520 --> 12:19.040
trend that is moving forward really fast, and there's a lot of development in it, and it could

12:19.760 --> 12:23.200
many different directions, and we want to make sure that we're talking about two different things

12:24.320 --> 12:32.080
in approaching in. One of the issues is that whether we want to or not or whether we have a certain

12:32.080 --> 12:38.240
timeline, there's already, for example, meta is calling their Lama model open source, and our

12:38.240 --> 12:44.960
deep seek have been called open source as well, and then we also have legislation like in the AI Act,

12:45.120 --> 12:49.360
it says that if your AI system is free and open source software, then you get a certain

12:49.360 --> 12:54.000
lighter regime, and then how do you know if your AI system is free and open source software,

12:54.000 --> 13:00.800
they have a 10-word definition that was copied from a page on Gnudelorg, and that's fantastic

13:00.800 --> 13:06.720
that they did that, but we also don't have an interpretation for how do you apply this definition

13:06.720 --> 13:11.680
to the concept of machine learning. So what I think we're going to need it, people are going to

13:11.680 --> 13:18.320
be calling software free software after a model has been added, and then we have to have our own way

13:18.320 --> 13:24.080
of deciding, is that still free software or is it no longer free software, and we need a way to

13:24.080 --> 13:28.240
be able to correct people if they call something that is clearly not free or open source software,

13:28.240 --> 13:33.280
if they call it free, open source AI, then we have to have a standard that we can say to the media,

13:33.280 --> 13:38.240
for example, this is absolutely not open source AI, so I think whether we want to do not we

13:38.320 --> 13:42.960
need something to to fill in the gap that will be created because people are going to touch the

13:42.960 --> 13:47.840
two terms anyway, free software and this software package which now includes an AI system.

13:51.920 --> 14:00.080
So, and that goes back to the defensive posture that I was mentioning, because if the

14:00.080 --> 14:05.840
motivation is to be able to correct people when people are misusing the term open source,

14:05.920 --> 14:16.480
yes, we do need a term, I mean we can call it, I don't know, like, publicly available,

14:19.680 --> 14:28.320
we can call it like the table machine learning for like air, but if I don't want it to have the

14:28.320 --> 14:36.640
term open source attached to it. So, in terms of how do we know if it's open source or not,

14:36.640 --> 14:43.680
I mean, there are well-established ways of analyzing something that's been received,

14:43.680 --> 14:48.320
a digital good to understand if it's open source software, the difference here only seems to

14:48.320 --> 14:55.280
be around the data, I would propose at least from what kind of work I do around security,

14:55.280 --> 15:03.200
looking at can we know certain security essential properties about this component? If we cannot

15:03.200 --> 15:09.040
know them, if we cannot fix them, if we cannot study them or change them, then by implication,

15:09.040 --> 15:15.440
it cannot be open source, because if it were, we could do those things. So, if I think you anticipated

15:15.440 --> 15:23.440
what I was going to next ask, which is also I think the, to me, the biggest question here

15:23.440 --> 15:33.760
is, or set of questions is about source source code. So, free software definition has availability

15:33.760 --> 15:41.600
of source code as a requirement, the OSD, applying to talks about source code uses this idea of

15:41.600 --> 15:46.880
the preferred form for making modifications, which is taken from originally from the GPL,

15:47.840 --> 15:54.160
where it was kind of applied more narrowly. So, this, to me, this is, I think, a very

15:54.160 --> 15:58.720
difficult issue, I'm not sure that if all of you think it is as difficult as I think it is.

16:00.320 --> 16:05.360
But when we think about, I mean, when we go to source code, and I think this is about model

16:05.360 --> 16:11.440
weights, basically, for the reasons you were kind of saying, when we think about what source code

16:11.520 --> 16:20.960
means, or the source code requirement means for model weights. It's not, and the issue is basically

16:20.960 --> 16:26.320
whether primarily whether training data is equivalent to the source code of the model weights.

16:27.120 --> 16:33.840
What does source code mean in this technical context? It's not source code in a strictly technical

16:33.840 --> 16:40.000
set. So, what is, is source code the preferred form for making modifications is the concern reproducibility

16:40.000 --> 16:45.680
is that the ability to kind of inspect, for a lot of people talk about the ability to inspect

16:46.640 --> 16:54.400
data training data for evidence of bias in trained models, what do all you think?

16:56.400 --> 17:02.080
All of those are good points. I'll add a few to the list of questions. Identifying

17:02.080 --> 17:07.360
vulnerabilities or backdoors, if there's an edge case in, let's say, an image recognition model

17:07.440 --> 17:13.600
or language generation model, and certain types of inputs result in very unexpected outputs

17:13.600 --> 17:19.040
that would be potentially catastrophic if that model were used in the wild, say, as an entry

17:19.040 --> 17:26.400
sensor on a door in a security facility. If you cannot detect those, or once detected, if you

17:26.400 --> 17:32.800
lack enough information or source code of some type, to fix it yourself, then it's not open source.

17:32.880 --> 17:38.400
Right, if you don't have that digital sovereignty to fix it, even once you know about the

17:38.400 --> 17:46.400
vulnerability, how could you call it open source? I appreciate that you're bringing this security

17:46.400 --> 17:53.760
point to view because I'm going to take a different approach, which is about the empowerment that

17:53.760 --> 18:00.960
free an open source software gives each and every one of us, which is the ability to at a fundamental

18:00.960 --> 18:12.400
level understand what is going on in the code that you're using. And while for machine learning,

18:13.120 --> 18:19.040
part of it is heavily dependent on code, the type of algorithm you choose to train, your model,

18:19.040 --> 18:27.040
or whatever other machine learning approach you're using, it is also incredibly heavily dependent

18:27.120 --> 18:39.920
on the data. If you use the same pipeline but change the data source or sources, then you're

18:39.920 --> 18:49.920
going to get a different model. Some exceptions apply. So, I want everyone, every person who

18:50.000 --> 19:01.120
wants to use a machine learning model to have that informed consent. This is what that model is

19:01.120 --> 19:10.800
bringing to me. This is what's going to be represented abstractly, but I want to know what risks

19:10.800 --> 19:17.600
that I'm taking on from a variety of perspectives. And I want everybody else who's adopting models

19:17.680 --> 19:28.560
to have that ability as well. Yeah, I actually don't have that much to add. We

19:31.520 --> 19:39.520
want to make sure people can fully understand study to the full extent of the application. And of

19:39.520 --> 19:45.040
course, the key word that we tend to use at the Free Software Foundation is the word control. We

19:45.040 --> 19:51.280
want to make sure that you can control your technology. And I think when you bring it down,

19:51.280 --> 19:56.400
it sort of boils down to all of the different elements that I've been brought forward by the people

19:56.400 --> 20:04.400
on this stage. So, I think the question of source code is the core question. And so, it's almost

20:04.400 --> 20:08.720
asking for a definition and none of us have the definition. I think this is something we have to

20:08.720 --> 20:13.920
realize that we're still in the early stages of discussing how we're going to deal with this issue.

20:13.920 --> 20:21.280
And we might be 25% in actually getting close to a solution. So, I think at the moment,

20:21.280 --> 20:28.800
do we or do I have a definition for source code? The answer is no. And we're hitting

20:28.800 --> 20:34.800
some certain dead ends in our attempts. So, when I'm looking for the opportunities, I do see

20:34.800 --> 20:39.360
that in the private sector and on the individual level, a lot of people are using AI already

20:39.360 --> 20:45.840
and they're using it through Facebook or through apps that are using non-free AI systems.

20:46.400 --> 20:51.680
And so, I think that's a difficult area, but there's also an area that hasn't yet adopted AI

20:51.680 --> 20:57.360
very much and that's the public sector. And that's also the area where there is a legal

20:57.360 --> 21:02.560
obligation to be transparent and to be aware of what you're doing, which you're citizens data

21:02.560 --> 21:07.520
and to be in control so that if you detect bias, you're able to correct that bias. And so,

21:07.600 --> 21:11.920
being able to detect bias and being able to correct that is basically another way of saying

21:11.920 --> 21:16.800
study and modify, which is the summary of the difficult questions of applying the free software

21:16.800 --> 21:22.400
definition to AI. So, I think rather than trying to come up with a definition of source code right

21:22.400 --> 21:26.880
now, or at least we should be trying, but rather than thinking that we might have it already,

21:26.880 --> 21:32.320
maybe the first step would be that we can do now is to give the public sector a set of principles

21:32.320 --> 21:38.880
for what they should be able to do with any AI system that they plan on using. And by doing this,

21:38.880 --> 21:44.960
we at least get them to also think about how they would implement a policy that would give them

21:44.960 --> 21:49.840
the ability to do the modify and it buys us some time because we're still working in this as well.

21:51.680 --> 21:57.280
Do you do any of you think that there's an over emphasis by some people talking about this issue on

21:58.080 --> 22:04.320
modification or like the term modification? Because like one thing I've heard from some people is

22:05.920 --> 22:10.240
like here's a couple of things I've heard and I think are kind of related one is that

22:11.280 --> 22:17.760
well in practice most people modify pre-trained models. So they aren't training

22:18.640 --> 22:23.040
models from scratch. So it isn't useful for them to have

22:23.040 --> 22:32.800
access to or knowledge of and access to the corpus of training data to exercise the right to

22:32.800 --> 22:37.840
modify in practice. The other thing I've heard that I think is a little bit connected to this is that

22:39.120 --> 22:46.800
it wouldn't be just wouldn't be useful for something like an LLM, the amount of training data

22:46.800 --> 22:53.200
so vast that if you just like through this enormous amount of training data that's at a user,

22:53.200 --> 22:57.920
they wouldn't know what to do with it anyway they wouldn't have the hardware resources to make

22:57.920 --> 23:05.200
use of it. Is there anything to that argument? I think off the cuff it's pretty hard to say if that

23:05.200 --> 23:16.320
were true lie on an common crawl it wouldn't be so lightly downloaded. I think this illustrates

23:16.400 --> 23:26.240
or this shines a light on of a very common problem with the discussion which is

23:28.400 --> 23:34.880
the discussions typically focus on LLMs and when we're talking about machine learning

23:35.920 --> 23:42.880
there's way more there's way way more than LLMs and the rise of small language models

23:43.040 --> 23:49.920
which I'm really kind of as much as I'm excited about and anything these days that's one of the

23:49.920 --> 23:59.280
areas that actually does exactly as the idea of small language models. So we don't want to center

23:59.280 --> 24:08.080
the discussions on one specific part of the technology and then subject the rest of it to the same

24:08.160 --> 24:20.480
um maybe delusion of open source free and open source principles that are we're seeing when

24:21.120 --> 24:32.480
when this discussion centers around LLMs. So I need to pass off that on so that someone else can pick it up

24:33.360 --> 24:44.880
so I think the question is is fine-tuning or incremental training sufficient to be able to take control

24:44.880 --> 24:51.040
to make this software do as you wish it would behave as you wish and so I was very hopeful about

24:51.040 --> 24:57.360
this last year because there seems to be two ways that we could have an AI model that could be

24:57.360 --> 25:02.000
studied modified and one would be to get the training data and training software and the process

25:02.000 --> 25:06.080
and then the second would be to directly study and modify the model and to be the retraining or

25:06.080 --> 25:11.680
fine-tuning and the I looked into the second option because the first option is so difficult to

25:11.680 --> 25:17.440
training data sometimes there's copyright problems there's privacy problems sometimes the

25:17.440 --> 25:23.360
the data doesn't exist by the time the model has been trained sometimes the training would take

25:23.360 --> 25:29.840
data center that is just not available to almost anyone on the planet and so the training data

25:29.840 --> 25:35.680
model approach looked like a dead end and then I focused on the directly modifying the the model

25:36.320 --> 25:42.480
you can you can do retraining and it does give you some control but I haven't found yet an

25:42.480 --> 25:46.160
answer to the question of is it enough does it really give you enough control does it give you

25:48.400 --> 25:55.040
yeah wait yeah so you like because it's a because actually yeah I mean you go first

25:56.000 --> 26:04.240
so so specifically retraining latitude training fine-tuning of the model is mathematically shown to be

26:04.240 --> 26:11.440
insufficient to remove hidden layers and back doors there's plenty of good articles out there

26:11.440 --> 26:17.040
some scholarly research some by large security companies I think crowd strike is one that's

26:17.040 --> 26:21.520
done some good research that I often point people to like try to understand it they wrote a nice walk

26:21.520 --> 26:28.640
through for other customers and posted it in ongoing learning systems systems that are built

26:28.640 --> 26:32.240
and then continuously learn based on what the experience in the field like detecting

26:33.200 --> 26:42.400
malicious attacks against networks if they learn something that is that reduces their usefulness

26:42.400 --> 26:47.200
or it adds a back door you cannot remove it later you have to roll back the training to a

26:47.200 --> 26:53.120
chat point and start over after you scrub the bad inputs from the data set it's only way I think

26:53.120 --> 27:02.080
that tells me enough like you have to be able to modify the data set to retrain so thanks um

27:02.080 --> 27:06.800
I'm not a machine learning expert so I'm happy to hear what you're saying because it's something

27:06.800 --> 27:12.080
that we have been discussing as well because it has been brought to us as well that that

27:12.080 --> 27:19.280
people indeed say like retraining is actually possible you can control your entire stack with that but

27:19.280 --> 27:25.920
I'm glad to hear that you know our thoughts about that which are insured so we should be sure

27:27.680 --> 27:32.320
are on the same on the good path there the other thing that I wanted to point out is that

27:32.720 --> 27:44.320
retraining is in many cases possibly something that works for people that want to indeed modify

27:45.760 --> 27:52.000
a machine learning application and the other objection was of course the the vastness of the amount of

27:52.000 --> 27:58.320
training they did that can go into a system and that of course can run into availability and you can

27:58.480 --> 28:05.280
run into privacy issues and computing power and all of those things and that makes it

28:06.480 --> 28:14.960
difficult I think to some of the two objections up is is we genuinely believe that we should be

28:14.960 --> 28:22.320
creating a definition or criteria here at least that apply to all systems and not most systems

28:22.640 --> 28:32.080
people do train their own models people do recreate their own models and

28:33.840 --> 28:40.240
the idea that that people don't it's it's similar to arguments against open source software

28:41.680 --> 28:46.880
when most people did not have the capability to recompile their kernel

28:47.120 --> 29:04.240
they did not have the ability to modify I don't know for Tran but very passionate and principal

29:04.240 --> 29:11.120
folks pushed forward with the idea that the source code should be available for you to look at

29:11.120 --> 29:17.600
and study so just because it's hard is not a good enough reason not to push for it

29:25.440 --> 29:36.000
so the question specific about the the OSI's OS AID and maybe this should be specifically for

29:36.000 --> 29:43.600
for Kieran because I think most of you probably I think probably most of you would what I was

29:43.600 --> 29:51.120
going to ask is does anyone want to say anything in support of the the way the OSI has approached this

29:51.120 --> 29:56.960
issue which I would say is basically by saying it's kind of a compromise a policy compromise where

29:56.960 --> 30:04.800
they've said you have to provide as I recall sufficiently detailed information about the training data

30:04.800 --> 30:13.360
for a model such that a skill a person skill in the yard I guess can maybe it's produced

30:14.560 --> 30:20.160
an equivalent technically equivalent model I think most of you would probably reject that but

30:20.160 --> 30:31.760
Kieran do you have a no that that's indeed the weak link in the the OSI AI OSI OS AI definition

30:32.240 --> 30:45.520
the the OSI AI OS definition the definition it says that you you have to distribute the training

30:45.520 --> 30:49.840
data unless that's not possible and if that's not possible then you have to give sufficient

30:49.840 --> 30:56.080
information about the training data such that somebody else can rebuild a comparable system

30:56.240 --> 31:01.120
or maybe study a modify I can't remember the detail but the so this part of the definition

31:01.840 --> 31:08.480
I expect will will change possibly drastically in the next version and just over the coming years

31:08.480 --> 31:16.480
in general the I suggested to my employer that we should endorse OSI's work and you know we added

31:16.480 --> 31:22.960
an explanation to our endorsement and the reason is that the the definition that has been published by OSI

31:23.600 --> 31:29.440
it's enough already to be able to look at that definition and say well you know that explains

31:29.440 --> 31:35.920
the Gemini and Deepseek and Lama and online AI services are not open source AI and really that's

31:35.920 --> 31:41.360
all we have right now that we don't have a definition that covers all of the cases we do have

31:41.360 --> 31:45.920
meta running around the European Parliament saying that there's other AI things are open source

31:45.920 --> 31:50.480
and we do have European legislation which says you get a lighter regime if your AI is free

31:50.480 --> 31:56.720
and open source software so having something that we can use is positive and you know it was

31:56.720 --> 32:03.520
flagged as version 1.0 there will be a process for developing a version 2.0 or 1.1 or you know

32:03.520 --> 32:11.920
and so the second reason why I think it was worth supporting this work is that I think like we

32:11.920 --> 32:20.400
had disunity 20 26 years ago when the open source term was created and this

32:20.400 --> 32:27.440
took 10 15 years to settle down if it even did the situation back then was actually a lot

32:28.640 --> 32:33.360
a lot easier because back then it was two groups of people who both cared about the software

32:33.360 --> 32:39.200
who were disagreeing about something. Today we have a situation where there are there might

32:39.200 --> 32:44.640
also be two or multiple groups that care about the software but there are also very large companies

32:44.640 --> 32:48.720
outside that would love to exploit the software or would love the software disappeared and I think

32:49.120 --> 32:57.440
by having disunity right now this year we have a much higher risk that we give entities that

32:57.440 --> 33:04.960
would like to harm the free open source ecosystem we give them a lever and so I think that the

33:04.960 --> 33:10.160
definition of particular part about what you do and the training data isn't available that needs

33:10.160 --> 33:16.480
serious work it's going to take probably years I hope everyone will get involved and this is part

33:16.960 --> 33:24.320
of that and I think it's more the unity of everyone can find the work is the important thing

33:24.320 --> 33:32.480
rather than the text of this version one window which will change. It's very difficult for me to listen

33:32.480 --> 33:41.920
to that because I just I don't see a reason to adapt what we believe in just because the OSI

33:41.920 --> 33:56.080
wanted to publish a definition and I and I think also on some of the smaller arguments that you

33:56.080 --> 34:02.560
made I mean made as lama license is known free in any interpretation that you can go with so they

34:02.560 --> 34:07.520
can call themselves open source or free software all day one but but that's just not what it is and I think

34:07.520 --> 34:15.440
most people also know that and from what I understand I don't run the whole ways in Brussels but

34:15.440 --> 34:20.480
from as far as I understand most people in Brussels actually already have a concept of what open source

34:20.480 --> 34:30.880
is supposed to mean in these laws and I think the way that that will develop it doesn't necessarily

34:30.880 --> 34:35.840
have to be something that we need to rush towards we need to make sure that we do it right

34:35.920 --> 34:43.760
and not fail because we see all kinds of issues with not publishing something today.

34:47.360 --> 34:59.120
I hear you concerns I do and I understand the motivation and behind that I also agree that we

34:59.120 --> 35:06.880
shouldn't compromise our principles for the sake of convenience um here's the thing if it's a one dot

35:06.880 --> 35:11.840
o it's got to be workable you have to assume that's the last thing that's ever going to be

35:11.840 --> 35:21.280
listen to or take it in that it's if it's a one dot o then as long as as long as people satisfy

35:21.280 --> 35:28.640
the requirements as spelled out in one dot o that's what's going to be considered open source

35:28.640 --> 35:42.080
AI for the foreseeable future the disunity is unenforced so there could have been unity much

35:42.080 --> 35:50.240
earlier on in the process the disunity that stalled around data information and the requirements

35:50.480 --> 35:58.960
of data um to be made available or not that disagreement was completely unforced and so we could have

35:58.960 --> 36:08.960
been much further along if more people with the expertise in machine learning and open source had

36:08.960 --> 36:20.160
been allowed at the table so you're opening point there are I think two camps

36:21.120 --> 36:27.440
those who are ready to compromise the values the principles of open source in the name of consensus

36:27.440 --> 36:33.120
right now to have a workable definition that achieves a modicum of forward progress and those who

36:33.120 --> 36:40.000
say our principles really stand for themselves and should not be compromised that set was unfortunately

36:41.360 --> 36:46.080
I think pretty overruled in the in the development of a definition and so the principle

36:46.640 --> 36:51.360
that the users the recipients of something need to be able to enjoy the full

36:51.760 --> 36:56.400
ness of the four freedoms of free software I don't think that should be compromised

36:56.400 --> 37:00.720
I do agree that moving forward together is good but we do need to find a way and a number of

37:00.720 --> 37:08.800
proposals for how to do that most of which revolved around using two terms for example open weights or

37:08.800 --> 37:17.760
open science address that gap yeah that was actually a question I was thinking of asking does anyone

37:18.320 --> 37:25.360
support this idea of having you know solving this this kind of debate or disagreement by having

37:26.320 --> 37:32.720
two different levels of openness or two terms whether it's open weights open source or something else

37:32.800 --> 37:41.440
I think I've heard heard many proposals but is that is this such a complex domain I guess that you

37:41.440 --> 37:47.840
really need you can't you can't solve the problem without having two different layers or two

37:47.840 --> 37:53.840
different types of openness or for or foriness there's there's tremendous value to a recipient to be

37:53.840 --> 37:58.960
able to use the model as they see fit I think everyone that I've ever spoken to agrees on that

37:59.920 --> 38:05.520
that freedom to use and the ability to do lastly or training or a little bit of customization which is the

38:05.520 --> 38:12.400
freedom to study and modify to a degree also very very important there's a whole large

38:13.440 --> 38:17.680
swath I would say most of the models out there today would fall in the category where they cannot

38:17.680 --> 38:23.440
distribute the data for whatever legal reason or lack of availability or size or discussed already

38:24.400 --> 38:29.520
sharing the data gets complicated there's value in that but there's a higher principle

38:29.520 --> 38:38.320
in the full freedom to study and modify that's where I draw the line is there utility in having

38:38.320 --> 38:47.040
multiple categories of transparency sure it gets very complicated though because people have

38:47.040 --> 38:56.640
hard enough time with the idea of free and open source software already so there is utility just

38:56.640 --> 39:07.440
like there's great utility and having freeware and open source software and I would say that

39:07.440 --> 39:13.280
without the data you're looking even on open weights you're looking more towards kind of a freeware

39:13.280 --> 39:24.000
model than anything that could realistically be called open source or open something as applied

39:24.000 --> 39:33.040
to machine learning models so this is where yes having the different out specifications for

39:33.040 --> 39:41.920
what the level of transparency that you have in the system model being discussed it's useful it's

39:41.920 --> 39:47.360
the practice that gets hard because people like simple people like a checkbox that's yes or no

39:48.080 --> 40:02.320
is this free and open source yes no yeah well I mean we we we we do go by this binary of free

40:02.320 --> 40:10.800
non free and if something is non free then it's non free and not a lot more and that is something

40:10.800 --> 40:16.640
that we do want to hold on to as much as possible but even we at the FFC that that in some cases it can be

40:17.920 --> 40:23.280
you know or like traditionally BFF has always said if something is non free it's also non ethical

40:24.160 --> 40:30.160
and we of course also at the FFC that there is difficulty there that you can it can actually be

40:30.160 --> 40:37.520
that an application is being built for ethical reasons cannot release some of its training data and

40:37.600 --> 40:42.800
that's an issue that we struggle with because we really don't want to create an extra level of

40:43.680 --> 40:47.680
discussion there we don't want to give people out so we don't want to we want to make sure that

40:47.680 --> 40:53.520
that free non free binary continues existing because as soon as you start eating away at it it will

40:53.520 --> 41:00.560
eat away at your freedom so yeah I don't support the binary approach like from what I've

41:00.560 --> 41:05.200
generally seen companies that don't share our interests they talk about spectrum and they talk

41:05.200 --> 41:09.760
but different degrees and they have a complicated system and I'm usually just suspicious of

41:09.760 --> 41:15.680
people who present complicated systems and then expect politicians to draw a job of a policy based

41:15.680 --> 41:22.400
on taking the word for it but if I could quickly respond to the previous question I think the

41:22.400 --> 41:29.040
focus is incorrect I think focusing on who endorse this document or who made what comment or

41:29.040 --> 41:33.040
who participated in this forum I think in five years time will have forgotten all these things I think

41:33.040 --> 41:37.600
like it's a 1.0 but it was specifically called a 1.0 because they said we're going to change

41:37.600 --> 41:42.160
in the future this is not you know it's not going to be set in stone and I think that we

41:43.200 --> 41:47.920
have to avoid you know explaining it to two camps I everyone that I've talked to a better AI

41:48.720 --> 41:52.480
in the last few days at least I've been telling them NFSF has a presentation at time clock in the

41:52.480 --> 41:55.440
cable and do you know where the cable building is it's over the top of the hill a little bit

41:55.440 --> 42:01.680
still after it's the big one you know this is not a question of you support this or that

42:02.320 --> 42:10.720
you know you can't measure values or you don't so I think that the dynamic of all working together

42:10.720 --> 42:15.680
you know we're so early in this like we've done maybe 25% of the work and there's so much more to

42:15.680 --> 42:21.360
do and I think that you know we can all fall out maybe at the end but don't fall out at the start

42:21.360 --> 42:28.640
that's my I guess at my point as quickly respond to that and also say that my comment earlier

42:28.640 --> 42:32.400
made it seem like I don't respect the word work that I was I did I want to make sure that it's

42:32.400 --> 42:39.200
clear I respect all the work that went into getting them to where they are I just don't necessarily

42:39.200 --> 42:45.520
believe that that we should feel ourselves be rushed by outside influences and that we need to be

42:45.520 --> 42:51.920
careful and and sure about what we're ending up publishing and it's fine to be aspirational in that

42:51.920 --> 43:01.280
and set an example that people can then live up to and then I also think that if I find a

43:01.280 --> 43:07.040
challenging to think about a definition 1.0 that is definitely going to change as a definition

43:07.680 --> 43:12.720
and so I just yeah that's a challenge that I have.

43:12.960 --> 43:22.480
Yeah and I don't I don't disagree with what either of you are saying I think that moving

43:22.480 --> 43:31.680
together is good but it's also about knowing when you're not accurately representing what's

43:31.680 --> 43:38.000
being discussed and what's being said and we saw a lot of that in the early process so

43:38.320 --> 43:46.960
there are a lot of one of one of my big concerns with open source is you don't know who's leaving

43:48.240 --> 43:54.400
generally and you definitely don't know who's never showed up in the first place so

43:55.760 --> 44:02.560
we don't know whose voices were went unrepresented and we don't know whose voices were

44:02.560 --> 44:12.960
silenced and we saw that so it's not necessarily about the idea the ambition the concept of

44:13.840 --> 44:19.680
figuring out a way to navigate these challenges together it's a matter of execution.

44:21.920 --> 44:28.000
So unfortunately we're running out of time so there were so many other issues we could have

44:28.080 --> 44:35.040
gotten to but maybe we could just give everyone a chance to say whatever they want to about this

44:35.040 --> 44:41.600
topic we'll go around and kind of close on that if you if you want to say anything.

44:43.360 --> 44:47.840
I guess the most important thing I've thought of recently is just the idea of the public sector

44:47.840 --> 44:52.400
and you know maybe the next step is that we can make progress by explain the public sector

44:52.400 --> 44:57.280
that it there there's obligation to detect and fix bias which is study and modify and that could

44:57.280 --> 45:01.120
bring them into the discussion they could even help us get to a final decision on this.

45:03.920 --> 45:10.080
We have a talk about this tomorrow and a different feeling so come over there as well if you

45:10.080 --> 45:13.680
want to add to this question or ask us more questions we'll try and present at least

45:14.880 --> 45:23.040
more complete image of what the FSTNs on this particular issue is then and I guess as a closing call

45:23.120 --> 45:27.920
I would like to say to people if you are a machine learning expert or if you do have

45:27.920 --> 45:33.440
feel like your opinion is not heard or you feel like it's worth having a discussion with

45:33.440 --> 45:39.120
us about anything that we've published so far then please do contact us because we want to hear

45:39.120 --> 45:44.160
your voices and we you know we need to learn from the people in this room or the experts but also

45:44.160 --> 45:51.280
the believers and supporters of what we have all been doing all of these years and yeah so contact us

45:51.280 --> 45:59.040
and we'll discuss. My parting thoughts are don't when you're thinking about machine learning

45:59.040 --> 46:06.480
and open source try to think about all of the benefit that you've gotten throughout the course

46:06.480 --> 46:12.960
of your lives from free and open source software and let's not rob future generations of those

46:13.040 --> 46:27.040
same benefits and opportunities in the name of expediency nothing to add to that so I think we better

46:27.040 --> 46:37.920
end it there because of time I don't know if there's any time for a question no no okay well thank

46:37.920 --> 46:47.960
you to our great panelists and we are

