WEBVTT

00:00.000 --> 00:12.000
Hello, good afternoon to everybody here. Good morning or hopefully not good middle of the night to folks out on the internet.

00:12.000 --> 00:17.000
I am so obnoxiously excited to be here with you all today.

00:17.000 --> 00:25.000
I'm going to talk to you about multi-lingual speech technologies for a global world, which is a bit like saying ATM machine, global world.

00:25.000 --> 00:30.000
It's a world world, it's kind of the point.

00:30.000 --> 00:37.000
My name is Jess and I work for the Common Voice Project over at the Mizzilla Foundation.

00:37.000 --> 00:44.000
I am absolutely needlessly intense about Birmingham being the best British city.

00:44.000 --> 00:53.000
For folks watching on the video, I'm the only one might, but somebody just like explained it to your agreement.

00:53.000 --> 00:59.000
I'm also really excited about communication and human languages.

00:59.000 --> 01:03.000
I'm really bad at learning languages and I have no self-control.

01:03.000 --> 01:10.000
Please, please do not talk to me after this and tell me anything interesting about your language.

01:10.000 --> 01:15.000
You will ruin my summer. Please, please, please don't.

01:15.000 --> 01:20.000
We're going to be talking about data and models and multi-lingual models.

01:20.000 --> 01:27.000
So the first thing I wanted to do was just go ahead and make the nod to AI hype.

01:27.000 --> 01:33.000
Are these models going to change the world? Are these part of Gen AI? Is this the most important thing?

01:33.000 --> 01:35.000
Are they going to replace programmers?

01:35.000 --> 01:42.000
Joyously, this is so far above my pay grade that I just get to sort of nod and bump along.

01:42.000 --> 01:52.000
In a non-editorial way though, I'd like to suggest that maybe we could talk about something practical and useful.

01:52.000 --> 01:58.000
I want to talk to you about multi-lingual speech technologies.

01:58.000 --> 02:05.000
And I want to talk to you about speech technologies because hopeful technology in 2025 feels a little bit rare.

02:05.000 --> 02:08.000
So I'd love to see if we can all be excited together.

02:08.000 --> 02:17.000
If any of you are eventually going to be building some of these, I'd love to kill any excuses you may have to not make them linguistically inclusive.

02:17.000 --> 02:26.000
But for those of you who are using these speech technologies, I'd love for you to be really, really aggressive out in the world about why doesn't this understand me.

02:26.000 --> 02:31.000
And why doesn't this understand people who aren't like me?

02:31.000 --> 02:37.000
Speech technologies are so boring. They are joyously boring.

02:37.000 --> 02:43.000
Almost all of them have terrible diagrams, and this was my favorite one which is quite good.

02:43.000 --> 02:47.000
A research team of Arshalan, I'm in Ramsey and Allen.

02:47.000 --> 02:54.000
Did this really fantastic one, which I think walks us through how ASR, automatic speech recognition,

02:55.000 --> 03:04.000
and speech to text, turning speech waveforms into a type of input a computer can recognize and act on.

03:04.000 --> 03:13.000
So we've got a bucket because that's how we like our data, just in a cylinder of transcribed speech data.

03:13.000 --> 03:19.000
So that's going to be clips of people talking and then text associated with that.

03:19.000 --> 03:29.000
And that gets built into a model to say, hey, when you hear these waveforms, it's associated with these words, this orthography.

03:29.000 --> 03:38.000
And then we've got text data that also feeds into models for what we think that's likely to be saying, and a model for pronunciation.

03:38.000 --> 03:42.000
So in this dialect, in this language, how are things pronounced?

03:42.000 --> 03:48.000
When a speaker comes to a speech technology, the technology will extract out the features.

03:48.000 --> 03:52.000
Usually, we'll get a cool look at the shape of those words.

03:52.000 --> 04:00.000
And all three of these models are going to feed into a decoder, and the thing I love best is, human speech is messy.

04:00.000 --> 04:03.000
Human speech is wild, even in the best of times.

04:03.000 --> 04:09.000
And speech technologies, automatic speech recognition is going to spit out hypothesized text.

04:09.000 --> 04:11.000
What we think that person said,

04:11.000 --> 04:14.000
is anybody Scottish?

04:14.000 --> 04:16.000
Cool.

04:16.000 --> 04:20.000
This works better or worse in some languages.

04:20.000 --> 04:25.000
And this works better and worse for some accents and variants and dialects.

04:25.000 --> 04:27.000
I'm not picking on Scottish people.

04:27.000 --> 04:36.000
They just seem to have the hardest time for speech technologies just really aren't voice assessments.

04:36.000 --> 04:40.000
When I give this talk in other types of mainstream conferences, I love to ask,

04:40.000 --> 04:45.000
like, hey, who uses those little things that live in your house?

04:45.000 --> 04:48.000
And totally don't listen to you when you don't use the key words.

04:48.000 --> 04:54.000
Who uses speech assistance, voice assistance, or those, oh, oh, cool.

04:54.000 --> 05:00.000
I mean, I'm, yeah, me too.

05:00.000 --> 05:07.000
I would definitely do the, so speech technologies are boring, but they're also not just

05:07.000 --> 05:09.000
Siri or Alexa.

05:09.000 --> 05:13.000
They're really, really things that are often exciting.

05:13.000 --> 05:22.000
So they're not perfect, but real time interpretation applications are magical feeling.

05:22.000 --> 05:28.000
I've seen them use together and share livestock disease information for pastoral farmers.

05:28.000 --> 05:31.000
Use for remote medical services platforms.

05:31.000 --> 05:35.000
So the ability to say, hey, I'm bleeding really bad.

05:35.000 --> 05:36.000
What can I do?

05:36.000 --> 05:40.000
It can be quite good when you don't have both hands free at the moment.

05:40.000 --> 05:44.000
Used in assistive technologies, which I absolutely love.

05:44.000 --> 05:47.000
Power is the automatic transcriptions.

05:47.000 --> 05:51.000
We're seeing more and more on videos for whatever value that's worth.

05:51.000 --> 06:00.000
And really at the end of the day, any time you talk to a machine, we're seeing ASR or speech-to-text at work.

06:00.000 --> 06:05.000
I'm hopeful because this is huge for lowering barriers.

06:05.000 --> 06:08.000
You don't need literacy skills.

06:08.000 --> 06:12.000
And all of us hanging out in well-developed Western European city.

06:12.000 --> 06:15.000
We know, oh, people who, a lot of people can't read.

06:15.000 --> 06:18.000
This could be due to cognitive challenges.

06:18.000 --> 06:21.000
This could be coming from a background where there's less literacy.

06:21.000 --> 06:28.000
I kind of don't care what I do care about is people deserve access to this technology.

06:28.000 --> 06:33.000
And if we can remove literacy as a barrier and let people interact through speech,

06:33.000 --> 06:37.000
at least to me, that's so dazzlingly exciting.

06:37.000 --> 06:42.000
You don't need to use your hands either because you don't have use of them now,

06:42.000 --> 06:45.000
or you're doing something else with them.

06:45.000 --> 06:47.000
You can use them.

06:47.000 --> 06:52.000
You can use speech technologies while probably not driving your car, please.

06:52.000 --> 06:53.000
And thank you.

06:53.000 --> 06:57.000
I'm not sure about local laws, but while using equipment machinery,

06:57.000 --> 07:04.000
while doing other things, and speaking requires less directed focus than writing.

07:04.000 --> 07:09.000
If I have to type something, I mean, a screen in my field of vision.

07:09.000 --> 07:13.000
I should probably almost definitely not do this while driving or operating machinery.

07:13.000 --> 07:19.000
But I could probably use speech technologies while chatting with you all.

07:19.000 --> 07:26.000
And interactions with speech technologies mirror the ways many of us communicate with each other day.

07:26.000 --> 07:34.000
This isn't a main point, but the ability to talk to our computers is retro sci-fi magic.

07:34.000 --> 07:42.000
They're probably not going to love us back, but the brief opportunity to feel like they talk to us is pretty cool.

07:42.000 --> 07:46.000
And linguistic include exclusion, so saying,

07:46.000 --> 07:52.000
hey, these speech technologies work, but not for everyone has huge impacts.

07:52.000 --> 07:57.000
When we're seeing people come to the web for the first time, whether in speech or text,

07:57.000 --> 08:02.000
we're seeing people join us in equal hardware, in equal connectivity,

08:02.000 --> 08:10.000
but we're also asking users to join us, often a second or third, or in a language they don't primarily speak.

08:10.000 --> 08:15.000
At this language is, this is an English spang on about all the time,

08:15.000 --> 08:18.000
but languages die every day.

08:18.000 --> 08:21.000
And the languages that die are the languages that don't get used.

08:21.000 --> 08:26.000
The languages we use are the ones we include in our tool line.

08:26.000 --> 08:38.000
And accent and variant-based barriers to speech technologies mean that they develop additional barriers based on class and race and region.

08:38.000 --> 08:48.000
Speech technology isn't just voice assistance, but voice assistance are a fantastic sort of lens through which to see what works and what doesn't.

08:48.000 --> 08:56.000
How many languages there are and what's a language, what's a variant, is kind of the tab's versus spaces of linguistics.

08:56.000 --> 09:00.000
But there's definitely more than 7,000 languages in the world.

09:00.000 --> 09:04.000
Voice assistance work really, really well with about 20.

09:04.000 --> 09:15.000
I'd go ahead and gather that we've got more than 21st language speakers of different languages here today.

09:15.000 --> 09:25.000
This means that the datasets that power them are coming from backgrounds where they underrepresented and lock out people of color,

09:25.000 --> 09:30.000
people from indigenous backgrounds, not just different language backgrounds.

09:30.000 --> 09:39.000
While I'm super excited about speech technologies and how hopeful they are, they're not being built for everybody right now.

09:39.000 --> 09:44.000
When we saw this diagram, I was pointing at all of the things I was excited about.

09:44.000 --> 09:58.000
One of the things I'm also excited about is this research that produced this diagram that I love is actually about the impact of gender and dialect and training size.

09:58.000 --> 10:06.000
Gender and dialect in the training data for Arabic automatic speech recognition.

10:06.000 --> 10:09.000
And this is broken because of the datasets.

10:09.000 --> 10:19.000
When we come back here and see that we've got this transcribed speech data that begins all of our processes for doing automatic speech recognition,

10:19.000 --> 10:27.000
that data not being there or not being included for these models to be trained kind of breaks a lot of things.

10:27.000 --> 10:35.000
Almost all of these datasets are proprietary, close source, and they are expensive as hell.

10:35.000 --> 10:37.000
Often they're limited in demographic scope.

10:37.000 --> 10:45.000
If you gave me a project why I needed to collect 17 hours of Belgian French data right now,

10:45.000 --> 10:48.000
I'd probably run a university campus.

10:48.000 --> 10:53.000
I'd probably go talk to younger students who are chill with casual work.

10:53.000 --> 11:05.000
And often times when we do get regional languages, these are being collected by folks from outside the region who don't understand the language, and are going to wind up with weird data.

11:05.000 --> 11:14.000
I can talk to you about what we did, but in doing so much less of a hey use this completely free and CC0 dataset pitch.

11:14.000 --> 11:22.000
But more of a let's talk through the different aspects of the data and the different aspects of linguistic data that I'm really excited about.

11:22.000 --> 11:29.000
So what common voice did back in 2017 is start collecting speech data via crowdsource platform.

11:29.000 --> 11:33.000
The big thing I want to stress is this is not our data.

11:33.000 --> 11:41.000
We never had a language unless someone from that language with community asks us, and we see ourselves as librarians of it.

11:41.000 --> 11:51.000
We release that data every quarter under CC0 license because it belongs to the people who made it.

11:52.000 --> 11:58.000
If you did want to collect your own data, let's kind of walk through how we did it and how we've been doing it.

11:58.000 --> 12:13.000
Somebody asks us to add a language right now we've got 131 languages on common voice, which doesn't really stack up to the 7,000 I was shouting about earlier.

12:13.000 --> 12:20.000
Because people are donating their voices on a website, we then ask people to help us localize the platform.

12:20.000 --> 12:30.000
We ask you to donate your voice, and I ask you to agree to the terms and a language you don't speak, that is exceedingly uncool.

12:30.000 --> 12:40.000
Because we're a red speech corpus, we also need copyright free sentences for folks to come to the website to read in their home language.

12:40.000 --> 12:46.000
We launch a new language, we have a little bit of a remote party in the office, it's fine.

12:46.000 --> 12:49.000
But people come and contribute their voices.

12:49.000 --> 12:54.000
Other people validate them because that data needs to be validated to be really valuable.

12:54.000 --> 12:59.000
And then because it's not our data, we just go ahead and kick it out into the world every quarter.

12:59.000 --> 13:06.000
If you thought, I want to do this, I want to go ahead and collect my own data, love it, love it, love it, do it.

13:06.000 --> 13:09.000
Let's walk through all this stuff you got to do.

13:10.000 --> 13:15.000
This will absolutely show exactly how old I am.

13:15.000 --> 13:22.000
But common voices gone for a CC0 license, which is the most yellow license you could possibly source.

13:22.000 --> 13:27.000
It means that folks can package the data, resell the data, use it for whatever they want.

13:27.000 --> 13:30.000
We thought this was a really good fit for our project.

13:30.000 --> 13:37.000
The big thing I want to shout about for folks looking at speech data collection is, this is not a good fit for every language community.

13:37.000 --> 13:44.000
We've talked to folks from indigenous language backgrounds saying, how do I keep big tech from accessing this?

13:44.000 --> 13:49.000
Say fantastic, not by using a CC0 license.

13:49.000 --> 13:53.000
So really looking at where the data comes from?

13:53.000 --> 13:59.000
While I could not definitively speak to it, when we look at a lot of big pre-trained models these days,

13:59.000 --> 14:05.000
there's a lot of supposition that they're trained on non-consensually access data.

14:05.000 --> 14:12.000
So thinking about, I'm using speech technologies, where do I want to get my data from, and is it okay to use the data?

14:12.000 --> 14:16.000
Is something that I really like to encourage folks think about?

14:16.000 --> 14:24.000
Think about the age of the data, the speakers in your speech data is massively important.

14:24.000 --> 14:33.000
For common voice, we ask people be adults to contribute their voices for a range of different ethical and legal reasons.

14:33.000 --> 14:43.000
And this is true of a lot of open data sets, but this means that speech technologies overwhelmingly, working incredibly poorly with young voices.

14:43.000 --> 14:52.000
Speech technologies also tend to work very, very poorly with the elderly, also because these are underrepresented demographics.

14:52.000 --> 15:02.000
Right now, we've got, so license, age of your speakers, literally just the languages you want to include.

15:02.000 --> 15:15.000
And this is massive. Right now, we've got 130 plus. I love to ask folks, and y'all have been chatty so far, to guess which language you think we have the most data for.

15:15.000 --> 15:18.000
I don't know y'all can y'all, I mean, that's fine.

15:18.000 --> 15:20.000
Oh.

15:20.000 --> 15:24.000
Oh, okay, with that English Spanish Chinese, Hindi.

15:24.000 --> 15:25.000
What?

15:25.000 --> 15:28.000
Yes.

15:29.000 --> 15:41.000
Okay, look, it's cheating, if you know. So we got a lot of really good and usually folks guess English first because tech tends to optimize for English data first.

15:41.000 --> 15:48.000
But we found that folks from research and language communities tend to be really, really passionate.

15:48.000 --> 15:50.000
I can't pick favorites.

15:50.000 --> 15:57.000
But folks like the cut-alone language community, the Welsh language community, have just been really, really passionate about.

15:57.000 --> 16:02.000
Hades are languages. We want to make sure we can use tools that represent it.

16:02.000 --> 16:05.000
I'm also a little bit active in English.

16:05.000 --> 16:10.000
Yeah. Sorry, I'm trying desperately not to get sidetracked, but I was like, yeah, did you know?

16:10.000 --> 16:12.000
No, no.

16:12.000 --> 16:15.000
You haven't been doing that for an instant longer than?

16:16.000 --> 16:17.000
Maybe.

16:17.000 --> 16:19.000
See me after class.

16:19.000 --> 16:22.000
So we don't just have to think about the language.

16:22.000 --> 16:28.000
And one really exciting thing is if you open this up to communities and say, hey, what's your language?

16:28.000 --> 16:34.000
You are going to get a ton of very interesting controversy around standards and what is a language.

16:34.000 --> 16:36.000
Love it, love it, love it.

16:36.000 --> 16:38.000
But also variants.

16:38.000 --> 16:43.000
So the way I speak English may be completely different from the way someone in Glasgow speaks English.

16:43.000 --> 16:48.000
Major language variants can be hugely different.

16:48.000 --> 16:51.000
And then we come down another level to accent.

16:51.000 --> 16:58.000
So having a look at how to map metadata for accents, define variants and split those out,

16:58.000 --> 17:05.000
and find a way that includes languages in a way that represents what people think of themselves as speaking.

17:05.000 --> 17:10.000
What we've done on common voice is we tend to disambiguate languages based on ISO code.

17:11.000 --> 17:16.000
So the Internet National Standards Organization often gets to choose who has and has into language.

17:16.000 --> 17:21.000
Variants we have a different BCP47 views.

17:21.000 --> 17:27.000
And for accents we've got a combination of an optional chance to set your accent.

17:27.000 --> 17:29.000
But it's also a free text field.

17:29.000 --> 17:33.000
So you get a drop down of American English, British English.

17:33.000 --> 17:36.000
Or you can type whatever it is in your heart.

17:36.000 --> 17:38.000
And people do.

17:39.000 --> 17:44.000
Right now the way we collect data on common voice is a red speech corpus.

17:44.000 --> 17:47.000
So people come up to the website, they push the button.

17:47.000 --> 17:48.000
It's very technical.

17:48.000 --> 17:53.000
And they'll get a sentence, which is always extremely normal.

17:53.000 --> 17:59.000
Like his research largely concerns the eco-physiology of legends.

17:59.000 --> 18:00.000
Like.

18:01.000 --> 18:10.000
I was going to say this is the first linguistic heckle I've gotten.

18:10.000 --> 18:13.000
But it's the first non-academic linguistic heckle I've gotten.

18:13.000 --> 18:14.000
And I quite like it.

18:14.000 --> 18:18.000
So our red speech corpus is fantastic for its simplicity.

18:18.000 --> 18:19.000
Folks can come in.

18:19.000 --> 18:22.000
They don't have to think too much about what's going on.

18:22.000 --> 18:24.000
Hey, just read the sentence.

18:24.000 --> 18:26.000
It's not great for a couple of different reasons.

18:26.000 --> 18:29.000
First of all, the sentence is extremely weird.

18:29.000 --> 18:33.000
And I think that speech recognition technology will very rarely be asked

18:33.000 --> 18:36.000
to identify the word likens.

18:36.000 --> 18:41.000
But also the way you read something and the way you say something is a very different vibe.

18:41.000 --> 18:43.000
His research largely.

18:47.000 --> 18:48.000
What are you doing?

18:48.000 --> 18:50.000
That's weird.

18:50.000 --> 18:51.000
Oh.

18:51.000 --> 18:58.000
Also, a problem that I'm so, I'm so sorry if I look really excited about like

18:58.000 --> 19:01.000
this technical problems, but they're fantastic.

19:01.000 --> 19:02.000
Oh, this.

19:02.000 --> 19:08.000
And like the language ones are even more complicated because they're people at the end of the day.

19:08.000 --> 19:11.000
So a red speech corpus as well.

19:11.000 --> 19:13.000
All you got to do is write it down, right?

19:13.000 --> 19:15.000
That's not a problem.

19:15.000 --> 19:16.000
People know how.

19:16.000 --> 19:19.000
Like once you write it down, it's not even a big deal.

19:19.000 --> 19:23.000
So for example, if I wanted to write something in Tajik, I would use the Tajik alphabet.

19:23.000 --> 19:24.000
Yeah.

19:24.000 --> 19:30.000
And that's not a problem at all until we get to the next step where this is also the Tajik alphabet.

19:30.000 --> 19:35.000
And really if I wanted to, this is also the Tajik alphabet.

19:35.000 --> 19:39.000
So looking at how to handle multiple orthographies.

19:39.000 --> 19:44.000
So different communities, different contexts may use different characters,

19:44.000 --> 19:48.000
which is one of my very favorite language problems.

19:48.000 --> 19:56.000
When and how do you give the people give folks the opportunity to switch between those?

19:56.000 --> 20:01.000
Sorry, I'm, yes.

20:01.000 --> 20:05.000
But also who here comes from an English first background?

20:05.000 --> 20:08.000
Like, I'm sorry, I need to.

20:08.000 --> 20:15.000
So from folks who come from an angle phone background, we might get very used to using the same language

20:15.000 --> 20:19.000
throughout our conversations with no code switching at all.

20:19.000 --> 20:27.000
For folks here in Brussels or for somebody in Nairobi, the opportunity to pop different languages words in and out is called code switching.

20:27.000 --> 20:30.000
And is immensely common.

20:30.000 --> 20:35.000
So one thing we've just piloted for common voice to deal with the different orthographies,

20:35.000 --> 20:41.000
to deal with code switching, to look at making it a little bit less like-any as well,

20:41.000 --> 20:48.000
is we've got a pilot coming out where we've asked people to spontaneously tell us what they think about something instead of.

20:48.000 --> 20:50.000
So instead of reading what's easy.

20:50.000 --> 20:55.000
This is a bit disappointing to give somebody who lives in Britain.

20:55.000 --> 20:59.000
Because it's a very, it's a very short.

20:59.000 --> 21:01.000
But it's fine.

21:01.000 --> 21:04.000
People instead get to record their responses.

21:04.000 --> 21:05.000
Oh, do you know what?

21:05.000 --> 21:12.000
I live with the equator, so I'm a reasonable person who is 12 hours of sun every day.

21:12.000 --> 21:16.000
But it gives folks the opportunity to talk about something they care about.

21:16.000 --> 21:20.000
It gives other folks the opportunity to transcribe it and learn about it.

21:20.000 --> 21:28.000
And folks get to speak more naturally with umms and aas and aas and aas and pops and code switching.

21:28.000 --> 21:31.000
And this could be a fantastic way to make a ton of money.

21:31.000 --> 21:38.000
You say, oh, Jess, you told me how all these proprietary data sets are really expensive.

21:38.000 --> 21:40.000
Like I did, I did, yeah.

21:40.000 --> 21:43.000
And all I got to do is these very simple things and I'll make a ton of money.

21:43.000 --> 21:45.000
I'm like, oh, cool, yeah.

21:45.000 --> 21:49.000
It's very easy. You have a good day.

21:49.000 --> 21:52.000
I wouldn't even worry about the marketing.

21:52.000 --> 21:56.000
Wait, wait, see me after class.

21:56.000 --> 22:00.000
But the big thing I would want to say is if you are using speech technology,

22:00.000 --> 22:05.000
if you're building something, please, please, please come and voice us free.

22:05.000 --> 22:08.000
It's thousands and thousands, it's tens of thousands of hours.

22:08.000 --> 22:13.000
There's no license on it. There's literally no excuse not to use it.

22:13.000 --> 22:17.000
But it's not just me sort of flogging our data set.

22:17.000 --> 22:22.000
There are so many language demographic and domain data sets out there that are free,

22:22.000 --> 22:25.000
that are open source, a lot of them are academic.

22:25.000 --> 22:28.000
In 2025, there are very few excuses.

22:28.000 --> 22:33.000
If you're building models and training models on languages, not to be adding these.

22:33.000 --> 22:38.000
Very, very free, permissively licensed data sets.

22:38.000 --> 22:44.000
And why I'm pushing this on you so hard is I like to stand up and be intense about languages,

22:44.000 --> 22:51.000
because one of you is going to come tell me about Georgian verbs after this.

22:51.000 --> 22:56.000
But if it's okay, it's not desperately uncool.

22:56.000 --> 23:02.000
I'd love the opportunity to be excited and hopeful about tech with you all.

23:02.000 --> 23:07.000
But really, I'd love to see some of you and some of you out there in the big wide internet,

23:07.000 --> 23:09.000
building speech technologies.

23:09.000 --> 23:13.000
There's a ton of open models. You can come get our data right now.

23:13.000 --> 23:15.000
I can't stop you.

23:16.000 --> 23:22.000
But also, even if you're not coming in telling me, giving feedback to people building the speech technologies

23:22.000 --> 23:26.000
and building the data sets is critically important.

23:26.000 --> 23:33.000
When I screw something up, when we screw something up, please yell at us.

23:33.000 --> 23:39.000
Oftentimes when you see well funded technologies coming out of the west, you see a rest of world mindset.

23:39.000 --> 23:42.000
Well, we're going to build this for the California market,

23:42.000 --> 23:45.000
and then we're going to come into Europe, and then do you know what?

23:45.000 --> 23:48.000
Then we're going to talk about rest of world.

23:48.000 --> 23:53.000
And that's not the way the world really works, especially if you're building open source,

23:53.000 --> 23:58.000
especially if you're building interesting, beautiful, useful things,

23:58.000 --> 24:03.000
that you hope would change the world, thinking about where and how language comes in,

24:03.000 --> 24:07.000
whether this is text, whether this is text localization, whether this is speech,

24:07.000 --> 24:10.000
is something I would like to politely beg of you.

24:10.000 --> 24:17.000
But even if you don't ever build anything with speech, please take into the world my permission

24:17.000 --> 24:23.000
to get as loud and to get as mad and to get as weird as you want when you're talking to your computer

24:23.000 --> 24:25.000
and it doesn't respect you.

24:25.000 --> 24:27.000
Thank you so much.

