WEBVTT

00:00.000 --> 00:02.000
You

01:01.000 --> 01:02.000
Thank you very much

01:05.000 --> 01:14.400
Well, this is for the recording. Yes, again. Thank you for coming. I'm not going to be repeating the whole things right so we have

01:18.500 --> 01:21.000
We had received lots of lots of

01:22.000 --> 01:29.200
Presentations idea. We cannot fit more we even got a full day that was amazing by the way

01:31.000 --> 01:33.000
Please softer

01:33.000 --> 01:36.000
Today or at some point in the next weeks and stuff like that

01:36.500 --> 01:43.800
Right to the first the organizers and say if you like this workshop to continue and stuff like that because that's the way that they decide

01:44.000 --> 01:46.000
which rooms get which

01:47.000 --> 01:53.200
Beverem proposals get accepted and which rooms they get and which days they get or slots and stuff like that

01:54.000 --> 01:56.000
so yeah,

01:57.000 --> 01:59.000
we usually have

01:59.000 --> 02:01.000
30 minutes slots

02:05.000 --> 02:10.000
But in order to fit even more we went to

02:11.000 --> 02:17.000
20 minutes slots in the middle of the day and then you'll see my organizers are here

02:17.000 --> 02:19.000
Here

02:27.000 --> 02:29.000
So you'll get this after with the way

02:37.000 --> 02:46.000
Very basic information about the speakers. There's a camera there. There's a microphone here. You should be talking to the camera. You should be speaking to the mic

02:46.000 --> 02:48.000
Everything is recorded everything is streamed

02:49.000 --> 02:53.000
So yeah, please take care of that you should stay on the

02:54.000 --> 02:59.000
Correct side of this red line here because otherwise you'll get out of the picture and

03:01.000 --> 03:04.000
I think that's the basics

03:05.000 --> 03:07.000
For our staff

03:08.000 --> 03:10.000
I'm not sure we want to go through

03:11.000 --> 03:15.000
Oh, and the times that you see here

03:16.000 --> 03:20.000
As you probably see the end time is the start time of the next one which is not correct, right?

03:21.000 --> 03:25.000
So the end time is not when a speaker should end the talk

03:26.000 --> 03:33.000
The speaker should have ended all the questions would have done the speaker who are left people who have left and then we're starting the new one

03:34.000 --> 03:37.000
So imagine that all the end times are five minutes

03:40.000 --> 03:45.000
Way into the future so yeah, and that's why instead of nine ten

03:46.000 --> 03:52.000
It's already nine seven and I'm out of my time. All right, so any questions?

03:55.000 --> 03:57.000
Good, perfect. Let's have a great day

04:10.000 --> 04:13.000
You

04:40.000 --> 04:42.000
You

05:10.000 --> 05:12.000
You

05:40.000 --> 05:42.000
You

05:48.000 --> 05:53.000
Hello, I don't know if it's works. It works. Okay perfect. Hello everyone

05:55.000 --> 06:03.000
Thank you Alexios and the organizers first of all for giving me the opportunity to be here today

06:03.000 --> 06:06.000
And yeah, let's get started

06:07.000 --> 06:12.000
Today I would like to start with the quick and simple analogy

06:13.000 --> 06:20.000
Imagine that you're going to a restaurant and you're going to book for dinner or lunch whatever

06:21.000 --> 06:27.000
And you suddenly start to see that each restaurant has their own unique system for ratings

06:27.000 --> 06:35.000
So for example, you can see we have maybe some of them use stars maybe some use numbers

06:36.000 --> 06:41.000
Maybe some of them uses emojis words and so on

06:42.000 --> 06:50.000
The question is how do you even compare I mean this is the actual the same challenge that we are facing

06:51.000 --> 06:55.000
When dealing with crypto algorithm detection

06:58.000 --> 07:04.000
And yeah, basically this blocks good security and compliance and that's why

07:05.000 --> 07:11.000
Our work with desploms and cryptos and the recession is so important

07:12.000 --> 07:20.000
I'm at the as I'm a software engineer. It's kind of a says where we provide open source software intelligence

07:21.000 --> 07:25.000
By giving access to our knowledge base to be corporations

07:27.000 --> 07:36.000
And today I would like to talk you about two big updates that we were talking about working in the last couple years

07:36.000 --> 07:42.000
On the first side, the release of the crypto algorithms opened data set

07:43.000 --> 07:51.000
And on the other side, the decision of SPDX to adopt it as an actual standard

07:52.000 --> 08:01.000
And by the end of this talk, I hope you learned how to actually leverage this data set to make your work easier and also

08:01.000 --> 08:07.000
To potentially stop building the same tools over and over again

08:11.000 --> 08:15.000
To begin with, I would like to talk about the impact and why

08:17.000 --> 08:20.000
Sunderized crypto identification matters at all

08:21.000 --> 08:25.000
On the first side, we have a couple key stakeholders which might be

08:26.000 --> 08:34.000
Trade compliance teams, security teams, companies which are increasingly concerned about post quantum crypto

08:35.000 --> 08:40.000
And maybe because all the team requirements tend to grow year over year

08:41.000 --> 08:44.000
But here's something that is crucial

08:45.000 --> 08:47.000
Standardization saves money

08:48.000 --> 09:00.000
And when we started this journey, we were spending significant resources on maintaining and updating our crypto detection methods

09:01.000 --> 09:09.000
So this doesn't only save us at Sunderized money, but also helps the community to be more efficient

09:09.000 --> 09:16.000
This is our proposal for the community

09:17.000 --> 09:23.000
So we have a crypto algorithm, the definition list, which has a simple data structure

09:24.000 --> 09:29.000
Which is written in a machine readable format, so it's an extensive old

09:30.000 --> 09:38.000
And we even have some reference code, so you can check that out and use it as a starting point

09:39.000 --> 09:45.000
But the important thing here is that this is not only something that's theoretical

09:46.000 --> 09:55.000
It's also been better tested by us in production, it's kind of billions of files and also helping big organizations

10:00.000 --> 10:06.000
So these are the key milestones that we're rich so far in the last couple years

10:06.000 --> 10:14.000
And let's go deeper into each one of them

10:15.000 --> 10:21.000
In 2021, customers started to ask us if we could have them

10:22.000 --> 10:28.000
Identifying the crypto algorithms that were present in their open source projects

10:29.000 --> 10:31.000
And at first, it sounds simple, right?

10:31.000 --> 10:35.000
I mean, yeah, but it's not

10:36.000 --> 10:44.000
Because think about scale, when you're scanning billions of files and you have thousands of projects

10:45.000 --> 10:49.000
You have a big problem because you have to be as efficient as possible

10:50.000 --> 11:00.000
But we recognize that this wasn't just an internal need, this was just actually coming from real customers and real use cases

11:01.000 --> 11:12.000
We started with keyword matching, for each crypto algorithm we created that definition file

11:13.000 --> 11:22.000
Which contains some attributes such as the algorithm ID, I don't know if you can actually see anything in the screen or if it's too small

11:23.000 --> 11:25.000
But I'll just read it out

11:26.000 --> 11:37.000
So for each algorithm, we created this kind of definition file which contains the algorithm ID, the name, the security strength, if any

11:38.000 --> 11:47.000
And what's most important, the keywords list, which we'll do you see afterwards to actually do the matching

11:48.000 --> 11:58.000
So as simple as it may sound, we realized that this was quite effective for large-scale scanning

11:59.000 --> 12:04.000
And it also allowed us to be quite precise on the detection

12:04.000 --> 12:14.000
Because we could differentiate between AES-128, AES-256 and so on

12:17.000 --> 12:21.000
So something interesting happened

12:22.000 --> 12:28.000
Customers started asking, what about our known open source projects

12:29.000 --> 12:36.000
And also, a lot of crypto libraries and frameworks started to pop up

12:37.000 --> 12:45.000
So we were not able to keep the data set updated as fast as we needed to

12:46.000 --> 12:54.000
So we needed that that we have something valuable and why keep this private

12:55.000 --> 13:02.000
The community was being already involved, our customers were already helping us

13:03.000 --> 13:09.000
In the data set update their rules and so we just needed to open the door

13:09.000 --> 13:11.000
And so with it

13:13.000 --> 13:17.000
In 2020-24, we made two big moves

13:18.000 --> 13:22.000
On the first side, we released the data set under the CCC relations

13:23.000 --> 13:26.000
Which is as close as to public domain

13:27.000 --> 13:30.000
There is in terms of copyright law

13:30.000 --> 13:41.000
And the second one is that we were being recognized as the default standard for credit

13:42.000 --> 13:50.000
We started to talk with SPDX to collaborate and being adopted as the actual standard

13:50.000 --> 14:01.000
So as you know, SPDX has a license list so basically this will be going to be the same

14:02.000 --> 14:04.000
But in terms of crypto detection

14:06.000 --> 14:12.000
So I have a small demo, I was going to do live but you know something things happened

14:13.000 --> 14:15.000
So I prefer to just

14:16.000 --> 14:19.000
You have there a link to the repo

14:20.000 --> 14:23.000
If you want you can also use the QR code

14:24.000 --> 14:30.000
And I will show you some screenshots of what it looks like

14:31.000 --> 14:38.000
So basically inside the repo of the same in the same repo that the data set is present

14:39.000 --> 14:45.000
We have an example script for the actual detection that leverages the data set

14:46.000 --> 14:54.000
And it also created a demo folder with some useful examples that will outline some challenges that I will talk later

14:59.000 --> 15:04.000
And basically it's very simple you have to just execute the script

15:04.000 --> 15:13.000
And you pass the folder that you want to scan and you'll get basically Jason Files a result

15:14.000 --> 15:20.000
Which will contain the files where the crypto algorithms were found

15:21.000 --> 15:26.000
As well as the definition file for each and which keyword was the actual match

15:27.000 --> 15:32.000
So very very simple it's not the any fancy integration or anything like that

15:32.000 --> 15:38.000
So being very simple it presents some challenges

15:39.000 --> 15:46.000
So as you can see here in this example that you can check out in the repo

15:47.000 --> 15:56.000
We have for example keywords that are being matched but are in a completely different context

15:57.000 --> 16:03.000
Because for example as you can see we have a match with for tuna which is a crypto algorithm

16:04.000 --> 16:09.000
But in the code is not being actually used in a crypto context

16:10.000 --> 16:13.000
So sometimes a keyword just a coincidence

16:14.000 --> 16:20.000
And the other challenge is that sometimes you have comments in your code

16:20.000 --> 16:27.000
That actually doesn't do in this case well it doesn't do anything because it's actually all commented out

16:28.000 --> 16:36.000
But sometimes you have comments that are misleading or actually they call that something that it's actually the opposite of what the comment does

16:40.000 --> 16:41.000
So

16:41.000 --> 16:45.000
Looking ahead tuna 25 seems to be very exciting

16:46.000 --> 16:50.000
We are going to move the data set to the software transparency foundation

16:51.000 --> 16:55.000
We are going to see new implementations and new in different languages

16:56.000 --> 17:01.000
And we are going to see the community taking ownership

17:04.000 --> 17:05.000
So

17:06.000 --> 17:10.000
Usually in open source projects tends to get better as more people help

17:11.000 --> 17:18.000
So even if we have different skills and experiences we want you to be involved

17:19.000 --> 17:22.000
So we have a couple ways that you can help us to improve the data set

17:23.000 --> 17:32.000
You can even create new implementations or share real world use cases or even explore if AI has any role to play here

17:32.000 --> 17:41.000
To regarding the context awareness and basically there are more ways of course

17:42.000 --> 17:47.000
And yeah

17:48.000 --> 17:58.000
At the beginning of the dog I mentioned personal writings and in open source we try to work together to collaborate

17:59.000 --> 18:02.000
But to do this effectively we need to speak the same language

18:03.000 --> 18:07.000
So this is what that is what this is about

18:08.000 --> 18:13.000
We have this work in solution and we know it helps and we want you to get involved

18:14.000 --> 18:15.000
The repo is open

18:16.000 --> 18:21.000
The tools are there and we are really looking forward to what you are going to build it

18:22.000 --> 18:25.000
And basically that's it

18:26.000 --> 18:30.000
Thank you very much and if you have any questions please ask me

18:32.000 --> 18:33.000
Question

18:34.000 --> 18:39.000
And so thank you very much and actually I still do not really understand the use case

18:40.000 --> 18:46.000
So the very long aspect is do not recommend your hash algorithm or crypto to use a library

18:46.000 --> 18:51.000
A group library that has a set of issues that can find those

18:52.000 --> 18:58.000
And probably you don't find the communication that we are flying in show use of any library

18:59.000 --> 19:03.000
So why would someone be an interest in it

19:05.000 --> 19:07.000
I don't know if I understand correctly the question

19:09.000 --> 19:13.000
The question was why would we actually use this

19:13.000 --> 19:15.000
Yeah, what's the use case

19:17.000 --> 19:19.000
What yeah, I was in

19:20.000 --> 19:23.000
For example your your my control department you will

19:26.000 --> 19:31.000
To know all the crypto algorithms that you have in your composition before you can actually

19:32.000 --> 19:36.000
Sheep the product to start in countries because you don't

19:36.000 --> 19:38.000
But it doesn't make sense

19:39.000 --> 19:42.000
Yes, that's one example the other one security compliance there are certain security compliance

19:43.000 --> 19:47.000
And then where that requires what some people are starting to call a sebum

19:50.000 --> 19:57.000
Another important one is that in order to understand the end of life

19:58.000 --> 20:03.000
What your product you need to understand the end of life of the crypto side of product

20:03.000 --> 20:05.000
There is a lot of people

