WEBVTT

00:00.000 --> 00:18.000
Here we give them a big round of applause and a round of applause and a round of applause and a round of applause.

00:18.000 --> 00:30.000
All right, I think it's working.

00:30.000 --> 00:35.000
All right, so I'm here to talk about the use storage.

00:35.000 --> 00:42.000
So before we talk about a certain, I mean, before we talk about the project, let's just talk about a certain.

00:42.000 --> 00:48.000
So it is one of the world's largest physics lab. We largely do particle collisions here.

00:48.000 --> 00:52.000
And I forgot about it. So let me do everything again.

01:02.000 --> 01:09.000
Hi, hi everyone. So I'm here to talk about from particle collisions to physics results.

01:09.000 --> 01:15.000
It's about use project search, which is a large storage project we use it's own.

01:15.000 --> 01:22.000
So before we begin talking about a certain, one of the world's largest physics labs.

01:22.000 --> 01:29.000
We largely do studies on the fundamental nature of matter. It's also a large international collaboration.

01:29.000 --> 01:36.000
It's like a composed of 24 member states and 10 associate members and a lot of other states around the world.

01:36.000 --> 01:45.000
Our mission is largely to gain understanding about the fundamental nature of matter and then how our universe began.

01:45.000 --> 01:51.000
And how we do it is via particle collisions. So we largely collide particles itself.

01:51.000 --> 01:59.000
And we have a huge accelerator complex where we collide protons on your light speeds.

02:00.000 --> 02:07.000
And the largest program at largest research project we have is the LSE program, which is the large hydrogen collider.

02:07.000 --> 02:13.000
It's 27 kilometer circular ring, which is 100 meters underground.

02:13.000 --> 02:19.000
And you have superconducting magnets that try to steer particles to a neon light speeds.

02:19.000 --> 02:24.000
And particles eventually collide at neon light speeds.

02:24.000 --> 02:29.000
If you talk about the three pillars of the LSE program, it's composed of the three things.

02:29.000 --> 02:33.000
So one is the accelerators themselves, which try to accelerate the particles.

02:33.000 --> 02:42.000
But what is more interesting is when they collide, we have very large 3D cameras of sorts called the directors where these collisions happen.

02:42.000 --> 02:49.000
And finally to understand what actually happened, we have to analyze this data in this world compute.

02:50.000 --> 02:55.000
And so talking about the directors themselves, they are mostly like 3D cameras.

02:55.000 --> 03:01.000
And what actually happens in the LSE ring is you would actually send bunches of protons, like 100 billion protons each.

03:01.000 --> 03:08.000
25 nanoseconds apart. So it gives rise to like almost 40 megahertz of collisions, which happen every second.

03:08.000 --> 03:14.000
And you set processing this data to come down to what actually is more relevant for you.

03:14.000 --> 03:22.000
And ultimately, of course, the particles do not remain, it's only the data that remains. And that's the challenge.

03:22.000 --> 03:29.000
And putting this into a much more translatable perspective of the data flow, this is how it looks like.

03:29.000 --> 03:34.000
So from a very large experiment, one of the bigger experiments is settlers.

03:34.000 --> 03:38.000
We have a workflow of where you have this 40 megahertz of collision events.

03:38.000 --> 03:48.000
This initially passes to custom hardware, more like assets and you know, custom hardware that tries to actually filter out this collision data.

03:48.000 --> 03:52.000
So you cut down the data from 40 megahertz to almost 100 kilohertz of data.

03:52.000 --> 04:00.000
It'll still a lot of data. So 100 and 60 kilohertz per second, new storage system can sustain over, I mean, 3 years of LSE run.

04:00.000 --> 04:10.000
And then, this is further cut down with MoCPU and GPU event firms. You come down to a more acceptable data rate of 10 to 40 kilohertz per second.

04:10.000 --> 04:15.000
And this flows into the large scale disk storage system, we have it's own call EOS.

04:15.000 --> 04:23.000
It's also all the data, raw data is also archived with the tapes system called CT, which would be the next stock in the series.

04:23.000 --> 04:28.000
We largely analyze this data in batch firms using HTC Condo.

04:28.000 --> 04:35.000
And finally, the data is also not just analyze itself, but it's also across the collaborating institutes all over the world.

04:35.000 --> 04:47.000
And this is the WLCG and we use a data management middleware called GRUSCO, which should be also talk about two stocks later from Google.

04:47.000 --> 04:55.000
Talking about EOS itself, currently, as of this year, we are almost storing an extra weight of hosting an extra weight of physics data.

04:55.000 --> 05:00.000
And this is the evolution of storage needs over time. I mean, the actual storage we had.

05:00.000 --> 05:08.000
So from 2010 when the project just about barely started, it was around two petabytes to almost an extra weight of data now.

05:08.000 --> 05:15.000
We are also, I mean, the data is also across, I mean 8 billion files and, you know, we are close to 70,000 hard.

05:15.000 --> 05:20.000
This is the point of time. The largest instance we have is 180 petabytes in size.

05:20.000 --> 05:27.000
So this is another large experiment we have it's own. And this is the largest storage instance we have for that.

05:27.000 --> 05:33.000
And here, we are actually having data rates much higher than the usual experiment.

05:33.000 --> 05:38.000
So we have incoming data rates going up to almost 250 gigabytes per second.

05:38.000 --> 05:46.000
And then the scene workflow, what I explained earlier, happens within the data actually is also moved across the globe as well as, you know,

05:46.000 --> 05:55.000
active tape and various other things. So, so why did we need to build another shared storage files system again?

05:55.000 --> 06:03.000
So this project was started in 2010 and the main goal back then was, you know, suitability for physics analysis,

06:03.000 --> 06:10.000
because physics analysis is actually like a large streaming workflow. So it seems equal into something like streaming like 100,

06:10.000 --> 06:13.000
and the next six movies where people are just seeking around.

06:13.000 --> 06:19.000
Another important aspect is of course, the remote accessibility because the data is not analyzed just locally at the certain data center,

06:19.000 --> 06:25.000
but analyze all across the globe. So you need to have something that actually can speak wide area networks as well.

06:25.000 --> 06:33.000
Then of course, the standard things that come across with, you know, a large lab of 16,000 users, you would have the resource sharing with

06:34.000 --> 06:42.000
quotas, accels and securities. And you mainly want a quality of service because you have not only the machines writing data at a very fixed rate,

06:42.000 --> 06:50.000
but you have users analyzing this data and you need both of these to work together in the same system.

06:50.000 --> 06:58.000
And one of the largest constraints we have is the cost, even though the data is always constantly increasing our budget is more or less fixed,

06:58.000 --> 07:05.000
which also brings us to the other point of heterogeneous hardware support, because every time you start buying this in a data center,

07:05.000 --> 07:10.000
you would see that the market has evolved, the latest and the greatest sizes would be the most expensive,

07:10.000 --> 07:15.000
but then if you go for a more cheaper hardware, it may not be the size you previously installed.

07:15.000 --> 07:20.000
So it's very important that the storage system actually speaks multiple, you know, disk sizes.

07:21.000 --> 07:30.000
So I mean, the another aspect is because of this aspect, your data centers, physical space does not always need to expand, which is actually a big premium,

07:30.000 --> 07:36.000
but you just by larger disks, but then you cannot always retire older disks.

07:36.000 --> 07:41.000
So this is the use project, it's started in 2010.

07:41.000 --> 07:50.000
And if you see the storage department at some, we actually host a two major, I mean,

07:50.000 --> 07:56.000
storage systems, one is the use, which we developed at some, the other is set, which is largely used for the IT applications

07:56.000 --> 08:01.000
and many other applications that we'll done on the storage infrastructure.

08:01.000 --> 08:06.000
Otherwise, users largely the file store that supports the physics analysis needs of the lab,

08:06.000 --> 08:11.000
but it also supports many of the user home folders and this kind of use cases with certain box,

08:11.000 --> 08:19.000
which is like Dropbox, like Sync and Share platform, that also gives you access to many other applications like Office and

08:19.000 --> 08:24.000
I mean, also even analysis workflows.

08:24.000 --> 08:31.000
It's also the disk sentence for the taper table system, which we talked about in the next talk.

08:31.000 --> 08:38.000
And as far as the protocols we expose, of course, the main protocol we expose for the data transfer,

08:38.000 --> 08:48.000
is something called root, X root, which is common protocol that many of the high energy physics clients speak.

08:48.000 --> 08:55.000
But also not only that, we expose a few centuries because that's largely intuitive for end users who do analysis.

08:55.000 --> 08:58.000
So that's also common interface that's heavily accessed.

08:58.000 --> 09:05.000
So any general purpose client can mount uses a few file system, which makes it usable everywhere.

09:05.000 --> 09:11.000
And as far as the non-technical and also mobile clients and everything else,

09:11.000 --> 09:20.000
we have a common interface exposed via certain box, which accesses a web UI and also has mobile android devices and many of the other things.

09:20.000 --> 09:26.000
It also exposes a Windows interface so that Windows clients can also mount this.

09:27.000 --> 09:31.000
In addition, we also have HTTP and JRPC interfaces.

09:31.000 --> 09:39.000
So that the data upload can also happen via HTTP, which is largely useful for the transfer across the globe.

09:39.000 --> 09:47.000
And largely, if you see how the architecture of deployment itself is, we largely split the,

09:47.000 --> 09:52.000
I mean, installation into multiple instances, each serving, it's own experiment or something like this.

09:52.000 --> 09:56.000
But one experiments work flow does not affect the other.

09:56.000 --> 10:00.000
And if we talk about certain itself, elixine is not the only experiment we do.

10:00.000 --> 10:05.000
We also have a lot of small and medium experiments, which are also hosted in the years.

10:05.000 --> 10:10.000
In addition, it's all the users' storage for documentation and everything like this, which happens in sandbox.

10:10.000 --> 10:12.000
All of this happens in the use storage.

10:12.000 --> 10:19.000
So we just split it across into more logical smaller deployments so that they can run more easily.

10:19.000 --> 10:30.000
And talking about the peak, the data rates over the last last year, it was in the order of 150 gigabytes per second for one of the experiments.

10:30.000 --> 10:36.000
During the higher one, and then it goes in the order of 20, 30 gigabytes per second.

10:36.000 --> 10:43.000
This is only the data traffic, but then it's not only the data traffic that the storage system has to handle.

10:43.000 --> 10:48.000
It's also the user analysis workloads, which always go in the order of 200 gigabytes per second.

10:48.000 --> 10:53.000
And if you see the networks are going in and out, this is what we are talking about.

10:53.000 --> 11:01.000
So when we have a large streaming analysis workloads running, we almost hit a peak of almost a database per second.

11:01.000 --> 11:06.000
And our incoming data rates also hit almost a database per second.

11:06.000 --> 11:12.000
But when you're talking about incoming data rates, it's not purely the right, because every writer's always mirrored at least,

11:12.000 --> 11:25.000
or also our erasure code, which means like you'll have an item notification of at least 100% or like 20, 30% depending on what erasure code coding configuration you run.

11:25.000 --> 11:29.000
And talking about the architecture of the storage system itself.

11:29.000 --> 11:40.000
So you'll actually have a metadata server called MGM, which handles the authentication of clients, and then clients directly talk to the storage servers, which are called FSTs.

11:40.000 --> 11:51.000
The metadata itself was actually persisted until this persistence table called the co-op DB, which we developed it soon, which actually shows the persistent metadata.

11:51.000 --> 11:57.000
And the idea is that your metadata kind of scale up kind of architecture, because it's usually very small.

11:57.000 --> 12:02.000
Even if you're talking about billions of files, you're just talking about a database of metadata.

12:02.000 --> 12:07.000
So it's just not something that actually exceeds an NVM or SSV.

12:08.000 --> 12:12.000
And the data story, of course, you have to talk about multiple files storage servers.

12:12.000 --> 12:16.000
So I mean, you're talking about exhibits of data servers.

12:16.000 --> 12:22.000
These demons are built on top of this framework called X-RootD, which I'll talk about later.

12:22.000 --> 12:28.000
And this is largely how, I mean, it's similar to any shared files system wherein you actually talk to a metadata server,

12:28.000 --> 12:35.000
and then it really helps you to direct storage servers, where you actually write the actual data.

12:35.000 --> 12:39.000
And this is a simple example of the data workflow.

12:39.000 --> 12:44.000
You're client would actually contact the MGM and ask for a given file way to write.

12:44.000 --> 12:47.000
And it will tell you which files to be written.

12:47.000 --> 12:50.000
It's all persisted into this persistent metadata store.

12:50.000 --> 12:56.000
And finally, once you've closed the write, this is actually, you know, the system is ensuring that the,

12:56.000 --> 13:02.000
when you issue a close command, it's both replicator and also everything like the checksums and everything is calculated.

13:02.000 --> 13:07.000
Any write would always involve multiple storage servers depending on the file layout.

13:07.000 --> 13:10.000
So it will be for replicator, it will be to replication.

13:10.000 --> 13:17.000
If it's a ratio coding, it's at least that number of, you know, K plus M ratio.

13:17.000 --> 13:23.000
And talking about the demons we run a useful, one is the metadata demon.

13:23.000 --> 13:30.000
It's quite complex. I'm not explaining the main things, but the only thing is that it basically translates your metadata path like,

13:30.000 --> 13:33.000
into actual file IDs that is actually stored in the disk.

13:33.000 --> 13:40.000
And it also does all the maintenance activities of bringing and managing many other things.

13:40.000 --> 13:48.000
And talking with the persistent data store itself, it's actually a very simple thing that actually talks,

13:48.000 --> 13:54.000
read is on top of, I mean, it kind of speaks latest protocol on top of ROXDB.

13:54.000 --> 14:02.000
So, every metadata transaction is essentially a rough transaction that eventually is persisted on ROXDB.

14:02.000 --> 14:08.000
And how we actually host the co-op DB themselves are like they are usually collected with MGM nodes,

14:08.000 --> 14:11.000
having an NVMe storage.

14:11.000 --> 14:19.000
And this is usually how you scale these numbers, like, you know, you'd store about 200 megabytes for every 100 millions of files.

14:20.000 --> 14:24.000
And talking about logical concepts, you have, of course, nodes.

14:24.000 --> 14:29.000
And then, you know, you have a concept of a group, which is a aggregation of filesystems.

14:29.000 --> 14:35.000
And then you have a logical group of filesystems called the space.

14:35.000 --> 14:42.000
Coming down to the actual code implementation, it's actually implemented as a plugin for the 6D framework.

14:42.000 --> 14:52.000
And then, which provides a POSICS like API and also, I mean, other methods of authentication and these kind of things.

14:52.000 --> 14:58.000
The X-roody project project that was started on Stanford, linear accelerator.

14:58.000 --> 15:03.000
And this is mainly a C++ framework that provides POSICS like namespace.

15:03.000 --> 15:07.000
And also, many other clustering functions.

15:07.000 --> 15:11.000
And it also gives you authentication mechanisms.

15:11.000 --> 15:17.000
And, you know, mechanisms like the party copy, where you can actually do a sync of one file server to another.

15:17.000 --> 15:26.000
I'll not go into the details of how, you know, you cluster and these kind of things, but then I've left it in the slides for, you know, data reference.

15:26.000 --> 15:32.000
Talking about US itself, when US is authenticated, it's usually via Cobra's.

15:32.000 --> 15:38.000
Or in Excel, it's largely Cobra's that we authenticated US as well.

15:38.000 --> 15:41.000
Otherwise, you also have X-Final, final times certificates.

15:41.000 --> 15:48.000
Then, you have a token system where you can actually give us token of a given scope for a particular file and data access.

15:48.000 --> 15:54.000
And finally, you have also integration with over the these kind of things so that you can target into your authentication system.

15:54.000 --> 15:59.000
Access control is implemented similar to NFS V4 accles.

15:59.000 --> 16:04.000
These are mostly attributes in the metadata server.

16:04.000 --> 16:07.000
And it's also not restricted to users alone.

16:07.000 --> 16:15.000
You have also group concepts of groups and egroup sets on, where you can actually deny your grant access or particular permissions and files.

16:15.000 --> 16:20.000
You have a quota system at the group user or egroup level.

16:20.000 --> 16:28.000
And as far as an administrator is concerned, you also have an access interface where you can actually ban users groups or things like these.

16:28.000 --> 16:37.000
And this is quite essential when you are actually having lots of people running very heavy analysis workloads because these workloads can actually sometimes kill the entire project.

16:37.000 --> 16:45.000
So we have to make sure that these things, you know, at the rear case when you need them, they are available for you.

16:45.000 --> 16:49.000
Talking about the one of the most fundamental features, one is the file layout.

16:49.000 --> 16:52.000
This is a policy, especially how you want to replicate a file.

16:52.000 --> 17:01.000
Default does at least like to replica, but you can usually for the large experiments is usually a ratio coded in, you know, two or three paraditians.

17:01.000 --> 17:05.000
And you can, I mean, go up to two device stripes.

17:05.000 --> 17:12.000
And these are not consequential just at the file level, but you can also do it at the directory level or a space level or something like this.

17:12.000 --> 17:15.000
Or you can even have policies at a user level.

17:15.000 --> 17:19.000
And you also have a system that can actually convert from one file layout to another.

17:19.000 --> 17:26.000
So when you, you can actually write files in a replica mode and then later convert them to a ratio coding, with the system.

17:26.000 --> 17:32.000
And we support multiple checksums, of course, the usual MD5 chart and these kind of things.

17:32.000 --> 17:38.000
And all the files checksums are guaranteed to be verified, I mean, before you issue a close command.

17:38.000 --> 17:44.000
And also a ratio coding, of course, always, I mean, computes block checksums because that's very essential if you wanted to do.

17:44.000 --> 17:47.000
If you wanted to easy to rebuild.

17:47.000 --> 17:54.000
One of the other fundamental features is of course, the converter interface which actually tries to converge files from one interface to another.

17:54.000 --> 18:02.000
And you also have this interface which can actually do a soft deletes in this kind of thing, so that you can actually expire files.

18:02.000 --> 18:08.000
One very important feature we have is the ishipping policies.

18:08.000 --> 18:13.000
This is very essential because you will have data from the machine that always comes from the electricity program, which is always critical.

18:13.000 --> 18:18.000
So these rights service have to go through, but a user doing analysis can actually wait.

18:18.000 --> 18:24.000
So you can actually say priority for rights which is configured for every experiment.

18:24.000 --> 18:28.000
And you also actually can cap bandwidth for experiments.

18:28.000 --> 18:37.000
So you can actually cap set of policies like this user does not get more than 250 megabytes of data streams.

18:38.000 --> 18:47.000
And that the hood on XFS filesystems, we also, I mean, configured the Linux scheduling priorities for these.

18:47.000 --> 18:51.000
Not only for the data streams, for the metadata also we implement a quality of service.

18:51.000 --> 18:58.000
And here you can actually throttle operations like file open or you know operations like listing or these kind of things.

18:58.000 --> 19:03.000
You also configure limits and the amount of threads that the user is pointing.

19:03.000 --> 19:06.000
And also that you will probably need a server responding.

19:06.000 --> 19:12.000
You can actually configure these things to actually have a hard limit so that when the user actually comes with the operation like this, then they are actually installed.

19:12.000 --> 19:14.000
So that they can no longer do the operation.

19:14.000 --> 19:18.000
They have to wait for the, I mean, for the given time before they can actually do.

19:18.000 --> 19:24.000
And this is actually because you don't want one user who is running a batch job with 5000 compute notes.

19:24.000 --> 19:30.000
Taking down if we are the resources in the server of other users who just want to check something very simple on the,

19:30.000 --> 19:35.000
home source directory so something like this.

19:35.000 --> 19:42.000
And if we talk about applications that are built on top of us, so one of the main applications we run as soon as something called serverenbox,

19:42.000 --> 19:53.000
which, you know, provides this interface similar to top box and also who is a lot of applications on top.

19:53.000 --> 20:00.000
At, as of current speaking, we are talking with 31 key user accounts at certain box.

20:00.000 --> 20:08.000
And there are many other applications like, I mean, and as a server workloads like Apache Spark and these kind of things.

20:08.000 --> 20:16.000
And then largely over analysis workloads of Python notebooks and these kind of things also happen at top of this.

20:16.000 --> 20:22.000
And talking about the current and future development roadmap, one of the important things we have in,

20:22.000 --> 20:29.000
so what is the SMR feature, where we are trying to actually write data into SMR disks.

20:29.000 --> 20:38.000
Currently we have added basic support that, but then we are actually waiting for a larger test pool to understand where we can actually do it at scale.

20:38.000 --> 20:49.000
Our understanding is that HTML hard to support should just work out of the box because in theory HTML is very similar to your normal conventional magnetic recording.

20:49.000 --> 20:56.000
Talking about loop cost flash, we are also evaluating QLC flash and these kind of technologies.

20:56.000 --> 21:03.000
We have a research project that is in the pipeline which should be having a position open very soon at turn.

21:03.000 --> 21:06.000
So, you do look in our jobs portal.

21:06.000 --> 21:15.000
One of the other things we are working on is the metadata server scalability and in terms of trying to make it more performant.

21:15.000 --> 21:27.000
One thing we are also working on is the SMR gate wave support because that is one of the most prominent protocols nowadays for any of these file system like interfaces now.

21:27.000 --> 21:31.000
Another thing we are working on is the different quality of service storage tiers.

21:31.000 --> 21:39.000
So, you can actually write to a flash storage and then this actually eventually gets lifecycle into a slow storage and then tap and these kind of things.

21:39.000 --> 21:51.000
And I see, I mean, exploration of where we have fast network interfaces for our, I mean, though, through the farms.

21:51.000 --> 21:55.000
And this almost links me to the end of the talks as far as concluding the March group.

21:55.000 --> 22:06.000
So, the US was developed at some primarily as a software to support the large scale physics use cases, but it also supports use cases beyond physics.

22:07.000 --> 22:12.000
Largely, we saw it's own box which is actually used for user filesystems not related to physics alone.

22:12.000 --> 22:17.000
So, every analysis is a more or less involved COS.

22:17.000 --> 22:24.000
And I mean, it's also used even data acquisition systems which actually have very high transaction rates.

22:24.000 --> 22:34.000
It's particularly appreciated by the users because you have the access to unified namespace where you actually access the same file on your physics analysis system,

22:34.000 --> 22:39.000
so you can actually use your own box to write your PDF files or something like this.

22:39.000 --> 22:47.000
With the next run of a let's see a few years ago, we are also trying to improve the system so that we can actually support the data rates for the next run,

22:47.000 --> 22:51.000
which is expected to have 10 times more the bandwidth requirements.

22:51.000 --> 22:58.000
The software is GPLV3 license, so we would love contributions, all the links would be in the next page.

22:59.000 --> 23:04.000
And largely about operations itself, we don't actually actively run them.

23:04.000 --> 23:11.000
We are running your system as a basic support, but it still seems like we still run at the 4th 9th of availability even with this.

23:11.000 --> 23:15.000
We largely do not have large operational incidents running them.

23:15.000 --> 23:24.000
So, I mean, it seems to work pretty well for us, and it's not only used at 7, but also few of the other high energy physics sites like Fermi Lab.

23:24.000 --> 23:31.000
And European Commission's GRC is using yours as an analytics platform.

23:31.000 --> 23:37.000
And finally coming to the project links themselves, you have the documentation and a bit here.

23:37.000 --> 23:43.000
We host server packages for Alma and all the EL variants like Red Hat.

23:43.000 --> 23:47.000
Client packages, we do have open to another things.

23:47.000 --> 23:53.000
You also have Docker images and these kind of things if you want to just run these as container images.

23:53.000 --> 24:04.000
And two months on the line, we also host this use workshop at this thing called Techweek Storage, which I think will be covered in the next two slides.

24:04.000 --> 24:08.000
That one moves with the end of the presentation.

24:08.000 --> 24:17.000
And I'm open for question.

24:17.000 --> 24:21.000
Let's lift.

24:21.000 --> 24:26.000
Yes.

24:26.000 --> 24:29.000
So far not yet.

24:29.000 --> 24:34.000
So the question was, for the yesterday's support, have we considered RGB standalone?

24:34.000 --> 24:40.000
We haven't evaluated yet, but I did see that the RGB now supports post-6 file systems.

24:40.000 --> 24:43.000
Is this merged in RGB already?

24:44.000 --> 24:53.000
Okay. So the answer was like, I mean, there is a feature developed in RGB to expose post-6 file systems.

24:53.000 --> 25:01.000
Once this is stable, we will probably evaluate that as a referendum.

25:01.000 --> 25:04.000
More questions? Yes.

25:05.000 --> 25:29.000
So the question was, when we are talking about limits that we can apply to users for metadata operations and these kind of things, is this feature built into X2D or it's a feature that we developed.

25:29.000 --> 25:34.000
And the answer is actually, it's not a X2D feature, it's something that we developed on top of EOS.

25:34.000 --> 25:39.000
Primarily because it's something that we experience a lot when we have user activity.

25:39.000 --> 25:48.000
And this is something that we developed over time, understanding what physicists usually do, which is trying to do a lot of not the correct way of accessing a file system.

25:48.000 --> 25:55.000
We do lot of listing operations for checking whether the file exists, which shouldn't be done in this format, but then people do it anyway.

25:55.000 --> 25:59.000
So people do metadata intensive workloads without them intending them.

25:59.000 --> 26:04.000
So when you kind of read them with them, they kind of get creative and then you understand how to do it properly.

26:04.000 --> 26:10.000
So it's something that we built.

26:10.000 --> 26:21.000
So the question was, what kind of this we are using? Hardness.

26:21.000 --> 26:29.000
So the answer is, we largely for all the physics stories and it's all hardness.

26:29.000 --> 26:36.000
For very limited use cases, we have essays these, but we are mainly bound by cost, so almost everywhere is just hardness.

26:36.000 --> 26:44.000
We don't have budget for running a large NVMe storage.

26:44.000 --> 26:49.000
Any more questions?

26:49.000 --> 26:55.000
Okay, hi, one more question.

26:55.000 --> 26:59.000
So the question was, how often do we replace hardness?

26:59.000 --> 27:07.000
Normally, I mean, you come with a three or four year warranty with the hardest, but usually we try to run them almost to the end of life cycle.

27:07.000 --> 27:13.000
So if it's not there in one of the slides, we do have a side of all the hard-wage operations we run.

27:13.000 --> 27:18.000
So the oldest hardest we are running is almost 10 years old at this point.

27:18.000 --> 27:27.000
But this is not always a happy thing to run, because eventually over time you would actually have a lot of operational issues that come along with it.

27:27.000 --> 27:38.000
But more or less we are seeing like, it's largely okay for large physics files, because I mean, as far as you constantly do all of your corrections and these kind of things they are fine.

27:38.000 --> 27:41.000
It seems to work for us. Yeah, one more question.

27:41.000 --> 27:51.000
So you have states that you consider as your mind, because there are these might be not working.

27:51.000 --> 27:58.000
All right, so the follow-up question was like, if we have it in consideration that the hardest may not work, given they are out of life cycle.

27:58.000 --> 28:10.000
And yes, we do, I mean, this is one of these important aspects we have and the other factor with which comes with it is also that you try to, I mean, cycle to different generations.

28:10.000 --> 28:19.000
So that, you know, you had, I mean, we try to optimize our groups such a way that they don't spend with the one kind of hardest generation.

28:19.000 --> 28:24.000
So that you try to kind of, I mean, hedge this.

28:24.000 --> 28:29.000
You mentioned one two for parity.

28:29.000 --> 28:32.000
Yeah, so the person was for erasure coding, what's the minimum parity we use.

28:32.000 --> 28:37.000
We start with only two, I mean, at least base minimum of two parity for erasure coding.

28:37.000 --> 28:43.000
For the application, we might go with two replicas in some cases, but for erasure coding, it's usually minimum two parity.

28:43.000 --> 28:54.000
But we do support even one parity, but it is not very safe to run like this.

28:54.000 --> 28:58.000
Okay, it seems like it's time. Okay.

28:58.000 --> 29:08.000
Thank you.