WEBVTT

00:00.000 --> 00:05.000
Thank you.

00:05.000 --> 00:10.000
Hello, so my name is Manib Zulian Leduc.

00:10.000 --> 00:12.000
I was living in Brazil, actually,

00:12.000 --> 00:15.000
and my wife was working in ULB, so it's nice to be back here.

00:15.000 --> 00:18.000
I felt that I left Brussels and landed in Geneva

00:18.000 --> 00:20.000
and started working at CERN.

00:20.000 --> 00:26.000
Now, I'm the service manager for the tape RFF service at CERN,

00:26.000 --> 00:30.000
which is like kind of a big responsibility.

00:30.000 --> 00:32.000
All the data that, basically,

00:32.000 --> 00:35.000
I'll be sure you what happened at CERN,

00:35.000 --> 00:36.000
but for me, CERN is like that.

00:36.000 --> 00:39.000
There's a collision and everything lands on tape,

00:39.000 --> 00:41.000
and that's it.

00:41.000 --> 00:44.000
The end product of the LHC is data.

00:44.000 --> 00:46.000
If the communication is cut outside CERN,

00:46.000 --> 00:49.000
it can only land at CERN and in the CTS service.

00:49.000 --> 00:51.000
So this is what I show you.

00:51.000 --> 00:54.000
And it's kind of important because,

00:54.000 --> 00:59.000
I think it's one of the only measurements that came out of IT

00:59.000 --> 01:03.000
and through this presentation by the director general of CERN.

01:03.000 --> 01:07.000
So, basically, what is the name space statistic of what we have on tape?

01:07.000 --> 01:10.000
This is the amount of data we have on tape.

01:10.000 --> 01:14.000
CERN runs LHC, the big thing,

01:14.000 --> 01:18.000
start in runs, and during runs we take data.

01:18.000 --> 01:21.000
So, basically, you see over the last 15 years,

01:21.000 --> 01:24.000
this increase, which is very exponential.

01:24.000 --> 01:26.000
This is start of run 200,

01:26.000 --> 01:31.000
and run 240, so due to an increase,

01:31.000 --> 01:35.000
this is technical steps, not much is happened there,

01:35.000 --> 01:39.000
but reversing happens and still data is created at a slower rate.

01:39.000 --> 01:42.000
But the start of run 3 in July 2022,

01:42.000 --> 01:47.000
we are run 320, and a 2.2 years to double that.

01:47.000 --> 01:50.000
So you see, the fully story called data,

01:50.000 --> 01:53.000
is just two years of future data you're going to take.

01:53.000 --> 01:55.000
This is kind of thing we're dealing with.

01:55.000 --> 01:59.000
And only in 2024, we took 250,

01:59.000 --> 02:02.000
of data we had on top of tapes.

02:02.000 --> 02:05.000
Okay, so this is what we're talking about.

02:05.000 --> 02:07.000
In terms of what we have on later,

02:07.000 --> 02:11.000
this is the values runs 1, 2, 3.

02:11.000 --> 02:13.000
We expect to be at 1.3,

02:13.000 --> 02:18.000
at the end of June, June 2026,

02:19.000 --> 02:21.000
but it looks like it will be worse.

02:21.000 --> 02:24.000
And see, that the future it's always green,

02:24.000 --> 02:29.000
so we won't lack work, that's just real.

02:29.000 --> 02:35.000
This is what it looks like in terms of monthly written data to tape.

02:35.000 --> 02:39.000
So you see, run 1 was around 4.6 petabyte record.

02:39.000 --> 02:41.000
This is Avian and of the year,

02:41.000 --> 02:46.000
when they collide, I am together in the LEC.

02:46.000 --> 02:50.000
And run 2, the record was 16 petabyte and last year,

02:50.000 --> 02:54.000
20, 24, over 40 petabyte for four months.

02:54.000 --> 02:58.000
Okay, so record is everyday basically.

02:58.000 --> 03:01.000
This is the infrastructure we have to deal with that,

03:01.000 --> 03:03.000
to deal with all those tapes.

03:03.000 --> 03:05.000
This is one old one I'm showing you.

03:05.000 --> 03:07.000
This is the GB, which runs here.

03:07.000 --> 03:09.000
This is the enterprise world.

03:09.000 --> 03:11.000
This is the linear tape open world.

03:11.000 --> 03:12.000
We have two different technologies,

03:12.000 --> 03:15.000
so that we share the risk and take the cheapest one.

03:15.000 --> 03:17.000
At any given moment in time,

03:17.000 --> 03:21.000
we don't put all X in the same basket as the strategy.

03:21.000 --> 03:25.000
As you see, we have between the enterprise drives and the linear

03:25.000 --> 03:29.000
to open drive, about 180 tape drives.

03:29.000 --> 03:31.000
The tape drive is where you mount those tapes.

03:31.000 --> 03:33.000
It's got by a robot sticking there.

03:33.000 --> 03:38.000
And then you write or read on the tape at 400 megabytes per second.

03:38.000 --> 03:43.000
And this is what we have to feed as a buffer in front of the tape.

03:43.000 --> 03:45.000
So let's see a bit about the evolution.

03:45.000 --> 03:47.000
This is historical stuff.

03:47.000 --> 03:49.000
It's how we did it in run one.

03:49.000 --> 03:53.000
Basically, there was just one strategy stem that makes disk and tape.

03:53.000 --> 03:56.000
As Abhishek showed you in the previous presentation,

03:56.000 --> 03:58.000
he also was introduced in 2010.

03:58.000 --> 04:02.000
It was used for run two to distribute,

04:02.000 --> 04:04.000
all the bandwidth requirements were.

04:04.000 --> 04:08.000
Castor was kept for data transfers to tape.

04:08.000 --> 04:09.000
But the problem here that you see,

04:09.000 --> 04:13.000
the disk plate of the experiment is between two different storage

04:13.000 --> 04:14.000
side.

04:14.000 --> 04:16.000
This is still this based, this based.

04:16.000 --> 04:18.000
So if you want to use a full disk you're paying for,

04:18.000 --> 04:20.000
you need to split your data.

04:20.000 --> 04:22.000
It's a bit problematic.

04:22.000 --> 04:27.000
And as tape drives read from the previous storage tier.

04:27.000 --> 04:29.000
That's 100 megabytes per second per file.

04:29.000 --> 04:32.000
This complicated unit like for run three,

04:32.000 --> 04:36.000
we would have needed 70 better of our drives to get the bandwidth.

04:36.000 --> 04:39.000
We didn't leave little capacity for the rest.

04:39.000 --> 04:41.000
So basically for run three,

04:41.000 --> 04:45.000
we did this decision to replace Castor by CTA,

04:45.000 --> 04:50.000
which has a SSD based tape buffer.

04:50.000 --> 04:54.000
Use driven so that we don't have to deploy to develop

04:54.000 --> 04:57.000
the protocol into different system like Castor and use.

04:57.000 --> 04:59.000
Use the ZTOL and on the CTA side,

04:59.000 --> 05:03.000
we just care about efficient transfer to run from tapes.

05:04.000 --> 05:07.000
SSDs are also much better because in the passing castle,

05:07.000 --> 05:10.000
we were multiplexing the files in memory.

05:10.000 --> 05:13.000
So you read 10 files, but 100 megabytes per second.

05:13.000 --> 05:17.000
When Wi-Fi is ready, you can stream it from memory to the drives.

05:17.000 --> 05:21.000
We stand gig files with luck and faster drives.

05:21.000 --> 05:23.000
That's getting problematic to feed that in memory.

05:23.000 --> 05:26.000
So SSDs is the way we multiplex those files.

05:26.000 --> 05:30.000
Native speed of SSDs at the time was 500 megabytes per second,

05:30.000 --> 05:35.000
which is higher than the tape drive speed.

05:35.000 --> 05:36.000
So it was nice.

05:36.000 --> 05:38.000
And we had to switch and try this.

05:38.000 --> 05:41.000
For the run four requirements, you saw before.

05:41.000 --> 05:45.000
There's no way we would write two exhibits during three years

05:45.000 --> 05:49.000
with 100 of petabyte of disk in front.

05:49.000 --> 05:57.000
So basically, this simplification made a bit more complicated.

05:57.000 --> 06:00.000
But you see a formula one is efficient, but it's not.

06:00.000 --> 06:03.000
You don't start a formula and I want to start a standard clar.

06:03.000 --> 06:05.000
It's much more complex process.

06:05.000 --> 06:07.000
So now we have EOS for the disk side.

06:07.000 --> 06:10.000
FTS is transferring the data to CTA.

06:10.000 --> 06:14.000
Now it's compulsory for the user to check that the file is on tape.

06:14.000 --> 06:17.000
Before removing it from EOS.

06:17.000 --> 06:20.000
We have just one copy on the tape buffer.

06:20.000 --> 06:24.000
If you lose the file on the way to tape, you write it.

06:24.000 --> 06:27.000
There's no need for us to have redundancy on the CTA side,

06:27.000 --> 06:30.000
so that we are more efficient, no redundancy fine.

06:30.000 --> 06:32.000
The file is still in EOS.

06:32.000 --> 06:34.000
Just keep it until it's on tape.

06:34.000 --> 06:37.000
So what is CTA about?

06:37.000 --> 06:44.000
Basically, it's basically the part that is queuing the transfers.

06:44.000 --> 06:51.000
And basically, you write a file on EOS on the EOS cache instance in front of CTA.

06:51.000 --> 06:55.000
It's in an event queues the file and CTA will scale you the tape mount.

06:55.000 --> 06:57.000
And the tape drive will read the file from the buffer.

06:57.000 --> 07:00.000
And this is this and write it on tape efficiently.

07:00.000 --> 07:05.000
So you can copy CTA with something else as a buffer in front.

07:05.000 --> 07:08.000
It's on its EOS plus CTA, but this is a V cache.

07:08.000 --> 07:11.000
And they use CTA for the tape movement engine.

07:11.000 --> 07:15.000
The architecture is this one.

07:15.000 --> 07:18.000
So we have the schedule for the queue.

07:18.000 --> 07:25.000
And the catalog to understand on which on this tape, for example, the 100 file is which file.

07:25.000 --> 07:27.000
From which EOS CTA instance.

07:27.000 --> 07:32.000
And as well when you read back, you trigger an event that you trigger a amount of V-s tape,

07:32.000 --> 07:36.000
mount the tape, go to the 100 file and read it back, pat it in the buffer,

07:36.000 --> 07:38.000
and scale it for transfer.

07:38.000 --> 07:41.000
So a lot of small little steps.

07:41.000 --> 07:45.000
What does it look like as a production infrastructure?

07:45.000 --> 07:48.000
So basically, to drive well is traffic.

07:48.000 --> 07:51.000
We are only 64 hyper converged servers.

07:51.000 --> 07:54.000
So for me, an hyper converged server is a streaming machine.

07:54.000 --> 08:00.000
16 times two-terabyte SSD, set-eye SSD machine.

08:00.000 --> 08:03.000
Connected on 25 gigabits ethernet.

08:03.000 --> 08:10.000
A very small blocking factor because whatever enters one machine needs to go out at the same speed.

08:10.000 --> 08:13.000
There's a bottleneck inside.

08:13.000 --> 08:18.000
I cannot read the same throughput to tape and data spiling up.

08:18.000 --> 08:26.000
Basically, if you take those 64 servers, it gives 160 gigabits per second,

08:26.000 --> 08:28.000
a full deplex throughput.

08:28.000 --> 08:32.000
You write a 160 gigabits per second to those 64 servers,

08:32.000 --> 08:40.000
and you get 160 gigabits per second to tape with 400 megabytes faster.

08:40.000 --> 08:45.000
Every single view, which is an experiment, has its own set of eos and eos.

08:45.000 --> 08:48.000
It stands for compile, and they write to the same set of tapes and writes.

08:48.000 --> 08:52.000
So if one experiment runs, write less than the others, we can give more drives to the other one,

08:52.000 --> 08:55.000
and we can distribute throughput more fairly.

08:55.000 --> 08:59.000
In terms of buffets, it's more or less very conservative.

08:59.000 --> 09:05.000
You see, we still have, we kept something like eight hours of buffer.

09:05.000 --> 09:09.000
10 gigabits per second per per each experiment.

09:09.000 --> 09:16.000
This is roughly what we have 60 gigabits per second of throughput to tape with 180 drives.

09:16.000 --> 09:18.000
And that's kind of it.

09:18.000 --> 09:23.000
Inside one of those instance, we also separate what goes to tape.

09:23.000 --> 09:27.000
It's a different set of SSD than the things that comes from tape.

09:27.000 --> 09:32.000
Because you see, when a file is written there, it's evicted when it's land on tape.

09:32.000 --> 09:35.000
When the file on here, it's evicted when the user moves it out.

09:35.000 --> 09:40.000
So if we had only one buffer, a user not moving files fast enough,

09:40.000 --> 09:43.000
we'll fill the default space and then again it cannot archive.

09:43.000 --> 09:49.000
And archiving is critical because we have one example to move quite regularly.

09:49.000 --> 09:55.000
And another thing is that during the year, you saw the traffic and so on,

09:55.000 --> 09:59.000
but if we look archive volume and stage volume, we have some data taken by it,

09:59.000 --> 10:02.000
they take a lot of experiments right a lot to tape.

10:02.000 --> 10:06.000
And there are some not so much data taken per year.

10:06.000 --> 10:10.000
We have some technical stuff and stuff like that, where they read back the data.

10:10.000 --> 10:15.000
So in this case you see, you can anticipate what we happen.

10:15.000 --> 10:21.000
During the data taken for AVI, for proton proton, you get a bit less archive throughput.

10:21.000 --> 10:26.000
A lot during AVI on, but then in between, they read back and they read back a lot at the end of the year

10:26.000 --> 10:29.000
to distribute the data to other sites.

10:29.000 --> 10:35.000
Okay, so basically, we have 10 SSD server, those like per commercial server.

10:35.000 --> 10:40.000
And during the data taking period, they put 9 on the default space for archival.

10:40.000 --> 10:45.000
That gives me 0 to 29 gigabytes per second of throughput to drives.

10:45.000 --> 10:50.000
With this, I can drive 7, T, tape drives at full speed.

10:50.000 --> 10:55.000
And I still keep one server because they may need to occasionally read back it.

10:55.000 --> 10:58.000
There's no server on a return site for get about reading.

10:58.000 --> 11:03.000
And when you enter the end of the year or recall season, it's more balanced.

11:03.000 --> 11:07.000
But they still have at least 10 gigabytes per second of throughput at 12,

11:07.000 --> 11:12.000
which can drive 34 drives for, for, for, for, right, 34 for read,

11:12.000 --> 11:16.000
so that they can do still late archival.

11:16.000 --> 11:20.000
Okay, it gives us a lot of flexibility.

11:21.000 --> 11:25.000
How do you really make the dimension that? So, you see, I went back to the benchmark.

11:25.000 --> 11:33.000
I did in 2018 when I was buying my machine and testing my, myself, about them before COVID,

11:33.000 --> 11:36.000
and I still in production, basically.

11:36.000 --> 11:40.000
So, basically, I did a bunch of streaming performance using DD,

11:40.000 --> 11:43.000
simplex-wise, so you do a lot of writes, a lot of reads,

11:43.000 --> 11:47.000
increasing the number of SSD you used, and then you see at one point,

11:47.000 --> 11:53.000
in the notebook, you reach the button neck on the HBA, my HBA on the machine,

11:53.000 --> 11:57.000
cannot go over seven gigabytes per second of throughput inplex.

11:57.000 --> 12:01.000
Then you test a duplex through duplex, because what we read is what we read.

12:01.000 --> 12:06.000
What we write is what we read, but we read the different times on different SSDs,

12:06.000 --> 12:11.000
because we read the drives read later when the clothes happen, it's cute.

12:11.000 --> 12:16.000
So, basically, you see, you get 3.5 gigabytes per second of writes

12:16.000 --> 12:19.000
and read at the same time, from different sets of SSDs, that's okay,

12:19.000 --> 12:24.000
because it will be statistically distributed between the number of SSDs we have,

12:24.000 --> 12:31.000
since it's under an N60, where we see we have 70 reads on the N60,

12:31.000 --> 12:37.000
in terms of stream writing, we have about 120 streams writing,

12:37.000 --> 12:42.000
so there's not a lot of collision, and we need to tackle statistical performance.

12:42.000 --> 12:47.000
So, basically, the HBA was good enough, in fact, because 70 gigabytes per second of internal throughput,

12:47.000 --> 12:53.000
is more than accumulated simplex throughput of 125 gigabit per second,

12:53.000 --> 12:56.000
port, which is around 5.5 gigabytes.

12:56.000 --> 13:03.000
So, no internet, now we need to move to how it looks like when we're going through this.

13:03.000 --> 13:08.000
So, basically, I take one machine, here, this is just one machine, right to the SSDs,

13:08.000 --> 13:14.000
read from the SSDs to tape. So, one color is when SSDs, you get the 16 SSDs here,

13:14.000 --> 13:21.000
saturating the network at 2.4 gigabytes per second, and then when the tape are mounted,

13:21.000 --> 13:25.000
here you see, one color is when tape, when the tape are getting mounted,

13:25.000 --> 13:29.000
we read from different SSDs in the late manner, like I was explaining.

13:29.000 --> 13:33.000
This data in production is around 20 minutes, okay.

13:34.000 --> 13:39.000
And you see here, the strings, one server, can drive 6, drive full speed in parallel,

13:39.000 --> 13:42.000
to give this back the number we had.

13:42.000 --> 13:47.000
So, that was the test, those were the test I did with one server before internal production.

13:47.000 --> 13:52.000
Now, next step is then to scale up, but your nice network together,

13:52.000 --> 13:56.000
so we have a separate network for tape, to drive that type server,

13:56.000 --> 14:02.000
400 megabytes per second here, and they're a blocking factor that is bigger than this one,

14:02.000 --> 14:05.000
because here is why we have this buffer.

14:05.000 --> 14:10.000
And then in production, this is what we see.

14:10.000 --> 14:15.000
Green, my KPI is efficiency, so the greener, the more efficient,

14:15.000 --> 14:19.000
whatever is green is over 350 megabytes per second,

14:19.000 --> 14:23.000
and those are 100, one line is one drive, okay.

14:23.000 --> 14:27.000
You see here, another range one, that was one drive,

14:27.000 --> 14:30.000
one type server, and the port was locked at one gigabyte,

14:30.000 --> 14:36.000
when gigabit, as a net instead of 10, so it stayed like that for, it's really bad.

14:36.000 --> 14:40.000
When I have an alarm on that, and you see your system strings that are not used a lot,

14:40.000 --> 14:44.000
basically, you see, those are the generations I showed in earlier,

14:44.000 --> 14:49.000
we have L29 here and here, 1170 here and here,

14:49.000 --> 14:52.000
so that I capacity 50 to rob like that tape,

14:52.000 --> 14:56.000
and this is slightly older generations that we use only when we need bandwidth.

14:56.000 --> 14:59.000
So you see some peak here, where we use them a lot,

14:59.000 --> 15:02.000
because it was mostly saturated, or we needed to do an intervention

15:02.000 --> 15:05.000
on one library of more recent media.

15:05.000 --> 15:08.000
And here, we had a peak of throughput during this period,

15:08.000 --> 15:10.000
and we kicked them in.

15:10.000 --> 15:13.000
The premise is that whatever we write using those older dried,

15:13.000 --> 15:16.000
we learn on older media, we will have to read it back,

15:16.000 --> 15:19.000
and write it to new media as that's what we call Repack,

15:19.000 --> 15:22.000
which is a huge part of operations.

15:22.000 --> 15:25.000
So let's go to Repack.

15:26.000 --> 15:29.000
That's about operation, this is the kind of tools we're using.

15:31.000 --> 15:33.000
Yep, and what is Repack?

15:33.000 --> 15:37.000
Basically, that's what I said, you take one on mega, you read it,

15:37.000 --> 15:41.000
and basically, when you have 50 terabyte tape,

15:41.000 --> 15:45.000
and the previous generation was 20, you read 20 terabyte,

15:45.000 --> 15:52.000
2.5 terabyte tape, and you write them on a single 50 terabyte tape,

15:52.000 --> 15:56.000
so you make space in your libraries, because libraries are just slots.

15:56.000 --> 16:00.000
Then you change the drives to newer generations,

16:00.000 --> 16:02.000
so that you can read the new media,

16:02.000 --> 16:06.000
and then you feed up again your library,

16:06.000 --> 16:09.000
until you can repack again for the next generation.

16:09.000 --> 16:13.000
The Repack is doing the same thing, it's reading from tapes,

16:13.000 --> 16:16.000
writing to an SSD buffer, and from the SSD buffer,

16:16.000 --> 16:19.000
whatever is cut there, it would be cut for archival

16:19.000 --> 16:22.000
on the new generation of media.

16:22.000 --> 16:29.000
We have this nice, complicated automatic tripack state machine.

16:29.000 --> 16:33.000
I think that you can look at those presentations if you're interested,

16:33.000 --> 16:35.000
but that the way we manage it automatically,

16:35.000 --> 16:39.000
and this is what happens in production, basically.

16:39.000 --> 16:45.000
You see, this is the set of our buffer of who it is.

16:45.000 --> 16:48.000
It's up to 100 terabyte,

16:48.000 --> 16:52.000
of SSDs for repack, and this instance,

16:52.000 --> 16:56.000
when it's filled up by a few more five for archival,

16:56.000 --> 17:00.000
so that we drain it, and we start sleeping some retrieval.

17:00.000 --> 17:03.000
Retrieval is reading from tape, filled the buffer,

17:03.000 --> 17:08.000
archival will lower from the buffer, so we have a top limit,

17:08.000 --> 17:13.000
and a low limit, so that we had a remove tapes for read and write.

17:13.000 --> 17:16.000
You see here that during three months,

17:16.000 --> 17:19.000
we had three gigabytes per second,

17:19.000 --> 17:22.000
reading and writing at the same time for repack tapes,

17:22.000 --> 17:25.000
and we repacked, basically,

17:25.000 --> 17:29.000
35 peta and 2,000 tapes over three months,

17:29.000 --> 17:34.000
while we were still archiving 40 terabyte payments.

17:34.000 --> 17:41.000
So, you see the repack is this sign line on top of archival here,

17:41.000 --> 17:45.000
like the three gigabytes per second line on top of the 20,

17:45.000 --> 17:49.000
20, 30 gigabytes per second of data taking,

17:49.000 --> 17:53.000
and this is what it's a bit more obvious on the read side,

17:53.000 --> 17:55.000
because repack was only thing reading,

17:55.000 --> 17:58.000
mainly during data taking.

17:58.000 --> 18:03.000
One thing, I'm happy about that CTA is also opening

18:03.000 --> 18:05.000
to the other communities.

18:05.000 --> 18:07.000
We're trying to drive protocol,

18:07.000 --> 18:08.000
like ABFAC was saying,

18:08.000 --> 18:11.000
a good concept of mostly HTTP based now,

18:11.000 --> 18:13.000
where we wrote on HTTP,

18:13.000 --> 18:15.000
tapressed API,

18:15.000 --> 18:18.000
that CTA can use to mount this Monteq query,

18:18.000 --> 18:19.000
where my file is,

18:19.000 --> 18:20.000
and stuff like that.

18:20.000 --> 18:22.000
We use it a lot.

18:22.000 --> 18:26.000
But then, CTA is not just about physics,

18:26.000 --> 18:27.000
because even at some,

18:27.000 --> 18:30.000
you see, we have plenty of custom uses for various aspects.

18:30.000 --> 18:32.000
I'm running this AFS to show,

18:32.000 --> 18:34.000
because AFS backup's actually learned on CTA,

18:34.000 --> 18:37.000
and GFS backup's file is backup's on there.

18:37.000 --> 18:38.000
Use namespace backup.

18:38.000 --> 18:42.000
CTA backup's a backup in CTA as well.

18:42.000 --> 18:44.000
And yeah,

18:44.000 --> 18:48.000
we need to also evolve our backup off at some,

18:48.000 --> 18:53.000
and CTA is the fact to stand up for whatever needs to learn on tapes.

18:53.000 --> 18:54.000
We would like to,

18:54.000 --> 18:57.000
we're looking to adding more protocol to CTA,

18:57.000 --> 18:59.000
like S3 Glacier.

18:59.000 --> 19:01.000
HPC community is a trail growing,

19:01.000 --> 19:03.000
but you see HPC is like,

19:03.000 --> 19:05.000
it's a close community,

19:05.000 --> 19:10.000
we're not really coming practice today, I'd say world.

19:10.000 --> 19:11.000
And yeah,

19:11.000 --> 19:14.000
basically what we wanted to demonstrate,

19:14.000 --> 19:18.000
that the mineral performance and efficiency was good for R&F.

19:18.000 --> 19:21.000
We also reached R&F during R&F,

19:21.000 --> 19:23.000
with over 24 gigabytes per second,

19:23.000 --> 19:25.000
for one instance.

19:25.000 --> 19:27.000
And the next step is clear,

19:27.000 --> 19:29.000
oriented towards reaching efficiently,

19:29.000 --> 19:31.000
because the disk capacity,

19:31.000 --> 19:34.000
with regard to all the data that's coming for R&F,

19:34.000 --> 19:35.000
is shrinking.

19:35.000 --> 19:37.000
So there will be more,

19:37.000 --> 19:40.000
much more reads from tapes during R&F,

19:40.000 --> 19:43.000
and we need to place files,

19:43.000 --> 19:44.000
intelligently.

19:44.000 --> 19:45.000
So one of my projects,

19:45.000 --> 19:46.000
those days,

19:46.000 --> 19:48.000
is work on archive metadata,

19:48.000 --> 19:51.000
to give a color to files,

19:51.000 --> 19:53.000
and put them closer together,

19:53.000 --> 19:57.000
so that we can know how efficient we will be when we're really back.

19:57.000 --> 19:59.000
And that's it.

19:59.000 --> 20:00.000
Also,

20:00.000 --> 20:01.000
if you have,

20:01.000 --> 20:02.000
if you want,

20:03.000 --> 20:05.000
we'll try to pinpoint source storage with tapes,

20:05.000 --> 20:07.000
things EM here and contact us,

20:07.000 --> 20:11.000
because we're really eager to do up until I'll side different.

20:13.000 --> 20:15.000
So thank you.

20:23.000 --> 20:26.000
Thanks for questioning me.

20:26.000 --> 20:28.000
Alright, So when now I will number five,

20:28.000 --> 20:29.000
from years to all,

20:29.000 --> 20:50.000
So the question was, when we move one file, so let's go back to, we were here, yes, kind of.

20:50.000 --> 20:56.280
So when we move one file from EOS to the SSD, before it moved to tape, do we stripe it on the

20:56.280 --> 21:03.280
SSDs? In fact, we, we don't need to, because we have enough throughput on one SSD to

21:03.280 --> 21:10.280
re-box everything, and CT manage single file only. So we take one file, the file is on

21:10.280 --> 21:17.280
EOS, it's moved on CT, on one SSD, on one file system, and then it's straight from this

21:17.280 --> 21:25.280
file system, and on one tape, in one F-sexo, in one position on one tape, and that's it.

21:26.280 --> 21:35.280
Yes, what's you archive something from tape, and that's not through the file

21:35.280 --> 21:41.280
system anymore, you only keep one copy on tape, and that's it, so as well, multiple

21:41.280 --> 21:49.280
back-ups of files. So the question is, how many copies of a file is on tape, do we have?

21:50.280 --> 21:57.280
So basically, at some, I mean, the pieces are like that they don't, they wouldn't

21:57.280 --> 22:02.280
want to keep all the eggs in the same basket, especially if some or the data sounds

22:02.280 --> 22:07.280
burn at one point, they have secondary point where they duplicate the data, so physics

22:07.280 --> 22:13.280
data, we have only one copy at some, and the second copy is as well. So basically, it's

22:13.280 --> 22:17.280
often distributed across the different tier ones, for example, if you take atlast,

22:17.280 --> 22:23.280
there are main tier ones outside sound, is brew cabin national lab, so they usually

22:23.280 --> 22:28.280
have, if not a copy, at least a good portion of the data. There are some pair of ways

22:28.280 --> 22:33.280
slightly more complicated, especially during AVIon, when the data rates are increasing, and

22:33.280 --> 22:39.280
we take, we took, for example, 21 gigabits per second, for system period, for CMS,

22:39.280 --> 22:43.280
that's more complicated for them to move it at the same time to tier ones, to the processing

22:43.280 --> 22:48.280
and so on, especially if they're running over the SLAs we agreed before. Okay, things are

22:48.280 --> 22:54.280
the machine for 10, if you go to 21, expecting to be a bit more complicated, so.

22:54.280 --> 23:01.280
But now, we have mostly one copy, the only, we have two copies for, for data that's

23:01.280 --> 23:19.280
only, it's, yes.

23:19.280 --> 23:23.280
So I would first use, use to use CTA.

23:23.280 --> 23:24.280
Question.

23:24.280 --> 23:26.280
That's the question.

23:26.280 --> 23:32.280
Can CTA work with something else, and use, for example, S3.

23:32.280 --> 23:39.280
So, in fact, CTA, if we go for the, here, I have a backup slide.

23:39.280 --> 23:46.280
Now, I already show you, showed here, that we have, use a sound in front of CTA, use

23:46.280 --> 23:54.280
writer or buffer, but a Daisy in Germany, they have a D-Cash disk system, and they hooked

23:54.280 --> 24:00.280
it on the back of CTA. This was the first step for us, as you see, we can address,

24:00.280 --> 24:06.280
one type server will read either from EOS or from D-Cash, and we have to use common

24:06.280 --> 24:10.280
protocol for, for, for, for authentication for triggering events and for, anything like that,

24:10.280 --> 24:13.280
like we're, we're moving to GAPC for this reason.

24:14.280 --> 24:21.280
They use HTTP for the transfers from D-Cash to tapes, we're still using XDD, but we are thinking

24:21.280 --> 24:26.280
about how to evolve that, and you see, this, somehow, we will have a technical student who will work

24:26.280 --> 24:32.280
on how to deal with other protocol, like S3 in front of tapes, so that we can interface

24:32.280 --> 24:36.280
backups in front of CTA.

24:37.280 --> 24:44.280
So we'll see at this moment, if we are more things, more things to change, and you see,

24:44.280 --> 24:52.280
I'll just say, the run, the current run 3 is up to mid-2026, which will lead us some

24:52.280 --> 24:57.280
time to redo things, we have things in the pipeline, like, reading back efficiently,

24:57.280 --> 25:04.280
we will change the scheduler, we are likely to change the APIs and stuff like that.

25:04.280 --> 25:08.280
Very likely.

25:08.280 --> 25:13.280
Yes?

25:13.280 --> 25:19.280
Yes.

25:19.280 --> 25:25.280
So the question was, when a new generation of media accounts, do you copy the whole

25:25.280 --> 25:34.280
exhibit of data to the next generation, we will have to, because we don't have anything

25:34.280 --> 25:41.280
in its space to keep our data, and the next trip back, we will have to do, we'll be

25:41.280 --> 25:47.280
400p, so 400p, a 10 gigabits per second of throughput reading, plus 10 gigabits per second

25:47.280 --> 25:52.280
of throughput writing is one and a half here, so we have to be faster.

25:52.280 --> 25:58.280
So you see, yeah, you see, take my job out.

25:58.280 --> 26:02.280
So the good thing with the repack is that it expects you to need a shutdown.

26:02.280 --> 26:07.280
We have massive repack that exercise our infrastructure before the next run comes.

26:07.280 --> 26:13.280
And you see, if we, as I said, I mean, future data is just two years of future data we take.

26:13.280 --> 26:17.280
So if we cannot deal with historical data, forget about dealing with what is coming.

26:17.280 --> 26:21.280
Okay, so we have to, it's a good exercise.

26:21.280 --> 26:28.280
And yeah, we have to go for the generation of media that the next run for, we use as well as you,

26:28.280 --> 26:32.280
your legacy of media and drive is just insane.

26:32.280 --> 26:34.280
Plus we have to pay maintenance and spare drive.

26:34.280 --> 26:40.280
So slower drive costs more and slower, so you have to pay more libraries if you keep all the media.

26:40.280 --> 26:46.280
So we really have to back up to repack, so that we know and we demonstrate that we can deal with future data.

26:46.280 --> 26:54.280
And you see, in fact, we get aside a city architecture with SSDs during LS1,

26:54.280 --> 26:59.280
while still using Castor for repack, because we can explain it with repack.

26:59.280 --> 27:00.280
Yes?

27:00.280 --> 27:04.280
When the next generation of tips in ready, so HL9 or what else?

27:04.280 --> 27:09.280
HL9 is already there, HL10 will come soon.

27:09.280 --> 27:11.280
And we follow the cycle.

27:11.280 --> 27:16.280
We don't know exactly, so when do you expect the next generation of tape media to be available?

27:16.280 --> 27:23.280
So to your 10 will be the next generation, I think it comes either end of this year or next year.

27:23.280 --> 27:28.280
But then you see, we have also a technical stop of close to three years.

27:28.280 --> 27:37.280
Is it, should we focus on improving performance that we repack fast on the very last year of the shutdown?

27:37.280 --> 27:41.280
Do we need to get that with the two types of drives I showed you?

27:41.280 --> 27:43.280
Like we have enterprise, we have LTS.

27:43.280 --> 27:47.280
The cycle is more like, there's LTS one year, there's enterprise another year.

27:47.280 --> 27:54.280
So we see, because buying a full exhibit back of data is on the expensive side of things.

27:54.280 --> 27:59.280
So skipping one major generation could spare a bit of money.

27:59.280 --> 28:04.280
Can you use the compression and encryption from the tape drive?

28:04.280 --> 28:09.280
And how do we manage the tape drive cleaning process?

28:09.280 --> 28:16.280
Okay, so do we use encryption and compression on the tape drives and do how do we manage the cleaner process?

28:16.280 --> 28:22.280
So basically, the data is mostly highly compressed.

28:22.280 --> 28:28.280
We compress it on top, but basically we don't get much more.

28:29.280 --> 28:33.280
We don't get a huge gain, because it's mostly a root fight that are already very compressed.

28:33.280 --> 28:46.280
So if we see that we can fit 10, like 40, 10, 10, 10, 10, 10, 10, we will contact the user and say, hey, please compress before.

28:46.280 --> 28:53.280
And regarding encryption, we use encryption only for non public, so physical, all physical data is supposed to be open and near.

28:53.280 --> 28:57.280
So that's not critical data, we need to encrypt.

28:57.280 --> 29:01.280
We only encrypt backup side of data and we manage the keys, basically.

29:01.280 --> 29:08.280
You configure the drive with a key and it will encrypt on the flight.

29:08.280 --> 29:10.280
And I think that's it.

29:10.280 --> 29:17.280
And the cleaning that's often part of the tape of the media library, the media library, they come with their own cleaning cycles.

29:17.280 --> 29:24.280
If a drive mounted in the entire user, they'll come to us and they will mount a cleaning cartridge every now and then.

29:24.280 --> 29:25.280
Yes?

29:25.280 --> 29:32.280
We have a specific procedure when a new version of ASS comes out, or if you want to test the ASL of a matrix, let's say,

29:32.280 --> 29:35.280
to see if the performance is similar.

29:35.280 --> 29:46.280
So do we have some procedure to test the performance of ASS now, because we don't use ASS by definition, we use CT.

29:46.280 --> 30:00.280
For the next generation of tape drives, so basically we put them on dual-copy tape pool, because as I said, we have the physics stuff is one copy, but we have some set of tapes that have two copies.

30:00.280 --> 30:05.280
And one of the two copies will be landing on the new media.

30:05.280 --> 30:15.280
So if we have some issues with the drives, a free series of tape or whatever, we can lose this tape, because we know that the first copy is,

30:15.280 --> 30:20.280
it's valid on the new media, and we can just did your tape and recreate the first copy.

30:20.280 --> 30:25.280
So this is how we test it in production along with the rest with no risks.

30:25.280 --> 30:28.280
I guess time's up for question.

30:28.280 --> 30:29.280
Thank you.

30:29.280 --> 30:39.280
Thank you.

