WEBVTT

00:00.000 --> 00:12.000
All right, welcome everybody, we're going to hear about Stethethes again, and connecting

00:12.000 --> 00:15.000
the front line with some beautiful seats today.

00:15.000 --> 00:19.000
So, give it up for the next one, Mr. George.

00:19.000 --> 00:24.000
Hi everybody.

00:24.000 --> 00:26.000
I'm Patrick Donnelly.

00:26.000 --> 00:31.000
I've been working on Stethethes for almost nine years now.

00:31.000 --> 00:34.000
Previously, the Stethethes team lead.

00:34.000 --> 00:38.000
Now, just tackling larger projects.

00:38.000 --> 00:39.000
Thank you, Shankar.

00:39.000 --> 00:42.000
It's now the team lead.

00:42.000 --> 00:44.000
All right.

00:44.000 --> 00:45.000
Yeah, hello.

00:45.000 --> 00:47.000
So, my name is Contedational.

00:47.000 --> 00:50.000
I also worked for IBM since quite some time.

00:50.000 --> 00:52.000
I was working for Redhead before that.

00:52.000 --> 00:56.000
And working on some of the code since very long time now.

00:56.000 --> 01:01.000
I think since 2022, 2004, I think was so very long time.

01:01.000 --> 01:02.000
Yeah.

01:02.000 --> 01:09.000
And at IBM, we look into providing us a piece of part for Stethethes, which is a relatively new feature.

01:09.000 --> 01:13.000
I've just been provided as a tech review for Stethethes, zero.

01:13.000 --> 01:19.000
And of course, we are looking for a lot of performance improvements and feature addition

01:19.000 --> 01:20.000
and support.

01:20.000 --> 01:22.000
So, there's a whole bunch of stuff to do.

01:22.000 --> 01:26.000
And one of the most important things we realized is really the problem of case sensitivity.

01:26.000 --> 01:29.000
And this is what our talk is all about.

01:29.000 --> 01:34.000
So, I hope you all can familiar with what the core problem of case sensitivity is.

01:34.000 --> 01:35.000
Actually, it's meant by that.

01:35.000 --> 01:37.000
Very easy in post-exfiles systems.

01:37.000 --> 01:40.000
You have, for example, if you take the term, a SFFFS.

01:40.000 --> 01:41.000
Okay.

01:41.000 --> 01:43.000
And this may be said that example because it's all uppercase.

01:43.000 --> 01:49.000
But typically, it's written like with a capital C and then with lower case EPH and that capital F and S.

01:49.000 --> 02:00.000
And if you have, if you just imagine all the possible case-folding types of that name, then you can have all the different files and a unique file system, which are all completely independent.

02:00.000 --> 02:02.000
They have nothing to do with each other.

02:02.000 --> 02:05.000
Well, in the Windows world, they're all considered to be the same file.

02:05.000 --> 02:11.000
And that's, of course, a fundamental problem of interoperability and of compatibility.

02:11.000 --> 02:17.000
And for that reason, we are taking this work up.

02:17.000 --> 02:22.000
So, first of all, Samba is not really a SFFFS.

02:22.000 --> 02:24.000
It's kind of an independent project.

02:24.000 --> 02:31.000
Obviously, it's really just running on top of SFF and exposing the file system with Windows semantics,

02:31.000 --> 02:35.000
to Windows clients or clients that are expecting Windows semantics.

02:35.000 --> 02:42.000
So, you would imagine that maybe with that additional layer on top, you have a way to kind of negotiate that behavior on the server side.

02:42.000 --> 02:49.000
So, is the server kind of maybe possible to run in the case-sensitive mode or in the case-insensitive mode.

02:49.000 --> 02:54.000
And actually, in the old S&B1 protocol, you see a network trace here.

02:54.000 --> 02:59.000
You have a negotiate protocol packet being sent from the cloud to the server.

02:59.000 --> 03:01.000
Then the server can reply, yes, I do support that.

03:01.000 --> 03:08.000
There was actually a case-sensitive bit that would indicate whether path names are case-less or case-sensitive.

03:08.000 --> 03:13.000
But, as you all familiar, S&B1 is really deprecated, not used anymore.

03:13.000 --> 03:20.000
And if you look now at the follow-up protocol, S&B2, and all the, according protocol specification,

03:20.000 --> 03:23.000
I mentioned them here and the SFF's, MSS&B2.

03:24.000 --> 03:30.000
In brackets, they're kind of the industry standard definition of control the Microsoft.

03:30.000 --> 03:35.000
You will find this bit still mentioned, this S&B2 case-sensitive bit.

03:35.000 --> 03:39.000
But, also, you have a footnote there saying this bit is ignored by Windows system,

03:39.000 --> 03:42.000
which always handle path names as case-insensitive.

03:42.000 --> 03:49.000
So, in that case, you can, of course, look at the file system implementation that you have in Windows,

03:49.000 --> 03:54.000
or whatever, not the file system that is exposed to the network, but just the local file system.

03:54.000 --> 03:59.000
And they actually also have a flag that indicates a case-sensitive behavior.

03:59.000 --> 04:06.000
They have one for file-cancetive file case-sensitive search, and one for preserving names.

04:06.000 --> 04:11.000
And then, if you look in an other document in the FSCC, no FSA document, it was actually,

04:11.000 --> 04:16.000
you will see that actually all the file systems that they have, like RIFS, NTFS,

04:16.000 --> 04:20.000
fed, X fed, and so forth, they all are case-preserving.

04:20.000 --> 04:25.000
And the majority of the ones that you would encounter on a kind of a production machine,

04:25.000 --> 04:31.000
particularly anti-FS case-sensitive search is always set.

04:31.000 --> 04:35.000
So, to some perspective on the protocol level up,

04:35.000 --> 04:42.000
it's basically, which means we have no way to kind of negotiate a case-sensitive behavior at the SMB protocol level.

04:42.000 --> 04:46.000
That set, of course, somebody itself has really much more flexibility,

04:46.000 --> 04:51.000
like the Windows system, and in particular it has to address this problem in the first phase.

04:51.000 --> 04:59.000
So, somebody came up in the past with a set of configuration options that you can set in the SMB.com file.

04:59.000 --> 05:03.000
That would control the server behavior, whether it is case-sensitive,

05:03.000 --> 05:08.000
whether it will preserve the case, and there's also third setting for default case.

05:08.000 --> 05:11.000
And if you go to the documentation of some of the main page,

05:11.000 --> 05:16.000
there's an eye section about this specific behavior.

05:16.000 --> 05:20.000
And the default case setting has been designed,

05:20.000 --> 05:25.000
so that you can basically eliminate all these costly case-forwarding operations,

05:25.000 --> 05:30.000
but just telling, okay, I have the assumption that all the files in this directory

05:30.000 --> 05:32.000
will be either uppercase or lower case.

05:32.000 --> 05:36.000
And if you then turn the other notes, like set case-sensitive to yes,

05:36.000 --> 05:40.000
then preserve case to no, then you can really avoid this problem altogether,

05:40.000 --> 05:46.000
of course, at the cost that all the content in these directories will need to match that specific description.

05:46.000 --> 05:50.000
So, it can be a use case imagined where this is appropriate,

05:50.000 --> 05:56.000
and really a meaningful way to avoid these case-forwarding operations,

05:56.000 --> 06:03.000
but of course, for many other scenarios, like whatever default share that will just be written with all kinds of clients.

06:03.000 --> 06:06.000
This is not applicable.

06:06.000 --> 06:10.000
So, if you look a little bit deeper at Samba, Samba has its own module stack,

06:10.000 --> 06:16.000
the FS modules, it's called, so that allows file system specific modules to be created

06:16.000 --> 06:20.000
to control specific aspects of the file system.

06:20.000 --> 06:24.000
And there has been one addition being made a very long time ago already,

06:24.000 --> 06:27.000
which is the VFS get real file name call.

06:27.000 --> 06:32.000
And if that module is implementing that call, it's basically just getting a request for a file name,

06:32.000 --> 06:34.000
it could be really in any case.

06:34.000 --> 06:39.000
And the function will then return the properly-cased file name on the disk,

06:39.000 --> 06:41.000
really in the file system.

06:41.000 --> 06:46.000
So, that Samba has an easy way to really completely avoid all these look-up operations,

06:46.000 --> 06:50.000
and just can, okay, I'm looking for CFFS, like the file name,

06:50.000 --> 06:53.000
regardless in what case, and then the file system can report back,

06:53.000 --> 06:58.000
okay, actually it is with capital C or all capital letters or all lower case letters,

06:58.000 --> 07:00.000
or whatever.

07:01.000 --> 07:07.000
So, there's a whole bunch of VFS modules that actually do implement this specific API call

07:07.000 --> 07:12.000
for GPFS, there's an implementation for cluster, even as well.

07:12.000 --> 07:16.000
But we don't have anything like this for CFFS right now.

07:16.000 --> 07:23.000
Then inside of the Samba file server, there's another capability function that will basically indicate

07:23.000 --> 07:28.000
what kind of properties they'll be exported files system, really has.

07:28.000 --> 07:35.000
There's a lot of knowledge already in place, which actually will be the vehicle for our

07:35.000 --> 07:36.000
implementation.

07:36.000 --> 07:38.000
I'm going to talk about.

07:38.000 --> 07:43.000
And of course, there are some special cases, like if you followed a focus talk about the

07:43.000 --> 07:50.000
unique extensions earlier today, he was talking about also posics path names,

07:50.000 --> 07:55.000
so that basically if you have posics extensions negotiated in the SMe2 world,

07:55.000 --> 08:01.000
that existed even in the SMe1 protocol, then you can also completely avoid all these

08:01.000 --> 08:07.000
case operations, but just whatever assuming both client and both server are running on

08:07.000 --> 08:11.000
posics systems and are following posics semantics.

08:11.000 --> 08:14.000
And then you can avoid all these lookup operations as well.

08:14.000 --> 08:19.000
And there's a bunch of clients available that support that the kernel is a big client or

08:20.000 --> 08:23.000
the S&B client utility, there might be might be others as well.

08:23.000 --> 08:27.000
And also there's of course the server support in Samba available.

08:27.000 --> 08:32.000
But again, this is really a special case, which is really mostly addressing posics

08:32.000 --> 08:37.000
two posics communication over S&B protocol.

08:37.000 --> 08:44.000
So to really go down what really the flow of operations really looks like in these two examples,

08:44.000 --> 08:48.000
when you have a file system that is supporting case sensitivity case

08:48.000 --> 08:53.000
and sensitive lookups or if you haven't one, if you just imagine there's an operation

08:53.000 --> 08:55.000
someone wants to open a file.

08:55.000 --> 08:59.000
So there's just a Windows client trying to open a file with a file,

08:59.000 --> 09:04.000
and in this case it's file name with a capital F and with a capital N.

09:04.000 --> 09:08.000
So the Samba server will receive that request that will then actually look for

09:08.000 --> 09:12.000
exactly that formatted string and the file system if it exists,

09:12.000 --> 09:17.000
if it exists, if this is fine, it will just open it and whatever return to the caller.

09:17.000 --> 09:23.000
If it does exist, it actually has to open the directory and iterate over the entire contents of the directory

09:23.000 --> 09:29.000
in order to find really the exact matched file name because they could be in the same directory

09:29.000 --> 09:35.000
file also called file name, but with a lower case F or with a capital lower case N or something like that.

09:35.000 --> 09:41.000
So you can imagine this multiple scenarios where you have really a long sequence of calls

09:41.000 --> 09:46.000
and really only done in order to find the appropriate file.

09:46.000 --> 09:51.000
While you have a file system that does support case and sensitive lookups,

09:51.000 --> 09:57.000
the flow of control is much shorter that we will have the same incoming request for file name

09:57.000 --> 10:02.000
and that specific case, if it exists it will just be opened and we are done.

10:02.000 --> 10:10.000
So obviously much shorter and we can avoid all these full directory traversal operations for this specific operation.

10:10.000 --> 10:18.000
And one colleague of mine actually did it test run really just an experiment by

10:18.000 --> 10:25.000
untarring the Linux kernel sources on an S&B share over LipsFFS and he counted,

10:25.000 --> 10:29.000
he did an analysis of what kind of system calls are called and he identified that

10:29.000 --> 10:36.000
140,000, 204 retail operations are called just really because of that specific operation.

10:36.000 --> 10:42.000
And the entire time it took for the untarring to complete was really spent in these operations for

10:42.000 --> 10:45.000
30.3% of the whole time.

10:45.000 --> 10:50.000
So really an enormous impact on the file system performance obviously.

10:50.000 --> 10:56.000
And then he repeated the same test with really these settings that I mentioned earlier,

10:56.000 --> 11:01.000
which basically just assumed in this case he decided that the default case would be lower

11:01.000 --> 11:04.000
but basically eliminating all the needs for these additional lookups.

11:04.000 --> 11:07.000
And the execution time really went down by almost a third.

11:07.000 --> 11:11.000
So just really with this simple configuration setting.

11:11.000 --> 11:18.000
Then of course this is something which we can't, whenever built in a production system or something

11:18.000 --> 11:20.000
this has been really just a test.

11:20.000 --> 11:25.000
So we definitely need to address the problem really at the blue layer which is connecting

11:25.000 --> 11:46.000
the S&B at the Seth World and with it I handed over two petitions.

11:46.000 --> 11:51.000
So let's move on to talking about case instability in S&BFS.

11:51.000 --> 11:55.000
Before we begin I'm going to bring up a slide we just saw in the last talk.

11:55.000 --> 12:00.000
For those who are not familiar with SethFS and just joining us.

12:00.000 --> 12:03.000
SethFS is a POSIX district file system.

12:03.000 --> 12:09.000
It's been around since about 2006 during Sagewall's PhD thesis work.

12:09.000 --> 12:12.000
It's the UC Santa Cruz.

12:12.000 --> 12:19.000
It is the original use case for Seth's rados distributed object store which was also

12:19.000 --> 12:23.000
developed at the same time.

12:23.000 --> 12:27.000
SethFS did something somewhat novel in the beginning by

12:27.000 --> 12:33.000
charting metadata and data into separate pools and having metadata servers

12:33.000 --> 12:44.000
act as a basically a cache and authoritative access point for all metadata in the set file system.

12:44.000 --> 12:50.000
And clients are able to interact directly with the data pool doing

12:50.000 --> 12:57.000
reads and writes with them to go through the MDS so long as they have appropriate access.

12:57.000 --> 13:06.000
Collectively the clients in the MDS collaboratively maintain the distributed cache of the metadata.

13:06.000 --> 13:10.000
And sometimes the client's authoritative for what the cache state is for a file,

13:10.000 --> 13:13.000
but generally the MDS is.

13:13.000 --> 13:19.000
The MDS officially writes out metadata changes to journals.

13:19.000 --> 13:29.000
It distributes metadata and exchanges metadata with other MDSs in the background.

13:29.000 --> 13:35.000
And it hands out rights to the clients as part of the distributed cache in the forms of capabilities

13:35.000 --> 13:39.000
which you may have heard about before.

13:39.000 --> 13:43.000
So jumping right into it.

13:43.000 --> 13:46.000
Directory entries.

13:46.000 --> 13:51.000
So something we don't really think about very often especially within the context of file systems,

13:51.000 --> 13:58.000
but it's a little part of of directories that holds metadata we don't often think about.

13:58.000 --> 14:04.000
Here is the POSIX definition of a directory entry on the right.

14:04.000 --> 14:08.000
It holds the I node number.

14:08.000 --> 14:12.000
And you get this structure when you do a read their call on a directory.

14:12.000 --> 14:16.000
It holds the I node for that particular directory entry.

14:16.000 --> 14:22.000
Some record length and offset information.

14:22.000 --> 14:28.000
The type of the directory entry which won't change because I know don't change type.

14:28.000 --> 14:33.000
So it can be like another directory or a file which can be helpful to cut out system calls.

14:33.000 --> 14:36.000
If you only carry it into look for directories, for instance.

14:36.000 --> 14:47.000
And then the directory name which in Linux is limited to 256 characters seems to be a common choice among POSIX file systems.

14:47.000 --> 15:00.000
Now, while ago we had the observation that it would be useful to add another bit of metadata to the directory entry.

15:00.000 --> 15:10.000
And that is this new field alternate name and it's just an opaque vector, a byte vector that we can stuff whatever we wanted to.

15:10.000 --> 15:16.000
The MDS does not actually care what's in this opaque structure.

15:16.000 --> 15:25.000
It only puts the data that it's been given by the client in that alternate name.

15:25.000 --> 15:29.000
So how do we actually use this and why is it exist?

15:29.000 --> 15:34.000
So the first use case for alternate name was actually encryption.

15:34.000 --> 15:46.000
A project that was worked on a few years ago was to plug in the kernel library FS script into the CFFS kernel driver.

15:46.000 --> 15:54.000
And the idea there was that the client would be able to encrypt a directory tree and including the data, of course,

15:54.000 --> 15:57.000
and also the file names in the directory names.

15:57.000 --> 16:13.000
And the MDS has no idea or any other entity that recovers that file system through whatever means would not be able to decrypt and know what those directory entry names are or what the file data is.

16:13.000 --> 16:21.000
So you just bring your own key and you can encrypt an entire file system tree on CFFS.

16:21.000 --> 16:25.000
So what that would look like is like the client's trying to manipulate file.text.

16:25.000 --> 16:32.000
And it sends a file crate with the encrypted directory entry names to the MDS.

16:32.000 --> 16:39.000
And that's what the MDS stores in the directory.

16:39.000 --> 16:45.000
Now before we wouldn't have this alternate name field, we would just store the encrypted name.

16:45.000 --> 16:48.000
And then the I node number for the file.

16:48.000 --> 16:53.000
The problem is that we encountered was, well, we're going to encrypt a name.

16:53.000 --> 16:56.000
It's going to output binary data.

16:56.000 --> 17:01.000
Many of those, the characters in that binary data are not actually valid file names.

17:01.000 --> 17:03.000
So we have to encode it.

17:03.000 --> 17:06.000
And when you encode it, the file name size increases.

17:06.000 --> 17:16.000
Well, if I give a valid long file name to FS script, the encoded name may be larger than the maximum size directory entry.

17:16.000 --> 17:18.000
So we had to deal with that.

17:18.000 --> 17:24.000
And the way we do it is we just put the if the file name is too long.

17:24.000 --> 17:28.000
We put it in this alternate name field in the directory in the directory entry.

17:28.000 --> 17:35.000
So now we can recover the long file name and decrypt it without having it,

17:35.000 --> 17:40.000
without overflowing the directory entry maximum length.

17:40.000 --> 17:42.000
Name maximum length.

17:42.000 --> 17:49.000
So the observation was that we could also use this alternate name functionality for a similar purpose within,

17:49.000 --> 17:56.000
for the handling case folding in SEPFS.

17:56.000 --> 18:06.000
And the idea here is we can have all the clients agree on how to transform a directory entry name,

18:06.000 --> 18:10.000
such that it's no longer has case in it.

18:10.000 --> 18:14.000
We case folded it.

18:14.000 --> 18:22.000
And then we store the actual directory entry name with the case in it.

18:22.000 --> 18:28.000
What I call the case full name in the alternate name field so that it can be recovered later.

18:28.000 --> 18:33.000
Like when Samba asks what's the actual file name.

18:33.000 --> 18:38.000
Or if I'm doing a reader operation on the directory I want to know what the file name is,

18:38.000 --> 18:42.000
complete with the case that was used when the file was created.

18:42.000 --> 18:47.000
Now the nice thing about this is that the MDS doesn't actually care at all about the alternate name.

18:47.000 --> 18:50.000
It's just storing what the client says the alternate name is,

18:50.000 --> 18:51.000
then look at it.

18:51.000 --> 18:59.000
All the cares about are the path names and those would be the case folded names that are actually used to name the file.

18:59.000 --> 19:06.000
Also the client doesn't really care much either about the the the name and the alternate name.

19:06.000 --> 19:13.000
And only actually needs to unwrap that name for a reader call for posics only API.

19:13.000 --> 19:22.000
That's the only time an application learns the the name of a directory entry is so reader.

19:22.000 --> 19:28.000
All other times the client is just using the case folded name.

19:28.000 --> 19:37.000
So whenever we send the creates RPC to the MDS it's going to attach the alternate name to the MDS stores it.

19:37.000 --> 19:48.000
And then when a future look up comes in or a reader to the MDS it collects that alternate name from the from the MDS and reinterpreted as it's as needed.

19:48.000 --> 19:53.000
So as a concrete example we're studying this path.

19:53.000 --> 20:03.000
We're going to send the client's going to case fold that to lower case home.

20:03.000 --> 20:15.000
It's going to do a look up operation on the root OX I know one with lower case homes and that off to the MDS then the S finds it in its table.

20:15.000 --> 20:27.000
The alternate name is capital H home that's the real case case full name of of that directory sends it back to the client.

20:27.000 --> 20:36.000
It doesn't matter for the path traversal on the client side because it's not trying to recover what the real name is.

20:36.000 --> 20:44.000
Then we're going to look up Patrick we case folded to lower case Patrick send that look up call to the MDS.

20:44.000 --> 20:57.000
It discovers the alternate name is capital P Patrick stores that doesn't need it right now it's going to continue with look up and then search for file that text again case folded.

20:57.000 --> 21:08.000
And find out that the alternate name that was used for when the file was created was capital F file and then the extension was capital's text.

21:08.000 --> 21:14.000
Again, not needed for a look up operation it just stores it in its cache.

21:14.000 --> 21:24.000
For a reader the application is going to come to the to the client the mount and say I want to read their home Patrick it's going to.

21:24.000 --> 21:49.000
Do a reader on this I note after does the path discovery on it it's going to get this table from the MDS and it's going to transfer it here and then the trick is it's going to use this alternate name as what it's going to pass back to the client.

21:49.000 --> 21:58.000
That's the only time the alternate name is is actually used and presented to the application.

21:58.000 --> 22:12.000
So how do we set up a case and stability in the in CEPFS we use a new virtual extended attribute suite of virtual exatters.

22:13.000 --> 22:19.000
They include Seth dot dirt our case sensitive Seth dot dirt our normalization Seth dot dirt our encoding.

22:19.000 --> 22:26.000
And then Seth dot dirt our charm app which is just a read only view of what the charm app is and the charm app looks like this.

22:26.000 --> 22:38.000
It's just a JSON output and you can see that for for this example directory the directory the case sensitivity is false so it's a case in sensitive directory.

22:38.000 --> 22:46.000
We have a certain normalization setting that we'll get into and then it's a UTF8 encoded directory entry names.

22:46.000 --> 22:56.000
The requirements to modify or set the charm app on our directory is it must be empty and it must not be part of a snapshot.

22:56.000 --> 23:07.000
And that's important because I can't just suddenly mark a directory case sensitive because there could be like a bunch of files which conflict with each other if they were properly folded.

23:07.000 --> 23:11.000
So it has to be done when the directory is created.

23:11.000 --> 23:15.000
And the idea there would be that it would be used for somba shares upfront.

23:15.000 --> 23:17.000
I want to use this directory tree for somba.

23:17.000 --> 23:22.000
I'm going to mark it upfront that it's going to be in sensitive.

23:22.000 --> 23:27.000
So the first charm app we'll talk about is Seth dot dirt our normalization.

23:27.000 --> 23:32.000
There are four normalizations that you have to choose from which come from unicode standards.

23:33.000 --> 23:42.000
These are supported by boost which we're using as a in the boost local library to actually implement these normalization routines.

23:42.000 --> 23:49.000
The default normalization is NFD form d canonical decomposition.

23:49.000 --> 23:58.000
And the way that looks is I'm going to just set the normalization for a directory to be NFD.

23:58.000 --> 24:06.000
And then if I create files, for example, that there how do I say this?

24:06.000 --> 24:07.000
Who is it?

24:07.000 --> 24:08.000
Who is it?

24:08.000 --> 24:09.000
Okay.

24:09.000 --> 24:11.000
I don't know any German.

24:11.000 --> 24:12.000
Sorry.

24:12.000 --> 24:14.000
But I love this word.

24:14.000 --> 24:27.000
So when it's normalized, it's going to translate the you with the umlaut into a you.

24:27.000 --> 24:31.000
A regular you like a English you.

24:31.000 --> 24:36.000
And then the umlaut gets separated out as a separate unicode character.

24:36.000 --> 24:46.000
And then this capital B, which is pronounced like two S's.

24:46.000 --> 24:53.000
It gets turned into this this unicode character is zero zero DF.

24:53.000 --> 24:59.000
And that's how the NFD transformation is done on that.

24:59.000 --> 25:04.000
Normalization is not optional for handling case sensitive directories.

25:04.000 --> 25:11.000
And the reason for that is it's very easy to construct two.

25:11.000 --> 25:15.000
Two directory entries which are rendered exactly the same.

25:15.000 --> 25:21.000
On our on a screen, but they are actually the the bite encoding is different.

25:21.000 --> 25:27.000
So the normalization is there to help with the um, collating that properly.

25:27.000 --> 25:29.000
So the set dot dot case sensitive.

25:29.000 --> 25:30.000
We're similar.

25:30.000 --> 25:34.000
We're going to set it zero to mark the directory in sensitive.

25:34.000 --> 25:37.000
And now we're just running the the case folding.

25:37.000 --> 25:43.000
The other standardized unicode case folding algorithm on the directory name.

25:43.000 --> 25:48.000
And then you can see like the capital G gets turned into a lower case G.

25:49.000 --> 25:54.000
And the nice thing about this case folding table that in unicode is this locale independent.

25:54.000 --> 25:58.000
So it doesn't matter, you know, what the locale of the client is.

25:58.000 --> 26:01.000
And again normalization is required.

26:01.000 --> 26:03.000
Set dot dot during coding.

26:03.000 --> 26:08.000
This is really just to give us room for changing things in the future.

26:08.000 --> 26:10.000
If we want to support other encoding types.

26:10.000 --> 26:14.000
It's actually a complicated thing to change because if you switch to for example,

26:14.000 --> 26:18.000
for example, the UTF 16, then you can have nulls occurring in directory names.

26:18.000 --> 26:25.000
And that that is not a no no um for for a lot of the the code that we already have.

26:25.000 --> 26:27.000
Because it assumes null terminated names.

26:27.000 --> 26:33.000
So that's just there for future proofing the the API.

26:33.000 --> 26:39.000
Um, there's an equivalent sub volume API, which we would expect to be used within the context of CFCSI.

26:39.000 --> 26:43.000
With some by exports.

26:43.000 --> 26:47.000
It works exactly the same as setting the x-atters.

26:47.000 --> 26:53.000
And then finally we have client access guards to prevent in compatible clients.

26:53.000 --> 27:00.000
And the main one would be kernel clients from interacting with the case sensitive case

27:00.000 --> 27:06.000
and sensitive directory because we don't have an implementation yet for that.

27:07.000 --> 27:12.000
But there's a client feature bit that now protects it.

27:12.000 --> 27:16.000
So the MDS will not allow an incompatible client to create files,

27:16.000 --> 27:18.000
create links, et cetera.

27:18.000 --> 27:21.000
But unlink an arm does okay.

27:21.000 --> 27:26.000
So you could mount a kernel client and just nuke a directory if you're an admin.

27:26.000 --> 27:32.000
And you want to do that for some reason.

27:33.000 --> 27:35.000
Yeah, so here's an example.

27:35.000 --> 27:40.000
We're going to set the case in sensitive directory for step of Esther.

27:40.000 --> 27:45.000
We're going to, um, here we're just getting it to have a look.

27:45.000 --> 27:47.000
It's it's a case in sensitive.

27:47.000 --> 27:52.000
The normalization's NFD, the default and the encodings UTF8.

27:52.000 --> 27:55.000
We're going to create this file.

27:55.000 --> 27:56.000
We LS it.

27:56.000 --> 28:01.000
We see that it actually, uh, we get the case full name back when we do the reader.

28:01.000 --> 28:06.000
And then we're going to tell the MDS to dump the cache for that particular address.

28:06.000 --> 28:08.000
So we can just have a look at it.

28:08.000 --> 28:13.000
And here we're just, um, finding that particular directory entry,

28:13.000 --> 28:14.000
ending with SSEN.

28:14.000 --> 28:18.000
And the reason I did that is because it's been case folded and normalized.

28:18.000 --> 28:22.000
So now instead of this, where it's, it's loosened.

28:22.000 --> 28:27.000
But, um, I can actually can't type it correctly on my terminal with the normalization.

28:27.000 --> 28:33.000
And then you can see this is what it looks like, uh, for the, on the MDS site.

28:33.000 --> 28:40.000
And then similarly, uh, if we wanted to, uh, base 64 decode the alternate name,

28:40.000 --> 28:45.000
we would see that, uh, we get back the correct name.

28:45.000 --> 28:49.000
So some closing thoughts, the alternate name, metadata,

28:49.000 --> 28:51.000
turned out to have some more use cases.

28:51.000 --> 28:55.000
I think it's, uh, pretty interesting that that was the case and we found another one so quickly.

28:55.000 --> 28:58.000
And it worked out very well. It's very, very performant.

28:58.000 --> 29:02.000
Um, and now SOMONS have festival and joy, uh,

29:02.000 --> 29:06.000
efficient case and sensitive directory trees.

29:06.000 --> 29:08.000
That, thank you and questions.

29:08.000 --> 29:10.000
Thank you.

29:16.000 --> 29:17.000
Yep, please.

29:17.000 --> 29:24.000
Are any plans for, like, converting existing data to use the new case and sanctitude?

29:24.000 --> 29:29.000
Any plans to convert existing file system trees to be case and sensitive?

29:29.000 --> 29:32.000
Uh, no, we would, you'd have to copy it.

29:32.000 --> 29:39.000
Um, there's no plans to, to change an existing directory tree.

29:39.000 --> 29:41.000
Time's up. Okay.

29:41.000 --> 29:43.000
We're happy to take questions else.

