WEBVTT

00:00.000 --> 00:11.640
OK, OK, well, let's get started.

00:11.640 --> 00:16.600
I think you guys already heard from a number of folks involved

00:16.600 --> 00:20.760
in the design implementations of distributed databases.

00:20.760 --> 00:25.880
And I am going to talk to you about where

00:25.880 --> 00:30.280
you actually will need, where those distributed databases

00:30.280 --> 00:32.280
on the first place, right?

00:32.280 --> 00:34.840
Or if some bad search come to us.

00:34.840 --> 00:40.040
But let me first, maybe define what do I mean by the distributed

00:40.040 --> 00:41.600
databases, right?

00:41.600 --> 00:44.360
I think what if you look at the database landscape right now,

00:44.360 --> 00:44.520
right?

00:44.520 --> 00:47.640
We can clearly see those two different database categories.

00:47.640 --> 00:51.280
There are some databases which have been originally

00:51.280 --> 00:52.880
designed for single node.

00:52.880 --> 00:54.480
MySQL postgres, right?

00:54.480 --> 00:58.800
And then there is a different generation of databases.

00:58.800 --> 01:03.720
We are designed to be distributed databases from a ground up, right?

01:03.720 --> 01:07.720
And a lot of database which is designed for cloud native age,

01:07.720 --> 01:14.720
we are typically distributed databases.

01:14.720 --> 01:18.960
Now, what is the key difference in the approach

01:18.960 --> 01:22.520
of a high mobility and scalability?

01:22.520 --> 01:25.880
Now, if you have something like a single node database,

01:25.880 --> 01:30.400
like mySQL, well, if you need to have a ability,

01:30.400 --> 01:33.080
you have replication, right?

01:33.080 --> 01:35.760
You can scale that by using the big boxes,

01:35.760 --> 01:38.600
maybe read write, split, and that is essentially

01:38.600 --> 01:42.840
what you have, have a bit of having complete copies of data.

01:42.840 --> 01:45.720
And then you execute the query on a single node.

01:45.720 --> 01:48.120
And that's kind of relatively simple problem.

01:48.120 --> 01:52.400
Now, distributed databases is then we also have a lot of nodes

01:52.400 --> 01:56.280
but typically we have only partly copies of data

01:56.280 --> 02:00.080
because data can be much, much larger, and fits in a single node.

02:00.080 --> 02:02.840
And then also you will have a distributed execution.

02:02.840 --> 02:07.040
That means when distributed query is going to touch many nodes

02:07.040 --> 02:10.080
one way or around.

02:10.080 --> 02:13.840
Now, if you really look at the high end, of course,

02:13.840 --> 02:18.000
no single node can run a Facebook, right?

02:18.000 --> 02:20.920
I mean, it's just, you know, two freaking big.

02:20.960 --> 02:23.240
Lots of data, lots of queries.

02:23.240 --> 02:28.880
So if you look at that large scale or extreme scale, right?

02:28.880 --> 02:32.200
Or if you are happy to write, you know,

02:32.200 --> 02:35.000
mid size of the application, but you happen to be in China,

02:35.000 --> 02:37.880
in all those cases, right?

02:37.880 --> 02:41.600
You need really distributed systems, right?

02:41.600 --> 02:42.680
It's one way or another.

02:42.680 --> 02:44.200
How can they approach that?

02:44.200 --> 02:47.960
Well, you can either do that in an application level

02:48.920 --> 02:53.960
and that is what a lot of folks did in, you know,

02:53.960 --> 02:56.360
early to meet the thousands, right?

02:56.360 --> 02:58.000
That's where Facebook was started.

02:58.000 --> 03:00.840
Some may remember live journal, right?

03:00.840 --> 03:03.200
Two popularized, the term, Schradian, right,

03:03.200 --> 03:05.560
and some of the early approaches to that.

03:05.560 --> 03:09.120
Because actually, we do not have at least an open source wall

03:09.120 --> 03:12.120
any good distributed databases.

03:12.120 --> 03:17.880
Then there is also a approach to write which I will call

03:17.880 --> 03:19.520
you know, proxy.

03:19.520 --> 03:20.480
If no, it is just back.

03:20.480 --> 03:24.080
That is a pretty complicated approach.

03:24.080 --> 03:25.480
Something like a VTAS.

03:25.480 --> 03:29.200
Where you have full blown database, right?

03:29.200 --> 03:30.680
And then you have some, you know,

03:30.680 --> 03:33.440
proxy and all of that, which deals with all that,

03:33.440 --> 03:37.800
complicated stuff, so you as a user don't have to.

03:40.240 --> 03:44.440
So Schradian, and this will process and are complicated,

03:44.440 --> 03:50.360
especially if you think about a really kind of complete solution.

03:50.360 --> 03:55.240
In my day, right, when you had a lot of those applications,

03:55.240 --> 03:57.520
Schradian roots, there are so many people

03:57.520 --> 04:00.320
implement some, you know, solution, which would kind of work

04:00.320 --> 04:03.880
in 95%, I mean, maybe kind of 99%, right?

04:03.880 --> 04:07.200
But that's all would be very, very fragile.

04:07.200 --> 04:10.920
And I think we came now to understand in what,

04:10.920 --> 04:13.840
having the application developers, right,

04:13.840 --> 04:19.560
to try to double in writing those distributed data,

04:19.560 --> 04:22.840
processing algorithms and distributed database

04:22.840 --> 04:24.480
is not a good idea, right?

04:24.480 --> 04:27.720
That is just, you know, some people still use it, right?

04:27.720 --> 04:29.880
Like somebody like, if you think a Facebook

04:29.880 --> 04:32.760
and a bunch of other companies started in that era,

04:32.760 --> 04:37.280
they sort of have their own VTAS-like proxy, right?

04:37.280 --> 04:39.800
Which does a lot of that kind of magic.

04:39.800 --> 04:43.000
So, application developers, they don't have to think

04:43.000 --> 04:45.720
about a distributed database, right?

04:45.720 --> 04:48.160
So, I think that is a very important thing.

04:48.160 --> 04:51.160
Hey, you know what, if you, in this day and age,

04:51.160 --> 04:55.240
and if you actually need that distributed database,

04:55.240 --> 05:00.720
then probably you should not do the manual Schradian

05:00.720 --> 05:02.320
in the application.

05:02.320 --> 05:06.040
Now, as a fairytale, that counts as sort of butt, right?

05:06.040 --> 05:08.680
Because if you think about a distributed systems,

05:08.720 --> 05:10.560
they are complicated.

05:10.560 --> 05:12.200
And I think it was kind of interesting

05:12.200 --> 05:14.880
like to watch your previous presentations, like,

05:14.880 --> 05:17.640
well, how do you even think about the time

05:17.640 --> 05:20.040
in those systems and you have a many nodes, right?

05:20.040 --> 05:22.080
All those kind of complicated things,

05:22.080 --> 05:25.320
how you handle visibility, consistency,

05:25.320 --> 05:28.040
isolation modes, and those kind of flash distributed systems.

05:28.040 --> 05:31.560
That is quite hard, right?

05:31.560 --> 05:34.160
And even if it's like a fully managed solution,

05:34.160 --> 05:37.000
that doesn't completely isolate you as an application

05:37.000 --> 05:41.040
developer, because you have to understand the systems.

05:41.040 --> 05:43.400
You have to understand how we can fail.

05:43.400 --> 05:45.960
You have to understand what kind of bugs,

05:45.960 --> 05:48.560
and hey, believe it or not, database engineers

05:48.560 --> 05:51.200
are not perfect, there are bugs in a database, right?

05:51.200 --> 05:54.720
And if you are dealing with some very complicated systems,

05:54.720 --> 05:58.040
you run into them, when a database is not behaving as a true,

05:58.040 --> 05:58.880
right?

05:58.880 --> 06:02.200
And understanding how it should in a distributed basis

06:02.200 --> 06:04.160
that is complicated, right?

06:04.160 --> 06:07.320
So if you are dealing with complicated

06:07.320 --> 06:09.760
a distributed system needlessly, right,

06:09.760 --> 06:15.000
then that can be that idea.

06:15.000 --> 06:16.800
So that brings us to the question,

06:16.800 --> 06:18.520
which is the premise of my presentation.

06:18.520 --> 06:22.960
Okay, when do you actually need that distributed database?

06:22.960 --> 06:25.440
And then you can just pick that little nice,

06:25.440 --> 06:27.440
my school of post-grace school instance,

06:27.440 --> 06:29.920
kind of maybe, you know, replicate that for high

06:29.920 --> 06:32.640
ability to scale reads and this kind of stuff, right?

06:32.640 --> 06:34.840
How high can you go?

06:34.840 --> 06:38.440
And actually, a while ago, I put this kind of little post

06:38.440 --> 06:40.760
on Twitter, you know, just to check where people

06:40.760 --> 06:42.480
fail things are.

06:42.480 --> 06:46.600
And I was actually surprised, especially by their

06:46.600 --> 06:48.800
post-grace field community folks, you can say, well,

06:48.800 --> 06:52.840
actually, we are running some, you know,

06:52.840 --> 06:57.840
100 terabyte plus instance on a BFFB, the hardware.

07:00.520 --> 07:02.320
And then I talk to some post-grace field guys

07:02.320 --> 07:03.560
and say, yeah, I would just, you know,

07:03.560 --> 07:05.600
giving the sheet and pools right to use that real

07:05.600 --> 07:08.400
and say, well, you know what, that is not the most

07:08.400 --> 07:10.840
pleasurable thing to do, but you can do that

07:10.840 --> 07:12.720
in a certain cases, right?

07:12.720 --> 07:17.280
Well, so that I think gives us some idea.

07:17.280 --> 07:20.440
Now, I think what is also interesting, in this case,

07:20.440 --> 07:25.160
is that hardware available for those days

07:25.160 --> 07:26.760
is actually pretty big.

07:26.760 --> 07:31.760
Well, how big, how many cores do you think,

07:33.320 --> 07:35.360
you can get the instance, let's say, on Amazon,

07:35.360 --> 07:39.120
how many cores you can get?

07:39.120 --> 07:40.920
What's it?

07:40.920 --> 07:41.920
200.

07:41.920 --> 07:46.400
Well, I look today, right, and you can actually get

07:46.400 --> 07:50.120
almost 2,000 cores, right?

07:50.120 --> 07:51.840
And 32 gigabytes of memory.

07:51.840 --> 07:55.240
And that requires a budget with, maybe the same also,

07:55.240 --> 07:56.920
2,000 zeros, right?

07:56.920 --> 07:59.840
But, you know, if a money is not an option, you can get,

07:59.840 --> 08:03.640
like, a huge, huge, huge instance out here.

08:03.640 --> 08:05.480
Now, if you look at, you know, hey,

08:05.480 --> 08:07.720
we have all reasonable people, you know,

08:07.720 --> 08:11.440
we can buy, can't buy Mercedes every minute, right?

08:11.440 --> 08:14.360
When, you know what, when you can look at kind of more

08:14.360 --> 08:16.920
out, so they come out of your hardware, right?

08:16.920 --> 08:20.320
And that would be, who would be something like this, right?

08:20.320 --> 08:22.240
Which is also pretty big.

08:22.240 --> 08:26.600
So, if you look at, in this case, it's interesting

08:26.600 --> 08:28.760
to think about what kind of performance

08:28.760 --> 08:30.520
a scalability we can get, right?

08:30.520 --> 08:34.120
And you can actually get like a number of millions of queries

08:34.120 --> 08:38.920
from a single node, let's say, my school, right?

08:38.920 --> 08:41.440
And in reality, the queries are more complicated,

08:41.440 --> 08:44.760
still going to be hundreds of thousands in many cases.

08:44.760 --> 08:48.680
Note though, what varies something you better check

08:48.680 --> 08:53.360
because both your scalability, as well as exact performance,

08:53.360 --> 08:56.720
A is not going to be linear scaling the number of CPU cores, right?

08:56.720 --> 09:00.920
And that's also going to be very workload dependent.

09:00.920 --> 09:06.400
I would also say what you need to mind maintenance.

09:06.400 --> 09:09.200
In many cases, what really bites you in the butt

09:09.200 --> 09:11.840
is not those kind of performance for normal queries.

09:11.840 --> 09:14.480
You run small queries, you know, couple of reads,

09:14.480 --> 09:16.320
mostly memory, who cares.

09:16.320 --> 09:18.920
But then imagine if you have that, for example,

09:18.920 --> 09:22.800
90 terabyte table in your 100 terabyte database

09:22.800 --> 09:25.800
and you need to add a hold on to it, right?

09:25.800 --> 09:28.600
Or build a new index.

09:28.600 --> 09:30.760
That can be very unpleasant.

09:30.760 --> 09:33.200
It can take a lot of time, right?

09:33.200 --> 09:34.960
And in this case, if you're bound to consume the node,

09:34.960 --> 09:38.040
especially if you look at some solutions like my school,

09:38.040 --> 09:40.360
for example, we don't even implement

09:40.360 --> 09:43.720
the parallel things for many of those, right?

09:43.720 --> 09:49.720
So that becomes very important if you look at those systems.

09:49.720 --> 09:52.320
Especially if you need to hold this door.

09:52.320 --> 09:57.480
Yeah, double the storage, but that is, of course, another thing.

09:57.480 --> 09:59.680
Now, if you think about the database use, right?

09:59.680 --> 10:06.320
I would say there are a bunch of different users

10:06.320 --> 10:09.360
typically you have in the system, right?

10:09.360 --> 10:12.360
That comes from your core production database, like, oh my gosh,

10:12.360 --> 10:15.160
you know, I am e-commerce, no matter what happens,

10:15.160 --> 10:18.200
I need people to be able to keep buying pain as money, right?

10:18.200 --> 10:19.680
Then there's some secondary things, like, well,

10:19.680 --> 10:21.840
maybe I also want them to see the ad,

10:21.840 --> 10:24.440
so they buy something else, but if it doesn't work,

10:24.440 --> 10:25.720
it's not such a big deal, right?

10:25.720 --> 10:28.800
And then there's like telemetry, analytics, so on and so forth.

10:28.800 --> 10:31.640
And all of them, they all have a different parameters, right?

10:31.640 --> 10:35.240
Like, for example, in terms of data, we can find what humans,

10:35.240 --> 10:37.400
even if you're posting a lot of, you know, like,

10:37.400 --> 10:39.760
you know, messages on the chart, right?

10:39.760 --> 10:42.880
Don't generate as much data as machines,

10:42.880 --> 10:47.040
which can generate, like, often, tens and thousands of times more.

10:47.040 --> 10:50.160
And that is very often a lot of those massive data scales

10:50.160 --> 10:55.160
can tell, even small, in a small environment.

10:56.160 --> 10:57.520
Now, I think it's also good to think

10:57.520 --> 10:59.560
what exactly applications you have, right?

10:59.560 --> 11:01.760
And you can think about the types in a different ways,

11:01.760 --> 11:03.200
but basically on one thing, you can say,

11:03.200 --> 11:05.960
hey, there is this kind of a little application.

11:05.960 --> 11:07.960
We run in our company, it's hosted in Toronto,

11:07.960 --> 11:09.840
now, in our server, wherever it is, right?

11:09.840 --> 11:14.240
And then all the way to something like, like, a Facebook,

11:14.240 --> 11:16.120
you know, multiple users, very kind of,

11:16.120 --> 11:18.160
intermingled data, right?

11:18.160 --> 11:20.040
We need to require a lot of connections.

11:20.040 --> 11:23.520
You want to maybe know what your friend's post, right?

11:23.520 --> 11:24.680
All this kind of stuff, right?

11:24.680 --> 11:28.080
That is a landscape, which I would say.

11:28.080 --> 11:31.320
And that, I think, place, this is a database

11:31.320 --> 11:34.200
and maybe need it or not, right?

11:34.200 --> 11:38.200
If you're having, like, a self-hosted application

11:38.200 --> 11:41.200
in the enterprise, even in a pretty large one,

11:41.200 --> 11:45.680
chances are, amount of data, amount of traffic, right?

11:45.680 --> 11:49.640
Can be handled by your kind of conventional old-style

11:49.640 --> 11:52.440
single-load database, with some replication for everything,

11:52.440 --> 11:53.280
and stuff.

11:53.280 --> 11:57.360
But then, if you are going to this massive, you know,

11:57.360 --> 12:00.240
web-scale public applications, well,

12:00.240 --> 12:02.360
different story altogether.

12:02.360 --> 12:06.400
Now, here is the challenge I would see.

12:06.400 --> 12:08.640
It's kind of cuts above ways.

12:08.640 --> 12:11.240
In some cases, I see kind of developers, you know,

12:11.240 --> 12:12.720
attending talk like this, right?

12:12.720 --> 12:15.600
So maybe hearing from Facebook and their meta,

12:15.600 --> 12:18.240
meet up on something and things like, wow,

12:18.240 --> 12:20.080
I need the web-scale database.

12:20.080 --> 12:22.320
I need to future proof myself, right?

12:22.320 --> 12:26.600
And they install something, which is way, way to more scalable

12:26.600 --> 12:30.480
and to complicated compared to what they need, right?

12:30.480 --> 12:33.600
Oh, I'm just going to have, like, a war-pressed website,

12:33.600 --> 12:35.800
which two people a day is going to visit.

12:35.800 --> 12:39.840
Well, yeah, that's if my mom is not in location, right?

12:39.840 --> 12:42.040
No, no, only distributed databases for that, right?

12:42.040 --> 12:45.360
And on the other hand, you also don't want to be in a situation

12:45.360 --> 12:48.280
where you pick the database, which is not distributed,

12:48.280 --> 12:49.600
kind of do this way, right?

12:49.600 --> 12:54.400
And then you are, actually, have to have a, you know,

12:54.400 --> 12:57.720
have massive scalability needs, you have to scale massively

12:57.720 --> 13:00.480
because that's the application became kind of super successful,

13:00.480 --> 13:03.600
you know, like, think about like an open AI, right?

13:03.600 --> 13:06.600
Like, going from zero to 100 million users

13:06.600 --> 13:11.600
right in this, then you have to do a different choices.

13:12.440 --> 13:15.800
And I think it's also important to highlight here,

13:15.800 --> 13:21.440
is what it is often not really a single answer.

13:21.440 --> 13:25.200
You would find what many organizations run multiple database

13:25.200 --> 13:27.920
technologies for a number of different reads, right?

13:27.920 --> 13:29.640
Because, you know, operational database

13:29.640 --> 13:31.760
and analytical database often different, right?

13:31.760 --> 13:34.520
And maybe you need to have some relational database

13:34.520 --> 13:36.560
and database, which are good, that, you know,

13:36.560 --> 13:39.960
storing documents, right, or vector, or search, right?

13:39.960 --> 13:45.360
And so often there is going to be a portfolio of databases.

13:45.360 --> 13:48.360
And this kind of a trade-off between, you know,

13:48.360 --> 13:51.360
like a simple single-load database

13:51.360 --> 13:54.760
and distributed scalable database is one of those choices,

13:54.760 --> 13:57.360
right, which I would say use to build

13:57.360 --> 14:01.440
portfolio of a database is your organization use.

14:01.440 --> 14:04.920
And that's all I had to say.

14:04.920 --> 14:07.760
Yeah, with a minute to spare, huh?

14:07.760 --> 14:08.760
Yeah.

14:08.760 --> 14:09.760
Yeah.

14:13.760 --> 14:17.760
Maybe you had time for one question if it's a quick one?

14:17.760 --> 14:18.760
No?

14:18.760 --> 14:19.760
It's all clear.

14:19.760 --> 14:23.760
You have more of them out, by your, you have more, like.

14:23.760 --> 14:24.760
Thank you.

14:24.760 --> 14:27.760
A free hour talk if no breaks, right?

