WEBVTT

00:00.000 --> 00:13.680
Hello everybody, I'm Hesos Espino, I'm supporting the name of Maramos, and I'm going to talk

00:13.680 --> 00:16.720
our experience running Maramos on Djugabai.

00:16.720 --> 00:22.680
Well, I'm going to explain what it's Maramos for, as Maramos is an open source communication

00:22.680 --> 00:29.920
platform, similar to the Slag or other communication platforms, it's specialized in mission

00:29.920 --> 00:36.400
critical system system where you need to be online or wrong things happen, and we are

00:36.400 --> 00:39.760
very focused on stability and security.

00:39.760 --> 00:43.120
So why are these two with the database?

00:43.120 --> 00:50.000
I think it's obvious from my previous definition of Maramos, but yeah, it aligns a lot with

00:50.000 --> 00:58.320
the focus of Maramos, the potential for highest scalability is also very interesting from

00:58.320 --> 01:06.400
this with the databases, geopartitioning for having different regions and things like that.

01:06.400 --> 01:09.800
And why we decide to explore Djugabai?

01:09.800 --> 01:14.800
Well, one of the main things is open source, so that's aligned with Maramos also.

01:14.800 --> 01:19.280
It's highly compatible with Postgres, it's something that is important for us, because it's

01:19.280 --> 01:20.880
our main database.

01:20.960 --> 01:28.640
Half a very good metric, good administration, and half a cloud service that I can just use for testing

01:28.640 --> 01:31.600
that.

01:31.600 --> 01:38.480
I'm going to tell the story that is a conference three years ago and trying another database,

01:38.480 --> 01:41.680
I'm not going to mention the database because it's not fair because it's for three years

01:41.680 --> 01:47.760
ago, and I don't even know if that is still the case, but the problem was this.

01:47.760 --> 01:54.800
I need all these changes, a lot of changes, a lot of files modified to get these running,

01:54.800 --> 01:59.280
not getting great, it's just running.

01:59.280 --> 02:07.520
And the results were really bad, the database was collapsing at 2000 users, Maramos is able

02:07.520 --> 02:15.760
to scale up to 200,000 users, so 2000 users is like, it was really bad.

02:15.920 --> 02:23.200
So I had a lot of back and forth with a support team from them, and we were able to get anywhere,

02:23.200 --> 02:29.200
and modifying a specific SQL for them and thinking that, and we were able to get a decent

02:29.200 --> 02:35.040
performance, what you can see there is the P99 performance that is four seconds for setting

02:35.040 --> 02:39.680
queries, what is not acceptable at all.

02:39.680 --> 02:45.280
So my lesson lens back then was, the three of the database are hard, it's not something

02:45.280 --> 02:50.320
that you can use as a drop in replacement of both of us, and you need to design your queries

02:50.320 --> 02:53.680
thinking about this with the databases.

02:53.680 --> 03:01.120
Then I started running Maramos in Djugabai, because I meet Frank in a conference, and he said,

03:01.120 --> 03:09.520
yeah, you should try Djugabai, and I say, okay, I was hesitant, but I tried, and these are the

03:09.520 --> 03:11.280
changes that I need to do.

03:11.520 --> 03:16.560
It's just small changes in the migration that were mostly related to things like

03:16.560 --> 03:18.560
out-to-backing and things like that.

03:18.560 --> 03:22.720
And actually, one of the changes I think today wouldn't be needed.

03:22.720 --> 03:34.720
But anyway, the first try was very, very, but what's collapsing to sales and users?

03:34.720 --> 03:40.080
And the performance was the grading over time, the most opposed we have was even worse,

03:40.240 --> 03:48.240
and the P in 99 was growing, we're query plans all the time, and then I was talking with

03:48.240 --> 03:53.600
Djugabai, people and say, just run and analyze, because it's something that I need to do when you

03:53.600 --> 03:59.520
are using the Postgres Compatibility to have that working on, I don't know if they were working

03:59.520 --> 04:02.480
on fixing that, but we need to do that back then.

04:02.560 --> 04:06.160
And suddenly, boom, everything was working like a sham.

04:06.160 --> 04:12.880
So, it was a great spring, everything, you can see the same kind of graph here from,

04:12.880 --> 04:17.920
from Djugabai, and it's everything is going down and going flat,

04:17.920 --> 04:22.000
where it's exactly what I was wanting.

04:22.000 --> 04:26.560
So the database was working on 6,000 users, the performance was stable over time.

04:27.520 --> 04:30.800
So that was a great experience.

04:30.800 --> 04:36.240
We were using our low tests, and modern ones that we have for demonstrating performance.

04:36.960 --> 04:43.360
And my lesson learned from Djugabai was, it can be a drop in replacement, so different lesson

04:43.360 --> 04:49.280
learned for sure. It worked great without any changes, was working great without changes,

04:49.280 --> 04:54.880
and while it requires analyze to get the right decisions when it's the right decision

04:54.880 --> 05:00.960
from the query planning, basic analyze updates, the metadata, the statistical data from the database.

05:02.480 --> 05:09.200
So, I will notice this, then, these open source, like, modern ones, so they Djugai people

05:10.240 --> 05:15.840
went there and run the low test against their instances, and they tried to, to,

05:15.840 --> 05:21.040
to, they five, low test were working well, and these are their results.

05:21.760 --> 05:31.280
They actually, and I'm having an example of a Marmos, sorry, Djugabai cloud instance

05:32.320 --> 05:39.120
that is bigger than the equivalent for Aurora, but in the price is kind of similar price,

05:39.120 --> 05:49.440
and it's getting the same amount of top users, the same amount of users that you can run in that instance.

05:49.600 --> 05:57.040
So, it was 70,000 users, 17,000 users, and the same for Aurora, similar price,

05:57.840 --> 06:07.920
and was a really good average time and P99 time. The average time and the P99 time on Aurora

06:07.920 --> 06:15.600
is lower than that, but the thing is, they are not paying for having a distributed database.

06:15.680 --> 06:20.080
So, at the end of the day, what you have is a whole distributed database for the same price

06:20.080 --> 06:26.800
that you have Aurora. I want to thank, so a mark from the Djugai team that did all this work,

06:26.800 --> 06:32.480
they were working really hard for getting this work in, because running a low test in Marmos is not an

06:32.480 --> 06:37.600
easy task to be honest. Some extra information here about the results,

06:37.600 --> 06:45.680
some graphs about growing the number of users and the performance. At the bottom, at the bottom,

06:45.680 --> 06:55.120
you see the store time, the time that is taking for getting the result from the database.

06:55.680 --> 07:03.440
So, you see that when you approach the 17,000, it's start getting worse, but in general,

07:03.440 --> 07:14.080
for 13,000, for example, it's working very well. The multiple test runs and everyone was

07:14.080 --> 07:21.440
more or less consistent. So, this is my comparison with Postgres when I run the low test,

07:21.440 --> 07:31.760
it's in general, it's similar in flat terms, it's getting flat at 13. But it's getting more

07:31.840 --> 07:36.800
latencies. What is expected from the database, because you are doing more work, actually.

07:39.040 --> 07:44.160
Same here, this is about CPU and there are some artifacts that make no sense, like the number of

07:44.160 --> 07:48.080
goretings, but it's not very relevant. I think this is the important one.

07:49.920 --> 07:56.080
Some conclusions, Marmos works on scales on Djugai, that's great. It's Djugai, it's very compatible

07:56.160 --> 08:04.480
with Postgres. I was surprised by that. And Postgres, it's able to handle more with

08:04.480 --> 08:10.560
left resources that is suspected. Djugai would have high latencies that is suspected.

08:12.960 --> 08:16.960
At the end of the day, it's the price that you pay for having this with the database.

08:17.760 --> 08:18.720
And that's it. Thank you.

08:26.080 --> 08:34.720
Thank you, that was awesome. Thank you.

