WEBVTT

00:00.000 --> 00:07.000
We'll get started here.

00:07.000 --> 00:08.000
How is everybody doing?

00:08.000 --> 00:11.000
I hope you've been having a good time with the conference.

00:11.000 --> 00:14.000
And welcome to talk, so we're going to talk about,

00:14.000 --> 00:17.000
well, the question is, was Leslie Lamport right?

00:17.000 --> 00:19.000
And it's a bit of a loaded question and you could say,

00:19.000 --> 00:22.000
well, maybe it's a bit clickbaity like in a YouTube way.

00:22.000 --> 00:25.000
But what we're going to do is we're going to look at some of the papers

00:25.000 --> 00:29.000
that were written by Leslie Lamport over the past 50 years or so.

00:29.000 --> 00:32.000
And you can see, well, first evaluate what they,

00:32.000 --> 00:35.000
the concepts are in them, but also kind of look at the questions

00:35.000 --> 00:42.000
and sort of, are they still relevant in today's world?

00:42.000 --> 00:45.000
First, I think you should know who we are.

00:45.000 --> 00:48.000
So I'm Sarah Kristoff, I'm a staff software engineer at Adera.

00:48.000 --> 00:52.000
And I am the founder of the Leslie Lamport fan club.

00:52.000 --> 00:57.000
Everyone can join after the talk and please email Leslie himself

00:57.000 --> 01:00.000
and let him know that I started this club for him.

01:00.000 --> 01:06.000
And I'm Nick Jackson, I am member number two of the Leslie Lamport fan club.

01:06.000 --> 01:10.000
So, so our agenda, we're going to have a look at a little bit of a history lesson.

01:10.000 --> 01:13.000
We're going to look at consensus, consistency, concurrency,

01:13.000 --> 01:15.000
clocks and gossip.

01:15.000 --> 01:18.000
And let's get moving.

01:18.000 --> 01:22.000
So it was a kind of a bit of history, one of the most sort of attributable

01:22.000 --> 01:29.000
famous papers that Leslie has written is probably the Byzantine generals problem.

01:29.000 --> 01:34.000
So what we want to do with the talk is we're going to kind of take that theme,

01:34.000 --> 01:37.000
and we're going to run it through all of the other examples.

01:37.000 --> 01:39.000
Hopefully.

01:39.000 --> 01:43.000
Now, where this sort of came from, where did the Byzantine Empire come from?

01:43.000 --> 01:48.000
Well, it came from way, way back when when Emperor Constantine

01:48.000 --> 01:53.000
back in the Roman Empire decided that he wanted to create a new capital for Rome.

01:53.000 --> 01:59.000
So they created this capital on the old Greek settlement of Byzantium.

01:59.000 --> 02:03.000
Now, the capital was known as Constantinople.

02:03.000 --> 02:06.000
Are there any, they might be Giants fans?

02:06.000 --> 02:13.000
Because, you know, obviously now that Istanbul was Constantinople.

02:13.000 --> 02:17.000
And now that that happened in the third century, but by the fifth century,

02:17.000 --> 02:19.000
the Byzantine Empire had really grown.

02:19.000 --> 02:22.000
It was like growing in quite a lot of strength.

02:22.000 --> 02:28.000
So again, Emperor Justinian decided that he was running the eastern Roman Empire

02:28.000 --> 02:32.000
Byzantine that he was going to take over Rome.

02:32.000 --> 02:37.000
And that was his ambition, and he did, and he united the sort of the whole thing,

02:37.000 --> 02:42.000
which leads us into the setting for our theme, which is around 500 years later.

02:42.000 --> 02:48.000
So let's introduce our cast of Heroes and Villains.

02:48.000 --> 02:55.000
So Emperor Romanoz in 1034 was actually the Emperor of the Byzantine Empire.

02:55.000 --> 02:58.000
He was, well, he got sick.

02:58.000 --> 03:04.000
He was married to Emperor Szovi, who allegedly also poisoned him,

03:04.000 --> 03:08.000
because Szovi was having an affair, allegedly,

03:08.000 --> 03:13.000
with Michael the Money Changer, who then later became the Emperor of the Byzantine Empire.

03:13.000 --> 03:17.000
Now, Michael was a Money Changer.

03:17.000 --> 03:19.000
Michael was basically a counterfitter.

03:19.000 --> 03:25.000
But Michael was also not particularly well versed in government or business.

03:25.000 --> 03:29.000
So to run the kind of the complex side of governance,

03:29.000 --> 03:33.000
Michael employed his brother, John the Unic.

03:34.000 --> 03:37.000
Now, our generals, we have George Manalix.

03:37.000 --> 03:40.000
George was one of the most fierce generals,

03:40.000 --> 03:46.000
the most successful generals in the Byzantine army at this time.

03:46.000 --> 03:49.000
But he couldn't do this alone.

03:49.000 --> 03:52.000
And the Byzantine Empire, which I think was really fascinating,

03:52.000 --> 03:55.000
wasn't just kind of self-supported.

03:55.000 --> 03:59.000
They actually had an army of Vikings, the Varangians,

04:00.000 --> 04:06.000
led by Harold Sigginson, who later became the future king of Denmark,

04:06.000 --> 04:11.000
as a kind of a general, the top general in the army as well.

04:11.000 --> 04:15.000
I think it's pretty fascinating, but we can get into the real stuff.

04:15.000 --> 04:21.000
And that brings us to the two generals problem.

04:21.000 --> 04:26.000
So in this problem, this is kind of a runs back into a classic paper.

04:26.000 --> 04:32.000
Things around the sort of the 70s, and it was around the concept of database consistency.

04:32.000 --> 04:38.000
And the concept is that if only one general attacks, both will be defeated.

04:38.000 --> 04:42.000
If none of them attack, they both live to fight another day.

04:42.000 --> 04:46.000
And if they both attack, then they can take the city.

04:46.000 --> 04:50.000
So it's around that achieving consensus.

04:50.000 --> 04:55.000
And the problem is written to be, it's not solvable, right?

04:55.000 --> 05:04.000
Because to be able to ensure that consistency in a system where messages are not necessarily guaranteed,

05:04.000 --> 05:06.000
you can end up with a situation like this.

05:06.000 --> 05:11.000
So Harold and Georgeus, they want to attack a city.

05:11.000 --> 05:15.000
So George says to Harold, hey, I'm going to attack at 10 a.m.

05:15.000 --> 05:16.000
Can you confirm?

05:16.000 --> 05:18.000
And then I will attack with you.

05:18.000 --> 05:21.000
So Harold sends a message back and says, hey, yeah, sure.

05:21.000 --> 05:23.000
Let's attack at 10 a.m.

05:23.000 --> 05:26.000
Confirm you got my message, so I know we're both going to attack.

05:26.000 --> 05:31.000
And then George sends a message back, which says, I'm confirming your confirmation.

05:31.000 --> 05:36.000
Can you confirm that you receive my confirmation so that I know that we're both going to attack?

05:36.000 --> 05:39.000
And then Harold sends a message back, says I'm confirming your confirmation.

05:39.000 --> 05:41.000
Can you confirm you got my confirmation?

05:41.000 --> 05:44.000
And this goes on and on and on and on.

05:44.000 --> 05:51.000
And ultimately, because they never know whether any of these messages could be intercepted by the enemy,

05:51.000 --> 05:58.000
well, you know, you can't kind of have that consistency.

05:58.000 --> 06:05.000
So at the time, the kind of Michael is looking at this and he says, well, I need a solution.

06:05.000 --> 06:07.000
And what can that sort of solution be?

06:07.000 --> 06:13.000
So he speaks to John the Unic and they come up with this solution here,

06:13.000 --> 06:21.000
which is where John suggests that if Michael acts as a commander and then we have the two generals of George and Harold,

06:21.000 --> 06:30.000
then if Michael issues an order to George and if George exchanges that order with Harold, then they should be able to come to consensus.

06:30.000 --> 06:38.000
But it doesn't work because if a message gets intercepted, now in all of the sort of the paper on the Byzantine General,

06:38.000 --> 06:42.000
Leslie Lamport treats this or he calls it a traitor.

06:42.000 --> 06:49.000
So in this instance, Michael is a traitor, but what they're really implying is that it could be a faulty node inside of a system,

06:49.000 --> 06:52.000
or it could be a fault with a message.

06:52.000 --> 06:54.000
So you end up with this situation here.

06:54.000 --> 07:02.000
And then if you kind of boil that down, what you can see is that because of this faulty message, don't look at that.

07:02.000 --> 07:11.000
Then you actually end up with an inconclusive situation because both George and Harold have conflicting pieces of information.

07:11.000 --> 07:13.000
How do you fix it?

07:13.000 --> 07:18.000
Well, you fix it by introducing another general.

07:18.000 --> 07:27.000
So John says, well, if we introduce one more general into the mix and we're all exchanging messages in the same way,

07:27.000 --> 07:32.000
then we should still be able to, we should now be able to achieve consensus.

07:32.000 --> 07:36.000
So in the instance here, so John is acting as the traitor.

07:36.000 --> 07:41.000
Michael sends attack messages to all three generals.

07:41.000 --> 07:46.000
Harold and George exchange attack messages.

07:46.000 --> 07:56.000
John sends retreat message. He changes the instruction or the messages has been intercepted and has changed by the Syracuse in.

07:57.000 --> 08:00.000
But then can they receive consensus?

08:00.000 --> 08:12.000
And if you look at now when you kind of sum up all of those different votes, you can still see the George and Harold can achieve consensus because they have two attacks.

08:12.000 --> 08:17.000
They received a retreat, but they still can come to the consensus that they're going to have an attack.

08:17.000 --> 08:19.000
And that's one of the cause of the paper.

08:19.000 --> 08:30.000
The core of the paper is that for a Byzantine problem to be solved, everybody who is an acting part in the problem has to be able to agree on the same outcome.

08:30.000 --> 08:35.000
So let's kind of make look at it as another way.

08:35.000 --> 08:39.000
What if Michael is the traitor?

08:39.000 --> 08:42.000
What if his message gets incorrectly intercepted?

08:42.000 --> 08:46.000
So you can see here again, George and Harold receive attack.

08:46.000 --> 08:56.000
John now receives retreat and sends retreat, but if we look at the combination of that, we still can achieve consensus.

08:56.000 --> 09:06.000
So now we've proven that for four generals, we can achieve consensus in the presence of a single traitor.

09:06.000 --> 09:12.000
Now in the paper, Leslie writes this as three and plus one, where M is the number of traders.

09:12.000 --> 09:22.000
So the number of generals that you need to achieve consensus with one traitor is three plus one or four.

09:22.000 --> 09:26.000
Now if you have two traders, it's seven.

09:26.000 --> 09:32.000
Now I looked at this and I was like seven, you need seven generals to come to consensus, that doesn't make sense to me.

09:32.000 --> 09:41.000
I thought five would be fine because, you know, if two of them don't get a message, you've still got three, you can still come to consensus.

09:41.000 --> 09:46.000
It's not the case, so let's look at the proof.

09:46.000 --> 09:58.000
So one traitor for two traders, we need seven, three ten generals and four thirteen, and that's following on from the formula.

09:58.000 --> 10:09.000
So now we don't have a kind of a slide for this, it's kind of going to be, sorry, a picture, but you can see here that the commander issues attack,

10:09.000 --> 10:14.000
every one of the latinants, I've got five latinants here, six generals and total.

10:14.000 --> 10:23.000
Now what each lieutenant does is each lieutenant sends the commander's message to each other lieutenant.

10:23.000 --> 10:27.000
So they can all see what everybody else received.

10:27.000 --> 10:32.000
Lieutenant three in lieutenant four are traders, however.

10:32.000 --> 10:39.000
So to achieve consensus, they all have to agree on the same thing, and you can see here they all agree on attack.

10:39.000 --> 10:44.000
Now we're just kind of ignoring the traders out of this.

10:44.000 --> 10:49.000
So I looked at this, and I was like, but that doesn't prove three and plus one.

10:49.000 --> 10:55.000
Like this proves that this does work with less, unless the commander is a trader.

10:55.000 --> 11:06.000
So if we look at what happens when the commander is a trader, you can see that with the commander being a trader under lieutenant is a trader,

11:06.000 --> 11:10.000
then the overall kind of consensus can be affected.

11:10.000 --> 11:17.000
And now, lieutenant one has retreat, lieutenant five has retreat two and three are attacks.

11:17.000 --> 11:24.000
They don't have consensus, which means they would fail to kind of win the battle.

11:24.000 --> 11:28.000
So let's add the additional general link here.

11:28.000 --> 11:32.000
And we're going to add the additional general in, so now we have seven.

11:32.000 --> 11:37.000
And we're theoretically satisfying that formula, three and plus one.

11:37.000 --> 11:39.000
But we still have inconsistency.

11:39.000 --> 11:49.000
So we have this three and plus one, and we still have inconsistency because it's still possible with the two traders to influence the kind of the outcome.

11:49.000 --> 11:55.000
And the reason for this is when I read the paper and I'm looking through it is I missed a core step.

11:55.000 --> 12:08.000
The core step is that Leslie Lampord also states that to kind of identify and validate what you need are T plus one voting rounds.

12:08.000 --> 12:13.000
Where T is the number of traders and was previously known the traders, why don't make this up?

12:13.000 --> 12:15.000
This is this kind of average written.

12:15.000 --> 12:16.000
But we have T plus one.

12:16.000 --> 12:24.000
And if we can look at those rules on there, what what an example voting round would be is that every lieutenant sends their value to everybody else.

12:24.000 --> 12:26.000
They try to find an outcome.

12:26.000 --> 12:29.000
If they then they send the result onto others.

12:29.000 --> 12:33.000
And then the little ten's going to use that data to complete a final outcome.

12:33.000 --> 12:46.000
So if we kind of look at that, then we can now see that the lieutenant two receives all of the data from every other lieutenant.

12:46.000 --> 12:51.000
And they can see that actually the consensus amongst the other lieutenant is retreat.

12:51.000 --> 13:01.000
So then if they change their decision to retreat, they now have consensus among all of the generals and the formula works.

13:01.000 --> 13:11.000
But what if we kind of say, all right, let's now apply that additional round to the six generals?

13:11.000 --> 13:14.000
Can we achieve consensus with six generals?

13:14.000 --> 13:15.000
And the answer is no.

13:15.000 --> 13:26.000
Because the kind of that we can't, we're still not, well, we're getting a volume of attack, which still didn't give us consensus on the sort of the previous.

13:26.000 --> 13:28.000
So the formula stands.

13:28.000 --> 13:32.000
Three M plus one with T plus one voting rounds.

13:32.000 --> 13:35.000
And you can solve the Byzantine generals problem.

13:35.000 --> 13:37.000
And it's just cool.

13:37.000 --> 13:42.000
Like this goes back, obviously, to 1034.

13:42.000 --> 13:46.000
But also, I think the paper when it was originally written was 1978.

13:46.000 --> 13:54.000
And we'll talk about that later, but we kind of still see this today.

13:54.000 --> 13:55.000
Cool.

13:55.000 --> 13:57.000
Now we're going to talk about consistency.

13:57.000 --> 14:05.000
So in the time of the Byzantine Empire, we are still trying to figure out how to handle consistent communication.

14:05.000 --> 14:08.000
Even though we have the Byzantine generals problem,

14:08.000 --> 14:17.000
figuring out how to handle just transparent open communication, like you would in a relationship is pretty difficult.

14:17.000 --> 14:18.000
Thank you.

14:18.000 --> 14:22.000
So the best stages are built on open and honest communication.

14:22.000 --> 14:33.000
So how do we have that happen when we have different militias, different lieutenant and maybe faulty pigeons?

14:33.000 --> 14:37.000
So in typical Michael fashion, he asks John the Unic,

14:37.000 --> 14:41.000
Hey, how do I figure out how to handle like inconsistent communication from my generals,

14:41.000 --> 14:44.000
from my lieutenant, from myself, from you?

14:44.000 --> 14:55.000
And John the Unic identified three key areas to make modalities or models for better communication across these militias.

14:55.000 --> 15:00.000
And just to keep in the story, I've kept what they are really called underneath,

15:00.000 --> 15:03.000
and then what we're calling them in terms of the story on top.

15:03.000 --> 15:07.000
So he said we're going to have focus on message consistency, soldier availability,

15:07.000 --> 15:09.000
and tolerance to bad actors.

15:09.000 --> 15:15.000
Couldn't find a good way to say tolerance to network petitions in like 10-10-35 terms.

15:15.000 --> 15:19.000
So that's what we get.

15:19.000 --> 15:22.000
So John the Unic proposed four different models.

15:22.000 --> 15:29.000
Aventual week strong and sequential consistency.

15:29.000 --> 15:36.000
So for eventual consistency means that we can't really guarantee that the message would be delivered,

15:36.000 --> 15:42.000
but no promises also on when a different general inquires to the lieutenant.

15:42.000 --> 15:44.000
Hey, have you received this message?

15:44.000 --> 15:45.000
Are you up to date?

15:45.000 --> 15:47.000
You will eventually get it.

15:47.000 --> 15:50.000
You will eventually get the most up to date message.

15:50.000 --> 15:54.000
So we can see here, don't wait.

15:54.000 --> 15:55.000
Thank you.

15:55.000 --> 15:57.000
No.

15:57.000 --> 16:02.000
You can see here that general one says, hey, are we going to attack?

16:02.000 --> 16:03.000
I'm sending an update.

16:03.000 --> 16:04.000
Hey, we attack.

16:04.000 --> 16:06.000
John choose, are we attacking?

16:06.000 --> 16:08.000
No, we're eating.

16:08.000 --> 16:09.000
We're having dinner.

16:09.000 --> 16:13.000
But eventually the lieutenant comes up today and tells the third general.

16:13.000 --> 16:14.000
Yes, we are attacking.

16:14.000 --> 16:18.000
So eventually consistency.

16:18.000 --> 16:19.000
Yes.

16:19.000 --> 16:20.000
I'm going to do this now.

16:20.000 --> 16:22.000
I'm better at this.

16:22.000 --> 16:28.000
Anyways, for a week consistency, there are no promises, no guarantees.

16:28.000 --> 16:29.000
You can send updates.

16:29.000 --> 16:33.000
There's no guarantee from the lieutenant that he will get the update.

16:33.000 --> 16:39.000
But also on top of that, there's no guarantees or safety promises on when the other general

16:39.000 --> 16:43.000
will receive that update as well.

16:43.000 --> 16:49.000
For strong consistency means that every ask or update to the general and also every read

16:49.000 --> 16:51.000
will have the most up to date information.

16:51.000 --> 16:54.000
So this is very hard to actually deploy.

16:54.000 --> 16:58.000
So they created something called.

16:58.000 --> 16:59.000
Strong sequential.

16:59.000 --> 17:02.000
I can make a really cute image so I use the comic.

17:02.000 --> 17:07.000
So basically strong sequential is something that Lampart proposed or John the Unic.

17:07.000 --> 17:09.000
Choose your character here.

17:09.000 --> 17:12.000
What it means is that we don't really care about the things that happen in between.

17:12.000 --> 17:14.000
We only care about the end goal.

17:14.000 --> 17:20.000
So in terms of strong sequential, we can say that hey, the messages may arrive out of order.

17:20.000 --> 17:22.000
They may arrive in order.

17:22.000 --> 17:24.000
What cares is that we get to the right purpose.

17:24.000 --> 17:28.000
And a good video that I watched about this because I've been studying this all week.

17:28.000 --> 17:29.000
Which is fun.

17:29.000 --> 17:35.000
It's basically like if you've ever commented something on Facebook or you've twisted before.

17:35.000 --> 17:38.000
And you'll see that someone has commented before you after you refresh.

17:38.000 --> 17:40.000
That is sequential.

17:40.000 --> 17:46.000
Strong sequential consistency, which is that you'll see those updates happen after.

17:46.000 --> 17:52.000
And it will get to the right order, but you might see them not when you do it the first time.

17:52.000 --> 17:58.000
So consistently models are used in everyday seizures or in everyday distribution systems.

17:58.000 --> 18:01.000
You've probably heard about them a bunch of you deal with databases.

18:01.000 --> 18:04.000
And I think they're pretty good to know, especially when designing systems.

18:04.000 --> 18:08.000
Limer talks a lot about using math.

18:08.000 --> 18:10.000
Oh, sorry.

18:10.000 --> 18:14.000
Limer talks a lot about using math to design these distributed systems.

18:14.000 --> 18:20.000
And I think consistency models are things that aren't shown enough when making these systems.

18:20.000 --> 18:22.000
So that we can know.

18:22.000 --> 18:26.000
Nick, you're so bad at this.

18:26.000 --> 18:32.000
So in a more Limer-esque state of mind, knowing consistency models will help you deal with

18:32.000 --> 18:36.000
down time or when you're creating these systems.

18:36.000 --> 18:40.000
What you want to do with these types of messages and what consistency guarantees you want to give.

18:40.000 --> 18:42.000
And then you can give the next one.

18:42.000 --> 18:44.000
Okay, cool.

18:44.000 --> 18:46.000
And now on to concurrency.

18:46.000 --> 18:48.000
This one is short and sweet, too.

18:48.000 --> 18:52.000
Sorry.

18:52.000 --> 18:53.000
Cool.

18:53.000 --> 18:56.000
So Limer and I don't have a lot of things in common.

18:56.000 --> 19:00.000
But what we do in common is that he's never taken computer science course

19:00.000 --> 19:02.000
and neither have I.

19:02.000 --> 19:08.000
But he did write a talk or a white paper on how to teach concurrency from his point of view.

19:08.000 --> 19:12.000
So the sad part about this is after I attempted to teach this.

19:12.000 --> 19:18.000
I kind of can call it a computer science course and we will no longer have these things in common.

19:18.000 --> 19:24.000
So Limer starts about talking about how computer science is based off of the

19:24.000 --> 19:26.000
figuring out what is a computation.

19:26.000 --> 19:30.000
Because computer science is the study of computation.

19:30.000 --> 19:34.000
So for the purpose of this talk, thank you.

19:34.000 --> 19:37.000
We can state that a computation is a sequence of steps.

19:37.000 --> 19:43.000
But that leads us to our next question, which is what is a step?

19:43.000 --> 19:50.000
And for the purpose of this talk, a step is one state, a transition from one state to another.

19:50.000 --> 19:58.000
Now, Limer and his white papers like Staharp about how he thinks technology should work.

19:58.000 --> 19:59.000
And I think he's right.

19:59.000 --> 20:06.000
He says that in Western society, we focus a lot on actions and verbiage.

20:06.000 --> 20:14.000
And so in computation, we are focusing a lot on the transition and how these verbs influence how we think about computers.

20:14.000 --> 20:18.000
But what he says is that we should think of computations and computational hardware

20:18.000 --> 20:20.000
as just state machines, right?

20:20.000 --> 20:26.000
I only care about the present state I am in and how I transition to the next state and not the past.

20:26.000 --> 20:37.000
So with that in mind, we will only focus on basically how to transition from states and not the actions to transition from states.

20:37.000 --> 20:43.000
So Limer goes on to talk about invariances, which is a term I've never heard until his paper.

20:43.000 --> 20:49.000
So an invariance is a condition that can is always true throughout the execution of the program.

20:49.000 --> 20:59.000
And he talks a lot about inductive invariances, which is basically similar to strong sequential concurrency.

20:59.000 --> 21:05.000
Man, that's hard to say fast, which is basically we do not care about the things that happen in between an execution.

21:05.000 --> 21:10.000
We only care about that we get to that true and state goal.

21:10.000 --> 21:24.000
So a very simple way to understand invariances is if your base invariance is n equals 1, then your inductive invariance, which could also get to that true state, is n equals 1 n n plus 1.

21:24.000 --> 21:29.000
You get to the same n equals 1 value in either way.

21:29.000 --> 21:38.000
I found this quote really cute, mainly because when he talks about computer engineers, he says she, no, maybe happy.

21:38.000 --> 21:43.000
Anyway, what's an engineer understands what a computation is and how it is described.

21:43.000 --> 21:47.000
She can understand the most important concept of in concurrency.

21:47.000 --> 21:58.000
So I thought you were cool to lay out how to teach concurrency in a different way than just going into like talking about mutual exclusions and semifores, which we will get to.

21:58.000 --> 22:13.000
Awesome, so in 1965, a guy with a name, who I'm not going to try and say right now, had a paper called the solution of problem and concurrent programming control and he proposed the requirement of mutual exclusion.

22:13.000 --> 22:20.000
And if those of you who have really taken compsi 101, probably know about this and those who don't, what's up, good job.

22:20.000 --> 22:31.000
Mutual exclusion is a way to present race conditions and concurrent executions.

22:31.000 --> 22:40.000
Yes. Now, Lampart created his own mutual exclusion algorithm called the bakery algorithm.

22:40.000 --> 22:46.000
Lampart calls out that in the original mutual exclusion algorithm, it satisfies the deadlock freedom.

22:46.000 --> 22:53.000
Meaning that if one or more processes try to enter the critical section, one of them must succeed.

22:53.000 --> 23:03.000
Now, in more recent papers, which I guess are younger papers, they try to satisfy a more difficult.

23:03.000 --> 23:12.000
They call it, no, go back. They try to satisfy a stronger requirement, which is the starvation freedom requirement.

23:12.000 --> 23:16.000
Meaning that every process that tries to enter the critical section eventually does so.

23:16.000 --> 23:22.000
And so the bakery algorithm does satisfy the starvation requirement.

23:22.000 --> 23:30.000
As pretty simple, honestly, if you've ever been to a bakery, you know how this works, which is you go in, you take a number, you wait.

23:30.000 --> 23:37.000
And so every process that enters the bakery or every customer that enters the bakery goes in takes a number and waits.

23:37.000 --> 23:46.000
When it is cold, then it places its order and you take it and you leave. Now, there's a couple things to take into mind here.

23:46.000 --> 23:52.000
One is that when you take the number and you're waiting, you are in a non-critical section of the code.

23:52.000 --> 23:55.000
So you can walk around, you can text on your phone, you can do whatever.

23:55.000 --> 24:00.000
But when you are talking and you're placing that order, you're in critical code execution.

24:00.000 --> 24:10.000
Now, a couple of requirements are met by this, basically saying that you can't have a process that hangs and basically tries to take an order forever.

24:10.000 --> 24:15.000
Another thing that they also try to meet is a process halting.

24:15.000 --> 24:19.000
I didn't know a good way to say in the algorithm here how a process would all.

24:19.000 --> 24:23.000
It would just be like if you were taking an order and then you just like plopped over.

24:23.000 --> 24:28.000
So it basically says if that does happen, then you exit into the non-critical section.

24:28.000 --> 24:33.000
It would be like if you got pushed away back into the bakery.

24:33.000 --> 24:38.000
And you can at any time, as long as you're in the non-critical section, re-enter the critical section.

24:38.000 --> 24:43.000
So you can order as many times as you would like as long as you do not halt.

24:43.000 --> 24:51.000
This is a good way to stop concurrent programs from reading and writing and updating at the same time.

24:52.000 --> 24:55.000
Well, now, this is pretty simple.

24:55.000 --> 25:01.000
The bakery algorithm, this is his first, like, basic way of solving it.

25:01.000 --> 25:11.000
What he then does throughout the paper, which you can read on your own time, is go into the, do constructive bakery algorithm and the distributed bakery algorithm.

25:12.000 --> 25:17.000
Basically, we try to make sure that the lower number always goes first.

25:17.000 --> 25:19.000
No one halts in the critical section.

25:19.000 --> 25:24.000
And that everyone at least gets one attempt at ordering.

25:30.000 --> 25:34.000
And that leads us up to kind of looking at the concept of time.

25:34.000 --> 25:40.000
Now back in the, the Byzantine Empire, they did have distributed time.

25:40.000 --> 25:45.000
They had a distributed clock, which we still have today.

25:45.000 --> 25:48.000
It's the big orange ball up in the sky.

25:48.000 --> 25:52.000
And that was the kind of the primary source of understanding what the time was.

25:52.000 --> 25:59.000
You would have to use a sundial in, in combination with, with obviously the sun to understand time.

25:59.000 --> 26:06.000
Now, in general, if you just kind of want an approximation of time, that is absolutely fine.

26:06.000 --> 26:17.000
But the issue is when you need absolute time and you need to, to be kind of for some things like if you want to do message ordering based on time, then you need to be much more precise.

26:17.000 --> 26:22.000
Where kind of sort of in certain his paper in 78 Lampo is addressing this as a problem.

26:22.000 --> 26:31.000
And one of the kind of the examples that he's citing is that, you know, in later revisions, even with prompt things like NTP, you can still have clock drift.

26:31.000 --> 26:49.000
And even if clock drift is by milliseconds, if one machine doesn't have the same exact representation of time that another machine, when it comes to event ordering based on time, you can run into problems and you can end up with mismatched orders.

26:49.000 --> 26:55.000
So if we, we kind of look at our theoretical scenario and see how this, this could work, it would be like this.

26:55.000 --> 27:04.000
So we have Michael and George and Michael decides that, hey, we're going to attack the city and we're going to attack the city at 1030.

27:04.000 --> 27:12.000
Now, Michael uses a physical clock to time stamp his message at 842.

27:12.000 --> 27:21.000
George receives his message, but George then says, no, we shouldn't be attacking at 1030. We need to attack it 830.

27:21.000 --> 27:25.000
So what he does is he then sends a message to everybody.

27:25.000 --> 27:32.000
Time stamping it from what his clock says, which is 841, and sends it onto everything everybody else.

27:32.000 --> 27:39.000
Now the problem is when Harold and John receive this message on when to attack.

27:39.000 --> 27:42.000
We, you know, we can't guarantee the order of the messages.

27:42.000 --> 27:46.000
So what they're doing is they're using those time stamps to sort the order.

27:46.000 --> 27:51.000
Because Michael's time stamp is later than George's, they take Michaels.

27:51.000 --> 27:59.000
So Michael and George attack at 830 and Harold and John attack at 1030.

27:59.000 --> 28:10.000
So to kind of solve the problem of physical clocks in this way, lampor suggests a very, very straightforward and simple solution.

28:10.000 --> 28:23.000
And that is to use what is known, a lampor clock, which is that every message will have a distributed scalar clock value attached to it rather than the physical value.

28:23.000 --> 28:29.000
So then he proposes that the kind of the way that the system would function is like this.

28:29.000 --> 28:33.000
So every general has a logical counter T.

28:33.000 --> 28:39.000
Every time an event occurs, the new value of T is T plus 1.

28:39.000 --> 28:45.000
Every time a message is sent, the new value, again, is T plus 1.

28:45.000 --> 28:49.000
And then T is also added on to the message.

28:49.000 --> 28:55.000
So when another general receives a message, what they can do is they can look at that value of T.

28:55.000 --> 29:02.000
Now if that value of T is greater than their values or what we're using T, the message T there is T prime.

29:02.000 --> 29:06.000
They will replace it with their own value and add one.

29:06.000 --> 29:10.000
And then in principle this works really well.

29:10.000 --> 29:14.000
So we can see here if we kind of look at that.

29:14.000 --> 29:17.000
A couple more rules on that.

29:17.000 --> 29:20.000
I will show you the chart in a second.

29:20.000 --> 29:23.000
Lampor clocks mostly work.

29:23.000 --> 29:26.000
Now again, but there are issues.

29:26.000 --> 29:35.000
So what they can guarantee with a lampor clock is that if the event A happens before B, the lampor clock value of A will be less than B.

29:35.000 --> 29:43.000
But what you can't guarantee is that if a lampor clock value is less than another one, that that event happened before.

29:43.000 --> 29:47.000
And that's because clock values can run out of sync.

29:47.000 --> 29:54.000
It's also possible to have two events with the same timestamp and lampor clock doesn't do well with concurrency.

29:54.000 --> 30:00.000
But depending on the scenario, it's a very simple solution that works incredibly well.

30:00.000 --> 30:07.000
So if we kind of apply lampor clocks here, so initially Michael's initial clock value is zero.

30:07.000 --> 30:12.000
So he sends a message, increments to one, adds it to the message.

30:12.000 --> 30:15.000
George then receives that.

30:15.000 --> 30:22.000
He takes the timestamp value here because it's greater than his own clock value, adds one to it.

30:22.000 --> 30:30.000
And then sends his own message, which has now has a value of three.

30:30.000 --> 30:40.000
So when we look at Harold and John, well Harold and John now don't need to worry about the physical clock value, what they're looking at is that the lampor clock value.

30:40.000 --> 30:45.000
The original message that they got from Michael has two attributed to it.

30:45.000 --> 30:50.000
But then the later message that they got from George has four attributed to it.

30:50.000 --> 30:54.000
So they can identify what the correct order of those messages are.

30:54.000 --> 30:57.000
Very, very simple mechanism.

30:57.000 --> 31:07.000
What strikes me is really quite lovely about this is that a lot of this research when it was kind of done by Leslie certainly the earlier papers.

31:07.000 --> 31:14.000
He was writing about the problems and systems and distributed computing and probably didn't even have computer on his desk.

31:14.000 --> 31:18.000
I mean, I think there's a very sort of high value, maybe a new number typewriter.

31:18.000 --> 31:26.000
A lot of this is done very theoretically and through sort of hand computation, which I think is beautiful.

31:26.000 --> 31:35.000
But kind of back to the kind of the point here, let's look at some of the problems and then we can have an issue here with concurrency.

31:35.000 --> 31:43.000
So in the instance that Michael sends a message saying, hey, we're going to attack at 1030.

31:43.000 --> 31:49.000
But George doesn't wait for Michael's message to arrive. He says, no, we should attack at 830.

31:49.000 --> 31:57.000
So in effect, they're sending the messages exactly the same time. They both have a clock value of 1.

31:57.000 --> 32:04.000
So then Harold and John, when they receive those messages, they receiving two messages of value of 1.

32:04.000 --> 32:11.000
So what do they do? How do they distinguish which value, which message should take precedence over the other?

32:11.000 --> 32:20.000
And the way that kind of Leslie discusses this and attempts to solve this problem is same or look in a situation like this, what you need is a tie break.

32:20.000 --> 32:30.000
And a tie break, when you have two messages with the same timestamp, is just an algorithm that you'll use to kind of determine what the order will be.

32:30.000 --> 32:37.000
You know, this could be just a lexical rule sort of list of the names of the nodes.

32:37.000 --> 32:43.000
You could have a predetermined dictionary or a predetermined order.

32:43.000 --> 32:51.000
The key thing is that every single node in the system needs to have an implement the same tie break mechanism.

32:51.000 --> 32:59.000
So whilst you might not have guaranteed message ordering, at least you have consistency in the order.

32:59.000 --> 33:10.000
Now, if you kind of want to solve the problems and look at things in a more complex way, vector clocks are a different approach looking this.

33:10.000 --> 33:25.000
A vector clock works in the way that rather than keeping sort of a unified single scalar for all the clock value in the entire system, it maintains a vector, which is shared amongst all of the nodes.

33:25.000 --> 33:31.000
But each dimension in that vector has the individual clock value.

33:31.000 --> 33:42.000
And this allows you to do comparison to be able to identify things like concurrency better, a little bit of homework to kind of read that up on maybe.

33:42.000 --> 33:49.000
But I'm going to hand back over to Sarah and probably press the button really badly for her when she talks about gossip protocols.

33:49.000 --> 34:00.000
And do lots of pointing at you. Cool. Our last ish topic is going to be on gossip protocols, which is slightly import-esque, but not really.

34:00.000 --> 34:08.000
So if you've ever been on high school, which I've talked a lot about today for some reason, you are familiar with how a rumor spreads throughout a high school.

34:08.000 --> 34:16.000
Basically, someone starts saying that I haven't watched Mean Girls in a really long time, but I think Regina George is mean.

34:16.000 --> 34:21.000
And then eventually the whole school knows and then you have to leave school and like, you know, move states and whatever.

34:21.000 --> 34:37.000
The same thing as how like gossip protocols are created, it's a infectious or rumor style based protocol that is decentralized and peer to peer and focuses on nodes sharing information with another in their cluster to spread information about the cluster status.

34:38.000 --> 34:43.000
So how a gossip protocol works, and please don't look too hard. Did you change these?

34:43.000 --> 34:45.000
Yeah, I changed those.

34:45.000 --> 34:57.000
Sick. Anyways, for around one of gossip protocols, you can see that G is talking to B interest information with B about either status of the cluster or node status, sending an update.

34:57.000 --> 35:04.000
It could be like a livelyist probe or it could be saying, hey, I'm dead.

35:05.000 --> 35:15.000
And now that B has this information about G, G can go tell or B can go tell. We should chose names. B can go tell F and D all about this.

35:15.000 --> 35:25.000
And it will start to spread throughout the cluster itself. Basically, as G keeps propagating information about itself, B is now eating.

35:25.000 --> 35:39.000
G in that spread of this information and it will propagate throughout the cluster until we have everyone knowing all of G's drama.

35:39.000 --> 35:46.000
And you can see here it shows this image just because it looked really busy, not even going to hide it here.

35:46.000 --> 35:51.000
So you can see here that basically how gossip protocols work.

35:52.000 --> 36:01.000
Cool. So we talked a lot about Byzantine failures today. And in the beginning, Nick gave like actual discussion on how Byzantine failures and the paper was ran.

36:01.000 --> 36:05.000
Gossip protocols are very susceptible to Byzantine failures.

36:05.000 --> 36:13.000
This is because there is no like hardening techniques on it. It is just I'm sending messages to nodes trusted within my network to spread information.

36:13.000 --> 36:24.000
So it can be right with this information. Now there is a lot of work on Byzantine fault tolerant gossip protocols, but I would say that ones that we have worked with.

36:24.000 --> 36:32.000
Basically, you weren't built that way. And I'm talking more about like member list or any other type of like decentralized peer communication network that you use.

36:32.000 --> 36:49.000
You can use different types of authentication to try and prevent this, but add it to basis things like swim, which is a what is it scalable weekly consistent infections style protocol, which is the one I've seen most implemented.

36:49.000 --> 37:02.000
You can have any type of Byzantine fault tolerance. So you can have things where your cluster or server will fall over because we haven't protected against this.

37:02.000 --> 37:05.000
And now to talk about Byzantine fault tolerance.

37:06.000 --> 37:14.000
I'll talk about the cloud third one and then we'll talk about the robin hood one and even though this is recorded, this is all allegedly.

37:14.000 --> 37:20.000
So cloud fair in 2020 had a Byzantine fault tolerance that happened a huge outage.

37:20.000 --> 37:25.000
And so you can see that Byzantine fault tolerance is still very alive today.

37:25.000 --> 37:41.000
And then basically, in 2020, there was a person who you might be looking at who removed a lamp or time stamp that was on a basically like the gossip protocol moved one case statement as a little baby intern.

37:41.000 --> 37:54.000
And what happened is that there would be this thundering herd-esque problem where no one could come to like a consistent consensus on like how to communicate and nodes would start to topple over.

37:54.000 --> 37:57.000
This caused a huge robin hood outage that was on CNN.

37:57.000 --> 38:02.000
And then the person who might have done this found out via a team meeting on accident.

38:02.000 --> 38:10.000
So you can see that just removing one case statement can cause a loss of multi millions of dollars.

38:10.000 --> 38:16.000
And then the picture of the first Byzantine fault, which is basically that there was this like shuttle that NASA had.

38:16.000 --> 38:28.000
And this thing needed to be turned a little bit more because they kept getting like interchanging yes and no values from the the valve that was like monitoring it.

38:28.000 --> 38:35.000
And so the computer system wasn't able to like reach a consensus and give information because they just had to turn something.

38:35.000 --> 38:41.000
So Byzantine fault tolerance is all around us today.

38:41.000 --> 38:51.000
It's also the thing that probably enables most of your favorite meme kinds because Byzantine fault tolerance is used in things like blockchain.

38:51.000 --> 38:57.000
It's also used in a lot of systems where you need sort of absolute assurance.

38:57.000 --> 39:04.000
So things like maybe I'm I'm going to have something which affects human human life.

39:04.000 --> 39:10.000
I don't want a singular sort of to rely on a single point of failure. I want to have multiple points of failure.

39:10.000 --> 39:15.000
But I want multiple kind of I want to be able to achieve consensus on that.

39:15.000 --> 39:21.000
Lampart crooks. I mean one of the kind of core patterns that we use in distributed systems now is a venting.

39:21.000 --> 39:27.000
And the kind of the the ability to have a lamp or clock which is very low impact on the system.

39:27.000 --> 39:35.000
The ability very fast to calculate very very simple and has a low message overhead is still used in inventing based systems.

39:35.000 --> 39:40.000
You can obviously go a little bit more complex. You can use back to clocks.

39:40.000 --> 39:45.000
You can even use things like lamp or clocks and effect the clocks with Byzantine consensus.

39:45.000 --> 39:55.000
But all of these things got up Byzantine fault tolerance lamp or clocks concurrency consensus.

39:55.000 --> 40:00.000
They are the thing which empower the systems that we're building and using today.

40:00.000 --> 40:07.000
And it really does fascinate me that a lot of these core concepts come back from those kind of core papers that were written in the 70s.

40:07.000 --> 40:15.000
But the kind of the the concepts of which which were no idea of the kind of things that they would be applied in today.

40:15.000 --> 40:19.000
Still all true and are still relevant.

40:19.000 --> 40:22.000
And we we kind of want to thanks somebody for that to me.

40:22.000 --> 40:34.000
Yeah, I mean it brings us back to the original question which was fuzzy lamp or a man who has written so many papers that the scroll and never on his thing just released a book by the way everyone should read it.

40:34.000 --> 40:44.000
I printed it out and also was a man at his time would get into fights with newspapers and things like that who would call him.

40:45.000 --> 40:58.000
Maybe a little I don't know what the like politically correct term here is just so that he's just having a little bit too much fun with his papers and people would call them hard to read and maybe just like an interesting fun guy.

40:58.000 --> 41:06.000
So was he right and the answer I've come to and probably have said for like way too long it's yeah he was right.

41:06.000 --> 41:12.000
We have and come across these problems all the time in technology and he found them in the 1970s.

41:12.000 --> 41:20.000
But on top of that he made these papers fun and interesting even though everyone hates Paxos and they think it's over complicated and they rather choose raft.

41:20.000 --> 41:34.000
Like he was right to teach things like the part time parliament which is the original paper of Paxos in a fun consumable way that isn't hugely technical and full of like algebraic expressions.

41:35.000 --> 41:52.000
So just to say thank you so much for listening I hope you all can run away and and go and find a bunch of reading on on Leslie Lampot and some of his work because it is revolutionary use an absolute pioneer of the industry and a fascinating individual.

41:52.000 --> 41:54.000
Do we have time for questions.

41:55.000 --> 42:03.000
And here's some references but there's a ton more you should go check out his website your exit Microsoft it's the Azure Slabs website sick.

42:03.000 --> 42:10.000
But if anyone has questions you can raise your hand and I'll sprint up to you and if not you can walk down here.