This Blog has several posts dedicated to the topic of Distributed Computing and Byzantine Fault Tolerance in its records. Today I want to get back to this topic. As is well known the Byzantine fault-tolerant (BFT) algorithm has found implementation within the field of cryptographic protocols, namely the now ubiquitously known crytpocurrency Bitcoin.
But BFT algorithms are also relevant to distributed computing applications more generally. This trend will only certainly increase in the years to come, with distributed computing becoming of widespread use and adoption within various sorts of frameworks and use cases.
The Website ItsBlockchain today released an article about one version of BFTs, this one called practical byzantine fault tolerance algorithm (PBFT Consensus algorithm). It is a brief paragraph explaining roughly the algorithm, with the additional information about the current main projects applying it to their projects.
I would like to compliment this post with a talk given by one researcher that happens to be Portuguese and that has contributed decisively to this Computer Science development. His name is Miguel Castro and he was co-author with Barbara Liskov of one of the founding papers about PBFTs: Practical Byzantine Fault Tolerance, from 1999, where it is presented the Castro-Liskov algorithm for the first time. The video is from Microsoft Research’s Youtube channel, and it is a nice stress free presentation of the topic. Please check below the introductory script from the channel providing an overview of an almost lengthy 2 hour talk.
This lecture is about implementing Byzantine fault tolerant state machine replication. A Byzantine faulty replica can behave arbitrarily, for example, it may be controlled by an attacker, whereas algorithms like Paxos assume that faulty replicas fail by stopping. Byzantine fault tolerant state machine replication works correctly if less than a third of the replicas are faulty. The lecture starts by discussing assumptions in replication algorithms and how they can be invalidated by an attacker. Then it describes the Castro-Liskov algorithm, explains how it works, and presents a number of important optimizations. Next it discusses implementation issues including a methodology for reusing existing code to implement replicated state machines, and a Byzantine fault tolerant replicated file system built using this methodology. The performance of this implementation is discussed next followed by a brief discussion of other work in Byzantine fault tolerance.
The other two points worth to further mention in the video is the discussion around replication algorithms in the context of distributed computing, and the crucial need for consensus of different machines or computational agents as to what constitutes a valid message, signal or transaction; and the other point is what Miguel Castro thinks about other related work within the field and the possible future developments.
body text image: practical byzantine fault tolerance