
Modern distributed systems heavily rely on the coordination protocols that enable distributed computing primitives (e.g., distributed reset, token passing, leader election). Such primitives are utilized across the software stack for the development of more complex distributed algorithms, from system-on-chip (e.g., cache coherence) to computer networks (e.g., routing, Paxos). Thus, their dependability and scalability have profound impacts on the overall reliability of modern distributed systems (e.g., distributed key-value storage, distributed AI). However, developing protocols that preserve their functional and non-functional properties regardless of their scale is a complex problem, evidenced by protocols whose proof of correctness takes years (e.g., Chord for distributed hashing). The scalability requirement exacerbates the design and verification problems because scalable protocols are required to function correctly for an unbounded number of nodes and unbounded variable domains. Thus, it is highly desirable to develop methods and tools that automate the design and verification of distributed coordination protocols. In this talk, I will first present the most recent research results on the verification and synthesis of protocols that preserve their fault tolerance while being scalable. Then, I will discuss some practical guidelines that help mainstream engineers in the development of dependable and scalable distributed systems. Finally, I will present some real-world applications of automated design, followed by a discussion on open problems and ongoing research.

Associate Professor @ Michigan Technological University