Distributed-Systems

The End to End Argument

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away

Pratik Sanghavi

16 Feb 2025 • 3 min read

I think my folly has always been expecting research papers to point at a problem and then meticulously work through a solution that addresses the problem. But that's just a blinkered view. Some research just gives a bird's eye view on the universal notion of doing things. And distills what in essence is convention versus what needs to be. This was the End-to-End Arguments in System Design paper to me. While primarily a computer networks paper, the guidelines it lays are generalizable to any design analysis.

The Gist

The paper describes a design principle for function placement in distributed systems. While pushing functions down into lower layers (the OSI model emerged around the time of this paper) makes sense for robustness and reliability guarantee considerations, performance impacts should also be considered. Think of the End-to-End argument as an Occam's razor to guard against the allure of making the lower layers in a system take on unnecessary responsibilities to "aid" the higher layers.

Why Complexity is best avoided....

The paper walks through a couple of studies to exemplify this. I'll cover a few

1. The Careful File Transfer

Let's say you wanted to transfer a file from one host to another. There are umpteen modes of failures that could potentially hamper this action - read errors, transfer errors, buffer errors, lost packets and crashes.

One choice is to reinforce each step in the process. But this introduces overheads on each step (and multiplies the code in detection/correction mechanisms!).

Another option is just do an end-to-end check and retry. Sure, when failures do happen, we'll not have a way of early detection. But failure probability is fairly low. Moreover, at the application layer, we will have to do an eventual final check inspite of every careful check along the way. So why not just push this functionality to the terminal ends? Does such a check at a packet level instead of the whole file level still benefit from the performance boost and relax the heavy-lifting the lower layers of the networking stack need to do (hint: it does, especially since probability of failure increases exponentially with increasing file sizes)?

2. The Reliable Delivery Conundrum

The very reason for acknowledging a packet (beyond congestion control), is to know if the message was acted upon by the target. Not delivered but used for changing the state of a target application. So a "I did it" or "I didn't". Making the target host sophisticated enough that when it accepts a message, it becomes responsible for guaranteeing that it is used becomes a good objective. But what would we do without self-clocking!

Moreover, reliability is also driven by application specific requirements. Think a VoIP call vs a storage systems. Completely different dynamics to delay vs reliability and hence an application level call has to be made about tradeoffs. The lower layers need to be flexible/loose enough to support different traffic streams.

3. On Securing Packets

Even if the encryption and decryption is handled by the communication system, there's always a risk that in a time-shared machine, another application can see the data that is being fanned out by the transmission system. Moreover, we still need a means to verify the authenticity of the message at the application layer. This makes the security more desirable at the end rather than within the network.

4. The RISC Microprocessor

What if the instruction set supported by the microprocessor was as minimalist as possible with all the higher level functionalities achieved by chaining these primitives together? The result - a reduced instruction set computer. Avoid esoteric features and simplify design.

5. The Open Operating System

Make an operating system as flexible as possible for applications to leverage unique quirks of the underlying hardware. This way we can derive the best performance with minimal overheads.

Conclusion

The End-to-End Arguments in System Design paper is a masterclass in simplicity and clarity, reminding us that not all problems require intricate, layered solutions. Instead, it champions the idea that sometimes, the most elegant and effective approach is to push functionality to the edges—where the real work happens.

So, the next time you’re tempted to add another layer of complexity to solve a problem, take a step back. Ask yourself: Does this really need to be here? The answer might just lead you to a cleaner, more robust solution. After all, as the paper shows, sometimes less really is more.