We work on a layer 4 load balancer called Shiv. Shiv routes packets to backends using a consistent hash of the 5-tuple of the packet (namely, the source ip, destination ip, source port, destination port, and protocol). Shiv’s objective is to route packets for a connection (which all have the same 5-tuple) to the same backend for the duration of the connection. If it is unable to do so, this leads to broken connections and user impact (for example, stalled videos). While consistent hashing is quite resilient to changes, when a large number of backends are added or removed, remappings occur, resulting in broken connections. To protect from such changes, Shiv maintains a cache that contains a mapping from 5-tuple to backend. The logic used by Shiv to route packets can be summarized as follows: If the 5-tuple of the packet is in its cache, route it to the backend indicated by the cache. Otherwise, calculate the hash function on the 5-tuple to obtain the destination backend, route the packet to that backend, and place the (5-tuple, backend) entry in the cache. Shiv works well under the following conditions: – In steady state, when the arrangement of Shivs and backends is the same. – When the arrangement of Shivs changes. In this case, packets for a connection may land on a different Shiv host than earlier packets, but both Shiv hosts use the same consistent hash function, and therefore, pick the same backend. – When the arrangement of backends changes. In this case, packets for a connection continue to land on the same Shiv host, which utilizes its cache to route the packet to the same backend as it used to. However, during changes to the arrangement of both Shivs and backends, a nontrivial number of misroutings occur, because the following sequence of events could happen: – Packets for a connection C arrive at a Shiv host A, which picks a backend X – A large topology change occurs on the Shivs and backends. – Packets for connection C now land at Shiv host B != A, which picks a backend Y != X because the hash ring has changed. We have implemented two solutions to this problem, that we will talk about: – Embedding “Server ID” hints into packets, that enable Shiv to route the packets to a specific server without having to perform a consistent hash. – Sharing the 5-tuple to backend cache among all Shivs in a cluster, thereby facilitating consistent decision making among them in the face of hash ring changes.
- WATCH NOW
- VIEW EVENTS
- 2023
- JANUARY
- No Events
- FEBRUARY
- no events
- MARCH
- RTC @Scale 2023
- April
- no events
- May
- AI Infra @Scale
- June
- no events
- July
- Systems @Scale Summer 2023
- August
- Product @Scale 2023
- September
- Networking @Scale 2023
- Reliability @Scale 2023
- October
- Mobile @Scale 2023
- November
- Video @Scale 2023
- December
- Systems @Scale Winter 2023
- 2022
- January
- no events
- February
- RTC @Scale 2022
- March
- Systems @Scale Spring 2022
- April
- Product @Scale Spring 2022
- May
- Data @Scale Spring 2022
- June
- Systems @Scale Summer 2022
- Networking @Scale Summer 2022
- July
- no events
- August
- Reliability @Scale Summer 2022
- September
- AI @Scale 2022
- October
- no events
- November
- Networking @Scale Fall 2022
- Video @Scale Fall 2022
- December
- Systems @Scale Winter 2022
- 2021
- 2020
- January
- no events
- February
- no events
- March
- no events
- April
- no events
- May
- no events
- June
- no events
- July
- no events
- August
- Systems @Scale Remote Edition — Summer 2020
- September
- no events
- October
- no events
- November
- Performance @Scale NY 2020
- Keeping the Lights On @Scale
- AI @Scale 2020
- December
- no events
- 2019
- January
- no events
- February
- no events
- March
- no events
- April
- no events
- May
- no events
- June
- Performance @Scale 2019
- Systems @Scale Summer 2019
- July
- no events
- August
- no events
- September
- Networking @Scale California 2019
- Systems @Scale Fall 2019
- Video @Scale 2019
- October
- The @Scale Conference 2019
- November
- Fighting Abuse @Scale 2019
- Systems @Scale Tel Aviv Fall 2019
- Networking @Scale Boston 2019
- December
- no events
- 2018
- January
- Android @Scale 2018
- February
- no events
- March
- Performance @Scale 2018
- April
- Video @Scale 2018
- Fighting Abuse @Scale 2018
- May
- Networking @Scale 2018
- June
- no events
- July
- Systems @Scale Summer 2018
- August
- no events
- September
- The @Scale Conference 2018
- October
- Data @Scale Boston 2018
- November
- Mobile @Scale Tel Aviv 2018
- December
- no events
- 2017
- January
- no events
- February
- Machine Learning @Scale 2017
- Video @Scale 2017
- March
- no events
- April
- no events
- May
- Dev Tools @Scale 2017
- Networking @Scale 2017
- June
- Data @Scale 2017
- July
- no events
- August
- The @Scale Conference 2017
- September
- no events
- October
- Mobile @Scale Boston 2017
- November
- no events
- December
- no events
- 2016
- January
- Video @Scale 2016
- February
- Performance @Scale 2016
- March
- Mobile @Scale 2016
- April
- no events
- May
- Networking @Scale 2016
- June
- Data @Scale 2016
- July
- no events
- August
- The @Scale Conference 2016
- September
- no events
- October
- Boston Networking @Scale 2016
- November
- Spam Fighting 2016
- December
- no events
- 2015
- 2023
- DIVIDER
- EXPLORE TOPICS
- MACHINE LEARNING AND AI
- Data, Systems, and Networking
- MOBILE, VIDEO, AND WEB
- DEV TOOLS AND OPS, PRIVACY, SUSTAINABILITY, AND PERFORMANCE
- Fighting Abuse and Security
- DIVIDER
- Annual @Scale Conference
- Blog
- Community Forum
- Speaker Submissions
- About @Scale