End-2-End Resource Accounting Leveraging Distributed Tracing
Transitive Resource Accounting (TRA) is a system that builds on top of Facebook’s distributed traces platform, Canopy, with the goal of capturing end-2-end request cost metrics and attributing them back to the originating caller. This data is pivotal in attributing capacity usage and understanding the efficiency of our infrastructure as a whole. Let’s say that a product is going to grow by 20% in the coming year. By how much do its backend service dependencies need to grow to support this added demand? To determine this we must understand what percentage of demand to the dependent services is driven by the product. Attributing this demand becomes increasingly more difficult as a distributed system grows and becomes ever more complex. Service owners can collect service specific data about who is calling them and who they are calling but this is a very narrow view that provides little insight about capacity & efficiency for the system as a whole. With the power of distributed tracing TRA measures the cost of the request from start to finish through every hop in the request path. We can then attribute the source of demand at any depth in the system.