By 2020, Instagram had accumulated a decade of incremental product development. While Instagram had grown significantly in functionality, users, and revenue, it had also grown significantly in other metrics: code base size and build times. I joined Instagram in 2017, when the app took 20 seconds to build in Xcode. By 2020, this had increased to two minutes—six times slower. The slow build times were a repeated complaint from our product engineers.
It’s extremely hard to maintain a ‘flow state’ … the times are so long that your phone screen will go to sleep 3-4 times while you’re waiting, or you’ll get distracted by messenger or a diff review, or like anything else.
Instagram is a highly visual and interactive app, and we care deeply about “craft”: whether the app feels well built and is nice to use. Achieving craft in an iOS app requires many small tweaks such as moving an element one pixel to the left or right, fine-tuning an animation, or making interactive gestures feel just right. In code, these changes can be made very quickly—if iterating on a single file, they can be done in seconds. Adding multiple minutes waiting time before the changes can be seen on screen is devastating for productivity.
Developer productivity is difficult to measure. Build times happen to be one of the more measurable contributors, but we shouldn’t focus only on the most easily measurable work. We interviewed some of the more senior product engineers across Instagram to verify that this was an important issue that we could work on and that we weren’t missing any larger problems that were more qualitative or that were not quantified yet. A common response:
Pretty much every product engineer I’ve talked to has expressed frustration with build times. Incremental builds especially take a long time and make feature work pretty slow right now.
After interviewing engineers, we were confident that build times were a critical issue affecting iOS developer productivity at Instagram. Although our code base was now much larger, I wanted to return us to the 2017-era numbers: 20 seconds.
Why were builds slow?
To improve build speeds, we needed to know what was slowing them down. At a high level, an iOS app build progresses like this:
- Compilation: Source code files are translated into executable code.
- Linking: Combines those separate compiled files into a single executable.
- Bundling: Combines the executable code with resources, including images, fonts, and language translations, into a complete iOS app.
Our codebase contains millions of source code files, all of which need to be compiled to executable code. However, compilation wasn’t the problem. Our build system, Buck, is very good at incremental builds, which happen when a developer is making a small change and they have already built the app locally. Buck knows that it needs to redo compilation only for files that may have changed. Buck also has a remote cache. While our codebase contains millions of files, product engineers only modify a few of them in each change. Even on an initial build, Buck is able to download pre-built artifacts from the cache.
In addition to Buck’s optimizations, compilation itself is not that slow. In 2020, most of our code and ongoing development was still in Objective-C, with only a limited amount of Swift code. While Objective-C has many disadvantages compared to Swift, compilation performance is not one of them. Objective-C compiles very quickly, and its use of header files for importing code make it extremely parallelizable.
We found that our biggest problems were linking and bundling. A build that modified only one source-code file and compiled that file in just one second would still take two minutes, because linking and bundling the app were so slow.
Linking and bundling did not work like compilation. Linking, in particular, was the biggest problem. Unlike compilation, it was not isolated to a single source-code file: We were combining every compiled file into a single executable. We weren’t able to parallelize this at all, and we weren’t taking advantage of Buck’s support for incremental builds; any one file change required the whole thing to be redone from scratch.
The same issue was the cause of slow bundling. Buck is smart, and it can only incrementally update files in the bundle that have changed. When the gigantic Instagram executable file changed on every build, however, this did not help much.
Reducing Incremental Link Time
The simplest way for us to make linking faster was to link less code. We couldn’t just delete half of the app, but we could break the app up into separate link steps. While this wouldn’t reduce the overall amount of linking, particularly in an initial build, it could make linking more incremental in practice. In iOS, we’re able to use frameworks to subdivide linking.
In smaller apps, engineers create and organize frameworks manually. We had too much code and too many changes for this to be feasible. Frameworks can’t have circular dependencies, so code in A.framework can’t directly use code in B.framework. Linker errors can also be quite confusing for engineers who aren’t familiar with them. We’d have to spend a lot of time supporting and debugging this, slowing down both ourselves and the product engineers who run into problems.
Instead, we decided that we needed to automatically assign code to frameworks, rather than manually curating them. This would place the burden of keeping them updated and valid on engineers. One approach to automatically assign code would be an algorithm that analyzes the dependency graph and finds the most optimal subdivision of code. However, we didn’t use that approach:
- To update for local changes, the algorithm would need to be run during builds. Buck didn’t have that capability, and even if it did, the algorithm would need to be very performant. The overall goal is to make builds faster, adding graph analysis to the build.
- If not run automatically during local builds, we’d have the same issues as manually curated assignments. Local changes could make the framework assignment invalid, and engineers would need to rerun the analysis—and know that they needed to do that.
- While an algorithm may be deterministic, it would not be easily understandable or predictable. Seemingly small changes to product dependencies could drastically reshape framework assignments.
One of our core values at Instagram is simplicity. We decided to use an assignment strategy that could be easily understood: file structure. Meta uses a monorepo for all mobile code called fbsource. The basic structure looks like this:
xplat directory contains cross-platform code shared with our Android apps.
fbobjc contains shared cross-iOS-app code in
Libraries, and Instagram-specific code in
xplat code depends on
fbobjc code, it’s not very cross-platform! The same applies to
Libraries code depending on code in
Libraries is shared across all apps at Meta.
We decided to use the file prefix in the repository for assignment. This is simple for product engineers to remember, and Buck supported it natively. Dividing the codebase by file prefix allowed us to significantly reduce linking time for the Instagram binary.
While the total link time was still the same, there was a big improvement for incremental builds. If no included source files are changed, we don’t need to redo linking for a framework. Most Instagram product engineers make changes in the code that go into the Instagram binary: our main product features, including Feed, Reels, Stories, and Direct.
Although linking performance was significantly improved, we were still far from reaching 20-second build times. Linking alone still exceeded our desired build duration, and it’s just one part of the build. Overall build times remained at around a minute. We still needed to get much faster.
Splitting & Parallelizing Product Code
To get faster, we would need to improve linking for our product code. So far, we’d only been able to slice off generic infrastructure that our product engineers don’t often touch. This created a serial chain of frameworks: xplat, then fbobjc, then Instagram.
We decided that splitting our product code into frameworks by feature made the most sense: Feed.framework, Stories.framework, and so on. However, there’s not an obvious dependency order to use between relatively independent features. So we decided that rather than setting up a serial order for features, they should all be linked in parallel. This would provide the best performance: We’d be able to link multiple features at the same time, and the performance would be fair to all teams. If we linked features serially, teams whose code came later in the order would have faster builds.
Our dependency graph was not amenable to this, however.
Different features, with their associated, color-coded code shown above, had many bidirectional dependencies. These can be found in the Instagram product: Feed posts can be shared to Direct, and Direct can render Feed posts. Direct depends on Feed; Feed depends on Direct.
Since parallelized frameworks can’t have any dependencies on each other, we needed to refactor our code. We decided to use an internal build-time, dependency-injection framework, which allowed features to call each other without direct imports, and with very little runtime overhead.
Reaching complete separation of features wasn’t something that we could accomplish in a short amount of time, so we made these changes incrementally. We chose a feature, then refactored its code to avoid direct calls to other features. That code would then link as a framework, while the remainder of the product code remained in the Instagram binary.
When we added a second feature, we’d add it in parallel.
We started with some internal-only developer features to validate this, and it worked: We could link in parallel for the first time. Next, we’d need to prove it with a “real” product feature. At this point, we needed to involve product engineers, with the motivation of faster build speeds.
But because we had created a collective-action problem, we weren’t able to convince teams to invest time in refactoring their code. With our framework linking order, teams that refactored their code would not get faster build speeds. Their frameworks came before the binary, so every other team would skip their framework and get faster builds. Unfortunately, they couldn’t skip linking their own framework—that’s where their changes are! Their build speeds would be the same as before.
If every team did the work at the same time, everyone would get faster build speeds. In a large product organization, however, that’s not realistic: Some teams won’t want to participate, and we wouldn’t be able to support hundreds of engineers doing this work and asking questions at the same time. While our technical approach was sound—it would have worked, if all of the work was done—it didn’t have strong organizational incentives. We needed to switch our direction.
Incentivizing Product Engineers
Product engineers wanted faster builds, and we wanted them to refactor their code. We flipped the direction of our linking strategy: Instead of linking refactored feature code before the remainder of the product code, we’d link it after. We created a new framework to hold this code, and we turned the main Instagram binary into a tiny shell that linked almost instantly.
This arrangement would allow product teams that refactored their feature to skip linking the rest of the product code in incremental builds, and it removed the dependency on other teams’ refactoring for build-speed improvements to be realized. This inverted the dependency-breaking work: Rather than breaking dependencies from a team’s feature code, they would need to break dependencies on their code. This wasn’t a problem in practice; the work was essentially the same, but the changes were made in different files.
We held a week-long sprint with the Instagram Direct team to refactor their code. At the end of this, we were able to create a framework for Direct, linked after the remainder of our product code. It worked. The Direct framework linked in one second, and incremental builds affecting only this framework finished in just 19 seconds. We had reached our original 20-second goal—for the Direct team.
The Direct example, with real build-speed data, helped us to convince other product teams at Instagram to refactor their product code into a dedicated framework for faster build performance. We now have 31 separate product frameworks, all set up in parallel.
We named the builds that only link product frameworks Fast Link builds, and other builds Slow Link. We track both the relative speed improvement and adoption rate of Fast Link builds. Fast Link builds are 40% faster than Slow Link builds, and they are currently 35% of our local builds.
We’re continuing to improve our Fast Link build adoption rate at Instagram. The data and feedback from engineers on teams using Fast Link builds tells a compelling story to teams whose code does not yet have a dedicated framework. While we’ll never reach 100% adoption, because changes to our shared cross-product infrastructure libraries will always create Slow Link builds, Fast Link builds have significantly improved Instagram iOS developer experience both quantitatively and subjectively.