TOPIC:

Mobile Configuration at Meta: The Key to Mobile Agile Development at Scale

Designed for engineers that manage large-scale information systems serving millions of people. The operation of large-scale systems often introduces complex, unprecedented engineering challenges.

Michael Leighton

TOPIC: Data, Systems and Networking

@SCALE SERIES: Systems @Scale

TYPE: ARTICLE

YEAR: 2023

TAGS:

Additional authors: Amit Adhikari, Tong Bao, Diedi Hu, Matt Guo, Zhao Wang, and Arjun Bhasin

Intro to MobileConfig

MobileConfig is a cross-platform, configuration-management system that developers can use to manage configuration values and distribute them to mobile clients. MobileConfig has been in production since 2015 and provides configuration values to some of the world’s largest and most widely used applications. Here at Meta, MobileConfig has become the key to agile mobile development. In this blog post, we will explore the key design features and optimizations of MobileConfig. We will also discuss how it can be used to support agile development practices and highlight some of its key features.

What is MobileConfig?

A config in MobileConfig is defined as a set of strongly typed parameters that can be used to control various aspects of an application’s behavior. MobileConfig provides a cross-platform, strongly typed API to the various supported applications and services, allowing developers to read config parameters easily. Each parameter can be a static value (such as true or false) or can be tied to an internal Meta experimentation or configuration tool that will provide the value during a sync.

Figure 1: Example of MobileConfig tied to an A/B test and Gatekeeper backend

MobileConfig client-server model works by pulling fresh values from the server on application start and then periodically throughout a session. This pull model ensures that the client always has access to the most up-to-date configuration values. When the MobileConfig client requests new values from the server, the latter must generate them based on the client context and the backend tools assigned to each parameter. This process, known as evaluation, serves as the linchpin that enables developers to utilize a unified client API supporting various vital configuration, feature-flagging, and experimentation platforms within Meta. These platforms encompass a feature-flagging platform (Gatekeeper, or GK), an experimentation platform (Quick Experiment, or QE), and a utility for providing static values, each of which leverages dynamic user and request context to guarantee that configuration values are tailored to individual end users. Developers possess the flexibility to adjust values using these tools, ensuring that parameter values remain dynamic, facilitating swift and straightforward experimentation with efficient deployment of new performance features.

Figure 2: MobileConfig client-side cache

MobileConfig client side always reads from a cache that is updated periodically and on app start. While the user is interacting with the app, values are locked per session to ensure consistency and prevent unexpected changes. A flat-buffer file is used to store the client cache. It contains arrays for each primitive type supported, including booleans, integers, doubles, and longs. At build time, MobileConfig will generate an integer value for each config/parameter used in the application, known as a parameter specifier. The parameter specifier has several fields that allow the client API to know the parameter type, and indexes into the flat-buffer array containing its value. Combining the flat-buffer cache with the code-generated specifiers makes reading configuration values extremely performant and efficient.

Why MobileConfig

When Meta introduced mobile development, existing software life cycle and configuration products for feature flagging, running A/B tests, and serving static configuration values were extended to the mobile space. Each of these tools came with its own client-side SDK for each platform that Meta supported. While this was a pragmatic approach, it had several drawbacks, including the need for mobile developers to learn multiple client SDKs and the many engineering resources required for support. Additionally, mobile developers had to change their code when swapping between these different tools, because each had a custom client SDK. At scale, this approach became unwieldy: As more apps were developed, and more backend products were created to satisfy new use cases, the amount of effort to maintain the many client side SDK’s grew rapidly.

MobileConfig was developed as the solution to these problems. MobileConfig is a cross-platform SDK and backend service that developers use to author and distribute configuration values to the many different mobile apps and services that Meta develops. The MobileConfig SDK supports a diverse suite of platforms and programming languages, from Objective-C and Kotlin to C++. The MobileConfig client SDK works directly with the MobileConfig backend, providing a single level of indirection and allowing developers to take advantage of all existing products without needing specific client libraries. Developers can easily switch from feature flagging using Gatekeeper (GK) to running an A/B test with QuickExperiment (QE) without updating the code, significantly increasing efficiency while reducing complexity. With MobileConfig, a relatively small team can support many of Meta’s mobile apps and services while also working on SDK optimization, client-server protocol, and backend evaluation to improve apps and services.

The Journey

MobileConfig has come a long way since its initial rollout in 2015, replacing the legacy mobile tools supporting feature flagging and A/B testing. Today, it serves as the primary configuration provider for many of Meta’s mobile applications, reaching billions of users and serving thousands of configs daily. The adoption of MobileConfig was a journey, not a race, with iOS being the first fully supported platform by 2016. Android followed in 2017, while Oculus and Instagram were fully supported by 2021. Eight years after MobileConfig’s creation, it has further expanded to support several different platforms, becoming an essential tool for mobile developers across the company.

Figure 3:Timeline of the different stages of MobileConfig adoption

Mobile Agile Development

Mobile-software development at scale presents a unique set of challenges, particularly when implementing agile methodologies. One of the main difficulties is the sheer volume of code changes that must be managed, with our largest application receiving over 300 changes a week due to contributions by thousands of developers each year. This scale makes it challenging to ensure that all changes are adequately tested and integrated, increasing the risk of errors or conflicts. Additionally, for previously released apps, on most platforms Meta has no control over when app code updates are released. If a change introduces a bug that regresses the stability of the application, it can be days or weeks before a new version is deployed and the regression is mitigated.

if(getConfigValue(use_new_nav_button)){
// Code to add new navigation button
btn.color = getConfigValue(button_color)
} else {
// Use previous code
}

Figure 4: Code using MobileConfig to gate a feature

MobileConfig gives developers complete control over their feature release cycle, independent of the application release cycle. Developers can start by putting their client-side code changes behind a MobileConfig parameter and releasing it with the parameter statically set to false. From there, they can use internal tools to decide how to release the feature. For example, they could hook the parameter to Gatekeeper and enable the feature only for themselves or their team. Once enough testing has been completed to allow confidence in the feature, they could start an A/B test for a small set of users. When satisfied with the results, developers can tie the parameter back to Gatekeeper and begin an automated rollout to the public. All of this can be done without changing a single line of client-side code, making the process seamless and efficient.

Figure 5: Feature-release life cycle using MobileConfig

Feature Flagging and Experimentation

Gatekeeper is the feature-flagging and gating platform in Meta that provides dynamic access control to certain features or functionalities of services or products. Meta developers utilize Gatekeeper to define the target apps or versions for their features or products.

Gatekeeper is an evaluation framework built upon Configurator (Meta’s configuration system, and it can be used with various services. It empowers developers to incorporate customized rules into the framework by extending specific API and work with diverse datasets, including mobile client details such as app type and version.

Gatekeeper provides an added functionality for running A/B tests to conduct experiments on different groups of users, providing data-driven decision making for our products. In a typical Gatekeeper A/B test, a defined percentage of users are split into control and test groups based on the rules specified by developers.

Gatekeeper APIs operate on the server side, and the evaluation results are being used on mobile apps through MobileConfig abstractions. The A/B test is triggered only when MobileConfig signals that the server-side evaluation result from a specific Gatekeeper has been utilized in the mobile app.

Mobile Config Canary

Running canaries for configuration changes is an essential part of the software development process at Meta, as it allows us to catch issues before they hit production. Unlike traditional canaries, however, mobile canaries at scale are more complex because they need to account for the specific platforms on which they are executed.

Typical server-side canaries generally run on a small set of hardware platforms that are highly optimized for a specific task. Meta’s mobile suite of applications, in contrast, runs on a large and diverse set of hardware, everything from 10-year-old Android devices to the latest iOS devices. Targeting only a small set of devices could easily miss entire hardware types or configurations.

Meta’s apps are large and feature-rich, and a configuration value located deep in a submenu will prompt fewer interactions than the main application screen. Also, MobileConfig values take longer to propagate due to our pull-versus-push distribution model. Short-running canaries do not give users enough time to use less prominent features, and issues could easily be missed.

To address these issues, MobileConfig canary uses a multi-stage (pre-land and post-land) approach. The pre-land canary is run on a smaller set of users and devices with shorter canary times. Developers must wait for this canary to complete before their change takes effect. The shorter time allows developers to land configuration changes quickly and does not affect developer efficiency. This canary is designed to check for the most critical issues, such as application crashes.

MobileConfig post-land canary is run on a much larger group of devices (0.05% of users), takes longer (four hours or more), and checks a large group of company health metrics. If any issues are detected, post-land developers are notified of the potential issues and can take steps to address the problem. The multi-stage approach has proven successful and prevented more than 40 SEVs in 2023 alone, demonstrating the importance of running canaries for mobile apps.

Key MobileConfig Features

Emergency Push

Emergency push is a feature in MobileConfig that quickly mitigates client-configuration issues. It allows for faster delivery of configuration fixes to devices. Also, it provides for post-action on delivery, such as invalidating the cache or restarting the app to ensure that all apps can apply the new config value. It’s an important feature because when a new value is pushed to mobile users, it may take up to 24 hours for devices to pick up a change after it has been updated on the backend.

To implement an Emergency Push (EP), we first store an emergency version number (EVN) for every app/version in the server. When a developer initiates an EP, we send the EVN, selected config, and action to the affected user’s device. The client will then check the EVN; if it’s newer than its previous EVN, it will update and perform a config fetch. Finally, the client will update the cache or restart the application.

MobileConfig allows the application to select between several push channels and protocols to perform the Emergency Push. Some applications will use a custom version of MQTT (the Message Queuing Telemetry Transport protocol), while others choose to send the EVN in the header of each HTTP request. The protocol selected is specific to the properties of the application and its user base, but it will result in quicker config updates compared to the standard MobileConfig pull model. A recent internal experiment showed that without Emergency Push, an updated config value would reach 26% of all users in four hours; with Emergency Push the change was propagated to 40% of users in the same amount of time.

Scale and Optimization

Having a highly optimized client-server protocol and client SDK is important for the success of MobileConfig. With support for many of Meta’s different applications, some of which have more than one billion users, our system must handle massive scale. Our most-used apps have more than 1000 configuration parameters, and we see more than 1000 remote configuration changes daily.

Ensuring that our system can handle this volume is a challenge. Over the years, we have implemented various optimizations in our client SDK and network protocol. These optimizations allow us to efficiently manage and distribute configuration values to a large number of clients while also minimizing the impact on user experience. The following sections will delve into the specific optimizations we use to make MobileConfig work at scale. The network optimizations alone have been shown to reduce our network usage by more than 96%.

Partial Fetch

Partial fetch is a feature in MobileConfig that helps to address the trade-off between the freshest configuration values on start and the app startup time. Most of the time, MobileConfig uses a local cache to serve config values, which is updated asynchronously on app start or based on some time interval. In some cases, however, such as a fresh app install or app update, a cache may not be available. In these situations, MobileConfig can either continue to fetch asynchronously and serve default values for configs accessed during this time or block the start of the application and fetch the latest set of configs.

Providing default values is not ideal, as it may present users with outdated or unwanted experiences. On the other hand, blocking to wait for fetch can also hurt user experience by slowing down the boot time of the application. To address this issue, MobileConfig Partial Fetch takes advantage of the fact that most configuration values are not used on application start. The small set of configs critical to the application’s operation are identified, and MobileConfig will block the application while fetching these essential configuration values and using an async fetch for all others. This way, we can ensure that the most critical config values are retrieved quickly and have the most up-to-date value while minimizing the impact on user experience.

Scheme Hash Optimization

MobileConfig provides an optimized path for requesting config values. Given that MobileConfig is used across a range of apps and platforms, the server lacks the knowledge of the exact subset of configs and parameters that the client wants to fetch. But when an app is compiled, each build has a fixed set of configs and parameters in use. As a result the server only needs to know the version of the app build. At compile time, the library generates a sha-256 hash of the list of configs and parameters and embeds it within the app binary. This list of configs and schema hash is also stored in a key-value store in the server. Then, during runtime the app/client only needs to send a request with the config schema hash, and the server can consult the key-value store to know which subset of configs and params to evaluate for that request.

Value-Hash Optimization

MobileConfig supports more than 1000 config parameters in our major apps. To guarantee config-value freshness, the mobile clients make synchronization requests to the server every few hours. While more than 1000 changes are made daily, the majority of parameters stay unchanged between each two synchronization requests. To minimize the response payload, we should avoid sending the unchanged config values back to the client. We need to embed the client-side values in synchronization requests so that the server can tell if the value has updated since last sync.

There will be a big increase in the request size, however, if we include the actual parameter values in the request. So, instead of sending the parameter values, a value hash is computed for each config, and the server can tell if any parameter has an updated value by comparing the value hash. To ensure consistency in value hash, the hash is calculated on the server side and sent back to the client in the synchronization request.

Boolean Optimization

Boolean optimization is a feature that reduces the http-response payload size for config fetch requests. The table below illustrates that we can tell the majority of configs are boolean types.

	Bool	Int	String	Double	Other
App1	69%	23%	4.9%	2.4%	1.3%
App2	66%	22%	4.8%	5.1%	1.5%

Figure 6: Breakdown of parameter types used in MobileConfig

To construct the response in a more efficient way, we allocate two bits for each boolean parameter. One bit is for the null flag (null/valid) and the other is the boolean value (true/false). If the null flag bit is set, it tells the client to use the client-side default value. We concatenate all boolean bits into an efficient byte array based on the order of the boolean parameter in the config-fetch request. Then this array is added as a separate field into the response. Once the client receives this array, the client code would get the value for each boolean parameter based on the same order of all boolean parameters.

Exposure Logging

One main use case of MobileConfig is to run A/B testing for client-side features. To make the analysis more precise, we need to compare the people who actually see the new features and the people who experience the old feature behavior. This requires the client-side exposure logging, which fires the logging events when the config value is read on client side and the feature is surfaced.

To tell which group the user belongs to, a logging ID is used to represent a test/control group. The logging ID is retrieved for parameters during the synchronization with the server. When the client side reads the parameter, an exposure event containing the logging ID will be sent to the server for logging and further analysis.

Since only very small portions of users are considered for the A/B testing, the majority of parameters have empty logging ID. Additionally, multiple parameters may be controlled by the same A/B testing and share the same logging ID. To reduce the disk/memory usage on mobile, we don’t directly store the logging ID along with the value and other meta info of the parameter. Instead the logging IDs are stored in a separate array. And the parameter meta info contains some bits for indexing the logging ID in that array.

Strongly Typed API

The MobileConfig library supports strongly typed API for referencing config parameters. When a developer adds new configs to an app, the compiler transparently generates symbols for Java or C++ config classes and parameters as fields. This allows the app developer to use a typed API such as getString(ButtonCfg.color), making it easier to work with configs in the code.

Generating thousands of classes and parameters, however, can inflate the app’s binary size and slow down app startup time. To address this issue, the compiler infrastructure replaces all the config symbols in their call site with bit-encoded integer ids, similar to Android resource IDs. These integer IDs encode a few parameters, one of which serves as an index to efficiently locate parameter values at runtime.

Consistency Logging

Consistency logging is a feature that ensures the accuracy and reliability of configuration values in MobileConfig. It works by periodically batching all config values in the client cache into a JSON object and sending it to the MobileConfig backend. Additionally, a random set of clients are selected for sampling in any given session. When these clients call our client API, we send the parameter value back to the server. The MobileConfig backend then re-runs the evaluation and compares the client data to the result of the new evaluation. It tracks values that are the same versus values that are different and stores inconsistencies in a database.

This allows us to calculate a consistency rate across the many different apps and services that we support. MobileConfig has an average consistency rate of 99.7%. This feature helps ensure that the configuration values used by clients are accurate and up to date, providing a reliable user experience. By adding alerting and dashboarding on top of our consistency rate, we can be sure that the service is healthy—and identify potential problem areas quickly.

Conclusion

MobileConfig is a powerful tool that allows developers to control the software-release cycle and work more efficiently. A single mobile SDK supports many different apps and platforms, serving billions of daily requests from billions of users. Providing config reliably at this scale requires a large number of optimizations. Using MobileConfig, developers can save time and resources while delivering high-quality features to their users.

SUBSCRIBE TO @SCALE

TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web