EVENT AGENDA
Event times below are displayed in PT.
Performance @Scale is an invite-only conference for engineers working on the technical and organizational challenges of high-performance applications and services.
If you have ever wanted to learn best practices from the pros on how to detect performance anomalies, scale your web service, or speedup your mobile apps, then Performance @Scale is the place to be on Thursday, June 20th, 2019! If you have friends or colleagues who may also be interested in attending, feel free to forward them this invitation.
Performance @Scale will be held on Facebook’s campus in Menlo Park, California. Registration and breakfast starts at 8:30 a.m. The Women in Technology panel will be held at at 9:00 a.m. and talks begin at 10 a.m. Stick around after the day-long conference for Happy Hour.
Learn more about @Scale events. and follow us on Facebook for updates.
Event times below are displayed in PT.
To kick off the event, Surupa takes us through a tour of how performance is done in Facebook’s apps. She describes how product teams and central performance teams work together to improve app size, startup times, crash rates, and more. Through better tools and partnerships, the teams have scaled to more than 150 metrics across multiple apps and platforms.
Surupa Biswas is an Engineering Director overseeing Mobile app Performance, Reliability and Efficiency in Facebook’s Developer Infrastructure group. Since joining Facebook in 2013 she has overseen server and mobile infrastructure teams, previously leading teams building Facebook’s open source web server infrastructure - HHVM and GraphQL
In this talk, we describe our top-down methodology for uncovering inefficiencies in our production AI workloads, the tools and technologies we’ve built to support performance analysis, and the common pitfalls in optimizing accelerated code. Our tools and techniques are being used by thousands of ML engineers at Facebook on products that serve billions of users.
Kim Hazelwood is an Engineering Manager leading the AI Infra Foundation and AI Infra Research efforts at Facebook, which focus on the hardware and software platform design and efficiency for Facebook's many applied machine learning-based products and services. Prior to Facebook, Kim held positions including a tenured Associate Professor at the University of Virginia, Software Engineer at Google, and Director of Systems Research at Yahoo Labs. She received a PhD in Computer Science from Harvard University in 2004, and is the recipient of an NSF CAREER Award, the Anita Borg Early Career Award, the MIT Technology Review Top 35 Innovators under 35 Award, and the ACM SIGPLAN 10-Year Test of Time Award. She currently serves on the Board of Directors of CRA, MIT SystemsThatLearn, and EPFL EcoCloud. She has authored over 50 conference papers and one book.
Tensor Processing Units are Machine Learning accelerators developed at Google. A TPU v3 Pod offers over 100 PFLOPs of compute, leading to dramatic reductions in training time of Machine Learning models. In this talk, we will explore some of the scalability challenges, often not unique to TPUs, and techniques to address those challenges.
Naveen Kumar is a Software Engineer at Google. He currently leads Performance within Google Brain. Previously, Naveen worked on Google's second generation Tensor Processing Units. Prior to Google, Naveen focused on microprocessor research at Intel Labs. Naveen holds a PhD from University of Pittsburgh and enjoys outdoor life in the Bay Area.
The computational size, complexity and footprint of neural network training has been doubling about every 3.5 months, according to OpenAI. As well, the amount of data used for training has been increasing, for instance as researchers are able to take advantage of unsupervised training methods as in BERT. These researchers now require multiple systems for training their models (a trend similar to scientific simulations on HPC systems in the past). This talk will discuss the techniques needed for running deep learning training at scale on GPUs, and state of the art results. The discussion will also review how to deploy, scale, load balance and optimize the trained network inference (or prediction) throughput on GPUs, using tools such as TensorRT Inference Server.
Ujval has spent the last 10 years working on software and libraries for deep learning and HPC at NVIDIA. Previously, he co-founded Stream Processors, a fabless-semi startup building programmable processors for signal and image processing. Ujval earned his PhD in EE at Stanford and a BS at Brown University.
Performance is more than a numbers game. This talk will share how Bing leverages behavioral analytics to identify usability bottlenecks and optimize perceived performance. We will cover a wide range of performance experiments, including good ideas that failed, and summarize the lessons we learned along the way.
Sarvesh leads the performance team at Bing, Microsoft and is passionate about solving complex data problems with rich visualizations. Sarvesh holds a M.S. in Computer Science from Columbia University, NY.
The Web as an application platform is still very much behind native platforms like Android and Windows for performance and richness of integration APIs. This makes it challenging for developers to create sophisticated yet performant webapps which require a non-trivial amount of client-side JS code. The Browser Engineering team at Facebook finds bottlenecks in browser implementations, contributes code to open-source browsers, prototypes new Web technologies, and advances new API proposals through Web standards committees. This talk will cover our current and future projects for making Web apps as fast and as powerful as native apps, including the new isInputPending() API, the upcoming JS Self-Profiling API, and new ideas for eliminating JavaScript overheads.
Vladan is the tech lead for the Browser Engineering team at Facebook. His technical focus is browser technology, performance, and low-level systems. Previously, he lead the Firefox performance team at Mozilla, working on browser startup, responsiveness and performance measurement.
Even under constant load, the behavior of a system is affected by variance, perturbations, single-threaded execution and other time-based issues, and never completely uniform. Using profilers to analyze the performance of a system generally involves aggregating events or samples over a period of time, and identifying these small variations in the full profile becomes a needle-in-a-haystack problem. FlameScope solves this by combining a subsecond-offset heatmap, for navigating a profile and visualizing these perturbation, with a flame graphs for code-path analysis.
For the past 13 years Martin's career evolved around technology and performance engineering, leading major initiatives at Netflix, Expedia and other companies. Currently, as a Performance Architect at Netflix, Martin is responsible for improving the performance of the Netflix service, for its 148+ million users, watching hundreds of millions of hours of movies and TV shows every day. Martin is also a Venture Advisor at monashees+, one of the largest venture capital firms in Brazil, angel investor and advisor to multiple startups, and an avid open source contributor.
At LinkedIn, we monitor our client side performance as experienced by our members (RUM/Real User Monitoring). In this talk, we will share our journey migrating to a new generation of RUM for native apps, challenges faced in building a generic instrumentation framework, tradeoffs made to fit in our mobile architecture, lessons learnt and best practices when designing new Tier 0 performance metrics for the company.
Ramya Pasumarti is a Staff Software Engineer at LinkedIn with the Performance Engineering team. She works on mobile and server side performance focusing on a variety of monitoring, tooling and optimization projects across the stack. She currently leads initiatives to enhance mobile performance measurement, monitoring and debugging experience for developers.
Startup of the iOS app is an important performance metric user experience. However, poor ordering of functions in the iOS binary can greatly increase page faults during startup and significantly hurt startup performance. An “order file” can be used to direct the linker how to order functions in an iOS binary better. To generate an order file for iOS apps, we usually use dtrace, but some apps have multiple startup scenarios that we want to optimize for with the order file. The dtrace approach does not scale well and it is not easy to automate. In this talk, we describe some more scalable approaches to generating order files.
Manman Ren is a Software Engineer at Facebook. She currently works on iOS app performance. Previously, Manman worked at Apple's compiler team and on bringing Android to support IA at Intel. Manman holds a PhD from Stanford University.
Kim Hazelwood leads the AI Infrastructure Foundation efforts at Facebook, which focus on the... read more