Back to Blog

Your ML Model Routing Just Got a Netflix-Scale Upgrade

Netflix replaced Switchboard with Lightbulb for ML model serving, streamlining routing configuration via JSON. Discover how this architectural shift handles 1M RPS for your ML operations.

Admin
May 03, 2026
3 min read
Your ML Model Routing Just Got a Netflix-Scale Upgrade
Your ML Model Routing Just Got a Netflix-Scale Upgrade

Editorial Note

Reviewed and analysis by ScoRpii Tech Editorial Team.

Introducing Lightbulb: A Refined Approach to ML Model Serving

Your experience with complex ML model serving systems likely includes the challenges of operational hurdles and the need for refinement. Netflix's initial routing component, Switchboard, relied on JavaScript for configuration, enabling context-aware routing and A/B testing of model variants. However, operating Switchboard at scale introduced substantial challenges, necessitating a comprehensive re-evaluation of its core implementation.

As you refine your own ML model serving infrastructure, you can appreciate the importance of retaining critical capabilities like context-aware routing and A/B testing while fundamentally reworking the underlying system for improved resilience and manageability. According to a report by Nipun Kumar, Rajat Shah, and Peter Chng, Lightbulb emerges as a direct response to these pressures, prioritizing operational simplicity and efficiency.

Lightbulb's Engineering: Mechanism and Scale

Lightbulb's engineering approach shifts from JavaScript configuration to a JSON file for defining routing rules, reducing complexity and offering a more declarative and auditable approach to managing your routing logic. This refined architecture is engineered to handle substantial loads, supporting up to 1 million requests per second for ML model serving.

The platform continues to facilitate sophisticated traffic steering capabilities, including granular context-aware routing and A/B testing model variants. You can implement these features without sacrificing stability, as demonstrated by Netflix's experience with Lightbulb.

Key Features and Benefits

The key features of Lightbulb include:

  • Common Client Abstraction: Providing a single point of contact for all clients' model needs.
  • Context-Aware Routing: Enabling routing decisions based on a rich set of contextual features.
  • Dynamic Traffic Splitting: Supporting real-time traffic splitting for canary deployments and experimentation.

These features offer several benefits, including improved operational simplicity, increased efficiency, and enhanced scalability.

What This Means For Your ML Operations

For your organization, this architectural transition at Netflix offers critical insights into managing ML inference at hyperscale. If you are grappling with complex, performance-sensitive routing for your own ML workloads, Netflix's experience with Lightbulb suggests a move towards declarative configurations via JSON can significantly alleviate operational overhead compared to script-based systems.

By simplifying your routing configuration, you can gain improved consistency and potentially faster debugging cycles when routing logic is defined in a structured, easily parsable format. The explicit mention of supporting 1 million requests per second validates the operational viability of this refined architecture.

The Bottom Line for Developers

As you refine your own ML model serving infrastructure, consider the lessons learned from Netflix's experience with Lightbulb. By prioritizing operational simplicity and efficiency, you can create a more scalable and resilient system that supports your organization's growing ML needs. Remember to focus on declarative configurations, context-aware routing, and dynamic traffic splitting to achieve a refined approach to ML model serving.

Originally reported by

Netflix Tech Blog (ML)

Share this article

What did you think?