USTA and IBM Live Likelihood to Win in SlamTracker at US Open, Flushing Meadows, New York; real-time win likelihood updated per point

Inside the IBM architecture that powers fan engagement at the US Open

IBM’s data, AI and hybrid cloud stack helped the USTA build a fast and scalable insight platform for tennis’ marquee event, providing fans with real-time context, predictions and narrative. At the US Open, the most important moment on court isn’t always obvious from the score. A player can trail by two sets and still start building momentum. A casual fan opening the US Open app might see more than a dozen matches at once and need a fast answer: which match deserves my attention right now? Here’s the challenge behind Live Likelihood to Win, an AI-powered prediction engine in IBM® SlamTracker®. Updated after every point, it predicts which player is more likely to win and how that probability changes as momentum shifts. The United States Tennis Association (USTA) runs the annual US Open as its Grand Slam event in Flushing Meadows, New York. During the tournament, more than 14 million unique users engage across the US Open app and website. Behind that experience, there are more than 7 million data points, with each point producing more than 150 variables such as serve speed, rally length and shot placement. The challenge was to make that live data useful in real time, at tournament scale, without turning the product into a data dashboard. By building with IBM’s AI and data products, the USTA turned match prediction into a scalable insight platform for unpredictable play, global traffic spikes and rising fan expectations. Live scores tell fans who won the last point, game or set. They don’t always show whether a match is tightening, momentum has shifted or a routine match is becoming one to watch. Live Likelihood to Win fills that gap. In IBM SlamTracker, it appears as a point-by-point graph that gives fans a continuously updated view of the match’s direction. That simplicity matters when fans might have as many as 20 matches to choose from at once. A power user might want deep player statistics. A casual fan might want to know whether to open a stream. The same system must serve both behaviors. The design principles were practical. The prediction had to reflect the current match point by point because tennis doesn’t have a fixed duration. The experience had to be fast, with only a 5- to 10-second window to ingest data, recompute probabilities and publish the update. The output also had to be trusted: responsive enough to capture decisive moments, but stable enough not to feel noisy. That trust requirement shaped the model design. The team built internal storytelling metrics to measure whether the prediction stayed stable during steady play, reacted to real momentum shifts and identified the likely winner early without overreacting to noise. The technical architecture behind Live Likelihood to Win uses a multilayer design that separates pre-match intelligence from live point-by-point execution. It combines both historical and real-time data to generate predictions. The model is built on more than 20 years of historical data and over 7 million real-time data points. A key engineering decision was the option to the right computational methods for the right moments. The pre-match layer contains more data and can use richer feature engineering and machine learning models because it has more time to compute. However, data from the point-by-point layer requires a faster, simulation-based approach to keep up with the incoming data points once play begins. Starting in the blue “data sources” box on the left side of the diagram above, the pre-match layer is used to establish the baseline probability before a match even begins. The data comes from both structured and unstructured data sources. These sources include Association of Tennis Professionals (ATP) and Women’s Tennis Association (WTA) player rankings and historical head-to-head performance data from Sportradar. They also include expert and media signals surfaced through IBM Discovery and custom-defined tennis predictors developed specifically for the tournament. Then, as the match begins, the data starts to flow into the point-by-point layer. Here, live scoring and gameplay telemetry is continuously ingested. Match statistics—including serve speed, match statistics, aces, double faults, unforced errors, rally length, distance run, volley and return statistics—are also ingested in real-time. What makes it especially powerful is that the win probability isn’t static. It updates dynamically after every point, continuously recalculating the likelihood of each player winning based on what’s happening live on the court. IBM watsonx.data® is the unified data foundation (the pink “Unified Data” box from the diagram). This layer brings together historical tennis data, live feeds from Sportradar and real-time USTA match data. Data is managed across IBM Db2®, IBM Cloudant® and IBM Cloud® Object Storage, giving the system a foundation for both batch model training and live analytics. The pre-match layer conducts significant analytical work before the match begins. Machine learning models ingest signals—including media reports on players’ previous performance and data from prior matchups between the opponents. These signals are combined into predictors such as the Watson Power Index, which provides a structured view of player strength and match context. The system also accounts for cases where media attention or narrative signals could skew a purely statistical view of the predicted winner. A spike prediction module identifies those anomalies and adjusts their influence, so the model can include contextual signals without letting them overwhelm the core prediction. As the match starts, live match events stream through a message queueing telemetry transport (MQTT) publish

subscribe framework and are processed point by point through 20+ Red Hat® OpenShift® pods. Furthermore, the processed output consolidates from the live and pre-match data and curates it before sending it to the live prediction engine through Apache Kafka. The Kafka payload can include who won the point, whether it was an ace, rally length, shot type, ball position and player position. The live layer model uses deterministic and statistical simulation techniques from the current match state, combining current momentum, on-court performance and historical ability to update the prediction after every point. The separation between pre-match context and live match simulation makes the latency target achievable. The system doesn’t try to do every calculation after every point. It performs heavier computation upfront, then keeps the live layer focused on fast updates that can operate inside the required 5- to 10-second window. Outputs are packaged as JSON match feeds, persisted in IBM Cloud Object Storage and distributed globally through IBM’s content delivery network (CDN). That feed powers the US Open web and mobile applications. The IBM CDN was chosen because it supported the latency requirements while simultaneously being able to scale during global traffic spikes during the men’s and women’s finals. Data distribution is a critical capability for this application. The US Open creates a bursty operating pattern: a relatively short annual event with enormous spikes in traffic. Global distribution requires high availability. The requirements called for handling traffic spikes up to 5,000% and maintaining 99.999% uptime. Using Red Hat OpenShift, automated deployment, observability, fault-tolerant machines across AWS and IBM Cloud and CDN-based delivery gives the USTA a more scalable operating model than a one-off prediction service would provide. The result is an AI experience that can entertain and help direct fans’ attention to the most important moments during live play. Understanding when momentum swings happen in a match also creates useful downstream signals for other USTA teams. Momentum shifts can help editorial teams understand where the momentum of the match changed. For example, this information can inform workflows for creating highlight videos. They can eventually support proactive fan alerts, such as identifying when an upset might be developing or when a player has taken control of a match. The platform runs on Red Hat OpenShift (as noted in the green box along the bottom of the diagram), with the components deployed as containerized microservices. Red Hat OpenShift provides orchestration, resilience and flexible scalability. IBM Terraform® supports Infrastructure-as-Code, helping automate environment provisioning and deployment. IBM Instana® provides observability across Red Hat OpenShift pods, services and dependencies, which is essential when a real-time microservices system must operate reliably during a short, high-visibility event window. IBM Consulting® designed and integrated the end-to-end solution, connecting the data foundation, AI modeling, real-time processing, delivery layer and fan-facing experience into one production system. Live Likelihood to Win debuted as a fan-facing feature, but its larger value comes from becoming a foundation for how the US Open turns live tennis data into useful insight. For fans, the benefit is immediate. Instead of scanning scores and guessing which match is heating up, they can use a live prediction signal to understand the current state of play. Instead of seeing only raw match data, they get context, momentum and narrative. For the USTA, the system creates a reusable layer for future digital experiences. The same underlying signals can support IBM SlamTracker, app experiences, editorial workflows, highlights, match alerts and more personalized fan journeys. As the models and explanations improve, the experience can move beyond showing that a prediction changed to explaining why it changed. For engineers and product teams, the lesson is clear: production AI succeeds when it is embedded in a real workflow, backed by reliable data infrastructure and tuned for the experience it serves. At the US Open, AI is not a stand-alone feature bolted onto a scoreboard. It is part of a real-time system, built on IBM’s data, AI and hybrid cloud stack, designed to help millions of fans understand what is happening on court, as it happens.