Queens

USTA and IBM Live Likelihood to Win in SlamTracker at US Open, Flushing Meadows, New York; real-time win likelihood updated per point Inside the IBM architecture that powers fan engagement at the US Open IBM’s data, AI and hybrid cloud stack helped the USTA build a fast and scalable insight platform for tennis’ marquee event, providing fans with real-time context, predictions and narrative. At the US Open, the most important moment on court isn’t always obvious from the score. A player can trail by two sets and still start building momentum. A casual fan opening the US Open app might see more than a dozen matches at once and need a fast answer: which match deserves my attention right now? Here’s the challenge behind Live Likelihood to Win, an AI-powered prediction engine in IBM® SlamTracker®. Updated after every point, it predicts which player is more likely to win and how that probability changes as momentum shifts. The United States Tennis Association (USTA) runs the annual US Open as its Grand Slam event in Flushing Meadows, New York. During the tournament, more than 14 million unique users engage across the US Open app and website. Behind that experience, there are more than 7 million data points, with each point producing more than 150 variables such as serve speed, rally length and shot placement. The challenge was to make that live data useful in real time, at tournament scale, without turning the product into a data dashboard. By building with IBM’s AI and data products, the USTA turned match prediction into a scalable insight platform for unpredictable play, global traffic spikes and rising fan expectations. Live scores tell fans who won the last point, game or set. They don’t always show whether a match is tightening, momentum has shifted or a routine match is becoming one to watch. Live Likelihood to Win fills that gap. In IBM SlamTracker, it appears as a point-by-point graph that gives fans a continuously updated view of the match’s direction. That simplicity matters when fans might have as many as 20 matches to choose from at once. A power user might want deep player statistics. A casual fan might want to know whether to open a stream. The same system must serve both behaviors. The design principles were practical. The prediction had to reflect the current match point by point because tennis doesn’t have a fixed duration. The experience had to be fast, with only a 5- to 10-second window to ingest data, recompute probabilities and publish the update. The output also had to be trusted: responsive enough to capture decisive moments, but stable enough not to feel noisy. That trust requirement shaped the model design. The team built internal storytelling metrics to measure whether the prediction stayed stable during steady play, reacted to real momentum shifts and identified the likely winner early without overreacting to noise. The technical architecture behind Live Likelihood to Win uses a multilayer design that separates pre-match intelligence from live point-by-point execution. It combines both historical and real-time data to generate predictions. The model is built on more than 20 years of historical data and over 7 million real-time data points. A key engineering decision was the option to the right computational methods for the right moments. The pre-match layer contains more data and can use richer feature engineering and machine learning models because it has more time to compute. However, data from the point-by-point layer requires a faster, simulation-based approach to keep up with the incoming data points once play begins. Starting in the blue “data sources” box on the left side of the diagram above, the pre-match layer is used to establish the baseline probability before a match even begins. The data comes from both structured and unstructured data sources. These sources include Association of Tennis Professionals (ATP) and Women’s Tennis Association (WTA) player rankings and historical head-to-head performance data from Sportradar. They also include expert and media signals surfaced through IBM Discovery and custom-defined tennis predictors developed specifically for the tournament. Then, as the match begins, the data starts to flow into the point-by-point layer. Here, live scoring and gameplay telemetry is continuously ingested. Match statistics—including serve speed, match statistics, aces, double faults, unforced errors, rally length, distance run, volley and return statistics—are also ingested in real-time. What makes it especially powerful is that the win probability isn’t static. It updates dynamically after every point, continuously recalculating the likelihood of each player winning based on what’s happening live on the court. IBM watsonx.data® is the unified data foundation (the pink “Unified Data” box from the diagram). This layer brings together historical tennis data, live feeds from Sportradar and real-time USTA match data. Data is managed across IBM Db2®, IBM Cloudant® and IBM Cloud® Object Storage, giving the system a foundation for both batch model training and live analytics. The pre-match layer conducts significant analytical work before the match begins. Machine learning models ingest signals—including media reports on players’ previous performance and data from prior matchups between the opponents. These signals are combined into predictors such as the Watson Power Index, which provides a structured view of player strength and match context. The system also accounts for cases where media attention or narrative signals could skew a purely statistical view of the predicted winner. A spike prediction module identifies those anomalies and adjusts their influence, so the model can include contextual signals without letting them overwhelm the core prediction. As the match starts, live match events stream through a message queueing telemetry transport (MQTT) publish ...