On any given day, sports statistics from across the globe course through the headquarters of Stats, which occupies the 22nd and 23rd floors of one of the Loop’s less remarkable glass and steel towers. At first blush, the sports data company’s office resembles that of any tech firm. The open-plan space is populated mostly by young men who stare into monitors on row upon row of standing desks or Ikea-ish modular furniture. It takes a moment for the jockier elements to come into focus: the copies of SportsBusiness Journal in the waiting area, the Pop-A-Shot in the atrium, and a LeBron James shot chart from the 2016–17 season that could be considered fine art only insomuch as it resembles the work of an abstract pointillist painter.
It is here, in a glass-walled conference room, that Patrick Lucey is showing me the future of sports on his laptop: It’s skeletons playing basketball. To be more precise, it’s the skeletons of the Portland Trail Blazers playing the skeletons of the Golden State Warriors.
Lucey, the company’s director of artificial intelligence, has cued up a clip from an NBA game. But in this version, rainbow-colored stick figures overlay the players and move in tandem with them. These skeletons, he explains, are generated by a technology called OpenPose. Developed at Carnegie Mellon’s Robotics Institute, it can detect the precise movement of a human body, using only a TV feed. “Normally,” says Lucey, “a player would have to wear some kind of motion capture suit.”
Advertisement
To demonstrate the tremendous leap this represents, Lucey had earlier shown me a clip of the same play rendered with the once innovative but now aging technology his company developed called SportVU (pronounced “Sport View”), which can record the locations of players and the ball 25 times per second using cameras mounted in a stadium’s catwalk. The video looked like simple numbered dots moving around a two-dimensional basketball court. As the play unfolded, a Blazer in the paint kicked the ball out to a teammate behind the 3-point arc. He appeared to be wide open — no Warriors dots were nearby. But he didn’t take the shot.
Why not? It’s only when Lucey pulls up the TV feed that we see what happened: The open player had to bend to retrieve a low pass and bobbled the ball. That’s something SportVU can’t register, but OpenPose does.
If Lucey can perfect the technology’s application to sports, such nuanced details of every millisecond of every game could be automatically entered into a massive, searchable database that NBA teams and other Stats clients could access for all kinds of purposes. A coach who wants to know which players aren’t keeping their hands up on defense could quickly call up that info. “They could even do injury prediction,” Lucey says. A minor shift in a player’s gait, for example, might indicate he has a physical problem brewing. The technology could also be employed to improve technique, pinpointing hitches in everything from a basketball shooting stroke to a tennis backhand.
“This is the most exciting thing we’re doing,” Lucey says. “This is going to change sport.”
Lucey’s bosses are counting on such innovations to also change the trajectory of Stats (which stands for Sports Team Analysis and Tracking Systems). The undisputed leader in the sports data field as recently as a few years ago, Stats contends these days with a new crop of deep-pocketed, forward-thinking competitors that have bumped the company from its long-held place atop the industry.
Started in the 1980s by a plucky band of baseball stats nerds — among them a Chicago actuary named John Dewan and the godfather of sabermetrics, Bill James — Stats made its name thinking outside the traditional box score. By tallying and databasing aspects of games that were previously either ignored or thought to be insignificant (pitch counts, the location of batted balls), the nascent company began shining light into darkened corners of sports orthodoxy. Where the gut feelings of hardened managers and scouts had once prevailed, Stats countered with insights backed by quantifiable proof.
If Stats today is the Cubs circa 2011 — a storied organization needing a competitive shot in the arm — then its Theo Epstein, its white knight, is Patrick Lucey. One company executive described Lucey, with utter sincerity, as its “shining star.” The 37-year-old is a mathlete in the most literal sense: He’s a big-data engineer who happens to be a skilled jock. Six-foot-four and broad-shouldered, Lucey played semipro soccer for eight years in his native Australia while earning his doctorate in computer science.
On his laptop, he has something else to show: another glimpse into a sci-fi-ish near future of sports technology. He plays a video featuring a system he helped develop that turns an iPad into what he calls an “intelligent clipboard,” allowing a basketball coach to sketch a play on the sidelines and run a 2D simulation to see how the defenders are likely to respond. The coach can then make adjustments to improve the chances of scoring.
“It’s something called ‘deep imitation learning.’ We can imitate what we’ve seen in previous games,” Lucey explains. “Given enough data, basically you can predict what’s going to happen.”
The project, which is in the R&D phase, made waves when it debuted in February at the MIT Sloan Sports Analytics Conference, the annual quant fest cofounded by Houston Rockets general manager Daryl Morey. “We’re basically living in an NBA 2K sim,” the Ringer’s Kevin O’Connor wrote of the technology. “Next step — robot coaches,” cracked Ian Levy of FanSided. “Get ready for an NBA Finals featuring Bot Rivers against Mike D’Android.”
Those reactions produce in Lucey a big Aussie chuckle. Stats, he says, isn’t looking to replace San Antonio’s Gregg Popovich with IBM’s Watson. The objective of his AI team is to find ways to augment, not replace, the intuition and expertise of coaches. But Lucey also understands, better than almost anyone, that the rise of the machines in sports has only just begun.
High up in the press box at Guaranteed Rate Field, before a Sox-Mariners matinee in late April, Jeff Chernow finds his seat amid the beat reporters. Even as Patrick Lucey and his team dream up cutting-edge technologies that could automate much of the data-gathering process, most of that work at Stats, at least for now, continues to be done by humans like Chernow.
This army of around 400 freelance “reporters” manually records play-by-play stats and a host of other info at close to 125,000 events a year across some 600 leagues — everything from America’s big four (MLB, NBA, NFL, NHL) to international soccer, rugby, cricket, and camogie. Stats provides this info — and also analytical insights and technology — to pro and college teams, fantasy sports outfits, and the media. If you hear an announcer cite an interesting factoid during a game, chances are good that it came from Stats. While the company is tight-lipped about which teams employ it, Stats publicly counts among its clients ESPN, CBS, Google, Snapchat, fantasy sports site FanDuel, and the Washington Post. A former executive put the company’s annual revenue at more than $100 million.
Chernow has been scoring games for Stats since 1996. As an operations manager, the 49-year-old now oversees the company’s MLB and NBA coverage, but he continues to find time to do work “on the assembly line” and has become Stats’ most trusted baseball scorer. He’s assigned half a dozen games a month, a mix of Cubs and Sox. The gig pays $140 to $150 a pop. “Just a little bonus on my paycheck,” says Chernow, who was raised in a Rogers Park household with divided baseball loyalties; his mother was a White Sox fan, his father more of a Cubs guy. “Watching baseball for pleasure now is a little like Warren Buffett watching the financial talking heads on CNBC.”
He opens his laptop and logs on to the Stats baseball scoring system, whose low-tech interface looks like something out of a 1980s Nintendo game. Team lineups flank a baseball diamond in the center of the screen, below which appears a grid of keyboard shortcuts for logging the outcome of every pitch: B for a ball, F for a foul, S for a swinging strike, T for a taken strike, and so on. He also records the placement of every batted ball.
Chernow then signs in to Stats Pass, the company’s subscription-only repository for statistics, scouting reports, and historical research tools. This is where the information he’s recording will ultimately live, though it will also appear on stadium scoreboards across the country and websites around the world. Here, teams, broadcasters, and other clients can do what are commonly called “splits” — adding specific game situations to any research query. A team facing the Sox, for instance, can find out what type of pitch its batters are likely to see from Carlos Rodón on a 2–1 count. A color commentator reaching for on-air banter can inform the audience of how many home runs Kyle Schwarber has hit against the Cardinals in day games at Wrigley Field on 3–2 pitches. Chernow calls up the previous night’s Philadelphia 76ers basketball game and arbitrarily queries how many dunks Ben Simmons threw down. “Just tons and tons of info,” he says.
Advertisement
Few contests in sports feel as inconsequential as baseball in April, but Chernow records every event as if this were game 7 of the World Series. Knowing the mechanics of the sport is a prerequisite for the job, he says, but he’s seen plenty of self-styled baseball authorities struggle at it. “You can think too much when scoring a game, and clients who want data now-now-now are tired of waiting on you to decide whether or not to give that guy an RBI.”
In the bottom of the first, the Sox’s Yoán Moncada crushes a changeup into the mostly empty seats in right field. As Moncada rounds the bases, Chernow’s fingers flit around his keyboard. He taps H for a hit, which brings up a menu of options, including a home run. As with every batted ball, he rates Moncada’s dinger on a scale of 1 to 4 based on how hard it was hit. On the digital baseball diamond, he places a point where the ball landed. “Boom! All the data is out. Clients already know there’s a run on the board,” Chernow boasts. The customary fireworks show over left-center field has not even commenced yet. “Usually I have everything entered before he crosses home plate.”
Chernow is far from all of the Stats manpower dedicated to this relatively inconsequential matchup. Long before the game started, the company’s research group produced notes and dense analytics packages that it delivered to broadcasters. At the same time Chernow is taking in the game live, another Stats reporter is at home watching the broadcast, recording what’s known as “TVL data” — details related to the type, velocity, and location of every pitch. This assignment is usually given to ex-pitchers, because of the eye required to discern the difference between, say, a two-seam and a four-seam fastball. “The most talented former minor leaguers who threw their arm out once upon a time? We’ll give them a ring,” Chernow says. “If they get tired of scoring games for us and they’re smart, they’ve moved on to other departments at Stats. We’ve got one guy in legal, a couple in finance, a couple of full-timers in operations.”
Once the game ends, yet another reporter will rewatch the footage to record what Stats has dubbed “X-Info” — extra information that adds layers to the play-by-play data. “Stuff that’s not scientifically provable — subjective data,” Chernow says. That includes making judgment calls such as distinguishing a pop-up from a flyout, noting whether an outfielder took a bad route but recovered in time to make the catch, or clarifying that there was a rundown play that prevented runners from advancing. “Some people stuff envelopes, some work at a call center, some people do X-Info for college football games,” he says.
As the Sox game moves along to the sixth inning, Chernow reflects back on his early years in the job. In the mid-’90s, Stats was feeding the increasing need for novel statistics during the proliferation of fantasy sports such as Rotisserie League Baseball. He worked in the “ops room” of the company’s previous office in suburban Northbrook alongside other sports nuts. “It wasn’t a frat, but it was like a cauldron, and you were bouncing ideas off of each other: ‘Maybe we should start covering deflections in NBA games.’ The internet was a couple years in, and the dial-up modems were buzzing. It was kind of like a startup before startups.”
The Sox ultimately lose, in a game that lasts two hours and 57 minutes, which Chernow duly records. He sends the completed scoring off to Stats HQ. A colleague from the ops room messages him to confirm receipt: “Tough to watch. See you tomorrow.” The game’s most impressive performance goes unrecorded: Chernow’s superhuman ability to avoid taking a single bathroom break.
Packing up his laptop, he begins to consider how much longer he’ll be able to score games. “You’ve heard me refer to [Stats reporters] as the guys on the assembly line. That’s the fear in a lot of blue-collar industries, that I’m going to be replaced by a robot,” he says. “The higher-ups like to talk about how the AI revolution is going to make us billions of dollars hand over fist. There’s a lot of data that can be collected and done very efficiently but probably” — and here he corrects himself — “most definitely would need a human being to guide and consult.” After all, he wonders aloud, “can a robot tell the difference between a hit and an error?”
One day in the mid-’80s, John Dewan, an actuary at insurance broker Aon, received a phone call that would change his life. It was from a man named Dick Cramer, a Harvard- and MIT-educated scientist, computer programmer, and stathead. The influential sabermetrician Bill James had recommended Dewan as a potential investor in Cramer’s four-year-old company, Stats Inc. Dewan had been serving as executive director of Project Scoresheet, James’s effort to organize volunteers across the country to record play-by-play data for every major-league baseball game on scorecards he had created to facilitate the info being entered into a database.
At that time, Stats’ business was built around supplying major-league baseball teams with a system that would enable them to track their own statistics. Known as the Edge 1.000 (pronounced “one-thousand”), it consisted of an Apple II personal computer, a Digital Equipment mainframe, and software Cramer had written. Teams paid $25,000 for the system when it debuted and would then hire an operator to lug the 200 pounds of equipment from press box to press box.
This was well before baseball’s Moneyball years, when analytics would find common acceptance in the major leagues. Even managers who used the Edge 1.000, like the young White Sox skipper Tony La Russa, kept it quiet. “I don’t always make the move printouts suggest,” he once told a reporter, “but at least I’m aware of what the percentages say when I choose to ignore them.” The Yankees, meanwhile, used the Edge 1.000’s insights mostly off the field, as ammo against players during contract negotiations. By 1985, Stats had contracts with only those two teams, and its debt-ridden parent company ceded it to Cramer, who was now asking John Dewan to invest.
And he did, along with his wife and Bill James. Still, he wasn’t crazy about dealing in hardware and software. “We quickly realized that wasn’t how we were going to make the company work,” says Dewan, who took on the role of president. What would work, he thought, was transitioning into a data company that would record statistics for every game and license them to teams and the media — a larger market. Stats rapidly built up its media partnerships, becoming the exclusive provider of expanded box scores to USA Today and the Associated Press, which helped fuel the explosion of fantasy sports.
Advertisement
In 1987, the first year that Stats kept an entire year’s worth of pitch-by-pitch game data, Dewan stumbled into a lucrative opportunity. NBC hired the company to furnish announcer Vin Scully with interesting nuggets to use during postseason broadcasts. Before game 4 of the National League Championship Series, between the St. Louis Cardinals and the San Francisco Giants, Stats dug up a tidbit about how Cards pitcher Danny Cox allowed a .268 batting average before his pitch count hit 70 but after that it jumped to .345. “So at pitch 70, Scully announces this,” Dewan says. “Pitch 73 is a double. Pitch 74 is a home run.” Cox then gave up two more hits. “Vin just went nuts. He almost fell out of the broadcast booth raving about this.”
This kind of alliance continues today. Senior researcher Ethan Cooperson, who has been with Stats since 1994, spends most of his week during the NFL season with his nose in Stats Pass, preparing game notes for CBS and Fox broadcasters. When game day rolls around, Cooperson is in the booth with CBS’s Jim Nantz and Tony Romo, offering them noteworthy stats, relevant trends, and historical trivia. “The one we got a lot of attention for was in 2008, Steelers and Chargers. Very late in the game, the score is 11–10,” Cooperson says. “I looked it up, and there had never been a game with that final score.” After 12,837 games in NFL history, this one would turn out to be a first.
Stats was also the earliest internet purveyor of real-time sports scores and statistics, beginning with a partnership with America Online in the mid-’90s. Then, in 1996, a collaboration with Motorola to distribute game updates on a pager product called SportsTrax triggered the NBA to file what became a landmark lawsuit against the two companies. The league claimed that Stats and Motorola had infringed on its intellectual property. The MLB, NFL, and NHL filed briefs supporting the NBA, while media giants such as AOL, the AP, and the New York Times lined up behind Stats and Motorola. The companies lost at trial but won on appeal, paving the way for all of the game updates available online today. “It turned into a groundbreaking case that is the basis for sports scores and statistics being in the public domain,” Dewan says. “If we hadn’t done that, everything would be controlled and owned by the leagues.”
Chernow speaks about Dewan today with great reverence, calling him “the George Washington of Stats”: “He had grand ideas about what to do on the business side. He was always of the mind that even if we didn’t make an explicit profit on things, we should still start collecting and building up a historical database that could be a great value down the line.”
Dewan left Stats shortly after orchestrating its sale to Rupert Murdoch’s News Corp. in 2000 for $45 million. (Bill James soon followed him out the door.) “We were small compared to the other units of News Corp., such as Fox and HarperCollins, so it was hard to garner resources,” says Alan Leib, who succeeded Dewan at Stats, serving as president for two years. “We were making money, but we really didn’t move the needle.”
Advertisement
The needle mover came in 2008, when Stats acquired the SportVU technology. Grantland’s Zach Lowe would call the optical tracking system “the most important innovation in the NBA in recent years.” Applying principles similar to those militaries use to visually track missiles, a pair of Israeli entrepreneurs had developed the system to follow players and the ball in soccer. Stats immediately saw the potential for basketball. While baseball lends itself to analytical study because it unfolds in a series of discrete events, basketball’s far more dynamic format — 10 bodies and a ball moving quickly and fluidly around a large court — made the sport maddeningly difficult to evaluate. Given the right instrument to turn all that movement into data, Stats figured, the NBA could usher in its own Moneyball moment and literally change the game.
In 2009, a Stats crew flew to Orlando to conduct a SportVU demo during the NBA Finals, between the Magic and Lakers, using the technology to capture footage of game 3. Brian Kopp, the company’s senior vice president of strategy and development, presented the results to the NBA’s top brass in the hallway of the arena before the start of game 4. The NBA’s deputy (and future) commissioner, Adam Silver, was standing in front of the pack.
“We picked a play, a really close goaltending call, where Dwight Howard swatted a ball away,” Kopp recalls. “A lot of people thought that wasn’t goaltending. You couldn’t see it with the naked eye. But when we put data to it, we showed that the ball was on its way down by a couple inches and that goaltending was the right call. The NBA’s head of officiating was there, and he was very happy the officials got it right. It got people thinking: Imagine everything we could do if we started tracking all of this.”
Four years later, before the 2013–14 season, the NBA agreed to a league-wide deal allowing Stats to outfit all 29 arenas with SportVU. Not only did SportVU automate the collection of play-by-play data and make it sortable, it also gave GMs, coaches, and scouts a trove of groundbreaking new insights. If a coach, for example, knows that an opposing point guard drives right 72 percent of the time, he can direct his team to pressure that player to go to his weak side. Teams could also better assess how a prospective player might fit into their scheme. If, for instance, the data showed a high percentage of a player’s shots came via isolations, a team running a democratic, motion-based offense might think twice before acquiring him. SportVU even allowed teams to play around with different five-man combinations to see what would happen to offensive and defensive effectiveness when certain players were subbed in.
One of SportVU’s biggest impacts was in helping evaluate defense. Blocks, for example, had been the only quantitative way to measure a player’s ability to protect the rim. But at the 2013 Sloan conference, two presenters used the system’s data to prove that a dominant interior defender such as Dwight Howard reduces both the shooting percentage around the rim and the frequency of shots within five feet of the hoop. “Howard’s mere presence,” the two wrote in their study, “ ‘blocks’ shots before they happen.”
And the current 3-point-shot revolution in the NBA? “It was supported in part by the data that the SportVU cameras helped unearth about more efficient and higher expected point value shots,” says Ben Shields, a lecturer at the MIT Sloan School of Management and adviser for the school’s analytics conference. Teams are getting smarter about where they’re taking 3-pointers too. SportVU made it clear that the corners are the most desirable spots because they are not only more than a foot and a half closer than the top of the arc but also where 3-pointers tend to be least contested.
One ESPN basketball analyst I spoke with likened SportVU’s impact on the NBA to TV’s transition from standard to high definition: “The old version seemed just fine until the new one came along, and now it would be difficult to go back.”
Even so, the SportVU era in the NBA would last only four seasons. In 2016, news broke that Stats was losing its licensing partnership with the league for both optical tracking and game statistics. Last season was the first that Los Angeles–based Second Spectrum served as the NBA’s tracking partner; Switzerland-based Sportradar landed the data distribution contract. But that wasn’t the only hit Stats was taking. Sportradar, which entered the U.S. market in 2013 backed by such high-profile investors as Michael Jordan and Mark Cuban, quickly snatched away deals Stats had with the NFL, NHL, and NASCAR. Sources familiar with those negotiations say Sportradar significantly outbid Stats. (The companies pay the leagues for the rights to be the official gatherer and distributor of data.)
The loss of the league contracts came on the heels of Stats changing hands again. In 2014, the private equity firm Vista Equity Partners purchased the company from News Corp. and the Associated Press, which had run it as a joint venture since 2005. Two rounds of layoffs followed, affecting more than 80 people. (The company now has about 600 employees.)
Stats’ sprawling reporter network ensures it can continue to service clients, even without the league contracts. But the lack of direct access to emerging new data will be the “challenging part,” Kopp says. “The real value over time is going to come in the new information that’s coming to bear in the player tracking.” For example, Zebra Technologies, based in suburban Lincolnshire, handles tracking for the NFL through radio frequency identification tags embedded in players’ shoulder pads and the ball.
At the same time, Stats is facing increased competition from an assortment of relative newcomers, including Sportvision and Blinkfire Analytics, both based in Chicago. Even John Dewan is a rival; in 2002, he cofounded a data and analytics company now called Sports Info Solutions that caters mostly to pro teams. So is Kopp; the architect of the SportVU deal left Stats in 2014 and now has his own firm, Stretch 4 Advisory.
“Everybody is trying to figure out the next best way to solve the game. It’s like an arms race, with all the different talent that they’re accumulating,” says Aaron Charlton, a Stats research analyst who has been with the company for 13 years. “You try to stay ahead of it, keep evolving, and stay at the forefront of things. But trying to choose which things to spend money on, which things are going to be the next big thing — that’s the difficult thing for our executives right now.”
Patrick Lucey often hears that he’s ruining sports. “A lot of people tell me that: ‘You’re taking the romance out of it by being too analytical.’ My response is always, ‘We are making sports better.’ ”
To argue his case, he cites Tom Brady and Peyton Manning. “Are they the best athletes?” he asks, disregarding Manning’s 2016 retirement. “Definitely not. Do they predict the best? Absolutely they do. Being the fastest or having the highest vertical leap is just not enough. You have to be smart and prepare against your opponent. You have to be able to predict what they are going to do and devise a strategy to maximize your strengths and minimize the opponent’s strengths. To do this, you have to learn like a computer. And to do that, you need a lot of data, which luckily we have at Stats.”
Growing up outside of Brisbane, Australia, Lucey played soccer, rugby, tennis, and golf — but he was particularly enamored of cricket. “Like baseball,” he says, “it is a game full of statistics.” As a teen, he took to memorizing stats from cricket cards. “The numbers basically told the story of a match, a season, and the career of a player. That is how I really fell in love with sports and numbers.”
Before joining Stats in 2015, he spent five years at Disney Research doing sports analytics work to benefit Disney-owned ESPN. At the 2014 Sloan conference, he presented a paper that had attendees buzzing. He and some of his Disney Research colleagues had studied player-tracking data from a season’s worth of soccer to demystify the notion of home field advantage. They found it has little to do with cheering fans or a team’s familiarity with the pitch and everything to do with how teams play more aggressively at home.
Now it’s his application of OpenPose that could be the sports data industry’s next breakthrough. At last year’s Sloan conference, Lucey’s intern Panna Felsen, a doctorate student from the University of California, Berkeley, presented the results of a paper in which she used the skeleton-tracking technology to analyze 3-point shooting styles in the NBA. Her case study on Stephen Curry revealed that, objectively, the Warriors star “moves much more than the average player in all phases of his shot, and he takes a higher proportion of off-balance shots compared to the average player.” The results added an interesting wrinkle to Curry’s reputation as one of the best shooters in the history of the league.
“People were very, very excited,” Lucey says of the reaction at Sloan. “A lot of the advanced teams are saying, ‘That would be very useful to have.’ It’s more information, and it could aid them in making better decisions.”
If Lucey and his AI team can hone OpenPose to collect 3D data from a 2D TV image with a high degree of precision, it would undoubtedly be a major boon to Stats, eliminating the need for it to have cameras inside the arenas. Beyond exposing a wealth of novel data related to body position, the system would give the company a way of gathering SportVU-like tracking data without having to compete with rivals for exclusive league contracts that cost millions of dollars.
“Computer vision has come a long way,” Lucey says of OpenPose, explaining that it uses a “convolutional neural network,” a computational process modeled after the functioning of the human nervous system. “That’s how autonomous vehicles work. You can think about some of the stuff we’re doing here as like an autonomous vehicle for sport.”
Among those Sloan attendees who took notice was Ben Alamar, the director of analytics at ESPN. “In basketball, in the player-tracking data, you only have two dimensions of the players,” he says. “And so I don’t know the direction those players are facing. I don’t know if one of them is contesting a shot with his hands up. I don’t know if they’re jumping. Computer vision, if done in a complete way, gives me all of that.”
Advertisement
Alamar also sees an application of the system that could add new layers to perennial arguments such as the Jordan-versus-LeBron GOAT debate. “I could run the entire NBA history through their computer vision algorithms and collect all of that data. And that expands dramatically the kinds of stories and kinds of research that I can do.”
But does this universe of research, like the actual universe, expand infinitely? Once you have data on the microlevel of an athlete’s skeleton, what more is there to know? As a way into these questions, I tell Lucey about a comment Stats cofounder Bill James made on a podcast during the Sloan conference a few years ago. James was asked whether, after four decades of toil, the sabermetrics community had come up with “a direct answer for everything” in baseball. His response was characteristically self-effacing: “There are a million things that we don’t know. We have succeeded in taking a bucketful of knowledge out of an ocean of ignorance.”
“I agree with him,” Lucey says. “There’s so much information that we’re missing.”
“How do you fill that in?” I ask.
“We’re swimming in all this data, right? Well, we’ll never have enough data. Once you split game data three or four ways, you’re going to have only one or two examples per situation. That’s a permutation problem.”
“So how do you solve that?”
That’s when Lucey leans in, lowers his voice, and begins outlining a hazy concept that sounds both fantastical and nightmarish. It also seems to point the way toward the future of big data in sports. “In machine learning, there’s something called generative models, where you can start synthesizing new examples.”
“Synthesizing? As in making up scenarios?”
“That’s where machine learning is going,” he answers. “You have to synthesize new data to improve performance.”
Lucey can tell he’s blowing my mind, or at least testing the limits of my patience with artificial intelligence jargon, so he stutter-steps away from abstract ideas for a moment.
“Say the Golden State Warriors play the Houston Rockets four times in the regular season. How many times have they played in the playoffs?” (At that point, the teams were only one game into the 2018 Western Conference finals.) “This is not a representative sample. What you want is to have them play each other about 10,000 times. You need to simulate these games to reliably model what you’re possibly going to see in the forthcoming matchup.”
As a sports die-hard himself, Lucey is well aware that the notion of computer-generated games might be objectionable to certain fans. Although plenty of Silicon Valley startups are using “fake data” to train machine-learning algorithms to, say, identify human faces in photos, there seems to be something fundamentally wrong about applying the technique to sports, which has always bristled at fakeness, whether it be players throwing games, steroid-enhanced home runs, or theatrical flopping. But as an AI engineer, Lucey also realizes that any truly innovative technology will contend with some initial opposition.
So how, exactly, does one synthesize a basketball matchup 10,000 times? As I lob more questions at Lucey, he begins to get cagey, as if I’m Slugworth angling for the recipe to Willy Wonka’s Everlasting Gobstopper. He’s hyperaware that the last thing Stats needs at this moment is for a rival to get real insight into his R&D projects.
“There’s nothing else I can show you,” Lucey says. “Until we’ve productized it, I can’t really talk about it.”
As in sports, so, too, in the business of sports data: Winning isn’t everything — it’s the only thing.