As race cars roar past the grandstands, visually impaired spectators follow the action in real time thanks to AI. Now people with impaired vision can enjoy the thrill of live motorsports by listening to an AI-generated commentator.

Capable of tracking car positions and making predictions as the race progresses, the system can describe all the live action just like a professional broadcast commentator. This live commentary AI is called Voice Watch, and it has opened up new possibilities for the visually impaired to experience spectator sports.

The technology won the top award in the Artificial Intelligence category at the 102nd Annual ADC Awards, presented by the Art Directors Club, which was founded in 1920 in New York City. The annual awards are the world’s longest-running citations for advertising.

Further, the technology was included in Good Design Best 100 for 2023. The Japan Good Design Award, the largest Asian award, was founded in 1957 and reflects Japanese design values and principles that aim to enrich lives, industries, and society.

To tell us about how the technology and ideas behind Voice Watch came together, we spoke with Kazuhiro Shimura, a creative director at Dentsu Inc. who led the project to develop the AI-based live commentary.

Erasing experience gap in spectator sports

How did live commentary AI get started?

Kazuhiro Shimura: The Toyota Mobility Foundation was calling for proposals for projects that could help visually impaired people enjoy auto racing. I came across this by chance on the internet.

On childcare leave at that time, I was taking stock of things and thinking about how nice it would be to lead a project that could benefit society in some way. The possibilities that such a project offered fascinated me, so I formed a team and submitted a proposal.

We got started by talking with people who are visually impaired. I was introduced to some who work in the Dentsu Group, and spoke with them directly.

Before meeting them, I had assumed that visually impaired people would be concerned about how to travel to the racing venue and how to get around inside the stadium. When I actually spoke with them, however, I discovered how many other practical issues need to be addressed.

A number of the people told me that, even were they to go to a race venue, they would not really understand what was happening. They would need a friend or family member to go with them to constantly explain how the race was unfolding. Some also said that, when the crowd gets excited, they could not feel the same emotion as was felt by sighted attendees, and so had no interest in attending such races.

In short, there is a significant gap between the experiences of people at a venue who are, and those who are not, able to watch a race.

Since the psychological hurdle creates an experience gap that prevents the visually impaired from attending motor races, I wanted to find a solution. I hoped to make it possible for these people to enjoy watching sports together with everyone else, so they would not miss out on the excitement of spectator sports.

That was the concept and objective of the proposal we submitted. The Voice Watch live-action commentary AI system started with that.

Integrating three types of AI—object recognition, sign detection, and speech frame AI

How does Voice Watch generate commentary?

Shimura: Voice Watch is a single, comprehensive AI system. But, take a closer look, and one sees it actually comprises three types of AI system.

The Voice Watch system screen

The first AI recognizes objects set up to track the race cars. Essentially, they serve as the eyes of visually impaired spectators. By tracking the race cars, the AI follows how the race is progressing in front of the grandstand: which team’s car is currently passing by and which cars are competing for the lead.

The second AI system detects signs of change in the race by analyzing a vast amount of racing data supplied in real time, such as lap times and car positions. It also predicts how the race will progress, considering the likelihood of the car in second place taking the lead in the next lap, or whether the gap between the cars in second and third place is gradually narrowing.

The third AI system is an original speech frame AI. It draws from the expertise of a professional broadcast commentator by analyzing their speech when covering previous races. Because this AI is linked to both the other two AI systems, it can reproduce live commentary on both the progress of the race and predictions of future developments.

Since the system has learned from a professional commentator, the AI commentary sounds natural and realistic, not wooden.

We want people who go to the racetrack not only to do so for the commentary but, of course, also to hear the roar of the powerful engines. For that reason, we made sure to balance the volume of the AI commentator’s voice with the actual venue sounds.

Live TV, radio sports target people not at an event; Voice Watch targets attendees at games and races

Shimura: When watching sports like baseball and soccer from the stands, there is no need to listen to live commentary because you can understand what is happening, based on visual information.

Thus, when developing Voice Watch, the team members and I wore eye masks while listening to the sounds at the racing circuit to try and understand what it would be like for visually impaired people sitting in the stands. It felt strange and even scary. Roaring sounds whizzed by unexpectedly, which felt really unpleasant.

However, when I wore the eye mask and listened to the race along with the Voice Watch live commentary, I gradually grew excited on hearing the sounds that I had earlier found unpleasant.

The human brain is truly fascinating. When one hears the roar of the race cars together with live commentary and sees the race in one’s mind’s eye, one again experiences those terrible noises. Yet this time, they are a source of excitement.

Voice Watch was used by visually impaired spectators at an auto race. What was their reaction?

Shimura: Yes, we had visually impaired people try it at Japan’s biggest endurance motorsports race, the Super Taikyu. Their response was most positive. Many told us they had enjoyed the race because of Voice Watch, and that they would be happy to attend sports events if they could use the commentary system. It was really moving for me and everyone on my team to give people who had never attended a motorsports event an opportunity to do so for the first time—and to enjoy it.

People with no visual impairment also tried out the system at the race. They said that, by using it, the race was fun to watch and easy to follow. Although we had created Voice Watch for the visually impaired, in the end we discovered that anyone could enjoy using it while experiencing motorsports in a new way.

AI commentary was generated for a Super Taikyu race, Japan’s biggest endurance motorsport race

Exciting events that lack live commentary

Will Voice Watch be used for other sports?

Shimura: We are now examining various options. We asked the visually impaired people who used Voice Watch at the Super Taikyu race about what kind of sporting events they would like to experience next. I expected them to be interested in big league sports like baseball and soccer but, to my surprise, most answered, “children’s sporting events.”

Many told us that, when they go to their children’s games or school sports day, they don’t know what is going on. They hesitate to ask other parents there about how their child is doing because those moms and dads are busy cheering for their own offspring. This makes going to their child’s school sports day or event a major hurdle for visually impaired parents.

Our response was to have Voice Watch generate live commentary of a children’s running race at a school sports day. A boy’s father, who is visually impaired, enjoyed this event more than ever before because he could hear the commentary about his own child during the race.

AI commentary was generated for a 50-meter running race at an elementary school’s sports day.

Nowadays, there are many sporting events that people are passionate about, but that cannot be covered live by commentators because of the costs involved, just like the children’s sports day. It made me realize how useful Voice Watch could be in the future.

Can Voice Watch be used by a broader range of users?

Shimura: In the same way it provided commentary on the parent’s child running in a race, AI can focus its commentary on specific athletes in a sporting event. This is useful for fans of particular athletes, and I believe it will open up a wide range of possibilities for Voice Watch.

For example, in the case of Toyota’s racing team at the Super Taikyu, the system can generate live commentary specifically for that team’s race cars. In other words, we can use it to produce commentary tailored to meet the wishes of specific fans, rather than for all spectators.

We can also change the language of the live commentary. At an international sporting event attended by spectators from around the world, for example, we could generate live commentary focusing on the participating countries’ athletes in the languages of their respective countries.

At first, our development team aimed to make the AI-generated commentary as realistic as possible. Now, however, we recognize that AI-generated commentary can do many interesting things that human commentators cannot. Therefore, we want to explore such possibilities offered by Voice Watch to create new experiences for spectators.

AI: a means of bringing ideas to fruition

You took part in developing Tuna Scope, an app that uses AI to judge tuna quality based on tail cross sections. It created a buzz some years ago. Were any concepts and technologies used for that applied to Voice Watch?

Shimura: Our approach was the same for both projects, in that we developed them based on a desire to create something new and exciting for society.

Both are powered by AI technology. Is that your starting point for creating new things?

Shimura: That’s how people tend to think about it, but to be honest, the process is a little different for me. I don’t start by asking whether I can create something new by using AI technology. Instead, I start by asking whether I can apply an idea in some way to solve a specific problem that people are facing today. AI comes next. To realize an idea that can solve such problems and excite people, AI is essential.

Since we are members of an advertising company, our work is not driven by technical concepts. Rather, it is based on combining technology and creativity to realize new experiences and provide solutions for people and for society as a whole.

If you want to create something new using AI technologies, you’ll find that there are lots of people around the world who intend to do the same thing. And, as a result, the same kinds of things are produced.

Our approach is to start with an idea and, if AI is needed to realize that idea, we will use it. Because our starting point is different, however, we should be able to continue creating completely new and original types of AI.

We hope to contribute to a better future by discovering challenges currently confronting people, and finding amazing ways to keep providing solutions. That is Dentsu’s strength, and it makes our work very fulfilling.

Voice Watch Website: Voice Watch TOP (voicewatch-project.com)

【Staff】
Creative Director :Kazuhiro Shimura(Dentsu Inc./Future Creative Center)
Art Director :Seri Tanaka(Dentsu Inc./Future Creative Center)
Planner :Susumu Tomita (Dentsu Inc./Future Creative Center)
Planner :Ryo Seki(Dentsu Inc./Future Creative Center)
Business Producer: Masafumi Kodama(Dentsu Inc.)
Date Scientist :Hatsumi Suzuki (Dentsu Digital Inc.)
Date Scientist :Toshiaki Uemura (Dentsu Digital Inc.)
Date Scientist: Samaneh Arzpeima (Dentsu Digital Inc.)
Producer :Yuusuke Michise (Dentsu Live Inc.)
Producer :Hiroki Gedou (Dentsu Live Inc.)
Producer :Masaya Ishii (Dentsu Live Inc.)
Director :Tomoyuki Katou (Dentsu Live Inc.)

Related Link

Voice Watch: How AI Live Comments Can Change Spectator Sports (Japanese language only)