As AI becomes more prevalent, will we still need humans around during a live event broadcast? Paul Milligan sits down with Patrick Daly from Diversified.
You’ll struggle to find anyone who doesn’t agree that all the different parts of the AV industry will rely more and more on AI as time goes on. Tasks that can be automated will be handled by AI. Automation makes life easier, that’s its job, but what happens when you are talking about a live event, with so many variables that can go wrong, and so many decisions to be made in real time? Inavate sat down with Patrick Daly, innovation lead for the media business unit at Diversified to find out if humans will remain a vital part of any live transmission or just become another part of the chain.
Before we answer that, I asked Daly what he thought the role of AI was in media operations right now. It’s a multi-fold answer he says, “We are predominantly seeing the benefits of AI in auto-tagging of assets, looking at each frame and discerning objects, brands, people, actions etc. Typically we would have a number of cameras coming into an operational centre and we’d have humans sitting in front of logging machines, with an X-keys panel to do the logging functions quickly i.e. a live sports game indicating goals, the score, penalties and so on. Now with AI models able to do that visual detection, these models can supply all of that metadata, oftentimes richer, more complete metadata again on a frame by frame basis.” Downstream we can use that metadata as we have traditionally done to colour the asset management library he adds, “To discern where we need to look for highlights, but increasingly feed that metadata into Generative AI models that are then able to automatically create highlights.”
This is where AI gets very clever, “We can target a particular audience. If we have first or third party data on the viewers, their behaviours, their preferences, we can assemble the log material into personalised clips. If I know you have a preference for soccer, I can assemble soccer and all soccer-adjacent sports highlights and deliver those to you as a recap. In the case of Olympics, that could be the day’s important games. And maybe leave out some of the floor gymnastics and things we don’t think you’re in to, based on your consumer data.”
How does Daly see the use of AI evolving over the next three to four years? “We’ll start to see an increasing use of Gen AI not only to assemble the highlights, but in the case of Olympics we used some Gen AI to simulate the known announcer voices as a part of assembling the review packages, you can send out a personalised review of the day with the NBC or BBC anchor voice talent whom we’ve trained one of these GenAI audio models on, and now that announcer is able to speak that personalised summary. Today we have these voice models that we’ve trained, in two years we’ll have multi-modal models that can create the audio and the video assembly, four years from now we’ll be at the point where we can see over that horizon and it looks like pure magic from here when it comes to Gen AI.”
At the moment it feels as though AI will be taking a lot of what you could term ‘grunt work’ away from humans, i.e time consuming tasks that are not particularly enjoyable to perform, is that a fair assumption? Absolutely says Daly, this is where AI will excel. “It’s everyone’s mission right now to create more content with the same or fewer resources. Having the ability down the long tail of content creation to just churn out new content 24 hours a day using either Gen AI or using AI to augment otherwise known workflows in the media supply chain to deliver those artefacts, that’s where my clients increasingly tell me they want to go.”
The downside to AI is always the apocalyptic effect it is predicted to have on the job market, if computers can do everything, why do we need to pay humans to do a job when he/she can only work 8 hours a day instead of 24? That fear is misplaced says Daly, “AI is not to eliminate roles that exist today, it’s to augment those roles and enable those people to create increasingly higher quality content, or a greater quantity of content.”
If you look at high profile events he adds, you’ll see a heavier weighting towards human content creators, especially at the final edit stage or shot selection. “There was a great shot in the Olympics (Belgium’s Renco Evenepoel winning the men’s road race) at the finish line which was right in front of the Eiffel Tower. They pulled away and you could see the whole Eiffel Tower, with the bike still in the foreground. I find it hard to believe Gen AI or an automation system could have pulled that off without somebody staging it up front in automation. For some of the higher profile events and the more meaningful moments in history we’re still going to choose to have people making these decisions about what are the important shots, and what needs to be saved for the historical record.”
Where does Daly think the right balance lies between AI and human intervention? It’s a decision you’ll have to make ahead of time he says, “You have to determine is this event important enough to send people out to cover it or can we rely on machines based on the level of confidence we have today with our systems in place i.e. their ability to reliably capture and reproduce what happened? So for Daly it’s a decision placed on the importance of the event being filmed? “When we do a consulting engagement for our client upfront, we’ll discuss values, revenue is important, that’s a given, but what else is part of mission statement for this organisation? What do you want the reach of your message to be? What do you want the impact of your message to be? It’s understanding what those value drivers are for an organisation that will give you the framework of how we’re going to make the decision about what’s important. It’s easy to say there’s millions of dollars riding on this event, let’s put the rigour around the operation. You’ll have more discussion when the values are less objective and more subjective. Some global broadcasters value impact almost more than revenue.”
Does Daly (pictured below) think humans will always be needed in a ‘in case of emergency break glass’ scenario or does he think AI could potentially solve problems that happen in live broadcasts, such as camera failure? “For the foreseeable future (four to five years), there’s still a human in the loop on much of this. When looking at how do you handle failure scenarios in the system, and can AI be brought in for those problems? Potentially. You can get a lot done with a proper monitoring and automation system. If camera three goes down, I can detect if it’s a failure of the head or the CCU or the router port and, as long as I programme the response, I can take those actions e.g. I can remove camera 3 from the wall, remove it from the camera controller.”
You might be forgiven to think because we are talking about AI that industries using it must be forward-thinking, but Daly feels the broadcast and media and entertainment space has been a laggard on many technologies, even networking. “If I look at where we’re at now, (SMPTE) 2110 running 4K/8K flows across 100gb links, 400gb backbones. This comes straight out of the IT space and we’re not the first people using this IT inter-connect technology.
If we look at our media application vendors, and we want to stand up a live production piece of software, so many of them remain monolithic applications that run on Windows.” This is where an integrator can help, and Diversified is looking to help its clients to understand the benefits of AI concludes Daly, “It’s an effort we’re undertaking with our vendor community, to upskill them and start to bring some of the latest technologies to bear, one of which is machine learning, machine intelligence, AI.”
image credits:
shutterstock/Cooler8
shutterstock/Santipong Srikhamta
shutterstock/DeshaCAM