This will help make video searches less painful
No one likes being second. There’s a singular sense of accomplishment that comes from being first. First to get the moon. First to climb Everest. Or you know, the simple things, like being first to discover the video of Pizza Rat, the NY Subway pizza-hauling rodent or @SkeletonTunes, the dancing skeleton taking Twitter by storm.
But how do you find that awesome clip people will want to watch again and again among the hundreds of hours of video put online every minute? Right now, that can be a painfully inefficient process. You can cross your fingers and hope to find something through the online communities you follow. You can do a search on YouTube and hope you stumble unto a gem. Or hope social media delivers you an unseen nugget.
“By the time something comes up as a leader, it’s been widely covered and everyone knows about it,” Marcus Moretti, a product manager at news site Mic.com, told me. “We’re trying to move the conversation forward and surface undiscovered videos, instead of overplayed ones.”
After all, if you’re in the business of breaking news, you don’t want look like a lemming. Mic.com, a news site for millenials, has been able to dwindle down the daily number of videos its staff needs to survey from millions to about 4,000, Moretti told me, and to home in on the ones that are relevant to topics the organization cares about, like Hurricane Joaquin, National Coming Out Day or life in Kunduz, Afghanistan. Mic.com’s recent success streamlining its search for video gems is due to a new partnership with a small New York-based AI company called Dextro. In September, Mic.com started testing Dextro’s new artificial-intelligence product, dubbed “Sight, Sound and Motion,”that adds labels to a video according to what is said and happens in it the same way Google now auto-tags your photos based on their content (if you let it).
The Mic.com staff, which set Dextro on videos uploaded through Twitter, used the video-curation tool during the Papal visit to the U.S. to find otherwise untagged user-generated videos of Pope Francis. The system also surfaced unseen video from people on the ground at the tragic Hajj stampede last month. None of the videos they’ve found through “Dextro” has made it into a story yet, according to Moretti, but as the tool improves and the editors get comfortable using it, he thinks it’s going to be an integral part of the Mic’s editorial strategy to discover videos useful for the reporting process, as well as videos that might go viral.
Dextro aims to make the world’s b-roll more searchable, that mountain of content you and I produce every day with our phones, tablets and GoPros. For instance, Moretti says, the Mic Politics team plans to use the tool during their elections coverage. Political debates and conventions are heavily scripted, which makes for boring, predictable footage. But when politicians are out and about at smaller venues, they may be more spontaneous. If a juicy clip is captured by a smartphone and uploaded to the web, it can translate into millions of views and mentions for a news agency. Without a computer that understands video, it can be harder to find that content among the millions of other clips uploaded to the web each day.
The goal is to get machines to understand “everything a human interacts with when watching a video,” David Luan, Dextro’s CEO told me. Right now, most video searches rely on tags or summaries that are inconsistent at best, and incomplete at worst. It’s tedious and time-intensive to have humans label things. That’s why searching for videos doesn’t always go well. But for machines, this process is much faster. Given the right training, they can survey hundreds of thousands of videos simultaneously and decipher what’s in them. It’s not perfect, but it’s much cheaper and it gives you more reach. It can scan more video than any one human could, again for a fraction of the cost.
Dextro’s product isn’t perfect yet. I looked through some sample videos uploaded to the Dextro website, demonstrating how it labels, and the AI sometimes made some pretty bad mistakes, like labeling a video with the tag “human rights” when the woman in it is chatting about how her diet is making her butt big. That’s where the partnership with Mic benefits them. The news site gives Dextro engineers feedback on what works and what doesn’t so that they can improve the product before it launches more widely.
Artificial intelligence that understands video well is important beyond the news industry. Dextro plans to market this product to user-generated video platforms—think Periscope, Vine or Twitch—as well as content aggregators like Digg or sectors of the economy that increasingly depend on video, like advertising. Dextro could be used to flag inappropriate content better, like online bullying, shootings, porn or illegal streams of sporting events.
These days, online video is king. Facebook is experiencing a video-traffic explosion, with 4 billion daily views, according to Fortune. Analysts predict the social network will rake in $1.5 billion from video ad revenues next year, according to The Wall Street Journal. YouTube’s predicted revenues are a whopping $4 billion. With so much video being made, and so much to be made from it, there’s an incentive to get relevant video to the right eyeballs. For example, if AIs can see you’re posting videos of hoverboards, they might be able to target you with ads better. Dextro, whose other computer vision products are being used by security camera monitoring companies, wants to cash in on that game.
Dextro has serious competition though. Facebook has a team of AI experts working on video analysis who, last year, unveiled software called C3D (get it?) that’s trained to recognize “objects, actions, scenes and other frequently occurring categories in videos,” according to a blog post by Facebook. Google wouldn’t comment on how YouTube search works, but it’s probable they’re taking “advantage of both keywords and content,” says Andrej Karpathy, a computer vision researcher at Stanford who’s done AI work at Google. Google and Facebook’s tech would likely be kept within their walled gardens, so companies like Dextro want to make it more widely available to others who might not have the money to hire engineers with the chops to build the AI tools they need.
That collaboration between Mic.com’s humans and Dextro’s software speaks to another big trend in AI, in which we help our (future) machine overlords get smarter. With time, that may have implications beyond just better video search or filtering.
“We’re not even aware of all the potential applications and for what we can leverage this information,” said Hossein Mobahi, a computer vision researcher at MIT.
But, he says, it’s not hard to imagine better video analysis being useful for self-driving cars, especially the multi-sensory variety Dextro and tech titans like Google and Facebook are building. When you have both visual and audio signals to process what the road is like, your chances of making a mistake are lower, says Mobahi. A robocar might not “see” a child in its blindspot, but it could pick up her voice and figure out it should stay put, lest it risk hitting her. The more types of data available to computers the better.
Daniela Hernandez is a senior writer at Fusion. She likes science, robots, pugs, and coffee.