A machine-learning model can identify the action in a video clip and label it, without the help of humans. Humans observe the world through a combination of different modalities, like vision, hearing, and our understanding of language. Machines, on the other hand, interpret the world through data
Researchers at the Computer Science and Artificial Intelligence Laboratory have developed an artificial intelligence technique that allows machines to learn concepts shared between different modalities such as videos, audio clips, and images. The AI system can learn that a baby crying in a video is related to the spoken word “crying” in an audio clip, for example, and use this knowledge to identify and label actions in a video.
“The main challenge here is, how can a machine align those different modalities? As humans, this is easy for us. We see a car and then hear the sound of a car driving by, and we know these are the same thing. But for machine learning, it is not that straightforward,” says Alexander Liu, a graduate student in the Computer Science and Artificial Intelligence Laboratory and first author of a paper tackling this problem.
It performs better than other machine-learning methods at cross-modal retrieval tasks, which involve finding a piece of data, like a video, that matches a user’s query given in another form, like spoken language. Their model also makes it easier for users to see why the machine thinks the video it retrieved matches their query.
Rather than encoding data from different modalities onto separate grids, their method employs a shared embedding space where two modalities can be encoded together. This enables the model to learn the relationship between representations from two modalities, like video that shows a person juggling and an audio recording of someone saying “juggling.”
“Just like a Google search, you type in some text and the machine tries to tell you the most relevant things you are searching for. Only we do this in the vector space,” Liu says.
Singapore Latest News, Singapore Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Mobile County Public School System bans TikTok on all school system issued devicesNow the school system has started blocking the app and any attempts to install it on all system owned devices including cell phones, tablets and computers.
Read more »
Toyota Boshoku Bringing Autonomous Pod Concepts To CES, Hint At Future Of Interior Design | CarscoopsToyota Boshoku Bringing Autonomous Pod Concepts To CES, Hint At Future Of Interior Design | Carscoops carscoops
Read more »
Edward Norton Learns How He Is Related to Pocahontas on PBS ShowEdward Norton learned he is related to Pocahontas and slave owners, calling the findings 'uncomfortable, and you should be uncomfortable with them' Some online backed up his reaction, while others seemed frustrated by the news, referencing Norton's wealth
Read more »
Keke Palmer Shared Adorable Photos From Her Babymoon As She Learns To 'Take It Easy'Keke Palmer shared that she's 'really proud' of herself for resting during her recent 'babymoon' as she shared photos from her laid back trip.
Read more »
Edward Norton Learns Pocahontas is His 12th Great Grandmother'It just makes you realize what a small … piece of the whole human story you are'
Read more »
Edward Norton Learns He's Direct Decscendant Of Pocahontas In PBS VideoGlassOnion star Edward Norton appears in the Finding Your Roots season 9 premiere, where he finally settles a family rumor - are they related to the real Pocahontas? An official PBS clip finally gives the long-awaited answers to the Norton family lore.
Read more »