Multimodal Video Retrieval and Multimodal Language Modelling: MVRMLM 2024:

Hui Wang, Josef Kittler, Mark Gales, Rob Cooper, Maurice Mulvenna, Wing Ng, Yang Hua, Richard Gault, Abbas Haider, Guanfeng Wu

Research output: Contribution to conferenceAbstractpeer-review

Abstract

As the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example-based search, has become increasingly crucial. Existing metadata often fails to meet the needs of specific types of searches, especially when videos contain elements from different modalities, such as visual and audio. Consequently, developing video retrieval methods that can handle multi-modal content is essential. In designing our novel video retrieval framework named Multi-modal Video Search by Examples (MVSE)1, we focused on accuracy (precision and recall), efficiency (retrieval time in seconds), interactivity, and extensibility, with key components including advanced data processing and a user-friendly interface aimed at enhancing search effectiveness and user experience. With the advent of Large Language Models (LLMs), the interaction between multimodal data, including image and audio has been transformed with a significant leap forward towards a bigger goal of artificial general intelligence. This workshop aims to bring together experts from diverse domains to explore the possibilities of developing novel ways of multimodal data search, understanding and interaction.
Original languageEnglish
Pages1345-1346
Number of pages2
DOIs
Publication statusPublished (in print/issue) - 7 Jun 2024
EventInternational Conference on Multimedia Retrieval - Phuket, Thailand
Duration: 10 Jun 202414 Jun 2024
https://icmr2024.org/

Conference

ConferenceInternational Conference on Multimedia Retrieval
Abbreviated titleICMR
Country/TerritoryThailand
CityPhuket
Period10/06/2414/06/24
Internet address

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

Keywords

  • Deep Learning
  • Information Retrieval
  • interaction
  • Large Language Models
  • Multimodal data retrieval
  • Multimodal data understanding

Fingerprint

Dive into the research topics of 'Multimodal Video Retrieval and Multimodal Language Modelling: MVRMLM 2024:'. Together they form a unique fingerprint.

Cite this