This page presents the Text And MultiModal Imagery (TAMMI) project funded by the JCJC program of ANR
In recent years, remote sensing images have become more available than ever. These images contain information which is already used to track climate change, improve security and to understand and manage the environment. This data is however hard to interpret and often involves manual processing. With the increase of amount of data, interpretation becomes a limiting factor impacting the delay at which information is extracted, but also the domains in which such data can be used. While the data is here, a large audience cannot use it. In this project, we aim at making the access to the information contained in multimodal data easier and accessible to a new audience.
To this effect, we propose to use natural language as a mean to extract information from such data. We propose to use a generic approach: the data representation learnt is not specific to a task. To achieve this objective, a new database will be constructed, targeting tasks such as Visual Question Answering, Image Captioning and Image Query. We will study cross-modal shared representations, with a focus on robustness to missing data. Furthermore, we will aim at enhancing the interpretability of predictions made based on text and multimodal data through new methodological developments on the specific example of Visual Question Answering.