Generating Missing Modalities: A Conditional Diffusion and Transformer Approach for Emotion Recognition

He  Yu; Chuwen  Zhang

doi:10.56028/aetr.15.1.1238.2025

Authors

He Yu
Chuwen Zhang

DOI:

https://doi.org/10.56028/aetr.15.1.1238.2025

Keywords:

Diffusion Model, Transformer, Generative Model, Incomplete Modality, Multimodal Emotion Classification.

Abstract

Multimodal models have significantly advanced traditional emotion recognition by utilizing information from text, audio, and visual modalities. Many studies have pushed the boundaries of this field. However, the absence of modalities remains a major challenge, hindering the model’s ability to capture and integrate cross-modal interactions effectively. Besides, conventional modality completion approaches often fail to preserve fine-grained details. To break through these limitations, we propose a novel modality completion framework based on Conditional Diffusion and Transformer (CDTP). By incorporating three types of prompts and conditions, CDTP enables more detailed representations within and across modalities. Experiments and ablation studies demonstrate that our method substantially enhances emotion recognition performance and exhibits strong robustness in scenarios with missing modalities. The source code will be publicly available at https://github.com/cwzhang689/DPT.

Generating Missing Modalities: A Conditional Diffusion and Transformer Approach for Emotion Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section