Abstract:Emotion recognition as a fundamental topic in computer vision has made tremendous progress, yet emotion recognition in unconstrained environments is still challenging. Existing methods mainly use face, posture, and scene information to recognize emotions, but these methods ignore the uncertainty of individuals in the context, and do not tap the emotional cues in the scene well. Aiming at the problems in existing research, a dual-branch network structure based on body and context cues is proposed. Two branches learning independently, then obtain the result of emotion classification through early fusion. For uncertainties of person in context, the body gesture attention mechanism is utilized to estimate the confidence coefficient and obtain the feature representation of body. For context branch, spatial attention mechanism and feature pyramid network are employed to fully obtain the emotional cues of different granularities in the scene. The experiment results demonstrated that the effectiveness of the proposed method in the EMOTIC dataset.