Abstract:Human behavior is complex and diverse, and the information such as scene, appearance and location are closely related to human behavior. Aiming at the problem of how to make efficient comprehensive use of these information, a multi-person behavior recognition method integrating scene and interactive features was proposed, and the individual appearance features and scene features were extracted by two channels. For the individual channel, the attention mechanism module was used to focus on the areas with greater correlation with behavior, and the extracted individual appearance features combined with location features were input into the graph convolution network for relational reasoning. Among them, the graph convolution network used the cosine similarity method to measure the correlation between individual features, and combined the position features between individuals for relationship reasoning; For the scene channel, scene features were extracted by using ResNet-50 pretrained on place365 dataset. Finally, the final features obtained from individual channels and scene channels were weighted and fused to obtain the behavior recognition results of groups and all individuals. The experimental results on the Collective Activity Dataset (CAD) show that this method can improve the accuracy of behavior recognition, and the accuracy of group behavior and individual behavior reaches 92.29% and 78.19%.