CN111914594A - Group emotion recognition method based on motion characteristics - Google Patents

Group emotion recognition method based on motion characteristics Download PDF

Info

Publication number
CN111914594A
CN111914594A CN201910383943.XA CN201910383943A CN111914594A CN 111914594 A CN111914594 A CN 111914594A CN 201910383943 A CN201910383943 A CN 201910383943A CN 111914594 A CN111914594 A CN 111914594A
Authority
CN
China
Prior art keywords
features
network
emotion recognition
level
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910383943.XA
Other languages
Chinese (zh)
Other versions
CN111914594B (en
Inventor
卿粼波
许盛宇
吴晓红
何小海
滕奇志
周文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910383943.XA priority Critical patent/CN111914594B/en
Publication of CN111914594A publication Critical patent/CN111914594A/en
Application granted granted Critical
Publication of CN111914594B publication Critical patent/CN111914594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The invention provides a group emotion recognition method based on motion characteristics, and mainly relates to the analysis of emotion in a scene video sequence by utilizing a multi-channel group emotion recognition network. The method comprises the following steps: constructing a multi-channel group emotion recognition network, extracting low-level motion features of different time sequences in parallel by using the network, rearranging and fusing the low-level features extracted by each channel in a time dimension, and obtaining global high-level features through a 3D residual error module to realize group emotion recognition. The method effectively avoids the problems of deviation, long time consumption and the like of manually extracting the features, so that the method has stronger adaptability. In addition, the multi-channel network is used for carrying out feature extraction on the long video sequence time sequence, the time correlation between frames is fully considered, low-level time sequence features are rearranged and fused on a time dimension, the coupling between the features is reduced, and the accuracy and the efficiency of group emotion recognition are improved.

Description

Group emotion recognition method based on motion characteristics
Technical Field
The invention relates to an emotion recognition problem in the field of deep learning, in particular to a group emotion recognition method based on motion characteristics.
Background
The emotion analysis of the crowd judges the emotional state of the crowd by analyzing the behaviors, dresses and the like of the crowd. Videos exist in real life in a large number, such as unmanned aerial vehicle video monitoring, network sharing videos, 3D videos and the like. By analyzing the emotion of the crowd in the video, the emotion and emotion change of the crowd in the video can be learned dynamically, and the video emotion recognition method has a wide application prospect.
Group emotion recognition is mainly analyzed by the emotions of people in a scene when a target is close to a camera. However, in a new era of rapid development, the mere analysis of clearly visible faces and emotions of groups has not fully satisfied the perception of emotional states of people. Therefore, the need of the study is not only to lift the study object from the face of an individual to a group, but also to lift the study on the group to the study on the emotion of a large-scale crowd far away from the shot. With the increasing annual population of the world in recent years, large-scale meeting occasions and population events are more and more, so that the emotion analysis of population groups is particularly important.
The traditional crowd emotion recognition algorithm mainly utilizes shallow algorithms to extract motion characteristics among video frames. For some shallow algorithms (support vector machines, single-layer neural networks, etc.), they need to manually extract features, and given a limited number of samples and computing units, the shallow structure is difficult to effectively express the features of a complex model, and especially when the studied object has rich meanings, the generalization capability is obviously insufficient, so the shallow structure has certain limitations. Existing research aiming at population groups mainly focuses on studying behaviors in the population, and the research on the emotional aspects of the population is less. The basic type of group movement may reflect a representative mood of the group. However, these conventional algorithms often extract too single features, resulting in an analysis of performance that is not deep enough. And a small amount of related research also gives full play to the advantage of deep learning, ensures that the motion characteristics of the group are automatically extracted, simultaneously improves the richness of the characteristics, and realizes the analysis of the group emotion in the video.
Disclosure of Invention
The invention aims to provide a group emotion recognition method based on motion characteristics, which combines deep learning with group emotion in a video, introduces a 3D residual convolution neural network structure, analyzes time sequence characteristics in a group video to obtain motion states of people in the video, and further analyzes emotion information of the people.
For convenience of explanation, the following concepts are first introduced:
convolutional Neural Network (CNN): the convolutional neural network is designed based on the inspiration of a visual neural mechanism, is a multilayer feedforward neural network, each layer is composed of a plurality of two-dimensional planes, each neuron on each plane works independently, and the convolutional neural network mainly comprises a feature extraction layer and a feature mapping layer.
3D Residual Module (3D Residual Module) to solve the problem of learning the identity mapping function, a linear layer is fitted to another feature f (x) h (x) -x, the main idea being to remove the same body part, highlighting the slight variations. And replacing the 2D convolution operation in the residual error module with a 3D convolution operation to obtain the 3D residual error module.
The invention specifically adopts the following technical scheme:
a group emotion recognition method based on motion characteristics is characterized by comprising the following steps:
a. dividing the long video sequence in time sequence, and respectively extracting low-level motion characteristics of each segment by channels;
b. analyzing low-level motion characteristics in the group video by using a 3D residual convolutional neural network;
c. rearranging and fusing the motion characteristics of the multi-channel network in the step a in the time dimension, and analyzing global high-level characteristics;
the method mainly comprises the following steps:
(1) preprocessing a group scene video sequence, and uniformly processing the video sequence into a resolution of 112 multiplied by 112;
(2) dividing a video sequence to be analyzed into 4 short videos, and respectively taking out initial 4 frames in each short video as the input of a network to obtain low-level motion characteristics on different time sequences;
(3) introducing a multi-Channel group emotion recognition network (Channel1 Channel, Channel2 Channel, Channel3 Channel and Channel4 Channel) based on a 3D residual convolutional neural network, and extracting low-level motion features of corresponding time sequences of each short video.
(4) And performing recombination fusion on the acquired low-level motion characteristics on a time dimension through a fusion module, sending the combined global low-level characteristics into a 3D residual error module, analyzing the global high-level characteristics based on the long video, and finally classifying to obtain group emotion.
The invention has the beneficial effects that:
(1) the advantage of self-learning in the deep learning is fully developed, the machine can automatically learn the image characteristics, the problem of deviation and low efficiency of manually selecting the characteristics is effectively avoided, and the adaptive capacity is stronger.
(2) The original long video sequence is divided into small segments according to time sequence, data volume is compressed on the premise of keeping global information, and network speed and computing efficiency are improved.
(3) The 3D convolutional neural network is used for replacing the 2D convolutional neural network for feature extraction, time sequence information between frames is fully reserved, and the performance and efficiency of the network are optimized by using the 3D residual error module.
(4) The motion features extracted from the channels are rearranged and fused in the time dimension, the features with correlation in the 4 channels are fused together, the coupling between the features is reduced, the correlation of the motion features in the time dimension is fully mined, and the performance of the network on group emotion analysis is improved.
(5) The deep learning and the emotion analysis of the group scene are combined, the problem that the accuracy rate of the traditional method is low is solved, and the research value is improved.
Drawings
Fig. 1 is a diagram of a motion feature population emotion recognition network composition based on a 3D convolutional neural network.
Fig. 2 is an illustration of the way in which the low-level motion features extracted from multiple channels are rearranged and fused in the time dimension.
Detailed Description
The present invention is further described in detail with reference to the drawings and examples, it should be noted that the following examples are only for illustrating the present invention and should not be construed as limiting the scope of the present invention, and those skilled in the art should be able to make certain insubstantial modifications and adaptations to the present invention based on the above disclosure and should still fall within the scope of the present invention.
The group emotion recognition method based on the motion characteristics specifically comprises the following steps:
(1) and a mixed data set combining a CUHK group data set, a UCF data set, a Web data set and a PET2009 data set is used, each long video in the data set is divided into 4 sections of short videos, each section of short video is divided into a plurality of groups according to a group of 4 frames and recombined, a plurality of recombined short video sequences for training are formed, and the expansion of the training set is realized.
(2) Firstly, a Kinetics human motion video data set is adopted to pre-train the model, then the expanded short video data set is sent into 4 channels of a network in batches, and the motion characteristics of each time sequence are extracted respectively to obtain the corresponding low-level motion characteristics.
(3) The acquired 4-channel low-level motion features are recombined through a short video space-time feature fusion module, the low-level motion features acquired by each channel are firstly split and respectively divided into 4 feature segments, and then the feature segments with correlation are stacked together to acquire a recombined global low-level feature.
(4) And sending the fused global low-level features into a subsequent 3D residual module for continuous training to obtain global high-level features based on the long video, and finally classifying to obtain group emotion. And reversely propagating and optimizing the network parameters according to the classification result until an optimal network model is obtained.
(5) And inputting the test set data into a network, and verifying the model performance.

Claims (4)

1. A group emotion recognition method based on motion characteristics is characterized by comprising the following steps:
a. dividing the long video sequence in time sequence, and respectively extracting low-level motion characteristics of each segment by channels;
b. analyzing low-level motion characteristics in the group video by using a 3D residual convolutional neural network;
c. rearranging and fusing the motion characteristics of the multi-channel network in the step a in the time dimension, and analyzing global high-level characteristics;
the method mainly comprises the following steps:
(1) preprocessing a group scene video sequence, and uniformly processing the video sequence into a resolution of 112 multiplied by 112;
(2) dividing a video sequence to be analyzed into 4 short videos, and respectively taking out initial 4 frames in each short video as the input of a network to obtain low-level motion characteristics on different time sequences;
(3) introducing a multi-Channel group emotion recognition network (Channel1 Channel, Channel2 Channel, Channel3 Channel and Channel4 Channel) based on a 3D residual convolutional neural network, and extracting low-level motion features of corresponding time sequences of each short video;
(4) and performing recombination fusion on the acquired low-level motion characteristics on a time dimension through a fusion module, sending the combined global low-level characteristics into a 3D residual error module, analyzing the global high-level characteristics based on the long video, and finally classifying to obtain group emotion.
2. The group emotion recognition method based on motion characteristics as claimed in claim 1, wherein the average frame extraction method is adopted in step (2), and the video sequence to be analyzed is firstly divided into 4 short videos, and then the initial 4 frames of the 4 short videos are respectively taken, and the video sequence is compressed on the premise of keeping a certain global information, so that the calculation efficiency is improved.
3. The group emotion recognition method based on motion features as claimed in claim 1, wherein in step (3), a 3D convolutional neural network is used instead of a 2D convolutional neural network for feature extraction, so that time sequence information between frames is fully retained, and the performance and efficiency of the network are optimized by using a 3D residual module.
4. The group emotion recognition method based on motion features as claimed in claim 1, wherein the motion features extracted from the 4 channels respectively in step (4) are rearranged and fused in the time dimension, the features with correlation in the 4 channels are fused together, the coupling between the features is reduced, the correlation of the motion features in the time dimension is fully mined, and the group emotion analysis performance by the network is improved.
CN201910383943.XA 2019-05-08 2019-05-08 Group emotion recognition method based on motion characteristics Active CN111914594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910383943.XA CN111914594B (en) 2019-05-08 2019-05-08 Group emotion recognition method based on motion characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910383943.XA CN111914594B (en) 2019-05-08 2019-05-08 Group emotion recognition method based on motion characteristics

Publications (2)

Publication Number Publication Date
CN111914594A true CN111914594A (en) 2020-11-10
CN111914594B CN111914594B (en) 2022-07-01

Family

ID=73242780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910383943.XA Active CN111914594B (en) 2019-05-08 2019-05-08 Group emotion recognition method based on motion characteristics

Country Status (1)

Country Link
CN (1) CN111914594B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN112699785A (en) * 2020-12-29 2021-04-23 中国民用航空飞行学院 Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222388A1 (en) * 2007-11-16 2009-09-03 Wei Hua Method of and system for hierarchical human/crowd behavior detection
CN107169426A (en) * 2017-04-27 2017-09-15 广东工业大学 A kind of detection of crowd's abnormal feeling and localization method based on deep neural network
CN107368798A (en) * 2017-07-07 2017-11-21 四川大学 A kind of crowd's Emotion identification method based on deep learning
CN107958260A (en) * 2017-10-27 2018-04-24 四川大学 A kind of group behavior analysis method based on multi-feature fusion
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN109299700A (en) * 2018-10-15 2019-02-01 南京地铁集团有限公司 Subway group abnormality behavioral value method based on crowd density analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222388A1 (en) * 2007-11-16 2009-09-03 Wei Hua Method of and system for hierarchical human/crowd behavior detection
CN107169426A (en) * 2017-04-27 2017-09-15 广东工业大学 A kind of detection of crowd's abnormal feeling and localization method based on deep neural network
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN107368798A (en) * 2017-07-07 2017-11-21 四川大学 A kind of crowd's Emotion identification method based on deep learning
CN107958260A (en) * 2017-10-27 2018-04-24 四川大学 A kind of group behavior analysis method based on multi-feature fusion
CN109299700A (en) * 2018-10-15 2019-02-01 南京地铁集团有限公司 Subway group abnormality behavioral value method based on crowd density analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卿粼波 等: "基于多流CNN-LSTM网络的群体情绪识别", 《计算机应用研究》, 8 February 2018 (2018-02-08) *
张严浩: "基于结构化认知计算的群体行为分析", 《中国优秀博士学位论文全文数据库 信息科技辑》, 15 January 2018 (2018-01-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN112699785A (en) * 2020-12-29 2021-04-23 中国民用航空飞行学院 Group emotion recognition and abnormal emotion detection method based on dimension emotion model
CN112699785B (en) * 2020-12-29 2022-06-07 中国民用航空飞行学院 Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Also Published As

Publication number Publication date
CN111914594B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN108764072B (en) Blood cell subtype image classification method based on multi-scale fusion
Song et al. Temporal–spatial mapping for action recognition
CN111144448A (en) Video barrage emotion analysis method based on multi-scale attention convolutional coding network
CN112669325A (en) Video semantic segmentation method based on active learning
CN109858407B (en) Video behavior recognition method based on multiple information flow characteristics and asynchronous fusion
CN111914594B (en) Group emotion recognition method based on motion characteristics
CN111582122B (en) System and method for intelligently analyzing behaviors of multi-dimensional pedestrians in surveillance video
US20210056357A1 (en) Systems and methods for implementing flexible, input-adaptive deep learning neural networks
CN110795990A (en) Gesture recognition method for underwater equipment
CN110472622B (en) Video processing method and related device, image processing method and related device
CN107992937B (en) Unstructured data judgment method and device based on deep learning
GB2585261A (en) Methods for generating modified images
CN112862828B (en) Semantic segmentation method, model training method and device
WO2021184754A1 (en) Video comparison method and apparatus, computer device and storage medium
CN112132797B (en) Short video quality screening method
CN113392781A (en) Video emotion semantic analysis method based on graph neural network
Mansour et al. Design of integrated artificial intelligence techniques for video surveillance on iot enabled wireless multimedia sensor networks
CN113657272B (en) Micro video classification method and system based on missing data completion
CN111914600A (en) Group emotion recognition method based on space attention model
Aliakbarian et al. Deep action-and context-aware sequence learning for activity recognition and anticipation
CN111401116A (en) Bimodal emotion recognition method based on enhanced convolution and space-time L STM network
Chen et al. Design and implementation of video analytics system based on edge computing
Liu et al. Dap3d-net: Where, what and how actions occur in videos?
Yan et al. Self-supervised regional and temporal auxiliary tasks for facial action unit recognition
WO2023217138A1 (en) Parameter configuration method and apparatus, device, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant