CN114863563A

CN114863563A - Gait information-based emotion recognition method and device

Info

Publication number: CN114863563A
Application number: CN202210503748.8A
Authority: CN
Inventors: 黎明欣; 黄淋; 饶宇熹; 刘金山
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2022-08-05

Abstract

The invention provides a gait information-based emotion recognition method and device, and relates to the technical field of artificial intelligence. The method comprises the following steps: decomposing video data of a target object into a sequence of image frames; acquiring three-dimensional posture characteristics of a target object based on a posture extraction module of the image frame sequence and the emotion recognition model; a motion track feature extraction module based on the image frame sequence and the emotion recognition model obtains a first motion track feature and a second motion track feature; obtaining fusion characteristics according to the three-dimensional posture characteristics, the first motion track characteristics, the second motion track characteristics and the feature fusion layer of the emotion recognition model; obtaining an emotion recognition result according to the fusion characteristics and an output layer of the emotion recognition model; the emotion recognition model is obtained based on training video data and corresponding emotion label training. The device is used for executing the method. The emotion recognition method and device based on gait information provided by the embodiment of the invention improve the accuracy of emotion recognition.

Description

Gait information-based emotion recognition method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for emotion recognition based on gait information.

Background

With the rapid development of computer vision technology, human-computer interaction is gradually emphasized, and emotion recognition as an important part of human-computer interaction becomes a research hotspot in the field of computer vision.

In the prior art, there has been proposed a method for emotion recognition through a walking gait of a person, which uses a speaker to emit an audio signal wave source and a microphone to collect a signal reflected by a target pedestrian to obtain audio data; then processing the audio data to cut out signals containing gait information, and further extracting macroscopic gait features and microscopic gait features of the gait signals and embedded representation features of various neural networks; and finally, inputting the characteristics into a trained classifier to obtain the emotion classification result of the target walker. Because the method extracts the gait information of the person by using the reflection of the sound signal, the gait information is easily influenced by noise and obstacles, the collected audio data including the gait information is inaccurate, and the emotion recognition accuracy is further reduced.

Disclosure of Invention

Aiming at the problems in the prior art, embodiments of the present invention provide a method and an apparatus for emotion recognition based on gait information, which can at least partially solve the problems in the prior art.

In a first aspect, the present invention provides a method for emotion recognition based on gait information, including:

acquiring video data of a target object, and decomposing the video data into an image frame sequence;

acquiring three-dimensional posture characteristics of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model;

obtaining a first motion track characteristic and a second motion track characteristic of the target object based on the image frame sequence and a motion track characteristic extraction module of the emotion recognition model;

obtaining a fusion characteristic of the target object according to the three-dimensional posture characteristic, the first motion track characteristic, the second motion track characteristic and a characteristic fusion layer of the emotion recognition model;

obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model;

the emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer; the emotion recognition model is obtained based on training video data and corresponding emotion label training.

In a second aspect, the present invention provides an emotion recognition apparatus based on gait information, including:

the decomposition module is used for acquiring video data of a target object and decomposing the video data into an image frame sequence;

the first feature extraction module is used for obtaining three-dimensional posture features of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model;

the second feature extraction module is used for obtaining a first motion track feature and a second motion track feature of the target object based on the image frame sequence and the motion track feature extraction module of the emotion recognition model;

the feature fusion module is used for obtaining fusion features of the target object according to the three-dimensional posture features of the target object, the first motion track features, the second motion track features and the feature fusion layer of the emotion recognition model;

the recognition module is used for obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model;

In a third aspect, the present invention provides a computer device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the gait information-based emotion recognition method according to any of the above embodiments.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the gait information-based emotion recognition method according to any of the above embodiments.

In a fifth aspect, the present invention provides a computer program product, the computer program product comprising a computer program, the computer program being executed by a processor to implement the method for emotion recognition based on gait information according to any of the above embodiments.

The gait information-based emotion recognition method and device provided by the embodiment of the invention can acquire video data of a target object, decompose the video data into an image frame sequence, obtain three-dimensional posture characteristics of the target object based on a posture extraction module of the image frame sequence and an emotion recognition model, obtain first motion track characteristics and second motion track characteristics of the target object based on a motion track characteristic extraction module of the image frame sequence and the emotion recognition model, obtain fusion characteristics of the target object according to the three-dimensional posture characteristics, the first motion track characteristics, the second motion track characteristics and a characteristic fusion layer of the emotion recognition model, obtain emotion recognition results of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model, and perform emotion recognition by using posture characteristic information and gait time sequence characteristic information of a human body, the accuracy of emotion recognition is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. In the drawings:

fig. 1 is a flowchart illustrating a method for emotion recognition based on gait information according to a first embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an emotion recognition model provided in a second embodiment of the present invention.

Fig. 3 is a flowchart illustrating a method for emotion recognition based on gait information according to a third embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a gesture extraction module according to a fourth embodiment of the present invention.

Fig. 5 is a flowchart illustrating a gait information-based emotion recognition method according to a fifth embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a gesture extraction module according to a sixth embodiment of the present invention.

Fig. 7 is a flowchart illustrating a gait information-based emotion recognition method according to a seventh embodiment of the present invention.

Fig. 8 is a flowchart illustrating a gait information-based emotion recognition method according to an eighth embodiment of the present invention.

Fig. 9 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a ninth embodiment of the present invention.

Fig. 10 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a tenth embodiment of the present invention.

Fig. 11 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to an eleventh embodiment of the present invention.

Fig. 12 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a twelfth embodiment of the present invention.

Fig. 13 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a thirteenth embodiment of the invention.

Fig. 14 is a schematic physical structure diagram of an electronic device according to a fourteenth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict. According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.

In order to facilitate understanding of the technical solutions provided in the present application, the following first describes relevant contents of the technical solutions in the present application. Psychological research shows that emotion can be expressed by gait, for example, a sad person walks slowly or gets low on his head, so that the emotion recognition method based on gait information provided by the embodiment of the invention fully utilizes the posture characteristic information and the gait time sequence characteristic information of a human body to realize emotion recognition of a user so as to improve the accuracy of emotion recognition.

The following describes a specific implementation process of the emotion recognition method based on gait information according to the embodiment of the present invention, taking a server as an execution subject.

Fig. 1 is a schematic flow chart of a gait information-based emotion recognition method according to a first embodiment of the present invention, and as shown in fig. 1, the gait information-based emotion recognition method according to the embodiment of the present invention includes:

s101, acquiring video data of a target object, and decomposing the video data into an image frame sequence;

specifically, the server can acquire video data of a target object, which is walking video data of the target object over a period of time. The server decomposes the video data into an image frame sequence, which is a frame of image arranged in time sequence. Wherein, the target object in the embodiment of the invention is a person.

For example, in a bank website, video data of a client entering and exiting the website can be collected through a camera, a section of video data of the client for emotion recognition can be intercepted from the video data, and the video data is sent to a server, and the server receives the video data of the client for emotion recognition.

S102, acquiring three-dimensional posture characteristics of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model;

specifically, the server inputs the image frame sequence to a gesture extraction module of an emotion recognition model, and performs gesture feature extraction on the image frame sequence through the gesture extraction module to obtain a three-dimensional gesture feature of the target object.

The emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer. As shown in fig. 2, the image frame sequence is respectively input to the gesture extraction module and the motion trajectory feature extraction module, an output end of the gesture extraction module and an output end of the motion trajectory feature extraction module are connected to an input end of the feature fusion layer, an input end of the feature fusion layer is connected to an input end of the output layer, and an output end of the output layer outputs a result of emotion recognition.

The emotion recognition model is obtained based on training video data and corresponding emotion label training. The emotion labels are set according to actual needs, such as three types including satisfied type, unsatisfied type and neutral type, and the embodiment of the invention is not limited.

S103, obtaining a first motion track characteristic and a second motion track characteristic of the target object based on the image frame sequence and a motion track characteristic extraction module of the emotion recognition model;

specifically, the server inputs the image frame sequence to a motion trajectory feature extraction module of the emotion recognition model, and motion trajectory feature extraction is performed on the image frame sequence by the motion trajectory feature extraction module, so that a first motion trajectory feature and a second motion trajectory feature of the target object can be obtained.

S104, obtaining fusion characteristics of the target object according to the three-dimensional posture characteristics, the first motion track characteristics, the second motion track characteristics and the feature fusion layer of the emotion recognition model;

specifically, the server inputs the three-dimensional posture characteristic, the first motion trajectory characteristic and the second motion trajectory characteristic of the target object into a characteristic fusion layer of the emotion recognition model, and the three-dimensional posture characteristic, the first motion trajectory characteristic and the second motion trajectory characteristic of the target object are subjected to characteristic fusion in a splicing mode through the characteristic fusion layer, so that the fusion characteristic of the target object can be obtained. Wherein, the execution of step S103 and step S104 has no precedence relationship.

For example, the feature fusion layer sequentially splices the three-dimensional posture feature, the first motion trajectory feature and the second motion trajectory feature in a serial manner to obtain the fusion feature of the target object.

S105, obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model;

specifically, the server inputs the fusion features of the target object to an output layer of the emotion recognition model, and the fusion features are classified through the output layer to obtain an emotion recognition result of the target object. The output layer may be a full connection layer, and is set according to actual needs, which is not limited in the embodiments of the present invention.

The emotion recognition method based on gait information provided by the embodiment of the invention can acquire video data of a target object, decompose the video data into an image frame sequence, obtain three-dimensional posture characteristics of the target object based on a posture extraction module of the image frame sequence and an emotion recognition model, obtain first motion track characteristics and second motion track characteristics of the target object based on a motion track characteristic extraction module of the image frame sequence and the emotion recognition model, obtain fusion characteristics of the target object according to the three-dimensional posture characteristics, the first motion track characteristics and the second motion track characteristics of the target object and a characteristic fusion layer of the emotion recognition model, obtain emotion recognition results of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model, and perform emotion recognition by utilizing posture characteristic information and gait time sequence characteristic information of a human body, the accuracy of emotion recognition is improved.

Fig. 3 is a schematic flowchart of an emotion recognition method based on gait information according to a third embodiment of the present invention, and as shown in fig. 3, further, on the basis of the foregoing embodiments, the obtaining a three-dimensional posture feature of the target object by the posture extraction module based on the image frame sequence and the emotion recognition model includes:

s301, obtaining a skeleton diagram of the target object according to the image frame sequence and a posture extraction network of the posture extraction module;

specifically, the server inputs the image frame sequence into a pose extraction network of the pose extraction module, and performs joint point extraction, shutdown point position drawing, and joint point connection on the image frame sequence through the pose extraction network, so as to obtain a skeleton diagram of the target object. The gesture extraction network may include a convolutional gesture machine (CPM), and the CPM is used to extract a joint point.

For example, the image frame sequence includes each frame image whose size is 368x368, and after the above frame images are input to the posture extraction network, n x3x17 images can be obtained, n represents the number of people in the images, since the present application performs image recognition on a single person, n is 1, 3 represents that the joint points in the images are represented by three-dimensional coordinates, and 17 is the number of joint points. And sequentially generating 17 joint points according to a traversal method of the tree structure, keeping the spatial association relationship of each joint point, and connecting the 17 joint points to obtain a skeleton diagram.

S302, obtaining three-dimensional posture characteristics of the target object according to the skeleton diagram of the target object and a characteristic extraction network of the posture extraction module; the gesture extraction module comprises a gesture extraction network and a feature extraction network.

Specifically, the server inputs the skeleton diagram of the target object into the feature extraction network of the gesture extraction module to perform gesture feature extraction, so as to obtain the three-dimensional gesture feature of the target object. The feature extraction network is set according to actual needs, and the embodiment of the present invention is not limited.

The gesture extraction module comprises a gesture extraction network and a feature extraction network. As shown in fig. 4, the posture extraction network and the feature extraction network are sequentially connected, the feature extraction network may include a spatial structure extraction unit and a temporal relationship extraction unit, which are sequentially connected, the spatial structure extraction unit may be implemented by two sequentially connected one-dimensional convolutional layers for extracting spatial structure features from the skeleton map, and the temporal relationship extraction unit may be implemented by a Long Short-Term Memory artificial neural network (LSTM) for further extracting temporal relationships of the features. The three-dimensional attitude feature output by the network is extracted through the features and is a space-time sequence feature.

Fig. 5 is a schematic flowchart of an emotion recognition method based on gait information according to a fifth embodiment of the present invention, and as shown in fig. 5, based on the above embodiments, further, the obtaining a first motion trajectory feature and a second motion trajectory feature of the target object by the motion trajectory feature extraction module based on the image frame sequence and the emotion recognition model includes:

s501, obtaining a contour map corresponding to each frame of image of the image frame sequence according to the image frame sequence and the background extraction layer;

specifically, the server inputs the image frame sequence into a background extraction layer, and extracts the contour map of the target object from each frame image of the image frame sequence through the background extraction layer, so as to obtain the contour map corresponding to each frame image of the image frame sequence. The background extraction layer may adopt a background extraction algorithm such as a background difference method, an interframe difference method, a ViBe, and the like.

S502, acquiring a gait sequence diagram according to a contour map corresponding to each frame image of the image frame sequence and a gait sequence layer;

specifically, the server inputs a contour map corresponding to each frame image of the image frame sequence into a gait time sequence layer, and stacks each frame image through the gait time sequence layer, so that a gait time sequence chart can be obtained.

For example, for the contour map of the height H and the width W corresponding to each frame image of the image frame sequence, the gait sequence layer stacks the contour maps of the height H and the width W corresponding to each frame image, and a gait sequence diagram of [ T, H, W ] dimension can be obtained, where T represents the time length of the image frame sequence.

S503, acquiring a motion track of a first part of the target object as a first motion track and acquiring a running track of a second part of the target object as a second motion track according to the gait sequence diagram and the slicing layer;

specifically, the server inputs the gait sequence diagram into a slicing layer, acquires a motion trail of a first part of the target object from the gait sequence diagram through the slicing layer as a first motion trail, and acquires a motion trail of a second part of the target object from the gait sequence diagram as a second motion trail. Wherein the first portion may be a hand of the target object, and the second portion may be a leg of the target object.

For example, different parts of the target object have different positions in the gait sequence diagram, and corresponding parts of the hand in the gait sequence diagram can be cut according to the positions of the hand of the target object in the gait sequence diagram to form a first motion trail. Similarly, the corresponding part of the leg in the gait sequence diagram can be cut according to the position of the leg of the target object in the gait sequence diagram to form a second motion trail. The positions of different parts of the target object in the gait sequence diagram can be set according to actual needs, and the embodiment of the invention is not limited.

S504, extracting a network according to the first motion track, the second motion track and the motion characteristics to obtain the first motion track characteristics and the second motion track characteristics;

specifically, the server inputs the first motion trajectory into the motion feature extraction network, and performs feature extraction from the first motion trajectory through the motion feature extraction network, so as to obtain the first motion feature trajectory. And the server inputs the second motion track into the motion feature extraction network, and performs feature extraction on the second motion track through the motion feature extraction network to obtain the second motion feature track. The motion feature extraction network may be established based on a Convolutional Neural network (CNN for short).

The motion trail feature extraction module comprises a background extraction layer, a gait time sequence layer, a cutting layer and a motion feature extraction network which are sequentially connected. As shown in fig. 6, the image frame sequence is input to the background extraction layer, the output end of the background extraction layer is connected to the input end of the gait time sequence layer, the output end of the gait time sequence layer is connected to the input end of the slice layer, and the output end of the slice layer is connected to the input end of the motion feature extraction network.

Fig. 7 is a schematic flowchart of a gait information-based emotion recognition method according to a seventh embodiment of the present invention, and as shown in fig. 7, on the basis of the foregoing embodiments, the step of obtaining the emotion recognition model based on training video data and corresponding emotion labels further includes:

s701, acquiring training video data and corresponding emotion labels, wherein the training video data comprises a plurality of video segments, and each video segment corresponds to one emotion label;

specifically, walking video segments of different people can be collected as training video data, and each segment of video segment included in the training video data is classified and labeled manually, so that an emotion label corresponding to each video segment is obtained. The server may obtain training video data and corresponding emotion labels. The emotion label is set according to actual needs, and the embodiment of the invention is not limited.

For example, a camera may be installed at a doorway of a business hall of a banking outlet to collect gait information of a customer, a video clip may be captured from collected video data of different customers as training video data, and an expert classifies and tags emotions of the customer according to information such as expressions of the customer in the video clip to obtain emotion tags corresponding to each video clip included in the training video data.

S702, decomposing each video segment into a corresponding image frame sequence;

specifically, the server decomposes each video segment included in the training video data into an image frame sequence corresponding to each video segment. The specific implementation process of this step is similar to the obtaining process of the image frame sequence in step S101, and is not described herein again.

S703, training a preset model according to the image frame sequence corresponding to each video clip and the corresponding emotion label to obtain the emotion recognition model; the preset model comprises a gesture extraction module, a motion track feature extraction module, a feature fusion layer and an output layer.

Specifically, the server trains a preset model according to the image frame sequence corresponding to each video clip and the corresponding emotion label, so as to obtain the emotion recognition model. The preset model corresponds to the emotion recognition model and comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer. It can be understood that the preset model comprises model parameters of a gesture extraction module, a motion track feature extraction module, a feature fusion layer and an output layer, and the emotion recognition model comprises model parameters of the gesture extraction module, the motion track feature extraction module, the feature fusion layer and the output layer, which are different.

Fig. 8 is a schematic flowchart of a method for emotion recognition based on gait information according to an eighth embodiment of the present invention, and as shown in fig. 8, on the basis of the foregoing embodiments, further, the training a preset model according to an image frame sequence corresponding to each video clip and a corresponding emotion label, and the obtaining the emotion recognition model includes:

s801, acquiring training three-dimensional posture characteristics corresponding to the video clip based on an image frame sequence corresponding to the video clip and a posture extraction module included by the preset model;

specifically, the server inputs the image frame sequence corresponding to the video clip into the gesture extraction module of the preset model, and performs gesture feature extraction on the image frame sequence corresponding to the video clip through the gesture extraction module of the preset model to obtain the training three-dimensional gesture feature corresponding to the video clip. The gesture extraction module of the preset model comprises a gesture extraction network and a feature extraction network.

S802, obtaining a first training motion track characteristic and a second training motion track characteristic corresponding to the video segment based on the image frame sequence corresponding to the video segment and a motion track characteristic extraction module included by the preset model;

specifically, the server inputs the image frame sequence corresponding to the video segment into the motion trajectory feature extraction module of the preset model, and the motion trajectory feature extraction module of the preset model extracts the motion trajectory feature of the image frame sequence corresponding to the video segment, so as to obtain a first training motion trajectory feature and a second training motion trajectory feature of the video segment.

S803, obtaining training fusion characteristics according to the training three-dimensional posture characteristics, the first training motion track characteristics, the second training motion track characteristics and the characteristic fusion layer of the preset model corresponding to the video clip;

specifically, the server inputs the training three-dimensional posture characteristic, the first training motion trajectory characteristic and the second training motion trajectory characteristic corresponding to the video segment into a characteristic fusion layer of the preset model, and performs characteristic fusion on the training three-dimensional posture characteristic, the first training motion trajectory characteristic and the second training motion trajectory characteristic of the video segment in a splicing manner through the characteristic fusion layer, so as to obtain a fusion characteristic corresponding to the video segment.

S804, obtaining a training output result corresponding to the video clip according to the training fusion feature corresponding to the video clip and the output layer of the preset model.

Specifically, the server inputs the fusion features corresponding to the video clips into an output layer of the preset model, and the fusion features are classified through the output layer to obtain training output results corresponding to the video clips, wherein the training output results indicate emotion classifications corresponding to the video clips. And updating the model parameters of the preset model according to the training output result corresponding to the video segment and the difference of the emotion label corresponding to the video segment. Wherein the output layer may comprise a loss function. The loss function is selected according to actual needs, for example, a cross entropy loss function is adopted, and the embodiment of the present invention is not limited.

On the basis of the foregoing embodiments, further, the decomposing the video data into a sequence of image frames includes:

and each frame of image is obtained from the video data to be cut and scaled, and each frame of image with the same pixel size is obtained to form the image frame sequence.

Specifically, after the server decomposes a frame of image from the video data, the frame of image may be cropped and scaled to be processed into an image with a preset pixel size, and the frame of image with the same pixel size obtained from the video data constitutes the image frame sequence, that is, the frame of image in the image frame sequence has the same size. The preset pixel size is set according to actual needs, for example, set to 368 × 368 pixel size, which is not limited in the embodiment of the present invention.

In addition to the above embodiments, the first motion trajectory feature is a motion trajectory feature corresponding to a hand of the target object, and the second motion trajectory feature is a motion trajectory feature corresponding to a leg of the target object.

Specifically, the first motion trail feature is obtained based on motion trail extraction of the hand of the target object and is a motion trail feature corresponding to the hand of the target object. The second motion trail feature is obtained by extracting the motion trail of the leg of the target object and is the motion trail feature corresponding to the leg of the target object.

The emotion recognition method based on gait information provided by the embodiment of the invention combines the posture space-time sequence characteristics of pedestrians and the motion track characteristics of hands and legs with obvious motion in the walking process to recognize emotions, and has the advantages of non-contact, long distance and the like compared with the traditional modes based on faces, voiceprints and the like.

Fig. 9 is a schematic structural diagram of a gait information-based emotion recognition apparatus according to a ninth embodiment of the present invention, and as shown in fig. 9, the gait information-based emotion recognition apparatus according to the embodiment of the present invention includes a decomposition module 901, a first feature extraction module 902, a second feature extraction module 903, a feature fusion module 904, and a recognition module 905, where:

the decomposition module 901 is configured to obtain video data of a target object and decompose the video data into an image frame sequence; the first feature extraction module 902 is configured to obtain a three-dimensional pose feature of the target object based on the image frame sequence and a pose extraction module of an emotion recognition model; the second feature extraction module 903 is configured to obtain a first motion trajectory feature and a second motion trajectory feature of the target object based on the image frame sequence and a motion trajectory feature extraction module of the emotion recognition model; the feature fusion module 904 is configured to obtain a fusion feature of the target object according to the three-dimensional posture feature of the target object, the first motion trajectory feature, the second motion trajectory feature, and the feature fusion layer of the emotion recognition model; the recognition module 905 is configured to obtain an emotion recognition result of the target object according to the fusion feature of the target object and an output layer of the emotion recognition model; the emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer; the emotion recognition model is obtained based on training video data and corresponding emotion label training.

Specifically, the decomposition module 901 can acquire video data of a target object, which is walking video data of the target object for a period of time. The decomposition module 901 decomposes the video data into an image frame sequence, which is a frame of images arranged in time sequence. Wherein, the target object in the embodiment of the invention is a person.

The first feature extraction module 902 inputs the image frame sequence to a gesture extraction module of the emotion recognition model, and performs gesture feature extraction on the image frame sequence through the gesture extraction module to obtain a three-dimensional gesture feature of the target object.

The second feature extraction module 903 inputs the image frame sequence to a motion trajectory feature extraction module of the emotion recognition model, and performs motion trajectory feature extraction on the image frame sequence through the motion trajectory feature extraction module, so as to obtain a first motion trajectory feature and a second motion trajectory feature of the target object.

The feature fusion module 904 inputs the three-dimensional posture feature, the first motion trajectory feature and the second motion trajectory feature of the target object into a feature fusion layer of the emotion recognition model, and performs feature fusion on the three-dimensional posture feature, the first motion trajectory feature and the second motion trajectory feature of the target object in a splicing manner through the feature fusion layer, so as to obtain a fusion feature of the target object.

The recognition module 905 inputs the fusion features of the target object into an output layer of the emotion recognition model, and classifies the fusion features through the output layer to obtain an emotion recognition result of the target object. The output layer may be a full connection layer, and is set according to actual needs, which is not limited in the embodiments of the present invention.

The emotion recognition device based on gait information provided by the embodiment of the invention can acquire video data of a target object, decompose the video data into an image frame sequence, obtain a three-dimensional posture characteristic of the target object based on a posture extraction module of the image frame sequence and an emotion recognition model, obtain a first motion track characteristic and a second motion track characteristic of the target object based on a motion track characteristic extraction module of the image frame sequence and the emotion recognition model, obtain a fusion characteristic of the target object according to the three-dimensional posture characteristic, the first motion track characteristic and the second motion track characteristic of the target object and a characteristic fusion layer of the emotion recognition model, obtain an emotion recognition result of the target object according to the fusion characteristic of the target object and an output layer of the emotion recognition model, and perform emotion recognition by utilizing posture characteristic information and gait time sequence characteristic information of a human body, the accuracy of emotion recognition is improved.

Fig. 10 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a tenth embodiment of the present invention, and as shown in fig. 10, on the basis of the foregoing embodiments, further, the first feature extraction module 902 includes a first extraction unit 9021 and a second extraction unit 9022, where:

the first extraction unit 9021 is configured to obtain a skeleton diagram of the target object according to the image frame sequence and a posture extraction network of the posture extraction module; the second extraction unit 9022 is configured to obtain a three-dimensional posture feature of the target object according to the skeleton diagram of the target object and a feature extraction network of the posture extraction module; the gesture extraction module comprises a gesture extraction network and a feature extraction network.

Fig. 11 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to an eleventh embodiment of the present invention, and as shown in fig. 11, the second feature extraction module 903 includes a third extraction unit 9031, an obtaining unit 9032, a segmentation unit 9033, and a fourth extraction unit 9034, where:

a third extraction unit 9031 obtains a contour map corresponding to each frame image of the image frame sequence according to the image frame sequence and the background extraction layer; the obtaining unit 9032 is configured to obtain a gait sequence diagram according to a contour map corresponding to each frame image of the image frame sequence and a gait sequence layer; the segmentation unit 9033 is configured to acquire a motion trajectory of a first portion of the target object as a first motion trajectory and acquire a motion trajectory of a second portion of the target object as a second motion trajectory according to the gait sequence diagram and the segmentation layer; a fourth extraction unit 9034 is configured to extract a network according to the first motion trajectory, the second motion trajectory, and the motion feature, to obtain the first motion trajectory feature and the second motion trajectory feature; the motion trail feature extraction module comprises a background extraction layer, a gait time sequence layer, a cutting layer and a motion feature extraction network which are sequentially connected.

Fig. 12 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a twelfth embodiment of the present invention, as shown in fig. 12, on the basis of the foregoing embodiments, further, the emotion recognition apparatus based on gait information according to the embodiment of the present invention further includes an obtaining module 906, a preprocessing module 907, and a training module 908, where:

the obtaining module 906 is configured to obtain training video data and corresponding emotion labels, where the training video data includes a plurality of video segments, and each video segment corresponds to one emotion label; the pre-processing module 907 is used for decomposing each video segment into a corresponding image frame sequence; the training module 908 is configured to train a preset model according to the image frame sequence corresponding to each video segment and the corresponding emotion label to obtain the emotion recognition model; the preset model comprises a gesture extraction module, a motion track feature extraction module, a feature fusion layer and an output layer.

Fig. 13 is a schematic structural diagram of an emotion recognition apparatus based on gait information according to a thirteenth embodiment of the present invention, and as shown in fig. 13, on the basis of the foregoing embodiments, the training module 908 further includes a first feature extraction unit 9081, a second feature extraction unit 9082, a feature fusion unit 9083, and a classification unit 9084, where:

the first feature extraction unit 9081 is configured to obtain a training three-dimensional pose feature based on the image frame sequence corresponding to the video clip and a pose extraction module included in the preset model; the second feature extraction unit 9082 is configured to obtain a first training motion trajectory feature and a second training motion trajectory feature corresponding to the video segment based on the image frame sequence corresponding to the video segment and a motion trajectory feature extraction module included in the preset model; the feature fusion unit 9083 is configured to obtain a training fusion feature according to the training three-dimensional posture feature, the first training motion trajectory feature, the second training motion trajectory feature, and the feature fusion layer of the preset model, which correspond to the video segment; the classification unit 9084 is configured to obtain a training output result corresponding to the video clip according to the training fusion feature corresponding to the video clip and the output layer of the preset model.

On the basis of the foregoing embodiments, further, the decomposition module 901 is specifically configured to:

and obtaining each frame of image from the video data to perform cutting and scaling, and obtaining each frame of image with the same pixel size to form the image frame sequence.

The embodiment of the apparatus provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.

It should be noted that the emotion recognition method and apparatus based on gait information provided in the embodiments of the present invention can be used in the financial field, and can also be used in any technical field other than the financial field.

Fig. 14 is a schematic physical structure diagram of an electronic device according to a fourteenth embodiment of the present invention, and as shown in fig. 14, the electronic device may include: a processor (processor)1401, a communication Interface (Communications Interface)1402, a memory (memory)1403, and a communication bus 1404, wherein the processor 1401, the communication Interface 1402, and the memory 1403 communicate with each other via the communication bus 1404. The processor 1401 may call logical instructions in the memory 1403 to perform the following method: acquiring video data of a target object, and decomposing the video data into an image frame sequence; acquiring three-dimensional posture characteristics of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model; obtaining a first motion track characteristic and a second motion track characteristic of the target object based on the image frame sequence and a motion track characteristic extraction module of the emotion recognition model; obtaining a fusion characteristic of the target object according to the three-dimensional posture characteristic, the first motion track characteristic, the second motion track characteristic and a characteristic fusion layer of the emotion recognition model; obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model; the emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer; the emotion recognition model is obtained based on training video data and corresponding emotion label training.

In addition, the logic instructions in the memory 1403 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The present embodiment discloses a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, including: acquiring video data of a target object, and decomposing the video data into an image frame sequence; acquiring three-dimensional posture characteristics of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model; obtaining a first motion track characteristic and a second motion track characteristic of the target object based on the image frame sequence and a motion track characteristic extraction module of the emotion recognition model; obtaining a fusion characteristic of the target object according to the three-dimensional posture characteristic, the first motion track characteristic, the second motion track characteristic and a characteristic fusion layer of the emotion recognition model; obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model; the emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer; the emotion recognition model is obtained based on training video data and corresponding emotion label training.

The present embodiment provides a computer-readable storage medium, which stores a computer program, where the computer program causes the computer to execute the method provided by the above method embodiments, for example, the method includes: acquiring video data of a target object, and decomposing the video data into an image frame sequence; acquiring three-dimensional posture characteristics of the target object based on the image frame sequence and a posture extraction module of an emotion recognition model; obtaining a first motion track characteristic and a second motion track characteristic of the target object based on the image frame sequence and a motion track characteristic extraction module of the emotion recognition model; obtaining a fusion characteristic of the target object according to the three-dimensional posture characteristic, the first motion track characteristic, the second motion track characteristic and a characteristic fusion layer of the emotion recognition model; obtaining an emotion recognition result of the target object according to the fusion characteristics of the target object and an output layer of the emotion recognition model; the emotion recognition model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer; the emotion recognition model is obtained based on training video data and corresponding emotion label training.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A gait information-based emotion recognition method is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining three-dimensional pose features of the target object based on the sequence of image frames and a pose extraction module of an emotion recognition model comprises:

obtaining a skeleton diagram of the target object according to the image frame sequence and a posture extraction network of the posture extraction module;

acquiring three-dimensional posture characteristics of the target object according to the skeleton diagram of the target object and a characteristic extraction network of the posture extraction module; the gesture extraction module comprises a gesture extraction network and a feature extraction network.

3. The method of claim 1, wherein the obtaining a first motion trajectory feature and a second motion trajectory feature of the target object based on the sequence of image frames and a motion trajectory feature extraction module of the emotion recognition model comprises:

obtaining a contour map corresponding to each frame image of the image frame sequence according to the image frame sequence and a background extraction layer;

acquiring a gait sequence diagram according to a contour map corresponding to each frame image of the image frame sequence and a gait sequence layer;

acquiring a motion track of a first part of the target object as a first motion track and acquiring a running track of a second part of the target object as a second motion track according to the gait sequence diagram and the slicing layer;

extracting a network according to the first motion track, the second motion track and the motion characteristics to obtain the first motion track characteristics and the second motion track characteristics; the motion trail feature extraction module comprises a background extraction layer, a gait time sequence layer, a cutting layer and a motion feature extraction network which are sequentially connected.

4. The method of claim 1, wherein the step of obtaining the emotion recognition model based on training video data and corresponding emotion label training comprises:

acquiring training video data and corresponding emotion labels, wherein the training video data comprises a plurality of video segments, and each video segment corresponds to one emotion label;

decomposing each video segment into a corresponding sequence of image frames;

training a preset model according to the image frame sequence corresponding to each video clip and the corresponding emotion label to obtain the emotion recognition model; the preset model comprises a posture extraction module, a motion track characteristic extraction module, a characteristic fusion layer and an output layer.

5. The method of claim 4, wherein the training of the preset model according to the image frame sequence corresponding to each video segment and the corresponding emotion label to obtain the emotion recognition model comprises:

acquiring training three-dimensional posture characteristics based on the image frame sequence corresponding to the video clip and a posture extraction module included by the preset model;

obtaining a first training motion track characteristic and a second training motion track characteristic corresponding to the video clip based on the image frame sequence corresponding to the video clip and a motion track characteristic extraction module included by the preset model;

obtaining training fusion characteristics according to the training three-dimensional posture characteristics, the first training motion track characteristics, the second training motion track characteristics and the characteristic fusion layer of the preset model corresponding to the video clip;

and obtaining a training output result corresponding to the video clip according to the training fusion characteristics corresponding to the video clip and the output layer of the preset model.

6. The method of claim 1, wherein decomposing the video data into a sequence of image frames comprises:

7. The method according to any one of claims 1 to 6, wherein the first motion trajectory feature is a motion trajectory feature corresponding to a hand of the target object, and the second motion trajectory feature is a motion trajectory feature corresponding to a leg of the target object.

8. An emotion recognition apparatus based on gait information, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.

11. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1 to 7.