CN112101095A - Suicide and violence tendency emotion recognition method based on language and limb characteristics - Google Patents

Suicide and violence tendency emotion recognition method based on language and limb characteristics Download PDF

Info

Publication number
CN112101095A
CN112101095A CN202010764407.7A CN202010764407A CN112101095A CN 112101095 A CN112101095 A CN 112101095A CN 202010764407 A CN202010764407 A CN 202010764407A CN 112101095 A CN112101095 A CN 112101095A
Authority
CN
China
Prior art keywords
layer
vector
neural network
text description
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010764407.7A
Other languages
Chinese (zh)
Other versions
CN112101095B (en
Inventor
杜广龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010764407.7A priority Critical patent/CN112101095B/en
Publication of CN112101095A publication Critical patent/CN112101095A/en
Application granted granted Critical
Publication of CN112101095B publication Critical patent/CN112101095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a suicide and violence tendency emotion recognition method based on language and limb characteristics. The method comprises the following steps: collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text descriptions; fusing the text description through a neural network with a self-organizing mapping layer to obtain a text description embedding vector; suicidality and violence tendencies were analyzed from the text description embedded vector using the Softmax function. The invention takes static body motion and dynamic body motion into consideration, and achieves higher efficiency.

Description

Suicide and violence tendency emotion recognition method based on language and limb characteristics
Technical Field
The invention belongs to the field of emotion recognition, and particularly relates to a suicide and violence tendency emotion recognition method based on language and limb characteristics.
Background
To prevent people from self-disability or violence tendencies, it is useful to detect their mood. The emotion of a human being can be recognized by various means such as an electrocardiogram, an electroencephalogram, speech, a facial expression, and the like. Among various emotion signals, physiological signals are widely used for emotion recognition. In recent years, human motion has also become a new feature. The conventional methods include two methods, one is to measure the physiological index of the object by contact, and the other is to observe the physiological property of the object by a non-contact method. In fact, although a non-invasive approach is better, subjects can mask their mood. Technically, audio and video (Beijing university Proc., 2006,5(1): 165-. Therefore, a fusing property is necessary. Although these methods have achieved significant results, improvements are still needed.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a suicide and violence tendency emotion recognition method based on language and limb characteristics. A Kinect with an Infrared (IR) camera can protect the facial image from illumination. Therefore, information such as voice, body movement, etc. is collected using Kinect. The present invention takes into account the spectral and prosodic characteristics of speech to help recognize emotion in the speech content. By extracting prosodic and spectral features from speech, the speech can be converted into textual descriptions, including information such as intonation, and pace. To accurately describe the motion, the body motion is divided into static body motion and dynamic body motion. Static motion and dynamic motion of a human body are respectively analyzed by adopting a Convolutional Neural Network (CNN) and a bidirectional long-and-short-term memory conditional random field (Bi-LSTM-CRF). Multi-sensor data fusion requires a reliable data fusion method. It is effective to fuse such information into text for emotion recognition. And finally, fusing information such as voice, limb actions and the like into the text description.
The purpose of the invention is realized by at least one of the following technical solutions.
The suicide and violence tendency emotion recognition method based on language and limb characteristics comprises the following steps:
s1, collecting videos and audios by using a Kinect, and respectively converting voice features and visual features extracted from the videos and the audios into text descriptions;
s2, fusing the text description through a neural network with a self-organizing mapping layer to obtain a text description embedding vector;
s3, using a Softmax function to analyze suicidality and violence tendency according to the text description embedded vector.
Further, in step S1, the speech features include speech content, prosody and frequency spectrum; the visual characteristics are the limb actions of the human body, and the limb actions are divided into static actions and dynamic actions.
Further, step S1 includes the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the prosody and spectrum features of the video are converted into a video state text description through a Back Propagation Neural Network (BPNN) with a classical structure;
s1.2, processing a selected single frame in a captured video through a Convolutional Neural Network (CNN), and converting the single frame into static motion text description; acquiring and representing joint points of bones from the Kinect, and recording the positions of the joint points of the bones at each moment to finally form sequence skeleton data; and coding framework point sequences corresponding to the continuous actions, namely the N set actions into vectors, processing the vectors by adopting a bidirectional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
Further, the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, which are respectively
Figure BDA0002614118800000021
The output value (i.e. predicted value) of the sample k after passing through the neural network is yk={yk1,...,yklH, the feature vector x of the kth training samplekDimension m, predictor vector ykAnd the true value vector
Figure BDA0002614118800000022
The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, and the size of the batch is set asp, using a sum of squared errors calculation formula, using the average sum of squared errors as an objective function, i.e. the objective function is
Figure BDA0002614118800000023
k denotes a kth node of the hidden layer and q denotes a qth node of the hidden layer.
Further, the Convolutional Neural Network (CNN) comprises an input layer, an implicit layer and a fully connected layer, the implicit layer comprising two convolutional layers and two pooling layers;
the formula for the convolutional layer is as follows:
Figure BDA0002614118800000024
wherein l represents the l convolutional layer, and i represents the value of the i component of the convolutional output matrix; j represents the number of corresponding output matrices; the value of j varies from 0 to N, where N represents the number of convolution output matrices; f is a non-linear sigmoid-type function;
Figure BDA0002614118800000025
an ith component of a jth output matrix representing the ith convolutional layer; bjRepresents the offset of the jth output matrix;
Figure BDA0002614118800000026
representing the weight of the a-th convolution kernel of the j-th output matrix;
and building a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from the upper convolutional layer, the output of the mean pooling layer is used as the input of the next convolutional layer, and the calculation formula is as follows:
Figure BDA0002614118800000031
wherein ,
Figure BDA0002614118800000032
representing the local output after the pooling process is over,
Figure BDA0002614118800000033
are represented as output matrices.
Further, in the bidirectional long-short term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to the bidirectional long-short term memory neural network (Bi-LSTM)1,x2,…,xt,…,xTT represents the T-th coordinate, and T represents a total of T coordinates, wherein the hidden layer output calculation formula is as follows:
ht=σh(Wxhxt+Whhht-1+bh);
wherein ,htRepresenting the output of the hidden layer at time t, WxhRepresenting the weight of the input layer to the hidden layer, WhhRepresenting the weight from hidden layer to hidden layer, bhRepresenting the bias, σ, of the hidden layerhRepresenting an activation function; the bidirectional long-short term memory neural network hidden layer (Bi-LSTM) is used for strengthening bilateral relation, and the first layer is forward LSTM, and the second layer is backward LSTM.
Further, step S2 includes the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description of fixed size into a vector A by using a long-short term memory (LSTM) neural network; the method comprises the steps of converting content text description into a space vector with a certain fixed length by using a word2vec method, embedding the space vector converted from the content text description into a vector B with a fixed size by using a Long Short Term Memory (LSTM) neural network, and using a forward LSTM of a bidirectional long short term memory neural network (Bi-LSTM) by using the Long Short Term Memory (LSTM) neural network. The size of the vector A is consistent with that of the vector B; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedding vector x and standardizing.
Further, in step S3, suicide and violence tendency are analyzed from the text description embedding vector using the Softmax function, and the calculation formula is as follows:
Figure BDA0002614118800000034
wherein ,WjA weight matrix which is the jth emotional tendency, and b represents the bias; the emotional propensity categories are suicidal and violent emotional propensity and no suicide and violent emotional propensity, respectively.
Compared with the prior art, the invention has the following advantages:
(1) the present invention aligns multimodal data with a text layer. The textual intermediate representation and the proposed fusion method form a framework that fuses limb movements and facial expressions. The invention reduces the dimensions of the limb action and the facial expression, and unifies the two types of information into a unified component.
(2) The invention takes static body motion and dynamic body motion into consideration, and achieves higher efficiency.
(3) The Kinect is adopted for data acquisition, so that the performance is high and the operation is convenient.
Drawings
FIG. 1 is a flow chart of the suicide and violence tendency emotion recognition method based on language and limb characteristics.
Detailed Description
Specific implementations of the present invention will be further described with reference to the following examples and drawings, but the embodiments of the present invention are not limited thereto.
Example (b):
the suicide and violence tendency emotion recognition method based on language and limb characteristics, as shown in fig. 1, comprises the following steps:
s1, collecting videos and audios by using a Kinect, and respectively converting voice features and visual features extracted from the videos and the audios into text descriptions;
the speech features include speech content, prosody, and frequency spectrum; the visual characteristics are the limb actions of the human body, and the limb actions are divided into static actions and dynamic actions.
Step S1 includes the following steps:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the prosody and spectrum features of the video are converted into a video state text description through a Back Propagation Neural Network (BPNN) with a classical structure;
the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, which are respectively
Figure BDA0002614118800000041
The output value (i.e. predicted value) of the sample k after passing through the neural network is yk={yk1,...,yklH, the feature vector x of the kth training samplekDimension m, predictor vector ykAnd the true value vector
Figure BDA0002614118800000042
The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of the batch is p, the sum of squared errors is adopted as a calculation formula, the sum of squared errors is used as a target function, namely the target function is
Figure BDA0002614118800000043
k denotes a kth node of the hidden layer and q denotes a qth node of the hidden layer.
S1.2, processing a selected single frame in a captured video through a Convolutional Neural Network (CNN), and converting the single frame into static motion text description; the Convolutional Neural Network (CNN) comprises an input layer, a hidden layer and a full-connection layer, wherein the hidden layer comprises two convolutional layers and two pooling layers;
the formula for the convolutional layer is as follows:
Figure BDA0002614118800000051
wherein l represents the l convolutional layer, and i represents the value of the i component of the convolutional output matrix; j represents the number of corresponding output matrices; the value of j varies from 0 to N, where N represents the number of convolution output matrices; f is a non-linear sigmoid-type function;
Figure BDA0002614118800000052
an ith component of a jth output matrix representing the ith convolutional layer; bjRepresents the offset of the jth output matrix;
Figure BDA0002614118800000053
representing the weight of the a-th convolution kernel of the j-th output matrix;
and building a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from the upper convolutional layer, the output of the mean pooling layer is used as the input of the next convolutional layer, and the calculation formula is as follows:
Figure BDA0002614118800000054
wherein ,
Figure BDA0002614118800000055
representing the local output after the pooling process is over,
Figure BDA0002614118800000056
are represented as output matrices.
Acquiring and representing joint points of bones from the Kinect, and recording the positions of the joint points of the bones at each moment to finally form sequence skeleton data; and coding framework point sequences corresponding to the continuous actions, namely the N set actions into vectors, processing the vectors by adopting a bidirectional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
In a bidirectional long-short term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to a bidirectional long-short term memory neural network (Bi-LSTM)1,x2,…,xt,…,xTT represents the T-th coordinate, and T represents a total of T coordinates, wherein the hidden layer output calculation formula is as follows:
ht=σh(Wxhxt+Whhht-1+bh);
wherein ,htRepresenting the output of the hidden layer at time t, WxhRepresenting the weight of the input layer to the hidden layer, WhhRepresenting the weight from hidden layer to hidden layer, bhRepresenting the bias, σ, of the hidden layerhRepresenting an activation function; the bidirectional long-short term memory neural network hidden layer (Bi-LSTM) is used for strengthening bilateral relation, and the first layer is forward LSTM, and the second layer is backward LSTM.
S2, fusing the text description through a neural network with a self-organizing mapping layer to obtain a text description embedding vector, and the method comprises the following steps:
s2.1, connecting static motion text description, dynamic motion text description and video state text description of fixed size into a vector A by using a long-short term memory (LSTM) neural network; the method comprises the steps of converting content text description into a space vector with a certain fixed length by using a word2vec method, embedding the space vector converted from the content text description into a vector B with a fixed size by using a Long Short Term Memory (LSTM) neural network, and using a forward LSTM of a bidirectional long short term memory neural network (Bi-LSTM) by using the Long Short Term Memory (LSTM) neural network. The size of the vector A is consistent with that of the vector B; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedding vector x and standardizing.
S3, embedding vectors according to the text description by using a Softmax function to analyze suicide and violence tendency, wherein the calculation formula is as follows:
Figure BDA0002614118800000061
wherein ,WjA weight matrix which is the jth emotional tendency, and b represents the bias; the emotional tendency categories are suicide and violent emotional tendency respectivelyAnd no suicidal or violent emotional tendencies.

Claims (8)

1. The suicide and violence tendency emotion recognition method based on language and limb characteristics is characterized by comprising the following steps of:
s1, collecting videos and audios by using a Kinect, and respectively converting voice features and visual features extracted from the videos and the audios into text descriptions;
s2, fusing the text description through a neural network with a self-organizing mapping layer to obtain a text description embedding vector;
s3, using a Softmax function to analyze suicidality and violence tendency according to the text description embedded vector.
2. The suicide and violence tendency emotion recognition method based on language and limb characteristics, as claimed in claim 1, wherein in step S1, the speech characteristics include speech content, prosody and frequency spectrum; the visual characteristics are the limb actions of the human body, and the limb actions are divided into static actions and dynamic actions.
3. The suicide and violence tendency emotion recognition method based on language and limb characteristics, as claimed in claim 2, wherein the step S1 comprises the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the prosody and spectrum features of the video are converted into a video state text description through a Back Propagation Neural Network (BPNN) with a classical structure;
s1.2, processing a selected single frame in a captured video through a Convolutional Neural Network (CNN), and converting the single frame into static motion text description; acquiring and representing joint points of bones from the Kinect, and recording the positions of the joint points of the bones at each moment to finally form sequence skeleton data; and coding framework point sequences corresponding to the continuous actions, namely the N set actions into vectors, processing the vectors by adopting a bidirectional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
4. The language and limb feature-based suicide and violence tendency emotion recognition method as claimed in claim 3, wherein the Back Propagation Neural Network (BPNN) structure is as follows:
there are n training samples in the training sample space Ω, which are respectively
Figure FDA0002614118790000011
The output value (i.e. predicted value) of the sample k after passing through the neural network is yk={yk1,...,yklH, the feature vector x of the kth training samplekDimension m, predictor vector ykAnd the true value vector
Figure FDA0002614118790000012
The vector dimensions are all l; the neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm uses a gradient descent algorithm to update each weight value in the network, the size of the batch is set to be p, a square error sum calculation formula is adopted, an average square error sum is used as a target function, namely the target function is:
Figure FDA0002614118790000013
k denotes a kth node of the hidden layer and q denotes a qth node of the hidden layer.
5. The language and limb feature based suicide and violence tendency emotion recognition method as recited in claim 3, said Convolutional Neural Network (CNN) comprising an input layer, an implied layer and a fully connected layer, the implied layer comprising two convolutional layers and two pooling layers;
the formula for the convolutional layer is as follows:
Figure FDA0002614118790000021
wherein l represents the l convolutional layer, and i represents the value of the i component of the convolutional output matrix; j represents the number of corresponding output matrices; the value of j varies from 0 to N, where N represents the number of convolution output matrices; f is a non-linear sigmoid-type function;
Figure FDA0002614118790000022
an ith component of a jth output matrix representing the ith convolutional layer; bjRepresents the offset of the jth output matrix;
Figure FDA0002614118790000023
representing the weight of the a-th convolution kernel of the j-th output matrix;
and building a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from the upper convolutional layer, the output of the mean pooling layer is used as the input of the next convolutional layer, and the calculation formula is as follows:
Figure FDA0002614118790000024
wherein ,
Figure FDA0002614118790000025
representing the local output after the pooling process is over,
Figure FDA0002614118790000026
are represented as output matrices.
6. The method of claim 3 wherein the conditional random field with long and short term memory (Bi-LSTM-CRF) is given an input sequence { x } for the neural network with long and short term memory (Bi-LSTM)1,x2,…,xt,…,xTT represents the T-th coordinate, and T represents a total of T coordinates, wherein the hidden layer output calculation formula is as follows:
ht=σh(Wxhxt+Whhht-1+bh);
wherein ,htRepresenting the output of the hidden layer at time t, WxhRepresenting the weight of the input layer to the hidden layer, WhhRepresenting the weight from hidden layer to hidden layer, bhRepresenting the bias, σ, of the hidden layerhRepresenting an activation function; the bidirectional long-short term memory neural network hidden layer (Bi-LSTM) is used for strengthening bilateral relation, and the first layer is forward LSTM, and the second layer is backward LSTM.
7. The suicide and violence tendency emotion recognition method based on language and limb characteristics, as claimed in claim 6, wherein the step S2 comprises the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description of fixed size into a vector A by using a long-short term memory (LSTM) neural network; converting the content text description into a space vector with a certain fixed length by using a word2vec method, embedding the space vector converted from the content text description into a vector B with a fixed size by using a Long Short Term Memory (LSTM) neural network, wherein the Long Short Term Memory (LSTM) neural network uses a forward LSTM of a bidirectional long short term memory neural network (Bi-LSTM); the size of the vector A is consistent with that of the vector B; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedding vector x and standardizing.
8. The method for recognizing suicidality and violence tendency based on language and limb characteristics as claimed in claim 1, wherein in step S3, suicidality and violence tendency are analyzed according to the text description embedding vector by using Softmax function, and the calculation formula is as follows:
Figure FDA0002614118790000031
wherein ,WjIs the j-th categoryA weight matrix of the perceptibility, b representing the bias; the emotional propensity categories are suicidal and violent emotional propensity and no suicide and violent emotional propensity, respectively.
CN202010764407.7A 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics Active CN112101095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010764407.7A CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010764407.7A CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Publications (2)

Publication Number Publication Date
CN112101095A true CN112101095A (en) 2020-12-18
CN112101095B CN112101095B (en) 2023-08-29

Family

ID=73750550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010764407.7A Active CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Country Status (1)

Country Link
CN (1) CN112101095B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117414135A (en) * 2023-10-20 2024-01-19 郑州师范学院 Behavioral and psychological abnormality detection method, system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN103279768A (en) * 2013-05-31 2013-09-04 北京航空航天大学 Method for identifying faces in videos based on incremental learning of face partitioning visual representations
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN108363978A (en) * 2018-02-12 2018-08-03 华南理工大学 Using the emotion perception method based on body language of deep learning and UKF
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN103279768A (en) * 2013-05-31 2013-09-04 北京航空航天大学 Method for identifying faces in videos based on incremental learning of face partitioning visual representations
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN108363978A (en) * 2018-02-12 2018-08-03 华南理工大学 Using the emotion perception method based on body language of deep learning and UKF
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117414135A (en) * 2023-10-20 2024-01-19 郑州师范学院 Behavioral and psychological abnormality detection method, system and storage medium

Also Published As

Publication number Publication date
CN112101095B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN108899050B (en) Voice signal analysis subsystem based on multi-modal emotion recognition system
CN108805087B (en) Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system
CN108805089B (en) Multi-modal-based emotion recognition method
CN108877801B (en) Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system
Wang et al. Human emotion recognition by optimally fusing facial expression and speech feature
CN108805088B (en) Physiological signal analysis subsystem based on multi-modal emotion recognition system
CN110188343B (en) Multi-mode emotion recognition method based on fusion attention network
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN108363978B (en) Emotion sensing method based on body language by adopting deep learning and UKF
CN111339837B (en) Continuous sign language recognition method
De et al. Recognition of human behavior for assisted living using dictionary learning approach
CN110956953A (en) Quarrel identification method based on audio analysis and deep learning
CN113822192A (en) Method, device and medium for identifying emotion of escort personnel based on Transformer multi-modal feature fusion
CN113749657B (en) Brain electricity emotion recognition method based on multi-task capsule
CN112151030A (en) Multi-mode-based complex scene voice recognition method and device
CN110163156A (en) It is a kind of based on convolution from the lip feature extracting method of encoding model
CN111967433A (en) Action identification method based on self-supervision learning network
CN112101096A (en) Suicide emotion perception method based on multi-mode fusion of voice and micro-expression
CN112101097A (en) Depression and suicide tendency identification method integrating body language, micro expression and language
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
CN112101095B (en) Suicide and violence tendency emotion recognition method based on language and limb characteristics
CN117198468A (en) Intervention scheme intelligent management system based on behavior recognition and data analysis
CN114882590B (en) Lip reading method based on event camera multi-granularity space-time feature perception
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN114694254A (en) Method and device for detecting and early warning robbery of articles in vertical ladder and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant