CN112101095B - Suicide and violence tendency emotion recognition method based on language and limb characteristics - Google Patents
Suicide and violence tendency emotion recognition method based on language and limb characteristics Download PDFInfo
- Publication number
- CN112101095B CN112101095B CN202010764407.7A CN202010764407A CN112101095B CN 112101095 B CN112101095 B CN 112101095B CN 202010764407 A CN202010764407 A CN 202010764407A CN 112101095 B CN112101095 B CN 112101095B
- Authority
- CN
- China
- Prior art keywords
- layer
- vector
- text description
- neural network
- lstm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 206010010144 Completed suicide Diseases 0.000 title claims abstract description 18
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 56
- 230000033001 locomotion Effects 0.000 claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 claims abstract description 34
- 230000006870 function Effects 0.000 claims abstract description 17
- 230000003068 static effect Effects 0.000 claims abstract description 13
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 230000015654 memory Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 14
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000008451 emotion Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 210000000988 bone and bone Anatomy 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000006403 short-term memory Effects 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 230000002146 bilateral effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000007787 long-term memory Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000033764 rhythmic process Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims 1
- 230000008921 facial expression Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Multimedia (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a suicide and violence tendency emotion recognition method based on language and limb characteristics. The method comprises the following steps: collecting video and audio by using Kinect, and respectively converting the voice features and the visual features extracted from the video and the audio into text descriptions; fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector; suicide and violence tendencies were analyzed from the text description embedded vector using a Softmax function. The invention allows for both static and dynamic body movements, resulting in higher efficiency.
Description
Technical Field
The invention belongs to the field of emotion recognition, and particularly relates to a suicide and violence tendency emotion recognition method based on language and limb characteristics.
Background
To prevent people from self-disability or violent tendencies, it is useful to detect their moods. The emotion of a human being can be recognized in various ways, such as an electrocardiogram, an electroencephalogram, speech, facial expression, and the like. Among various emotion signals, physiological signals are widely used for emotion recognition. In recent years, human motion has also become a new feature. There are two conventional methods, one is to measure the physiological index of the object by contact and the other is to observe the physiological characteristic of the object by a non-contact method. In fact, although a non-invasive approach is better, subjects can disguise their mood. Technically, audio and video (university of Beijing, university journal 2006,5 (1): 165-182) are readily available but susceptible to noise. Thus, fusion properties are necessary. Although these methods achieve significant results, improvements are still needed.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a suicide and violence tendency emotion recognition method based on language and limb characteristics. Kinect with an Infrared (IR) camera can prevent the face image from being affected by illumination. Therefore, kinect is used to collect information such as voice, limb movements, etc. The present invention considers the spectral and prosodic features of speech to help identify emotion in speech content. By extracting prosody and spectral features from speech, the speech can be converted into textual descriptions, including intonation, and pace of speech. To accurately describe the movement, the body movement is divided into a static body movement and a dynamic body movement. The Convolutional Neural Network (CNN) and the Bi-directional long-short-time memory conditional random field (Bi-LSTM-CRF) are adopted to analyze the static motion and the dynamic motion of the human body respectively. The multi-sensor data fusion requires a reliable data fusion method. It is effective to fuse such information into text for emotion recognition. Finally, the information such as voice, limb actions and the like is fused into the text description.
The object of the invention is achieved by at least one of the following technical solutions.
The suicide and violence tendency emotion recognition method based on language and limb characteristics comprises the following steps:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description;
s2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector;
s3, analyzing suicide and violence tendency by using a Softmax function according to the text description embedded vector.
Further, in step S1, the speech features include speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
Further, step S1 includes the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
s1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; and (3) coding skeleton point sequences corresponding to the continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
Further, the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as an objective function, namely the objective function is
k represents the kth node of the hidden layer and q represents the qth node of the hidden layer.
Further, the Convolutional Neural Network (CNN) comprises an input layer, an hidden layer and a fully connected layer, wherein the hidden layer comprises two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents the number of convolved output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix; />A weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
Further, in the two-way long-short term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to the two-way long-short term memory neural network (Bi-LSTM) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
Further, step S2 includes the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; the word2vec method is utilized to convert the content text description into a space vector with a certain fixed length, a Long Short Term Memory (LSTM) neural network is used to embed the space vector converted by the content text description into a vector B with a fixed size, and a Long Short Term Memory (LSTM) neural network is used as a forward LSTM of a Bi-directional long term memory neural network (Bi-LSTM). The vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
Further, in step S3, the suicide and violence tendencies are analyzed according to the text description embedding vector using the Softmax function, and the calculation formula is as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.
Compared with the prior art, the invention has the following advantages:
(1) The present invention aligns multimodal data with a text layer. The text intermediate representation and the proposed fusion method form a framework for fusing limb movements and facial expressions. The invention reduces the dimension of limb actions and facial expressions, and unifies two types of information into a unified component.
(2) The invention allows for both static and dynamic body movements, resulting in higher efficiency.
(3) The invention adopts Kinect for data acquisition, and has high performance and convenient operation.
Drawings
FIG. 1 is a flow chart of the suicidal and violent predisposition emotion recognition method of the present invention based on language and limb characteristics.
Detailed Description
Specific embodiments of the present invention will be described further below with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
Examples:
the suicide and violence tendency emotion recognition method based on language and limb characteristics, as shown in fig. 1, comprises the following steps:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description;
the speech features include speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
Step S1 comprises the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as an objective function, namely the objective function is
k represents the kth node of the hidden layer and q represents the qth node of the hidden layer.
S1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; the Convolutional Neural Network (CNN) comprises an input layer, an implicit layer and a fully connected layer, wherein the implicit layer comprises two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents a convolutionThe number of output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix; />A weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
Acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; and (3) coding skeleton point sequences corresponding to the continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
In the two-way long-short-term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to a two-way long-short-term memory neural network (Bi-LSTM) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
S2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector, wherein the method comprises the following steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; the word2vec method is utilized to convert the content text description into a space vector with a certain fixed length, a Long Short Term Memory (LSTM) neural network is used to embed the space vector converted by the content text description into a vector B with a fixed size, and a Long Short Term Memory (LSTM) neural network is used as a forward LSTM of a Bi-directional long term memory neural network (Bi-LSTM). The vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
S3, using a Softmax function to analyze suicide and violence tendency according to the text description embedded vector, wherein the calculation formula is as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.
Claims (5)
1. The suicide and violence tendency emotion recognition method based on language and limb characteristics is characterized by comprising the following steps of:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description; the method comprises the following steps:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
s1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; coding skeleton point sequences corresponding to continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier;
the Back Propagation Neural Network (BPNN) structure is as follows:
there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->Vector dimensions are all l; the neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as a target functionThe number, i.e. the objective function, is:
k represents a kth node of the hidden layer, and q represents a qth node of the hidden layer;
s2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector;
s3, using a Softmax function to analyze suicide and violence tendency according to the text description embedded vector; the suicide and violence tendencies were analyzed from the text description embedded vector using the Softmax function, calculated as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.
2. The method for identifying suicidal and violent tendencies based on language and limb characteristics as in claim 1 wherein in step S1 the speech characteristics comprise speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
3. The language and limb feature based suicide and violence prone emotion recognition method of claim 1, the Convolutional Neural Network (CNN) comprising an input layer, an hidden layer and a fully connected layer, the hidden layer comprising two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents the number of convolved output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix;a weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
4. The method for identifying suicidal and violent tendencies based on language and limb characteristics according to claim 1, wherein an input sequence { x } is given to a two-way long-short-term memory neural network (Bi-LSTM) in a two-way long-short-term memory conditional random field (Bi-LSTM-CRF) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
5. The language and limb feature-based suicidal and violent tendencies and emotion recognition method as recited in claim 4, wherein step S2 includes the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; converting the content text description into a space vector with a certain fixed length by using a word2vec method, embedding the space vector converted by the content text description into a vector B with a fixed size by using a Long Short Term Memory (LSTM) neural network, and embedding the space vector into a forward LSTM of a bidirectional long and short term memory (Bi-LSTM) neural network by using the LSTM neural network; the vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010764407.7A CN112101095B (en) | 2020-08-02 | 2020-08-02 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010764407.7A CN112101095B (en) | 2020-08-02 | 2020-08-02 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112101095A CN112101095A (en) | 2020-12-18 |
CN112101095B true CN112101095B (en) | 2023-08-29 |
Family
ID=73750550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010764407.7A Active CN112101095B (en) | 2020-08-02 | 2020-08-02 | Suicide and violence tendency emotion recognition method based on language and limb characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101095B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117414135A (en) * | 2023-10-20 | 2024-01-19 | 郑州师范学院 | Behavioral and psychological abnormality detection method, system and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049751A (en) * | 2013-01-24 | 2013-04-17 | 苏州大学 | Improved weighting region matching high-altitude video pedestrian recognizing method |
CN103279768A (en) * | 2013-05-31 | 2013-09-04 | 北京航空航天大学 | Method for identifying faces in videos based on incremental learning of face partitioning visual representations |
CN103473801A (en) * | 2013-09-27 | 2013-12-25 | 中国科学院自动化研究所 | Facial expression editing method based on single camera and motion capturing data |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
CN108363978A (en) * | 2018-02-12 | 2018-08-03 | 华南理工大学 | Using the emotion perception method based on body language of deep learning and UKF |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
-
2020
- 2020-08-02 CN CN202010764407.7A patent/CN112101095B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049751A (en) * | 2013-01-24 | 2013-04-17 | 苏州大学 | Improved weighting region matching high-altitude video pedestrian recognizing method |
CN103279768A (en) * | 2013-05-31 | 2013-09-04 | 北京航空航天大学 | Method for identifying faces in videos based on incremental learning of face partitioning visual representations |
CN103473801A (en) * | 2013-09-27 | 2013-12-25 | 中国科学院自动化研究所 | Facial expression editing method based on single camera and motion capturing data |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
CN108363978A (en) * | 2018-02-12 | 2018-08-03 | 华南理工大学 | Using the emotion perception method based on body language of deep learning and UKF |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
Also Published As
Publication number | Publication date |
---|---|
CN112101095A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113673489B (en) | Video group behavior identification method based on cascade Transformer | |
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN108877801B (en) | Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system | |
CN108805088B (en) | Physiological signal analysis subsystem based on multi-modal emotion recognition system | |
CN110956953B (en) | Quarrel recognition method based on audio analysis and deep learning | |
CN113749657B (en) | Brain electricity emotion recognition method based on multi-task capsule | |
CN111128242A (en) | Multi-mode emotion information fusion and identification method based on double-depth network | |
CN111967354B (en) | Depression tendency identification method based on multi-mode characteristics of limbs and micro-expressions | |
CN117198468B (en) | Intervention scheme intelligent management system based on behavior recognition and data analysis | |
CN111210415B (en) | Method for detecting facial expression hypo of Parkinson patient | |
CN114707530A (en) | Bimodal emotion recognition method and system based on multi-source signal and neural network | |
CN113343860A (en) | Bimodal fusion emotion recognition method based on video image and voice | |
CN112101096A (en) | Suicide emotion perception method based on multi-mode fusion of voice and micro-expression | |
CN116343284A (en) | Attention mechanism-based multi-feature outdoor environment emotion recognition method | |
CN112989920A (en) | Electroencephalogram emotion classification system based on frame-level feature distillation neural network | |
CN114724224A (en) | Multi-mode emotion recognition method for medical care robot | |
CN110717423A (en) | Training method and device for emotion recognition model of facial expression of old people | |
CN111967361A (en) | Emotion detection method based on baby expression recognition and crying | |
CN112101097A (en) | Depression and suicide tendency identification method integrating body language, micro expression and language | |
CN112380924A (en) | Depression tendency detection method based on facial micro-expression dynamic recognition | |
CN115273236A (en) | Multi-mode human gait emotion recognition method | |
CN113642505B (en) | Facial expression recognition method and device based on feature pyramid | |
CN112101095B (en) | Suicide and violence tendency emotion recognition method based on language and limb characteristics | |
CN113850182A (en) | Action identification method based on DAMR-3 DNet | |
CN110348395B (en) | Skeleton behavior identification method based on space-time relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |