CN112101095B - Suicide and violence tendency emotion recognition method based on language and limb characteristics - Google Patents

Suicide and violence tendency emotion recognition method based on language and limb characteristics Download PDF

Info

Publication number
CN112101095B
CN112101095B CN202010764407.7A CN202010764407A CN112101095B CN 112101095 B CN112101095 B CN 112101095B CN 202010764407 A CN202010764407 A CN 202010764407A CN 112101095 B CN112101095 B CN 112101095B
Authority
CN
China
Prior art keywords
layer
vector
text description
neural network
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010764407.7A
Other languages
Chinese (zh)
Other versions
CN112101095A (en
Inventor
杜广龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010764407.7A priority Critical patent/CN112101095B/en
Publication of CN112101095A publication Critical patent/CN112101095A/en
Application granted granted Critical
Publication of CN112101095B publication Critical patent/CN112101095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a suicide and violence tendency emotion recognition method based on language and limb characteristics. The method comprises the following steps: collecting video and audio by using Kinect, and respectively converting the voice features and the visual features extracted from the video and the audio into text descriptions; fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector; suicide and violence tendencies were analyzed from the text description embedded vector using a Softmax function. The invention allows for both static and dynamic body movements, resulting in higher efficiency.

Description

Suicide and violence tendency emotion recognition method based on language and limb characteristics
Technical Field
The invention belongs to the field of emotion recognition, and particularly relates to a suicide and violence tendency emotion recognition method based on language and limb characteristics.
Background
To prevent people from self-disability or violent tendencies, it is useful to detect their moods. The emotion of a human being can be recognized in various ways, such as an electrocardiogram, an electroencephalogram, speech, facial expression, and the like. Among various emotion signals, physiological signals are widely used for emotion recognition. In recent years, human motion has also become a new feature. There are two conventional methods, one is to measure the physiological index of the object by contact and the other is to observe the physiological characteristic of the object by a non-contact method. In fact, although a non-invasive approach is better, subjects can disguise their mood. Technically, audio and video (university of Beijing, university journal 2006,5 (1): 165-182) are readily available but susceptible to noise. Thus, fusion properties are necessary. Although these methods achieve significant results, improvements are still needed.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a suicide and violence tendency emotion recognition method based on language and limb characteristics. Kinect with an Infrared (IR) camera can prevent the face image from being affected by illumination. Therefore, kinect is used to collect information such as voice, limb movements, etc. The present invention considers the spectral and prosodic features of speech to help identify emotion in speech content. By extracting prosody and spectral features from speech, the speech can be converted into textual descriptions, including intonation, and pace of speech. To accurately describe the movement, the body movement is divided into a static body movement and a dynamic body movement. The Convolutional Neural Network (CNN) and the Bi-directional long-short-time memory conditional random field (Bi-LSTM-CRF) are adopted to analyze the static motion and the dynamic motion of the human body respectively. The multi-sensor data fusion requires a reliable data fusion method. It is effective to fuse such information into text for emotion recognition. Finally, the information such as voice, limb actions and the like is fused into the text description.
The object of the invention is achieved by at least one of the following technical solutions.
The suicide and violence tendency emotion recognition method based on language and limb characteristics comprises the following steps:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description;
s2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector;
s3, analyzing suicide and violence tendency by using a Softmax function according to the text description embedded vector.
Further, in step S1, the speech features include speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
Further, step S1 includes the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
s1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; and (3) coding skeleton point sequences corresponding to the continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
Further, the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as an objective function, namely the objective function is
k represents the kth node of the hidden layer and q represents the qth node of the hidden layer.
Further, the Convolutional Neural Network (CNN) comprises an input layer, an hidden layer and a fully connected layer, wherein the hidden layer comprises two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents the number of convolved output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix; />A weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
Further, in the two-way long-short term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to the two-way long-short term memory neural network (Bi-LSTM) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
Further, step S2 includes the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; the word2vec method is utilized to convert the content text description into a space vector with a certain fixed length, a Long Short Term Memory (LSTM) neural network is used to embed the space vector converted by the content text description into a vector B with a fixed size, and a Long Short Term Memory (LSTM) neural network is used as a forward LSTM of a Bi-directional long term memory neural network (Bi-LSTM). The vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
Further, in step S3, the suicide and violence tendencies are analyzed according to the text description embedding vector using the Softmax function, and the calculation formula is as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.
Compared with the prior art, the invention has the following advantages:
(1) The present invention aligns multimodal data with a text layer. The text intermediate representation and the proposed fusion method form a framework for fusing limb movements and facial expressions. The invention reduces the dimension of limb actions and facial expressions, and unifies two types of information into a unified component.
(2) The invention allows for both static and dynamic body movements, resulting in higher efficiency.
(3) The invention adopts Kinect for data acquisition, and has high performance and convenient operation.
Drawings
FIG. 1 is a flow chart of the suicidal and violent predisposition emotion recognition method of the present invention based on language and limb characteristics.
Detailed Description
Specific embodiments of the present invention will be described further below with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
Examples:
the suicide and violence tendency emotion recognition method based on language and limb characteristics, as shown in fig. 1, comprises the following steps:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description;
the speech features include speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
Step S1 comprises the steps of:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
the Back Propagation Neural Network (BPNN) structure is as follows: there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->The vector dimensions are all l. The neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as an objective function, namely the objective function is
k represents the kth node of the hidden layer and q represents the qth node of the hidden layer.
S1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; the Convolutional Neural Network (CNN) comprises an input layer, an implicit layer and a fully connected layer, wherein the implicit layer comprises two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents a convolutionThe number of output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix; />A weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
Acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; and (3) coding skeleton point sequences corresponding to the continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier.
In the two-way long-short-term memory conditional random field (Bi-LSTM-CRF), an input sequence { x ] is given to a two-way long-short-term memory neural network (Bi-LSTM) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
S2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector, wherein the method comprises the following steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; the word2vec method is utilized to convert the content text description into a space vector with a certain fixed length, a Long Short Term Memory (LSTM) neural network is used to embed the space vector converted by the content text description into a vector B with a fixed size, and a Long Short Term Memory (LSTM) neural network is used as a forward LSTM of a Bi-directional long term memory neural network (Bi-LSTM). The vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
S3, using a Softmax function to analyze suicide and violence tendency according to the text description embedded vector, wherein the calculation formula is as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.

Claims (5)

1. The suicide and violence tendency emotion recognition method based on language and limb characteristics is characterized by comprising the following steps of:
s1, collecting video and audio by using Kinect, and respectively converting voice features and visual features extracted from the video and the audio into text description; the method comprises the following steps:
s1.1, directly converting voice content into content text description through Windows SDK v2.0 public preview of Kinect; the two characteristics of rhythm and frequency spectrum are converted into video state text description through a Back Propagation Neural Network (BPNN) of a classical structure;
s1.2, converting a single frame selected from a captured video into a static motion text description through Convolutional Neural Network (CNN) processing; acquiring and representing bone joint points from Kinect, recording bone joint point positions at each moment, and finally forming sequence skeleton data; coding skeleton point sequences corresponding to continuous actions, namely N set actions, into vectors, processing the vectors by adopting a Bi-directional long-short term memory conditional random field (Bi-LSTM-CRF) to obtain action sequences, and finally classifying the action sequences into corresponding dynamic motion text descriptions by a Softmax classifier;
the Back Propagation Neural Network (BPNN) structure is as follows:
there are n training samples in the training sample space Ω, respectivelyThe output value (i.e., predicted value) of the sample k after passing through the neural network is y k ={y k1 ,...,y kl Characteristic vector x of kth training sample k Dimension m, predictive value vector y k And the true value vector->Vector dimensions are all l; the neural network has a 3-layer structure, wherein the 1 st layer is an input layer, the 3 rd layer is an output layer, the 2 nd layer is a hidden layer, the BP algorithm updates each weight in the network by using a gradient descent algorithm, the size of batch is set as p, a square error sum calculation formula is adopted, and the average square error sum is used as a target functionThe number, i.e. the objective function, is:
k represents a kth node of the hidden layer, and q represents a qth node of the hidden layer;
s2, fusing the text description through a neural network with a self-organizing map layer to obtain a text description embedded vector;
s3, using a Softmax function to analyze suicide and violence tendency according to the text description embedded vector; the suicide and violence tendencies were analyzed from the text description embedded vector using the Softmax function, calculated as follows:
wherein ,Wj B represents bias for the weight matrix of the j-th emotion tendency; the emotion tendencies categories are those with and without suicide and violence tendencies, respectively.
2. The method for identifying suicidal and violent tendencies based on language and limb characteristics as in claim 1 wherein in step S1 the speech characteristics comprise speech content, prosody and spectrum; the visual characteristics are limb movements of a human body, and the limb movements are divided into static movements and dynamic movements.
3. The language and limb feature based suicide and violence prone emotion recognition method of claim 1, the Convolutional Neural Network (CNN) comprising an input layer, an hidden layer and a fully connected layer, the hidden layer comprising two convolutional layers and two pooling layers;
the calculation formula of the convolution layer is as follows:
where l represents the first convolution layer and i represents the value of the i-th component of the convolution output matrix; j represents the number of corresponding output matrixes; the value of j varies between 0 and N, where N represents the number of convolved output matrices; f is a nonlinear sigmoid type function;an ith component representing a jth output matrix of the ith convolutional layer; b j Representing the bias of the jth output matrix;a weight representing an a-th convolution kernel of the j-th output matrix;
the method comprises the steps of constructing a pooling layer by using mean pooling, wherein the input of the mean pooling layer is derived from an upper convolution layer, and the output is used as the input of a next convolution layer, and the calculation formula is as follows:
wherein ,representing the local output after the pooling process is finished, < >>Are denoted as output matrices.
4. The method for identifying suicidal and violent tendencies based on language and limb characteristics according to claim 1, wherein an input sequence { x } is given to a two-way long-short-term memory neural network (Bi-LSTM) in a two-way long-short-term memory conditional random field (Bi-LSTM-CRF) 1 ,x 2 ,…,x t ,…,x T -wherein T represents the T-th coordinate and T represents a total of T coordinates, wherein the output of the hidden layer is calculated as:
h t =σ h (W xh x t +W hh h t-1 +b h );
wherein ,ht Indicating the output of the hidden layer at time t, W xh Representing the weight of the input layer to the hidden layer, W hh Representing weights from hidden layer to hidden layer, b h Representing the bias, sigma, of the hidden layer h Representing an activation function; a Bi-directional long-short term memory neural network hidden layer (Bi-LSTM) is used to strengthen the bilateral relationship, and the first layer is a forward LSTM, and the second layer is a backward LSTM.
5. The language and limb feature-based suicidal and violent tendencies and emotion recognition method as recited in claim 4, wherein step S2 includes the steps of:
s2.1, connecting static motion text description, dynamic motion text description and video state text description with fixed sizes into a vector A by using a long-short-term memory (LSTM) neural network; converting the content text description into a space vector with a certain fixed length by using a word2vec method, embedding the space vector converted by the content text description into a vector B with a fixed size by using a Long Short Term Memory (LSTM) neural network, and embedding the space vector into a forward LSTM of a bidirectional long and short term memory (Bi-LSTM) neural network by using the LSTM neural network; the vector A and the vector B keep the same size; and connecting the vector A and the vector B with the vector A by using element multiplication to obtain a cross effect, obtaining a text description embedded vector x and carrying out standardization.
CN202010764407.7A 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics Active CN112101095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010764407.7A CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010764407.7A CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Publications (2)

Publication Number Publication Date
CN112101095A CN112101095A (en) 2020-12-18
CN112101095B true CN112101095B (en) 2023-08-29

Family

ID=73750550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010764407.7A Active CN112101095B (en) 2020-08-02 2020-08-02 Suicide and violence tendency emotion recognition method based on language and limb characteristics

Country Status (1)

Country Link
CN (1) CN112101095B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117414135A (en) * 2023-10-20 2024-01-19 郑州师范学院 Behavioral and psychological abnormality detection method, system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN103279768A (en) * 2013-05-31 2013-09-04 北京航空航天大学 Method for identifying faces in videos based on incremental learning of face partitioning visual representations
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN108363978A (en) * 2018-02-12 2018-08-03 华南理工大学 Using the emotion perception method based on body language of deep learning and UKF
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN103279768A (en) * 2013-05-31 2013-09-04 北京航空航天大学 Method for identifying faces in videos based on incremental learning of face partitioning visual representations
CN103473801A (en) * 2013-09-27 2013-12-25 中国科学院自动化研究所 Facial expression editing method based on single camera and motion capturing data
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN108363978A (en) * 2018-02-12 2018-08-03 华南理工大学 Using the emotion perception method based on body language of deep learning and UKF
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term

Also Published As

Publication number Publication date
CN112101095A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN113673489B (en) Video group behavior identification method based on cascade Transformer
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN108877801B (en) Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system
CN108805088B (en) Physiological signal analysis subsystem based on multi-modal emotion recognition system
CN110956953B (en) Quarrel recognition method based on audio analysis and deep learning
CN113749657B (en) Brain electricity emotion recognition method based on multi-task capsule
CN111128242A (en) Multi-mode emotion information fusion and identification method based on double-depth network
CN111967354B (en) Depression tendency identification method based on multi-mode characteristics of limbs and micro-expressions
CN117198468B (en) Intervention scheme intelligent management system based on behavior recognition and data analysis
CN111210415B (en) Method for detecting facial expression hypo of Parkinson patient
CN114707530A (en) Bimodal emotion recognition method and system based on multi-source signal and neural network
CN113343860A (en) Bimodal fusion emotion recognition method based on video image and voice
CN112101096A (en) Suicide emotion perception method based on multi-mode fusion of voice and micro-expression
CN116343284A (en) Attention mechanism-based multi-feature outdoor environment emotion recognition method
CN112989920A (en) Electroencephalogram emotion classification system based on frame-level feature distillation neural network
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
CN110717423A (en) Training method and device for emotion recognition model of facial expression of old people
CN111967361A (en) Emotion detection method based on baby expression recognition and crying
CN112101097A (en) Depression and suicide tendency identification method integrating body language, micro expression and language
CN112380924A (en) Depression tendency detection method based on facial micro-expression dynamic recognition
CN115273236A (en) Multi-mode human gait emotion recognition method
CN113642505B (en) Facial expression recognition method and device based on feature pyramid
CN112101095B (en) Suicide and violence tendency emotion recognition method based on language and limb characteristics
CN113850182A (en) Action identification method based on DAMR-3 DNet
CN110348395B (en) Skeleton behavior identification method based on space-time relationship

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant