CN112347766A - Multi-label classification method for processing microblog text cognition distortion - Google Patents

Multi-label classification method for processing microblog text cognition distortion Download PDF

Info

Publication number
CN112347766A
CN112347766A CN202011351175.9A CN202011351175A CN112347766A CN 112347766 A CN112347766 A CN 112347766A CN 202011351175 A CN202011351175 A CN 202011351175A CN 112347766 A CN112347766 A CN 112347766A
Authority
CN
China
Prior art keywords
text
model
label
classification method
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011351175.9A
Other languages
Chinese (zh)
Inventor
刘丰玮
李娟�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011351175.9A priority Critical patent/CN112347766A/en
Publication of CN112347766A publication Critical patent/CN112347766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a multi-label classification method for processing microblog text cognition distortion, which is a text classification method based on the fusion of a BERT (belief transfer), an LSTM (local transformation) and an Attention mechanism, and is used for performing text preprocessing on a plurality of Chinese corpora in a Chinese corpus data set to obtain a plurality of sequences corresponding to the Chinese corpora; extracting word embedding of each sequence by using a BERT model; performing feature extraction on each sequence by adopting LSTM and Attention to obtain text deep semantic features corresponding to each sequence; the model is trained and tested by classifying the obtained deep semantic features of the text by using a softmax classifier, so that text classification is realized, and context information in the true sense can be captured; the method gives consideration to the context information, avoids the problem of weak historical memory caused by long-time sequences, and can effectively improve the classification effect.

Description

Multi-label classification method for processing microblog text cognition distortion
Technical Field
The present invention is in the field of computer natural language processing and cognitive distortion in depression. Some of them mainly involve multi-label, based on BERT and LSTM and attention mechanism, deep learning text classification method, etc.
Background
With the development of social networking platforms, more and more depressed patients, especially young people, have microblogs as one of the ways to express negative emotions and suicidal ideation. In recent years, more and more similar tree holes appear on the microblog. This vast amount of data includes thoughts, emotions, and even behavioral behaviors in life for patients with depression. Studies have shown that depression patients tend to be more prone to develop depressive or suicidal intent in their daily speech and microblogs. The phenomenon provides a new solution for cognitive characteristic analysis of suspected depression people, namely cognitive characteristic analysis is realized by means of microblog tree hole data. At present, the identification and labeling of the cognitive distortion of the texts of the microblog comments need to be carefully analyzed, so that the identification and labeling are mainly performed manually by experienced professionals. However, the result of manual labeling is easily affected by highly subjective factors such as the emotion and fatigue degree of the annotators, and finally, the labeling result of each annotator is often deviated to a certain degree, so that the stability is not achieved, and a large amount of labor force is wasted.
So far, the research on the causes of depression is not mature, the research has great relation with the physiology and the psychology of patients, meanwhile, the surrounding environment also has great influence on the research, and researchers in the field do a lot of work at home and abroad in the aspects of depression identification, treatment and the like. The precondition and key to the treatment of depression lies in the early recognition and cognition analysis, and if the cognition of a suspected depression patient can be analyzed quickly and effectively under the conditions of relative safety and not much privacy, the corresponding treatment measures can be implemented on the patient as quickly as possible. But currently, cognitive analysis and defense of depression patients in China are still in a lower stage. Traditional cognitive analysis is primarily identified by authoritative diagnostic criteria and psychological scales. The well-known Hamilton depression scale is often used by physicians to diagnose depressive conditions. The table scores the mood, suicidal tendency, sleep state, etc. of the patient, with higher scores indicating more severe depression.
With the development of social networks, microblog texts are classified into an important research content of natural language processing, and high attention is paid at home and abroad. The text classification technology mainly undergoes the development of three stages from statistical language to shallow machine learning and then to deep learning, wherein feature selection as an important classification link also undergoes the conversion from manual extraction to semi-manual extraction and then to automatic extraction. At present, the research in the field at home and abroad is relatively mature, and more text classification algorithms are used as a shallow machine learning algorithm and a deep learning algorithm, and the two types are commonly SVM, KNN, RNN, CNN and the like.
For training of text classifiers, the documents fed Conference on Computer Science and Information Systems teach that words are converted into vectors using Word2vec, then sets of articles representing 7 topics in wiki are classified using LSTM-based Networks, and the classification effect is compared with Convolutional Neural Networks (CNNs), with a significantly higher accuracy than CNNs. Lichao et al introduced a new fusion classification learning framework (LSTM-MFCNN) that synthesized the respective features of the CNN and LSTM models, thereby enhancing the expression of word order semantics and the mining of features. A Recurrent Neural Network (RNN) indicates the defects of the model, and introduces LSTM and GRU on the basis, constructs a bidirectional RNN model, represents Word vectors by using a Skip-gram model in Word2vec, then combines the LSTM and the GRU with the constructed bidirectional RNN respectively to extract features, and finally statistically classifies the features of text data by using a softmax classification method. Zhang D et al propose a classification method based on Word2vec and SVMperf, firstly use Word2vec to cluster similar features, and verify the ability of Word2vec to extract and mine semantic features from specific data in different subject fields and Chinese corpus. And then, vectorizing and classifying the existing comment text by using a word2vec method and an SVMperf method. The method combines the advantages of the CNN and the RNN, and provides a new unified sentence representation and text classification model C-LSTM. Yangs et al propose a multi-layer neural network model (BilSTM-CNN) using LSTM and CNN models, and use Bi-LSTM to process text data and CNN to process the output of the upper layer to obtain classification results, and compare the results through experiments, the experimental effect of the model is obviously superior to that of a single-layer network model and a traditional statistical method. Wang JH et al studied the effect of word embedding and LSTM on text classification, and then combined LSTM and word embedding improved the short text classification algorithm. Nowak J et al propose bidirectional LSTM through the improvement of LSTM, and compare with the study model of the multi-label short text classification method based on deep learning of the bag of words, the effect is obvious.
Disclosure of Invention
Based on the analysis, the invention mainly designs a deep learning multi-label text classification method for performing cognitive distortion analysis on texts. The text semantic representation method is developed from an initial One-Hot representation method to a current mainstream method based on a neural network, such as Word2Vec, Glove and the like, and although the problem of Word context relationship is solved to a certain extent, the polysemous problem that words have different meanings in different contexts is not solved yet. According to the method, BERT is used as a language feature extraction and representation method, not only can rich grammatical and semantic features of a microblog comment text be obtained, but also the problem that a traditional language feature representation method based on a neural network structure ignores word ambiguity is solved.
Firstly, crawling microblog comment content, asking relevant experts to label relative cognitive distortion, and then carrying out next-step processing on data to obtain a format required by a code. Then, the data set is divided into a training set, a verification set and a test set, the key is that corresponding labels are uniform, the labels are divided into three data sets, and then the three data sets are out of order.
The data flow then precedes BERT. BERT, which was proposed in 2018 by the google team as Devlin, is applicable to a variety of natural language processing tasks. BERT adopts a Transformer language model, the model is of an Encoder-Decoder structure, a recursive structure is abandoned, and attention mechanism is adopted to mine the relation between input and output. The structure of the Transformer model is shown in FIG. 1. The structure of the BERT model is shown in FIG. 2.
The data was processed by BERT and then LSTM. The hidden layer representation of the bidirectional long-and-short time memory network (BilSTM) is formed by splicing the output transmitted forward and the output transmitted backward by the LSTM. It adds information from the postamble to the time step of the LSTM through the flow of information conducted in both directions. Using LSTM not only utilizes gating to filter information, but also enables network to learn long distance inter-phrase relationship.
A multi-label classification method for processing microblog text cognition distortion comprises the following steps:
step 1, crawling a comment data set under a microblog meal carrier, roughly cleaning the data set (removing some junk information such as advertisements) and then carrying out professional labeling on data in the data set.
And 2, randomly dividing the microblog text data set into a training set, a verification set and a test set, and converting excel into a tsv file according to format requirements.
And 3, constructing a model, namely pre-training by using a BERT model, then training an LSTM model, adding an Attention layer before an output layer of a normal BilTM model, adopting an Attention mechanism, and having the core that an Attention vector is generated, updating the weighted value of each dimension by performing similarity calculation with an input vector, so that the value of a key word in a sentence is improved, the model focuses Attention on the key word, the action of other irrelevant words is reduced, and the precision of text classification is further improved.
Step 3.1, performing word coding on the text processed in the step 2, namely obtaining a word vector of the text through a BERT model;
and 3.2, processing the vector obtained in the step 3.1 by utilizing dropout. And inputting the output vector into the hidden layer, calculating the hidden layer vector sequence from one direction by using a standard LSTM, calculating the two-way LSTM on the hidden layers in two different directions, and finally merging and outputting the results in the two directions.
And 3.3, adding an attention mechanism before outputting, distributing different weights and bias terms for each output value of the BilSTM, and calculating the weight score of each word in each text by using a torch.
Step 3.4, the results of the previous step are normalized by softmax.
And 3.5, calculating the score condition of each classification according to the calculated feature vector and outputting.
And 4, training the model constructed in the step 3, and selecting the model which best appears on the test set as a result after the training is finished.
Compared with the prior art, the invention has the following advantages:
the invention is based on a text classification method of integration of BERT, LSTM and Attention mechanisms, and text preprocessing is carried out on a plurality of Chinese corpora in a Chinese corpus data set to obtain a plurality of sequences corresponding to the Chinese corpora; extracting word embedding of each sequence by using a BERT model; performing feature extraction on each sequence by adopting LSTM and Attention to obtain text deep semantic features corresponding to each sequence; and training and testing the model by classifying the obtained deep semantic features of the text by using a softmax classifier, so as to realize text classification. Meanwhile, the model has the following specific advantages:
(1) word embedding is extracted through a BERT model, the process of pre-training by a word2vec algorithm in the prior art is replaced, and the BERT model is used as a bidirectional deep system and can capture context information in the true sense;
(2) based on the LSTM, past and future information can be acquired, and context information is considered.
(3) The addition of the Attention mechanism can highlight important features, avoid the problem of weakened historical memory caused by long-time sequences, and effectively improve the classification effect.
Drawings
FIG. 1 is a diagram of a Transformer model.
Figure 2 is a BERT model.
Fig. 3 is a general structure diagram of the proposed model of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.
The invention provides a multi-label classification method for processing microblog text cognition distortion, which specifically comprises the following steps:
the hardware equipment used by the invention comprises 1 PC and 1 1080 video card;
for convenience of description, the related terms appearing in the detailed description are explained:
(1) BERT (Bidirectional Encoder reproduction from transformations): a bi-directionally encoded representation of a transform;
(2) LSTM (Long Short-Term Memory): a long-time and short-time memory network model;
(3) and (6) Attention: an attention mechanism;
step 1, crawling a comment data set under a microblog meal carrier, roughly cleaning the data set (removing some junk information such as advertisements) and then carrying out professional labeling on data in the data set.
And 2, randomly dividing the microblog text data set into a training set, a verification set and a test set, and converting excel into a tsv file according to format requirements.
And 2.1, reading the file content through a java FileReader to generate a tsv text file, wherein the front is sensor, the rear is label, the label is seized, the middle is separated by \ t.
And 2.2, when the file is generated, 60 percent of the data quantity is taken as a training set, 20 percent is taken as a verification set, and 20 percent is taken as a test set, so that the data of each label is uniformly distributed in the three verification sets as much as possible.
And 2.3, after the three tsv files are generated, disordering the data by using a shuffle method of java's Collections.
And 3, constructing a model, namely pre-training by using a BERT model, then training an LSTM model, adding an Attention layer before an output layer of a normal BilTM model, adopting an Attention mechanism, and having the core that an Attention vector is generated, updating the weighted value of each dimension by performing similarity calculation with an input vector, so that the value of a key word in a sentence is improved, the model focuses Attention on the key word, the action of other irrelevant words is reduced, and the precision of text classification is further improved.
Step 3.1, inputting an original text data set T, and preprocessing the text data to obtain a text data set T'; wherein T ═ { T ═ T1,t2,...,ti,...,tlen(T)Len (T) is the number of texts in T, TiIs the ith text in T, T '═ T'1,t′2,...,t′j,...,t′len(T′)Len (T ') is the number of text contents in T', Tj' is the jth text in T ', and unifies the text tj ' into a fixed length seq _ len;
3.2, vectorizing the text data set T' by using the pre-trained BERT model;
and 3.3, processing the vector obtained in step 3.2 by utilizing dropout. dropout refers to temporarily discarding a neural network unit from a network according to a certain probability in the training process of a deep learning network. The purpose is to prevent overfitting. This procedure yields a three-dimensional matrix with the shape [ batch _ size, seq _ len, bert _ dim ].
And 3.4, inputting the 3.3 output vector into the hidden layer by the model, calculating the hidden layer vector sequence from one direction by a standard LSTM, calculating the two-way LSTM on the hidden layers in two different directions, and finally merging and outputting the results in the two directions. Output a three-dimensional matrix whose shape is [ batch _ size, seq _ len, rnn _ hidden _ size x 2] (where rnn _ hidden _ size is the hidden feature dimension, i default to 300, so the third dimension is rnn _ hidden _ size x 2 because it is bi-directional).
Step 3.5, adding an attention mechanism before outputting, assigning different weights and bias terms to each output value of the BilSTM, calculating the weight score of each word in the text by a torch.
And 3.6, normalizing the previous result by softmax, and calculating a weight at each moment to obtain a matrix of [ batch _ size, seq _ len,1 ].
And 3.7, combining the hidden dot obtained in the step 3.4 with the matrix obtained in the step 3.6, and performing torech.sum operation to obtain feat: [ batch _ size, rnn _ hidden _ size × 2], namely the extracted feature vector.
And 3.8, calculating the score condition of each classification according to the calculated feature vector and outputting.
And 4, training the model constructed in the step 3, and selecting the model which best appears on the test set as a result after the training is finished.
Step 4.1, the model data input table is a 14-dimensional vector, 0 represents no label, and 1 represents the label (14-dimensional because the model data input table is roughly fourteen types according to the type of cognitive distortion).
And 4.2, calculating loss function loss through the nn.BCEWithLoitsLoss () function according to the score conditions logits obtained in the step three.
Step 4.3, if the gradient accumulation is greater than 1, then the correction is made by dividing loss by this number.
And 4.4, outputting the model with the best result as the training result through the test in the test set.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims (4)

1. A multi-label classification method for processing microblog text cognition distortion is characterized by comprising the following steps: the method comprises the following steps of,
step 1, crawling a comment data set under a microblog meal carrier, firstly carrying out rough cleaning on the data set, and then carrying out professional labeling on data in the data set;
step 2, randomly dividing a microblog text data set into a training set, a verification set and a test set, and converting excel into a tsv file according to format requirements;
step 3, constructing a model, namely pre-training by using a BERT model, then training an LSTM model, adding an Attention layer before an output layer of a normal BilTM model, adopting an Attention mechanism, and having the core that an Attention vector is generated, updating the weight value of each dimension by performing similarity calculation with an input vector, so that the value of key words in a sentence is improved, the model focuses Attention on the key words, the action of other irrelevant words is reduced, and the precision of text classification is further improved;
and 4, training the BERT model constructed in the step 3, and selecting the model on the test set as a result after the training is finished.
2. The multi-label classification method for processing microblog text recognition distortion according to claim 1, wherein the multi-label classification method comprises the following steps: in step 2, step 2.1, reading the file content through a java FileReader to generate a tsv text file, wherein the front is a sensor, the rear is a label, the label is taken, the separation is carried out, and the middle is separated by t;
step 2.2, when the document is generated, 60 percent of the data quantity is taken as a training set, 20 percent is taken as a verification set, and 20 percent is taken as a test set, so that the data of each label is uniformly distributed in the three verification sets;
and 2.3, after the three tsv files are generated, disordering the data by using a shuffle method of java's Collections.
3. The multi-label classification method for processing microblog text recognition distortion according to claim 1, wherein the multi-label classification method comprises the following steps: in step 3, step 3.1, performing word coding on the text processed in step 2, namely obtaining a word vector of the text through a BERT model;
step 3.2, processing the vector obtained by the step 3.1 by utilizing dropout; inputting the output vector into a hidden layer, calculating a hidden layer vector sequence from one direction by a standard LSTM, calculating the two-way LSTM on hidden layers in two different directions, and finally merging and outputting results in the two directions;
step 3.3, adding an attention mechanism before outputting, distributing different weights and bias terms for each output value of the BilSTM, and calculating the weight score of each word in each text by a torch.
Step 3.4, normalizing the result of the previous step by softmax;
and 3.5, calculating the score condition of each classification according to the calculated feature vector and outputting.
4. The multi-label classification method for processing microblog text recognition distortion according to claim 1, wherein the multi-label classification method comprises the following steps: in step 4, step 4.1, the model data input table is a 14-dimensional vector, 0 represents that the label is absent, and 1 represents that the label is present;
4.2, calculating loss function loss through nn.BCEWithLoitsLoss () function of the score situation logits obtained in the step 3;
step 4.3, if the gradient accumulation is greater than 1, dividing the loss by the number for correction;
and 4.4, outputting the model with the best result as the training result through the test in the test set.
CN202011351175.9A 2020-11-27 2020-11-27 Multi-label classification method for processing microblog text cognition distortion Pending CN112347766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011351175.9A CN112347766A (en) 2020-11-27 2020-11-27 Multi-label classification method for processing microblog text cognition distortion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011351175.9A CN112347766A (en) 2020-11-27 2020-11-27 Multi-label classification method for processing microblog text cognition distortion

Publications (1)

Publication Number Publication Date
CN112347766A true CN112347766A (en) 2021-02-09

Family

ID=74364948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011351175.9A Pending CN112347766A (en) 2020-11-27 2020-11-27 Multi-label classification method for processing microblog text cognition distortion

Country Status (1)

Country Link
CN (1) CN112347766A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113779252A (en) * 2021-09-09 2021-12-10 安徽理工大学 Emotion classification method for Chinese short text based on electra + atten + BilSTM
CN115392218A (en) * 2022-07-15 2022-11-25 哈尔滨工业大学 Method and system for constructing pre-training language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309306A (en) * 2019-06-19 2019-10-08 淮阴工学院 A kind of Document Modeling classification method based on WSD level memory network
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309306A (en) * 2019-06-19 2019-10-08 淮阴工学院 A kind of Document Modeling classification method based on WSD level memory network
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
於张闲;胡孔法;: "基于BERT-Att-biLSTM模型的医学信息分类研究", 计算机时代, no. 03, 15 March 2020 (2020-03-15), pages 1 - 4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191148A (en) * 2021-04-30 2021-07-30 西安理工大学 Rail transit entity identification method based on semi-supervised learning and clustering
CN113779252A (en) * 2021-09-09 2021-12-10 安徽理工大学 Emotion classification method for Chinese short text based on electra + atten + BilSTM
CN115392218A (en) * 2022-07-15 2022-11-25 哈尔滨工业大学 Method and system for constructing pre-training language model
CN115392218B (en) * 2022-07-15 2023-06-20 哈尔滨工业大学 Method and system for constructing pre-training language model

Similar Documents

Publication Publication Date Title
Abdullah et al. SEDAT: sentiment and emotion detection in Arabic text using CNN-LSTM deep learning
CN110210037B (en) Syndrome-oriented medical field category detection method
Sasidhar et al. Emotion detection in hinglish (hindi+ english) code-mixed social media text
CN111382565B (en) Emotion-reason pair extraction method and system based on multiple labels
CN112347766A (en) Multi-label classification method for processing microblog text cognition distortion
CN108460089A (en) Diverse characteristics based on Attention neural networks merge Chinese Text Categorization
CN110020671B (en) Drug relationship classification model construction and classification method based on dual-channel CNN-LSTM network
CN110287323B (en) Target-oriented emotion classification method
CN107247702A (en) A kind of text emotion analysis and processing method and system
CN110502753A (en) A kind of deep learning sentiment analysis model and its analysis method based on semantically enhancement
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
Sheshikala et al. Natural language processing and machine learning classifier used for detecting the author of the sentence
Zhao et al. Multi-level fusion of wav2vec 2.0 and bert for multimodal emotion recognition
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN114065848A (en) Chinese aspect level emotion classification method based on pre-training emotion embedding
Antit et al. TunRoBERTa: a Tunisian robustly optimized BERT approach model for sentiment analysis
Parvin et al. Multi-class textual emotion categorization using ensemble of convolutional and recurrent neural network
CN112597304A (en) Question classification method and application thereof
CN112784601A (en) Key information extraction method and device, electronic equipment and storage medium
Hasnat et al. Understanding sarcasm from reddit texts using supervised algorithms
Marerngsit et al. A two-stage text-to-emotion depressive disorder screening assistance based on contents from online community
AlBatayha Multi-topic labelling classification based on LSTM
KR20200040032A (en) A method ofr classification of korean postings based on bidirectional lstm-attention
CN114519355A (en) Medicine named entity recognition and entity standardization method
Shaw et al. Investigations in psychological stress detection from social media text using deep architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination