CN110245229A - A kind of deep learning theme sensibility classification method based on data enhancing - Google Patents

A kind of deep learning theme sensibility classification method based on data enhancing Download PDF

Info

Publication number
CN110245229A
CN110245229A CN201910365005.7A CN201910365005A CN110245229A CN 110245229 A CN110245229 A CN 110245229A CN 201910365005 A CN201910365005 A CN 201910365005A CN 110245229 A CN110245229 A CN 110245229A
Authority
CN
China
Prior art keywords
sentence
word
training
data
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910365005.7A
Other languages
Chinese (zh)
Other versions
CN110245229B (en
Inventor
周晨星
赖韩江
印鉴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201910365005.7A priority Critical patent/CN110245229B/en
Publication of CN110245229A publication Critical patent/CN110245229A/en
Application granted granted Critical
Publication of CN110245229B publication Critical patent/CN110245229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

The present invention provides a kind of deep learning theme sensibility classification method based on data enhancing, this method can allow word first to get a preliminary semantic information by bert pre-training language model, then by the context semantic feature between two-way GRU e-learning word and word, a kind of method for enhancing data is proposed simultaneously, the maximum word of feeling polarities is influenced in each sentence by rejecting, the judgement for the feeling polarities for forcing model that study is gone to be more difficult sentence, while EDS extended data set enables model to capture feature from more data sets again.Show the sensibility classification method before present invention comparison by the experiment on corresponding data collection, has a distinct increment.

Description

A kind of deep learning theme sensibility classification method based on data enhancing
Technical field
The present invention relates to natural language processing fields, more particularly, to a kind of deep learning master based on data enhancing Inscribe sensibility classification method.
Background technique
In recent years, Internet technology is more mature, people are accustomed to mutually exchanging and expressing the idea of oneself on the net.? Many text informations are remained during this, on internet, and sentiment analysis technology is intended to excavate from these text informations Client mentions in the viewpoint and tendency to show emotion for some things for subsequent concrete application scene such as retail shop's innovation etc. out Technical support is supplied, so sentiment analysis technology all has very high application value in academia and industry.
And the other sentiment analysis of subject matter level is exactly emotion tendency of the judgement in short about some theme, this is in emotion point It is had a decisive role in analysis.Common analysis method mainly includes based on sentiment dictionary and based on engineering at present The method of habit.Method based on sentiment dictionary be by find sentence in about some theme emotion word remittance abroad show quantity and Their feeling polarities carry out the Sentiment orientation of overall merit this sentence about this theme, the emphasis of this step be it needs to be determined that It is then for statistical analysis again which emotion vocabulary relevant to given theme has.The method is easy to operate, is easy to operate.No Cross disadvantage to be also evident from: the quality requirement of sentiment dictionary constructed by 1. pairs is very high, there is some words implicitly to show emotion It is easy to be ignored and sentiment analysis accuracy rate is caused to decline.2. needing precise positioning to current sentence about some theme Emotion word will cause the same reduction classification performance of mistake classification if position inaccurate.Therefore, based on the side of sentiment dictionary Method is gradually substituted by other methods.Many researchs all use the sentiment analysis method of machine learning at present, are seen first At being a classification problem, the feature for being conducive to determine about the sentiment analysis of theme is chosen from the training sample marked, Then train a sorter model (such as arest neighbors KNN, Bayes and support vector machines etc.) go to predict unknown sentence about The feeling polarities of some theme.This method is more preferable than the classification effect based on sentiment dictionary, but does not reach still The expectation of people.
Analyze occur at present cause classifying quality it is bad a reason --- data set scale is small.Consider, by original Expanding new data set on the basis of data set makes depth network have a more powerful classification capacity.Due in a word The word for influencing the feeling polarities of some theme is not that uniquely, keeping in mind influences feeling polarities judgement maximum by cutting out in a word That word, be put into depth network as new training set and train again, on the one hand can achieve EDS extended data set Purpose, to enhance extraction and study of the depth network to data set features;On the other hand enhancing depth network is to emotion pole The analysis ability of the unconspicuous sentence of property.A kind of mode can make classifier classifying quality more preferable in this way, and accuracy rate also can It is higher.
Summary of the invention
The present invention provides a kind of accuracy rate the higher deep learning theme sensibility classification method based on data enhancing.
In order to reach above-mentioned technical effect, technical scheme is as follows:
A kind of deep learning theme sensibility classification method based on data enhancing, comprising the following steps:
S1: the semantic information for generating sentence, the deep learning network model G of character representation and classifier are established;
S2: being picked out according to deep learning network model G influences the most important word of sentiment analysis in training set is constituted newly Training set;
S3: deep learning network model G is trained again according to original training set and new training set, is then surveyed Examination.
Further, the detailed process of the step S1 is:
S11: utilizing bert pre-training language model, and each word in pre-training treated sentence is low with one Dimension, dense real vector are indicated, and since bert pre-training language model itself has been contained to each word Semantic modeling, therefore, by bert export each word there is semantic information, then by entire sentence expression at X= [x1,…,xt,…,xn], wherein n is the length of sentence, and the dimension of vector matrix X is 768 dimensions;
S12: it indicates to have been provided with certain semantic information according to the term vector by bert layer, it is also necessary to allow model learning The contextual information of each word of sentence removes the contextual information of study sentence with a two-way GRU network;If each word generation One time step t of table, the input of each GRU cell factory are the term vector x of current t momenttAnd the GRU cell at t-1 moment Hidden layer exports hft-1, H is expressed as to GRU before obtainingf=[hf1,…,hft,…hfn], similarly, backward GRU's is expressed as Hb= [hb1,…,hbt,…hbn]
S13: in order to learn each word of sentence and the relationship of descriptor, constructing one layer Attention layers, every for calculating Weight of a word about descriptor, weight is bigger, and the word that represents is bigger about the feeling polarities of current topic in influence sentence, first First each word is expressed as H=H by S12f+Hb, the term vector of current topic word is expressed as eN, then two vectors are spliced And tanh activation primitive is used, obtained vector is expressed as M=tanh ([H;eN]), then learn a parameter W and goes to calculate often A word exports the entirety for obtaining sentence about descriptor multiplied by the GRU of each word of corresponding position about the weight size of descriptor Indicate r, wherein r=Hsoftmax (WTM);
S14: establishing the last layer output layer, and the sentence expression r that S13 is obtained passes through two layers of full articulamentum and one layer Softmax is mapped in three class categories, and the feeling polarities for respectively corresponding current sentence are that actively, passive and neutral is general Then rate exports the feeling polarities of maximum probability according to probability size cases, export result;
S15: the training data in data set is carried out to a training according to the above process, uses cross entropy in training process It as loss function, is optimized using ADAM optimizer, over-fitting is prevented using L2 regularization, finally by the ginseng of network Number preserves.
Further, the detailed process of the step S2 is:
S21: each word of each sentence of training data is once replaced with [MASK] respectively, if currently Sentence is s=[w1,…,wt,…,wn], n indicates the number for the word that current sentence includes, then each sentence after replacing one by one The shared n word of sentence set s ' one, wherein s ' be { [[MASK] ..., wt,…,wn],…,[w1,…,[MASK],…, wn],…,[w1,…,wt,…,[MASK]]}.
S22: reloading the network parameter saved in S15, the network G of trained mistake is obtained before, then by s's ' Each sentence is separately input to the probability distribution of available prediction feeling polarities in network G, then selects from true Emotion is distributed that sentence of lie farthest away, is put into new training set, gets more one times of new training set, and every One sentence, which has all been cut out, influences the maximum word of the sentence feeling polarities, enhances the classification capacity of model.
Further, the detailed process of the step S3 is:
S31: the training set generated in S22 is put into S15 with former training data together as training set and has been trained Network in, then once trained again according to the process of S1, training when, still uses cross entropy as loss function, Use ADAM as optimizer, using L2 regularization, learning rate is set as 0.01, and model is restrained after 5 epochs of training.
S32: test data being put into S31 in trained network and is tested, and test index is carried out using accuracy rate It measures.
Further, the decision rule of that sentence in the step S22 from real feelings distribution lie farthest away is:
Assuming that real feelings are distributed as y1,y2,y3···,yn, true tag yt, the probability distribution set of all predictions For { (x11,x12,x13, x1n),(x21,x22,x23, x2n),…,(xm1,xm2,xm3, xmn), it looks for The smallest x outit, corresponding to distribution (xi1,xi2,xi3, xin) it is the language that lie farthest away is distributed from real feelings Sentence.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention can allow word first to get a preliminary semantic information by bert pre-training language model, then pass through The context semantic feature between two-way GRU e-learning word and word is crossed, while proposing a kind of method for enhancing data, by picking Except the maximum word of feeling polarities is influenced in each sentence, the judgement for the feeling polarities for forcing model that study is gone to be more difficult sentence, simultaneously EDS extended data set enables model to capture feature from more data sets again.Shown by the experiment on corresponding data collection Sensibility classification method before present invention comparison, has a distinct increment.
Detailed description of the invention
Fig. 1 is the flow through a network schematic diagram that data of the invention enhance;
Fig. 2 is the complete model schematic of the present invention.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
As shown in Figure 1, a kind of deep learning theme sensibility classification method based on data enhancing, comprising the following steps:
S1: the semantic information for generating sentence, the deep learning network model G of character representation and classifier are established;
S2: being picked out according to deep learning network model G influences the most important word of sentiment analysis in training set is constituted newly Training set;
S3: deep learning network model G is trained again according to original training set and new training set, is then surveyed Examination.
The detailed process of step S1 is:
S11: utilizing bert pre-training language model, and each word in pre-training treated sentence is low with one Dimension, dense real vector are indicated, and since bert pre-training language model itself has been contained to each word Semantic modeling, therefore, by bert export each word there is semantic information, then by entire sentence expression at X= [x1,…,xt,…,xn], wherein n is the length of sentence, and the dimension of vector matrix X is 768 dimensions;
S12: it indicates to have been provided with certain semantic information according to the term vector by bert layer, it is also necessary to allow model learning The contextual information of each word of sentence removes the contextual information of study sentence with a two-way GRU network;If each word generation One time step t of table, the input of each GRU cell factory are the term vector x of current t momenttAnd the GRU cell at t-1 moment Hidden layer exports hft-1, H is expressed as to GRU before obtainingf=[hf1,…,hft,…hfn], similarly, backward GRU's is expressed as Hb= [hb1,…,hbt,…hbn]
S13: in order to learn each word of sentence and the relationship of descriptor, constructing one layer Attention layers, every for calculating Weight of a word about descriptor, weight is bigger, and the word that represents is bigger about the feeling polarities of current topic in influence sentence, first First each word is expressed as H=H by S12f+Hb, the term vector of current topic word is expressed as eN, then two vectors are spliced And tanh activation primitive is used, obtained vector is expressed as M=tanh ([H;eN]), then learn a parameter W and goes to calculate often A word exports the entirety for obtaining sentence about descriptor multiplied by the GRU of each word of corresponding position about the weight size of descriptor Indicate r, wherein r=Hsoftmax (WTM);
S14: establishing the last layer output layer, and the sentence expression r that S13 is obtained passes through two layers of full articulamentum and one layer Softmax is mapped in three class categories, and the feeling polarities for respectively corresponding current sentence are that actively, passive and neutral is general Then rate exports the feeling polarities of maximum probability according to probability size cases, export result;
S15: the training data in data set is carried out to a training according to the above process, uses cross entropy in training process It as loss function, is optimized using ADAM optimizer, over-fitting is prevented using L2 regularization, finally by the ginseng of network Number preserves.
The detailed process of step S2 is:
S21: each word of each sentence of training data is once replaced with [MASK] respectively, if currently Sentence is s=[w1,…,wt,…,wn], n indicates the number for the word that current sentence includes, then each sentence after replacing one by one The shared n word of sentence set s ' one, wherein s ' be { [[MASK] ..., wt,…,wn],…,[w1,…,[MASK],…, wn],…,[w1,…,wt,…,[MASK]]}.
S22: reloading the network parameter saved in S15, the network G of trained mistake is obtained before, then by s's ' Each sentence is separately input to the probability distribution of available prediction feeling polarities in network G, then selects from true Emotion is distributed that sentence of lie farthest away, is put into new training set, gets more one times of new training set, and every One sentence, which has all been cut out, influences the maximum word of the sentence feeling polarities, enhances the classification capacity of model.
The detailed process of step S3 is:
S31: the training set generated in S22 is put into S15 with former training data together as training set and has been trained Network in, then once trained again according to the process of S1, training when, still uses cross entropy as loss function, Use ADAM as optimizer, using L2 regularization, learning rate is set as 0.01, and model is restrained after 5 epochs of training.
S32: test data being put into S31 in trained network and is tested, and test index is carried out using accuracy rate It measures.
The decision rule of that sentence in step S22 from real feelings distribution lie farthest away is:
Assuming that real feelings are distributed as y1,y2,y3···,yn, true tag yt, the probability distribution set of all predictions For { (x11,x12,x13, x1n),(x21,x22,x23, x2n),…,(xm1,xm2,xm3, xmn), it looks for The smallest x outit, corresponding to distribution (xi1,xi2,xi3, xin) it is the language that lie farthest away is distributed from real feelings Sentence.
Embodiment 2
The data set that this method uses is to set under the special interest group SIGLEX of computational linguistics association vocabulary for 2015 The task of the series of computation semantic parsing system assessment of meter, the data set of use contain two from this task of Task12. Partial data, a part are the restaurant reviews from some clients, other comments from client to laptop computer.About The descriptor of restaurant review has 13, and the descriptor about laptop computer comment has 87, and two comments all only include three Kind affective tag: actively, neutral and passive.Data set basic condition used in the present invention is as shown in the table:
Dataset Train Test Topics
Restaurant 1478 775 13
Laptop 1972 948 87
The building of network N is as shown in Fig. 2 left-hand component.
Using the words as example: The food is so good and delicious, but the staff is This original sentence is input in model by terrible.Topic:service (label is passive), after bert It can obtain one 768 × 14 matrix.Then this Input matrix is learnt to the context letter between word and word into two-way GRU Breath, the GRU output of available forward direction respectively: Hf=[hf1,…,hft,…hfn] and backward GRU output: Hf= [hf1,…,hft,…hfn].Then the output of the two is corresponded to dimension to be added, the whole of last each word, which can be obtained, to be indicated: H =[h1,h2,…,hn].Next, the term vector of each word of the matrix and descriptor is stitched together, in order to count Calculate the correlation that each word closes descriptor.Descriptor carries out word expression using Glove, and dimension is the matrix M after 300. splicings Dimension be 1068 × 14. then learn a parameter W go to calculate the weight of each word, the method for calculating is softmax (WTM), the weight matrix finally obtained is α=[α12,…,αn], wherein all elements and be 1, then by weight matrix It is multiplied by the expression H of corresponding each word and is added and obtain the expression r of entire sentence, wherein r is one 1 × 768 vector.Then will It is mapped in three class categories by two full articulamentums finally by softmax layers, and corresponding emotional semantic classification is respectively Actively, passive and neutral.
First the model just built is trained with existing training set, using cross entropy as loss in training process Function is optimized using ADAM optimizer, and prevents over-fitting using L2 regularization.By parameter after training model It preserves, next data set is expanded using the model.The idea of expansion, which derives from, will such as judge this following sentence The food is so good and delicious, but the staff is terrible.Topic:service is talked about, Can by two aspect judge that this topic feeling polarities is negative about service, one be by terrible this Word, the other is but is followed by turnover by good and but, due to good be it is positive, but is described later right As being exactly passive.So still can speculate service by good and but even if this word covers terrible Feeling polarities be negative.And using this judgment method force model go study sentence in semantic information, to enhance The classification capacity of model.
The difficulty of model prediction current sentence feeling polarities can be increased and at the same time increasing by so covering what kind of word The scale of data set? such as Fig. 2, the step of using, is as follows: with that sentence above for The food is so good and Delicious, but the staff is terrible. are covered from first word is covered until covering the last one word Word is substituted with [MASK], is thus generated 14 words (because having 14 words), is then respectively put into these sentences just A prediction probability of available each sentence is distributed in trained good model, then select probability distribution and real feelings That sentence of lie farthest away is distributed (if two probability distribution are respectively 0.1 0.8 0.1 and 0.3 0.5 0.2, and true It is distributed as 010 in fact, then it is assumed that 0.3 0.5 0.2 this distribution distance are really distributed farthest) it is put into new training, for working as For preceding sentence, can select The food is so good and delicious, but the staff is [MASK] by its It is put into the middle of new training set.Aforesaid operations are executed by each sentence to training set, can be obtained and training set The new training set of size as many.
After obtaining new data set, it is merged into one bigger data of composition with original data set Collection, then time network of retraining on model basis before, still uses ADAM as loss function using cross entropy As optimizer, using L2 regularization, learning rate is set as 0.01, and model starts to restrain after having trained 5 epochs.Then will Test data is put into trained model and is tested.
In order to show the good result of this experiment, this experiment and the current existing good theme sentiment classification model of effect (Word&clause level) compares verifying, and evaluation index is accuracy rate (accuracy), and accuracy rate is defined as model Correctly predicted number of samples accounts for the percentage of entire test data set total sample number.Experimental result is as follows:
Invention as can be seen from the results has a biggish promotion compared to pervious method, and the present invention is from the angle that data enhance Degree sets out, classification capacity of the lift scheme in the theme sentiment analysis for classifying more difficult sentence, and rationally utilizes bert in this way A kind of pre-training language model.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications done without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.
The same or similar label correspond to the same or similar components;
Described in attached drawing positional relationship for only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.
The same or similar label correspond to the same or similar components;
Described in attached drawing positional relationship for only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (5)

1. a kind of deep learning theme sensibility classification method based on data enhancing, which comprises the following steps:
S1: the semantic information for generating sentence, the deep learning network model G of character representation and classifier are established;
S2: being picked out according to deep learning network model G influences the most important word of sentiment analysis in training set constitutes new training Collection;
S3: deep learning network model G is trained again according to original training set and new training set, is then tested.
2. the deep learning theme sensibility classification method according to claim 1 based on data enhancing, which is characterized in that institute The detailed process for stating step S1 is:
S11: utilizing bert pre-training language model, thick by one low-dimensional of each word in pre-training treated sentence Close real vector is indicated, and the semanteme to each word has been contained due to bert pre-training language model itself Therefore modeling has semantic information by each word that bert is exported, then by entire sentence expression at X=[x1,…, xt,…,xn], wherein n is the length of sentence, and the dimension of vector matrix X is 768 dimensions;
S12: it indicates to have been provided with certain semantic information according to the term vector by bert layer, it is also necessary to allow model learning sentence Each word contextual information, with a two-way GRU network go study sentence contextual information;If each word represents one A time step t, the input of each GRU cell factory are the term vector x of current t momenttAnd the GRU cell hidden layer at t-1 moment Export hft-1, H is expressed as to GRU before obtainingf=[hf1,…,hft,…hfn], similarly, backward GRU's is expressed as Hb= [hb1,…,hbt,…hbn]
S13: in order to learn each word of sentence and the relationship of descriptor, one layer Attention layers are constructed, for calculating each word About the weight of descriptor, weight is bigger, and the word that represents is bigger about the feeling polarities of current topic in influence sentence, first often A word is expressed as H=H by S12f+Hb, the term vector of current topic word is expressed as eN, then two vectors are spliced and are made With tanh activation primitive, obtained vector is expressed as M=tanh ([H;eN]), then learn a parameter W and goes to calculate each word Weight size about descriptor exports the entirety expression for obtaining sentence about descriptor multiplied by the GRU of each word of corresponding position R, wherein r=Hsoftmax (WTM);
S14: establishing the last layer output layer, and the sentence expression r that S13 is obtained passes through two layers of full articulamentum and one layer Softmax is mapped in three class categories, and the feeling polarities for respectively corresponding current sentence are that actively, passive and neutral is general Then rate exports the feeling polarities of maximum probability according to probability size cases, export result;
S15: the training data in data set is carried out to a training according to the above process, uses cross entropy conduct in training process Loss function is optimized using ADAM optimizer, prevents over-fitting using L2 regularization, finally protects the parameter of network It leaves and.
3. the deep learning theme sensibility classification method according to claim 2 based on data enhancing, which is characterized in that institute The detailed process for stating step S2 is:
S21: each word of each sentence of training data is once replaced with [MASK] respectively, if current sentence For s=[w1,…,wt,…,wn], n indicates the number for the word that current sentence includes, then each sentence is by replaced sentence one by one The shared n word of subclass s ' one, wherein s ' is { [[MASK] ..., wt,…,wn],…,[w1,…,[MASK],…,wn],…, [w1,…,wt,…,[MASK]]}.
S22: reloading the network parameter saved in S15, the network G of trained mistake is obtained before, then by each of s ' Sentence is separately input to the probability distribution of available prediction feeling polarities in network G, then selects from real feelings It is distributed that sentence of lie farthest away, is put into new training set, gets more one times of new training set, and each Sentence, which has all been cut out, influences the maximum word of the sentence feeling polarities, enhances the classification capacity of model.
4. the deep learning theme sentiment analysis method according to claim 3 based on data enhancing, which is characterized in that institute The detailed process for stating step S3 is:
S31: the training set generated in S22 and former training data are put into S15 trained net together as training set It in network, is then once trained again according to the process of S1, still uses cross entropy as loss function when training, use ADAM is as optimizer, and using L2 regularization, learning rate is set as 0.01, and model is restrained after 5 epochs of training.
S32: test data being put into S31 in trained network and is tested, and test index is measured using accuracy rate.
5. the deep learning theme sentiment analysis method according to claim 4 based on data enhancing, which is characterized in that institute The decision rule for stating that sentence in step S22 from real feelings distribution lie farthest away is:
Assuming that real feelings are distributed as y1,y2,y3···,yn, true tag yt, the probability distribution collection of all predictions is combined into {(x11,x12,x13, x1n),(x21,x22,x23, x2n),…,(xm1,xm2,xm3, xmn), it finds out The smallest xit, corresponding to distribution (xi1,xi2,xi3, xin) it is the language that lie farthest away is distributed from real feelings Sentence.
CN201910365005.7A 2019-04-30 2019-04-30 Deep learning theme emotion classification method based on data enhancement Active CN110245229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910365005.7A CN110245229B (en) 2019-04-30 2019-04-30 Deep learning theme emotion classification method based on data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910365005.7A CN110245229B (en) 2019-04-30 2019-04-30 Deep learning theme emotion classification method based on data enhancement

Publications (2)

Publication Number Publication Date
CN110245229A true CN110245229A (en) 2019-09-17
CN110245229B CN110245229B (en) 2023-03-28

Family

ID=67883613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910365005.7A Active CN110245229B (en) 2019-04-30 2019-04-30 Deep learning theme emotion classification method based on data enhancement

Country Status (1)

Country Link
CN (1) CN110245229B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728153A (en) * 2019-10-15 2020-01-24 天津理工大学 Multi-category emotion classification method based on model fusion
CN110826336A (en) * 2019-09-18 2020-02-21 华南师范大学 Emotion classification method, system, storage medium and equipment
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN111079406A (en) * 2019-12-13 2020-04-28 华中科技大学 Natural language processing model training method, task execution method, equipment and system
CN111104512A (en) * 2019-11-21 2020-05-05 腾讯科技(深圳)有限公司 Game comment processing method and related equipment
CN111309871A (en) * 2020-03-26 2020-06-19 普华讯光(北京)科技有限公司 Method for matching degree between requirement and output result based on text semantic analysis
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111859908A (en) * 2020-06-30 2020-10-30 北京百度网讯科技有限公司 Pre-training method and device for emotion learning, electronic equipment and readable storage medium
CN112069320A (en) * 2020-09-10 2020-12-11 东北大学秦皇岛分校 Span-based fine-grained emotion analysis method
CN112685558A (en) * 2019-10-18 2021-04-20 普天信息技术有限公司 Emotion classification model training method and device
CN112765993A (en) * 2021-01-20 2021-05-07 上海德拓信息技术股份有限公司 Semantic parsing method, system, device and readable storage medium
CN113255365A (en) * 2021-05-28 2021-08-13 湖北师范大学 Text data enhancement method, device and equipment and computer readable storage medium
CN113297842A (en) * 2021-05-25 2021-08-24 湖北师范大学 Text data enhancement method
CN113723075A (en) * 2021-08-28 2021-11-30 重庆理工大学 Specific target emotion analysis method for enhancing and counterlearning of fused word shielding data
CN114580430A (en) * 2022-02-24 2022-06-03 大连海洋大学 Method for extracting fish disease description emotion words based on neural network
US11468239B2 (en) 2020-05-22 2022-10-11 Capital One Services, Llc Joint intent and entity recognition using transformer models
CN115662435A (en) * 2022-10-24 2023-01-31 福建网龙计算机网络信息技术有限公司 Virtual teacher simulation voice generation method and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112753A1 (en) * 2005-11-14 2007-05-17 Microsoft Corporation Augmenting a training set for document categorization
CN107577785A (en) * 2017-09-15 2018-01-12 南京大学 A kind of level multi-tag sorting technique suitable for law identification
CN109034092A (en) * 2018-08-09 2018-12-18 燕山大学 Accident detection method for monitoring system
CN109670169A (en) * 2018-11-16 2019-04-23 中山大学 A kind of deep learning sensibility classification method based on feature extraction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112753A1 (en) * 2005-11-14 2007-05-17 Microsoft Corporation Augmenting a training set for document categorization
CN107577785A (en) * 2017-09-15 2018-01-12 南京大学 A kind of level multi-tag sorting technique suitable for law identification
CN109034092A (en) * 2018-08-09 2018-12-18 燕山大学 Accident detection method for monitoring system
CN109670169A (en) * 2018-11-16 2019-04-23 中山大学 A kind of deep learning sensibility classification method based on feature extraction

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826336A (en) * 2019-09-18 2020-02-21 华南师范大学 Emotion classification method, system, storage medium and equipment
CN110728153A (en) * 2019-10-15 2020-01-24 天津理工大学 Multi-category emotion classification method based on model fusion
CN112685558A (en) * 2019-10-18 2021-04-20 普天信息技术有限公司 Emotion classification model training method and device
CN111104512A (en) * 2019-11-21 2020-05-05 腾讯科技(深圳)有限公司 Game comment processing method and related equipment
CN111104512B (en) * 2019-11-21 2020-12-22 腾讯科技(深圳)有限公司 Game comment processing method and related equipment
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN110956579B (en) * 2019-11-27 2023-05-23 中山大学 Text picture rewriting method based on generation of semantic segmentation map
CN111079406A (en) * 2019-12-13 2020-04-28 华中科技大学 Natural language processing model training method, task execution method, equipment and system
CN111309871A (en) * 2020-03-26 2020-06-19 普华讯光(北京)科技有限公司 Method for matching degree between requirement and output result based on text semantic analysis
CN111309871B (en) * 2020-03-26 2024-01-30 普华讯光(北京)科技有限公司 Method for matching degree between demand and output result based on text semantic analysis
US11468239B2 (en) 2020-05-22 2022-10-11 Capital One Services, Llc Joint intent and entity recognition using transformer models
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111859908A (en) * 2020-06-30 2020-10-30 北京百度网讯科技有限公司 Pre-training method and device for emotion learning, electronic equipment and readable storage medium
CN111859908B (en) * 2020-06-30 2024-01-19 北京百度网讯科技有限公司 Emotion learning pre-training method and device, electronic equipment and readable storage medium
CN112069320B (en) * 2020-09-10 2022-06-28 东北大学秦皇岛分校 Span-based fine-grained sentiment analysis method
CN112069320A (en) * 2020-09-10 2020-12-11 东北大学秦皇岛分校 Span-based fine-grained emotion analysis method
CN112765993A (en) * 2021-01-20 2021-05-07 上海德拓信息技术股份有限公司 Semantic parsing method, system, device and readable storage medium
CN113297842A (en) * 2021-05-25 2021-08-24 湖北师范大学 Text data enhancement method
CN113255365A (en) * 2021-05-28 2021-08-13 湖北师范大学 Text data enhancement method, device and equipment and computer readable storage medium
CN113723075A (en) * 2021-08-28 2021-11-30 重庆理工大学 Specific target emotion analysis method for enhancing and counterlearning of fused word shielding data
CN114580430A (en) * 2022-02-24 2022-06-03 大连海洋大学 Method for extracting fish disease description emotion words based on neural network
CN114580430B (en) * 2022-02-24 2024-04-05 大连海洋大学 Method for extracting fish disease description emotion words based on neural network
CN115662435A (en) * 2022-10-24 2023-01-31 福建网龙计算机网络信息技术有限公司 Virtual teacher simulation voice generation method and terminal
US11727915B1 (en) 2022-10-24 2023-08-15 Fujian TQ Digital Inc. Method and terminal for generating simulated voice of virtual teacher

Also Published As

Publication number Publication date
CN110245229B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN110245229A (en) A kind of deep learning theme sensibility classification method based on data enhancing
CN108399158B (en) Attribute emotion classification method based on dependency tree and attention mechanism
Vadicamo et al. Cross-media learning for image sentiment analysis in the wild
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
CN110532554A (en) A kind of Chinese abstraction generating method, system and storage medium
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN107688870A (en) A kind of the classification factor visual analysis method and device of the deep neural network based on text flow input
CN112256866A (en) Text fine-grained emotion analysis method based on deep learning
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN108920446A (en) A kind of processing method of Engineering document
Sujana et al. Rumor detection on Twitter using multiloss hierarchical BiLSTM with an attenuation factor
CN109670169B (en) Deep learning emotion classification method based on feature extraction
CN111259147A (en) Sentence-level emotion prediction method and system based on adaptive attention mechanism
Nabil et al. Cufe at semeval-2016 task 4: A gated recurrent model for sentiment classification
CN113535949A (en) Multi-mode combined event detection method based on pictures and sentences
Torres, Carmen Vaca Cross-lingual perspectives about crisis-related conversations on Twitter
Wang et al. Distant supervised relation extraction with position feature attention and selective bag attention
Lisjana et al. Classifying complaint reports using rnn and handling imbalanced dataset
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
Handayani et al. Sentiment Analysis Of Electric Cars Using Recurrent Neural Network Method In Indonesian Tweets
Sadr et al. A novel deep learning method for textual sentiment analysis
Jiang et al. Network public comments sentiment analysis based on multilayer convolutional neural network
Preetham et al. Comparative Analysis of Research Papers Categorization using LDA and NMF Approaches
Urkude et al. Comparative analysis on machine learning techniques: a case study on Amazon product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant