CN108388654A - A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism - Google Patents
A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism Download PDFInfo
- Publication number
- CN108388654A CN108388654A CN201810171490.XA CN201810171490A CN108388654A CN 108388654 A CN108388654 A CN 108388654A CN 201810171490 A CN201810171490 A CN 201810171490A CN 108388654 A CN108388654 A CN 108388654A
- Authority
- CN
- China
- Prior art keywords
- vector
- training
- sample
- term vector
- convolution kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, step includes:1. utilizing known term vector dictionary, every sample in training set and test set is expressed as term vector matrix;2. choose suitable convolution kernel carries out convolution to term vector matrix, extraction mappings characteristics vector is to realize dimensionality reduction;3. building adversative dictionary, and by inquiring the position of adversative in the sample, semantic division is carried out to the mappings characteristics of extraction, extraction each divides most important information in the block, forms final feature space;4. training grader based on the final feature space, and classify to the sample in test set.The present invention is based on the adversative dictionaries of structure, realize the division of sentence semantics block, can obtain the important semantic information in every section, while considering the positional structure feature of sentence, so as to improve the correctness of text emotion classification.
Description
Technical field
The invention belongs to the emotional semantic classification problems of nature and Language Processing field, especially for the emotion for including a variety of semantemes
Expression way is such as intended to press down the turnover sentence first raised or be intended to raise and first press down, carries out effective emotional semantic classification.
Background technology
It is the text message of communication media increasingly by enterprises and institutions using network with the rapid development of Internet
With personal concern, the network information can be that government department understands common people's intention, and enterprise understands user couple by opening up product review
The opinion of product is to improve properties of product, and consumer is by product review come consumption guidance behavior.However, online have greatly daily
The new comment of amount occurs, and comment, which starts, may be the attitude actively affirmed but consider that other factors may be changed into negative state
That is, there is the phenomenon that rising after restrain, first go up and then down or changing the attitude repeatedly in degree.Such as:Social networks (including:Domestic Tencent/
Sina weibo, Renren Network, external Facebook, Twitter etc.) a large amount of user data can be all generated daily, and wherein wrap
Certain event is delivered containing a large amount of people to the text message of personal view.Such as:About " little Huang vehicle death case " in Tencent's news
One comment:" compensation for the spirit of humanitarianism will be understood by, but chase shared bicycle provider right and wrong simply
Normal not logic.Moreover as victim, you itself is also in fault, if you do not go Misuse to share bicycle, just
It is not in present situation." it is that should give reparation to shared bicycle provider to hold a definite attitude, but transfer again below first
Expression victim oneself is also required to undertake the responsibility.Such as:Shopping platform (including:Jingdone district store, Suning easily purchase, day cat etc.) daily
User's net purchase comment information of magnanimity will be generated;Such as:The comment of the one purchase mobile phone user in Jingdone district:" sound is too small, connects electricity
Words it is very hard, sound opens to the maximum effect also unobvious, but networking speed is quickly, shape as smart as a new pin, whole favorable comment ",
It expressed before this and states that arrangement is satisfied behind disadvantage certainly.These real-life comment informations, which are removed, to be had and can show emotion
Outside polar emotion word, and makes comment information containing adversative while there is positive/negative two kinds of emotions, this feature to make text feelings
Sense classification problem becomes more complicated, but also traditional data mining algorithm and existing machine learning method face sternness
Challenge:
One of challenge:Traditional unsupervised segmentation method based on emotion dictionary, passes through word in emotion dictionary parsing sentence
The feeling polarities of language, and by determining the Sentiment orientation of sentence entirety to the polar simple summation of these words, to word
Importance does not distinguish, it is clear that is difficult to obtain preferable effect;
The two of challenge:Based on machine learning text emotion trend analysis method (including:K- neighbours, support vector machines
SVM, Bayes etc.) there are following main problems:1) be indicated using traditional bag of words method, the dimension of text vector compared with
High and data are more sparse, do not utilize the training of model;2) only consider the syntactic structure between feature and ignore its semantic information, make
It is mismatched at the semanteme in Feature Mapping result, can not indicate the semanteme of document well.
The three of challenge:Existing deep learning method can learn sentence characteristics, and typical neural network structure such as recycles god
Through network (RNN) and convolutional neural networks (CNN), both models are all that word vector indicates feature space, and utilizes semanteme
Synthetic method extracts the feature of sentence, finally grader is used to classify its feeling polarities.Relative to RNN models, CNN parameters
Negligible amounts, and the semantic feature of text can be preferably captured, time complexity is also much smaller than RNN.However, traditional CNN nets
Network is used to have ignored the structure feature of sentence when sentiment analysis, and Max-pooling methods are from the feature of sentence according in importance
A maximum value is extracted, any differentiation is not made to the structure of sentence.This feature makes this method in the processing of turnover sentence
It is ineffective.
Invention content
In the presence of solving the problems, such as three challenges, the present invention is provided one kind and is divided based on turnover sentence semantic chunk
The sensibility classification method of mechanism carries out segmentation to Feature Mapping space and realizes sentence language to the adversative dictionary based on structure
The division of adopted block to obtain the important semantic information in every section, while considering the positional structure feature of sentence, and then improves text
The correctness of this Sentiment orientation analysis.
The present invention is to reach the goal of the invention, is adopted the following technical scheme that:
The present invention it is a kind of based on turnover sentence semantic chunk partition mechanism sensibility classification method the characteristics of be as follows into
Row:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set
The term vector of word;The dimension set of term vector is | V |;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text is constituted
Test set DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th in the training set DS and test set DT is indicated respectively
Training sample and j-th of test sample, and have: It indicates in the training set DS i-th
Training sample siIn m-th of word; Indicate j-th of test in the test set DT
Sample tjIn n-th of word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to the term vector dictionary D, i-th of training sample s in the training set DS is inquirediIn m-th of word
Term vector beObtain i-th of training sample s in the training set DSiTerm vector matrix
For a M × | V | matrix;
Similarly obtain j-th of test sample t in the test set DTjTerm vector matrix Indicate j-th of test sample t in the test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1,W2,…,Wk,…,WKWhereinIndicate a height of hk, width nkKth kind size convolution kernel set,Indicate hk×nkMatrix;And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random
Initialization;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith
G-th of convolution kernelI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between piece
SectionIt carries outConvolution operation obtains single features map vectorIn the τ valueIt is reflected to obtain single features
Directive amountThen by the convolution kernel of the K kinds size and i-th of training sample si's
Term vector matrix SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Indicate that bias vector, σ () are activation primitive,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M-hk+
1;
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in the training set DSiMiddle lookup adversative,
It is according to adversative position that the Feature Mapping is vectorialTo being segmented, and most important one is extracted in each segmentation
A feature, then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search in the training set DS i-th
Training sample siIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZDL, then the L is provided
A adversative zLI-th of training sample s in the training set DSiIn position be
Step 3.2 is according to g-th of convolution kernelSize obtain the adversative zLSample is trained at described i-th
This siTerm vector matrix SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, point
The maximum value in two sections of single features map vectors is not obtainedWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to the K × G maximum value pairIt carries out
Splicing, thus obtains i-th of training sample siFinal character representation space
Step 4:Based on the character representation spaceBuild disaggregated model
Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation space
Element for same dimension be 0 or be 1 vector;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, WoFor weighting parameter, boFor another bias vector;
Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O
The training of (), the softmax graders after being optimizedAs the disaggregated model;
Step 5 utilizes the disaggregated model O*() is to j-th of test sample t in the test set DTjCarry out emotion point
Class obtains the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
Compared with the prior art, beneficial effects of the present invention are embodied in:
1, in order to capture abundanter text feature, the present invention carries out convolution fortune using various sizes of more convolution kernels
It calculates, to obtain the Feature Mapping vector of multi-source, has achieved the purpose that extract higher quality feature.
2, the present invention is based on adversative positions is segmented sentence, and most important feature is chosen from each section;Both gram
It has taken the dimension disaster in conventional method, sparsity and has not accounted for text semantic information, reduced so as to cause nicety of grading
Problem;It overcomes again and does not account for sentence turnover phenomenon in deep learning method, extract feature only according to importance, cause important
The problem of characteristic information is lost;Therefore the accuracy rate of text emotion classification is improved.
3, the present invention is distributed zero setting method in final disaggregated model building process using Bernoulli Jacob, can effectively prevent
Only model over-fitting makes model have better generalization ability.
4, the present invention is towards practical application area, such as:User delivers the Sentiment orientation to event view in social networks, can
It is found in time for government department and grasps public opinion trend;Shopping online user to the Sentiment orientation of comment on commodity, can be businessman,
Consumer provides prediction, early warning work, provides suggestion for the sale of businessman, service quality adjustable strategies and is the purchase of consumer
Object behavior is recommended.
Description of the drawings
Fig. 1 is sensibility classification method flow chart of the present invention;
Fig. 2 is the process schematic that the present invention carries out convolution operation using more convolution kernels;
Fig. 3 is that the present invention is based on the process schematics that adversative position carries out semantic segmentation;
Fig. 4 is the process schematic that the present invention builds disaggregated model merely with Partial Feature in feature space.
Specific implementation mode
In the present embodiment, as shown in Figure 1, a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism is by such as
Lower step carries out:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set
The term vector of word;Based on corpus (about 100,000,000,000 words) outside GoogleNews corpus in the present embodiment, utilize
Word2vec disclosed in google is trained corpus, obtained term vector library googlenews-vecctors-
Negative300.bin files are as term vector dictionary D, the dimension set of term vector | V |, in the present embodiment | V |=300;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text is constituted
Test set DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th of training in training set DS and test set DT is indicated respectively
Sample and j-th of test sample, and have: Indicate i-th of training sample in training set DS
siIn m-th of word; Indicate j-th of test sample t in test set DTjIn n-th
A word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to term vector dictionary D, i-th of training sample s in training set DS is inquirediIn m-th of wordTerm vector
ForObtain i-th of training sample s in training set DSiTerm vector matrixFor a M × | V
| matrix;
Similarly obtain j-th of test sample t in test set DTjTerm vector matrix
Indicate j-th of test sample t in test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1, W2 ..., Wk..., WK } whereinIndicate a height of hk, width nkKth kind size convolution kernel set,Indicate hk×nkMatrix;And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random
Initialization;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith g-th volume
Product coreI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features map vectorThen by the convolution kernel of K kind sizes and i-th of training sample siTerm vector matrix
SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Expression bias vector, σ () relu activation primitives,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M-
hk+1;Convolution operationConcrete operation be expressed as WithG-th of convolution kernel is indicated respectivelyWith term vector matrix Siτ to τ+h-1 between segmentIn
E rows f row element.
The convolution algorithm schematic diagram that the convolution kernel for assuming 3 kinds of sizes is given shown in Fig. 2, the size point of convolution kernel in Fig. 2
Not Wei 2 × 300,3 × 300 and 4 × 300, i.e. hk=2, hk=3, hk=4, n is set here1=n2=n3=| V |=300;Each
The convolution kernel of size has 100, i.e. G=100.Respectively to 3 × 100 convolution kernels and i-th of training sample s in Fig. 2iWord
Vector matrix SiConvolution algorithm is carried out,
Work as convolution kernelSize is 2 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-2+1
Vector, the convolution kernel that 100 sizes are 2 × 300 then obtains the vector of 100 M-2+1 dimensions, is expressed as
Work as convolution kernelSize is 3 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-3+1
Vector, the convolution kernel that 100 sizes are 3 × 300 then obtains the vector of 100 M-3+1 dimensions, is expressed as
Work as convolution kernelSize is 4 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-4+1
Vector, the convolution kernel that 100 sizes are 4 × 300 then obtains the vector of 100 M-4+1 dimensions, is expressed as
3 × 100 convolution kernels in Fig. 2 and i-th of training sample siTerm vector matrix SiConvolution algorithm is carried out, can be obtained
To 3 × 100 single features map vectorsObtain i-th of training sample siTerm vector matrix SiFeature Mapping to
Amount, is expressed as
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in training set DSiMiddle lookup adversative, according to
Adversative position is vectorial by Feature MappingTo being segmented, and a most important feature is extracted in each segmentation,
Then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to adversative dictionary ZD, search i-th of training sample in training set DS
siIn whether contain adversative, if containing l-th adversative z in turnover dictionary ZDL, then l-th adversative z is providedLIn training
Collect i-th of training sample s in DSiIn position be
Intelligent word (http is combined in the present embodiment://www.smart-words.org/linking-words/
) and MSU (https transition-words.html://msu.edu/user/jdowell/135/transw.html) it is open
Adversative, construct the dictionary of total 179 adversatives.Adversative dictionary ZD is as shown in table 1.
1 adversative dictionary of table
Step 3.2 is according to g-th of convolution kernelSize obtain adversative zLIn i-th of training sample siTerm vector square
Battle array SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into
Two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, point
The maximum value in two sections of single features map vectors is not obtainedWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to K × G maximum value pairIt is spelled
It connects, thus obtains i-th of training sample siFinal character representation space
The specific example that semantic division is carried out based on adversative position is given shown in Fig. 3, with 100 sizes for 3
For × 300 convolution kernel, the pond maximum operation process based on adversative is illustrated.Include adversative in sample in Fig. 3
" but ", and the position of " but " in the sample isIt is assumed that the size of convolution kernel is 3 × 300, adversative " but " can be obtained and existed
Single features map vectorIn division points be 10-3+1, to by single features map vectorIt is divided into two sectionsWithAnd every section of maximum value is found out respectively, so as to
To a maximum value pair, the convolution kernel that 100 sizes are 3 × 300 then obtains 100 maximum values pair, to constitute final spy
Levy representation space
Step 4:Feature based representation spaceBuild disaggregated model
Step 4.1 is based on overfitting problem caused by full connection type trains grader possible in order to prevent, is exerted using uncle
Sharp mode is randomly by character representation spaceρ is set to 0 according to a certain percentage, and only non-zero element participates in grader structure.Specifically
Operation is as shown in Figure 4:Setting zero setting vector is distributed for r based on Bernoulli Jacob, and zero setting vector r and character representation spaceIt is same
The element of dimension be 0 or be 1 vector, utilizeIt willIn Partial Elements set to 0;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, and sigmoid functions or tanh functions, W can be usedoFor random initializtion
Weighting parameter, boFor another bias vector, it is initialized as 0 vector;
I-th of training sample s can be calculated using the grader of formula (2)iThe probability for belonging to classification l is expressed as formula (3):
In formula (3),Indicate O*(si) vector first of element, | l | indicate classification total number.
Step 4.3 optimizes logarithm loss function using gradient descent method, to softmax graders O
The training of (), the softmax graders after being optimizedAs disaggregated model;
In the present embodiment, log-likelihood loss function is expressed as formula (4):
The concrete operation step of gradient descent method is as follows:
Step 4.3.1 updates weighting parameter Wo and bias vector bo according to formula (5) and formula (6);
Step 4.3.2 is by the maximum value pair of kth kind g-th of convolution kernel of sizeIt is put back to single
Feature Mapping vectorIn home position, single features map vectorIn remaining position set to 0, be expressed as
Step 4.3.3 updates g-th of convolution kernel according to formula (7) and formula (8)With bias vector bc
In formula (7) and formula (8),It indicates to carry out 180 overturnings to matrix;
Step 4.3.4 returns to step 2, and it is E times total that iteration executes from Step 2 to Step 4;
Step 5 utilizes disaggregated model O*() is to j-th of test sample t in test set DTjEmotional semantic classification is carried out, is obtained not
The probability of feeling of sympathy classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
For j-th of test sample t in test set DTj, final feature space is obtained according to the identical method of training sampleSubstitute into grader O*J-th of test sample t in test set DT is solved in ()jThe probability for belonging to classification l is expressed as:
In formula (9),Indicate O*(tj) vector first of element, | l | indicate classification total number.
Claims (1)
1. a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, it is characterized in that carrying out as follows:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring word in training set and test set
Term vector;The dimension set of term vector is | V |;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text constitutes test
Collect DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th of training in the training set DS and test set DT is indicated respectively
Sample and j-th of test sample, and have: Indicate i-th of instruction in the training set DS
Practice m-th of word in sample si; Indicate j-th of test specimens in the test set DT
This tjIn n-th of word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to the term vector dictionary D, i-th of training sample s in the training set DS is inquirediIn m-th of wordWord
Vector isObtain i-th of training sample s in the training set DSiTerm vector matrixFor
One M × | V | matrix;
Similarly obtain j-th of test sample t in the test set DTjTerm vector matrix Table
Show j-th of test sample t in the test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1,W2,…,Wk,…,WKWhereinIndicate a height of hk, width is the kth kind size convolution kernel set of nk,Indicate hk×nkMatrix;And have Indicate kth kind size convolution kernel set in g-th of convolution kernel, and carry out with
Machine initializes;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith it is described
G-th of convolution kernelI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features mapping
VectorThen by the convolution kernel of the K kinds size and i-th of training sample siWord
Vector matrix SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Indicate that bias vector, σ () are activation primitive,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M-hk+
1;
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in the training set DSiMiddle lookup adversative, according to
Adversative position is vectorial by the Feature MappingTo being segmented, and a most important spy is extracted in each segmentation
Sign, then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search i-th of training in the training set DS
Sample siIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZDL, then provide the l-th and turn
Roll over word zLI-th of training sample s in the training set DSiIn position be
Step 3.2 is according to g-th of convolution kernelSize obtain the adversative zLIn i-th of training sample si's
Term vector matrix SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vector
It is divided into two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, respectively
To the maximum value in two sections of single features map vectorsWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to the K × G maximum value pairIt is spelled
It connects, thus obtains i-th of training sample siFinal character representation space
Step 4:Based on the character representation spaceBuild disaggregated model
Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation spaceIt is same
The element of dimension be 0 or be 1 vector;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, WoFor weighting parameter, boFor another bias vector;
Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O ()
Training, the softmax graders after being optimizedAs the disaggregated model;
Step 5 utilizes the disaggregated model O*() is to j-th of test sample t in the test set DTjEmotional semantic classification is carried out, is obtained
To the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810171490.XA CN108388654B (en) | 2018-03-01 | 2018-03-01 | Sentiment classification method based on turning sentence semantic block division mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810171490.XA CN108388654B (en) | 2018-03-01 | 2018-03-01 | Sentiment classification method based on turning sentence semantic block division mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108388654A true CN108388654A (en) | 2018-08-10 |
CN108388654B CN108388654B (en) | 2020-03-17 |
Family
ID=63069615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810171490.XA Active CN108388654B (en) | 2018-03-01 | 2018-03-01 | Sentiment classification method based on turning sentence semantic block division mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108388654B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271632A (en) * | 2018-09-14 | 2019-01-25 | 重庆邂智科技有限公司 | A kind of term vector learning method of supervision |
CN110377740A (en) * | 2019-07-22 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Feeling polarities analysis method, device, electronic equipment and storage medium |
CN110377739A (en) * | 2019-07-19 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Text sentiment classification method, readable storage medium storing program for executing and electronic equipment |
CN110765769A (en) * | 2019-08-27 | 2020-02-07 | 电子科技大学 | Entity attribute dependency emotion analysis method based on clause characteristics |
CN111611375A (en) * | 2019-07-03 | 2020-09-01 | 北京航空航天大学 | Text emotion classification method based on deep learning and turning relation |
CN113806542A (en) * | 2021-09-18 | 2021-12-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120253792A1 (en) * | 2011-03-30 | 2012-10-04 | Nec Laboratories America, Inc. | Sentiment Classification Based on Supervised Latent N-Gram Analysis |
CN104035992A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Method and system for processing text semantics by utilizing image processing technology and semantic vector space |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
KR101652486B1 (en) * | 2015-04-05 | 2016-08-30 | 주식회사 큐버 | Sentiment communication system based on multiple multimodal agents |
CN107608956A (en) * | 2017-09-05 | 2018-01-19 | 广东石油化工学院 | A kind of reader's mood forecast of distribution algorithm based on CNN GRNN |
-
2018
- 2018-03-01 CN CN201810171490.XA patent/CN108388654B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120253792A1 (en) * | 2011-03-30 | 2012-10-04 | Nec Laboratories America, Inc. | Sentiment Classification Based on Supervised Latent N-Gram Analysis |
CN104035992A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Method and system for processing text semantics by utilizing image processing technology and semantic vector space |
CN104731770A (en) * | 2015-03-23 | 2015-06-24 | 中国科学技术大学苏州研究院 | Chinese microblog emotion analysis method based on rules and statistical model |
KR101652486B1 (en) * | 2015-04-05 | 2016-08-30 | 주식회사 큐버 | Sentiment communication system based on multiple multimodal agents |
CN107608956A (en) * | 2017-09-05 | 2018-01-19 | 广东石油化工学院 | A kind of reader's mood forecast of distribution algorithm based on CNN GRNN |
Non-Patent Citations (3)
Title |
---|
YANWEN WU等: "Building Chinese Sentiment Lexicon Based on HowNet", 《ADVANCED MATERIALS RESEARCH》 * |
YUHONG ZHANG等: "Sentiment Classification Based on Piecewise Pooling Convolutional Neural Network", 《TECH SCIENCE PRESS CMC》 * |
邸鹏等: "基于转折句式的文本情感倾向性分析", 《计算机工程与设计》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271632A (en) * | 2018-09-14 | 2019-01-25 | 重庆邂智科技有限公司 | A kind of term vector learning method of supervision |
CN111611375A (en) * | 2019-07-03 | 2020-09-01 | 北京航空航天大学 | Text emotion classification method based on deep learning and turning relation |
CN110377739A (en) * | 2019-07-19 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Text sentiment classification method, readable storage medium storing program for executing and electronic equipment |
CN110377740A (en) * | 2019-07-22 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Feeling polarities analysis method, device, electronic equipment and storage medium |
CN110765769A (en) * | 2019-08-27 | 2020-02-07 | 电子科技大学 | Entity attribute dependency emotion analysis method based on clause characteristics |
CN110765769B (en) * | 2019-08-27 | 2023-05-02 | 电子科技大学 | Clause feature-based entity attribute dependency emotion analysis method |
CN113806542A (en) * | 2021-09-18 | 2021-12-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
CN113806542B (en) * | 2021-09-18 | 2024-05-17 | 上海幻电信息科技有限公司 | Text analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108388654B (en) | 2020-03-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108388654A (en) | A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism | |
CN109145112B (en) | Commodity comment classification method based on global information attention mechanism | |
CN109271522B (en) | Comment emotion classification method and system based on deep hybrid model transfer learning | |
Hota et al. | KNN classifier based approach for multi-class sentiment analysis of twitter data | |
CN107391483A (en) | A kind of comment on commodity data sensibility classification method based on convolutional neural networks | |
Rei et al. | Grasping the finer point: A supervised similarity network for metaphor detection | |
CN108763326B (en) | Emotion analysis model construction method of convolutional neural network based on feature diversification | |
CN108446271B (en) | Text emotion analysis method of convolutional neural network based on Chinese character component characteristics | |
Basari et al. | Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization | |
Ruangkanokmas et al. | Deep belief networks with feature selection for sentiment classification | |
CN108427670A (en) | A kind of sentiment analysis method based on context word vector sum deep learning | |
Mamgain et al. | Sentiment analysis of top colleges in India using Twitter data | |
US11762990B2 (en) | Unstructured text classification | |
CN109299268A (en) | A kind of text emotion analysis method based on dual channel model | |
CN110263257B (en) | Deep learning based recommendation method for processing multi-source heterogeneous data | |
CN107908715A (en) | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion | |
CN110543242A (en) | expression input method based on BERT technology and device thereof | |
CN107515855A (en) | The microblog emotional analysis method and system of a kind of combination emoticon | |
CN110457562A (en) | A kind of food safety affair classification method and device based on neural network model | |
Jiang et al. | Detecting hate speech from tweets for sentiment analysis | |
CN111814453A (en) | Fine-grained emotion analysis method based on BiLSTM-TextCNN | |
Sunarya et al. | Comparison of accuracy between convolutional neural networks and Naïve Bayes Classifiers in sentiment analysis on Twitter | |
CN110321918A (en) | The method of public opinion robot system sentiment analysis and image labeling based on microblogging | |
CN109062958B (en) | Primary school composition automatic classification method based on TextRank and convolutional neural network | |
Huang | A CNN model for SMS spam detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |