CN108388654A - A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism - Google Patents

A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism Download PDF

Info

Publication number
CN108388654A
CN108388654A CN201810171490.XA CN201810171490A CN108388654A CN 108388654 A CN108388654 A CN 108388654A CN 201810171490 A CN201810171490 A CN 201810171490A CN 108388654 A CN108388654 A CN 108388654A
Authority
CN
China
Prior art keywords
vector
training
sample
term vector
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810171490.XA
Other languages
Chinese (zh)
Other versions
CN108388654B (en
Inventor
张玉红
王勤勤
李玉玲
李培培
胡学钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201810171490.XA priority Critical patent/CN108388654B/en
Publication of CN108388654A publication Critical patent/CN108388654A/en
Application granted granted Critical
Publication of CN108388654B publication Critical patent/CN108388654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, step includes:1. utilizing known term vector dictionary, every sample in training set and test set is expressed as term vector matrix;2. choose suitable convolution kernel carries out convolution to term vector matrix, extraction mappings characteristics vector is to realize dimensionality reduction;3. building adversative dictionary, and by inquiring the position of adversative in the sample, semantic division is carried out to the mappings characteristics of extraction, extraction each divides most important information in the block, forms final feature space;4. training grader based on the final feature space, and classify to the sample in test set.The present invention is based on the adversative dictionaries of structure, realize the division of sentence semantics block, can obtain the important semantic information in every section, while considering the positional structure feature of sentence, so as to improve the correctness of text emotion classification.

Description

A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism
Technical field
The invention belongs to the emotional semantic classification problems of nature and Language Processing field, especially for the emotion for including a variety of semantemes Expression way is such as intended to press down the turnover sentence first raised or be intended to raise and first press down, carries out effective emotional semantic classification.
Background technology
It is the text message of communication media increasingly by enterprises and institutions using network with the rapid development of Internet With personal concern, the network information can be that government department understands common people's intention, and enterprise understands user couple by opening up product review The opinion of product is to improve properties of product, and consumer is by product review come consumption guidance behavior.However, online have greatly daily The new comment of amount occurs, and comment, which starts, may be the attitude actively affirmed but consider that other factors may be changed into negative state That is, there is the phenomenon that rising after restrain, first go up and then down or changing the attitude repeatedly in degree.Such as:Social networks (including:Domestic Tencent/ Sina weibo, Renren Network, external Facebook, Twitter etc.) a large amount of user data can be all generated daily, and wherein wrap Certain event is delivered containing a large amount of people to the text message of personal view.Such as:About " little Huang vehicle death case " in Tencent's news One comment:" compensation for the spirit of humanitarianism will be understood by, but chase shared bicycle provider right and wrong simply Normal not logic.Moreover as victim, you itself is also in fault, if you do not go Misuse to share bicycle, just It is not in present situation." it is that should give reparation to shared bicycle provider to hold a definite attitude, but transfer again below first Expression victim oneself is also required to undertake the responsibility.Such as:Shopping platform (including:Jingdone district store, Suning easily purchase, day cat etc.) daily User's net purchase comment information of magnanimity will be generated;Such as:The comment of the one purchase mobile phone user in Jingdone district:" sound is too small, connects electricity Words it is very hard, sound opens to the maximum effect also unobvious, but networking speed is quickly, shape as smart as a new pin, whole favorable comment ", It expressed before this and states that arrangement is satisfied behind disadvantage certainly.These real-life comment informations, which are removed, to be had and can show emotion Outside polar emotion word, and makes comment information containing adversative while there is positive/negative two kinds of emotions, this feature to make text feelings Sense classification problem becomes more complicated, but also traditional data mining algorithm and existing machine learning method face sternness Challenge:
One of challenge:Traditional unsupervised segmentation method based on emotion dictionary, passes through word in emotion dictionary parsing sentence The feeling polarities of language, and by determining the Sentiment orientation of sentence entirety to the polar simple summation of these words, to word Importance does not distinguish, it is clear that is difficult to obtain preferable effect;
The two of challenge:Based on machine learning text emotion trend analysis method (including:K- neighbours, support vector machines SVM, Bayes etc.) there are following main problems:1) be indicated using traditional bag of words method, the dimension of text vector compared with High and data are more sparse, do not utilize the training of model;2) only consider the syntactic structure between feature and ignore its semantic information, make It is mismatched at the semanteme in Feature Mapping result, can not indicate the semanteme of document well.
The three of challenge:Existing deep learning method can learn sentence characteristics, and typical neural network structure such as recycles god Through network (RNN) and convolutional neural networks (CNN), both models are all that word vector indicates feature space, and utilizes semanteme Synthetic method extracts the feature of sentence, finally grader is used to classify its feeling polarities.Relative to RNN models, CNN parameters Negligible amounts, and the semantic feature of text can be preferably captured, time complexity is also much smaller than RNN.However, traditional CNN nets Network is used to have ignored the structure feature of sentence when sentiment analysis, and Max-pooling methods are from the feature of sentence according in importance A maximum value is extracted, any differentiation is not made to the structure of sentence.This feature makes this method in the processing of turnover sentence It is ineffective.
Invention content
In the presence of solving the problems, such as three challenges, the present invention is provided one kind and is divided based on turnover sentence semantic chunk The sensibility classification method of mechanism carries out segmentation to Feature Mapping space and realizes sentence language to the adversative dictionary based on structure The division of adopted block to obtain the important semantic information in every section, while considering the positional structure feature of sentence, and then improves text The correctness of this Sentiment orientation analysis.
The present invention is to reach the goal of the invention, is adopted the following technical scheme that:
The present invention it is a kind of based on turnover sentence semantic chunk partition mechanism sensibility classification method the characteristics of be as follows into Row:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set The term vector of word;The dimension set of term vector is | V |;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text is constituted Test set DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th in the training set DS and test set DT is indicated respectively Training sample and j-th of test sample, and have: It indicates in the training set DS i-th Training sample siIn m-th of word; Indicate j-th of test in the test set DT Sample tjIn n-th of word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to the term vector dictionary D, i-th of training sample s in the training set DS is inquirediIn m-th of word Term vector beObtain i-th of training sample s in the training set DSiTerm vector matrix For a M × | V | matrix;
Similarly obtain j-th of test sample t in the test set DTjTerm vector matrix Indicate j-th of test sample t in the test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1,W2,…,Wk,…,WKWhereinIndicate a height of hk, width nkKth kind size convolution kernel set,Indicate hk×nkMatrix;And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random Initialization;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith G-th of convolution kernelI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between piece SectionIt carries outConvolution operation obtains single features map vectorIn the τ valueIt is reflected to obtain single features Directive amountThen by the convolution kernel of the K kinds size and i-th of training sample si's Term vector matrix SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Indicate that bias vector, σ () are activation primitive,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M-hk+ 1;
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in the training set DSiMiddle lookup adversative, It is according to adversative position that the Feature Mapping is vectorialTo being segmented, and most important one is extracted in each segmentation A feature, then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search in the training set DS i-th Training sample siIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZDL, then the L is provided A adversative zLI-th of training sample s in the training set DSiIn position be
Step 3.2 is according to g-th of convolution kernelSize obtain the adversative zLSample is trained at described i-th This siTerm vector matrix SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, point The maximum value in two sections of single features map vectors is not obtainedWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to the K × G maximum value pairIt carries out Splicing, thus obtains i-th of training sample siFinal character representation space
Step 4:Based on the character representation spaceBuild disaggregated model
Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation space Element for same dimension be 0 or be 1 vector;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, WoFor weighting parameter, boFor another bias vector;
Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O The training of (), the softmax graders after being optimizedAs the disaggregated model;
Step 5 utilizes the disaggregated model O*() is to j-th of test sample t in the test set DTjCarry out emotion point Class obtains the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
Compared with the prior art, beneficial effects of the present invention are embodied in:
1, in order to capture abundanter text feature, the present invention carries out convolution fortune using various sizes of more convolution kernels It calculates, to obtain the Feature Mapping vector of multi-source, has achieved the purpose that extract higher quality feature.
2, the present invention is based on adversative positions is segmented sentence, and most important feature is chosen from each section;Both gram It has taken the dimension disaster in conventional method, sparsity and has not accounted for text semantic information, reduced so as to cause nicety of grading Problem;It overcomes again and does not account for sentence turnover phenomenon in deep learning method, extract feature only according to importance, cause important The problem of characteristic information is lost;Therefore the accuracy rate of text emotion classification is improved.
3, the present invention is distributed zero setting method in final disaggregated model building process using Bernoulli Jacob, can effectively prevent Only model over-fitting makes model have better generalization ability.
4, the present invention is towards practical application area, such as:User delivers the Sentiment orientation to event view in social networks, can It is found in time for government department and grasps public opinion trend;Shopping online user to the Sentiment orientation of comment on commodity, can be businessman, Consumer provides prediction, early warning work, provides suggestion for the sale of businessman, service quality adjustable strategies and is the purchase of consumer Object behavior is recommended.
Description of the drawings
Fig. 1 is sensibility classification method flow chart of the present invention;
Fig. 2 is the process schematic that the present invention carries out convolution operation using more convolution kernels;
Fig. 3 is that the present invention is based on the process schematics that adversative position carries out semantic segmentation;
Fig. 4 is the process schematic that the present invention builds disaggregated model merely with Partial Feature in feature space.
Specific implementation mode
In the present embodiment, as shown in Figure 1, a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism is by such as Lower step carries out:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring in training set and test set The term vector of word;Based on corpus (about 100,000,000,000 words) outside GoogleNews corpus in the present embodiment, utilize Word2vec disclosed in google is trained corpus, obtained term vector library googlenews-vecctors- Negative300.bin files are as term vector dictionary D, the dimension set of term vector | V |, in the present embodiment | V |=300;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text is constituted Test set DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th of training in training set DS and test set DT is indicated respectively Sample and j-th of test sample, and have: Indicate i-th of training sample in training set DS siIn m-th of word; Indicate j-th of test sample t in test set DTjIn n-th A word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to term vector dictionary D, i-th of training sample s in training set DS is inquirediIn m-th of wordTerm vector ForObtain i-th of training sample s in training set DSiTerm vector matrixFor a M × | V | matrix;
Similarly obtain j-th of test sample t in test set DTjTerm vector matrix Indicate j-th of test sample t in test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1, W2 ..., Wk..., WK } whereinIndicate a height of hk, width nkKth kind size convolution kernel set,Indicate hk×nkMatrix;And have It indicates g-th of convolution kernel in kth kind size convolution kernel set, and carries out random Initialization;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith g-th volume Product coreI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features map vectorThen by the convolution kernel of K kind sizes and i-th of training sample siTerm vector matrix SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Expression bias vector, σ () relu activation primitives,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M- hk+1;Convolution operationConcrete operation be expressed as WithG-th of convolution kernel is indicated respectivelyWith term vector matrix Siτ to τ+h-1 between segmentIn E rows f row element.
The convolution algorithm schematic diagram that the convolution kernel for assuming 3 kinds of sizes is given shown in Fig. 2, the size point of convolution kernel in Fig. 2 Not Wei 2 × 300,3 × 300 and 4 × 300, i.e. hk=2, hk=3, hk=4, n is set here1=n2=n3=| V |=300;Each The convolution kernel of size has 100, i.e. G=100.Respectively to 3 × 100 convolution kernels and i-th of training sample s in Fig. 2iWord Vector matrix SiConvolution algorithm is carried out,
Work as convolution kernelSize is 2 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-2+1 Vector, the convolution kernel that 100 sizes are 2 × 300 then obtains the vector of 100 M-2+1 dimensions, is expressed as
Work as convolution kernelSize is 3 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-3+1 Vector, the convolution kernel that 100 sizes are 3 × 300 then obtains the vector of 100 M-3+1 dimensions, is expressed as
Work as convolution kernelSize is 4 × 300, the single features map vector obtained after convolution operationIt is tieed up for M-4+1 Vector, the convolution kernel that 100 sizes are 4 × 300 then obtains the vector of 100 M-4+1 dimensions, is expressed as
3 × 100 convolution kernels in Fig. 2 and i-th of training sample siTerm vector matrix SiConvolution algorithm is carried out, can be obtained To 3 × 100 single features map vectorsObtain i-th of training sample siTerm vector matrix SiFeature Mapping to Amount, is expressed as
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in training set DSiMiddle lookup adversative, according to Adversative position is vectorial by Feature MappingTo being segmented, and a most important feature is extracted in each segmentation, Then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to adversative dictionary ZD, search i-th of training sample in training set DS siIn whether contain adversative, if containing l-th adversative z in turnover dictionary ZDL, then l-th adversative z is providedLIn training Collect i-th of training sample s in DSiIn position be
Intelligent word (http is combined in the present embodiment://www.smart-words.org/linking-words/ ) and MSU (https transition-words.html://msu.edu/user/jdowell/135/transw.html) it is open Adversative, construct the dictionary of total 179 adversatives.Adversative dictionary ZD is as shown in table 1.
1 adversative dictionary of table
Step 3.2 is according to g-th of convolution kernelSize obtain adversative zLIn i-th of training sample siTerm vector square Battle array SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to division pointsBy Feature Mapping vectorIn single features map vectorIt is divided into Two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, point The maximum value in two sections of single features map vectors is not obtainedWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to K × G maximum value pairIt is spelled It connects, thus obtains i-th of training sample siFinal character representation space
The specific example that semantic division is carried out based on adversative position is given shown in Fig. 3, with 100 sizes for 3 For × 300 convolution kernel, the pond maximum operation process based on adversative is illustrated.Include adversative in sample in Fig. 3 " but ", and the position of " but " in the sample isIt is assumed that the size of convolution kernel is 3 × 300, adversative " but " can be obtained and existed Single features map vectorIn division points be 10-3+1, to by single features map vectorIt is divided into two sectionsWithAnd every section of maximum value is found out respectively, so as to To a maximum value pair, the convolution kernel that 100 sizes are 3 × 300 then obtains 100 maximum values pair, to constitute final spy Levy representation space
Step 4:Feature based representation spaceBuild disaggregated model
Step 4.1 is based on overfitting problem caused by full connection type trains grader possible in order to prevent, is exerted using uncle Sharp mode is randomly by character representation spaceρ is set to 0 according to a certain percentage, and only non-zero element participates in grader structure.Specifically Operation is as shown in Figure 4:Setting zero setting vector is distributed for r based on Bernoulli Jacob, and zero setting vector r and character representation spaceIt is same The element of dimension be 0 or be 1 vector, utilizeIt willIn Partial Elements set to 0;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, and sigmoid functions or tanh functions, W can be usedoFor random initializtion Weighting parameter, boFor another bias vector, it is initialized as 0 vector;
I-th of training sample s can be calculated using the grader of formula (2)iThe probability for belonging to classification l is expressed as formula (3):
In formula (3),Indicate O*(si) vector first of element, | l | indicate classification total number.
Step 4.3 optimizes logarithm loss function using gradient descent method, to softmax graders O The training of (), the softmax graders after being optimizedAs disaggregated model;
In the present embodiment, log-likelihood loss function is expressed as formula (4):
The concrete operation step of gradient descent method is as follows:
Step 4.3.1 updates weighting parameter Wo and bias vector bo according to formula (5) and formula (6);
Step 4.3.2 is by the maximum value pair of kth kind g-th of convolution kernel of sizeIt is put back to single Feature Mapping vectorIn home position, single features map vectorIn remaining position set to 0, be expressed as
Step 4.3.3 updates g-th of convolution kernel according to formula (7) and formula (8)With bias vector bc
In formula (7) and formula (8),It indicates to carry out 180 overturnings to matrix;
Step 4.3.4 returns to step 2, and it is E times total that iteration executes from Step 2 to Step 4;
Step 5 utilizes disaggregated model O*() is to j-th of test sample t in test set DTjEmotional semantic classification is carried out, is obtained not The probability of feeling of sympathy classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
For j-th of test sample t in test set DTj, final feature space is obtained according to the identical method of training sampleSubstitute into grader O*J-th of test sample t in test set DT is solved in ()jThe probability for belonging to classification l is expressed as:
In formula (9),Indicate O*(tj) vector first of element, | l | indicate classification total number.

Claims (1)

1. a kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism, it is characterized in that carrying out as follows:
Step 1:The term vector of sample indicates in training set and test set
Step 1.1 builds term vector dictionary D
External language material is obtained from network and is trained, and term vector dictionary D is obtained, for inquiring word in training set and test set Term vector;The dimension set of term vector is | V |;
Step 1.2 carries out term vector expression to sample in training set and test set
Obtain | I | comment text composing training collection DS={ s1,s2,…si…s|I|And | I | comment text constitutes test Collect DT={ t1,t2,…,tj,…t|J|, wherein siAnd tjI-th of training in the training set DS and test set DT is indicated respectively Sample and j-th of test sample, and have: Indicate i-th of instruction in the training set DS Practice m-th of word in sample si; Indicate j-th of test specimens in the test set DT This tjIn n-th of word;I=1,2 ..., | I |, m=1,2 ..., M, j=1,2 ..., | J |, n=1,2 ..., N;
According to the term vector dictionary D, i-th of training sample s in the training set DS is inquirediIn m-th of wordWord Vector isObtain i-th of training sample s in the training set DSiTerm vector matrixFor One M × | V | matrix;
Similarly obtain j-th of test sample t in the test set DTjTerm vector matrix Table Show j-th of test sample t in the test set DTjIn n-th of wordTerm vector;
Step 2:Setting convolution kernel simultaneously carries out convolutional calculation
The convolution kernel set of K kind different size sizes is arranged in step 2.1, is denoted as { W1,W2,…,Wk,…,WKWhereinIndicate a height of hk, width is the kth kind size convolution kernel set of nk,Indicate hk×nkMatrix;And have Indicate kth kind size convolution kernel set in g-th of convolution kernel, and carry out with Machine initializes;
Step 2.2 is with g-th of convolution kernelFor sliding window, using formula (1) to g-th of convolution kernelWith it is described G-th of convolution kernelI-th of training sample s under coveringiTerm vector matrix Siτ to τ+h-1 between segmentIt carries outConvolution operation obtains single features map vectorIn the τ valueTo obtain single features mapping VectorThen by the convolution kernel of the K kinds size and i-th of training sample siWord Vector matrix SiConvolution operation is carried out, i-th of training sample s is obtainediTerm vector matrix SiFeature Mapping vector
In formula (1),Indicate the vector matrix between the τ to τ+h-1 under the covering of current sliding window mouth,Indicate that bias vector, σ () are activation primitive,Indicate 1 × (M-hk+ 1) matrix, 1≤τ≤M-hk+ 1;
Step 3:Structure turnover dictionary ZD, and i-th of training sample s in the training set DSiMiddle lookup adversative, according to Adversative position is vectorial by the Feature MappingTo being segmented, and a most important spy is extracted in each segmentation Sign, then several segments obtain several features;
Step 3.1 structure turnover dictionary ZD, and according to the adversative dictionary ZD, search i-th of training in the training set DS Sample siIn whether contain adversative, if containing l-th adversative z in the turnover dictionary ZDL, then provide the l-th and turn Roll over word zLI-th of training sample s in the training set DSiIn position be
Step 3.2 is according to g-th of convolution kernelSize obtain the adversative zLIn i-th of training sample si's Term vector matrix SiSingle features map vectorIn position beAnd as division points;
Step 3.3 is according to the division pointsBy Feature Mapping vectorIn single features map vector It is divided into two sections of single features map vectors, i.e.,With
Step 3.4 utilizes maximum pond two sections of single features map vectors of method pairWithIt is handled, respectively To the maximum value in two sections of single features map vectorsWithAnd form the maximum value pair of kth kind g-th of convolution kernel of sizeTo obtain K × G maximum value pair
Step 3.5 is to the K × G maximum value pairIt is spelled It connects, thus obtains i-th of training sample siFinal character representation space
Step 4:Based on the character representation spaceBuild disaggregated model
Step 4.1 is based on Bernoulli Jacob and is distributed setting zero setting vector for r, and the zero setting vector r and character representation spaceIt is same The element of dimension be 0 or be 1 vector;
Step 4.2 is using formula (2) to training set DS structure softmax graders O:
In formula (2), f () is activation primitive, WoFor weighting parameter, boFor another bias vector;
Step 4.3 optimizes loss function using gradient descent method, to the softmax graders O () Training, the softmax graders after being optimizedAs the disaggregated model;
Step 5 utilizes the disaggregated model O*() is to j-th of test sample t in the test set DTjEmotional semantic classification is carried out, is obtained To the probability of different emotions classification, and using the emotional category corresponding to maximum probability as final emotional semantic classification result.
CN201810171490.XA 2018-03-01 2018-03-01 Sentiment classification method based on turning sentence semantic block division mechanism Active CN108388654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810171490.XA CN108388654B (en) 2018-03-01 2018-03-01 Sentiment classification method based on turning sentence semantic block division mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810171490.XA CN108388654B (en) 2018-03-01 2018-03-01 Sentiment classification method based on turning sentence semantic block division mechanism

Publications (2)

Publication Number Publication Date
CN108388654A true CN108388654A (en) 2018-08-10
CN108388654B CN108388654B (en) 2020-03-17

Family

ID=63069615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810171490.XA Active CN108388654B (en) 2018-03-01 2018-03-01 Sentiment classification method based on turning sentence semantic block division mechanism

Country Status (1)

Country Link
CN (1) CN108388654B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271632A (en) * 2018-09-14 2019-01-25 重庆邂智科技有限公司 A kind of term vector learning method of supervision
CN110377740A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Feeling polarities analysis method, device, electronic equipment and storage medium
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN110765769A (en) * 2019-08-27 2020-02-07 电子科技大学 Entity attribute dependency emotion analysis method based on clause characteristics
CN111611375A (en) * 2019-07-03 2020-09-01 北京航空航天大学 Text emotion classification method based on deep learning and turning relation
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253792A1 (en) * 2011-03-30 2012-10-04 Nec Laboratories America, Inc. Sentiment Classification Based on Supervised Latent N-Gram Analysis
CN104035992A (en) * 2014-06-10 2014-09-10 复旦大学 Method and system for processing text semantics by utilizing image processing technology and semantic vector space
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
KR101652486B1 (en) * 2015-04-05 2016-08-30 주식회사 큐버 Sentiment communication system based on multiple multimodal agents
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253792A1 (en) * 2011-03-30 2012-10-04 Nec Laboratories America, Inc. Sentiment Classification Based on Supervised Latent N-Gram Analysis
CN104035992A (en) * 2014-06-10 2014-09-10 复旦大学 Method and system for processing text semantics by utilizing image processing technology and semantic vector space
CN104731770A (en) * 2015-03-23 2015-06-24 中国科学技术大学苏州研究院 Chinese microblog emotion analysis method based on rules and statistical model
KR101652486B1 (en) * 2015-04-05 2016-08-30 주식회사 큐버 Sentiment communication system based on multiple multimodal agents
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANWEN WU等: "Building Chinese Sentiment Lexicon Based on HowNet", 《ADVANCED MATERIALS RESEARCH》 *
YUHONG ZHANG等: "Sentiment Classification Based on Piecewise Pooling Convolutional Neural Network", 《TECH SCIENCE PRESS CMC》 *
邸鹏等: "基于转折句式的文本情感倾向性分析", 《计算机工程与设计》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271632A (en) * 2018-09-14 2019-01-25 重庆邂智科技有限公司 A kind of term vector learning method of supervision
CN111611375A (en) * 2019-07-03 2020-09-01 北京航空航天大学 Text emotion classification method based on deep learning and turning relation
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN110377740A (en) * 2019-07-22 2019-10-25 腾讯科技(深圳)有限公司 Feeling polarities analysis method, device, electronic equipment and storage medium
CN110765769A (en) * 2019-08-27 2020-02-07 电子科技大学 Entity attribute dependency emotion analysis method based on clause characteristics
CN110765769B (en) * 2019-08-27 2023-05-02 电子科技大学 Clause feature-based entity attribute dependency emotion analysis method
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system
CN113806542B (en) * 2021-09-18 2024-05-17 上海幻电信息科技有限公司 Text analysis method and system

Also Published As

Publication number Publication date
CN108388654B (en) 2020-03-17

Similar Documents

Publication Publication Date Title
CN108388654A (en) A kind of sensibility classification method based on turnover sentence semantic chunk partition mechanism
CN109145112B (en) Commodity comment classification method based on global information attention mechanism
CN109271522B (en) Comment emotion classification method and system based on deep hybrid model transfer learning
Hota et al. KNN classifier based approach for multi-class sentiment analysis of twitter data
CN107391483A (en) A kind of comment on commodity data sensibility classification method based on convolutional neural networks
Rei et al. Grasping the finer point: A supervised similarity network for metaphor detection
CN108763326B (en) Emotion analysis model construction method of convolutional neural network based on feature diversification
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
Basari et al. Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization
Ruangkanokmas et al. Deep belief networks with feature selection for sentiment classification
CN108427670A (en) A kind of sentiment analysis method based on context word vector sum deep learning
Mamgain et al. Sentiment analysis of top colleges in India using Twitter data
US11762990B2 (en) Unstructured text classification
CN109299268A (en) A kind of text emotion analysis method based on dual channel model
CN110263257B (en) Deep learning based recommendation method for processing multi-source heterogeneous data
CN107908715A (en) Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
CN110543242A (en) expression input method based on BERT technology and device thereof
CN107515855A (en) The microblog emotional analysis method and system of a kind of combination emoticon
CN110457562A (en) A kind of food safety affair classification method and device based on neural network model
Jiang et al. Detecting hate speech from tweets for sentiment analysis
CN111814453A (en) Fine-grained emotion analysis method based on BiLSTM-TextCNN
Sunarya et al. Comparison of accuracy between convolutional neural networks and Naïve Bayes Classifiers in sentiment analysis on Twitter
CN110321918A (en) The method of public opinion robot system sentiment analysis and image labeling based on microblogging
CN109062958B (en) Primary school composition automatic classification method based on TextRank and convolutional neural network
Huang A CNN model for SMS spam detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant