CN110134934A - Text emotion analysis method and device - Google Patents

Text emotion analysis method and device Download PDF

Info

Publication number
CN110134934A
CN110134934A CN201810105796.5A CN201810105796A CN110134934A CN 110134934 A CN110134934 A CN 110134934A CN 201810105796 A CN201810105796 A CN 201810105796A CN 110134934 A CN110134934 A CN 110134934A
Authority
CN
China
Prior art keywords
text
word
participle
vector
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810105796.5A
Other languages
Chinese (zh)
Inventor
张春荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Putian Information Technology Co Ltd
Original Assignee
Putian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Putian Information Technology Co Ltd filed Critical Putian Information Technology Co Ltd
Priority to CN201810105796.5A priority Critical patent/CN110134934A/en
Publication of CN110134934A publication Critical patent/CN110134934A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

This application discloses a kind of text emotion analysis method and devices, and wherein method includes: to obtain the text for needing to carry out sentiment analysis, and obtain the participle that the text includes;For each participle, term vector of the participle in default dictionary is calculated, and according to the term vector, calculates the similarity in the participle and the dictionary between other vocabulary;The participle of specified part of speech is determined as candidate sentiment analysis target word by the part of speech for marking each participle;Using the similarity of candidate sentiment analysis target word, extracted from the sentiment analysis target word of the candidate text it is corresponding in terms of Aspect;According to the sentiment analysis target word and the aspect of the candidate, feeling polarities analysis is carried out to the text using sectional convolution neural network.Using the present invention, text can be carried out effectively, accurately sentiment analysis.

Description

Text emotion analysis method and device
Technical field
The present invention relates to Computer Applied Technologies, more particularly to a kind of text emotion analysis method and device.
Background technique
Text emotion analysis refers to there is artificial subjective emotional color text material to be handled, be analyzed and the mistake of reasoning Journey.It is that user is handled and analyzed about the comment text of some theme that text emotion, which analyzes main application scenarios,.Than Such as, people plan to go a restaurant have a meal between before, it will usually go to see the comment on the net about restaurant, then decide whether It has a meal in this restaurant.Text emotion analysis is also fields such as current network the analysis of public opinion (such as ' product evaluation ' data minings) Vital task needs the feeling polarities such as viewpoint, the hobby expressed user in text to judge.These have feeling polarities Subjective texts are primarily referred to as user to product, restaurant or the comment of service, these comments can exist to potential consumer Purchase product is consumed and provides decision-making foundation when being serviced.
Currently, having parsing such as chapter grade, paragraph level, Sentence-level, the word-level of many levels for emotion information text.Text The analysis of this Sentiment orientation mainly has two major classes method: the first is the method based on sentiment dictionary;Second is based on engineering The method of habit.The former needs to use sentiment dictionary (HowNet of the main You Zhong National IP Network publication of Chinese sentiment dictionary and Taiwan Two sentiment dictionaries of university NTUSD), the method for calculating calculates the similarity of common words and seed word, then judges sentence Sentiment orientation.Second is the method based on machine learning, carries out the classification of text emotion information, text using machine learning algorithm The extraction of this emotion information, text emotion information analysis etc..In view of reaching its maturity for internet development, big data era is required more It is careful and various sentiment analysis accurately are carried out to network text, accomplish precisely to sort out the various aspects of emotion word, from And hold the Sentiment orientation either to short sentence or long article.
Text emotion analysis has great help to excavating user's pent-up demand and improve product and service, but this A little comments are all increasing in large quantities daily, by manually analyzing it on not only at high cost but also time there is also hysteresis quality, because This needs to analyze the feeling polarities of these texts using suitable intelligent algorithm.
As shown in Figure 1, existing various aspects text emotion analytical plan is generally divided into three steps and carries out: 1. to comment pair The aspect (Aspect) of elephant and its corresponding emotion word are identified;2. carrying out emotion to many aspects of the comment object identified Polarity check, the i.e. emotional value by the polarity of emotion word according to predefined are classified, such as are classified as normal polarity or negative To polarity;3. being summarized (aggregation) to classification results.
Usually there are following problems in existing various aspects text emotion analytical plan:
Firstly, sentiment analysis is difficult to carry out fine-grained sentiment analysis.At present many methods be for sentence level or The sentiment analysis of chapter rank is difficult the case where various aspects conflict with each other inside parsing sentence.And user comment, not only singly One comment, such as may include the comment of food, service, place, environment etc. various aspects (Aspect) to the comment in restaurant. May may be to the evaluation in restaurant comprising conflicting many-sided emotion, such as a user in these various comments " so-and-so food in restaurant is fine, but services very poor ... ", for including conflicting many-sided feelings in a this sentence Sense, the sentiment analysis based on sub- rank or chapter rank can not carry out effective, accurately emotional semantic classification excavation to it.
Secondly, be directed to network short commentary, it is this mostly be informal occasions short text corpus, only consider front tendency and In the case where negative tendency, the classification of Text Orientation is realized.Since short text noise is big, neologisms are more, abbreviation is frequent, has oneself Regular collocation, contextual information it is limited, it is obvious to do segmentation ambiguity, utilizes above-mentioned existing many-sided text emotion analysis side Case realizes the classification of Text Orientation, and what is frequently resulted in is bad cutting.
Summary of the invention
In view of this, the main purpose of the present invention is to provide a kind of text emotion analysis method and device, it can be to text This progress effectively, accurately sentiment analysis.
In order to achieve the above object, technical solution proposed by the present invention are as follows:
A kind of text emotion analysis method, comprising:
A, the text for needing to carry out sentiment analysis is obtained, and obtains the participle that the text includes;
B, for each participle, term vector of the participle in default dictionary is calculated, and according to the term vector, is calculated Similarity in the participle and the dictionary between other vocabulary;
C, the participle of specified part of speech is determined as candidate sentiment analysis target by the part of speech for marking each participle Word;
D, it using the similarity of candidate sentiment analysis target word, is mentioned from the sentiment analysis target word of the candidate Take out the corresponding aspect Aspect of the text;
E, according to the sentiment analysis target word and the aspect of the candidate, using sectional convolution neural network to the text This progress feeling polarities analysis.
Preferably, the acquisition needs the text for carrying out sentiment analysis to include:
Webpage where the text is cleaned and parsed using web-page parser;By the non-textual letter in parsing result Breath is deleted, and the text is obtained.
Preferably, the participle that the acquisition text includes includes:
The text is split according to complete words;
Divide obtained each sentence for described, carry out word segmentation processing using preset participle tool, obtains constituting and be somebody's turn to do The participle of sentence.
Preferably, the step b includes:
For each participle, using the CBOW model of Word2Vec, calculate word of the participle in the dictionary to Amount;By calculating cosine angle, the similarity in the participle and the dictionary between other vocabulary is obtained.
Preferably, the specified part of speech includes: noun, verb, adjective and adverbial word.
Preferably, the step d includes:
The word in terms of in dictionary, obtaining the corresponding seed of the text fields in terms of preset seed;The seed Word in aspect dictionary is included in the dictionary;
For the sentiment analysis target word of each candidate, one by one by word in terms of the target word and each seed it Between the similarity and preset similar threshold value be compared, if the similarity be greater than the similar threshold value, will be right Word in terms of the seed answered, as the corresponding aspect of the text.
Preferably, further comprising before the step d after the step a:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;
In terms of word in terms of the seed extracted is added to the seed in dictionary.
Preferably, the step e includes:
Using the sentiment analysis target word of the candidate, word sequence H is constructed;Using each word in the word sequence H as One input state of the convolutional neural networks model based on attention is sequentially input to the volume of the convolutional neural networks model Lamination;Wherein, H={ h1..., ht..., hT, 1≤t≤T;htFor t-th of word in word sequence H, T is the feelings of the candidate The quantity of sense analysis target word;
The convolutional layer of the convolutional neural networks model utilizes the context vector A of random initializtion, according toCalculate each input state htWeight at;According toDescribed in construction The text representation vector v of text, and export to the pond layer of the convolutional neural networks model;Wherein, et=Tanh (Wht+ B), W is the weight in model, and b is the biasing in model, and Tanh () is hyperbolic tangent function;
The pond layer of the convolutional neural networks model, by the way of being segmented pond, to the text representation vector v into The processing of row pondization obtains the feature vector of the text, and exports to the affection computation layer of the convolutional neural networks model;
The affection computation layer of the convolutional neural networks model, using each aspect as an affective tag, according to The feature vector of the text, for each affective tag, using softmax classifier, construct the score of the affective tag to Amount, and according toConditional probability distribution is converted by the score vector, wherein i=1,2 ..., C, C indicates the number of emotion aspect label, pi(x) conditional probability distribution of i-th of emotion aspect label, x are indicatediIndicate i-th of feelings The score vector of sense aspect label.
Preferably, it is described by the way of being segmented pond, pond processing is carried out to the text representation vector v, obtains institute The feature vector for stating text includes:
The text representation vector v is averagely divided into C sections, and determines the maximum value in every vector paragraph;
The maximum value of all vector paragraphs is spliced into a vector, and tanh functional operation is carried out to the vector, obtains institute State the feature vector of text.
A kind of text emotion analytical equipment, comprising:
Extraction unit is segmented, for obtaining the text for needing to carry out sentiment analysis, and obtains the participle that the text includes;
Similarity calculated, for calculating term vector of the participle in default dictionary for each participle, and According to the term vector, the similarity in the participle and the dictionary between other vocabulary is calculated;
The participle of specified part of speech is determined as by target word extraction unit for marking the part of speech of each participle Candidate sentiment analysis target word;
Aspect extraction unit, for the similarity using candidate sentiment analysis target word, from the feelings of the candidate The corresponding aspect Aspect of the text is extracted in sense analysis target word;
Feeling polarities analytical unit utilizes segmentation for the sentiment analysis target word and the aspect according to the candidate Convolutional neural networks carry out feeling polarities analysis to the text.
Preferably, the participle extraction unit, for utilizing webpage solution when acquisition needs to carry out the text of sentiment analysis Parser is cleaned and is parsed to webpage where the text, and the non-textual information in parsing result is deleted, the text is obtained This.
Preferably, the participle extraction unit, for when obtaining the participle that the text includes, to the text according to Complete words are split, and are divided obtained each sentence for described, are carried out word segmentation processing using preset participle tool, obtain To the participle for constituting the sentence.
Preferably, the similarity calculated, for utilizing the CBOW mould of Word2Vec for each participle Type calculates term vector of the participle in the dictionary, and by calculating cosine angle, obtains its in the participle and the dictionary Similarity between his vocabulary.
Preferably, the specified part of speech includes: noun, verb, adjective and adverbial word.
Preferably, the aspect extraction unit, for from dictionary, obtaining in terms of preset seed, the text is affiliated to be led Word in terms of the corresponding seed in domain;Word in terms of the seed in dictionary is included in the dictionary;For each candidate's Sentiment analysis target word, one by one by the similarity between word in terms of the target word and each seed and preset similar Threshold value is compared, if the similarity is greater than the similar threshold value, by word in terms of corresponding seed, as the text Corresponding aspect.
Preferably, described device further comprises Word library updating unit in terms of seed:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;
In terms of word in terms of the seed extracted is added to the seed in dictionary.
Preferably, the feeling polarities analytical unit constructs word order for the sentiment analysis target word using the candidate Arrange H;Using each word in the word sequence H as an input state of the convolutional neural networks model based on attention, according to The secondary convolutional layer for being input to the convolutional neural networks model;Wherein, H={ h1..., ht..., hT, 1≤t≤T;htFor word T-th of word in sequence H, T are the quantity of the sentiment analysis target word of the candidate;Pass through the convolutional neural networks model Convolutional layer, using the context vector A of random initializtion, according toCalculate each input shape State htWeight at;According toThe text representation vector v of the text is constructed, and is exported to the convolution mind Pond layer through network model;Wherein, et=Tanh (Wht+ b), W is the weight in model, and b is the biasing in model, Tanh () is hyperbolic tangent function;By the pond layer of the convolutional neural networks model, by the way of being segmented pond, to described Text representation vector v carries out pond processing, obtains the feature vector of the text, and export to the convolutional neural networks model Affection computation layer;By the affection computation layer of the convolutional neural networks model, using each aspect as an emotion Label, for each affective tag, using softmax classifier, constructs the affective tag according to the feature vector of the text Score vector, and according toConditional probability distribution is converted by the score vector, wherein i=1, 2 ..., C, C indicate the number of emotion aspect label, pi(x) conditional probability distribution of i-th of emotion aspect label, x are indicatediTable Show the score vector of i-th of emotion aspect label.
Preferably, the feeling polarities analytical unit, for adopting in the pond layer for passing through the convolutional neural networks model With segmentation pond mode, to the text representation vector v carry out pondization handle when, by the text representation vector v average mark C sections are cut into, and determines the maximum value in every vector paragraph;The maximum value of all vector paragraphs is spliced into a vector, and to this to Amount carries out tanh functional operation, obtains the feature vector of the text.
In conclusion text emotion analysis method proposed by the present invention and device, using sectional convolution neural network, deeply Fine-grained many-sided sentiment analysis is carried out to text inside sentence, text can be carried out effectively, accurately sentiment analysis.
Detailed description of the invention
Fig. 1 is existing many-sided text emotion analysis method flow diagram;
Fig. 2 is the method flow schematic diagram of the embodiment of the present invention;
Fig. 3 is the apparatus structure schematic diagram of the embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, right below in conjunction with the accompanying drawings and the specific embodiments The present invention is described in further detail.
Core of the invention thought is: the method based on attention and sectional convolution neural network, excavates content of the sentence word Related information between language, and carry out many-sided text emotion using sectional convolution and analyze.
Fig. 2 is the flow diagram of the embodiment of the present invention, as shown in Fig. 2, the text emotion analysis side that the embodiment is realized Method specifically includes that
Step 201, acquisition need to carry out the text of sentiment analysis, and obtain the participle that the text includes.
This step, for being parsed to text to be analyzed, in the follow-up process, to carry out fine granularity using participle Sentiment analysis, with achieve the purpose that carry out text effectively, accurately sentiment analysis.
Preferably, the text for needing to carry out sentiment analysis can be obtained using following methods:
Webpage where the text is cleaned and parsed using web-page parser;By the non-textual letter in parsing result Breath is deleted, and the text is obtained.
It in the above method, needs to clean the webpage where text, parse, html web page is parsed into text, only Retain useful text information, to be based further on useful text information, obtains more fine-grained for sentiment analysis Participle.
The above-mentioned specific method for being cleaned and being parsed to webpage, is grasped by those skilled in the art, no longer superfluous herein It states.
Preferably, the participle that the text includes can be obtained using following methods:
The text is split according to complete words;Divide obtained each sentence for described, utilization is preset Participle tool carries out word segmentation processing, obtains the participle for constituting the sentence.
In the above method, text is first subjected to subordinate sentence, obtains several sentences, then recycles participle tool (such as: stammerer point Word tool) Chinese word segmentation is carried out to each sentence.Specific subordinate sentence and segmenting method, are grasped by those skilled in the art, This is repeated no more.
Step 202, for each participle, calculate term vector of the participle in default dictionary, and according to the word to Amount, calculates the similarity in the participle and the dictionary between other vocabulary.
This step for determining the term vector of each participle that step 201 obtains, and calculates each participle and default dictionary In each vocabulary between similarity.
Preferably, this step can be realized using following methods:
For each participle, using the CBOW model of Word2Vec, calculate word of the participle in the dictionary to Amount;By calculating cosine angle, the similarity in the participle and the dictionary between other vocabulary is obtained.
The participle of specified part of speech is determined as candidate emotion point by the part of speech of step 203, each participle of mark Analyse target word.
This step, for according to specified part of speech from step 201 to participle set in filter out in subsequent step and use In the target word of sentiment analysis, i.e. candidate sentiment analysis target word.
Preferably, the specified part of speech includes at least: noun, verb, adjective in order to enhance the accuracy of sentiment analysis And adverbial word.For example, for example, for text: " ... restaurant food is fine, but services poor ... ", available " restaurant " " food A series of words such as object " " very " " good " " still " " service " " poor " are as candidate sentiment analysis target word.
Step 204, using the similarity of candidate sentiment analysis target word, from the sentiment analysis target of the candidate The corresponding aspect Aspect of the text is extracted in word.
In this step, further from the candidate sentiment analysis target word that step 203 obtains, current text packet will be determined Which aspect contained, so as in the next steps, can have been leaveed particulate using convolutional neural networks to text based on these aspects The feeling polarities of degree are analyzed.
In practical applications, the evaluation for some restaurant or the evaluation to some product, a certain service, many words With specific aspect (Aspect) attribute.For example, in the evaluation to restaurant, word " food ", " service ", " position " etc., this It all can serve as the seed aspect (Seed Aspect) to restaurant review a bit.
Preferably, this step can be realized using following methods:
The word in terms of in dictionary, obtaining the corresponding seed of the text fields in terms of preset seed;The seed Word in aspect dictionary is included in the dictionary;
For the sentiment analysis target word of each candidate, one by one by word in terms of the target word and each seed it Between the similarity and preset similar threshold value be compared, if the similarity be greater than the similar threshold value, will be right Word in terms of the seed answered, as the corresponding aspect of the text.
In the above method, need first from the seed side for extracting the affiliated application field of text in terms of preset seed in dictionary Face word (these words are also contained in the dictionary), the similarity then obtained further according to step 102 judge each candidate Sentiment analysis target word and these seeds in terms of similarity between word whether be greater than certain threshold value, if it is, this is right Word in terms of the seed answered, as the corresponding aspect of the text.
Specifically, appropriate value can be arranged according to actual needs by those skilled in the art in the similar threshold value, as long as energy Ensure to accurately identify the corresponding aspect of text.
It further, in practical applications can also be according to current text pair in order to make seed aspect dictionary more comprehensively Dictionary is updated in real time in terms of seed.Specifically, this purpose can be realized using following methods:
It further comprise using current text after the step 201 and before the step 204 to described kind The process that sub- aspect dictionary is updated, specific as follows:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;By what is extracted In terms of word is added to the seed in terms of seed in dictionary.
Step 205, sentiment analysis target word and the aspect according to the candidate, utilize sectional convolution neural network pair The text carries out feeling polarities analysis.
This step carries out feeling polarities analysis to the text using sectional convolution neural network, has refined sentiment analysis Granularity, the case where can conflicting with each other to sentence inside various aspects carries out accurate and effective sentiment analysis.
Preferably, this step can be realized using following methods:
Step 2051 utilizes the sentiment analysis target word of the candidate, building word sequence H;It will be every in the word sequence H An input state of a word as the convolutional neural networks model based on attention, sequentially inputs to the convolutional neural networks The convolutional layer of model.
Wherein, H={ h1..., ht..., hT, 1≤t≤T.
htFor t-th of word in word sequence H.
T is the quantity of the sentiment analysis target word of the candidate.
Step 2052, the convolutional neural networks model convolutional layer utilize random initializtion context vector A, according toCalculate each input state htWeight at;According toDescribed in construction The text representation vector v of text, and export to the pond layer of the convolutional neural networks model.
Wherein, et=Tanh (Wht+ b), W is the weight in model.B is the biasing in model.
Tanh () is hyperbolic tangent function.
In the above method, the input state at each moment in the convolutional neural networks model based on attention and one with The context vector A of machine initialization is related, and A can be regarded as a kind of expression to input.
Described W, b and A are obtained together as the parameter during model training by constantly training study.By above Convolutional neural networks model based on attention, can for input construction one fixed length context-sensitive text representation to It measures, the information in vector includes the significance level of each input state.
The pond layer of step 2053, the convolutional neural networks model, by the way of being segmented pond, to the text table Show that vector v carries out pond processing, obtain the feature vector of the text, and exports to the emotion of the convolutional neural networks model Computation layer.
In this step, with traditional convolutional neural networks method the difference is that here by the way of being segmented pond into The processing of row pondization.Traditional convolutional neural networks method is when executing pondization operation, often in a convolution vector Choose the most significant feature that a maximum value represents the convolution vector.Here, in order to more accurately excavate the fine granularity feelings of text Feel polarity, the key feature of different structure will be captured by the way of segmentation using architectural characteristic possessed by sentence, that is, One convolution vector is equally divided into several segments, is then maximized in each section again.All to all convolution vector segmentations Similarly operated, these maximum values taken out then be spliced into a vector, and to the vector do one it is nonlinear Operation (tanh functional operation), using finally obtained vector as the character representation of current text sentence.Preferably, for above-mentioned Purpose can use following methods, by the way of being segmented pond, carry out pond processing to the text representation vector v, obtain The feature vector of the text:
The text representation vector v is averagely divided into C sections, and determines the maximum value in every vector paragraph;By all sections to The maximum value of amount is spliced into a vector, and carries out tanh functional operation to the vector, obtains the feature vector of the text.
It in the above method, needs text representation vector v being divided into several segments, institute in the quantity and step 204 of the segmentation Obtained aspect quantity is corresponding, in this way it can be ensured that the feature vector of text more fully includes the various aspects pair of text The feature answered, so as to accurately obtain feeling polarities corresponding to various aspects in the next steps.
The affection computation layer of step 2054, the convolutional neural networks model, using each aspect as an emotion Label, for each affective tag, using softmax classifier, constructs the affective tag according to the feature vector of the text Score vector, and according toConditional probability distribution is converted by the score vector.
Wherein, i=1,2 ..., C, C indicate the number of emotion aspect label, pi(x) i-th of emotion aspect label is indicated Conditional probability distribution, xiIndicate the score vector of i-th of emotion aspect label.
In this step, each aspect for needing to extract in step 204 calculates each emotion as an affective tag Label determines its score vector, so as to obtain the corresponding conditional probability distribution of various aspects, so as to accurately obtain The corresponding feeling polarities of the various aspects of text.
Through the foregoing embodiment, it can be seen that the present invention uses the convolutional neural networks based on attention mechanism, by right Each input state is weighted, and constructs a context-sensitive text representation vector for input text.Secondly, using The method for being segmented pond, captures the key feature of the different structure of sentence, convolution vector is divided into several segments, the maximum value of taking-up It is spliced into a vector, the character representation as current text sentence.In this way, fine-grained more by going deep into inside sentence carrying out Aspect sentiment analysis, the case where capable of conflicting with each other to sentence inside various aspects, carry out accurate and effective sentiment analysis.
Fig. 3 is a kind of text emotion analytical equipment structural schematic diagram corresponded to the above method, as shown in figure 3, the dress It sets and includes:
Extraction unit 301 is segmented, for obtaining the text for needing to carry out sentiment analysis, and obtains point that the text includes Word;
Similarity calculated 302, for for each participle, calculate word of the participle in default dictionary to Amount, and according to the term vector, calculate the similarity in the participle and the dictionary between other vocabulary;
Target word extraction unit 303 determines the participle of specified part of speech for marking the part of speech of each participle For candidate sentiment analysis target word;
Aspect extraction unit 304, for the similarity using candidate sentiment analysis target word, from the candidate's The corresponding aspect Aspect of the text is extracted in sentiment analysis target word;
Feeling polarities analytical unit 305, for the sentiment analysis target word and the aspect according to the candidate, using point Section convolutional neural networks carry out feeling polarities analysis to the text.
Preferably, the participle extraction unit 301, for utilizing net when acquisition needs to carry out the text of sentiment analysis Page resolver is cleaned and is parsed to webpage where the text, and the non-textual information in parsing result is deleted, institute is obtained State text.
Preferably, the participle extraction unit 301, for when obtaining the participle that the text includes, to the text It is split according to complete words, divides obtained each sentence for described, carried out at participle using preset participle tool Reason, obtains the participle for constituting the sentence.
Preferably, the similarity calculated 302, for utilizing the CBOW of Word2Vec for each participle Model calculates term vector of the participle in the dictionary, and by calculating cosine angle, obtains in the participle and the dictionary Similarity between other vocabulary.
Preferably, the specified part of speech includes: noun, verb, adjective and adverbial word.
Preferably, the aspect extraction unit 304, for from dictionary, obtaining the text institute in terms of preset seed Word in terms of the corresponding seed in category field;Word in terms of the seed in dictionary is included in the dictionary;For each time The sentiment analysis target word of choosing, one by one by the similarity between word in terms of the target word and each seed and preset Similar threshold value is compared, if the similarity is greater than the similar threshold value, by word in terms of corresponding seed, as described The corresponding aspect of text.
Preferably, described device further comprises Word library updating unit 306 in terms of seed:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;
In terms of word in terms of the seed extracted is added to the seed in dictionary.
Preferably, the feeling polarities analytical unit 305, for the sentiment analysis target word using the candidate, building Word sequence H;Using each word in the word sequence H as an input shape of the convolutional neural networks model based on attention State is sequentially input to the convolutional layer of the convolutional neural networks model;Wherein, H={ h1..., ht..., hT, 1≤t≤T; htFor t-th of word in word sequence H, T is the quantity of the sentiment analysis target word of the candidate;Pass through the convolutional neural networks The convolutional layer of model, using the context vector A of random initializtion, according toJuice is calculated each described Input state htWeight at;According toThe text representation vector v of the text is constructed, and is exported to described The pond layer of convolutional neural networks model;Wherein, et=Tanh (Wht+ b), W is the weight in model, and b is inclined in model It sets, Tanh () is hyperbolic tangent function;By the pond layer of the convolutional neural networks model, by the way of being segmented pond, Pond processing is carried out to the text representation vector v, obtains the feature vector of the text, and export to the convolutional Neural net The affection computation layer of network model;By the affection computation layer of the convolutional neural networks model, using each aspect as one A affective tag, for each affective tag, using softmax classifier, constructs the feelings according to the feature vector of the text The score vector of sense label, and according toConditional probability distribution is converted by the score vector, In, i=1,2 ..., C, C indicate the number of emotion aspect label, pi(x) conditional probability of i-th of emotion aspect label is indicated Distribution, xiIndicate the score vector of i-th of emotion aspect label.
Preferably, the feeling polarities analytical unit 305, in the pond for passing through the convolutional neural networks model When carrying out pondization processing to the text representation vector v, the text representation vector v is put down by the way of being segmented pond for layer C sections are divided into, and determines the maximum value in every vector paragraph;The maximum value of all vector paragraphs is spliced into a vector, and right The vector carries out tanh functional operation, obtains the feature vector of the text.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (18)

1. a kind of text emotion analysis method characterized by comprising
A, the text for needing to carry out sentiment analysis is obtained, and obtains the participle that the text includes;
B, for each participle, term vector of the participle in default dictionary is calculated, and according to the term vector, calculates this point Similarity in word and the dictionary between other vocabulary;
C, the participle of specified part of speech is determined as candidate sentiment analysis target word by the part of speech for marking each participle;
D, it using the similarity of candidate sentiment analysis target word, is extracted from the sentiment analysis target word of the candidate The corresponding aspect Aspect of the text;
E, according to the sentiment analysis target word and the aspect of the candidate, using sectional convolution neural network to the text into Market sense polarity check.
2. the method according to claim 1, wherein the acquisition needs the text for carrying out sentiment analysis to include:
Webpage where the text is cleaned and parsed using web-page parser;Non-textual information in parsing result is deleted It removes, obtains the text.
3. the method according to claim 1, wherein the participle that the acquisition text includes includes:
The text is split according to complete words;
Divide obtained each sentence for described, carries out word segmentation processing using preset participle tool, obtain constituting the sentence Participle.
4. the method according to claim 1, wherein the step b includes:
Term vector of the participle in the dictionary is calculated using the CBOW model of Word2Vec for each participle;It is logical Calculating cosine angle is crossed, the similarity in the participle and the dictionary between other vocabulary is obtained.
5. the method according to claim 1, wherein the specified part of speech include: noun, verb, adjective and Adverbial word.
6. the method according to claim 1, wherein the step d includes:
The word in terms of in dictionary, obtaining the corresponding seed of the text fields in terms of preset seed;In terms of the seed Word in dictionary is included in the dictionary;
It, one by one will be between word in terms of the target word and each seed for the sentiment analysis target word of each candidate The similarity and preset similar threshold value are compared, will be corresponding if the similarity is greater than the similar threshold value Word in terms of seed, as the corresponding aspect of the text.
7. according to the method described in claim 6, it is characterized in that, after the step a, the packet that takes a step forward of the step d It includes:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;
In terms of word in terms of the seed extracted is added to the seed in dictionary.
8. the method according to claim 1, wherein the step e includes:
Using the sentiment analysis target word of the candidate, word sequence H is constructed;Using each word in the word sequence H as being based on One input state of the convolutional neural networks model of attention is sequentially input to the convolution of the convolutional neural networks model Layer;Wherein, H={ h1..., ht..., hT, 1≤t≤T;htFor t-th of word in word sequence H, T is the emotion of the candidate Analyze the quantity of target word;
The convolutional layer of the convolutional neural networks model utilizes the context vector A of random initializtion, according toCalculate the weight a of each input state hxt;According toDescribed in construction The text representation vector v of text, and export to the pond layer of the convolutional neural networks model;Wherein, et=Tanh (Wht+ B), W is the weight in model, and b is the biasing in model, and Tanh () is hyperbolic tangent function;
The pond layer of the convolutional neural networks model carries out pond to the text representation vector v by the way of being segmented pond Change processing obtains the feature vector of the text, and exports to the affection computation layer of the convolutional neural networks model;
The affection computation layer of the convolutional neural networks model, using each aspect as an affective tag, according to described The feature vector of text constructs the score vector of the affective tag using softmax classifier for each affective tag, and According toConditional probability distribution is converted by the score vector, wherein i=1,2 ..., C, C are indicated The number of label, p in terms of emotioni(x) conditional probability distribution of i-th of emotion aspect label, x are indicatediIndicate i-th of emotion side The score vector of face label.
9. according to the method described in claim 8, it is characterized in that, described by the way of being segmented pond, to the text table Show that vector v carries out pond processing, the feature vector for obtaining the text includes:
The text representation vector v is averagely divided into C sections, and determines the maximum value in every vector paragraph;
The maximum value of all vector paragraphs is spliced into a vector, and tanh functional operation is carried out to the vector, obtains the text This feature vector.
10. a kind of text emotion analytical equipment characterized by comprising
Extraction unit is segmented, for obtaining the text for needing to carry out sentiment analysis, and obtains the participle that the text includes;
Similarity calculated, for for each participle, calculating term vector of the participle in default dictionary, and according to The term vector calculates the similarity in the participle and the dictionary between other vocabulary;
The participle of specified part of speech is determined as candidate for marking the part of speech of each participle by target word extraction unit Sentiment analysis target word;
Aspect extraction unit, for the similarity using candidate sentiment analysis target word, from the emotion of the candidate point The corresponding aspect Aspect of the text is extracted in analysis target word;
Feeling polarities analytical unit utilizes sectional convolution for the sentiment analysis target word and the aspect according to the candidate Neural network carries out feeling polarities analysis to the text.
11. device according to claim 10, which is characterized in that the participle extraction unit, for acquisition need into When the text of row sentiment analysis, webpage where the text is cleaned and parsed using web-page parser, by parsing result In non-textual information delete, obtain the text.
12. device according to claim 10, which is characterized in that the participle extraction unit, for obtaining the text When the participle for originally including, the text is split according to complete words, divides obtained each sentence for described, utilize Preset participle tool carries out word segmentation processing, obtains the participle for constituting the sentence.
13. device according to claim 10, which is characterized in that the similarity calculated, for for each institute Participle is stated, using the CBOW model of Word2Vec, calculates term vector of the participle in the dictionary, and by calculating cosine folder Angle obtains the similarity in the participle and the dictionary between other vocabulary.
14. device according to claim 10, which is characterized in that the specified part of speech includes: noun, verb, adjective And adverbial word.
15. device according to claim 10, which is characterized in that the aspect extraction unit is used for from preset seed In aspect dictionary, word in terms of the corresponding seed of the text fields is obtained;Word in terms of the seed in dictionary is included in In the dictionary;It, one by one will be in terms of the target word and each seed for the sentiment analysis target word of each candidate The similarity and preset similar threshold value between word are compared, if the similarity is greater than the similar threshold value, By word in terms of corresponding seed, as the corresponding aspect of the text.
16. device according to claim 15, which is characterized in that described device further comprises Word library updating in terms of seed Unit:
Using term frequency-inverse document frequency TF-IDF algorithm, word in terms of seed is extracted from the participle;
In terms of word in terms of the seed extracted is added to the seed in dictionary.
17. device according to claim 10, which is characterized in that the feeling polarities analytical unit, described in utilizing Candidate sentiment analysis target word constructs word sequence H;Using each word in the word sequence H as the convolution based on attention One input state of neural network model is sequentially input to the convolutional layer of the convolutional neural networks model;Wherein, H= {h1..., ht..., hT, 1≤t≤T;htFor t-th of word in word sequence H, T is the sentiment analysis target word of the candidate Quantity;By the convolutional layer of the convolutional neural networks model, using the context vector A of random initializtion, according toCalculate each input state htWeight at;According toDescribed in construction The text representation vector v of text, and export to the pond layer of the convolutional neural networks model;Wherein, et=Tanh (W.ht+ B), W is the weight in model, and b is the biasing in model, and Tanh () is hyperbolic tangent function;Pass through the convolutional neural networks The pond layer of model carries out pond processing to the text representation vector v, obtains the text by the way of being segmented pond Feature vector, and export to the convolutional neural networks model affection computation layer;Pass through the convolutional neural networks model Affection computation layer, using each aspect as an affective tag, according to the feature vector of the text, for each feelings Feel label, using softmax classifier, constructs the score vector of the affective tag, and according toIt will The score vector is converted into conditional probability distribution, wherein i=1,2 ..., C, C indicate the number of emotion aspect label, pi(x) Indicate the conditional probability distribution of i-th of emotion aspect label, xiIndicate the score vector of i-th of emotion aspect label.
18. device according to claim 17, which is characterized in that the feeling polarities analytical unit, for passing through The pond layer for stating convolutional neural networks model carries out pond processing to the text representation vector v by the way of being segmented pond When, the text representation vector v is averagely divided into C sections, and determine the maximum value in every vector paragraph;Most by all vector paragraphs Big value is spliced into a vector, and carries out tanh functional operation to the vector, obtains the feature vector of the text.
CN201810105796.5A 2018-02-02 2018-02-02 Text emotion analysis method and device Pending CN110134934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810105796.5A CN110134934A (en) 2018-02-02 2018-02-02 Text emotion analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810105796.5A CN110134934A (en) 2018-02-02 2018-02-02 Text emotion analysis method and device

Publications (1)

Publication Number Publication Date
CN110134934A true CN110134934A (en) 2019-08-16

Family

ID=67566916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810105796.5A Pending CN110134934A (en) 2018-02-02 2018-02-02 Text emotion analysis method and device

Country Status (1)

Country Link
CN (1) CN110134934A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909167A (en) * 2019-11-29 2020-03-24 重庆邮电大学 Microblog text classification system
CN110929029A (en) * 2019-11-04 2020-03-27 中国科学院信息工程研究所 Text classification method and system based on graph convolution neural network
CN111639152A (en) * 2019-08-29 2020-09-08 上海卓繁信息技术股份有限公司 Intention recognition method
CN112016298A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method for extracting product characteristic information, electronic device and storage medium
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113378545A (en) * 2021-06-08 2021-09-10 北京邮电大学 Aspect level emotion analysis method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105912720A (en) * 2016-05-04 2016-08-31 南京大学 Method for analyzing emotion-involved text data in computer
CN106547924A (en) * 2016-12-09 2017-03-29 东软集团股份有限公司 The sentiment analysis method and device of text message
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903164A (en) * 2014-03-25 2014-07-02 华南理工大学 Semi-supervised automatic aspect extraction method and system based on domain information
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105912720A (en) * 2016-05-04 2016-08-31 南京大学 Method for analyzing emotion-involved text data in computer
CN106547740A (en) * 2016-11-24 2017-03-29 四川无声信息技术有限公司 Text message processing method and device
CN106547924A (en) * 2016-12-09 2017-03-29 东软集团股份有限公司 The sentiment analysis method and device of text message
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639152A (en) * 2019-08-29 2020-09-08 上海卓繁信息技术股份有限公司 Intention recognition method
CN111639152B (en) * 2019-08-29 2021-04-13 上海卓繁信息技术股份有限公司 Intention recognition method
CN110929029A (en) * 2019-11-04 2020-03-27 中国科学院信息工程研究所 Text classification method and system based on graph convolution neural network
CN110909167A (en) * 2019-11-29 2020-03-24 重庆邮电大学 Microblog text classification system
CN110909167B (en) * 2019-11-29 2022-07-01 重庆邮电大学 Microblog text classification system
CN112016298A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method for extracting product characteristic information, electronic device and storage medium
CN112818682A (en) * 2021-01-22 2021-05-18 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN112818682B (en) * 2021-01-22 2023-01-03 深圳大学 E-commerce data analysis method, equipment, device and computer-readable storage medium
CN113378545A (en) * 2021-06-08 2021-09-10 北京邮电大学 Aspect level emotion analysis method and device, electronic equipment and storage medium
CN113378545B (en) * 2021-06-08 2022-02-11 北京邮电大学 Aspect level emotion analysis method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110852087B (en) Chinese error correction method and device, storage medium and electronic device
CN110134934A (en) Text emotion analysis method and device
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN103631961B (en) Method for identifying relationship between sentiment words and evaluation objects
CN106919673A (en) Text mood analysis system based on deep learning
CN108108433A (en) A kind of rule-based and the data network integration sentiment analysis method
CN107133214A (en) A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN106980609A (en) A kind of name entity recognition method of the condition random field of word-based vector representation
CN108563638B (en) Microblog emotion analysis method based on topic identification and integrated learning
CN107239439A (en) Public sentiment sentiment classification method based on word2vec
CN110502626B (en) Aspect level emotion analysis method based on convolutional neural network
CN104778209A (en) Opinion mining method for ten-million-scale news comments
Chang et al. Research on detection methods based on Doc2vec abnormal comments
CN108388554B (en) Text emotion recognition system based on collaborative filtering attention mechanism
CN101520802A (en) Question-answer pair quality evaluation method and system
CN103020167B (en) A kind of computer Chinese file classification method
CN108614855A (en) A kind of rumour recognition methods
CN105205124A (en) Semi-supervised text sentiment classification method based on random feature subspace
CN108388660A (en) A kind of improved electric business product pain spot analysis method
CN105183717A (en) OSN user emotion analysis method based on random forest and user relationship
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN106446147A (en) Emotion analysis method based on structuring features
CN108363784A (en) A kind of public sentiment trend estimate method based on text machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190816