CN106557463A - Sentiment analysis method and device - Google Patents
Sentiment analysis method and device Download PDFInfo
- Publication number
- CN106557463A CN106557463A CN201610966330.5A CN201610966330A CN106557463A CN 106557463 A CN106557463 A CN 106557463A CN 201610966330 A CN201610966330 A CN 201610966330A CN 106557463 A CN106557463 A CN 106557463A
- Authority
- CN
- China
- Prior art keywords
- emotion
- word sequence
- vector
- emotion word
- target text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of sentiment analysis method and device, is related to natural language processing technique field, the accuracy rate of sentiment analysis is improve by the present invention.The technical scheme is that:Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-semantic word of order extraction;Generate emotion word sequence vector corresponding with the emotion word sequence;Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;The emotion of the acquisition is marked into the sentiment analysis result as the target text.Present invention is mainly used for the emotion of analysis target text.
Description
Technical field
The present invention relates to natural language processing technique field, more particularly to a kind of sentiment analysis method and device.
Background technology
In the last few years, sentiment analysis technology had become the hot issue of natural language processing research field, sentiment analysis
Target is the viewpoint and feeling polarities of the digging user expression from text, and in excavating text, Sentiment orientation can be used to help other
User makes a decision.Therefore sentiment analysis technology has obtained the concern of numerous researchers in natural language processing research field, has
Very big using value.
At present, by the Sentiment orientation of flag sequence Rule target text, i.e., based on each sentence in training text
Emotional category mark and the emotion mark of training text constitute flag sequence rule, finally according to these flags sequence rule point
The emotion of analysis target text.
But, emotional category in category label sequence mark by manually arranging, this data in present tape label
It is not readily available when under deficient present situation, and drop can causes to excavate in the case of being limited or even little in emotional category mark
The sequence rules for going out are very few, so as to reduce the accuracy rate of sentiment analysis.
The content of the invention
In view of this, the present invention provides a kind of sentiment analysis method and device, and main purpose is to improve sentiment analysis
Accuracy rate.
According to one aspect of the invention, there is provided a kind of sentiment analysis method, including:
Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-language of order extraction
Adopted word;
Generate emotion word sequence vector corresponding with the emotion word sequence;
Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;It is described
The corresponding relation that the emotion word that is stored with preset disaggregated model sequence vector is marked with emotion;
The emotion of the acquisition is marked into the sentiment analysis result as the target text.
Further, methods described also includes:
Obtain the word feature vector of the target text;
Word feature vector and emotion word sequence vector to the target text is carried out merging and obtains the target text
Characteristic vector.
Specifically, it is described that feelings corresponding with the emotion word sequence vector of the target text are obtained according to preset disaggregated model
Sense mark includes:
Emotion corresponding with the characteristic vector of the target text is obtained according to preset disaggregated model to mark, described preset point
The corresponding relation that the characteristic vector that is stored with class model is marked with emotion.
Further, the preset disaggregated model is arranged using following methods:
Emotion word sequence is extracted from training text;
The corresponding relation that the emotion of the emotion word sequence of the training text and training text is marked is used as training text
Emotion word sequence signature;
The preset disaggregated model is trained according to the emotion word sequence signature of the training text.
Further, it is described according to the emotion word sequence signature of the training text train the preset disaggregated model it
Before, methods described also includes:
The emotion word sequence signature is filtered according to class sequence rules CSR algorithm.
Specifically, the preset disaggregated model is trained to include according to the emotion word sequence signature of the training text:
The emotion word sequence of the training text is converted into into emotion word sequence vector according to bag of words;
Word feature vector and emotion word sequence vector to the training text is carried out merging and obtains the target text
Characteristic vector;
The training preset disaggregated model is marked by the characteristic vector and emotion of each training text.
According to another aspect of the invention, there is provided a kind of sentiment analysis device, including:
Extraction unit, for emotion word sequence is extracted from target text, the emotion word sequence includes that order is extracted
Emotion word and non-semantic word;
Signal generating unit, for generating emotion word sequence vector corresponding with the emotion word sequence;
Acquiring unit, for obtaining corresponding with the emotion word sequence vector of the target text according to preset disaggregated model
Emotion is marked;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
Determining unit, marks the sentiment analysis result as the target text for the emotion using the acquisition.
Further, described device also includes:
The acquiring unit, is additionally operable to obtain the word feature vector of the target text;
Integrated unit, obtains institute for carrying out fusion to the word feature vector of the target text and emotion word sequence vector
State the characteristic vector of target text.
The acquiring unit, specifically for obtaining corresponding with the characteristic vector of the target text according to preset disaggregated model
Emotion mark, the corresponding relation of the characteristic vector that is stored with the preset disaggregated model and emotion mark.
Further, described device also includes:
The extraction unit, is additionally operable to emotion word sequence is extracted from training text;
The determining unit, it is right with what the emotion of training text was marked by the emotion word sequence of the training text to be additionally operable to
The emotion word sequence signature as training text should be related to;
Training unit, is additionally operable to train the preset disaggregated model according to the emotion word sequence signature of the training text.
Further, methods described also includes:
Filter element, for being filtered to the emotion word sequence signature according to class sequence rules CSR algorithm.
Specifically, the training unit includes;
Modular converter, for according to bag of words by the emotion word sequence of the training text be converted into emotion word sequence to
Amount;
Fusion Module, obtains institute for carrying out fusion to the word feature vector of the training text and emotion word sequence vector
State the characteristic vector of target text;
Training module, for the characteristic vector by each training text and the emotion mark training preset disaggregated model.
By above-mentioned technical proposal, technical scheme provided in an embodiment of the present invention at least has following advantages:
A kind of sentiment analysis method and device provided in an embodiment of the present invention, extracts emotion word order first from target text
Row, then generate emotion word sequence vector corresponding with the emotion word sequence, and according to preset disaggregated model obtain with it is described
The corresponding emotion mark of emotion word sequence vector of target text, finally using the emotion mark of the acquisition as target text
This sentiment analysis result.Compared with the Sentiment orientation of target text is obtained at present according to flag sequence, the embodiment of the present invention will
Emotion word sequence is extracted from target text and is converted into emotion word sequence vector, then obtained and target according to preset disaggregated model
The corresponding emotion mark of emotion word sequence vector of text, finally marks the emotion of the acquisition as the target text
Sentiment analysis result.Due to the present invention emotion word sequence compared to existing technology in emotional category flag sequence resource be easier
Obtain, and more information can be obtained by comparison, therefore existing category label sequence is solved by the embodiment of the present invention
In row, emotional category mark obtains difficult problem, improves the accuracy rate of sentiment analysis.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred embodiment, various other advantages and benefit are common for this area
Technical staff will be clear from understanding.Accompanying drawing is only used for the purpose for illustrating preferred embodiment, and is not considered as to the present invention
Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
Fig. 1 shows a kind of sentiment analysis method flow diagram provided in an embodiment of the present invention;
Fig. 2 shows a kind of structured flowchart of sentiment analysis device provided in an embodiment of the present invention;
Fig. 3 shows the structured flowchart of another kind of sentiment analysis device provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here
Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments herein described herein can with except here diagram or
Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover
Lid is non-exclusive to be included, and for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to
Those steps clearly listed or unit, but may include clearly not list or for these processes, method, product
Or intrinsic other steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of sentiment analysis embodiment of the method, it should be noted that in the stream of accompanying drawing
The step of journey is illustrated can be performed in the such as computer system of one group of computer executable instructions, and, although in stream
Logical order is shown in journey figure, but in some cases, can be performed with the order being different from herein shown or described
The step of.
In order to provide the implementation of the accuracy rate for improving sentiment analysis, a kind of sentiment analysis are embodiments provided
The preferred embodiments of the present invention are illustrated by method and device below in conjunction with Figure of description.
A kind of sentiment analysis method is embodiments provided, as shown in figure 1, concrete steps include:
101st, the extraction emotion word sequence from target text.
Wherein, the emotion word sequence includes the emotion word sequentially extracted from target text and non-semantic word, non-language
Adopted word is the word with physical meaning in target text, is specifically as follows adverbial word, negative word, and/or conjunction.
In embodiments of the present invention, extracting emotion word sequence detailed process from target text is:First to target text
Subordinate sentence is carried out, for each clause in target text, emotion word therein, adverbial word, negative word, conjunction is extracted, by which
It is ranked up in the order of former target text, constitutes an emotion word sequence.For example, according to sentence, " morning was originally very glad, can
It is afternoon wallet to be lost, 555, good grief." the emotion word sequence that obtains is:[it is glad, but, have bad luck].
102nd, generate emotion word sequence vector corresponding with the emotion word sequence.
For the embodiment of the present invention, can according to bag of words generate corresponding with emotion word sequence emotion word sequence to
Amount.Will emotion word sequence as the word in bag of words, by each feelings in the emotion word sequence of target text and bag of words
Sense word sequence is matched respectively, then will be put 1 with the successful digit of emotion word sequences match in bag of words, and be matched unsuccessful
Digit set to 0, finally obtain emotion word sequence vector corresponding with target text emotion word sequence.It should be noted that emotion
Between word sequence in the case of identical can the match is successful, i.e., the content, word between emotion word sequence only in word
Quantity, can the match is successful in the case of the order whole identical of word.
For example, the emotion word sequence of target text is:[it is glad, but, have bad luck];Emotion word sequence bag in bag of words
Include:[it is glad, but, have bad luck], [it is happy, but, it is sad], [it is sad, and, happiness], [it is unlucky, but, it is glad].Due to mesh
Emotion word sequence in mark text in emotion word sequence [glad, but, have bad luck] and bag of words [it is glad, but, have bad luck] in
In all of word all same, and [glad, but, have bad luck] " happiness ", " but ", the order and target text emotion word of " haveing bad luck "
In sequence, the order of word is identical, therefore emotion word sequence in [glad, but, have bad luck] and bag of words [it is glad, but,
It is mould] the match is successful.In addition, though emotion word sequence [it is unlucky, but, it is glad] including all of in target text emotion word sequence
Word, but in [unlucky, but, glad] " happiness ", " but ", the order of " haveing bad luck " and word in target text emotion word sequence
Order is different, therefore emotion word sequence in the emotion sequence [glad, but, have bad luck] of target text and bag of words [it is unlucky,
But, it is glad] and matching is unsuccessful.Thus the corresponding object vector of emotion sequence for obtaining target text is [1,0,0,0].
103rd, emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model.
Wherein, the corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion.At this
In inventive embodiments, the emotion word sequence vector stored in preset disaggregated model is obtained according to training text, i.e., preset point
Emotion word sequence vector in class model can represent the emotion word sequence of training text, emotion corresponding with emotion word sequence vector
Mark for representing the feeling polarities of training text, emotion mark is according to the actual feeling polarities of training text by backstage personnel
Arrange, the emotion mark is specifically as follows:Pain, disdain, hate, envying, it is happy, trust, grateful, rejoice etc., this
Bright embodiment is not specifically limited.
In embodiments of the present invention, obtain corresponding with the emotion word sequence vector of target text according to preset disaggregated model
Emotion mark detailed process can be:The emotion word sequence vector phase with target text is searched first from preset disaggregated model
Like degree highest primary vector, then primary vector corresponding emotion is marked the sentiment analysis as the target text tie
Really.For example, from preset disaggregated model, acquisition with the emotion word sequence vector similarity highest primary vector of target text is
Obtained according to training text A, that is, the primary vector for obtaining is obtained according to the corresponding emotion word sequences of training text A, if
The corresponding emotions of training text A are labeled as " happiness ", then get primary vector correspondence emotion and be labeled as " happiness ", that is, obtain mesh
The emotion of the emotion word sequence vector of mark text is labeled as " happiness ".
104th, the emotion of the acquisition is marked into the sentiment analysis result as the target text.
The embodiment of the present invention provides a kind of sentiment analysis method, will extract the conversion of emotion word sequence first from target text
Into emotion word sequence vector, emotion corresponding with the emotion word sequence vector of target text is obtained according to preset disaggregated model then
The emotion of the acquisition is finally marked the sentiment analysis result as the target text by mark.Due to the emotion of the present invention
Word sequence compared to existing technology in emotional category flag sequence resource easily obtain, and can obtain by comparison more
Information, therefore the mark of emotional category in existing category label sequence solved by the embodiment of the present invention obtain difficult asking
Topic, improves the accuracy rate of sentiment analysis.
In order to preferably illustrate to sentiment analysis method provided in an embodiment of the present invention, following examples will be for upper
State each step to be refined and extended.
Further, sentiment analysis method also includes:Obtain the word feature vector of the target text;To target text
This word feature vector and emotion word sequence vector carries out merging the characteristic vector for obtaining the target text.Wherein, obtain mesh
The detailed process of word feature vector of mark text is:Participle is carried out to the target text first, then Filtration Goal text point
Insignificant word in word result, and single word feature and/or adjacent word feature are chosen from the filter result of the target text
As candidate word feature set, same or meter is carried out with the word in bag of words finally by the word in the candidate word feature set
Calculate, obtain the word feature vector of the target text.
For example, the word in candidate word feature set is:Ma Yun is very, rich;Word in bag of words is:Ma Yun, makes a good deal of money,;
The word in candidate word feature set is carried out into meaning of a word characteristic vector that is same or being calculated target text with the word in bag of words then
For (1,0,0), " Ma Yun " that will be in candidate word feature set in " Ma Yun " and bag of words carry out with or computing obtain 1;To wait
Select " making a good deal of money " in word feature set in "true" and bag of words carry out with or computing obtain 0;By " rich " in candidate word feature set
Same or computing is carried out with " " in bag of words and obtains 0.
Word feature vector and emotion word sequence vector to target text carries out merging the characteristic vector for obtaining target text,
Emotion word sequence vector i.e. by splicing word feature vector and the target text of target text obtain the feature of target text to
Amount.For example, the word feature vector X of target text1..., Xn.The emotion word sequence vector of target text is B1, B2..., Bn, then
By merging the word feature vector and emotion word sequence vector of target text and obtaining the characteristic vector of target text it is:X1..., Xn, B1, B2..., Bn。
Further, it is described to obtain corresponding with the emotion word sequence vector of the target text according to preset disaggregated model
Emotion mark includes:Emotion mark corresponding with the characteristic vector of the target text is obtained according to preset disaggregated model, it is described
The corresponding relation of characteristic vector and emotion mark is stored in preset disaggregated model.Wherein, the characteristic vector in preset disaggregated model
Obtained according to training text, i.e., obtain the emotion word sequence vector and word feature vector of training text, Ran Hougen first
The characteristic vector of training text is obtained according to the emotion word sequence vector and word feature vector of Fusion training text.Need explanation
Be, the acquisition process of emotion word sequence vector and word feature vector of training text and the emotion word sequence of target text to
Amount, word feature vector acquisition process it is identical, the embodiment of the present invention will not be described here.
In embodiments of the present invention, first the word feature vector of target text and emotion word sequence vector are carried out merging
To the characteristic vector of target text, feelings corresponding with the characteristic vector of the target text are obtained according to preset disaggregated model then
Sense mark, the emotion for obtaining finally is marked the sentiment analysis result as target text.Due to the characteristic vector of target text
It is emotion word sequence vector, the word feature vector of fusion, therefore the characteristic vector of target text can preferably expresses target text
This emotion and meaning of a word feature, the emotion mark that obtained by the characteristic vector of target text can the degree of accuracy give expression to target text
This Sentiment orientation, so as to the embodiment of the present invention further increasing the accuracy rate of sentiment analysis.
Specifically, the preset disaggregated model is arranged using following methods:Emotion word sequence is extracted from training text;Will
Emotion word order of the corresponding relation that the emotion word sequence of the training text is marked with the emotion of training text as training text
Row feature;The preset disaggregated model is trained according to the emotion word sequence signature of the training text.Wherein, with regard to from training text
The process that emotion word sequence is extracted in this can refer to from target text the process for extracting emotion word sequence, embodiment of the present invention here
Repeat no more.
Further, it is described according to the emotion word sequence signature of the training text train the preset disaggregated model it
Before, methods described also includes:According to CSR (Class sequential rules, class sequence rules) algorithm to the emotion word
Sequence signature is filtered.Wherein, the set of a frequent item set comprising maximum frequent itemsets can be obtained by CSR algorithms, is pressed
According to the length order from big to small of frequent item set, the frequent Son item set of the non-frequent transaction of a frequent item set is obtained successively, so as to
All of frequent item set Candidate Set is obtained, and all of correlation rule is obtained by frequent item set Candidate Set.Therefore, the present invention is real
Apply example and the non-emotion word sequence signature for frequently occurring can be filtered out by CSR algorithms, and according to the emotion word sequence after filtration
Feature obtains all of correlation rule.
Specifically, the preset disaggregated model is trained to include according to the emotion word sequence signature of the training text:According to
The emotion word sequence of the training text is converted into emotion word sequence vector by bag of words;Word feature to the training text
Vector and emotion word sequence vector carry out merging the characteristic vector for obtaining the target text;By the feature of each training text to
Amount and the emotion mark training preset disaggregated model.
It should be noted that emotion word sequence is converted into the bag of words of emotion word sequence vector different from target is literary
Originally the bag of words of word feature vector are converted to.Emotion word sequence is converted into storing in the bag of words of emotion word sequence vector
Be the emotion word sequence obtained by training text, will emotion word sequence as the word in bag of words;And by target text
What is stored in the bag of words for being converted to word feature vector is the word obtained by training text.
Further, the embodiment of the present invention provides a kind of sentiment analysis device, as shown in Fig. 2 described device includes:Extract
Unit 21, signal generating unit 22, acquiring unit 23, determining unit 24.
Extraction unit 21, for emotion word sequence is extracted from target text, the emotion word sequence includes that order is carried
The emotion word for taking and non-semantic word;
Wherein, non-semantic word is the word with physical meaning in target text, be specifically as follows adverbial word, negative word and/
Or conjunction.In embodiments of the present invention, extracting emotion word sequence detailed process from target text is:First to target text
Subordinate sentence is carried out, for each clause in target text, emotion word therein, adverbial word, negative word, conjunction is extracted, by which
It is ranked up in the order of former target text, constitutes an emotion word sequence.For example, according to sentence, " morning was originally very glad, can
It is afternoon wallet to be lost, 555, good grief." the emotion word sequence that obtains is:[it is glad, but, have bad luck].
Signal generating unit 22, for generating emotion word sequence vector corresponding with the emotion word sequence;
For the embodiment of the present invention, can according to bag of words generate corresponding with emotion word sequence emotion word sequence to
Amount.Will emotion word sequence as the word in bag of words, will be each in the emotion word sequence of target text and bag of words
Emotion word sequence is matched respectively, then will put 1 with the successful digit of emotion word sequences match in bag of words, matching not into
The digit of work(sets to 0, and finally obtains emotion word sequence vector corresponding with target text emotion word sequence.It should be noted that feelings
Sense word sequence between only in the case of identical can the match is successful, i.e., between emotion word sequence only in word
Hold, the quantity of word, can the match is successful in the case of the order whole identical of word.
For example, the emotion word sequence of target text is:[it is glad, but, have bad luck];Emotion word sequence bag in bag of words
Include:[it is glad, but, have bad luck], [it is happy, but, it is sad], [it is sad, and, happiness], [it is unlucky, but, it is glad].Due to mesh
Own in mark text emotion word sequence [glad, but, have bad luck] and emotion word sequence in bag of words [glad, but, have bad luck]
Word all same, and in [glad, but, have bad luck] " happiness ", " but ", the order and target text emotion word sequence of " haveing bad luck "
The order of interior word is identical, therefore emotion word sequence in [glad, but, have bad luck] and bag of words [it is glad, but, have bad luck]
With success.In addition, though emotion word sequence [it is unlucky, but, it is glad] including all of word in target text emotion word sequence, but
It is in [unlucky, but, glad] " happiness ", " but ", the order of " haveing bad luck " and the order of word in target text emotion word sequence
Emotion word sequence in difference, therefore the emotion sequence [glad, but, have bad luck] of target text and bag of words [it is unlucky, can
It is be, glad] matching is unsuccessful.Thus the corresponding object vector of emotion sequence for obtaining target text is [1,0,0,0].
Acquiring unit 23, for obtaining corresponding with the emotion word sequence vector of the target text according to preset disaggregated model
Emotion mark;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
In embodiments of the present invention, the emotion word sequence vector for storing in preset disaggregated model is obtained according to training text
, i.e., the emotion word sequence vector in preset disaggregated model can represent the emotion word sequence of training text, with emotion word sequence to
Measure corresponding emotion to mark for representing the feeling polarities of training text, emotion mark is according to training text by backstage personnel
What actual feeling polarities were arranged, the emotion mark is specifically as follows:Pain, disdain, hate, envying, it is happy, trust, it is grateful,
Rejoice etc., the embodiment of the present invention is not specifically limited.
In embodiments of the present invention, obtain corresponding with the emotion word sequence vector of target text according to preset disaggregated model
Emotion mark detailed process can be:The emotion word sequence vector phase with target text is searched first from preset disaggregated model
Like degree highest primary vector, then primary vector corresponding emotion is marked the sentiment analysis as the target text tie
Really.For example, from preset disaggregated model, acquisition with the emotion word sequence vector similarity highest primary vector of target text is
Obtained according to training text A, that is, the primary vector for obtaining is obtained according to the corresponding emotion word sequences of training text A, if
The corresponding emotions of training text A are labeled as " happiness ", then get primary vector correspondence emotion and be labeled as " happiness ", that is, obtain mesh
The emotion of the emotion word sequence vector of mark text is labeled as " happiness ".
Determining unit 24, marks the sentiment analysis result as the target text for the emotion using the acquisition.
The embodiment of the present invention provides a kind of sentiment analysis device, will extract the conversion of emotion word sequence first from target text
Into emotion word sequence vector, emotion corresponding with the emotion word sequence vector of target text is obtained according to preset disaggregated model then
The emotion of the acquisition is finally marked the sentiment analysis result as the target text by mark.Due to the emotion of the present invention
Word sequence compared to existing technology in emotional category flag sequence resource easily obtain, and can obtain by comparison more
Information, therefore the mark of emotional category in existing category label sequence solved by the embodiment of the present invention obtain difficult asking
Topic, improves the accuracy rate of sentiment analysis.
Further, as shown in figure 3, described device also includes:
The acquiring unit 23, is additionally operable to obtain the word feature vector of the target text;
Integrated unit 25, obtains for carrying out fusion to the word feature vector of the target text and emotion word sequence vector
The characteristic vector of the target text.
Wherein, the detailed process of the word feature vector of acquisition target text is:Participle is carried out to the target text first,
Then insignificant word in Filtration Goal text word segmentation result, and it is special that single word is chosen from the filter result of the target text
Levy and/or adjacent word feature is used as candidate word feature set, finally by the word and bag of words in the candidate word feature set
In word carry out with or calculate, obtain the word feature vector of the target text.
For example, the word in candidate word feature set is:Ma Yun is very, rich;Word in bag of words is:Ma Yun, makes a good deal of money,;
The word in candidate word feature set is carried out into meaning of a word characteristic vector that is same or being calculated target text with the word in bag of words then
For (1,0,0), " Ma Yun " that will be in candidate word feature set in " Ma Yun " and bag of words carry out with or computing obtain 1;To wait
Select " making a good deal of money " in word feature set in "true" and bag of words carry out with or computing obtain 0;By " rich " in candidate word feature set
Same or computing is carried out with " " in bag of words and obtains 0.
Word feature vector and emotion word sequence vector to target text carries out merging the characteristic vector for obtaining target text,
Emotion word sequence vector i.e. by splicing word feature vector and the target text of target text obtain the feature of target text to
Amount.For example, the word feature vector X of target text1..., Xn.The emotion word sequence vector of target text is B1, B2..., Bn, then
By merging the word feature vector and emotion word sequence vector of target text and obtaining the characteristic vector of target text it is:X1..., Xn, B1, B2..., Bn。
The acquiring unit 23, specifically for the characteristic vector pair with the target text is obtained according to preset disaggregated model
The emotion mark answered, the corresponding relation that the characteristic vector that is stored with the preset disaggregated model is marked with emotion.Wherein, preset point
Characteristic vector in class model is obtained according to training text, i.e., first obtain training text emotion word sequence vector and
Word feature vector, then obtains the spy of training text according to the emotion word sequence vector and word feature vector of Fusion training text
Levy vector.It should be noted that the acquisition process of the emotion word sequence vector and word feature vector of training text and target text
This emotion word sequence vector, the acquisition process of word feature vector are identical, and the embodiment of the present invention will not be described here.
In embodiments of the present invention, first the word feature vector of target text and emotion word sequence vector are carried out merging
To the characteristic vector of target text, feelings corresponding with the characteristic vector of the target text are obtained according to preset disaggregated model then
Sense mark, the emotion for obtaining finally is marked the sentiment analysis result as target text.Due to the characteristic vector of target text
It is emotion word sequence vector, the word feature vector of fusion, therefore the characteristic vector of target text can preferably expresses target text
This emotion and meaning of a word feature, the emotion mark that obtained by the characteristic vector of target text can the degree of accuracy give expression to target text
This Sentiment orientation, so as to the embodiment of the present invention further increasing the accuracy rate of sentiment analysis.
Further, as shown in figure 3, described device also includes:
The extraction unit 21, is additionally operable to emotion word sequence is extracted from training text;
The determining unit 24, is additionally operable to the emotion word sequence of the training text and the emotion mark of training text
Emotion word sequence signature of the corresponding relation as training text;
Training unit 26, is additionally operable to train the preset classification mould according to the emotion word sequence signature of the training text
Type.
Further, as shown in figure 3, described device also includes:
Filter element 27, for being filtered to the emotion word sequence signature according to class sequence rules CSR algorithm.
Specifically, as shown in figure 3, the training unit 26 includes:
Modular converter 261, for the emotion word sequence of the training text is converted into emotion word order according to bag of words
Column vector;
Fusion Module 262, for carrying out merging to the word feature vector of the training text and emotion word sequence vector
To the characteristic vector of the target text;
Training module 263, for the characteristic vector by each training text and the emotion mark training preset classification mould
Type.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment
Point, may refer to the associated description of other embodiment.
It is understood that said method and the correlated characteristic in device mutually can be referred to.In addition, in above-described embodiment
" first ", " second " etc. be, for distinguishing each embodiment, and not represent the quality of each embodiment.
Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this
Bright preferred forms.
In specification mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention
Example can be put into practice in the case where not having these details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist
Above to, in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes
In example, figure or descriptions thereof.However, should the method for the disclosure be construed to reflect following intention:I.e. required guarantor
The more features of feature is expressly recited in each claim by the application claims ratio of shield.More precisely, such as following
Claims it is reflected as, inventive aspect is less than all features of single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more different from embodiment equipment.Can be the module or list in embodiment
Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any
Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (includes adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In some included features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint
One of meaning can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) are realizing sentiment analysis method and device according to embodiments of the present invention (such as
Determine the device of website internal chaining grade) in some or all parts some or all functions.The present invention can be with
It is embodied as performing some or all equipment or program of device (for example, computer of method as described herein
Program and computer program).Such program for realizing the present invention can be stored on a computer-readable medium, Huo Zheke
In the form of with one or more signal.Such signal can be downloaded from internet website and be obtained, or in carrier
There is provided on signal, or provided with any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of sentiment analysis method, it is characterised in that include:
Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-semantic of order extraction
Word;
Generate emotion word sequence vector corresponding with the emotion word sequence;
Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;It is described preset
The corresponding relation that the emotion word that is stored with disaggregated model sequence vector is marked with emotion;
The emotion of the acquisition is marked into the sentiment analysis result as the target text.
2. method according to claim 1, it is characterised in that methods described also includes:
Obtain the word feature vector of the target text;
Word feature vector and emotion word sequence vector to the target text carries out merging the feature for obtaining the target text
Vector.
3. method according to claim 2, it is characterised in that described to be obtained according to preset disaggregated model and target text
This corresponding emotion mark of emotion word sequence vector includes:
Emotion mark corresponding with the characteristic vector of the target text, the preset classification mould are obtained according to preset disaggregated model
The corresponding relation that the characteristic vector that is stored with type is marked with emotion.
4. the method according to claim 1 or 3, it is characterised in that the preset disaggregated model is arranged using following methods:
Emotion word sequence is extracted from training text;
Feelings of the corresponding relation that the emotion of the emotion word sequence of the training text and training text is marked as training text
Sense word sequence feature;
The preset disaggregated model is trained according to the emotion word sequence signature of the training text.
5. method according to claim 4, it is characterised in that the emotion word sequence signature according to the training text
Before training the preset disaggregated model, methods described also includes:
The emotion word sequence signature is filtered according to class sequence rules CSR algorithm.
6. method according to claim 4, it is characterised in that trained according to the emotion word sequence signature of the training text
The preset disaggregated model includes:
The emotion word sequence of the training text is converted into into emotion word sequence vector according to bag of words;
Word feature vector and emotion word sequence vector to the training text carries out merging the feature for obtaining the target text
Vector;
The training preset disaggregated model is marked by the characteristic vector and emotion of each training text.
7. a kind of sentiment analysis device, it is characterised in that include:
Extraction unit, for emotion word sequence is extracted from target text, the emotion word sequence includes the feelings that order is extracted
Sense word and non-semantic word;
Signal generating unit, for generating emotion word sequence vector corresponding with the emotion word sequence;
Acquiring unit, for obtaining emotion corresponding with the emotion word sequence vector of the target text according to preset disaggregated model
Mark;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
Determining unit, marks the sentiment analysis result as the target text for the emotion using the acquisition.
8. device according to claim 7, it is characterised in that described device also includes:
The acquiring unit, is additionally operable to obtain the word feature vector of the target text;
Integrated unit, obtains the mesh for carrying out fusion to the word feature vector of the target text and emotion word sequence vector
The characteristic vector of mark text.
9. device according to claim 8, it is characterised in that the acquiring unit, specifically for according to preset classification mould
Type obtains emotion mark corresponding with the characteristic vector of the target text, and be stored with the preset disaggregated model characteristic vector
With the corresponding relation of emotion mark.
10. the device according to claim 7 or 9, it is characterised in that described device also includes:
The extraction unit, is additionally operable to emotion word sequence is extracted from training text;
The determining unit, is additionally operable to the corresponding pass for marking the emotion word sequence of the training text and the emotion of training text
It is the emotion word sequence signature as training text;
Training unit, is additionally operable to train the preset disaggregated model according to the emotion word sequence signature of the training text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610966330.5A CN106557463A (en) | 2016-10-31 | 2016-10-31 | Sentiment analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610966330.5A CN106557463A (en) | 2016-10-31 | 2016-10-31 | Sentiment analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106557463A true CN106557463A (en) | 2017-04-05 |
Family
ID=58443772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610966330.5A Pending CN106557463A (en) | 2016-10-31 | 2016-10-31 | Sentiment analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106557463A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967258A (en) * | 2017-11-23 | 2018-04-27 | 广州艾媒数聚信息咨询股份有限公司 | The sentiment analysis method and system of text message |
CN108647205A (en) * | 2018-05-02 | 2018-10-12 | 深圳前海微众银行股份有限公司 | Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing |
CN108763510A (en) * | 2018-05-30 | 2018-11-06 | 北京五八信息技术有限公司 | Intension recognizing method, device, equipment and storage medium |
CN109783800A (en) * | 2018-12-13 | 2019-05-21 | 北京百度网讯科技有限公司 | Acquisition methods, device, equipment and the storage medium of emotion keyword |
CN110020420A (en) * | 2018-01-10 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Text handling method, device, computer equipment and storage medium |
CN110097936A (en) * | 2019-05-08 | 2019-08-06 | 北京百度网讯科技有限公司 | Method and apparatus for exporting case history |
CN111126046A (en) * | 2019-12-06 | 2020-05-08 | 腾讯云计算(北京)有限责任公司 | Statement feature processing method and device and storage medium |
CN111159412A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Classification method and device, electronic equipment and readable storage medium |
CN111368555A (en) * | 2020-05-27 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Data identification method and device, storage medium and electronic equipment |
CN116089602A (en) * | 2021-11-04 | 2023-05-09 | 腾讯科技(深圳)有限公司 | Information processing method, apparatus, electronic device, storage medium, and program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200969A (en) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | Text sentiment polarity classification system and method based on sentence sequence |
CN104063427A (en) * | 2014-06-06 | 2014-09-24 | 北京搜狗科技发展有限公司 | Expression input method and device based on semantic understanding |
CN104794208A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Sentiment classification method and system based on contextual information of microblog text |
CN105930503A (en) * | 2016-05-09 | 2016-09-07 | 清华大学 | Combination feature vector and deep learning based sentiment classification method and device |
-
2016
- 2016-10-31 CN CN201610966330.5A patent/CN106557463A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200969A (en) * | 2010-03-25 | 2011-09-28 | 日电(中国)有限公司 | Text sentiment polarity classification system and method based on sentence sequence |
CN104063427A (en) * | 2014-06-06 | 2014-09-24 | 北京搜狗科技发展有限公司 | Expression input method and device based on semantic understanding |
CN104794208A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Sentiment classification method and system based on contextual information of microblog text |
CN105930503A (en) * | 2016-05-09 | 2016-09-07 | 清华大学 | Combination feature vector and deep learning based sentiment classification method and device |
Non-Patent Citations (2)
Title |
---|
王磊 等: "基于主题的文本句情感分析", 《计算机科学》 * |
王飞跃 等: "《社会计算的基本方法与应用》", 31 May 2013 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967258A (en) * | 2017-11-23 | 2018-04-27 | 广州艾媒数聚信息咨询股份有限公司 | The sentiment analysis method and system of text message |
CN110020420A (en) * | 2018-01-10 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Text handling method, device, computer equipment and storage medium |
CN108647205A (en) * | 2018-05-02 | 2018-10-12 | 深圳前海微众银行股份有限公司 | Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing |
CN108647205B (en) * | 2018-05-02 | 2022-02-15 | 深圳前海微众银行股份有限公司 | Fine-grained emotion analysis model construction method and device and readable storage medium |
CN108763510A (en) * | 2018-05-30 | 2018-11-06 | 北京五八信息技术有限公司 | Intension recognizing method, device, equipment and storage medium |
CN109783800B (en) * | 2018-12-13 | 2024-04-12 | 北京百度网讯科技有限公司 | Emotion keyword acquisition method, device, equipment and storage medium |
CN109783800A (en) * | 2018-12-13 | 2019-05-21 | 北京百度网讯科技有限公司 | Acquisition methods, device, equipment and the storage medium of emotion keyword |
CN110097936A (en) * | 2019-05-08 | 2019-08-06 | 北京百度网讯科技有限公司 | Method and apparatus for exporting case history |
CN110097936B (en) * | 2019-05-08 | 2022-08-05 | 北京百度网讯科技有限公司 | Method and device for outputting medical records |
CN111126046A (en) * | 2019-12-06 | 2020-05-08 | 腾讯云计算(北京)有限责任公司 | Statement feature processing method and device and storage medium |
CN111126046B (en) * | 2019-12-06 | 2023-07-14 | 腾讯云计算(北京)有限责任公司 | Sentence characteristic processing method and device and storage medium |
CN111159412A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Classification method and device, electronic equipment and readable storage medium |
CN111159412B (en) * | 2019-12-31 | 2023-05-12 | 腾讯科技(深圳)有限公司 | Classification method, classification device, electronic equipment and readable storage medium |
CN111368555A (en) * | 2020-05-27 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Data identification method and device, storage medium and electronic equipment |
CN111368555B (en) * | 2020-05-27 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Data identification method and device, storage medium and electronic equipment |
CN116089602A (en) * | 2021-11-04 | 2023-05-09 | 腾讯科技(深圳)有限公司 | Information processing method, apparatus, electronic device, storage medium, and program product |
CN116089602B (en) * | 2021-11-04 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Information processing method, apparatus, electronic device, storage medium, and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106557463A (en) | Sentiment analysis method and device | |
Mukhtar et al. | Urdu sentiment analysis using supervised machine learning approach | |
CN107291783B (en) | Semantic matching method and intelligent equipment | |
CN105283868B (en) | For the method for probability resolution, component, medium and system | |
US8468167B2 (en) | Automatic data validation and correction | |
CN109885824A (en) | A kind of Chinese name entity recognition method, device and the readable storage medium storing program for executing of level | |
CN109299457A (en) | A kind of opining mining method, device and equipment | |
CN105930452A (en) | Smart answering method capable of identifying natural language | |
CN107862322B (en) | Method, device and system for classifying picture attributes by combining picture and text | |
CN111291210A (en) | Image material library generation method, image material recommendation method and related device | |
CN112948535A (en) | Method and device for extracting knowledge triples of text and storage medium | |
CN105955962A (en) | Method and device for calculating similarity of topics | |
CN102314452B (en) | A kind of method and system of being undertaken navigating by input method platform | |
CN106708940A (en) | Method and device used for processing pictures | |
CN108415972A (en) | text emotion processing method | |
CN105653547A (en) | Method and device for extracting keywords of text | |
CN110688455A (en) | Method, medium and computer equipment for filtering invalid comments based on artificial intelligence | |
CN111143708A (en) | Search device, search method, search program, and recording medium | |
WO2021151929A1 (en) | Math detection in handwriting | |
CN112613321A (en) | Method and system for extracting entity attribute information in text | |
CN104615910A (en) | Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest | |
CN112597299A (en) | Text entity classification method and device, terminal equipment and storage medium | |
Tschirschwitz et al. | A dataset for analysing complex document layouts in the digital humanities and its evaluation with Krippendorff’s alpha | |
CN105893363A (en) | A method and a system for acquiring relevant knowledge points of a knowledge point | |
Mzamo et al. | Introducing XGL-a lexicalised probabilistic graphical lemmatiser for isiXhosa |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170405 |
|
RJ01 | Rejection of invention patent application after publication |