CN106557463A - Sentiment analysis method and device - Google Patents

Sentiment analysis method and device Download PDF

Info

Publication number
CN106557463A
CN106557463A CN201610966330.5A CN201610966330A CN106557463A CN 106557463 A CN106557463 A CN 106557463A CN 201610966330 A CN201610966330 A CN 201610966330A CN 106557463 A CN106557463 A CN 106557463A
Authority
CN
China
Prior art keywords
emotion
word sequence
vector
emotion word
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610966330.5A
Other languages
Chinese (zh)
Inventor
王明强
齐勇
张明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201610966330.5A priority Critical patent/CN106557463A/en
Publication of CN106557463A publication Critical patent/CN106557463A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of sentiment analysis method and device, is related to natural language processing technique field, the accuracy rate of sentiment analysis is improve by the present invention.The technical scheme is that:Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-semantic word of order extraction;Generate emotion word sequence vector corresponding with the emotion word sequence;Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;The emotion of the acquisition is marked into the sentiment analysis result as the target text.Present invention is mainly used for the emotion of analysis target text.

Description

Sentiment analysis method and device
Technical field
The present invention relates to natural language processing technique field, more particularly to a kind of sentiment analysis method and device.
Background technology
In the last few years, sentiment analysis technology had become the hot issue of natural language processing research field, sentiment analysis Target is the viewpoint and feeling polarities of the digging user expression from text, and in excavating text, Sentiment orientation can be used to help other User makes a decision.Therefore sentiment analysis technology has obtained the concern of numerous researchers in natural language processing research field, has Very big using value.
At present, by the Sentiment orientation of flag sequence Rule target text, i.e., based on each sentence in training text Emotional category mark and the emotion mark of training text constitute flag sequence rule, finally according to these flags sequence rule point The emotion of analysis target text.
But, emotional category in category label sequence mark by manually arranging, this data in present tape label It is not readily available when under deficient present situation, and drop can causes to excavate in the case of being limited or even little in emotional category mark The sequence rules for going out are very few, so as to reduce the accuracy rate of sentiment analysis.
The content of the invention
In view of this, the present invention provides a kind of sentiment analysis method and device, and main purpose is to improve sentiment analysis Accuracy rate.
According to one aspect of the invention, there is provided a kind of sentiment analysis method, including:
Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-language of order extraction Adopted word;
Generate emotion word sequence vector corresponding with the emotion word sequence;
Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;It is described The corresponding relation that the emotion word that is stored with preset disaggregated model sequence vector is marked with emotion;
The emotion of the acquisition is marked into the sentiment analysis result as the target text.
Further, methods described also includes:
Obtain the word feature vector of the target text;
Word feature vector and emotion word sequence vector to the target text is carried out merging and obtains the target text Characteristic vector.
Specifically, it is described that feelings corresponding with the emotion word sequence vector of the target text are obtained according to preset disaggregated model Sense mark includes:
Emotion corresponding with the characteristic vector of the target text is obtained according to preset disaggregated model to mark, described preset point The corresponding relation that the characteristic vector that is stored with class model is marked with emotion.
Further, the preset disaggregated model is arranged using following methods:
Emotion word sequence is extracted from training text;
The corresponding relation that the emotion of the emotion word sequence of the training text and training text is marked is used as training text Emotion word sequence signature;
The preset disaggregated model is trained according to the emotion word sequence signature of the training text.
Further, it is described according to the emotion word sequence signature of the training text train the preset disaggregated model it Before, methods described also includes:
The emotion word sequence signature is filtered according to class sequence rules CSR algorithm.
Specifically, the preset disaggregated model is trained to include according to the emotion word sequence signature of the training text:
The emotion word sequence of the training text is converted into into emotion word sequence vector according to bag of words;
Word feature vector and emotion word sequence vector to the training text is carried out merging and obtains the target text Characteristic vector;
The training preset disaggregated model is marked by the characteristic vector and emotion of each training text.
According to another aspect of the invention, there is provided a kind of sentiment analysis device, including:
Extraction unit, for emotion word sequence is extracted from target text, the emotion word sequence includes that order is extracted Emotion word and non-semantic word;
Signal generating unit, for generating emotion word sequence vector corresponding with the emotion word sequence;
Acquiring unit, for obtaining corresponding with the emotion word sequence vector of the target text according to preset disaggregated model Emotion is marked;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
Determining unit, marks the sentiment analysis result as the target text for the emotion using the acquisition.
Further, described device also includes:
The acquiring unit, is additionally operable to obtain the word feature vector of the target text;
Integrated unit, obtains institute for carrying out fusion to the word feature vector of the target text and emotion word sequence vector State the characteristic vector of target text.
The acquiring unit, specifically for obtaining corresponding with the characteristic vector of the target text according to preset disaggregated model Emotion mark, the corresponding relation of the characteristic vector that is stored with the preset disaggregated model and emotion mark.
Further, described device also includes:
The extraction unit, is additionally operable to emotion word sequence is extracted from training text;
The determining unit, it is right with what the emotion of training text was marked by the emotion word sequence of the training text to be additionally operable to The emotion word sequence signature as training text should be related to;
Training unit, is additionally operable to train the preset disaggregated model according to the emotion word sequence signature of the training text.
Further, methods described also includes:
Filter element, for being filtered to the emotion word sequence signature according to class sequence rules CSR algorithm.
Specifically, the training unit includes;
Modular converter, for according to bag of words by the emotion word sequence of the training text be converted into emotion word sequence to Amount;
Fusion Module, obtains institute for carrying out fusion to the word feature vector of the training text and emotion word sequence vector State the characteristic vector of target text;
Training module, for the characteristic vector by each training text and the emotion mark training preset disaggregated model.
By above-mentioned technical proposal, technical scheme provided in an embodiment of the present invention at least has following advantages:
A kind of sentiment analysis method and device provided in an embodiment of the present invention, extracts emotion word order first from target text Row, then generate emotion word sequence vector corresponding with the emotion word sequence, and according to preset disaggregated model obtain with it is described The corresponding emotion mark of emotion word sequence vector of target text, finally using the emotion mark of the acquisition as target text This sentiment analysis result.Compared with the Sentiment orientation of target text is obtained at present according to flag sequence, the embodiment of the present invention will Emotion word sequence is extracted from target text and is converted into emotion word sequence vector, then obtained and target according to preset disaggregated model The corresponding emotion mark of emotion word sequence vector of text, finally marks the emotion of the acquisition as the target text Sentiment analysis result.Due to the present invention emotion word sequence compared to existing technology in emotional category flag sequence resource be easier Obtain, and more information can be obtained by comparison, therefore existing category label sequence is solved by the embodiment of the present invention In row, emotional category mark obtains difficult problem, improves the accuracy rate of sentiment analysis.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred embodiment, various other advantages and benefit are common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for the purpose for illustrating preferred embodiment, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
Fig. 1 shows a kind of sentiment analysis method flow diagram provided in an embodiment of the present invention;
Fig. 2 shows a kind of structured flowchart of sentiment analysis device provided in an embodiment of the present invention;
Fig. 3 shows the structured flowchart of another kind of sentiment analysis device provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, " Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so as to embodiments herein described herein can with except here diagram or Order beyond those of description is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that cover Lid is non-exclusive to be included, and for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to Those steps clearly listed or unit, but may include clearly not list or for these processes, method, product Or intrinsic other steps of equipment or unit.
According to the embodiment of the present application, there is provided a kind of sentiment analysis embodiment of the method, it should be noted that in the stream of accompanying drawing The step of journey is illustrated can be performed in the such as computer system of one group of computer executable instructions, and, although in stream Logical order is shown in journey figure, but in some cases, can be performed with the order being different from herein shown or described The step of.
In order to provide the implementation of the accuracy rate for improving sentiment analysis, a kind of sentiment analysis are embodiments provided The preferred embodiments of the present invention are illustrated by method and device below in conjunction with Figure of description.
A kind of sentiment analysis method is embodiments provided, as shown in figure 1, concrete steps include:
101st, the extraction emotion word sequence from target text.
Wherein, the emotion word sequence includes the emotion word sequentially extracted from target text and non-semantic word, non-language Adopted word is the word with physical meaning in target text, is specifically as follows adverbial word, negative word, and/or conjunction.
In embodiments of the present invention, extracting emotion word sequence detailed process from target text is:First to target text Subordinate sentence is carried out, for each clause in target text, emotion word therein, adverbial word, negative word, conjunction is extracted, by which It is ranked up in the order of former target text, constitutes an emotion word sequence.For example, according to sentence, " morning was originally very glad, can It is afternoon wallet to be lost, 555, good grief." the emotion word sequence that obtains is:[it is glad, but, have bad luck].
102nd, generate emotion word sequence vector corresponding with the emotion word sequence.
For the embodiment of the present invention, can according to bag of words generate corresponding with emotion word sequence emotion word sequence to Amount.Will emotion word sequence as the word in bag of words, by each feelings in the emotion word sequence of target text and bag of words Sense word sequence is matched respectively, then will be put 1 with the successful digit of emotion word sequences match in bag of words, and be matched unsuccessful Digit set to 0, finally obtain emotion word sequence vector corresponding with target text emotion word sequence.It should be noted that emotion Between word sequence in the case of identical can the match is successful, i.e., the content, word between emotion word sequence only in word Quantity, can the match is successful in the case of the order whole identical of word.
For example, the emotion word sequence of target text is:[it is glad, but, have bad luck];Emotion word sequence bag in bag of words Include:[it is glad, but, have bad luck], [it is happy, but, it is sad], [it is sad, and, happiness], [it is unlucky, but, it is glad].Due to mesh Emotion word sequence in mark text in emotion word sequence [glad, but, have bad luck] and bag of words [it is glad, but, have bad luck] in In all of word all same, and [glad, but, have bad luck] " happiness ", " but ", the order and target text emotion word of " haveing bad luck " In sequence, the order of word is identical, therefore emotion word sequence in [glad, but, have bad luck] and bag of words [it is glad, but, It is mould] the match is successful.In addition, though emotion word sequence [it is unlucky, but, it is glad] including all of in target text emotion word sequence Word, but in [unlucky, but, glad] " happiness ", " but ", the order of " haveing bad luck " and word in target text emotion word sequence Order is different, therefore emotion word sequence in the emotion sequence [glad, but, have bad luck] of target text and bag of words [it is unlucky, But, it is glad] and matching is unsuccessful.Thus the corresponding object vector of emotion sequence for obtaining target text is [1,0,0,0].
103rd, emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model.
Wherein, the corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion.At this In inventive embodiments, the emotion word sequence vector stored in preset disaggregated model is obtained according to training text, i.e., preset point Emotion word sequence vector in class model can represent the emotion word sequence of training text, emotion corresponding with emotion word sequence vector Mark for representing the feeling polarities of training text, emotion mark is according to the actual feeling polarities of training text by backstage personnel Arrange, the emotion mark is specifically as follows:Pain, disdain, hate, envying, it is happy, trust, grateful, rejoice etc., this Bright embodiment is not specifically limited.
In embodiments of the present invention, obtain corresponding with the emotion word sequence vector of target text according to preset disaggregated model Emotion mark detailed process can be:The emotion word sequence vector phase with target text is searched first from preset disaggregated model Like degree highest primary vector, then primary vector corresponding emotion is marked the sentiment analysis as the target text tie Really.For example, from preset disaggregated model, acquisition with the emotion word sequence vector similarity highest primary vector of target text is Obtained according to training text A, that is, the primary vector for obtaining is obtained according to the corresponding emotion word sequences of training text A, if The corresponding emotions of training text A are labeled as " happiness ", then get primary vector correspondence emotion and be labeled as " happiness ", that is, obtain mesh The emotion of the emotion word sequence vector of mark text is labeled as " happiness ".
104th, the emotion of the acquisition is marked into the sentiment analysis result as the target text.
The embodiment of the present invention provides a kind of sentiment analysis method, will extract the conversion of emotion word sequence first from target text Into emotion word sequence vector, emotion corresponding with the emotion word sequence vector of target text is obtained according to preset disaggregated model then The emotion of the acquisition is finally marked the sentiment analysis result as the target text by mark.Due to the emotion of the present invention Word sequence compared to existing technology in emotional category flag sequence resource easily obtain, and can obtain by comparison more Information, therefore the mark of emotional category in existing category label sequence solved by the embodiment of the present invention obtain difficult asking Topic, improves the accuracy rate of sentiment analysis.
In order to preferably illustrate to sentiment analysis method provided in an embodiment of the present invention, following examples will be for upper State each step to be refined and extended.
Further, sentiment analysis method also includes:Obtain the word feature vector of the target text;To target text This word feature vector and emotion word sequence vector carries out merging the characteristic vector for obtaining the target text.Wherein, obtain mesh The detailed process of word feature vector of mark text is:Participle is carried out to the target text first, then Filtration Goal text point Insignificant word in word result, and single word feature and/or adjacent word feature are chosen from the filter result of the target text As candidate word feature set, same or meter is carried out with the word in bag of words finally by the word in the candidate word feature set Calculate, obtain the word feature vector of the target text.
For example, the word in candidate word feature set is:Ma Yun is very, rich;Word in bag of words is:Ma Yun, makes a good deal of money,; The word in candidate word feature set is carried out into meaning of a word characteristic vector that is same or being calculated target text with the word in bag of words then For (1,0,0), " Ma Yun " that will be in candidate word feature set in " Ma Yun " and bag of words carry out with or computing obtain 1;To wait Select " making a good deal of money " in word feature set in "true" and bag of words carry out with or computing obtain 0;By " rich " in candidate word feature set Same or computing is carried out with " " in bag of words and obtains 0.
Word feature vector and emotion word sequence vector to target text carries out merging the characteristic vector for obtaining target text, Emotion word sequence vector i.e. by splicing word feature vector and the target text of target text obtain the feature of target text to Amount.For example, the word feature vector X of target text1..., Xn.The emotion word sequence vector of target text is B1, B2..., Bn, then By merging the word feature vector and emotion word sequence vector of target text and obtaining the characteristic vector of target text it is:X1..., Xn, B1, B2..., Bn
Further, it is described to obtain corresponding with the emotion word sequence vector of the target text according to preset disaggregated model Emotion mark includes:Emotion mark corresponding with the characteristic vector of the target text is obtained according to preset disaggregated model, it is described The corresponding relation of characteristic vector and emotion mark is stored in preset disaggregated model.Wherein, the characteristic vector in preset disaggregated model Obtained according to training text, i.e., obtain the emotion word sequence vector and word feature vector of training text, Ran Hougen first The characteristic vector of training text is obtained according to the emotion word sequence vector and word feature vector of Fusion training text.Need explanation Be, the acquisition process of emotion word sequence vector and word feature vector of training text and the emotion word sequence of target text to Amount, word feature vector acquisition process it is identical, the embodiment of the present invention will not be described here.
In embodiments of the present invention, first the word feature vector of target text and emotion word sequence vector are carried out merging To the characteristic vector of target text, feelings corresponding with the characteristic vector of the target text are obtained according to preset disaggregated model then Sense mark, the emotion for obtaining finally is marked the sentiment analysis result as target text.Due to the characteristic vector of target text It is emotion word sequence vector, the word feature vector of fusion, therefore the characteristic vector of target text can preferably expresses target text This emotion and meaning of a word feature, the emotion mark that obtained by the characteristic vector of target text can the degree of accuracy give expression to target text This Sentiment orientation, so as to the embodiment of the present invention further increasing the accuracy rate of sentiment analysis.
Specifically, the preset disaggregated model is arranged using following methods:Emotion word sequence is extracted from training text;Will Emotion word order of the corresponding relation that the emotion word sequence of the training text is marked with the emotion of training text as training text Row feature;The preset disaggregated model is trained according to the emotion word sequence signature of the training text.Wherein, with regard to from training text The process that emotion word sequence is extracted in this can refer to from target text the process for extracting emotion word sequence, embodiment of the present invention here Repeat no more.
Further, it is described according to the emotion word sequence signature of the training text train the preset disaggregated model it Before, methods described also includes:According to CSR (Class sequential rules, class sequence rules) algorithm to the emotion word Sequence signature is filtered.Wherein, the set of a frequent item set comprising maximum frequent itemsets can be obtained by CSR algorithms, is pressed According to the length order from big to small of frequent item set, the frequent Son item set of the non-frequent transaction of a frequent item set is obtained successively, so as to All of frequent item set Candidate Set is obtained, and all of correlation rule is obtained by frequent item set Candidate Set.Therefore, the present invention is real Apply example and the non-emotion word sequence signature for frequently occurring can be filtered out by CSR algorithms, and according to the emotion word sequence after filtration Feature obtains all of correlation rule.
Specifically, the preset disaggregated model is trained to include according to the emotion word sequence signature of the training text:According to The emotion word sequence of the training text is converted into emotion word sequence vector by bag of words;Word feature to the training text Vector and emotion word sequence vector carry out merging the characteristic vector for obtaining the target text;By the feature of each training text to Amount and the emotion mark training preset disaggregated model.
It should be noted that emotion word sequence is converted into the bag of words of emotion word sequence vector different from target is literary Originally the bag of words of word feature vector are converted to.Emotion word sequence is converted into storing in the bag of words of emotion word sequence vector Be the emotion word sequence obtained by training text, will emotion word sequence as the word in bag of words;And by target text What is stored in the bag of words for being converted to word feature vector is the word obtained by training text.
Further, the embodiment of the present invention provides a kind of sentiment analysis device, as shown in Fig. 2 described device includes:Extract Unit 21, signal generating unit 22, acquiring unit 23, determining unit 24.
Extraction unit 21, for emotion word sequence is extracted from target text, the emotion word sequence includes that order is carried The emotion word for taking and non-semantic word;
Wherein, non-semantic word is the word with physical meaning in target text, be specifically as follows adverbial word, negative word and/ Or conjunction.In embodiments of the present invention, extracting emotion word sequence detailed process from target text is:First to target text Subordinate sentence is carried out, for each clause in target text, emotion word therein, adverbial word, negative word, conjunction is extracted, by which It is ranked up in the order of former target text, constitutes an emotion word sequence.For example, according to sentence, " morning was originally very glad, can It is afternoon wallet to be lost, 555, good grief." the emotion word sequence that obtains is:[it is glad, but, have bad luck].
Signal generating unit 22, for generating emotion word sequence vector corresponding with the emotion word sequence;
For the embodiment of the present invention, can according to bag of words generate corresponding with emotion word sequence emotion word sequence to Amount.Will emotion word sequence as the word in bag of words, will be each in the emotion word sequence of target text and bag of words Emotion word sequence is matched respectively, then will put 1 with the successful digit of emotion word sequences match in bag of words, matching not into The digit of work(sets to 0, and finally obtains emotion word sequence vector corresponding with target text emotion word sequence.It should be noted that feelings Sense word sequence between only in the case of identical can the match is successful, i.e., between emotion word sequence only in word Hold, the quantity of word, can the match is successful in the case of the order whole identical of word.
For example, the emotion word sequence of target text is:[it is glad, but, have bad luck];Emotion word sequence bag in bag of words Include:[it is glad, but, have bad luck], [it is happy, but, it is sad], [it is sad, and, happiness], [it is unlucky, but, it is glad].Due to mesh Own in mark text emotion word sequence [glad, but, have bad luck] and emotion word sequence in bag of words [glad, but, have bad luck] Word all same, and in [glad, but, have bad luck] " happiness ", " but ", the order and target text emotion word sequence of " haveing bad luck " The order of interior word is identical, therefore emotion word sequence in [glad, but, have bad luck] and bag of words [it is glad, but, have bad luck] With success.In addition, though emotion word sequence [it is unlucky, but, it is glad] including all of word in target text emotion word sequence, but It is in [unlucky, but, glad] " happiness ", " but ", the order of " haveing bad luck " and the order of word in target text emotion word sequence Emotion word sequence in difference, therefore the emotion sequence [glad, but, have bad luck] of target text and bag of words [it is unlucky, can It is be, glad] matching is unsuccessful.Thus the corresponding object vector of emotion sequence for obtaining target text is [1,0,0,0].
Acquiring unit 23, for obtaining corresponding with the emotion word sequence vector of the target text according to preset disaggregated model Emotion mark;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
In embodiments of the present invention, the emotion word sequence vector for storing in preset disaggregated model is obtained according to training text , i.e., the emotion word sequence vector in preset disaggregated model can represent the emotion word sequence of training text, with emotion word sequence to Measure corresponding emotion to mark for representing the feeling polarities of training text, emotion mark is according to training text by backstage personnel What actual feeling polarities were arranged, the emotion mark is specifically as follows:Pain, disdain, hate, envying, it is happy, trust, it is grateful, Rejoice etc., the embodiment of the present invention is not specifically limited.
In embodiments of the present invention, obtain corresponding with the emotion word sequence vector of target text according to preset disaggregated model Emotion mark detailed process can be:The emotion word sequence vector phase with target text is searched first from preset disaggregated model Like degree highest primary vector, then primary vector corresponding emotion is marked the sentiment analysis as the target text tie Really.For example, from preset disaggregated model, acquisition with the emotion word sequence vector similarity highest primary vector of target text is Obtained according to training text A, that is, the primary vector for obtaining is obtained according to the corresponding emotion word sequences of training text A, if The corresponding emotions of training text A are labeled as " happiness ", then get primary vector correspondence emotion and be labeled as " happiness ", that is, obtain mesh The emotion of the emotion word sequence vector of mark text is labeled as " happiness ".
Determining unit 24, marks the sentiment analysis result as the target text for the emotion using the acquisition.
The embodiment of the present invention provides a kind of sentiment analysis device, will extract the conversion of emotion word sequence first from target text Into emotion word sequence vector, emotion corresponding with the emotion word sequence vector of target text is obtained according to preset disaggregated model then The emotion of the acquisition is finally marked the sentiment analysis result as the target text by mark.Due to the emotion of the present invention Word sequence compared to existing technology in emotional category flag sequence resource easily obtain, and can obtain by comparison more Information, therefore the mark of emotional category in existing category label sequence solved by the embodiment of the present invention obtain difficult asking Topic, improves the accuracy rate of sentiment analysis.
Further, as shown in figure 3, described device also includes:
The acquiring unit 23, is additionally operable to obtain the word feature vector of the target text;
Integrated unit 25, obtains for carrying out fusion to the word feature vector of the target text and emotion word sequence vector The characteristic vector of the target text.
Wherein, the detailed process of the word feature vector of acquisition target text is:Participle is carried out to the target text first, Then insignificant word in Filtration Goal text word segmentation result, and it is special that single word is chosen from the filter result of the target text Levy and/or adjacent word feature is used as candidate word feature set, finally by the word and bag of words in the candidate word feature set In word carry out with or calculate, obtain the word feature vector of the target text.
For example, the word in candidate word feature set is:Ma Yun is very, rich;Word in bag of words is:Ma Yun, makes a good deal of money,; The word in candidate word feature set is carried out into meaning of a word characteristic vector that is same or being calculated target text with the word in bag of words then For (1,0,0), " Ma Yun " that will be in candidate word feature set in " Ma Yun " and bag of words carry out with or computing obtain 1;To wait Select " making a good deal of money " in word feature set in "true" and bag of words carry out with or computing obtain 0;By " rich " in candidate word feature set Same or computing is carried out with " " in bag of words and obtains 0.
Word feature vector and emotion word sequence vector to target text carries out merging the characteristic vector for obtaining target text, Emotion word sequence vector i.e. by splicing word feature vector and the target text of target text obtain the feature of target text to Amount.For example, the word feature vector X of target text1..., Xn.The emotion word sequence vector of target text is B1, B2..., Bn, then By merging the word feature vector and emotion word sequence vector of target text and obtaining the characteristic vector of target text it is:X1..., Xn, B1, B2..., Bn
The acquiring unit 23, specifically for the characteristic vector pair with the target text is obtained according to preset disaggregated model The emotion mark answered, the corresponding relation that the characteristic vector that is stored with the preset disaggregated model is marked with emotion.Wherein, preset point Characteristic vector in class model is obtained according to training text, i.e., first obtain training text emotion word sequence vector and Word feature vector, then obtains the spy of training text according to the emotion word sequence vector and word feature vector of Fusion training text Levy vector.It should be noted that the acquisition process of the emotion word sequence vector and word feature vector of training text and target text This emotion word sequence vector, the acquisition process of word feature vector are identical, and the embodiment of the present invention will not be described here.
In embodiments of the present invention, first the word feature vector of target text and emotion word sequence vector are carried out merging To the characteristic vector of target text, feelings corresponding with the characteristic vector of the target text are obtained according to preset disaggregated model then Sense mark, the emotion for obtaining finally is marked the sentiment analysis result as target text.Due to the characteristic vector of target text It is emotion word sequence vector, the word feature vector of fusion, therefore the characteristic vector of target text can preferably expresses target text This emotion and meaning of a word feature, the emotion mark that obtained by the characteristic vector of target text can the degree of accuracy give expression to target text This Sentiment orientation, so as to the embodiment of the present invention further increasing the accuracy rate of sentiment analysis.
Further, as shown in figure 3, described device also includes:
The extraction unit 21, is additionally operable to emotion word sequence is extracted from training text;
The determining unit 24, is additionally operable to the emotion word sequence of the training text and the emotion mark of training text Emotion word sequence signature of the corresponding relation as training text;
Training unit 26, is additionally operable to train the preset classification mould according to the emotion word sequence signature of the training text Type.
Further, as shown in figure 3, described device also includes:
Filter element 27, for being filtered to the emotion word sequence signature according to class sequence rules CSR algorithm.
Specifically, as shown in figure 3, the training unit 26 includes:
Modular converter 261, for the emotion word sequence of the training text is converted into emotion word order according to bag of words Column vector;
Fusion Module 262, for carrying out merging to the word feature vector of the training text and emotion word sequence vector To the characteristic vector of the target text;
Training module 263, for the characteristic vector by each training text and the emotion mark training preset classification mould Type.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion described in detail in certain embodiment Point, may refer to the associated description of other embodiment.
It is understood that said method and the correlated characteristic in device mutually can be referred to.In addition, in above-described embodiment " first ", " second " etc. be, for distinguishing each embodiment, and not represent the quality of each embodiment.
Those skilled in the art can be understood that, for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be described here.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together based on teaching in this.As described above, construct required by this kind of system Structure be obvious.Additionally, the present invention is also not for any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this Bright preferred forms.
In specification mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case where not having these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist Above to, in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, should the method for the disclosure be construed to reflect following intention:I.e. required guarantor The more features of feature is expressly recited in each claim by the application claims ratio of shield.More precisely, such as following Claims it is reflected as, inventive aspect is less than all features of single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more different from embodiment equipment.Can be the module or list in embodiment Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (includes adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In some included features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection appoint One of meaning can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) are realizing sentiment analysis method and device according to embodiments of the present invention (such as Determine the device of website internal chaining grade) in some or all parts some or all functions.The present invention can be with It is embodied as performing some or all equipment or program of device (for example, computer of method as described herein Program and computer program).Such program for realizing the present invention can be stored on a computer-readable medium, Huo Zheke In the form of with one or more signal.Such signal can be downloaded from internet website and be obtained, or in carrier There is provided on signal, or provided with any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of sentiment analysis method, it is characterised in that include:
Emotion word sequence is extracted from target text, the emotion word sequence includes the emotion word and non-semantic of order extraction Word;
Generate emotion word sequence vector corresponding with the emotion word sequence;
Emotion mark corresponding with the emotion word sequence vector of the target text is obtained according to preset disaggregated model;It is described preset The corresponding relation that the emotion word that is stored with disaggregated model sequence vector is marked with emotion;
The emotion of the acquisition is marked into the sentiment analysis result as the target text.
2. method according to claim 1, it is characterised in that methods described also includes:
Obtain the word feature vector of the target text;
Word feature vector and emotion word sequence vector to the target text carries out merging the feature for obtaining the target text Vector.
3. method according to claim 2, it is characterised in that described to be obtained according to preset disaggregated model and target text This corresponding emotion mark of emotion word sequence vector includes:
Emotion mark corresponding with the characteristic vector of the target text, the preset classification mould are obtained according to preset disaggregated model The corresponding relation that the characteristic vector that is stored with type is marked with emotion.
4. the method according to claim 1 or 3, it is characterised in that the preset disaggregated model is arranged using following methods:
Emotion word sequence is extracted from training text;
Feelings of the corresponding relation that the emotion of the emotion word sequence of the training text and training text is marked as training text Sense word sequence feature;
The preset disaggregated model is trained according to the emotion word sequence signature of the training text.
5. method according to claim 4, it is characterised in that the emotion word sequence signature according to the training text Before training the preset disaggregated model, methods described also includes:
The emotion word sequence signature is filtered according to class sequence rules CSR algorithm.
6. method according to claim 4, it is characterised in that trained according to the emotion word sequence signature of the training text The preset disaggregated model includes:
The emotion word sequence of the training text is converted into into emotion word sequence vector according to bag of words;
Word feature vector and emotion word sequence vector to the training text carries out merging the feature for obtaining the target text Vector;
The training preset disaggregated model is marked by the characteristic vector and emotion of each training text.
7. a kind of sentiment analysis device, it is characterised in that include:
Extraction unit, for emotion word sequence is extracted from target text, the emotion word sequence includes the feelings that order is extracted Sense word and non-semantic word;
Signal generating unit, for generating emotion word sequence vector corresponding with the emotion word sequence;
Acquiring unit, for obtaining emotion corresponding with the emotion word sequence vector of the target text according to preset disaggregated model Mark;The corresponding relation that the emotion word sequence vector that is stored with the preset disaggregated model is marked with emotion;
Determining unit, marks the sentiment analysis result as the target text for the emotion using the acquisition.
8. device according to claim 7, it is characterised in that described device also includes:
The acquiring unit, is additionally operable to obtain the word feature vector of the target text;
Integrated unit, obtains the mesh for carrying out fusion to the word feature vector of the target text and emotion word sequence vector The characteristic vector of mark text.
9. device according to claim 8, it is characterised in that the acquiring unit, specifically for according to preset classification mould Type obtains emotion mark corresponding with the characteristic vector of the target text, and be stored with the preset disaggregated model characteristic vector With the corresponding relation of emotion mark.
10. the device according to claim 7 or 9, it is characterised in that described device also includes:
The extraction unit, is additionally operable to emotion word sequence is extracted from training text;
The determining unit, is additionally operable to the corresponding pass for marking the emotion word sequence of the training text and the emotion of training text It is the emotion word sequence signature as training text;
Training unit, is additionally operable to train the preset disaggregated model according to the emotion word sequence signature of the training text.
CN201610966330.5A 2016-10-31 2016-10-31 Sentiment analysis method and device Pending CN106557463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610966330.5A CN106557463A (en) 2016-10-31 2016-10-31 Sentiment analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610966330.5A CN106557463A (en) 2016-10-31 2016-10-31 Sentiment analysis method and device

Publications (1)

Publication Number Publication Date
CN106557463A true CN106557463A (en) 2017-04-05

Family

ID=58443772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610966330.5A Pending CN106557463A (en) 2016-10-31 2016-10-31 Sentiment analysis method and device

Country Status (1)

Country Link
CN (1) CN106557463A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967258A (en) * 2017-11-23 2018-04-27 广州艾媒数聚信息咨询股份有限公司 The sentiment analysis method and system of text message
CN108647205A (en) * 2018-05-02 2018-10-12 深圳前海微众银行股份有限公司 Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing
CN108763510A (en) * 2018-05-30 2018-11-06 北京五八信息技术有限公司 Intension recognizing method, device, equipment and storage medium
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN110020420A (en) * 2018-01-10 2019-07-16 腾讯科技(深圳)有限公司 Text handling method, device, computer equipment and storage medium
CN110097936A (en) * 2019-05-08 2019-08-06 北京百度网讯科技有限公司 Method and apparatus for exporting case history
CN111126046A (en) * 2019-12-06 2020-05-08 腾讯云计算(北京)有限责任公司 Statement feature processing method and device and storage medium
CN111159412A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Classification method and device, electronic equipment and readable storage medium
CN111368555A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Data identification method and device, storage medium and electronic equipment
CN116089602A (en) * 2021-11-04 2023-05-09 腾讯科技(深圳)有限公司 Information processing method, apparatus, electronic device, storage medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200969A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Text sentiment polarity classification system and method based on sentence sequence
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN105930503A (en) * 2016-05-09 2016-09-07 清华大学 Combination feature vector and deep learning based sentiment classification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200969A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Text sentiment polarity classification system and method based on sentence sequence
CN104063427A (en) * 2014-06-06 2014-09-24 北京搜狗科技发展有限公司 Expression input method and device based on semantic understanding
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN105930503A (en) * 2016-05-09 2016-09-07 清华大学 Combination feature vector and deep learning based sentiment classification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王磊 等: "基于主题的文本句情感分析", 《计算机科学》 *
王飞跃 等: "《社会计算的基本方法与应用》", 31 May 2013 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967258A (en) * 2017-11-23 2018-04-27 广州艾媒数聚信息咨询股份有限公司 The sentiment analysis method and system of text message
CN110020420A (en) * 2018-01-10 2019-07-16 腾讯科技(深圳)有限公司 Text handling method, device, computer equipment and storage medium
CN108647205A (en) * 2018-05-02 2018-10-12 深圳前海微众银行股份有限公司 Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing
CN108647205B (en) * 2018-05-02 2022-02-15 深圳前海微众银行股份有限公司 Fine-grained emotion analysis model construction method and device and readable storage medium
CN108763510A (en) * 2018-05-30 2018-11-06 北京五八信息技术有限公司 Intension recognizing method, device, equipment and storage medium
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN110097936A (en) * 2019-05-08 2019-08-06 北京百度网讯科技有限公司 Method and apparatus for exporting case history
CN110097936B (en) * 2019-05-08 2022-08-05 北京百度网讯科技有限公司 Method and device for outputting medical records
CN111126046A (en) * 2019-12-06 2020-05-08 腾讯云计算(北京)有限责任公司 Statement feature processing method and device and storage medium
CN111126046B (en) * 2019-12-06 2023-07-14 腾讯云计算(北京)有限责任公司 Sentence characteristic processing method and device and storage medium
CN111159412A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Classification method and device, electronic equipment and readable storage medium
CN111159412B (en) * 2019-12-31 2023-05-12 腾讯科技(深圳)有限公司 Classification method, classification device, electronic equipment and readable storage medium
CN111368555A (en) * 2020-05-27 2020-07-03 腾讯科技(深圳)有限公司 Data identification method and device, storage medium and electronic equipment
CN111368555B (en) * 2020-05-27 2020-08-28 腾讯科技(深圳)有限公司 Data identification method and device, storage medium and electronic equipment
CN116089602A (en) * 2021-11-04 2023-05-09 腾讯科技(深圳)有限公司 Information processing method, apparatus, electronic device, storage medium, and program product
CN116089602B (en) * 2021-11-04 2024-05-03 腾讯科技(深圳)有限公司 Information processing method, apparatus, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN106557463A (en) Sentiment analysis method and device
Mukhtar et al. Urdu sentiment analysis using supervised machine learning approach
CN107291783B (en) Semantic matching method and intelligent equipment
CN105283868B (en) For the method for probability resolution, component, medium and system
US8468167B2 (en) Automatic data validation and correction
CN109885824A (en) A kind of Chinese name entity recognition method, device and the readable storage medium storing program for executing of level
CN109299457A (en) A kind of opining mining method, device and equipment
CN105930452A (en) Smart answering method capable of identifying natural language
CN107862322B (en) Method, device and system for classifying picture attributes by combining picture and text
CN111291210A (en) Image material library generation method, image material recommendation method and related device
CN112948535A (en) Method and device for extracting knowledge triples of text and storage medium
CN105955962A (en) Method and device for calculating similarity of topics
CN102314452B (en) A kind of method and system of being undertaken navigating by input method platform
CN106708940A (en) Method and device used for processing pictures
CN108415972A (en) text emotion processing method
CN105653547A (en) Method and device for extracting keywords of text
CN110688455A (en) Method, medium and computer equipment for filtering invalid comments based on artificial intelligence
CN111143708A (en) Search device, search method, search program, and recording medium
WO2021151929A1 (en) Math detection in handwriting
CN112613321A (en) Method and system for extracting entity attribute information in text
CN104615910A (en) Method for predicating helix interactive relationship of alpha transmembrane protein based on random forest
CN112597299A (en) Text entity classification method and device, terminal equipment and storage medium
Tschirschwitz et al. A dataset for analysing complex document layouts in the digital humanities and its evaluation with Krippendorff’s alpha
CN105893363A (en) A method and a system for acquiring relevant knowledge points of a knowledge point
Mzamo et al. Introducing XGL-a lexicalised probabilistic graphical lemmatiser for isiXhosa

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170405

RJ01 Rejection of invention patent application after publication