Summary of the invention
The present invention is based on the above problems, proposes a kind of new technical solution, can solve accurately to analyze
Document is directed to the technical issues of emotion tendency of actualite out.
In view of this, an aspect of of the present present invention proposes a kind of information processing method, comprising: obtain the text in current document
Word in shelves sentence and the document sentence, and determine according to default dictionary the word polarity number of the word;According to described
Each of the document sentence word, the word polarity number of the word and sentence polarity number computation model calculate institute
State the sentence polarity number of document sentence;According to the sentence polarity number of each of the current document document sentence and
Feature set of words determines the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document
, the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure
It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately
The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document
Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen
Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
In the above-mentioned technical solutions, it is preferable that the institute according to each of the current document document sentence
Predicate sentence polarity number and Feature Words set analysis determine the emotion tendency of the current document, specifically include: according to each institute
The sentence polarity number for stating document sentence classifies to the document sentence, to obtain the praising property language in the document sentence
Sentence and demoting property sentence;All praising property languages are determined according to the praising property sentence, the demoting property sentence and the feature set of words
Sentence is associated with the first incidence coefficient of the feature set of words and all demoting property sentences with the second of the feature set of words
Coefficient;According to first incidence coefficient, second incidence coefficient, each document sentence and each document sentence
The sentence polarity number determine the topic polarity number of the current document;According to the topic polarity number of the current document
Classify to the current document, and determines the emotion tendency of the current document according to classification results.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence
Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document
A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence
Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence
Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains
Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document
With the incidence coefficient of feature set of words, to be worked as by the topic polarity number that current document is accurately calculated to accurately determine
Preceding document is directed to the emotion tendency of actualite.
In the above-mentioned technical solutions, it is preferable that receive calculation command, the institute of the current document is calculated by following equation
State topic polarity number: Wherein, W (d)
Indicate the topic polarity number of the current document d, αposIndicate the institute of all the praising property sentences and the feature set of words
State the first incidence coefficient, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos
(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence institute
Predicate sentence polarity number, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus
The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as
Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document
Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when
Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And
It receives first and determines order, the word polarity number of the word is determined by following procedure: counting the word respectively
First number and each character that each character occurs in the commendatory term dictionary are in the derogatory term dictionary
Second number occurred;It is true according to first number, second number, the commendatory term dictionary and the derogatory term dictionary
The commendation weighted value and derogatory sense weighted value of the fixed character;According to the commendation weighted value and derogatory sense weighted value calculating
The character polarity number of character, and the character polarity number by counting each character determines the list of the word
Word polarity number.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary
Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists
The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore,
The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity
Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that receive second and determine order, the commendation weight is determined by following equation
Value and the derogatory sense weighted value: Wherein, POciIndicate i-th of character in the word
The commendation weighted value, NOciIndicate the derogatory sense weighted value of i-th of character in the word, f (pci) indicate i-th
First number that a character occurs in the commendatory term dictionary, f (nci) indicate that i-th of character is demoted described
Second number occurred in adopted word dictionary, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary,
N0 indicates the quantity of the unduplicated derogatory sense character in the derogatory term dictionary, and m0 and n0 are positive integer, f (pcj) indicate institute
State the third number that j-th in the commendatory term dictionary commendation character occurs in the commendatory term dictionary, f (ncj) indicate institute
State the 4th number that j-th in the derogatory term dictionary derogatory sense character occurs in the derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal
Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character
It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the token-category and the word according to belonging to the word are in institute
State the word polarity effect value that the word is arranged to subsequent words in the position in document sentence;And the sentence polarity number calculates
Model includes following formula: Its
In, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate the institute of j-th of word in the document sentence
State word polarity number, g (wj) indicating institute's predicate polarity effect value of j-th of word, a is indicated in the document sentence
The quantity of the word, and a is positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence
Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair
Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative
Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and
Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word
It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, therefore, to document sentence
Sentence polarity number calculating process in, comprehensively consider opposite between degree adverb, negative word and degree adverb and negative word
Position, to realize the accurate calculating to the sentence polarity number of document sentence.
Another aspect of the present invention proposes a kind of information processing system, comprising: the first determination unit, it is current for obtaining
The word in document sentence and the document sentence in document, and determine according to default dictionary the word polarity of the word
Value;Computing unit, for according to the word polarity number of each of the document sentence word, the word and
Sentence polarity number computation model calculates the sentence polarity number of the document sentence;Processing unit, for according to the current document
Each of the document sentence the sentence polarity number and feature set of words determine the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document
, the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure
It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately
The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document
Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen
Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
In the above-mentioned technical solutions, it is preferable that the processing unit includes: taxon, for according to each text
Shelves sentence the sentence polarity number classify to the document sentence, with obtain the praising property sentence in the document sentence and
Demoting property sentence;Second determination unit determines institute for the praising property sentence according to, the demoting property sentence and the feature set of words
There are the praising property sentence and the first incidence coefficient of the feature set of words and all demoting property sentences and the feature word set
The second incidence coefficient closed, according to first incidence coefficient, second incidence coefficient, each document sentence and each
The sentence polarity number of the document sentence determines the topic polarity number of the current document, according to the institute of the current document
It states topic polarity number to classify to the current document, and determines the Sentiment orientation of the current document according to classification results
Property.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence
Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document
A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence
Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence
Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains
Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document
With the incidence coefficient of feature set of words, to be worked as by the topic polarity number that current document is accurately calculated to accurately determine
Preceding document is directed to the emotion tendency of actualite.
In the above-mentioned technical solutions, it is preferable that second determination unit is specifically used for, and calculation command is received, under
Column formula calculates the topic polarity number of the current document: Wherein, W (d) indicates the topic pole of the current document d
Property value, αposIndicate first incidence coefficient of all the praising property sentences and the feature set of words, βnegIndicate all institutes
State second incidence coefficient of demoting property sentence Yu the feature set of words, Spos(px) indicate and praising property sentence described in x-th
The sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence the sentence polarity number, k expression described in praising property language
The quantity of sentence, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus
The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as
Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document
Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when
Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And
First determination unit is specifically used for, and receives first and determines order, the word of the word is determined by following procedure
Polarity number: first number and each institute that each of described word character occurs in the commendatory term dictionary are counted respectively
State second number that character occurs in the derogatory term dictionary;According to first number, second number, the commendation
Word dictionary and the derogatory term dictionary determine the commendation weighted value and derogatory sense weighted value of the character;According to the commendation weighted value
The character polarity number of the character, and the character polarity by counting each character are calculated with the derogatory sense weighted value
Value determines the word polarity number of the word.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary
Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists
The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore,
The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity
Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that first determination unit is specifically used for, and receives second and determines order, leads to
It crosses following equation and determines the commendation weighted value and the derogatory sense weighted value:Its
In, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate i-th in the word
The derogatory sense weighted value of the character, f (pci) indicate i-th of character occurs in the commendatory term dictionary described the
Number, f (nci) indicate second number for occurring in the derogatory term dictionary of i-th of character, described in m0 expression
The quantity of unduplicated commendation character in commendatory term dictionary, n0 indicate the unduplicated derogatory sense character in the derogatory term dictionary
Quantity, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary described
The third number occurred in commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary described
The 4th number occurred in derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal
Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character
It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the computing unit includes: setting unit, for according to the word institute
The word is arranged to the word pole of subsequent words in position of the token-category and the word belonged in the document sentence
Property influence value;And the sentence polarity number computation model includes following formula:Wherein, SpIndicate the document
The sentence polarity number of sentence p, SwjIndicate the word polarity number of j-th of word in the document sentence, g (wj)
Indicate institute's predicate polarity effect value of described j-th word, a indicates the quantity of word described in the document sentence, and a
For positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence
Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair
Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative
Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and
Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word
It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence.Therefore, to document sentence
Sentence polarity number calculating process in, comprehensively consider opposite between degree adverb, negative word and degree adverb and negative word
Position, to realize the accurate calculating to the sentence polarity number of document sentence.It according to the technical solution of the present invention, can be accurately
The topic polarity number that current document is directed to actualite is calculated, so as to accurately analyze the Sentiment orientation of current document
Property.
Specific embodiment
It is with reference to the accompanying drawing and specific real in order to be more clearly understood that the above objects, features and advantages of the present invention
Applying mode, the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application
Feature in example and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also
To be implemented using other than the one described here other modes, therefore, protection scope of the present invention is not by described below
Specific embodiment limitation.
Fig. 1 shows the flow diagram of information processing method according to an embodiment of the invention.
As shown in Figure 1, information processing method according to an embodiment of the invention, comprising:
Step 102, the word in the document sentence and the document sentence in current document is obtained, and according to default dictionary
Determine the word polarity number of the word;
Step 104, according to each of the document sentence word, the word the word polarity number and
Sentence polarity number computation model calculates the sentence polarity number of the document sentence;
Step 106, according to the sentence polarity number and Feature Words of each of the current document document sentence
Set determines the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document
, the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure
It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately
The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document
Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen
Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
Specifically, current document relevant to actualite is obtained, counts each character in word to accurately determine
The word polarity number of the word, then in the sentence polarity number for determining document sentence, due to the various words in document sentence
Word (word of modification especially being played to the word with emotion, such as degree adverb and negative word) under classification
The emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative word is " no " " non-"
Deng, and position of the word in document sentence can also affect the emotion tendency of document sentence, for example, degree adverb
It is also different before negative word with influence of the degree adverb after negative word to the emotion tendency of document sentence, it is therefore, comprehensive
It closes and considers that the position of token-category and word in document sentence belonging to each word can accurately determine document sentence
Sentence polarity number, finally in the emotion tendency for determining current document, due to different document sentences describe it is same current
The Feature Words of topic have difference, therefore, talk about according to the sentence polarity number of each document sentence in current document and with current
The emotion tendency that current document is directed to actualite can be accurately determined by inscribing relevant feature set of words, it is preferable that obtained
When taking current document, current document relevant with actualite is obtained according to feature set of words relevant to actualite, then
Current document is pre-processed, which specifically includes: deleting meaningless word and character in current document, for example,
Delete in current document " " the meaningless word such as " ", character includes number and punctuation mark etc., is worked as to improve
The accuracy of the Sentiment orientation of preceding document, wherein word can be by the word that multiple characters are constituted, certainly, by a character
The word of composition is also possible to word.
In the above-mentioned technical solutions, it is preferable that the step 106 specifically includes: according to the institute of each document sentence
Predicate sentence polarity number classifies to the document sentence, to obtain the praising property sentence in the document sentence and demoting property sentence;
All praising property sentences and the Feature Words are determined according to the praising property sentence, the demoting property sentence and the feature set of words
First incidence coefficient of set and the second incidence coefficient of all the demoting property sentences and the feature set of words;According to described
The sentence polarity of one incidence coefficient, second incidence coefficient, each document sentence and each document sentence
Value determines the topic polarity number of the current document;According to the topic polarity number of the current document to the current document
Classify, and determines the emotion tendency of the current document according to classification results.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence
Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document
A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence
Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence
Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains
Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document
With the incidence coefficient of feature set of words, that is, calculates all praising property sentences and all demoting property sentences and be associated with respectively with actualite
Coefficient, to accurately determine current document by the topic polarity number that current document is accurately calculated for actualite
Emotion tendency, for example, if the topic polarity number of current document be greater than zero, current document for actualite be commendation see
Point, if the topic polarity number of current document is equal to zero, current document is neutral viewpoint for actualite, if current document
Topic polarity number is less than zero, then current document is demoting property viewpoint for actualite.
Certainly, it if be independent from each other between the document sentence of current document, i.e., is not closed between each document sentence
Connection relationship can directly can be obtained by the topic polarity number of current document by the sentence polarity number for the document sentence that adds up, from
And current document is obtained to the emotion tendency of actualite according to topic polarity number, wherein document sentence includes but is not limited to
Praising property sentence and demoting property sentence, praising property sentence for example, document sentence can also include neutral sentence, and in document sentence are demoted
Property sentence and the quantity of neutral sentence can be zero, in addition, add up table in the formula of topic polarity number for calculating current document
Show praising property sentence and influence of the demoting property sentence to the topic polarity number of current document, to improve the topic polarity being calculated
The accuracy of value.
In the above-mentioned technical solutions, it is preferable that receive calculation command, the institute of the current document is calculated by following equation
State topic polarity number: Wherein, W (d)
Indicate the topic polarity number of the current document d, αposIndicate the institute of all the praising property sentences and the feature set of words
State the first incidence coefficient, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos
(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence institute
Predicate sentence polarity number, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus
The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as
Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document
Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when
Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And
It receives first and determines order, the word polarity number of the word is determined by following procedure: counting the word respectively
First number and each character that each character occurs in the commendatory term dictionary are in the derogatory term dictionary
Second number occurred;It is true according to first number, second number, the commendatory term dictionary and the derogatory term dictionary
The commendation weighted value and derogatory sense weighted value of the fixed character;According to the commendation weighted value and derogatory sense weighted value calculating
The character polarity number of character, and the character polarity number by counting each character determines the list of the word
Word polarity number.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary
Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists
The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore,
The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity
Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that receive second and determine order, the commendation weight is determined by following equation
Value and the derogatory sense weighted value: Wherein, POciIndicate i-th of character in the word
The commendation weighted value, NOciIndicate the derogatory sense weighted value of i-th of character in the word, f (pci) indicate i-th
First number that a character occurs in the commendatory term dictionary, f (nci) indicate that i-th of character is demoted described
Second number occurred in adopted word dictionary, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary,
N0 indicates the quantity of the unduplicated derogatory sense character in the derogatory term dictionary, and m0 and n0 are positive integer, f (pcj) indicate institute
State the third number that j-th in the commendatory term dictionary commendation character occurs in the commendatory term dictionary, f (ncj) indicate institute
State the 4th number that j-th in the derogatory term dictionary derogatory sense character occurs in the derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal
Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character
It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the token-category and the word according to belonging to the word are in institute
State the word polarity effect value that the word is arranged to subsequent words in the position in document sentence;And the sentence polarity number calculates
Model includes following formula: Its
In, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate the institute of j-th of word in the document sentence
State word polarity number, g (wj) indicating institute's predicate polarity effect value of j-th of word, a is indicated in the document sentence
The quantity of the word, and a is positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence
Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair
Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative
Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and
Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word
It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, specifically, works as degree adverb
When before negative word, degree adverb and negative contamination play the role of reinforcing reversed emotion tendency, work as degree adverb
When after negative word, degree adverb and negative contamination play the work for weakening reversed emotion tendency to a certain extent
With will not even change emotion tendency, i.e., when calculating the word polarity effect value of word, need to consider word to subsequent words
Word polarity effect value therefore in the calculating process of the sentence polarity number to document sentence, comprehensively consider degree adverb, no
The relative position between word and degree adverb and negative word is determined, to realize the accurate meter to the sentence polarity number of document sentence
It calculates.Fig. 2 shows the structural schematic diagrams of information processing system according to an embodiment of the invention.
As shown in Fig. 2, information processing system 200 according to an embodiment of the invention, comprising: the first determination unit
202, for obtaining the word in document sentence and the document sentence in current document, and according to the determination of default dictionary
The word polarity number of word;Computing unit 204, for according to each of the document sentence word, the word
The word polarity number and sentence polarity number computation model calculate the sentence polarity number of the document sentence;Processing unit 206,
For according to the sentence polarity number of each of the current document document sentence and the determination of feature set of words
The emotion tendency of current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document
, the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure
It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately
The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document
Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen
Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
Specifically, current document relevant to actualite is obtained, counts each character in word to accurately determine
The word polarity number of the word, then in the sentence polarity number for determining document sentence, due to the various words in document sentence
Word (word of modification especially being played to the word with emotion, such as degree adverb and negative word) under classification
The emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative word is " no " " non-"
Deng, and position of the word in document sentence can also affect the emotion tendency of document sentence, for example, degree adverb
It is also different before negative word with influence of the degree adverb after negative word to the emotion tendency of document sentence, it is therefore, comprehensive
It closes and considers that the position of token-category and word in document sentence belonging to each word can accurately determine document sentence
Sentence polarity number, finally in the emotion tendency for determining current document, due to different document sentences describe it is same current
The Feature Words of topic have difference, therefore, talk about according to the sentence polarity number of each document sentence in current document and with current
The emotion tendency that current document is directed to actualite can be accurately determined by inscribing relevant feature set of words, it is preferable that obtained
When taking current document, current document relevant with actualite is obtained according to feature set of words relevant to actualite, then
Current document is pre-processed, which specifically includes: deleting meaningless word and character in current document, for example,
Delete in current document " " the meaningless word such as " ", character includes number and punctuation mark etc., is worked as to improve
The accuracy of the Sentiment orientation of preceding document, wherein word can be by the word that multiple characters are constituted, certainly, by a character
The word of composition is also possible to word.
In the above-mentioned technical solutions, it is preferable that the processing unit 206 includes: taxon 2062, for according to each
The sentence polarity number of the document sentence classifies to the document sentence, to obtain the praising property in the document sentence
Sentence and demoting property sentence;Second determination unit 2064, for the praising property sentence according to, the demoting property sentence and the Feature Words
Set determines all praising property sentences and the first incidence coefficient of the feature set of words and all demoting property sentences and institute
The second incidence coefficient for stating feature set of words, according to first incidence coefficient, second incidence coefficient, each document
The sentence polarity number of sentence and each document sentence determines the topic polarity number of the current document, is worked as according to described
The topic polarity number of preceding document classifies to the current document, and determines the current document according to classification results
Emotion tendency.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence
Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document
A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence
Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence
Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains
Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document
With the incidence coefficient of feature set of words, that is, calculates all praising property sentences and all demoting property sentences and be associated with respectively with actualite
Coefficient, to accurately determine current document by the topic polarity number that current document is accurately calculated for actualite
Emotion tendency, for example, if the topic polarity number of current document be greater than zero, current document for actualite be commendation see
Point, if the topic polarity number of current document is equal to zero, current document is neutral viewpoint for actualite, if current document
Topic polarity number is less than zero, then current document is demoting property viewpoint for actualite.
Certainly, it if be independent from each other between the document sentence of current document, i.e., is not closed between each document sentence
Connection relationship can directly can be obtained by the topic polarity number of current document by the sentence polarity number for the document sentence that adds up, from
And current document is obtained to the emotion tendency of actualite according to topic polarity number, wherein document sentence includes but is not limited to
Praising property sentence and demoting property sentence, praising property sentence for example, document sentence can also include neutral sentence, and in document sentence are demoted
Property sentence and the quantity of neutral sentence can be zero, in addition, add up table in the formula of topic polarity number for calculating current document
Show praising property sentence and influence of the demoting property sentence to the topic polarity number of current document, to improve the topic polarity being calculated
The accuracy of value.
In the above-mentioned technical solutions, it is preferable that second determination unit 2064 is specifically used for, and receives calculation command, leads to
Cross the topic polarity number that following equation calculates the current document:It wherein, ought be above described in W (d) expression
The topic polarity number of shelves d, αposIndicate that all praising property sentences are associated with system with described the first of the feature set of words
Number, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos(px) indicate and xth
The sentence polarity number of a praising property sentence, Sneg(py) indicate with y-th described in demoting property sentence the sentence polarity
It is worth, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus
The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as
Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document
Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when
Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And
First determination unit 202 is specifically used for, and receives first and determines order, the list of the word is determined by following procedure
Word polarity number: first number that each of described word character occurs in the commendatory term dictionary and each is counted respectively
Second number that the character occurs in the derogatory term dictionary;According to first number, second number, described praise
Adopted word dictionary and the derogatory term dictionary determine the commendation weighted value and derogatory sense weighted value of the character;According to the commendation weight
Value and the derogatory sense weighted value calculate the character polarity number of the character, and the character pole by counting each character
Property value determines the word polarity number of the word.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary
Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists
The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore,
The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity
Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that first determination unit 202 is specifically used for, and receives second and determines order,
The commendation weighted value and the derogatory sense weighted value are determined by following equation:Its
In, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate i-th in the word
The derogatory sense weighted value of the character, f (pci) indicate i-th of character occurs in the commendatory term dictionary described the
Number, f (nci) indicate second number for occurring in the derogatory term dictionary of i-th of character, described in m0 expression
The quantity of unduplicated commendation character in commendatory term dictionary, n0 indicate the unduplicated derogatory sense character in the derogatory term dictionary
Quantity, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary described
The third number occurred in commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary described
The 4th number occurred in derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal
Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character
It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the computing unit 204 includes: setting unit 2042, for according to
The word is arranged to subsequent words in position of the token-category and the word belonging to word in the document sentence
Word polarity effect value;And the sentence polarity number computation model includes following formula: Wherein, SpIndicate the document
The sentence polarity number of sentence p, SwjIndicate the word polarity number of j-th of word in the document sentence, g (wj)
Indicate institute's predicate polarity effect value of described j-th word, a indicates the quantity of word described in the document sentence, and a
For positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence
Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair
Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative
Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and
Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word
It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, specifically, works as degree adverb
When before negative word, degree adverb and negative contamination play the role of reinforcing reversed emotion tendency, work as degree adverb
When after negative word, degree adverb and negative contamination play the work for weakening reversed emotion tendency to a certain extent
With will not even change emotion tendency, i.e., when calculating the word polarity effect value of word, need to consider word to subsequent words
Word polarity effect value therefore in the calculating process of the sentence polarity number to document sentence, comprehensively consider degree adverb, no
The relative position between word and degree adverb and negative word is determined, to realize the accurate meter to the sentence polarity number of document sentence
It calculates.
Fig. 3 shows the schematic illustration of information processing system according to an embodiment of the invention.
As shown in figure 3, information processing system according to an embodiment of the invention 300 (be equivalent to Fig. 2 shows implementation
Information processing system 200 in example), comprising: word polarity number determination unit, sentence polarity number determination unit and emotion tendency
Determination unit obtains more parts of current documents relevant to actualite first, pre-processes to every part of current document, pre-processes
Pre-processed including function word, that is, delete function word (meaningless word, such as) in every part of current document and character (number with
Punctuation mark etc.), obtain the word in the document sentence and document sentence in every part of current document.Then word polarity number determines
Unit is used to count the number that each character in word occurs in default dictionary (commendatory term dictionary and derogatory term dictionary), from
And determine the character polarity number of each character, it is added up the character polarity number of each character in word to obtain the word
Word polarity number, sentence polarity number determination unit are used in analysis document sentence word collocation relationship to the sentence of document sentence
On the basis of polarity number influences, common negative word dictionary and degree adverb dictionary are established, and establishes document polarity quantization meter
Calculation method realizes the sentence polarity number that document sentence is determined according to the word difference Matching Relation of document sentence.Sentiment orientation
Property determination unit be used for Feature Words and document sentence by comprehensively considering the sentence polarity number of document sentence, actualite and
Influence of the degree of association of actualite to current document, establishes quantitative analysis model of the current document relative to actualite, really
The emotion tendency that current document is directed to actualite is determined, if current document is commendation, derogatory sense or neutrality for actualite
Viewpoint.
The following detailed description of technical solution of the present invention:
1. word polarity number determination unit
By counting the number that each character occurs in commendatory term dictionary and derogatory term dictionary come calculating character polarity number,
And then the character polarity number comprising character realizes the calculating of word polarity number in accumulative word, specifically, constructs commendatory term first
Dictionary and derogatory term dictionary, since current document is directed to the emotion tendency of the actualite by the sentence polarity of document sentence
What value determined, what the sentence polarity number of document sentence was determined by the word polarity number of word again, and the word polarity number of word is most
It is determined eventually by the character for forming word.Therefore, pass through the word polarity value-based algorithm based on character, it is contemplated that commendatory term dictionary
With the unequal influence to calculated result of the number of the sample words in derogatory term dictionary, standardized calculation method is used,
The calculation formula of commendation weighted value and derogatory sense weighted value is as follows:
Wherein, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate the list
The derogatory sense weighted value of i-th of character in word, f (pci) indicate i-th of character in the commendatory term dictionary
First number occurred, f (nci) indicate described second that i-th of character occurs in the derogatory term dictionary
Number, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary, and n0 is indicated in the derogatory term dictionary not
The quantity of duplicate derogatory sense character, and m0 and n0 are positive integer, f (pcj) indicate described in j-th in the commendatory term dictionary
The third number that commendation character occurs in the commendatory term dictionary, f (ncj) indicate described in j-th in the derogatory term dictionary
The 4th number that derogatory sense character occurs in the derogatory term dictionary.
In above-mentioned formula, if the number that character occurs in commendatory term dictionary is more, which, which just has, is praised
Justice tendency, if the number occurred in derogatory term dictionary is more, which just has derogatory sense tendency, no person, it is believed that the character
It is neutral.Complete POciAnd NOciCalculating and then be calculated by the following formula the character polarity number of character:
Zci=POci-NOci
Wherein, ZciIndicate the character polarity number of i-th of character in word, POciIt indicates described in i-th in the word
The commendation weighted value of character, NOciIndicate the derogatory sense weighted value of i-th of character in the word.
Assuming that a word w includes m1 character, it can obtain word w's by the character polarity number for the character that adds up
Word polarity number, calculation formula are as follows:
Wherein, YwIndicate that the word polarity number of word w, m1 indicate the quantity of character included in word w, ZcjIndicate single
The character polarity number of j-th of character in word w.
2. sentence polarity number determination unit
Minimum semantic unit of the document sentence as expression viewpoint, has more complicated form, although the word of word
Polarity number can reflect the viewpoint that document sentence is held to a certain degree, but not fully consistent between the two, can not lead to
Simple cumulative word polarity number is crossed to calculate the sentence polarity number of document sentence.It, can by analyzing following document sentence
More preferably to illustrate such case:
A: I dislikes her.B: I very it is disagreeable she.
C: I does not dislike her.D: I less dislikes her.
For above-mentioned several document sentences, if passing through the cumulative sentence to calculate document sentence of word polarity number merely
Polarity number, then the result analyzed is consistent, all expresses a kind of " passiveness " to obtain the emotion tendency of document sentence
Attitude, it is clear that and C, D are actually subjected to expression and mean inconsistent, this is primarily due to negative word " no " and changes " disagreeable "
Emotion tendency.In addition, A and B are compared, a degree adverb " very ", expressed viewpoint more than B ratio A are stronger;C
It is compared with D, a degree adverb " too ", expressed viewpoint be not also identical more than D ratio C.Therefore, in the language for calculating document sentence
When sentence polarity number, the tendentiousness of emotion word is not only analyzed, but also context should be considered on the tendentious influence of emotion word, spy
It is not that the influence of the degree adverb and negative word of modification to entire document sentence emotion tendency is played to emotion word.To understand
Certainly negative word and degree adverb bring emotion word change in polarity problem, establish common negative word word set and degree adverb word
Collection, by the process of the emotional orientation analysis in document sentence, comprehensively considers the effect of negative word and degree adverb, Lai Shixian
The analysis of the sentence polarity number of more accurate document sentence.Tables 1 and 2 lists partial negation word and degree adverb respectively:
Table 1
Table 2
It is assumed that document sentence P={ w1, w2...wa, wherein word is according to positional relationship ordered arrangement in document sentence
, before negative word and degree adverb typically occur in modificand, and often occur in combination, has between each other
There are reinforcement or weakening, when calculating the sentence polarity number of document sentence, needs to consider emotion word wjBefore closest to two
Therefore the influence to sentence polarity number of a word can be calculate by the following formula the sentence polarity number of document sentence:
Wherein, SpIndicate the sentence polarity number of the document sentence P, SwjIndicate j-th of institute in the document sentence
State the word polarity number of word, g (wj) indicate institute's predicate polarity effect value of j-th of word, described in a expression
The quantity of word described in document sentence, and a is positive integer.
In above-mentioned formula, g (wj) it is discrete function, for indicating word polarity effect of the word w for subsequent emotion word,
In view of degree adverb and negative word positional relationship, gives and calculate g (wj) calculation formula:
By analyzing a large amount of text datas, the discovery degree adverb positional relationship different with negative word, reflection is not sympathized with
Feel intensity, when degree adverb is before negative word, often plays the role of reinforcing emotion word reversed polarity, and degree adverb is no
When determining after word, it can often weaken the intensity of emotion word reversed polarity to a certain extent or even the pole of emotion word can't be changed
Property.In summary it analyzes, is the algorithm for calculating the sentence polarity number of document sentence below:
Input: P={ w1, w2...wa, output: P={ w1, w2...wa}
One, for (each wj) // recalculate each word polarity number
If (j==1)
If (j==2)
1:wjIt is negative word;
2:wjIt is degree adverb;
3:wjIt is not degree adverb or negative word;}
If (j >=3)
1:wj-1And wj-2It is not degree adverb or negative word;
2:wj-1And wj-2One of be negative word, another non-degree adverb;
3:wj-1And wj-2One of be negative word, another non-degree adverb;
4:wj-1It is degree adverb, wj-2It is negative word;
5:wj-1It is negative word, wj-2It is degree adverb;}
Two,// calculate Sp
3. emotion tendency determination unit
Current document refers to that analysis current document is directed on the whole for the emotional orientation analysis of the actualite
The viewpoint that actualite is held, for example, actively, it is passive or neutral, current document is by document sentence according to certain grammer and sentence
The ordered sequence that method rule is combined can pass through cumulative text if be independent from each other between document sentence
The sentence polarity number of shelves sentence judges the emotion tendency of document, if in current document the document sentence of commendation account for it is more
Number just illustrates that current document has commendation, if the document sentence of derogatory sense occupies the majority, just illustrates that current document has derogatory sense
Property, if the two is suitable, current document is neutral.
In fact, the document sentence that current document includes is not independent due to there are conjunction, especially turning company
Word utilization is connect, the emotion tendency of document sentence in current document is often enabled to change.In addition, each document sentence is retouched
The Feature Words for stating the same topic also can difference, the Feature Words of some topics can only occur in commendation document sentence,
The Feature Words of other topic can only occur in derogatory sense document sentence, and there are also the Feature Words of some topics to occur simultaneously
In both document sentences, this results in the document sentence of praising property and the document sentence of demoting property in the emotion for calculating current document
Inequality when tendentiousness.To solve the above-mentioned problems, this programme uses document polarity relation analysis model, comprehensively considers text
Shelves sentence polarity, topic feature and document sentence and the topic degree of association are on the polar influence of document.The relation analysis model is detailed
It states as follows:
Feature set of words relevant to actualite is Fr={ fe1, fe2...fem2, current document d includes n1 praising property
The passing judgement on property sentence collection that sentence and demoting property sentence, praising property sentence and demoting property sentence are formed is combined into { P1, P2...Pn1, then currently
The topic polarity number of document can be calculate by the following formula to obtain:
Wherein, W (d) indicates the topic polarity number of the current document d, αposIndicate all praising property sentences with
First incidence coefficient of the actualite, βnegIndicate described the of all demoting property sentences and the actualite
Two incidence coefficients, Spos(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate and y-th of institute
State the sentence polarity number of demoting property sentence, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and
K, l is positive integer, k+l=n1.
α is calculated by following formulaposAnd βneg:
Wherein, m2 indicates the quantity of the Feature Words in feature set of words relevant to actualite, and n1 indicates praising property sentence
With the total quantity of demoting property sentence, and m2 and n1 are positive integer, frpos(fej, Pi) indicate j-th of Feature Words in i-th of praising property language
The number occurred in sentence, frneg(fej, Pi) indicate the number that j-th of Feature Words occurs in i-th of demoting property sentence.
The technical scheme of the present invention has been explained in detail above with reference to the attached drawings, and current document can be accurately calculated and be directed to and work as
The topic polarity number of preceding topic, so as to accurately analyze the emotion tendency of current document.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.