CN106294312B - Information processing method and information processing system - Google Patents

Information processing method and information processing system Download PDF

Info

Publication number
CN106294312B
CN106294312B CN201510369322.8A CN201510369322A CN106294312B CN 106294312 B CN106294312 B CN 106294312B CN 201510369322 A CN201510369322 A CN 201510369322A CN 106294312 B CN106294312 B CN 106294312B
Authority
CN
China
Prior art keywords
sentence
word
document
character
polarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510369322.8A
Other languages
Chinese (zh)
Other versions
CN106294312A (en
Inventor
赵立永
杨建武
张丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University
Priority to CN201510369322.8A priority Critical patent/CN106294312B/en
Publication of CN106294312A publication Critical patent/CN106294312A/en
Application granted granted Critical
Publication of CN106294312B publication Critical patent/CN106294312B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention proposes a kind of information processing methods and a kind of information processing system, which comprises obtains the word in the document sentence and the document sentence in current document, and determines the word polarity number of the word according to default dictionary;The sentence polarity number of the document sentence is calculated according to each of the document sentence word, the word polarity number of the word and sentence polarity number computation model;The emotion tendency of the current document is determined according to the sentence polarity number of each of the current document document sentence and feature set of words.According to the technical solution of the present invention, it can accurately analyze the emotion tendency that current document is directed to actualite.

Description

Information processing method and information processing system
Technical field
The present invention relates to electroporation fields, in particular to a kind of information processing method and a kind of information processing system System.
Background technique
Currently, the fast development of the interactive networks service such as blog, BBS and microblogging, makes sharply increasing for the document of user, The content of these documents includes to comment on the personal of consumer products, video display amusement, news and current affairs, policies and regulations, public personage etc., Include the emotion of user, reflects the personal view of user.Emotional orientation analysis technology can be by identification document The opinion of the support, opposition or the neutrality that include more fully understands emotion tendency of the user in specific comment, to be a People, business and government department provide data supporting.In emotional orientation analysis scheme in the related art, by utilizing data Digging technology and natural language processing technique handle document, to excavate by the product feature of user comment, identification text Include the document sentence of viewpoint in shelves, is come the emotion tendency for determining document sentence according to the polarity word in these document sentences It is actively or passive, and generate the summary info based on product feature.
Due to product have apparent attributive character, be easier by manually mark and the method for machine learning identification with The associated emotion word of product feature determines the emotion tendency of document.But due in document relevant to actualite simultaneously There is no apparent attributive character, and evaluation property word relevant with topic or phrase are complicated and changeable, it is difficult to capture, be difficult to analyze Document is directed to the emotion tendency of actualite.
Therefore, the emotion tendency for how accurately analyzing document for actualite becomes urgent problem to be solved.
Summary of the invention
The present invention is based on the above problems, proposes a kind of new technical solution, can solve accurately to analyze Document is directed to the technical issues of emotion tendency of actualite out.
In view of this, an aspect of of the present present invention proposes a kind of information processing method, comprising: obtain the text in current document Word in shelves sentence and the document sentence, and determine according to default dictionary the word polarity number of the word;According to described Each of the document sentence word, the word polarity number of the word and sentence polarity number computation model calculate institute State the sentence polarity number of document sentence;According to the sentence polarity number of each of the current document document sentence and Feature set of words determines the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document , the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
In the above-mentioned technical solutions, it is preferable that the institute according to each of the current document document sentence Predicate sentence polarity number and Feature Words set analysis determine the emotion tendency of the current document, specifically include: according to each institute The sentence polarity number for stating document sentence classifies to the document sentence, to obtain the praising property language in the document sentence Sentence and demoting property sentence;All praising property languages are determined according to the praising property sentence, the demoting property sentence and the feature set of words Sentence is associated with the first incidence coefficient of the feature set of words and all demoting property sentences with the second of the feature set of words Coefficient;According to first incidence coefficient, second incidence coefficient, each document sentence and each document sentence The sentence polarity number determine the topic polarity number of the current document;According to the topic polarity number of the current document Classify to the current document, and determines the emotion tendency of the current document according to classification results.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document With the incidence coefficient of feature set of words, to be worked as by the topic polarity number that current document is accurately calculated to accurately determine Preceding document is directed to the emotion tendency of actualite.
In the above-mentioned technical solutions, it is preferable that receive calculation command, the institute of the current document is calculated by following equation State topic polarity number: Wherein, W (d) Indicate the topic polarity number of the current document d, αposIndicate the institute of all the praising property sentences and the feature set of words State the first incidence coefficient, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos (px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence institute Predicate sentence polarity number, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And It receives first and determines order, the word polarity number of the word is determined by following procedure: counting the word respectively First number and each character that each character occurs in the commendatory term dictionary are in the derogatory term dictionary Second number occurred;It is true according to first number, second number, the commendatory term dictionary and the derogatory term dictionary The commendation weighted value and derogatory sense weighted value of the fixed character;According to the commendation weighted value and derogatory sense weighted value calculating The character polarity number of character, and the character polarity number by counting each character determines the list of the word Word polarity number.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore, The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that receive second and determine order, the commendation weight is determined by following equation Value and the derogatory sense weighted value: Wherein, POciIndicate i-th of character in the word The commendation weighted value, NOciIndicate the derogatory sense weighted value of i-th of character in the word, f (pci) indicate i-th First number that a character occurs in the commendatory term dictionary, f (nci) indicate that i-th of character is demoted described Second number occurred in adopted word dictionary, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary, N0 indicates the quantity of the unduplicated derogatory sense character in the derogatory term dictionary, and m0 and n0 are positive integer, f (pcj) indicate institute State the third number that j-th in the commendatory term dictionary commendation character occurs in the commendatory term dictionary, f (ncj) indicate institute State the 4th number that j-th in the derogatory term dictionary derogatory sense character occurs in the derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the token-category and the word according to belonging to the word are in institute State the word polarity effect value that the word is arranged to subsequent words in the position in document sentence;And the sentence polarity number calculates Model includes following formula: Its In, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate the institute of j-th of word in the document sentence State word polarity number, g (wj) indicating institute's predicate polarity effect value of j-th of word, a is indicated in the document sentence The quantity of the word, and a is positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, therefore, to document sentence Sentence polarity number calculating process in, comprehensively consider opposite between degree adverb, negative word and degree adverb and negative word Position, to realize the accurate calculating to the sentence polarity number of document sentence.
Another aspect of the present invention proposes a kind of information processing system, comprising: the first determination unit, it is current for obtaining The word in document sentence and the document sentence in document, and determine according to default dictionary the word polarity of the word Value;Computing unit, for according to the word polarity number of each of the document sentence word, the word and Sentence polarity number computation model calculates the sentence polarity number of the document sentence;Processing unit, for according to the current document Each of the document sentence the sentence polarity number and feature set of words determine the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document , the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
In the above-mentioned technical solutions, it is preferable that the processing unit includes: taxon, for according to each text Shelves sentence the sentence polarity number classify to the document sentence, with obtain the praising property sentence in the document sentence and Demoting property sentence;Second determination unit determines institute for the praising property sentence according to, the demoting property sentence and the feature set of words There are the praising property sentence and the first incidence coefficient of the feature set of words and all demoting property sentences and the feature word set The second incidence coefficient closed, according to first incidence coefficient, second incidence coefficient, each document sentence and each The sentence polarity number of the document sentence determines the topic polarity number of the current document, according to the institute of the current document It states topic polarity number to classify to the current document, and determines the Sentiment orientation of the current document according to classification results Property.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document With the incidence coefficient of feature set of words, to be worked as by the topic polarity number that current document is accurately calculated to accurately determine Preceding document is directed to the emotion tendency of actualite.
In the above-mentioned technical solutions, it is preferable that second determination unit is specifically used for, and calculation command is received, under Column formula calculates the topic polarity number of the current document: Wherein, W (d) indicates the topic pole of the current document d Property value, αposIndicate first incidence coefficient of all the praising property sentences and the feature set of words, βnegIndicate all institutes State second incidence coefficient of demoting property sentence Yu the feature set of words, Spos(px) indicate and praising property sentence described in x-th The sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence the sentence polarity number, k expression described in praising property language The quantity of sentence, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And First determination unit is specifically used for, and receives first and determines order, the word of the word is determined by following procedure Polarity number: first number and each institute that each of described word character occurs in the commendatory term dictionary are counted respectively State second number that character occurs in the derogatory term dictionary;According to first number, second number, the commendation Word dictionary and the derogatory term dictionary determine the commendation weighted value and derogatory sense weighted value of the character;According to the commendation weighted value The character polarity number of the character, and the character polarity by counting each character are calculated with the derogatory sense weighted value Value determines the word polarity number of the word.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore, The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that first determination unit is specifically used for, and receives second and determines order, leads to It crosses following equation and determines the commendation weighted value and the derogatory sense weighted value:Its In, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate i-th in the word The derogatory sense weighted value of the character, f (pci) indicate i-th of character occurs in the commendatory term dictionary described the Number, f (nci) indicate second number for occurring in the derogatory term dictionary of i-th of character, described in m0 expression The quantity of unduplicated commendation character in commendatory term dictionary, n0 indicate the unduplicated derogatory sense character in the derogatory term dictionary Quantity, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary described The third number occurred in commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary described The 4th number occurred in derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the computing unit includes: setting unit, for according to the word institute The word is arranged to the word pole of subsequent words in position of the token-category and the word belonged in the document sentence Property influence value;And the sentence polarity number computation model includes following formula:Wherein, SpIndicate the document The sentence polarity number of sentence p, SwjIndicate the word polarity number of j-th of word in the document sentence, g (wj) Indicate institute's predicate polarity effect value of described j-th word, a indicates the quantity of word described in the document sentence, and a For positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence.Therefore, to document sentence Sentence polarity number calculating process in, comprehensively consider opposite between degree adverb, negative word and degree adverb and negative word Position, to realize the accurate calculating to the sentence polarity number of document sentence.It according to the technical solution of the present invention, can be accurately The topic polarity number that current document is directed to actualite is calculated, so as to accurately analyze the Sentiment orientation of current document Property.
Detailed description of the invention
Fig. 1 shows the flow diagram of information processing method according to an embodiment of the invention;
Fig. 2 shows the structural schematic diagrams of information processing system according to an embodiment of the invention;
Fig. 3 shows the schematic illustration of information processing system according to an embodiment of the invention.
Specific embodiment
It is with reference to the accompanying drawing and specific real in order to be more clearly understood that the above objects, features and advantages of the present invention Applying mode, the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application Feature in example and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also To be implemented using other than the one described here other modes, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.
Fig. 1 shows the flow diagram of information processing method according to an embodiment of the invention.
As shown in Figure 1, information processing method according to an embodiment of the invention, comprising:
Step 102, the word in the document sentence and the document sentence in current document is obtained, and according to default dictionary Determine the word polarity number of the word;
Step 104, according to each of the document sentence word, the word the word polarity number and Sentence polarity number computation model calculates the sentence polarity number of the document sentence;
Step 106, according to the sentence polarity number and Feature Words of each of the current document document sentence Set determines the emotion tendency of the current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document , the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
Specifically, current document relevant to actualite is obtained, counts each character in word to accurately determine The word polarity number of the word, then in the sentence polarity number for determining document sentence, due to the various words in document sentence Word (word of modification especially being played to the word with emotion, such as degree adverb and negative word) under classification The emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative word is " no " " non-" Deng, and position of the word in document sentence can also affect the emotion tendency of document sentence, for example, degree adverb It is also different before negative word with influence of the degree adverb after negative word to the emotion tendency of document sentence, it is therefore, comprehensive It closes and considers that the position of token-category and word in document sentence belonging to each word can accurately determine document sentence Sentence polarity number, finally in the emotion tendency for determining current document, due to different document sentences describe it is same current The Feature Words of topic have difference, therefore, talk about according to the sentence polarity number of each document sentence in current document and with current The emotion tendency that current document is directed to actualite can be accurately determined by inscribing relevant feature set of words, it is preferable that obtained When taking current document, current document relevant with actualite is obtained according to feature set of words relevant to actualite, then Current document is pre-processed, which specifically includes: deleting meaningless word and character in current document, for example, Delete in current document " " the meaningless word such as " ", character includes number and punctuation mark etc., is worked as to improve The accuracy of the Sentiment orientation of preceding document, wherein word can be by the word that multiple characters are constituted, certainly, by a character The word of composition is also possible to word.
In the above-mentioned technical solutions, it is preferable that the step 106 specifically includes: according to the institute of each document sentence Predicate sentence polarity number classifies to the document sentence, to obtain the praising property sentence in the document sentence and demoting property sentence; All praising property sentences and the Feature Words are determined according to the praising property sentence, the demoting property sentence and the feature set of words First incidence coefficient of set and the second incidence coefficient of all the demoting property sentences and the feature set of words;According to described The sentence polarity of one incidence coefficient, second incidence coefficient, each document sentence and each document sentence Value determines the topic polarity number of the current document;According to the topic polarity number of the current document to the current document Classify, and determines the emotion tendency of the current document according to classification results.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document With the incidence coefficient of feature set of words, that is, calculates all praising property sentences and all demoting property sentences and be associated with respectively with actualite Coefficient, to accurately determine current document by the topic polarity number that current document is accurately calculated for actualite Emotion tendency, for example, if the topic polarity number of current document be greater than zero, current document for actualite be commendation see Point, if the topic polarity number of current document is equal to zero, current document is neutral viewpoint for actualite, if current document Topic polarity number is less than zero, then current document is demoting property viewpoint for actualite.
Certainly, it if be independent from each other between the document sentence of current document, i.e., is not closed between each document sentence Connection relationship can directly can be obtained by the topic polarity number of current document by the sentence polarity number for the document sentence that adds up, from And current document is obtained to the emotion tendency of actualite according to topic polarity number, wherein document sentence includes but is not limited to Praising property sentence and demoting property sentence, praising property sentence for example, document sentence can also include neutral sentence, and in document sentence are demoted Property sentence and the quantity of neutral sentence can be zero, in addition, add up table in the formula of topic polarity number for calculating current document Show praising property sentence and influence of the demoting property sentence to the topic polarity number of current document, to improve the topic polarity being calculated The accuracy of value.
In the above-mentioned technical solutions, it is preferable that receive calculation command, the institute of the current document is calculated by following equation State topic polarity number: Wherein, W (d) Indicate the topic polarity number of the current document d, αposIndicate the institute of all the praising property sentences and the feature set of words State the first incidence coefficient, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos (px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate with y-th described in demoting property sentence institute Predicate sentence polarity number, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And It receives first and determines order, the word polarity number of the word is determined by following procedure: counting the word respectively First number and each character that each character occurs in the commendatory term dictionary are in the derogatory term dictionary Second number occurred;It is true according to first number, second number, the commendatory term dictionary and the derogatory term dictionary The commendation weighted value and derogatory sense weighted value of the fixed character;According to the commendation weighted value and derogatory sense weighted value calculating The character polarity number of character, and the character polarity number by counting each character determines the list of the word Word polarity number.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore, The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that receive second and determine order, the commendation weight is determined by following equation Value and the derogatory sense weighted value: Wherein, POciIndicate i-th of character in the word The commendation weighted value, NOciIndicate the derogatory sense weighted value of i-th of character in the word, f (pci) indicate i-th First number that a character occurs in the commendatory term dictionary, f (nci) indicate that i-th of character is demoted described Second number occurred in adopted word dictionary, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary, N0 indicates the quantity of the unduplicated derogatory sense character in the derogatory term dictionary, and m0 and n0 are positive integer, f (pcj) indicate institute State the third number that j-th in the commendatory term dictionary commendation character occurs in the commendatory term dictionary, f (ncj) indicate institute State the 4th number that j-th in the derogatory term dictionary derogatory sense character occurs in the derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the token-category and the word according to belonging to the word are in institute State the word polarity effect value that the word is arranged to subsequent words in the position in document sentence;And the sentence polarity number calculates Model includes following formula: Its In, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate the institute of j-th of word in the document sentence State word polarity number, g (wj) indicating institute's predicate polarity effect value of j-th of word, a is indicated in the document sentence The quantity of the word, and a is positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, specifically, works as degree adverb When before negative word, degree adverb and negative contamination play the role of reinforcing reversed emotion tendency, work as degree adverb When after negative word, degree adverb and negative contamination play the work for weakening reversed emotion tendency to a certain extent With will not even change emotion tendency, i.e., when calculating the word polarity effect value of word, need to consider word to subsequent words Word polarity effect value therefore in the calculating process of the sentence polarity number to document sentence, comprehensively consider degree adverb, no The relative position between word and degree adverb and negative word is determined, to realize the accurate meter to the sentence polarity number of document sentence It calculates.Fig. 2 shows the structural schematic diagrams of information processing system according to an embodiment of the invention.
As shown in Fig. 2, information processing system 200 according to an embodiment of the invention, comprising: the first determination unit 202, for obtaining the word in document sentence and the document sentence in current document, and according to the determination of default dictionary The word polarity number of word;Computing unit 204, for according to each of the document sentence word, the word The word polarity number and sentence polarity number computation model calculate the sentence polarity number of the document sentence;Processing unit 206, For according to the sentence polarity number of each of the current document document sentence and the determination of feature set of words The emotion tendency of current document.
In the technical scheme, since the emotion tendency of current document is determined by the document sentence for constituting current document , the emotion tendency of document sentence is to be determined by the word for constituting document sentence, and the emotion tendency of word is by structure It is determined at each character of word.Therefore, by the word polarity number of each word in statistics current document come accurately The sentence polarity number of document sentence is determined, then according to the sentence polarity number and feature of each document sentence for constituting current document Set of words can accurately determine the emotion tendency of current document, for example, can accurately determine current document is that commendation is seen Point or derogatory sense viewpoint or neutral viewpoint, so that the data analysis to current document provides advantageous support.
Specifically, current document relevant to actualite is obtained, counts each character in word to accurately determine The word polarity number of the word, then in the sentence polarity number for determining document sentence, due to the various words in document sentence Word (word of modification especially being played to the word with emotion, such as degree adverb and negative word) under classification The emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative word is " no " " non-" Deng, and position of the word in document sentence can also affect the emotion tendency of document sentence, for example, degree adverb It is also different before negative word with influence of the degree adverb after negative word to the emotion tendency of document sentence, it is therefore, comprehensive It closes and considers that the position of token-category and word in document sentence belonging to each word can accurately determine document sentence Sentence polarity number, finally in the emotion tendency for determining current document, due to different document sentences describe it is same current The Feature Words of topic have difference, therefore, talk about according to the sentence polarity number of each document sentence in current document and with current The emotion tendency that current document is directed to actualite can be accurately determined by inscribing relevant feature set of words, it is preferable that obtained When taking current document, current document relevant with actualite is obtained according to feature set of words relevant to actualite, then Current document is pre-processed, which specifically includes: deleting meaningless word and character in current document, for example, Delete in current document " " the meaningless word such as " ", character includes number and punctuation mark etc., is worked as to improve The accuracy of the Sentiment orientation of preceding document, wherein word can be by the word that multiple characters are constituted, certainly, by a character The word of composition is also possible to word.
In the above-mentioned technical solutions, it is preferable that the processing unit 206 includes: taxon 2062, for according to each The sentence polarity number of the document sentence classifies to the document sentence, to obtain the praising property in the document sentence Sentence and demoting property sentence;Second determination unit 2064, for the praising property sentence according to, the demoting property sentence and the Feature Words Set determines all praising property sentences and the first incidence coefficient of the feature set of words and all demoting property sentences and institute The second incidence coefficient for stating feature set of words, according to first incidence coefficient, second incidence coefficient, each document The sentence polarity number of sentence and each document sentence determines the topic polarity number of the current document, is worked as according to described The topic polarity number of preceding document classifies to the current document, and determines the current document according to classification results Emotion tendency.
It in the technical scheme, is praising property sentence and demoting property by document statement classification according to the sentence polarity number of document sentence Sentence is not independent from each other, and every due to having certain incidence relation between each document sentence in current document A document sentence describe the same actualite Feature Words can difference, for example, some Feature Words are only in praising property sentence Occur, other Feature Words only occur in demoting property sentence, and there are also some Feature Words can be simultaneously in praising property sentence and demoting property sentence Middle appearance, this just makes praising property sentence and demoting property sentence affect the emotion tendency of current document, therefore, obtains Feature set of words relevant to actualite, to calculate all the praising property sentences and all demoting property sentences difference in current document With the incidence coefficient of feature set of words, that is, calculates all praising property sentences and all demoting property sentences and be associated with respectively with actualite Coefficient, to accurately determine current document by the topic polarity number that current document is accurately calculated for actualite Emotion tendency, for example, if the topic polarity number of current document be greater than zero, current document for actualite be commendation see Point, if the topic polarity number of current document is equal to zero, current document is neutral viewpoint for actualite, if current document Topic polarity number is less than zero, then current document is demoting property viewpoint for actualite.
Certainly, it if be independent from each other between the document sentence of current document, i.e., is not closed between each document sentence Connection relationship can directly can be obtained by the topic polarity number of current document by the sentence polarity number for the document sentence that adds up, from And current document is obtained to the emotion tendency of actualite according to topic polarity number, wherein document sentence includes but is not limited to Praising property sentence and demoting property sentence, praising property sentence for example, document sentence can also include neutral sentence, and in document sentence are demoted Property sentence and the quantity of neutral sentence can be zero, in addition, add up table in the formula of topic polarity number for calculating current document Show praising property sentence and influence of the demoting property sentence to the topic polarity number of current document, to improve the topic polarity being calculated The accuracy of value.
In the above-mentioned technical solutions, it is preferable that second determination unit 2064 is specifically used for, and receives calculation command, leads to Cross the topic polarity number that following equation calculates the current document:It wherein, ought be above described in W (d) expression The topic polarity number of shelves d, αposIndicate that all praising property sentences are associated with system with described the first of the feature set of words Number, βnegIndicate second incidence coefficient of all the demoting property sentences and the feature set of words, Spos(px) indicate and xth The sentence polarity number of a praising property sentence, Sneg(py) indicate with y-th described in demoting property sentence the sentence polarity It is worth, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and k, l are positive integer.
In the technical scheme, the topic polarity number of current document can be accurately calculated by above-mentioned formula, thus The emotion tendency that current document is directed to actualite is accurately determined according to the topic polarity number of current document, for example, if working as Polarity number is inscribed if preceding document and is greater than zero, then current document is commendation viewpoint for actualite, if the topic pole of current document Property value be equal to zero, then current document for actualite is neutral viewpoint, if the topic polarity number of current document less than zero, when Preceding document is demoting property viewpoint for actualite.
In the above-mentioned technical solutions, it is preferable that the default dictionary includes: commendatory term dictionary and derogatory term dictionary;And First determination unit 202 is specifically used for, and receives first and determines order, the list of the word is determined by following procedure Word polarity number: first number that each of described word character occurs in the commendatory term dictionary and each is counted respectively Second number that the character occurs in the derogatory term dictionary;According to first number, second number, described praise Adopted word dictionary and the derogatory term dictionary determine the commendation weighted value and derogatory sense weighted value of the character;According to the commendation weight Value and the derogatory sense weighted value calculate the character polarity number of the character, and the character pole by counting each character Property value determines the word polarity number of the word.
In the technical scheme, each character counted respectively in word occurs in commendatory term dictionary and derogatory term dictionary Number, if the number that character occurs in commendatory term dictionary is more, which just has commendation tendency, if character exists The number occurred in derogatory term dictionary is more, then the character just has derogatory sense tendency, otherwise the character is neutral, therefore, The character polarity number that character can be relatively accurately determined according to commendatory term dictionary and derogatory term dictionary, thus according to character polarity Value accurately determines the word polarity number of each word.
In the above-mentioned technical solutions, it is preferable that first determination unit 202 is specifically used for, and receives second and determines order, The commendation weighted value and the derogatory sense weighted value are determined by following equation:Its In, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate i-th in the word The derogatory sense weighted value of the character, f (pci) indicate i-th of character occurs in the commendatory term dictionary described the Number, f (nci) indicate second number for occurring in the derogatory term dictionary of i-th of character, described in m0 expression The quantity of unduplicated commendation character in commendatory term dictionary, n0 indicate the unduplicated derogatory sense character in the derogatory term dictionary Quantity, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary described The third number occurred in commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary described The 4th number occurred in derogatory term dictionary.
In the technical scheme, can make to calculate since the quantity of word in commendatory term dictionary and derogatory term dictionary is unequal Character polarity number it is not accurate enough, therefore, be standardized ground by commendation weight and derogatory sense weight of the above-mentioned formula to character It calculates, so as to accurately obtain the character polarity number of character, and then the word polarity number of word can be accurately obtained.
In the above-mentioned technical solutions, it is preferable that the computing unit 204 includes: setting unit 2042, for according to The word is arranged to subsequent words in position of the token-category and the word belonging to word in the document sentence Word polarity effect value;And the sentence polarity number computation model includes following formula: Wherein, SpIndicate the document The sentence polarity number of sentence p, SwjIndicate the word polarity number of j-th of word in the document sentence, g (wj) Indicate institute's predicate polarity effect value of described j-th word, a indicates the quantity of word described in the document sentence, and a For positive integer.
In the technical scheme, since document sentence is compared to having more complicated form for word, in document sentence Different token-categories under word (especially to the word with emotion rise modification word, such as degree pair Word and negative word) emotion tendency of document sentence can be affected, for example, degree adverb is " most " " slightly " etc., negative Word is " no " " non-" etc., such as " I detests him " and " I does not dislike him " to be expressed mean it is complete different, and Position of the word in document sentence can also affect the emotion tendency of document sentence, if degree adverb is before negative word It is also different with influence of the degree adverb after negative word to the emotion tendency of document sentence, specifically, works as degree adverb When before negative word, degree adverb and negative contamination play the role of reinforcing reversed emotion tendency, work as degree adverb When after negative word, degree adverb and negative contamination play the work for weakening reversed emotion tendency to a certain extent With will not even change emotion tendency, i.e., when calculating the word polarity effect value of word, need to consider word to subsequent words Word polarity effect value therefore in the calculating process of the sentence polarity number to document sentence, comprehensively consider degree adverb, no The relative position between word and degree adverb and negative word is determined, to realize the accurate meter to the sentence polarity number of document sentence It calculates.
Fig. 3 shows the schematic illustration of information processing system according to an embodiment of the invention.
As shown in figure 3, information processing system according to an embodiment of the invention 300 (be equivalent to Fig. 2 shows implementation Information processing system 200 in example), comprising: word polarity number determination unit, sentence polarity number determination unit and emotion tendency Determination unit obtains more parts of current documents relevant to actualite first, pre-processes to every part of current document, pre-processes Pre-processed including function word, that is, delete function word (meaningless word, such as) in every part of current document and character (number with Punctuation mark etc.), obtain the word in the document sentence and document sentence in every part of current document.Then word polarity number determines Unit is used to count the number that each character in word occurs in default dictionary (commendatory term dictionary and derogatory term dictionary), from And determine the character polarity number of each character, it is added up the character polarity number of each character in word to obtain the word Word polarity number, sentence polarity number determination unit are used in analysis document sentence word collocation relationship to the sentence of document sentence On the basis of polarity number influences, common negative word dictionary and degree adverb dictionary are established, and establishes document polarity quantization meter Calculation method realizes the sentence polarity number that document sentence is determined according to the word difference Matching Relation of document sentence.Sentiment orientation Property determination unit be used for Feature Words and document sentence by comprehensively considering the sentence polarity number of document sentence, actualite and Influence of the degree of association of actualite to current document, establishes quantitative analysis model of the current document relative to actualite, really The emotion tendency that current document is directed to actualite is determined, if current document is commendation, derogatory sense or neutrality for actualite Viewpoint.
The following detailed description of technical solution of the present invention:
1. word polarity number determination unit
By counting the number that each character occurs in commendatory term dictionary and derogatory term dictionary come calculating character polarity number, And then the character polarity number comprising character realizes the calculating of word polarity number in accumulative word, specifically, constructs commendatory term first Dictionary and derogatory term dictionary, since current document is directed to the emotion tendency of the actualite by the sentence polarity of document sentence What value determined, what the sentence polarity number of document sentence was determined by the word polarity number of word again, and the word polarity number of word is most It is determined eventually by the character for forming word.Therefore, pass through the word polarity value-based algorithm based on character, it is contemplated that commendatory term dictionary With the unequal influence to calculated result of the number of the sample words in derogatory term dictionary, standardized calculation method is used, The calculation formula of commendation weighted value and derogatory sense weighted value is as follows:
Wherein, POciIndicate the commendation weighted value of i-th of character in the word, NOciIndicate the list The derogatory sense weighted value of i-th of character in word, f (pci) indicate i-th of character in the commendatory term dictionary First number occurred, f (nci) indicate described second that i-th of character occurs in the derogatory term dictionary Number, m0 indicate the quantity of the unduplicated commendation character in the commendatory term dictionary, and n0 is indicated in the derogatory term dictionary not The quantity of duplicate derogatory sense character, and m0 and n0 are positive integer, f (pcj) indicate described in j-th in the commendatory term dictionary The third number that commendation character occurs in the commendatory term dictionary, f (ncj) indicate described in j-th in the derogatory term dictionary The 4th number that derogatory sense character occurs in the derogatory term dictionary.
In above-mentioned formula, if the number that character occurs in commendatory term dictionary is more, which, which just has, is praised Justice tendency, if the number occurred in derogatory term dictionary is more, which just has derogatory sense tendency, no person, it is believed that the character It is neutral.Complete POciAnd NOciCalculating and then be calculated by the following formula the character polarity number of character:
Zci=POci-NOci
Wherein, ZciIndicate the character polarity number of i-th of character in word, POciIt indicates described in i-th in the word The commendation weighted value of character, NOciIndicate the derogatory sense weighted value of i-th of character in the word.
Assuming that a word w includes m1 character, it can obtain word w's by the character polarity number for the character that adds up Word polarity number, calculation formula are as follows:
Wherein, YwIndicate that the word polarity number of word w, m1 indicate the quantity of character included in word w, ZcjIndicate single The character polarity number of j-th of character in word w.
2. sentence polarity number determination unit
Minimum semantic unit of the document sentence as expression viewpoint, has more complicated form, although the word of word Polarity number can reflect the viewpoint that document sentence is held to a certain degree, but not fully consistent between the two, can not lead to Simple cumulative word polarity number is crossed to calculate the sentence polarity number of document sentence.It, can by analyzing following document sentence More preferably to illustrate such case:
A: I dislikes her.B: I very it is disagreeable she.
C: I does not dislike her.D: I less dislikes her.
For above-mentioned several document sentences, if passing through the cumulative sentence to calculate document sentence of word polarity number merely Polarity number, then the result analyzed is consistent, all expresses a kind of " passiveness " to obtain the emotion tendency of document sentence Attitude, it is clear that and C, D are actually subjected to expression and mean inconsistent, this is primarily due to negative word " no " and changes " disagreeable " Emotion tendency.In addition, A and B are compared, a degree adverb " very ", expressed viewpoint more than B ratio A are stronger;C It is compared with D, a degree adverb " too ", expressed viewpoint be not also identical more than D ratio C.Therefore, in the language for calculating document sentence When sentence polarity number, the tendentiousness of emotion word is not only analyzed, but also context should be considered on the tendentious influence of emotion word, spy It is not that the influence of the degree adverb and negative word of modification to entire document sentence emotion tendency is played to emotion word.To understand Certainly negative word and degree adverb bring emotion word change in polarity problem, establish common negative word word set and degree adverb word Collection, by the process of the emotional orientation analysis in document sentence, comprehensively considers the effect of negative word and degree adverb, Lai Shixian The analysis of the sentence polarity number of more accurate document sentence.Tables 1 and 2 lists partial negation word and degree adverb respectively:
Table 1
Table 2
It is assumed that document sentence P={ w1, w2...wa, wherein word is according to positional relationship ordered arrangement in document sentence , before negative word and degree adverb typically occur in modificand, and often occur in combination, has between each other There are reinforcement or weakening, when calculating the sentence polarity number of document sentence, needs to consider emotion word wjBefore closest to two Therefore the influence to sentence polarity number of a word can be calculate by the following formula the sentence polarity number of document sentence:
Wherein, SpIndicate the sentence polarity number of the document sentence P, SwjIndicate j-th of institute in the document sentence State the word polarity number of word, g (wj) indicate institute's predicate polarity effect value of j-th of word, described in a expression The quantity of word described in document sentence, and a is positive integer.
In above-mentioned formula, g (wj) it is discrete function, for indicating word polarity effect of the word w for subsequent emotion word, In view of degree adverb and negative word positional relationship, gives and calculate g (wj) calculation formula:
By analyzing a large amount of text datas, the discovery degree adverb positional relationship different with negative word, reflection is not sympathized with Feel intensity, when degree adverb is before negative word, often plays the role of reinforcing emotion word reversed polarity, and degree adverb is no When determining after word, it can often weaken the intensity of emotion word reversed polarity to a certain extent or even the pole of emotion word can't be changed Property.In summary it analyzes, is the algorithm for calculating the sentence polarity number of document sentence below:
Input: P={ w1, w2...wa, output: P={ w1, w2...wa}
One, for (each wj) // recalculate each word polarity number
If (j==1)
If (j==2)
1:wjIt is negative word;
2:wjIt is degree adverb;
3:wjIt is not degree adverb or negative word;}
If (j >=3)
1:wj-1And wj-2It is not degree adverb or negative word;
2:wj-1And wj-2One of be negative word, another non-degree adverb;
3:wj-1And wj-2One of be negative word, another non-degree adverb;
4:wj-1It is degree adverb, wj-2It is negative word;
5:wj-1It is negative word, wj-2It is degree adverb;}
Two,// calculate Sp
3. emotion tendency determination unit
Current document refers to that analysis current document is directed on the whole for the emotional orientation analysis of the actualite The viewpoint that actualite is held, for example, actively, it is passive or neutral, current document is by document sentence according to certain grammer and sentence The ordered sequence that method rule is combined can pass through cumulative text if be independent from each other between document sentence The sentence polarity number of shelves sentence judges the emotion tendency of document, if in current document the document sentence of commendation account for it is more Number just illustrates that current document has commendation, if the document sentence of derogatory sense occupies the majority, just illustrates that current document has derogatory sense Property, if the two is suitable, current document is neutral.
In fact, the document sentence that current document includes is not independent due to there are conjunction, especially turning company Word utilization is connect, the emotion tendency of document sentence in current document is often enabled to change.In addition, each document sentence is retouched The Feature Words for stating the same topic also can difference, the Feature Words of some topics can only occur in commendation document sentence, The Feature Words of other topic can only occur in derogatory sense document sentence, and there are also the Feature Words of some topics to occur simultaneously In both document sentences, this results in the document sentence of praising property and the document sentence of demoting property in the emotion for calculating current document Inequality when tendentiousness.To solve the above-mentioned problems, this programme uses document polarity relation analysis model, comprehensively considers text Shelves sentence polarity, topic feature and document sentence and the topic degree of association are on the polar influence of document.The relation analysis model is detailed It states as follows:
Feature set of words relevant to actualite is Fr={ fe1, fe2...fem2, current document d includes n1 praising property The passing judgement on property sentence collection that sentence and demoting property sentence, praising property sentence and demoting property sentence are formed is combined into { P1, P2...Pn1, then currently The topic polarity number of document can be calculate by the following formula to obtain:
Wherein, W (d) indicates the topic polarity number of the current document d, αposIndicate all praising property sentences with First incidence coefficient of the actualite, βnegIndicate described the of all demoting property sentences and the actualite Two incidence coefficients, Spos(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate and y-th of institute State the sentence polarity number of demoting property sentence, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and K, l is positive integer, k+l=n1.
α is calculated by following formulaposAnd βneg:
Wherein, m2 indicates the quantity of the Feature Words in feature set of words relevant to actualite, and n1 indicates praising property sentence With the total quantity of demoting property sentence, and m2 and n1 are positive integer, frpos(fej, Pi) indicate j-th of Feature Words in i-th of praising property language The number occurred in sentence, frneg(fej, Pi) indicate the number that j-th of Feature Words occurs in i-th of demoting property sentence.
The technical scheme of the present invention has been explained in detail above with reference to the attached drawings, and current document can be accurately calculated and be directed to and work as The topic polarity number of preceding topic, so as to accurately analyze the emotion tendency of current document.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of information processing method characterized by comprising
The word in the document sentence and the document sentence in current document is obtained, and the word is determined according to default dictionary Word polarity number;
According to each of the document sentence word, the word polarity number and sentence polarity number meter of the word Calculate the sentence polarity number that model calculates the document sentence;
According to the sentence polarity number of each of the current document document sentence and the determination of feature set of words The emotion tendency of current document;
The sentence polarity number and Feature Words set analysis according to each of the current document document sentence The emotion tendency for determining the current document, specifically includes:
Classified according to the sentence polarity number of each document sentence to the document sentence, to obtain the document Praising property sentence and demoting property sentence in sentence;
All praising property sentences and the spy are determined according to the praising property sentence, the demoting property sentence and the feature set of words Levy the first incidence coefficient of set of words and the second incidence coefficient of all the demoting property sentences and the feature set of words;
According to first incidence coefficient, second incidence coefficient, each document sentence and each document sentence The sentence polarity number determine the topic polarity number of the current document;
Classified according to the topic polarity number of the current document to the current document, and is determined according to classification results The emotion tendency of the current document.
2. information processing method according to claim 1, which is characterized in that
Calculation command is received, the topic polarity number of the current document is calculated by following equation:
Wherein, W (d) indicates the topic polarity number of the current document d, αposIndicate all praising property sentences with it is described First incidence coefficient of feature set of words, βnegIndicate described the of all demoting property sentences and the feature set of words Two incidence coefficients, Spos(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate and y-th of institute State the sentence polarity number of demoting property sentence, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and K, l is positive integer.
3. information processing method according to claim 1, which is characterized in that the default dictionary includes: commendatory term dictionary With derogatory term dictionary;And
It receives first and determines order, the word polarity number of the word is determined by following procedure:
First number and each character that each character of the word occurs in the commendatory term dictionary are counted respectively Second number occurred in the derogatory term dictionary;
The character is determined according to first number, second number, the commendatory term dictionary and the derogatory term dictionary Commendation weighted value and derogatory sense weighted value;
The character polarity number of the character is calculated according to the commendation weighted value and the derogatory sense weighted value, and each by statistics The character polarity number of the character determines the word polarity number of the word.
4. information processing method according to claim 3, which is characterized in that
It receives second and determines order, the commendation weighted value and the derogatory sense weighted value are determined by following equation:
Wherein, POciIndicate the commendation weighted value of i-th of character in the word, NOciIt indicates in the word The derogatory sense weighted value of i-th of character, f (pci) indicate what i-th of character occurred in the commendatory term dictionary First number, f (nci) indicate second number that i-th of character occurs in the derogatory term dictionary, m0 table Show the quantity of the unduplicated commendation character in the commendatory term dictionary, what n0 indicated in the derogatory term dictionary unduplicated demotes The quantity of adopted character, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary The third number occurred in the commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary The 4th number occurred in the derogatory term dictionary.
5. information processing method according to any one of claim 1 to 4, which is characterized in that
According to the position of token-category belonging to the word and the word in the document sentence, the word pair is set The word polarity effect value of subsequent words;And the sentence polarity number computation model includes following formula:
Wherein, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate j-th of list in the document sentence The word polarity number of word, g (wj) indicating institute's predicate polarity effect value of j-th of word, a indicates the document The quantity of word described in sentence, and a is positive integer.
6. a kind of information processing system characterized by comprising
First determination unit, for obtaining the word in document sentence and the document sentence in current document, and according to pre- If dictionary determines the word polarity number of the word;
Computing unit, for according to the word polarity number of each of the document sentence word, the word with And sentence polarity number computation model calculates the sentence polarity number of the document sentence;
Processing unit, for the sentence polarity number and Feature Words according to each of the current document document sentence Set determines the emotion tendency of the current document;
The processing unit includes:
Taxon, for being classified according to the sentence polarity number of each document sentence to the document sentence, To obtain the praising property sentence in the document sentence and demoting property sentence;
Second determination unit determines all institutes for the praising property sentence according to, the demoting property sentence and the feature set of words State the first incidence coefficient and all demoting property sentences and the feature set of words of praising property sentence and the feature set of words Second incidence coefficient,
According to first incidence coefficient, second incidence coefficient, each document sentence and each document sentence The sentence polarity number determine the topic polarity number of the current document,
Classified according to the topic polarity number of the current document to the current document, and is determined according to classification results The emotion tendency of the current document.
7. information processing system according to claim 6, which is characterized in that
Second determination unit is specifically used for, and receives calculation command, is calculated described in the current document by following equation Topic polarity number:
Wherein, W (d) indicates the topic polarity number of the current document d, αposIndicate all praising property sentences with it is described First incidence coefficient of feature set of words, βnegIndicate described the of all demoting property sentences and the feature set of words Two incidence coefficients, Spos(px) indicate with x-th described in praising property sentence the sentence polarity number, Sneg(py) indicate and y-th of institute State the sentence polarity number of demoting property sentence, the quantity of praising property sentence described in k expression, the quantity of demoting property sentence described in l expression, and K, l is positive integer.
8. information processing system according to claim 6, which is characterized in that the default dictionary includes: commendatory term dictionary With derogatory term dictionary;And
First determination unit is specifically used for, and receives first and determines order, is determined described in the word by following procedure Word polarity number:
First number and each character that each character of the word occurs in the commendatory term dictionary are counted respectively Second number occurred in the derogatory term dictionary;
The character is determined according to first number, second number, the commendatory term dictionary and the derogatory term dictionary Commendation weighted value and derogatory sense weighted value;
The character polarity number of the character is calculated according to the commendation weighted value and the derogatory sense weighted value, and each by statistics The character polarity number of the character determines the word polarity number of the word.
9. information processing system according to claim 8, which is characterized in that
First determination unit is specifically used for, and receives second and determines order, determines the commendation weighted value by following equation With the derogatory sense weighted value:
Wherein, POciIndicate the commendation weighted value of i-th of character in the word, NOciIt indicates in the word The derogatory sense weighted value of i-th of character, f (pci) indicate what i-th of character occurred in the commendatory term dictionary First number, f (nci) indicate second number that i-th of character occurs in the derogatory term dictionary, m0 table Show the quantity of the unduplicated commendation character in the commendatory term dictionary, what n0 indicated in the derogatory term dictionary unduplicated demotes The quantity of adopted character, and m0 and n0 are positive integer, f (pcj) indicate j-th of commendation character in the commendatory term dictionary The third number occurred in the commendatory term dictionary, f (ncj) indicate j-th of derogatory sense character in the derogatory term dictionary The 4th number occurred in the derogatory term dictionary.
10. information processing system according to any one of claims 6 to 9, which is characterized in that the computing unit packet It includes:
Setting unit is set for the position of the token-category according to belonging to the word and the word in the document sentence The word is set to the word polarity effect value of subsequent words;And the sentence polarity number computation model includes following formula:
Wherein, SpIndicate the sentence polarity number of the document sentence p, SwjIndicate j-th of list in the document sentence The word polarity number of word, g (wj) indicating institute's predicate polarity effect value of j-th of word, a indicates the document The quantity of word described in sentence, and a is positive integer.
CN201510369322.8A 2015-06-29 2015-06-29 Information processing method and information processing system Expired - Fee Related CN106294312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510369322.8A CN106294312B (en) 2015-06-29 2015-06-29 Information processing method and information processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510369322.8A CN106294312B (en) 2015-06-29 2015-06-29 Information processing method and information processing system

Publications (2)

Publication Number Publication Date
CN106294312A CN106294312A (en) 2017-01-04
CN106294312B true CN106294312B (en) 2019-02-26

Family

ID=57650357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510369322.8A Expired - Fee Related CN106294312B (en) 2015-06-29 2015-06-29 Information processing method and information processing system

Country Status (1)

Country Link
CN (1) CN106294312B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304571B (en) * 2018-02-22 2020-10-09 湘潭大学 Portable network public opinion analysis system based on particle model topic analysis algorithm
KR102012968B1 (en) * 2018-08-07 2019-08-27 주식회사 서큘러스 Method and server for controlling interaction robot
CN110399484A (en) * 2019-06-25 2019-11-01 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of long text

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766287B1 (en) * 1999-12-15 2004-07-20 Xerox Corporation System for genre-specific summarization of documents
CN102236636A (en) * 2010-04-26 2011-11-09 富士通株式会社 Method and device for analyzing emotional tendency
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766287B1 (en) * 1999-12-15 2004-07-20 Xerox Corporation System for genre-specific summarization of documents
CN102236636A (en) * 2010-04-26 2011-11-09 富士通株式会社 Method and device for analyzing emotional tendency
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method

Also Published As

Publication number Publication date
CN106294312A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
Tabassum et al. A survey on text pre-processing & feature extraction techniques in natural language processing
CN108491377A (en) A kind of electric business product comprehensive score method based on multi-dimension information fusion
Trinh et al. Lexicon-based sentiment analysis of Facebook comments in Vietnamese language
US9646512B2 (en) System and method for automated teaching of languages based on frequency of syntactic models
CN107704556B (en) Emotion analysis method and system for automobile industry subdivision field
Hou et al. A study on correlation between Chinese sentence and constituting clauses based on the Menzerath-Altmann law
Mostafa Egyptian student sentiment analysis using Word2vec during the coronavirus (Covid-19) pandemic
KR20120109943A (en) Emotion classification method for analysis of emotion immanent in sentence
Bsir et al. Enhancing deep learning gender identification with gated recurrent units architecture in social text
Ahmed et al. A novel approach for Sentimental Analysis and Opinion Mining based on SentiWordNet using web data
CN105740382A (en) Aspect classification method for short comment texts
Mangal et al. Analysis of users’ interest based on tweets
Van Hee et al. Monday mornings are my fave:)# not exploring the automatic recognition of irony in english tweets
Haque et al. Opinion mining from bangla and phonetic bangla reviews using vectorization methods
CN106294312B (en) Information processing method and information processing system
Ma et al. Multi-resolution annotations for emoji prediction
Nama et al. Sentiment analysis of movie reviews: A comparative study between the naive-bayes classifier and a rule-based approach
Al-Obaidi et al. Opinion mining: Analysis of comments written in arabic colloquial
Yang et al. A semantic similarity analysis of multiple English translations of The Analects: Based on a natural language processing algorithm
CN110222181B (en) Python-based film evaluation emotion analysis method
Jurgens et al. Your spouse needs professional help: Determining the contextual appropriateness of messages through modeling social relationships
Xiong et al. Understanding differences in perceived peer-review helpfulness using natural language processing
Rezaee et al. The sequencing of adverbial clauses of time in academic English: Random forest modelling
Al Bashaireh et al. Towards a new indicator for evaluating universities based on twitter sentiment analysis
Rai et al. Identification of landscape preferences by using social media analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220620

Address after: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee after: Peking University

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871 No. 5, the Summer Palace Road, Beijing, Haidian District

Patentee before: Peking University

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190226

CF01 Termination of patent right due to non-payment of annual fee