CN104408035B - The analysis method and device of word affective style - Google Patents

The analysis method and device of word affective style Download PDF

Info

Publication number
CN104408035B
CN104408035B CN201410779580.9A CN201410779580A CN104408035B CN 104408035 B CN104408035 B CN 104408035B CN 201410779580 A CN201410779580 A CN 201410779580A CN 104408035 B CN104408035 B CN 104408035B
Authority
CN
China
Prior art keywords
word
affective style
sample
affective
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410779580.9A
Other languages
Chinese (zh)
Other versions
CN104408035A (en
Inventor
刘粉香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201410779580.9A priority Critical patent/CN104408035B/en
Publication of CN104408035A publication Critical patent/CN104408035A/en
Application granted granted Critical
Publication of CN104408035B publication Critical patent/CN104408035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of analysis method and device of word affective style.Wherein, this method includes:Word segmentation processing is carried out to samples of text, obtains set of words;The first word that predetermined number is extracted from set of words obtains the first word sample;The emotion attribute of each first word in first word sample is set using default Emotion tagging value, obtains the second word sample of multiple affective styles;Phrase vector and emotion attribute based on the second word in the second word sample calculate the Gaussian Distribution Parameters of each affective style;The probability of each affective style of correspondence of the 3rd word in samples of text is calculated using Gaussian Distribution Parameters;Affective style based on the word of determine the probability the 3rd.By the present invention, solve the problems, such as that machine can not accurately analyze the affective style of word in the prior art, the effect of the affective style of sample word can be given with the emotion tendency of existing word in automatic decision corpus and automatically and accurately analysis by realizing machine.

Description

The analysis method and device of word affective style
Technical field
The present invention relates to data processing field, in particular to a kind of analysis method and device of word affective style.
Background technology
For theme of concern, if giving a sample word, it is necessary to solve the problems, such as it how is according to given Large amount of text information fast and effectively analyze the affective style of given sample word, that is, determine that the emotion of given sample word is inclined Tropism.Above-mentioned to solve the problems, such as, the affective style analysis method of word of the prior art is mainly according to emotion tendency The corpus of vocabulary, relative words being matched, traversal searches the word tendentiousness (i.e. affective style) of given sample word, still, Prior art can not be applied to the word (such as emerging network vocabulary) do not included in corpus or in itself without emotion Tendentious word.
Because existing solution is with the word tendentiousness of the given sample word of traversal lookup, is calculating and storing The computer resource expended during data is all bigger, and processing speed is slow, and by matching the corpus of emotion tendency vocabulary In the method for vocabulary search the word tendentiousness of given sample word, the word for the vocabulary do not included in corpus can not be analyzed Tendentiousness.
The problem of can not accurately analyzing the affective style of word for machine in the prior art, not yet propose at present effective Solution.
The content of the invention
It is a primary object of the present invention to provide a kind of analysis method and device of word affective style, to solve existing skill Machine can not accurately analyze the problem of affective style of word in art.
To achieve these goals, one side according to embodiments of the present invention, there is provided a kind of word affective style Analysis method.
Included according to the analysis method of the present invention:Word segmentation processing is carried out to samples of text, obtains set of words;From word collection The first word that predetermined number is extracted in conjunction obtains the first word sample;Using default Emotion tagging value, the first word sample is set The emotion attribute of each first word in this, obtains the second word sample of multiple affective styles;Based on the second word sample In the emotion attribute of the word of phrase vector sum second of the second word calculate the Gaussian Distribution Parameters of each affective style;Use Gaussian Distribution Parameters calculate the probability of each affective style of correspondence of the 3rd word in samples of text;Pair based on the 3rd word Answer the affective style of the word of determine the probability the 3rd of each affective style.
Further, multiple affective styles include the first affective style and the second affective style, wherein, based on the 3rd word The affective style of the word of determine the probability the 3rd of each affective style of correspondence include:Obtain the corresponding first emotion class of the 3rd word Second probability of the first probability of type and corresponding second affective style of the 3rd word;Calculate the difference of the first probability and the second probability Value;Judge whether difference is more than the first predetermined threshold value;If difference is more than the first predetermined threshold value, the emotion of the 3rd word is judged Type is the first affective style;If difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;If Difference is less than the second predetermined threshold value, then the affective style for judging the 3rd word is the second affective style;If difference is not less than the Two predetermined threshold values, the then affective style for judging the 3rd word are the 3rd affective style.
Further, the emotion category of each first word in the first word sample is set using default Emotion tagging value Property, obtaining the second word sample of multiple affective styles includes:Set using default first ident value and belong to the first emotion class The emotion attribute of first word of type, obtain the second word sample of the first affective style;Set using default second ident value Put the emotion attribute for the first word for belonging to the second affective style;Obtain the second word sample of the second affective style.
Further, the emotion category of each first word in the first word sample is set using default Emotion tagging value Property, obtaining the second word sample of multiple affective styles includes:The Sentiment orientation word of the first word is read in tables of data;Use The Emotion tagging value of Sentiment orientation word sets the emotion attribute of the first word, obtains the second word sample of multiple affective styles.
Further, word segmentation processing is being carried out to samples of text, after obtaining set of words, analysis method also includes:It is logical The phrase vector that machine learning method obtains each word in samples of text is crossed, wherein, the one-dimensional data in phrase vector is used for One attribute information of the word is described.
To achieve these goals, another aspect according to embodiments of the present invention, there is provided a kind of word affective style Analytical equipment.
Included according to the analytical equipment of the present invention:Word-dividing mode, for carrying out word segmentation processing to samples of text, obtain word Set;Abstraction module, the first word for extracting predetermined number from set of words obtain the first word sample;Mould is set Block, for setting the emotion attribute of each first word in the first word sample using default Emotion tagging value, obtain more Second word sample of individual affective style;First computing module, for the phrase based on the second word in the second word sample The emotion attribute of the word of vector sum second calculates the Gaussian Distribution Parameters of each affective style;Second computing module, for using Gaussian Distribution Parameters calculate the probability of each affective style of correspondence of the 3rd word in samples of text;Determining module, for base In the affective style of the word of determine the probability the 3rd of each affective style of correspondence of the 3rd word.
Further, multiple affective styles include the first affective style and the second affective style, wherein it is determined that module bag Include:First acquisition module, for obtaining the first probability and the 3rd word corresponding second of corresponding first affective style of the 3rd word Second probability of affective style;Calculating sub module, for calculating the difference of the first probability and the second probability;Judge module, it is used for Judge whether difference is more than the first predetermined threshold value;First determination sub-module, for being more than the situation of the first predetermined threshold value in difference Under, the affective style for determining the 3rd word is the first affective style;Judging submodule, for presetting threshold no more than first in difference In the case of value, judge whether difference is less than the second predetermined threshold value;Second determination sub-module, it is default for being less than second in difference In the case of threshold value, the affective style for determining the 3rd word is the second affective style;3rd determination sub-module, in difference not In the case of less than the second predetermined threshold value, the affective style for determining the 3rd word is the 3rd affective style.
Further, setup module includes:First sets submodule, belongs to for being set using default first ident value The emotion attribute of first word of the first affective style, obtain the second word sample of the first affective style;Second sets submodule Block, for setting the emotion attribute for the first word for belonging to the second affective style using default second ident value;Obtain second Second word sample of affective style.
Further, setup module includes:Read module, for reading the Sentiment orientation of the first word in tables of data Word;3rd sets submodule, for setting the emotion attribute of the first word using the Emotion tagging value of Sentiment orientation word, obtains more Second word sample of individual affective style.
Further, analytical equipment also includes:Second acquisition module, for carrying out word segmentation processing to samples of text, obtain To after set of words, the phrase vector of each word in samples of text is obtained by machine learning method, wherein, phrase vector In one-dimensional data be used to describe an attribute information of the word.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words In the word of predetermined number extracted by random algorithm obtain the first word sample, and set the using default Emotion tagging value The emotion attribute of each word in one word sample obtains the second word sample, the word based on the word in the second word sample Group vector sum emotion attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates samples of text using Gaussian Distribution Parameters In the 3rd word corresponding each affective style (such as given sample word) probability, based on each affective style determine the probability The affective style of 3rd word.In embodiments of the present invention, by the affective style of the partial words in identification of words set, make The probability for giving sample word and corresponding to each affective style is calculated with Gaussian Distribution Parameters, given sample word can be accurately obtained Language corresponds to the probability of each affective style, and the determine the probability based on each affective style gives the affective style of sample word, and And the corpus that given sample word need not be present in emotion tendency vocabulary can also determine to give the emotion class of sample word Type, the degree of accuracy of the affective style of the given sample word of analysis is improved, without traveling through corpus to search given sample word Affective style.By the embodiment of the present invention, solves the affective style that machine in the prior art can not accurately analyze word Problem, realizing machine can analyze with the emotion tendency of existing word in automatic decision corpus and automatically and accurately to random sample The effect of the affective style of this word.
Brief description of the drawings
The accompanying drawing for forming the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate to be used to explain the present invention, do not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the analysis method of word affective style according to embodiments of the present invention;
Fig. 2 is a kind of flow chart of the analysis method of optional word affective style according to embodiments of the present invention;And
Fig. 3 is the schematic diagram of the analytical equipment of word affective style according to embodiments of the present invention.
Embodiment
First, the part noun or term occurred during the embodiment of the present invention is described is applied to following solution Release:
Machine learning is that a kind of method of information is converted data to by the extracting rule in data or pattern, mainly Machine learning method have induction learning and analytic learning method.In machine-learning process, data are pretreated first, are formed Feature, then according to certain model of feature-modeling;The data that are collected into of machine learning algorithm analysis, distribution weight, threshold value and its He reaches the aim of learning at parameter.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use Data can exchange in the appropriate case, so as to embodiments of the invention described herein.In addition, term " comprising " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
The embodiments of the invention provide a kind of analysis method of word affective style.
Fig. 1 is the flow chart of the analysis method of word affective style according to embodiments of the present invention.As shown in figure 1, this point It is as follows that analysis method can include step:
Step S102, word segmentation processing is carried out to samples of text, obtains set of words.
Step S104, the first word that predetermined number is extracted from set of words obtain the first word sample.
Specifically, the first word of predetermined number can be extracted from set of words by the program of application random algorithm, Obtain the first word sample.
Step S106, the emotion category of each first word in the first word sample is set using default Emotion tagging value Property, obtain the second word sample of multiple affective styles.
Step S108, the emotion attribute meter of the word of phrase vector sum second based on the second word in the second word sample Calculate the Gaussian Distribution Parameters of each affective style.
Step S110, each affective style of correspondence of the 3rd word in samples of text is calculated using Gaussian Distribution Parameters Probability.
Step S112, the affective style of the word of determine the probability the 3rd of each affective style of correspondence based on the 3rd word.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words In the word of predetermined number extracted by random algorithm obtain the first word sample, and set the using default Emotion tagging value The emotion attribute of each word in one word sample obtains the second word sample, the word based on the word in the second word sample Group vector sum emotion attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates samples of text using Gaussian Distribution Parameters In the 3rd word corresponding each affective style (such as given sample word) probability, based on each affective style determine the probability The affective style of 3rd word.In embodiments of the present invention, by the affective style of the partial words in identification of words set, make The probability for giving sample word and corresponding to each affective style is calculated with Gaussian Distribution Parameters, given sample word can be accurately obtained Language corresponds to the probability of each affective style, and the determine the probability based on each affective style gives the affective style of sample word, and And the corpus that given sample word need not be present in emotion tendency vocabulary can also determine to give the emotion class of sample word Type, the degree of accuracy of the affective style of the given sample word of analysis is improved, without traveling through corpus to search given sample word Affective style.By the embodiment of the present invention, solves the affective style that machine in the prior art can not accurately analyze word Problem, realizing machine can analyze with the emotion tendency of existing word in automatic decision corpus and automatically and accurately to random sample The effect of the affective style of this word.
In the above-described embodiments, text message can be text (e.g., news item or one obtained from internet Wen Bo is commented on) or the e-text that is obtained by scanning or inputting the content of paper document, it can also be that user passes through E-text of terminal input etc.;Machine can be computer or computer program.
It should be further stated that carrying out word segmentation processing to text message, obtaining set of words can be by such as lower section Method is realized:Text message is split as multiple words according to default word combination, multiple words is preserved and obtains set of words.
Specifically, default word combination can be obtained from term database, and by the word and word in text message Default word combination in database is matched, if the word in text message is identical with default word combination, by the word Language marks off from text message to be come, and obtains multiple words.
It is alternatively possible to word segmentation processing is carried out to text message using participle instrument.
For example, if text message is " today, weather was fine ", text information is being carried out at participle using participle instrument After reason, obtained word can be " today ", " weather ", " very " and " good ".
According to the above embodiment of the present invention, multiple affective styles include the first affective style and the second affective style, wherein, The affective style of the word of determine the probability the 3rd of each affective style of correspondence based on the 3rd word can include:Obtain the 3rd word First probability of corresponding first affective style of language and the second probability of corresponding second affective style of the 3rd word;Calculate the first probability With the difference of the second probability;Judge whether difference is more than the first predetermined threshold value;If difference is more than the first predetermined threshold value, judge The affective style of 3rd word is the first affective style;If difference is not more than the first predetermined threshold value, judge whether difference is less than Second predetermined threshold value;If difference is less than the second predetermined threshold value, the affective style for judging the 3rd word is the second affective style; If difference is not less than the second predetermined threshold value, the affective style for judging the 3rd word is the 3rd affective style.
Specifically, the first probability and the 3rd word of the 3rd word corresponding first affective style (such as given sample word) are obtained Second probability of corresponding second affective style of language, calculates the difference of first probability and the second probability and judges whether difference is more than First predetermined threshold value, when difference is more than the first predetermined threshold value, the affective style for judging the 3rd word is the first affective style; When difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;It is less than the second default threshold in difference During value, the affective style for judging the 3rd word is the second affective style;When difference is not less than the second predetermined threshold value, then judge The affective style for going out the 3rd word is the 3rd affective style.
In an optional embodiment, the absolute value of the first predetermined threshold value and the second predetermined threshold value (can be designated as presetting Probable value) can be with equal, and it is on the occasion of the second predetermined threshold value is negative value that can take the first predetermined threshold value.In this embodiment, When the first probability and the absolute value of the difference of the second probability are more than predetermined probabilities value, the first probability and the second probability pair are judged The word answered has obvious emotion tendency (i.e. affective style), and the affective style corresponding to larger probability is to be somebody's turn to do The affective style of word;When the difference of the first probability and the second probability is not more than predetermined probabilities value, the feelings of the word are judged Feel type unobvious, be the 3rd affective style (i.e. neutral affective style).
In the above embodiment of the present invention, the first affective style can be positive emotion type, and the second affective style can be with For negative emotion type, and the 3rd affective style can be neutral affective style.
It should be further stated that the 3rd word is the word in set of words.
By the above embodiment of the present invention, determine that the first probability is corresponding with the second probability according to threshold value set in advance The affective style of word, improve the degree of accuracy of the affective style of the word of determination.
According to the above embodiment of the present invention, each first in the first word sample is set using default Emotion tagging value The emotion attribute of word, obtaining the second word sample of multiple affective styles can include:Set using default first ident value The emotion attribute for the first word for belonging to the first affective style is put, obtains the second word sample of the first affective style;Using pre- If the second ident value set belong to the second affective style the first word emotion attribute;Obtain the second of the second affective style Word sample.
Specifically, the emotion attribute for the word for belonging to each affective style is identified respectively using default ident value, i.e. make The emotion attribute of the word of first affective style is set with the first ident value, and the second affective style is set using the second ident value Word emotion attribute.
In the above-described embodiment, the first ident value can be 1, represent positive emotion type (i.e. the first affective style); Second ident value can be -1, represent negative emotion type (i.e. the second affective style).
Further, in the second word sample can also include set of words in addition to the first word, do not mark feelings Feel other words of type.
In the above embodiment of the present invention, each first in the first word sample is set using default Emotion tagging value The emotion attribute of word, obtaining the second word sample of multiple affective styles can include:The first word is read in tables of data Sentiment orientation word;The emotion attribute of the first word is set using the Emotion tagging value of Sentiment orientation word, obtains multiple emotion classes Second word sample of type.
Specifically, the Sentiment orientation word of word is read in tables of data, uses the Emotion tagging value of the Sentiment orientation word (e.g., the Emotion tagging value of positive emotion type is 1, and the Emotion tagging value of negative emotion type is -1) sets the emotion category of word Property, obtain the second word sample.
In the above-described embodiment, Sentiment orientation word can include the word for representing positive emotion type, such as represent to praise Word or positive word etc.;Sentiment orientation word can also include the word for representing negative emotion type, such as derogatory term or disappear Pole word etc.;Sentiment orientation word can also include the word for representing neutral affective style.
According to the above embodiment of the present invention, word segmentation processing, after obtaining set of words, analysis side are being carried out to samples of text Method can also include:The phrase vector of each word in samples of text is obtained by machine learning method, wherein, in phrase vector One-dimensional data be used to describe an attribute information of the word.
Specifically, the phrase vector for each word in samples of text being obtained by machine learning method can pass through engineering The method (e.g., machine learning program) of habit is realized.Alternatively, the phrase vector in the embodiment can be the vector of 500 dimensions, The operational efficiency and operation result accuracy of terminal can be ensured using the vector of 500 dimensions in this embodiment.
Wherein it is possible to word is characterized as phrase vector using instrument word2vec.Word2vec is one and turns word Change the instrument of vector form into.
By the above embodiments of the present invention, using each word in phrase vectorial samples of text, obtain multiple Set of words, in the affective style of the given sample word of analysis, only the phrase vector sum emotion attribute of word need to be used to calculate The Gaussian Distribution Parameters of each affective style, then calculate other words in samples of text using Gaussian Distribution Parameters and (such as give Random sample this word) each affective style of correspondence probability, and correspond to based on given sample word the probability of each affective style It is determined that the affective style of given sample word, without analyzing the tendentiousness of word by traversal of the prior art, is saved Space needed for storage word and samples of text, when the data volume of text message is larger, can rapidly and accurately be analyzed given The affective style of sample word.
Fig. 2 is a kind of flow chart of the analysis method of optional word affective style according to embodiments of the present invention, below The above embodiment of the present invention is discussed in detail with reference to Fig. 2.
As shown in Fig. 2 Gauss point is passed through using the emotion tendency of arbitrary sampling method mark word in this embodiment Cloth divides the emotion tendency of word, then judges given sample word (the 3rd word i.e. in the above embodiment of the present invention) Sentiment orientation, specifically, the analysis method may include steps of:
Step S202, word segmentation processing is carried out to text training sample, multiple words is obtained and marks the part of speech of each word.
Wherein, text training sample is the samples of text in the above embodiment of the present invention.
Step S204, with each word of array representation, the correspondence that each word is obtained by machine learning method uniquely counts Group.
Wherein, array is the phrase vector in above-described embodiment, and array can be 500 dimension groups.
Step S206, emotion tendency mark is carried out to subsample phrase using method of random sampling.
Specifically, the emotion attribute of the word of positive emotion type is designated as 1, and the emotion of the word of negative emotion type Attribute is designated as -1.
Wherein, subsample phrase is the first word sample in the above embodiment of the present invention.
Step S208, the word to front affective style and the word of positive emotion type are calculated with maximum likelihood method respectively Respective higher-dimension Gaussian Distribution Parameters.
Step S210, array corresponding to given new sample word is searched, and it is positive emotion type to calculate it respectively With the probability of negative emotion type.
Wherein, sample word is the 3rd word in the above embodiment of the present invention, is the word that text training sample includes Language.
Step S212, the emotion tendency of sample word is determined according to emotion tendency probability threshold value.
Wherein, emotion tendency is the affective style in the above embodiment of the present invention.
Specifically, obtain default emotion tendency probability threshold value, if the probability of the positive emotion type of sample word with The difference of the probability of negative emotion type is more than emotion tendency probability threshold value, it is determined that there is sample word obvious emotion to incline Tropism, and the affective style that the big affective style of probability is the sample word is determined, if the positive emotion type of sample word The difference of probability and the probability of negative emotion type is not more than emotion tendency probability threshold value, it is determined that the emotion of the sample word Tendentiousness is neutrality, i.e., the affective style of sample word is neutral affective style.
By the above embodiment of the present invention, each word is represented using array mode, and obtain by machine learning method Each word uniquely for array, calculating speed is fast;Based on emotion tendency mark is carried out to word, and pass through maximum likelihood Method calculates higher-dimension Gaussian Distribution Parameters so that the judgement of affective style is more accurate;Using emotion tendency probability threshold value so that Judging the accuracy of the affective style of word can be adjusted according to the requirement of analyst, add the availability of result.
The embodiment of the present invention additionally provides a kind of analytical equipment of word affective style.The analytical equipment can be by this hair The analysis method of word affective style in bright above-described embodiment realizes its function.
Fig. 3 is the schematic diagram of the analytical equipment of word affective style according to embodiments of the present invention.As shown in figure 3, this point Analysis apparatus can include:Word-dividing mode 10, for carrying out word segmentation processing to samples of text, obtain set of words;Abstraction module 30, The first word for extracting predetermined number from set of words obtains the first word sample;Setup module 50, for using pre- If Emotion tagging value the emotion attribute of each first word in the first word sample is set, obtain the of multiple affective styles Two word samples;First computing module 70, for the word of phrase vector sum second based on the second word in the second word sample The emotion attribute of language calculates the Gaussian Distribution Parameters of each affective style;Second computing module 90, for being joined using Gaussian Profile Number calculates the probability of each affective style of correspondence of the 3rd word in samples of text;Determining module 110, for based on the 3rd word The affective style of the word of determine the probability the 3rd of each affective style of correspondence of language.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words The middle word for extracting predetermined number obtains the first word sample, and is set using default Emotion tagging value in the first word sample The emotion attribute of each word obtain the second word sample, the phrase vector sum emotion based on the word in the second word sample Attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates the 3rd word in samples of text using Gaussian Distribution Parameters The probability of corresponding each affective style, the feelings based on each word of affective style determine the probability the 3rd (such as given sample word) Feel type.In embodiments of the present invention, by the affective style of the partial words in identification of words set, joined using Gaussian Profile Number calculates the probability that given sample word corresponds to each affective style, can accurately obtain given sample word and correspond to each feelings Feel the probability of type, the determine the probability based on each affective style gives the affective style of sample word, and given sample word The corpus that language need not be present in emotion tendency vocabulary can also determine the affective style of given sample word, improve analysis The degree of accuracy of the affective style of given sample word, without traveling through corpus to search the affective style of given sample word.It is logical The embodiment of the present invention is crossed, solves the problems, such as that machine can not accurately analyze the affective style of word in the prior art, realize machine Device can give the emotion of sample word with the emotion tendency of existing word in automatic decision corpus and automatically and accurately analysis The effect of type.
Specifically, the first word of predetermined number can be extracted from set of words by the program of application random algorithm, Obtain the first word sample.
In the above-described embodiments, text message can be text (e.g., news item or one obtained from internet Wen Bo is commented on) or the e-text that is obtained by scanning or inputting the content of paper document, it can also be that user passes through E-text of terminal input etc.;Machine can be computer or computer program.
It should be further stated that carrying out word segmentation processing to text message, obtaining set of words can be by such as lower section Method is realized:Text message is split as multiple words according to default word combination, multiple words is preserved and obtains set of words.
Specifically, default word combination can be obtained from term database, and by the word and word in text message Default word combination in database is matched, if the word in text message is identical with default word combination, by the word Language marks off from text message to be come, and obtains multiple words.
It is alternatively possible to word segmentation processing is carried out to text message using participle instrument.
For example, if text message is " today, weather was fine ", text information is being carried out at participle using participle instrument After reason, obtained word can be " today ", " weather ", " very " and " good ".
According to the above embodiment of the present invention, multiple affective styles include the first affective style and the second affective style, wherein, Determining module can include:First acquisition module, for obtaining the first probability and the of corresponding first affective style of the 3rd word Second probability of corresponding second affective style of three words;Calculating sub module, for calculating the difference of the first probability and the second probability; Judge module, for judging whether difference is more than the first predetermined threshold value;First determination sub-module, it is pre- for being more than first in difference If in the case of threshold value, the affective style for determining the 3rd word is the first affective style;Judging submodule, for little in difference In the case of the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;Second determination sub-module, in difference In the case of less than the second predetermined threshold value, the affective style for determining the 3rd word is the second affective style;3rd determination sub-module, In the case of being not less than the second predetermined threshold value in difference, the affective style for determining the 3rd word is the 3rd affective style.
Specifically, the first probability and the 3rd word of the 3rd word corresponding first affective style (such as given sample word) are obtained Second probability of corresponding second affective style of language, calculates the difference of first probability and the second probability and judges whether difference is more than First predetermined threshold value, when difference is more than the first predetermined threshold value, the affective style for judging the 3rd word is the first affective style; When difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;It is less than the second default threshold in difference During value, the affective style for judging the 3rd word is the second affective style;When difference is not less than the second predetermined threshold value, then judge The affective style for going out the 3rd word is the 3rd affective style.
In an optional embodiment, the absolute value of the first predetermined threshold value and the second predetermined threshold value (can be designated as presetting Probable value) can be with equal, and it is on the occasion of the second predetermined threshold value is negative value that can take the first predetermined threshold value.In this embodiment, When the first probability and the absolute value of the difference of the second probability are more than predetermined probabilities value, the first probability and the second probability pair are judged The word answered has obvious emotion tendency (i.e. affective style), and the affective style corresponding to larger probability is to be somebody's turn to do The affective style of word;When the difference of the first probability and the second probability is not more than predetermined probabilities value, the feelings of the word are judged Feel type unobvious, be the 3rd affective style (i.e. neutral affective style).
In the above embodiment of the present invention, the first affective style can be positive emotion type, and the second affective style can be with For negative emotion type, and the 3rd affective style can be neutral affective style.
It should be further stated that the 3rd word is the word in set of words.
By the above embodiment of the present invention, determine that the first probability is corresponding with the second probability according to threshold value set in advance The affective style of word, improve the degree of accuracy of the affective style of the word of determination.
According to the above embodiment of the present invention, setup module can include:First sets submodule, for using default the One ident value sets the emotion attribute for the first word for belonging to the first affective style, obtains the second word sample of the first affective style This;Second sets submodule, for setting the feelings for the first word for belonging to the second affective style using default second ident value Feel attribute;Obtain the second word sample of the second affective style.
Specifically, the emotion attribute for the word for belonging to each affective style is identified respectively using default ident value, i.e. make The emotion attribute of the word of first affective style is set with the first ident value, and the second affective style is set using the second ident value Word emotion attribute.
In the above-described embodiment, the first ident value can be 1, represent positive emotion type (i.e. the first affective style); Second ident value can be -1, represent negative emotion type (i.e. the second affective style).
Further, in the second word sample can also include set of words in addition to the first word, do not mark feelings Feel other words of type.
In the above embodiment of the present invention, setup module can include:Read module, for reading first in tables of data The Sentiment orientation word of word;3rd sets submodule, for setting the first word using the Emotion tagging value of Sentiment orientation word Emotion attribute, obtain the second word sample of multiple affective styles.
Specifically, the Sentiment orientation word of word is read in tables of data, uses the Emotion tagging value of the Sentiment orientation word (e.g., the Emotion tagging value of positive emotion type is 1, and the Emotion tagging value of negative emotion type is -1) sets the emotion category of word Property, obtain the second word sample.
In the above-described embodiment, Sentiment orientation word can include the word for representing positive emotion type, such as represent to praise Word or positive word etc.;Sentiment orientation word can also include the word for representing negative emotion type, such as derogatory term or disappear Pole word etc.;Sentiment orientation word can also include the word for representing neutral affective style.
According to the above embodiment of the present invention, analytical equipment can also include:Second acquisition module, for samples of text Carry out word segmentation processing, after obtaining set of words, by machine learning method obtain samples of text in each word phrase to Amount, wherein, the one-dimensional data in phrase vector is used for an attribute information for describing the word.
Specifically, the phrase vector for each word in samples of text being obtained by machine learning method can pass through engineering The method (e.g., machine learning program) of habit is realized.Alternatively, the phrase vector in the embodiment can be the vector of 500 dimensions, The operational efficiency and operation result accuracy of terminal can be ensured using the vector of 500 dimensions in this embodiment.
Wherein it is possible to word is characterized as phrase vector using instrument word2vec.Word2vec is one and turns word Change the instrument of vector form into.
By the above embodiments of the present invention, using each word in phrase vectorial samples of text, obtain multiple Set of words, in the affective style of the given sample word of analysis, only the phrase vector sum emotion attribute of word need to be used to calculate The Gaussian Distribution Parameters of each affective style, then calculate other words in samples of text using Gaussian Distribution Parameters and (such as give Random sample this word) each affective style of correspondence probability, and correspond to based on given sample word the probability of each affective style It is determined that the affective style of given sample word, without analyzing the tendentiousness of word by traversal of the prior art, is saved Space needed for storage word and samples of text, when the data volume of text message is larger, can rapidly and accurately be analyzed given The affective style of sample word.
Modules provided in the present embodiment are identical with the application method that the corresponding step of embodiment of the method is provided, should Can also be identical with scene.It is noted, of course, that the scheme that above-mentioned module is related to can be not limited in above-described embodiment Content and scene, and above-mentioned module may operate in terminal or mobile terminal, can be realized by software or hardware.
As can be seen from the above description, the present invention realizes following technique effect:
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words The middle word for extracting predetermined number obtains the first word sample, and is set using default Emotion tagging value in the first word sample The emotion attribute of each word obtain the second word sample, the phrase vector sum emotion based on the word in the second word sample Attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates the 3rd word in samples of text using Gaussian Distribution Parameters The probability of corresponding each affective style, the feelings based on each word of affective style determine the probability the 3rd (such as given sample word) Feel type.In embodiments of the present invention, by the affective style of the partial words in identification of words set, joined using Gaussian Profile Number calculates the probability that given sample word corresponds to each affective style, can accurately obtain given sample word and correspond to each feelings Feel the probability of type, the determine the probability based on each affective style gives the affective style of sample word, and given sample word The corpus that language need not be present in emotion tendency vocabulary can also determine the affective style of given sample word, improve analysis The degree of accuracy of the affective style of given sample word, without traveling through corpus to search the affective style of given sample word.It is logical The embodiment of the present invention is crossed, the affective style that can not accurately analyze word in the prior art is solved the problems, such as, realizes exactly The effect of the affective style of the given sample word of analysis.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific Hardware and software combines.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

  1. A kind of 1. analysis method of word affective style, it is characterised in that including:
    Word segmentation processing is carried out to samples of text, obtains set of words;
    The first word that predetermined number is extracted from the set of words obtains the first word sample;
    The emotion attribute of each first word in the first word sample is set using default Emotion tagging value, obtained To the second word sample of multiple affective styles, wherein, identified respectively using default ident value and belong to each affective style The emotion attribute of word includes:The emotion attribute of the word of first affective style is set using the first ident value, and uses second Ident value sets the emotion attribute of the word of the second affective style;
    The emotion attribute of second word described in phrase vector sum based on the second word in the second word sample calculates each The Gaussian Distribution Parameters of the individual affective style;
    The each affective style of correspondence of the 3rd word in the samples of text is calculated using the Gaussian Distribution Parameters Probability;
    The affective style of 3rd word described in the determine the probability of each affective style of correspondence based on the 3rd word,
    Wherein, word segmentation processing is carried out to samples of text, obtaining set of words also includes:Default vocabulary is obtained from term database Combination, the word in the samples of text is matched with the default word combination in the term database, if the text Word in this sample is identical with the default word combination, then marks off to come from the samples of text by the word, obtain Multiple words, preserve multiple words and obtain the set of words.
  2. 2. analysis method according to claim 1, it is characterised in that the multiple affective style includes the first affective style With the second affective style, wherein, the 3rd described in the determine the probability of each affective style of correspondence based on the 3rd word The affective style of word includes:
    Obtain that the 3rd word corresponds to the first probability of first affective style and the 3rd word corresponds to described second Second probability of affective style;
    Calculate the difference of first probability and second probability;
    Judge whether the difference is more than the first predetermined threshold value;
    If the difference is more than first predetermined threshold value, the affective style for judging the 3rd word is first feelings Feel type;
    If the difference is not more than first predetermined threshold value, judge whether the difference is less than the second predetermined threshold value;
    If the difference is less than second predetermined threshold value, the affective style for judging the 3rd word is second feelings Feel type;
    If the difference is not less than second predetermined threshold value, the affective style for judging the 3rd word is the 3rd emotion Type.
  3. 3. analysis method according to claim 2, it is characterised in that set described first using default Emotion tagging value The emotion attribute of each first word in word sample, obtaining the second word sample of multiple affective styles includes:
    The emotion attribute for first word for belonging to first affective style is set using default first ident value, obtained The second word sample of first affective style;
    The emotion attribute for first word for belonging to second affective style is set using default second ident value;Obtain The second word sample of second affective style.
  4. 4. analysis method according to claim 2, it is characterised in that set described first using default Emotion tagging value The emotion attribute of each first word in word sample, obtaining the second word sample of multiple affective styles includes:
    The Sentiment orientation word of first word is read in tables of data;
    The emotion attribute of first word is set using the Emotion tagging value of the Sentiment orientation word, obtains multiple affective styles The second word sample.
  5. 5. analysis method as claimed in any of claims 1 to 4, it is characterised in that divide to samples of text Word processing, after obtaining set of words, the analysis method also includes:
    By machine learning method obtain each word in the samples of text the phrase vector, wherein, the phrase to One-dimensional data in amount is used for an attribute information for describing the word.
  6. A kind of 6. analytical equipment of word affective style, it is characterised in that including:
    Word-dividing mode, for carrying out word segmentation processing to samples of text, obtain set of words;
    Abstraction module, the first word for extracting predetermined number from the set of words obtain the first word sample;
    Setup module, for setting each first word in the first word sample using default Emotion tagging value Emotion attribute, obtain the second word sample of multiple affective styles, wherein, identified respectively using default ident value and belong to each The emotion attribute of the word of individual affective style includes:The emotion category of the word of first affective style is set using the first ident value Property, and the emotion attribute of the word using the second ident value the second affective style of setting;
    First computing module, for the second word described in the phrase vector sum based on the second word in the second word sample Emotion attribute calculate the Gaussian Distribution Parameters of each affective style;
    Second computing module, the correspondence for calculating the 3rd word in the samples of text using the Gaussian Distribution Parameters are each The probability of the individual affective style;
    Determining module, for the 3rd word described in the determine the probability of each affective style of correspondence based on the 3rd word Affective style,
    Wherein, the word-dividing mode is additionally operable to obtain default word combination from term database, by the samples of text Word is matched with the default word combination in the term database, if the word in the samples of text is preset with described Word combination is identical, then marks off to come from the samples of text by the word, obtain multiple words, preserve multiple words and obtain The set of words.
  7. 7. analytical equipment according to claim 6, it is characterised in that the multiple affective style includes the first affective style With the second affective style, wherein, the determining module includes:
    First acquisition module, the first probability and the described 3rd of first affective style is corresponded to for obtaining the 3rd word Word corresponds to the second probability of second affective style;
    Calculating sub module, for calculating the difference of first probability and second probability;
    Judge module, for judging whether the difference is more than the first predetermined threshold value;
    First determination sub-module, in the case of being more than first predetermined threshold value in the difference, determine the 3rd word The affective style of language is first affective style;
    Judging submodule, in the case of being not more than first predetermined threshold value in the difference, whether judge the difference Less than the second predetermined threshold value;
    Second determination sub-module, in the case of being less than second predetermined threshold value in the difference, determine the 3rd word The affective style of language is second affective style;
    3rd determination sub-module, in the case of being not less than second predetermined threshold value in the difference, determine the described 3rd The affective style of word is the 3rd affective style.
  8. 8. analytical equipment according to claim 7, it is characterised in that the setup module includes:
    First sets submodule, belongs to described the first of first affective style for being set using default first ident value The emotion attribute of word, obtain the second word sample of first affective style;
    Second sets submodule, belongs to described the first of second affective style for being set using default second ident value The emotion attribute of word;Obtain the second word sample of second affective style.
  9. 9. analytical equipment according to claim 7, it is characterised in that the setup module includes:
    Read module, for reading the Sentiment orientation word of first word in tables of data;
    3rd sets submodule, for setting the emotion category of first word using the Emotion tagging value of the Sentiment orientation word Property, obtain the second word sample of multiple affective styles.
  10. 10. the analytical equipment according to any one in claim 6 to 9, it is characterised in that the analytical equipment also wraps Include:
    Second acquisition module, for carrying out word segmentation processing to samples of text, after obtaining set of words, by machine learning side Method obtains the phrase vector of each word in the samples of text, wherein, the one-dimensional data in the phrase vector is used for One attribute information of the word is described.
CN201410779580.9A 2014-12-15 2014-12-15 The analysis method and device of word affective style Active CN104408035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410779580.9A CN104408035B (en) 2014-12-15 2014-12-15 The analysis method and device of word affective style

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410779580.9A CN104408035B (en) 2014-12-15 2014-12-15 The analysis method and device of word affective style

Publications (2)

Publication Number Publication Date
CN104408035A CN104408035A (en) 2015-03-11
CN104408035B true CN104408035B (en) 2018-04-03

Family

ID=52645667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410779580.9A Active CN104408035B (en) 2014-12-15 2014-12-15 The analysis method and device of word affective style

Country Status (1)

Country Link
CN (1) CN104408035B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334342B (en) * 2019-06-10 2024-02-09 创新先进技术有限公司 Word importance analysis method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609459A (en) * 2009-07-21 2009-12-23 北京大学 A kind of extraction system of affective characteristic words
CN102122297A (en) * 2011-03-04 2011-07-13 北京航空航天大学 Semantic-based Chinese network text emotion extracting method
CN103514279A (en) * 2013-09-26 2014-01-15 苏州大学 Method and device for classifying sentence level emotion
CN103593431A (en) * 2013-11-11 2014-02-19 北京锐安科技有限公司 Internet public opinion analyzing method and device
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN103678607A (en) * 2013-12-16 2014-03-26 合肥工业大学 Building method of emotion marking system
CN104142913A (en) * 2013-05-07 2014-11-12 株式会社日立制作所 Distinguishing method and distinguishing system for polarities of words and expressions

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214233B (en) * 2011-06-28 2013-04-10 东软集团股份有限公司 Method and device for classifying texts

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609459A (en) * 2009-07-21 2009-12-23 北京大学 A kind of extraction system of affective characteristic words
CN102122297A (en) * 2011-03-04 2011-07-13 北京航空航天大学 Semantic-based Chinese network text emotion extracting method
CN104142913A (en) * 2013-05-07 2014-11-12 株式会社日立制作所 Distinguishing method and distinguishing system for polarities of words and expressions
CN103514279A (en) * 2013-09-26 2014-01-15 苏州大学 Method and device for classifying sentence level emotion
CN103593431A (en) * 2013-11-11 2014-02-19 北京锐安科技有限公司 Internet public opinion analyzing method and device
CN103678278A (en) * 2013-12-16 2014-03-26 中国科学院计算机网络信息中心 Chinese text emotion recognition method
CN103678607A (en) * 2013-12-16 2014-03-26 合肥工业大学 Building method of emotion marking system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于中文微博语料的情感倾向性分析;罗毅,李利,谭松波,程学旗;《山东大学学报(理学版)》;20141130;第49卷(第11期);第1-7页 *

Also Published As

Publication number Publication date
CN104408035A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN110175325B (en) Comment analysis method based on word vector and syntactic characteristics and visual interaction interface
CN104408191B (en) The acquisition methods and device of the association keyword of keyword
CN109815952A (en) Brand name recognition methods, computer installation and computer readable storage medium
CN107463658B (en) Text classification method and device
CN105808526A (en) Commodity short text core word extracting method and device
CN107544982A (en) Text message processing method, device and terminal
CN108319888B (en) Video type identification method and device and computer terminal
CN106126502A (en) A kind of emotional semantic classification system and method based on support vector machine
CN104778283B (en) A kind of user's occupational classification method and system based on microblogging
CN108959329B (en) Text classification method, device, medium and equipment
CN105843796A (en) Microblog emotional tendency analysis method and device
CN103617192B (en) The clustering method and device of a kind of data object
CN106649250A (en) Method and device for identifying emotional new words
CN107133854A (en) Information recommendation method and device
CN109033212A (en) A kind of file classification method based on similarity mode
CN104462065B (en) The analysis method and device of event affective style
CN106844482B (en) Search engine-based retrieval information matching method and device
CN110059156A (en) Coordinate retrieval method, apparatus, equipment and readable storage medium storing program for executing based on conjunctive word
CN104035955B (en) searching method and device
CN112948575A (en) Text data processing method, text data processing device and computer-readable storage medium
CN107291775A (en) The reparation language material generation method and device of error sample
CN112699232A (en) Text label extraction method, device, equipment and storage medium
CN107291774A (en) Error sample recognition methods and device
CN111144215A (en) Image processing method, image processing device, electronic equipment and storage medium
CN110928986A (en) Legal evidence sorting and recommending method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Word emotion type analysis method and device

Effective date of registration: 20190531

Granted publication date: 20180403

Pledgee: Shenzhen Black Horse World Investment Consulting Co., Ltd.

Pledgor: Beijing Guoshuang Technology Co.,Ltd.

Registration number: 2019990000503

CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: Beijing Guoshuang Technology Co.,Ltd.