The content of the invention
It is a primary object of the present invention to provide a kind of analysis method and device of word affective style, to solve existing skill
Machine can not accurately analyze the problem of affective style of word in art.
To achieve these goals, one side according to embodiments of the present invention, there is provided a kind of word affective style
Analysis method.
Included according to the analysis method of the present invention:Word segmentation processing is carried out to samples of text, obtains set of words;From word collection
The first word that predetermined number is extracted in conjunction obtains the first word sample;Using default Emotion tagging value, the first word sample is set
The emotion attribute of each first word in this, obtains the second word sample of multiple affective styles;Based on the second word sample
In the emotion attribute of the word of phrase vector sum second of the second word calculate the Gaussian Distribution Parameters of each affective style;Use
Gaussian Distribution Parameters calculate the probability of each affective style of correspondence of the 3rd word in samples of text;Pair based on the 3rd word
Answer the affective style of the word of determine the probability the 3rd of each affective style.
Further, multiple affective styles include the first affective style and the second affective style, wherein, based on the 3rd word
The affective style of the word of determine the probability the 3rd of each affective style of correspondence include:Obtain the corresponding first emotion class of the 3rd word
Second probability of the first probability of type and corresponding second affective style of the 3rd word;Calculate the difference of the first probability and the second probability
Value;Judge whether difference is more than the first predetermined threshold value;If difference is more than the first predetermined threshold value, the emotion of the 3rd word is judged
Type is the first affective style;If difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;If
Difference is less than the second predetermined threshold value, then the affective style for judging the 3rd word is the second affective style;If difference is not less than the
Two predetermined threshold values, the then affective style for judging the 3rd word are the 3rd affective style.
Further, the emotion category of each first word in the first word sample is set using default Emotion tagging value
Property, obtaining the second word sample of multiple affective styles includes:Set using default first ident value and belong to the first emotion class
The emotion attribute of first word of type, obtain the second word sample of the first affective style;Set using default second ident value
Put the emotion attribute for the first word for belonging to the second affective style;Obtain the second word sample of the second affective style.
Further, the emotion category of each first word in the first word sample is set using default Emotion tagging value
Property, obtaining the second word sample of multiple affective styles includes:The Sentiment orientation word of the first word is read in tables of data;Use
The Emotion tagging value of Sentiment orientation word sets the emotion attribute of the first word, obtains the second word sample of multiple affective styles.
Further, word segmentation processing is being carried out to samples of text, after obtaining set of words, analysis method also includes:It is logical
The phrase vector that machine learning method obtains each word in samples of text is crossed, wherein, the one-dimensional data in phrase vector is used for
One attribute information of the word is described.
To achieve these goals, another aspect according to embodiments of the present invention, there is provided a kind of word affective style
Analytical equipment.
Included according to the analytical equipment of the present invention:Word-dividing mode, for carrying out word segmentation processing to samples of text, obtain word
Set;Abstraction module, the first word for extracting predetermined number from set of words obtain the first word sample;Mould is set
Block, for setting the emotion attribute of each first word in the first word sample using default Emotion tagging value, obtain more
Second word sample of individual affective style;First computing module, for the phrase based on the second word in the second word sample
The emotion attribute of the word of vector sum second calculates the Gaussian Distribution Parameters of each affective style;Second computing module, for using
Gaussian Distribution Parameters calculate the probability of each affective style of correspondence of the 3rd word in samples of text;Determining module, for base
In the affective style of the word of determine the probability the 3rd of each affective style of correspondence of the 3rd word.
Further, multiple affective styles include the first affective style and the second affective style, wherein it is determined that module bag
Include:First acquisition module, for obtaining the first probability and the 3rd word corresponding second of corresponding first affective style of the 3rd word
Second probability of affective style;Calculating sub module, for calculating the difference of the first probability and the second probability;Judge module, it is used for
Judge whether difference is more than the first predetermined threshold value;First determination sub-module, for being more than the situation of the first predetermined threshold value in difference
Under, the affective style for determining the 3rd word is the first affective style;Judging submodule, for presetting threshold no more than first in difference
In the case of value, judge whether difference is less than the second predetermined threshold value;Second determination sub-module, it is default for being less than second in difference
In the case of threshold value, the affective style for determining the 3rd word is the second affective style;3rd determination sub-module, in difference not
In the case of less than the second predetermined threshold value, the affective style for determining the 3rd word is the 3rd affective style.
Further, setup module includes:First sets submodule, belongs to for being set using default first ident value
The emotion attribute of first word of the first affective style, obtain the second word sample of the first affective style;Second sets submodule
Block, for setting the emotion attribute for the first word for belonging to the second affective style using default second ident value;Obtain second
Second word sample of affective style.
Further, setup module includes:Read module, for reading the Sentiment orientation of the first word in tables of data
Word;3rd sets submodule, for setting the emotion attribute of the first word using the Emotion tagging value of Sentiment orientation word, obtains more
Second word sample of individual affective style.
Further, analytical equipment also includes:Second acquisition module, for carrying out word segmentation processing to samples of text, obtain
To after set of words, the phrase vector of each word in samples of text is obtained by machine learning method, wherein, phrase vector
In one-dimensional data be used to describe an attribute information of the word.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words
In the word of predetermined number extracted by random algorithm obtain the first word sample, and set the using default Emotion tagging value
The emotion attribute of each word in one word sample obtains the second word sample, the word based on the word in the second word sample
Group vector sum emotion attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates samples of text using Gaussian Distribution Parameters
In the 3rd word corresponding each affective style (such as given sample word) probability, based on each affective style determine the probability
The affective style of 3rd word.In embodiments of the present invention, by the affective style of the partial words in identification of words set, make
The probability for giving sample word and corresponding to each affective style is calculated with Gaussian Distribution Parameters, given sample word can be accurately obtained
Language corresponds to the probability of each affective style, and the determine the probability based on each affective style gives the affective style of sample word, and
And the corpus that given sample word need not be present in emotion tendency vocabulary can also determine to give the emotion class of sample word
Type, the degree of accuracy of the affective style of the given sample word of analysis is improved, without traveling through corpus to search given sample word
Affective style.By the embodiment of the present invention, solves the affective style that machine in the prior art can not accurately analyze word
Problem, realizing machine can analyze with the emotion tendency of existing word in automatic decision corpus and automatically and accurately to random sample
The effect of the affective style of this word.
Embodiment
First, the part noun or term occurred during the embodiment of the present invention is described is applied to following solution
Release:
Machine learning is that a kind of method of information is converted data to by the extracting rule in data or pattern, mainly
Machine learning method have induction learning and analytic learning method.In machine-learning process, data are pretreated first, are formed
Feature, then according to certain model of feature-modeling;The data that are collected into of machine learning algorithm analysis, distribution weight, threshold value and its
He reaches the aim of learning at parameter.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments of the invention described herein.In addition, term " comprising " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear
It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
The embodiments of the invention provide a kind of analysis method of word affective style.
Fig. 1 is the flow chart of the analysis method of word affective style according to embodiments of the present invention.As shown in figure 1, this point
It is as follows that analysis method can include step:
Step S102, word segmentation processing is carried out to samples of text, obtains set of words.
Step S104, the first word that predetermined number is extracted from set of words obtain the first word sample.
Specifically, the first word of predetermined number can be extracted from set of words by the program of application random algorithm,
Obtain the first word sample.
Step S106, the emotion category of each first word in the first word sample is set using default Emotion tagging value
Property, obtain the second word sample of multiple affective styles.
Step S108, the emotion attribute meter of the word of phrase vector sum second based on the second word in the second word sample
Calculate the Gaussian Distribution Parameters of each affective style.
Step S110, each affective style of correspondence of the 3rd word in samples of text is calculated using Gaussian Distribution Parameters
Probability.
Step S112, the affective style of the word of determine the probability the 3rd of each affective style of correspondence based on the 3rd word.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words
In the word of predetermined number extracted by random algorithm obtain the first word sample, and set the using default Emotion tagging value
The emotion attribute of each word in one word sample obtains the second word sample, the word based on the word in the second word sample
Group vector sum emotion attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates samples of text using Gaussian Distribution Parameters
In the 3rd word corresponding each affective style (such as given sample word) probability, based on each affective style determine the probability
The affective style of 3rd word.In embodiments of the present invention, by the affective style of the partial words in identification of words set, make
The probability for giving sample word and corresponding to each affective style is calculated with Gaussian Distribution Parameters, given sample word can be accurately obtained
Language corresponds to the probability of each affective style, and the determine the probability based on each affective style gives the affective style of sample word, and
And the corpus that given sample word need not be present in emotion tendency vocabulary can also determine to give the emotion class of sample word
Type, the degree of accuracy of the affective style of the given sample word of analysis is improved, without traveling through corpus to search given sample word
Affective style.By the embodiment of the present invention, solves the affective style that machine in the prior art can not accurately analyze word
Problem, realizing machine can analyze with the emotion tendency of existing word in automatic decision corpus and automatically and accurately to random sample
The effect of the affective style of this word.
In the above-described embodiments, text message can be text (e.g., news item or one obtained from internet
Wen Bo is commented on) or the e-text that is obtained by scanning or inputting the content of paper document, it can also be that user passes through
E-text of terminal input etc.;Machine can be computer or computer program.
It should be further stated that carrying out word segmentation processing to text message, obtaining set of words can be by such as lower section
Method is realized:Text message is split as multiple words according to default word combination, multiple words is preserved and obtains set of words.
Specifically, default word combination can be obtained from term database, and by the word and word in text message
Default word combination in database is matched, if the word in text message is identical with default word combination, by the word
Language marks off from text message to be come, and obtains multiple words.
It is alternatively possible to word segmentation processing is carried out to text message using participle instrument.
For example, if text message is " today, weather was fine ", text information is being carried out at participle using participle instrument
After reason, obtained word can be " today ", " weather ", " very " and " good ".
According to the above embodiment of the present invention, multiple affective styles include the first affective style and the second affective style, wherein,
The affective style of the word of determine the probability the 3rd of each affective style of correspondence based on the 3rd word can include:Obtain the 3rd word
First probability of corresponding first affective style of language and the second probability of corresponding second affective style of the 3rd word;Calculate the first probability
With the difference of the second probability;Judge whether difference is more than the first predetermined threshold value;If difference is more than the first predetermined threshold value, judge
The affective style of 3rd word is the first affective style;If difference is not more than the first predetermined threshold value, judge whether difference is less than
Second predetermined threshold value;If difference is less than the second predetermined threshold value, the affective style for judging the 3rd word is the second affective style;
If difference is not less than the second predetermined threshold value, the affective style for judging the 3rd word is the 3rd affective style.
Specifically, the first probability and the 3rd word of the 3rd word corresponding first affective style (such as given sample word) are obtained
Second probability of corresponding second affective style of language, calculates the difference of first probability and the second probability and judges whether difference is more than
First predetermined threshold value, when difference is more than the first predetermined threshold value, the affective style for judging the 3rd word is the first affective style;
When difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;It is less than the second default threshold in difference
During value, the affective style for judging the 3rd word is the second affective style;When difference is not less than the second predetermined threshold value, then judge
The affective style for going out the 3rd word is the 3rd affective style.
In an optional embodiment, the absolute value of the first predetermined threshold value and the second predetermined threshold value (can be designated as presetting
Probable value) can be with equal, and it is on the occasion of the second predetermined threshold value is negative value that can take the first predetermined threshold value.In this embodiment,
When the first probability and the absolute value of the difference of the second probability are more than predetermined probabilities value, the first probability and the second probability pair are judged
The word answered has obvious emotion tendency (i.e. affective style), and the affective style corresponding to larger probability is to be somebody's turn to do
The affective style of word;When the difference of the first probability and the second probability is not more than predetermined probabilities value, the feelings of the word are judged
Feel type unobvious, be the 3rd affective style (i.e. neutral affective style).
In the above embodiment of the present invention, the first affective style can be positive emotion type, and the second affective style can be with
For negative emotion type, and the 3rd affective style can be neutral affective style.
It should be further stated that the 3rd word is the word in set of words.
By the above embodiment of the present invention, determine that the first probability is corresponding with the second probability according to threshold value set in advance
The affective style of word, improve the degree of accuracy of the affective style of the word of determination.
According to the above embodiment of the present invention, each first in the first word sample is set using default Emotion tagging value
The emotion attribute of word, obtaining the second word sample of multiple affective styles can include:Set using default first ident value
The emotion attribute for the first word for belonging to the first affective style is put, obtains the second word sample of the first affective style;Using pre-
If the second ident value set belong to the second affective style the first word emotion attribute;Obtain the second of the second affective style
Word sample.
Specifically, the emotion attribute for the word for belonging to each affective style is identified respectively using default ident value, i.e. make
The emotion attribute of the word of first affective style is set with the first ident value, and the second affective style is set using the second ident value
Word emotion attribute.
In the above-described embodiment, the first ident value can be 1, represent positive emotion type (i.e. the first affective style);
Second ident value can be -1, represent negative emotion type (i.e. the second affective style).
Further, in the second word sample can also include set of words in addition to the first word, do not mark feelings
Feel other words of type.
In the above embodiment of the present invention, each first in the first word sample is set using default Emotion tagging value
The emotion attribute of word, obtaining the second word sample of multiple affective styles can include:The first word is read in tables of data
Sentiment orientation word;The emotion attribute of the first word is set using the Emotion tagging value of Sentiment orientation word, obtains multiple emotion classes
Second word sample of type.
Specifically, the Sentiment orientation word of word is read in tables of data, uses the Emotion tagging value of the Sentiment orientation word
(e.g., the Emotion tagging value of positive emotion type is 1, and the Emotion tagging value of negative emotion type is -1) sets the emotion category of word
Property, obtain the second word sample.
In the above-described embodiment, Sentiment orientation word can include the word for representing positive emotion type, such as represent to praise
Word or positive word etc.;Sentiment orientation word can also include the word for representing negative emotion type, such as derogatory term or disappear
Pole word etc.;Sentiment orientation word can also include the word for representing neutral affective style.
According to the above embodiment of the present invention, word segmentation processing, after obtaining set of words, analysis side are being carried out to samples of text
Method can also include:The phrase vector of each word in samples of text is obtained by machine learning method, wherein, in phrase vector
One-dimensional data be used to describe an attribute information of the word.
Specifically, the phrase vector for each word in samples of text being obtained by machine learning method can pass through engineering
The method (e.g., machine learning program) of habit is realized.Alternatively, the phrase vector in the embodiment can be the vector of 500 dimensions,
The operational efficiency and operation result accuracy of terminal can be ensured using the vector of 500 dimensions in this embodiment.
Wherein it is possible to word is characterized as phrase vector using instrument word2vec.Word2vec is one and turns word
Change the instrument of vector form into.
By the above embodiments of the present invention, using each word in phrase vectorial samples of text, obtain multiple
Set of words, in the affective style of the given sample word of analysis, only the phrase vector sum emotion attribute of word need to be used to calculate
The Gaussian Distribution Parameters of each affective style, then calculate other words in samples of text using Gaussian Distribution Parameters and (such as give
Random sample this word) each affective style of correspondence probability, and correspond to based on given sample word the probability of each affective style
It is determined that the affective style of given sample word, without analyzing the tendentiousness of word by traversal of the prior art, is saved
Space needed for storage word and samples of text, when the data volume of text message is larger, can rapidly and accurately be analyzed given
The affective style of sample word.
Fig. 2 is a kind of flow chart of the analysis method of optional word affective style according to embodiments of the present invention, below
The above embodiment of the present invention is discussed in detail with reference to Fig. 2.
As shown in Fig. 2 Gauss point is passed through using the emotion tendency of arbitrary sampling method mark word in this embodiment
Cloth divides the emotion tendency of word, then judges given sample word (the 3rd word i.e. in the above embodiment of the present invention)
Sentiment orientation, specifically, the analysis method may include steps of:
Step S202, word segmentation processing is carried out to text training sample, multiple words is obtained and marks the part of speech of each word.
Wherein, text training sample is the samples of text in the above embodiment of the present invention.
Step S204, with each word of array representation, the correspondence that each word is obtained by machine learning method uniquely counts
Group.
Wherein, array is the phrase vector in above-described embodiment, and array can be 500 dimension groups.
Step S206, emotion tendency mark is carried out to subsample phrase using method of random sampling.
Specifically, the emotion attribute of the word of positive emotion type is designated as 1, and the emotion of the word of negative emotion type
Attribute is designated as -1.
Wherein, subsample phrase is the first word sample in the above embodiment of the present invention.
Step S208, the word to front affective style and the word of positive emotion type are calculated with maximum likelihood method respectively
Respective higher-dimension Gaussian Distribution Parameters.
Step S210, array corresponding to given new sample word is searched, and it is positive emotion type to calculate it respectively
With the probability of negative emotion type.
Wherein, sample word is the 3rd word in the above embodiment of the present invention, is the word that text training sample includes
Language.
Step S212, the emotion tendency of sample word is determined according to emotion tendency probability threshold value.
Wherein, emotion tendency is the affective style in the above embodiment of the present invention.
Specifically, obtain default emotion tendency probability threshold value, if the probability of the positive emotion type of sample word with
The difference of the probability of negative emotion type is more than emotion tendency probability threshold value, it is determined that there is sample word obvious emotion to incline
Tropism, and the affective style that the big affective style of probability is the sample word is determined, if the positive emotion type of sample word
The difference of probability and the probability of negative emotion type is not more than emotion tendency probability threshold value, it is determined that the emotion of the sample word
Tendentiousness is neutrality, i.e., the affective style of sample word is neutral affective style.
By the above embodiment of the present invention, each word is represented using array mode, and obtain by machine learning method
Each word uniquely for array, calculating speed is fast;Based on emotion tendency mark is carried out to word, and pass through maximum likelihood
Method calculates higher-dimension Gaussian Distribution Parameters so that the judgement of affective style is more accurate;Using emotion tendency probability threshold value so that
Judging the accuracy of the affective style of word can be adjusted according to the requirement of analyst, add the availability of result.
The embodiment of the present invention additionally provides a kind of analytical equipment of word affective style.The analytical equipment can be by this hair
The analysis method of word affective style in bright above-described embodiment realizes its function.
Fig. 3 is the schematic diagram of the analytical equipment of word affective style according to embodiments of the present invention.As shown in figure 3, this point
Analysis apparatus can include:Word-dividing mode 10, for carrying out word segmentation processing to samples of text, obtain set of words;Abstraction module 30,
The first word for extracting predetermined number from set of words obtains the first word sample;Setup module 50, for using pre-
If Emotion tagging value the emotion attribute of each first word in the first word sample is set, obtain the of multiple affective styles
Two word samples;First computing module 70, for the word of phrase vector sum second based on the second word in the second word sample
The emotion attribute of language calculates the Gaussian Distribution Parameters of each affective style;Second computing module 90, for being joined using Gaussian Profile
Number calculates the probability of each affective style of correspondence of the 3rd word in samples of text;Determining module 110, for based on the 3rd word
The affective style of the word of determine the probability the 3rd of each affective style of correspondence of language.
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words
The middle word for extracting predetermined number obtains the first word sample, and is set using default Emotion tagging value in the first word sample
The emotion attribute of each word obtain the second word sample, the phrase vector sum emotion based on the word in the second word sample
Attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates the 3rd word in samples of text using Gaussian Distribution Parameters
The probability of corresponding each affective style, the feelings based on each word of affective style determine the probability the 3rd (such as given sample word)
Feel type.In embodiments of the present invention, by the affective style of the partial words in identification of words set, joined using Gaussian Profile
Number calculates the probability that given sample word corresponds to each affective style, can accurately obtain given sample word and correspond to each feelings
Feel the probability of type, the determine the probability based on each affective style gives the affective style of sample word, and given sample word
The corpus that language need not be present in emotion tendency vocabulary can also determine the affective style of given sample word, improve analysis
The degree of accuracy of the affective style of given sample word, without traveling through corpus to search the affective style of given sample word.It is logical
The embodiment of the present invention is crossed, solves the problems, such as that machine can not accurately analyze the affective style of word in the prior art, realize machine
Device can give the emotion of sample word with the emotion tendency of existing word in automatic decision corpus and automatically and accurately analysis
The effect of type.
Specifically, the first word of predetermined number can be extracted from set of words by the program of application random algorithm,
Obtain the first word sample.
In the above-described embodiments, text message can be text (e.g., news item or one obtained from internet
Wen Bo is commented on) or the e-text that is obtained by scanning or inputting the content of paper document, it can also be that user passes through
E-text of terminal input etc.;Machine can be computer or computer program.
It should be further stated that carrying out word segmentation processing to text message, obtaining set of words can be by such as lower section
Method is realized:Text message is split as multiple words according to default word combination, multiple words is preserved and obtains set of words.
Specifically, default word combination can be obtained from term database, and by the word and word in text message
Default word combination in database is matched, if the word in text message is identical with default word combination, by the word
Language marks off from text message to be come, and obtains multiple words.
It is alternatively possible to word segmentation processing is carried out to text message using participle instrument.
For example, if text message is " today, weather was fine ", text information is being carried out at participle using participle instrument
After reason, obtained word can be " today ", " weather ", " very " and " good ".
According to the above embodiment of the present invention, multiple affective styles include the first affective style and the second affective style, wherein,
Determining module can include:First acquisition module, for obtaining the first probability and the of corresponding first affective style of the 3rd word
Second probability of corresponding second affective style of three words;Calculating sub module, for calculating the difference of the first probability and the second probability;
Judge module, for judging whether difference is more than the first predetermined threshold value;First determination sub-module, it is pre- for being more than first in difference
If in the case of threshold value, the affective style for determining the 3rd word is the first affective style;Judging submodule, for little in difference
In the case of the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;Second determination sub-module, in difference
In the case of less than the second predetermined threshold value, the affective style for determining the 3rd word is the second affective style;3rd determination sub-module,
In the case of being not less than the second predetermined threshold value in difference, the affective style for determining the 3rd word is the 3rd affective style.
Specifically, the first probability and the 3rd word of the 3rd word corresponding first affective style (such as given sample word) are obtained
Second probability of corresponding second affective style of language, calculates the difference of first probability and the second probability and judges whether difference is more than
First predetermined threshold value, when difference is more than the first predetermined threshold value, the affective style for judging the 3rd word is the first affective style;
When difference is not more than the first predetermined threshold value, judge whether difference is less than the second predetermined threshold value;It is less than the second default threshold in difference
During value, the affective style for judging the 3rd word is the second affective style;When difference is not less than the second predetermined threshold value, then judge
The affective style for going out the 3rd word is the 3rd affective style.
In an optional embodiment, the absolute value of the first predetermined threshold value and the second predetermined threshold value (can be designated as presetting
Probable value) can be with equal, and it is on the occasion of the second predetermined threshold value is negative value that can take the first predetermined threshold value.In this embodiment,
When the first probability and the absolute value of the difference of the second probability are more than predetermined probabilities value, the first probability and the second probability pair are judged
The word answered has obvious emotion tendency (i.e. affective style), and the affective style corresponding to larger probability is to be somebody's turn to do
The affective style of word;When the difference of the first probability and the second probability is not more than predetermined probabilities value, the feelings of the word are judged
Feel type unobvious, be the 3rd affective style (i.e. neutral affective style).
In the above embodiment of the present invention, the first affective style can be positive emotion type, and the second affective style can be with
For negative emotion type, and the 3rd affective style can be neutral affective style.
It should be further stated that the 3rd word is the word in set of words.
By the above embodiment of the present invention, determine that the first probability is corresponding with the second probability according to threshold value set in advance
The affective style of word, improve the degree of accuracy of the affective style of the word of determination.
According to the above embodiment of the present invention, setup module can include:First sets submodule, for using default the
One ident value sets the emotion attribute for the first word for belonging to the first affective style, obtains the second word sample of the first affective style
This;Second sets submodule, for setting the feelings for the first word for belonging to the second affective style using default second ident value
Feel attribute;Obtain the second word sample of the second affective style.
Specifically, the emotion attribute for the word for belonging to each affective style is identified respectively using default ident value, i.e. make
The emotion attribute of the word of first affective style is set with the first ident value, and the second affective style is set using the second ident value
Word emotion attribute.
In the above-described embodiment, the first ident value can be 1, represent positive emotion type (i.e. the first affective style);
Second ident value can be -1, represent negative emotion type (i.e. the second affective style).
Further, in the second word sample can also include set of words in addition to the first word, do not mark feelings
Feel other words of type.
In the above embodiment of the present invention, setup module can include:Read module, for reading first in tables of data
The Sentiment orientation word of word;3rd sets submodule, for setting the first word using the Emotion tagging value of Sentiment orientation word
Emotion attribute, obtain the second word sample of multiple affective styles.
Specifically, the Sentiment orientation word of word is read in tables of data, uses the Emotion tagging value of the Sentiment orientation word
(e.g., the Emotion tagging value of positive emotion type is 1, and the Emotion tagging value of negative emotion type is -1) sets the emotion category of word
Property, obtain the second word sample.
In the above-described embodiment, Sentiment orientation word can include the word for representing positive emotion type, such as represent to praise
Word or positive word etc.;Sentiment orientation word can also include the word for representing negative emotion type, such as derogatory term or disappear
Pole word etc.;Sentiment orientation word can also include the word for representing neutral affective style.
According to the above embodiment of the present invention, analytical equipment can also include:Second acquisition module, for samples of text
Carry out word segmentation processing, after obtaining set of words, by machine learning method obtain samples of text in each word phrase to
Amount, wherein, the one-dimensional data in phrase vector is used for an attribute information for describing the word.
Specifically, the phrase vector for each word in samples of text being obtained by machine learning method can pass through engineering
The method (e.g., machine learning program) of habit is realized.Alternatively, the phrase vector in the embodiment can be the vector of 500 dimensions,
The operational efficiency and operation result accuracy of terminal can be ensured using the vector of 500 dimensions in this embodiment.
Wherein it is possible to word is characterized as phrase vector using instrument word2vec.Word2vec is one and turns word
Change the instrument of vector form into.
By the above embodiments of the present invention, using each word in phrase vectorial samples of text, obtain multiple
Set of words, in the affective style of the given sample word of analysis, only the phrase vector sum emotion attribute of word need to be used to calculate
The Gaussian Distribution Parameters of each affective style, then calculate other words in samples of text using Gaussian Distribution Parameters and (such as give
Random sample this word) each affective style of correspondence probability, and correspond to based on given sample word the probability of each affective style
It is determined that the affective style of given sample word, without analyzing the tendentiousness of word by traversal of the prior art, is saved
Space needed for storage word and samples of text, when the data volume of text message is larger, can rapidly and accurately be analyzed given
The affective style of sample word.
Modules provided in the present embodiment are identical with the application method that the corresponding step of embodiment of the method is provided, should
Can also be identical with scene.It is noted, of course, that the scheme that above-mentioned module is related to can be not limited in above-described embodiment
Content and scene, and above-mentioned module may operate in terminal or mobile terminal, can be realized by software or hardware.
As can be seen from the above description, the present invention realizes following technique effect:
Using the embodiment of the present invention, after word segmentation processing is carried out to samples of text and obtains set of words, from set of words
The middle word for extracting predetermined number obtains the first word sample, and is set using default Emotion tagging value in the first word sample
The emotion attribute of each word obtain the second word sample, the phrase vector sum emotion based on the word in the second word sample
Attribute calculates the Gaussian Distribution Parameters of each affective style, and calculates the 3rd word in samples of text using Gaussian Distribution Parameters
The probability of corresponding each affective style, the feelings based on each word of affective style determine the probability the 3rd (such as given sample word)
Feel type.In embodiments of the present invention, by the affective style of the partial words in identification of words set, joined using Gaussian Profile
Number calculates the probability that given sample word corresponds to each affective style, can accurately obtain given sample word and correspond to each feelings
Feel the probability of type, the determine the probability based on each affective style gives the affective style of sample word, and given sample word
The corpus that language need not be present in emotion tendency vocabulary can also determine the affective style of given sample word, improve analysis
The degree of accuracy of the affective style of given sample word, without traveling through corpus to search the affective style of given sample word.It is logical
The embodiment of the present invention is crossed, the affective style that can not accurately analyze word in the prior art is solved the problems, such as, realizes exactly
The effect of the affective style of the given sample word of analysis.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored
Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific
Hardware and software combines.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.