CN103020249A - Classifier construction method and device as well as Chinese text sentiment classification method and system - Google Patents

Classifier construction method and device as well as Chinese text sentiment classification method and system Download PDF

Info

Publication number
CN103020249A
CN103020249A CN2012105564463A CN201210556446A CN103020249A CN 103020249 A CN103020249 A CN 103020249A CN 2012105564463 A CN2012105564463 A CN 2012105564463A CN 201210556446 A CN201210556446 A CN 201210556446A CN 103020249 A CN103020249 A CN 103020249A
Authority
CN
China
Prior art keywords
marked
sample
feeling polarities
emotion word
negative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105564463A
Other languages
Chinese (zh)
Inventor
李寿山
张小倩
周国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN2012105564463A priority Critical patent/CN103020249A/en
Publication of CN103020249A publication Critical patent/CN103020249A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention provides a classifier construction method and device as well as a Chinese text sentiment classification method and system. The classification method comprises the following steps of: obtaining a sample to be labeled from a sample set to be labeled; looking up sentiment words in the sample to be labeled; obtaining the sentiment polarity of each sentiment word; converting the sentiment polarity of the sentiment words of which the sentiment polarity conforms to a sentiment polarity conversion rule in the sample to be labeled; counting the amount of the sentiment words of which the sentiment polarity is negative and positive in the sample to be labeled; according to the amount of the sentiment words of which the sentiment polarity is positive and the amount of the sentiment words of which the sentiment polarity is negative, determining the sentiment polarity of the sample to be labeled to obtain a labeled sample; according to the labeled sample, labeling other samples to be labeled in the sample set to be labeled to obtain a labeled sample set; constructing a maximum entropy classifier by the labeled sample set; and classifying a Chinese text to be classified by the maximum entropy classifier. According to the method, the device and the system provided by the invention, the Chinese text classification time is shortened, and the classification accuracy is improved.

Description

The construction method of sorter and device, Chinese text sensibility classification method and system
Technical field
The present invention relates to natural language processing and mode identification technology, relate in particular to a kind of construction method and device, Chinese text sensibility classification method and system of sorter.
Background technology
Flourish along with Web2.0, produced on the internet a large amount of masses for personage, event, product etc. with the review information of emotion, the user can understand popular public opinion for the view of a certain event or product by browsing these review information, because the quantity of information of review information is larger, if merely relying on, the user manually goes to collect and arrangement, can waste a large amount of time and efforts, therefore, in the urgent need to utilizing computer help user quick obtaining and these review information of arrangement, the text emotion analytical technology is arisen at the historic moment.
So-called text emotion analysis utilizes computer help user quick obtaining, arrangement and analysis review information exactly, to analyze with the subjective texts of emotion color, process, the process of conclusion and reasoning.Wherein, the text emotion classification is the substance that text emotion is analyzed, it can be divided into Sentence-level, chapter level etc. by different granularities, for Sentence-level and chapter level, the text emotion classification refers to text is divided into front text and negative text, for example, " I am delithted with this product ", by the text emotion classification, the words will be classified as the front text, and " this this book is too poor " will be classified as negative text.
At present, text emotion sorting technique commonly used is based on measure of supervision, the method is trained the classification of specific area with the data that the field is labeled, although this method has obtained preferably classifying quality, because it needs a large amount of artificial tagged corpus, therefore, the time that makes up sorter is longer, and, just must again mark language material if change a field, namely the field dependence is larger.
Summary of the invention
In view of this, the invention provides a kind of construction method and device, Chinese text sensibility classification method and system of sorter, make up the long and larger problem of application dependence of time of sorter in order to solve existing sorting technique.Its technical scheme is as follows:
A kind of construction method of sorter comprises:
Obtain sample set to be marked and obtain a sample to be marked from described sample set to be marked, wherein, described sample set to be marked comprises at least two samples to be marked;
Search the emotion word in the described sample to be marked, and obtain the feeling polarities of each emotion word, wherein, described feeling polarities comprises positive and negative;
Change the feeling polarities that meets the emotion word of feeling polarities transition rule in the described sample to be marked;
Adding up feeling polarities in the described sample to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities;
For the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined described sample to be marked, obtain marking sample according to described feeling polarities;
Utilize the method for self study that other sample to be marked in the described sample set to be marked is marked according to described mark sample, obtain marking sample set;
Utilize the mark sample in the described mark sample set to make up maximum entropy classifiers.
Preferably, changing the feeling polarities that meets the emotion word of feeling polarities transition rule in the described sample to be marked comprises:
If negative keyword occurred in the sentence at the emotion word place in the sample to be marked, then changed the feeling polarities of this emotion word;
If the turnover keyword has appearred in next sentence or next paragraph of the sentence at the emotion word place in the sample to be marked, then change the feeling polarities of this emotion word;
And/or, if the sentence at the emotion word place in the sample to be marked occurred being willing to keyword, then change the feeling polarities of this emotion word.
Preferably, described is that the quantity of negative emotion word determines that the feeling polarities of described sample to be marked comprises according to described feeling polarities for the quantity of positive emotion word and feeling polarities:
If feeling polarities for the quantity of positive emotion word and feeling polarities be the difference of quantity of negative emotion word greater than setting threshold, the feeling polarities of then determining described sample to be marked is the front;
If feeling polarities be the quantity of negative emotion word and feeling polarities for the difference of the quantity of positive emotion word greater than described setting threshold, determine that then the feeling polarities of described sample to be marked is negative.
Preferably,, obtain marking sample and comprise for the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined described sample to be marked according to described feeling polarities:
Utilize described mark sample to make up maximum entropy classifiers;
Utilize described maximum entropy classifiers that other sample to be marked in the described sample set to be marked is marked classification, obtain classification results;
Determine the feeling polarities of each sample to be marked to obtain marking sample set according to described classification results.
A kind of Chinese text sensibility classification method comprises: the construction method of above-mentioned sorter also comprises:
Utilize the maximum entropy classifiers that makes up that Chinese text to be sorted is classified.
A kind of construction device of sorter comprises: acquiring unit, search unit, polarity transformation unit, statistic unit, determining unit, self study unit and sorter construction unit;
Described acquiring unit is used for obtaining sample set to be marked and obtaining a sample to be marked from described sample set to be marked, and wherein, described sample set to be marked comprises at least two samples to be marked;
The described unit of searching is used for searching the emotion word of described sample to be marked, and obtains the feeling polarities of each emotion word, and wherein, described feeling polarities comprises positive and negative;
Described polarity transformation unit is used for changing the feeling polarities that described sample to be marked meets the emotion word of feeling polarities transition rule;
Described statistic unit, being used for adding up described sample feeling polarities to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities;
Described determining unit is used for according to described feeling polarities obtaining marking sample for the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined described sample to be marked;
Described self study unit is used for utilizing the method for self study that other sample to be marked of described sample set to be marked is marked according to described mark sample, obtains marking sample set;
Described sorter construction unit is used for utilizing the mark sample of described mark sample set to make up maximum entropy classifiers.
Preferably, described polarity transformation unit comprises: the first polarity transformation subelement, the second polarity transformation subelement and/or the 3rd polarity transformation subelement;
Described the first polarity transformation subelement is used for changing the feeling polarities of this emotion word when negative keyword having occurred in the sentence at the emotion word place of sample to be marked;
Described the second polarity transformation subelement is used for changing the feeling polarities of this emotion word when the turnover keyword has appearred in next or next paragraph of the sentence at the emotion word place of sample to be marked;
Described the 3rd polarity transformation subelement is used for changing the feeling polarities of this emotion word when the sentence at the emotion word place of sample to be marked has occurred being willing to keyword.
Preferably, described determining unit comprises: first determines subelement and second definite subelement;
Described first determines subelement, be used for when feeling polarities be the difference of quantity of negative emotion word during greater than setting threshold for the quantity of positive emotion word and feeling polarities, the feeling polarities of determining described sample to be marked is the front;
Described second determines subelement, be used for when feeling polarities be that quantity and the feeling polarities of negative emotion word is the difference of quantity of emotion word in front during greater than described setting threshold, the feeling polarities of determining described sample to be marked is negative.
Preferably, described self study unit comprises: sorter makes up subelement, classification subelement and the 3rd is determined subelement;
Described sorter makes up subelement, is used for utilizing described mark sample to make up maximum entropy classifiers;
Described classification subelement is used for utilizing described maximum entropy classifiers that other sample to be marked of described sample set to be marked is marked classification, obtains classification results;
The 3rd determines subelement, is used for determining according to described classification results the feeling polarities of each sample to be marked.
A kind of Chinese text emotional semantic classification system comprises also comprising the construction device of above-mentioned sorter: taxon;
Described taxon is classified to Chinese text to be sorted for the maximum entropy classifiers that the construction device that utilizes described sorter makes up.
The construction method of sorter provided by the invention and device, Chinese text sensibility classification method and system, application feeling polarities transition rule is carried out polarity transformation to the feeling polarities of emotion device, and utilize the method for self study that other sample to be marked in the sample set to be marked is marked according to the mark sample, the maximum entropy classifiers that will make up according to the mark sample of mark sample set is as the sorter of Chinese text emotional semantic classification.The construction method of sorter provided by the invention and device, Chinese text sensibility classification method and system, avoided the people of artificial mark training sample waste to consume cost, shortened the structure time that is used for the sorter of Chinese text emotional semantic classification, simultaneously, improved the accuracy of Chinese text emotional semantic classification.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is embodiments of the invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to the accompanying drawing that provides other accompanying drawing.
The schematic flow sheet of the construction method of the sorter that Fig. 1 provides for the embodiment of the invention;
The structural representation of the Chinese text emotional semantic classification system that Fig. 2 provides for the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The invention provides a kind of construction method of sorter, Fig. 1 is the process flow diagram of the method, and the method can comprise:
S101: obtain sample set to be marked and obtain a sample to be marked from described mark sample set, wherein, sample set to be marked comprises at least two samples to be marked.
S102: search the emotion word in the sample to be marked, and obtain the feeling polarities of each emotion word, wherein, the feeling polarities of emotion word comprises positive and negative.
S103: the feeling polarities that changes the emotion word that meets the feeling polarities transition rule in the sample to be marked.
S104: adding up feeling polarities in the sample to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities.
S105: for the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined sample to be marked, obtain marking sample according to feeling polarities.
S106: utilize the method for self study that other sample to be marked in the sample set to be marked is marked according to the mark sample, obtain marking sample set, comprised all mark samples in this mark sample set.
S107: utilize the mark sample in the mark sample set to make up maximum entropy classifiers.
In another embodiment of the present invention, step S102 can comprise: the emotion word that contrast is preset and the corresponding table of feeling polarities, and in sample to be marked, search the emotion word, and obtain the feeling polarities corresponding with the emotion word according to the correspondence table.Table 1 has provided an emotion word and need to prove with the corresponding table of feeling polarities, and present embodiment is not limited to the cited emotion word of table 1, and other emotion word also can be arranged.
Table 1
Feeling polarities The emotion word
Positive Like, be willing to, satisfied, good, fine
Negative Disagreeable, be sick of, sad, bad
In an again embodiment of the present invention, the feeling polarities transition rule can comprise: negate rule, turnover rule and/or can be willing to regular.Accordingly, step S103 can comprise: if negative keyword occurred in the sentence at the emotion word place in the sample to be marked, then change the feeling polarities of this emotion word; If the turnover keyword has appearred in next sentence or next paragraph of the sentence at the emotion word place in the sample to be marked, then change the feeling polarities of this emotion word; And/or, if the sentence at the emotion word place in the sample to be marked occurred being willing to keyword, then change the feeling polarities of this emotion word.Table 2 has been listed common negative keyword, turnover keyword and can be willing to keyword, and present embodiment is not limited to these keywords certainly, also can comprise the keyword that other expression is negated, transferred and can be willing to.
Table 2
Figure BDA00002616392400071
The below describes for enumerating three instantiations based on the feeling polarities of negating rule, turnover rule and being willing to regular transformation emotion word:
Example 1: I do not like this product.
In the sentence of example 1, if the emotion word is " liking ", and negative keyword " no " has appearred in this sentence, then change the feeling polarities that the emotion word " is liked ", the feeling polarities that is about to " liking " is changed into negative by the front.
Example 2: I like the idea of this product, but this quality I can not accept.
In the sentence of example 2, if the emotion word is " liking ", and in next sentence of its place sentence turnover keyword " still " has appearred, then change the feeling polarities that the emotion word " is liked ", the feeling polarities that is about to " liking " is changed into negative by the front.
Example 3: if color is all right of redness.
In the sentence of example 3, if the emotion word is " good ", and in its place sentence, occurred being willing in the front of emotion word " good " keyword " if ", then change the feeling polarities of emotion word " good ", the feeling polarities that is about to " good " is changed into negative by the front.
In another embodiment of the present invention, step S106 can comprise: if feeling polarities for the quantity of positive emotion word and feeling polarities be the difference of quantity of negative emotion word greater than setting threshold, the feeling polarities of then determining sample to be marked is the front; If feeling polarities be the quantity of negative emotion word and feeling polarities for the difference of the quantity of positive emotion word greater than setting threshold, determine that then the feeling polarities of sample to be marked is negative.Suppose that feeling polarities is N for the quantity of positive emotion word +, feeling polarities is that the quantity of negative emotion word is N -, setting threshold is N MaxIf, N +-N -N Max, the feeling polarities of then determining sample to be marked is positive, if N --N +N Max, determine that then the feeling polarities of sample to be marked is negative.
In another embodiment of the present invention, step S105 can comprise: utilize the mark sample to make up maximum entropy classifiers; Utilize maximum entropy classifiers that other sample to be marked in the sample set to be marked is marked classification, obtain classification results, determine the feeling polarities of each sample to be marked finally to obtain two master sample collection according to classification results: positive mark sample set and negative mark sample set.
Wherein, a kind of as in the machine learning classification method of maximum entropy classifiers is based on the maximum entropy information theory, and its basic thought is to set up model for all known factors, and all unknown factors are foreclosed.That is to say, find a kind of probability distribution, satisfy all known facts, but allow the randomization of unknown factor.With respect to the naive Bayesian method, it is independent that the characteristics of the method maximum are exactly the condition that does not need to satisfy between feature and the feature.Therefore, the method be fit to merge various different features, and need not to consider the impact between them.
Under maximum entropy model, the formula of predicted condition probability P (c|D) is as follows:
P ( c i | D ) = 1 Z ( D ) exp ( Σ k λ k , c F k , c ( D , c i ) )
Wherein Z (D) is normalized factor.P K, cBe fundamental function, be defined as:
F k , c ( D , c ′ ) = 1 n k ( d ) > 0 and c ′ = c 0 oterwise
The present invention also provides a kind of Chinese text sensibility classification method, and the method also comprises except comprising above-mentioned step S101-S107: utilize the maximum entropy classifiers that makes up that Chinese text to be sorted is classified.
For Chinese text sensibility classification method and the existing Chinese text sensibility classification method that present embodiment is provided compares, present embodiment adopts comment language material in some fields as non-mark sample to be sorted, respectively these two kinds of sorting techniques is tested.The language material that uses in the test is the data in two fields, is respectively the comment about case and bag and hotel.The evaluation criterion that experiment is selected is accuracy rate Accuracy, accuracy rate is to estimate the comprehensive evaluation standard of general classification problem, for each field, standard rate be calculated as Accuracy=(TP+NP)/A, wherein, the TP correct total sample number of face text classification of making a comment or criticism, NP refers to the total sample number that negative text classification is correct, A refers to the correct total sample number of classification selected.
Need to prove, the correctness of the feeling polarities of Chinese text to be sorted is judged that judge that specifically content is, in the text of front, if positive emotion word number is more than negative emotion word number, then positive text classification is correct; In the text of front, if positive emotion word number lacks than negative emotion word number or number equates that then positive text classification is incorrect; In negative text, if negative emotion word number is more than positive emotion word number, then negative text classification is correct; In negative text, if negative emotion word number lacks than positive emotion word number or number equates that then negative text classification is incorrect.
The result that table 3 is classified to Chinese text for the sorting technique that adopts sorting technique provided by the invention and employing prior art compares:
Table 3
Figure BDA00002616392400091
In this experiment, adopt the mark sample of varying number to carry out respectively experimental verification, and N Max=3.
Emotion word number in each sample that traditional sorting technique will be calculated is as the foundation of the emotion classification of judging sample.The method that the embodiment of the invention provides, at first adopt the feeling polarities transition rule that the polarity transformation judgement done in the emotion word to the emotion word, rule comprises: negate rule, turnover rule, can be willing to rule, avoid the impact of feeling polarities transformation on the emotion word judgment, and will use the maximum entropy classifiers that makes up behind the non-mark sample automatic marking to be used for the Chinese text emotional semantic classification.
Can find out from the data of table 3, the accuracy rate of the Chinese emotional reaction categorization classification that the application present embodiment provides, the accuracy rate that will be higher than traditional text emotion sorting technique far away, the amplitude that improves is the highest above 3 percentage points, the accuracy rate of the sorting technique that the proving again present embodiment provides is high, when reducing artificial mark cost, the emotion word of having avoided the feeling polarities transformation occurs is conducive to improve the Text Categorization effect to the adverse effect that the text classification result brings.
Corresponding with the construction method of above-mentioned sorter, the embodiment of the invention also provides a kind of construction device of sorter, Fig. 2 is the structural representation of this device, and this device can comprise: acquiring unit 101, search unit 102, polarity transformation unit 103, statistic unit 104, determining unit 105, self study unit 106 and sorter construction unit 107.Wherein:
Acquiring unit 101 is used for obtaining sample set to be marked and obtaining a sample to be marked from sample set to be marked, and wherein, sample set to be marked comprises at least two samples to be marked.Search unit 102, be used for searching the emotion word of sample to be marked, and obtain the feeling polarities of each emotion word, wherein, feeling polarities comprises positive and negative.Polarity transformation unit 103 is used for changing the feeling polarities that sample to be marked meets the emotion word of feeling polarities transition rule.Statistic unit 104, being used for adding up sample feeling polarities to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities.Determining unit 105 is used for according to feeling polarities obtaining marking sample for the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined sample to be marked.Self study unit 106 is used for utilizing according to the mark sample and utilizes the method for self study that other sample to be marked of sample set to be marked is marked, and obtains marking sample set.Sorter construction unit 107 is used for utilizing the mark sample of mark sample set to make up maximum entropy classifiers.
In another embodiment of the present invention, polarity transformation unit 103 can comprise: the first polarity transformation subelement, the second polarity transformation subelement and/or the 3rd polarity transformation subelement.Wherein:
The first polarity transformation subelement is used for changing the feeling polarities of this emotion word when negative keyword having occurred in the sentence at the emotion word place of sample to be marked.The second polarity transformation subelement is used for changing the feeling polarities of this emotion word when the turnover keyword has appearred in next or next paragraph of the sentence at the emotion word place of sample to be marked.The 3rd polarity transformation subelement is used for changing the feeling polarities of this emotion word when the sentence at the emotion word place of sample to be marked has occurred being willing to keyword.
In another embodiment of the present invention, determining unit 105 can comprise: first determines subelement and second definite subelement.Wherein:
First determines subelement, be used for when feeling polarities be the difference of quantity of negative emotion word during greater than setting threshold for the quantity of positive emotion word and feeling polarities, the feeling polarities of determining sample to be marked is the front.Second determines subelement, be used for when feeling polarities be that quantity and the feeling polarities of negative emotion word is the difference of quantity of emotion word in front during greater than setting threshold, the feeling polarities of determining sample to be marked is negative.
In an again embodiment of the present invention, learning by oneself from the unit 106 can comprise: sorter makes up subelement, classification subelement and the 3rd is determined subelement.Wherein:
Sorter makes up subelement, is used for utilizing the mark sample to make up maximum entropy classifiers.The classification subelement is used for utilizing maximum entropy classifiers that other sample to be marked of sample set to be marked is classified, and obtains classification results.The 3rd determines subelement, is used for determining according to classification results the feeling polarities of each sample to be marked.
Corresponding with above-mentioned Chinese text sensibility classification method, the embodiment of the invention also provides a kind of Chinese text emotional semantic classification system, and this system also comprises except the construction device that comprises above-mentioned sorter: taxon.Wherein, taxon is classified to Chinese text to be sorted for the maximum entropy classifiers that the construction device that utilizes sorter makes up.
The construction method of the sorter that the embodiment of the invention provides and device, Chinese text sensibility classification method and system, application feeling polarities transition rule is carried out polarity transformation to the feeling polarities of emotion device, and utilize the method for self study that other sample to be marked in the sample set to be marked is marked according to the mark sample, the maximum entropy classifiers that will make up according to the mark sample of mark sample set is as the sorter of Chinese text emotional semantic classification.The construction method of sorter provided by the invention and device, Chinese text sensibility classification method and system, avoided the people of artificial mark training sample waste to consume cost, shortened the structure time that is used for the sorter of Chinese text emotional semantic classification, simultaneously, improved the accuracy of Chinese text emotional semantic classification.
For the convenience of describing, be divided into various unit with function when describing above device and describe respectively.Certainly, when enforcement is of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each unit.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium, such as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for system embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select according to the actual needs wherein some or all of module to realize the purpose of present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation of not paying creative work.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise distributed computing environment of above any system or equipment etc.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the present invention, in these distributed computing environment, be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
Need to prove, in this article, relational terms such as the first and second grades only is used for an entity or operation are made a distinction with another entity or operation, and not necessarily requires or hint and have the relation of any this reality or sequentially between these entities or the operation.
The above only is the specific embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. the construction method of a sorter is characterized in that, comprising:
Obtain sample set to be marked and obtain a sample to be marked from described sample set to be marked, wherein, described sample set to be marked comprises at least two samples to be marked;
Search the emotion word in the described sample to be marked, and obtain the feeling polarities of each emotion word, wherein, described feeling polarities comprises positive and negative;
Change the feeling polarities that meets the emotion word of feeling polarities transition rule in the described sample to be marked;
Adding up feeling polarities in the described sample to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities;
For the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined described sample to be marked, obtain marking sample according to described feeling polarities;
Utilize the method for self study that other sample to be marked in the described sample set to be marked is marked according to described mark sample, obtain marking sample set;
Utilize the mark sample in the described mark sample set to make up maximum entropy classifiers.
2. method according to claim 1 is characterized in that, changes the feeling polarities that meets the emotion word of feeling polarities transition rule in the described sample to be marked and comprises:
If negative keyword occurred in the sentence at the emotion word place in the sample to be marked, then changed the feeling polarities of this emotion word;
If the turnover keyword has appearred in next sentence or next paragraph of the sentence at the emotion word place in the sample to be marked, then change the feeling polarities of this emotion word;
And/or, if the sentence at the emotion word place in the sample to be marked occurred being willing to keyword, then change the feeling polarities of this emotion word.
3. method according to claim 1 is characterized in that, described is that the quantity of negative emotion word determines that the feeling polarities of described sample to be marked comprises according to described feeling polarities for the quantity of positive emotion word and feeling polarities:
If feeling polarities for the quantity of positive emotion word and feeling polarities be the difference of quantity of negative emotion word greater than setting threshold, the feeling polarities of then determining described sample to be marked is the front;
If feeling polarities be the quantity of negative emotion word and feeling polarities for the difference of the quantity of positive emotion word greater than described setting threshold, determine that then the feeling polarities of described sample to be marked is negative.
4. method according to claim 1 is characterized in that, utilizes the method for self study that other sample to be marked in the described sample set to be marked is marked according to described mark sample, obtains marking sample set and comprises:
Utilize described mark sample to make up maximum entropy classifiers;
Utilize described maximum entropy classifiers that other sample to be marked in the described sample set to be marked is marked classification, obtain classification results;
Determine the feeling polarities of each sample to be marked to obtain marking sample set according to described classification results.
5. a Chinese text sensibility classification method is characterized in that, comprising: the construction method such as the described sorter of any one among the claim 1-4 also comprises:
Utilize the maximum entropy classifiers that makes up that Chinese text to be sorted is classified.
6. the construction device of a sorter is characterized in that, comprising: acquiring unit, search unit, polarity transformation unit, statistic unit, determining unit, self study unit and sorter construction unit;
Described acquiring unit is used for obtaining sample set to be marked and obtaining a sample to be marked from described sample set to be marked, and wherein, described sample set to be marked comprises at least two samples to be marked;
The described unit of searching is used for searching the emotion word of described sample to be marked, and obtains the feeling polarities of each emotion word, and wherein, described feeling polarities comprises positive and negative;
Described polarity transformation unit is used for changing the feeling polarities that described sample to be marked meets the emotion word of feeling polarities transition rule;
Described statistic unit, being used for adding up described sample feeling polarities to be marked is the quantity of negative emotion word for the quantity of positive emotion word and feeling polarities;
Described determining unit is used for according to described feeling polarities obtaining marking sample for the quantity of positive emotion word and feeling polarities are the feeling polarities that the quantity of negative emotion word is determined described sample to be marked;
Described self study unit is used for utilizing the method for self study that other sample to be marked of described sample set to be marked is marked according to described mark sample, obtains marking sample set;
Described sorter construction unit is used for utilizing the mark sample of described mark sample set to make up maximum entropy classifiers.
7. device according to claim 6 is characterized in that, described polarity transformation unit comprises: the first polarity transformation subelement, the second polarity transformation subelement and/or the 3rd polarity transformation subelement;
Described the first polarity transformation subelement is used for changing the feeling polarities of this emotion word when negative keyword having occurred in the sentence at the emotion word place of sample to be marked;
Described the second polarity transformation subelement is used for changing the feeling polarities of this emotion word when the turnover keyword has appearred in next or next paragraph of the sentence at the emotion word place of sample to be marked;
Described the 3rd polarity transformation subelement is used for changing the feeling polarities of this emotion word when the sentence at the emotion word place of sample to be marked has occurred being willing to keyword.
8. device according to claim 6 is characterized in that, described determining unit comprises: first determines subelement and second definite subelement;
Described first determines subelement, be used for when feeling polarities be the difference of quantity of negative emotion word during greater than setting threshold for the quantity of positive emotion word and feeling polarities, the feeling polarities of determining described sample to be marked is the front;
Described second determines subelement, be used for when feeling polarities be that quantity and the feeling polarities of negative emotion word is the difference of quantity of emotion word in front during greater than described setting threshold, the feeling polarities of determining described sample to be marked is negative.
9. device according to claim 6 is characterized in that, described self study unit comprises: sorter makes up subelement, classification subelement and the 3rd is determined subelement;
Described sorter makes up subelement, is used for utilizing described mark sample to make up maximum entropy classifiers;
Described classification subelement is used for utilizing described maximum entropy classifiers that other sample to be marked of described sample set to be marked is marked classification, obtains classification results;
The 3rd determines subelement, is used for determining according to described classification results the feeling polarities of each sample to be marked.
10. a Chinese text emotional semantic classification system is characterized in that, comprises such as the construction device of the described sorter of any one among the claim 6-9, also comprises: taxon;
Described taxon is classified to Chinese text to be sorted for the maximum entropy classifiers that the construction device that utilizes described sorter makes up.
CN2012105564463A 2012-12-19 2012-12-19 Classifier construction method and device as well as Chinese text sentiment classification method and system Pending CN103020249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105564463A CN103020249A (en) 2012-12-19 2012-12-19 Classifier construction method and device as well as Chinese text sentiment classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105564463A CN103020249A (en) 2012-12-19 2012-12-19 Classifier construction method and device as well as Chinese text sentiment classification method and system

Publications (1)

Publication Number Publication Date
CN103020249A true CN103020249A (en) 2013-04-03

Family

ID=47968852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105564463A Pending CN103020249A (en) 2012-12-19 2012-12-19 Classifier construction method and device as well as Chinese text sentiment classification method and system

Country Status (1)

Country Link
CN (1) CN103020249A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530283A (en) * 2013-10-25 2014-01-22 苏州大学 Method for extracting emotional triggers
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device
CN104317965A (en) * 2014-11-14 2015-01-28 南京理工大学 Establishment method of emotion dictionary based on linguistic data
CN106844743A (en) * 2017-02-14 2017-06-13 国网新疆电力公司信息通信公司 The sensibility classification method and device of Uighur text
CN107644101A (en) * 2017-09-30 2018-01-30 百度在线网络技术(北京)有限公司 Information classification approach and device, information classification equipment and computer-readable medium
CN108241650A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 The training method and device of training criteria for classification
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus
CN112445897A (en) * 2021-01-28 2021-03-05 京华信息科技股份有限公司 Method, system, device and storage medium for large-scale classification and labeling of text data
CN114443849A (en) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 Method and device for selecting marked sample, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125371A1 (en) * 2007-08-23 2009-05-14 Google Inc. Domain-Specific Sentiment Classification
CN102323944A (en) * 2011-09-02 2012-01-18 苏州大学 Sentiment classification method based on polarity transfer rules
CN102682130A (en) * 2012-05-17 2012-09-19 苏州大学 Text sentiment classification method and system
CN102682124A (en) * 2012-05-16 2012-09-19 苏州大学 Emotion classifying method and device for text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125371A1 (en) * 2007-08-23 2009-05-14 Google Inc. Domain-Specific Sentiment Classification
CN102323944A (en) * 2011-09-02 2012-01-18 苏州大学 Sentiment classification method based on polarity transfer rules
CN102682124A (en) * 2012-05-16 2012-09-19 苏州大学 Emotion classifying method and device for text
CN102682130A (en) * 2012-05-17 2012-09-19 苏州大学 Text sentiment classification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
代大明等: "基于情绪词的非监督中文情感分类方法研究", 《中文信息学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530283A (en) * 2013-10-25 2014-01-22 苏州大学 Method for extracting emotional triggers
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device
CN104317965B (en) * 2014-11-14 2018-04-03 南京理工大学 Sentiment dictionary construction method based on language material
CN104317965A (en) * 2014-11-14 2015-01-28 南京理工大学 Establishment method of emotion dictionary based on linguistic data
CN108241650B (en) * 2016-12-23 2020-08-11 北京国双科技有限公司 Training method and device for training classification standard
CN108241650A (en) * 2016-12-23 2018-07-03 北京国双科技有限公司 The training method and device of training criteria for classification
CN106844743B (en) * 2017-02-14 2020-04-24 国网新疆电力公司信息通信公司 Emotion classification method and device for Uygur language text
CN106844743A (en) * 2017-02-14 2017-06-13 国网新疆电力公司信息通信公司 The sensibility classification method and device of Uighur text
WO2019042450A1 (en) * 2017-09-04 2019-03-07 华为技术有限公司 Natural language processing method and apparatus
US11630957B2 (en) 2017-09-04 2023-04-18 Huawei Technologies Co., Ltd. Natural language processing method and apparatus
CN107644101A (en) * 2017-09-30 2018-01-30 百度在线网络技术(北京)有限公司 Information classification approach and device, information classification equipment and computer-readable medium
CN112445897A (en) * 2021-01-28 2021-03-05 京华信息科技股份有限公司 Method, system, device and storage medium for large-scale classification and labeling of text data
CN114443849A (en) * 2022-02-09 2022-05-06 北京百度网讯科技有限公司 Method and device for selecting marked sample, electronic equipment and storage medium
CN114443849B (en) * 2022-02-09 2023-10-27 北京百度网讯科技有限公司 Labeling sample selection method and device, electronic equipment and storage medium
US11907668B2 (en) 2022-02-09 2024-02-20 Beijing Baidu Netcom Science Technology Co., Ltd. Method for selecting annotated sample, apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN103020249A (en) Classifier construction method and device as well as Chinese text sentiment classification method and system
CN102682124B (en) Emotion classifying method and device for text
CN103631961B (en) Method for identifying relationship between sentiment words and evaluation objects
CN102663139B (en) Method and system for constructing emotional dictionary
Kim Predicting L2 Writing Proficiency Using Linguistic Complexity Measures: A Corpus-Based Study.
CN105205124B (en) A kind of semi-supervised text sentiment classification method based on random character subspace
CN102682130B (en) Text sentiment classification method and system
Saha et al. A discriminative model approach for suggesting tags automatically for stack overflow questions
El-Halees Mining opinions in user-generated contents to improve course evaluation
CN105138653B (en) It is a kind of that method and its recommendation apparatus are recommended based on typical degree and the topic of difficulty
CN104794212A (en) Context sentiment classification method and system based on user comment text
CN105354327A (en) Interface API recommendation method and system based on massive data analysis
CN104573114A (en) Music classification method and device
CN102880600A (en) Word semantic tendency prediction method based on universal knowledge network
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN104268134A (en) Subjective and objective classifier building method and system
Agrawal et al. Identifying enrichment candidates in textbooks
CN110472257A (en) A kind of MT engine assessment preferred method and system based on sentence pair
Nasseri Is postgraduate English academic writing more clausal or phrasal? Syntactic complexification at the crossroads of genre, proficiency, and statistical modelling
CN103514279A (en) Method and device for classifying sentence level emotion
CN104572877A (en) Detection method and detection system of game public opinion
CN110134799A (en) A kind of text corpus based on BM25 algorithm build and optimization method
CN103473356B (en) Document-level emotion classifying method and device
CN105786898A (en) Domain ontology construction method and apparatus
Antunes et al. Readability of web content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130403