CN102375838A - Method and device for constructing polarity morpheme database, and method and device for determining polarity of words - Google Patents

Method and device for constructing polarity morpheme database, and method and device for determining polarity of words Download PDF

Info

Publication number
CN102375838A
CN102375838A CN2010102576351A CN201010257635A CN102375838A CN 102375838 A CN102375838 A CN 102375838A CN 2010102576351 A CN2010102576351 A CN 2010102576351A CN 201010257635 A CN201010257635 A CN 201010257635A CN 102375838 A CN102375838 A CN 102375838A
Authority
CN
China
Prior art keywords
polarity
speech
institute
morpheme database
predicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102576351A
Other languages
Chinese (zh)
Inventor
张洁
孟遥
于浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN2010102576351A priority Critical patent/CN102375838A/en
Publication of CN102375838A publication Critical patent/CN102375838A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for constructing a polarity morpheme database, and a method and a device for determining the polarity of words. The method for determining the polarity of the words comprises the following steps of: calculating mutual information between words of which the polarity is required to be determined and all the words of which the polarity is positive in a preset polarity morpheme database as well as mutual information between the words of which the polarity is required to be determined and all the words of which the polarity is negative in the polarity morpheme database; calculating the relevance between the words and all the words of which the polarity is positive in the polarity morpheme database according to the mutual information between the calculated words and all the words of which the polarity is positive in the polarity morpheme database, and calculating the relevance between the words and all the words of which the polarity is negative in the polarity morpheme database according to the mutual information between the calculated words and all the words of which the polarity is negative in the polarity morpheme database; and comparing the two kinds of relevance, and determining the polarity of the words according to a comparison result.

Description

Be used to make up the method and apparatus of the polarity of polarity morpheme database and definite speech
Technical field
Relate generally to word processing of the present invention.More specifically, the present invention relates to the judgement of the polarity of speech.
Background technology
The confirming of the polarity of speech (polarity) is widely used in the article classification, viewpoint is excavated and emotion analysis etc.In traditional method, for the performance of the polarity identification that improves speech, structure comprises the dictionary of a large amount of speech with manual mark part of speech.It is time-consuming and cost is high to construct such dictionary.
Summary of the invention
According to a first aspect of the invention, a kind of method that is used to make up polarity morpheme database is provided, has comprised: from corpus and/or word storehouse, extracted monosyllable with polarity; And to having the monosyllable mark polarity of polarity, the monosyllable that is marked polarity constitutes polarity morpheme database.
According to a second aspect of the invention, a kind of method that is used for the polarity of definite speech is provided, has comprised:
For the speech that will confirm polarity, calculate speech and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense; It according to each polarity in speech that is calculated and the polarity morpheme database mutual information between the speech of commendation; Calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of commendation; And be the mutual information between the speech of derogatory sense according to each polarity in speech that is calculated and the polarity morpheme database, calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of derogatory sense; And to all polarity in speech and the polarity morpheme database be relevance and all polarity in speech and the polarity morpheme database between the speech of commendation be derogatory sense speech between relevance compare, judge the polarity of speech according to comparative result.
According to a third aspect of the invention we, a kind of device that is used to make up polarity morpheme database is provided, has comprised: extraction unit is arranged to the monosyllable that extraction has polarity from corpus and/or word storehouse; And the mark unit, being arranged to having the monosyllable mark polarity of polarity, the monosyllable that is marked polarity constitutes polarity morpheme database.
According to a forth aspect of the invention; A kind of device that is used for the polarity of definite speech is provided; Comprise: the mutual information computing unit; Be arranged to for the speech that will confirm polarity, calculate speech and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense; The relevance computing unit; Be arranged to the mutual information between the speech that the speech that calculated according to the mutual information computing unit and each polarity in the polarity morpheme database is commendation; Calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of commendation; And the speech that is calculated according to the mutual information computing unit and each polarity in the polarity morpheme database be derogatory sense speech between mutual information, calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of derogatory sense; And identifying unit; The relevance that is arranged between the speech that relevance and all polarity in speech and the polarity morpheme database between the speech that speech that the relevance computing unit is calculated and all polarity in the polarity morpheme database are commendation are derogatory sense compares, and judges the polarity of speech according to comparative result.
According to other embodiments of the invention, corresponding computer readable storage medium and computer program are provided also.
According to embodiments of the invention, can make up the polarity of polarity morpheme database and definite speech effectively.
Through below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, these and other advantage of the present invention will be more obvious.
Description of drawings
With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose, characteristics and advantage of the present invention to the embodiment of the invention with being more prone to.Parts in the accompanying drawing are just in order to illustrate principle of the present invention.In the accompanying drawings, identical or similar techniques characteristic or parts will adopt identical or similar Reference numeral to represent.In the accompanying drawing:
Fig. 1 shows the process flow diagram of the method for structure polarity morpheme database according to an embodiment of the invention;
Fig. 2 shows the process flow diagram of the method for structure polarity morpheme database according to another embodiment of the invention;
Fig. 3 shows the process flow diagram of the method for structure polarity morpheme database according to another embodiment of the invention;
Fig. 4 shows the process flow diagram of the method for structure polarity morpheme database according to another embodiment of the invention;
Fig. 5 shows the process flow diagram of the method for the polarity that is used for confirming speech according to an embodiment of the invention;
Fig. 6 shows the process flow diagram of the method for the polarity that is used for definite speech according to another embodiment of the invention;
Fig. 7 shows the block diagram that is used to make up the device of polarity morpheme database according to an embodiment of the invention;
Fig. 8 shows the block diagram of device that is used to make up polarity morpheme database according to another embodiment of the invention;
Fig. 9 shows the block diagram of the device of the polarity that is used for confirming speech according to an embodiment of the invention;
Figure 10 shows the block diagram of the device of the polarity that is used for definite speech according to another embodiment of the invention; And
Figure 11 shows the schematic block diagram that can be used for implementing according to the computing machine of the method and apparatus of the embodiment of the invention.
Embodiment
To combine accompanying drawing that example embodiment of the present invention is described hereinafter.In order to know and for simplicity, in instructions, not describe all characteristics of actual embodiment.Yet; Should understand; In the process of any this practical embodiments of exploitation, must make a lot of decisions, so that realize developer's objectives, for example specific to embodiment; Meet and system and professional those relevant restrictive conditions, and these restrictive conditions may change along with the difference of embodiment to some extent.In addition, might be very complicated and time-consuming though will also be appreciated that development, concerning the those skilled in the art that have benefited from present disclosure, this development only is customary task.
At this; What also need explain a bit is; For fear of having blured the present invention because of unnecessary details; In accompanying drawing and explanation, only described and closely-related apparatus structure of scheme according to the present invention and/or treatment step, and omitted to the expression and the description of relation of the present invention parts little, that those of ordinary skills are known and processing.
In following part is described, be that example has been introduced embodiments of the invention with Chinese.But the invention is not restricted to this.The present invention also goes for and the similar language of Chinese.
Fig. 1 shows the process flow diagram of method of the structure polarity morpheme database of an embodiment.
In step S101, from corpus and/or word storehouse, extract monosyllable with polarity.In step S102, to having the monosyllable mark polarity of polarity, the monosyllable that is marked polarity can constitute polarity morpheme database.
Corpus can be any set that comprises a large amount of language materials (sentence).The word storehouse can be the dictionary that comprises a large amount of words.
In step S102, can come to be monosyllable mark polarity through the mode of inquiring about existing polarity morpheme database; Be commendation (positive) or derogatory sense (negative), also can mark polarity through monosyllable that shows extraction and the mode that receives this monosyllabic polarity of operator's input.
In the present embodiment, only utilize monosyllable to constitute polarity morpheme database with polarity.Because it is less relatively to have the number of monosyllable of polarity, and it is huge to have the disyllabic word or the polysyllabic word quantity of polarity.In addition, disyllabic word or the polysyllabic word of the overwhelming majority with polarity can be confirmed its polarity according to wherein included monosyllable.That is to say, usually can confirm the part of speech of disyllabic word or polysyllabic word by the polarity of monosyllable.Therefore have the monosyllabic polarity morpheme database of fewer purpose through structure, can save time and cost.
Fig. 2 shows the process flow diagram of the method for structure polarity morpheme database according to another embodiment of the invention.
In step S201, from corpus and/or word storehouse, extract monosyllable with polarity.
In step S202, from monosyllable, select the monosyllable commonly used of predetermined quantity with polarity.
In step S203, to the monosyllable mark polarity commonly used of selected said predetermined quantity.Monosyllable by mark polarity can constitute polarity morpheme database.
The embodiment of Fig. 2 and the difference of Fig. 1 are, only select the monosyllable commonly used of predetermined quantity to mark.In one example, step S202 can comprise show to extract monosyllabic and receive the operator to this monosyllable whether be the input of the monosyllable used always.In another example, step S202 can comprise the frequency that occurs according to monosyllable, selects the monosyllable of the highest predetermined quantity of the frequency of occurrences.The frequency that monosyllabic word occurs can obtain through the existing statistics of inquiry, perhaps can in step S201, add up the frequency that monosyllable with polarity occurs in corpus and/or word storehouse.
In the present embodiment, through only selecting the monosyllable commonly used of predetermined quantity, can further reduce the scale of polarity morpheme database, thereby can further save time and cost.
Fig. 3 shows the process flow diagram of the method for structure polarity morpheme database according to an embodiment of the invention.
In step S301, the disyllabic word in the disyllabic word repertorie is cut into monosyllable.
In step S302, analyze each monosyllabic part of speech of institute's cutting.
In step S303,, from the monosyllable of cutting, select the monosyllable with polarity of non-semantic core according to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in disyllabic word of institute's cutting.
In step S304, to having the monosyllable mark polarity of polarity, the monosyllable that is marked polarity can constitute polarity morpheme database.
In the present embodiment, from the disyllabic word repertorie, extract monosyllable with polarity.The disyllabic word repertorie can be any existing disyllabic word repertorie, can be from general word and phrase database, to choose disyllabic word and the disyllabic word repertorie that constitutes.
In step S301, it is that those skilled in the art can realize that disyllabic word is cut into monosyllable.Here be not described in detail.
In step S302, can adopt various known part of speech analytical approachs to analyze each monosyllabic part of speech of institute's cutting.For example, for disyllabic word " poor quality ", it is carried out the part of speech analysis can obtain " [bad .a.] [matter .n.] ", promptly expression " bad " is an adjective, and " matter " is noun.
In step S303,, from the monosyllable of cutting, select the monosyllable with polarity of non-semantic core according to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in disyllabic word of institute's cutting.Can adopt the whole bag of tricks to confirm which monosyllable is the monosyllable with polarity in the disyllabic word.For example, for " [bad .a.] [matter .n.] ", because first monosyllable is an adjective, second monosyllable is noun, is the monosyllable with polarity so can confirm " bad ".Here, in disyllabic word " poor quality ", " matter " is semantic core, and the semantic core of " bad " right and wrong.Note, might one two monosyllables in disyllabic word monosyllable of not being considered to have polarity.Perhaps, might one two monosyllables that monosyllable all is considered to have polarity in the disyllabic word.Those skilled in the art it is also contemplated that other and confirms to have the rule of the monosyllable of polarity, is not described in detail here.
Step S103 among step S304 and Fig. 1 is similar, is not described in detail here.
Similar with the embodiment among Fig. 2, in the embodiments of figure 3, also can only select the monosyllable of predetermined quantity to constitute polarity morpheme database.
Fig. 4 shows the process flow diagram of the method for structure polarity morpheme database according to another embodiment of the invention.
In step S401, the statement in the corpus is carried out word segmentation processing.
In step S402, the part of speech of each speech in the statement behind the analysis participle.
In step S403,, select the monosyllable with polarity of non-semantic core in each speech the statement behind participle according to the part of speech and the relative position of each speech in the statement behind the participle.
In step S404, to having the monosyllable mark polarity of polarity, the monosyllable that is marked polarity can constitute polarity morpheme database.
In step S401, those skilled in the art can adopt the whole bag of tricks to carry out word segmentation processing.In the present embodiment, be monosyllable and/or disyllabic word generally with the statement participle.For example, can " from the result, the obtainable colour temperature of AWB be very accurately with statement." participle for " from/result// ,/automatically/white/balance/institute/ability/acquisition// look/temperature/be/very/accurate//./”。
In step S402, can adopt the part of speech of each speech in the statement after various known part of speech analytical approachs are analyzed participle.For example, to " from/result// ,/automatically/white/balance/institute/ability/acquisition// look/temperature/be/very/accurately//./ " carry out the part of speech analysis can obtain " from/p result/n/u ,/w automatically/d is white/d balance/a institute/u ability/v acquisition/v /u look/n temperature/Ng is/v very/d is accurate/a/u./ w ", wherein p representes preposition, and n representes noun, and u representes other auxiliary words, and w representes punctuation mark, and d representes adverbial word, and a representes adjective, and v representes verb, Ng representes a part of speech morpheme.
In step S403, at first can select speech in each speech the statement behind participle according to the part of speech and the relative position of each speech in the statement behind the participle with polarity.For example " very " is adverbial word, and thereafter " accurately " is adjective, can " accurately " be confirmed as the speech with polarity.Then; If determined speech with polarity is disyllabic word or polysyllabic word; Can be monosyllabic with its cutting; According to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in disyllabic word of institute's cutting, from the monosyllable of cutting, select the monosyllable with polarity of non-semantic core.For example can " standard " in " accurately " be confirmed as the monosyllable with polarity of non-semantic core.
Step S203 among step S404 and Fig. 2 is similar, is not described in detail here.
Similar with the embodiment among Fig. 2, in the embodiment of Fig. 4, also can only select the monosyllable of predetermined quantity to constitute polarity morpheme database.
In one embodiment, can Fig. 3 and Fig. 4 be combined, what be about in step 303 and step 304, to obtain has the monosyllabic combined of polarity, and it is marked polarity, thus formation polarity morpheme database.
Fig. 5 shows the process flow diagram of the method for the polarity that is used for confirming speech according to an embodiment of the invention.
In step S501,, calculate each polarity in this speech and the predetermined polarity morpheme database and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense for the speech that will confirm polarity.
In step S502; It according to each polarity in speech that is calculated and the polarity morpheme database mutual information between the speech of commendation; Calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of commendation; And be the mutual information between the speech of derogatory sense according to each polarity in speech that is calculated and the polarity morpheme database, calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of derogatory sense.
In step S503, to all polarity in speech and the polarity morpheme database be relevance and all polarity in speech and the polarity morpheme database between the speech of commendation be derogatory sense speech between relevance compare, judge the polarity of speech according to comparative result.
In step S501, can calculate the mutual information between speech and the speech in every way.
For example, can adopt following formula to calculate mutual information:
MI(w1,w2)=2p(w1,w2)/[p(w1)+p(w2)]
Wherein w1 and w2 indicate to calculate two speech of mutual information, p (w 1, w 2) be w 1With w 2The number of times of co-occurrence, p (w 1) expression w 1The number of times that occurs, p (w 2) expression w 2The number of times that occurs, MI (w1, w2) mutual information between expression w1 and the w2.P (w 1, w 2), p (w 1) and p (w 2) can from various existing statisticses, obtain.
In addition, can adopt pointwise mutual information PMI (Pointwise Mutual Information).Calculate mutual information:
MI ( w 1 , w 2 ) = log 2 p ( w 1 , w 2 ) p ( w 1 ) p ( w 2 )
Wherein w1 and w2 indicate to calculate two speech of mutual information, p (w 1, w 2) be w 1With w 2The number of times of co-occurrence, p (w 1) expression w 1The number of times that occurs, p (w 2) expression w 2The number of times that occurs, MI (w1, w2) mutual information between expression w1 and the w2.P (w 1, w 2), p (w 1) and p (w 2) can from various existing statisticses, obtain.
In step S502; In one example; Speech and all polarity in the polarity morpheme database are that the relevance between the speech of commendation can be that mutual information between the speech of commendation is directly proportional with speech and each polarity in the polarity morpheme database, and speech and all polarity in the polarity morpheme database are that the relevance meter between the speech of derogatory sense can be that mutual information between the speech of derogatory sense is directly proportional with speech and each polarity in the polarity morpheme database.
In step S503, can grammatical term for the character and polarity morpheme database in all polarity be relevance and all polarity in speech and the polarity morpheme database between the speech of commendation be derogatory sense speech between relevance the two who big.If all polarity in this speech and the polarity morpheme database are that the relevance between the speech of commendation is bigger, judge that then this speech is a commendation.If all polarity in this speech and the polarity morpheme database are that the relevance between the speech of derogatory sense is bigger, judge that then this speech is a derogatory sense.If the two equates, can judge that this speech is nonpolarity or be neutral words.
Fig. 6 shows the process flow diagram of the method for the polarity that is used for definite speech according to another embodiment of the invention.
In step S601, calculate speech and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense.
In step S602, the mutual information between the speech that the speech that calculated and each polarity in the polarity morpheme database are commendation is sued for peace, and obtains first summed result.
In step S603, the mutual information between the speech that the speech that calculated and each polarity in the polarity morpheme database are derogatory sense is sued for peace, and obtains second summed result.
In step S604, if first summed result is greater than second summed result then judge that the polarity of speech is commendation, if first summed result is less than second summed result then judge that the polarity of speech is derogatory sense.
Step S501 among step S601 and Fig. 5 is similar, is not described in detail here.
In step S602, utilize following formula to calculate first summed result:
MI 1 = Σ pw ∈ p MI ( w , pw )
Wherein w indicates to confirm the speech of polarity, and the polarity in the polarity morpheme database that pw representes to be scheduled to is the speech of commendation, and p representes that the polarity in the polarity morpheme database is the set of the speech of commendation, and MI1 representes first summed result.
In step S603, utilize following formula to calculate second summed result:
MI 2 = Σ nw ∈ n MI ( w , nw )
Wherein w indicates to confirm the speech of polarity, and the polarity in the polarity morpheme database that nw representes to be scheduled to is the speech of derogatory sense, and n representes that the polarity in the polarity morpheme database is the set of the speech of derogatory sense, and MI2 representes second summed result.
In step S604, if MI1 is greater than MI2 then judge that the polarity of speech is commendation, if MI1 less than MI2 then judge that the polarity of speech is derogatory sense, if MI1 equals MI2, can judge that this speech is nonpolarity or be neutral words.
The polarity morpheme database that embodiment utilized of method that is used for confirming the polarity of speech at Fig. 5 and shown in Figure 6 can be the polarity morpheme database that in the embodiment like Fig. 1~shown in Figure 4, makes up.But embodiments of the invention are not limited thereto.In the embodiment of Fig. 5 and Fig. 6, also can adopt other polarity morpheme databases, for example both comprise the monosyllabic polarity morpheme database that also comprises polysyllabic word.
Fig. 7 shows the block diagram that is used to make up the device 700 of polarity morpheme database according to an embodiment of the invention.Device 700 comprises extraction unit 701 and mark unit 702.Extraction unit 701 is arranged to the monosyllable that extraction has polarity from corpus and/or word storehouse.Mark unit 702 is arranged to having the monosyllable mark polarity of polarity, and the monosyllable that is marked polarity constitutes said polarity morpheme database.
Alternatively, above-mentioned word storehouse is the disyllabic word repertorie, and extraction unit 701 comprises: the cutting module is arranged to the disyllabic word in the disyllabic word repertorie is cut into monosyllable; Analysis module is arranged to each monosyllabic part of speech of analyzing the cutting of cutting module institute; And selection module; Be arranged to according to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in said disyllabic word of institute's cutting, from the monosyllable of cutting module institute cutting, select the monosyllable with polarity of non-semantic core.
Alternatively, extraction unit 701 comprises: word-dividing mode is arranged to the statement in the said corpus is carried out word segmentation processing; Analysis module is arranged to the part of speech of each speech in the statement of analysis after by said word-dividing mode participle; And the selection module, be arranged to the part of speech and the relative position of each speech in the statement behind the participle of analyzing according to analysis module, from being selected the monosyllable with polarity of non-semantic core in each speech the statement behind the word-dividing mode participle.
Mark unit 702 can come to be monosyllable mark polarity through the mode of inquiring about existing polarity morpheme database.
Mark unit 702 also can mark polarity through monosyllable that shows extraction and the mode that receives this monosyllabic polarity of operator's input.At this moment, mark unit 702 can comprise load modules such as display module such as display or mouse.
Details about the operation of installing 700 various piece and function can be with reference to combining Fig. 1, and the embodiments of the invention of Fig. 3 and Fig. 4 description are not described in detail here.
Fig. 8 shows the block diagram of device 800 that is used to make up polarity morpheme database according to another embodiment of the invention.Device 800 comprises extraction unit 801, selected cell 802 and mark unit 803.Extraction unit 801 is arranged to the monosyllable that extraction has polarity from corpus and/or word storehouse.Selected cell 802 is arranged to the monosyllable commonly used of from the monosyllable with polarity, selecting predetermined quantity.Mark unit 803 is arranged to the monosyllable mark polarity commonly used to selected cell 802 selected said predetermined quantities, and the monosyllable that is marked polarity constitutes said polarity morpheme database.
Can be not described in detail here with reference to the embodiments of the invention that combine Fig. 2 to describe about the operation of installing 800 various piece and the details of function.
Fig. 9 shows the block diagram of the device 900 of the polarity that is used for confirming speech according to an embodiment of the invention.Device 900 comprises mutual information computing unit 901; Be arranged to for the speech that will confirm polarity, calculate speech and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense; Relevance computing unit 902; Be arranged to the mutual information between the speech that the speech that calculated according to the mutual information computing unit and each polarity in the polarity morpheme database is commendation; Calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of commendation; And the speech that is calculated according to the mutual information computing unit and each polarity in the polarity morpheme database be derogatory sense speech between mutual information, calculate all polarity in speech and the polarity morpheme database and be the relevance between the speech of derogatory sense; And identifying unit 903; The relevance that is arranged between the speech that relevance and all polarity in speech and the polarity morpheme database between the speech that speech that the relevance computing unit is calculated and all polarity in the polarity morpheme database are commendation are derogatory sense compares, and judges the polarity of speech according to comparative result.
Can be not described in detail here with reference to the embodiments of the invention that combine Fig. 5 to describe about the operation of installing 900 various piece and the details of function.
Figure 10 shows the block diagram of the device 1000 of the polarity that is used for definite speech according to another embodiment of the invention.Device 1000 comprises mutual information computing unit 1001, relevance computing unit 1002 and identifying unit 1005.
Mutual information computing unit 1001 is arranged to for the speech that will confirm polarity, calculates speech and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in speech and the polarity morpheme database between the speech of commendation is derogatory sense.
Relevance computing unit 1002 comprises first summation module 1003, and the mutual information that is arranged between the speech that speech that the mutual information computing unit is calculated and each polarity in the polarity morpheme database is commendation is sued for peace, and obtains first summed result; And second summation module 1004, the mutual information that is arranged between the speech that speech that the mutual information computing unit is calculated and each polarity in the polarity morpheme database is derogatory sense is sued for peace, and obtains second summed result.
Identifying unit 1005 is configured to: first summed result and second summed result that relevance computing unit 1002 is obtained compare; If first summed result is greater than second summed result then judge that the polarity of speech is commendation, if first summed result is less than second summed result then judge that the polarity of speech is derogatory sense.
Can be not described in detail here with reference to the embodiments of the invention that combine Fig. 6 to describe about the operation of installing 1000 various piece and the details of function.
Device 900 at Fig. 9 and Figure 10 can be the polarity morpheme database that makes up through like Fig. 7~device 700 shown in Figure 8 and device 800 with installing the polarity morpheme database that is utilized in 1000.But embodiments of the invention are not limited thereto.Device 900 also can adopt other polarity morpheme databases with installing in 1000, has for example both comprised the monosyllabic polarity morpheme database that also comprises polysyllabic word.
Figure 11 shows the schematic block diagram that can be used for implementing according to the computing machine of the method and apparatus of the embodiment of the invention.In Figure 11, CPU (CPU) 1101 carries out various processing according to program stored among ROM (read-only memory) (ROM) 1102 or from the program that storage area 1108 is loaded into random-access memory (ram) 1103.In RAM 1103, also store data required when CPU 1101 carries out various processing or the like as required.CPU 1101, ROM 1102 and RAM 1103 are connected to each other via bus 504.Input/output interface 1105 also is connected to bus 1104.
Following parts are connected to input/output interface 1105: importation 1106 (comprising keyboard, mouse or the like), output 1107 (comprise display; Such as cathode ray tube (CRT), LCD (LCD) etc. and loudspeaker etc.), storage area 1108 (comprising hard disk etc.), communications portion 1109 (comprising that NIC is such as LAN card, modulator-demodular unit etc.).Communications portion 1109 is handled such as the Internet executive communication via network.As required, driver 1110 also can be connected to input/output interface 1105.Detachable media 1111 can be installed on the driver 1110 such as disk, CD, magneto-optic disk, semiconductor memory or the like as required, makes the computer program of therefrom reading be installed to as required in the storage area 1108.
Realizing through software under the situation of above-mentioned series of processes, such as detachable media 1111 program that constitutes software is being installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 11 wherein having program stored therein, distribute so that the detachable media 1111 of program to be provided to the user with equipment with being separated.The example of detachable media 1111 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 1102, the storage area 1108 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.
The present invention also proposes a kind of program product that stores the instruction code of machine-readable.When said instruction code is read and carried out by machine, can carry out above-mentioned method according to the embodiment of the invention.
Correspondingly, the storage medium that is used for carrying the program product of the above-mentioned instruction code that stores machine-readable is also included within of the present invention open.Said storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick or the like.
In the above in the description to the specific embodiment of the invention; Characteristic to a kind of embodiment is described and/or illustrated can be used in one or more other embodiment with identical or similar mode; Combined with the characteristic in other embodiment, or substitute the characteristic in other embodiment.
Should stress that term " comprises/comprise " existence that when this paper uses, refers to characteristic, key element, step or assembly, but not get rid of the existence of one or more further feature, key element, step or assembly or additional.
In addition, the time sequencing of describing during method of the present invention is not limited to is to specifications carried out, also can according to other time sequencing ground, carry out concurrently or independently.The execution sequence of the method for therefore, describing in this instructions does not constitute restriction to technical scope of the present invention.
Although the present invention is disclosed above through description to specific embodiment of the present invention,, should be appreciated that all above-mentioned embodiment and example all are exemplary, and nonrestrictive.Those skilled in the art can be in the spirit of accompanying claims and scope design to various modifications of the present invention, improve or equivalent.These modifications, improvement or equivalent also should be believed to comprise in protection scope of the present invention.
About comprising the embodiment of above each embodiment, the remarks below also disclosing.
Remarks
1. method that is used to make up polarity morpheme database comprises:
From corpus and/or word storehouse, extract monosyllable with polarity; And
To said monosyllable mark polarity with polarity, said monosyllable by mark polarity constitutes said polarity morpheme database.
2. like remarks 1 described method, before said mark, also comprise the monosyllable commonly used of from said monosyllable with polarity, selecting predetermined quantity, and said mark comprises the monosyllable mark polarity commonly used to selected said predetermined quantity.
3. like remarks 1 described method, wherein said word storehouse is the disyllabic word repertorie, and the monosyllable that said extraction has polarity comprises:
Disyllabic word in the said disyllabic word repertorie is cut into monosyllable;
Analyze each monosyllabic part of speech of institute's cutting; And
According to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in said disyllabic word of institute's cutting, from the monosyllable of said cutting, select the monosyllable with polarity of non-semantic core.
4. like remarks 1 described method, the monosyllable that wherein said extraction has polarity comprises:
Statement in the said corpus is carried out word segmentation processing;
The part of speech of each speech in the statement behind the analysis participle; And
According to the part of speech and the relative position of each speech in the statement behind the participle, select the monosyllable with polarity of non-semantic core in each speech the statement behind participle.
5. method that is used for confirming the polarity of speech comprises:
For the speech that will confirm polarity, calculate institute's predicate and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in institute's predicate and the said polarity morpheme database between the speech of commendation is derogatory sense;
It according to each polarity in institute's predicate that is calculated and the said polarity morpheme database mutual information between the speech of commendation; Calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of commendation; And be the mutual information between the speech of derogatory sense according to each polarity in institute's predicate that is calculated and the said polarity morpheme database, calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of derogatory sense; And
To all polarity in institute's predicate and the said polarity morpheme database be relevance and all polarity in institute's predicate and the said polarity morpheme database between the speech of commendation be derogatory sense speech between relevance compare, judge the polarity of institute's predicate according to comparative result.
6. like remarks 5 described methods, the step of wherein said compute associations property comprises:
Mutual information between the speech that institute's predicate of being calculated and each polarity in the said polarity morpheme database are commendation is sued for peace, and obtains first summed result; And
Mutual information between the speech that institute's predicate of being calculated and each polarity in the said polarity morpheme database are derogatory sense is sued for peace, and obtains second summed result.
7. like remarks 6 described methods; The step of wherein said judgement comprises: said first summed result and second summed result are compared; If first summed result is greater than second summed result then judge that the polarity of institute's predicate is commendation, if first summed result is less than second summed result then judge that the polarity of institute's predicate is derogatory sense.
8. like remarks 5 described methods, wherein said polarity morpheme database is the polarity morpheme database that is made up by each described method among the remarks 1-4.
9. device that is used to make up polarity morpheme database comprises:
Extraction unit is arranged to the monosyllable that extraction has polarity from corpus and/or word storehouse; And
The mark unit is arranged to the monosyllable mark polarity that has polarity to said, and said monosyllable by mark polarity constitutes said polarity morpheme database.
10. like remarks 9 described devices, also comprise selected cell, be arranged to the monosyllable commonly used of from said monosyllable with polarity, selecting predetermined quantity; And said mark unit is arranged to the monosyllable mark polarity commonly used to the selected said predetermined quantity of said selected cell.
11. like remarks 9 described devices, wherein said word storehouse is the disyllabic word repertorie, and said extraction unit comprises:
The cutting module is arranged to the disyllabic word in the said disyllabic word repertorie is cut into monosyllable;
Analysis module is arranged to each monosyllabic part of speech of analyzing the cutting of cutting module institute; And
Select module; Be arranged to according to each the monosyllabic part of speech of institute's cutting and the relative position of each monosyllable in said disyllabic word of institute's cutting, from the monosyllable of cutting module institute cutting, select the monosyllable with polarity of non-semantic core.
12. like remarks 9 described devices, wherein said extraction unit comprises:
Word-dividing mode is arranged to the statement in the said corpus is carried out word segmentation processing;
Analysis module is arranged to the part of speech of each speech in the statement of analysis after by said word-dividing mode participle; And
Select module, be arranged to the part of speech and the relative position of each speech in the statement behind the participle of analyzing according to said analysis module, from being selected the monosyllable with polarity of non-semantic core in each speech the statement behind the said word-dividing mode participle.
13. a device that is used for the polarity of definite speech comprises:
The mutual information computing unit; Be arranged to for the speech that will confirm polarity, calculate institute's predicate and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in institute's predicate and the said polarity morpheme database between the speech of commendation is derogatory sense;
The relevance computing unit; Be arranged to the mutual information between the speech that institute's predicate of being calculated according to the mutual information computing unit and each polarity in the said polarity morpheme database is commendation; Calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of commendation; And the institute's predicate that is calculated according to the mutual information computing unit and each polarity in the said polarity morpheme database be derogatory sense speech between mutual information, calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of derogatory sense; And
Identifying unit; The relevance that is arranged between the speech that relevance and all polarity in institute's predicate and the said polarity morpheme database between the speech that institute's predicate that the relevance computing unit is calculated and all polarity in the said polarity morpheme database are commendation are derogatory sense compares, and judges the polarity of institute's predicate according to comparative result.
14. like remarks 13 described devices, wherein said relevance computing unit comprises:
First summation module, the mutual information that is arranged between the speech that institute's predicate that the mutual information computing unit is calculated and each polarity in the said polarity morpheme database is commendation is sued for peace, and obtains first summed result; And
Second summation module, the mutual information that is arranged between the speech that institute's predicate that the mutual information computing unit is calculated and each polarity in the said polarity morpheme database is derogatory sense is sued for peace, and obtains second summed result.
15. like remarks 14 described devices; Wherein said identifying unit is configured to: said first summed result and second summed result that said relevance computing unit is obtained compare; If first summed result is greater than second summed result then judge that the polarity of institute's predicate is commendation, if first summed result is less than second summed result then judge that the polarity of institute's predicate is derogatory sense.
16. like remarks 13 described devices, wherein said polarity morpheme database is the polarity morpheme database that is made up by each described device among the remarks 9-12.

Claims (10)

1. method that is used to make up polarity morpheme database comprises:
From corpus and/or word storehouse, extract monosyllable with polarity; And
To said monosyllable mark polarity with polarity, said monosyllable by mark polarity constitutes said polarity morpheme database.
2. method that is used for confirming the polarity of speech comprises:
For the speech that will confirm polarity, calculate institute's predicate and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in institute's predicate and the said polarity morpheme database between the speech of commendation is derogatory sense;
It according to each polarity in institute's predicate that is calculated and the said polarity morpheme database mutual information between the speech of commendation; Calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of commendation; And be the mutual information between the speech of derogatory sense according to each polarity in institute's predicate that is calculated and the said polarity morpheme database, calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of derogatory sense; And
To all polarity in institute's predicate and the said polarity morpheme database be relevance and all polarity in institute's predicate and the said polarity morpheme database between the speech of commendation be derogatory sense speech between relevance compare, judge the polarity of institute's predicate according to comparative result.
3. method as claimed in claim 2, the step of wherein said compute associations property comprises:
Mutual information between the speech that institute's predicate of being calculated and each polarity in the said polarity morpheme database are commendation is sued for peace, and obtains first summed result; And
Mutual information between the speech that institute's predicate of being calculated and each polarity in the said polarity morpheme database are derogatory sense is sued for peace, and obtains second summed result.
4. method as claimed in claim 3; The step of wherein said judgement comprises: said first summed result and second summed result are compared; If first summed result is greater than second summed result then judge that the polarity of institute's predicate is commendation, if first summed result is less than second summed result then judge that the polarity of institute's predicate is derogatory sense.
5. method as claimed in claim 2, wherein said polarity morpheme database are the polarity morpheme database that is made up by the described method of claim 1.
6. device that is used to make up polarity morpheme database comprises:
Extraction unit is arranged to the monosyllable that extraction has polarity from corpus and/or word storehouse; And
The mark unit is arranged to the monosyllable mark polarity that has polarity to said, and said monosyllable by mark polarity constitutes said polarity morpheme database.
7. device that is used for confirming the polarity of speech comprises:
The mutual information computing unit; Be arranged to for the speech that will confirm polarity, calculate institute's predicate and each polarity in the polarity morpheme database of being scheduled to and be the mutual information between the speech that mutual information and each polarity in institute's predicate and the said polarity morpheme database between the speech of commendation is derogatory sense;
The relevance computing unit; Be arranged to the mutual information between the speech that institute's predicate of being calculated according to the mutual information computing unit and each polarity in the said polarity morpheme database is commendation; Calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of commendation; And the institute's predicate that is calculated according to the mutual information computing unit and each polarity in the said polarity morpheme database be derogatory sense speech between mutual information, calculate all polarity in institute's predicate and the said polarity morpheme database and be the relevance between the speech of derogatory sense; And
Identifying unit; The relevance that is arranged between the speech that relevance and all polarity in institute's predicate and the said polarity morpheme database between the speech that institute's predicate that the relevance computing unit is calculated and all polarity in the said polarity morpheme database are commendation are derogatory sense compares, and judges the polarity of institute's predicate according to comparative result.
8. device as claimed in claim 7, wherein said relevance computing unit comprises:
First summation module, the mutual information that is arranged between the speech that institute's predicate that the mutual information computing unit is calculated and each polarity in the said polarity morpheme database is commendation is sued for peace, and obtains first summed result; And
Second summation module, the mutual information that is arranged between the speech that institute's predicate that the mutual information computing unit is calculated and each polarity in the said polarity morpheme database is derogatory sense is sued for peace, and obtains second summed result.
9. device as claimed in claim 8; Wherein said identifying unit is configured to: said first summed result and second summed result that said relevance computing unit is obtained compare; If first summed result is greater than second summed result then judge that the polarity of institute's predicate is commendation, if first summed result is less than second summed result then judge that the polarity of institute's predicate is derogatory sense.
10. device as claimed in claim 7, wherein said polarity morpheme database are the polarity morpheme database that is made up by the described device of claim 6.
CN2010102576351A 2010-08-17 2010-08-17 Method and device for constructing polarity morpheme database, and method and device for determining polarity of words Pending CN102375838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102576351A CN102375838A (en) 2010-08-17 2010-08-17 Method and device for constructing polarity morpheme database, and method and device for determining polarity of words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102576351A CN102375838A (en) 2010-08-17 2010-08-17 Method and device for constructing polarity morpheme database, and method and device for determining polarity of words

Publications (1)

Publication Number Publication Date
CN102375838A true CN102375838A (en) 2012-03-14

Family

ID=45794461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102576351A Pending CN102375838A (en) 2010-08-17 2010-08-17 Method and device for constructing polarity morpheme database, and method and device for determining polarity of words

Country Status (1)

Country Link
CN (1) CN102375838A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
WO2016197577A1 (en) * 2015-06-12 2016-12-15 百度在线网络技术(北京)有限公司 Method and apparatus for labelling comment information and computer device
CN109086285A (en) * 2017-06-14 2018-12-25 佛山辞荟源信息科技有限公司 Chinese intelligent processing method and system and device based on morpheme

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009163565A (en) * 2008-01-08 2009-07-23 Toyota Central R&D Labs Inc Sentence shaping device and sentence shaping program
CN101520778A (en) * 2008-02-27 2009-09-02 株式会社东芝 Apparatus and method for determing parts-of-speech in chinese
CN101751431A (en) * 2008-12-15 2010-06-23 北京大学 Method and device for positive and negative analysis of Chinese comments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009163565A (en) * 2008-01-08 2009-07-23 Toyota Central R&D Labs Inc Sentence shaping device and sentence shaping program
CN101520778A (en) * 2008-02-27 2009-09-02 株式会社东芝 Apparatus and method for determing parts-of-speech in chinese
CN101751431A (en) * 2008-12-15 2010-06-23 北京大学 Method and device for positive and negative analysis of Chinese comments

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟凡博等: "文本褒贬倾向判定系统的研究", 《小型微型计算机系统》 *
王素格: "基于Web的评论文本情感分类问题研究", 《中国博士学位论文全文数据库》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN102663139B (en) * 2012-05-07 2013-04-03 苏州大学 Method and system for constructing emotional dictionary
WO2016197577A1 (en) * 2015-06-12 2016-12-15 百度在线网络技术(北京)有限公司 Method and apparatus for labelling comment information and computer device
CN109086285A (en) * 2017-06-14 2018-12-25 佛山辞荟源信息科技有限公司 Chinese intelligent processing method and system and device based on morpheme
CN109086285B (en) * 2017-06-14 2021-10-15 佛山辞荟源信息科技有限公司 Intelligent Chinese processing method, system and device based on morphemes

Similar Documents

Publication Publication Date Title
CN103399901B (en) A kind of keyword abstraction method
Leopold et al. Detection of naming convention violations in process models for different languages
Napoles et al. Annotated gigaword
CN103493041B (en) Use the automatic sentence evaluation device of shallow parsing device automatic evaluation sentence and error-detecting facility thereof and method
CN102880600B (en) Based on the phrase semantic tendency Forecasting Methodology of world knowledge network
Furlan et al. Semantic similarity of short texts in languages with a deficient natural language processing support
CN106294466A (en) Disaggregated model construction method, disaggregated model build equipment and sorting technique
CN105095444A (en) Information acquisition method and device
CN105975625A (en) Chinglish inquiring correcting method and system oriented to English search engine
US20080168341A1 (en) Digital spreadsheet formula automation
US20090019358A1 (en) Extensible business reporting language (xbrl) enabler for business documents
WO2004061593A2 (en) Automated essay scoring
CN103324609A (en) Text proofreading apparatus and text proofreading method
CN102609406B (en) Learning device, judgment means, learning method and determination methods
CN101833555A (en) Information extraction method and device
US20160048768A1 (en) Topic Model For Comments Analysis And Use Thereof
CN109635297A (en) A kind of entity disambiguation method, device, computer installation and computer storage medium
CN101013422A (en) Language information translating device and method
CN101976394B (en) Data acquiring and counting system and method
CN104778186A (en) Method and system for hanging commodity object to standard product unit (SPU)
CN102567306A (en) Acquisition method and acquisition system for similarity of vocabularies between different languages
CN103678371B (en) Word library updating device, data integration device and method and electronic equipment
CN103514151A (en) Dependency grammar analysis method and device and auxiliary classifier training method
CN102375838A (en) Method and device for constructing polarity morpheme database, and method and device for determining polarity of words
Mansouri et al. State-of-the-art english to persian statistical machine translation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120314