WO2013128984A1 - 述語テンプレート収集装置、特定フレーズペア収集装置、及びそれらのためのコンピュータプログラム - Google Patents
述語テンプレート収集装置、特定フレーズペア収集装置、及びそれらのためのコンピュータプログラム Download PDFInfo
- Publication number
- WO2013128984A1 WO2013128984A1 PCT/JP2013/051326 JP2013051326W WO2013128984A1 WO 2013128984 A1 WO2013128984 A1 WO 2013128984A1 JP 2013051326 W JP2013051326 W JP 2013051326W WO 2013128984 A1 WO2013128984 A1 WO 2013128984A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- template
- pair
- predicate
- noun
- phrase
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present invention relates to a technique for recognizing relationships between natural language sentences or phrases, and more particularly to a technique for automatically recognizing and collecting expressions relating to causal relations, contradictions, etc. between sentences or phrases. About.
- Non-Patent Documents 1 to 7 There are those described in Non-Patent Documents 1 to 7 below as conventional techniques related to the components of such a technique.
- Non-Patent Document 1 As a technology for acquiring a new causal relationship by machine learning from a large amount of causal relationship examples prepared manually, there is a technology described in Non-Patent Document 1. As an example in Japanese, there is one that automatically recognizes the relationship between phrases based on the appearance of a conjunction in the text such as “for” and “no” that explicitly indicate the causal relationship (Non-patent Document 2).
- Non-Patent Document 6 Hypothesis Generation Method Using Language
- the object of the present invention is to automatically recognize the logical relationships between phrases, such as causal relationships, contradictory relationships, etc. It is to provide a predicate template collection device that can be recognized with high accuracy.
- Another object of the present invention is to provide an apparatus capable of automatically and efficiently recognizing a phrase pair having a specific relationship such as a causal relationship or a contradiction relationship.
- the predicate template collection device is a predicate template collection device for collecting predicate templates from a set of predetermined sentences.
- the predicate template is a phrase that is combined with a noun.
- the predicate template can be assigned an activity value indicating the direction and magnitude of the activity according to the classification of activity, inactivity, and neutrality.
- Activity refers to describing an event in a direction in which the function or effect of the target pointed to by the noun linked to the predicate template is exhibited.
- “Inactive” indicates that an event in a direction in which the function or effect of the target indicated by the noun linked to the predicate template is not exhibited is described.
- Neutral indicates a predicate template that is neither active nor inactive.
- the distinction between active and inactive for a predicate template is called the predicate template polarity.
- the predicate template collection device includes a conjunction storage unit that stores conjunctions classified as forward or reverse, and a seed template storage unit that stores a seed template serving as a starting point for constructing a predicate template network. Each seed template is given a polarity and an activity value. Therefore, it can be said that the polarity of the predicate template indicates the sign of the activation value of the predicate template.
- the apparatus further includes noun pair collecting means for collecting noun pairs satisfying a certain relationship from a predetermined corpus and classifying the polarity of the relationship between nouns constituting each noun pair as positive or negative.
- the polarity of the relationship between nouns constituting a noun pair is defined as positive when the object indicated by one of the noun pairs promotes the appearance of the object indicated by the other, and negative when suppressed.
- the apparatus further collects predicate template pairs co-occurring with the noun pairs collected by the noun pair collecting means from a predetermined corpus, and for each collected predicate template pair, the noun pair co-occurs with the predicate template pair.
- Predicate template pair collection means and predicate template pair collection means for determining whether the activity / inactivity of the predicate template pair is the same or opposite based on the polarity of the relationship between the predicate template pair and the conjunction connecting the predicate template pair
- the predicate template pairs A link with the relationship between predicate templates that make up Based on the construction means for constructing the plate network and the activity value assigned in advance to the node corresponding to the seed template in the template network, the relation between the nodes in the template network is used to assign to each node.
- Active value calculating means for calculating a power value and adding the calculated activity value to a predicate template corresponding to each node and outputting the result.
- the noun pair collecting means collects a noun pair co-occurring with the predicate template pair from a predetermined corpus using a conjunction stored in the conjunction storage unit and a seed template stored in the seed template storage unit. And means for classifying the polarity of the relationship between the nouns constituting each noun pair as positive or negative.
- the means for classifying is a noun pair co-occurring with the predicate template pair using a conjunction stored in the conjunction storage and a seed template stored in the seed template storage, Are collected from the corpus, and the polarity of the relationship between the nouns constituting each noun pair is classified as positive or negative.
- the collecting means uses the conjunction stored in the conjunction storage unit and the seed template stored in the seed template storage unit at a frequency equal to or higher than a predetermined frequency in the predicate template pair and the corpus. Includes means for collecting co-occurring noun pairs from the corpus.
- the polarity determining means includes, for each noun pair collected by the collecting means, the polarity of the predicate template pair of the predicate template co-occurring with the noun pair, and the noun pair and the predicate template. And a means for determining the polarity of the relationship between nouns constituting each of the noun pairs, and a means for determining each of the noun pairs based on the types of conjunctions connecting the phrase pairs. And a means for counting the polarities between the nouns for each type of noun pair and determining the polarity between them for each type of noun pair by majority vote.
- the predicate template collection device further determines whether an end condition for the process of calculating the activity value of the predicate template is satisfied in response to completion of the output of the predicate template by the activity value calculation means. And the absolute value of the activity value of the predicate template calculated by the activity value calculation unit is greater than or equal to the threshold value in response to the determination unit determining that the end condition is not satisfied by the determination unit.
- a new seed template consisting of a predicate template is selected, and the update means for updating the stored contents of the seed template storage unit with the newly selected seed template, and responding to the update by the update means.
- the constructing unit is a unit for adding a node corresponding to the predicate template when a node corresponding to the predicate template forming the predicate template pair collected by the predicate template pair collecting unit does not exist in the template network.
- link means for generating a link between the predicate templates forming the predicate template pair collected by the predicate template pair collecting means.
- the link means assigns an attribute indicating activity match or mismatch to each link according to whether the activity of the predicate templates connected by each link is the same.
- the constructing means further includes weight assigning means for assigning a weight that is a function of the number of links with other nodes to each link generated by the link means. The sign of the weight assigned by the weight assigning unit is different depending on whether the attribute of the link is a value indicating a match or a value indicating a mismatch.
- Activity value calculation means is a function of the weight of each link in the template network and the activity value assigned to each node.
- Means may be included for estimating the activity value assigned to each node in the template network by optimizing the value of the function defined by. However, among the seed templates, those having active polarity are given a positive polarity and activity value, and those having inactivity are given a negative polarity and activity value.
- the computer program executable by the computer according to the second aspect of the present invention causes the computer to function as any of the predicate template collection devices described above.
- the specific phrase pair collection device includes any one of the above predicate template collection device, a predicate template storage unit for storing the predicate template collected by the predicate template collection device, and a predicate Phrase for collecting a phrase pair including a predicate template pair consisting of a combination of specific active / inactive predicate templates and a specific type of conjunction among predicate templates stored in the template storage means from a predetermined corpus Extract a pair collection means, a noun pair co-occurring with a predicate template in the phrase pair collected by the phrase pair collection means, and a specific combination of the polarities of the predicate template in the phrase pair To select phrase pairs that express a given relationship And a phrase selection means.
- the specific phrase pair collection device further includes, for each phrase pair selected by the phrase selecting unit, an activity value of a predicate template constituting each phrase pair and a noun pair included in the phrase pair in the corpus.
- the score calculation means for calculating a score representing the strength of the predetermined relationship
- the phrase pair selected by the phrase selection means in the order of the scores calculated by the score calculation means Means for aligning.
- a causal relationship in which one phrase causes the other phrase an inconsistent relationship indicating content in which one phrase and the other phrase contradict each other, or a causal relationship in the corpus
- a causal hypothesis As the predetermined relationship, a causal relationship in which one phrase causes the other phrase, an inconsistent relationship indicating content in which one phrase and the other phrase contradict each other, or a causal relationship in the corpus
- FIG. 1 It is a block diagram of the contradiction expression collection system which concerns on the 1st Embodiment of this invention. It is a more detailed block diagram of the template DB construction apparatus shown in FIG. It is a figure which shows the structure of a template network typically. It is a flowchart which shows the control structure of the program which implement
- FIG. 8 is a flowchart showing a control structure of a program part for realizing a process of selecting a contradictory phrase pair in the program showing the control structure in FIG. 7. It is a flowchart which shows the control structure of the program which ranks the contradictory phrase pair among the programs which show a control structure in FIG. It is a flowchart which shows the control structure of the program for implement
- FIG. 16 is a block diagram illustrating a hardware configuration of the computer illustrated in FIG. 15.
- phrase in which a noun and a verb (or a predicate such as an adjective or an adjective verb) are connected through a particle is called a “phrase”.
- phrase template a combination of a particle and a predicate in a phrase (for example, ⁇ , eat>) is called a “predicate template”.
- template As already described, one noun and one verb, an adjective or an adjective verb connected through a particle is called a “predicate template”. In the present embodiment, the predicate template is classified as active, inactive, or neutral.
- predicate templates appearing in text that is input to the entire system are classified into three types shown in Table 1 below.
- the above classification is automatically calculated from the text. At that time, a positive activity value is assigned to the active template, and a negative activity value is assigned to the inactive template. A specific method for calculating the activity value will be described later.
- a neutral verb means an absolute value of an activity value obtained as a result of calculation that is below a certain threshold value.
- both of the above-mentioned classification relating to “activity”, “inactivity”, and “neutral” and the activity value assigned to each template are collectively referred to as “polarity” of the predicate template. An example is given in Table 2 below.
- the following table 5 shows examples of these relationships.
- the noun pair ⁇ earthquake, tsunami> has a positive causal relationship
- ⁇ anticancer drug, cancer> has a negative causal relationship.
- both ⁇ prescribe (vaccine) '' and ⁇ inject (vaccine) '' are positive in polarity
- both ⁇ (earthquake) occurs '' and ⁇ (earthquake) occur '' are both polarities Is positive.
- the activity values are not necessarily the same.
- a large amount of expressions such as the above-described text, that is, two predicate templates each connected to a noun and connected to each other by a conjunctive or paradoxical conjunction from the Internet. collect.
- a network of predicate templates is created by establishing links between predicate templates connected by these conjunctions and between predicate template pairs having synonym / entailment relationships. As described above, information on whether the polarities of the predicate templates are the same is given to the link as an attribute.
- An activity value of +1 or ⁇ 1 is manually assigned to a small number of predicate templates on this network in advance. On the network, it is possible to define energy similar to the energy of electron spin in quantum mechanics, as will be described later.
- Non-Patent Document 8 for the calculation method of the activity value.
- activity value assignment algorithm described below is merely an example, and there may be other methods for obtaining a specific activity value based on an energy function considering the linguistic constraints.
- ⁇ te does not necessarily represent a causal relationship.
- the causal relationship between phrases can be obtained with high accuracy.
- a causal relationship for example, it can be predicted that there is a possibility of being hit by a tsunami from information that “an earthquake has occurred”.
- the causal relationship acquired in this way is a big factor for a very important technique of automatically acquiring a semantic relationship between phrases.
- predicate template pairs Removing common nouns from these phrase pairs leaves a predicate template pair. They are also likely to contradict each other. If such a predicate template is made into a database, it can be used as a useful dictionary about anomaly. In the above example, it can be collected as predicate template pairs that have opposite meanings such as “suck ...” and “stop”.
- material relations can be automatically obtained by acquiring semantic relations between nouns based on patterns (existing technology can be used for this) from the base text. That is, the relationship between the product B and the material A can be automatically acquired by a pattern such as “Make B with A”. As a result of this automatic acquisition, it is assumed that information that the material of the product “carbide tool” is “tungsten” can be acquired.
- a predicate template that frequently co-occurs with a pair of nouns that have been found to have this material relationship that is, a pair of nouns consisting of “carbide tool” and “tungsten”, and the activity values of each other. Is selected so that the product of is positive and the absolute value of the activity value is large.
- Each selected predicate template is aligned with a pair of nouns. Then, for example, between a verb phrase consisting of a predicate (verb) and a noun, such as “import tungsten (active value positive) and manufacture a cemented carbide tool (active value positive)”. ) Causal relationship can be acquired. The causal relationship here is that "importing tungsten” is for “manufacturing carbide tools”.
- each of the predicate templates is inconsistent with the predicate template for the causal relationship between the predicate and the noun phrase including those acquired as described above.
- the hypothesis generation as described above starts only from information described in a simple and frequent pattern such as “making a carbide tool from tungsten” at the beginning.
- information on the activity / inactivity of the predicate template is extracted from a text having no relationship with the carbide tool or tungsten.
- a causal hypothesis is generated.
- the only reference to the cemented carbide tool and tungsten in the input text is the expression “make the cemented carbide tool from tungsten”, it is possible to generate a hypothesis as described above.
- this technology has a wide range of applications and becomes a core technology for advanced use of information.
- the first embodiment relates to a system that automatically collects contradictory expressions, which is one of the logical relationships between phrases, using the phrase pair extraction technique described above.
- the second embodiment relates to a system that applies the phrase pair extraction technique described above to acquisition of a causal relationship, which is another example of a logical relationship between phrases.
- the third and fourth embodiments relate to a system for generating a causal relationship hypothesis.
- the contradiction expression collection system 30 includes a template (this is referred to as a “seed template”) among the predicate templates described above, which is a core for building a template network.
- the contradiction expression collection device 36 and the contradiction expression collected by the contradiction expression collection device 36 are stored. And a conflict representation memory 38 in order.
- the contradiction expression collection device 36 is connected to the seed template storage device 32, the conjunction storage unit 34, and the Internet 40, collects a large number of phrase pairs from a virtual corpus on the Internet 40, and extracts a large amount of predicate templates from them.
- the templates stored in the seed template storage device 32 are given positive or negative activation values in advance according to the activation / deactivation of the templates. At the beginning of the process described below, these values are +1 and ⁇ 1, respectively.
- template DB construction device 60 is connected to seed template storage device 32 and conjunction storage unit 34, and stores all templates stored in seed template storage device 32 and conjunction storage unit 34.
- a template pair storage unit for storing the template pairs generated by the template pair generation unit 90.
- the template pair generation unit 90 generates all combinations in which two templates are connected by a conjunction. 92.
- An example of the shape of the template pair generated by the template pair generation unit 90 is as follows.
- the template DB construction device 60 further includes, for each template pair stored in the template pair storage unit 92, a noun pair collection unit 94 for collecting noun pairs co-occurring with the template pair from the Internet 40, and a noun pair.
- a noun pair storage unit 96 for storing the noun pairs collected by the collection unit 94 and a noun pair connected to the noun pair storage unit 96 and included in each of the noun pairs stored in the noun pair storage unit 96
- a noun pair polarity determination unit 98 In order to determine the relationship based on the polarity of the predicate template co-occurring with those nouns and the type of the conjunction stored in the conjunction storage unit 34, and to add a tag indicating the relationship to each noun pair.
- the noun pair polarity determination unit 98 determines the relationship between nouns constituting the noun pair according to the method shown in Table 9 below.
- the two predicate templates have the same polarity, and they are connected by a conjunctive connective, the relationship between the noun pairs co-occurring with them is positive.
- the two predicate templates have the same polarity. When these are connected by reverse conjuncts, the relationship between the noun pairs co-occurring with them is negative.
- the two predicate templates are opposite in polarity, and they are connected by the conjunctive connective.
- the two predicate templates are opposite in polarity and are connected by a conjunctive conjunction, they are co-occurring noun pairs.
- the template DB construction device 60 is connected to the noun pair storage unit 96, and for each of the noun pairs to which the relation tag is attached by the noun pair polarity determination unit 98, the template pair co-occurring with them is interned.
- a template pair collection unit 100 for collecting from the net 40
- a template pair storage unit 102 for storing the template pairs collected by the template pair collection unit 100 in association with noun pairs co-occurring with them, and a template pair
- the relationship between the noun pairs that co-occur with the template pair as to whether or not the templates constituting the template pair have the same activity / inactivity (whether they match) (Positive / negative) and a template activity match determination unit 104 for making a determination based on whether the conjunction connecting the templates is forward or reverse, and assigning the result as a tag to each template pair.
- Whether the template pair is active / inactive can be determined by the method shown below. As shown in Table 9, the relationship of the noun pair ⁇ earthquake, tsunami> is positive, the relationship of the noun pair ⁇ salivation, dry mouth> is negative, and the relationship of the noun pair ⁇ acetaldehyde, liver disorder> is positive.
- the activity of a template pair that co-occurs with a positive noun pair and connected by a forward conjunction is the same
- Co-occurs with a positive noun pair and connected by a reverse conjunctive The activity of the template pair is opposite (3) co-occurs with a noun pair whose relationship is negative, and the activity of the template pair connected by the forward conjunctive is opposite (4) It co-occurs with a noun pair whose relationship is negative
- the activity of the template pair connected by the reverse connection connector is the same.
- the template DB construction device 60 further constructs a network between the templates based on the template pair stored in the template pair storage unit 102 and its match determination result.
- the template network 140 includes a plurality of nodes each corresponding to one template and links defined between the nodes.
- the link is established between nodes corresponding to the template for which the match determination shown by the table 9 is performed.
- Each link is assigned the attribute of the same polarity or opposite polarity according to the result of match determination between the templates of the nodes at both ends (table 9).
- links to which the same polarity is assigned are indicated by solid lines, and links to which the opposite polarity is assigned are indicated by dotted lines.
- the activity value of each template is calculated using this link.
- the seed templates stored in the seed template storage device 32 for example, “cause”, “generate”, and “suppress”) in FIG. For
- a value of +1 or ⁇ 1 is manually given in advance.
- the active value of each node (template) is calculated using these values, the link between the nodes, and the attribute of the link. Specific contents of the calculation method will be described later.
- template DB construction device 60 is further connected to template network construction unit 106, and template network storage unit 110 for storing template network 140 constructed by template network construction unit 106, template For each node of the template network 140 stored in the network storage unit 110, the activation value of each node (template) is calculated based on the activation value of +1 or ⁇ 1 previously attached to the seed template. Of the nodes (templates) of the template network 140 stored in the template network storage unit 110 and the template activity value calculation unit 112 for assigning these activity values to the nodes (templates), the template activity value is calculated.
- a high activity template extraction unit 114 for extracting only those having a large absolute value of the activity value calculated by the value calculation unit 112 and constructing the template DB 62 from the extracted template, and predetermined for the template DB construction
- the template stored in the template DB 62 is displayed.
- the seed template storage device 32 is updated as a new seed template, and a seed template update unit 118 for causing the template DB construction device 60 to execute template DB construction processing again is included.
- the end determination unit 116 determines that the end condition is satisfied, the operation of the template DB construction device 60 ends, and the contradiction expression acquisition unit 64 is activated.
- each unit of the contradiction expression collection device 36 is realized by computer hardware and a computer program executed by the computer hardware.
- the template pair generation unit 90 generates a template pair by simply combining all combinations of seed templates stored in the seed template storage device 32 and all conjunctions stored in the conjunction storage unit 34. belongs to. Typical examples of template pairs are “cause (noun 1)” “so” “generate (noun 2)”.
- the noun pair collection unit 94 performs the following processing.
- a combination of the above template pair + conjunction can be considered as a noun pair that co-occurs in one sentence.
- Such noun pairs are classified into those having a positive relationship with each other and those having a negative relationship with each other as exemplified below.
- Positive / negative of a noun pair is determined by a combination of activity / inactivity of a template pair co-occurring with the noun pair and a conjunction.
- the template pair collection unit 100 performs the following processing. Consider a noun pair determined by the noun pair polarity determination unit 98 to appear on the Internet 40 only as a positive relationship. Among them, the template pair collection unit 100 leaves only noun pairs whose appearance frequency is a predetermined number of times or more as positive relational noun pairs. Similarly, regarding noun pairs that appear on the Internet 40 only as a negative relationship, only noun pairs whose number of appearances is equal to or greater than the predetermined number are left as negative related noun pairs.
- the predetermined number of times as the threshold value may be different or the same when selecting a positive relational noun pair and when selecting a negative relational noun pair.
- the template activity match determination unit 104 determines whether a template pair that co-occurs in a sentence with a remaining positive / negative noun pair + connective in accordance with a determination method based on Table 12 below, Classify into the same (match) and the opposite (opposite). At this time, some template pairs appear on the Internet 40 with the same template activity or appear in the opposite activity. For these, the number of appearances of the matching and the opposite are compared and determined by majority vote.
- template pairs are stored in template pair storage unit 102, and activity matching for each template pair is performed by template activity match determination unit 104. Fires when the determination is complete.
- This program secures a predetermined storage area on the memory, assigns an initial value to an area for a predetermined variable among the storage areas, and constructs an initial empty template network 140.
- step 152 for executing processing 154 on all template pairs stored in the template pair storage unit 102.
- a process 154 is for performing a process of adding a template constituting the template pair and a link therebetween to the template network 140.
- step 150 it is assumed that an empty network is constructed in advance as the template network 140.
- the process 154 determines, for each template included in the template pair to be processed, whether or not the corresponding node exists in the template network 140, that is, whether or not the node should be added to the template network 140; Step 182 is executed when the determination in step 180 is affirmative, and processing for adding a node (one or two) determined to be added to the template network 140 to the template network 140; and steps 180 and 182 Is executed after the step 184 for determining whether or not there is already a link between nodes corresponding to the template pair to be processed, and when the determination in the step 184 is negative, the link is added to the template network 140. Process And a step 186 to end the sounding processing 154. If the determination in step 184 is affirmative, execution of process 154 for this template pair ends.
- the program that implements the template network construction unit 106 further adds a link to the constructed template network 140 by referring to the synonym / entailment relation dictionary 108 after the completion of the processing in step 152, and In the template network 140 obtained as a result, a step 166 of deleting a node whose number of links with other nodes is equal to or less than a predetermined threshold value, and the number of nodes to which each node is linked is determined. And a step 168 of calculating a weight (a calculation method will be described later) and assigning it to each link to end the processing.
- the program part (routine) executed in step 164 of the link addition process of FIG. 4 is performed for all the pairs of nodes that do not have a link among the nodes in template network 140. It includes step 200 for performing the following process 202.
- the process 202 determines whether or not there is a specific relationship between the node pairs to be processed, and “identical” between the node pairs to be processed when the determination in step 210 is affirmative. And the step 212 of ending the process 202 by adding a link having the attribute ".” If the determination in step 210 is negative, the process 202 is also terminated.
- the grammatical information of the verb and the synonym / entailment relationship of the words stored in the synonym / entailment relationship dictionary 108 shown in FIG. 2 are used. .
- step 168 the weight of each link is calculated.
- w ij be the weight given to the link between template i and template j.
- the weight w ij is calculated by the following equation (1).
- d (i) indicates the number of templates linked to the template i.
- SAME (i, j) indicates that the “match” attribute is attached to the link between the template i and the template j.
- OPPOSITE (i, j) indicates that the “opposite” attribute is attached to the link between template i and template j. That is, if the matching attribute is assigned to the template i and the template j, the weight is a positive value, and if the opposite attribute is assigned, the sign of the weight is reversed and becomes negative.
- the template activity value calculation unit 112 shown in FIG. 2 calculates the activity value of each node for each node of the template network 140 stored in the template network storage unit 110 by the method described below.
- the computer program that realizes template activity value calculation unit 112 starts executing in response to template network 140 being stored in template network storage unit 110 and each link being weighted. To do.
- This program uses the activation values (+1 for the active seed template and -1 for the inactive seed template) previously assigned to the nodes corresponding to the seed template among the nodes in the template network 140. ), And a predetermined initial value is set in other nodes.
- the value E (x, W) defined by the following equation is optimized (here, minimized) ) To estimate the activity value of each node.
- x i and x j are the activation values with the signs of templates i and j
- x is a vector composed of these activation values
- W is a matrix composed of link weights w ij , respectively.
- This value E is similar to the calculation formula of the electron spin energy in quantum mechanics, and can be performed in the same manner as the calculation of energy minimization in quantum mechanics.
- the value of x i x j tends to be a positive value when the polarities of x i and x j are the same, and a negative value when they are different, after the energy minimization calculation. is there.
- Equation (2) since there is a coefficient “ ⁇ 1/2” before sigma, the value of E (x, W) is minimized by maximizing the sigma.
- the contradiction expression acquisition unit 64 of FIG. 1 is also realized by a computer program.
- the computer program for realizing contradiction expression acquisition unit 64 conflicts with each other (steps 280 for generating a phrase group and contradictions among the phrase groups generated at step 280 (having conflicting meanings).
- the contradictory pair here refers to a phrase pair that satisfies the following conditions.
- Both phrases consist of one noun and one active or inactive template. For example, “I have (cold)” and “Prevent (cold)”.
- Two nouns included in both phrases are synonymous (or identical) to each other. For example, a combination such as ⁇ cold, cold> or ⁇ cold, cold>.
- One of the two templates included in both phrases is active and the other is inactive. For example, a pair of “being affected” (active) and “preventing” (inactive).
- the two templates share many nouns that co-occur on the Internet (connecting dependency relationships). That is, these two templates have a high distribution similarity.
- the common nouns that co-occur with “cold” may be cold, cold, pneumonia, etc.
- the nouns that co-occur with “prevent” include cold, cold, pneumonia, fire, disaster, etc.
- the distribution similarity between the two is high.
- Each phrase has an appearance frequency on the Internet that is equal to or higher than a predetermined threshold. That is, the noun of each phrase and the template form a dependency relationship with a frequency equal to or higher than the threshold value. For example, the occurrence frequency of “(cold)” ⁇ threshold value and the occurrence frequency of “(prevent) (cold)” ⁇ threshold value must both hold.
- the processing for extracting the contradictory pair by executing the above processing is executed in step 280 of FIG.
- the program portion 280 includes a step 320 for acquiring nouns from the Internet 40 and a step 322 for executing the following processing 324 for all the acquired nouns.
- the process 324 includes a step 360 of executing the following process 362 for all templates stored in the template DB 62 for the noun that is the processing target.
- the process 362 associates a template to be processed with a noun to be processed, thereby generating a step 400 and determining whether or not the frequency of appearance of the phrase on the Internet 40 is equal to or higher than the above threshold. Step 402, and when the determination at step 402 is affirmative, add step 404 to the phrase group and end step 362. If the determination in step 402 is negative, the phrase is not added to the phrase group.
- a large number of active phrases and inactive phrases are generated by executing the program shown in FIG. For example, there are “cause” and “being affected” as active templates, and “inhibit” and “prevent” as inactive templates. If there are “earthquake”, “tsunami”, “cold”, “cold” etc. as examples of nouns obtained from the Internet 40, the following are generated as active phrases and inactive phrases with high appearance frequency. Will. These are added to the phrase group and input to step 282 in FIG.
- the program part that implements step 282 in FIG. 7 has a control structure as shown in FIG. Referring to FIG. 9, this program part clears an area reserved in the storage device as an area for storing contradictory phrase pairs in advance, and all of the phrase groups obtained in step 280. And step 442 for performing the following processing 444 on the active phrase.
- Process 444 includes a step 470 of executing the following process 472 for all inactive phrases.
- the process 472 determines whether or not the nouns included in the active phrase and the inactive phrase to be processed are the same, and the noun included in the active phrase when the determination in step 490 is negative. For example, step 498 is searched from a dictionary of the same kind as the synonym / entailment dictionary 108 shown in FIG. 2 and whether any of the words searched in step 498 matches the noun of the inactive phrase. Determining 500. If the determination in step 500 is negative, execution of process 472 ends.
- Step 492 determines whether or not the distribution similarity between the active phrase and the inactive phrase to be processed is greater than a threshold value. If the determination in step 492 is affirmative, control proceeds to step 494. In step 494, it is determined whether the frequency of appearance of each phrase on the Internet 40 is equal to or higher than a predetermined threshold value. If the determination is affirmative, the active phrase / inactive phrase pair to be processed is added to the contradictory phrase pair group (step 496), otherwise the pair is discarded.
- the program part that realizes the ranking executed in step 284 of FIG. 7 has a control structure as shown in FIG. 10 in this embodiment.
- the program executes step 530 for calculating a score indicating the degree of contradiction of the contradictory phrase pair for all the contradictory phrase pairs selected in step 282 of FIG.
- the process includes step 534 for sorting and outputting all the contradictory phrase pairs in descending order of score, and ending the process.
- the score calculated in step 532 is a score C t (p 1 , p 2 ) calculated by the following equation.
- p 1 and p 2 each represent a phrase constituting an inconsistent pair
- t 1 and t 2 are templates included in p 1 and p 2
- s 1 and s 2 are templates t 1 and t, respectively.
- represents the absolute value of the activity value s 1
- sim (t 1 , t 2 ) represents the distribution similarity between the templates t 1 and t 2 .
- the contradiction expression collection system 30 operates as follows. Referring to FIG. 1, a small number of seed templates are stored in advance in seed template storage device 32. Whether each seed template is active is also determined in advance, and the tag is attached to each template. On the other hand, the conjunction storage unit 34 stores Japanese forward and reverse conjunctions. Also for these, information indicating whether the connection is forward or reverse is provided in advance.
- the template DB construction device 60 operates as follows to construct the template DB 62.
- template pair generation unit 90 generates all possible combinations of all combinations of all seed templates stored in seed template storage device 32 and conjunctions stored in conjunction storage unit 34. These are all stored in the template pair storage unit 92 as template pairs.
- the noun pair collection unit 94 collects the noun pairs that co-occur with the template pair from the Internet 40 and stores them in the noun pair storage unit 96.
- the noun pair polarity determination unit 98 determines whether the noun pair corresponds to the activity / inactivity of the template in the template pair co-occurring with the noun pair and the type of the conjunction that binds the template pair. Determine whether the relationship is positive or negative, and add a tag to each noun pair.
- the template pair collection unit 100 collects, for each noun pair, a template pair that co-occurs with the noun pair from the Internet 40 and stores it in the template pair storage unit 102.
- the template activity match determination unit 104 activates / inactivates the templates constituting the template pair according to the positive / negative of the co-occurring noun pair and the type of conjunction (forward tangent, reverse tangent). Determine whether they are the same or opposite.
- the template activity match determination unit 104 assigns a tag indicating whether each activity / inactivity is the same or opposite to each template pair stored in the template pair storage unit 102.
- the template network construction unit 106 constructs a template network 140 based on the template pairs stored in the template pair storage unit 102.
- the template network construction unit 106 adds a node corresponding to two templates constituting a template pair to the network if it is not in the network, and adds a node if there is no link. By executing this process for all template pairs, a template of the template network 140 is constructed.
- the template network construction unit 106 further refers to the synonym / implication relation dictionary 108 for all pairs of nodes that are not linked to each other in the network, and has a specific relationship as shown in the table 13 between the templates corresponding to the nodes.
- the template network construction unit 106 assigns a weight calculated by the equation (1) to each link of the network constructed in this way.
- the template network 140 to which the link is added in this way is stored in the template network storage unit 110.
- the template activity value calculation unit 112 executes the process shown in FIG. That is, first, an activation value of +1 or ⁇ 1 is given to the seed template according to its activation / deactivation (step 240). Further, by executing a process for minimizing the value E (x, W) defined as an amount similar to the energy of electron spin (step 242), the activity value of each template is estimated, and the activity value of each template is estimated. Assign a value. Some of these activity values are negative and some are positive.
- the high activity level template extraction unit 114 selects templates whose activity values are estimated in this way and whose activity value is larger than a predetermined threshold value, and uses those templates to determine the template DB 62. To construct. In this case, the order may be set according to the magnitude of the value of the activation value, instead of selecting the threshold value.
- the end determination unit 116 shown in FIG. 2 determines whether or not a predetermined end condition is satisfied when the template DB 62 is constructed.
- a termination condition for example, a condition such that the number of repetitions exceeds a predetermined number or the number of templates exceeds a predetermined number can be assumed. If the termination condition is satisfied, it is assumed that the template DB 62 is completed. If the termination condition is not satisfied, the seed template updating unit 118 updates the seed template storage device 32 by using the template included in the template DB 62 as a seed template. Since these seed templates are given the activity values calculated by the above processing, the same processing as described above is executed using these activity values in the subsequent processing.
- the contradiction expression acquisition unit 64 uses the template DB 62 to execute processing for acquiring the contradiction expression from the Internet 40.
- the contradiction expression acquisition unit 64 generates a phrase group as shown in FIG. That is, as shown in FIG. 8, a noun is acquired from the Internet 40 (step 320), and for each combination of each noun and each template in the template DB 62 (step 322, process 324, step 360), the noun And a template including the template are generated (step 400). If the frequency of appearance of the phrase on the Internet 40 is equal to or higher than a predetermined value, the phrase is added to the phrase group used in the following processing, otherwise the phrase is discarded. By executing the above processing for all combinations of nouns and all templates, a phrase that appears on the Internet 40 at a certain frequency can be obtained.
- the contradictory expression acquisition unit 64 selects a contradictory phrase pair from the phrase group thus generated as follows. That is, first, the storage area for the contradictory phrase pair is cleared (step 440 in FIG. 9), and all combinations of all active phrases and all inactive phrases included in the phrase group (steps 442 and 444). Step 470), whether the nouns included in both are the same or synonymous (steps 490, 498, 500). If both nouns are the same or synonymous, it is further checked whether the distribution similarity of the template pair included in the phrase pair is greater than a threshold value. If the judgment is negative, the phrase pair is discarded.
- step 494 it is next checked in step 494 whether the frequency of appearance of the phrase pair on the Internet 40 is greater than or equal to a threshold value. If the judgment is negative, the phrase pair is discarded. If the determination is affirmative, the phrase pair is added to the contradictory phrase pair group (step 496).
- the contradiction expression acquisition unit 64 repeats the above processing for a phrase pair composed of combinations of all active phrases and inactive phrases. As a result, a large number of contradictory phrase pair groups can be obtained automatically.
- a score of the degree of contradiction is calculated as indicated by step 532 in FIG. Using this score, contradictory phrase pairs are sorted and output in descending order of score.
- the template DB 62 can be obtained by the template DB construction device 60.
- This template DB 62 can be used not only for acquiring contradiction expressions as in the first embodiment but also for various processes.
- the second embodiment is an example in which the template DB 62 is used for acquiring a causal relationship.
- This second embodiment can be realized by employing a processing unit for acquiring a causal relationship expression from the Internet 40 instead of the contradiction expression acquiring unit 64 of FIG.
- a processing unit can be realized by a computer program.
- the causal relationship acquisition process can be performed as follows. First, in step 570, a phrase pair co-occurring in a virtual corpus on the Internet 40 in a sentence connected to each other with a tangent conjunction, and a noun pair composed of nouns in the phrase pair are converted to the Internet 40. Collect from.
- step 572 it is determined whether or not the relationship between the noun pairs in the phrase pair to be processed is a positive relationship. If the determination is positive, it is further determined whether the activity / inactivity of the template pair is the same or opposite (step 592). If the determination in step 592 is negative, the noun pair is discarded. If the determination in step 592 is affirmative, the phrase pair is added to the causal relationship pair group (step 594).
- step 590 determines whether the activity / inactivity of the template pair is opposite to each other. If the judgment is negative, discard this phrase pair. If the determination is positive, the phrase pair is added to the causal relationship pair group.
- step 578 is executed for all the causal relationship phrase pairs in the causal relationship pair group obtained as a result (step 576).
- the processing performed for each causal relationship phrase pair in step 578 is processing for calculating the causal relationship strength score C s (p 1 , p 2 ) by the following equation.
- p 1 and p 2 are phrases constituting the causal relationship pair
- s 1 and s 2 are activation values of templates constituting the phrases p 1 and p 2 , respectively, and a symbol
- Absolute values, n 1 and n 2 are nouns included in the phrases p 1 and p 2
- npfreq (n 1 and n 2 ) are n 1 and n 2 are active if n 1 and n 2 are positive / If the inactive template pair co-occurs in one sentence, if n 1 and n 2 are negative, n 1 and n 2 co-occur in a sentence with different active / inactive template pairs. , Respectively.
- the causal relationship phrase pairs are sorted and output in descending order of the scores.
- the method of acquiring the causal relationship is not limited to this.
- a phrase pair that satisfies all of the following conditions may be acquired as a causal relationship pair.
- Both phrases consist of one noun and one active / inactive phrase. For example, it is a phrase pair such as “(earthquake)” and “(tsunami) occurs”.
- a template pair that co-occurs in one sentence together with a forward conjunction and a noun pair that co-occurs in that sentence are acquired. These are considered to be prototypes of causal pairs.
- a causal relationship strength score is calculated by the following equation (5). All the causal relationship phrase pairs are sorted and output in descending order of the causal relationship strength score.
- the equation (5) may be used for the score calculation in the above-described second embodiment.
- causal relationships that are considered to describe causal relationships are extracted from descriptions existing on the Internet 40.
- only those that are actually described as sentences on the Internet 40, that is, those that can be regarded as being grounded by human expression activities are extracted. ing.
- even a single causal relationship can be expressed in a wide variety of languages. For example, limited to Japanese, the causal relationship that exists around a certain causal relationship “Obtain US beef ⁇ make beef bowl” can be expressed in various forms as shown below.
- causal relation DB 632 there is a database (causal relation DB 632) describing the causal relation, and the inference system 630 is set to output a hypothesis that can be inferred from the input using the causal relation DB 632. It shall be.
- This causal relationship DB 632 has a causal relationship 634 of “obtain US beef ⁇ make beef bowl”, and only the causal relationship regarding US beef is included in the causal relationship DB 632.
- the causal relationship DB 672 includes not only the causal relationship 634 but also other causal relationship groups 674 shown in the table 15 above. Then, it is assumed that the inference system 670 uses this causal relation DB 672 to infer an answer to the question. Then, when the same question as in the case of FIG. 12A is given, the inference system 670 causes the causal relation “causal relation in the causal relation DB 672 to be prohibited from importing US beef ⁇ beef can not be eaten. , An output 676 of “I can no longer eat beef bowl” can be obtained. It is clear that such a system is much more useful compared to the inference system 630 shown in FIG.
- the problem is how to obtain the causal relationship group 674 and the like when only the causal relationship 634 is obtained from the Internet 40.
- it is useful to use the template DB 62 described in the first embodiment.
- a system for outputting a causal relationship hypothesis that does not exist directly on the Internet 40 from a causal relationship that can be found on the Internet 40 using the template DB 62 is a computer hardware and a program executed by the computer hardware. And can be realized. This system is used together with the template DB 62 shown in FIG.
- a program for realizing such a function includes a pair of nouns in a causal relationship (causal relationship noun pair), and a pair of nouns in which one corresponds to the other material ( A material related noun pair), and a step 710 of acquiring a noun pair (suppression related noun pair) in which one suppresses the other.
- Any of these noun pairs can be obtained from an expression that matches a predetermined pattern using existing techniques. For example, in the case of a causal noun pair, an expression that matches a pattern such as “A causes B” is selected, and the nouns A and B are acquired as a causal noun pair.
- the nouns A and B are acquired from an expression that matches a pattern such as “Make B with A”.
- the nouns A and B are acquired from an expression that matches the pattern such as “A prevents B”. Examples of these are shown in Table 16 below. Note that the noun pairs acquired in this step 710 to generate a causal relationship hypothesis are not limited to the above-mentioned three types of relationship noun pairs, and various other relationships can be considered.
- the program further includes a step 712 that executes the following program portion 714 for all noun pairs obtained in step 710.
- the program portion 714 determines, for each noun in the noun pair to be processed, step 740 for identifying a template that frequently has a dependency relationship with that noun, and determines whether the noun pairs are in a positive relationship with each other.
- Step 742 for branching the control flow and step 742 is executed when the determination in step 742 is affirmative, and is frequently shared on the Internet 40 together with the tangent conjunction among the combinations of the templates identified in step 740.
- Step 746 which frequently co-occurs with conjunctions and selects opposites of each other's activity / inactivity. “Frequently” in step 744 and step 746 refers to a case where the appearance frequency on the Internet 40 is greater than a predetermined value in the present embodiment.
- the program portion 714 further includes a step 748 that is executed subsequent to steps 744 and 746 and that executes the program portion 750 described below for all template pairs selected in step 744 or step 746.
- the program part 750 generates a causal relationship hypothesis by adding a template pair to be processed to a noun pair to be processed, and adds the causal relationship hypothesis generated in step 770 to the causal relationship hypothesis group for processing. And step 772 for ending.
- a large number of causal relationship hypotheses that do not exist on the Internet 40 can be generated based on the noun pairs collected from the Internet 40 using the template DB 62.
- the causal relationship obtained from the Internet 40 but much more causal relationship hypotheses can be obtained. Therefore, it is possible to cover a wide range of causal relationships as the basis of the inference system, and the inference system can find answers to a wide range of questions.
- the method according to the fourth embodiment is a causal relationship pair acquired directly from the Internet 40 (a pair of phrases in a causal relationship) or a causal relationship hypothesis acquired by the method according to the third embodiment.
- a causal relationship hypothesis is generated from (a kind of causal relationship pair) and the contradictory expression (a pair consisting of mutually contradictory phrases) obtained in the first embodiment. Specifically, it is as follows.
- a new causal relationship hypothesis is automatically obtained by replacing each acquired phrase of the causal relationship or the causal relationship hypothesis with an inconsistent phrase.
- a program for realizing this embodiment by computer hardware is based on the premise that a causal relationship pair has already been obtained. Step 810 is performed.
- the program part 812 includes a step 840 of executing the following program part 842 for all contradictory phrases for the left-side phrase (phrase representing the cause) in the causal relationship pair to be processed.
- the program part 842 replaces the left phrase of the causal relationship pair being processed with a phrase inconsistent with the phrase (the contradictory phrase selected as the processing target in step 840), and the causal relationship pair being processed. Step 872 of executing the following program portion 874 for all phrases that contradict the right phrase.
- the program part 874 replaces the right phrase of the causal relationship pair being processed with a phrase that contradicts the phrase, and the phrase pair newly obtained by completing the processing of step 910 as a new causal relationship hypothesis. Adding to the causal relationship hypothesis group and ending the program portion 874.
- the left phrase is replaced with a phrase that contradicts the left phrase
- the right phrase is replaced with a phrase that contradicts the right phrase.
- the first example is to use a template pair to improve accuracy when acquiring synonyms and implications.
- synonyms and implications are obtained from text using the similarity (distribution similarity) of their appearance contexts. For example, the following examples can be considered.
- the appearance context of “import” is “noun 1”, and the appearance context of “import” is “noun 2”.
- noun 1 and noun 2 often coincide or are synonymous words. This is because the meanings of these two phrases are similar (synonymous).
- the appearance context of “No import” is “noun 3”, and the appearance context of “become difficult to obtain” is “noun 4”.
- the set formed by the noun 3 tends to be a subset of the set formed by the noun 4. Therefore, in the case of the synonymous relationship and the implication relationship, the appearance contexts of both templates are similar and the distribution similarity is high. Synonyms and implications are obtained using these relationships.
- the distribution similarity is not only a case where there are synonyms and implications between two templates, but is often a contradictory (opposite) relationship. For example, “importing (noun 1)” and “(noun 2) are prohibited from importing” are contradictory. However, the noun 1 and the noun 2 often contain the same noun, resulting in a high distribution similarity. Therefore, in the synonym / implication expression, there is a problem that templates that originally have contradictory meanings are selected as the synonym or implication expression.
- Such problems can be solved by using the template DB 62. That is, it is checked whether or not the activity / inactivity of both templates matches for a linguistic expression pair that is extracted by a conventional method and is a candidate for synonym / implication expression. If the activity / inactivity of both is the same, the templates are considered to have the same meaning or implication, while if the activity / inactivity of the two is not the same, it can be determined that these templates are in a contradictory relationship.
- causal noun pairs across sentence boundaries Most of the conventional techniques for acquiring causal noun pairs are to extract noun pairs that co-occur in a sentence with a certain pattern. However, with such a technique, only causal relationship pairs described in one sentence can be obtained. On the other hand, in reality, there may be expressions of causality other than expressions co-occurring in a sentence. In particular, many pairs of expressions that are close to each other in a text represent a causal relationship. For example, an expression such as “An earthquake occurred in Tohoku. Many people were hit by a tsunami after that.”
- the causal relation noun pair that appears across sentences as described above can be obtained as follows using the template DB 62.
- causal relationship or causal relationship hypothesis both phrase pairs + form of tangent conjunction
- the method described in Embodiment 2 or 3 may be used.
- template pairs that appear frequently are extracted. For example, it is possible to obtain template pairs such as “attack ⁇ attack” and “cause ⁇ attack”.
- Phrase pairs and noun pairs with a large amount of causality can be obtained regardless of whether they are written in text or not.
- the most practical use is considered to be the application of the present invention to a question answering system or the like that gives a clever, straightforward and accurate answer to a question written in an arbitrary language.
- the present invention is combined with a speech recognition technology to be used in a system that issues a question to a computer and obtains an answer thereof, or a system that maintains a database storing past cases in a call center or the like. High compatibility.
- a database of causal relations covering a very wide range can be maintained by the system of the embodiment as described above.
- questions related to causal relationships that is, “WHY-type questions” that have been weak in conventional question answering techniques.
- such a response can be easily acquired by using the causal relation database obtained by the embodiment described above.
- the hypothesis generation technique described above makes it possible to present information that does not exist on the Internet 40 as a hypothesis.
- conventional question answering systems have not been able to answer the consequences or causes of unknown events.
- the embodiments described above enable such a technique. For example, if the technique of the embodiment described above is used, before the price of the cemented carbide tool actually rises, a question such as "What can be the cause if the price of the cemented carbide tool rises?" Will be able to answer the hypothesis that if China bans the export of tungsten, the price of cemented carbide tools may rise. Once this is possible, the questioner can also take steps to hedge the risk.
- a case can be considered in which a complaint about a trouble that contradicts the past case of a certain product is sent from the customer to the call center.
- a call center retrieves information from a database using keywords and obtains an appropriate answer.
- the system can recognize that at least a new claim is a case inconsistent with a past case, and can notify the operator to that effect. From the result, for example, the operator can respond much more accurately than when there is no information. For example, since a new claim is inconsistent with a past case, it can be shown to the customer that the customer has misrecognized the trouble or that this trouble may be an unknown trouble. As a result, the problem can be solved more smoothly than before.
- the causal relationship can be automatically recognized without using a clue expression such as “for” or “no”.
- Information equivalent to a dictionary can also be acquired automatically. Therefore, its application range is dramatically widened.
- verbs are classified into three types: active / inactive / neutral.
- active / inactive / neutral Such a classification method has not been considered in the past. Of these, no corresponding classification has been proposed for inactive predicates.
- the active / inactive combination of predicates (template) is an important factor, and both are useful together. Therefore, the conventional technology which does not have such a classification of activity / inactivity and further, a classification of activity / inactivity / neutral gives effects as in the above embodiment in automatic recognition of causality and recognition of contradiction. I can't.
- the causal relationship between words is not only acquired, such as the causal relationship between “cholesterol” and “cerebral infarction” that has been handled by the conventional technology.
- the semantic relationship between words according to the above embodiment, it is possible to acquire what cannot be acquired by the conventional method.
- the above embodiment makes the hypothesis generation technique regarding the semantic relationship between words more powerful than the conventional technique.
- verbs such as “cause” and “prevent” are classified into different labels (active / inactive) in advance. And it imposes the restriction that verbs classified into different labels are not recognized as synonymous. By imposing such restrictions, it can be prevented that antonyms are recognized as synonyms and implications because of the high distribution similarity. Therefore, the above embodiment can improve the accuracy of recognizing synonyms and implications.
- FIG. 15 shows the external appearance of the computer system 930
- FIG. 16 shows the internal configuration of the computer system 930.
- this computer system 930 includes a computer 940 having a memory port 952 and a DVD (Digital Versatile Disc) drive 950, a keyboard 946, a mouse 948, and a monitor 942.
- a computer 940 having a memory port 952 and a DVD (Digital Versatile Disc) drive 950, a keyboard 946, a mouse 948, and a monitor 942.
- DVD Digital Versatile Disc
- the computer 940 boots up a CPU (central processing unit) 956, a bus 966 connected to the CPU 956, the memory port 952, and the DVD drive 950.
- a read only memory (ROM) 958 that stores programs and the like, and a random access memory (RAM) 960 that is connected to the bus 966 and stores program instructions, system programs, work data, and the like are included.
- the computer system 930 further includes a network interface (I / F) 944 that provides a connection to a network that enables communication with other terminals.
- I / F network interface
- a computer program for causing the computer system 930 to function as each functional unit of the system according to each of the above-described embodiments is stored in the DVD 962 or the removable memory 964 attached to the DVD drive 950 or the memory port 952, and further transferred to the hard disk 954. Is done.
- the program may be transmitted to the computer 940 through a network (not shown) and stored in the hard disk 954.
- the program is loaded into the RAM 960 when executed.
- the program may be loaded directly into the RAM 960 from the DVD 962, from the removable memory 964, or via a network.
- This program includes a plurality of instructions for causing the computer 940 to function as each functional unit of the system according to the above embodiment. Some of the basic functions required to perform this operation are provided by operating system (OS) or third party programs running on the computer 940, or modules of various programming toolkits installed on the computer 940. The Therefore, this program does not necessarily include all functions necessary for realizing the system and method of this embodiment.
- This program includes only instructions that realize the functions of the system described above by calling appropriate functions or appropriate program tools in a programming tool kit in a controlled manner so as to obtain a desired result. Should be included.
- the operation of computer system 930 is well known. Therefore, it does not repeat here.
- the present invention can be used in a method and apparatus that uses natural language processing, and in particular, a predicate template collection apparatus that can automatically and accurately recognize a predicate template that constitutes a phrase and a phrase pair in a specific relationship. It can be used in the industry that manufactures, uses, and rents.
- Contradiction Representation Collection System 32 Seed Template Storage Device 34 Conjunction Storage Unit 36 Contradiction Representation Collection Device 38 Contradiction Representation Storage Device 40 Internet 60 Template DB Construction Device 62 Template DB 64 Contradiction expression acquisition unit 90 Template pair generation unit 92 Template pair storage unit 94 Noun pair collection unit 96 Noun pair storage unit 98 Noun pair polarity determination unit 100 Template pair collection unit 102 Template pair storage unit 104 Template activity match determination unit 106 Template network Construction unit 108 Synonym / Implication relation dictionary 110 Template network storage unit 112 Template activity value calculation unit 114 High activity template extraction unit 116 Termination determination unit 118 Seed template update unit 140 Template networks 630 and 670 Inference system 632 and 672 Causal relationship DB 634 Causality 674 Causality
Abstract
Description
人手で用意した大量の因果関係事例から機械学習によって新たな因果関係を取得する技術として、非特許文献1に記載された技術がある。日本語での例として、因果関係を明示的に示す「ため」「ので」といった接続詞のテキスト中での出現を手がかりにフレーズ間の関係を自動認識するものが存在する(非特許文献2)。
WordNet等の人手で構築された辞書を利用するものが存在する(非特許文献3)。
名詞と動詞との組合せからなる単位について、動詞が例えば名詞の指す対象の持つ機能、効果等を発揮させる、増大させる方向の出来事を記述するようなものか否かに基づいて動詞を分類したり、そうした性質を持つ動詞を自動的に獲得したりする研究が存在する(例えば非特許文献4及び非特許文献5)。
単語間の特定の意味的関係、例えば、因果関係に関して、仮説を生成する技術が存在する(非特許文献6)。例えば、「コレステロール」と「動脈硬化」の間には因果関係があり、「動脈硬化」と「脳梗塞」の間に因果関係があることをデータベースに保持していたとすると、それらの因果関係を組合せて、新たな仮説「コレステロール」は「脳梗塞」の原因となることを推論する。
従来、動詞等の語又は「AがBを引き起こす」のようなパターンの間の同義性、含意の認識において、その語の周辺に出現する他の語又はパターン中においてA、Bといった変数の占める位置に出現する語の確率分布を求め、それらの間の統計的類似度(これを「分布類似度」と呼ぶ)を用いる技術が存在する(非特許文献7)。例えば、「AがBを引き起こす」というパターンと「AがBの原因となる」というパターンとはほぼ同義と認められるが、このことを、A、Bの位置にくる一連の名詞、例えば「ダイオキシン」「ガン」のようなものの出現確率を求め、それらの出現確率の間の類似度によってそうした同義を認識する技術である。
以下、(A)フレーズ間の因果関係の自動認識手法、(B)フレーズ間の矛盾関係の自動認識手法、(C)述語テンプレート(助詞と動詞の対、例:<を、食べる>)の自動分類方法、(D)言語による自動仮説生成手法、及び(E)フレーズ間の同義、含意の自動認識手法、の5点に関して従来技術の課題を整理する。
フレーズ間の論理的関係の1つとして、因果関係がある。先行技術は、因果関係を明示的に示す「ため」「ので」といった接続詞、又は、人手で構築した辞書をその情報源として因果関係を認識する。しかし、「ため」のような接続詞は通常テキスト中で出現頻度がそれほど多くないこと、及び、人手で作成された辞書はカバーする語が少なく、獲得できる因果関係が多くないという問題がある。したがって、適用範囲を広くすることが望ましい。
フレーズ間の論理的関係の他の例として、フレーズ間の矛盾関係がある。ここでいう矛盾関係とは、両者の意味するところが反対であることをいう。この点に関する先行技術は、人手で構築された辞書に依存している。そうした辞書に記載されている語は多くなく、広範な表現に対応できないという問題がある。
先行技術では、動詞が例えば名詞の指す対象の持つ機能、効果等を発揮させる、増大させる方向の出来事を記述するようなもののみに着目している。しかし、単に動詞のこのような性質に着目するのみでは、矛盾・因果関係の認識が十分に行なえないという問題がある。
先行技術に開示された技術では、単語間の因果関係を抽出することは可能だが、より広い単位で因果関係を抽出したり、因果関係に関する仮説を生成したりすることができないという問題がある。
先行技術では、単語、フレーズ、パターンの周辺に出現する他の単語の確率分布を求める。さらにそれらの確率分布の間の類似度を計算する。こうして得られた情報によって、単語、フレーズ、及びパターンの間の同義又は含意関係を認識している。しかし、これらの技術には、「反義」の表現も同義と認識してしまう可能性が高いという欠点がある。例えば、パターン「AがBを引き起こす」とパターン「AがBを防ぐ」というパターンとを比較するとわかりやすい。両者において、「引き起こす」と「防ぐ」とは全く反対の意味を持つ。ところが、周辺に出現する他の単語の確率分布の類似度を求めると、これらが高い類似度を示すためである。例えば、「食事が成人病を引き起こす」「食事が病気を防ぐ」のように曖昧な語が、両者のパターンに頻出することが多いということに起因する。こうしたパターンが頻出するために、「引き起こす」、及び「防ぐ」という語を含むパターンに出現する名詞の確率分布が類似してしまう。したがって、同義、含意についてもより精度高く認識できるようにする必要がある。
《テンプレート》
既に述べたとおり、1つの名詞と、1つの動詞、形容詞又は形容動詞とを助詞を介して結び付けたものを「述語テンプレート」と呼ぶ。本実施の形態では、述語テンプレートは、活性、不活性、中立のいずれかに分類される。
以下では、述語テンプレートの分類ラベルについて説明する。次に、その分類が持つ言語学的性質を説明する。さらに、分類の自動獲得手法について述べる。最後に、分類された述語テンプレートのアプリケーションについて述べる。
本実施の形態では、全体のシステムの入力となるテキストに現れる述語テンプレートを以下のテーブル1に示す3種に分類する。
(2)述語テンプレートへの極性の割当
述語テンプレートへの極性の割当は自動的に計算される。まず、極性割当の手がかりとして以下のような言語学的性質、制約を考える。まず、名詞の対を考え、対を成す名詞の間の因果関係という概念を導入する。
なお、同義/含意の関係を持つ述語テンプレートの組は同じ極性を持つ。例えば「(ワクチン)を処方する」と「(ワクチン)を注射する」とは両方とも極性は正であり、「(地震)が発生する」と「(地震)が起きる」についても極性は両方とも正である。ただし、活性値が同一であるとは限らない。
活性値の積が正で、活性値の絶対値が大きい述語テンプレートの対が正の因果関係を持つ名詞と共起し、順接の接続詞で結ばれた場合には因果関係を表している可能性が高くなる。活性値の積が負で、活性値の絶対値が大きい述語テンプレートの対が負の因果関係を持つ名詞と共起し、順接の接続詞で結ばれた場合にも因果関係を表している可能性が高い。この性質を利用して、因果関係を表す、2つの述語テンプレート及び正負の因果関係を持つ名詞対を含む表現をテキスト中で自動的に認識し、取得できる。
接続詞「~て」は、因果関係を表すとは限らない。「~て」が因果関係以外を表す表現は無数に存在する。例えば、「風呂に入って、食事をする」の場合、風呂と食事との間には、因果関係は通常認められない。本実施の形態により、このような、因果関係ではないフレーズペアをうまく除外し、因果関係のみを精度高く取得できる。
極性が反対の述語テンプレートは、仮に両者が同種の名詞に付随して出現する場合、互いに矛盾している可能性が高い。この性質を利用して、矛盾するフレーズ対を自動的に取得できる。具体的には、共通の名詞と共に出現する確率が高く、極性が反対の述語テンプレートの対に、共通の名詞を埋め込んだフレーズ対を収集する。これらフレーズ対は、互いに矛盾するフレーズ対として自動的に取得できる。以下に、矛盾する表現の例を挙げる。
上記(A)フレーズ間の因果関係の自動認識と(B)情報の矛盾の自動認識の技術を併せ用いる事で、元となるテキストに陽に記載されていない因果関係を自動取得する事が可能となる。その手法の概略は以下のとおりである。
従来の同義、含意関係の自動獲得技術では、注目している表現の周辺に出現する語の出現確率の分布の類似により、同義、含意を認識している。しかし既に述べたように、往々にして、ある単語Aに対する反義の単語Bが、単語Aと同義又は含意という関係を持つと誤認識されるケースがあった。これは、反義語の出現するコンテキストが類似することが多いことに起因する。これに対して本実施の形態によれば、例えば、「を引き起こす」の活性値が正、「を防ぐ」の活性値が負であることを自動的に計算できる。これらの情報を用いると、従来技術を用いることで抽出された同義表現の候補の中で、述語テンプレートの極性が異なっているか否かにより同義か否かを見分けることができる。この結果、本実施の形態の技術を用いて、単語の同義及び含意の自動獲得の精度が向上する。
上記した(A)及び(C)の技術によって多数の因果関係を取得できる。それら多数の因果関係をデータベース化すると、それら因果関係の表現中に頻出する述語テンプレートの対が獲得できる。例えば、「が起きた」と「に襲われた」という述語テンプレートの対が、データベース中の因果関係に多数出現するものとする。そうした述語テンプレートは、テキスト中で文をまたがって(別の文の中で)出現した場合でも、互いの間の文数、単語数又は文字数等、「距離」が近い場合には因果関係を表す可能性が高い。例えば、「昨日、地震が起きた。津波に襲われたとの報告が寄せられている。」といったように、「地震が起きた」という表現と「津波に襲われた」という表現とが2文に分かれて出現した場合を考える。この場合でも、「地震が起きた」と「津波に襲われた」との2つのフレーズで記述される出来事の間には因果関係がある。また、そこに出現している名詞の対、すなわち「地震」と「津波」の間にも因果関係がある。こうした性質を利用し、複数文にまたがって記載されている因果関係を、フレーズ間及び単語間の双方について、自動的に取得できる。
[構成]
図1を参照して、この発明の第1の実施の形態に係る矛盾表現収集システム30は、前述した述語テンプレートのうち、テンプレートネットワーク構築の際の核となるテンプレート(これを「シードテンプレート」と呼ぶ)を記憶するためのシードテンプレート記憶装置32と、述語テンプレートの間を連結する、順接及び逆説の接続詞を記憶する接続詞記憶部34と、シードテンプレート記憶装置32、接続詞記憶部34、及びインターネット40上のコーパスから、接続詞記憶部34に記憶された接続詞により結び付けられた2つのフレーズからなるフレーズ対を大量に収集し、それらの中から互いに矛盾した表現(相反する表現)を獲得するための矛盾表現収集装置36と、矛盾表現収集装置36により収集された矛盾表現を記憶するための矛盾表現記憶装置38とを含む。
(2)2つの述語テンプレートの極性が同じで、これらが逆接の接続詞で接続されていると、これらと共起している名詞ペアの関係は負
(3)2つの述語テンプレートの極性が反対で、これらが順接の接続詞で接続されていると、これらと共起している名詞ペアの間の負
(4)2つの述語テンプレートの極性が反対で、これらが逆接の接続詞で接続されていると、これらと共起している名詞ペアの間の正
テンプレートDB構築装置60はさらに、名詞ペア記憶部96に接続され、名詞ペア極性判定部98により関係タグが付された名詞ペアの各々について、それらと共起するテンプレートペアをインターネット40から収集するためのテンプレートペア収集部100と、テンプレートペア収集部100が収集したテンプレートペアを、それらと共起した名詞ペアと関係付けて記憶するためのテンプレートペア記憶部102と、テンプレートペア記憶部102に記憶されたテンプレートペアの各々について、そのテンプレートペアを構成するテンプレートの活性/不活性が同じか否か(マッチするか否か)を、そのテンプレートペアと共起する名詞ペアの関係(正/負)と、テンプレートを連結している接続詞が順接か逆接かに基づいて判定し、各テンプレートペアにその結果をタグとして付与するためのテンプレート活性マッチ判定部104とを含む。
(2)関係が正である名詞ペアと共起し、逆接の接続詞により接続されるテンプレートペアの活性は反対
(3)関係が負である名詞ペアと共起し、順接の接続詞により接続されるテンプレートペアの活性は反対
(4)関係が負である名詞ペアと共起し、逆接の接続子により接続されるテンプレートペアの活性は同じ
テンプレートDB構築装置60はさらに、テンプレートペア記憶部102に記憶されたテンプレートペアとそのマッチ判定結果とに基づいて、テンプレート間にネットワークを構築するためのテンプレートネットワーク構築部106と、テンプレートネットワーク構築部106が、ネットワークの構築時にテンプレート間のリンクを追加するために使用する同義・含意関係辞書108とを含む。このネットワークを本明細書では「テンプレートネットワーク」と呼ぶ。
この第1の実施の形態に係る矛盾表現収集システム30は、以下のように動作する。図1を参照して、シードテンプレート記憶装置32には予め少数のシードテンプレートが格納される。各シードテンプレートが活性か否かについても予め判断されており、各テンプレートにそのタグが付されている。一方、接続詞記憶部34には、日本語の順接接続詞及び逆接接続詞が格納されている。これらについても、予め順接か逆接かを示す情報を付与しておく。
上記した第1の実施の形態の矛盾表現収集システム30のうち、テンプレートDB構築装置60によってテンプレートDB62を得ることができる。このテンプレートDB62は、第1の実施の形態のような矛盾表現の獲得だけではなく、様々な処理に使用できる。第2の実施の形態は、テンプレートDB62を因果関係の取得に使用する例である。この第2の実施の形態は、図1の矛盾表現獲得部64に代えて、インターネット40から因果関係表現を獲得するための処理部を採用することで実現できる。そうした処理部は、コンピュータプログラムで実現できる。
第2の実施の形態では、インターネット40上に存在する記載について、因果関係を記載していると思われる因果関係を抽出している。しかし、世の中には、因果関係と見なせるものは無数に存在している。第2の実施の形態の方法では、それらの中で、実際にインターネット40上に文として記載されたもの、すなわち人間の表現活動により根拠が与えられているとみなすことのできるもののみが抽出されている。しかも、1つの因果関係であっても、多種多様な言語で表現できる。例えば、日本語に限定して、ある因果関係「米国産牛肉を入手する→牛丼を作る」の周辺に存在する因果関係は、以下に例を示すように多様な形で表現できる。
因果関係仮説の生成手法としては、第3の実施の形態に係るもの以外にも種々考えられる。この第4の実施の形態に係る手法は、インターネット40から直接的に獲得した因果関係ペア(因果関係にあるフレーズのペア)、又は、第3の実施の形態に係る手法により獲得した因果関係仮説(因果関係ペアの一種)と、第1の実施の形態で求めた矛盾表現(互いに矛盾するフレーズからなるペア)とから、因果関係仮説を生成する、というものである。具体的には、以下のようにする。なお、以下の処理の前提として、フレーズ「牛肉を輸入する」に対してフレーズ「牛肉が輸入禁止になる」という矛盾表現が予め得られており、フレーズ「牛丼を食べる」に対して「牛丼が食べられない」という矛盾表現が予め得られているものとする。
《同義・含意表現の精度の向上》
上記実施の形態は、いずれも最終的には何らかの形のフレーズペアを得る。しかし、本発明により得られるテンプレートペアは、そのような実施の形態で利用可能なだけではない。他にも種々の利用を考えることができる。
従来の因果関係名詞ペアを獲得する手法の大部分は、あるパターンで一文内に共起している名詞のペアを抽出する、というものである。しかし、そうした手法では、一文中に記述されている因果関係ペアしか獲得できない。一方、現実には、一文中に共起している表現以外でも、因果関係の表現があり得る。特に、テキスト中で互いに近い位置に存在している表現のペアの中に、因果関係を表しているものも多い。例えば、「東北で地震が起きた。その後、大勢の人が津波に襲われた」というような表現である。
以上のように本発明の実施の形態によれば、以下のような効果が得られる。
上記実施の形態に係るシステムは、コンピュータハードウェアと、そのコンピュータハードウェア上で実行されるコンピュータプログラムとにより実現できる。図15はこのコンピュータシステム930の外観を示し、図16はコンピュータシステム930の内部構成を示す。
32 シードテンプレート記憶装置
34 接続詞記憶部
36 矛盾表現収集装置
38 矛盾表現記憶装置
40 インターネット
60 テンプレートDB構築装置
62 テンプレートDB
64 矛盾表現獲得部
90 テンプレートペア生成部
92 テンプレートペア記憶部
94 名詞ペア収集部
96 名詞ペア記憶部
98 名詞ペア極性判定部
100 テンプレートペア収集部
102 テンプレートペア記憶部
104 テンプレート活性マッチ判定部
106 テンプレートネットワーク構築部
108 同義・含意関係辞書
110 テンプレートネットワーク記憶部
112 テンプレート活性値算出部
114 高活性度テンプレート抽出部
116 終了判定部
118 シードテンプレート更新部
140 テンプレートネットワーク
630,670 推論システム
632,672 因果関係DB
634 因果関係
674 因果関係群
Claims (12)
- 述語テンプレートを、所定の文の集合から収集するための述語テンプレート収集装置であって、
述語テンプレートは、名詞と結びついてフレーズを構成するものであり、
かつ述語テンプレートには、活性、不活性、及び中立という分類に従って活性の向き及びその大きさとを表す活性値を付与することが可能であり、
活性とは、当該述語テンプレートに結び付けられた名詞の指す対象の機能又は効果を発揮させる方向の出来事を記述することを示し、
不活性とは、当該述語テンプレートに結び付けられた名詞の指す対象の機能又は効果を発揮させない方向の出来事を記述することを示し、
中立とは、活性でも不活性でもない述語テンプレートであることを示し、
述語テンプレートに関する活性及び不活性の区別を極性と呼び、
前記述語テンプレート収集装置は、
順接又は逆接に分類された接続詞を記憶する接続詞記憶部と、
テンプレートネットワークを構築するための起点となるシードテンプレートを記憶するためのシードテンプレート記憶部とを含み、
前記シードテンプレートの各々には、極性と活性値とが付され、
前記述語テンプレート収集装置はさらに、ある関係を充足する名詞ペアを所定のコーパスから収集し、各名詞ペアを構成する名詞同士の関係の極性を正か負に分類するための名詞ペア収集手段を含み、
名詞ペアを構成する名詞同士の関係の極性は、当該名詞ペアの一方の示す対象が、他方の示す対象の出現を促進するときには正、抑制するときには負として定義され、
前記述語テンプレート収集装置はさらに、
前記名詞ペア収集手段により収集された名詞ペアとそれぞれ共起する述語テンプレートペアを所定のコーパスから収集し、収集された各述語テンプレートペアについて、当該述語テンプレートペアと共起する名詞ペアの関係の極性と、当該述語テンプレートペアを結ぶ接続詞とに基づいて、当該述語テンプレートペアの活性/不活性が同一か、反対かを判定するための述語テンプレートペア収集手段と、
前記述語テンプレートペア収集手段により収集された述語テンプレートペアと、各述語テンプレートペアについての活性/不活性が同一か否かの判定結果とを用いて述語テンプレート間を関係付けることにより、各述語テンプレートをノードとし、述語テンプレートペアを構成する述語テンプレートの間の関係をリンクとするテンプレートネットワークを構築するための構築手段と、
前記テンプレートネットワーク内のシードテンプレートに対応するノードに予め付与されている前記活性値をもとにし、前記テンプレートネットワーク内のノード間の関係を用い、各ノードに付与すべき活性値を算出し、各ノードに対応する述語テンプレートに、算出された活性値を付与して出力するための活性値算出手段とを含む、述語テンプレート収集装置。 - 前記名詞ペア収集手段は、前記接続詞記憶部に記憶された接続詞と、前記シードテンプレート記憶部に記憶されたシードテンプレートとを用いて、述語テンプレートペアと共起する名詞ペアを所定のコーパスから収集し、各名詞ペアを構成する名詞同士の関係の極性を正又は負に分類するための手段を含む、請求項1に記載の述語テンプレート収集装置。
- 前記分類するための手段は、前記接続詞記憶部に記憶された接続詞と、前記シードテンプレート記憶部に記憶されたシードテンプレートとを用いて、述語テンプレートペアと共起する名詞ペアであって、前記コーパスに所定の頻度以上出現するものを前記コーパスから収集し、各名詞ペアを構成する名詞同士の関係の極性を正又は負に分類するための手段を含む、請求項2に記載の述語テンプレート収集装置。
- 前記分類するための手段は、
前記接続詞記憶部に記憶された接続詞と、前記シードテンプレート記憶部に記憶されたシードテンプレートとを用いて、述語テンプレートペアと共起する名詞ペアを前記コーパスから収集するための手段と、
前記収集するための手段により収集された名詞ペアの組合せの各々について、各名詞ペアと共起する述語テンプレートペアの極性と、当該名詞ペアおよび述語テンプレートにより構成されるフレーズ対を結び付けている接続詞の種類とに基づいて、当該名詞ペアの組合せを構成する名詞の間の関係の極性を決定するための極性決定手段とを含む、請求項2に記載の述語テンプレート収集装置。 - 前記収集するための手段は、前記接続詞記憶部に記憶された接続詞と、前記シードテンプレート記憶部に記憶されたシードテンプレートとを用いて、述語テンプレートペアと前記コーパス内で所定の頻度以上の頻度で共起する名詞ペアを前記コーパスから収集するための手段を含む、請求項4に記載の述語テンプレート収集装置。
- 前記極性決定手段は、前記収集するための手段により収集された名詞ペアの各々について、当該名詞ペアと共起する述語テンプレートの述語テンプレートペアの極性と、当該名詞ペアおよび述語テンプレートにより構成されるフレーズ対を結び付けている接続詞の種類とに基づいて、当該名詞ペアの各々を構成する名詞の間の関係の極性を決定するための手段と、
前記決定するための手段により前記名詞ペアの各々について決定された極性を、前記名詞ペアの種類毎に集計し、多数決により、名詞ペアの種類ごとに極性を決定するための手段とを含む、請求項4に記載の述語テンプレート収集装置。 - さらに、前記活性値算出手段による述語テンプレートの出力が完了したことに応答して、述語テンプレートの活性値を算出する処理の終了条件が成立しているかを判定するための判定手段と、
前記判定手段により前記終了条件が成立していないと判定されたことに応答して、前記活性値算出手段により算出された述語テンプレートの内、活性値の絶対値がしきい値以上の述語テンプレートからなる新たなシードテンプレートを選択し、当該新たに選択されたシードテンプレートにより、前記シードテンプレート記憶部の記憶内容を更新するための更新手段と、
前記更新手段による更新が行われたことに応答して、前記述語テンプレートペア収集手段、前記名詞ペア収集手段、前記述語テンプレートペア収集手段、前記構築手段、及び前記活性値算出手段による処理を再実行させるための手段とを含む、請求項1又は請求項2
に記載の述語テンプレート収集装置。 - 前記構築手段は、
前記述語テンプレートペア収集手段により収集された述語テンプレートペアをなす述語テンプレートに対応するノードが前記テンプレートネットワーク内に存在しないときには、当該述語テンプレートに対応するノードを追加するための手段と、
前記述語テンプレートペア収集手段により収集された述語テンプレートペアをなす述語テンプレートの間にリンクを生成するためのリンク手段とを含み、
前記リンク手段は、各リンクにより接続される述語テンプレートの活性が同じか否かにしたがって、各リンクに活性の一致又は不一致を示す属性を付与し、
前記構築手段はさらに、前記リンク手段により生成される各リンクに、他のノードとのリンクの数の関数である重みを付与するための重み付与手段を含み、
前記重み付与手段が付与する重みは、当該リンクの前記属性が前記一致を示す値のときと、前記不一致を示す値のときとで符号が異なる、請求項7に記載の述語テンプレート収集装置。 - 請求項1~請求項9のいずれかに記載された述語テンプレート収集装置と、
前記述語テンプレート収集装置により収集された述語テンプレートを記憶するための述語テンプレート記憶手段と、
前記述語テンプレート記憶手段に記憶された述語テンプレートのうち、特定の活性/不活性の述語テンプレートの組合せと、特定の種類の接続詞とからなる述語テンプレートペアを含むフレーズペアを所定のコーパスから収集するためのフレーズペア収集手段と、
前記フレーズペア収集手段により収集されたフレーズペア内で述語テンプレートと共起している名詞ペアと、当該フレーズペア内の述語テンプレートの極性とが特定の組合せになっているものを抽出することで、所定の関係を表現するフレーズペアを選択するためのフレーズ選択手段とを含む、特定フレーズペア収集装置。 - さらに、前記フレーズ選択手段により選択されたフレーズペアの各々について、前記各フレーズペアを構成する述語テンプレートの活性値と、当該フレーズペアに含まれる名詞ペアの、前記コーパス内での共起関係との関数として、前記所定の関係の強さを表すスコアを算出するためのスコア算出手段と、
前記スコア算出手段により算出されたスコアの順番で前記フレーズ選択手段により選択されたフレーズペアを整列させるための手段とを含む、請求項11に記載の特定フレーズペア収集装置。 - コンピュータが実行可能なコンピュータプログラムであって、当該コンピュータに、請求項1~請求項11のいずれかに記載の全ての手段として機能させる、コンピュータプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13754814.5A EP2821923B1 (en) | 2012-02-27 | 2013-01-23 | Predicate template gathering device, specified phrase pair gathering device and computer program for said devices |
KR1020147023682A KR101972408B1 (ko) | 2012-02-27 | 2013-01-23 | 술어 템플릿 수집 장치, 특정 프레이즈 페어 수집 장치, 및 이들을 위한 컴퓨터 프로그램 |
US14/377,988 US9582487B2 (en) | 2012-02-27 | 2013-01-23 | Predicate template collecting device, specific phrase pair collecting device and computer program therefor |
CN201380011077.2A CN104137097B (zh) | 2012-02-27 | 2013-01-23 | 谓语模板收集装置以及特定短语对收集装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-039966 | 2012-02-27 | ||
JP2012039966A JP5924666B2 (ja) | 2012-02-27 | 2012-02-27 | 述語テンプレート収集装置、特定フレーズペア収集装置、及びそれらのためのコンピュータプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013128984A1 true WO2013128984A1 (ja) | 2013-09-06 |
Family
ID=49082189
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/051326 WO2013128984A1 (ja) | 2012-02-27 | 2013-01-23 | 述語テンプレート収集装置、特定フレーズペア収集装置、及びそれらのためのコンピュータプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US9582487B2 (ja) |
EP (1) | EP2821923B1 (ja) |
JP (1) | JP5924666B2 (ja) |
KR (1) | KR101972408B1 (ja) |
CN (1) | CN104137097B (ja) |
WO (1) | WO2013128984A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3086239A4 (en) * | 2013-12-20 | 2017-12-06 | National Institute of Information and Communications Technology | Scenario generation device and computer program therefor |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5825676B2 (ja) * | 2012-02-23 | 2015-12-02 | 国立研究開発法人情報通信研究機構 | ノン・ファクトイド型質問応答システム及びコンピュータプログラム |
JP6150291B2 (ja) * | 2013-10-08 | 2017-06-21 | 国立研究開発法人情報通信研究機構 | 矛盾表現収集装置及びそのためのコンピュータプログラム |
JP5907393B2 (ja) * | 2013-12-20 | 2016-04-26 | 国立研究開発法人情報通信研究機構 | 複雑述語テンプレート収集装置、及びそのためのコンピュータプログラム |
JP6403382B2 (ja) * | 2013-12-20 | 2018-10-10 | 国立研究開発法人情報通信研究機構 | フレーズペア収集装置、及びそのためのコンピュータプログラム |
JP6551968B2 (ja) * | 2015-03-06 | 2019-07-31 | 国立研究開発法人情報通信研究機構 | 含意ペア拡張装置、そのためのコンピュータプログラム、及び質問応答システム |
JP6347519B2 (ja) * | 2015-05-15 | 2018-06-27 | 日本電信電話株式会社 | 推移矛盾収集装置、方法、及びプログラム |
JP6618735B2 (ja) | 2015-08-31 | 2019-12-11 | 国立研究開発法人情報通信研究機構 | 質問応答システムの訓練装置及びそのためのコンピュータプログラム |
JPWO2017104571A1 (ja) * | 2015-12-14 | 2018-10-04 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、コンピュータ・プログラム |
US10706044B2 (en) | 2016-04-06 | 2020-07-07 | International Business Machines Corporation | Natural language processing based on textual polarity |
US20170293621A1 (en) * | 2016-04-06 | 2017-10-12 | International Business Machines Corporation | Natural language processing based on textual polarity |
US20170293620A1 (en) * | 2016-04-06 | 2017-10-12 | International Business Machines Corporation | Natural language processing based on textual polarity |
JP6721179B2 (ja) | 2016-10-05 | 2020-07-08 | 国立研究開発法人情報通信研究機構 | 因果関係認識装置及びそのためのコンピュータプログラム |
JP6929539B2 (ja) * | 2016-10-07 | 2021-09-01 | 国立研究開発法人情報通信研究機構 | ノン・ファクトイド型質問応答システム及び方法並びにそのためのコンピュータプログラム |
US20190065583A1 (en) * | 2017-08-28 | 2019-02-28 | International Business Machines Corporation | Compound q&a system |
US10915707B2 (en) * | 2017-10-20 | 2021-02-09 | MachineVantage, Inc. | Word replaceability through word vectors |
KR102111609B1 (ko) * | 2018-04-26 | 2020-05-15 | 대한민국 | 재난속성정보 추출 시스템 및 방법 |
US20230020080A1 (en) * | 2021-04-12 | 2023-01-19 | Adishesh Kishore | Relationship builder to relate data across multiple entities/nodes |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008075524A1 (ja) * | 2006-12-18 | 2008-06-26 | Nec Corporation | 極性推定システム、情報配信システム、極性推定方法及び、極性推定用プログラム、及び評価極性推定用プログラム |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100918338B1 (ko) | 2001-08-10 | 2009-09-22 | 도쿠리쯔교세이호진 죠호쯔신겡큐기코 | 복수 언어의 대역 텍스트 입력에 의한 제 3 언어 텍스트 생성 방법, 장치 및 프로그램을 저장한 기록 매체 |
WO2003027894A1 (en) * | 2001-09-26 | 2003-04-03 | The Trustees Of Columbia University In The City Of New York | System and method of generating dictionary entries |
US8155946B2 (en) * | 2002-12-23 | 2012-04-10 | Definiens Ag | Computerized method and system for searching for text passages in text documents |
JP2005031979A (ja) | 2003-07-11 | 2005-02-03 | National Institute Of Advanced Industrial & Technology | 情報処理方法、情報処理プログラム、情報処理装置およびリモートコントローラ |
US7970600B2 (en) * | 2004-11-03 | 2011-06-28 | Microsoft Corporation | Using a first natural language parser to train a second parser |
US7899666B2 (en) * | 2007-05-04 | 2011-03-01 | Expert System S.P.A. | Method and system for automatically extracting relations between concepts included in text |
US20090048823A1 (en) * | 2007-08-16 | 2009-02-19 | The Board Of Trustees Of The University Of Illinois | System and methods for opinion mining |
CN101377770B (zh) | 2007-08-27 | 2017-03-01 | 微软技术许可有限责任公司 | 中文组块分析的方法及系统 |
JP5536518B2 (ja) | 2009-04-23 | 2014-07-02 | インターナショナル・ビジネス・マシーンズ・コーポレーション | システムの自然言語仕様から当該システム用のシステム・モデル化メタモデル言語モデルを自動的に抽出するための方法、装置及びコンピュータ・ |
US8532981B2 (en) * | 2011-03-21 | 2013-09-10 | Xerox Corporation | Corpus-based system and method for acquiring polar adjectives |
US8650023B2 (en) * | 2011-03-21 | 2014-02-11 | Xerox Corporation | Customer review authoring assistant |
WO2012132388A1 (ja) * | 2011-03-28 | 2012-10-04 | 日本電気株式会社 | テキスト分析装置、問題言動抽出方法および問題言動抽出プログラム |
-
2012
- 2012-02-27 JP JP2012039966A patent/JP5924666B2/ja not_active Expired - Fee Related
-
2013
- 2013-01-23 CN CN201380011077.2A patent/CN104137097B/zh not_active Expired - Fee Related
- 2013-01-23 US US14/377,988 patent/US9582487B2/en not_active Expired - Fee Related
- 2013-01-23 WO PCT/JP2013/051326 patent/WO2013128984A1/ja active Application Filing
- 2013-01-23 EP EP13754814.5A patent/EP2821923B1/en not_active Not-in-force
- 2013-01-23 KR KR1020147023682A patent/KR101972408B1/ko active IP Right Grant
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008075524A1 (ja) * | 2006-12-18 | 2008-06-26 | Nec Corporation | 極性推定システム、情報配信システム、極性推定方法及び、極性推定用プログラム、及び評価極性推定用プログラム |
Non-Patent Citations (10)
Title |
---|
DEKANG LIN; PATRICK PANTEL.: "Discovery of inference rules for question answering", NATURAL LANGUAGE ENGINEERING, vol. 7, no. 4, 2001, pages 343 - 360 |
HIROYA TAKAMURA; TAKASHI INUI; MANABU OKUMURA: "Extracting Semantic Orientations of Words using Spin Model", PROCEEDINGS OF THE 43RD ANNUAL MEETING OF THE ACL, 2005, pages 133 - 140, XP055222238 |
INUI TAKASHI; INUI KENTARO; YUJI MATSUMOTO: "Extracting Causal Knowledge from Text, The Case of Resultative Connectives 'tame", INFORMATION PROCESSING SOCIETY OF JAPAN, SPECIAL INTEREST GROUP OF NATURAL LANGUAGE PROCESSING (NL-150-25, 2002, pages 171 - 178 |
JAMES PUSTEJOVSKY: "The Generative Lexicon", 1995, MIT PRESS |
KENTARO TORISAWA: "Automatically Acquiring Natural Language Expressions Representing Preparation and Utilization of an Object", NATURAL LANGUAGE PROCESSING, vol. 13, no. 2, 2006, pages 125 - 144 |
MASAAKI TSUCHIDA; KENTARO TORISAWA; STIJN DE SAEGER; JONG HOON OH; JUN'ICHI KAZAMA; CHIKARA HASHIMOTO; HAYATO OHWADA: "Toward Finding Semantic Relations not Written in a Single Sentence: An Inference Method using Auto-Discovered Rules", PROCEEDINGS OF THE 5TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP 2011, November 2011 (2011-11-01), pages 902 - 910 |
ROXANA GIRJU.: "Automatic Detection of Causal Relations for Question Answering", PROCEEDINGS OF ACL WORKSHOP ON MULTILINGUAL SUMMARIZATION AND QUESTION ANSWERING, 2003 |
SAIFMOHAMMAD; BONNIE DORR; GRAEME HIRST: "Computing Word Pair Antonymy", PROCEEDINGS OF THE 2008 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, October 2008 (2008-10-01), pages 982 - 991 |
See also references of EP2821923A4 |
TETSUYA NASUKAWA ET AL.: "Acquisition of Sentiment Lexicon by Using Context Coherence", IPSJ SIG NOTES, vol. 2004, no. 73, 15 July 2004 (2004-07-15), pages 109 - 116, XP008174372 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3086239A4 (en) * | 2013-12-20 | 2017-12-06 | National Institute of Information and Communications Technology | Scenario generation device and computer program therefor |
Also Published As
Publication number | Publication date |
---|---|
EP2821923B1 (en) | 2016-09-07 |
CN104137097A (zh) | 2014-11-05 |
EP2821923A4 (en) | 2015-12-02 |
CN104137097B (zh) | 2017-02-22 |
EP2821923A1 (en) | 2015-01-07 |
KR20140129053A (ko) | 2014-11-06 |
KR101972408B1 (ko) | 2019-04-25 |
US20150039296A1 (en) | 2015-02-05 |
JP2013175097A (ja) | 2013-09-05 |
JP5924666B2 (ja) | 2016-05-25 |
US9582487B2 (en) | 2017-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5924666B2 (ja) | 述語テンプレート収集装置、特定フレーズペア収集装置、及びそれらのためのコンピュータプログラム | |
Ghenai et al. | Catching Zika fever: Application of crowdsourcing and machine learning for tracking health misinformation on Twitter | |
US20200184275A1 (en) | Method and system for generating and correcting classification models | |
Kotov et al. | Modeling and analysis of cross-session search tasks | |
Dehkharghani et al. | Sentimental causal rule discovery from Twitter | |
US9524291B2 (en) | Visual display of semantic information | |
US9110985B2 (en) | Generating a conceptual association graph from large-scale loosely-grouped content | |
US9183285B1 (en) | Data clustering system and methods | |
AU2015203818B2 (en) | Providing contextual information associated with a source document using information from external reference documents | |
US20070016863A1 (en) | Method and apparatus for extracting and structuring domain terms | |
WO2015093540A1 (ja) | フレーズペア収集装置、及びそのためのコンピュータプログラム | |
CN102089805A (zh) | 用于概念映射的系统和方法 | |
WO2015093539A1 (ja) | 複雑述語テンプレート収集装置、及びそのためのコンピュータプログラム | |
CN112989208B (zh) | 一种信息推荐方法、装置、电子设备及存储介质 | |
Afroz et al. | Sentiment analysis of COVID-19 nationwide lockdown effect in India | |
CN109284389A (zh) | 一种文本数据的信息处理方法、装置 | |
Jedrzejewski et al. | Opinion mining and social networks: A promising match | |
JP2016042364A (ja) | コンピュータによる自然言語処理のためのコンピュータ読取可能な辞書及びそれを記憶した記憶媒体 | |
Kulkarni et al. | Sortinghat: A framework for deep matching between classes of entities | |
Sagae et al. | Image retrieval with textual label similarity features | |
Suzuki et al. | What is your tweet worldview? Mapping the topic structure of tweets on the Wikipedia | |
Kotov et al. | Modeling and Analyses of Multi-Session Search Tasks | |
Khazaei et al. | Computational Analysis of Collective Intelligence in Conversational Text | |
Eklou et al. | How can the Web help Wikipedia? a study of information complementation of Wikipedia by the Web | |
CN113505889A (zh) | 图谱化知识库的处理方法、装置、计算机设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13754814 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14377988 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2013754814 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013754814 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20147023682 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |