WO2013058118A1 - テキスト含意判定装置、テキスト含意判定方法、及びコンピュータ読み取り可能な記録媒体 - Google Patents
テキスト含意判定装置、テキスト含意判定方法、及びコンピュータ読み取り可能な記録媒体 Download PDFInfo
- Publication number
- WO2013058118A1 WO2013058118A1 PCT/JP2012/075765 JP2012075765W WO2013058118A1 WO 2013058118 A1 WO2013058118 A1 WO 2013058118A1 JP 2012075765 W JP2012075765 W JP 2012075765W WO 2013058118 A1 WO2013058118 A1 WO 2013058118A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- term structure
- predicate
- word
- implication
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present invention relates to a text implication determination device, a text implication determination method, and a computer-readable recording medium on which a program for realizing the same is recorded for determining whether a specific text implies other text.
- a program for realizing the same is recorded for determining whether a specific text implies other text.
- the text entailment determination is a task for determining whether “text H can be estimated from text T” when text T and text H are given.
- Non-Patent Document 1 discloses an example of a conventional text entailment determination system.
- the text entailment determination system disclosed in Non-Patent Document 1 first parses text T and text H, and for each, the verb is the root (top node) and the verb argument (subject, purpose)
- a tree structure is created in which words included in a word or the like become child nodes or grandchild nodes.
- Non-Patent Document 1 performs word substitution and syntactic paraphrasing on the text T, and creates a tree structure that matches the tree structure of the text H in the subtree. Try.
- the implication determination system determines an implication when the tree structure can be created in the subtree of the text T.
- Non-Patent Document 1 when performing a tree structure match determination, not only a match determination as to whether or not a match is complete, but also an approximate match determination can be performed. Specifically, after the above-described tree structure is created, the implication determination system creates data called VAS (Verb-Argument Structure) from the created tree structure.
- VAS Very-Argument Structure
- VAS is a kind of so-called predicate term structure, and is composed of a root verb of a tree structure and a word set created separately for each argument type.
- a VAS of “ ⁇ kill, (object: Casey, Sheehan), (other: Iraq)>” is generated from a sentence “Casey Sheehan was killed in Iraq.”.
- Non-Patent Document 1 also discloses a method of creating a word set from the entire child node and grandchild node without distinguishing the types of arguments when the root is a be verb.
- Non-Patent Document 1 obtains the word coverage between word sets having the same argument for two VASs created from the text T and the text H. Subsequently, the implication determination system determines that the contents of the arguments of the two VASs match when the word coverage is equal to or greater than a certain value, and further determines that the content of the two arguments is equal to or greater than a certain ratio. It is determined that the original tree structures of two VASs also match. Thus, when a VAS is generated, not only a complete match of the verb argument character string but also an approximate match is determined.
- Non-Patent Document 1 since the implication determination system disclosed in Non-Patent Document 1 described above can determine whether one of the two natural sentences is implied, the natural sentence can be searched. It is considered possible.
- Non-Patent Document 1 has a problem that implication determination cannot be performed when the term structures between predicates are different. This is because the implication determination system attempts to collate between texts assuming that the term structures between predicates are the same.
- Non-Patent Document 1 extracts “withdrawal (subject: company A, object: personal computer, business)” from text T as VAS, and “disappears (subject: A) from text H. (Company, PC)).
- An example of an object of the present invention is a text entailment determination device that can solve the above-described problem and enable the determination of implications even when the term structures between predicates are different in a plurality of texts to be determined, A text agreement determination method and a computer-readable recording medium are provided.
- an implication determination apparatus is an apparatus for determining whether or not a first text implies a second text
- the predicate term structure of each of the first text and the second text is acquired, and for each of the first text and the second text, a predicate argument in the predicate term structure for each predescription term structure
- a vector generation unit that generates a vector using a word other than the word indicating the type of A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison,
- a combination identifying unit for identifying a combination of a predicate term structure of one text and a predicate term structure of the second text; For each of the identified combinations, a feature amount is obtained, and based on the obtained feature amount, an implication determination unit that determines whether the first text implies the second text, and It is characterized by having.
- an implication determination method in one aspect of the present invention is a method for determining whether or not a first text implies a second text
- A Obtaining predicate term structures of the first text and the second text, respectively, and for each of the first text and the second text, for each predescription term structure, in the predicate term structure Generating a vector using a word other than the word indicating the type of argument of the predicate
- B A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison Identifying a combination of a predicate term structure of the first text and a predicate term structure of the second text
- C determining a feature value for each identified combination, and determining whether the first text implies the second text based on the determined feature value; It is characterized by having.
- a computer-readable recording medium records a program for determining whether or not the first text implies the second text by the computer.
- a computer-readable recording medium In the computer, (A) Obtaining predicate term structures of the first text and the second text, respectively, and for each of the first text and the second text, for each predescription term structure, in the predicate term structure Generating a vector using a word other than the word indicating the type of argument of the predicate; (B) A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison Identifying a combination of a predicate term structure of the first text and a predicate term structure of the second text; (C) determining a feature value for each identified combination, and determining whether the first text implies the second text based on the determined feature value; It is characterized by recording a program including an instruction for executing
- FIG. 1 is a block diagram showing a configuration of an implication determining apparatus according to an embodiment of the present invention.
- FIG. 2 is a flowchart showing the operation of the implication determining apparatus in the embodiment of the present invention.
- FIG. 3 is a diagram illustrating an example of a predicate term structure extracted from text.
- FIG. 4 is a diagram showing a vector generated from the predicate term structure shown in FIG.
- FIG. 5 is a diagram for explaining an example of a combination identification process executed in the present embodiment.
- FIG. 6 is a diagram for explaining an example of implication determination processing executed in the present embodiment.
- FIG. 7 is a block diagram illustrating an example of a computer that implements an implication determining apparatus according to an embodiment of the present invention.
- FIG. 1 is a block diagram showing a configuration of an implication determining apparatus according to an embodiment of the present invention.
- the implication determining apparatus 2 is an apparatus for determining whether or not a first text implies a second text.
- the implication determining apparatus 2 according to the present embodiment shown in FIG. Further, as illustrated in FIG. 1, the implication determining device 2 includes a vector generation unit 21, a combination identification unit 22, and an implication determination unit 23.
- the vector generation unit 21 first acquires predicate term structures of the first text and the second text. Then, the vector generation unit 21 generates a vector for each of the first text and the second text using a word other than the word indicating the type of argument of the predicate in the predicate term structure for each predicate term structure. .
- the combination identifying unit 22 compares the vector generated for each predicate term structure with respect to the first text and the vector generated for each predicate term structure with respect to the second text, and determines the first based on the comparison result. The combination of the predicate term structure of the second text and the predicate term structure of the second text is identified.
- the implication determining unit 23 calculates a feature amount for each identified combination, and determines whether the first text implies the second text based on the calculated feature amount.
- the implication determination apparatus 2 identifies combinations of predicate term structures to be determined from the comparison result of vectors ignoring the term structure, and determines implications between texts based on the identified combinations. . Therefore, according to the implication determination device 2, even if the term structures between predicates are different in a plurality of texts to be determined, these implications can be determined.
- the implication determining device 2 is a device that operates by program control, and is realized by executing a program described later on a computer.
- the “predicate term structure” includes at least a predicate (verb) included in the text, a word serving as an argument of the predicate, and a word (label) indicating the type of the argument (described later). 3). Therefore, in the present embodiment, “a word other than a word indicating the type of argument of a predicate in the predicate term structure” means a predicate (verb) and a word serving as an argument.
- the implication determining device 2 includes an input device 1, a storage device 3 that stores various data used in the implication determining device 2, and a result to be output. Are connected to the output device 4.
- the implication determination device 2 constructs an implication determination system together with the input device 1, the storage device 3, and the output device 4.
- the input device 1 inputs two texts to be subjected to implication determination by the implication determination device 2, that is, the first text and the second text, to the implication determination device 2.
- the first text may be referred to as “text T” or simply “T”
- the second text may be referred to as “text H” or simply “H”.
- the text T and the text H that are subject to implication determination may be text in an arbitrary unit defined by some method.
- Text T and text H subject to implication determination for example, created by concatenating all or part of the text that constitutes a text file and a character string contained in an arbitrary subtree obtained by parsing And text created by concatenating character strings in the predicate term structure.
- the output device 4 outputs the result of the implication determination performed by the implication determination device 2 for the text T and text H input from the input device 1.
- Specific examples of the output device 4 include a display device and a printer.
- the storage device 3 includes an inter-word matching rule storage unit 30 and a predicate term inter-structure matching rule storage unit 31.
- the implication determining device 2 can use the information stored in the storage device 3, and therefore, the implication determination accuracy can be improved as compared to the case where the storage device 3 is not connected. .
- the inter-word collation rule storage unit 30 stores implication rules (inter-word collation rules) that are established between words such as synonyms, subordinates, parts, and derivations.
- implication rules include “NEC ⁇ NEC”, “Run ⁇ Move”, “Tokyo ⁇ Japan”, “Manufacturer ⁇ Make”, and the like.
- the predicate term inter-structure matching rule storage unit 31 stores a relation of arguments between predicate term structures to be collated at the time of implication determination (predicate term inter-structure matching rule).
- predicate term inter-structure matching rules include “withdraw (subject: X, object: Y) ⁇ disappear (subject: Y of X)”, “kill (object: X) ⁇ die (subject: X)” Or the like.
- the implication determining device 2 includes a predicate term structure analyzing unit 20 in addition to the vector generating unit 21, the combination identifying unit 22, and the implication determining unit 23 described above.
- the predicate term structure analysis unit 20 performs a structure analysis on the text input by the input device 1 and extracts a predicate term structure from each text based on the result of the structure analysis. Further, the predicate term structure analysis unit 20 outputs the extracted predicate structure to the vector generation unit 21.
- the vector generation unit 21 extracts words other than the word indicating the type of argument of the predicate in the predicate term structure, that is, the predicate and the word serving as the argument of the predicate. To generate a vector.
- the vector generation unit 21 generates a vector for each predicate term structure of each text, that is, for each predicate structure when each text has a plurality of predicate structures.
- the vector generation unit 21 outputs the generated vector to the combination identification unit 22.
- the combination identification unit 22 reads the collation rule from each of the inter-word collation rule storage unit 30 and the predicate term inter-structure collation rule storage unit 31, and identifies the combination by referring to the collation rule. it can. Further, the combination identifying unit 22 outputs the identified combination to the implication determining unit 23.
- the combination identification unit 22 calculates the similarity between the vector generated for each predicate term structure for the text T and the vector generated for each predicate term structure for the text H.
- the combination identifying unit 22 identifies a combination of the predicate term structure of the text T and the predicate term structure of the text H based on the calculated similarity.
- the combination identifying unit 22 identifies, for each predicate term structure of the text H, a combination of the predicate term structure and a single predicate term structure of the text T. That is, in this case, combinations are identified by the number of predicate term structures of the text H.
- the combination identifying unit 22 calculates the similarity for all possible pairs of each vector generated from the predicate term structure of the text H and each vector generated from the predicate structure of the text T. . Then, the combination identifying unit 22 identifies a pair whose similarity is equal to or higher than a threshold value or a pair having the highest similarity, and identifies two predicate term structures that are the creation sources of the identified pair.
- the implication determining unit 23 calculates a feature amount based on a word (a predicate and a word serving as an argument thereof) other than a word indicating the type of the predicate argument in the predicate term structure.
- examples of the feature amount include the degree of word coverage in the predicate term structure of the text T and the predicate term structure of the text H, and the degree of matching of words for only the word serving as an argument.
- the implication determining unit 23 can determine that the text T implies the text H when, for example, a threshold value is set for the feature amount and the threshold value is equal to or greater than the set threshold value. Further, in this embodiment, the implication determining unit 23 can also perform determination using a structural feature of the predicate term structure in addition to the feature amount.
- the implication determining unit 23 outputs the result of the implication determination to the output device 4.
- the determination criterion in implication determination is not particularly limited, and a determination criterion conventionally used for implication determination can also be used.
- FIG. 2 is a flowchart showing the operation of the implication determining apparatus in the embodiment of the present invention.
- FIG. 1 is taken into consideration as appropriate.
- the implication determination method is implemented by operating the implication determination apparatus 2. Therefore, the description of the implication determination method in the present embodiment is replaced with the following description of the operation of the implication determination device 2.
- the predicate term structure analysis unit 20 receives the input of the text T and the text H from the input device 1 and extracts the predicate term structure from the text T and the text H that have received the input. (Step S1).
- FIG. 3 is a diagram illustrating an example of a predicate term structure extracted from text.
- the text T and the text H are illustrated, but the text T is two of the text T1 and the text T2.
- the implication determining device 2 determines whether the text T1 implies the text H and whether the text T2 implies the text H. Is determined.
- the predicate term structure analysis unit 20 indicates that the predicate is “approval” from the text “T1: Mr. B is approved as the president of Company A (Tokyo) by the general meeting of shareholders”. Then, “approval (subject: general meeting of shareholders, object: Mr. B, target: president of company A (Tokyo))” is extracted as the predicate term structure.
- predicate term structure analysis unit 20 starts with the text “T2: Mr. B who lives in Tokyo as the president of company A”. , “In office (subject: Mr. B, goal: president of company A)” and “living (subject: Mr. B, location: Tokyo)” are extracted.
- the predicate term structure analysis unit 20 starts with the predicate term structure from the text “H: B has become the president of company A in Tokyo.” And “Naru (subject: Mr. B, goal: president of company A in Tokyo)” and “A (subject: company A, location: Tokyo)” are extracted.
- T1 implies H, but T2 is correct when it is determined that H is not implied. This is because the information that “Company A is in Tokyo” cannot be read from T2.
- the vector generation unit 21 acquires the predicate term structure of each text extracted in step S1, and for each text, the predicate in the predicate term structure and the word serving as the argument of the predicate are components. (Hereinafter referred to as “predicate term structure vector”) is created (step S2).
- FIG. 4 is a diagram showing a vector generated from the predicate term structure shown in FIG.
- the vector generation unit 21 generates a predicate term structure vector using only content words such as a predicate and a word serving as an argument of the predicate.
- the predicate term structure vector only needs to include a word other than a word indicating the type of argument of the predicate in the predicate term structure as a component.
- the vector generation unit 21 extracts the predicate term structure “approved (subject: stock general meeting, object: Mr. B, target: president of company A (Tokyo)) extracted from T1. ”Is generated as a predicate term structure vector (approval, shareholder, general meeting, Mr. B, company A, Tokyo, president).
- the vector generation unit 21 obtains (incumbent, Mr. B) from “incumbent (subject: Mr. B, target: president of company A)” and “dwell (subject: Mr. B, location: Tokyo)” extracted from T2. , Company A, President) and (Live, Mr. B, Tokyo).
- the vector generation unit 21 is obtained from “Naru (subject: Mr. B, target: president of company A in Tokyo)” and “A (subject: company A, location: Tokyo)” extracted from H. , B, Tokyo, Company A, President) and (A Company, Tokyo).
- the vector generation unit 21 uses arbitrary information that can be acquired from a term structure such as “predicate_argument type_word” (hereinafter referred to as “structure information”) as a predicate term structure vector. Can also be added.
- structure information a term structure
- the vector generation unit 21 may, for example, (approval, shareholder, general meeting, Mr. B, company A, Tokyo, president, predicate: approval, approval_subject_shareholder, approval_subject_general_general, approval_object_B A vector such as Mr., 7) can be generated.
- the combination identification unit 22 calculates the similarity of each vector obtained from the predicate term structures of the text T1 and the text T2 with respect to each vector obtained from the predicate term structure of the text H, and based on the similarity.
- a combination of predicate term structures is identified (step S3).
- the combination identifying unit 22 identifies a vector pair whose similarity is equal to or greater than a threshold value or a vector pair having the highest similarity, and identifies two predicate term structures from which the identified pair is created.
- the two predicate term structure vectors to be calculated are vectors whose dimension number is a value obtained by subtracting the number of common character strings from the total number of both character strings. Converted. At this time, the component in which the character string exists is “1”, and the component in which the character string does not exist is “0”.
- the former is converted to (1, 1, 1, 0, 0, 0, 0), and the latter is converted to (0, 1, 1, 1, 1, 1, 1).
- the weight value estimated by some method may be given to the component of each vector after conversion.
- FIG. 5 is a diagram for explaining an example of a combination identification process executed in the present embodiment.
- the similarity threshold is set to 0.5
- the cosine similarity sim is calculated by the following formula 1.
- x and y indicate two converted vectors to be calculated.
- (x ⁇ y) represents the inner product of the vector x and the vector y
- represents the length of the vector x
- represents the length of the vector y.
- T1 and H are objects of implication determination. Since T1 has only one predicate term structure, the combination of the predicate term structure of T1 and H's “subject (subject: Company A, location: Tokyo)”, and the predicate term structure of T1 and H The combination “Naru (subject: Mr. B, goal: president of company A in Tokyo)” is automatically identified.
- the predicate term structure vector of T2 as “taken (subject: Mr. B, goal: president of company A)” Is calculated to be 0.617, and the similarity of the predicate term structure vector to “dwell (subject: Mr. B, place: Tokyo)” is set to 0.471.
- the latter degree of similarity is below the threshold value, only “incumbent (subject: Mr. B, target: president of company A)” with the highest degree of similarity is identified as the predicate term structure to be determined.
- the combination identification unit 22 refers to the inter-word collation rule stored in the inter-word collation rule storage unit 30 and matches two words defined in the inter-word collation rule. It is also possible to calculate the similarity degree.
- the combination identifying unit 22 regards the verb and the noun as a matching word, and determines the similarity. Can be calculated. As a result, it is possible to identify the predicate term structure to be determined more appropriately without being bound by the predicate term structure.
- the combination identification unit 22 stores the predicate term inter-structure matching rule storage unit 31 when the structure information described in the description of step S2 is added to the predicate term structure vector.
- the combination identification unit 22 calculates similarity by regarding each argument of the two predicate term structures defined by the predicate term structure matching rule as a matching word.
- the combination identifying unit 22 can also refer to the inter-word matching rule when determining whether the arguments match.
- step S3 the combination identifying unit 22 performs normalization according to the amount of information included in each predicate term structure vector when calculating the similarity in order to identify a predicate term structure having no redundant information. Processing can also be executed. Examples of the amount of information included in the predicate term structure vector include the number of non-zero components of the vector, the weight of the component, and the like. Further, the similarity calculated after the normalization process is executed includes a cosine similarity, a jaccard coefficient, and the like.
- the predicate structure “Presentation (subject: president, purpose: Mr. B will be the chairman)”
- the “purpose” of the predicate term structure has a predicate term structure of “being (subject: Mr. B, goal: chairman)”. Therefore, using the number of shared words as the similarity, when trying to identify the determination target of the predicate term structure of “Naru (subject: Mr. B, goal: president)”, the similarity is 3 with the former, Two between the latter and the latter inherent in the former. As a result, when the word sharing number is used as the similarity, the former is easily selected as a determination target.
- the former includes the word “President” although Mr. B does not mean “President”. Therefore, depending on the determination criteria in the implication determination unit 23 described later, “becomes (subject: Mr. B, target: president)” There is a possibility of making an implication determination that implies the former.
- the implication determining unit 23 obtains a feature amount for each combination of each predicate structure on the H side and the predicate term structure to be determined on the T side, which is identified by the combination identifying unit 22 in step S3, and determines the feature amount. Based on this, it is determined whether T implies H (step S4). In the present embodiment, for example, the implication determining unit 23 calculates an implication score between T and H based on the obtained feature amount, and determines an implication if the implication score is a certain value or more.
- the implication determination unit 23 determines the coverage of words between the predicate term structures based on words other than words indicating the types of predicate arguments in the predicate term structure (predicates and predicate argument words), Alternatively, the degree of matching of words only for the word as an argument is obtained as a feature amount. Further, the implication determining unit 23 can also obtain the feature amount by using one or both of the inter-word matching rule and the predicate term inter-structure matching rule, like the combination identifying unit 22.
- FIG. 6 is a diagram for explaining an example of implication determination processing executed in the present embodiment.
- the word coverage (coverage) between the predicate term structures is obtained as the feature amount.
- the coverage is the number of components that match the predicate term structure vector of H and the predicate term structure vector of T1 or T2, and a coverage of all the components of the H predicate term structure vector.
- the number is b, it is calculated by the following formula 2.
- an average value of feature quantities calculated between T1 or T2 and H is calculated, and this average value is used as an implication score.
- the implication score is 0.50 or more, it is determined as an implication.
- the consensus score is 0.50 or more which is a threshold value, the implication determining unit 23 determines that “T1 implies H”.
- the implication determination unit 23 determines that “T2 does not imply H”.
- the implication determining unit 23 can also assign a weight to the feature amount based on the data obtained by machine learning when obtaining the feature amount. Specifically, when a large amount of combinations of two texts that can be determined as implications are learned by machine learning, the implication determination unit 23 corrects the calculated feature amount based on the learned data. Can do. Specific examples of machine learning include decision trees, perceptrons, support vector machines, and the like.
- the implication determination unit 23 performs the implication determination using the feature amount obtained from the predicate included in the predicate term structure and the word as an argument for each combination.
- the present embodiment is not limited to the above example.
- the implication determining unit 23 can determine whether T implies H by using the structural features of the predicate term structure of each text in addition to the above feature amount.
- structural features include presence / absence of information such as “denial” and “modality (guess, possible, etc.)” given to the predicate term structure, types of predicate arguments, and the like.
- the implication determination unit 23 determines that the “Negation” is given to the T side, even if the coverage is high. Can be judged.
- the structural feature of the predicate term structure is used. Even when implication determination is difficult, accurate implication determination is possible.
- the implication determining unit 23 determines the structural similarity between the T predicate term structure and the H predicate term structure, It is also possible to make an implication determination by giving priority to one of the feature quantity and the structural feature.
- structural similarity is specified based on, for example, the degree of similarity between predicates, or the degree of similarity of the types of arguments included in each predicate term structure.
- the implication determination unit 23 performs the implication determination with priority on the feature amount.
- the program in the present embodiment may be a program that causes a computer to execute steps S1 to S4 shown in FIG.
- a CPU Central Processing Unit
- a CPU Central Processing Unit
- a storage device such as a hard disk provided in the computer can function as the storage device 3.
- FIG. 7 is a block diagram illustrating an example of a computer that implements an implication determining apparatus according to an embodiment of the present invention.
- the computer 110 includes a CPU 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader / writer 116, and a communication interface 117. These units are connected to each other via a bus 121 so that data communication is possible.
- the CPU 111 performs various operations by developing the program (code) in the present embodiment stored in the storage device 113 in the main memory 112 and executing them in a predetermined order.
- the main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory).
- the program in the present embodiment is provided in a state of being stored in a computer-readable recording medium 120. Note that the program in the present embodiment may be distributed on the Internet connected via the communication interface 117.
- the storage device 113 include a semiconductor storage device such as a flash memory in addition to a hard disk.
- the input interface 114 mediates data transmission between the CPU 111 and an input device 118 such as a keyboard and a mouse.
- the display controller 115 is connected to the display device 119 and controls display on the display device 119.
- the data reader / writer 116 mediates data transmission between the CPU 111 and the recording medium 120, and reads a program from the recording medium 120 and writes a processing result in the computer 110 to the recording medium 120.
- the communication interface 117 mediates data transmission between the CPU 111 and another computer.
- the recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic storage media such as a flexible disk, or CD- Optical storage media such as ROM (Compact Disk Read Only Memory) are listed.
- CF Compact Flash
- SD Secure Digital
- magnetic storage media such as a flexible disk
- CD- Optical storage media such as ROM (Compact Disk Read Only Memory) are listed.
- An apparatus for determining whether a first text implies a second text comprising: The predicate term structure of each of the first text and the second text is acquired, and for each of the first text and the second text, a predicate argument in the predicate term structure for each predescription term structure A vector generation unit that generates a vector using a word other than the word indicating the type of A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison, A combination identifying unit for identifying a combination of a predicate term structure of one text and a predicate term structure of the second text; For each of the identified combinations, a feature amount is obtained, and based on the obtained feature amount, an implication determination unit that determines whether the first text implies the second text, and A text entailment determination device characterized by comprising:
- the predescription term structure includes a predicate included in the first text or the second text, a word serving as an argument of the predescription word, and a word indicating the type of the argument;
- the vector generation unit generates the vector using a previous description word and a word that is an argument of the previous description word;
- the text entailment determination device according to attachment 1.
- the combination identification unit calculates a similarity between a vector generated for each preceding description term structure for the first text and a vector generated for each preceding description term structure for the second text. Identifying a combination of the predicate term structure of the first text and the predicate term structure of the second text based on the calculated similarity; The text entailment determination apparatus according to appendix 1 or 2.
- the combination identification unit performs normalization processing according to the amount of information included in the vector when the similarity is calculated.
- the text entailment determination device according to attachment 3.
- the implication determination unit uses the predicate term structure of the first text and the second as the feature amount based on a word other than the word indicating the type of argument of the predicate in the predicate term structure. Finding either the word coverage degree or the word matching degree only for the word as the argument in the predicate term structure of the text of The text entailment determination device according to any one of appendices 1 to 4.
- the implication determining unit determines whether the first text implies the second text by using a structural feature of the predescription term structure in addition to the feature amount;
- the text entailment determination device according to attachment 5.
- the implication determining unit determines the structural features of the feature quantity and the predescription term structure according to the structural similarity between the predicate term structure of the first text and the predicate term structure of the second text.
- the text entailment determination device according to appendix 6, wherein the determination is performed with priority given to any one of the above.
- Appendix 8 The text implication determination device according to any one of appendices 5 to 7, wherein the implication determination unit assigns a weight to the feature amount based on data obtained by machine learning when the feature amount is obtained.
- a method for determining whether a first text implies a second text comprising: (A) Obtaining predicate term structures of the first text and the second text, respectively, and for each of the first text and the second text, for each predescription term structure, in the predicate term structure Generating a vector using a word other than the word indicating the type of argument of the predicate; (B) A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison Identifying a combination of a predicate term structure of the first text and a predicate term structure of the second text; (C) determining a feature value for each identified combination, and determining whether the first text implies the second text based on the determined feature value; A method for determining text entailment, comprising:
- the predescription term structure includes a predicate included in the first text or the second text, a word serving as an argument of the predescription word, and a word indicating the type of the argument;
- the vector is generated using a previous description word and a word serving as an argument of the previous description word.
- step (b) the similarity between the vector generated for each preceding description term structure for the first text and the vector generated for each preceding description term structure for the second text is calculated. Calculating a combination of the predicate term structure of the first text and the predicate term structure of the second text based on the calculated similarity; The text entailment determination method according to appendix 9 or 10.
- step (b) when the similarity is calculated, a normalization process is executed according to the amount of information included in the vector.
- the predicate term structure of the first text and the feature text are based on a word other than the word indicating the type of argument of the predicate in the predescript term structure. Find either the word coverage in the predicate term structure of the second text or the word match for only the word that is the argument, The method for determining entailment of text according to any one of appendices 9 to 12.
- step (c) in addition to the feature quantity, it is determined whether the first text implies the second text by using a structural feature of the predescription term structure.
- step (c) In the step (c), according to the structural similarity between the predicate term structure of the first text and the predicate term structure of the second text, 15.
- a computer-readable recording medium recording a program for determining whether or not a first text implies a second text by a computer, In the computer, (A) Obtaining predicate term structures of the first text and the second text, respectively, and for each of the first text and the second text, for each predescription term structure, in the predicate term structure Generating a vector using a word other than the word indicating the type of argument of the predicate; (B) A vector generated for each preceding description term structure for the first text is compared with a vector generated for each preceding description term structure for the second text, and based on the result of the comparison Identifying a combination of a predicate term structure of the first text and a predicate term structure of the second text; (C) determining a feature value for each identified combination, and determining whether the first text implies the second text based on the determined feature value; The computer-readable recording medium which records the program containing the instruction
- the predescription term structure includes a predicate included in the first text or the second text, a word serving as an argument of the predescription word, and a word indicating the type of the argument;
- the vector is generated using a previous description word and a word serving as an argument of the previous description word.
- step (b) the similarity between the vector generated for each preceding description term structure for the first text and the vector generated for each preceding description term structure for the second text is calculated. Calculating a combination of the predicate term structure of the first text and the predicate term structure of the second text based on the calculated similarity;
- the computer-readable recording medium according to appendix 17 or 18.
- the predicate term structure of the first text and the feature text are based on a word other than the word indicating the type of argument of the predicate in the predescript term structure. Find either the word coverage in the predicate term structure of the second text or the word match for only the word that is the argument,
- the computer-readable recording medium according to any one of appendices 17 to 20.
- step (c) in addition to the feature quantity, it is determined whether the first text implies the second text by using a structural feature of the predescription term structure.
- Appendix 24 The computer-readable information according to any one of appendices 21 to 23, wherein, in the step (c), when obtaining the feature amount, a weight is assigned to the feature amount based on data obtained by machine learning. recoding media.
- the present invention is useful for applications such as semantic natural sentence retrieval in an information retrieval system.
- the present invention is also useful for uses such as opinion clustering in text mining.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Discrimination (AREA)
Abstract
Description
本発明の目的の一例は、上記問題を解消し、判定対象となる複数のテキストにおいて、述語間の項構造が異なる場合であっても、これらの含意判定を可能にし得る、テキスト含意判定装置、テキスト合意判定方法、及びコンピュータ読み取り可能な記録媒体を提供することにある。
前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ベクトル生成部と、
前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、組合せ同定部と、
同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、含意判定部と、
を備えていることを特徴とする。
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を有することを特徴とする。
前記コンピュータに、
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を実行させる、命令を含むプログラムを記録していることを特徴とする。
以下、本発明の実施の形態における、含意判定装置、含意判定方法、及びプログラムについて、図1~図7を参照しながら説明する。
最初に、本実施の形態における含意判定装置の構成について図1を用いて説明する。図1は、本発明の実施の形態における含意判定装置の構成を示すブロック図である。
次に、本発明の実施の形態における含意判定装置2の動作について図2を用いて説明する。図2は、本発明の実施の形態における含意判定装置の動作を示すフロー図である。以下の説明においては、適宜図1を参酌する。また、本実施の形態では、含意判定装置2を動作させることによって、含意判定方法が実施される。よって、本実施の形態における含意判定方法の説明は、以下の含意判定装置2の動作説明に代える。
最初に、図2に示すように、述語項構造解析部20が、入力装置1から、テキストTとテキストHとの入力を受付け、入力を受付けたテキストT及びテキストHから述語項構造を抽出する(ステップS1)。
次に、ベクトル生成部21は、ステップS1で抽出された各テキストの述語項構造を取得し、各テキストについて、述語項構造毎に、述語項構造中の述語及び述語の引数となる単語が成分となるベクトル(以下「述語項構造ベクトル」と表記する。)を作成する(ステップS2)。
次に、組合せ同定部22は、テキストHの述語項構造から得られた各ベクトルに対する、テキストT1及びテキストT2それぞれの述語項構造から得られた各ベクトルの類似度を計算し、類似度に基づいて、述語項構造の組合せを同定する(ステップS3)。例えば、組合せ同定部22は、類似度が閾値以上となるベクトルのペア、又は類似度が最も高いベクトルのペアを特定し、特定したペアの作成元の2つの述語項構造を同定する。
sim=(x・y)/(|x||y|)
最後に、含意判定部23は、ステップS3で組合せ同定部22が同定した、H側の各述語構造とT側の判定対象の述語項構造との各組合せについて、特徴量を求め、特徴量に基づいて、TがHを含意しているかどうかを判定する(ステップS4)。また、本実施の形態では、含意判定部23は、例えば、求めた特徴量に基づいて、TとHとの含意スコアを計算し、含意スコアが一定値以上であれば含意と判定する。
被覆率=a/b
ここで、単純に、テキスト中の単語集合の被覆率を、含意スコアとして、含意判定を行なう場合について検討する。図3の例を挙げると、T1及びT2は、共に、Hの6個の内容語のうち4単語(A社,B氏,東京,社長)を含んでいる。よって、Hを基準とした被覆率は、共に、0.66(=4/6)となる。これは、含意、非含意を区別できないことを意味する。
本実施の形態におけるプログラムは、コンピュータに、図2に示すステップS1~S4を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態における含意判定装置2と含意判定方法とを実現することができる。この場合、コンピュータのCPU(Central Processing Unit)は、述語項構造解析部20、ベクトル生成部21、組合せ同定部22、含意判定部23として機能し、処理を行なう。また、本実施の形態では、コンピュータに備えられたハードディスク等の記憶装置が、記憶装置3として機能することができる。
第1のテキストが第2のテキストを含意しているかどうかを判定するための装置であって、
前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ベクトル生成部と、
前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、組合せ同定部と、
同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、含意判定部と、
を備えていることを特徴とするテキスト含意判定装置。
前記述語項構造が、前記第1のテキスト又は前記第2のテキストに含まれる述語と、前記述語の引数となる単語と、前記引数の種類を示す単語とを含み、
前記ベクトル生成部が、前記述語と、前記述語の引数となる単語とを用いて、前記ベクトルを生成する、
付記1に記載のテキスト含意判定装置。
前記組合せ同定部が、前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルと、の類似度を算出し、算出した前記類似度に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、
付記1または2に記載のテキスト含意判定装置。
前記組合せ同定部が、前記類似度の算出の際に、前記ベクトルの有する情報の量に応じて正規化処理を実行する、
付記3に記載のテキスト含意判定装置。
前記含意判定部が、前記組合せ毎に、前記述語項構造において述語の引数の種類を示す単語以外の単語に基づいて、前記特徴量として、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造とにおける、単語の被覆度合い、及び前記引数となる単語のみを対象とした単語の一致度合い、のいずれかを求める、
付記1~4のいずれかに記載のテキスト含意判定装置。
前記含意判定部が、前記特徴量に加えて、前記述語項構造の構造的な特徴を用いて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、
付記5に記載のテキスト含意判定装置。
前記含意判定部が、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との構造的な類似性に応じて、前記特徴量及び前記述語項構造の構造的な特徴のいずれか一方を優先して判定を行なう、付記6に記載のテキスト含意判定装置。
前記含意判定部が、前記特徴量を求める際に、機械学習によって得られたデータに基づいて、前記特徴量に重みを付与する、付記5~7のいずれかに記載のテキスト含意判定装置。
第1のテキストが第2のテキストを含意しているかどうかを判定するための方法であって、
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を有することを特徴とするテキスト含意判定方法。
前記述語項構造が、前記第1のテキスト又は前記第2のテキストに含まれる述語と、前記述語の引数となる単語と、前記引数の種類を示す単語とを含み、
前記(a)のステップにおいて、前記述語と、前記述語の引数となる単語とを用いて、前記ベクトルを生成する、
付記9に記載のテキスト含意判定方法。
前記(b)のステップにおいて、前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルと、の類似度を算出し、算出した前記類似度に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、
付記9または10に記載のテキスト含意判定方法。
前記(b)のステップにおいて、前記類似度の算出の際に、前記ベクトルの有する情報の量に応じて正規化処理を実行する、
付記11に記載のテキスト含意判定方法。
前記(c)のステップにおいて、前記組合せ毎に、前記述語項構造において述語の引数の種類を示す単語以外の単語に基づいて、前記特徴量として、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造とにおける、単語の被覆度合い、及び前記引数となる単語のみを対象とした単語の一致度合い、のいずれかを求める、
付記9~12のいずれかに記載のテキスト含意判定方法。
前記(c)のステップにおいて、前記特徴量に加えて、前記述語項構造の構造的な特徴を用いて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、
付記13に記載のテキスト含意判定方法。
前記(c)のステップにおいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との構造的な類似性に応じて、前記特徴量及び前記述語項構造の構造的な特徴のいずれか一方を優先して判定を行なう、付記14に記載のテキスト含意判定方法。
前記(c)のステップにおいて、前記特徴量を求める際に、機械学習によって得られたデータに基づいて、前記特徴量に重みを付与する、付記13~15のいずれかに記載のテキスト含意判定方法。
コンピュータによって、第1のテキストが第2のテキストを含意しているかどうかを判定するためのプログラムを記録したコンピュータ読み取り可能な記録媒体であって、
前記コンピュータに、
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を実行させる、命令を含むプログラムを記録している、コンピュータ読み取り可能な記録媒体。
前記述語項構造が、前記第1のテキスト又は前記第2のテキストに含まれる述語と、前記述語の引数となる単語と、前記引数の種類を示す単語とを含み、
前記(a)のステップにおいて、前記述語と、前記述語の引数となる単語とを用いて、前記ベクトルを生成する、
付記17に記載のコンピュータ読み取り可能な記録媒体。
前記(b)のステップにおいて、前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルと、の類似度を算出し、算出した前記類似度に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、
付記17または18に記載のコンピュータ読み取り可能な記録媒体。
前記(b)のステップにおいて、前記類似度の算出の際に、前記ベクトルの有する情報の量に応じて正規化処理を実行する、
付記19に記載のコンピュータ読み取り可能な記録媒体。
前記(c)のステップにおいて、前記組合せ毎に、前記述語項構造において述語の引数の種類を示す単語以外の単語に基づいて、前記特徴量として、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造とにおける、単語の被覆度合い、及び前記引数となる単語のみを対象とした単語の一致度合い、のいずれかを求める、
付記17~20のいずれかに記載のコンピュータ読み取り可能な記録媒体。
前記(c)のステップにおいて、前記特徴量に加えて、前記述語項構造の構造的な特徴を用いて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、
付記21に記載のコンピュータ読み取り可能な記録媒体。
前記(c)のステップにおいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との構造的な類似性に応じて、前記特徴量及び前記述語項構造の構造的な特徴のいずれか一方を優先して判定を行なう、付記22に記載のコンピュータ読み取り可能な記録媒体。
前記(c)のステップにおいて、前記特徴量を求める際に、機械学習によって得られたデータに基づいて、前記特徴量に重みを付与する、付記21~23のいずれかに記載のコンピュータ読み取り可能な記録媒体。
2 含意判定装置
3 記憶装置
4 出力装置
20 述語項構造解析部
21 ベクトル生成部
22 組合せ同定部
23 含意判定部
30 単語間照合規則記憶部
31 述語項構造間照合規則記憶部
110 コンピュータ
111 CPU
112 メインメモリ
113 記憶装置
114 入力インターフェイス
115 表示コントローラ
116 データリーダ/ライタ
117 通信インターフェイス
118 入力機器
119 ディスプレイ装置
120 記録媒体
121 バス
Claims (10)
- 第1のテキストが第2のテキストを含意しているかどうかを判定するための装置であって、
前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ベクトル生成部と、
前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、組合せ同定部と、
同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、含意判定部と、
を備えていることを特徴とするテキスト含意判定装置。 - 前記述語項構造が、前記第1のテキスト又は前記第2のテキストに含まれる述語と、前記述語の引数となる単語と、前記引数の種類を示す単語とを含み、
前記ベクトル生成部が、前記述語と、前記述語の引数となる単語とを用いて、前記ベクトルを生成する、
請求項1に記載のテキスト含意判定装置。 - 前記組合せ同定部が、前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルと、の類似度を算出し、算出した前記類似度に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、
請求項1または2に記載のテキスト含意判定装置。 - 前記組合せ同定部が、前記類似度の算出の際に、前記ベクトルの有する情報の量に応じて正規化処理を実行する、
請求項3に記載のテキスト含意判定装置。 - 前記含意判定部が、前記組合せ毎に、前記述語項構造において述語の引数の種類を示す単語以外の単語に基づいて、前記特徴量として、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造とにおける、単語の被覆度合い、及び前記引数となる単語のみを対象とした単語の一致度合い、のいずれかを求める、
請求項1~4のいずれかに記載のテキスト含意判定装置。 - 前記含意判定部が、前記特徴量に加えて、前記述語項構造の構造的な特徴を用いて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、
請求項5に記載のテキスト含意判定装置。 - 前記含意判定部が、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との構造的な類似性に応じて、前記特徴量及び前記述語項構造の構造的な特徴のいずれか一方を優先して判定を行なう、請求項6に記載のテキスト含意判定装置。
- 前記含意判定部が、前記特徴量を求める際に、機械学習によって得られたデータに基づいて、前記特徴量に重みを付与する、請求項5~7のいずれかに記載のテキスト含意判定装置。
- 第1のテキストが第2のテキストを含意しているかどうかを判定するための方法であって、
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を有することを特徴とするテキスト含意判定方法。 - コンピュータによって、第1のテキストが第2のテキストを含意しているかどうかを判定するためのプログラムを記録したコンピュータ読み取り可能な記録媒体であって、
前記コンピュータに、
(a)前記第1のテキスト及び前記第2のテキストそれぞれの述語項構造を取得し、前記第1のテキスト及び前記第2のテキストそれぞれについて、前記述語項構造毎に、当該述語項構造において述語の引数の種類を示す単語以外の単語を用いて、ベクトルを生成する、ステップと、
(b)前記第1のテキストについて前記述語項構造毎に生成されたベクトルと、前記第2のテキストについて前記述語項構造毎に生成されたベクトルとを比較し、比較の結果に基づいて、前記第1のテキストの述語項構造と前記第2のテキストの述語項構造との組合せを同定する、ステップと、
(c)同定された前記組合せ毎に、特徴量を求め、求めた前記特徴量に基づいて、前記第1のテキストが前記第2のテキストを含意しているかどうかを判定する、ステップと、
を実行させる、命令を含むプログラムを記録しているコンピュータ読み取り可能な記録媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG2013018791A SG188994A1 (en) | 2011-10-20 | 2012-10-04 | Textual entailment recognition apparatus, textual entailment recognition method, and computer-readable recording medium |
JP2013511427A JP5387870B2 (ja) | 2011-10-20 | 2012-10-04 | テキスト含意判定装置、テキスト含意判定方法、及びプログラム |
CN201280003691.XA CN103221947B (zh) | 2011-10-20 | 2012-10-04 | 文本含意辨认装置、文本含意辨认方法和计算机可读记录介质 |
US13/823,546 US8762132B2 (en) | 2011-10-20 | 2012-10-04 | Textual entailment recognition apparatus, textual entailment recognition method, and computer-readable recording medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011230773 | 2011-10-20 | ||
JP2011-230773 | 2011-10-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013058118A1 true WO2013058118A1 (ja) | 2013-04-25 |
Family
ID=48140770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/075765 WO2013058118A1 (ja) | 2011-10-20 | 2012-10-04 | テキスト含意判定装置、テキスト含意判定方法、及びコンピュータ読み取り可能な記録媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8762132B2 (ja) |
JP (1) | JP5387870B2 (ja) |
CN (1) | CN103221947B (ja) |
SG (1) | SG188994A1 (ja) |
WO (1) | WO2013058118A1 (ja) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014229078A (ja) * | 2013-05-22 | 2014-12-08 | 大学共同利用機関法人情報・システム研究機構 | 自然言語推論システム、自然言語推論方法及びプログラム |
WO2016035273A1 (ja) * | 2014-09-05 | 2016-03-10 | 日本電気株式会社 | テキスト処理システム、テキスト処理方法、及び、コンピュータ・プログラムが記録された記憶媒体 |
WO2016147218A1 (ja) * | 2015-03-18 | 2016-09-22 | 日本電気株式会社 | テキスト監視システム、テキスト監視方法、及び、記録媒体 |
JP2018036725A (ja) * | 2016-08-29 | 2018-03-08 | 日本電信電話株式会社 | 整合性判定装置、方法、及びプログラム |
CN109791569A (zh) * | 2016-10-05 | 2019-05-21 | 国立研究开发法人情报通信研究机构 | 因果关系识别装置及用于其的计算机程序 |
JP2021005391A (ja) * | 2019-05-08 | 2021-01-14 | 日本電気株式会社 | テキスト監視システム、テキスト監視方法、及び、プログラム |
US11544455B2 (en) | 2017-06-21 | 2023-01-03 | Nec Corporation | Information processing device, information processing method, and recording medium |
WO2023144872A1 (ja) * | 2022-01-25 | 2023-08-03 | 日本電気株式会社 | 文書分類装置、文書分類方法、および文書分類プログラム |
WO2023144871A1 (ja) * | 2022-01-25 | 2023-08-03 | 日本電気株式会社 | 文書分類装置、文書分類方法、および文書分類プログラム |
Families Citing this family (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5743938B2 (ja) * | 2012-03-26 | 2015-07-01 | 株式会社日立製作所 | 連想検索システム、連想検索サーバ及びプログラム |
US20140372102A1 (en) * | 2013-06-18 | 2014-12-18 | Xerox Corporation | Combining temporal processing and textual entailment to detect temporally anchored events |
US9916289B2 (en) * | 2013-09-10 | 2018-03-13 | Embarcadero Technologies, Inc. | Syndication of associations relating data and metadata |
JP5907393B2 (ja) | 2013-12-20 | 2016-04-26 | 国立研究開発法人情報通信研究機構 | 複雑述語テンプレート収集装置、及びそのためのコンピュータプログラム |
JP5904559B2 (ja) | 2013-12-20 | 2016-04-13 | 国立研究開発法人情報通信研究機構 | シナリオ生成装置、及びそのためのコンピュータプログラム |
JP6403382B2 (ja) | 2013-12-20 | 2018-10-10 | 国立研究開発法人情報通信研究機構 | フレーズペア収集装置、及びそのためのコンピュータプログラム |
US20150199339A1 (en) * | 2014-01-14 | 2015-07-16 | Xerox Corporation | Semantic refining of cross-lingual information retrieval results |
US9619457B1 (en) * | 2014-06-06 | 2017-04-11 | Google Inc. | Techniques for automatically identifying salient entities in documents |
WO2016013175A1 (ja) * | 2014-07-22 | 2016-01-28 | 日本電気株式会社 | テキスト処理システム、テキスト処理方法およびテキスト処理プログラム |
WO2016013157A1 (ja) * | 2014-07-23 | 2016-01-28 | 日本電気株式会社 | テキスト処理システム、テキスト処理方法およびテキスト処理プログラム |
JP6551968B2 (ja) * | 2015-03-06 | 2019-07-31 | 国立研究開発法人情報通信研究機構 | 含意ペア拡張装置、そのためのコンピュータプログラム、及び質問応答システム |
CN106250370B (zh) * | 2016-08-02 | 2019-06-11 | 海信集团有限公司 | 一种获取近义词的方法和装置 |
US20220253611A1 (en) * | 2017-05-10 | 2022-08-11 | Oracle International Corporation | Techniques for maintaining rhetorical flow |
US10599885B2 (en) * | 2017-05-10 | 2020-03-24 | Oracle International Corporation | Utilizing discourse structure of noisy user-generated content for chatbot learning |
US11386274B2 (en) * | 2017-05-10 | 2022-07-12 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US11373632B2 (en) * | 2017-05-10 | 2022-06-28 | Oracle International Corporation | Using communicative discourse trees to create a virtual persuasive dialogue |
US10817670B2 (en) * | 2017-05-10 | 2020-10-27 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US11586827B2 (en) * | 2017-05-10 | 2023-02-21 | Oracle International Corporation | Generating desired discourse structure from an arbitrary text |
US11960844B2 (en) * | 2017-05-10 | 2024-04-16 | Oracle International Corporation | Discourse parsing using semantic and syntactic relations |
US10679011B2 (en) * | 2017-05-10 | 2020-06-09 | Oracle International Corporation | Enabling chatbots by detecting and supporting argumentation |
US11615145B2 (en) | 2017-05-10 | 2023-03-28 | Oracle International Corporation | Converting a document into a chatbot-accessible form via the use of communicative discourse trees |
US12001804B2 (en) * | 2017-05-10 | 2024-06-04 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
JP7086993B2 (ja) * | 2017-05-10 | 2022-06-20 | オラクル・インターナショナル・コーポレイション | コミュニケーション用談話ツリーの使用による修辞学的分析の可能化 |
US10839154B2 (en) * | 2017-05-10 | 2020-11-17 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
JP6726638B2 (ja) * | 2017-05-11 | 2020-07-22 | 日本電信電話株式会社 | 含意認識装置、方法、及びプログラム |
US10839161B2 (en) | 2017-06-15 | 2020-11-17 | Oracle International Corporation | Tree kernel learning for text classification into classes of intent |
US11100144B2 (en) | 2017-06-15 | 2021-08-24 | Oracle International Corporation | Data loss prevention system for cloud security based on document discourse analysis |
US11182412B2 (en) | 2017-09-27 | 2021-11-23 | Oracle International Corporation | Search indexing using discourse trees |
CN117114001A (zh) | 2017-09-28 | 2023-11-24 | 甲骨文国际公司 | 基于命名实体的解析和识别确定跨文档的修辞相互关系 |
EP3688626A1 (en) | 2017-09-28 | 2020-08-05 | Oracle International Corporation | Enabling autonomous agents to discriminate between questions and requests |
EP3746916A1 (en) | 2018-01-30 | 2020-12-09 | Oracle International Corporation | Using communicative discourse trees to detect a request for an explanation |
US11537645B2 (en) | 2018-01-30 | 2022-12-27 | Oracle International Corporation | Building dialogue structure by using communicative discourse trees |
GB2572132A (en) | 2018-02-08 | 2019-09-25 | George Thompson Trevor | Document analysis method and apparatus |
US11023684B1 (en) * | 2018-03-19 | 2021-06-01 | Educational Testing Service | Systems and methods for automatic generation of questions from text |
JP7258047B2 (ja) | 2018-05-09 | 2023-04-14 | オラクル・インターナショナル・コーポレイション | 収束質問に対する回答を改善するための仮想談話ツリーの構築 |
US11455494B2 (en) | 2018-05-30 | 2022-09-27 | Oracle International Corporation | Automated building of expanded datasets for training of autonomous agents |
CN109165300B (zh) * | 2018-08-31 | 2020-08-11 | 中国科学院自动化研究所 | 文本蕴含识别方法及装置 |
US11449682B2 (en) | 2019-08-29 | 2022-09-20 | Oracle International Corporation | Adjusting chatbot conversation to user personality and mood |
US11775772B2 (en) | 2019-12-05 | 2023-10-03 | Oracle International Corporation | Chatbot providing a defeating reply |
CN113064983B (zh) * | 2021-04-23 | 2024-04-26 | 深圳壹账通智能科技有限公司 | 语义检测方法、装置、计算机设备及存储介质 |
CN113901215B (zh) * | 2021-10-09 | 2022-04-26 | 延边大学 | 一种融合高低层语义信息的文本蕴含识别方法 |
WO2023212524A1 (en) * | 2022-04-25 | 2023-11-02 | Gyan, Inc. (A Delaware Corporation) | An explainable natural language understanding platform |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09128401A (ja) * | 1995-10-27 | 1997-05-16 | Sharp Corp | 動画像検索装置及びビデオ・オン・デマンド装置 |
JP2004272352A (ja) * | 2003-03-05 | 2004-09-30 | Nippon Telegr & Teleph Corp <Ntt> | 類似度計算方法、装置、プログラムおよび該プログラムを格納した記録媒体 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060245641A1 (en) * | 2005-04-29 | 2006-11-02 | Microsoft Corporation | Extracting data from semi-structured information utilizing a discriminative context free grammar |
CN101238459A (zh) * | 2005-05-13 | 2008-08-06 | 柯廷技术大学 | 比较文本文件 |
US8180633B2 (en) * | 2007-03-08 | 2012-05-15 | Nec Laboratories America, Inc. | Fast semantic extraction using a neural network architecture |
US7890539B2 (en) * | 2007-10-10 | 2011-02-15 | Raytheon Bbn Technologies Corp. | Semantic matching using predicate-argument structure |
US8392436B2 (en) * | 2008-02-07 | 2013-03-05 | Nec Laboratories America, Inc. | Semantic search via role labeling |
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
US8554542B2 (en) * | 2010-05-05 | 2013-10-08 | Xerox Corporation | Textual entailment method for linking text of an abstract to text in the main body of a document |
CN101908042B (zh) * | 2010-08-09 | 2016-04-13 | 中国科学院自动化研究所 | 一种双语联合语义角色的标注方法 |
-
2012
- 2012-10-04 SG SG2013018791A patent/SG188994A1/en unknown
- 2012-10-04 CN CN201280003691.XA patent/CN103221947B/zh not_active Expired - Fee Related
- 2012-10-04 WO PCT/JP2012/075765 patent/WO2013058118A1/ja active Application Filing
- 2012-10-04 US US13/823,546 patent/US8762132B2/en active Active
- 2012-10-04 JP JP2013511427A patent/JP5387870B2/ja active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09128401A (ja) * | 1995-10-27 | 1997-05-16 | Sharp Corp | 動画像検索装置及びビデオ・オン・デマンド装置 |
JP2004272352A (ja) * | 2003-03-05 | 2004-09-30 | Nippon Telegr & Teleph Corp <Ntt> | 類似度計算方法、装置、プログラムおよび該プログラムを格納した記録媒体 |
Non-Patent Citations (4)
Title |
---|
ASHER STERN ET AL.: "Rule Chaining and Approximate Match in textual inference", 27 October 2010 (2010-10-27), XP003031176, Retrieved from the Internet <URL:http://www.nist.gov/tac/publications/2010/participant.papers/BIU.proceedings.pdf> [retrieved on 20121022] * |
MICHITAKA ODANI ET AL.: "Iikae Hyogen no Jutsugoko Kozo eno Seikika to Text Gan'i Kankei Ninshiki deno Riyo", PROCEEDINGS OF THE 15TH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING, 2 March 2009 (2009-03-02), pages 260 - 263 * |
SHUYA ABE ET AL.: "Event Relation Acquisition with Syntactic Patterns and Shared Arguments", JOURNAL OF NATURAL LANGUAGE PROCESSING, vol. 17, no. 1, January 2010 (2010-01-01), pages 121 - 139, XP003031177 * |
TOMOHIDE SHIBATA ET AL.: "Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Case Frames", IPSJ SIG NOTES, vol. 2011-NL-, no. 2, 15 October 2011 (2011-10-15), pages 1 - 8, XP003031178 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014229078A (ja) * | 2013-05-22 | 2014-12-08 | 大学共同利用機関法人情報・システム研究機構 | 自然言語推論システム、自然言語推論方法及びプログラム |
WO2016035273A1 (ja) * | 2014-09-05 | 2016-03-10 | 日本電気株式会社 | テキスト処理システム、テキスト処理方法、及び、コンピュータ・プログラムが記録された記憶媒体 |
US10339223B2 (en) | 2014-09-05 | 2019-07-02 | Nec Corporation | Text processing system, text processing method and storage medium storing computer program |
WO2016147218A1 (ja) * | 2015-03-18 | 2016-09-22 | 日本電気株式会社 | テキスト監視システム、テキスト監視方法、及び、記録媒体 |
JPWO2016147218A1 (ja) * | 2015-03-18 | 2017-12-14 | 日本電気株式会社 | テキスト監視システム、テキスト監視方法、及び、プログラム |
JP2018036725A (ja) * | 2016-08-29 | 2018-03-08 | 日本電信電話株式会社 | 整合性判定装置、方法、及びプログラム |
CN109791569A (zh) * | 2016-10-05 | 2019-05-21 | 国立研究开发法人情报通信研究机构 | 因果关系识别装置及用于其的计算机程序 |
CN109791569B (zh) * | 2016-10-05 | 2023-07-04 | 国立研究开发法人情报通信研究机构 | 因果关系识别装置及存储介质 |
US11544455B2 (en) | 2017-06-21 | 2023-01-03 | Nec Corporation | Information processing device, information processing method, and recording medium |
JP2021005391A (ja) * | 2019-05-08 | 2021-01-14 | 日本電気株式会社 | テキスト監視システム、テキスト監視方法、及び、プログラム |
WO2023144872A1 (ja) * | 2022-01-25 | 2023-08-03 | 日本電気株式会社 | 文書分類装置、文書分類方法、および文書分類プログラム |
WO2023144871A1 (ja) * | 2022-01-25 | 2023-08-03 | 日本電気株式会社 | 文書分類装置、文書分類方法、および文書分類プログラム |
Also Published As
Publication number | Publication date |
---|---|
CN103221947A (zh) | 2013-07-24 |
JP5387870B2 (ja) | 2014-01-15 |
US20130204611A1 (en) | 2013-08-08 |
SG188994A1 (en) | 2013-05-31 |
JPWO2013058118A1 (ja) | 2015-04-02 |
CN103221947B (zh) | 2016-05-25 |
US8762132B2 (en) | 2014-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5387870B2 (ja) | テキスト含意判定装置、テキスト含意判定方法、及びプログラム | |
US9858264B2 (en) | Converting a text sentence to a series of images | |
KR102345455B1 (ko) | 질의 응답 시스템에서 컴퓨터 발생형 자연 언어 출력 | |
US9448772B2 (en) | Generating program fragments using keywords and context information | |
KR101893090B1 (ko) | 취약점 정보 관리 방법 및 그 장치 | |
WO2019037258A1 (zh) | 信息推荐的装置、方法、系统及计算机可读存储介质 | |
JP6663826B2 (ja) | 計算機及び応答の生成方法 | |
US20160188569A1 (en) | Generating a Table of Contents for Unformatted Text | |
CN112560483A (zh) | 自动检测自由文本中的个人信息 | |
US20130159346A1 (en) | Combinatorial document matching | |
CA3207902A1 (en) | Auditing citations in a textual document | |
KR20210050139A (ko) | 이미지의 유사도 계산 장치 및 방법 | |
US20140236571A1 (en) | Inducing and Applying a Subject-Targeted Context Free Grammar | |
US8769700B2 (en) | Method, apparatus and computer program for supporting determination on degree of confidentiality of document | |
US20230095036A1 (en) | Method and system for proficiency identification | |
Nagy et al. | Improving fake news classification using dependency grammar | |
JP6194180B2 (ja) | 文章マスク装置及び文章マスクプログラム | |
JP2018077604A (ja) | 機能記述からの実現手段・方法の侵害候補を自動特定する人工知能装置 | |
Wei et al. | Motif-based hyponym relation extraction from wikipedia hyperlinks | |
JP6201702B2 (ja) | 意味情報分類プログラム及び情報処理装置 | |
JP2014112306A (ja) | 要望文抽出装置、要望内容同定モデル学習装置、方法、及びプログラム | |
JP2018073298A (ja) | 人工知能装置による手段・方法の自動抽出・作成方法 | |
EP4339817A1 (en) | Anomalous command line entry detection | |
JP2019003237A (ja) | 提示方法、提示装置及び提示プログラム | |
Gupta et al. | Using variant directional dis (similarity) measures for the task of textual entailment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2013511427 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13823546 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12842600 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12842600 Country of ref document: EP Kind code of ref document: A1 |