US11301441B2 - Information processing system and information processing method - Google Patents

Information processing system and information processing method Download PDF

Info

Publication number
US11301441B2
US11301441B2 US15/923,875 US201815923875A US11301441B2 US 11301441 B2 US11301441 B2 US 11301441B2 US 201815923875 A US201815923875 A US 201815923875A US 11301441 B2 US11301441 B2 US 11301441B2
Authority
US
United States
Prior art keywords
word
tree structure
specific
structure pattern
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/923,875
Other languages
English (en)
Other versions
US20190026324A1 (en
Inventor
Misa SATO
Kohsuke Yanai
Toshihiko Yanase
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Assigned to HITACHI, LTD. reassignment HITACHI, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SATO, MISA, YANAI, KOHSUKE, YANASE, TOSHIHIKO
Publication of US20190026324A1 publication Critical patent/US20190026324A1/en
Application granted granted Critical
Publication of US11301441B2 publication Critical patent/US11301441B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Definitions

  • the present invention relates to an information processing system and an information processing method for processing information.
  • Japanese Patent Application Laid-Open Publication No. 2006-171969 discloses a document processing apparatus capable of outputting a keyword having a specific attribute.
  • the document processing apparatus gives a morphological analysis unit a character string included in the medical report, and the morphological analysis unit divides the character string into words and generates a word list.
  • a word extraction unit determines whether a thesaurus code of a selected word in the word list specified by a thesaurus search unit meets an extraction condition, extracts the selected word in the case where the code meets the condition, and outputs the selected word as a keyword.
  • Japanese Patent Application Laid-Open Publication No. 2008429662 discloses an information extraction device for performing subtree matching at high speed.
  • the information extraction device includes a word dictionary; an analyzing unit for generating a syntax tree for each sentence in text; a parent-child index generation unit for generating a parent-child index by extracting a parent-child relation between words from the syntax tree, generating a key indicating the parent-child relation, and associating the key with a syntax tree ID for specifying the syntax tree and a node ID column in the syntax tree of words included in the parent-child relation; and an extraction unit for specifying an extraction target from a targeted syntax tree to perform action to the specified target, by reading an information extraction rule including a rule condition syntax tree and the action, generating a search key in the same format as the parent-child index from the rule condition syntax tree of a search condition, narrowing down the targeted syntax trees through searching for the parent-child index by use of the generated search key and performing matching of the rule condition syntax tree with the syntax tree, and performing mapping
  • the information extraction device assigns a unique index describing a parent-child relation of words to text and a rule, and in advance narrows down syntax trees to be targeted for information extraction.
  • Tgrep 2 is a grep tool for syntax tree expression.
  • the tool enables searching syntax tree expression with a query of a specific syntax tree.
  • Document Levy (Levy, R. and Andrew. G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures, in Proceedings of LREC-2006, 2006.) discloses a syntax tree query tool Tregex having richer expression than Tgrep 2.
  • the syntax tree query tool Tregex extracts relations according to a syntax rule described in one line.
  • the object of the present invention is to facilitate database maintenance.
  • An aspect of the invention disclosed in this application is an information processing system comprises a processor for executing a program, a storage device for storing the program, a word dictionary database for storing a word group corresponding to a group of words grouped according to a predetermined attribute and a rule database for storing a tree structure pattern obtained by abstracting tree structure data indicating relations between words in a sentence, by use of the word group.
  • the processor executes acceptance processing of accepting a maintenance request, and maintenance processing of, when the maintenance request accepted in the acceptance processing is a maintenance request related to a word, maintaining the word dictionary database as for a word group, the word belonging to the word group, while when the maintenance request is a maintenance request related to the tree structure pattern, maintaining the rule database as for the tree structure pattern.
  • FIG. 1 is an explanatory diagram illustrating a database maintenance example 1.
  • FIG. 2 is an explanatory diagram illustrating a database maintenance example 2.
  • FIG. 3 is an explanatory diagram illustrating a database maintenance example 3.
  • FIG. 4 is a block diagram illustrating an example of a hardware configuration of a computer.
  • FIG. 5 is an explanatory diagram illustrating an example of storage contents in the word dictionary DB.
  • FIG. 6 is an explanatory diagram illustrating an example of storage contents in the rule DB.
  • FIG. 7 is an explanatory diagram illustrating an example of storage contents in the data store.
  • FIG. 8 is an explanatory diagram illustrating one example of the sentence.
  • FIG. 9 is an explanatory diagram illustrating one example of the tree structure data and the tree structure pattern.
  • FIG. 10 is an explanatory diagram illustrating one example of a pattern expression.
  • FIG. 11 is an explanatory diagram illustrating an example of conversion by use of the pattern expression shown in FIG. 10 .
  • FIG. 12 is a flowchart indicating an example of information processing procedure by the information processing system.
  • FIG. 13 is an explanatory diagram illustrating a use example of the information processing system.
  • FIG. 14 is an explanatory diagram illustrating a display screen example 1 of the information processing system.
  • FIG. 15 is an explanatory diagram illustrating a display screen example 2 of the information processing system.
  • FIG. 16 is an explanatory diagram illustrating a display screen example 3 of the information processing system.
  • FIG. 17 is an explanatory diagram illustrating a display screen example 4 of the information processing system.
  • FIG. 18 is an explanatory diagram illustrating a display screen example 5 of the information processing system.
  • FIG. 19 is an explanatory diagram illustrating a display screen example 6 of the information processing system.
  • FIG. 20 is a flowchart indicating an example of processing procedure in a use example of the information processing system.
  • FIG. 1 is an explanatory diagram illustrating a database maintenance example 1.
  • the word dictionary DB 101 stores one or more word groups.
  • a word group herein is a group of words grouped according to a predetermined attribute.
  • a predetermined attribute herein is a feature exhibited by a targeted word group. Specific examples of the predetermined attribute in Japanese sentence include a verb whose subject is followed by case “ga” of a postpositional particle and a verb co-occurring with a specific adverb.
  • the predetermined attribute may be a synonym or a similar word, or a word used in a specific field (investment, medical care, etc.).
  • a word group Ga is a synonym group including “suppress” and “decrease.”
  • a rule DB 102 is a database for storing a tree structure pattern indicating a rule.
  • a tree structure pattern herein is data in which the tree structure data indicating relations between words in a sentence is abstracted by use of a word group.
  • Tree structure data herein is, for example, a syntax tree generated according to a phrase structure rule by morphological analysis and dependency analysis (hereinafter, referred to as parsing).
  • a rule Ra in FIG. 1 has a tree structure pattern in which a subject (wild card), a predicate and an object (wild card) are included in this word order, and a verb constituting a predicate belongs to the word group Ga.
  • a data store 103 stores text data of various types of sentences (for example, sentences in academic papers and books, sentences in newspapers and magazines, sentences described on web pages, etc.).
  • each sentence in the search result 112 is text data meeting the rule Ra, and “Z reduces D.” and “X is going to reduce E.” each including “reduce” are further added to the search result 111 .
  • Simply maintaining the word dictionary DB 101 enables searching so as to satisfy the maintenance result of the word dictionary DB 101 without maintaining the rule DB 102 .
  • each sentence in the search result 111 is text data meeting the rule Ra, and neither “Z reduces D.” nor “X is going to reduce E.” each including “reduce” in the search result 112 is found.
  • deletion or addition of a word may be performed as described above.
  • “reduce” is deleted from the word group Ga and “drop” is added.
  • simply maintaining the word dictionary DB 101 enables searching so as to satisfy the maintenance result of the word dictionary DB 101 without maintaining the rule DB 102 .
  • FIG. 2 is an explanatory diagram illustrating a database maintenance example 2. With reference to FIG. 2 , maintenance of the rule DB 102 is described.
  • (A) is the same as (A) shown in FIG. 1 .
  • (B) illustrates a rule Rb added newly.
  • the rule Rb has a tree structure pattern in which a subject (wild card), a predicate (auxiliary verb (wild card) and verb) and an object (wild card) are included in this word order, and a verb belongs to the word group Ga. That is, the rule Rb has the tree structure pattern in which an auxiliary verb is added to the rule Ra.
  • a search result 210 is obtained.
  • Each sentence in the search result 210 is text data meeting the rule Rb.
  • the rule Rb is simply deleted from the rule DB 102 , and there is no need to maintain the word dictionary DB 101 .
  • deletion or addition of a rule may be performed as described above.
  • the rule Ra may be called and an auxiliary verb (wild card) may be added in front of a verb (word group Ga).
  • simply maintaining the rule DB 102 enables searching so as to satisfy the maintenance result of the rule DB 102 without maintaining the word dictionary DB 101 .
  • FIG. 3 is an explanatory diagram illustrating a database maintenance example 3.
  • maintenance of the rule DB 102 is described.
  • a word group is used in a rule
  • the rule Ra uses the word group Ga
  • FIG. 4 is a block diagram illustrating an example of a hardware configuration of a computer.
  • the computer 400 has a processor 401 , a storage device 402 , an input device 403 , an output device 404 and a communication interface (communication IF 405 ).
  • the processor 401 , the storage device 402 , the input device 403 , the output device 404 and the communication IF 405 are connected by a bus 406 .
  • the processor 401 controls the computer 400 .
  • the storage device 402 serves as a work area of the processor 401 .
  • the storage device 402 is a non-transitory or transitory storage medium for storing various types of programs and data.
  • Examples of the storage device 402 include a ROM (Read Only Memory), a RAM (Random Access Memory), an HDD (Hard Disk Drive), and a flash memory.
  • the input device 403 inputs data. Examples of the input device 403 include a keyboard, a mouse, a touch panel, a numeric keypad, and a scanner.
  • the output device 404 outputs data. Examples of the output device 404 include a display and a printer.
  • the communication IF 405 is connected to the network to transmit and receive data.
  • the word dictionary DB 101 , the rule DB 102 , and the data store 103 may be realized by the storage device 402 in the computer 400 shown in FIG. 4 , or may be realized by another computer accessible via the communication IF 405 . It is noted that in the following description of a database or a table, a value of AA field bbb (AA is a field name, and bbb is a code) may be expressed as AAbbb. In an example, the value of a group ID field 501 is expressed as a group ID 501 .
  • FIG. 5 is an explanatory diagram illustrating an example of storage contents in the word dictionary DB 101 .
  • the word dictionary DB 101 has a group ID field 501 , an attribute field 502 , a word field 503 , and a part of speech field 504 .
  • the combination of values of the fields 501 to 504 in the same line defines an entry indicating one word group.
  • the group ID field 501 is a storage area for storing group IDs.
  • the group ID 501 is identification information for uniquely specifying word groups.
  • the attribute field 502 is a storage area for storing attributes.
  • the attribute 502 is a feature exhibited by a targeted word group. Specific examples in Japanese sentence include a verb whose subject is followed by case “ga” of a postpositional particle and a verb co-occurring with a specific adverb.
  • an attribute herein may be a synonym or a similar word, or a word used in a specific field (investment, medical care, etc.).
  • the word field 503 is a storage area for storing words.
  • the word 503 is a word belonging to a targeted word group.
  • An operator (user or administrator) can add, change, and delete the word 503 with respect to the word field 503 .
  • the part of speech field 504 is a storage area for storing parts of speech.
  • the part of speech 504 is a classification of words belonging to a word group classified according to form and role. It is noted that the part of speech 504 may specify a form of words.
  • a verb is specified from among, for example, base form (current form), past tense, past particle and present progressive form;
  • a noun is specified from among, for example, uncountable noun, countable noun, singular form and plural form;
  • each of an adjective and an adverb is specified from among, for example, positive degree, comparative degree, and superlative degree.
  • all forms of the part of speech 504 may be included.
  • FIG. 6 is an explanatory diagram illustrating an example of storage contents in the rule DB 102 .
  • the rule DB 102 has a rule ID field 601 and a tree structure pattern field 602 .
  • the combination of values of the fields 601 and 602 in the same line defines an entry indicating one rule.
  • the rule ID field 601 is a storage area for storing rule IDs.
  • the rule ID 601 is identification information for uniquely specifying rules.
  • the tree structure pattern field 602 is a storage area for storing tree structure patterns. An operator can add, change, and delete the tree structure pattern 602 with respect to the tree structure pattern field 602 .
  • FIG. 3 illustrates a rule in which a word group is used as a verb in the tree structure pattern 602 , and a wild card is used as each of a subject and an object.
  • a word group may be used as a word or phrase other than a predicate, such as a subject or an object, and a wild card may be used as another word or phrase.
  • Another rule may be used, in which a plurality of word groups are specified in one tree structure pattern 602 .
  • FIG. 7 is an explanatory diagram illustrating an example of storage contents in the data store 103 .
  • the data store 103 has an index field 701 , a sentence field 702 , and a tree structure data field 703 , The combination of values of the fields 701 to 703 in the same line defines an entry with respect to one sentence.
  • the index field 701 is a storage area for storing indexes, and is used for index search.
  • the index field 701 has a plurality of lemma fields ( FIG. 7 illustrates three fields of a lemma a 0 field 710 , a lemma a 1 field 711 and a lemma a 2 field 712 ).
  • the lemma a 0 field 710 is a storage area for suiting the indexes 701 each having been set in advance as a lemma a 0 .
  • Each of the lemma a 1 field 711 and the lemma a 2 field 712 is a storage area for storing the indexes 701 to be served as a lemma a 1 and a lemma a 2 , respectively.
  • the initial state of each of the lemma a 1 field 711 and the lemma a 2 field 712 is blank, and the lemmas a 1 and a 2 are added thereto respectively at the time of index-up
  • the sentence field 702 is a storage area for storing sentences.
  • the sentence 702 is text data to be parsed to obtain the tree structure data 703 .
  • the tree structure data field 703 is a storage area for storing tree structure data each obtained by parsing a sentence according to a phrase structure rule.
  • FIG. 8 is an explanatory diagram illustrating one example of the sentence 702 .
  • FIG. 8 illustrates one example of an English sentence st 1 .
  • the sentence 702 may be in another language such as in Japanese, without being limited to English.
  • FIG. 9 is an explanatory diagram illustrating one example of the tree structure data and the tree structure pattern.
  • a tree structure data tr 1 is a syntax tree obtained by parsing the sentence st 1 shown in FIG. 8 according to a phrase structure rule.
  • “POS” indicates a part of speech
  • “ROOT” indicates a root of the syntax tree.
  • An alphabet string having one to three letters in capitals indicates a type of part of speech (noun, verb, etc.).
  • a tree structure pattern tp 1 is a pattern obtained in such a manner that an operator deletes unnecessary information from the tree structure data tr 1 and edits the resultant.
  • the tree structure pattern tp 1 indicates a rule specifying a subject (wild card), a predicate (“spin off” as a verb) and an object (wild card) in this word order.
  • FIG. 10 is an explanatory diagram illustrating one example of a pattern expression.
  • a pattern expression 1000 is used when the information processing system executes information processing. By recognizing the pattern expression 1000 , an operator can edit the tree structure data 703 to generate the tree structure pattern 602 .
  • “_” expresses determination of a leaf node (leaf of a syntax tree); “
  • FIG. 11 is an explanatory diagram illustrating an example of conversion by use of the pattern expression shown in FIG. 10 .
  • the selection of “increase” or “cause” whose part of speech (POS) is a verb (VP) in a tree structure data tr 11 is converted to calling ( ⁇ dic.) of a word group of “affect” in the group ID 501 .
  • a tree structure pattern tp 11 including the word group is generated.
  • Such conversion is executed upon edit operation by an operator.
  • FIG. 12 is a flowchart indicating an example of information processing procedure by the information processing system.
  • the information processing system waits for a maintenance request (step S 1201 ; No).
  • the maintenance request is instructed by the processor 401 , or given by a terminal via the communication IF 405 or by the input device 403 .
  • the information processing system determines whether the maintenance request is a maintenance request related to a word or a maintenance request related to a rule (tree structure pattern), on the basis of the information included in the maintenance request (step S 1202 ).
  • step S 1202 the information processing system determines whether the maintenance request related to a word is a request for addition or deletion of a word, on the basis of the information included in the maintenance request related to a word (step S 1203 ).
  • step S 1203 addition
  • the information processing system specifies a word group as destination of addition from the word dictionary DB 101 (step S 1204 ). Specifically, in the case where the maintenance request related to a word includes a group ID of destination of addition, the information processing system specifies the word group specified by the group ID 501 , as the destination of addition of the word to be added included in the maintenance request related to a word.
  • the information processing system may automatically specify a word group of destination of addition.
  • a word to be added is a word extracted from the sentence 702 included in the maintenance request related to a word
  • the information processing system specifies a word group having the attribute corresponding to the feature of the sentence from the word dictionary DB 101 . Then, the information processing system adds the word to be added to the specified word group of destination of addition (step S 1205 ), and returns to step S 1201 .
  • step S 1203 in the case of deletion of a word (step S 1203 : deletion), the information processing system deletes the word to be deleted included in the maintenance request related to a word from the word group for deletion in the word dictionary DB 101 (step S 1206 ), and returns to step S 1201 .
  • a word group for deletion herein is, for example, all entries in the word dictionary DB 101 in the case where the group ID 501 is not specified in the maintenance request relating to a word, or the entry specified by the group ID 501 in the case where the group ID 501 is specified.
  • step S 1202 in the case of a maintenance request related to a rule (step S 1202 : rule), the information processing system determines whether the maintenance request related to a rule is a request for addition or deletion of a rule, on the basis of the information included in the maintenance request related to a rule (step S 1207 ). In the case of addition of a rule (step S 1207 : addition), the information processing system adds to the rule DB 102 a rule to be added included in the maintenance request related to a rule (step S 1208 ), and returns to step S 1201 .
  • step S 1207 in the case of deletion of a rule (step S 1207 : deletion), the information processing system deletes an entry of the rule ID 601 included in the maintenance request related to a rule from the rule DB 102 (step S 1209 ), and returns to step S 1201 .
  • FIG. 13 is an explanatory diagram illustrating a use example of the information processing system.
  • the information processing system acquires a sentence stc 1 from the data store 103 .
  • the information processing system may directly acquire the sentence stc 1 , or may acquire the sentence stc 1 by index search by use of the index 701 .
  • the information processing system converts the acquired sentence stc 1 into tree structure data trc by parsing.
  • the information processing system may execute parsing.
  • the information processing system may transmit the sentence stc 1 to another computer, and the another computer may execute parsing and return the tree structure data trc to the information processing system.
  • the information processing system calls the tree structure data trc associated with the sentence stc 1 from the data store 103 .
  • the information processing system generates a tree structure pattern on the basis of the tree structure data trc upon edit operation by an operator, and sets it as a rule Rc.
  • a word group Gb of verbs is applied to the predicate in the rule Rc.
  • the information processing system extracts “X” corresponding to the subject in the sentence stc 1 from the tree structure pattern of the rule Rc as the lemma a 1 , extracts “A” corresponding to the object in the sentence stc 1 as the lemma a 2 , and displays them on a display screen.
  • the information processing system registers the rule Re with the rule DB 102 . It is noted that in the case where a rule having the same contents has been registered already, the information processing system does not register the rule Re with the rule DB 102 .
  • the information processing system registers the tree structure data trc of (2) and the lemmas a 1 and a 2 of (4) as the entry of the sentence stc 1 with the data store 103 . This enables automatically generating the index 701 of the acquired sentence stc 1 and resulting in improving the efficiency of index search thereafter.
  • the information processing system searches other sentences than the sentence stc 1 the data store 103 to specify a sentence stc 2 meeting the rule Rc, and registers “J” corresponding to the subject of the entry of the sentence stc 2 as the lemma a 1 and “K” corresponding to the object thereof as the lemma a 2 (index-updating). This gives influence on another sentence stc 2 , thereby enables automatically generating the index 701 and resulting in improving the efficiency of index search thereafter.
  • FIG. 13 An example of a display screen in the use example shown in FIG. 13 is described with reference to FIG. 14 to FIG. 19 .
  • FIG. 14 is an explanatory diagram illustrating a display screen example 1 of the information processing system.
  • a display screen 1400 has a SAMPLE tab 1401 , a VALIDATE, tab 1402 and an INDEX tab 1403 .
  • FIG. 14 illustrates the SAMPLE tab 1401 .
  • the SAMPLE tab 1401 has a search keyword input box 1411 , a SEARCH button 1412 , and a SAVE button 1415 .
  • the search keyword input box 1411 is an input box in which an operator inputs a search keyword.
  • the SEARCH button 1412 is a button for index-searching the data store 103 for the index 701 upon operation by an operator to extract the corresponding sentence 702 .
  • the following description is about index search in the present example.
  • a full sentence of the sentence 702 may be searched for.
  • FIG. 14 “spin off” is input into the search keyword input box 1411 , and the SEARCH button 1412 is pressed, in this case, as shown in (1) of FIG. 13 , the data store 103 is index-searched for the index 701 , and the corresponding sentence 702 is displayed as a search result 1413 .
  • Each sentence of the search result 1413 has a check box 1414 .
  • the information processing system selects the sentence corresponding to the check box 1414 ticked by an operator.
  • the sentence st 1 is selected.
  • the SAVE button 1415 is a button for saving the sentence corresponding to the check box 1414 selected from the search result 1413 . When the SAVE button 1415 is pressed, the sentence st 1 corresponding to the ticked check box 1414 is stored in the data store 103 .
  • FIG. 15 is an explanatory diagram illustrating a display screen example 2 of the information processing system.
  • the display screen example 2 is an example of the display screen in the case where the VALIDATE tab 1402 is selected with the check box 1414 ticked on the display screen example 1 shown in FIG. 14 .
  • the VALIDATE tab 1402 has a confirmation area 1501 , a copy area 1502 , a PARSING button 1503 , a LEMMA button 1504 , an ADD button 1505 , and an edit area 1506 .
  • the confirmation area 1501 has a selected sentence display area 1510 , a lemma a 1 display area 1511 and a lemma a 2 display area 1512 .
  • the selected sentence display area 1510 displays the sentence selected upon ticking in the check box 1414 in the display screen example 1 shown in FIG. 14 .
  • the lemma a 1 display area 1511 is an area for displaying the lemma a 1 (subject).
  • the lemma a 2 display area 1512 is an area for displaying the lemma a 2 (object).
  • the lemma a 1 display area 1511 has a text input box for lemma a 1 1513 .
  • an operator inputs a word or phrase (eg, “Nichiritsu”) corresponding to the lemma a 1 (subject) into the text input box for lemma a 1 1513 .
  • the lemma a 2 display area 1512 has a text input box for lemma a 2 1514 .
  • an operator inputs a word or phrase (eg, “home appliance”) corresponding to the lemma a 2 (object) into the text input box for lemma a 2 1514 .
  • a word or phrase eg, “home appliance”
  • the combination of the sentence st 1 displayed in the confirmation area 1501 the word “Nichiritsu” input into the text input box for lemma a 1 1513 , and the phrase “home appliance” input into the text input box for lemma a 2 1514 is called a data set for confirmation 1500 .
  • a COPY button 1515 is a button for copying the sentence displayed in the selected sentence display area 1510 into the copy area 1502 upon operation by an operator.
  • the copy area 1502 is an area for displaying the sentence st 1 copied from the selected sentence display area 1510 when the COPY button 1515 is pressed.
  • the PARSING button 1503 is a button for parsing the sentence st 1 copied into the copy area 1502 (corresponding to (2) in FIG. 13 ).
  • the LEMMA button 1504 is a button for extracting a lemma of the sentence st 1 from the tree structure pattern edited in the edit area 1506 (corresponding to (4) in FIG. 13 ).
  • the ADD button 1505 is a button for adding the tree structure pattern edited in the edit area 1506 to the rule DB 102 as a rule (corresponding to (5) in FIG. 13 ).
  • FIG. 16 is an explanatory diagram illustrating a display screen example 3 of the information processing system.
  • the display screen example 3 is an example of the display screen in the case where the COPY button 1515 and the PARSING button 1503 are pressed on the display screen example 2 shown in FIG. 15 .
  • the selected sentence st 1 is copied into the copy area 1502 .
  • the PARSING button 1503 the tree structure data tr 1 obtained by parsing the selected sentence st 1 is displayed in the edit area 1506 (corresponding to (2) in FIG. 13 ).
  • FIG. 17 is an explanatory diagram illustrating a display screen example 4 of the information processing system.
  • the display screen example 4 is an example of the display screen in the case where the tree structure data tr 1 in the edit area 1506 is edited on the display screen example 3 shown in FIG. 16 .
  • the information processing system assigns “a 0 ,” “a 1 ” and “a 2 ” indicating lemmas to the words to be extracted as lemmas.
  • the lemmas “a 0 ,” “a 1 ” and “a 2 ” define rules.
  • the lemma a 0 is a non-extraction-target lemma which is used as an extraction reference of other lemmas a 1 and a 2 .
  • the lemma a 0 is a word
  • the lemma a 0 is a non-extraction-target word matching other sentences
  • the lemma a 0 is a word group
  • the lemma a 0 is a non-extraction-target word group including words of other sentences.
  • the lemma a 1 is defined as a subject (noun phrase (NP)) for the lemma a 0 in the tree structure pattern tp 1
  • the lemma a 2 is defined as an object (noun phrase (NP)) for the lemma a 0 in the tree structure pattern tp 1 .
  • An operator operates to delete a subtree or “lemma” (base form of a word) which is determined to be unimportant on the basis of operator's subjectivity.
  • a word defined by the tree structure data tr 1 may be changed to the description of call of a word group including the word, in some cases.
  • FIG. 18 is an explanatory diagram illustrating a display screen example 5 of the information processing system.
  • the display screen example 5 is an example of the display screen in the case where the LEMMA button 1504 is pressed on the display screen example 4 shown in FIG. 17 .
  • the information processing system extracts from the selected sentence st 1 of the copied screen the character strings corresponding to the lemmas a 1 and a 2 meeting the tree structure pattern tp 1 (rule) edited in the edit area 1506 , and displays an extraction result 1800 (corresponding to (4) in FIG. 13 ).
  • Japanese electronics maker Nichiritsu is extracted as the noun phrase of the lemma a 1
  • “its home appliance and industrial equipment divisions” is extracted as the noun phrase of the lemma a 2 .
  • the extracted noun phrases of the lemmas a 1 and a 2 are displayed in the lemma a 1 display area 1511 and the lemma a 2 display area 1512 , respectively.
  • an operator compares the word. “Nichiritsu” input into, the text input box for lemma a 1 1513 with the noun phrase “Japanese electronics maker Nichiritsu” of the lemma a 1 extracted according to the rule, thereby enabling confirming the certainty of the rule.
  • an operator compares the phrase “home appliance” input into the text input box for lemma a 2 1514 with the noun phrase “its home appliance and industrial equipment divisions” of the lemma a 2 extracted according to the rule, thereby enabling confirming the certainty of the rule.
  • the character string in the edit area 1506 (edited tree structure data tr 1 ) is regarded as the tree structure pattern tp 1 , and is registered as a rule with the rule DB 102 (corresponding to (5) in FIG. 13 ).
  • FIG. 19 is an explanatory diagram illustrating a display screen example 6 of the information processing system.
  • the display screen example 6 is an example of the display screen in the case where the INDEX tab 1403 is selected on the display screen example 5 shown in FIG. 18 .
  • the INDEX tab 1403 has an UPDATE button 1900 .
  • the information processing system registers with the data store 103 the selected sentence st 1 in association with the tree structure data tr 1 , the noun phrase “Japanese electronics maker Nichiritsu” of the lemma a 1 , and the noun phrase “its home appliance and industrial equipment divisions” of the lemma a 2 , thereby index-updating the entry of the selected sentence st 1 (corresponding to (6) in FIG. 13 ).
  • the information processing system registers with the data store 103 the noun phrase of the lemma a 1 and the noun phrase of the lemma a 2 meeting the role of the tree structure pattern tp 1 in association with the corresponding another sentence, thereby index-updating the entry of the corresponding another sentence (corresponding to (7) in FIG. 13 ).
  • FIG. 20 is a flowchart indicating an example of processing procedure in a use example of the information processing system.
  • the information processing system accepts a search keyword input into the search keyword input box 1411 (step S 2001 ), and executes index search by use of the input search keyword when the SEARCH button 1412 is pressed (step S 2002 ).
  • the information processing system stores the selected sentence upon operation by an operator (step S 2003 ).
  • the information processing system sets the data set for confirmation 1500 upon operation by an operator (step S 2004 ). Then, as shown in FIG. 16 , the information processing system obtains the tree structure data tr 1 by parsing the selected sentence st 1 (step S 2005 ). The information processing system registers with the rule DB 102 the tree structure pattern tp 1 obtained by editing the tree structure data tr 1 , when an operator presses the ADD button 1505 (step S 2006 ), Pressing of the ADD button 1505 corresponds to addition in step S 1207 shown in FIG. 12 , and registration of the tree structure pattern tp 1 corresponds to step S 1208 shown in FIG. 12 .
  • the information processing system extracts a word or phrase of the lemma a 1 and a word or phrase of the lemma a 2 from the selected sentence st 1 according to the rule of the tree structure pattern tp 1 , and displays them as the extraction result 1800 (step S 2007 ).
  • step S 2007 the information processing system extracts lemmas from the selected sentence for each tree structure pattern tp 1 . Then, as shown in FIG. 19 , the information processing system index-updates the data store 103 as for the extracted lemmas (step S 2008 ).
  • the above-described information processing system has the word dictionary DB 101 and the rule DB 102 , and the processor 401 executes acceptance processing of accepting a maintenance request, and executes maintenance processing of performing, in the case where the maintenance request accepted in the acceptance processing is a maintenance request related to a word, maintenance of the word dictionary DB 101 as for the word group to which the word belongs, and performing, in the case where the maintenance request is a maintenance request related to a tree structure pattern, maintenance of the rule DB 102 as for the tree structure pattern.
  • the information processing system maintains only one of the word dictionary DB 101 and the rule DB 102 . Therefore, even if a certain word group in the word dictionary DB 101 is maintained, there is no need to maintain the rule for using the word group in the rule DB 102 . Conversely, even if a certain rule in the rule DB 102 is maintained, there, is no need to maintain a word group to be used by the rule. Accordingly, database maintenance is enabled to be facilitated.
  • the processor 401 executes specification processing of specifying the attribute of the word group to which the word should belong on the basis of the word.
  • the processor 401 adds the word to the word group having the attribute specified by the specification processing.
  • the processor 401 deletes the word from the word group to which the word belongs.
  • the processor 401 registers with the rule DB 102 the tree structure pattern in the case where the tree structure pattern does not exist in the rule DB 102 .
  • the processor 401 deletes the tree structure pattern from the rule DB 102 .
  • the processor 401 is capable of accessing the data store 103 storing a plurality of sentences, and executes acquisition processing of acquiring a specific tree structure pattern by abstracting specific tree structure data corresponding to a resultant by parsing a specific sentence in the data store 103 including a specific word among the plurality of sentences, by use of a specific word group including the specific word, extraction processing of extracting, from the specific tree structure data, a word contained in a word or phrase co-occurring with the specific word group (for example, a subject or an object in the case where the specific word group is a predicate verb) in the specific tree structure pattern acquired in the acquisition processing, and outputting processing of outputting the word extracted in the extraction processing so as to be displayed on a display screen.
  • acquisition processing of acquiring a specific tree structure pattern by abstracting specific tree structure data corresponding to a resultant by parsing a specific sentence in the data store 103 including a specific word among the plurality of sentences, by use of a specific word group including the specific word, extraction processing
  • a maintenance request related to a specific tree structure pattern is a request for addition of the specific tree structure pattern (for example, in the case where the ADD button 1505 is pressed)
  • the processor 401 registers the specific tree structure pattern with the rule DB 102 .
  • a word meeting the specific tree structure pattern is enabled to be displayed as a lemma of the specific sentence. Accordingly, in an example, in the case where an operator has in advance selected a word or phrase co-occurring with the specific word group with respect to the specific sentence, the specific tree structure pattern is confirmed with respect to the certainty thereof through comparison of the selected word with the lemma, and registered with the rule DB 102 .
  • the processor 401 is capable of accessing the data store 103 storing a plurality of sentences, and executes acquisition processing of acquiring the specific tree structure pattern by abstracting the specific tree structure data corresponding to the resultant by parsing the specific sentence in the data store 103 including the specific word among the plurality of sentences, by use of the specific word group including the specific word, extraction processing of extracting, from the specific tree structure data, the word contained in the phrase co-occurring with the specific word group in the specific tree structure pattern acquired in the acquisition processing, and updating processing of updating the data store 103 by associating the word extracted in the extraction processing with the specific sentence, in the maintenance processing, in the case where a maintenance request related to a specific tree structure pattern is a request for addition of the specific tree structure pattern, the processor 401 registers the specific tree structure pattern with the rule DB 102 .
  • the processor 401 extracts another word contained in the phrase co-occurring with the specific word group in the specific tree structure pattern from other tree structure data corresponding to the resultant obtained by parsing another sentence than the specific sentence among the plurality of sentences, and associates the another word extracted in the extraction processing with the another sentence to update the data store 103 .
  • another sentence in the data store 103 is also enabled to be registered in association with the another word meeting the specific tree structure pattern as a lemma of another sentence, resulting in making the rule defined by the specific tree structure pattern influence another sentence.
  • the information on the programs, tables, files, and the like for implementing the respective functions can be stored in a storage device such as a memory, a hard disk drive, or a solid state drive (SSD) or a recording medium such as an IC card, an SD card, or a DVD.
  • a storage device such as a memory, a hard disk drive, or a solid state drive (SSD) or a recording medium such as an IC card, an SD card, or a DVD.
  • control lines and information lines that are assumed to be necessary for the sake of description are described, but not all the control lines and information lines that are necessary in terms of implementation are described. It may be considered that almost all the components are connected to one another in actuality.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
US15/923,875 2017-07-20 2018-03-16 Information processing system and information processing method Active 2038-05-17 US11301441B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2017141076A JP7103763B2 (ja) 2017-07-20 2017-07-20 情報処理システムおよび情報処理方法
JPJP2017-141076 2017-07-20
JP2017-141076 2017-07-20

Publications (2)

Publication Number Publication Date
US20190026324A1 US20190026324A1 (en) 2019-01-24
US11301441B2 true US11301441B2 (en) 2022-04-12

Family

ID=61800264

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/923,875 Active 2038-05-17 US11301441B2 (en) 2017-07-20 2018-03-16 Information processing system and information processing method

Country Status (3)

Country Link
US (1) US11301441B2 (enExample)
EP (1) EP3432161A1 (enExample)
JP (1) JP7103763B2 (enExample)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7135399B2 (ja) * 2018-04-12 2022-09-13 富士通株式会社 特定プログラム、特定方法および情報処理装置
CN111563385B (zh) * 2020-04-30 2023-12-26 北京百度网讯科技有限公司 语义处理方法、装置、电子设备和介质
CN114090721B (zh) * 2022-01-19 2022-04-22 支付宝(杭州)信息技术有限公司 基于自然语言数据进行查询、数据更新的方法及装置
JP2023152343A (ja) * 2022-04-04 2023-10-17 株式会社日立製作所 生成装置、生成方法、および生成プログラム

Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5497319A (en) * 1990-12-31 1996-03-05 Trans-Link International Corp. Machine translation and telecommunications system
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US6119077A (en) * 1996-03-21 2000-09-12 Sharp Kasbushiki Kaisha Translation machine with format control
JP2001242885A (ja) 2000-02-28 2001-09-07 Sony Corp 音声認識装置および音声認識方法、並びに記録媒体
US6411962B1 (en) * 1999-11-29 2002-06-25 Xerox Corporation Systems and methods for organizing text
US20020173958A1 (en) * 2000-02-28 2002-11-21 Yasuharu Asano Speech recognition device and speech recognition method and recording medium
US20030023442A1 (en) * 2001-06-01 2003-01-30 Makoto Akabane Text-to-speech synthesis system
US20040225646A1 (en) * 2002-11-28 2004-11-11 Miki Sasaki Numerical expression retrieving device
US20040243394A1 (en) * 2003-05-28 2004-12-02 Oki Electric Industry Co., Ltd. Natural language processing apparatus, natural language processing method, and natural language processing program
US6928448B1 (en) * 1999-10-18 2005-08-09 Sony Corporation System and method to match linguistic structures using thesaurus information
US20050188330A1 (en) * 2004-02-20 2005-08-25 Griffin Jason T. Predictive text input system for a mobile communication device
US20050246316A1 (en) * 2004-04-30 2005-11-03 Lawson Alexander J Method and software for extracting chemical data
JP2006171969A (ja) 2004-12-14 2006-06-29 Konica Minolta Holdings Inc 文書処理装置
US7231379B2 (en) * 2002-11-19 2007-06-12 Noema, Inc. Navigation in a hierarchical structured transaction processing system
US20070179776A1 (en) * 2006-01-27 2007-08-02 Xerox Corporation Linguistic user interface
US20080010259A1 (en) * 2006-07-10 2008-01-10 Nec (China) Co., Ltd. Natural language based location query system, keyword based location query system and a natural language and keyword based location query system
JP2008129662A (ja) 2006-11-16 2008-06-05 Nec Corp 情報抽出装置、情報抽出方法、情報抽出プログラム
US7493252B1 (en) * 1999-07-07 2009-02-17 International Business Machines Corporation Method and system to analyze data
US20090198488A1 (en) * 2008-02-05 2009-08-06 Eric Arno Vigen System and method for analyzing communications using multi-placement hierarchical structures
US20090210411A1 (en) * 2008-02-15 2009-08-20 Oki Electric Industry Co., Ltd. Information Retrieving System
US20090240487A1 (en) * 2008-03-20 2009-09-24 Libin Shen Machine translation
JP2010267247A (ja) 2010-02-08 2010-11-25 Ntt Data Corp 情報検索装置、情報検索方法、端末装置、およびプログラム
US20110078167A1 (en) * 2009-09-28 2011-03-31 Neelakantan Sundaresan System and method for topic extraction and opinion mining
US20110307241A1 (en) * 2008-04-15 2011-12-15 Mobile Technologies, Llc Enhanced speech-to-speech translation system and methods
US20140214833A1 (en) * 2013-01-31 2014-07-31 Hewlett-Packard Development Company, L.P. Searching threads
US8898166B1 (en) * 2013-06-24 2014-11-25 Google Inc. Temporal content selection
US20150178271A1 (en) * 2013-12-19 2015-06-25 Abbyy Infopoisk Llc Automatic creation of a semantic description of a target language
US20150206031A1 (en) * 2014-01-02 2015-07-23 Robert Taaffe Lindsay Method and system of identifying an entity from a digital image of a physical text
US20160336004A1 (en) * 2015-05-14 2016-11-17 Nuance Communications, Inc. System and method for processing out of vocabulary compound words
US20170068655A1 (en) * 2015-09-09 2017-03-09 Quixey, Inc. System for Tokenizing Text in Languages without Inter-Word Separation
US20170371858A1 (en) * 2016-06-27 2017-12-28 International Business Machines Corporation Creating rules and dictionaries in a cyclical pattern matching process
US20180060302A1 (en) * 2016-08-24 2018-03-01 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text
US10332508B1 (en) * 2016-03-31 2019-06-25 Amazon Technologies, Inc. Confidence checking for speech processing and query answering
US10388274B1 (en) * 2016-03-31 2019-08-20 Amazon Technologies, Inc. Confidence checking for speech processing and query answering

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003263428A (ja) 2002-03-08 2003-09-19 Nippon Telegr & Teleph Corp <Ntt> 文型との照合による意味解釈方法、装置、当該方法の実行コンピュータプログラム、及び当該方法の実行コンピュータプログラムを記録した記憶媒体
JP2006309446A (ja) 2005-04-27 2006-11-09 Toshiba Corp 分類辞書更新装置、分類辞書更新プログラムおよび分類辞書更新方法

Patent Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5497319A (en) * 1990-12-31 1996-03-05 Trans-Link International Corp. Machine translation and telecommunications system
US6119077A (en) * 1996-03-21 2000-09-12 Sharp Kasbushiki Kaisha Translation machine with format control
US6081774A (en) * 1997-08-22 2000-06-27 Novell, Inc. Natural language information retrieval system and method
US7493252B1 (en) * 1999-07-07 2009-02-17 International Business Machines Corporation Method and system to analyze data
US6928448B1 (en) * 1999-10-18 2005-08-09 Sony Corporation System and method to match linguistic structures using thesaurus information
US6411962B1 (en) * 1999-11-29 2002-06-25 Xerox Corporation Systems and methods for organizing text
JP2001242885A (ja) 2000-02-28 2001-09-07 Sony Corp 音声認識装置および音声認識方法、並びに記録媒体
US20020173958A1 (en) * 2000-02-28 2002-11-21 Yasuharu Asano Speech recognition device and speech recognition method and recording medium
US20030023442A1 (en) * 2001-06-01 2003-01-30 Makoto Akabane Text-to-speech synthesis system
US7231379B2 (en) * 2002-11-19 2007-06-12 Noema, Inc. Navigation in a hierarchical structured transaction processing system
US20040225646A1 (en) * 2002-11-28 2004-11-11 Miki Sasaki Numerical expression retrieving device
US20040243394A1 (en) * 2003-05-28 2004-12-02 Oki Electric Industry Co., Ltd. Natural language processing apparatus, natural language processing method, and natural language processing program
US20050188330A1 (en) * 2004-02-20 2005-08-25 Griffin Jason T. Predictive text input system for a mobile communication device
US20050246316A1 (en) * 2004-04-30 2005-11-03 Lawson Alexander J Method and software for extracting chemical data
JP2006171969A (ja) 2004-12-14 2006-06-29 Konica Minolta Holdings Inc 文書処理装置
US20070179776A1 (en) * 2006-01-27 2007-08-02 Xerox Corporation Linguistic user interface
US20080010259A1 (en) * 2006-07-10 2008-01-10 Nec (China) Co., Ltd. Natural language based location query system, keyword based location query system and a natural language and keyword based location query system
JP2008129662A (ja) 2006-11-16 2008-06-05 Nec Corp 情報抽出装置、情報抽出方法、情報抽出プログラム
US20090198488A1 (en) * 2008-02-05 2009-08-06 Eric Arno Vigen System and method for analyzing communications using multi-placement hierarchical structures
US20090210411A1 (en) * 2008-02-15 2009-08-20 Oki Electric Industry Co., Ltd. Information Retrieving System
US20090240487A1 (en) * 2008-03-20 2009-09-24 Libin Shen Machine translation
US20110307241A1 (en) * 2008-04-15 2011-12-15 Mobile Technologies, Llc Enhanced speech-to-speech translation system and methods
US20110078167A1 (en) * 2009-09-28 2011-03-31 Neelakantan Sundaresan System and method for topic extraction and opinion mining
JP2010267247A (ja) 2010-02-08 2010-11-25 Ntt Data Corp 情報検索装置、情報検索方法、端末装置、およびプログラム
US20140214833A1 (en) * 2013-01-31 2014-07-31 Hewlett-Packard Development Company, L.P. Searching threads
US8898166B1 (en) * 2013-06-24 2014-11-25 Google Inc. Temporal content selection
US20150178271A1 (en) * 2013-12-19 2015-06-25 Abbyy Infopoisk Llc Automatic creation of a semantic description of a target language
US20150206031A1 (en) * 2014-01-02 2015-07-23 Robert Taaffe Lindsay Method and system of identifying an entity from a digital image of a physical text
US20160336004A1 (en) * 2015-05-14 2016-11-17 Nuance Communications, Inc. System and method for processing out of vocabulary compound words
US20170068655A1 (en) * 2015-09-09 2017-03-09 Quixey, Inc. System for Tokenizing Text in Languages without Inter-Word Separation
US10332508B1 (en) * 2016-03-31 2019-06-25 Amazon Technologies, Inc. Confidence checking for speech processing and query answering
US10388274B1 (en) * 2016-03-31 2019-08-20 Amazon Technologies, Inc. Confidence checking for speech processing and query answering
US20170371858A1 (en) * 2016-06-27 2017-12-28 International Business Machines Corporation Creating rules and dictionaries in a cyclical pattern matching process
US20180060302A1 (en) * 2016-08-24 2018-03-01 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
European Search Report issued in counterpart European Application No. 18161495.9 dated May 25, 2018 with English translation (nine (9) pages).
Japanese-language Office Action issued in Japanese Application No. 2017-141076 dated Apr. 6, 2021 with English translation (17 pages).
Levy et al., "Tregex and Tsurgeon: tools for querying and manipulating tree data structures", Proceedings of LREC-2006, 2006, four pages.
Ramakrishnan et al., "Database Management Systems", 2003, McGraw-Hill Education, 3rd edition, pp. 63-66 (Year: 2003). *
Reiss et al., "An Algebraic Approach to Rule-Based Information Extraction," IEEE 24th international Conference on Data Engineering, ICDE, Apr. 7, 2008, pp. 933-942, Piscataway, New Jersey, XP031246051.
SAE1962 et al., "Database Normalization", Wikipedia, pp. 1-6, Jul. 19, 2017, XP055471792, Retrieved from the internet: https://en.wikipedia.org/w/index.php?title=Database_normalization&oldid=791282721.
Stevenson, M., Greenwood, M.A. (2009) "Dependency Pattern Models for Information Extraction". Res on Lang and Comput 7, 13. (Year: 2009). *

Also Published As

Publication number Publication date
JP7103763B2 (ja) 2022-07-20
EP3432161A1 (en) 2019-01-23
JP2019021194A (ja) 2019-02-07
US20190026324A1 (en) 2019-01-24

Similar Documents

Publication Publication Date Title
JP4644420B2 (ja) ネットワークを介してデータを検索及び提示する方法及びマシン可読記憶装置
US7293018B2 (en) Apparatus, method, and program for retrieving structured documents
US8832133B2 (en) Answering web queries using structured data sources
US20100228711A1 (en) Enterprise Search Method and System
US11301441B2 (en) Information processing system and information processing method
JP5315368B2 (ja) 文書処理装置
JP2011100403A (ja) 情報処理装置、情報抽出方法、プログラム及び情報処理システム
JP7168411B2 (ja) 情報処理システムおよび情報処理方法
CN112597410A (zh) 基于规则配置库对网页内容执行结构化提取的方法及装置
US20180314766A1 (en) Data Processing System, Data Processing Method, and Data Structure
JP2000020537A (ja) テキスト検索装置及びテキスト検索プログラムを記録したコンピュータ読み取り可能な記録媒体
CN101452459B (zh) 利用索引查找相似翻译结果的系统及其方法
JP2001092844A (ja) 異種情報源問い合わせ変換方法及び装置及び異種情報源問い合わせ変換プログラムを格納した記憶媒体
JP2014089646A (ja) 電子データ処理装置、及び電子データ処理方法
JP2001101184A (ja) 構造化文書生成方法及び装置及び構造化文書生成プログラムを格納した記憶媒体
JP2008129662A (ja) 情報抽出装置、情報抽出方法、情報抽出プログラム
JP2009181524A (ja) 文書検索システム及び文書検索方法
KR100659370B1 (ko) 시소러스 매칭에 의한 문서 db 형성 방법 및 정보검색방법
JP2004118543A (ja) 構造化文書検索方法、検索支援方法、検索支援装置および検索支援プログラム
JPH1145238A (ja) 文書管理システムおよびそのシステムとしてコンピュータを機能させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体
US20230409620A1 (en) Non-transitory computer-readable recording medium storing information processing program, information processing method, information processing device, and information processing system
JP2005108006A (ja) 文書データ管理方法、文書データ管理システム及び文書データ管理用コンピュータプログラム
JPH10171815A (ja) 統合化検索装置
JP6476638B2 (ja) 固有用語候補抽出装置、固有用語候補抽出方法、及び固有用語候補抽出プログラム
KR100522719B1 (ko) 자질연산 구문분석기법을 이용한 범용정보 추출 템플리트구성방법

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: HITACHI, LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SATO, MISA;YANAI, KOHSUKE;YANASE, TOSHIHIKO;REEL/FRAME:045314/0867

Effective date: 20180226

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4