JP4963245B2 - Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium - Google Patents

Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium Download PDF

Info

Publication number
JP4963245B2
JP4963245B2 JP2007068208A JP2007068208A JP4963245B2 JP 4963245 B2 JP4963245 B2 JP 4963245B2 JP 2007068208 A JP2007068208 A JP 2007068208A JP 2007068208 A JP2007068208 A JP 2007068208A JP 4963245 B2 JP4963245 B2 JP 4963245B2
Authority
JP
Japan
Prior art keywords
semantic
syntax
analysis result
information
semantic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2007068208A
Other languages
Japanese (ja)
Other versions
JP2008233964A (en
Inventor
貴秋 田中
早苗 藤田
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2007068208A priority Critical patent/JP4963245B2/en
Publication of JP2008233964A publication Critical patent/JP2008233964A/en
Application granted granted Critical
Publication of JP4963245B2 publication Critical patent/JP4963245B2/en
Application status is Expired - Fee Related legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

  The present invention relates to a natural language processing technique, and more particularly to a language model creation technique used to determine the most probable analysis result from a plurality of syntax analysis results (trees) and semantic analysis results obtained from input sentences.

In recent years, with the development of the Internet and the spread of computers, electronic documents written in a natural language have been distributed in large quantities. Along with this, there is an increasing demand for analyzing these electronic documents with a computer and performing information processing such as automatic summarization, machine translation, and information retrieval.
In order to improve the accuracy of such information processing, it is important to correctly parse natural language sentences. In order to realize more advanced information processing, it is necessary to perform not only syntactic analysis but also deeper semantic analysis.

  The problem with performing such automatic syntax analysis and semantic analysis is the vast amount of analysis ambiguity. Normally, many parsing results are obtained especially when syntactic analysis of long sentences is performed. This is due to the syntactic ambiguity included in the target sentence, and it is important to eliminate this ambiguity and obtain a correct analysis result. Alternatively, even when the analysis result of the N-best method for selecting the lowest cost among the N candidates presented is used, it is required to rank the more probable analysis results at the top.

  Conventional parsing technology improves parsing ranking accuracy by introducing a statistical model into grammatical analysis that includes advanced linguistic knowledge that is used not only for syntactic analysis but also for semantic analysis. (For example, refer nonpatent literature 1-3 etc.). The statistical model used here is created using features created only from information obtained from the training data itself, such as the appearance probability of grammatical rules and information on words that are the main words of phrases.

Riezler, Stefan, Tracy H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell & Mark Johnson: 2002, 'Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques', in 41st Annual Meeting of the Association for Computational Linguistics: ACL-2002. Oepen, Stephan, Iran Flickinger, Kristina Toutanova & Christoper D. Manning; 2004, 'LinGO redwoods: A rich and dynamic treebank for HPSG', Research on Language and Computation, 2 (4}: 575-596. Malouf, Robert & Gertjan van Noord: 2004, 'Wide coverage parsing with stochastic attribute value grammars', in IJCNLP-04 Workshop: Beyond shallow analyses-Formalisms and statistical modeling for deep analyses., JST CREST. Malouf, Robert: 2002, 'A comparison of algorithms for maximum entropy parameter estimation', in CONLL-2002, Taipei, Taiwan.

  However, in such a conventional technique, there is a problem that, when ranking syntax / semantic analysis results, sufficient accuracy cannot be obtained only by a ranking method using a general character-based statistical model.

  For example, in the above prior art, when creating a statistical model, learning is performed using features created only from information obtained from the training data itself, so that there is a limit in itself. For example, a word often has a plurality of meanings, but in the conventional face-based statistical model, it does not reflect which meaning is used. However, syntactical behaviors and meanings are often different for different meanings. Therefore, even in the case of using the parsing level parsing in the prior art, the accuracy of the parsing is still about 80%. The accuracy of deeper semantic analysis is even lower, and it has not reached a level where automatic analysis results can be used.

  SUMMARY OF THE INVENTION The present invention is to solve such problems, and a syntax / semantic analysis ranking model creation method for creating a syntax / semantic analysis ranking model capable of ranking syntax / semantic analysis results with high accuracy, and An object is to provide an apparatus, a program, and a recording medium.

  In order to achieve such an object, a method for creating a syntax / semantic analysis result ranking model according to the present invention includes a combination of a processing target sentence composed of natural language data, its syntax / semantic analysis result, and an evaluation result indicating whether it is correct or not. This is a syntax / semantic analysis result ranking model creation method that creates a syntax / semantic analysis result ranking model for automatically ranking analysis results for natural language by machine learning of features created from Stores a set of sentence to be processed consisting of natural language data, its syntax / semantic analysis results, and evaluation results indicating the correctness of the analysis results, and stores a semantic information database for storing semantic information indicating meanings of various phrases The semantic analysis result from the syntactic / semantic analysis result read from the storage unit by the storage step and the semantic information extraction unit Extracting the semantic information related to the target word / phrase by searching the semantic information database of the storage unit for the processing target sentence read out from the storage unit based on the semantic analysis result or the target word / phrase selected from the processing target sentence A semantic information extraction step and a feature creation step for creating a feature to be used for creating a syntax / semantic analysis result ranking model by expanding the target phrase based on the semantic information extracted by the semantic information extraction unit by the feature creation unit And.

  At this time, the syntax information extraction unit further includes a syntax information extraction step of extracting the syntax analysis result from the syntax / semantic analysis result read from the storage unit and outputting the result as syntax information. Based on the output syntax information, a feature used for creating a syntax / semantic analysis result ranking model may be created.

  In addition, the semantic information database is composed of a sense bank database that stores the meaning-giving results for each target phrase of the processing target sentence, or a thesaurus / ontology database that stores various phrases and semantic categories that have a predetermined relationship with each other. By searching the semantic information database in the extraction step, the meaning of the target phrase or another phrase or semantic category having a predetermined relationship with the target phrase may be extracted as the semantic information.

  In addition, the storage unit further includes a storage step of storing a syntax meaning information dictionary that stores syntax meaning information indicating the meaning of the phrase for each syntax in which the phrase is used for various phrases, and the target phrase by the meaning information extraction step The syntactic and semantic information of the target phrase may be extracted as semantic information by searching the syntactic and semantic information dictionary.

  At this time, the syntactic and semantic information dictionary, the valence dictionary for storing the valence information of the target word, the representative word / typical word dictionary for storing the representative word or typical word of the target word, or the example case frame of the target word It is composed of any one or more of the case frame dictionaries with examples to be used, and the syntactic semantic information dictionary is searched by the semantic information extraction step, so that the valence information, the representative word, the typical word, or the example of the target phrase Any one or more of the rating frames may be extracted as semantic information.

  In addition, the syntax / semantic analysis result ranking model creation device according to the present invention machine-learns a feature created from a combination of a processing target sentence composed of natural language data, a syntax / semantic analysis result thereof, and an evaluation result indicating correctness. Is a syntax / semantic analysis result ranking model creation device for creating a syntax / semantic analysis result ranking model for automatically ranking analysis results for natural language, and includes a processing target sentence composed of natural language data and its syntax A storage unit that stores a set of semantic analysis results and an evaluation result indicating whether the analysis result is correct and stores a semantic information database that stores semantic information indicating meanings of various phrases, and a syntax read from the storage unit Extract the semantic analysis result from the semantic analysis result, and read it from the storage unit based on the semantic analysis result. By searching the semantic information database in the storage unit for the selected target word / phrase, the semantic information extraction unit that extracts semantic information related to the target word / phrase and the target word / phrase are expanded based on the semantic information extracted by the semantic information extraction unit And a feature creation unit that creates features used to create a syntax / semantic analysis result ranking model.

  At this time, it further includes a syntax information extraction unit that extracts the syntax analysis result from the syntax / semantic analysis result read from the storage unit and outputs the result as syntax information, and the feature creation unit outputs the syntax information output from the syntax information extraction unit. Based on this, a feature used to create a syntax / semantic analysis result ranking model may be created.

  In addition, the semantic information database is composed of a sense bank database that stores the meaning-giving results for each target phrase of the processing target sentence, or a thesaurus / ontology database that stores various phrases and semantic categories that have a predetermined relationship with each other. By searching the semantic information database in the extraction unit, the meaning of the target phrase or another phrase or semantic category having a predetermined relationship with the target phrase may be extracted as semantic information.

  In addition, the storage unit stores a syntax-semantic information dictionary that accumulates syntax-semantic information indicating the meaning of the word / phrase for each syntax in which the word / phrase is used. May be extracted as semantic information.

  At this time, in the syntactic and semantic information dictionary, a valence dictionary that accumulates valence information of the target word, a representative word / typical word dictionary that accumulates a representative word or typical word of the target word, or an example case frame of the target word By storing at least one of the example case frame dictionaries to be used and searching the syntactic semantic information dictionary by the semantic information extraction unit, the valence information, representative words, typical words, or examples of the target phrase Any one or more of the rating frames may be extracted as semantic information.

A program according to the present invention is a program for causing a computer to execute each step of the syntax / semantic analysis result ranking model creation method.
A recording medium according to the present invention is a recording medium on which the program is recorded.

  According to the present invention, the semantic information extraction unit extracts the semantic analysis result from the syntax / semantic analysis result read from the storage unit, and the target phrase selected from the processing target sentence read from the storage unit based on the semantic analysis result The semantic information database of the storage unit is searched for, the semantic information about the target word / phrase is extracted, and the target word / phrase is expanded by the feature generation unit based on the semantic information extracted by the semantic information extraction unit 11A.・ Features used to create a semantic analysis result ranking model are created.

  As a result, if there is a syntax / semantic analysis result of the sentence to be processed, an evaluation result indicating whether it is correct, and semantic information corresponding to this analysis result, this semantic information can be used to create a statistical model based on semantics instead of text. A certain syntax / semantic analysis result ranking model can be created. Therefore, it is possible to rank with higher accuracy compared to the case of ranking syntax / semantic analysis results using a face-based statistical model, which is extremely useful for natural language processing systems, information retrieval systems, machine translation systems, etc. Useful.

Next, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
First, a syntactic / semantic analysis model creation apparatus according to a first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the syntax / semantic analysis model creation apparatus according to the first embodiment of the present invention.
The syntactic / semantic analysis model creation apparatus 10 includes a general information processing apparatus such as a server or a personal computer, and each syntactic / semantic analysis result included in an input tree bank database (hereinafter referred to as tree bank DB) X. Is processed, and a syntax / semantic analysis ranking model Y for ranking syntax / semantic analysis results is created and output.

  In this embodiment, the storage unit stores a set of a processing target sentence composed of natural language data, a syntax / semantic analysis result thereof, and an evaluation result indicating whether the analysis result is correct or not, and a meaning indicating meanings of various words and phrases A semantic information database for storing information is stored in advance, the learning feature creation unit extracts the semantic analysis result from the syntax / semantic analysis result read from the storage unit, and reads from the storage unit based on the semantic analysis result By searching the semantic information database in the storage unit for the target phrase selected from the processing target sentence, the semantic information regarding the target phrase is extracted, and by expanding the target phrase based on the extracted semantic information, The features used to create the syntax / semantic analysis result ranking model are created.

The configuration of the syntax / semantic analysis model creation apparatus according to the first embodiment of the present invention will be described below in detail with reference to FIG.
The syntactic / semantic analysis model creation device 10 includes, as main functional units, an arithmetic processing unit 1, a storage unit 2, an input / output interface unit (hereinafter referred to as an input / output I / F unit), as in a general information processing device. 3) A communication interface unit (hereinafter referred to as a communication I / F unit) 4, an operation input unit 5, and a screen display unit 6 are provided.

The arithmetic processing unit 1 is composed of a microprocessor such as a CPU and its peripheral circuits, and reads and executes the program 20 stored in the storage unit 2, thereby causing the hardware and the program 20 to cooperate with each other to perform various processes. Realize the part.
The main processing units realized by the arithmetic processing unit 1 include a learning feature creation unit 11 and a machine learning unit 12.

The storage unit 2 is composed of a storage device such as a hard disk or a memory, and stores a program 20 executed by the arithmetic processing unit 1 and various processing information used for a semantic tag assignment process. The program 20 is read from the recording medium M via, for example, the input / output I / F unit 3 or read from an external device (not shown) via the communication I / F unit 4 and stored in the storage unit 2 in advance. The
Main processing information stored in the storage unit 2 includes a tree bank database (hereinafter referred to as tree bank DB) 21, a semantic information database (hereinafter referred to as semantic information DB) 22, and a syntax / semantic analysis ranking model 23.

The input / output I / F unit 3 includes a dedicated data input / output circuit, and is connected to a recording medium M such as a CD, a DVD, or a nonvolatile memory card in accordance with an instruction from the arithmetic processing unit 1. It has a function to input / output various data and programs such as bank DBX, syntax / semantic ranking model Y, dictionary, database, and the like.
The communication I / F unit 4 is composed of a dedicated data communication circuit, and communicates with an external device such as a server connected via a communication line such as a LAN in accordance with an instruction from the arithmetic processing unit 1 according to an instruction from the tree bank. It has a function of transmitting and receiving various data and programs such as DBX, syntax / semantic ranking model Y, dictionary, and database.

The operation input unit 5 includes an operation input device such as a keyboard and a mouse, and has a function of detecting an operation of the operator and outputting the operation to the arithmetic processing unit 1.
The screen display unit 6 includes a screen display device such as an LCD or a PDP, and has a function of displaying various data such as a tree bank DBX, syntax / semantic ranking model Y, and an operation screen in accordance with an instruction from the arithmetic processing unit 1. Have.

FIG. 2 is a block diagram showing a main part of the syntax / semantic analysis model creation apparatus according to the first embodiment of the present invention.
The learning feature creation unit 11 reads a set of the target processing sentence, its syntax / semantic analysis result, and an evaluation result indicating whether it is correct and stored in the tree bank DB 21 of the storage unit 2 and performs a ranking. Has the function to create.
The tree bank DB 21 includes a set of a processing target sentence composed of natural language data, a syntax / semantic analysis result obtained by performing language analysis on the processing target sentence in advance, and an evaluation result indicating whether the analysis result is correct or not, that is, correct / incorrect. Stored.

The learning feature creation unit 11 includes a semantic information extraction unit 11A, a syntax information extraction unit 11D, a feature creation unit 11E, and a feature selection unit 11H.
The semantic information extraction unit 11A selects a semantic analysis result from a syntax / semantic analysis result read from the tree bank DB 21 and a processing target sentence read from the tree bank DB 21 or a processing target sentence based on the semantic analysis result. For the target word / phrase, the semantic information DB 22 is searched to extract semantic information related to the target word / phrase. The target phrase is selected from the processing target sentences in the smallest unit having the meaning of the phrase alone.

The semantic information DB 22 is a database that stores semantic information indicating meanings related to various phrases, and is composed of either or both of the sense bank DB 22A and the thesaurus / ontology DB 22B.
The sense bank DB 22A is a database that accumulates the meaning-giving results for each target word / phrase constituting the target processing section for each processing target sentence.
The thesaurus / ontology DB 22B is a database that accumulates various phrases and semantic categories that have a predetermined relationship with each other.

Specifically, the semantic information extraction unit 11A includes a meaning addition unit 11B and a meaning provision unit 11C.
The meaning assigning unit 11B searches the sense bank DB 22A for the processing target sentence, extracts the meaning assignment result for the target phrase, and searches the thesaurus / ontology DB 22B, and searches for the broader terms, synonyms, semantic categories, And a function for extracting semantic information such as a semantic category.

When the meaning imparting unit 11B cannot extract the semantic information of the target phrase, the meaning imparting unit 11B displays semantic information on the phrase, semantic information such as broader terms, semantic categories, and superior semantic categories in the thesaurus / ontology DB 22B. It has a function to automatically estimate and output. Here, as an automatic estimation method of semantic information, statistical learning is performed by using an existing sense bank or thesaurus / ontology as learning data, and a statistical model for estimating semantic information is created. A method of using the meaning information having the highest appearance frequency among the meanings of the phrase can be considered.
The sense banks DB22A and DB22A are based on conditions required for the syntactic / semantic analysis ranking model and its creation process, for example, feature variation range, ranking accuracy, ranking required time, model creation required time, and the like. To select. Both the sense bank DB 22A and the thesaurus / ontology DB 22B may be provided, or one of them may be provided.

  The syntax information extraction unit 11D has a function of extracting the syntax analysis result from the syntax / semantic analysis result read from the tree bank DB 21.

  The feature creation unit 11E has a function of creating a feature used for creating a syntax / semantic analysis result ranking model based on the semantic information extracted by the semantic information extraction unit 11A and the syntax information extracted by the syntax information extraction unit 11B. Have. The feature is learning information used for creating a syntax / semantic analysis result ranking model, particularly for learning in the machine learning unit 12 described later.

The feature creation unit 11E specifically includes a semantic feature creation unit 11F and a syntactic feature creation unit 11G.
The semantic feature creation unit 11F expands the target phrase based on the meaning, broader terms and synonyms, semantic categories, and higher semantic categories of the target phrase included in the semantic information extracted by the semantic information extraction unit 11A. By doing so, it has a function of creating a semantic feature used for creating a syntax / semantic analysis result ranking model.
The syntactic feature creation unit 11G expands the grammatical rules applied to the target phrase, part of speech, and analysis based on the syntax information extracted by the syntax information extraction unit 11B, thereby generating a syntactic / semantic analysis result ranking model. It has a function to create syntactic features used for creation.

The feature selection unit 11H has a function of selecting a feature that is used for learning a feature that satisfies a predetermined selection condition among the features created by the feature creation unit 11E.
The machine learning unit 12 has a function of creating a syntax / semantic analysis ranking model 23 by machine learning of the features selected by the feature selection unit 11H.
The creation result of the syntax / semantic analysis ranking model 23 is fed back to the feature creation unit 11E and the feature selection unit 11H in order to change the creation and selection of the feature.
Eventually, the syntax / semantic analysis ranking model 23 created in this way is incorporated into a syntax / semantic analyzer (not shown) or the result of parsing is ranked by post-processing. The correct answer selection accuracy may be improved.

[Operation of First Embodiment]
Next, the operation of the syntax / semantic analysis model creation device according to the first embodiment of the present invention will be described with reference to FIG. FIG. 3 is a flowchart showing a model creation process of the syntax / semantic analysis model creation apparatus according to the first embodiment of the present invention.
The arithmetic processing unit 1 of the syntax / semantic analysis model creation apparatus 10 starts the model creation process of FIG. 3 when the operation input unit 5 detects the start operation of the model creation process by the operator.

  First, the arithmetic processing unit 1 uses the learning feature creation unit 11 to read a set of a target processing sentence, its syntax / semantic analysis result, and an evaluation result indicating whether it is correct or not, stored in the tree bank DB 21 of the storage unit 2. (Step 100).

The learning feature creation unit 11 extracts the semantic analysis result from the syntax / semantic analysis result by the semantic assignment unit 11B and the meaning provision unit 11C of the semantic information extraction unit 11A, and processes the sentence to be processed or its processing based on the semantic analysis result. By searching the semantic information DB 22 for the target phrase selected from the target sentence, semantic information regarding the target phrase is extracted (step 101).
Next, the learning feature creation unit 11 expands the target phrase based on the semantic information extracted by the semantic information extraction unit 11A by the semantic feature creation unit 11F of the feature creation unit 11E, thereby constructing the syntax / meaning. A semantic feature used to create an analysis result ranking model is created (step 102).

  In parallel with this, the learning feature creation unit 11 extracts the syntax analysis result from the syntax / semantic analysis result by the syntax information extraction unit 11D (step 103), and the syntactic feature creation unit 11E of the feature creation unit 11E Based on the syntax information extracted by the syntax information extraction unit 11B, the target word / phrase is expanded to create a syntactic feature to be used for creating a syntax / semantic analysis result ranking model (step 104).

  Thereafter, the learning feature creation unit 11 selects a feature that uses a feature satisfying a predetermined selection condition from among the features created by the feature creation unit 11E for learning (step 105), and an arithmetic processing unit 1, the machine learning unit 12 creates a syntax / semantic analysis ranking model 23 by machine learning of the features selected by the feature selection unit 11H (step 106), and ends a series of model creation processing.

[Syntax / semantic analysis results]
Next, with reference to FIG. 4 and FIG. 5, the syntax / semantic analysis results used in the present embodiment will be described. FIG. 4 is an example of a syntax analysis result. FIG. 5 is an example of a syntax / semantic analysis result. Here, a case where the processing target sentence is Japanese will be described as an example. However, any language such as English, Chinese, Spanish, German, French may be used.

FIG. 4 shows the result of syntactic analysis for the sentence to be processed “person driving the car”. The analysis result T1-1 is an example of the correct answer, and the analysis result T1-2 is the analysis result. Is an example of an incorrect answer.
On the other hand, FIG. 5 shows the result of syntactic analysis for the sentence to be processed “person driving a car”, and the analysis result T1-1-2 is an example of the correct answer, analysis result T1-1. -1, T1-2-1, and T1-2-2 are examples in which the analysis is incorrect.

The analysis result T1-1 in FIG. 4 alone does not reveal the relationship between “driving (car)” and “people”, but the analysis result evaluated as the correct answer among the semantic analysis results for T1-1 in FIG. From T1-1-2, it can be seen that “person” corresponds to term 1 out of two terms required by “drive”.
T1-1-1 in FIG. 5 has the same syntax structure as T1-1-2 (T1-1 in FIG. 4). However, in the case of T1-1-1, item 1 of “drive” is empty. In other words, “driving (car)” is interpreted as a mere relative clause for “people”.

  As a tree bank DB 21 for obtaining such a syntax / semantic analysis result, assuming that the sentence to be processed is Japanese, for example, as a tagged corpus that can acquire not only the syntax analysis result but also the semantic analysis result, "The Hinoki Treebank: Working Toward Text Understanding", Francis Bond, Sanae Fujita, Chikara Hashimoto, Kaname Kasahara, Shigeko Nariyama, Eric Nichols, Akira Otani, Takaaki Tanaka, Shigeaki Amano, COLING-2004, Geneva, Switzerland, 2004, 8 / 23-8 / 29), or as a tagged corpus from which predicate-term relationships can be extracted, NAIST text corpus (http://cl.naist.jp/nldata/corpus/), Kyoto text corpus Version 4.0 (http: //nlp.kuee.kyoto-u.ac.jp/nl-resource/corpus.html).

[Semantic information DB]
Next, the semantic information DB used in the present embodiment will be described with reference to FIGS. FIG. 6 is an example of meaning imparting information of the processing target sentence stored in the sense bank DB. FIG. 7 is an example of a dictionary entry that serves as a basis for assigning meaning to a processing target sentence stored in the sense bank DB. FIG. 8 is an example of the thesaurus / ontology of the processing target sentence stored in the thesaurus / ontology DB.

  The semantic information extraction unit 11A searches the sense bank DB 22A in FIG. 6 and the thesaurus / ontology DB 22B in FIG. 8 based on the semantic analysis results as shown in FIG. 5, and extracts semantic information. However, when there is no semantic analysis result as shown in FIG. 5, or when it is desired to create a syntax / semantic analysis ranking model for acquiring only the syntax analysis result, a predicate item that can be known only from the syntax analysis result as shown in FIG. It is also possible to use a structure. Here, the meaning of FIG. 6 is defined in FIG.

  As the sense bank DB 22A, assuming that the sentence to be processed is Japanese, for example, Lexed (“Construction of“ Basic Word Semantic Database: Lexeed ””, Kaname Kasahara, Hiroshi Sato, Francis Bond, Takaaki Tanaka, Sanae Fujita, Tomoko Kanesugi, Akinori Amano. 2004-NLC-159, PP.75-82, 2004), “The Hinoki Sensebank-A Large-Scale Word Sense Tagged Corpus of Japanese-”, Takaaki Tanaka , Francis Bond, Sanae Fujita. Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, Sydney, pp.62-69, 2006. "(ACL Workshop)"), Iwanami national language granted according to Iwanami Japanese Dictionary There is a corpus with a dictionary tag (http://gsk.or.jp/doc/IWANAMI2004.pdf).

However, when the sense bank DB 22A does not exist or when the sense bank DB 22A does not have the meaning imparting information of the processing target sentence, the semantic information extraction unit 11A uses the meaning granting unit 11C to define the meaning of the processing target sentence, thesaurus / What is necessary is just to acquire the semantic category, broader word, broader semantic category, etc. of the phrase in the ontology DB 22B.
Even when meaning cannot be given, it is also conceivable to acquire and use only the semantic category of the word, the broader word, the broader semantic category, etc. in the thesaurus / ontology DB 22B.

  As the thesaurus / ontology DB 22B, if the sentence to be processed is Japanese, for example, the Japanese vocabulary system (Nippon Telegraph and Telephone Corporation, http://www.kecl.ntt.co.jp/icl/mtg/resources/ GoiTaikei / index.html), classification vocabulary table-augmented revised edition-(National Institute for Japanese Language, http://www.kokken.go.jp/katsudo/kanko/data/index.html), 檜 Ontology ("Acquiring an Ontology for a Fundamental Vocabulary ", Francis Bond, Eric Nichols, Sanae Fujita, Takaaki Tanaka. In COLING-2004, Geneva, pp. 1319-1325, 2004.).

In these examples, the heading is a verb, but it is not necessarily a verb, and may be an adjective or a noun that takes a term. In these examples, the headings and the like are words, but are not necessarily words, and may include a plurality of words such as compound words.
In addition, the sense bank DB 22A does not need to have one sense bank, and a plurality of sense banks may be stored. For example, it is possible to store both a sense bank and a corpus with Iwanami dictionary dictionary. It is done. In addition, when there is no sense bank, or when there is no sense-of-granting information for the target sentence even when the sense bank exists, it is conceivable to use only the meaning-giving unit 11C.
Also, the thesaurus / ontology DB 22B need not have a single thesaurus / ontology, and a plurality of dictionaries may be stored. For example, it is conceivable to store both the Japanese vocabulary system and the 檜 ontology. .

[Semantic information extraction processing]
Next, semantic information extraction processing in the present embodiment will be described with reference to FIGS. 9 and 10. FIG. 9 is an example of words included in the thesaurus / ontology semantic category. FIG. 10 shows an example of the execution result of the semantic information extraction process.
The semantic information extraction unit 11A extracts a syntax / semantic analysis result and a determination result of whether the analysis result is correct or incorrect from one or a plurality of tree banks stored in the tree bank DB 21 (FIG. 4, 5). Here, assuming that the analysis result T1 of FIG. 4 is provided from the tree bank DB21, the semantic information extraction unit 11A acquires the meaning-giving result corresponding to the analysis result T1 from the sense bank DB22A of FIG.

  As shown in FIG. 8, in the semantic category of the thesaurus, as shown in FIG. 9, the words themselves included in the semantic category are also registered. Therefore, the semantic information extraction unit 11A acquires the semantic categories for the target words, that is, the content words “person”, “car”, and “driving” of the analysis result T1 from FIG. 9, and obtains the execution result as shown in FIG. .

  Here, in the thesaurus / ontology DB 22B to be used, if the semantic category includes a vocabulary instead of a word, for example, “driving (D2-1)” is included in <control>, the semantic information extraction unit 11A It is conceivable to acquire a semantic category corresponding to the meaning through the meaning-giving result. Conversely, if you do not understand the meaning of the word, acquire all the semantic categories corresponding to the face of the word, acquire the semantic category of the word meaning that the word is most commonly used, and estimate and use the semantic category of the word. Can be considered.

  Further, the semantic information extraction unit 11A acquires the upper category of each semantic category from FIG. Although FIG. 8 shows only up to level 3, if <maneuvering> is a semantic category of level 9 under <human activity> and <vehicle (land)> is a semantic category of level 7 under <animal> Top categories such as 10 can be acquired. Here, by setting various levels of higher-level categories to be acquired, it is possible to consolidate into semantic categories having various meanings.

[Semantic feature creation process]
Next, semantic feature creation processing in the present embodiment will be described with reference to FIG. FIG. 11 shows an example of the execution result of the semantic feature creation process. Of the features shown in FIG. 11, B0-B6 are face-based features, which are used in previous studies.

  The semantic feature creation unit 11F has the meaning of FIG. 10 extracted by the semantic information extraction unit 11A based on the analysis result T1-1 in FIG. 4, here the feature B0-B5 extracted from the analysis result T1-1. Using the information, the content words of the analysis result are expanded, or only some of the content words are expanded, and the features F0 to F5 based on the meaning information are created.

Similarly, the semantic feature creation unit 11F develops content words using the semantic categories of FIG. 10 or develops only some content words, and creates features C0 to C7 based on the semantic categories.
Similarly, the semantic feature creation unit 11F develops content words using the upper category of FIG. 10 or develops only some content words, and creates features H0 to H7 based on the upper category. Here, H0 to H7 use the level 3 semantic category as the upper category.

  In addition to the information in FIG. 11, various feature creation methods are conceivable, such as replacing or combining features with parts of speech. However, it is not necessary to create all the features shown in FIG. 11, and if the thesaurus / ontology DB 22B does not exist, features (C0-C7, H0-H7) using semantic categories and higher categories are not created. A method is conceivable.

Here, the effect when using the created feature is illustrated.
First, it is an effect in the case of using semantic information. The texts of (1) “Tighten the screw with a screwdriver” and (2) “The screwdriver tightened the screw” are very similar, but “Driver” in (1) means screw turning. While it is a driver (D4-1) and is an excellent example of “tightening”, the “driver” in (2) is a driver (D4-2) in the sense of the driver and is the main character. Even these very similar sentences can be completely different depending on the meaning.

When meaning information is not used, the following features are created as features of the correct structure / semantic analysis results, and extra ambiguity increases in the relationship between “tighten” and “driver”.
[Tighten, Item 1: Screwdriver, Item 2: Screw]
[Tighten, Item 1: Driver]
[Tighten, Item 1:-, Item 2: Screw, Special: Screwdriver]
[Tighten, excellent: driver]

However, by using the meaning,
[Tighten, Item 1 = Screwdriver (D4-2), Item 2: Screw (D5-1)]
[Tighten, Item 1: Driver (D4-2)]
[Tighten, Item 1:-, Item 2: Screw (D5-1), Special: Driver (D4-1)]
[Tighten, excellent: Driver (D4-1)]
Can be distinguished.

Next, the effect of using semantic categories is shown. (3) When he considers that he is driving a new car, the following features that are character-based features do not appear at all in FIG.
[Driving, Item 1: He, Item 2: New car]
However, from FIG. 9, it can be seen that the semantic categories of “he”, “new car”, and “driving” are <person>, <vehicle (land)>, and <maneuver>, respectively. The feature corresponding to C0-C7 of can be acquired.

  Next, the effect of using the upper semantic category is shown. (4) When considering “a test pilot who controls an airplane”, the face-based feature does not appear as in (3). The semantic categories of “test pilot”, “airplane”, and “maneuver” are <driver>, <vehicle (sky)>, and <maneuver>, respectively. , Has not appeared at all. However, when these semantic categories are further expanded into higher categories, they become <people>, <inanimate>, and <human activity> at level 3, respectively, and a feature corresponding to H0-H7 in FIG. 11 can be acquired.

[Feature selection process]
Next, feature selection processing in the present embodiment will be described with reference to FIG.
FIG. 11 shows the features created by the semantic feature creation unit 11 </ b> F, but it is not necessary to use all these features for creating the syntax / semantic analysis ranking model 23.
The feature selection unit 11H extracts or deletes a feature satisfying a specific condition, and determines whether the syntax / semantic analysis result is correct or incorrect and the feature corresponding to the analysis result to the machine learning unit 12. Output.

  The feature selection method includes, for example, feature selection such as using only basic features (B0-B6) and features (C0-C7) using semantic categories among features shown in FIG. These conditions may be used. The feature selection conditions are not fixed, and may be arbitrarily changed according to, for example, the nature and usage of the desired syntax / semantic analysis ranking model 23.

[Machine learning processing]
Next, machine learning processing in the present embodiment will be described with reference to FIG. FIG. 12 shows an example of the execution result of the machine learning process.
The machine learning unit 12 receives the determination result of the syntax / semantic analysis result from the learning feature creation unit 11 and the feature set created by the syntax / semantic analysis ranking feature creation unit 11E for the analysis result, It statistically learns which features appear when the analysis results are likely to be correct or incorrect, and outputs the learning results to the syntax / semantic analysis ranking model 23.

  The machine learning unit 12 stores one or more statistical learners. The learner is a learner that can give a probability that the analysis result is correct from the viewpoint of ranking, for example, maximum entropy / minimum divergence (MEMD) based on logarithmic linear model (Malouf (2002), http: /tadm.sourceforge.net) From the perspective of determining whether the analysis results are correct or incorrect, learners that can perform binary classification, such as Support Vector Machine (SVM) (Vapnik, VN: Statistical Leaning Theory, Adaptive and Learning Systems for Signal Processing Communications, and control, John Wiley & Sons (1998)).

  The processing results in FIG. 12 are parameter examples given to each feature by the machine learning unit 12 when maximum entropy / minimum divergence is used as the machine learning unit 12. If these parameters are high, the probability of being correct when the feature appears is high, and conversely if the parameters are negative, the probability of being incorrect when the feature appears is high.

In general, the feature that appears in both correct and incorrect data has small parameters. If it appears only in correct data, the parameter is large in the positive direction, or negative if it appears only in incorrect data. The parameter becomes larger.
Also, the weight of this parameter varies depending on the appearance frequency. When ranking new analysis results, a score (probability of being a correct answer) of each analysis result is calculated from the feature set and parameters obtained from each analysis result, and ranking is performed based on the score.

[Syntax / Semantic Ranking Model]
Next, the function of the syntax / semantic analysis ranking model 23 in the present embodiment will be described.
The syntax / semantic analysis ranking model 23 has a database function, and not only accumulates various features, but also evaluates the syntax / semantic analysis ranking model, and changes the creation and selection method of the features according to the evaluation result. Feedback is returned to the feature selection unit 11H and the feature creation unit 11E5.

  For example, the tree bank stored in the tree bank DB 21 is divided into training data and test data, and the test data is evaluated based on the syntax / semantic analysis ranking model learned from the training data. The feature selection unit 11H changes the feature to be used and the syntax / semantic analysis ranking model, and the evaluation result of the test data, that is, the analysis result that is actually correct in the tree bank is evaluated as the correct answer by the machine learning unit 12. Change the creation and selection method of features so that the ratio of being played is improved. For example, by changing the level of the upper category in various ways and using the level that gives the best evaluation result, or when the evaluation result deteriorates due to the type B3 feature, the type B3 feature is not used. is there.

  Based on the evaluation result in the machine learning unit, the validity of the feature is judged, the feature creation and selection method is changed, more effective feature creation and selection is performed, and more accurate syntax and meaning An analysis ranking model can be created.

[Syntax / Semantic Analysis Ranking Device]
Next, a syntax / semantic analysis ranking apparatus using the syntax / semantic analysis ranking model created by the syntax / semantic analysis ranking model creation apparatus according to the first embodiment of the present invention will be described with reference to FIG. . FIG. 13 is a block diagram showing a schematic configuration of the syntax / semantic analysis ranking apparatus.

The syntax / semantic analysis ranking device 30 is composed of a general information processing device such as a server or a personal computer. The syntax / semantic analysis ranking device 30 takes in an input sentence 3X made up of natural language data and refers to the syntax / semantic analysis ranking model 23 for ranking. A function to output the syntax / semantic analysis solution 3Y or a classification result indicating whether each analysis result is correct or not.
The syntax / semantic analysis ranking device 30 includes, as main functional units, a language analyzer 31, a syntax / semantic analyzer 32, a learning feature creation unit 11, A machine learning unit 12 is provided.

Among these, the learning feature creation unit 11 and the machine learning unit 12 are equivalent to those of the syntax / semantic analysis model creation device 10. The semantic information DB 22 and the syntax / semantic analysis ranking model 23 are also equivalent to those of the syntax / semantic analysis model creation device 10.
The language analyzer 31 includes a general morphological analyzer, a chunker, and the like, and has a function of performing morphological analysis and phrase division adjustment for performing morphological analysis of the input sentence 3X. The syntax / semantic analyzer 32 is a general syntax / semantic analyzer, and has a function of performing syntax / semantic analysis of the input sentence 3X based on the analysis result of the language analyzer 31.

In the syntax / semantic analysis ranking apparatus 30, first, the target sentence of syntax / semantic analysis is set as the input sentence 3X, the basic language analysis processing is performed by the language analyzer 31, and the syntax / semantic analyzer 32 performs syntax / semantic analysis. Do. At this time, if a plurality of analysis result candidates are obtained, the feature creation unit 11E creates features for the multiple syntax / semantic analysis solution candidates 33.
After that, by using the syntax / semantic analysis ranking model 23 created in advance by the syntax / semantic analysis model creation device 10, the probability of whether each analysis result is correct is calculated, and the analysis results are arranged in the descending order of probability. Is output as a syntax / semantic analysis solution 3Y. Alternatively, each analysis result is classified as a correct answer or an incorrect answer, and a classification result of correct answer or incorrect answer is attached and output.

  Here, the ranked syntax / semantic analysis solution 3Y obtained as the output result is obtained by extracting the analysis result having the highest probability of correct answer when using the syntax / semantic analysis result or using the correct answer. Various usage methods are conceivable, such as using some of the top analysis results with a high probability of or using an analysis result with a probability equal to or greater than a certain threshold.

  For example, in the case of considering the syntax / semantic analysis of (4) “Test pilot piloting an airplane”, syntactic analysis candidates and semantic analysis candidates are obtained as shown in FIGS. 14 and 15. FIG. 14 shows an example of syntax analysis candidates. FIG. 15 is a candidate example of semantic analysis. As described above, there are a plurality of analysis results for the input sentence 3X, and it is not known which of these analysis results is correct when only a plurality of syntax / semantic analysis solution candidates 33 are issued.

  On the other hand, in the syntax / semantic analysis ranking apparatus 30, when a feature is created through the feature creation unit 11E for each analysis result of the multiple syntax / semantic analysis solution candidates 33, for example, from U1-1-2 in FIG. The created features include the features C5 and H4-H7 among the features of FIG. 11 created from the correct data T1-1-2, and the features created from U1-1-1 of FIG. The features H6 and P6 in FIG. 11 are included.

  However, many of the features created from U1-1-1 are the same as the features created from T1-1-1 which is incorrect answer data. Similarly, the feature created from U1-2-1 includes the features C5, H5, and P5 in FIG. 11, and the feature created from U1-2-2 does not include any of the features in FIG. The features created from U1-2-1 and U1-2-2 include many of the same features created from T1-2-1 and T1-2-2, which are incorrect answer data. Parameters created from incorrect answer data have many negative parameters, and parameters created from correct answer data have many positive parameters.

  Therefore, when the probability of correct answer is calculated, the respective probabilities of U1-1-1, U1-1-2, U1-2-1, U1-2-2 are, for example, 8.263206, 20.586221, -0.0199882,- Assuming 10.57533, the syntax / semantic analysis solution 3Y rearranged as U1-1-2, U1-1-1, U1-2-1, U1-2-2 in descending order of probability of correct answers. can get.

[Effect of the first embodiment]
As described above, according to the present embodiment, the storage unit 2 stores a set of a processing target sentence composed of natural language data, a syntax / semantic analysis result thereof, and an evaluation result indicating whether the analysis result is correct or not, and various phrases. A semantic information DB 22 for storing semantic information indicating meanings is stored in advance, and the semantic information extraction unit 11A extracts a semantic analysis result from the syntax / semantic analysis result read from the storage unit, and based on the semantic analysis result. For the target word / phrase selected from the processing target sentence read from the storage unit, the semantic information DB 22 in the storage unit 2 is searched to extract semantic information related to the target word / phrase, and the feature creation unit 11E extracts the semantic information extraction unit 11A. The features used in the creation of the syntax / semantic analysis result ranking model 23 are expanded by expanding the target phrase based on the semantic information extracted in step (b). In which it was to be formed.

  As a result, if there is a syntax / semantic analysis result of the sentence to be processed, an evaluation result indicating whether it is correct, and semantic information corresponding to this analysis result, this semantic information can be used to create a statistical model based on semantics instead of text. A certain syntax / semantic analysis result ranking model 23 can be created. Therefore, it is possible to rank with higher accuracy compared to the case of ranking syntax / semantic analysis results using a face-based statistical model, which is extremely useful for natural language processing systems, information retrieval systems, machine translation systems, etc. Useful.

  In the present embodiment, the syntax information extraction unit 11D extracts the syntax analysis result from the syntax / semantic analysis result read from the storage unit 2 and outputs it as syntax information. Based on the syntax information, the feature creation unit Since the feature used to create the syntax / semantic analysis result ranking model is created by 11E, it is possible to perform learning using both syntax information and semantic information, and the syntax / semantic analysis result ranking model 23 Accuracy can be increased.

  In the present embodiment, as the semantic information DB 22, a sense bank database that accumulates the meaning-giving results for each target phrase of the processing target sentence, or a thesaurus / ontology database that accumulates various phrases and semantic categories having a predetermined relationship with each other. The semantic information extraction unit 11A searches the semantic information DB 22 to extract the meaning of the target phrase or another phrase or semantic category having a predetermined relationship with the target phrase as semantic information.

  Therefore, it is possible to extract semantic information such as broader terms and synonyms, semantic categories, and superior semantic categories of the target phrase. For this reason, even if the surface character-based or semantic-based features are too sparse, the statistical model can be smoothed to increase accuracy. Further, by adjusting the granularity of the semantic class and the superordinate concept, it is possible to construct a statistical model having an optimal granularity by further adjusting the degree of smoothing.

[Second Embodiment]
Next, a syntactic / semantic analysis model creating apparatus according to the second embodiment of the present invention will be described with reference to FIGS. FIG. 16 is a block diagram showing the configuration of the syntax / semantic analysis model creation apparatus according to the second embodiment of the present invention, in which the same or equivalent parts as those in FIG. FIG. 17 is a block diagram showing a main part of the syntax / semantic analysis model creation apparatus according to the second embodiment of the present invention, in which the same or equivalent parts as those in FIG.

In the first embodiment, the case where the learning feature creation unit 11 extracts semantic information from the semantic information DB 22 has been described as an example. In the present embodiment, a case in which semantic information is extracted by searching the semantic information DB 22 and the syntactic and semantic information dictionary 24 will be described.
Compared to the first embodiment, the syntax / semantic analysis model creating apparatus according to the present embodiment has a syntax / semantic information dictionary 24 added to the storage unit 2 and a learning feature creating unit 11. A syntax and semantic information extraction unit 11I is added to the semantic information extraction unit 11A. Other configurations are the same as those in the first embodiment, and a detailed description thereof is omitted here.

  The syntactic and semantic information dictionary 24 is a dictionary database that accumulates syntactic and semantic information indicating the meaning of each word or phrase for each word or phrase used in various words and phrases, and includes a valence dictionary database (hereinafter referred to as a valence dictionary DB) 23A, A representative word / typical word dictionary database (hereinafter referred to as a representative word / typical word dictionary DB) 23B, a case-specific case frame dictionary database (hereinafter referred to as a case-specific case frame dictionary DB) 23C, or all or any one of them. Is done.

The valence dictionary DB 24A is a database for accumulating the valence dictionary of the sentence to be processed, and the stored dictionary record includes at least the entry word, part of speech, and valence information of the target phrase. The valence information is a description of co-occurring words (mainly idioms) and conditions of words (mainly nouns) and phrases (mainly noun phrases).
The representative word / typical word dictionary DB 24B is a database for storing the representative word dictionary and the typical word dictionary of the sentence to be processed, and the dictionary record is likely to co-occur with at least the target word phrase, its part of speech, and the head word. Contains information about words (mainly nouns) and phrases (mainly noun phrases).
The example case frame dictionary DB 24C is a database for accumulating example case frame dictionaries for the sentence to be processed. The stored case frame dictionary contains at least the heading of the target word and phrase, its part of speech, and the actual headword. Contains information about examples of words and phrases that occur.

  The syntactic and semantic information extraction unit 11I obtains the relationship between the target word and other words, the broader terms and synonyms of the target word, the semantic category, the higher semantic category, and the like obtained by the meaning assignment unit 11B and the meaning provision unit 11C. It has a function of extracting the syntax semantic information related to the relationship from the syntax semantic information dictionary 24 and outputting it as semantic information.

  FIG. 18 shows an example of a valence dictionary entry. When the sentence to be processed is Japanese, the valence dictionary DB 24A, for example, Japanese in the ALT-JIE pattern pair dictionary (NTT, “Japanese-English Machine Translation Technology 1, NTT R & D vol.46, pp107-141, 1997). Side, IPAL (Japan Information Processing Promotion Association, “Computer Japanese Verb Dictionary IPAL (Basic Verbs)”, Commentary & Dictionary, 1987), EDR Electronic Dictionary (Japan Electronic Dictionary Institute, http: / /www.iijnet.or.jp/edr/J_index.html).

  As shown in FIG. 18, at least a headline, its part of speech, and case frame information are registered as a valence dictionary entry of a processing target sentence. For example, in the valence dictionary entry PID1 of the sentence to be processed, the heading “drive”, its part of speech “sa-changing noun”, and case frame information “N1 drives N2” are registered. Here, N1 and N2 indicate nouns or noun phrases.

Further, in FIG. 18, information on selection restrictions <people> and <vehicles> is also registered. Selection restrictions define noun or noun phrase conditions that apply as N1 or N2 in the case frame.
In FIG. 18, PID2 is registered in addition to PID1 as a valence dictionary entry of the heading “drive”.

  FIG. 19 shows an example of an entry in the representative word / typical word dictionary. The representative word / typical word dictionary assumes that the sentence to be processed is Japanese. For example, Akiba et al.'S representative vocabulary ("Interactive Generalization of a Translation Example Using Queries Based on a Semantic Hierarchy", Yasuhiro Akiba, Hiromi Nakaiwa, Satoshi Shirai and Yoshi-fumi 0oyama, in ICTAI-00, pp.326-332, 2000.) and the representative words of Naruyama et al. Tanaka, Hiromi Nakaiwa. In Machine Translation Summit X, Phuket, pp.3-10, 2005.).

  FIG. 20 is an example of entry of an example case frame. An example case frame dictionary includes, for example, a Web case frame (http://www.kc.tu-tokyo.ac.jp/c/dehverables.html) when the sentence to be processed is Japanese.

  As shown in FIG. 19 and FIG. 20, the representative word / typical word dictionary and the case frame dictionary with examples include words and phrases such as at least a headline of the sentence to be processed, its part of speech, and case frame information, as in the valence dictionary. The relationship between is registered. In the case of FIG. 19, information such as a word that a term of a certain headword is typically taken, its semantic category, or a typical word that easily enters the term is stored. In the case of FIG. 20, the case frame structure acquired from newspaper data and Web data, and information such as words appearing in each case and their semantic categories are stored.

In these examples, the heading is a verb, but it is not necessarily a verb, and may be an adjective or a noun that takes a term. In these examples, the headings and the like are words, but are not necessarily words, and may include a plurality of words such as compound words.
The number of dictionaries stored in the valence dictionary DB 24A, the representative word / typical word dictionary DB 24B, and the example case frame dictionary DB 24C does not have to be one, and a plurality of dictionaries may be stored. It is conceivable to store both the J / E valence dictionary and the IPAL. Further, any of the valence dictionary, the representative word / typical word dictionary, and the case frame dictionary with examples may not exist.

[Operation of Second Embodiment]
Next, the operation of the syntax / semantic analysis model creation device according to the second exemplary embodiment of the present invention will be described with reference to FIG.
The model creation process in the present embodiment is almost the same as that of the first embodiment shown in FIG. 3, but there are the following differences in steps 101 and 102.

  In the first embodiment, in step 101 of FIG. 3, the learning feature creation unit 11 uses the meaning addition unit 11B and the meaning addition unit 11C of the semantic information extraction unit 11A to obtain the semantic analysis result from the syntax / semantic analysis result. The semantic information related to the target phrase is extracted by searching the semantic information DB 22 for the target sentence selected from the processing target sentence or the processing target sentence based on the semantic analysis result.

  In this embodiment, in this step 101, the syntactic and semantic information extraction unit 11I further obtains the relationship between the target phrase and other phrases and the broader terms of the target phrase obtained by the meaning addition unit 11B and the meaning provision unit 11C. , Synonym, synonym, semantic category, higher-level semantic category, and the like are extracted from the syntactic and semantic information dictionary 24 and output as semantic information.

  At this time, if the information on the word used in the sentence to be processed is registered as an entry of a valence dictionary, a representative word / typical word dictionary, and an example case frame, the syntax semantic information extraction unit 11I To extract.

For example, when considering a processing target sentence T1: “person driving a car” in FIG. 6, the word information used in the processing target sentence is PID1 in FIG. It matches 19 RID2 and CID1 in FIG.
That is, in FIG. 18, there are PID1 and PID2 as the entry that matches the headword “Driving (1 part of speech)”. From FIG. 10, “person” and “car” of T1 are respectively Since it matches the semantic categories <person> and <vehicle (land)>, it can be seen that PID1 is a more appropriate entry.

  Here, as a condition for extracting a matching entry, among entries whose headword and part-of-speech match and only a headword match, it is registered in an entry that further matches the selection restriction or matches the selection preference. Various conditions such as satisfying all terms or satisfying some terms are conceivable. Also, depending on the way of matching, there are methods such as giving a cost and extracting the most costly entry, or using a plurality of costly entries to find the most effective entry.

Further, in the present embodiment, in step 102 of FIG. 3, the semantic feature creation unit 11F of the feature creation unit 11E displays semantic information including the syntax semantic information extracted by the syntax semantic information extraction unit 11I as the semantic information extraction unit. When received from 11A, the feature is created by the following method. FIG. 21 shows an example of the execution result of the semantic feature creation process.
First, there is a method of selecting an entry that coincides in common with each of the semantic information of FIG. 10 and the syntactic and semantic information including PID1 of FIG. 11, RD2 of FIG. 12, and CID1 of FIG. In this case, the features P0, R0, Y0 in FIG. 21 are selected.
Further, entries matching the valence dictionary, representative word / typical word dictionary, and example case frame may be selected. In this case, the features P1, R1, and Y1 in FIG. 21 are selected.

In addition, according to the method of searching what word (person, car) and what semantic category (<person>, <vehicle>) match each term (case), The feature P4 in FIG. 21 is selected.
Further, in the method of searching for which term (case) matches / disagrees, the features P3, P5, P6 (match) in FIG. 21 and the feature Y5 (mismatch) are selected.
Also, in the method of inserting SC0RE when matching, the feature P7 of FIG. 21 is selected, and in the method of inputting information about whether the SC0RE is high or low, the feature P8 of FIG. In the method of searching whether the appearance frequency is high or low, the features Y0-Y4 and Y6-Y8 in FIG. 21 are selected.

  In addition to the information in FIG. 21, various feature creation methods are conceivable, such as replacing or combining features with parts of speech. However, it is not necessary to create all the features shown in FIG. 21, and various methods are conceivable, such as not creating features (P0-P8) using a valence dictionary unless a valence dictionary exists.

Here, as an effect when the created feature is used, an effect when the syntax semantic information dictionary 24 is used will be described.
For example, when considering (5) “I accidentally stepped off the stairs”, the analysis result in (5) shows whether the “step” is “wrong” or “stepping”. Sex comes out. If PID3 and PID4 in FIG. 18 indicate that “Staircase” is “wrong”, N1 of PID3 matches, but does not match N2 at all. On the other hand, if “stepping off” is performed, a feature that matches both N1 and N2 of PID4 can be created, and a more probable feature can be created.

[Effect of the second embodiment]
As described above, in the present embodiment, the storage unit 2 stores in advance the syntactic and semantic information dictionary 24 that accumulates the syntactic and semantic information indicating the meaning of the phrase for each syntax in which the phrase is used for various phrases. Since the semantic information extraction unit 11A extracts the syntax semantic information of the target phrase as semantic information by searching the syntax semantic information dictionary for the target phrase, the information of the existing syntax dictionary can be diverted and efficiently Improve the accuracy of the syntax and semantic analysis system.

It is a block diagram which shows the structure of the syntactic / semantic analysis model creation apparatus concerning the 1st Embodiment of this invention. It is a block diagram which shows the principal part of the syntax and semantic analysis model creation apparatus concerning the 1st Embodiment of this invention. It is a flowchart which shows the model creation process of the syntax and semantic analysis model creation apparatus concerning the 1st Embodiment of this invention. It is an example of a parsing result. It is an example of a syntax and a semantic analysis result. It is an example of the meaning provision information of the process target sentence stored in the sense bank DB. It is an example of the entry of the dictionary used as the reference | standard of the meaning addition to the process target sentence stored in sense bank DB. It is an example of a thesaurus / ontology of a sentence to be processed stored in a thesaurus / ontology DB. It is an example of the word contained in the semantic category of a thesaurus / ontology. It is an execution result example of a semantic information extraction process. It is an execution result example of a semantic feature creation process. It is an execution result example of a machine learning process. It is a block diagram which shows schematic structure of a syntax and semantic analysis ranking apparatus. This is a candidate for parsing. This is a candidate example of semantic analysis. It is a block diagram which shows the structure of the syntax and the semantic analysis model creation apparatus concerning the 2nd Embodiment of this invention. It is a block diagram which shows the principal part of the syntax and semantic analysis model creation apparatus concerning the 2nd Embodiment of this invention. It is an example of an entry of a valence dictionary. It is an example of an entry of a representative word / typical word dictionary. It is an example of entry of an example case frame. It is an execution result example of a semantic feature creation process.

Explanation of symbols

  DESCRIPTION OF SYMBOLS 10 ... Syntax and semantic analysis model creation apparatus, 1 ... Operation processing part, 11 ... Learning feature creation part, 12 ... Machine learning part, 2 ... Memory | storage part, 20 ... Program, 21 ... Tree bank DB, 22 ... Semantic information DB , 22A ... sense bank DB 22B ... thesaurus ontology DB, 23 ... syntax / semantic analysis ranking model, 24 ... syntax semantic information dictionary, 24A ... valence dictionary DB, 24B ... representative word / typical word dictionary DB, 24C ... with examples Case frame dictionary DB, 3 ... Input / output I / F unit, 4 ... Communication I / F unit, 5 ... Operation input unit, 6 ... Screen display unit, X ... Tree bank DB, Y ... Syntax / semantic analysis ranking model, M ... Recording medium, 30 ... Syntax / semantic ranking device, 31 ... Language analyzer, 32 ... Syntax / semantic analyzer, 33 ... Syntax / semantic analysis candidate 3X ... Input sentence, 3Y ... Syntax / semantic solution Solution.

Claims (12)

  1. A syntax for automatically ranking the analysis results for natural language by machine learning of the features created from the combination of the processing target sentence consisting of natural language data, the syntax / semantic analysis result, and the evaluation result indicating correctness / incorrectness・ Syntactic analysis result ranking model creation method ・ Semantic analysis result ranking model creation method,
    Semantic information that stores a set of processing target sentences consisting of natural language data, their syntax / semantic analysis results, and evaluation results indicating the correctness of the analysis results, and stores semantic information indicating meanings of various phrases. A storage step for storing the database;
    The semantic information extraction unit extracts the semantic analysis result from the syntax / semantic analysis result read from the storage unit, and the processing target sentence read from the storage unit based on the semantic analysis result or the target phrase selected from the processing target sentence A semantic information extraction step of extracting semantic information related to the target phrase by searching a semantic information database of the storage unit,
    A feature creation step of creating a feature used to create a syntax / semantic analysis result ranking model by expanding the target phrase based on the semantic information extracted by the semantic information extraction unit by the feature creation unit; A syntactic and semantic analysis result ranking model creation method characterized by
  2. In the method of creating a syntax / semantic analysis result ranking model according to claim 1,
    The syntax information extraction unit further includes a syntax information extraction step of extracting a syntax analysis result from the syntax / semantic analysis result read from the storage unit and outputting the result as syntax information,
    Creating a feature used to create a syntax / semantic analysis result ranking model based on the syntax information output from the syntax information extraction unit in the feature creation step, .
  3. In the method of creating a syntax / semantic analysis result ranking model according to claim 1,
    The semantic information database comprises a sense bank database that accumulates the meaning-giving results for each target phrase of the processing target sentence, or a thesaurus / ontology database that accumulates various phrases and semantic categories that have a predetermined relationship with each other,
    By searching the semantic information database in the semantic information extraction step, the meaning of the target phrase or another phrase or semantic category having a predetermined relationship with the target phrase is extracted as the semantic information. A syntax / semantic analysis result ranking model creation method.
  4. In the method of creating a syntax / semantic analysis result ranking model according to claim 1,
    The storage unit further includes a storage step of storing a syntax and semantic information dictionary that stores syntax and semantic information indicating a meaning of the word / phrase for each syntax in which the word / phrase is used for various words and phrases,
    A syntactic / semantic analysis result ranking model creation method, wherein the semantic semantic information of the target word / phrase is extracted as the semantic information by searching the syntactic / semantic information dictionary for the target word / phrase in the semantic information extraction step.
  5. In the method for creating a syntax / semantic analysis result ranking model according to claim 4,
    The syntactic and semantic information dictionary stores a valence dictionary that accumulates valence information of the target word, a representative word / typical word dictionary that accumulates representative words or typical words of the target word, or an example case frame of the target word. It consists of one or more dictionaries with case frames with examples.
    By searching the syntactic and semantic information dictionary in the semantic information extraction step, any one or more of the valence information, representative words, typical words, and example case frames of the target phrase are converted into the semantic information. A syntactic / semantic analysis result ranking model creation method characterized by extracting as
  6. A syntax for automatically ranking the analysis results for natural language by machine learning of the features created from the combination of the processing target sentence consisting of natural language data, the syntax / semantic analysis result, and the evaluation result indicating correctness / incorrectness A syntax / semantic analysis result ranking model creation device for creating a semantic analysis result ranking model,
    Stores a pair of natural language data, a syntax / semantic analysis result, and an evaluation result indicating whether the analysis result is correct, and stores a semantic information database that stores semantic information indicating meanings of various phrases. A storage unit;
    A semantic analysis result is extracted from a syntax / semantic analysis result read from the storage unit, and a semantic information database of the storage unit is selected for a target phrase selected from a processing target sentence read from the storage unit based on the semantic analysis result. A semantic information extraction unit that extracts semantic information related to the target phrase by searching;
    A feature creation unit that creates a feature used for creating a syntax / semantic analysis result ranking model by expanding the target phrase based on the semantic information extracted by the semantic information extraction unit.・ Semantic analysis result ranking model creation device.
  7. In the syntactic / semantic analysis result ranking model creation device according to claim 6,
    A syntax information extraction unit that extracts the syntax analysis result from the syntax / semantic analysis result read from the storage unit and outputs the result as syntax information;
    The feature creation unit creates a feature used to create a syntax / semantic analysis result ranking model based on the syntax information output from the syntax information extraction unit. .
  8. In the syntactic / semantic analysis result ranking model creation device according to claim 6,
    The semantic information database comprises a sense bank database that accumulates the meaning-giving results for each target phrase of the processing target sentence, or a thesaurus / ontology database that accumulates various phrases and semantic categories that have a predetermined relationship with each other,
    The semantic information extraction unit searches the semantic information database to extract the meaning of the target phrase or another phrase or semantic category having a predetermined relationship with the target phrase as the semantic information. A syntactic / semantic analysis result ranking model creation device.
  9. In the syntactic / semantic analysis result ranking model creation device according to claim 6,
    The storage unit stores a syntax semantic information dictionary that accumulates syntax semantic information indicating the meaning of the phrase for each phrase in which the phrase is used for various phrases.
    The syntactic / semantic analysis result ranking model creation device, wherein the semantic information extraction unit extracts the syntactic and semantic information of the target word / phrase as the semantic information by searching the syntactic / semantic information dictionary for the target word / phrase.
  10. In the syntax / semantic analysis result ranking model creation device according to claim 9,
    The syntactic and semantic information dictionary stores a valence dictionary that accumulates valence information of the target word, a representative word / typical word dictionary that accumulates representative words or typical words of the target word, or an example case frame of the target word. Store one or more of the case frame dictionaries with examples,
    The semantic information extraction unit searches the syntactic and semantic information dictionary to obtain any one or more of valence information, representative words, typical words, and example case frames of the target word / phrase as the semantic information. A syntactic / semantic analysis result ranking model creation device characterized by being extracted as
  11.   A program for causing a computer to execute each step of the syntax / semantic analysis result ranking model creation method according to claim 1.
  12.   A recording medium on which the program according to claim 11 is recorded.
JP2007068208A 2007-03-16 2007-03-16 Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium Expired - Fee Related JP4963245B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007068208A JP4963245B2 (en) 2007-03-16 2007-03-16 Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007068208A JP4963245B2 (en) 2007-03-16 2007-03-16 Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium

Publications (2)

Publication Number Publication Date
JP2008233964A JP2008233964A (en) 2008-10-02
JP4963245B2 true JP4963245B2 (en) 2012-06-27

Family

ID=39906720

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007068208A Expired - Fee Related JP4963245B2 (en) 2007-03-16 2007-03-16 Syntax / semantic analysis result ranking model creation method and apparatus, program, and recording medium

Country Status (1)

Country Link
JP (1) JP4963245B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5302784B2 (en) * 2009-06-05 2013-10-02 株式会社日立製作所 Machine translation method and system
JP5254888B2 (en) * 2009-06-05 2013-08-07 日本電信電話株式会社 Language resource information generating apparatus, method, program, and recording medium
KR101061201B1 (en) 2009-09-03 2011-08-31 주식회사 다음커뮤니케이션 Search Ranking Model Simulation System and Its Methods
JP5881048B2 (en) * 2012-09-18 2016-03-09 株式会社日立製作所 Information processing system and information processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01113869A (en) * 1987-10-28 1989-05-02 Hitachi Ltd Japanese sentence analyzing system
JP2007011775A (en) * 2005-06-30 2007-01-18 Nippon Telegr & Teleph Corp <Ntt> Dictionary creating device, dictionary creation method, program, and recording medium

Also Published As

Publication number Publication date
JP2008233964A (en) 2008-10-02

Similar Documents

Publication Publication Date Title
Díaz-Negrillo et al. Error tagging systems for learner corpora
Rayson et al. The UCREL semantic analysis system.
Dey et al. Opinion mining from noisy text data
Bouma et al. Alpino: Wide-coverage computational analysis of Dutch
Rayson Matrix: A statistical method and software tool for linguistic analysis through corpus comparison
US8346534B2 (en) Method, system and apparatus for automatic keyword extraction
Roark et al. Computational approaches to morphology and syntax
Baroni et al. Introducing the La Repubblica Corpus: A Large, Annotated, TEI (XML)-compliant Corpus of Newspaper Italian.
CN1871597B (en) System and method for associating documents with contextual advertisements
JP4974445B2 (en) Method and system for providing confirmation
van Halteren Syntactic wordclass tagging
Avramidis et al. Enriching morphologically poor languages for statistical machine translation
US20150066484A1 (en) Systems and methods for an autonomous avatar driver
US6061675A (en) Methods and apparatus for classifying terminology utilizing a knowledge catalog
US20100180198A1 (en) Method and system for spell checking
Gambhir et al. Recent automatic text summarization techniques: a survey
Saggion et al. Automatic text summarization: Past, present and future
JP2008522332A (en) System and method for automatically expanding documents
US7805303B2 (en) Question answering system, data search method, and computer program
US7797303B2 (en) Natural language processing for developing queries
US7774198B2 (en) Navigation system for text
JP2005157524A (en) Question response system, and method for processing question response
KR101968102B1 (en) Non-factoid question answering system and computer program
US20130138696A1 (en) Method to build a document semantic model
US20170242840A1 (en) Methods and systems for automated text correction

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20090109

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20111125

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20111125

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20120309

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20120321

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20120322

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20150406

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees