JP2008242607A

JP2008242607A - Device, method and program for selecting proper candidate from language processing result

Info

Publication number: JP2008242607A
Application number: JP2007079381A
Authority: JP
Inventors: Kazuo Sumita; 一男住田; Takashi Masuko; 貴史益子
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-03-26
Filing date: 2007-03-26
Publication date: 2008-10-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language processor for selecting proper processing results from candidates of processing results. <P>SOLUTION: This language processor is provided with a first storage part 121 for storing constitutional units and a generation probability of a sentence; a second storage part 122 for storing a dependency relation expressed by the constitutional units of a dependency destination and a dependency origin and a conditional probability of the appearance of the constitutional units of the dependency origin with respect to the dependency destination; an input accepting part 101 for accepting the candidate of the processing result; an analyzing part 103 for analyzing the dependency structure of the candidate of the processing result; a calculation part 103a for acquiring the generation probability corresponding to the constitutional units of a sentence end from the first storage part 121 about each of the candidates of the dependency structure, and for acquiring the conditional probability corresponding to the dependency relation from the second storage part 122, and for calculating the generation probability of the candidate of the dependency structure as a product of all the conditional probabilities and a product of the generation probabilities; and a selection part 104 for searching the candidates of the dependency structure maximizing the calculated generation probability, and for selecting the candidates of the processing result corresponding to the searched candidate. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、音声認識や文字認識などから得られる複数の認識候補系列を入力として、係り受け解析によって妥当な系列を選択する装置、方法およびプログラムに関する。 The present invention relates to an apparatus, method, and program for selecting an appropriate sequence by dependency analysis using a plurality of recognition candidate sequences obtained from speech recognition, character recognition, and the like as input.

従来から、人が発話する音声や紙に記された文字イメージを対象にして、文字列や単語列に変換する音声認識および文字認識などの自然言語の認識処理技術が広く知られている。このような認識処理では、入力者が意図した文字列や単語列を１００％正しく認識する処理を実現することは困難である。例えば、音声認識では、特徴が類似する音韻の存在や背景雑音に起因して、ほとんどの場合認識誤りが発生する。 2. Description of the Related Art Conventionally, natural language recognition processing techniques such as speech recognition and character recognition for converting speech uttered by humans and character images written on paper into character strings and word strings are widely known. In such a recognition process, it is difficult to realize a process that correctly recognizes a character string or word string intended by the input person 100%. For example, in speech recognition, recognition errors often occur due to the presence of phonemes with similar features and background noise.

一般的な音声認識の処理では、まず、マイクロフォンなどによって取り込んだユーザの音声信号からＦＦＴ（高速フーリエ変換）分析などを適用して特徴量データを抽出する。続いて、あらかじめ保持している各音素の特徴量データの標準パターンを格納した音素辞書や、各単語を構成する音素記号列と単語見出しとの対応情報を格納した単語辞書を用いて、音声を文字列に変換する。 In general speech recognition processing, first, feature amount data is extracted from a user's speech signal captured by a microphone or the like by applying FFT (Fast Fourier Transform) analysis or the like. Subsequently, using a phoneme dictionary storing a standard pattern of feature value data of each phoneme stored in advance or a word dictionary storing correspondence information between phoneme symbol strings and word headings constituting each word, Convert to string.

代表的な音声認識手法であるＨＭＭ（隠れマルコフモデル）方式では、音素間の遷移関係を単語ネットワークで表現し、ネットワーク中のノード（音素に対応）間のリンクに確率値が付与されたデータを単語辞書に格納する。そして、入力された音声の特徴量データと音素辞書に格納された標準パターンと照合することにより、各音素との類似度を求め、この類似度に基づき、単語辞書に格納されている単語ネットワークを用いて、入力音声と最も類似度の高い単語候補を求める。 In the HMM (Hidden Markov Model) method, which is a typical speech recognition method, the transition relationship between phonemes is expressed by a word network, and data with probability values assigned to links between nodes (corresponding to phonemes) in the network is represented. Store in word dictionary. Then, by comparing the input feature data of the voice with the standard pattern stored in the phoneme dictionary, the similarity with each phoneme is obtained, and the word network stored in the word dictionary is obtained based on the similarity. The word candidate having the highest similarity to the input speech is used.

複数の単語から構成される文章を認識対象とする場合、上述のような単語認識を行うとともに、入力された音声信号に対して、正しい単語列を得ることが必要となる。例えば、候補として得られる複数の単語系列から最も確からしい単語系列を得るために、複数の単語に関する連接のしやすさを表現したｎ−ｇｒａｍなどの言語モデルが用いられる。 When a sentence composed of a plurality of words is used as a recognition target, it is necessary to perform word recognition as described above and obtain a correct word string for the input voice signal. For example, in order to obtain the most probable word sequence from a plurality of word sequences obtained as candidates, a language model such as n-gram expressing the ease of connection of a plurality of words is used.

ＨＭＭ方式によれば、このｎ−ｇｒａｍを用いた候補の絞込み処理も、単語認識処理と統一的に行うことができる。単語辞書における音素間の遷移確率や、ｎ−ｇｒａｍにおける単語間の遷移確率は、音声データやテキストコーパスからあらかじめ学習することが可能である。また、大量の音声データやテキストコーパスを利用することで信頼度の高い音声認識が実行可能となる。 According to the HMM method, the candidate narrowing process using the n-gram can be performed in a unified manner with the word recognition process. The transition probability between phonemes in the word dictionary and the transition probability between words in the n-gram can be learned in advance from speech data or a text corpus. Moreover, highly reliable speech recognition can be performed by using a large amount of speech data and a text corpus.

上述のような認識処理は、入力系列をＩと記述し、出力として得られる単語系列をＷと記述した場合、入力系列Ｉが与えられたときの条件付確率Ｐ（Ｗ｜Ｉ）を最大とするような単語系列Ｗを求めること（ａｒｇｍａｘ_W Ｐ（Ｗ｜Ｉ）と記述される）に相当する。ここで、Ｐ（Ｗ｜Ｉ）は、ベイズの定理によって、Ｐ（Ｗ｜Ｉ）＝Ｐ（Ｉ｜Ｗ）・Ｐ（Ｗ）／Ｐ（Ｉ）と変形できる。また、入力系列Ｉは与えられる値であるため、分母Ｐ（Ｉ）は固定と考えることができる。したがって、ａｒｇｍａｘ_W Ｐ（Ｉ｜Ｗ）・Ｐ（Ｗ）となる単語系列Ｗを求める問題として上記認識処理を定式化することができる。 In the above recognition processing, when the input sequence is described as I and the word sequence obtained as an output is described as W, the conditional probability P (W | I) when the input sequence I is given is maximized. This is equivalent to obtaining a word sequence W such as argmax _WP (W | I). Here, P (W | I) can be transformed to P (W | I) = P (I | W) · P (W) / P (I) by Bayes' theorem. Further, since the input sequence I is a given value, the denominator P (I) can be considered to be fixed. Therefore, the above recognition process can be formulated as a problem of obtaining a word sequence W that is argmax _W P (I | W) · P (W).

一方、例えば日本語を入力して認識処理を行った場合、認識した単語系列から得られる文節間の係り受け構造を、係り受け解析処理によって求める必要がある。係り受け解析とは、単語間の修飾・被修飾関係を解析して係り受け構造を求める処理である。係り受け解析の方法としては、品詞を解析の手掛かりとして文の統語的構造を求める構文解析の後、係り受け先の候補を絞り込み、係り受け構造を求める方法や、単語系列から直接係り受け解析を行って依存構造を求める方法などが存在する。非特許文献１では、単語間の係り受けに関する強度を依存確率としてモデル化し、確率的に係り受け解析を行う技術が提案されている。 On the other hand, for example, when the recognition process is performed by inputting Japanese, it is necessary to obtain a dependency structure between phrases obtained from the recognized word series by a dependency analysis process. The dependency analysis is a process for obtaining a dependency structure by analyzing a modification / modification relationship between words. Dependency analysis methods include syntactic analysis to find the syntactic structure of sentences using part-of-speech as a clue for analysis, then narrow down the candidates for dependency destinations to obtain dependency structure, and direct dependency analysis from word sequences. There are methods to go and find the dependency structure. Non-Patent Document 1 proposes a technique that models the strength of dependency between words as a dependency probability and performs dependency analysis stochastically.

大野誠寛、松原茂樹、河口信夫、稲垣康善「日本語音声対話文の統計的係り受け解析とその評価」、情報処理学会第６５回全国大会講演論文集、Ｖｏｌ．２、ｐｐ．１−２、２００３．Masahiro Ohno, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki “Statistical Dependency Analysis of Japanese Spoken Dialogue and Its Evaluation”, Information Processing Society of Japan 65th Annual Conference, Vol. 2, pp. 1-2, 2003.

しかしながら、非特許文献１の方法は、入力された１つの文節列から最適な係り受け構造を選択するものであるため、複数の文節列から最適な文節列を選択する処理に適用することができないという問題があった。 However, since the method of Non-Patent Document 1 selects an optimal dependency structure from one input phrase string, it cannot be applied to the process of selecting an optimal phrase string from a plurality of phrase strings. There was a problem.

例えば、音声認識で得られた複数の認識結果の候補から、最適な係り受け構造を有する候補を選択可能であれば、音声認識の精度の向上が実現できると考えられる。しかし、そもそも非特許文献１の方法は、与えられた１つの文節列を対象として最尤の係り受け構造を求めるものである。また、求められた係り受け構造は、単に入力された単語系列に対する最尤な構造であるため、別の単語系列から得られる構造とは比較することができない。したがって、非特許文献１の方法によって、複数の文節列から、係り受け構造を考慮して最適な文節列を選択するという処理を実現することはできない。 For example, if a candidate having an optimum dependency structure can be selected from a plurality of recognition result candidates obtained by speech recognition, it is considered that the accuracy of speech recognition can be improved. However, in the first place, the method of Non-Patent Document 1 seeks a maximum likelihood dependency structure for a given phrase string. Further, since the obtained dependency structure is simply the maximum likelihood structure for the input word sequence, it cannot be compared with a structure obtained from another word sequence. Therefore, by the method of Non-Patent Document 1, it is not possible to realize a process of selecting an optimum phrase string from a plurality of phrase strings in consideration of the dependency structure.

本発明は、上記に鑑みてなされたものであって、係り受け解析の解析結果を考慮して、認識処理などの処理結果の候補から適切な処理結果を選択することができる装置、方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and an apparatus, method, and program capable of selecting an appropriate processing result from candidates for processing results such as recognition processing in consideration of an analysis result of dependency analysis The purpose is to provide.

上述した課題を解決し、目的を達成するために、本発明は、文の構成単位についての処理結果の候補から前記処理結果を選択する言語処理装置であって、前記構成単位と、前記構成単位の生起確率とを対応づけて記憶する第１記憶部と、係り先となる前記構成単位および係り元となる前記構成単位によって表される係り受け関係と、前記係り先となる前記構成単位に対して前記係り元となる前記構成単位が出現する条件付確率とを対応づけて記憶する第２記憶部と、前記処理結果の候補の入力を受付ける入力受付部と、受付けた前記処理結果の候補のそれぞれについて、前記構成単位間の前記係り受け関係の組合せを表す係り受け構造を解析する解析部と、解析された前記係り受け構造の候補のそれぞれについて、文末の前記構成単位に対応する前記生起確率を前記第１記憶部から取得するとともに、前記係り受け構造に含まれる前記係り受け関係それぞれに対応する前記条件付確率を前記第２記憶部から取得し、取得したすべての前記条件付確率の積と取得した前記生起確率との積である前記係り受け構造の候補の生起確率を算出する算出部と、算出した前記生起確率が最大となる前記係り受け構造の候補を求め、求めた前記係り受け構造の候補に対応する前記処理結果の候補を前記処理結果として選択する選択部と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, the present invention provides a language processing apparatus that selects a processing result from processing result candidates for a sentence constituent unit, the constituent unit and the constituent unit. A first storage unit that stores the occurrence probability of the relationship, a dependency relationship represented by the structural unit that is a dependency destination and the structural unit that is a dependency source, and the structural unit that is the dependency destination A second storage unit that stores the conditional probability of occurrence of the constituent unit that is the source of the association, an input receiving unit that receives an input of the candidate processing result, and the received candidate processing result For each of the analysis unit for analyzing the dependency structure representing the combination of the dependency relationships between the structural units, and for each of the analyzed dependency structure candidates, the analysis unit corresponds to the structural unit at the end of the sentence. The occurrence probability is acquired from the first storage unit, the conditional probabilities corresponding to the dependency relationships included in the dependency structure are acquired from the second storage unit, and all the acquired conditions are acquired. A calculation unit that calculates the occurrence probability of the dependency structure candidate that is a product of the product of the attached probability and the acquired occurrence probability, and obtains the determination of the dependency structure candidate that maximizes the calculated occurrence probability. And a selection unit that selects the processing result candidate corresponding to the dependency structure candidate as the processing result.

また、本発明は、上記装置を実行することができる方法およびプログラムである。 Further, the present invention is a method and program capable of executing the above-described apparatus.

本発明によれば、係り受け解析の解析結果を考慮して、認識処理などの処理結果の候補から適切な処理結果を選択することができるという効果を奏する。 According to the present invention, it is possible to select an appropriate processing result from candidates for processing results such as recognition processing in consideration of the analysis result of dependency analysis.

以下に添付図面を参照して、この発明にかかる装置、方法およびプログラムの最良な実施の形態を詳細に説明する。 Exemplary embodiments of an apparatus, a method, and a program according to the present invention will be described below in detail with reference to the accompanying drawings.

（第１の実施の形態）
第１の実施の形態にかかる言語処理装置は、音声認識処理などの言語処理による複数の処理結果の候補を入力し、係り受け解析の結果を参照して最適な処理結果を選択するものである。なお、以下では、日本語を対象とした言語処理を例として説明するが、対象言語は日本語に限られるものではない。 (First embodiment)
The language processing apparatus according to the first embodiment inputs a plurality of processing result candidates by language processing such as speech recognition processing, and selects an optimum processing result by referring to the result of dependency analysis. . In the following, language processing for Japanese is described as an example, but the target language is not limited to Japanese.

図１は、第１の実施の形態にかかる言語処理装置１００の構成を示すブロック図である。図１に示すように、言語処理装置１００は、第１記憶部１２１と、第２記憶部１２２と、第３記憶部１２３と、入力受付部１０１と、制御部１０２と、出力部１０５とを備えている。 FIG. 1 is a block diagram illustrating a configuration of a language processing apparatus 100 according to the first embodiment. As shown in FIG. 1, the language processing apparatus 100 includes a first storage unit 121, a second storage unit 122, a third storage unit 123, an input receiving unit 101, a control unit 102, and an output unit 105. I have.

第１記憶部１２１は、文節の生起確率を格納する生起確率テーブル１２１ａを記憶するものである。図２は、第１の実施の形態の生起確率テーブル１２１ａのデータ構造の一例を示す説明図である。図２に示すように、生起確率テーブル１２１ａは、文節と、文節の生起確率とを対応づけて格納している。文節の生起確率は、大量の音声データやテキストコーパスを利用して事前に算出した値を生起確率テーブル１２１ａに格納する。 The 1st memory | storage part 121 memorize | stores the occurrence probability table 121a which stores the occurrence probability of a phrase. FIG. 2 is an explanatory diagram illustrating an example of a data structure of the occurrence probability table 121a according to the first embodiment. As shown in FIG. 2, the occurrence probability table 121a stores a phrase and the occurrence probability of the phrase in association with each other. As the phrase occurrence probability, a value calculated in advance using a large amount of speech data or a text corpus is stored in the occurrence probability table 121a.

第２記憶部１２２は、係り受け関係の条件付確率を格納する条件付確率テーブル１２２ａを記憶するものである。係り受け関係の条件付確率とは、係り受け関係の係り先となる文節に対して、係り受け関係の係り元となる文節が出現する確率を表すものである。係り受け関係の条件付確率についても、大量の音声データやテキストコーパスを利用して事前に算出した値を条件付確率テーブル１２２ａに格納する。 The second storage unit 122 stores a conditional probability table 122a that stores conditional probabilities of dependency relationships. The conditional probability of the dependency relationship represents the probability that the phrase that is the dependency source of the dependency relationship will appear with respect to the clause that is the dependency relationship of the dependency relationship. As for the conditional probability of the dependency relationship, a value calculated in advance using a large amount of speech data or a text corpus is stored in the conditional probability table 122a.

図３は、第１の実施の形態の条件付確率テーブル１２２ａのデータ構造の一例を示す説明図である。図３に示すように、条件付確率テーブル１２２ａは、係り元の文節と、係り先の文節と、条件付確率とを対応づけて格納している。 FIG. 3 is an explanatory diagram illustrating an example of a data structure of the conditional probability table 122a according to the first embodiment. As shown in FIG. 3, the conditional probability table 122 a stores a relation source clause, a relation destination clause, and a conditional probability in association with each other.

生起確率テーブル１２１ａおよび条件付確率テーブル１２２ａは、後述する算出部１０３ａが係り受け構造の生起確率を算出する際に参照される。 The occurrence probability table 121a and the conditional probability table 122a are referred to when the calculation unit 103a described later calculates the occurrence probability of the dependency structure.

第３記憶部１２３は、各単語の品詞情報などの辞書情報を格納する辞書テーブル１２３ａを記憶するものである。図４は、辞書テーブル１２３ａのデータ構造の一例を示す説明図である。図４に示すように、辞書テーブル１２３ａは、単語の見出しと、品詞と、自立語か付属語かを表すカテゴリとを対応づけて格納している。 The third storage unit 123 stores a dictionary table 123a that stores dictionary information such as part of speech information of each word. FIG. 4 is an explanatory diagram showing an example of the data structure of the dictionary table 123a. As shown in FIG. 4, the dictionary table 123a stores word headings, parts of speech, and categories representing independent words or attached words in association with each other.

後述するように、本実施の形態では、原則として文節系列の入力を受付けて係り受け解析等の処理を行うが、単語系列の入力を受付け、単語系列から生成した文節系列を対象として、同様の処理を行うように構成することができる。そして、辞書テーブル１２３ａは、単語系列から文節系列を生成するときに参照されるテーブルである。文節系列の生成処理の詳細については後述する。 As will be described later, in this embodiment, in principle, an input of a phrase sequence is received and dependency analysis or the like is performed, but the input of a word sequence is accepted and the same applies to a phrase sequence generated from the word sequence. It can be configured to perform processing. The dictionary table 123a is a table that is referred to when a phrase series is generated from a word series. Details of the phrase series generation processing will be described later.

なお、第１記憶部１２１、第２記憶部１２２、および第３記憶部１２３は、ＨＤＤ（Hard Disk Drive）、光ディスク、メモリカード、ＲＡＭ（Random Access Memory）などの一般的に利用されているあらゆる記憶媒体により構成することができる。 Note that the first storage unit 121, the second storage unit 122, and the third storage unit 123 are all commonly used such as a hard disk drive (HDD), an optical disk, a memory card, and a random access memory (RAM). It can be configured by a storage medium.

入力受付部１０１は、係り受け解析の対象となる文の構成単位として、文節系列の入力を受け付けるものである。入力受付部１０１は、例えば、音声認識、文字認識、および形態素解析などの言語処理の結果として得られた文節系列の入力を受付ける。 The input receiving unit 101 receives a phrase series input as a constituent unit of a sentence to be subjected to dependency analysis. For example, the input receiving unit 101 receives an input of a phrase series obtained as a result of language processing such as speech recognition, character recognition, and morphological analysis.

図５は、入力される文節系列の一例を示す説明図である。同図は、「太郎はまずい料理を食べた」を意味する日本語の音声に対する音声認識処理によって、類似する５つの音声認識結果の候補が生成され、それぞれに対応する５つの文節系列が入力された例を示している。本実施の形態の方法によれば、これら５つの候補のそれぞれについて文節間の係り受け関係が解析され、係り受け関係の解析結果を参照して音声認識結果として最も適切な候補１つを選択することが可能となる。 FIG. 5 is an explanatory diagram showing an example of an input phrase series. In the figure, five similar speech recognition result candidates are generated by speech recognition processing for Japanese speech meaning “Taro ate bad food”, and five corresponding phrase sequences are input. An example is shown. According to the method of the present embodiment, the dependency relationship between clauses is analyzed for each of these five candidates, and the most suitable candidate is selected as the speech recognition result with reference to the analysis result of the dependency relationship. It becomes possible.

なお、受付ける文の構成単位は文節系列に限られるものではなく、係り受け関係の解析対象とする単位であれば、単語などのその他の文の構成単位を受付けるように入力受付部１０１を構成してもよい。例えば、中国語のように助詞が存在しない言語を処理対象とする場合は、単語系列の入力を受付けて係り受け解析を行うように構成することができる。 Note that the composition unit of the sentence to be accepted is not limited to the phrase series, and the input reception unit 101 is configured to accept the composition unit of other sentences such as words as long as it is a unit subject to dependency relation analysis. May be. For example, when a language that does not have a particle, such as Chinese, is to be processed, a dependency analysis can be performed by receiving an input of a word sequence.

また、日本語のように文節系列に対して係り受け解析を行う場合であっても、まず単語系列の入力を受付け、辞書テーブル１２３ａを参照して単語系列から文節系列を生成するように構成してもよい。この場合、入力受付部１０１は、受付けた単語系列内の単語のカテゴリを辞書テーブル１２３ａから取得し、単語のカテゴリが付属語である場合に、カテゴリが自立語である直前の単語に付加することにより文節系列を生成していく。 Further, even when dependency analysis is performed on a phrase sequence such as Japanese, first, an input of a word sequence is received, and a phrase sequence is generated from the word sequence with reference to the dictionary table 123a. May be. In this case, the input receiving unit 101 acquires the category of the word in the accepted word sequence from the dictionary table 123a, and adds the word category to the immediately preceding word that is an independent word when the category of the word is an attached word. The phrase sequence is generated by

制御部１０２は、入力受付部１０１により受付けられた文節系列から最適な文節系列を処理結果として選択する処理を制御するものであり、解析部１０３と、選択部１０４とを備えている。 The control unit 102 controls processing for selecting an optimum phrase sequence from the phrase sequences received by the input receiving unit 101 as a processing result, and includes an analysis unit 103 and a selection unit 104.

解析部１０３は、入力受付部１０１により受付けられた文節系列に対して係り受け解析を行い、各文節間の係り受け関係の組合せによって表される係り受け構造の候補を生成するものである。また、解析部１０３は、係り受け解析処理の中で、生成された係り受け構造の候補の生起確率を算出する算出部１０３ａを備えている。 The analysis unit 103 performs dependency analysis on the phrase series received by the input receiving unit 101, and generates a dependency structure candidate represented by a combination of dependency relationships between clauses. The analysis unit 103 also includes a calculation unit 103a that calculates the occurrence probability of a generated dependency structure candidate during the dependency analysis process.

選択部１０４は、入力された文節系列の中から、係り受け解析の結果を参照して最適な文節系列を処理結果として選択するものである。具体的には、選択部１０４は、まず、解析部１０３によって解析された係り受け構造の候補と、各候補について算出部１０３ａによって算出された生起確率とを参照し、生起確率が最大となる係り受け構造の候補を求める。そして、選択部１０４は、求めた係り受け構造の候補に対応する文節系列を処理結果として選択する。 The selection unit 104 selects an optimum phrase series as a processing result by referring to the result of dependency analysis from the inputted phrase series. Specifically, the selection unit 104 first refers to the dependency structure candidates analyzed by the analysis unit 103 and the occurrence probabilities calculated by the calculation unit 103a for each candidate, so that the occurrence probability is maximized. Find the receiving structure candidate. Then, the selection unit 104 selects a phrase series corresponding to the obtained dependency structure candidate as a processing result.

出力部１０５は、選択部１０４によって選択された処理結果を出力するものである。 The output unit 105 outputs the processing result selected by the selection unit 104.

係り受け解析を行って最適な候補を選択する候補選択処理の詳細について説明する前に、係り受け構造の表現形式と、生起確率および係り受け構造の逐次的な生成過程について説明する。 Before describing the details of candidate selection processing for selecting an optimal candidate by performing dependency analysis, the expression format of the dependency structure, the occurrence probability, and the sequential generation process of the dependency structure will be described.

図６は、係り受け構造の表現形式の一例を示す説明図である。図６では、４文節からなる文節系列の例（文節系列２０１〜２０５）と、各文節系列について文節間の係り受け関係を図示した係り受け構造２０６〜２１０と、リスト構造で表現した係り受け構造（リスト構造２１１〜２１５）とが対応づけて示されている。 FIG. 6 is an explanatory diagram illustrating an example of a representation format of the dependency structure. In FIG. 6, an example of a phrase series including four phrases (phrase series 201 to 205), dependency structures 206 to 210 illustrating dependency relations between phrases for each phrase series, and dependency structures expressed in a list structure. (List structures 211 to 215) are shown in association with each other.

同図は、（１）文節間の係り受け関係が交差しない、（２）前の文節は後ろ文節に係る、という２つの条件を仮定した場合に、与えられた文節系列に対して解析されうる係り受け構造の例を示している。 This figure can be analyzed for a given phrase sequence, assuming two conditions: (1) the dependency relationship between clauses does not intersect; (2) the previous clause is related to the subsequent clause. An example of a dependency structure is shown.

文節系列が与えられたとき、前方の文節から後方の文節への係り受け関係が交差しない係り受け構造のうち、意味的に妥当な係り受け構造が存在する。例えば、４文節からなる文節系列の場合、図６に示すように５つの妥当な係り受け構造が存在する。なお、文節系列や係り受け構造内の数値は、文節を識別する値であって、文頭から文末に向けて１から始まる連番を付与している。 When a phrase series is given, there is a semantically valid dependency structure among dependency structures in which the dependency relationship from the preceding phrase to the subsequent phrase does not intersect. For example, in the case of a phrase sequence consisting of four phrases, there are five appropriate dependency structures as shown in FIG. The numerical values in the phrase series and the dependency structure are values for identifying the phrases, and are assigned serial numbers starting from 1 from the beginning to the end of the sentence.

例えば、文節系列２０１は係り受け構造２０６に対応しており、係り受け構造２０６は、１番目の文節（太郎の）が２番目の文節（姉の）に係り、２番目の文節（姉の）が３番目の文節（料理を）に係り、３番目の文節（料理を）が４番目の文節（食べた）に係ることを表している。なお、以下ではｉ番目の文節を第ｉ文節という場合がある。 For example, the phrase series 201 corresponds to the dependency structure 206, and the dependency structure 206 is related to the first phrase (Taro's) related to the second phrase (elder sister's), and the second phrase (elder sister's). Is related to the third phrase (cooking), and the third phrase (cooking) is related to the fourth phrase (eating). Hereinafter, the i-th clause may be referred to as the i-th clause.

また、リスト構造２１１は、係り受け構造２０６をリスト形式で表したものである。なお、ｄ（ｉ，ｊ）は、第ｉ文節が第ｊ文節に係る係り受け関係を表している。 The list structure 211 represents the dependency structure 206 in a list format. Note that d (i, j) represents a dependency relationship in which the i-th clause relates to the j-th clause.

次に、係り受け構造の生起確率について図７を用いて説明する。図７は、係り受け構造の生起確率の一例を示す説明図である。図７では、図６の係り受け構造２０６〜２１０それぞれに対する生起確率３０６〜３１０が示されている。 Next, the occurrence probability of the dependency structure will be described with reference to FIG. FIG. 7 is an explanatory diagram showing an example of the occurrence probability of the dependency structure. In FIG. 7, the occurrence probabilities 306 to 310 for the dependency structures 206 to 210 of FIG. 6 are shown.

なお、同図で、ｗ_iは第ｉ文節、Ｐ（ｗ_i）はｗ_iの生起確率、Ｐ（ｗ_i，ｗ_j）はｗ_iとｗ_jが同時に生起する確率、Ｐ（ｗ_i｜ｗ_j）は、ｗ_jに対するｗ_iの条件付確率を表す。これにより、例えば、係り受け構造２０６の生起確率は、｛Ｐ（ｗ₁，ｗ₂）／Ｐ（ｗ₂）｝・｛Ｐ（ｗ₂，ｗ₃）／Ｐ（ｗ₃）｝・Ｐ（ｗ₃，ｗ₄）＝Ｐ（ｗ₁｜ｗ₂）・Ｐ（ｗ₂｜ｗ₃）・Ｐ（ｗ₃｜ｗ₄）・Ｐ（ｗ₄）で表すことができる。 In the figure, w _i is the i-th clause, P (w _i ) is the probability of occurrence of w _i , P (w _i , w _j ) is the probability of occurrence of w _i and w _j simultaneously, and P (w _i | w _j ) represents the conditional probability of w _i with respect to w _j . Thus, for example, the occurrence probability of the dependency structure 206 is {P (w ₁ , w ₂ ) / P (w ₂ )} · {P (w ₂ , w ₃ ) / P (w ₃ )} · P ( w ₃ , w ₄ ) = P (w ₁ | w ₂ ) · P (w ₂ | w ₃ ) · P (w ₃ | w ₄ ) · P (w ₄ ).

これを一般化することにより、Ｎ文節からなる文節系列から得られる係り受け構造Ｓの生起確率は、Ｓに含まれるすべての係り受け関係ｄ（ｉ，ｊ）に対応する条件付確率Ｐ（ｗ_i｜ｗ_j）の積と、文末の文節の生起確率Ｐ（Ｗ_N）の積からなる以下の（１）式で表すことができる。

By generalizing this, the occurrence probability of the dependency structure S obtained from the phrase sequence consisting of N clauses becomes the conditional probability P (w) corresponding to all the dependency relationships d (i, j) included in S. _i | w _j ) and the product of the occurrence probability P (W _N ) of the sentence at the end of the sentence can be expressed by the following equation (1).

次に、係り受け構造の候補を求める過程を図８および図９を用いて説明する。図８は、４文節の文節系列から、可能な係り受け構造を求める過程を示した説明図である。 Next, a process for obtaining a dependency structure candidate will be described with reference to FIGS. FIG. 8 is an explanatory diagram showing a process of obtaining a possible dependency structure from a four-phrase phrase sequence.

まず、文末の２文節、すなわち第３文節と第４文節の係り受けを考える。２つの文節間での可能な係り受け関係はただ１つであり、この１つの係り受け関係からなる係り受け構造４０１が得られる。次に、２文節目を加えて可能な係り受け関係を考えると、第２文節からは第３文節に係るか、第４文節に係るかの２つの可能性しか存在しない。したがって、それらの可能性に対応して、係り受け構造４０２と係り受け構造４０３とが得られる。 First, consider the dependency of the last two clauses, that is, the third and fourth clauses. There is only one possible dependency relationship between two phrases, and a dependency structure 401 comprising this one dependency relationship is obtained. Next, considering the possible dependency relationship by adding the second clause, there are only two possibilities from the second clause, whether it relates to the third clause or the fourth clause. Therefore, the dependency structure 402 and the dependency structure 403 are obtained corresponding to these possibilities.

最後に、１文節目を加えて可能な係り受け関係を考える。係り受け構造４０２から導出される係り受け構造としては、係り受け構造２０６、２０７、および２０８の３種類が存在する。これは、第１文節を係り元とする場合、第２文節へ係り受けする場合、第３文節へ係り受けする場合、第４文節へ係り受けする場合の３種類の可能性が存在することに対応している。 Finally, consider the possible dependency relationships by adding the first sentence. There are three types of dependency structures derived from the dependency structure 402: dependency structures 206, 207, and 208. This is because there are three types of possibilities: when the first phrase is a dependency source, when it is dependent on the second phrase, when it is dependent on the third phrase, and when it is dependent on the fourth phrase. It corresponds.

また、係り受け構造４０３からは、係り受け構造２０９および２１０の２種類の構造が導出される。係り受け構造４０３を前提とした場合、第１文節からは第２文節と第４文節に対しては、互いに非交差な係り受け関係を構成することが可能である。これに対し、第１文節からは第３文節に対する係り受け関係は、第２文節から第４文節への係り受け関係と交差することになり、上述の係り受け構造の条件を満たさない。このため、係り受け構造４０３からは２種類の構造が導出される。 Further, from the dependency structure 403, two types of structures, the dependency structures 209 and 210, are derived. When the dependency structure 403 is assumed, it is possible to configure a dependency relationship that is non-intersecting from the first clause to the second and fourth clauses. On the other hand, the dependency relationship from the first clause to the third clause intersects with the dependency relationship from the second clause to the fourth clause, and does not satisfy the above-described dependency structure condition. For this reason, two types of structures are derived from the dependency structure 403.

図８では、４文節の場合についての係り受け構造の候補の生成過程について説明したが、文節数が増えた場合も同様に文末の文節から逐次的に構造を生成していくこと可能である。 In FIG. 8, the process of generating dependency structure candidates in the case of four clauses has been described. However, when the number of clauses increases, a structure can be sequentially generated from the clauses at the end of the sentence.

次に、（１）式に示した生起確率を求める過程を図９〜図１１を用いて説明する。図９〜１１は、図８で示した係り受け構造の生成過程に対応して、係り受け構造の生起確率を求める過程の一例を示す図である。 Next, the process for obtaining the occurrence probability shown in the equation (1) will be described with reference to FIGS. FIGS. 9-11 is a figure which shows an example of the process of calculating | requiring the occurrence probability of a dependency structure corresponding to the generation process of the dependency structure shown in FIG.

図９は、文末の２文節に１文節加えた係り受け構造の生起確率を算出する過程を示す説明図である。図９の係り受け構造４０１に対しては、文末の文節の生起確率と、文末の文節に対する直前の文節の条件付確率との積によって生起確率が算出される。この例では、Ｐ（ｗ₃｜ｗ₄）・Ｐ（ｗ₄）で表すことができる。 FIG. 9 is an explanatory diagram showing a process of calculating the occurrence probability of a dependency structure in which one sentence is added to two sentences at the end of the sentence. For the dependency structure 401 in FIG. 9, the occurrence probability is calculated by the product of the occurrence probability of the sentence at the end of the sentence and the conditional probability of the immediately preceding phrase with respect to the sentence at the end of the sentence. In this example, it can be represented by P (w ₃ | w ₄ ) · P (w ₄ ).

係り受け構造４０１を元に、係り受け構造４０２と係り受け構造４０３とを生成する場合、新たに第２文節から第３文節への係り受け関係と、第２文節から第４文節への係り受け関係とを、それぞれ係り受け構造４０１に付加することになる。したがって、各係り受け構造４０２および４０３の生起確率は、付加した係り受け関係に対応する条件付確率であるＰ（ｗ₂｜ｗ₃）およびＰ（ｗ₂｜ｗ₄）を、係り受け構造４０１の生起確率Ｐ（ｗ₃｜ｗ₄）・Ｐ（ｗ₄）に乗じて算出することができる。これにより、係り受け構造４０２および係り受け構造４０３の生起確率は、それぞれＰ（ｗ₂｜ｗ₃）・Ｐ（ｗ₃｜ｗ₄）・Ｐ（ｗ₄）およびＰ（ｗ₂｜ｗ₄）・Ｐ（ｗ₃｜ｗ₄）・Ｐ（ｗ₄）となる。 When the dependency structure 402 and the dependency structure 403 are generated based on the dependency structure 401, the dependency relationship from the second phrase to the third phrase and the dependency relation from the second phrase to the fourth phrase are newly added. Each relationship is added to the dependency structure 401. Accordingly, the occurrence probabilities of the respective dependency structures 402 and 403 are P (w ₂ | w ₃ ) and P (w ₂ | w ₄ ), which are conditional probabilities corresponding to the added dependency relationships, and the dependency structures 401. Can be calculated by multiplying the occurrence probability P (w ₃ | w ₄ ) · P (w ₄ ). Thus, the occurrence probabilities of the dependency structure 402 and the dependency structure 403 are P (w ₂ | w ₃ ) · P (w ₃ | w ₄ ) · P (w ₄ ) and P (w ₂ | w ₄ ), respectively. P (w ₃ | w ₄ ) · P (w ₄ )

図１０は、さらに１文節加えた係り受け構造の生起確率を算出する過程を示す説明図である。図１０は、係り受け構造４０２および４０３の生起確率から、係り受け構造２０６〜２１０の生起確率を算出する過程を示している。 FIG. 10 is an explanatory diagram showing a process of calculating the occurrence probability of the dependency structure with one more sentence added. FIG. 10 shows a process of calculating the occurrence probabilities of the dependency structures 206 to 210 from the occurrence probabilities of the dependency structures 402 and 403.

係り受け構造４０２および構造４０３に対し、さらに第１文節を付加した係り受け構造の生起確率についても、上述と同様に、新たに付加する係り受け関係に対応する条件付確率を、係り受け構造４０２または係り受け構造４０３の生起確率に乗ずることにより算出することができる。 As for the occurrence probability of the dependency structure in which the first clause is further added to the dependency structures 402 and 403, the conditional probability corresponding to the newly added dependency relationship is also set as described above. Alternatively, it can be calculated by multiplying the occurrence probability of the dependency structure 403.

図１１は、上記のような生起確率の算出過程を一般化した場合を説明するための模式図である。図１１は、Ｉ＋１文節からＮ文節までの部分的な文節系列に対して、係り受け構造Ｓ（係り受け構造１１０１）の生起確率Ｐが得られているときに、その直前に文節Ｉ（文節１１０２）を付け加えた場合の生起確率の算出方法を示している。 FIG. 11 is a schematic diagram for explaining a case where the process of calculating the occurrence probability as described above is generalized. FIG. 11 shows that when the occurrence probability P of the dependency structure S (the dependency structure 1101) is obtained for a partial clause sequence from the I + 1 clause to the N clause, the clause I (the clause 1102) immediately before that is obtained. ) Is added, the calculation method of the occurrence probability is shown.

この場合、Ｉ＋１文節からＮ文節までの文節に対して、係り受けが非交差となる文節Ｉに対する係り先の文節を文節ｉとすると、付加される係り受け関係はｄ（Ｉ，ｉ）となる。このため、新たに生成される係り受け構造Ｓ’は[ｄ（Ｉ，ｉ）｜Ｓ]となる。ここで、[ｄ（Ｉ，ｉ）｜Ｓ]とは、係り受け構造Ｓに係り受け関係ｄ（Ｉ，ｉ）を追加した係り受け構造を表す。また、係り受け構造Ｓ’の生起確率Ｐ’は、Ｐに条件付確率Ｐ（ｗ_I｜ｗ_i）を乗じた値となる。 In this case, with respect to the clauses from the I + 1 clause to the N clause, if the dependency clause for the clause I whose dependency is non-intersecting is the clause i, the dependency relationship added is d (I, i). . Therefore, the newly generated dependency structure S ′ is [d (I, i) | S]. Here, [d (I, i) | S] represents a dependency structure in which the dependency relationship d (I, i) is added to the dependency structure S. Further, the occurrence probability P ′ of the dependency structure S ′ is a value obtained by multiplying P by the conditional probability P (w _I | w _i ).

以上説明したように、本実施の形態では、文末から逐次的に係り受け構造を生成し、生成した係り受け構造の生起確率を算出する。 As described above, in this embodiment, a dependency structure is sequentially generated from the end of a sentence, and the occurrence probability of the generated dependency structure is calculated.

次に、このように構成された第１の実施の形態にかかる言語処理装置１００による候補選択処理について図１２を用いて説明する。図１２は、第１の実施の形態における候補選択処理の全体の流れを示すフローチャートである。 Next, candidate selection processing by the language processing apparatus 100 according to the first embodiment configured as described above will be described with reference to FIG. FIG. 12 is a flowchart showing an overall flow of candidate selection processing in the first embodiment.

まず、入力受付部１０１は、複数の文節系列の入力を受付ける（ステップＳ１２０１）。なお、上述のように、入力受付部１０１が単語系列の入力を受付け、辞書テーブル１２３ａを参照して文節系列を生成するように構成してもよい。 First, the input receiving unit 101 receives an input of a plurality of phrase series (step S1201). As described above, the input receiving unit 101 may receive an input of a word series and generate a phrase series by referring to the dictionary table 123a.

次に、制御部１０２は、生起確率の最大値Ｐｍａｘを０で初期化する（ステップＳ１２０２）。次に、解析部１０３は、受付けた複数の文節系列から１つの文節系列を取得し、取得した文節系列について文節間の係り受け関係を解析する係り受け解析処理を実行する（ステップＳ１２０３）。係り受け解析処理では、解析された係り受け構造の候補と、各候補の生起確率とが出力される。係り受け解析処理の詳細については後述する。 Next, the control unit 102 initializes the maximum value Pmax of the occurrence probability to 0 (step S1202). Next, the analysis unit 103 obtains one phrase series from the accepted plurality of phrase series, and executes dependency analysis processing for analyzing the dependency relation between phrases for the acquired phrase series (step S1203). In the dependency analysis process, the analyzed dependency structure candidates and the occurrence probabilities of the candidates are output. Details of the dependency analysis processing will be described later.

次に、選択部１０４は、係り受け解析処理の処理結果である係り受け構造の候補と、各候補の生起確率とを参照し、生起確率の最大値Ｐを選択する（ステップＳ１２０４）。続いて、選択部１０４は、選択した最大値ＰがＰｍａｘより大きいか否かを判断する（ステップＳ１２０５）。 Next, the selection unit 104 refers to the dependency structure candidate, which is the processing result of the dependency analysis process, and the occurrence probability of each candidate, and selects the maximum value P of the occurrence probability (step S1204). Subsequently, the selection unit 104 determines whether or not the selected maximum value P is larger than Pmax (step S1205).

ＰがＰｍａｘより大きい場合は（ステップＳ１２０５：ＹＥＳ）、選択部１０４は、ＰをＰｍａｘに設定し、Ｐに対応する係り受け構造の候補を、出力する処理結果の候補（出力候補）として選択する（ステップＳ１２０６）。 When P is larger than Pmax (step S1205: YES), the selection unit 104 sets P to Pmax, and selects a dependency structure candidate corresponding to P as an output processing result candidate (output candidate). (Step S1206).

出力候補を選択した後、または、ステップＳ１２０５でＰがＰｍａｘより大きくないと判断された場合（ステップＳ１２０５：ＮＯ）、制御部１０２は、すべての文節系列を処理したか否かを判断する（ステップＳ１２０７）。 After selecting an output candidate or when it is determined in step S1205 that P is not greater than Pmax (step S1205: NO), the control unit 102 determines whether or not all phrase sequences have been processed (step S1205). S1207).

すべての文節系列を処理していない場合は（ステップＳ１２０７：ＮＯ）、解析部１０３は、次の文節系列を選択して処理を繰り返す（ステップＳ１２０３）。すべての文節系列を処理した場合は（ステップＳ１２０７：ＹＥＳ）、出力部１０５は、選択された出力候補を出力し（ステップＳ１２０８）、候補選択処理を終了する。 If all the phrase series have not been processed (step S1207: NO), the analysis unit 103 selects the next phrase series and repeats the process (step S1203). When all the phrase sequences have been processed (step S1207: YES), the output unit 105 outputs the selected output candidate (step S1208), and ends the candidate selection process.

このように、従来の方法では、１つの文節系列に対して最適な係り受け関係を選択するだけであったのに対し、本実施の形態によれば、複数の文節系列のそれぞれの係り受け関係を解析し、最適な係り受け関係が得られる文節系列を、最適な文節系列として選択することが可能となる。 As described above, in the conventional method, only the optimum dependency relationship is selected for one phrase sequence, but according to the present embodiment, each dependency relationship of a plurality of phrase sequences is selected. It is possible to select a phrase sequence from which the optimal dependency relationship is obtained as the optimal phrase sequence.

次に、ステップＳ１２０３の係り受け解析処理の詳細について図１３および図１４を用いて説明する。図１３は、第１の実施の形態における係り受け解析処理の全体の流れの概要を示すフローチャートである。図１４は、第１の実施の形態における係り受け解析処理の全体の流れの詳細を示すフローチャートである。 Next, details of the dependency analysis processing in step S1203 will be described with reference to FIGS. FIG. 13 is a flowchart showing an outline of the entire flow of dependency analysis processing according to the first embodiment. FIG. 14 is a flowchart showing details of the entire flow of dependency analysis processing according to the first embodiment.

すなわち、図１４は図１３で示した係り受け解析処理の概要を表すフローチャートを詳細化したフローチャートに相当する。なお、図１３の各ステップと図１４の各ステップとの対応は、図１４内に示している。 That is, FIG. 14 corresponds to a detailed flowchart of the flowchart representing the outline of the dependency analysis process shown in FIG. The correspondence between each step in FIG. 13 and each step in FIG. 14 is shown in FIG.

図１３で、まず、解析部１０３は、指定された文節系列の文節数が２であるか否かを判断する（ステップＳ１３０１）。後述するように、係り受け解析処理は、文節系列から先頭の文節を削除した文節系列に対して再帰的に実行される。このため、文節の削除を繰り返して最終的に文末の２文節に到達したか否かを判断し、この場合に実行される係り受け解析処理では、特別に算出した生起確率等を返す必要がある。このための判定処理がステップＳ１３０１の処理に相当する。 In FIG. 13, the analysis unit 103 first determines whether or not the number of phrases in the specified phrase series is 2 (step S1301). As will be described later, the dependency analysis process is recursively executed on a phrase series obtained by deleting the first phrase from the phrase series. For this reason, it is necessary to repeatedly delete the clauses and finally determine whether or not the last two clauses have been reached, and in the dependency analysis process executed in this case, it is necessary to return a specially calculated occurrence probability or the like . The determination process for this corresponds to the process of step S1301.

文節数が２である場合は（ステップＳ１３０１：ＹＥＳ）、算出部１０３ａは、文末の２文節についての係り受け構造を生成して出力するとともに、生成した係り受け構造の生起確率を算出して出力する（ステップＳ１３０２）。この場合、算出部１０３ａは、図８および図９で示した方法によって、係り受け構造の生成と、生起確率の算出を行う。 When the number of clauses is 2 (step S1301: YES), the calculation unit 103a generates and outputs a dependency structure for the two clauses at the end of the sentence, and calculates and outputs the occurrence probability of the generated dependency structure. (Step S1302). In this case, the calculation unit 103a generates a dependency structure and calculates the occurrence probability by the method shown in FIGS.

文節数が２でない場合は（ステップＳ１３０１：ＮＯ）、指定された文節系列の先頭の文節を除いた文節系列に対して再帰的に係り受け解析処理を実行する（ステップＳ１３０３）。係り受け解析処理では、上述のように、解析された係り受け構造の候補と、各候補の生起確率とが出力される。 If the number of phrases is not 2 (step S1301: NO), the dependency analysis process is recursively performed on the phrase series excluding the first phrase of the specified phrase series (step S1303). In the dependency analysis process, as described above, the analyzed dependency structure candidates and the occurrence probabilities of the candidates are output.

なお、文節数が１の文節系列が入力された場合は係り受け解析を行う必要がないため処理を終了するが、同図では省略している。 Note that when a phrase series having the number of phrases of 1 is input, the dependency analysis is not necessary, and thus the processing is terminated, but is omitted in FIG.

次に、解析部１０３は、係り受け解析処理の解析結果である係り受け構造の各候補に対して、直前の文節を追加した係り受け構造を生成する（ステップＳ１３０４）。このとき、解析部１０３は、新たに付加される係り受け関係が、既存の係り受け構造の各係り受け関係と交差しないような係り受け構造を生成する。 Next, the analysis unit 103 generates a dependency structure in which the immediately preceding clause is added to each dependency structure candidate that is an analysis result of the dependency analysis processing (step S1304). At this time, the analysis unit 103 generates a dependency structure in which the newly added dependency relationship does not intersect with each dependency relationship of the existing dependency structure.

次に、算出部１０３ａは、追加した文節との係り受け関係に対応する条件付確率を用いて、生成した係り受け構造の生起確率を算出する（ステップＳ１３０５）。このとき、算出部１０３ａは、新たに付加される係り受け関係に対応する条件付確率を、条件付確率テーブル１２２ａから取得して生起確率の算出に利用する。 Next, the calculation unit 103a calculates the occurrence probability of the generated dependency structure using the conditional probability corresponding to the dependency relationship with the added clause (step S1305). At this time, the calculation unit 103a acquires the conditional probability corresponding to the newly added dependency relationship from the conditional probability table 122a and uses it for calculating the occurrence probability.

なお、ステップＳ１３０４で新たに生成される係り受け構造の候補が複数存在する場合があるため、ステップＳ１３０５では、算出部１０３ａは各候補に対してそれぞれ生起確率を算出する。 Since there may be a plurality of dependency structure candidates newly generated in step S1304, the calculation unit 103a calculates the occurrence probability for each candidate in step S1305.

次に、解析部１０３は、生成した係り受け構造の候補と、各候補について算出した生起確率とを出力して係り受け解析処理を終了する（ステップＳ１３０６）。 Next, the analysis unit 103 outputs the generated dependency structure candidates and the occurrence probabilities calculated for the candidates, and ends the dependency analysis processing (step S1306).

このように、本実施の形態では、係り受け解析処理を再帰的に呼び出すことにより、文末側から係り受け構造の候補を逐次生成するとともに、生成した候補の生起確率を逐次算出することができる。また、本実施の形態では、このとき、事前に準備された生起確率テーブル１２１ａに記憶された文末の文節の生起確率を用いて係り受け構造の生起確率を順次算出している。 As described above, in the present embodiment, the dependency analysis process is recursively called, whereby the dependency structure candidates are sequentially generated from the sentence end side, and the occurrence probability of the generated candidates can be sequentially calculated. In this embodiment, the occurrence probability of the dependency structure is sequentially calculated using the occurrence probability of the sentence at the end of the sentence stored in the occurrence probability table 121a prepared in advance.

このように、生起確率テーブル１２１ａに記憶された文節の生起確率を用いて係り受け構造の生起確率を算出しているため、複数の文節系列それぞれに対して算出された係り受け構造の生起確率を相互に比較することが可能となる。このため、生起確率を比較することによって生起確率が最大となる係り受け構造を求め、求めた係り受け構造に対応する文節系列を最適な処理結果として選択することが可能となる。 Thus, since the occurrence probability of the dependency structure is calculated using the occurrence probability of the phrase stored in the occurrence probability table 121a, the occurrence probability of the dependency structure calculated for each of the plurality of phrase sequences is calculated. It becomes possible to compare with each other. For this reason, it is possible to obtain a dependency structure that maximizes the occurrence probability by comparing the occurrence probabilities, and to select a phrase series corresponding to the obtained dependency structure as an optimum processing result.

次に、係り受け解析処理の詳細について図１４を用いて説明する。まず、解析部１０３は、指定された開始文節Ｉと、文節数Ｎと、文節系列Ｗとを取得する（ステップＳ１４０１）。 Next, details of the dependency analysis process will be described with reference to FIG. First, the analysis unit 103 acquires the designated start phrase I, number N of phrases, and phrase series W (step S1401).

次に、解析部１０３は、文節数Ｎが２であるか否かを判断し（ステップＳ１４０２）、文節数が２である場合は（ステップＳ１４０２：ＹＥＳ）、係り受け構造の集合ＳＬとして[ｄ（１，２）]を、生起確率の集合ＰＬとして［Ｐ（ｗ₁｜ｗ₂）・Ｐ（ｗ₂）］を出力し（ステップＳ１４０３）、係り受け解析処理を終了する。ステップＳ１４０３での出力内容は、再帰的に係り受け解析処理が実行され、最終的に文末の２文節に対する係り受け解析処理が実行されたときの出力内容を表している。 Next, the analysis unit 103 determines whether or not the number of clauses N is 2 (step S1402). If the number of clauses is 2 (step S1402: YES), [d [1,2]] is output as a set PL of occurrence probabilities [P (w ₁ | w ₂ ) · P (w ₂ )] (step S1403), and the dependency analysis process is terminated. The output contents in step S1403 represent the output contents when the dependency analysis process is recursively executed and the dependency analysis process for the last two clauses is finally executed.

文節数が２でない場合は（ステップＳ１４０２：ＮＯ）、解析部１０３は、さらに文節数が２以上であるか否かを判断する（ステップＳ１４０４）。２以上でない場合は（ステップＳ１４０４：ＮＯ）、係り受け解析ができないので係り受け解析処理を終了する。 When the number of phrases is not 2 (step S1402: NO), the analysis unit 103 further determines whether the number of phrases is 2 or more (step S1404). If it is not 2 or more (step S1404: NO), the dependency analysis process is terminated because the dependency analysis cannot be performed.

文節数が２以上である場合は（ステップＳ１４０４：ＹＥＳ）、解析部１０３は、先頭の文節を除き、開始文節をＩ＋１、文節数をＮ−１、文節系列をＷとして指定して再帰的に係り受け解析処理を実行する（ステップＳ１４０５）。 When the number of clauses is 2 or more (step S1404: YES), the analysis unit 103 recursively designates the start clause as I + 1, the number of clauses as N-1, and the clause series as W except for the first clause. A dependency analysis process is executed (step S1405).

次に、解析部１０３は、解析結果として、係り受け構造の候補の集合Ｌ２と、各候補の生起確率の集合Ｐ２とを取得する（ステップＳ１４０６）。続いて、解析部１０３は、直前の文節を追加した場合の係り受け構造の候補を生成するとともに各候補の生起確率を算出するため、以下のステップＳ１４０７〜ステップＳ１４１７を実行する。 Next, the analysis unit 103 acquires a set L2 of dependency structure candidates and a set P2 of occurrence probabilities of each candidate as analysis results (step S1406). Subsequently, the analysis unit 103 executes the following steps S1407 to S1417 in order to generate a dependency structure candidate when the immediately preceding phrase is added and to calculate the occurrence probability of each candidate.

まず、解析部１０３は、生成する係り受け構造の候補を格納するための集合Ｌ３と、各候補の生起確率を格納するための集合Ｐ３を空リストに初期化する（ステップＳ１４０７）。 First, the analysis unit 103 initializes a set L3 for storing the candidates for the dependency structure to be generated and a set P3 for storing the occurrence probabilities of each candidate to an empty list (step S1407).

次に、解析部１０３は、Ｌ２の最初の要素である係り受け構造の候補Ｓと、Ｐ２の最初の要素であるＳの生起確率Ｐとを取得する（ステップＳ１４０８）。次に、解析部１０３は、文節位置ｉに開始文節の位置を表すＩ＋１を設定する（ステップＳ１４０９）。 Next, the analysis unit 103 acquires the dependency structure candidate S that is the first element of L2 and the occurrence probability P of S that is the first element of P2 (step S1408). Next, the analysis unit 103 sets I + 1 representing the position of the start phrase to the phrase position i (step S1409).

以下の処理（ステップＳ１４１０〜ステップＳ１４１５）では、文末に向けて係り先となる文節位置ｉを移動させながら、係り元の文節である直前の文節（文節位置Ｉ）と係り先の文節（文節位置ｉ）による係り受け関係が、Ｓ内の各係り受け関係と交差するかを判定し、交差しない場合に当該係り受け関係を含む新たな係り受け構造の候補を生成して生起確率を算出する。 In the following processing (steps S1410 to S1415), the immediately preceding phrase (phrase position I) that is the source phrase and the related phrase (sentence position) are moved while moving the related phrase position i toward the end of the sentence. It is determined whether the dependency relationship of i) intersects with each dependency relationship in S, and if not, a new dependency structure candidate including the dependency relationship is generated and the occurrence probability is calculated.

まず、解析部１０３は、文節位置ｉと係り受け構造の候補Ｓとを指定して、係り受け関係の交差を判定する交差判定処理を実行する（ステップＳ１４１０）。交差判定処理の詳細については後述する。 First, the analysis unit 103 designates the phrase position i and the dependency structure candidate S, and executes an intersection determination process for determining the intersection of the dependency relationship (step S1410). Details of the intersection determination process will be described later.

次に、解析部１０３は、交差判定処理の結果を元に、新たに追加すべき係り受け関係がＳ内の係り受け関係と交差しているか否かを判断する（ステップＳ１４１１）。交差していない場合は（ステップＳ１４１１：ＮＯ）、解析部１０３は、係り受け構造の候補Ｓの先頭に係り受け関係ｄ（Ｉ，ｉ）を付加した係り受け構造を生成し、集合Ｌ３に追加する（ステップＳ１４１２）。 Next, the analysis unit 103 determines whether or not the dependency relationship to be newly added intersects with the dependency relationship in S based on the result of the intersection determination process (step S1411). If they do not intersect (step S1411: NO), the analysis unit 103 generates a dependency structure with the dependency relationship d (I, i) added to the head of the dependency structure candidate S and adds it to the set L3. (Step S1412).

続いて、算出部１０３ａが、生成した係り受け構造の生起確率として、Ｐ（ｗ_I｜ｗ_i）・Ｐを算出し、集合Ｐ３に追加する（ステップＳ１４１３）。Ｐ（ｗ_I｜ｗ_i）は、文節ｗ_Iと文節ｗ_iとの係り受け関係に対応する条件付確率であり、条件付確率テーブル１２２ａから取得することができる。 Subsequently, the calculation unit 103a calculates P (w _I | w _i ) · P as the occurrence probability of the generated dependency structure and adds it to the set P3 (step S1413). P (w _I | w _i ) is a conditional probability corresponding to the dependency relationship between the phrase w _I and the phrase w _i, and can be acquired from the conditional probability table 122a.

生起確率を集合Ｐ３に追加した後、または、係り受け関係が交差している場合は（ステップＳ１４１１：ＹＥＳ）、解析部１０３は、ｉ＝ｉ＋１とすることによって文節位置ｉを文末側にずらす（ステップＳ１４１４）。 After adding the occurrence probability to the set P3 or when the dependency relationship intersects (step S1411: YES), the analysis unit 103 shifts the phrase position i to the end of the sentence by setting i = i + 1 ( Step S1414).

次に、解析部１０３は、ｉが文節数Ｎより大きいか否かを判断し（ステップＳ１４１５）、大きくない場合は（ステップＳ１４１５：ＮＯ）、新たな文節位置について交差判定処理を繰り返す（ステップＳ１４１０）。 Next, the analysis unit 103 determines whether i is larger than the number of phrases N (step S1415). If not larger (step S1415: NO), the intersection determination process is repeated for a new phrase position (step S1410). ).

ｉが文節数より大きい場合は（ステップＳ１４１５：ＹＥＳ）、係り受け構造の候補Ｓについて、すべての係り受け関係との交差判定処理が終了したことになるため、解析部１０３は、集合Ｌ２および集合Ｐ２から、それぞれＳおよびＰを削除する（ステップＳ１４１６）。 If i is larger than the number of clauses (step S1415: YES), since the intersection determination process with all the dependency relationships has been completed for the dependency structure candidate S, the analysis unit 103 determines that the set L2 and the set S and P are deleted from P2, respectively (step S1416).

次に、解析部１０３は、Ｌ２が空リストであるか否かを判断し（ステップＳ１４１７）、空リストでない場合は（ステップＳ１４１７：ＮＯ）、さらに次の係り受け構造の候補を取得して処理を繰り返す（ステップＳ１４０８）。 Next, the analysis unit 103 determines whether or not L2 is an empty list (step S1417). If the L2 is not an empty list (step S1417: NO), further obtains a candidate for the next dependency structure and processes it. Is repeated (step S1408).

Ｌ２が空リストである場合、すなわち、すべての係り受け構造の候補について処理が終了した場合は（ステップＳ１４１７：ＹＥＳ）、解析部１０３は、それまでに追加された新たな係り受け構造の候補および生起確率それぞれの集合である集合Ｌ３および集合Ｐ３を出力し（ステップＳ１４１８）、係り受け解析処理を終了する。 If L2 is an empty list, that is, if the processing is completed for all the dependency structure candidates (step S1417: YES), the analysis unit 103 adds the new dependency structure candidates added up to that point and The sets L3 and P3, which are sets of occurrence probabilities, are output (step S1418), and the dependency analysis process is terminated.

次に、ステップＳ１４１０の交差判定処理の詳細について図１５を用いて説明する。図１５は、第１の実施の形態における交差判定処理の全体の流れの詳細を示すフローチャートである。 Next, details of the intersection determination process in step S1410 will be described with reference to FIG. FIG. 15 is a flowchart showing details of the overall flow of the intersection determination process in the first embodiment.

まず、解析部１０３は、指定された文節位置ｉと係り受け構造の候補Ｓとを取得する（ステップＳ１５０１）。次に、解析部１０３は、係り受け構造Ｓに含まれる係り受け関係ｄ（ａ，ｂ）を取得する（ステップＳ１５０２）。 First, the analysis unit 103 acquires a specified phrase position i and a dependency structure candidate S (step S1501). Next, the analysis unit 103 acquires a dependency relationship d (a, b) included in the dependency structure S (step S1502).

続いて、解析部１０３は、文節位置ｉと、係り元の文節位置ａおよび係り先の文節位置ｂとの関係から、係り先をｉとする係り受け関係と係り受け関係ｄ（ａ，ｂ）とが交差するか否かを判断する。具体的には、解析部１０３は、ｉがａより大きく、かつ、ｉがｂより小さいか否かを判断する（ステップＳ１５０３）。ｉがａより大きく、かつ、ｉがｂより小さい場合以外であれば（ステップＳ１５０３：ＮＯ）、係り先をｉとする係り受け関係と係り受け関係ｄ（ａ，ｂ）とは交差しないと判断できる。 Subsequently, the analysis unit 103 determines whether the dependency destination is i and the dependency relationship d (a, b) based on the relationship between the phrase position i, the source phrase position a, and the destination phrase position b. Whether or not crosses. Specifically, the analysis unit 103 determines whether i is larger than a and i is smaller than b (step S1503). Unless i is larger than a and i is smaller than b (step S1503: NO), it is determined that the dependency relationship with the dependency destination i does not intersect with the dependency relationship d (a, b). it can.

この場合は、解析部１０３は、Ｓ内のすべての係り受け関係を処理したか否かを判断し（ステップＳ１５０４）、処理していない場合は（ステップＳ１５０４：ＮＯ）、次の係り受け関係を取得して処理を繰り返す（ステップＳ１５０２）。 In this case, the analysis unit 103 determines whether or not all dependency relationships in S have been processed (step S1504). If not, the analysis unit 103 determines the next dependency relationship (step S1504: NO). Obtain and repeat the process (step S1502).

すべての係り受け関係を処理したと判断した場合は（ステップＳ１５０４：ＹＥＳ）、解析部１０３は、係り先をｉとする係り受け関係は、Ｓ内のすべての係り受け関係と交差していないと判定して（ステップＳ１５０５）、交差判定処理を終了する。 If it is determined that all the dependency relationships have been processed (step S1504: YES), the analysis unit 103 determines that the dependency relationship with the dependency destination i does not intersect with all the dependency relationships in S. Determination is made (step S1505), and the intersection determination process is terminated.

一方、ステップＳ１５０３で、ｉがａより大きく、かつ、ｉがｂより小さい場合は（ステップＳ１５０３：ＹＥＳ）、係り先をｉとする係り受け関係と係り受け関係ｄ（ａ，ｂ）とが交差すると判断される。このため、解析部１０３は、係り受け関係が交差していると判定して（ステップＳ１５０６）、交差判定処理を終了する。 On the other hand, if i is greater than a and i is smaller than b in step S1503 (step S1503: YES), the dependency relationship with the dependency destination i intersects with the dependency relationship d (a, b). It is judged. Therefore, the analysis unit 103 determines that the dependency relationship intersects (step S1506), and ends the intersection determination process.

このように、第１の実施の形態にかかる言語処理装置では、言語処理の処理結果として複数の文節系列の候補を入力し、各文節系列について求めた係り受け解析の結果を参照して、複数の候補から最適な候補を処理結果として選択することができる。 As described above, in the language processing apparatus according to the first embodiment, a plurality of phrase series candidates are input as the processing result of the language processing, and a plurality of phrase series results are obtained with reference to the dependency analysis results obtained for each phrase series. The optimal candidate can be selected as the processing result from the candidates.

（第２の実施の形態）
第１の実施の形態では、文内の文節数を考慮せずに求められた生起確率および条件付確率を用いて係り受け構造の生起確率を算出していた。これに対し、第２の実施の形態にかかる言語処理装置は、文内の文節数ごとに求められた文節の生起確率および係り受け関係の条件付確率を用いることにより、高精度に係り受け構造の生起確率を算出するものである。 (Second Embodiment)
In the first embodiment, the occurrence probability of the dependency structure is calculated using the occurrence probability and the conditional probability obtained without considering the number of clauses in the sentence. On the other hand, the language processing apparatus according to the second embodiment uses the phrase occurrence probability and the conditional probability of the dependency relationship obtained for each number of clauses in the sentence, so that the dependency structure is highly accurate. The occurrence probability of is calculated.

図１６は、第２の実施の形態にかかる言語処理装置１６００の構成を示すブロック図である。図１６に示すように、言語処理装置１６００は、第１記憶部１６２１と、第２記憶部１６２２と、第３記憶部１２３と、入力受付部１０１と、制御部１６０２と、出力部１０５とを備えている。 FIG. 16 is a block diagram illustrating a configuration of a language processing device 1600 according to the second embodiment. As illustrated in FIG. 16, the language processing device 1600 includes a first storage unit 1621, a second storage unit 1622, a third storage unit 123, an input reception unit 101, a control unit 1602, and an output unit 105. I have.

第２の実施の形態では、第１記憶部１６２１、第２記憶部１６２２、および制御部１６０２の構成または機能が第１の実施の形態と異なっている。その他の構成および機能は、第１の実施の形態にかかる言語処理装置１００の構成を表すブロック図である図１と同様であるので、同一符号を付し、ここでの説明は省略する。 In the second embodiment, the configurations or functions of the first storage unit 1621, the second storage unit 1622, and the control unit 1602 are different from those of the first embodiment. Other configurations and functions are the same as those in FIG. 1, which is a block diagram showing the configuration of the language processing apparatus 100 according to the first embodiment, and thus the same reference numerals are given and description thereof is omitted here.

第１記憶部１６２１は、文内の文節数ごとに事前に求められた生起確率を格納するように拡張された生起確率テーブル１６２１ａを記憶するものである。図１７は、第２の実施の形態の生起確率テーブル１６２１ａのデータ構造の一例を示す説明図である。図１７に示すように、生起確率テーブル１６２１ａは、文節と、文内の文節数ごとに求められた文節の生起確率とを対応づけて格納している。 The 1st memory | storage part 1621 memorize | stores the occurrence probability table 1621a extended so that the occurrence probability calculated | required in advance for every number of clauses in a sentence may be stored. FIG. 17 is an explanatory diagram illustrating an example of a data structure of the occurrence probability table 1621a according to the second embodiment. As shown in FIG. 17, the occurrence probability table 1621a stores a phrase and the occurrence probability of the phrase obtained for each number of phrases in the sentence in association with each other.

同図のΩ_nは、文節数nの文節系列の集合を表している。なお、観測空間Ωを全ての可能な文節系列とすると、Ωは以下の（２）式のように表すことができる。

In the figure, Ω _n represents a set of phrase series having the number of phrases n. If the observation space Ω is all possible phrase series, Ω can be expressed as the following equation (2).

ここで、ｎが異なるΩ_nについては互いに共通部分を持たないため、各部分空間Ω_nの確率Ｐ（Ω_n）について、以下の（３）式が成り立つ。

Here, since Ω n having different _n do not have a common part with each other, the following equation (3) is established for the probability P (Ω _n ) of each subspace Ω _n .

第２記憶部１６２２は、文内の文節数ごとに事前に求められた条件付確率を格納するように拡張された条件付確率テーブル１６２２ａを記憶するものである。図１８は、第２の実施の形態の条件付確率テーブル１６２２ａのデータ構造の一例を示す説明図である。図１８に示すように、条件付確率テーブル１６２２ａは、係り元の文節と、係り先の文節と、文内の文節数ごとに求められた係り受け関係の条件付確率とを対応づけて格納している。 The second storage unit 1622 stores a conditional probability table 1622a extended so as to store conditional probabilities obtained in advance for each number of clauses in the sentence. FIG. 18 is an explanatory diagram illustrating an example of a data structure of the conditional probability table 1622a according to the second embodiment. As shown in FIG. 18, the conditional probability table 1622a stores the relational clauses, the relational clauses, and the conditional probabilities of the dependency relationships obtained for each number of clauses in the sentence in association with each other. ing.

制御部１６０２は、第１の実施の形態の制御部１０２と同様に、受付けられた文節系列から最適な文節系列を処理結果として選択する処理を制御するものであるが、解析部１６０３の機能が制御部１０２と異なっている。選択部１０４の構成および機能は第１の実施の形態の図１と同様であるので、同一符号を付し、ここでの説明は省略する。 Similar to the control unit 102 of the first embodiment, the control unit 1602 controls the process of selecting the optimum phrase series from the accepted phrase series as a processing result. Different from the control unit 102. Since the configuration and function of the selection unit 104 are the same as those in FIG. 1 of the first embodiment, the same reference numerals are given, and description thereof is omitted here.

解析部１６０３は、上記拡張された各テーブルを参照して係り受け構造の生起確率を算出する算出部１６０３ａを備えた点が、第１の実施の形態の解析部１０３と異なっている。 The analysis unit 1603 is different from the analysis unit 103 of the first embodiment in that the analysis unit 1603 includes a calculation unit 1603a that calculates the occurrence probability of the dependency structure with reference to each of the expanded tables.

なお、算出部１６０３ａによって算出される係り受け構造の生起確率は、以下の（４）式で表すことができる。ここで、Ｐ（ｗ_i｜ｗ_j、Ω_n）は、Ω_nにおけるｗ_jに対するｗ_iの条件付確率、Ｐ（ｗ_N,Ω_n）は、Ω_nにおけるｗ_Nの生起確率、Ｐ（Ω_n）はΩ_nの生起確率を表す。Ｐ（Ω_n）は、事前に算出されて図示しない記憶部等に記憶されている値を参照する。

The occurrence probability of the dependency structure calculated by the calculation unit 1603a can be expressed by the following equation (4). Where P (w _i | w _j , Ω _n ) is the conditional probability of w _i for w _j in Ω _n , P (w _N, Ω _n ) is the probability of occurrence of w _N in Ω _n , P ( Ω _n ) represents the probability of occurrence of Ω _n . P (Ω _n ) refers to a value calculated in advance and stored in a storage unit (not shown).

次に、このように構成された第２の実施の形態にかかる言語処理装置１６００による候補選択処理について説明する。第２の実施の形態の候補選択処理の全体の流れは、第１の実施の形態の候補選択処理の全体の流れを示す図１２と同様である。ただし、図１２のステップＳ１２０３の係り受け解析処理の詳細が第１の実施の形態と異なっている。 Next, candidate selection processing by the language processing device 1600 according to the second embodiment configured as described above will be described. The overall flow of the candidate selection process of the second embodiment is the same as FIG. 12 showing the overall flow of the candidate selection process of the first embodiment. However, the details of the dependency analysis processing in step S1203 in FIG. 12 are different from those in the first embodiment.

以下では、第２の実施の形態における係り受け解析処理について図１９を用いて説明する。図１９は、第２の実施の形態における係り受け解析処理の全体の流れを示すフローチャートである。 Below, the dependency analysis process in 2nd Embodiment is demonstrated using FIG. FIG. 19 is a flowchart showing an overall flow of dependency analysis processing according to the second embodiment.

まず、解析部１６０３は、指定された文節系列の文節数が２であるか否かを判断する（ステップＳ１９０１）。文節数が２である場合は（ステップＳ１９０１：ＹＥＳ）、算出部１６０３ａは、文末の２文節についての係り受け構造を生成して出力するとともに、生成した係り受け構造の生起確率を算出して出力する（ステップＳ１９０２）。このとき、算出部１６０３ａは、文末の文節の生起確率として、入力された文節系列における文節数に対応する値を生起確率テーブル１６２１ａから取得する。 First, the analysis unit 1603 determines whether or not the number of phrases in the specified phrase series is 2 (step S1901). When the number of clauses is 2 (step S1901: YES), the calculation unit 1603a generates and outputs a dependency structure for the two clauses at the end of the sentence, and calculates and outputs the occurrence probability of the generated dependency structure. (Step S1902). At this time, the calculation unit 1603a acquires, from the occurrence probability table 1621a, a value corresponding to the number of phrases in the input phrase series as the occurrence probability of the sentence at the end of the sentence.

ここで、入力された文節系列における文節数とは、ステップＳ１９０１で判定する文節数とは異なる値であり、ステップＳ１２０１で入力を受付けた状態での各文節系列の文節数に相当するものである。この値は、例えば、係り受け解析処理を実行するときに文節系列とは別に指定し、係り受け解析処理内で参照可能とするように構成すればよい。 Here, the number of phrases in the input phrase series is a value different from the number of phrases determined in step S1901, and corresponds to the number of phrases in each phrase series in a state where the input is accepted in step S1201. . For example, this value may be specified separately from the phrase series when the dependency analysis process is executed, and may be configured so that it can be referred to in the dependency analysis process.

ステップＳ１９０３からステップＳ１９０４までの、係り受け解析処理（再帰処理）、係り受け構造生成処理は、第１の実施の形態にかかる言語処理装置１００におけるステップＳ１３０３からステップＳ１３０４までと同様の処理なので、その説明を省略する。 The dependency analysis processing (recursion processing) and dependency structure generation processing from step S1903 to step S1904 are the same as steps S1303 to S1304 in the language processing apparatus 100 according to the first embodiment. Description is omitted.

次に、算出部１６０３ａは生成した係り受け構造の生起確率を算出する（ステップＳ１９０５）。算出部１６０３ａは、追加した文節との係り受け関係だけでなく、文節数も考慮して適切な条件付確率を条件付確率テーブル１６２２ａから取得して、生起確率を算出する。具体的には、算出部１６０３ａは、追加した文節との係り受け関係と、入力された文節系列における文節数とに対応する条件付確率を条件付確率テーブル１６２２ａから取得し、生起確率の算出に用いる。 Next, the calculation unit 1603a calculates the occurrence probability of the generated dependency structure (step S1905). The calculation unit 1603a obtains an appropriate conditional probability from the conditional probability table 1622a in consideration of not only the dependency relationship with the added clause but also the number of clauses, and calculates the occurrence probability. Specifically, the calculation unit 1603a acquires the conditional probability corresponding to the dependency relationship with the added clause and the number of clauses in the input phrase series from the conditional probability table 1622a, and calculates the occurrence probability. Use.

最後に、解析部１６０３は、生成した係り受け構造の候補と、各候補について算出した生起確率とを出力して係り受け解析処理を終了する（ステップＳ１９０６）。 Finally, the analysis unit 1603 outputs the generated dependency structure candidates and the occurrence probabilities calculated for the candidates, and ends the dependency analysis processing (step S1906).

このように、第２の実施の形態にかかる言語処理装置では、文内の文節数ごとに求められた生起確率および条件付確率を用いて、係り受け構造の生起確率を算出しているため、より高精度に係り受け構造の生起確率を算出することができる。そして、このように高精度に算出された生起確率によって最適な処理結果を高精度に選択することが可能となる。 Thus, in the language processing device according to the second embodiment, the occurrence probability of the dependency structure is calculated using the occurrence probability and the conditional probability obtained for each number of clauses in the sentence. The occurrence probability of the dependency structure can be calculated with higher accuracy. And it becomes possible to select an optimal process result with high precision by the occurrence probability calculated with high precision in this way.

次に、第１または第２の実施の形態にかかる言語処理装置のハードウェア構成について図２０を用いて説明する。図２０は、第１または第２の実施の形態にかかる言語処理装置のハードウェア構成を示す説明図である。 Next, the hardware configuration of the language processing apparatus according to the first or second embodiment will be described with reference to FIG. FIG. 20 is an explanatory diagram of a hardware configuration of the language processing apparatus according to the first or second embodiment.

第１または第２の実施の形態にかかる言語処理装置は、ＣＰＵ（Central Processing Unit）５１などの制御装置と、ＲＯＭ（Read Only Memory）５２やＲＡＭ５３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ５４と、ＨＤＤ（Hard Disk Drive）、ＣＤ（Compact Disc）ドライブ装置などの外部記憶装置と、ディスプレイ装置などの表示装置と、キーボードやマウスなどの入力装置と、各部を接続するバス６１を備えており、通常のコンピュータを利用したハードウェア構成となっている。 The language processing device according to the first or second embodiment communicates with a control device such as a CPU (Central Processing Unit) 51 and a storage device such as a ROM (Read Only Memory) 52 and a RAM 53 by connecting to a network. The communication I / F 54, an external storage device such as an HDD (Hard Disk Drive) and a CD (Compact Disc) drive device, a display device such as a display device, and an input device such as a keyboard and a mouse. A bus 61 is provided and has a hardware configuration using a normal computer.

第１または第２の実施の形態にかかる言語処理装置で実行される候補選択プログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The candidate selection program executed by the language processing apparatus according to the first or second embodiment is a file in an installable format or an executable format, and is a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD). ), A CD-R (Compact Disk Recordable), a DVD (Digital Versatile Disk), and the like.

また、第１または第２の実施の形態にかかる言語処理装置で実行される候補選択プログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、第１または第２の実施の形態にかかる言語処理装置で実行される候補選択プログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Further, the candidate selection program executed by the language processing apparatus according to the first or second embodiment is stored on a computer connected to a network such as the Internet and is provided by being downloaded via the network. It may be configured. Further, the candidate selection program executed by the language processing apparatus according to the first or second embodiment may be provided or distributed via a network such as the Internet.

また、第１または第２の実施の形態の候補選択プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 The candidate selection program according to the first or second embodiment may be provided by being incorporated in advance in a ROM or the like.

第１または第２の実施の形態にかかる言語処理装置で実行される候補選択プログラムは、上述した各部（入力受付部、制御部、出力部）を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ５１（プロセッサ）が上記記憶媒体から候補選択プログラムを読み出して実行することにより上記各部が主記憶装置上にロードされ、上述した各部が主記憶装置上に生成されるようになっている。 The candidate selection program executed by the language processing apparatus according to the first or second embodiment has a module configuration including the above-described units (input reception unit, control unit, output unit), and actual hardware. As described above, the CPU 51 (processor) reads out and executes the candidate selection program from the storage medium, whereby the above-described units are loaded onto the main storage device, and the above-described units are generated on the main storage device.

以上のように、本発明にかかる装置、方法およびプログラムは、音声認識、文字認識、形態素解析などの処理による複数の処理結果の候補から最適な処理結果を選択する装置、方法およびプログラムに適している。 As described above, the apparatus, method, and program according to the present invention are suitable for an apparatus, method, and program that select an optimum processing result from a plurality of processing result candidates by processing such as speech recognition, character recognition, and morphological analysis. Yes.

第１の実施の形態にかかる言語処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the language processing apparatus concerning 1st Embodiment. 第１の実施の形態の生起確率テーブルのデータ構造の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the occurrence probability table of 1st Embodiment. 第１の実施の形態の条件付確率テーブルのデータ構造の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the conditional probability table of 1st Embodiment. 辞書テーブルのデータ構造の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of a dictionary table. 入力される文節系列の一例を示す説明図である。It is explanatory drawing which shows an example of the phrase series input. 係り受け構造の表現形式の一例を示す説明図である。It is explanatory drawing which shows an example of the expression format of a dependency structure. 係り受け構造の生起確率の一例を示す説明図である。It is explanatory drawing which shows an example of the occurrence probability of a dependency structure. ４文節の文節系列から、可能な係り受け構造を求める過程を示した説明図である。It is explanatory drawing which showed the process of calculating | requiring the possible dependency structure from the clause series of 4 clauses. 係り受け構造の生起確率を求める過程の一例を示す図である。It is a figure which shows an example of the process of calculating | requiring the occurrence probability of a dependency structure. 係り受け構造の生起確率を算出する過程を示す説明図である。It is explanatory drawing which shows the process of calculating the occurrence probability of a dependency structure. 生起確率の算出過程を一般化した場合を説明するための模式図である。It is a schematic diagram for demonstrating the case where the calculation process of occurrence probability is generalized. 第１の実施の形態における候補選択処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the candidate selection process in 1st Embodiment. 第１の実施の形態における係り受け解析処理の全体の流れの概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the whole flow of the dependency analysis process in 1st Embodiment. 第１の実施の形態における係り受け解析処理の全体の流れの詳細を示すフローチャートである。It is a flowchart which shows the detail of the whole flow of the dependency analysis process in 1st Embodiment. 第１の実施の形態における交差判定処理の全体の流れの詳細を示すフローチャートである。It is a flowchart which shows the detail of the whole flow of the intersection determination process in 1st Embodiment. 第２の実施の形態にかかる言語処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the language processing apparatus concerning 2nd Embodiment. 第２の実施の形態の生起確率テーブルのデータ構造の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the occurrence probability table of 2nd Embodiment. 第２の実施の形態の条件付確率テーブルのデータ構造の一例を示す説明図である。It is explanatory drawing which shows an example of the data structure of the conditional probability table of 2nd Embodiment. 第２の実施の形態における係り受け解析処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the dependency analysis process in 2nd Embodiment. 第１または第２の実施の形態にかかる言語処理装置のハードウェア構成を示す説明図である。It is explanatory drawing which shows the hardware constitutions of the language processing apparatus concerning 1st or 2nd embodiment.

Explanation of symbols

５１ＣＰＵ
５２ＲＯＭ
５３ＲＡＭ
５４通信Ｉ／Ｆ
６１バス
１００言語処理装置
１０１入力受付部
１０２制御部
１０３解析部
１０３ａ算出部
１０４選択部
１０５出力部
１２１第１記憶部
１２１ａ生起確率テーブル
１２２第２記憶部
１２２ａ条件付確率テーブル
１２３第３記憶部
１２３ａ辞書テーブル
２０１、２０２、２０３、２０４、２０５文節系列
２０６、２０７、２０８、２０９、２１０係り受け構造
２１１、２１２、２１３、２１４、２１５リスト構造
３０６、３０７、３０８、３０９、３１０生起確率
４０１、４０２、４０３係り受け構造
１１０１係り受け構造
１１０２文節
１６００言語処理装置
１６０２制御部
１６０３解析部
１６０３ａ算出部
１６２１第１記憶部
１６２１ａ生起確率テーブル
１６２２第２記憶部
１６２２ａ条件付確率テーブル 51 CPU
52 ROM
53 RAM
54 Communication I / F
61 Bus 100 Language processing apparatus 101 Input reception unit 102 Control unit 103 Analysis unit 103a Calculation unit 104 Selection unit 105 Output unit 121 First storage unit 121a Occurrence probability table 122 Second storage unit 122a Conditional probability table 123 Third storage unit 123a Dictionary table 201, 202, 203, 204, 205 Phrase series 206, 207, 208, 209, 210 Dependency structure 211, 212, 213, 214, 215 List structure 306, 307, 308, 309, 310 Occurrence probability 401, 402 , 403 Dependency structure 1101 Dependency structure 1102 Clause 1600 Language processing device 1602 Control unit 1603 Analysis unit 1603a Calculation unit 1621 First storage unit 1621a Occurrence probability table 1622 Second storage unit 1622a Conditional probability table Le

Claims

A language processing apparatus that selects the processing result from processing result candidates for a sentence unit,
A first storage unit that stores the structural unit and the occurrence probability of the structural unit in association with each other;
A dependency relationship represented by the constituent unit that is the destination and the constituent unit that is the source of the relationship, and a conditional probability that the constituent unit that is the source of the relation appears for the constituent unit that is the destination A second storage unit for storing
An input receiving unit that receives input of candidates for the processing result;
For each of the accepted processing result candidates, an analysis unit that analyzes a dependency structure that represents a combination of the dependency relationships between the structural units;
For each of the analyzed dependency structure candidates, the occurrence probability corresponding to the constituent unit at the end of a sentence is acquired from the first storage unit, and also corresponds to each of the dependency relationships included in the dependency structure. A calculation unit that acquires the conditional probability from the second storage unit, and calculates a probability of occurrence of the dependency structure candidate that is a product of the acquired product of all the conditional probabilities and the acquired probability of occurrence; ,
A selection unit that obtains the candidate of the dependency structure that has the maximum occurrence probability calculated, and selects the candidate of the processing result corresponding to the obtained candidate of the dependency structure as the processing result;
A language processing apparatus comprising:

The calculation unit, for each non-end-end unit that is the constituent unit other than the constituent unit at the end of the sentence, in the order from the non-end-end unit at the end of the sentence toward the non-end-end unit at the beginning of the sentence, is behind the non-end-end unit. Acquiring the conditional probability corresponding to the dependency relationship with respect to the structural unit from the second storage unit, and sequentially multiplying the acquired conditional probability with respect to the occurrence probability acquired from the first storage unit. To calculate the occurrence probability of the dependency structure candidate,
The language processing apparatus according to claim 1.

The calculating unit, the dependency among the candidate structures parsed from the k-th of the process result candidate, occurrence probability PL (k, l _k) of the candidate l _k-th of the dependency structure of (1) Calculated by the formula
The selection unit obtains an integer k and an integer l _k that maximize the occurrence probability PL (k, l _k ) in the equation (1), and sets the kth candidate for the processing result corresponding to the obtained integer k as the process. Select as a result,
The language processing apparatus according to claim 1.

The first storage unit associates the structural unit, the number of units representing the number of the structural units in a sentence, and the occurrence probability of the structural unit in a sentence in which the number of structural units is the unit number. Remember,
The second storage unit stores the dependency relationship, the number of units, and the conditional probability in a sentence in which the number of structural units is the number of units, in association with each other.
The calculation unit further determines, for each of the analyzed dependency structure candidates, the number of the structural units included in the processing result candidate corresponding to the dependency structure candidate, and the structural unit at the end of the sentence. The occurrence probability corresponding to the determined number is acquired from the first storage unit, and the conditional probability corresponding to the dependency relationship included in the dependency structure and the determined number is the second probability. Obtaining from the storage unit, calculating the occurrence probability of the dependency structure candidate that is the product of the acquired occurrence probability and the acquired conditional probability;
The language processing apparatus according to claim 1.

The calculating unit, the dependency among the candidate structures parsed from the k-th of the process result candidate, occurrence probability PL (k, l _k) of the candidate l _k-th of the dependency structure of (2) Calculated by the formula
The selection unit obtains an integer k and an integer l _k maximizing the occurrence probability PL (k, l _k ) of the expression (2), and sets the kth processing result candidate corresponding to the obtained integer k as the process. Select as a result,
The language processing apparatus according to claim 4.

The input receiving unit receives an input of a recognition result candidate of a speech recognition process for recognizing a voice and dividing it into the constituent units as a candidate for the processing result;
The language processing apparatus according to claim 1.

The input accepting unit accepts input of a recognition result candidate of a character recognition process for recognizing a character and dividing it into the constituent units as a candidate for the processing result;
The language processing apparatus according to claim 1.

The input receiving unit receives input of analysis result candidates of a morpheme analysis process in which a sentence is morphologically analyzed and divided into morphemes as the constituent units as the processing result candidates,
The language processing apparatus according to claim 1.

A candidate selection method in a language processing device for selecting a processing result from processing result candidates for a sentence unit,
The language processing device includes:
A first storage unit that stores the structural unit and the occurrence probability of the structural unit in association with each other;
A dependency relationship represented by the constituent unit that is the destination and the constituent unit that is the source of the relationship, and a conditional probability that the constituent unit that is the source of the relation appears for the constituent unit that is the destination A second storage unit that stores the information in association with each other,
An input receiving step of receiving input of the candidate processing result by an input receiving unit;
An analysis step of analyzing a dependency structure representing a combination of the dependency relationships between the structural units for each of the processing result candidates received by the analysis unit;
For each of the dependency structure candidates analyzed by the calculation unit, the occurrence probability corresponding to the constituent unit at the end of a sentence is acquired from the first storage unit, and the dependency relationship included in the dependency structure is obtained. The conditional probabilities corresponding to each are acquired from the second storage unit, and the occurrence probability of the candidate of the dependency structure, which is the product of the acquired product of all the conditional probabilities and the acquired occurrence probability, is calculated. A calculating step to
A selection step of obtaining a candidate for the dependency structure that maximizes the calculated occurrence probability by a selection unit, and selecting the processing result candidate corresponding to the obtained candidate for the dependency structure as the processing result;
A candidate selection method characterized by comprising:

A candidate selection program in a language processing apparatus for selecting the processing result from processing result candidates for a sentence unit,
The language processing device includes:
A first storage unit that stores the structural unit and the occurrence probability of the structural unit in association with each other;
A dependency relationship represented by the constituent unit that is the destination and the constituent unit that is the source of the relationship, and a conditional probability that the constituent unit that is the source of the relation appears for the constituent unit that is the destination A second storage unit that stores the information in association with each other,
An input acceptance procedure for accepting input of candidate processing results;
An analysis procedure for analyzing a dependency structure representing a combination of the dependency relationships between the structural units for each of the accepted processing result candidates;
For each of the analyzed dependency structure candidates, the occurrence probability corresponding to the constituent unit at the end of a sentence is acquired from the first storage unit, and also corresponds to each of the dependency relationships included in the dependency structure. A calculation procedure for acquiring the conditional probability from the second storage unit, and calculating the occurrence probability of the candidate of the dependency structure, which is a product of the acquired product of all the conditional probabilities and the acquired occurrence probability; ,
A selection procedure for obtaining a candidate for the dependency structure that maximizes the calculated occurrence probability, and selecting the processing result candidate corresponding to the obtained candidate for the dependency structure as the processing result;
A candidate selection program that causes a computer to execute.