JP3059504B2

JP3059504B2 - Part of speech selection system

Info

Publication number: JP3059504B2
Application number: JP3043661A
Authority: JP
Inventors: 由紀子山口
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 2000-07-04
Anticipated expiration: 2015-07-04
Also published as: JPH0628392A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は自然言語処理技術に関
し、更に詳しくは文章（テキスト）中の各単語の品詞の
種類を自動的に選択するようにした品詞選択システムに
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing technique, and more particularly, to a part of speech selection system that automatically selects the type of part of speech of each word in a sentence (text).

【０００２】近年、音声合成の研究が進み、読み上げシ
ステム，応答システム等のさまざまな分野で利用され始
めている。合成音声の自然性を向上せさるためには、出
力する文章の言語情報を抽出し、それに応じたアクセン
トやイントネーションを生成するパターンを生成する必
要がある。そのような言語情報の基礎をなすのが品詞で
ある。品詞とは、名詞，動詞，副詞等の単語の種類をい
う。音声合成システムは、携帯性（ポータビリティ）が
要求されていることから、文章の中の単語の品詞を簡便
に抽出できるシステムが強く望まれている。[0002] In recent years, research on speech synthesis has been advanced, and it has begun to be used in various fields such as a reading system and a response system. In order to improve the naturalness of synthesized speech, it is necessary to extract linguistic information of a sentence to be output and generate a pattern for generating an accent or intonation according to the linguistic information. The part of speech forms the basis of such linguistic information. The part of speech refers to a type of word such as a noun, a verb, an adverb and the like. Since a voice synthesis system is required to have portability, a system that can easily extract the part of speech of a word in a sentence is strongly desired.

【０００３】[0003]

【従来の技術】最近、文章の中の単語の品詞を統計的な
情報を利用して決定する方式が用いられてきている。こ
れは、例えば“ＡＳｔｏｃｈａｓｔｉｃＰａｒｔｓ
ＰｒｏｇｒａｍａｎｄＮｏｕｎＰｈｒａｓｅ
ＰａｓｅｒｆｏｒＵｎｒｅｓｔｒｉｃｔｅｄＴｅ
ｘｔ”（ＡＣＬｐｒｏｃｅｅｄｉｎｇｓ２ｎｄＡ
ｐｐｌｉｅｄＮａｔｕｒａｌＬａｎｇｕａｇｅＰ
ｒｏｃｅｓｓｉｎｇ）に示されているように、与えられ
た単語がある品詞をとる確率である語彙確率と品詞並び
の出現頻度を示す文脈確率を用いて文章中の各単語の品
詞を決定するものである。2. Description of the Related Art Recently, a method of determining the part of speech of a word in a sentence using statistical information has been used. This is, for example, “A Stochastic Parts
Program and Noun Phrase
Passer for Unrestricted Te
xt "(ACL processings 2nd A
pplied Natural Language P
As shown in (processing), the part of speech of each word in a sentence is determined using a vocabulary probability that is a probability that a given word takes a part of speech and a context probability that indicates the frequency of appearance of a part of speech. .

【０００４】図１９は従来装置の構成例を示すブロック
図である。図において、１は入力テキストを受けて、単
語毎の語彙確率を検索する語彙確率検索部、２は語彙と
該語彙の確率が格納され、前記語彙検索部１の検索の対
象となる語彙確率辞書、３は語彙確率検索部１の出力を
基に、各単語の品詞並びを作成する品詞並び作成部であ
る。FIG. 19 is a block diagram showing a configuration example of a conventional apparatus. In the figure, 1 is a vocabulary probability search unit that receives an input text and searches for vocabulary probabilities for each word, 2 is a vocabulary probability dictionary in which vocabularies and the probabilities of the vocabularies are stored, and are searched by the vocabulary search unit 1 Reference numeral 3 denotes a part-of-speech list creation unit that creates a part-of-speech list of each word based on the output of the vocabulary probability search unit 1.

【０００５】４は前記語彙確率検索部１から検索された
語彙確率を基に、複数個の品詞並びの文脈確率を検出す
る文脈確率検出部、５は品詞並びの文脈確率を格納して
いる文脈確率テーブル、６は前記文脈確率検出部４の出
力を受けて、複数個の品詞並びの文脈確率と該品詞並び
の先頭たは末尾の単語の語彙確率から評価値を算出する
評価値算出部である。このように構成された装置の動作
を説明すれば、以下のとおりである。[0005] Reference numeral 4 denotes a context probability detection unit for detecting context probabilities of a plurality of parts of speech based on the vocabulary probabilities retrieved from the vocabulary probability retrieval unit 1; The probability table 6 receives the output of the context probability detection unit 4 and is an evaluation value calculation unit that calculates an evaluation value from the context probabilities of a plurality of part-of-speech lists and the vocabulary probabilities of the first and last words of the part-of-speech list. is there. The operation of the device configured as described above will be described below.

【０００６】先ず、語彙確率検索部１は入力テキスト
（文章）を入力し、各単語毎の語彙確率を語彙確率辞書
２から読み込む。例えば、入力テキストがＩｌｏｖｅｈｅｒｖｅｒｙｍｕｃｈ．という文章であったものとする。ここでは、単語の品詞
並びのカデゴリとして３単語品詞並びを考える。語彙確
率検索部１は、この文章を構成している各単語の語彙確
率を語彙確率辞書２から抽出する。First, a vocabulary probability search unit 1 inputs an input text (sentence) and reads a vocabulary probability for each word from a vocabulary probability dictionary 2. For example, if the input text is I love her very much much. It is assumed that the sentence was. Here, a three-word part-of-speech arrangement is considered as a category of the part-of-speech arrangement of words. The vocabulary probability search unit 1 extracts the vocabulary probabilities of the words constituting the sentence from the vocabulary probability dictionary 2.

【０００７】語彙確率辞書２は、例えば図２０に示すよ
うな構成となっている。図に示す内容は、入力された文
章の単語のみを抽出したものであり、実際にはアルファ
ベット順に全ての単語が品詞の種類毎の出現確率として
格納されている。The vocabulary probability dictionary 2 has, for example, a configuration as shown in FIG. The contents shown in the figure are obtained by extracting only words of the input text, and in fact, all words are stored in alphabetical order as appearance probabilities for each type of part of speech.

【０００８】ここでは、文章の後から処理していく場合
を考える。[0008] Here, consider a case where processing is performed after a sentence.

【０００９】品詞並び作成部３は、先ず、文章のピリオ
ド“．＃＃”に対する（Ｚ＃＃）という品詞並びを作成
し、文脈確率検出部４に与える。ここで、＃は文末また
は文頭を示す記号である。該文脈確率検出部４は、文脈
確率テーブル５から（Ｚ＃＃）なる品詞並びの文脈確率
を抽出してくる。文脈確率テーブル５には３個の品詞の
組み合わせになるあらゆるパターンについての出現確率
が文脈確率として格納されている。The part-of-speech list creation unit 3 first creates a part-of-speech list (Z ##) for the period “. ##” of the sentence, and gives it to the context probability detection unit 4. Here, # is a symbol indicating the end of the sentence or the beginning of the sentence. The context probability detection unit 4 extracts a context probability of a part of speech sequence (Z ##) from the context probability table 5. The context probability table 5 stores, as context probabilities, the appearance probabilities of all patterns that are combinations of three parts of speech.

【００１０】図２１は文脈確率テーブルの構成例を示す
図である。図ではその一部のみを示している。また、こ
こでは品詞並びとして英語の場合、最も分類しやすい３
単語品詞並びを用いている。それぞれの品詞並びの組み
合わせがとり得る確率は、大量の文章を参考にして、そ
のとり得る確率を統計的に求めたものである。FIG. 21 is a diagram showing a configuration example of a context probability table. The figure shows only a part thereof. Also, here, in the case of English as a part of speech,
Word part of speech is used. The probabilities that each combination of parts of speech can take are obtained by statistically calculating the probabilities that can be taken with reference to a large amount of sentences.

【００１１】評価値算出部６は、（Ｚ＃＃）なる品詞並
びの文脈確率と（．Ｚ）の語彙確率とを乗算して評価値
とする。例えば、（Ｚ＃＃）の文脈確率が０．９８３，
（．Ｚ）の語彙確率が１．００であれば、その評価値は
０．９８３となる。The evaluation value calculator 6 multiplies the context probability of the part of speech (Z ##) by the vocabulary probability of (.Z) to obtain an evaluation value. For example, if the context probability of (Z ##) is 0.983,
If the vocabulary probability of (.Z) is 1.00, the evaluation value is 0.983.

【００１２】次に、品詞並び作成部３は、単語並び“ｍ
ｕｃｈ．＃”に対する考えられる組合わせの品詞並び
（ＡＤＪＺ＃），（ＮＯＵＮＺ＃），（ＡＤＶ
Ｚ＃）を作成する。文脈確率検出部４は、これら品詞
並び（ＡＤＪＺ＃），（ＮＯＵＮＺ＃），（Ａ
ＤＶＺ＃）に対する文脈確率を文脈確率テーブル５
から抽出する。ここで、ＡＤＪは形容詞，ＮＯＵＮは名
詞，ＡＤＶは副詞である。図２４に品詞の分類を示す。
英語の場合、図に示すように、品詞の種類には２０通り
あり、ここではそれぞの品詞について、図のような品詞
記号を用いるものとする。Next, the part-of-speech list creation unit 3 sends the word list "m
uch. # ”For possible combinations of parts of speech (ADJ Z #), (NOUN Z #), (ADV
Z #) is created. The context probability detection unit 4 determines these parts of speech (ADJ Z #), (NOUN Z #), (A
The context probabilities for DVZ #) are shown in the context probability table 5
Extract from Here, ADJ is an adjective, NOUN is a noun, and ADV is an adverb. FIG. 24 shows the classification of parts of speech.
In the case of English, as shown in the figure, there are 20 types of parts of speech, and here, it is assumed that each part of speech uses a part of speech symbol as shown in the figure.

【００１３】評価値算出部６は、それぞれの品詞並びの
場合において、“ｍｕｃｈ”がＡＤＪをとる場合，ＮＯ
ＵＮをとる場合及びＡＤＶをとる場合の文脈確率と語彙
確率とそれまでの対応する累積評価値を乗算して評価値
を算出する。The evaluation value calculation unit 6 determines that if “much” takes ADJ in each part of speech sequence,
The evaluation value is calculated by multiplying the context probability and the vocabulary probability in the case of taking UN and the case of taking ADV by the corresponding cumulative evaluation value.

【００１４】次に、品詞並び作成部３は単語並び“ｖｅ
ｒｙｍｕｃｈ．”に対する品詞並び（ＡＤＶＡＤＪ
Ｚ），（ＡＤＶＮＯＵＮＺＺ），（ＡＤＶＡＤ
ＶＺ），（ＡＤＪＡＤＪＺ），（ＡＤＪＮＯＵＮ
ＺＺ），（ＡＤＪＡＤＶＺ）を作成する。Next, the part-of-speech list creator 3 generates the word list "ve
ry much. ”(ADV ADJ
Z), (ADV NOUNZ Z), (ADV AD
VZ), (ADJ ADJ Z), (ADJ NOUN
Z Z) and (ADJ ADV Z).

【００１５】文脈確率検出部４は、これら品詞並び（Ａ
ＤＶＡＤＪＺ），（ＡＤＶＮＯＵＮＺＺ），
（ＡＤＶＡＤＶＺ），（ＡＤＪＡＤＪＺ），
（ＡＤＪＮＯＵＮＺＺ），（ＡＤＪＡＤＶＺ）に
対する文脈確率を文脈確率テーブル５から抽出する。評
価値算出部６は、それぞれの品詞並びの場合において、
“ｖｅｒｙ”がＡＤＶをとる場合及びＡＤＪをとる場合
のそれぞれについて、文脈確率，語彙確率，累積評価値
を乗算してそれぞれの品詞並びの評価値を算出する。The context probability detection unit 4 detects the part of speech (A
DV ADJ Z), (ADV NOUNZ Z),
(ADV ADV Z), (ADJ ADJ Z),
The context probabilities for (ADJNOUNZ Z) and (ADJ ADV Z) are extracted from the context probability table 5. The evaluation value calculation unit 6 determines, for each part of speech,
For each case in which “very” takes ADV and ADJ, the evaluation value of each part-of-speech list is calculated by multiplying the context probability, the vocabulary probability, and the cumulative evaluation value.

【００１６】以下、同様の操作を“＃＃Ｉ”まで繰り返
すと、５個の品詞並びＩｌｏｖｅｈｅｒｖｅｒｙｍｕｃｈ．がとりうる全ての品詞並びに対する評価値が求まる。そ
して、それら評価値の最も高いものを品詞列として選択
する。この場合に、選択される品詞列は、＃＃ＰＰＲＯＮＶＰＲＯＮＡＤＶＡＤＶＺ＃＃となる。Hereinafter, when the same operation is repeated up to “## I”, five parts of speech are arranged, and I love her very much. The evaluation values for all possible part-of-speech arrangements are obtained. Then, the one having the highest evaluation value is selected as the part of speech sequence. In this case, the part-of-speech sequence to be selected is ## PPRON V PRON ADV ADV Z ##.

【００１７】[0017]

【発明が解決しようとする課題】前述した従来装置の場
合、品詞並びの文脈確率をテーブル（文脈確率テーブル
５）で保存しているため、品詞分類のカテゴリ数や品詞
並びの数が大きくなるにつれて、大量のメモリが必要に
なるという問題があった。In the conventional apparatus described above, the context probabilities of the parts of speech are stored in a table (context probability table 5). However, there is a problem that a large amount of memory is required.

【００１８】本発明はこのような課題に鑑みてなされた
ものであって、メモリ容量を小さくすることができる品
詞選択システムを提供することを目的としている。The present invention has been made in view of such a problem, and has as its object to provide a part-of-speech selection system capable of reducing the memory capacity.

【００１９】[0019]

【課題を解決するための手段】図１は本発明の原理ブロ
ック図である。図１９と同一のものは、同一の符号を付
して示す。図において、１は入力テキストを受けて、単
語毎の語彙確率を検索する語彙確率検索部、２は語彙と
該語彙の確率が格納され、前記語彙検索部１の検索の対
象となる語彙確率辞書である。FIG. 1 is a block diagram showing the principle of the present invention. The same components as those in FIG. 19 are denoted by the same reference numerals. In the figure, 1 is a vocabulary probability search unit that receives an input text and searches for vocabulary probabilities for each word, 2 is a vocabulary probability dictionary in which vocabularies and the probabilities of the vocabularies are stored, and are searched by the vocabulary search unit 1 It is.

【００２０】１０は前記語彙確率検索部１から検索され
た語彙確率を基に、複数個の品詞並びの文脈確率を検出
する文脈確率検出部、２０は該文脈確率検出部１０から
与えられる入力パターンを入力して次の単語の品詞の種
類に応じた文脈確率を算出する、ニューラルネットワー
クを用いた文脈確率算出部、６は前記文脈確率検出部１
０の出力を受けて、複数個の品詞並びの文脈確率と該品
詞並びの先頭または末尾の単語の語彙確率から所定の手
順に従って評価値を算出する評価値算出部である。Reference numeral 10 denotes a context probability detecting unit for detecting a context probability of a plurality of parts of speech based on the vocabulary probabilities retrieved from the vocabulary probability retrieving unit 1. Reference numeral 20 denotes an input pattern provided from the context probability detecting unit 10. To calculate the context probability according to the type of part of speech of the next word, using a neural network.
An evaluation value calculation unit that receives an output of 0 and calculates an evaluation value according to a predetermined procedure from the context probabilities of a plurality of part-of-speech lists and the vocabulary probabilities of the words at the beginning or end of the part-of-speech list.

【００２１】[0021]

【作用】予め、文脈確率算出部２０に品詞並びの確率を
学習させておく。例えば、３単語品詞並びの場合には、
ニューラルネットワークに後ろから２つの品詞の入力パ
ターンを入力させ、その時の次の単語（先頭単語）の品
詞の種類に応じた確率を予め教師パターンとして与えて
やり、教師パターンとニューラルネットワークの出力と
が等しくなるように学習させておく。[Operation] The context probability calculation unit 20 is made to previously learn the probability of the part of speech arrangement. For example, in the case of a three-word part of speech,
The neural network inputs the input patterns of the two parts of speech from the back, and the probability according to the type of part of speech of the next word (head word) at that time is given as a teacher pattern in advance, and the teacher pattern and the output of the neural network are output. Train them to be equal.

【００２２】しかる後、文脈確率算出部２０は、文脈確
率検出部１０から与えられる品詞並びのパターンを入力
して、次の単語の品詞の種類に応じた文脈確率を算出す
る。例えば、３単語品詞並びの場合には、２個の品詞並
びを入力パターンとして入力し、次の単語の品詞並びを
とる確率（文脈確率）が品詞の種類毎に算出される。こ
の算出された文脈確率を基に、評価値算出部６は３単語
品詞並びのパターン毎の評価値を算出する。Thereafter, the context probability calculation unit 20 inputs the pattern of the part-of-speech arrangement given from the context probability detection unit 10 and calculates the context probability according to the type of the part of speech of the next word. For example, in the case of a three-word part-of-speech arrangement, two parts-of-speech arrangements are input as input patterns, and the probability of taking the part-of-speech arrangement of the next word (context probability) is calculated for each type of part of speech. Based on the calculated context probabilities, the evaluation value calculation unit 6 calculates an evaluation value for each pattern of the three-word part of speech.

【００２３】このように、本発明によれば文脈確率算出
部２０を構成するニューラルネットワークに品詞並びの
確率を学習させることにより、例えば３品詞並びの場合
には、後ろから２つの品詞に相当するパターンを入力パ
ターンとして入力してやれば、次の３つ目の品詞（先頭
単語の品詞）がとる確率（文脈確率）を品詞の種類ごと
に出力するので、文脈確率の情報を文脈確率テーブルと
してもっている必要がなくなり、メモリ容量を小さくす
ることができる品詞選択システムを提供することができ
る。As described above, according to the present invention, the neural network constituting the context probability calculating unit 20 learns the probability of the part-of-speech arrangement. If a pattern is input as an input pattern, the probability (context probability) of the next third part of speech (the first word of speech) is output for each type of part of speech, so that information on the context probability is provided as a context probability table. It is possible to provide a part-of-speech selection system that eliminates the need and can reduce the memory capacity.

【００２４】[0024]

【実施例】以下、図面を参照して本発明の動作を詳細に
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The operation of the present invention will be described below in detail with reference to the drawings.

【００２５】図２は文脈確率算出部２０の構成例を示す
図である。２１はニューラルネットワークで、入力層Ｌ
１，中間層Ｌ２及び出力層Ｌ３から構成されている。入
力層Ｌ１は品詞カテゴリ数の組合わせが（品詞並びの数
−１）個設けられている。例えば品詞並びが前述した３
個である場合、入力層Ｌ１は品詞カテゴリ数の組合わせ
が２個で構成される。出力装置Ｌ３は１個のカテゴリ数
の組合わせをもっている。これら品詞カテゴリ数の組合
わせの単位としては、例えば英語の品詞の数である２０
個（２０ユニット）が用いられる。FIG. 2 is a diagram showing a configuration example of the context probability calculation unit 20. 21 is a neural network, the input layer L
1, an intermediate layer L2 and an output layer L3. The input layer L1 is provided with a combination of the number of parts of speech categories (the number of parts of speech arranged minus one). For example, the part of speech is 3
When the number of parts of speech is one, the input layer L1 includes two combinations of the number of parts of speech categories. The output device L3 has one category number combination. The unit of the combination of the number of parts of speech category is, for example, 20 which is the number of parts of speech in English.
(20 units) are used.

【００２６】２２は出力層Ｌ３の出力値と教師パターン
とを比較する比較部である。該比較部２２は、出力層Ｌ
３の出力（確率）が教師パターンと等しくなるように入
力層Ｌ１と中間層Ｌ２間のシナプスの重みの変更及び中
間層Ｌ２と出力層Ｌ３間の重みの変更を行う。最終的
に、出力層Ｌ３の出力値と教師パターンが一致した時点
で学習が終了したことになる。この時、入力層Ｌ１と中
間層Ｌ２間のシナプスの重み及び中間層Ｌ２と出力層Ｌ
３間の重みはある一定値に固定される。Reference numeral 22 denotes a comparing unit for comparing the output value of the output layer L3 with the teacher pattern. The comparison unit 22 includes an output layer L
The weight of the synapse between the input layer L1 and the intermediate layer L2 and the weight between the intermediate layer L2 and the output layer L3 are changed so that the output (probability) of No. 3 becomes equal to the teacher pattern. Eventually, learning ends when the output value of the output layer L3 matches the teacher pattern. At this time, the weight of the synapse between the input layer L1 and the intermediate layer L2 and the weight of the synapse between the intermediate layer L2 and the output layer L2
The weight between the three is fixed to a certain value.

【００２７】図３は本発明の一実施例を示す構成ブロッ
ク図である。図１と同一のものは、同一の符号を伏して
示す。図において、文脈確率検出部１０は、語彙確率検
索部１からの単語を入力して所定の数（例えば３個）だ
けの品詞並びを作成する品詞並び作成部１１，品詞並び
作成部１１の出力を受けて入力パターンを作成する入力
パターン作成部１２，該入力パターン作成部１２の出力
を文脈確率算出部２０に入力してニューラルネットワー
クから得られたそれぞれの品詞毎の文脈確率を受けて出
力値を選択する出力値選択部１３から構成されている。FIG. 3 is a block diagram showing the configuration of an embodiment of the present invention. 1 that are the same as those shown in FIG. In the figure, a context probability detecting unit 10 receives a word from the vocabulary probability searching unit 1 and generates a predetermined number (for example, three) of a part-of-speech list, and outputs a part-of-speech list creating unit 11 and an output of the part-of-speech list creating unit 11 An input pattern creating unit 12 for creating an input pattern in response to the input, and inputting the output of the input pattern creating unit 12 to a context probability calculating unit 20 to receive the context probability for each part of speech obtained from the neural network and output an output value Is selected from the output value selection unit 13 for selecting the value.

【００２８】評価値算出部６は、出力値選択部１３から
与えられた文脈確率と語彙確率とから評価値を算出する
評価値算出手段３１と、該評価値算出手段３１からの複
数の出力のうちから評価値が最大のものを選択する最大
値選択部３２から構成されている。そして、最大値選択
部３２の出力が求める品詞列となる。このように構成さ
れた装置の動作を説明すれば、以下のとおりである。The evaluation value calculating unit 6 calculates an evaluation value from the context probability and the vocabulary probability given from the output value selecting unit 13, and outputs a plurality of outputs from the evaluation value calculating unit 31. It comprises a maximum value selection unit 32 for selecting the one with the largest evaluation value from among them. Then, the output of the maximum value selection unit 32 becomes the part of speech string to be obtained. The operation of the device configured as described above will be described below.

【００２９】この実施例におけるニューラルネットワー
ク２１は、予め大量のデータから抽出した品詞並びの頻
度から、入力テキストについて末尾の単語から先頭の単
語に向かって逆向きに品詞を選択して学習させる方法
と、先頭の単語から末期の単語に向かって前向きに品詞
を選択して学習せさる方法のいずれも用いることができ
る。（末尾の単語から品詞を選択する場合）図４は末尾の単
語から品詞を選択する場合の学習パターンの作成例を示
す図である。ここでは３単語品詞並びを用いるものとす
る。例えば、ＡＤＶＡＤＶＡＤＶのように副詞が３個並ぶ場合を考える。このような品詞
並びの頻度は大量のデータを検索した時、３０７個であ
り、その文脈確率は０．０７である。この０．０７は、
頻度３０７を３単語品詞並びの後ろ２品詞が同じものの
頻度の総和で割った値として求められる。このようにし
て、次の品詞並びＮＯＴＡＤＶＡＤＶのようなＮＯＴと２個の副詞の並びの頻度は７８であ
り、その文脈確率は０．０２である。以下同様にして、
後ろの２個が全てＡＤＶで先頭の１個がカテゴリ数２０
のあらゆる品詞をとる確率が文脈確率として図４に示さ
れている。この文脈確率をニューラルネットワーク２１
に学習させる方法について説明する。The neural network 21 in this embodiment selects a part-of-speech from the last word toward the first word in the input text in the reverse direction and learns from the part-of-speech frequency extracted from a large amount of data in advance. Alternatively, any of the methods of selecting and learning the part of speech in a forward direction from the first word to the last word can be used. FIG. 4 is a diagram showing an example of creating a learning pattern when selecting a part of speech from the last word. Here, it is assumed that a three-word part-of-speech arrangement is used. For example, consider a case where three adverbs are arranged like ADV ADV ADV. When a large amount of data is searched, the frequency of such part-of-speech arrangement is 307, and the context probability is 0.07. This 0.07 is
The frequency 307 is obtained as a value obtained by dividing the last two parts of speech of a three-word part of speech by the sum of the frequencies of the same parts. In this way, the frequency of the sequence of NOT and two adverbs such as the next part of speech sequence NOT ADV ADV is 78, and the context probability is 0.02. Similarly,
The last two are ADV, and the first one is 20 categories
Is shown in FIG. 4 as the context probability. This context probability is calculated using the neural network 21
A method for causing the learning will be described.

【００３０】ここでは、図２のニューラルネットワーク
２１が入力は２組であり、その品詞種別はいずれもＡＤ
Ｖであるものとする。このＡＤＶを示すのは、２０ユニ
ットのうちの最初のユニットに“１”を立てた状態で示
す。ここで、品詞カテゴリ数として２０ユニットを用い
たのは、英語の場合、品詞の種類が図２４に示すように
２０種類であることに対応させたものである。Here, the neural network 21 shown in FIG. 2 has two sets of inputs, and the part of speech type is AD
V. This ADV is shown in a state where "1" is set in the first unit of the 20 units. Here, the use of 20 units as the number of part-of-speech categories corresponds to the case where English has 20 kinds of part-of-speech as shown in FIG.

【００３１】図５（ａ）はこの時の入力パターン例を示
す図である。今、入力パターン作成部１２は図５に示す
ようなパターンをニューラルネットワーク２１に入力さ
せる。また、例えば、３単語品詞並びとして後ろの２個
がＮＯＵＮＡＤＶである場合には、その入力パターン
は図５（ｂ）に示すようなものとなる。ＮＯＵＮの場
合、その番号は図２４より“４”である。図２４は０か
ら番号が始まっているので、“４”であることは第５番
目を示す。そこで、ＮＯＵＮの場合には前から５番目に
“１”が立っている。FIG. 5A is a diagram showing an example of the input pattern at this time. Now, the input pattern creation unit 12 causes the neural network 21 to input a pattern as shown in FIG. If, for example, the last two words in the three-word part-of-speech sequence are NOUN ADVs, the input pattern is as shown in FIG. 5B. In the case of NOUN, the number is "4" from FIG. In FIG. 24, since the number starts from 0, “4” indicates the fifth. Therefore, in the case of NOUN, "1" stands fifth from the front.

【００３２】図６はこの時の学習の説明図である。入力
層Ｌ１は２０ユニットずつ２組で合計４０ユニットの入
力を受ける。各組の最初のユニットに“１”が立ってい
るのは副詞であることを示している。出力は２０ユニッ
トである。一方、このニューラルネットワーク２１の比
較部２２に与える教師パターンは、図７に示すようなも
のである。この教師パターンは図４に示す文脈確率をそ
のまま利用したものである。FIG. 6 is an explanatory diagram of the learning at this time. The input layer L1 receives an input of a total of 40 units in two sets of 20 units. The presence of “1” in the first unit of each set indicates an adverb. The output is 20 units. On the other hand, the teacher pattern given to the comparison unit 22 of the neural network 21 is as shown in FIG. This teacher pattern uses the context probability shown in FIG. 4 as it is.

【００３３】つまり、ＡＤＶＡＤＶＡＤＶとなる品詞並びの確率は０．０７であるので、この値を
教師信号として出力層Ｌ３の２０ユニットの最初のユニ
ットに与える。次のユニットにはＮＯＴＡＤＶＡＤＶとなる品詞並びの確率である０．０２を与える。以下、
図７の値を教師パターンとして比較部２２に与えてい
く。最後のユニットには＃ＡＤＶＡＤＶとなる品詞並びの確率である０．０８を与える。図６の
構成において、出力層Ｌ３の各ユニットの出力が０．０
７，０．０２，…，０．０８をとるようになるまで何回
も同一の入力パターンを与えてニューラルネットワーク
２１の学習を行なわせる。That is, since the probability of the part-of-speech arrangement that becomes ADV ADV ADV is 0.07, this value is given as a teacher signal to the first unit of the 20 units in the output layer L3. The next unit is given 0.02, which is the probability of a part-of-speech arrangement that becomes NOT ADV ADV. Less than,
The values in FIG. 7 are provided to the comparing unit 22 as teacher patterns. The last unit is given with 0.08, which is the probability of the part-of-speech arrangement as #ADV ADV. In the configuration of FIG. 6, the output of each unit of the output layer L3 is 0.0
The learning of the neural network 21 is performed by giving the same input pattern many times until the values become 7, 0.02,..., 0.08.

【００３４】以上、後の２個がＡＤＶ，ＡＤＶの場合の
先頭の単語がとる品詞の種類に応じた文脈確率を求める
場合を例にとったが、品詞並びはこれに限るものではな
く、図５（ｂ）に示したものの他、極めて多くのパター
ン（例えば２０カテゴリの場合３品詞並びは８０００、
学習パターンは４００）が存在する。それぞれのパター
ンの組み合わせに対しても、図６のニューラルネットワ
ーク２１に学習を行なわせる。As described above, the case where the context probability according to the type of part of speech taken by the first word when the latter two are ADV and ADV is taken as an example, but the part of speech arrangement is not limited to this. 5 (b), an extremely large number of patterns (for example, in the case of 20 categories, 3 parts-of-speech
There is a learning pattern 400). The learning is performed by the neural network 21 shown in FIG. 6 for each combination of the patterns.

【００３５】このようにして、全ての３単語品詞並びに
ついての学習が終了した状態で、図３に示す装置に入力
テキストを入力してやると、文脈確率算出部２０は入力
パターンに応じた３個の品詞並びの文脈確率を出力す
る。この文脈確率算出部の出力から目的の品詞並びの確
率の出力値を出力値選択部１３が選択して、評価値算出
手段３１に与える。When the input text is input to the device shown in FIG. 3 in a state where the learning of all the three-word part-of-speech sequences has been completed in this way, the context probability calculating unit 20 determines that three contexts corresponding to the input pattern have been input. Output context probability of part of speech. The output value selection unit 13 selects an output value of the probability of the target part-of-speech arrangement from the output of the context probability calculation unit, and gives it to the evaluation value calculation unit 31.

【００３６】評価値算出手段３１は前記文脈確率と先頭
単語の語彙確率と、それまでの対応する品詞列の累積評
価値を乗算して評価値を算出する。評価値算出手段３１
は３品詞並びがとり得る全ての場合についての評価値を
算出するので、最大値選択部３２はこれら評価値の中で
一番大きい値の品詞並びを選択し、求めるべき品詞列と
して出力する。（先頭の単語から品詞を選択する場合）先頭の単語から
品詞を選択する場合も、前述した末尾の単語から品詞を
選択する場合と同様に考えることができる。The evaluation value calculating means 31 calculates an evaluation value by multiplying the context probability, the vocabulary probability of the head word, and the cumulative evaluation value of the corresponding part of speech sequence up to that point. Evaluation value calculation means 31
Calculates the evaluation values for all possible cases of the three part-of-speech arrangements, so the maximum value selection unit 32 selects the part-of-speech arrangement having the largest value among these evaluation values, and outputs it as the part-of-speech sequence to be obtained. (Case of Part-of-Speech Selection from First Word) The case of selecting a part-of-speech from the first word can be considered in the same manner as the case of selecting the part-of-speech from the last word described above.

【００３７】図８は末尾の単語から品詞を選択する場合
の学習パターンの作成例を示す図である。ここでは３単
語品詞並びを用いるものとする。例えば、ＡＤＶＡＤＶＡＤＶのように副詞が３個並ぶ場合を考える。このような品詞
並びの頻度は大量のデータを検索した時、３０７個であ
り、その文脈確率は０．０７である。この０．０７は、
頻度３０７を３単語品詞並びの前２品詞がＡＤＶ，ＡＤ
Ｖである３品詞並びの頻度の総和で割った値として求め
られる。このようにして、次の品詞並びＡＤＶＡＤＶＮＯＴのようなＮＯＴと２個の副詞の並びの頻度は６であり、
その文脈確率は０．００である。以下同様にして、前の
２個が全てＡＤＶで末尾の１個がカテゴリ数２０のあら
ゆる品詞をとる確率が文脈確率として図８に示されてい
る。この文脈確率をニューラルネットワーク２１に学習
させる方法について説明する。FIG. 8 is a diagram showing an example of creating a learning pattern when selecting a part of speech from the last word. Here, it is assumed that a three-word part-of-speech arrangement is used. For example, consider a case where three adverbs are arranged like ADV ADV ADV. When a large amount of data is searched, the frequency of such part-of-speech arrangement is 307, and the context probability is 0.07. This 0.07 is
Frequency 307 is the first two parts of speech of the three-word part-of-speech arrangement is ADV, AD
It is obtained as a value obtained by dividing by the total sum of the frequencies of the three part-of-speech arrangements as V. Thus, the frequency of the next part-of-speech sequence, such as ADV ADV NOT and the sequence of two adverbs, is 6,
Its context probability is 0.00. Similarly, FIG. 8 shows the probability that the previous two all take ADV and the last one takes any part of speech with 20 categories as the context probability. A method for causing the neural network 21 to learn the context probability will be described.

【００３８】ここでは、入力は２組であり、その品詞種
別はいずれもＡＤＶであるものとする。このＡＤＶを示
すのは、２０ユニットのうちの最初のユニットに“１”
を立てた状態で示す。Here, it is assumed that there are two sets of inputs, and that each of the parts of speech is ADV. This ADV is indicated by “1” in the first unit of the 20 units.
Is shown in an upright position.

【００３９】図９はこの時の入力パターン例を示す図で
ある。今、入力パターン作成部１２は図９に示すような
パターンをニューラルネットワーク２１に入力させる。
この時のニューラルネットワークの構成は図６と同じで
ある。FIG. 9 is a diagram showing an example of the input pattern at this time. Now, the input pattern creation unit 12 causes the neural network 21 to input a pattern as shown in FIG.
The configuration of the neural network at this time is the same as in FIG.

【００４０】入力層Ｌ１は２０ユニットずつ２組で合計
４０ユニットの入力を受ける。各組の最初のユニットに
“１”が立っているのは副詞であることを示している。
出力は２０ユニットである。一方、このニューラルネッ
トワーク２１の比較部２２に与える教師パターンは、図
１０に示すようなものである。この教師パターンは図８
に示す文脈確率をそのまま利用したものである。The input layer L1 receives an input of a total of 40 units in two sets of 20 units. The presence of “1” in the first unit of each set indicates an adverb.
The output is 20 units. On the other hand, the teacher pattern given to the comparison unit 22 of the neural network 21 is as shown in FIG. This teacher pattern is shown in FIG.
The context probability shown in FIG.

【００４１】つまり、ＡＤＶＡＤＶＡＤＶとなる品詞並びの確率は０．０７であるので、この値を
教師信号として出力層Ｌ３の２０ユニットの最初のユニ
ットに与える。次のユニットにはＡＤＶＡＤＶＮＯＴとなる品詞並びの確率である０．００を与える。以下、
図１０の値を教師パターンとして比較部２２に与えてい
く。最後のユニットにはＡＤＶＡＤＶ＃となる品詞並びの確率である０．００を与える。図６の
構成において、出力層Ｌ３の各ユニットの出力が０．０
７，０．００，…，０．００をとるようになるまで何回
も同一の入力パターンを与えてニューラルネットワーク
２１の学習を行なわせる。That is, since the probability of the part-of-speech arrangement that becomes ADV ADV ADV is 0.07, this value is given as a teacher signal to the first unit of the 20 units in the output layer L3. The next unit is given 0.00, which is the probability of the part-of-speech arrangement that becomes ADV ADV NOT. Less than,
The values in FIG. 10 are provided to the comparing unit 22 as teacher patterns. The last unit is given 0.00, which is the probability of the part-of-speech arrangement that becomes ADV ADV #. In the configuration of FIG. 6, the output of each unit of the output layer L3 is 0.0
The neural network 21 is trained by giving the same input pattern many times until the value becomes 7, 0.00,..., 0.00.

【００４２】以上、前から２個がＡＤＶ，ＡＤＶの場合
の末尾の単語がとる品詞の種類に応じた文脈確率を求め
る場合を例にとったが、品詞並びはこれに限るものでは
なく、極めて多くのパターンが存在する。それぞれのパ
ターンの組み合わせに対しても、図６のニューラルネッ
トワーク２１に学習を行なわせる。As described above, the case where the context probability according to the type of part of speech taken by the last word in the case where the first two words are ADV and ADV is taken as an example, but the part of speech arrangement is not limited to this. There are many patterns. The learning is performed by the neural network 21 shown in FIG. 6 for each combination of the patterns.

【００４３】このようにして、全ての３単語品詞並びに
ついての学習が終了した状態で、図３に示す装置に入力
テキストを入力してやると、文脈確率算出部２０は入力
パターンに応じた３個の品詞並びの文脈確率を出力す
る。この文脈確率算出部の出力から目的の品詞並びの確
率の出力値を出力値選択部１３が選択して、評価値算出
手段３１に与える。When the input text is input to the apparatus shown in FIG. 3 in a state where the learning on all the three-word part-of-speech arrangements has been completed, the context probability calculation unit 20 determines that the three parts corresponding to the input pattern Output context probability of part of speech. The output value selection unit 13 selects an output value of the probability of the target part-of-speech arrangement from the output of the context probability calculation unit, and gives it to the evaluation value calculation unit 31.

【００４４】評価値算出手段３１は前記文脈確率と末尾
単語の語彙確率と、それまでの対応する品詞列の累積評
価値を乗算して評価値を算出する。評価値算出手段３１
は３品詞並びがとり得る全ての場合についての評価値を
算出するので、最大値選択部３２はこれら評価値の中で
一番大きい値の品詞並びを選択し、求めるべき品詞列と
して出力する。The evaluation value calculating means 31 calculates an evaluation value by multiplying the context probability, the vocabulary probability of the last word and the cumulative evaluation value of the corresponding part of speech sequence up to that point. Evaluation value calculation means 31
Calculates the evaluation values for all possible cases of the three part-of-speech arrangements, so the maximum value selection unit 32 selects the part-of-speech arrangement having the largest value among these evaluation values, and outputs it as the part-of-speech sequence to be obtained.

【００４５】次に、具体例を用いて本発明装置の動作を
説明する。ここでは、図２４に示した２０カテゴリに分
類した品詞を使用し、３品詞並びの文脈確率を利用し
て、入力テキストの各単語の品詞を末尾から先頭に向か
って選択していく場合を考える。ここでは、 I appealed to the children to make less noise. という文章について品詞選択の処理を行う。語彙確率検
索部１は、入力テキスト中の各単語について語彙確率辞
書２を検索し、それぞれの単語について品詞カテゴリと
その確率を抽出する。図１１はこのようにして抽出され
た品詞カテゴリとその確率を示す図である。前記文章に
対応して、それぞの単語の品詞カテゴリとその語彙確率
とが示されている。Next, the operation of the apparatus of the present invention will be described using a specific example. Here, it is assumed that the parts of speech classified into the 20 categories shown in FIG. 24 are used, and the parts of speech of each word of the input text are selected from the end to the beginning by using the context probabilities of the three parts of speech. . Here, the part-of-speech selection process is performed for the sentence I appealed to the children to make less noise. The vocabulary probability search unit 1 searches the vocabulary probability dictionary 2 for each word in the input text, and extracts a part of speech category and its probability for each word. FIG. 11 is a diagram showing the part-of-speech categories extracted in this way and their probabilities. Corresponding to the sentence, the part of speech category of each word and its vocabulary probability are shown.

【００４６】品詞並び作成部１１は、この検索結果につ
いて３単語の品詞並びを作成する。図１２はこのように
して作成された３単語品詞並びを示す図である。文章の
後の方から実現される可能性のある全ての品詞並びが作
成されている。The part-of-speech arrangement creating unit 11 creates a part-of-speech arrangement of three words based on the search result. FIG. 12 is a diagram showing a three-word part-of-speech arrangement created in this way. All possible parts-of-speech sequences that can be realized from the later part of the sentence are created.

【００４７】入力パターン作成部１２は、これらの品詞
並びの後２品詞から図５に示したような入力パターンを
作成し、ニューラルネットワークを用いた文脈確率算出
部２０に与える。文脈確率算出部２０は、後２品詞から
先頭の単語のとる確率を文脈確率として品詞カテゴリ毎
に出力する。The input pattern creating unit 12 creates an input pattern as shown in FIG. 5 from the two parts of speech after these part of speech arrangements, and gives it to the context probability calculating unit 20 using a neural network. The context probability calculation unit 20 outputs the probability of taking the first word from the last two parts of speech as the context probability for each part of speech category.

【００４８】出力値選択部１３は、与えられた入力パタ
ーンに対する文脈確率算出部２０の出力について、品詞
並びの先頭の品詞に対する値を選択し、当該品詞並びの
文脈確率とする。The output value selection unit 13 selects a value for the first part of speech in the part of speech list for the output of the context probability calculation unit 20 for the given input pattern, and sets it as the context probability of the part of speech list.

【００４９】評価値算出手段３１は、出力値選択部１３
から得られた文脈確率Ｐｂと、語彙確率検索部１で検索
した品詞並びの先頭の品詞に対する語彙確率Ｐｖを、入
力テキストの末尾の単語から作成した品詞並びに対する
累積評価値ｅ´にかけた値ｅを新しい評価値とする。新
しい評価値ｅは次式で表される。The evaluation value calculating means 31 outputs the output value
Multiplied by the context probability Pb obtained from the above and the vocabulary probability Pv for the first part of speech of the part-of-speech sequence searched by the vocabulary probability search unit 1 to the cumulative evaluation value e ′ for the part-of-speech sequence created from the last word of the input text e Is the new evaluation value. The new evaluation value e is represented by the following equation.

【００５０】ｅ＝Ｐｂ×Ｐｖ×ｅ´ （１）図１３，図１４はこのようにして得られた評価値算出結
果を示す図である。図１３と図１４は連続しており、図
１３の後に図１４がくるようになっている。図におい
て、Ａの部分は文脈確率算出部２０で前述したシーケン
スにより得られた文脈確率、Ｂは先頭単語の品詞の種類
に応じた語彙確率、Ｃはそれまでの過程で得られた累積
評価値ｅ、Ｄは当該累積評価値に対応した品詞並びをそ
れぞれ示している。E = Pb × Pv × e ′ (1) FIGS. 13 and 14 are diagrams showing the evaluation value calculation results obtained in this way. 13 and FIG. 14 are continuous, and FIG. 14 comes after FIG. In the figure, part A is the context probability obtained by the above-described sequence in the context probability calculation unit 20, B is the vocabulary probability according to the type of part of speech of the first word, and C is the cumulative evaluation value obtained in the process up to that point. “e” and “D” indicate the part-of-speech arrangement corresponding to the cumulative evaluation value, respectively.

【００５１】入力テキストの末尾の単語から順次、当該
単語がとる品詞カテゴリの全てについて、それまでに作
成した品詞並びの全てについて当該品詞並びの後２品詞
と合わせて３品詞並びを作成し、文脈確率Ｐｂを検索す
る。この文脈確率Ｐｂと、当該単語の品詞カテゴリにお
ける語彙確率Ｐｖと、それまでに作成した品詞並びの評
価値ｅ´の積を新しい品詞並びの評価値ｅとしている。
図に示すように、入力テキストの先頭の単語まで、全て
の品詞並びの評価値が算出できた後、最大値選択部３２
はこれらの品詞並びの中から最大の評価値を持つものを
選択し、入力テキストの各単語の品詞を決定する。In order from the last word of the input text, for all of the part of speech categories taken by the word, for all of the part of speech parts created so far, a three part of speech part list is created by adding the last two parts of speech and the context Search for the probability Pb. The product of the context probability Pb, the vocabulary probability Pv of the word in the part-of-speech category, and the evaluation value e ′ of the part-of-speech list created up to that time is defined as the evaluation value e of the new part-of-speech list.
As shown in the figure, after the evaluation values of all parts of speech are calculated up to the first word of the input text, the maximum value selection unit 32
Selects the part-of-speech list having the highest evaluation value from these part-of-speech arrangements, and determines the part of speech of each word in the input text.

【００５２】図１５はこのようにして得られた品詞選択
結果を示す図である。図１４の最終段の品詞並びと一致
している。FIG. 15 is a diagram showing a part of speech selection result obtained in this way. This matches the part-of-speech arrangement at the last stage in FIG.

【００５３】図１６は本発明の他の実施例を示す構成ブ
ロック図である。図３と同一のものは、同一の符号を付
して示す。図において、１は入力テキストを受けて、単
語毎の語彙確率を検索する語彙確率検索部、２は語彙と
該語彙の確率が格納され、前記語彙検索部１の検索の対
象となる語彙確率辞書、１０はは語彙確率検索部１の出
力を受けて入力パターンを作成する文脈確率検出部とし
ての入力パターン作成部である。FIG. 16 is a structural block diagram showing another embodiment of the present invention. The same components as those in FIG. 3 are denoted by the same reference numerals. In the figure, 1 is a vocabulary probability search unit that receives an input text and searches for vocabulary probabilities for each word, 2 is a vocabulary probability dictionary in which vocabularies and the probabilities of the vocabularies are stored, and are searched by the vocabulary search unit 1 Reference numeral 10 denotes an input pattern creation unit as a context probability detection unit that creates an input pattern by receiving the output of the vocabulary probability search unit 1.

【００５４】２０は入力パターン作成部１０から与えら
れる入力パターンを入力して次の単語の品詞の種類に応
じた文脈確率を算出する、ニューラルネットワークを用
いた文脈確率算出部、３１は文脈確率算出部２０の出力
を受けて、与えられた文脈確率と語彙確率とから評価値
を算出する評価値算出手段、３２は該評価値算出手段３
１からの複数の出力のうちから評価値が最大のものを選
択する最大値選択部３２である。これら、評価値算出手
段３１と最大値選択部３２とで評価値算出部６を構成し
ている。このように構成された装置の動作を説明すれ
ば、以下のとおりである。Reference numeral 20 denotes a context probability calculation unit using a neural network, which inputs an input pattern given from the input pattern creation unit 10 and calculates a context probability according to the type of part of speech of the next word. 31 denotes a context probability calculation. The evaluation value calculating means 32 receives the output of the section 20 and calculates an evaluation value from the given context probability and vocabulary probability.
A maximum value selection unit 32 that selects the output having the largest evaluation value from among a plurality of outputs from one. The evaluation value calculation unit 6 and the maximum value selection unit 32 constitute the evaluation value calculation unit 6. The operation of the device configured as described above will be described below.

【００５５】この実施例では、後２品詞並びの入力パタ
ーンを図５に示すように“１”か“０”かで入力するの
ではなく、後２品詞並びを構成する単語の各品詞カテゴ
リをとる確率を示す評価値を入力するようにしたもので
ある。In this embodiment, instead of inputting the input pattern of the latter two parts of speech as "1" or "0" as shown in FIG. 5, each part of speech category of the words constituting the latter two parts of speech is An evaluation value indicating the probability of taking is input.

【００５６】図３の実施例と同様に、語彙確率検索部１
は入力テキスト中の各単語について語彙確率辞書２を検
索して、それぞれの単語についての品詞カテゴリとその
確率を図１１に示すように抽出する。ここでは、品詞を
選択する際の評価値を単語毎に品詞カテゴリの語彙確率
で表現し、ニューラルネットワークを用いた文脈確率算
出部２０に対して隣接する後２単語又は前２単語の品詞
カテゴリの確率を入力する。As in the embodiment of FIG. 3, the vocabulary probability search unit 1
Searches the vocabulary probability dictionary 2 for each word in the input text, and extracts the part of speech category and its probability for each word as shown in FIG. Here, the evaluation value at the time of selecting the part of speech is expressed by the vocabulary probability of the part of speech category for each word, and the adjacent two words or the preceding two words of the part of speech category of the two words before and after to the context probability calculation unit 20 using the neural network. Enter the probability.

【００５７】図１７，図１８は評価値算出の例を示す図
である。図１７，図１８は一連の動作を示したものであ
り、図１７の後に図１８が続くようになっている。入力
テキストの末尾の単語から品詞を選択する場合には、入
力パターン作成部１０は、３単語品詞並びのうちの後２
単語の評価値Ｅをニューラルネットワーク２１に入力
し、得られた出力が文脈確率Ｐｂとなる。FIGS. 17 and 18 are diagrams showing examples of evaluation value calculation. FIGS. 17 and 18 show a series of operations, and FIG. 18 follows FIG. When selecting a part of speech from the last word of the input text, the input pattern creating unit 10 sets the last two words of the three-word part of speech
The evaluation value E of the word is input to the neural network 21, and the obtained output is the context probability Pb.

【００５８】この文脈確率Ｐｂのそれぞれの品詞カテゴ
リの確率に、語彙確率検索部１で得られた該単語の各品
詞カテゴリの確率をかけたものを当該単語の評価値Ｅと
する。The probability of each part-of-speech category of the context probability Pb multiplied by the probability of each part-of-speech category of the word obtained by the vocabulary probability search unit 1 is set as the evaluation value E of the word.

【００５９】最大値選択部３２は、入力テキストの各単
語について、評価値Ｅにおいて最大の値となっている品
詞カテゴリを当該単語の品詞として選択する。例えば、
図１７の場合において、Ｗｎ＝６の場合には単語として
“ｍａｋｅ”を評価している。この時の評価値Ｅは、前
から１０番目のユニットが０．３４と最大値をとってい
る。前から１０番目の品詞は、図２２より動詞（Ｖ）で
ある。このようにして、品詞を選択した結果も、図１５
と同じになる。The maximum value selection section 32 selects, for each word of the input text, the part of speech category having the largest value in the evaluation value E as the part of speech of the word. For example,
In the case of FIG. 17, when Wn = 6, "make" is evaluated as a word. As for the evaluation value E at this time, the tenth unit from the front has the maximum value of 0.34. The tenth part of speech from the front is a verb (V) from FIG. The result of selecting the part of speech in this way is also shown in FIG.
Will be the same as

【００６０】この実施例によれば、ニューラルネットワ
ーク２１に品詞カテゴリをとる確率を示す評価値を入力
することにより、文脈確率の検索の回数を減少させるこ
とが可能となり、高速に品詞選択を行うことができる。According to this embodiment, by inputting the evaluation value indicating the probability of taking the part of speech category to the neural network 21, the number of context probability searches can be reduced, and the part of speech can be selected at high speed. Can be.

【００６１】上述の実施例では英語に本発明を適用した
場合について説明したが、本発明はこれに限るものでは
ない。日本語その他の文章についても本発明を適用する
ことができる。In the above embodiment, the case where the present invention is applied to English has been described, but the present invention is not limited to this. The present invention can be applied to Japanese and other sentences.

【００６２】[0062]

【発明の効果】以上、詳細に説明したように、本発明に
よれば文脈確率を求めるのにニューラルネットワークを
用いるようにすることにより、文脈確率テーブルを持つ
必要がなくなり、メモリ容量を小さくすることができる
品詞選択システムを提供することができる。As described in detail above, according to the present invention, by using a neural network to obtain the context probability, it is not necessary to have a context probability table, and the memory capacity can be reduced. Can be provided.

[Brief description of the drawings]

【図１】本発明の原理ブロック図である。FIG. 1 is a principle block diagram of the present invention.

【図２】文脈確率算出部の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a context probability calculation unit.

【図３】本発明の一実施例を示す構成ブロック図であ
る。FIG. 3 is a configuration block diagram showing one embodiment of the present invention.

【図４】末尾の単語から品詞を選択する場合の学習パタ
ーンの作成例を示す図である。FIG. 4 is a diagram showing an example of creating a learning pattern when selecting a part of speech from the last word.

【図５】入力パターン例を示す図である。FIG. 5 is a diagram showing an example of an input pattern.

【図６】学習の説明図である。FIG. 6 is an explanatory diagram of learning.

【図７】教師パターン例を示す図である。FIG. 7 is a diagram showing an example of a teacher pattern.

【図８】先頭の単語から品詞を選択する場合の学習パタ
ーンの作成例を示す図である。FIG. 8 is a diagram showing an example of creating a learning pattern when a part of speech is selected from the first word.

【図９】入力パターン例を示す図である。FIG. 9 is a diagram illustrating an example of an input pattern.

【図１０】教師パターン例を示す図である。FIG. 10 is a diagram showing an example of a teacher pattern.

【図１１】抽出された品詞カテゴリとその確率を示す図
である。FIG. 11 is a diagram showing extracted part-of-speech categories and their probabilities.

【図１２】作成された３単語の品詞並びを示す図であ
る。FIG. 12 is a diagram illustrating a created part-of-speech arrangement of three words.

【図１３】本発明により得られた評価値算出結果を示す
図である。FIG. 13 is a diagram showing evaluation value calculation results obtained by the present invention.

【図１４】本発明により得られた評価値算出結果を示す
図である。FIG. 14 is a diagram showing evaluation value calculation results obtained by the present invention.

【図１５】本発明による品詞選択結果を示す図である。FIG. 15 is a diagram showing a part of speech selection result according to the present invention.

【図１６】本発明の他の実施例を示す構成ブロック図で
ある。FIG. 16 is a configuration block diagram showing another embodiment of the present invention.

【図１７】評価値算出の例を示す図である。FIG. 17 is a diagram illustrating an example of evaluation value calculation.

【図１８】評価値算出の例を示す図である。FIG. 18 is a diagram showing an example of evaluation value calculation.

【図１９】従来装置の構成例を示すブロック図である。FIG. 19 is a block diagram illustrating a configuration example of a conventional device.

【図２０】語彙確率辞書の内部構成例を示す図である。FIG. 20 is a diagram showing an example of the internal configuration of a vocabulary probability dictionary.

【図２１】文脈確率テーブルの構成例を示す図である。FIG. 21 is a diagram illustrating a configuration example of a context probability table.

【図２２】品詞の分類を示す図である。FIG. 22 is a diagram illustrating classification of parts of speech.

[Explanation of symbols]

１語彙確率検索部２語彙確率辞書６評価値算出部１０文脈確率検出部２０文脈確率算出部 Reference Signs List 1 vocabulary probability search unit 2 vocabulary probability dictionary 6 evaluation value calculation unit 10 context probability detection unit 20 context probability calculation unit

フロントページの続き (56)参考文献山口外、「ニューラルネットを利用した句境界抽出機能」、電子情報通信学会全国大会講演論文集、1990年秋季、ｐ. １−126 山口外、「ニューラルネット利用した言語処理と音素記号生成−英語音声合成システムへの応用−」、電子情報通信学会論文誌、1992年、Ｖｏｌ．Ｊ75−Ｄ▲ ＩＩ▼，Ｎｏ．５，ｐ．852−860 ＭａｔｓｕｍｏｔｏＴ，ＹａｍａｇｕｃｈｉＹ，”ＡＭｕｌｔｉ−ＬａｎｇｕａｇｅＴｅｘｔ−ｔｏ−ＳｐｅｅｃｈＳｙｓｔｅｍＵｓｉｎｇＮｅｕｒａｌＮｅｔｗｏｒｋｓ”，ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＥＳＣＡＷｒｏｋｓｈｏｐｏｎＳｐｅｅｃｈＳｙｎｔｈｅｓｉｓ，ｐ. 269−272 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/27 G06F 17/28 G06F 15/18 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References Yamaguchi, G., "Phrase Boundary Extraction Function Using Neural Network", Proc. Of the IEICE National Convention, Autumn 1990, p. Language Processing and Phoneme Symbol Generation Using -Application to English Speech Synthesis System- ", Transactions of the Institute of Electronics, Information and Communication Engineers, 1992, Vol. J75-D II, No. 5, p. 852-860 Matsumoto T, Yamaguchi Y, "A Multi-Language Text-to-Speech System Usage Neural Networks, Inc. Field (Int.Cl. ⁷ , DB name) G06F 17/27 G06F 17/28 G06F 15/18 JICST file (JOIS)

Claims

(57) [Claims]

1. A vocabulary probability search unit (1) that receives an input text and searches for a vocabulary probability for each word; a vocabulary and a probability of the vocabulary are stored;
A vocabulary probability dictionary (2) to be searched for; and a context probability detection unit (10) for detecting a context probability of a plurality of parts of speech based on the vocabulary probabilities retrieved from the vocabulary probability retrieval unit (1). A context probability calculation unit (20) using a neural network, which inputs an input pattern given from the context probability detection unit (10) and calculates a context probability according to the type of part of speech of the next word; In response to the output of the context probability detection unit (10), an evaluation value calculation unit (6) calculates an evaluation value according to a predetermined procedure from the context probabilities of a plurality of part-of-speech lists and the vocabulary probabilities of words at the beginning or end of the part-of-speech list. ) Is a part-of-speech selection system.

2. When the context probability calculating section (20) calculates a context probability of an n-word part-of-speech list, a part-of-speech pattern of (n-1) th word from the beginning of the part-of-speech list is input as (n 1) The part-of-speech selection system according to claim 1, wherein a probability of taking each part-of-speech category of the n-th word with respect to the part-of-speech arrangement of the word is output.

3. When the context probability calculating section (20) calculates a context probability of an n-word part-of-speech list, a part-of-speech pattern of a word from the second to the end n of the part-of-speech list is input as (n−1) 2. The part-of-speech selection system according to claim 1, wherein a probability of taking each part-of-speech category of the first word with respect to the word part-of-speech arrangement is output.

4. When the context probability calculating section (20) calculates a context probability of an n-word part-of-speech list, an evaluation value indicating a probability of taking each part-of-speech is input to (n-1) part-of-speech arrangement patterns. 4. The part-of-speech selection system according to claim 2, wherein a probability that each part-of-speech category of the nth or first word is taken is output.