JPS6342279B2

JPS6342279B2 -

Info

Publication number: JPS6342279B2
Application number: JP57172787A
Authority: JP
Inventors: Fumio Togawa; Mitsuhiro Toya
Original assignee: DENSHI KEISANKI KIPPON GIJUTSU KENKYU KUMIAI
Current assignee: DENSHI KEISANKI KIPPON GIJUTSU KENKYU KUMIAI
Priority date: 1982-09-30
Filing date: 1982-09-30
Publication date: 1988-08-22
Also published as: JPS5961898A

Description

[Detailed description of the invention]

〈技術分野〉本発明は認識装置の改良に関し、更に詳細には
例えば文節等の一区切りの音声等の一区切りの認
識すべき情報を音韻、かな、音節、文節等のより
細分化された単位要素で認識する認識装置の改良
に関するものである。〈従来技術〉文節等の一区切りの音声等を音韻、かな、音節
等のより細分化された単位で認識する場合、従来
一般的には入力された認識すべき一区切りの音声
情報等を例えば音響処理して音韻、音節等の単位
毎の特徴ベクトル入力パターンを得ると共に、こ
の入力パターンと予め記憶されている標準パター
ンとのマツチングを行つて、入力された情報を候
補単位列として類似度の高いものから出力し、こ
の出力された候補単位列と文節等の辞書の内容と
を照合して入力された情報に対する文節等の一区
切りの情報を認識している。しかし、このような従来の方法によれば、全て
の音韻、音節等の標準パターンと入力パターンと
のマツチングを行なつて類似度を算出し、類似度
の高いものから順に候補音節等として出力してい
る。したがつて、例えば拗音を含む単音節単位で認
識する場合、各音節単位全てについて100種以上
の単音節の標準パターンと入力パターンとの間で
マツチングを行う必要があり、その処理時間が多
大なものとなつていた。また、その後に類似度の高いものから出力され
る候補単位列の全てについて辞書照合処理を行な
う必要があり、その処理時間が長くなり、正しい
文節等を認識する確度が向上せず、結果的に全体
の認識に要する処理量が膨大なものになつてい
た。〈目的〉本発明は、上記従来の欠点を除去した認識装置
を提供することを目的とし、正しい文節等の一区
切りの認識すべき情報を認識する確度を向上させ
ると共に、結果的に全体の認識に要する処理量を
減少させることのできる認識装置を提供するもの
である。〈実施例〉以下、本発明の認識装置を文節等の一区切りの
音声入力を音節等のより細分化された単位要素で
認識する場合の例を実施例として説明する。本発明の実施例によれば、文節等の一区切りの
音声等の認識すべき情報を音韻、かな、音節等の
より細分化されたＮ個の単位要素で認識する認識
装置において、認識対象となる文節あるいは文章
等の文字（単位要素）列について（Ｎ＋１）個の
文字（単位要素）間の接続関係である遷移関係を
記述した遷移行列を作成する遷移行列作成手段
と、この遷移行列作成手段により作成された遷移
行列にもとずいて、音節（単位要素）ラテイス生
成時に、一音節（単位要素）前のどの候補音節
（単位要素）からも遷移しない音節（単位要素）
群は認識対象から除去し、及びまたは候補列作成
時に各候補列に対して遷移行列を参照し、遷移し
ない音節（単位要素）の組合せを含む候補列は除
外する等の認識処理を行う処理手段とを備えて、
次の高次の辞書照合の際の処理量の削減を図るよ
うに構成されている。まず、本発明の実施例の説明に先立ち、本発明
の認識装置に用いられる単位要素間の接続関係で
ある遷移関係を示した遷移行列について説明す
る。一般に日本語文章は、全てかな文字で表現した
場合、かな文字列に対応した音節列で表現でき
る。例えば文節「地球の」は“ち”“きゆ”“う”
“の”という４個の単音節といわれる単位要素か
ら成り立つている。２つの音節間の接続関係
（“ち”から“きゆ”，“きゆ”から“う”，“う”か
ら“の”）を、日本語全て、あるいは特定の分野、
話題における文章等について調べると接続（遷
移：以下遷移ということばを使う）しない音節対
がある。例えばぱ行の音節の前には“ん”，“つ”
以外はこない。また“にや”は語頭にこないし、
“へ”（へと発声するもの）は語尾にこない。こような文節を構成する音節の１次の遷移関係
を以下に示す式(1)に従つて記述して、第１図に示
すような遷移行列Ｍ（Ｘ，Ｙ）を作成する。第１図において遷移行列Ｍ（Ｘ，Ｙ）は単位要
素列である文字列の文字Ｘから次の文字Ｙへの遷
移を記述したものであり、単位要素（音節）がＮ
個の場合、（Ｎ＋１）×（Ｎ＋１）の行列であり、
ハード的にはROM等に記憶される。またY₀列に
は各単位要素（１〜Ｎ）が節頭に来るか否かを表
わし、X₀行には各単位要素（１〜Ｎ）が節尾に
来るか否かを表わすデータが書込まれる。例えば“赤い”という文字列の遷移を遷移行列
に書込んだ例を第２図に示す。遷移行列の要素は
０（遷移不可能）か１（遷移可能）の２値のどちら
かで表現され、１ビツトで記憶される。なお、第
２図においては表記“１”以外の行列要素は全て
“０”であり、その表示を省略している。次に遷移行列の作成について、今少し詳細に説
明する。まず遷移行列の作成にあたつて遷移行列メモリ
を“０”に初期セツト〔Ｍ（Ｘ，Ｙ）＝０〕する。次に文字列〓＝（a₁，a₂，a₃，……a_I）但し、Ｉ：列の文字数とした場合、次式(1) Ｍ（０，a₁）＝１，（ｉ＝１）Ｍ（a_i-1，a_i）＝１，（ｉ＝２〜Ｉ）Ｍ（a_I，０）＝１，（ｉ＝Ｉ＋１） …(1) に従つて、文字列〓の文字遷移関係を遷移行列Ｍ
（Ｘ，Ｙ）に書込む。同様に認識対象となる文字
列の全てについて遷移関係を書込み遷移行列（１
次）の作成を完了する。このようにして作成された具体的な遷移行列
（１次）Ｍ（Ｘ，Ｙ）の例を第３図に示している。
この第３図より明らかなように例えば（Ｘ，Ｙ）
＝（え，く）のビツト位置が“１”であるため、
“え”から“く”への遷移が存在し、また（Ｘ，
Ｙ）＝（え，け）のビツト位置が“０”であるた
め、“え”から“け”への遷移が存在しないこと
を表わしている。上記は１次の遷移であるが、２次遷移、更には
一般にＭ次へ拡張したＭ次遷移行列も同様に次式
(2)に従つて作成することが出来る。Ｍ次遷移行列：Ｍ（X₁，X₂，X₃，……，X_M，
Ｙ），（Ｎ＋１）^M+1次元Ｍ（a_i-M，a_i-(M-1)，……，a_i）＝１，（ｉ＝１〜Ｉ＋１） ……(2) 但しｌ０ｌ＞Ｉのときa_l＝０本発明の実施例は、この遷移行列を任意に作成
することが出来るように認識すべき所定の単位要
素列について（Ｎ＋１）個の単位要素間の接続関
係である遷移関係を記述した遷移行列を作成する
遷移行列作成手段を認識装置に備えるようにした
ものである。次に本発明の実施例を図面を参照して説明す
る。第４図は本発明の一実施例装置の構成を示すブ
ロツク図である。第４図において、１はフロツピーデイスク装置
であり、認識対象となる文節あるいは文章等の文
字列を記憶した記憶媒体が装着される。２は文字
コード入力端子であり、外部装置から文字列の文
字コードが入力される。また３はキーボード装
置、４は切換スイツチ手段、５は中央処理装置
（CPU）、６は文字バツフア、７は文字カウンタ、
８は遷移行列メモリ、９は認識処理部、１０は認
識すべき音声情報が入力される入力端子、１１は
上記CPUに対して遷移行列作成の指示信号を入
力するためのフアンクシヨンキーである。上記の如き構成において、遷移行列メモリ８に
所望の遷移行列情報を書込む場合、まずフアンク
シヨンキー１１を操作してCPUに対して遷移行
列作成の指示を行ない、次に切換スイツチ手段４
を操作してキーボード３、フロツピーデイスク装
置１あるいはその他の入力手段を選択し、認識対
象となる文字列を単位として入力する。上記入力手段より入力された文字列はCPU５
の制御の下に第５図の遷移行列作成の処理フロー
に従つて遷移行列メモリ８へ遷移行列情報を書き
込んでいく。即ちCPU５は最初遷移行列初期値設定動作を
実行して（ステツプn1）、遷移行列メモリ８の記
憶内容の全てを初期値“０”に設定する。次に具体的な遷移行列作成動作に移行し
（n2）、入力手段より入力された文字列はコード
化されて文字バツフア６に一時記憶され、またそ
の文字数は文字カウンタ７に記憶される（n3）。次に文字バツフア６に記憶されたコード化され
た文字列情報にもとずいて、文字遷移関係が遷移
行列メモリ８に書込まれる（n4）。この動作は上
記した式(1)に従つて実行される。具体的には例え
ば文字バツフア６に記憶されたデータは順次第６
図ａに示す一桁文字バツフア６Ｘ，６Ｙ及び一文
字遅延器６Ｄより成る遷移判定手段にシフト入力
され、このバツフア６Ｘ及び６Ｙに内容に対応し
て遷移行列メモリ８の（Ｘ，Ｙ）アドレスを指定
すると共にそのアドレス位置に“１”を書込む。
従つて最初のシフト動作によつて第１番目の文字
コードがバツフア６Ｙに入力され、メモリ８のＸ
＝０，Ｙ＝a₁のアドレス位置が指定され、そのア
ドレス位置に“１”が書込まれる。次のシフト動
作により第１番目の文字コードがバツフア６Ｘに
入力され、第２番目の文字コードがバツフア６Ｙ
に入力され、メモリ８のＸ＝a₁，Ｙ＝a₂のアドレ
ス位置が指定され、そのアドレス位置に“１”が
書込まれ、a₁からa₂への遷移関係が書込まれる。
以下同様の動作を文字カウンタ７の記憶内容に対
応して実行し、一文字列に対する遷移関係の書込
みを完了する。以下、同様の動作を認識対象となる文字列の全
てについて行ない、遷移行列の作成を完了する
（n5，n6）。また認識対象語を新たに追加する場合には、第
５図におけるステツプn1の初期設定動作を除い
て、ステツプn3〜n5の動作を実行して遷移行列
にその遷移関係を書込む。以上は１次の遷移であるが、２次遷移、更には
Ｍ次の遷移行列についても、同様に上記した式(2)
に従がつて第６図ｂ，ｃに示す遷移判定手段によ
つて作成することが出来る。次に上記のようにして作成された遷移行列を用
いた認識動作について説明する。第７図は上記第４図に示した認識処理部９の詳
ブツク図である。第７図において、文節音声入力部２１に入力さ
れた音声情報は次段の音響処理・比較部２２に入
力される。この音響処理・比較部２２は遷移行列
メモリ２６（第５図のメモリ８に対応）を用いた
処理部分を除いた部分は従来公知のものであり、
例えば文節音声入力部２１に入力された文節音声
信号が音響処理部２２により単音節毎に特徴抽出
処理が行なわれ、各単音節毎の特徴パターンが同
処理部２２内のバツフアに一時記憶される。一方
記憶装置２３には各単音節毎の標準パターンP_i
（ｉ＝１〜Ｎ）が記憶されており、の標準パター
ンP_iが順次読出されて処理・比較部２２において
該処理部内のバツフアに記憶れた入力音声の入力
特徴パターンとのマツチング計算が行なわれる。従来技術によれば、この標準パターンと入力特
徴パターンとのマツチング計算処理は全ての標準
パターンについて行なわれていたが、本発明の実
施例によれば、後述するように遷移行列メモリ２
６に記憶された情報にもとずいて前に候補として
認識した音節に接続可能な音節（最初の場合は先
頭に来る可能性のある音節）の標準パターンとの
マツチングが計算され、最も近似したものが第１
候補として、また順次近似したものが次候補とし
て選出され、その結果が候補音節メモリ２４に記
憶される。即ち音節ラテイス生成時に、一音節前
のどの候補音節からも遷移しない音節群は認識対
象から除外するように処理される。上記候補音節ラテイスメモリ２４に記憶された
複数個の候補音節の時系列は候補列作成部２５及
び遷移行列メモリ２６より成る候補列出力部２７
に入力され、該候補列出力部２７において、遷移
行列メモリ２６の内容を参照して遷移不可能な音
節遷移を含む候補列は除外して、遷移可能な候補
列のみ、信頼度の高い組合せ順に作成され、この
候補列と辞書２８に記憶された文節とが辞書照合
部２９により照合され、一致すればその結果が文
節出力部３０に出力されるように構成されてい
る。次に遷移行列Ｍ（Ｘ，Ｙ）を用いた音節認識処
理について第８図に示す遷移行列を用いた候補音
節作成処理ブロツク図を参照して説明する。本実施例においては、結果として得る候補音節
を時系列順に候補音節ラテイスバツフア２４に一
次記憶する。また上記した遷移行列情報はメモリ
２６に記憶されており、音節標準パターンはメモ
リ２３に記憶されている。候補音節ラテイス２４には認識結果が次表の如
く記憶されていくが、今第ｉ音節を認識する場合
には、以下の如く処理が実行される。 <Technical Field> The present invention relates to the improvement of a recognition device, and more specifically, the present invention relates to the improvement of a recognition device, and more specifically, the present invention relates to the improvement of a recognition device, and more specifically, the present invention relates to the improvement of a recognition device, and more specifically, the present invention relates to the improvement of a recognition device. This invention relates to improvements in recognition devices. <Prior art> When recognizing one segment of speech, such as a phrase, in more subdivided units such as phonemes, kana, syllables, etc., conventionally, the input speech information of one segment to be recognized is generally processed through acoustic processing, for example. to obtain a feature vector input pattern for each unit such as phoneme, syllable, etc., and to match this input pattern with a pre-stored standard pattern, to select the input information as a candidate unit sequence with a high degree of similarity. The output candidate unit string is compared with the contents of a dictionary of phrases, etc., to recognize one section of information, such as a phrase, for the input information. However, according to such conventional methods, the similarity is calculated by matching the input pattern with standard patterns of all phonemes, syllables, etc., and the similarity is output as candidate syllables etc. in descending order of similarity. ing. Therefore, for example, when recognizing monosyllables including syllables, it is necessary to match the input pattern with more than 100 standard patterns of monosyllables for each syllable, which requires a large amount of processing time. It had become a thing. In addition, it is necessary to perform dictionary matching processing on all of the candidate unit sequences output from those with high similarity, which increases the processing time and does not improve the accuracy of recognizing correct phrases, etc. The amount of processing required for overall recognition was enormous. <Purpose> The present invention aims to provide a recognition device that eliminates the above-mentioned conventional drawbacks, and improves the accuracy of recognizing a section of information such as a correct phrase, and as a result improves overall recognition. The present invention provides a recognition device that can reduce the amount of processing required. <Example> Hereinafter, an example in which the recognition device of the present invention recognizes one segment of audio input such as a phrase using more subdivided unit elements such as syllables will be described as an example. According to an embodiment of the present invention, in a recognition device that recognizes information to be recognized, such as a segment of speech such as a phrase, using N unit elements that are further divided into phonemes, kana, syllables, etc., the recognition target A transition matrix creation means for creating a transition matrix that describes a transition relationship that is a connection relationship between (N+1) characters (unit elements) for a string of characters (unit elements) such as a bunsetsu or a sentence, and by this transition matrix creation means. Based on the created transition matrix, when generating a syllable (unit element) latex, a syllable (unit element) that does not transition from any candidate syllable (unit element) one syllable (unit element) before.
Processing means that performs recognition processing such as removing groups from recognition targets, and/or referring to a transition matrix for each candidate sequence when creating candidate sequences, and excluding candidate sequences that include combinations of syllables (unit elements) that do not transition. and,
It is configured to reduce the amount of processing during the next high-level dictionary comparison. First, prior to describing the embodiments of the present invention, a transition matrix showing a transition relationship, which is a connection relationship between unit elements used in the recognition device of the present invention, will be described. In general, if a Japanese sentence is expressed entirely in kana characters, it can be expressed as a syllable string corresponding to the kana character string. For example, the phrase ``earth'' is ``chi'', ``kiyu'', and ``u''.
It is made up of four monosyllable unit elements called "no". The connection relationship between two syllables (“chi” to “kiyu”, “kiyu” to “u”, “u” to “no”) can be studied in all of Japanese or in specific fields.
If you look at sentences etc. in the topic, there are syllable pairs that do not connect (transition: hereinafter we will use the term transition). For example, the syllables in the line “pa” are preceded by “n” and “tsu.”
Nothing else will come. Also, “niya” does not come at the beginning of the word,
“He” (pronounced “he”) does not come at the end of the word. The first-order transition relationship of the syllables constituting such a bunsetsu is described according to the following equation (1), and a transition matrix M(X, Y) as shown in FIG. 1 is created. In Figure 1, the transition matrix M(X, Y) describes the transition from character X to the next character Y in a character string that is a unit element string, and the unit elements (syllables) are N.
, it is a (N+1)×(N+1) matrix,
In terms of hardware, it is stored in ROM etc. In addition, the _Y0 column indicates whether each unit element (1 to N) comes at the beginning of the clause, and the _X0 row contains data indicating whether each unit element (1 to N) comes at the end of the clause. written. For example, FIG. 2 shows an example in which the transition of the character string "red" is written in a transition matrix. The elements of the transition matrix are expressed as either 0 (transition not possible) or 1 (transition possible), and are stored as 1 bit. In FIG. 2, all matrix elements other than the notation "1" are "0", and their display is omitted. Next, the creation of the transition matrix will be explained in a little more detail. First, when creating a transition matrix, the transition matrix memory is initially set to "0" [M(X,Y)=0]. Next, the character string = (a ₁ , a ₂ , a ₃ , ...a _I ) However, when I is the number of characters in the string, the following formula (1) M (0, a ₁ ) = 1, (i = 1) M(a _i-1 , a _i )=1, (i=2~I) M(a _I ,0)=1, (i=I+1) ...According to (1), the characters of the string 〓 The transition relationship is expressed as a transition matrix M
Write to (X, Y). Similarly, write transition relationships for all character strings to be recognized and transition matrix (1
Complete the creation of the following). An example of a concrete transition matrix (first order) M(X, Y) created in this way is shown in FIG.
As is clear from this figure 3, for example (X, Y)
Since the bit position of = (E, KU) is “1”,
There is a transition from “e” to “ku”, and (X,
Since the bit position of Y)=(E, KE) is "0", this indicates that there is no transition from "E" to "KE". The above is a first-order transition, but the second-order transition, and moreover, the M-order transition matrix that is generally extended to the M-order is also expressed by the following formula.
It can be created in accordance with (2). M-order transition matrix: M(X ₁ , X ₂ , X ₃ , ..., X _M ,
Y), (N+1) ^M+1 dimension M(a _iM , a _i-(M-1) , ..., a _i ) =1, (i=1~I+1) ...(2) However, l0 l>I When a _l =0, the embodiment of the present invention establishes a transition relationship that is a connection relationship between (N+1) unit elements for a predetermined unit element sequence to be recognized so that this transition matrix can be created arbitrarily. The recognition device is provided with transition matrix creation means for creating a transition matrix that describes the transition matrix. Next, embodiments of the present invention will be described with reference to the drawings. FIG. 4 is a block diagram showing the configuration of an apparatus according to an embodiment of the present invention. In FIG. 4, reference numeral 1 denotes a floppy disk device, into which a storage medium storing character strings such as phrases or sentences to be recognized is attached. 2 is a character code input terminal, into which a character code of a character string is input from an external device. Further, 3 is a keyboard device, 4 is a changeover switch means, 5 is a central processing unit (CPU), 6 is a character buffer, 7 is a character counter,
8 is a transition matrix memory, 9 is a recognition processing unit, 10 is an input terminal to which speech information to be recognized is input, and 11 is a function key for inputting an instruction signal for creating a transition matrix to the CPU. In the above configuration, when writing desired transition matrix information into the transition matrix memory 8, first operate the function key 11 to instruct the CPU to create a transition matrix, and then switch the switch means 4.
is operated to select the keyboard 3, floppy disk device 1, or other input means, and input the character string to be recognized as a unit. The character string input from the above input method is CPU5
Under the control of , transition matrix information is written into the transition matrix memory 8 according to the process flow for creating a transition matrix shown in FIG. That is, the CPU 5 first executes the transition matrix initial value setting operation (step n1), and sets all the stored contents of the transition matrix memory 8 to the initial value "0". Next, the process moves to a specific transition matrix creation operation (n2), and the character string input from the input means is encoded and temporarily stored in the character buffer 6, and the number of characters is stored in the character counter 7 (n3). ). Next, character transition relationships are written into the transition matrix memory 8 based on the encoded character string information stored in the character buffer 6 (n4). This operation is performed according to equation (1) above. Specifically, for example, the data stored in the character buffer 6 is
It is shifted into the transition determination means consisting of single-digit character buffers 6X, 6Y and single-character delay device 6D shown in Figure a, and the (X, Y) address of the transition matrix memory 8 is specified to the buffers 6X and 6Y in accordance with the contents. At the same time, "1" is written to that address location.
Therefore, by the first shift operation, the first character code is input to the buffer 6Y, and the
=0, Y=a ₁ address position is specified, and "1" is written to that address position. With the next shift operation, the first character code is input to buffer 6X, and the second character code is input to buffer 6Y.
is input, the address position of X=a ₁ and Y=a ₂ in the memory 8 is specified, "1" is written to that address position, and the transition relationship from a ₁ to a ₂ is written.
Thereafter, similar operations are performed corresponding to the contents stored in the character counter 7, and writing of the transition relationship for one character string is completed. Thereafter, similar operations are performed for all character strings to be recognized, completing the creation of the transition matrix (n5, n6). When adding a new word to be recognized, steps n3 to n5 are executed except for the initial setting operation of step n1 in FIG. 5, and the transition relationship is written in the transition matrix. The above is a first-order transition, but for a second-order transition and even an M-order transition matrix, the above formula (2) can be used similarly.
Accordingly, it can be created by the transition determination means shown in FIGS. 6b and 6c. Next, a recognition operation using the transition matrix created as described above will be explained. FIG. 7 is a detailed diagram of the recognition processing section 9 shown in FIG. 4 above. In FIG. 7, the speech information input to the phrase speech input section 21 is input to the next stage acoustic processing/comparison section 22. This acoustic processing/comparison unit 22 is a conventionally known part except for a processing part using a transition matrix memory 26 (corresponding to the memory 8 in FIG. 5).
For example, the phrase speech signal input to the phrase speech input section 21 is subjected to feature extraction processing for each monosyllable by the acoustic processing section 22, and the feature pattern for each monosyllable is temporarily stored in a buffer in the processing section 22. . On the other hand, the storage device 23 stores a standard pattern P _i for each single syllable.
(i=1 to N) are stored, and the standard patterns P _i are sequentially read out and the processing/comparison section 22 performs matching calculations with the input feature pattern of the input voice stored in the buffer in the processing section. It will be done. According to the prior art, this matching calculation process between the standard pattern and the input feature pattern was performed for all standard patterns, but according to the embodiment of the present invention, as will be described later, the transition matrix memory 2
Based on the information stored in step 6, the matching with the standard pattern of syllables that can be connected to the syllable previously recognized as a candidate (in the first case, a syllable that may come at the beginning) is calculated, and the most approximate pattern is calculated. Things come first
The candidates and successively approximated ones are selected as the next candidates, and the results are stored in the candidate syllable memory 24. That is, when generating a syllable latex, syllable groups that do not transition from any candidate syllable one syllable before are processed to be excluded from recognition targets. The time series of the plurality of candidate syllables stored in the candidate syllable latex memory 24 is outputted to a candidate sequence output unit 27 consisting of a candidate sequence creation unit 25 and a transition matrix memory 26.
The candidate string output unit 27 refers to the contents of the transition matrix memory 26, excludes candidate strings that include syllable transitions that cannot be transitioned, and selects only transitionable candidate strings in the order of combinations with high reliability. This candidate string created and stored in the dictionary 28 is compared with the clauses stored in the dictionary 28 by the dictionary collation unit 29, and if they match, the result is output to the clause output unit 30. Next, syllable recognition processing using the transition matrix M(X, Y) will be explained with reference to a block diagram of candidate syllable creation processing using the transition matrix shown in FIG. In this embodiment, the resulting candidate syllables are temporarily stored in the candidate syllable latex buffer 24 in chronological order. Further, the above-mentioned transition matrix information is stored in the memory 26, and the syllable standard pattern is stored in the memory 23. The recognition results are stored in the candidate syllable latex 24 as shown in the following table, and when the i-th syllable is to be recognized, the following processing is executed.

【表】今、前音節候補をＸ＝｛S_i-1，ｊ｝ｊ＝１〜Ｊ（ｉ−１）組合せ数：Ｊ（ｉ−１）（ｌ＝０のときS_l,j＝０）とした場合、次式(3)に従つて直前の複数個（Ｊ
（ｉ−１）個）の候補音節について遷移行列の各
行の和をとり、得られた行ｍ（Ｙ）が０である音
節は遷移不可能であると指定する。ｍ（Ｙ）＝VM（S_i-1,j，Ｙ） ……(3) ＝Ｍ（S_i-1,1，Ｙ）＋Ｍ（S_i-1,2，Ｙ）＋…＋Ｍ（S_i-1,J(i-1)，Ｙ）この(3)式においてｍ（Ｙ）＝０となり、遷移不可
能と指定された音節群は、除外して次の類似比較
の処理を行い、第ｉ音節の候補音節を出力し、候
補音節ラテイス７に書込む。但し、ｉ＝１（節頭
の音節）のときは第０行Ｍ（０，Ｙ）によつて遷
移不可能と指定された音節群を除外して類似比較
の処理を行なう。以上を繰返して、一文節音声の候補音節ラテイ
スの作成を完了する。今、一文節音声として「国民は」を入力した場
合、音響処理部２２により音節毎に特徴抽出が行
なわれ、その音節毎の特徴パターン〓_iが入力パ
ターン時系列バツフア３１に記憶される。次に遷
移行列を用いた候補音節作成処理に移り、最初に
第１音節の特徴パターン〓₁が入力パターンバツ
フア３２に読み込まれ、次にステツプn3に移行
して前候補音節群により式(3)にしたがつて遷移行
列の行を指定する。最初の場合はステツプn4に
おいて第０行のＭ（０，Ｙ）が指定されその内容
がバツフア３３に一時記憶され、ステツプn5の
生起音節の指定が成される。次にステツプn6に移行して入力パターンバツ
フア３２に記憶された第１音節〓₁の特徴パター
ンがロードされ、この特徴パターン〓₁と音節標
準パターンメモリ２３に記憶された標準パターン
の内バツフア３３によつて生起音節と指定されて
順次標準パターンバツフア３４に読出される標準
パターンとの間で類似比較が行なわれ（ステツプ
ｎ７）、その結果にもとずいて候補音節が出力さ
れ（ステツプn8）、その結果が候補音節ラテイス
２４に書かれる。この実施例においては第１音節
候補として“KO”，“GO”，“BO”が記憶され
る。次にステツプn2に戻り、第２音節特徴パター
ン〓₂がバツフア３２に入力され、ステツプn3に
移行して、候補音節ラテイス２４の第１候補音節
にもとずいて“KO”，“GO”，“BO”に対応した
各行のＭ（S_l,1〜3，Ｙ）が指定され、ステツプn4に
おいて、その遷移行列の和（OR）が作成されて
その結果がバツフア３３に一時記憶され、ステツ
プn5の生起音節の指定が成される。次にステツプn6に移行し、以下同様のステツ
プn6〜n9を実行して第２候補音節“KU”，“GU”
をメモリ２４に記憶する。以上の動作を繰返して一文節の候補音節ラテイ
スの作成を完了する。以上のようにして候補音節ラテイス２４に候補
列が記憶されることになるが、遷移行列を用いな
い場合の従来方式の場合と本方式の場合の実例を
入力音声「国民は」について次表に示す。[Table] Now _, _the previous syllable candidates are: ), the immediately preceding multiple pieces (J
The sum of each row of the transition matrix is calculated for (i-1) candidate syllables, and the syllables whose row m(Y) obtained is 0 are designated as non-transitionable. m(Y)=VM(S _i-1,j ,Y) ...(3) =M(S _i-1,1 ,Y)+M(S _i-1,2 ,Y) +...+M(S _{i -1,J(i-1)} , Y) In this equation (3), m(Y) = 0, and the syllable group designated as impossible to transition is excluded and the next similarity comparison process is performed. Output the candidate syllable of the i syllable and write it in the candidate syllable latex 7. However, when i=1 (syllable at the beginning of a clause), the syllable group designated as non-transitionable by the 0th row M(0, Y) is excluded from the similarity comparison process. By repeating the above steps, the creation of candidate syllable latisses for one sentence of speech is completed. Now, when "Kokumin wa" is inputted as a single syllable speech, the acoustic processing unit 22 extracts features for each syllable, and the feature pattern 〓 _i for each syllable is stored in the input pattern time series buffer 31. Next, we move on to the candidate syllable creation process using the transition matrix. First, the feature pattern of the first syllable = ₁ is read into the input pattern buffer 32, and then we move to step n3, where the previous candidate syllable group is used to create the formula (3). ) to specify the rows of the transition matrix. In the first case, M(0, Y) in the 0th line is specified in step n4, its contents are temporarily stored in the buffer 33, and the occurring syllable is specified in step n5. Next, the process moves to step n6, where the characteristic pattern of the first syllable 〓 ₁ stored in the input pattern buffer 32 is loaded, and this characteristic pattern 〓 ₁ and the buffer 33 of the standard patterns stored in the syllable standard pattern memory 23 are loaded. A similarity comparison is made between the standard patterns designated as the occurring syllables and sequentially read out to the standard pattern buffer 34 (step n7), and candidate syllables are output based on the results (step n8). ), and the result is written in the candidate syllable latex 24. In this embodiment, "KO", "GO", and "BO" are stored as first syllable candidates. Next, the process returns to step n2, where the second syllable feature pattern 〓 ₂ is input to the buffer 32, and the process proceeds to step n3, where "KO", "GO", M (S _{l,1 to 3} , Y) of each row corresponding to "BO" is specified, and in step n4, the sum (OR) of the transition matrices is created, the result is temporarily stored in the buffer 33, and the result is temporarily stored in the buffer 33. The specification of the occurring syllable of n5 is completed. Next, move to step n6, and execute similar steps n6 to n9 to produce the second candidate syllables “KU”, “GU”.
is stored in the memory 24. By repeating the above operations, the creation of candidate syllable latisses for one phrase is completed. As described above, candidate strings are stored in the candidate syllable lattice 24. Examples of the conventional method and the present method when no transition matrix is used are shown in the table below for the input speech "Kokuminwa". show.

【表】【table】

【表】上記の例から明らかなように、本方式による方
が正しい文字列が候補列の上位に上がつている様
子がわかる。以上の遷移行列は１次遷移であるが、２次遷
移、更には一般的なＭ次遷移まで同じ手法で拡張
することができる。なおＭ次の遷移行列の作成は上述の式(2)に従
い、前候補音節（Ｍ音節前まで）からの音節指定
は次に示す式(4)によつて行なうことが出来る。即ちＭ次遷移行列Ｍ（X₁，X₂，…，X_M，Ｙ）
への拡張の場合、前音節候補列を｛X₁，X₂，…，X_M｝＝｛S_i-M,j1S_i-(M-1),j2 …S_i-1,jM｝ j₁＝１〜Ｊ（ｉ−Ｍ） j₂＝１〜Ｊ（ｉ−（Ｍ−１））〓 j_M＝１〜Ｊ（ｉ−１）組合せの数：Ｊ（ｉ−Ｍ）・Ｊ（ｉ−（Ｍ−１））
…Ｊ（ｉ−１）（ｌ０のときS_l,j＝０）とした場合、音節指定はｍ（Ｙ）＝VM（S_i-M,j1 ，S_i-(M-1),j2， …，S_i-1,jM，Ｙ） …(4) j₁＝１〜Ｊ（ｉ−Ｍ） j₂＝１〜Ｊ（ｉ
−（Ｍ−１））〓 j_M＝１〜Ｊ（ｉ−１）によつて行なうことになる。なお、Ｍの次数を大きくとれば、生成音節の限
定が強くなり、効果はより大きくなる。次に上記候補列出力部２７で実行されている遷
移行列を用いた候補音節列作成動作について、第
９図に示す遷移行列を用いた候補列作成の処理ブ
ロツク図を参照して説明する。上記第７図に示した音響処理・比較部２２から
出力された複数個の候補音節の時系列を記憶する
候補音節ラテイスメモリ２４の内容をもとに、候
補音節列作成部４１において信頼度の高い順に候
補列が作成され、その結果が候補音節列バツフア
４２に一次記憶される。この候補音節列バツフア
４２に記憶された候補音節列は遷移行列参照部４
３においてメモリ２６に記憶された遷移行列：Ｍ
（Ｘ，Ｙ）を参照して、遷移可能か不可能かを次
式(5)によつて判定部４４において判定し、可能な
候補列のみ候補音節列書込み部４５を介して候補
音節列出力バツフア４６に記憶していく。今第ｊ番目の候補音節列を〓_j＝（a₁，a₂，……a_I）但し、a_i：第ｉ番目の音節番号Ｉ：列の音節数とした場合、判定部４４による遷移行列Ｍ（Ｘ，
Ｙ）を用いた候補列否定はＭ（０，a₁）＝０（ｉ＝１）Ｍ（a_i-1，a_i）＝０（ｉ＝２〜１）Ｍ（a_I，０）＝０（ｉ＝Ｉ＋１） ……(5) のいずれか一つが成立した場合に成される。この(5)式において、いずれか一つが成立した遷
移不可能な音節列を含んだ候補音節列は除外さ
れ、次の候補音節列について同様の判定を行な
い、遷移可能な候補音節列のみが出力バツフア４
６に記憶される。今、一文節音声として「国民は」を入力した場
合、音響処理・比較部２２の処理により候補音節
ラテイスメモリ２４に次表の如き候補音節が時系
列に記憶される。[Table] As is clear from the above example, it can be seen that the correct character strings rise to the top of the candidate strings using this method. Although the above transition matrix is a first-order transition, it can be extended to a second-order transition and even a general M-order transition using the same method. Note that the M-th transition matrix can be created according to the above equation (2), and the syllable designation from the previous candidate syllable (up to M syllables) can be performed using the following equation (4). That is, M-order transition matrix M (X ₁ , X ₂ ,..., X _M , Y)
In the case of expansion, _the previous syllable _candidate sequence is {X ₁ _, _X ₂ , ... _, 1~J(i-M) j ₂ =1~J(i-(M-1)) 〓 j _M =1~J(i-1) Number of combinations: J(i-M)・J(i- (M-1))
...J(i-1) (S _l,j = 0 when l0), the syllable specification is m(Y) = VM(S _iM,j1 , S _i-(M-1),j2 , ..., S _i-1,jM ,Y) ...(4) j ₁ =1~J(i-M) _j2 =1~J(i
−(M−1)) 〓 j _M =1 to J(i−1). Note that if the order of M is increased, the syllables to be generated will be more limited, and the effect will be greater. Next, the operation of creating a candidate syllable string using a transition matrix, which is executed by the candidate string output section 27, will be explained with reference to the process block diagram of creating a candidate string using a transition matrix shown in FIG. Based on the contents of the candidate syllable latex memory 24 that stores the time series of a plurality of candidate syllables output from the acoustic processing/comparison section 22 shown in FIG. Candidate strings are created in order, and the results are temporarily stored in the candidate syllable string buffer 42. The candidate syllable strings stored in the candidate syllable string buffer 42 are stored in the transition matrix reference section 4.
Transition matrix stored in memory 26 at 3: M
(X, Y), the determination unit 44 determines whether the transition is possible or not using the following equation (5), and outputs only possible candidate syllable strings via the candidate syllable string writing unit 45. I will store it in Batsuhua 46. Now, the j-th candidate syllable string is 〓 _j = (a ₁ , a ₂ , ... a _I ) However, when a _i is the i-th syllable number and I is the number of syllables in the string, the transition by the determination unit 44 Matrix M(X,
Candidate sequence negation using Y) is M(0, a ₁ )=0 (i=1) M(a _i-1 , a _i )=0 (i=2~1) M(a _I ,0)= 0 (i=I+1)...This is done when any one of (5) holds true. In this equation (5), candidate syllable strings containing non-transitionable syllable strings in which one of the conditions holds true are excluded, the same judgment is made for the next candidate syllable string, and only transitional candidate syllable strings are output. Batsuhua 4
6 is stored. Now, when "Kokuminwa" is inputted as a single sentence speech, candidate syllables as shown in the following table are stored in chronological order in the candidate syllable latex memory 24 through processing by the acoustic processing/comparison section 22.

【表】このメモリ２４に記憶された音節ラテイスを基
に、信頼度の高い順に候補列が作成され、遷移行
列：Ｍ（Ｘ，Ｙ）を参照して作成された候補列が
遷移可能なもののみが出力され、この例の場合に
は候補音節列が次の如く出力される。[Table] Based on the syllable lattice stored in this memory 24, candidate strings are created in order of reliability, and the candidate strings created by referring to the transition matrix: M(X, Y) are transitionable ones. In this example, the candidate syllable string is output as follows.

【表】遷移行列を参照しない従来方式によれば信頼度
の最も高い候補列として「GOKUPINWA」が出
力されることになるが、本方式によれば、この候
補列の音節の遷移例えば“KU”から“PI”が遷
移不可能であると遷移行列：Ｍ（Ｘ，Ｙ）を用い
て判断され、以後の辞書照合処理から除外され
る。以上の遷移行列は１次遷移であるが、２次遷
移、更には一般的なＭ次遷移まで同じ手法で拡張
することができる。なおＭ次の遷移行列の作成は上述の式(2)に従い
候補音節列の否定は次に示す式(6)によつて行うこ
とが出来る。即ち、Ｍ次遷移行列：Ｍ（X₁，X₂，…，X_M，
Ｙ）への拡張の場合、第ｊ候補列を〓_j＝（a₁，
a₂，…，a_I）とするとＭ（a_i-M，a_i-(M-1)，…，a_i）＝０（ｉ＝１〜Ｉ＋１）…(6) （但しｌ０，ｌ＞Ｉのときa_l＝０）のいずれか一つが成立した場合に否定が成され
る。なお、Ｍの次数を大きくとれば、候補音節列の
限定が強くなり、効果はより大きくなる。以上のようにして、候補列作成時に、各候補列
に対して行列Ｍを参照し、遷移しない音節の組合
せを含む候補列は除外されることになる。上記した認識装置の認識対象は文節に限らず、
音節、単語、文章でもよく、また細分化された単
位は音節に限らず、音韻、単語でもよい。またアルフアベツト等の文字列あるいは
FORTRAN言語等のプログラム言語の文字列で
もよい。一般に認識対象語を構成する細分化した単位の
遷移関係の存在する文字列であれば、本発明を適
用することが出来る。〈効果〉以上の如く、本発明によれば、確度高く正しい
候補列を抽出することが出来るため、正しい文節
等を認識する確度が高くなり、結果的に高次の辞
書照合等の処理量を減少させることが出来ると共
に、認識すべき情報の種類、内容、話題、分野等
に応じて、その都度必要に応て話題、分野別等の
遷移行列を認識装置において作成することが出来
るため、遷移行列を用いた認識処理の効果をより
大きくすることが可能である。なお、本発明において、話題毎の文章や文節に
ついて作成したような同次数の異なる種類の遷移
行列：M_i，M_jから、それらの和をとつて合成す
ることにより、簡単に新しい遷移行列：Ｍ（Ｍ＝
M_i∪M_j）を作成することが出来る。[Table] According to the conventional method that does not refer to the transition matrix, "GOKUPINWA" is output as the candidate string with the highest reliability, but according to this method, the syllable transition of this candidate string, for example "KU", is output. Therefore, it is determined that "PI" cannot be transitioned using the transition matrix: M(X, Y), and is excluded from the subsequent dictionary matching process. Although the above transition matrix is a first-order transition, it can be extended to a second-order transition and even a general M-order transition using the same method. Note that the M-th transition matrix can be created using the above equation (2), and the candidate syllable string can be negated using the following equation (6). That is, M-order transition matrix: M(X ₁ , X ₂ , ..., X _M ,
Y), the j-th candidate column is 〓 _j = (a ₁ ,
a ₂ ,…, a _I ) then M(a _iM , a _i-(M-1) ,…, a _i ) =0 (i=1~I+1)…(6) (However, if l0, l>I If any one of the _following holds true, negation is achieved. Note that if the degree of M is increased, the candidate syllable string becomes more limited, and the effect becomes greater. As described above, when creating a candidate string, the matrix M is referred to for each candidate string, and candidate strings that include combinations of syllables that do not transition are excluded. The recognition target of the above-mentioned recognition device is not limited to phrases.
It may be a syllable, a word, or a sentence, and the subdivided unit is not limited to a syllable, but may be a phoneme or a word. Also, character strings such as alphanumeric characters or
It may also be a character string in a programming language such as FORTRAN. In general, the present invention can be applied to any character string in which there is a transition relationship between subdivided units constituting a recognition target word. <Effects> As described above, according to the present invention, since it is possible to extract correct candidate sequences with high accuracy, the accuracy of recognizing correct phrases, etc. is increased, and as a result, the amount of processing such as high-level dictionary matching can be reduced. At the same time, transition matrices can be created in the recognition device according to the type, content, topic, field, etc. of the information to be recognized. It is possible to further increase the effect of recognition processing using matrices. In addition, in the present invention, a new transition matrix can be easily created by summing and synthesizing transition matrices of different types with the same degree, such as those created for sentences and clauses for each topic: M _i and M _j . M (M=
M _i ∪M _j ) can be created.

[Brief explanation of the drawing]

第１図は１次遷移行列を示す図、第２図は文字
列の遷移を書込んだ遷移行列例を示す図、第３図
は文節文字列の遷移行列例を示す図、第４図は本
発明を実施した認識装置の一実施例の構成を示す
ブロツク図、第５図は本発明に係る遷移行列作成
の処理フロー図、第６図は遷移判定手段の具体例
を示すブロツク図、第７図は遷移行列を用いた認
識処理部の詳細ブロツク図、第８図は遷移行列を
用いた候補音節作成の処理フロー図、第９図は遷
移行列を用いた候補列作成の処理ブロツク図であ
る。１……フロツピーデイスク装置、３……キーボ
ード、４……切換スイツチ手段、５……中央処理
装置（CPU）、８……遷移行列メモリ、９……認
識処理部、１１……遷移行列作成指示フアンクシ
ヨンキー。 Figure 1 shows a linear transition matrix, Figure 2 shows an example of a transition matrix in which character string transitions are written, Figure 3 shows an example of a transition matrix for bunsetsu character strings, and Figure 4 shows an example of a transition matrix in which character string transitions are written. FIG. 5 is a block diagram showing the configuration of an embodiment of a recognition device implementing the present invention. FIG. 5 is a processing flow diagram for creating a transition matrix according to the present invention. FIG. Figure 7 is a detailed block diagram of the recognition processing unit using a transition matrix, Figure 8 is a processing flow diagram of candidate syllable creation using a transition matrix, and Figure 9 is a processing block diagram of candidate string creation using a transition matrix. be. DESCRIPTION OF SYMBOLS 1...Floppy disk device, 3...Keyboard, 4...Switching means, 5...Central processing unit (CPU), 8...Transition matrix memory, 9...Recognition processing section, 11...Transition matrix creation Instruction function key.

Claims

[Scope of Claims] 1. In a recognition device that recognizes one section of information to be recognized using N unit elements that are further subdivided, for a predetermined sequence of unit elements to be recognized, (N+
1) A transition matrix memory that stores information on whether connections between unit elements are possible in the relationship between rows and columns, a means for inputting a character string to be recognized, and a code for the input character string. transition matrix creation means for sequentially reading the encoded information in the storage means character by character, designating a corresponding address in the transition matrix memory, and storing connectable information at that address; , processing means for performing recognition processing based on the transition matrix created by the transition matrix creation means.