JPH08146988A

JPH08146988A - Method and device for speech recognition

Info

Publication number: JPH08146988A
Application number: JP6286850A
Authority: JP
Inventors: Toshihiro Isobe; 俊洋磯部; Noriya Murakami; 憲也村上
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1994-11-21
Filing date: 1994-11-21
Publication date: 1996-06-07

Abstract

PURPOSE: To improve the response from speech input to recognition result output without causing deterioration in recognition performance and to increase the number of categories without causing deterioration in response by using a specified representative pattern as retrieval key information and extracting other representative patterns to be narrowed down from a memory. CONSTITUTION: An input speech inputted from a speech input device 60 is digitized by a feature quantity extractor 61 and analyzed as specified to extract its speech feature quantity. A collator 11 collates the speech feature quantity with all representative patterns in a meaning dictionary 63 to select the representative pattern which matches the speech feature quantity most. Then a representative pattern mutual order table 10 is retrieved by using the representative pattern as retrieval key information and a representative pattern which is stored while related to the storage field of index information on the representative pattern is selected. Then the speech feature quantity of the input speech is collated with category patterns subordinate to the respective representative patterns and the category pattern which has the largest degree of matching is extracted and outputted as a recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識技術に係り、
より詳細には入力音声特徴量との照合に用いられる語彙
辞書内の代表パタンの絞り込み技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition technology,
More specifically, the present invention relates to a technique for narrowing down representative patterns in a vocabulary dictionary used for matching with an input voice feature amount.

【０００２】[0002]

【従来の技術】従来の一般的な音声認識装置の要部構成
図を図６に示す。この種の音声認識装置は、マイクロフ
ォン等の音声入力装置６０から取り込んだ音声を例えば
特徴量抽出器６１においてデジタル化し、所要の分析を
施してその音声特徴量を抽出する。照合器６２では、特
徴量抽出器６１から送られてくる音声特徴量と語彙辞書
６３に格納された語彙（カテゴリ）パタンとの照合を行
い、入力音声の音声特徴量に最も合致するカテゴリを認
識結果として出力する。2. Description of the Related Art FIG. 6 is a block diagram showing the main part of a conventional general speech recognition apparatus. In this type of voice recognition device, a voice taken in from a voice input device 60 such as a microphone is digitized by, for example, a feature quantity extractor 61, and required analysis is performed to extract the voice feature quantity. The collator 62 collates the voice feature amount sent from the feature amount extractor 61 with the vocabulary (category) pattern stored in the vocabulary dictionary 63, and recognizes the category that best matches the voice feature amount of the input voice. Output as a result.

【０００３】このように、音声認識装置では、入力音声
特徴量と語彙辞書６３内の全カテゴリパタンとを照合し
ているので、語彙辞書６３に格納されたカテゴリパタン
が多い場合は、照合処理に多大な時間を要する。そのた
め、語彙辞書６３を作成する際に、予め全カテゴリを類
似パタン毎にグループ化して各々のグループの特徴を表
す代表パタンを定め、照合時には、まず、入力音声特徴
量と全代表パタンとの照合を行い、入力音声特徴量との
合致度に応じて所定個数の代表パタンを絞り込んだ後、
より合致度の高い代表パタンのグループに属する語彙か
ら順次入力音声特徴量との照合を行い、最も合致する語
彙を認識結果として出力しているのが通常である。As described above, since the speech recognition apparatus collates the input speech feature amount with all the category patterns in the vocabulary dictionary 63, when there are many category patterns stored in the vocabulary dictionary 63, the collation processing is performed. It takes a lot of time. Therefore, when creating the vocabulary dictionary 63, all categories are grouped in advance for each similar pattern to define a representative pattern that represents the characteristics of each group, and at the time of matching, first, the input speech feature amount and all the representative patterns are matched. After narrowing down the number of representative patterns according to the degree of matching with the input voice feature amount,
Usually, the vocabulary belonging to the group of the representative pattern having a higher degree of matching is sequentially compared with the input speech feature amount, and the vocabulary having the best match is output as the recognition result.

【０００４】例えば、語彙辞書６３に格納された全カテ
ゴリパタンが、図５に示すように７つの類似パターン群
にグループ化され、各々のグループについて代表パタン
Ｄ１〜Ｄ７が定められている場合について説明する。こ
の例の場合、第１パタンである”アイチ”は複数の「ア
イチ」という学習用音声データの特徴ベクトルの平均
値、分散値をパラメタとして保有し、代表パタンＤ１
は、上記”アイチ”から第４パタンである”アダチ”の
パラメタのグループ平均値，分散値をそのパラメタとし
て保有している。第５パタン以後の各パタンとそれぞれ
のグループの代表パタンＤ２〜Ｄ７との関係も同様であ
る。For example, a case will be described in which all category patterns stored in the vocabulary dictionary 63 are grouped into seven similar pattern groups as shown in FIG. 5, and representative patterns D1 to D7 are defined for each group. To do. In the case of this example, the first pattern “Aichi” holds a plurality of “Aichi” average values and variance values of the feature vectors of the learning voice data as parameters, and the representative pattern D1.
Holds the group average value and variance value of the parameters from "Aichi" to "Adachi", which is the fourth pattern, as its parameters. The same applies to the relationship between each pattern after the fifth pattern and the representative patterns D2 to D7 of each group.

【０００５】ここで、入力音声が”あいち”の場合の照
合器６２の処理手順を図７に従って説明する。なお、Ｓ
は各処理ステップを示す。照合器６２は、まず、”あい
ち”の音声特徴量と語彙辞書６３内の全代表パタンＤ１
〜Ｄ７とを照合し（Ｓ２０１）、合致度に応じたソーテ
ィング（並び替え）処理を行った後（Ｓ２０２）、合致
度の高い上位３個の代表パタンＤ１，Ｄ３，Ｄ５を選出
する（Ｓ２０３）。その後、各代表パタンＤ１，Ｄ３，
Ｄ５に従属するカテゴリパタン「アイチ」〜「アダ
チ」、「アマガサ」〜「アマミオオシマ」、「アキタ」
〜「アキホ」と入力音声”あいち”の音声特徴量との照
合処理を行い（Ｓ２０４）、最も合致度の高いカテゴリ
パタン「アイチ」を抽出して（Ｓ２０５）それを認識結
果として出力する（Ｓ２０６）。The processing procedure of the collator 62 when the input voice is "Aichi" will now be described with reference to FIG. In addition, S
Indicates each processing step. The collator 62 first detects the voice feature amount of "Aichi" and all the representative patterns D1 in the vocabulary dictionary 63.
~ D7 are collated (S201), sorting (sorting) processing is performed according to the degree of matching (S202), and the top three representative patterns D1, D3, D5 having the highest degree of matching are selected (S203). . After that, each representative pattern D1, D3
D5 subordinate category patterns "Aichi" ~ "Adachi", "Amagasa" ~ "Amami Oshima", "Akita"
~ Collation processing between "Akiho" and the voice feature amount of the input voice "Aichi" is performed (S204), the category pattern "Aichi" having the highest degree of matching is extracted (S205), and it is output as a recognition result (S206). ).

【０００６】[0006]

【発明が解決しようとする課題】上述のように、従来
は、入力音声特徴量と語彙辞書６３内の全代表パタンＤ
１〜Ｄ７との照合結果からより合致度の高い上位３個の
代表パタンＤ１，Ｄ３，Ｄ５を選出するとき、その前処
理として、図７のＳ２０２に示すソーティング処理を施
すのが一般的であった。そのため、語彙辞書６３内の代
表パタン数が多い場合にはソーティングのための処理時
間が長期化し、音声入力から認識結果出力までのレスポ
ンスが著しく低下する問題があった。個々の代表パタン
に従属するカテゴリ数を多くすれば代表パタン数が減っ
て上記レスポンスの問題は解消できるが、逆に認識性能
が劣化する。他方、認識性能を高めるためにはカテゴリ
パタン数を増加させることが必要となるが、そうすると
上記問題がより顕著になる。As described above, conventionally, the input voice feature amount and all the representative patterns D in the vocabulary dictionary 63 are conventionally used.
When selecting the top three representative patterns D1, D3, D5 with a higher degree of matching from the matching results with 1 to D7, it is common to perform the sorting process shown in S202 of FIG. 7 as a pre-process. It was Therefore, when the number of representative patterns in the vocabulary dictionary 63 is large, the processing time for sorting becomes long and there is a problem that the response from voice input to output of the recognition result is significantly reduced. If the number of categories subordinate to each representative pattern is increased, the number of representative patterns is reduced and the response problem can be solved, but the recognition performance is deteriorated. On the other hand, in order to improve the recognition performance, it is necessary to increase the number of category patterns, which makes the above problem more remarkable.

【０００７】そこで本発明の課題は、認識性能の劣化を
起こさずに音声入力から認識結果出力までのレスポンス
を向上させ、また上記レスポンスの低下を招くことなく
カテゴリ数を増加させることができる音声認識方法及び
この方法を実現する音声認識装置を提供することにあ
る。Therefore, an object of the present invention is to improve the response from voice input to recognition result output without degrading the recognition performance, and to increase the number of categories without lowering the response. It is an object of the present invention to provide a method and a voice recognition device that realizes this method.

【０００８】[0008]

【課題を解決するための手段】本発明が提供する音声認
識方法は、複数のカテゴリを類似パタン毎にグループ化
して各々のグループの特徴を表す代表パタンを語彙辞書
内に格納しておき、入力音声特徴量と前記複数のカテゴ
リとを照合する際に、該入力音声特徴量と前記語彙辞書
に格納された全代表パタンとを照合して合致度が高い所
定個の代表パタンを絞り込んだ後、これら所定個の代表
パタンのグループに属するカテゴリと入力音声特徴量と
を照合して最も合致度の高いカテゴリを認識結果として
出力する方法において、前記語彙辞書に格納された個々
の代表パタンの特徴量に対する他の全ての代表パタンの
特徴量を相互に照合し、一の代表パタンとの合致度が所
定値を超えた他の代表パタンのインデックス情報を当該
一の代表パタンのインデックス情報と関連付けてメモリ
に記憶しておく第１の段階と、その特徴量が前記入力音
声特徴量と最も合致する代表パタン及びそのインデック
ス情報を特定する第２の段階と、特定した代表パタンの
インデックス情報を検索キー情報として前記メモリから
関連する他の代表パタンのインデックス情報を抽出し、
抽出した代表パタンのインデックス情報に対応する前記
語彙辞書内のカテゴリと前記入力音声特徴量とを照合す
る第３の段階と、を含むことを特徴とする。代表パタン
のインデックス情報は、例えば語彙辞書内の代表パタン
のアドレスや代表パタンの特徴量データ等であり、一義
的に当該代表パタンを一義的に導出できるものであれば
その種類は問わない。According to a speech recognition method provided by the present invention, a plurality of categories are grouped for each similar pattern, representative patterns representing the characteristics of each group are stored in a vocabulary dictionary, and input. When collating the voice feature amount with the plurality of categories, after narrowing down a predetermined number of representative patterns having a high degree of matching by collating the input voice feature amount with all the representative patterns stored in the vocabulary dictionary, In the method of comparing the category belonging to the group of these predetermined representative patterns with the input speech feature amount and outputting the category with the highest degree of matching as the recognition result, the feature amount of each representative pattern stored in the vocabulary dictionary. Of the other representative patterns with respect to each other, the index information of the other representative patterns whose matching degree with the one representative pattern exceeds a predetermined value is A first step of storing in a memory in association with index information; a second step of identifying a representative pattern whose feature amount best matches the input voice feature amount and its index information; Using the index information as search key information, the index information of other representative patterns related to the memory is extracted,
The third step of collating the category in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount is included. The index information of the representative pattern is, for example, the address of the representative pattern in the vocabulary dictionary, the feature amount data of the representative pattern, and the like, and any type can be used as long as the representative pattern can be uniquely derived.

【０００９】なお、前記第１の段階は、例えば一の代表
パタンを疑似音声特徴量としたときの他の代表パタンと
の合致度を導出するステップと、導出した合致度が所定
値を超えた他の代表パタンのインデックス情報を合致度
の高い順にソートして当該一の代表パタンのインデック
ス情報と共に前記メモリに記憶するステップとを有し、
前記第３の段階は、前記メモリから関連する他の代表パ
タンのインデックス情報を前記特定した代表パタンとの
合致度の高い順に抽出するステップと、それぞれ抽出し
た代表パタンのインデックス情報に対応する前記語彙辞
書内のカテゴリと前記入力音声特徴量とを順次照合する
ステップとを有する。In the first step, for example, a step of deriving a degree of matching with another representative pattern when one representative pattern is used as a pseudo voice feature amount, and the derived degree of matching exceeds a predetermined value. Sorting the index information of the other representative pattern in descending order of matching, and storing the index information of the one representative pattern in the memory,
In the third step, a step of extracting index information of other related representative patterns from the memory in descending order of matching degree with the specified representative pattern, and the vocabulary corresponding to the extracted representative pattern index information, respectively. And a step of sequentially matching the category in the dictionary with the input voice feature amount.

【００１０】また、本発明が提供する音声認識装置は、
認識対象となる入力音声の特徴量を抽出する入力音声特
徴量抽出手段、複数のカテゴリを類似パタン毎にグルー
プ化して各々のグループの特徴を表す代表パタンを格納
した語彙辞書、及び前記抽出した入力音声特徴量と前記
語彙辞書内の代表パタンとカテゴリとを段階的に照合し
て最も合致度の高いカテゴリを認識結果として出力する
照合器と、前記語彙辞書に格納された個々の代表パタン
の特徴量に対する他の全ての代表パタンの特徴量の相互
照合結果に基づき、一の代表パタンとの合致度が所定値
を超えた他の代表パタンのインデックス情報を合致度順
に当該一の代表パタンのインデックス情報と関連付けて
記憶したメモリとを有している。この構成において、前
記照合器は、前記入力音声特徴量と前記語彙辞書に格納
された代表パタンの特徴量とを照合して最も合致する代
表パタン及びそのインデックス情報を特定する第１の手
段と、特定した代表パタンのインデックス情報を検索キ
ー情報として前記メモリから関連する他の代表パタンの
インデックス情報を抽出する第２の手段と、抽出した代
表パタンのインデックス情報に対応する前記語彙辞書内
のカテゴリと前記入力音声特徴量とを照合する第３の手
段と、を含む構成であることを特徴とする。Further, the voice recognition device provided by the present invention is
Input voice feature amount extraction means for extracting the feature amount of the input voice to be recognized, a vocabulary dictionary storing representative patterns representing the features of each group by grouping a plurality of categories into similar patterns, and the extracted input A collator that collates the voice feature quantity with the representative patterns and categories in the vocabulary dictionary stepwise and outputs the category with the highest degree of coincidence as a recognition result, and the characteristics of the individual representative patterns stored in the vocabulary dictionary. Based on the mutual matching results of the feature quantities of all other representative patterns with respect to the amount, the index information of the other representative patterns whose matching degree with one representative pattern exceeds a predetermined value is the index of the one representative pattern in order of matching degree. It has a memory stored in association with information. In this configuration, the collator collates the input speech feature amount with the feature amount of the representative pattern stored in the vocabulary dictionary, and specifies the best matching representative pattern and its index information. Second means for extracting index information of another related representative pattern from the memory using the index information of the specified representative pattern as search key information, and a category in the vocabulary dictionary corresponding to the extracted index information of the representative pattern A third means for collating with the input voice feature amount is included.

【００１１】[0011]

【作用】本発明では、予め、語彙辞書に格納された個々
の代表パタンの特徴量に対する他の全ての代表パタンの
特徴量を相互に照合し、一の代表パタンとの合致度が所
定値を超えた他の代表パタンのインデックス情報を当該
一の代表パタンのインデックス情報と関連付けてメモリ
に記憶しておく。その際、好ましくは、一の代表パタン
を疑似音声特徴量としたときの他の代表パタンとの合致
度を導出し、導出した合致度が所定値を超えた他の代表
パタンのインデックス情報を合致度の高い順にソートし
て当該一の代表パタンのインデックス情報と共に記憶し
ておく。このようにすることにより、従来のソーティン
グ処理と同様の結果をごく短時間で得られる利点があ
る。According to the present invention, the feature amounts of all other representative patterns with respect to the feature amounts of the individual representative patterns stored in the vocabulary dictionary are collated with each other in advance, and the degree of matching with one representative pattern has a predetermined value. The index information of the other representative patterns that have been exceeded is stored in the memory in association with the index information of the one representative pattern. At that time, preferably, the degree of matching with one of the other representative patterns when one representative pattern is used as the pseudo voice feature amount is derived, and the index information of the other representative pattern whose derived degree of matching exceeds a predetermined value is matched. It is sorted in descending order of frequency and stored together with the index information of the one representative pattern. By doing so, there is an advantage that the same result as the conventional sorting process can be obtained in a very short time.

【００１２】照合に際しては、認識対象となる入力音声
の特徴量を入力音声特徴量抽出手段で抽出し、この入力
音声特徴量と語彙辞書とを照合器の第１の手段で照合し
て最も合致する一つの代表パタン及びそのインデックス
情報を特定する。その後、特定した代表パタンを検索キ
ー情報として照合器の第２の手段がメモリを検索し、関
連する他の代表パタンのインデックス情報を抽出する。
上述のように関連する他の代表パタンが合致度順に記憶
されているときは、当該順にインデックス情報を抽出す
る。照合器の第３の手段は、抽出した代表パタンのイン
デックス情報から一義的に導出される語彙辞書内の代表
パタンのグループに従属するカテゴリと前記入力音声特
徴量とを照合する。これにより最も合致度の高い一つの
カテゴリが認識結果として出力される。At the time of collation, the feature quantity of the input speech to be recognized is extracted by the input speech feature quantity extraction means, and the input speech feature quantity and the vocabulary dictionary are collated by the first means of the collator to obtain the best match. One representative pattern and its index information are specified. After that, the second means of the collator searches the memory using the specified representative pattern as the search key information, and extracts the index information of other related representative patterns.
As described above, when other related representative patterns are stored in the matching degree order, the index information is extracted in that order. The third means of the collator collates a category subordinate to the group of representative patterns in the vocabulary dictionary uniquely derived from the extracted representative pattern index information with the input speech feature amount. As a result, one category with the highest degree of matching is output as the recognition result.

【００１３】[0013]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。図１は、本発明の一実施例に係る音声認識
装置の要部構成図であり、従来装置である図６と同一機
能の要素については同一符号を付してある。また、便宜
上、語彙辞書も図５に示す内容のものとして説明する。
この実施例では、メモリ内に代表パタン相互順位テーブ
ル１０を設けるとともに、それに合わせて照合器１１の
構成を一部変更したものである。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a configuration diagram of a main part of a voice recognition device according to an embodiment of the present invention, in which elements having the same functions as those of the conventional device shown in FIG. Also, for convenience, the vocabulary dictionary will be described with the contents shown in FIG.
In this embodiment, the representative pattern mutual rank table 10 is provided in the memory, and the configuration of the collator 11 is partially changed accordingly.

【００１４】この代表パタン相互順位テーブル１０は、
予め音声認識処理の前段階として、以下の手順によって
作成しておく。The representative pattern mutual rank table 10 is
It is created in advance by the following procedure as a pre-stage of the voice recognition process.

【００１５】（１）個々の代表パタン、例えば第１グル
ープの代表パタンＤ１を疑似音声特徴量とし、この疑似
音声特徴量と他の全代表パタンＤ２〜Ｄ７との合致度を
計算する。（２）次に計算の結果、合致度が所定値を超えたものを
選出する。この所定値は認識対象語彙の種類あるいは用
途に応じて任意に調整して良い。なお、この選出ステッ
プは場合によっては省略することもできる。（３）次に、上記計算結果あるいは選出結果を合致度の
高い順にソートし、このソート結果を代表パタンＤ１が
実際の入力音声特徴量との合致度が最大となる場合のグ
ループ内のパタンを照合するべき代表パタンとして、代
表パタンＤ１のインデックスの参照欄に格納する。（４）上記処理を他の代表パタンＤ２〜Ｄ７について繰
り返す。(1) Each representative pattern, for example, the representative pattern D1 of the first group is set as a pseudo voice feature amount, and the degree of matching between this pseudo voice feature amount and all other representative patterns D2 to D7 is calculated. (2) Next, as a result of the calculation, the one whose matching degree exceeds a predetermined value is selected. This predetermined value may be arbitrarily adjusted according to the type or use of the recognition target vocabulary. It should be noted that this selection step can be omitted in some cases. (3) Next, the above calculation results or selection results are sorted in descending order of matching degree, and a pattern in the group in which the matching degree of the representative pattern D1 with the actual input voice feature amount is maximum is sorted. The representative pattern to be collated is stored in the reference column of the index of the representative pattern D1. (4) The above process is repeated for the other representative patterns D2 to D7.

【００１６】このようにして作成された代表パタン相互
順位テーブル１０の内容を図２に示す。図２において、
符号２０は入力音声特徴量との合致度が最大となる代表
パタンのインデックス情報の格納欄、２１はこの欄２０
に関連付けられた欄であって、従属するパタンを照合す
るべき上位Ｐ（自然数）個の代表パタンのインデックス
情報を合致度順に格納するために設けられたものであ
る。FIG. 2 shows the contents of the representative pattern mutual rank table 10 thus created. In FIG.
Reference numeral 20 is a storage field for storing index information of the representative pattern that maximizes the degree of matching with the input voice feature amount, and 21 is for this field 20.
And is provided for storing index information of upper P (natural number) representative patterns for which subordinate patterns are to be collated, in order of matching degree.

【００１７】図示の例の上段内容は、第１グループの代
表パタンＤ１との合致度が所定値を超えた他の代表パタ
ンが２つあり、しかもこれら代表パタンのうち、第３グ
ループの代表パタンＤ３が第５グループの代表パタンＤ
５よりも合致度が高かったことを示している。また、次
段内容は、第２グループの代表パタンＤ２を疑似音声特
徴量Ｆとした場合に、合致度が所定値を超えた代表パタ
ンが１つしかなかった（代表パタンＤ３）ことを示して
いる。つまり、この代表パタン相互順位テーブル１０に
よれば、一の代表パタンを特定できれば、その特定され
た代表パタンにより合致する他の代表パタンが直ちに特
定できることになる。なお、この図では、便宜上、図５
に示した各代表パタンの識別情報Ｄ１〜Ｄ７をインデッ
クス情報として用いているが、このインデックス情報
は、これら識別情報に限定されないのは上述のとおりで
ある。In the upper part of the illustrated example, there are two other representative patterns whose degree of coincidence with the representative pattern D1 of the first group exceeds a predetermined value, and among these representative patterns, the representative pattern of the third group. D3 is the representative pattern D of the 5th group
It shows that the degree of agreement was higher than 5. Further, the content of the next stage shows that, when the representative pattern D2 of the second group is set as the pseudo voice feature amount F, there is only one representative pattern whose matching degree exceeds the predetermined value (representative pattern D3). There is. That is, according to the representative pattern mutual ranking table 10, if one representative pattern can be specified, another representative pattern that matches the specified representative pattern can be immediately specified. In addition, in FIG.
Although the identification information D1 to D7 of the respective representative patterns shown in (4) are used as the index information, the index information is not limited to these identification information as described above.

【００１８】次に、上記構成の本実施例の音声認識装置
において、入力音声が”あいち”の場合の認識処理内容
を、図３をも参照して説明する。なお、図３においてＳ
は各処理ステップを示す。Next, with reference to FIG. 3, description will be given of the contents of the recognition processing when the input voice is "Aichi" in the voice recognition apparatus of the present embodiment having the above-mentioned configuration. In addition, in FIG.
Indicates each processing step.

【００１９】図１に示した音声入力装置６０から取り込
んだ”あいち”の音声を特徴量抽出器６１においてデジ
タル化し、所要の分析を施してその音声特徴量を抽出す
る。ここまでの処理は従来装置と同様となる。本実施例
の照合器１１では、図３に示すように、”あいち”の音
声特徴量と語彙辞書６３内の全代表パタンＤ１〜Ｄ７と
を照合し（Ｓ１０１）、”あいち”の音声特徴量に最も
合致する代表パタンＤ１を選出する（Ｓ１０２：第１の
手段）。この処理は公知のパタン認識技術により容易に
実現することができる。The voice of "Aichi" taken in from the voice input device 60 shown in FIG. 1 is digitized by the feature quantity extractor 61, and the required analysis is performed to extract the voice feature quantity. The processing up to this point is similar to that of the conventional apparatus. As shown in FIG. 3, the collator 11 of the present embodiment collates the speech feature amount of "Aichi" with all the representative patterns D1 to D7 in the vocabulary dictionary 63 (S101), and the speech feature amount of "Aichi". A representative pattern D1 that most closely matches with is selected (S102: first means). This processing can be easily realized by a known pattern recognition technique.

【００２０】その後、この代表パタンＤ１を検索キー情
報として代表パタン相互順位テーブル１０を検索し、当
該代表パタンＤ１のインデックス情報の格納欄２０と関
連付けられて格納された代表パタンＤ３，Ｄ５を選出す
る（Ｓ１０３：第２の手段）。その後、各代表パタンＤ
１，Ｄ３，Ｄ５に従属するカテゴリパタン「アイチ」〜
「アダチ」、「アマガサ」〜「アマミオオシマ」、「ア
キタ」〜「アキホ」と入力音声”あいち”の音声特徴量
との照合処理を行い（Ｓ１０４：第３の手段）、最も合
致度の高いカテゴリパタン「アイチ」を抽出して（Ｓ１
０５）それを認識結果として出力する（Ｓ１０６）。After that, the representative pattern mutual rank table 10 is searched by using the representative pattern D1 as search key information, and the representative patterns D3 and D5 stored in association with the storage column 20 of the index information of the representative pattern D1 are selected. (S103: second means). After that, each representative pattern D
Category pattern "Aichi" subordinate to 1, D3, D5 ~
The matching process is performed with "Adachi", "Amagasa" to "Amami Oshima", "Akita" to "Akiho" and the voice feature amount of the input voice "Aichi" (S104: third means), and the highest degree of matching is obtained. Extract the category pattern "Aichi" (S1
05) It is output as a recognition result (S106).

【００２１】図４は、単語認識実験における本実施例に
よる上位Ｐ個の代表パタン選出結果と、従来法のソーテ
ィング処理における上位Ｐ’個の代表パタン選出結果と
の一致度を示す実測図である。この場合の辞書構成は、
語彙のパタン数ｍ＝２，０００、代表パタン数Ｑ＝６４
となっている。図中、縦軸は一致度［％］を示し、横軸
は代表パタンを選出した後に照合を行うべき語彙のパタ
ン数ｋ［個］を示している。FIG. 4 is an actual measurement chart showing the degree of coincidence between the result of selecting the top P representative patterns according to this embodiment in the word recognition experiment and the result of selecting the top P'representative patterns in the conventional sorting process. . The dictionary structure in this case is
Number of vocabulary patterns m = 2,000, number of representative patterns Q = 64
Has become. In the figure, the vertical axis represents the degree of coincidence [%], and the horizontal axis represents the number k [pieces] of vocabulary patterns to be checked after the representative pattern is selected.

【００２２】図４に示されるように、照合を行うべき語
彙のパタン数ｋがほぼ６００になるように上位代表パタ
ンＰ個を選出する場合、従来法と本実施例の手法による
上位代表パタンの一致度は約７５％であった。また、こ
のときの認識性能は、両手法による音声認識装置ともに
８９％であり、全く差はなかった。As shown in FIG. 4, when P representative upper representative patterns are selected so that the number k of patterns of the vocabulary to be collated is almost 600, the upper representative patterns by the conventional method and the method of this embodiment are selected. The degree of agreement was about 75%. In addition, the recognition performance at this time was 89% in both of the speech recognition devices by both methods, and there was no difference.

【００２３】このように、本実施例の音声認識装置で
は、従来のソーティング処理に代えて、代表パタン相互
順位テーブル１０の検索を行うことにより入力音声特徴
量との合致度の高い上位Ｐ個の代表パタンを選出してい
るので、認識性能の劣化を起こすことなく照合処理のた
めの演算量を削減することができる。しかも上述のよう
に代表パタン相互順位テーブル１０を語彙辞書６３のイ
ンデックスとして活用することができるので、語彙辞書
６３内に多量の語彙及び代表パタンを格納しても音声認
識処理自体のレスポンスは変わらず、認識性能の大幅な
向上も期待できる。更に、事後的に語彙辞書６３に語彙
及び代表パタンを追加格納しても代表パタン相互順位テ
ーブル１０の一部変更のみで対処できるので汎用性もあ
り、運用コスト的にも有利となる。As described above, in the speech recognition apparatus of this embodiment, the representative pattern mutual rank table 10 is searched in place of the conventional sorting processing, so that the P high-ranking items having a high degree of matching with the input speech feature amount are searched. Since the representative pattern is selected, it is possible to reduce the calculation amount for the matching process without degrading the recognition performance. Moreover, since the representative pattern mutual rank table 10 can be used as an index of the vocabulary dictionary 63 as described above, the response of the voice recognition process itself does not change even if a large number of vocabulary and representative patterns are stored in the vocabulary dictionary 63. The recognition performance can be expected to improve significantly. Further, even if the vocabulary and the representative pattern are additionally stored in the vocabulary dictionary 63 afterwards, it can be dealt with only by partially changing the representative pattern mutual rank table 10, so that there is versatility, which is advantageous in terms of operation cost.

【００２４】[0024]

【効果】以上の説明から明らかなように、本発明の音声
認識方法によれば、入力音声特徴量と最も合致する一の
代表パタンを特定した後、この特定した代表パタンを検
索キー情報としてメモリから絞込対象となる他の代表パ
タンが抽出されるので、認識性能の劣化を起こさずに音
声入力から認識結果出力までのレスポンスを向上させる
ことができる。また上記レスポンスの低下を招くことな
くカテゴリパタン数を増加させることができ、認識性能
の向上に寄与することもできる。As is apparent from the above description, according to the voice recognition method of the present invention, after the one representative pattern that most matches the input voice feature amount is specified, the specified representative pattern is stored as search key information in the memory. Since other representative patterns to be narrowed down are extracted from, the response from voice input to recognition result output can be improved without degrading the recognition performance. In addition, the number of category patterns can be increased without lowering the response, which can also contribute to improvement in recognition performance.

【００２５】また、本発明の音声認識装置によれば、照
合器の構成を変えることなく、語彙辞書に格納されたカ
テゴリパタンや代表パタン数に応じてメモリに記憶すべ
き情報を任意に変えることができるので、音声入力から
認識結果出力までのレスポンスの向上、認識性能の向上
が図れることはもとより、カテゴリパタン数を用途に応
じて任意に変更させることができ、より汎用性のある装
置構成を実現することができる。Further, according to the speech recognition apparatus of the present invention, the information to be stored in the memory can be arbitrarily changed according to the number of category patterns and the number of representative patterns stored in the vocabulary dictionary without changing the structure of the collator. Since it is possible to improve the response from voice input to recognition result output and improve the recognition performance, the number of category patterns can be arbitrarily changed according to the application, and a more versatile device configuration can be achieved. Can be realized.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声認識装置の要部構
成図。FIG. 1 is a configuration diagram of a main part of a voice recognition device according to an embodiment of the present invention.

【図２】本実施例の音声認識装置が有する代表パタン相
互順位テーブルの内容説明図。FIG. 2 is an explanatory diagram of contents of a representative pattern mutual rank table included in the voice recognition device according to the present embodiment.

【図３】本実施例による音声認識処理の概要手順図。FIG. 3 is a schematic procedure diagram of voice recognition processing according to the present embodiment.

【図４】単語認識実験における本実施例による代表パタ
ン選出結果と、従来法のソーティング処理における代表
パタン選出結果との一致度を示す実測図。FIG. 4 is an actual measurement diagram showing the degree of coincidence between the representative pattern selection result according to the present example in the word recognition experiment and the representative pattern selection result in the conventional sorting process.

【図５】この種の音声認識装置で用いられる語彙辞書の
構成例を示す説明図。FIG. 5 is an explanatory diagram showing a configuration example of a vocabulary dictionary used in this type of voice recognition device.

【図６】従来の音声認識装置の要部構成図。FIG. 6 is a configuration diagram of a main part of a conventional voice recognition device.

【図７】従来の音声認識処理の概要手順図。FIG. 7 is a schematic procedure diagram of conventional speech recognition processing.

[Explanation of symbols]

１０代表パタン相互順位テーブル１１，６２照合器６０マイクロフォン等の音声入力装置６１入力音声の特徴量を抽出する特徴量抽出器６３複数の代表パタンを格納した語彙辞書 10 mutual pattern mutual rank table 11,62 collator 60 voice input device such as microphone 61 feature amount extractor 63 for extracting feature amount of input voice 63 vocabulary dictionary storing a plurality of representative patterns

Claims

[Claims]

1. A plurality of vocabularies are grouped for each similar pattern, representative patterns representing the characteristics of each group are stored in a vocabulary dictionary, and when an input speech feature amount and the plurality of vocabularies are collated. , The input speech feature amount is compared with all the representative patterns stored in the vocabulary dictionary to narrow down a predetermined number of representative patterns having a high degree of matching, and the vocabulary belonging to the group of these predetermined representative patterns and the input In a voice recognition method of collating with a voice feature amount and outputting the vocabulary with the highest degree of matching as a recognition result, the feature amounts of all other representative patterns with respect to the feature amounts of the individual representative patterns stored in the vocabulary dictionary are Match each other,
A first step of storing index information of another representative pattern whose degree of matching with one representative pattern exceeds a predetermined value in a memory in association with the index information of the one representative pattern, and A second step of identifying a representative pattern that best matches the input voice feature amount and its index information, and extracting index information of other relevant representative patterns from the memory using the index information of the identified representative pattern as search key information. And a third step of collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount, the speech recognition method.

2. The first step is a step of deriving a degree of agreement with another representative pattern when one representative pattern is used as a pseudo voice feature amount, and a step of deriving the degree of agreement that has been derived exceeds a predetermined value. Sorting the representative pattern index information in descending order of the degree of matching and storing the same in the memory together with the index information of the one representative pattern, and the third step includes another related representative from the memory. Extracting the index information of the patterns in descending order of the degree of matching with the specified representative pattern, and sequentially collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount. The method according to claim 1, further comprising:

3. An input voice feature amount extraction means for extracting a feature amount of an input voice to be recognized, a vocabulary dictionary storing a representative pattern representing a feature of each group by grouping a plurality of vocabularies into similar patterns, And a collator that collates the extracted input speech feature amount with the representative pattern and the vocabulary in the vocabulary dictionary stepwise and outputs the vocabulary having the highest degree of coincidence as a recognition result. Index information of other representative patterns whose matching degree with one representative pattern exceeds a predetermined value, based on the mutual collation result of the characteristic amounts of all other representative patterns with respect to the characteristic amounts of each representative pattern stored in the dictionary. Is provided in association with the index information of the one representative pattern in the order of the degree of matching, and the collator is configured to store the input voice feature quantity and the representative pattern stored in the vocabulary dictionary. Means for identifying the most representative representative pattern and its index information by collating with the feature amount of the search target and index information of the other representative pattern related from the memory using the index information of the identified representative pattern as search key information. It is configured to include a second means for extracting information, and a third means for collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount. And a voice recognition device.