JPH08146988A - Method and device for speech recognition - Google Patents

Method and device for speech recognition

Info

Publication number
JPH08146988A
JPH08146988A JP6286850A JP28685094A JPH08146988A JP H08146988 A JPH08146988 A JP H08146988A JP 6286850 A JP6286850 A JP 6286850A JP 28685094 A JP28685094 A JP 28685094A JP H08146988 A JPH08146988 A JP H08146988A
Authority
JP
Japan
Prior art keywords
representative
representative pattern
pattern
patterns
index information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP6286850A
Other languages
Japanese (ja)
Inventor
Toshihiro Isobe
俊洋 磯部
Noriya Murakami
憲也 村上
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP6286850A priority Critical patent/JPH08146988A/en
Publication of JPH08146988A publication Critical patent/JPH08146988A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE: To improve the response from speech input to recognition result output without causing deterioration in recognition performance and to increase the number of categories without causing deterioration in response by using a specified representative pattern as retrieval key information and extracting other representative patterns to be narrowed down from a memory. CONSTITUTION: An input speech inputted from a speech input device 60 is digitized by a feature quantity extractor 61 and analyzed as specified to extract its speech feature quantity. A collator 11 collates the speech feature quantity with all representative patterns in a meaning dictionary 63 to select the representative pattern which matches the speech feature quantity most. Then a representative pattern mutual order table 10 is retrieved by using the representative pattern as retrieval key information and a representative pattern which is stored while related to the storage field of index information on the representative pattern is selected. Then the speech feature quantity of the input speech is collated with category patterns subordinate to the respective representative patterns and the category pattern which has the largest degree of matching is extracted and outputted as a recognition result.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、音声認識技術に係り、
より詳細には入力音声特徴量との照合に用いられる語彙
辞書内の代表パタンの絞り込み技術に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition technology,
More specifically, the present invention relates to a technique for narrowing down representative patterns in a vocabulary dictionary used for matching with an input voice feature amount.

【0002】[0002]

【従来の技術】従来の一般的な音声認識装置の要部構成
図を図6に示す。この種の音声認識装置は、マイクロフ
ォン等の音声入力装置60から取り込んだ音声を例えば
特徴量抽出器61においてデジタル化し、所要の分析を
施してその音声特徴量を抽出する。照合器62では、特
徴量抽出器61から送られてくる音声特徴量と語彙辞書
63に格納された語彙(カテゴリ)パタンとの照合を行
い、入力音声の音声特徴量に最も合致するカテゴリを認
識結果として出力する。
2. Description of the Related Art FIG. 6 is a block diagram showing the main part of a conventional general speech recognition apparatus. In this type of voice recognition device, a voice taken in from a voice input device 60 such as a microphone is digitized by, for example, a feature quantity extractor 61, and required analysis is performed to extract the voice feature quantity. The collator 62 collates the voice feature amount sent from the feature amount extractor 61 with the vocabulary (category) pattern stored in the vocabulary dictionary 63, and recognizes the category that best matches the voice feature amount of the input voice. Output as a result.

【0003】このように、音声認識装置では、入力音声
特徴量と語彙辞書63内の全カテゴリパタンとを照合し
ているので、語彙辞書63に格納されたカテゴリパタン
が多い場合は、照合処理に多大な時間を要する。そのた
め、語彙辞書63を作成する際に、予め全カテゴリを類
似パタン毎にグループ化して各々のグループの特徴を表
す代表パタンを定め、照合時には、まず、入力音声特徴
量と全代表パタンとの照合を行い、入力音声特徴量との
合致度に応じて所定個数の代表パタンを絞り込んだ後、
より合致度の高い代表パタンのグループに属する語彙か
ら順次入力音声特徴量との照合を行い、最も合致する語
彙を認識結果として出力しているのが通常である。
As described above, since the speech recognition apparatus collates the input speech feature amount with all the category patterns in the vocabulary dictionary 63, when there are many category patterns stored in the vocabulary dictionary 63, the collation processing is performed. It takes a lot of time. Therefore, when creating the vocabulary dictionary 63, all categories are grouped in advance for each similar pattern to define a representative pattern that represents the characteristics of each group, and at the time of matching, first, the input speech feature amount and all the representative patterns are matched. After narrowing down the number of representative patterns according to the degree of matching with the input voice feature amount,
Usually, the vocabulary belonging to the group of the representative pattern having a higher degree of matching is sequentially compared with the input speech feature amount, and the vocabulary having the best match is output as the recognition result.

【0004】例えば、語彙辞書63に格納された全カテ
ゴリパタンが、図5に示すように7つの類似パターン群
にグループ化され、各々のグループについて代表パタン
D1〜D7が定められている場合について説明する。こ
の例の場合、第1パタンである”アイチ”は複数の「ア
イチ」という学習用音声データの特徴ベクトルの平均
値、分散値をパラメタとして保有し、代表パタンD1
は、上記”アイチ”から第4パタンである”アダチ”の
パラメタのグループ平均値,分散値をそのパラメタとし
て保有している。第5パタン以後の各パタンとそれぞれ
のグループの代表パタンD2〜D7との関係も同様であ
る。
For example, a case will be described in which all category patterns stored in the vocabulary dictionary 63 are grouped into seven similar pattern groups as shown in FIG. 5, and representative patterns D1 to D7 are defined for each group. To do. In the case of this example, the first pattern “Aichi” holds a plurality of “Aichi” average values and variance values of the feature vectors of the learning voice data as parameters, and the representative pattern D1.
Holds the group average value and variance value of the parameters from "Aichi" to "Adachi", which is the fourth pattern, as its parameters. The same applies to the relationship between each pattern after the fifth pattern and the representative patterns D2 to D7 of each group.

【0005】ここで、入力音声が”あいち”の場合の照
合器62の処理手順を図7に従って説明する。なお、S
は各処理ステップを示す。照合器62は、まず、”あい
ち”の音声特徴量と語彙辞書63内の全代表パタンD1
〜D7とを照合し(S201)、合致度に応じたソーテ
ィング(並び替え)処理を行った後(S202)、合致
度の高い上位3個の代表パタンD1,D3,D5を選出
する(S203)。その後、各代表パタンD1,D3,
D5に従属するカテゴリパタン「アイチ」〜「アダ
チ」、「アマガサ」〜「アマミオオシマ」、「アキタ」
〜「アキホ」と入力音声”あいち”の音声特徴量との照
合処理を行い(S204)、最も合致度の高いカテゴリ
パタン「アイチ」を抽出して(S205)それを認識結
果として出力する(S206)。
The processing procedure of the collator 62 when the input voice is "Aichi" will now be described with reference to FIG. In addition, S
Indicates each processing step. The collator 62 first detects the voice feature amount of "Aichi" and all the representative patterns D1 in the vocabulary dictionary 63.
~ D7 are collated (S201), sorting (sorting) processing is performed according to the degree of matching (S202), and the top three representative patterns D1, D3, D5 having the highest degree of matching are selected (S203). . After that, each representative pattern D1, D3
D5 subordinate category patterns "Aichi" ~ "Adachi", "Amagasa" ~ "Amami Oshima", "Akita"
~ Collation processing between "Akiho" and the voice feature amount of the input voice "Aichi" is performed (S204), the category pattern "Aichi" having the highest degree of matching is extracted (S205), and it is output as a recognition result (S206). ).

【0006】[0006]

【発明が解決しようとする課題】上述のように、従来
は、入力音声特徴量と語彙辞書63内の全代表パタンD
1〜D7との照合結果からより合致度の高い上位3個の
代表パタンD1,D3,D5を選出するとき、その前処
理として、図7のS202に示すソーティング処理を施
すのが一般的であった。そのため、語彙辞書63内の代
表パタン数が多い場合にはソーティングのための処理時
間が長期化し、音声入力から認識結果出力までのレスポ
ンスが著しく低下する問題があった。個々の代表パタン
に従属するカテゴリ数を多くすれば代表パタン数が減っ
て上記レスポンスの問題は解消できるが、逆に認識性能
が劣化する。他方、認識性能を高めるためにはカテゴリ
パタン数を増加させることが必要となるが、そうすると
上記問題がより顕著になる。
As described above, conventionally, the input voice feature amount and all the representative patterns D in the vocabulary dictionary 63 are conventionally used.
When selecting the top three representative patterns D1, D3, D5 with a higher degree of matching from the matching results with 1 to D7, it is common to perform the sorting process shown in S202 of FIG. 7 as a pre-process. It was Therefore, when the number of representative patterns in the vocabulary dictionary 63 is large, the processing time for sorting becomes long and there is a problem that the response from voice input to output of the recognition result is significantly reduced. If the number of categories subordinate to each representative pattern is increased, the number of representative patterns is reduced and the response problem can be solved, but the recognition performance is deteriorated. On the other hand, in order to improve the recognition performance, it is necessary to increase the number of category patterns, which makes the above problem more remarkable.

【0007】そこで本発明の課題は、認識性能の劣化を
起こさずに音声入力から認識結果出力までのレスポンス
を向上させ、また上記レスポンスの低下を招くことなく
カテゴリ数を増加させることができる音声認識方法及び
この方法を実現する音声認識装置を提供することにあ
る。
Therefore, an object of the present invention is to improve the response from voice input to recognition result output without degrading the recognition performance, and to increase the number of categories without lowering the response. It is an object of the present invention to provide a method and a voice recognition device that realizes this method.

【0008】[0008]

【課題を解決するための手段】本発明が提供する音声認
識方法は、複数のカテゴリを類似パタン毎にグループ化
して各々のグループの特徴を表す代表パタンを語彙辞書
内に格納しておき、入力音声特徴量と前記複数のカテゴ
リとを照合する際に、該入力音声特徴量と前記語彙辞書
に格納された全代表パタンとを照合して合致度が高い所
定個の代表パタンを絞り込んだ後、これら所定個の代表
パタンのグループに属するカテゴリと入力音声特徴量と
を照合して最も合致度の高いカテゴリを認識結果として
出力する方法において、前記語彙辞書に格納された個々
の代表パタンの特徴量に対する他の全ての代表パタンの
特徴量を相互に照合し、一の代表パタンとの合致度が所
定値を超えた他の代表パタンのインデックス情報を当該
一の代表パタンのインデックス情報と関連付けてメモリ
に記憶しておく第1の段階と、その特徴量が前記入力音
声特徴量と最も合致する代表パタン及びそのインデック
ス情報を特定する第2の段階と、特定した代表パタンの
インデックス情報を検索キー情報として前記メモリから
関連する他の代表パタンのインデックス情報を抽出し、
抽出した代表パタンのインデックス情報に対応する前記
語彙辞書内のカテゴリと前記入力音声特徴量とを照合す
る第3の段階と、を含むことを特徴とする。代表パタン
のインデックス情報は、例えば語彙辞書内の代表パタン
のアドレスや代表パタンの特徴量データ等であり、一義
的に当該代表パタンを一義的に導出できるものであれば
その種類は問わない。
According to a speech recognition method provided by the present invention, a plurality of categories are grouped for each similar pattern, representative patterns representing the characteristics of each group are stored in a vocabulary dictionary, and input. When collating the voice feature amount with the plurality of categories, after narrowing down a predetermined number of representative patterns having a high degree of matching by collating the input voice feature amount with all the representative patterns stored in the vocabulary dictionary, In the method of comparing the category belonging to the group of these predetermined representative patterns with the input speech feature amount and outputting the category with the highest degree of matching as the recognition result, the feature amount of each representative pattern stored in the vocabulary dictionary. Of the other representative patterns with respect to each other, the index information of the other representative patterns whose matching degree with the one representative pattern exceeds a predetermined value is A first step of storing in a memory in association with index information; a second step of identifying a representative pattern whose feature amount best matches the input voice feature amount and its index information; Using the index information as search key information, the index information of other representative patterns related to the memory is extracted,
The third step of collating the category in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount is included. The index information of the representative pattern is, for example, the address of the representative pattern in the vocabulary dictionary, the feature amount data of the representative pattern, and the like, and any type can be used as long as the representative pattern can be uniquely derived.

【0009】なお、前記第1の段階は、例えば一の代表
パタンを疑似音声特徴量としたときの他の代表パタンと
の合致度を導出するステップと、導出した合致度が所定
値を超えた他の代表パタンのインデックス情報を合致度
の高い順にソートして当該一の代表パタンのインデック
ス情報と共に前記メモリに記憶するステップとを有し、
前記第3の段階は、前記メモリから関連する他の代表パ
タンのインデックス情報を前記特定した代表パタンとの
合致度の高い順に抽出するステップと、それぞれ抽出し
た代表パタンのインデックス情報に対応する前記語彙辞
書内のカテゴリと前記入力音声特徴量とを順次照合する
ステップとを有する。
In the first step, for example, a step of deriving a degree of matching with another representative pattern when one representative pattern is used as a pseudo voice feature amount, and the derived degree of matching exceeds a predetermined value. Sorting the index information of the other representative pattern in descending order of matching, and storing the index information of the one representative pattern in the memory,
In the third step, a step of extracting index information of other related representative patterns from the memory in descending order of matching degree with the specified representative pattern, and the vocabulary corresponding to the extracted representative pattern index information, respectively. And a step of sequentially matching the category in the dictionary with the input voice feature amount.

【0010】また、本発明が提供する音声認識装置は、
認識対象となる入力音声の特徴量を抽出する入力音声特
徴量抽出手段、複数のカテゴリを類似パタン毎にグルー
プ化して各々のグループの特徴を表す代表パタンを格納
した語彙辞書、及び前記抽出した入力音声特徴量と前記
語彙辞書内の代表パタンとカテゴリとを段階的に照合し
て最も合致度の高いカテゴリを認識結果として出力する
照合器と、前記語彙辞書に格納された個々の代表パタン
の特徴量に対する他の全ての代表パタンの特徴量の相互
照合結果に基づき、一の代表パタンとの合致度が所定値
を超えた他の代表パタンのインデックス情報を合致度順
に当該一の代表パタンのインデックス情報と関連付けて
記憶したメモリとを有している。この構成において、前
記照合器は、前記入力音声特徴量と前記語彙辞書に格納
された代表パタンの特徴量とを照合して最も合致する代
表パタン及びそのインデックス情報を特定する第1の手
段と、特定した代表パタンのインデックス情報を検索キ
ー情報として前記メモリから関連する他の代表パタンの
インデックス情報を抽出する第2の手段と、抽出した代
表パタンのインデックス情報に対応する前記語彙辞書内
のカテゴリと前記入力音声特徴量とを照合する第3の手
段と、を含む構成であることを特徴とする。
Further, the voice recognition device provided by the present invention is
Input voice feature amount extraction means for extracting the feature amount of the input voice to be recognized, a vocabulary dictionary storing representative patterns representing the features of each group by grouping a plurality of categories into similar patterns, and the extracted input A collator that collates the voice feature quantity with the representative patterns and categories in the vocabulary dictionary stepwise and outputs the category with the highest degree of coincidence as a recognition result, and the characteristics of the individual representative patterns stored in the vocabulary dictionary. Based on the mutual matching results of the feature quantities of all other representative patterns with respect to the amount, the index information of the other representative patterns whose matching degree with one representative pattern exceeds a predetermined value is the index of the one representative pattern in order of matching degree. It has a memory stored in association with information. In this configuration, the collator collates the input speech feature amount with the feature amount of the representative pattern stored in the vocabulary dictionary, and specifies the best matching representative pattern and its index information. Second means for extracting index information of another related representative pattern from the memory using the index information of the specified representative pattern as search key information, and a category in the vocabulary dictionary corresponding to the extracted index information of the representative pattern A third means for collating with the input voice feature amount is included.

【0011】[0011]

【作用】本発明では、予め、語彙辞書に格納された個々
の代表パタンの特徴量に対する他の全ての代表パタンの
特徴量を相互に照合し、一の代表パタンとの合致度が所
定値を超えた他の代表パタンのインデックス情報を当該
一の代表パタンのインデックス情報と関連付けてメモリ
に記憶しておく。その際、好ましくは、一の代表パタン
を疑似音声特徴量としたときの他の代表パタンとの合致
度を導出し、導出した合致度が所定値を超えた他の代表
パタンのインデックス情報を合致度の高い順にソートし
て当該一の代表パタンのインデックス情報と共に記憶し
ておく。このようにすることにより、従来のソーティン
グ処理と同様の結果をごく短時間で得られる利点があ
る。
According to the present invention, the feature amounts of all other representative patterns with respect to the feature amounts of the individual representative patterns stored in the vocabulary dictionary are collated with each other in advance, and the degree of matching with one representative pattern has a predetermined value. The index information of the other representative patterns that have been exceeded is stored in the memory in association with the index information of the one representative pattern. At that time, preferably, the degree of matching with one of the other representative patterns when one representative pattern is used as the pseudo voice feature amount is derived, and the index information of the other representative pattern whose derived degree of matching exceeds a predetermined value is matched. It is sorted in descending order of frequency and stored together with the index information of the one representative pattern. By doing so, there is an advantage that the same result as the conventional sorting process can be obtained in a very short time.

【0012】照合に際しては、認識対象となる入力音声
の特徴量を入力音声特徴量抽出手段で抽出し、この入力
音声特徴量と語彙辞書とを照合器の第1の手段で照合し
て最も合致する一つの代表パタン及びそのインデックス
情報を特定する。その後、特定した代表パタンを検索キ
ー情報として照合器の第2の手段がメモリを検索し、関
連する他の代表パタンのインデックス情報を抽出する。
上述のように関連する他の代表パタンが合致度順に記憶
されているときは、当該順にインデックス情報を抽出す
る。照合器の第3の手段は、抽出した代表パタンのイン
デックス情報から一義的に導出される語彙辞書内の代表
パタンのグループに従属するカテゴリと前記入力音声特
徴量とを照合する。これにより最も合致度の高い一つの
カテゴリが認識結果として出力される。
At the time of collation, the feature quantity of the input speech to be recognized is extracted by the input speech feature quantity extraction means, and the input speech feature quantity and the vocabulary dictionary are collated by the first means of the collator to obtain the best match. One representative pattern and its index information are specified. After that, the second means of the collator searches the memory using the specified representative pattern as the search key information, and extracts the index information of other related representative patterns.
As described above, when other related representative patterns are stored in the matching degree order, the index information is extracted in that order. The third means of the collator collates a category subordinate to the group of representative patterns in the vocabulary dictionary uniquely derived from the extracted representative pattern index information with the input speech feature amount. As a result, one category with the highest degree of matching is output as the recognition result.

【0013】[0013]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。図1は、本発明の一実施例に係る音声認識
装置の要部構成図であり、従来装置である図6と同一機
能の要素については同一符号を付してある。また、便宜
上、語彙辞書も図5に示す内容のものとして説明する。
この実施例では、メモリ内に代表パタン相互順位テーブ
ル10を設けるとともに、それに合わせて照合器11の
構成を一部変更したものである。
Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a configuration diagram of a main part of a voice recognition device according to an embodiment of the present invention, in which elements having the same functions as those of the conventional device shown in FIG. Also, for convenience, the vocabulary dictionary will be described with the contents shown in FIG.
In this embodiment, the representative pattern mutual rank table 10 is provided in the memory, and the configuration of the collator 11 is partially changed accordingly.

【0014】この代表パタン相互順位テーブル10は、
予め音声認識処理の前段階として、以下の手順によって
作成しておく。
The representative pattern mutual rank table 10 is
It is created in advance by the following procedure as a pre-stage of the voice recognition process.

【0015】(1)個々の代表パタン、例えば第1グル
ープの代表パタンD1を疑似音声特徴量とし、この疑似
音声特徴量と他の全代表パタンD2〜D7との合致度を
計算する。 (2)次に計算の結果、合致度が所定値を超えたものを
選出する。この所定値は認識対象語彙の種類あるいは用
途に応じて任意に調整して良い。なお、この選出ステッ
プは場合によっては省略することもできる。 (3)次に、上記計算結果あるいは選出結果を合致度の
高い順にソートし、このソート結果を代表パタンD1が
実際の入力音声特徴量との合致度が最大となる場合のグ
ループ内のパタンを照合するべき代表パタンとして、代
表パタンD1のインデックスの参照欄に格納する。 (4)上記処理を他の代表パタンD2〜D7について繰
り返す。
(1) Each representative pattern, for example, the representative pattern D1 of the first group is set as a pseudo voice feature amount, and the degree of matching between this pseudo voice feature amount and all other representative patterns D2 to D7 is calculated. (2) Next, as a result of the calculation, the one whose matching degree exceeds a predetermined value is selected. This predetermined value may be arbitrarily adjusted according to the type or use of the recognition target vocabulary. It should be noted that this selection step can be omitted in some cases. (3) Next, the above calculation results or selection results are sorted in descending order of matching degree, and a pattern in the group in which the matching degree of the representative pattern D1 with the actual input voice feature amount is maximum is sorted. The representative pattern to be collated is stored in the reference column of the index of the representative pattern D1. (4) The above process is repeated for the other representative patterns D2 to D7.

【0016】このようにして作成された代表パタン相互
順位テーブル10の内容を図2に示す。図2において、
符号20は入力音声特徴量との合致度が最大となる代表
パタンのインデックス情報の格納欄、21はこの欄20
に関連付けられた欄であって、従属するパタンを照合す
るべき上位P(自然数)個の代表パタンのインデックス
情報を合致度順に格納するために設けられたものであ
る。
FIG. 2 shows the contents of the representative pattern mutual rank table 10 thus created. In FIG.
Reference numeral 20 is a storage field for storing index information of the representative pattern that maximizes the degree of matching with the input voice feature amount, and 21 is for this field 20.
And is provided for storing index information of upper P (natural number) representative patterns for which subordinate patterns are to be collated, in order of matching degree.

【0017】図示の例の上段内容は、第1グループの代
表パタンD1との合致度が所定値を超えた他の代表パタ
ンが2つあり、しかもこれら代表パタンのうち、第3グ
ループの代表パタンD3が第5グループの代表パタンD
5よりも合致度が高かったことを示している。また、次
段内容は、第2グループの代表パタンD2を疑似音声特
徴量Fとした場合に、合致度が所定値を超えた代表パタ
ンが1つしかなかった(代表パタンD3)ことを示して
いる。つまり、この代表パタン相互順位テーブル10に
よれば、一の代表パタンを特定できれば、その特定され
た代表パタンにより合致する他の代表パタンが直ちに特
定できることになる。なお、この図では、便宜上、図5
に示した各代表パタンの識別情報D1〜D7をインデッ
クス情報として用いているが、このインデックス情報
は、これら識別情報に限定されないのは上述のとおりで
ある。
In the upper part of the illustrated example, there are two other representative patterns whose degree of coincidence with the representative pattern D1 of the first group exceeds a predetermined value, and among these representative patterns, the representative pattern of the third group. D3 is the representative pattern D of the 5th group
It shows that the degree of agreement was higher than 5. Further, the content of the next stage shows that, when the representative pattern D2 of the second group is set as the pseudo voice feature amount F, there is only one representative pattern whose matching degree exceeds the predetermined value (representative pattern D3). There is. That is, according to the representative pattern mutual ranking table 10, if one representative pattern can be specified, another representative pattern that matches the specified representative pattern can be immediately specified. In addition, in FIG.
Although the identification information D1 to D7 of the respective representative patterns shown in (4) are used as the index information, the index information is not limited to these identification information as described above.

【0018】次に、上記構成の本実施例の音声認識装置
において、入力音声が”あいち”の場合の認識処理内容
を、図3をも参照して説明する。なお、図3においてS
は各処理ステップを示す。
Next, with reference to FIG. 3, description will be given of the contents of the recognition processing when the input voice is "Aichi" in the voice recognition apparatus of the present embodiment having the above-mentioned configuration. In addition, in FIG.
Indicates each processing step.

【0019】図1に示した音声入力装置60から取り込
んだ”あいち”の音声を特徴量抽出器61においてデジ
タル化し、所要の分析を施してその音声特徴量を抽出す
る。ここまでの処理は従来装置と同様となる。本実施例
の照合器11では、図3に示すように、”あいち”の音
声特徴量と語彙辞書63内の全代表パタンD1〜D7と
を照合し(S101)、”あいち”の音声特徴量に最も
合致する代表パタンD1を選出する(S102:第1の
手段)。この処理は公知のパタン認識技術により容易に
実現することができる。
The voice of "Aichi" taken in from the voice input device 60 shown in FIG. 1 is digitized by the feature quantity extractor 61, and the required analysis is performed to extract the voice feature quantity. The processing up to this point is similar to that of the conventional apparatus. As shown in FIG. 3, the collator 11 of the present embodiment collates the speech feature amount of "Aichi" with all the representative patterns D1 to D7 in the vocabulary dictionary 63 (S101), and the speech feature amount of "Aichi". A representative pattern D1 that most closely matches with is selected (S102: first means). This processing can be easily realized by a known pattern recognition technique.

【0020】その後、この代表パタンD1を検索キー情
報として代表パタン相互順位テーブル10を検索し、当
該代表パタンD1のインデックス情報の格納欄20と関
連付けられて格納された代表パタンD3,D5を選出す
る(S103:第2の手段)。その後、各代表パタンD
1,D3,D5に従属するカテゴリパタン「アイチ」〜
「アダチ」、「アマガサ」〜「アマミオオシマ」、「ア
キタ」〜「アキホ」と入力音声”あいち”の音声特徴量
との照合処理を行い(S104:第3の手段)、最も合
致度の高いカテゴリパタン「アイチ」を抽出して(S1
05)それを認識結果として出力する(S106)。
After that, the representative pattern mutual rank table 10 is searched by using the representative pattern D1 as search key information, and the representative patterns D3 and D5 stored in association with the storage column 20 of the index information of the representative pattern D1 are selected. (S103: second means). After that, each representative pattern D
Category pattern "Aichi" subordinate to 1, D3, D5 ~
The matching process is performed with "Adachi", "Amagasa" to "Amami Oshima", "Akita" to "Akiho" and the voice feature amount of the input voice "Aichi" (S104: third means), and the highest degree of matching is obtained. Extract the category pattern "Aichi" (S1
05) It is output as a recognition result (S106).

【0021】図4は、単語認識実験における本実施例に
よる上位P個の代表パタン選出結果と、従来法のソーテ
ィング処理における上位P’個の代表パタン選出結果と
の一致度を示す実測図である。この場合の辞書構成は、
語彙のパタン数m=2,000、代表パタン数Q=64
となっている。図中、縦軸は一致度[%]を示し、横軸
は代表パタンを選出した後に照合を行うべき語彙のパタ
ン数k[個]を示している。
FIG. 4 is an actual measurement chart showing the degree of coincidence between the result of selecting the top P representative patterns according to this embodiment in the word recognition experiment and the result of selecting the top P'representative patterns in the conventional sorting process. . The dictionary structure in this case is
Number of vocabulary patterns m = 2,000, number of representative patterns Q = 64
Has become. In the figure, the vertical axis represents the degree of coincidence [%], and the horizontal axis represents the number k [pieces] of vocabulary patterns to be checked after the representative pattern is selected.

【0022】図4に示されるように、照合を行うべき語
彙のパタン数kがほぼ600になるように上位代表パタ
ンP個を選出する場合、従来法と本実施例の手法による
上位代表パタンの一致度は約75%であった。また、こ
のときの認識性能は、両手法による音声認識装置ともに
89%であり、全く差はなかった。
As shown in FIG. 4, when P representative upper representative patterns are selected so that the number k of patterns of the vocabulary to be collated is almost 600, the upper representative patterns by the conventional method and the method of this embodiment are selected. The degree of agreement was about 75%. In addition, the recognition performance at this time was 89% in both of the speech recognition devices by both methods, and there was no difference.

【0023】このように、本実施例の音声認識装置で
は、従来のソーティング処理に代えて、代表パタン相互
順位テーブル10の検索を行うことにより入力音声特徴
量との合致度の高い上位P個の代表パタンを選出してい
るので、認識性能の劣化を起こすことなく照合処理のた
めの演算量を削減することができる。しかも上述のよう
に代表パタン相互順位テーブル10を語彙辞書63のイ
ンデックスとして活用することができるので、語彙辞書
63内に多量の語彙及び代表パタンを格納しても音声認
識処理自体のレスポンスは変わらず、認識性能の大幅な
向上も期待できる。更に、事後的に語彙辞書63に語彙
及び代表パタンを追加格納しても代表パタン相互順位テ
ーブル10の一部変更のみで対処できるので汎用性もあ
り、運用コスト的にも有利となる。
As described above, in the speech recognition apparatus of this embodiment, the representative pattern mutual rank table 10 is searched in place of the conventional sorting processing, so that the P high-ranking items having a high degree of matching with the input speech feature amount are searched. Since the representative pattern is selected, it is possible to reduce the calculation amount for the matching process without degrading the recognition performance. Moreover, since the representative pattern mutual rank table 10 can be used as an index of the vocabulary dictionary 63 as described above, the response of the voice recognition process itself does not change even if a large number of vocabulary and representative patterns are stored in the vocabulary dictionary 63. The recognition performance can be expected to improve significantly. Further, even if the vocabulary and the representative pattern are additionally stored in the vocabulary dictionary 63 afterwards, it can be dealt with only by partially changing the representative pattern mutual rank table 10, so that there is versatility, which is advantageous in terms of operation cost.

【0024】[0024]

【効果】以上の説明から明らかなように、本発明の音声
認識方法によれば、入力音声特徴量と最も合致する一の
代表パタンを特定した後、この特定した代表パタンを検
索キー情報としてメモリから絞込対象となる他の代表パ
タンが抽出されるので、認識性能の劣化を起こさずに音
声入力から認識結果出力までのレスポンスを向上させる
ことができる。また上記レスポンスの低下を招くことな
くカテゴリパタン数を増加させることができ、認識性能
の向上に寄与することもできる。
As is apparent from the above description, according to the voice recognition method of the present invention, after the one representative pattern that most matches the input voice feature amount is specified, the specified representative pattern is stored as search key information in the memory. Since other representative patterns to be narrowed down are extracted from, the response from voice input to recognition result output can be improved without degrading the recognition performance. In addition, the number of category patterns can be increased without lowering the response, which can also contribute to improvement in recognition performance.

【0025】また、本発明の音声認識装置によれば、照
合器の構成を変えることなく、語彙辞書に格納されたカ
テゴリパタンや代表パタン数に応じてメモリに記憶すべ
き情報を任意に変えることができるので、音声入力から
認識結果出力までのレスポンスの向上、認識性能の向上
が図れることはもとより、カテゴリパタン数を用途に応
じて任意に変更させることができ、より汎用性のある装
置構成を実現することができる。
Further, according to the speech recognition apparatus of the present invention, the information to be stored in the memory can be arbitrarily changed according to the number of category patterns and the number of representative patterns stored in the vocabulary dictionary without changing the structure of the collator. Since it is possible to improve the response from voice input to recognition result output and improve the recognition performance, the number of category patterns can be arbitrarily changed according to the application, and a more versatile device configuration can be achieved. Can be realized.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例に係る音声認識装置の要部構
成図。
FIG. 1 is a configuration diagram of a main part of a voice recognition device according to an embodiment of the present invention.

【図2】本実施例の音声認識装置が有する代表パタン相
互順位テーブルの内容説明図。
FIG. 2 is an explanatory diagram of contents of a representative pattern mutual rank table included in the voice recognition device according to the present embodiment.

【図3】本実施例による音声認識処理の概要手順図。FIG. 3 is a schematic procedure diagram of voice recognition processing according to the present embodiment.

【図4】単語認識実験における本実施例による代表パタ
ン選出結果と、従来法のソーティング処理における代表
パタン選出結果との一致度を示す実測図。
FIG. 4 is an actual measurement diagram showing the degree of coincidence between the representative pattern selection result according to the present example in the word recognition experiment and the representative pattern selection result in the conventional sorting process.

【図5】この種の音声認識装置で用いられる語彙辞書の
構成例を示す説明図。
FIG. 5 is an explanatory diagram showing a configuration example of a vocabulary dictionary used in this type of voice recognition device.

【図6】従来の音声認識装置の要部構成図。FIG. 6 is a configuration diagram of a main part of a conventional voice recognition device.

【図7】従来の音声認識処理の概要手順図。FIG. 7 is a schematic procedure diagram of conventional speech recognition processing.

【符号の説明】[Explanation of symbols]

10 代表パタン相互順位テーブル 11,62 照合器 60 マイクロフォン等の音声入力装置 61 入力音声の特徴量を抽出する特徴量抽出器 63 複数の代表パタンを格納した語彙辞書 10 mutual pattern mutual rank table 11,62 collator 60 voice input device such as microphone 61 feature amount extractor 63 for extracting feature amount of input voice 63 vocabulary dictionary storing a plurality of representative patterns

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 複数の語彙を類似パタン毎にグループ化
して各々のグループの特徴を表す代表パタンを語彙辞書
内に格納しておき、入力音声特徴量と前記複数の語彙と
を照合する際に、該入力音声特徴量と前記語彙辞書に格
納された全代表パタンとを照合して合致度が高い所定個
の代表パタンを絞り込んだ後、これら所定個の代表パタ
ンのグループに属する語彙と前記入力音声特徴量とを照
合して最も合致度の高い語彙を認識結果として出力する
音声認識方法において、 前記語彙辞書に格納された個々の代表パタンの特徴量に
対する他の全ての代表パタンの特徴量を相互に照合し、
一の代表パタンとの合致度が所定値を超えた他の代表パ
タンのインデックス情報を当該一の代表パタンのインデ
ックス情報と関連付けてメモリに記憶しておく第1の段
階と、 その特徴量が前記入力音声特徴量と最も合致する代表パ
タン及びそのインデックス情報を特定する第2の段階
と、 特定した代表パタンのインデックス情報を検索キー情報
として前記メモリから関連する他の代表パタンのインデ
ックス情報を抽出し、抽出した代表パタンのインデック
ス情報に対応する前記語彙辞書内の語彙と前記入力音声
特徴量とを照合する第3の段階と、を含むことを特徴と
する音声認識方法。
1. A plurality of vocabularies are grouped for each similar pattern, representative patterns representing the characteristics of each group are stored in a vocabulary dictionary, and when an input speech feature amount and the plurality of vocabularies are collated. , The input speech feature amount is compared with all the representative patterns stored in the vocabulary dictionary to narrow down a predetermined number of representative patterns having a high degree of matching, and the vocabulary belonging to the group of these predetermined representative patterns and the input In a voice recognition method of collating with a voice feature amount and outputting the vocabulary with the highest degree of matching as a recognition result, the feature amounts of all other representative patterns with respect to the feature amounts of the individual representative patterns stored in the vocabulary dictionary are Match each other,
A first step of storing index information of another representative pattern whose degree of matching with one representative pattern exceeds a predetermined value in a memory in association with the index information of the one representative pattern, and A second step of identifying a representative pattern that best matches the input voice feature amount and its index information, and extracting index information of other relevant representative patterns from the memory using the index information of the identified representative pattern as search key information. And a third step of collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount, the speech recognition method.
【請求項2】 前記第1の段階は、 一の代表パタンを疑似音声特徴量としたときの他の代表
パタンとの合致度を導出するステップと、導出した合致
度が所定値を超えた他の代表パタンのインデックス情報
を合致度の高い順にソートして当該一の代表パタンのイ
ンデックス情報と共に前記メモリに記憶するステップと
を有し、 前記第3の段階は、 前記メモリから関連する他の代表パタンのインデックス
情報を前記特定した代表パタンとの合致度の高い順に抽
出するステップと、それぞれ抽出した代表パタンのイン
デックス情報に対応する前記語彙辞書内の語彙と前記入
力音声特徴量とを順次照合するステップとを有すること
を特徴とする請求項1記載の音声認識方法。
2. The first step is a step of deriving a degree of agreement with another representative pattern when one representative pattern is used as a pseudo voice feature amount, and a step of deriving the degree of agreement that has been derived exceeds a predetermined value. Sorting the representative pattern index information in descending order of the degree of matching and storing the same in the memory together with the index information of the one representative pattern, and the third step includes another related representative from the memory. Extracting the index information of the patterns in descending order of the degree of matching with the specified representative pattern, and sequentially collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount. The method according to claim 1, further comprising:
【請求項3】 認識対象となる入力音声の特徴量を抽出
する入力音声特徴量抽出手段、複数の語彙を類似パタン
毎にグループ化して各々のグループの特徴を表す代表パ
タンを格納した語彙辞書、及び前記抽出した入力音声特
徴量と前記語彙辞書内の代表パタンと語彙とを段階的に
照合して最も合致度の高い語彙を認識結果として出力す
る照合器、を有する音声認識装置において、 前記語彙辞書に格納された個々の代表パタンの特徴量に
対する他の全ての代表パタンの特徴量の相互照合結果に
基づき、一の代表パタンとの合致度が所定値を超えた他
の代表パタンのインデックス情報を合致度順に当該一の
代表パタンのインデックス情報と関連付けて記憶したメ
モリを設け、 前記照合器は、前記入力音声特徴量と前記語彙辞書に格
納された代表パタンの特徴量とを照合して最も合致する
代表パタン及びそのインデックス情報を特定する第1の
手段と、 特定した代表パタンのインデックス情報を検索キー情報
として前記メモリから関連する他の代表パタンのインデ
ックス情報を抽出する第2の手段と、 抽出した代表パタンのインデックス情報に対応する前記
語彙辞書内の語彙と前記入力音声特徴量とを照合する第
3の手段と、を含む構成であることを特徴とする音声認
識装置。
3. An input voice feature amount extraction means for extracting a feature amount of an input voice to be recognized, a vocabulary dictionary storing a representative pattern representing a feature of each group by grouping a plurality of vocabularies into similar patterns, And a collator that collates the extracted input speech feature amount with the representative pattern and the vocabulary in the vocabulary dictionary stepwise and outputs the vocabulary having the highest degree of coincidence as a recognition result. Index information of other representative patterns whose matching degree with one representative pattern exceeds a predetermined value, based on the mutual collation result of the characteristic amounts of all other representative patterns with respect to the characteristic amounts of each representative pattern stored in the dictionary. Is provided in association with the index information of the one representative pattern in the order of the degree of matching, and the collator is configured to store the input voice feature quantity and the representative pattern stored in the vocabulary dictionary. Means for identifying the most representative representative pattern and its index information by collating with the feature amount of the search target and index information of the other representative pattern related from the memory using the index information of the identified representative pattern as search key information. It is configured to include a second means for extracting information, and a third means for collating the vocabulary in the vocabulary dictionary corresponding to the extracted representative pattern index information with the input speech feature amount. And a voice recognition device.
JP6286850A 1994-11-21 1994-11-21 Method and device for speech recognition Pending JPH08146988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP6286850A JPH08146988A (en) 1994-11-21 1994-11-21 Method and device for speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP6286850A JPH08146988A (en) 1994-11-21 1994-11-21 Method and device for speech recognition

Publications (1)

Publication Number Publication Date
JPH08146988A true JPH08146988A (en) 1996-06-07

Family

ID=17709845

Family Applications (1)

Application Number Title Priority Date Filing Date
JP6286850A Pending JPH08146988A (en) 1994-11-21 1994-11-21 Method and device for speech recognition

Country Status (1)

Country Link
JP (1) JPH08146988A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06290272A (en) * 1993-04-02 1994-10-18 Sharp Corp High-speed matching system
JPH08123460A (en) * 1994-10-26 1996-05-17 Sony Corp Searching method and speech recognition device
JP2522154B2 (en) * 1993-06-03 1996-08-07 日本電気株式会社 Voice recognition system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06290272A (en) * 1993-04-02 1994-10-18 Sharp Corp High-speed matching system
JP2522154B2 (en) * 1993-06-03 1996-08-07 日本電気株式会社 Voice recognition system
JPH08123460A (en) * 1994-10-26 1996-05-17 Sony Corp Searching method and speech recognition device

Similar Documents

Publication Publication Date Title
US5329609A (en) Recognition apparatus with function of displaying plural recognition candidates
US5774588A (en) Method and system for comparing strings with entries of a lexicon
US7003519B1 (en) Method of thematic classification of documents, themetic classification module, and search engine incorporating such a module
JP5111607B2 (en) Computer-implemented method and apparatus for interacting with a user via a voice-based user interface
JPH01167896A (en) Voice input device
EP2548202A1 (en) Methods and apparatus for extracting alternate media titles to facilitate speech recognition
EP2631815A1 (en) Method and device for ordering search results, method and device for providing information
US5640488A (en) System and method for constructing clustered dictionary for speech and text recognition
WO2008062822A1 (en) Text mining device, text mining method and text mining program
JPH08146988A (en) Method and device for speech recognition
JPH06325092A (en) Customer information retrieval system
JPH07210569A (en) Method and device for retrieving information
JP3678360B2 (en) Kanji character string specifying apparatus and method using voice input
KR930000593B1 (en) Voice information service system and method utilizing approximate matching
JPS59117673A (en) Postprocessing system of character recognizing device
JP2007072961A (en) Database retrieval method, program, and system
JPS60233782A (en) Address reader
JPS62191924A (en) Information registration and retrieval device
JPH03266898A (en) Voice recognition processing system for large-vocabulary
CN115905831A (en) Feature selection method based on improved Boruta algorithm
JPH05181719A (en) Variable length data storage and reference system
CN113589957A (en) Method and system for rapidly inputting professional words of laws and regulations
JPH05225248A (en) Data base retrieval system
JPH08227427A (en) Character recognition device
CN117874361A (en) Data pushing method and system