JPH11282490A

JPH11282490A - Speech recognition device and storage medium

Info

Publication number: JPH11282490A
Application number: JP10104152A
Authority: JP
Inventors: Nobukimi Kobayashi; 宣公小林
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 1998-03-30
Filing date: 1998-03-30
Publication date: 1999-10-15

Abstract

PROBLEM TO BE SOLVED: To shorten a time required for recognition by reducing a necessary calculation amount before a recognition result is obtained based on a feature parameter obtained by analyzing a speech signal. SOLUTION: In a standard pattern group storage part 20, plural groups which have collected mutually approximates feature parameters from 101 well- known standard patterns, and representative standard patterns using specified standard patterns selected from standard patterns in each group respectively as the standard patterns representing the groups are associated with each other and stored. The representative standard patterns 16a stored in a standard pattern group storage part 20 is stored in a representative standard pattern storage part 16a. And, a comparison recognition part 18 selects the representative standard patterns approximated with the feature parameters extracted by an analysis part 14 from the representative standard pattern storage part 16, and a control unit 22 selects a group corresponding to the selected representative standard pattern from a standard pattern group storage part 20 and outputs it.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声を認識する音
声認識装置、およびその音声認識装置が音声認識を行う
ためのコンピュータプログラムが記憶された記憶媒体に
関し、音声の単音節を分析して音声認識を行う場合の演
算量を軽減できるものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device for recognizing speech and a storage medium storing a computer program for the speech recognition device to perform speech recognition. This can reduce the amount of calculation when performing recognition.

【０００２】[0002]

【従来の技術】従来、上記音声認識装置として、たとえ
ば図５に示す構成のものが知られている。この音声認識
装置４０に備えられた入力部１２は、音声認識の対象と
なる音声信号を入力し、分析部１４は、入力部１２から
出力される音声信号を所定の周期（たとえば、４〜２０
ｍｓｅｃ）ごとに分析して特徴パラメータを抽出する。
その特徴パラメータとしては、音声に含まれるスペクト
ルをバンドパスフィルタバンクによって抽出したもの等
が用いられる。また、音声認識装置４０には、１０１個
の単音節の特徴パラメータの標準的な型である標準パタ
ンを記憶する標準パタンメモリ４２が備えられている。
その標準パタンは、特定話者認識装置の場合、話者に対
応した各単音節の特徴パラメータの標準的なものであ
り、また、不特定話者認識装置の場合、予め多数の人の
音声から抽出した各単音節の特徴パラメータの標準的な
ものである。2. Description of the Related Art Conventionally, as the above-mentioned speech recognition apparatus, for example, one having a configuration shown in FIG. 5 is known. The input unit 12 provided in the voice recognition device 40 inputs a voice signal to be subjected to voice recognition, and the analysis unit 14 converts the voice signal output from the input unit 12 into a predetermined period (for example, 4 to 20).
msec) to extract feature parameters.
As the characteristic parameter, a parameter extracted from a spectrum included in a voice by a band-pass filter bank is used. Further, the speech recognition device 40 includes a standard pattern memory 42 for storing a standard pattern which is a standard type of feature parameters of 101 single syllables.
In the case of the specific speaker recognition device, the standard pattern is a standard pattern of the characteristic parameters of each single syllable corresponding to the speaker. This is a standard feature parameter of each extracted syllable.

【０００３】そして、分析部１４によって分析された特
徴パラメータは、比較認識部４４に供給され、比較認識
部４４は、供給された特徴パラメータが、標準パタンメ
モリ４２に記憶されている標準パタンの中のどの標準パ
タンと近似しているかのマッチング計算処理を標準パタ
ンメモリ４２に記憶されている１０１個の標準パラメー
タの総てに対して行う。そしてそのマッチング計算処理
の結果、最も近似したものが第１候補として、また順次
近似したものが次候補として制御部４６に出力された
後、ＣＲＴやＬＣＤなどの表示部２６に表示される。そ
して、表示部２６に表示された候補から所望の認識結果
を発見したら、キーボード２８で表示部２６のカーソル
を移動させて所望の認識結果を選択する。この選択され
た認識結果は、制御部４６に出力された後、制御部４６
からデータ処理部４８に出力される。The characteristic parameters analyzed by the analyzing unit 14 are supplied to a comparison recognizing unit 44. The comparison recognizing unit 44 stores the supplied characteristic parameters in a standard pattern stored in a standard pattern memory 42. Is performed for all of the 101 standard parameters stored in the standard pattern memory 42. As a result of the matching calculation processing, the closest approximation is output to the control unit 46 as the first candidate, and the successive approximation is output to the control unit 46 as the next candidate, and then displayed on the display unit 26 such as a CRT or LCD. When a desired recognition result is found from the candidates displayed on the display unit 26, the cursor on the display unit 26 is moved with the keyboard 28 to select the desired recognition result. After the selected recognition result is output to the control unit 46,
Is output to the data processing unit 48.

【０００４】[0004]

【発明が解決しようとする課題】しかし、上記従来の単
音節認識装置４０は、分析部１４によって抽出された特
徴パラメータと標準パタンメモリ４２に記憶されている
標準パタンとを比較するマッチング計算処理を、標準パ
タンメモリ４２に記憶されている１０１個の標準パタン
の総てに対して行うため、そのマッチング計算処理の計
算量がきわめて多くなり、時間がかかるという問題があ
る。However, the conventional single syllable recognition device 40 performs a matching calculation process for comparing the feature parameters extracted by the analysis unit 14 with the standard patterns stored in the standard pattern memory 42. Since the calculation is performed for all of the 101 standard patterns stored in the standard pattern memory 42, the amount of calculation of the matching calculation processing becomes extremely large, and there is a problem that it takes time.

【０００５】そこで、本発明は、音声信号を分析して得
られる特徴パラメータに基づいて認識結果を出すまでに
必要な計算量を削減して認識に要する時間を短縮できる
音声認識装置および記憶媒体を実現することを目的とす
る。Accordingly, the present invention provides a speech recognition apparatus and a storage medium which can reduce the amount of calculation required to produce a recognition result based on a characteristic parameter obtained by analyzing a speech signal, thereby shortening the time required for recognition. It is intended to be realized.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記目的を達
成するため、請求項１に記載の発明では、音声の単音節
を分析して特徴パラメータを抽出するとともに、その抽
出された特徴パラメータの中で近似しているもの同士を
集めて特徴パラメータのグループを作成し、各グループ
においてグループ中の特徴パラメータに基づいて、その
グループを代表するパラメータとして作成された代表パ
ラメータを記憶した記憶手段と、音声を入力する音声入
力手段と、この音声入力手段によって入力された音声の
単音節を分析してその単音節の特徴パラメータを抽出す
るとともに、その抽出された特徴パラメータと近似する
代表パラメータを前記記憶手段から選択して出力する出
力手段と、が備えられたという技術的手段を採用する。According to the present invention, in order to achieve the above object, according to the first aspect of the present invention, a feature parameter is extracted by analyzing a monosyllable of a voice, and the extracted feature parameter is extracted. A storage means for storing a group of characteristic parameters by collecting similar parameters in each group, and storing representative parameters generated as parameters representing the group in each group based on characteristic parameters in the group. Voice input means for inputting voice, and analyzing a single syllable of the voice input by the voice input means to extract a characteristic parameter of the single syllable, and representing a representative parameter approximating the extracted characteristic parameter to And output means for selecting and outputting from the storage means.

【０００７】請求項２に記載の発明では、請求項１に記
載の音声認識装置において、前記記憶手段は、前記代表
パラメータと、その代表パラメータによって代表される
グループの内容とを対応付けて記憶しており、前記出力
手段は、前記音声入力手段によって入力された音声の単
音節を分析してその単音節の特徴パラメータを抽出する
とともに、その抽出された特徴パラメータと近似する代
表パラメータに対応付けられているグループの内容を前
記記憶手段から読出して出力するという技術的手段を採
用する。According to a second aspect of the present invention, in the speech recognition apparatus according to the first aspect, the storage unit stores the representative parameter and the contents of a group represented by the representative parameter in association with each other. The output unit analyzes a single syllable of the voice input by the voice input unit, extracts a characteristic parameter of the single syllable, and associates the extracted characteristic parameter with a representative parameter that approximates the extracted characteristic parameter. The technical means of reading out the contents of the group in question from the storage means and outputting it.

【０００８】請求項３に記載の発明では、請求項１に記
載の音声認識装置において、前記出力手段は、前記音声
入力手段によって入力された音声の単音節を分析してそ
の単音節の特徴パラメータを抽出するとともに、その抽
出された特徴パラメータと近似する代表パラメータのう
ち、近似度の高い上位の所定数を前記記憶手段から選択
して出力するという技術的手段を採用する。According to a third aspect of the present invention, in the speech recognition apparatus according to the first aspect, the output unit analyzes a single syllable of the voice input by the voice input unit, and analyzes the characteristic parameter of the single syllable. And a technical means of selecting and outputting, from the storage means, a predetermined high-order number having a high degree of approximation among representative parameters approximated to the extracted characteristic parameters.

【０００９】請求項４に記載の発明では、請求項２に記
載の音声認識装置において、前記出力手段は、前記音声
入力手段によって入力された音声の単音節を分析してそ
の単音節の特徴パラメータを抽出するとともに、その抽
出された特徴パラメータと近似する代表パラメータのう
ち、近似度の高い上位の所定数を前記記憶手段から選択
し、その選択された所定数の代表パラメータにそれぞれ
対応付けられているグループの内容を出力するという技
術的手段を採用する。According to a fourth aspect of the present invention, in the speech recognition apparatus according to the second aspect, the output unit analyzes a single syllable of the voice input by the voice input unit, and analyzes the characteristic parameter of the single syllable. And, from among the representative parameters that are similar to the extracted characteristic parameters, a high-order predetermined number having a high degree of approximation is selected from the storage unit, and the selected number is associated with the selected predetermined number of representative parameters. The technical means of outputting the contents of a certain group is adopted.

【００１０】請求項５に記載の発明では、請求項４に記
載の音声認識装置において、前記出力手段は、前記音声
入力手段によって入力された音声の単音節を分析してそ
の単音節の特徴パラメータを抽出するとともに、その抽
出された特徴パラメータと近似する代表パラメータのう
ち、近似度の高い上位の所定数を前記記憶手段から選択
し、さらに、その選択された所定数の代表パラメータに
それぞれ対応付けられているグループ内の特徴パラメー
タのうち、近似度の高い上位の所定数を選択し、その選
択した上位の所定数の特徴パラメータを出力するという
技術的手段を採用する。According to a fifth aspect of the present invention, in the voice recognition apparatus according to the fourth aspect, the output unit analyzes a single syllable of the voice input by the voice input unit, and analyzes the characteristic parameter of the single syllable. And, from among the representative parameters that are similar to the extracted characteristic parameters, a higher-order predetermined number having a higher degree of approximation is selected from the storage unit, and further, the selected parameters are associated with the selected predetermined number of representative parameters. A technical means is employed in which, from among the feature parameters in the group, a predetermined number of higher-ranked features having a high degree of approximation is selected, and the selected predetermined number of upper-ranked feature parameters are output.

【００１１】請求項６に記載の発明では、請求項２また
は請求項４または請求項５に記載の音声認識装置におい
て、前記記憶手段は、前記グループ内の特徴パラメータ
を発声頻度に対応付けて記憶しており、前記出力手段
は、前記特徴パラメータを発声頻度の高い順に出力する
という技術的手段を採用する。According to a sixth aspect of the present invention, in the speech recognition apparatus according to the second or fourth or fifth aspect, the storage means stores the feature parameters in the group in association with the utterance frequency. The output means employs technical means for outputting the feature parameters in descending order of utterance frequency.

【００１２】請求項７に記載の発明では、請求項１ない
し請求項６のいずれか１つに記載の音声認識装置におい
て、前記代表パラメータは、前記グループ内から選択し
た所定の特徴パラメータであるという技術的手段を採用
する。According to a seventh aspect of the present invention, in the speech recognition apparatus according to any one of the first to sixth aspects, the representative parameter is a predetermined characteristic parameter selected from the group. Adopt technical means.

【００１３】請求項８に記載の発明では、音声の単音節
を分析して特徴パラメータを抽出するとともに、その抽
出されたされた特徴パラメータの中で近似しているもの
同士を集めて特徴パラメータのグループを作成し、各グ
ループにおいてグループ中の特徴パラメータに基づい
て、そのグループを代表するパラメータとして作成され
た代表パラメータを記憶した記憶領域と、音声を入力
し、その入力された音声の単音節を分析してその単音節
の特徴パラメータを抽出するとともに、その抽出された
特徴パラメータと近似する代表パラメータを前記記憶手
段から選択して出力するコンピュータプログラムが記憶
された記憶媒体という技術的手段を採用する。According to the present invention, a feature parameter is extracted by analyzing a single syllable of a voice, and similarities among the extracted feature parameters are collected to obtain a feature parameter. A group is created, and in each group, based on the characteristic parameters in the group, a storage area storing a representative parameter created as a parameter representative of the group and a voice are input, and a monosyllable of the input voice is input. A technical means of a storage medium storing a computer program for analyzing and extracting characteristic parameters of the single syllable and selecting and outputting representative parameters similar to the extracted characteristic parameters from the storage means is employed. .

【００１４】請求項９に記載の発明では、請求項８に記
載の記憶媒体において、前記記憶領域は、前記代表パラ
メータと、その代表パラメータによって代表されるグル
ープの内容とを対応付けて記憶しており、前記コンピュ
ータプログラムは、前記音声入力手段によって入力され
た音声の単音節を分析してその単音節の特徴パラメータ
を抽出するとともに、その抽出された特徴パラメータと
近似する代表パラメータに対応付けられているグループ
の内容を前記記憶手段から読出して出力するためのもの
であるという技術的手段を採用する。According to a ninth aspect of the present invention, in the storage medium according to the eighth aspect, the storage area stores the representative parameter and the contents of a group represented by the representative parameter in association with each other. The computer program analyzes a single syllable of the voice input by the voice input unit, extracts a characteristic parameter of the single syllable, and associates the extracted characteristic parameter with a representative parameter that approximates the extracted characteristic parameter. A technical means is employed for reading out and outputting the contents of a certain group from the storage means.

【００１５】[0015]

【作用】請求項１ないし請求項７に記載の発明では、上
記記憶手段は、音声の単音節を分析して特徴パラメータ
を抽出するとともに、その抽出された特徴パラメータの
中で近似しているもの同士を集めて特徴パラメータのグ
ループを作成し、各グループにおいてグループ中の特徴
パラメータに基づいて、そのグループを代表するパラメ
ータとして作成された代表パラメータを記憶し、上記出
力手段は、音声を入力する音声入力手段によって入力さ
れた音声の単音節を分析してその単音節の特徴パラメー
タを抽出するとともに、その抽出された特徴パラメータ
と近似する代表パラメータを上記記憶手段から選択して
出力する。つまり、音声信号を分析して抽出された特徴
パラメータに近似している特徴パラメータを記憶手段か
ら選択するために、記憶手段に記憶されている総ての特
徴パラメータに対して近似しているか否かを判定するの
ではなく、近似しているもの同士を集めてグループ化
し、そのグループを代表する代表パラメータの中から近
似しているものを選択して出力することができる。した
がって、請求項１ないし請求項７に記載の発明によれ
ば、記憶手段に記憶されている総ての特徴パラメータに
対して近似しているか否かを判定する音声認識装置より
も、計算量を削減して認識に要する時間を短縮できる音
声認識装置を実現できる。According to the first to seventh aspects of the present invention, the storage means analyzes a monosyllable of a voice to extract a characteristic parameter and approximates the extracted characteristic parameter. A group of feature parameters is created by gathering each other, and a representative parameter created as a parameter representative of the group is stored in each group based on the feature parameters in the group. A single syllable of the voice input by the input means is analyzed to extract a characteristic parameter of the single syllable, and a representative parameter similar to the extracted characteristic parameter is selected and output from the storage means. That is, in order to select, from the storage unit, a feature parameter that is similar to the extracted feature parameter by analyzing the audio signal, whether or not all the feature parameters stored in the storage unit are approximated Instead of judging, approximate ones are gathered and grouped, and an approximate one can be selected and output from representative parameters representing the group. Therefore, according to the first to seventh aspects of the present invention, the amount of calculation is smaller than that of the speech recognition device that determines whether or not all the feature parameters stored in the storage unit are approximated. It is possible to realize a speech recognition device capable of reducing the time required for recognition by reducing the number.

【００１６】特に、請求項２に記載の発明では、上記記
憶手段は、代表パラメータと、その代表パラメータによ
って代表されるグループの内容とを対応付けて記憶して
おり、上記出力手段は、音声入力手段によって入力され
た音声の単音節を分析してその単音節の特徴パラメータ
を抽出するとともに、その抽出された特徴パラメータと
近似する代表パラメータに対応付けられているグループ
の内容を記憶手段から読出して出力する。つまり、代表
パラメータに対応付けられているグループの内容を出力
することができるため、出力された認識結果を表示する
際にグループの内容を表示できる。In particular, in the second aspect of the present invention, the storage unit stores the representative parameter and the contents of the group represented by the representative parameter in association with each other, and the output unit outputs the voice input. Means for analyzing a single syllable of the input voice and extracting characteristic parameters of the single syllable, and reading out from the storage means the contents of the group associated with the extracted representative parameter and a representative parameter that approximates the characteristic parameter; Output. That is, since the content of the group associated with the representative parameter can be output, the content of the group can be displayed when the output recognition result is displayed.

【００１７】また、請求項３に記載の発明では、上記出
力手段は、音声入力手段によって入力された音声の単音
節を分析してその単音節の特徴パラメータを抽出すると
ともに、その抽出された特徴パラメータと近似する代表
パラメータのうち、近似度の高い上位の所定数を前記記
憶手段から選択して出力する。つまり、近似する代表パ
ラメータの総てを選択して出力しないで、近似度の高い
上位の所定数のみを選択して出力するため、認識結果を
出力する時間を短縮できるとともに、認識結果を表示す
る場合の表示数が少なくなるので、所望の認識結果を見
つけやすくなる。According to the third aspect of the present invention, the output unit analyzes a single syllable of the voice input by the voice input unit, extracts a characteristic parameter of the single syllable, and extracts the extracted characteristic. Among the representative parameters that are similar to the parameters, a high-order predetermined number having a high degree of approximation is selected from the storage means and output. In other words, not all the representative parameters to be approximated are selected and output, but only the upper predetermined number having a high degree of approximation is selected and output, so that the time for outputting the recognition result can be reduced and the recognition result is displayed. Since the number of displayed cases is reduced, it is easy to find a desired recognition result.

【００１８】さらに、請求項４に記載の発明では、上記
出力手段は、音声入力手段によって入力された音声の単
音節を分析してその単音節の特徴パラメータを抽出する
とともに、その抽出された特徴パラメータと近似する代
表パラメータのうち、近似度の高い上位の所定数を記憶
手段から選択し、その選択された代表パラメータに対応
付けられているグループの内容を出力する。つまり、近
似度の高い上位所定数の代表パラメータに対応付けられ
ているグループの内容を出力することができるため、出
力された認識結果を表示する際に近似度の高い上位所定
数のグループの内容を表示できる。Further, in the invention described in claim 4, the output means analyzes a single syllable of the voice input by the voice input means, extracts a characteristic parameter of the single syllable, and extracts the extracted characteristic. Among the representative parameters that are close to the parameters, a predetermined number with a high degree of approximation is selected from the storage unit, and the contents of the group associated with the selected representative parameter are output. In other words, the contents of the groups associated with the predetermined number of representative parameters having a high degree of approximation can be output. Can be displayed.

【００１９】また、請求項５に記載の発明では、上記出
力手段は、音声入力手段によって入力された音声の単音
節を分析してその単音節の特徴パラメータを抽出すると
ともに、その抽出された特徴パラメータと近似する代表
パラメータのうち、近似度の高い上位の所定数を記憶手
段から選択し、さらに、その選択された所定数の代表パ
ラメータにそれぞれ対応付けられているグループ内の特
徴パラメータのうち、近似度の高い上位の所定数を選択
し、その選択した上位の所定数の特徴パラメータを出力
する。つまり、近似度の高い上位所定数のグループ内の
特徴パラメータの総てを出力するのではなく、グループ
内の特徴パラメータの中で近似度の高い上位所定数の特
徴パラメータを選択して出力するため、グループ内の近
似度の低い特徴パラメータを出力しないようにすること
ができる。したがって、認識結果を表示した際に、あま
り近似していない特徴パラメータが表示されないため、
所望の認識結果を見つけやすくなる。In the invention described in claim 5, the output means analyzes a single syllable of the voice input by the voice input means, extracts a characteristic parameter of the single syllable, and extracts the extracted characteristic. Among the representative parameters similar to the parameters, a higher-order predetermined number having a higher degree of approximation is selected from the storage means, and further, among the feature parameters in the group respectively associated with the selected predetermined number of the representative parameters, A predetermined upper-ranked number having a high degree of approximation is selected, and the selected upper-ranked predetermined number of characteristic parameters are output. That is, instead of outputting all of the feature parameters in the upper predetermined number of groups having a higher degree of approximation, selecting and outputting the upper predetermined number of feature parameters having a higher degree of approximation among the characteristic parameters in the group , A feature parameter with a low degree of approximation within a group can be prevented from being output. Therefore, when the recognition result is displayed, feature parameters that are not very similar are not displayed.
It becomes easier to find a desired recognition result.

【００２０】さらに、請求項６に記載の発明では、上記
記憶手段は、グループ内の特徴パラメータを発声頻度に
対応付けて記憶しており、上記出力手段は、グループを
出力する際に、そのグループ内の特徴パラメータを発声
頻度の高い順に出力する。つまり、グループ内の特徴パ
ラメータをランダムに出力するのではなく、発声頻度の
高い順に出力することから、認識結果を表示した際に、
選択を所望する認識結果が上位になる確率が高くなるた
め、所望の認識結果を容易に見つけることができる。Further, in the invention according to claim 6, the storage means stores the characteristic parameter in the group in association with the utterance frequency, and the output means outputs the group when outputting the group. Are output in descending order of utterance frequency. In other words, the feature parameters in the group are not output randomly, but are output in descending order of utterance frequency.
Since the probability that the recognition result desired to be selected becomes higher is higher, the desired recognition result can be easily found.

【００２１】また、請求項７に記載の発明では、上記代
表パラメータは、グループ内から選択した所定の特徴パ
ラメータであるという技術的手段を採用する。つまり、
代表パラメータを作成する手法としては、グループ中の
特徴パラメータに基づいて、新しい特徴パラメータを作
成し、その作成した特徴パラメータを代表パラメータと
する手法と、グループ中の所望の特徴パラメータを選択
して、その選択した特徴パラメータを代表パラメータと
する手法とがあるが、後者の手法、つまり請求項７に記
載の発明によれば、前者の手法のように、新しく特徴パ
ラメータを作成する必要がなく、たとえば、前述した従
来の１０１個の単音節パタンをそのまま適用することが
できる。In the invention according to claim 7, technical means is employed in which the representative parameter is a predetermined characteristic parameter selected from within a group. That is,
As a method of creating a representative parameter, based on the feature parameters in the group, a new feature parameter is created, the created feature parameter is used as a representative parameter, and a desired feature parameter in the group is selected. There is a method of using the selected characteristic parameter as a representative parameter. According to the latter method, that is, according to the invention of claim 7, it is not necessary to create a new characteristic parameter unlike the former method. The conventional 101 single syllable patterns described above can be applied as they are.

【００２２】そして、請求項８に記載の発明では、音声
の単音節を分析して特徴パラメータを抽出するととも
に、その抽出された特徴パラメータの中で近似している
もの同士を集めて特徴パラメータのグループを作成し、
各グループにおいてグループ中の特徴パラメータに基づ
いて、そのグループを代表するパラメータとして作成さ
れた代表パラメータを記憶した記憶領域と、音声を入力
し、その入力された音声の単音節を分析してその単音節
の特徴パラメータを抽出するとともに、その抽出された
特徴パラメータと近似する代表パラメータを上記記憶領
域から選択して出力するコンピュータプログラムが記憶
された記憶媒体という構成であるため、その記憶媒体を
用いることにより、上記請求項１に記載の音声認識装置
を実現できる。つまり、上記音声認識装置は、たとえ
ば、後述する発明の実施の形態に記載するように、音声
認識装置に内蔵されたＣＰＵ、あるいは、音声認識装置
に接続されたコンピュータによって制御されることか
ら、上記記憶媒体としての記憶部を音声認識装置に設
け、もしくは、上記記憶媒体に格納されているコンピュ
ータプログラムをコンピュータにインストールすること
によって、計算量を削減して認識に要する時間を短縮で
きる音声認識装置を実現できるからである。According to the present invention, a feature parameter is extracted by analyzing a single syllable of a voice, and similar features among the extracted feature parameters are collected to obtain a feature parameter. Create a group,
In each group, based on the characteristic parameters in the group, a storage area storing a representative parameter created as a parameter representative of the group, a voice is input, a single syllable of the input voice is analyzed, and the unit is analyzed. Since the computer program is configured to extract a syllable feature parameter and select and output a representative parameter similar to the extracted feature parameter from the storage area, the computer program is stored. Thus, the speech recognition device according to claim 1 can be realized. In other words, the speech recognition device is controlled by a CPU incorporated in the speech recognition device or a computer connected to the speech recognition device, as described in an embodiment of the invention described later. By providing a storage unit as a storage medium in a speech recognition device, or by installing a computer program stored in the storage medium into a computer, a speech recognition device capable of reducing the amount of calculation and shortening the time required for recognition. Because it can be realized.

【００２３】特に、請求項９に記載の発明では、上記記
憶領域は、代表パラメータと、その代表パラメータによ
って代表されるグループの内容とを対応付けて記憶して
おり、上記コンピュータプログラムは、音声入力手段に
よって入力された音声の単音節を分析してその単音節の
特徴パラメータを抽出するとともに、その抽出された特
徴パラメータと近似する代表パラメータに対応付けられ
ているグループの内容を記憶領域から読出して出力する
ためのものであるため、代表パラメータに対応付けられ
ているグループの内容を出力することができ、その出力
された認識結果を表示する際にグループの内容を表示可
能とする音声認識装置を実現できる。[0023] In particular, in the ninth aspect of the present invention, the storage area stores a representative parameter and a content of a group represented by the representative parameter in association with each other. Means for analyzing a single syllable of the input speech and extracting characteristic parameters of the single syllable, and reading out the contents of a group associated with a representative parameter approximating the extracted characteristic parameter from a storage area; Since it is for outputting, it is possible to output the contents of the group associated with the representative parameter, and a speech recognition device that can display the contents of the group when displaying the output recognition result. realizable.

【００２４】[0024]

【発明の実施の形態】以下、本発明の音声認識装置の一
実施形態について図を参照して説明する。図１は、本実
施形態の音声認識装置の概略構成をブロックで示す説明
図である。なお、従来と同一の構成には同一の符号を用
いてその説明を省略する。図１に示すように、本実施形
態の音声認識装置１０は、本発明の音声入力手段を構成
する入力部１２と、代表標準パタン記憶部１６と、分析
部１４と、比較認識部１８と、標準パタン群記憶部２０
と、制御部２２と、データ処理部２４と、表示部２６
と、キーボード２８とを備える。なお、分析部１４、比
較認識部１８および制御部２２が、本発明の出力手段と
して機能する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the speech recognition apparatus of the present invention will be described below with reference to the drawings. FIG. 1 is an explanatory diagram showing a schematic configuration of the speech recognition device of the present embodiment by blocks. The same components as those in the related art are denoted by the same reference numerals, and description thereof is omitted. As shown in FIG. 1, the speech recognition apparatus 10 of the present embodiment includes an input unit 12, a representative standard pattern storage unit 16, an analysis unit 14, a comparison recognition unit 18, Standard pattern group storage unit 20
, Control unit 22, data processing unit 24, display unit 26
And a keyboard 28. Note that the analysis unit 14, the comparison recognition unit 18, and the control unit 22 function as output means of the present invention.

【００２５】入力部１２は、マイクロフォン、アンプ、
Ａ／Ｄ変換器およびフィルタなどから構成されており、
分析部１４は、バンドパスフィルタバンクなどで構成さ
れている。比較認識部１８としては、ＣＰＵが用いら
れ、代表標準パタン記憶部１６および標準パタン群記憶
部２０としては、ＲＯＭ、ＥＥＰＲＯＭ、ＲＡＭなどの
記憶媒体が用いられる。表示部２６は、たとえばモニタ
テレビやＬＣＤ（液晶表示装置）などで構成されてい
る。ここで、代表標準パタン記憶部１６および標準パタ
ン群記憶部２０の記憶内容について、それを示す図２を
参照して説明する。図２（Ａ）は、標準パタン群記憶部
２０の記憶内容を示す説明図であり、図２（Ｂ）は、代
表標準パタン記憶部１６の記憶内容を示す説明図であ
る。なお、標準パタン群記憶部２０および代表標準パタ
ン記憶部１６の記憶内容は、実際には、単音節の特徴パ
ラメータの標準的な型である標準パタンというデータそ
のものであるが、説明を分かり易くするために、図２で
は、標準パタンに対応する単音節により記憶内容を示し
ている。図２（Ａ）に示すように、標準パタン群記憶部
２０は、５０音、拗音、濁音、半濁音の４種類の標準パ
タンから構成されるグループ２０ａと、グループ２０ａ
の中の標準パタン群の中から５０音を示す標準パタンを
そのグループを代表する標準パタンとして選択した標準
パタン（以下、代表標準パタンと称する）１６ａとを対
応付けてｎ個記憶している。たとえば、１個目のグルー
プには、「か」、「きゃ」、「が」および「ぎゃ」の４
個の標準パタンから構成されるグループ２０ａと、その
グループ２０ａの代表標準パタン１６ａである「か」と
が対応付けて記憶されている。また、代表標準パタン記
憶部１６には、図２（Ｂ）に示すように、標準パタン群
記憶部２０に記憶されているｎ個の代表標準パタン１６
ａが記憶されている。なお、代表標準パタンが、本発明
の代表パラメータに対応する。The input unit 12 includes a microphone, an amplifier,
It is composed of an A / D converter and a filter, etc.
The analysis unit 14 includes a band-pass filter bank and the like. A CPU is used as the comparison recognition unit 18, and a storage medium such as a ROM, an EEPROM, and a RAM is used as the representative standard pattern storage unit 16 and the standard pattern group storage unit 20. The display unit 26 includes, for example, a monitor television, an LCD (liquid crystal display device), and the like. Here, the storage contents of the representative standard pattern storage unit 16 and the standard pattern group storage unit 20 will be described with reference to FIG. FIG. 2A is an explanatory diagram showing the storage contents of the standard pattern group storage unit 20, and FIG. 2B is an explanatory diagram showing the storage contents of the representative standard pattern storage unit 16. Note that the storage contents of the standard pattern group storage unit 20 and the representative standard pattern storage unit 16 are actually data of a standard pattern which is a standard type of a characteristic parameter of a single syllable, but the description is easy to understand. For this reason, in FIG. 2, the storage contents are indicated by single syllables corresponding to the standard pattern. As shown in FIG. 2A, the standard pattern group storage unit 20 stores a group 20a composed of four types of standard patterns of 50 tones, a murmur, a voiced sound, and a semi-voiced sound, and a group 20a.
And n standard patterns indicating 50 sounds from the standard pattern group in the table are stored in association with a standard pattern (hereinafter, referred to as a representative standard pattern) 16a selected as a standard pattern representing the group. For example, the first group contains four characters, “ka”, “ki”, “ga”, and “gyu”.
A group 20a composed of a plurality of standard patterns and “ka”, which is the representative standard pattern 16a of the group 20a, are stored in association with each other. As shown in FIG. 2B, the representative standard pattern storage unit 16 stores the n representative standard patterns 16 stored in the standard pattern group storage unit 20.
a is stored. Note that the representative standard pattern corresponds to the representative parameter of the present invention.

【００２６】次に、本実施形態の音声認識装置１０が音
声認識を行う際の処理の流れについてそれを示す図３の
フローチャートを参照して説明する。なお、以下に示す
分析部１４、比較認識部１８および制御部２２は、主と
して図示しないＣＰＵがコンピュータプログラムを実行
することによって機能する部分を示す。まず、音声認識
装置１０の使用者が、入力部１２に備えられたマイクロ
フォンなどに向かって単音節を発生すると（ステップ
２）、その発生された単音節は、入力部１２に備えられ
たＡ／Ｄ変換器およびフィルタなどによってデジタルの
音声信号に変換される。続いて、分析部１４が入力部１
２から出力された音声信号を、所定の周期ごとに周波数
スペクトルの時間変化に分析して特徴パラメータを抽出
する（ステップ４）。続いて、比較認識部１８が分析部
１４によって抽出された特徴パラメータと、代表標準パ
タン記憶部１６に記憶されている各代表標準パタン１６
ａとを順次比較する（ステップ６）。Next, the flow of processing when the speech recognition apparatus 10 of this embodiment performs speech recognition will be described with reference to the flowchart of FIG. The analysis unit 14, comparison / recognition unit 18, and control unit 22 shown below function mainly when a CPU (not shown) executes a computer program. First, when the user of the voice recognition device 10 generates a single syllable toward a microphone or the like provided in the input unit 12 (step 2), the generated single syllable is input to the A / It is converted into a digital audio signal by a D converter and a filter. Subsequently, the analysis unit 14 sets the input unit 1
The audio signal output from the second is analyzed at every predetermined cycle into a time change of a frequency spectrum to extract a characteristic parameter (step 4). Subsequently, the comparison recognition unit 18 compares the characteristic parameters extracted by the analysis unit 14 with the representative standard patterns 16 stored in the representative standard pattern storage unit 16.
a are sequentially compared (step 6).

【００２７】そして、比較認識部１８は、比較して得ら
れた代表標準パタン１６ａの中で最も近似している代表
標準パタンを第１候補として、また順次近似した代表標
準パタンを次候補として選択し（ステップ８）、これら
の選択した代表標準パタンを制御部２２へ出力処理する
（ステップ１０）。続いて、制御部２２は、入力された
代表標準パタンに対応付けられているグループ２０ａを
選択し、その選択したグループの中身である標準パタン
を示すデータを表示部２６に出力し、表示部２６は入力
したデータによって示される単音節を表示する（ステッ
プ１２）。たとえば、使用者が、「きゃ」と発声した場
合は、比較認識部１８は、図２（Ｂ）に示す代表標準パ
タン記憶部１６の中から第１候補として「か」を、第２
候補として「た」をというように、第１〜第４候補まで
を選択し（ステップ８）、制御部２２は、それら第１〜
第４候補までの代表標準パタンに対応付けられているグ
ループとして、図２（Ａ）に示す標準パタン群記憶部２
０の中から４つのグループを選択し、その４つのグルー
プそれぞれの中身である標準パタンを示すデータを表示
部２６に出力する。そして、表示部２６によって、認識
結果を説明する図４に示すように上記選択された４つの
グループが表示される（ステップ１２）。The comparison / recognition unit 18 selects the representative standard pattern closest to the representative standard pattern 16a obtained by comparison as the first candidate, and selects the representative standard pattern sequentially approximated as the next candidate. Then, the selected representative standard pattern is output to the control unit 22 (step 10). Subsequently, the control unit 22 selects the group 20a associated with the input representative standard pattern, outputs data indicating the standard pattern that is the contents of the selected group to the display unit 26, and outputs the data to the display unit 26. Displays a single syllable indicated by the input data (step 12). For example, when the user utters “Kyu”, the comparison / recognition unit 18 sets “ka” as the first candidate in the representative standard pattern storage unit 16 shown in FIG.
The first to fourth candidates are selected, such as "ta" as a candidate (step 8), and the control unit 22 selects the first to fourth candidates.
As a group associated with the representative standard pattern up to the fourth candidate, the standard pattern group storage unit 2 shown in FIG.
Four groups are selected from 0, and data indicating a standard pattern, which is the contents of each of the four groups, is output to the display unit 26. Then, the four selected groups are displayed on the display unit 26 as shown in FIG. 4 for explaining the recognition result (step 12).

【００２８】そして、使用者が、表示部２６に表示され
た認識結果の中から所望の認識結果を発見した後、キー
ボード２８を操作して画面上のカーソルを選択したい認
識結果の上へ移動させて選択する（ステップ１４）。続
いて、その選択された認識結果は、制御部２２へ出力さ
れた後、データ処理部２４に出力される（ステップ１
６）。なお、図２（Ａ）に示すように、標準パタンおよ
び代表標準パタンに番号を付しておき、図４に示すよう
に、認識結果を番号と共に表示部２６に表示するように
構成することもできる。この構成によれば、使用者は、
テンキーなどを用いて番号を直接入力することにより、
所望の認識結果を直接選択することができる。また、制
御部２２から出力されるデータを図示しない記憶媒体へ
出力し、その記憶媒体に記憶するように構成することも
できる。After the user finds a desired recognition result from the recognition results displayed on the display unit 26, the user operates the keyboard 28 to move the cursor on the screen to the desired recognition result. (Step 14). Subsequently, the selected recognition result is output to the data processing unit 24 after being output to the control unit 22 (step 1).
6). In addition, as shown in FIG. 2A, a number may be assigned to the standard pattern and the representative standard pattern, and the recognition result may be displayed on the display unit 26 together with the number as shown in FIG. it can. According to this configuration, the user:
By directly entering the number using the numeric keypad, etc.
A desired recognition result can be directly selected. Further, the data output from the control unit 22 may be output to a storage medium (not shown) and stored in the storage medium.

【００２９】以上のように、本実施形態の音声認識装置
１０を使用すれば、特徴パラメータが近似している標準
パタン同士をグループ化するとともに、各グループごと
にそのグループを代表する代表標準パタンを設定するこ
とにより、比較対象となる数を削減し、音声信号の分析
により抽出された特徴パラメータと近似しているか否か
の判定を、数の少ない代表標準パタン１６ａに対しての
み行うため、従来のように、上記判定を１０１個の総て
の標準パラメータに対して行うものよりも、計算量をか
なり削減して認識に要する時間を短縮できる。たとえ
ば、代表標準パタン記憶部１６に記憶されている代表標
準パタンが２０個である場合は、比較認識部１８のマッ
チング処理回数は２０回となり、従来の約１／５の計算
量に軽減できる。As described above, by using the speech recognition apparatus 10 of the present embodiment, standard patterns having similar characteristic parameters are grouped, and a representative standard pattern representing the group is defined for each group. By setting, the number of comparison targets is reduced, and the determination as to whether or not it is close to the characteristic parameter extracted by the analysis of the audio signal is performed only for the representative standard pattern 16a with a small number. As described above, the amount of calculation can be considerably reduced and the time required for recognition can be reduced as compared with the case where the above determination is made for all 101 standard parameters. For example, when the number of representative standard patterns stored in the representative standard pattern storage unit 16 is 20, the number of matching processes of the comparison recognition unit 18 is 20, which can be reduced to about 1/5 of the conventional calculation amount.

【００３０】また、上記実施形態では、比較認識部１８
が代表標準パタン記憶部１６の中から近似していると判
定する代表標準パタンを上位の４個まで選択する構成を
用いたが、４個に限らず、たとえば、２個や３個を選択
するような構成でもよく、さらには、特に個数を限定す
ることなく選択するように構成してもよい。この構成を
用いれば、代表標準パタンの選択数を減らすことができ
るため、比較認識部１８および制御部２２の処理時間を
短縮できるとともに、表示部２６に表示される認識結果
の数が少なくなるので認識結果を見つけやすくなるとい
う効果がある。なお、この構成が、請求項３または請求
項４に記載の発明に対応する。In the above embodiment, the comparison recognition unit 18
Is used to select up to four higher-order representative standard patterns that are determined to be approximated from the representative standard pattern storage unit 16, but the number is not limited to four, and for example, two or three are selected. Such a configuration may be adopted, and further, a selection may be made without any particular limitation. With this configuration, the number of selection of the representative standard pattern can be reduced, so that the processing time of the comparison recognition unit 18 and the control unit 22 can be reduced, and the number of recognition results displayed on the display unit 26 can be reduced. This has the effect of making it easier to find the recognition result. This configuration corresponds to the third or fourth aspect of the present invention.

【００３１】さらに、制御部２２によって標準パタン群
記憶部２０から選択されたグループの中で、音声信号の
分析により抽出された特徴パラメータと近似している標
準パタンを選択する構成を設け、その選択した標準パタ
ンと、代表標準パタンとを表示部２６に表示する構成を
用いることもできる。つまり、上位所定数の代表標準パ
タンおよびそのグループ内の標準パタンの総てを表示す
るのではなく、グループ内の標準パタンの中で近似して
いる標準パタンを選択して代表標準パタンと共に表示す
るため、グループ内の近似していない標準パタンを出力
しないようにすることができる。したがって、認識結果
を表示した際に、あまり近似していない特徴パラメータ
が表示されないため、認識結果を見つけやすくなるとい
う効果がある。なお、この構成が、請求項５に記載の発
明に対応する。Further, a configuration is provided for selecting a standard pattern that is similar to the characteristic parameter extracted by analyzing the audio signal from the group selected by the control unit 22 from the standard pattern group storage unit 20. A configuration in which the standard pattern and the representative standard pattern are displayed on the display unit 26 may be used. That is, instead of displaying the predetermined number of representative standard patterns and all of the standard patterns in the group, a standard pattern that is close to the standard patterns in the group is selected and displayed together with the representative standard pattern. Therefore, it is possible not to output a non-approximate standard pattern in the group. Therefore, when the recognition result is displayed, a feature parameter that is not so approximated is not displayed, so that the recognition result can be easily found. This configuration corresponds to the fifth aspect of the present invention.

【００３２】またさらに、制御部２２が、発声頻度の高
い順に認識結果を出力するように構成することもでき
る。たとえば、単音節の発声頻度を予め計算しておき、
その計算結果を発声頻度を示すデータとして標準パタン
と関連付けて標準パタン群記憶部２０に記憶しておく。
そして、制御部２２が標準パタン群記憶部２０から選択
したグループを表示部２６へ出力するときに、その出力
しようとするグループ内の標準パタンに関連付けられて
いる発声頻度を示すデータに基づいて、発声頻度の高い
順に標準パタンを表示部２６に出力する。この構成を用
いれば、認識結果が発声頻度の高い順に出力するため、
認識結果を表示した際に、選択を所望する認識結果が上
位になる確率が高くなり、所望の認識結果を見つけやす
くなるという効果がある。なお、この構成が、請求項６
に記載の発明に対応する。Further, the control unit 22 may be configured to output recognition results in descending order of utterance frequency. For example, calculate the utterance frequency of a single syllable in advance,
The calculation result is stored in the standard pattern group storage unit 20 as data indicating the utterance frequency in association with the standard pattern.
Then, when the control unit 22 outputs the group selected from the standard pattern group storage unit 20 to the display unit 26, based on the data indicating the utterance frequency associated with the standard pattern in the group to be output, The standard patterns are output to the display unit 26 in descending order of the utterance frequency. With this configuration, recognition results are output in descending order of utterance frequency.
When the recognition result is displayed, the probability that the recognition result desired to be selected becomes higher is increased, and there is an effect that the desired recognition result is easily found. In addition, this configuration corresponds to claim 6
Corresponds to the invention described in (1).

【００３３】そしてさらに、上記実施形態では、単音節
の認識方法として標準パタンとのマッチングを用いた
が、隠れマルコフモデルやニューラルネットワークなど
を利用した認識方法を用いることもできる。また、上記
実施形態では、公知の１０１個の標準パタンの中から特
徴パラメータが近似しているもの同士を集めたグループ
の中の標準パタンの中から所定の標準パタンを選択し、
その選択した標準パタンを代表標準パタンに設定する構
成を用いたが、特徴パラメータが近似している単音節の
グループに分け、そのグループ中の単音節群から学習し
て代表の特徴パラメータを生成し、それを代表標準パタ
ンとすることもできる。その学習としては、単音節の認
識方法としてＤＰマッチングを用いる場合は、グループ
中の特徴パラメータの平均値を用いたり、特徴パラメー
タの共通部分を抽出して行う。また、隠れマルコフモデ
ルを用いる場合は、Ｂａｕｍ−Ｗｅｌｃｈ（ｅｘｐｅｃ
ｔａｔｉｏｎ−ｍａｘｉｍｉｚａｔｉｏｎ−ａｌｇｏｒ
ｉｔｈｍ）などを用いて学習する。Further, in the above embodiment, matching with a standard pattern is used as a method for recognizing a single syllable. However, a recognition method using a hidden Markov model, a neural network, or the like can be used. Further, in the above-described embodiment, a predetermined standard pattern is selected from among standard patterns in a group in which characteristic parameter approximations are collected from among 101 known standard patterns,
The selected standard pattern is set as the representative standard pattern, but it is divided into single syllable groups whose characteristic parameters are similar, and learning is performed from the single syllable groups in the group to generate representative characteristic parameters. , Can be used as a representative standard pattern. When using DP matching as a method for recognizing a single syllable, the learning is performed by using an average value of feature parameters in a group or extracting a common part of feature parameters. When a hidden Markov model is used, Baum-Welch (expec
tation-maximization-algor
It learns using (i.

【００３４】また、上記実施形態の音声認識装置１０が
音声認識を行うためのコンピュータプログラムは、音声
認識装置１０内の所定箇所にＲＯＭなどの記憶媒体とし
て設けられ、あるいは、上記コンピュータプログラムを
ＣＤ−ＲＯＭやフロッピーディスクなどに記憶し、それ
らを装置に備えられた読取装置を用いてインストールす
ることによって提供される。この場合、上記ＲＯＭ、あ
るいは、ＣＤ−ＲＯＭやフロッピーディスクなどが、請
求項３または請求項９に記載の記憶媒体として機能す
る。さらに、外部情報処理装置から有線または無線の通
信手段を介してコンピュータプログラムを読み込んで動
作させることもできる。The computer program for the voice recognition device 10 of the above embodiment to perform voice recognition is provided at a predetermined location in the voice recognition device 10 as a storage medium such as a ROM, or the computer program is stored on a CD-ROM. It is provided by storing it in a ROM, a floppy disk, or the like, and installing them using a reading device provided in the device. In this case, the ROM, CD-ROM, floppy disk, or the like functions as the storage medium according to the third or ninth aspect. Furthermore, a computer program can be read from an external information processing device via a wired or wireless communication unit and operated.

【００３５】ところで、上記実施形態における代表標準
パタン記憶部１６および標準パタン群記憶部２０が、本
発明の記憶手段に対応し、代表標準パタン記憶部１６お
よび標準パタン群記憶部２０の記憶内容が、コンピュー
タプログラムの一部として記憶されている場合は、それ
らを記憶する領域が請求項８または請求項９に記載の記
憶領域に対応する。Incidentally, the representative standard pattern storage section 16 and the standard pattern group storage section 20 in the above embodiment correspond to the storage means of the present invention, and the storage contents of the representative standard pattern storage section 16 and the standard pattern group storage section 20 are as follows. If they are stored as a part of a computer program, an area for storing them corresponds to the storage area according to claim 8 or 9.

【００３６】[0036]

【発明の効果】以上のように、請求項１ないし請求項７
に記載の発明によれば、音声の単音節を分析して特徴パ
ラメータを抽出するとともに、その抽出された特徴パラ
メータの中で近似しているもの同士を集めて特徴パラメ
ータのグループを作成し、各グループにおいてグループ
中の特徴パラメータに基づいて、そのグループを代表す
るパラメータとして作成された代表パラメータを記憶す
る記憶手段と、音声を入力する音声入力手段によって入
力された音声の単音節を分析してその単音節の特徴パラ
メータを抽出するとともに、その抽出された特徴パラメ
ータと近似する代表パラメータを上記記憶手段から選択
して出力する出力手段とを備えるため、記憶手段に記憶
されている総ての特徴パラメータに対して近似している
か否かを判定する音声認識装置よりも、計算量を削減し
て認識に要する時間を短縮できる音声認識装置を実現で
きる。As described above, claims 1 to 7 are as described above.
According to the invention described in the above, while analyzing a single syllable of the voice to extract the feature parameters, among the extracted feature parameters that are similar to each other are collected to create a group of feature parameters, In the group, based on the characteristic parameters in the group, storage means for storing a representative parameter created as a parameter representative of the group, and a single syllable of the voice input by the voice input means for inputting the voice are analyzed. Output means for extracting a characteristic parameter of a single syllable and selecting and outputting a representative parameter approximating the extracted characteristic parameter from the storage means, so that all characteristic parameters stored in the storage means are provided. When the amount of calculation is smaller than that of a speech recognition device that determines whether or not Can be realized speech recognition apparatus can be shortened.

【００３７】特に、請求項２に記載の発明によれば、上
記記憶手段は、代表パラメータと、その代表パラメータ
によって代表されるグループの内容とを対応付けて記憶
しており、上記出力手段は、音声入力手段によって入力
された音声の単音節を分析してその単音節の特徴パラメ
ータを抽出するとともに、その抽出された特徴パラメー
タと近似する代表パラメータに対応付けられているグル
ープの内容を記憶手段から読出して出力するため、出力
された認識結果を表示する際にグループの内容を表示で
きる。In particular, according to the second aspect of the present invention, the storage means stores the representative parameter and the contents of the group represented by the representative parameter in association with each other. A single syllable of the voice input by the voice input unit is analyzed to extract the characteristic parameter of the single syllable, and the contents of the group associated with the extracted representative parameter and the representative parameter that is approximated are stored from the storage unit. Since the data is read and output, the content of the group can be displayed when the output recognition result is displayed.

【００３８】また、請求項３に記載の発明によれば、上
記出力手段は、音声入力手段によって入力された音声の
単音節を分析してその単音節の特徴パラメータを抽出す
るとともに、その抽出された特徴パラメータと近似する
代表パラメータのうち、近似度の高い上位の所定数を前
記記憶手段から選択して出力するため、認識結果を出力
する時間を短縮できるとともに、認識結果を表示する場
合の表示数が少なくなるので、所望の認識結果を見つけ
やすくすることができる。According to the third aspect of the present invention, the output means analyzes a single syllable of the voice input by the voice input means, extracts a characteristic parameter of the single syllable, and extracts the characteristic parameter of the single syllable. Among the representative parameters that are similar to the characteristic parameters, a predetermined number with a high degree of approximation is selected and output from the storage unit, so that the time required to output the recognition result can be reduced, and the display when displaying the recognition result can be shortened. Since the number is reduced, a desired recognition result can be easily found.

【００３９】さらに、請求項４に記載の発明によれば、
上記出力手段は、音声入力手段によって入力された音声
の単音節を分析してその単音節の特徴パラメータを抽出
するとともに、その抽出された特徴パラメータと近似す
る代表パラメータのうち、近似度の高い上位の所定数を
記憶手段から選択し、その選択された代表パラメータに
それぞれ対応付けられているグループの内容を出力する
ため、出力された認識結果を表示する際に近似度の高い
上位所定数のグループの内容を表示できる。Further, according to the invention described in claim 4,
The output means analyzes a single syllable of the voice input by the voice input means, extracts a characteristic parameter of the single syllable, and, among representative parameters approximated to the extracted characteristic parameter, Is selected from the storage means, and the contents of the groups respectively associated with the selected representative parameters are output. Therefore, when displaying the output recognition results, the upper predetermined number of groups having a high degree of similarity are displayed. Can be displayed.

【００４０】また、請求項５に記載の発明によれば、上
記出力手段は、音声入力手段によって入力された音声の
単音節を分析してその単音節の特徴パラメータを抽出す
るとともに、その抽出された特徴パラメータと近似する
代表パラメータのうち、近似度の高い上位の所定数を記
憶手段から選択し、さらに、その選択された所定数の代
表パラメータにそれぞれ対応付けられているグループ内
の特徴パラメータのうち、近似度の高い上位の所定数を
選択し、その選択した上位の所定数の特徴パラメータを
出力するため、認識結果を表示した際に、あまり近似し
ていない特徴パラメータが表示されないため、所望の認
識結果を見つけやすくなる。According to the fifth aspect of the present invention, the output means analyzes a single syllable of the voice input by the voice input means, extracts the characteristic parameter of the single syllable, and extracts the characteristic parameter of the single syllable. Of the representative parameters approximated by the selected characteristic parameter, a high-order predetermined number having a high degree of approximation is selected from the storage means, and further, the characteristic parameters in the group respectively associated with the selected predetermined number of the representative parameters are selected. Of these, a higher-order predetermined number having a high degree of approximation is selected, and the selected higher-order predetermined number of feature parameters are output. When the recognition result is displayed, feature parameters that are not very similar are not displayed. Makes it easier to find the recognition result.

【００４１】さらに、請求項６に記載の発明によれば、
上記記憶手段は、グループ内の特徴パラメータを発声頻
度に対応付けて記憶しており、上記出力手段は、特徴パ
ラメータを発声頻度の高い順に出力するため、認識結果
を表示した際に、選択を所望する認識結果が上位になる
確率が高くなり、認識結果を選択しやすくなる。この構
成を用いれば、認識結果が発声頻度の高い順に出力する
ため、認識結果を表示した際に、選択を所望する認識結
果が上位になる確率が高くなり、所望の認識結果をより
一層見つけやすくなるという効果がある。Further, according to the invention described in claim 6,
The storage unit stores the feature parameters in the group in association with the utterance frequency, and the output unit outputs the feature parameters in descending order of the utterance frequency. The higher the probability that the recognition result to be performed is higher, the easier it is to select the recognition result. With this configuration, since the recognition results are output in descending order of utterance frequency, when the recognition results are displayed, the probability that the recognition result desired to be selected becomes higher is higher, and the desired recognition result is more easily found. It has the effect of becoming.

【００４２】また、請求項７に記載の発明によれば、上
記代表パラメータは、グループ内から選択した所定の特
徴パラメータであるという技術的手段を採用する。つま
り、代表パラメータを作成する手法としては、グループ
中の特徴パラメータに基づいて、新しい特徴パラメータ
を作成し、その作成した特徴パラメータを代表パラメー
タとする手法と、グループ中の所望の特徴パラメータを
選択して、その選択した特徴パラメータを代表パラメー
タとする手法とがあるが、後者の手法、つまり請求項７
に記載の発明によれば、前者の手法のように、新しく特
徴パラメータを作成する必要がなく、たとえば、前述し
た従来の１０１個の単音節パタンをそのまま適用するこ
とができる。According to the seventh aspect of the present invention, a technical means is adopted in which the representative parameter is a predetermined characteristic parameter selected from within a group. In other words, as a method of creating a representative parameter, a new feature parameter is created based on the feature parameters in the group, and the created feature parameter is used as a representative parameter, and a desired feature parameter in the group is selected. Then, there is a method in which the selected characteristic parameter is used as a representative parameter.
According to the invention described in (1), unlike the former method, it is not necessary to create a new feature parameter. For example, the above-mentioned conventional 101 single syllable patterns can be applied as they are.

【００４３】そして、請求項８に記載の発明によれば、
音声の単音節を分析して特徴パラメータを抽出するとと
もに、その抽出された特徴パラメータの中で近似してい
るもの同士を集めて特徴パラメータのグループを作成
し、各グループにおいてグループ中の特徴パラメータに
基づいて、そのグループを代表するパラメータとして作
成された代表パラメータを記憶した記憶領域と、音声を
入力し、その入力された音声の単音節を分析してその単
音節の特徴パラメータを抽出するとともに、その抽出さ
れた特徴パラメータと近似する代表パラメータを上記記
憶領域から選択して出力するコンピュータプログラムが
記憶された記憶媒体という構成であるため、その記憶媒
体を音声認識装置内の記憶部として設け、もしくは、そ
の記憶媒体に格納されているコンピュータプログラムを
音声認識装置あるいは音声認識装置に接続されたコンピ
ュータにインストールすることによって計算量を削減し
て認識に要する時間を短縮できる音声認識装置を実現で
きる。According to the eighth aspect of the present invention,
A feature parameter is extracted by analyzing a monosyllable of a voice, and a group of feature parameters is created by collecting approximations among the extracted feature parameters, and a group of feature parameters is created for each group. Based on the storage area storing a representative parameter created as a parameter representing the group, and input a voice, and analyze a single syllable of the input voice to extract the characteristic parameter of the single syllable, Since the configuration is a storage medium in which a computer program for selecting and outputting a representative parameter similar to the extracted characteristic parameter from the storage area is stored, the storage medium is provided as a storage unit in the speech recognition device, or The computer program stored in the storage medium, It can be realized speech recognition apparatus which can shorten the time required for the recognition by reducing the amount of calculation by installing a computer connected to a voice recognition device.

【００４４】特に、請求項９に記載の発明によれば、上
記記憶領域は、代表パラメータと、その代表パラメータ
によって代表されるグループの内容とを対応付けて記憶
しており、上記コンピュータプログラムは、音声入力手
段によって入力された音声の単音節を分析してその単音
節の特徴パラメータを抽出するとともに、その抽出され
た特徴パラメータと近似する代表パラメータに対応付け
られているグループの内容を記憶領域から読出して出力
するためのものであるため、代表パラメータに対応付け
られているグループの内容を出力することができ、その
出力された認識結果を表示する際にグループの内容を表
示可能とする音声認識装置を実現できる。In particular, according to the ninth aspect of the present invention, the storage area stores the representative parameter and the contents of the group represented by the representative parameter in association with each other. A single syllable of the voice input by the voice input means is analyzed to extract the characteristic parameter of the single syllable, and the contents of the group corresponding to the extracted representative parameter and the representative parameter approximated from the storage area. Since it is for reading and outputting, it is possible to output the contents of the group associated with the representative parameter, and to display the contents of the group when displaying the output recognition result. The device can be realized.

[Brief description of the drawings]

【図１】本発明実施形態の音声認識装置の概略構成をブ
ロックで示す説明図である。FIG. 1 is an explanatory diagram showing a schematic configuration of a speech recognition device according to an embodiment of the present invention by blocks.

【図２】図２（Ａ）は、標準パタン群記憶部２０の記憶
内容を示す説明図であり、図２（Ｂ）は、代表標準パタ
ン記憶部１６の記憶内容を示す説明図である。FIG. 2A is an explanatory diagram showing storage contents of a standard pattern group storage unit 20; FIG. 2B is an explanatory diagram showing storage contents of a representative standard pattern storage unit 16;

【図３】本発明実施形態の音声認識装置の処理の流れを
示すフローチャートである。FIG. 3 is a flowchart showing a processing flow of the voice recognition device according to the embodiment of the present invention.

【図４】表示部２６に表示された認識結果を示す説明図
である。FIG. 4 is an explanatory diagram showing a recognition result displayed on a display unit 26;

【図５】従来の音声認識装置の概略構成をブロックで示
す説明図である。FIG. 5 is an explanatory diagram showing a schematic configuration of a conventional voice recognition device by blocks.

[Explanation of symbols]

１０音声認識装置１２入力部１４分析部１６代表標準パタン記憶部１６ａ代表標準パタン１８比較認識部２０標準パタン群記憶部２０ａグループ２２制御部２４データ処理部２６表示部２８キーボード DESCRIPTION OF SYMBOLS 10 Speech recognition apparatus 12 Input part 14 Analysis part 16 Representative standard pattern storage part 16a Representative standard pattern 18 Comparative recognition part 20 Standard pattern group storage part 20a Group 22 Control part 24 Data processing part 26 Display part 28 Keyboard

Claims

[Claims]

1. A feature parameter is extracted by analyzing a monosyllable of a speech, and a group of feature parameters is created by collecting approximations among the extracted feature parameters. Based on the characteristic parameters in the storage unit, storage means for storing representative parameters created as parameters representing the group, voice input means for inputting voice, and monosyllables of voice input by the voice input means are analyzed. Output means for extracting a characteristic parameter of the single syllable and selecting and outputting a representative parameter similar to the extracted characteristic parameter from the storage means. .

2. The storage unit stores the representative parameter and the contents of a group represented by the representative parameter in association with each other, and the output unit outputs a voice input by the voice input unit. Analyzing a single syllable to extract the characteristic parameter of the single syllable, and reading and outputting from the storage means the contents of a group associated with a representative parameter that is similar to the extracted characteristic parameter. The voice recognition device according to claim 1.

3. The output means analyzes a single syllable of the voice input by the voice input means to extract a characteristic parameter of the single syllable, and includes a representative parameter which is similar to the extracted characteristic parameter. 2. The speech recognition apparatus according to claim 1, wherein a predetermined number having a higher degree of approximation is selected and output from the storage unit.

4. The output means analyzes a single syllable of the voice input by the voice input means to extract a characteristic parameter of the single syllable, and includes a representative parameter which is similar to the extracted characteristic parameter. 3. The method according to claim 2, wherein a predetermined number having a high degree of approximation is selected from the storage unit, and the contents of the group respectively associated with the selected predetermined number of representative parameters are output. Voice recognition device.

5. The output means analyzes a single syllable of the voice input by the voice input means to extract a characteristic parameter of the single syllable, and includes a representative parameter which is similar to the extracted characteristic parameter. , Selecting a predetermined number with a high degree of similarity from the storage means, and further selecting a predetermined number with a high degree of similarity among the feature parameters in the group respectively associated with the selected predetermined number of representative parameters. 5. The speech recognition apparatus according to claim 4, wherein the number is selected, and a selected predetermined number of characteristic parameters of the selected higher order are output.

6. The storage unit stores feature parameters in the group in association with utterance frequencies, and the output unit outputs the feature parameters in descending order of utterance frequency. The speech recognition device according to claim 2 or claim 4 or claim 5.

7. The speech recognition apparatus according to claim 1, wherein the representative parameter is a predetermined characteristic parameter selected from the group.

8. A monosyllable of a voice is analyzed to extract feature parameters, and a group of feature parameters is created by collecting approximations among the extracted feature parameters. Based on the characteristic parameters in the group, a storage area storing representative parameters created as parameters representative of the group, a voice is input, a single syllable of the input voice is analyzed, and the A storage medium storing a computer program for extracting characteristic parameters and selecting and outputting a representative parameter similar to the extracted characteristic parameters from the storage means.

9. The storage area stores the representative parameter and the contents of a group represented by the representative parameter in association with each other, and the computer program stores a voice of the voice input by the voice input unit. This is for analyzing a single syllable and extracting a characteristic parameter of the single syllable, and reading and outputting the contents of a group associated with a representative parameter approximated to the extracted characteristic parameter from the storage means. 9. The storage medium according to claim 8, wherein: