JPS6266299A

JPS6266299A - Voice recognition equipment

Info

Publication number: JPS6266299A
Application number: JP20713285A
Authority: JP
Inventors: 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-09-19
Filing date: 1985-09-19
Publication date: 1987-03-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、音声認識装置に関する。[Detailed description of the invention] Technical field The present invention relates to a speech recognition device.

従来技術本出願人は、単語単位に発生した音声を、２値化処理し
て特徴パターンを求め、この２値化処理して求めた特徴
パターンと辞書パターンを線形マツチングして認識する
Ｂ　Ｔ　Ｓ　Ｐ　（Ｂｉｎａｒｙ　Ｔ　Ｓ　Ｐ）につい
て提案した。Prior Art The present applicant has developed a BTS system in which speech generated word by word is binarized to obtain a feature pattern, and the feature pattern obtained by this binarization processing is linearly matched with a dictionary pattern for recognition. P (Binary T S P) was proposed.

しかし、このＢ　Ｔ　Ｓ　Ｐ方式では認識における母音
のウェイトが大きく、子音による違いがあまり現れず例
えば「濃く」と「億」の誤認識等を引き起こしやすい。However, in this BTSP method, vowels are given a large weight in recognition, and differences between consonants do not appear much, which tends to cause misrecognition of, for example, ``doku'' and ``billion''.

豆−一孜本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声認識装置における子音部の認識精度を向上さ
せることを目的としてなされたものである。The present invention was made in view of the above-mentioned circumstances.
In particular, this was done with the aim of improving the recognition accuracy of consonant parts in a speech recognition device.

遭−一一戊本発明は、上記目的を達成するために、音声収集部と、
特徴量変換部と、音声区間検出部と、標準パターン格納
部とを有して成り、音声収集部で得られた音声を特徴量
変換部にて特徴量に変換し、音声区間に係る部分だけを
抽出し、あらかじめ登録された標準パターンとの類似性
（距離）を求め。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention includes a voice collection section,
It has a feature converter, a voice section detecting section, and a standard pattern storage section.The feature converter converts the voice obtained by the voice collecting section into a feature, and extracts only the portion related to the voice section. is extracted and its similarity (distance) to a pre-registered standard pattern is determined.

類似度（距離）の最も高い（低い）ものを認識結果とし
て出力する音声認識装置において、音声のパワーを求め
、パワーの増減と逆（正）の関係となるような重みをつ
けて類似度の計算をすることを特徴としたものである。In a speech recognition device that outputs the highest (lowest) similarity (distance) as a recognition result, the power of the voice is determined, and the similarity is calculated by assigning weights that have an inverse (positive) relationship with the increase or decrease in power. It is characterized by calculation.

以下、本発明の実施例に基づいて説明する。Hereinafter, the present invention will be explained based on examples.

第１図は、本発明が適用される音声認識装置の一例を説
明するための電気的ブロック線図で、図中、１はマイク
、２は音声区間検出部、３は特微量抽出部、４はパター
ン収納部、５は切り換えスイッチ、６は照合部、７は結
果出力部、８はＤＰマツチング部、９はフレーム間距離
部、１０はフレームパワー掛算部で、照合部６が第２図
のようになっているのが特徴である。而して、本発明は
。FIG. 1 is an electrical block diagram for explaining an example of a speech recognition device to which the present invention is applied. is a pattern storage section, 5 is a changeover switch, 6 is a matching section, 7 is a result output section, 8 is a DP matching section, 9 is an interframe distance section, 10 is a frame power multiplication section, and the matching section 6 is the same as shown in FIG. It is characterized by the fact that it looks like this. Therefore, the present invention is as follows.

子音が母音に比べてパワーが小さいことに着目してなさ
れたものであり、音声収集部と、特徴量変換部と、音声
区間検出部と、標準パターン格納部とを有して成り、音
声収集部で得られた音声を特徴量変換にて特徴量に変換
し、音声区間に係る部分だけを抽出し、あらかじめ登録
されて標準パターンとの類似性（距離）を求め類似度（
距離）の最も高い（低い）ものを認識結果として出力す
る音声認識装置において、音声のパワーを求め、パワー
の増減と逆（正）の関係となるような重みをつけて類似
度の計算をするようにしたものである。It was developed by focusing on the fact that consonants have lower power than vowels, and it is comprised of a voice collection section, a feature value conversion section, a voice section detection section, and a standard pattern storage section. The voice obtained in the section is converted into a feature by feature conversion, only the parts related to the voice section are extracted, and the similarity (distance) with the standard pattern registered in advance is calculated.
In a speech recognition device that outputs the highest (lowest) distance (distance) as the recognition result, the power of the speech is determined, and the degree of similarity is calculated by assigning weights that have an inverse (positive) relationship with the increase or decrease in power. This is how it was done.

上述のように、本発明においては、第１図に示したよう
な一般的な音声認識装置の照合部６が第２図に示すよう
に構成されており、まず、マイクからの音声区間を切り
出して特徴抽出する。音声区間は音声のパワーが一定値
を越えた時から下るまでの区間をとり出すような方法で
良く、また、特徴抽出部はバンドパスフィルタ群による
周波数分析等で良い。標準パターン作成時にはこのバン
ドパスフィルタの出力を１０ｍ秒毎に１２〜１６ｂｉｔ
程度でサンプリングして格納しておく。認識時には特徴
抽出した未知入カバターンと登録されている標準パター
ンを照合し、最も類似しているものを認識結果として出
力する。照合はフレーム間距離による動的計画法を用い
る方法（ＤＰマツチング）など知られている方法を用い
れば良い。バンドパスフィルタの数をｎ個とし、入カバ
ターンを。ｉ、標準パターンをｂｉ　とするとフレーム
間距離は。As described above, in the present invention, the matching unit 6 of the general speech recognition device shown in FIG. 1 is configured as shown in FIG. Extract features. The voice section may be extracted from the period from when the voice power exceeds a certain value to when it drops, and the feature extracting section may perform frequency analysis using a group of band-pass filters. When creating a standard pattern, the output of this bandpass filter is 12 to 16 bits every 10 msec.
Sample and store it. During recognition, the system compares the extracted unknown cover patterns with registered standard patterns, and outputs the most similar pattern as the recognition result. For matching, a known method such as a method using dynamic programming based on interframe distance (DP matching) may be used. The number of bandpass filters is n, and the input cover pattern is: If i and the standard pattern are bi, the interframe distance is.

ＤＰマツチングの際にフレーム間距離を求め、そげるこ
とによりフレームパワーに比例した距離となり、パワー
の小さい方がウェイトが大きくなり。When performing DP matching, the distance between frames is determined and the distance is proportional to the frame power, and the smaller the power, the greater the weight.

パワーの小さい部分を注目した認識が可能となる。This makes it possible to recognize parts with low power.

ここでは距離を用いているが類似度を用いる時はフレー
ムパワーをかける部分を割れば良い。又、パワーを掛け
ると出力０の部分の距離がＯとなってしまうため、不都
合が正しるので最大パワーよりも大きな値αからパワー
を引き、これでフレーム幅距離を割れば良い。Here, distance is used, but when using similarity, it is sufficient to divide the part to which frame power is applied. Also, if the power is multiplied, the distance of the part where the output is 0 becomes O, so to correct the problem, it is sufficient to subtract the power from the value α, which is larger than the maximum power, and divide the frame width distance by this.

第３図は、上述のごとき場合の一実施例を示す要部構成
図で、図中、１１はパワー計算部、１２はα−パワ一部
、１３は割算部で、この実施例は、図示のように引算部
１２において、最大パワーよりも大きなαからパワーを
引算し、割算部１３において、上述のごとくしてこの引
算した値でフレーム間距離を割り算するようにしたもの
である。FIG. 3 is a main part configuration diagram showing an example of the above-mentioned case. In the figure, 11 is a power calculation section, 12 is a part of α-power, and 13 is a division section. As shown in the figure, the subtraction unit 12 subtracts the power from α, which is larger than the maximum power, and the division unit 13 divides the inter-frame distance by this subtracted value as described above. It is.

仇−一来以上の説明から明らかなように、本発明によると、子音
部分に大きなウェイトがつき、精度の良い音声認識が可
能となる。As is clear from the above description, according to the present invention, a large weight is given to consonant parts, making it possible to perform highly accurate speech recognition.

[Brief explanation of the drawing]

第１図は、本発明が適用される音声認識装置の一例を示
す図、第２図は、第１図に示した照合部の詳細図、第３
図は、本発明の他の実施例を示す要部構成図である。１・・・マイク、２・・・音声区間検出部、３・・・特
微量抽出部、４・・・パターン収納部、５・・・切り換
えスイッチ、６・・・照合部、７・・・結果出力部、８
・・・ＤＰマツチング部、９・・・フレーム間距離部、
１０・・・フレームパワー掛算部、１１・・・パワー計
算部、１２・・・α−パワ一部、１３・・・割算部。FIG. 1 is a diagram showing an example of a speech recognition device to which the present invention is applied, FIG. 2 is a detailed diagram of the matching section shown in FIG. 1, and FIG.
The figure is a main part configuration diagram showing another embodiment of the present invention. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Voice section detection section, 3... Feature amount extraction section, 4... Pattern storage section, 5... Changeover switch, 6... Verification section, 7... Result output section, 8
... DP matching section, 9... inter-frame distance section,
DESCRIPTION OF SYMBOLS 10... Frame power multiplication part, 11... Power calculation part, 12... α-power part, 13... Division part.

Claims

[Claims]

It has a voice collection section, a feature amount conversion section, a voice section detection section, and a standard pattern storage section.The voice obtained by the voice collection section is converted into a feature amount by the feature amount conversion section. A speech recognition device extracts only the part related to the section, calculates the similarity (distance) to a pre-registered standard pattern, and outputs the one with the highest (lowest) similarity (distance) as the recognition result. Seeking power, increasing and decreasing power and vice versa (
A speech recognition device characterized in that similarity calculation is performed by applying weights such that a relationship of (positive) is established.