JPS6091397A

JPS6091397A - Voice recognition equipment

Info

Publication number: JPS6091397A
Application number: JP58200188A
Authority: JP
Inventors: 中谷　奉文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-10-26
Filing date: 1983-10-26
Publication date: 1985-05-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】肢生分国本発明は、効率的に音声を認識することのできる音声認
識装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device that can efficiently recognize speech.

皿米技権音声認識装置において、音声入力が１台の認識装置に重
なる場合、従来は、録音器等の記録媒体に一時記録して
おき、時間的に余裕のできた時点で再生して順次認識す
るようにしているが、この方法では再生に時間がかかり
認識できる量が限定されてしまうという欠点があった。In the Saramai Giken voice recognition device, when voice input overlaps with one recognition device, conventionally, it is temporarily recorded on a recording medium such as a recorder, and then played back when there is free time and recognized sequentially. However, this method has the disadvantage that it takes time to reproduce and the amount that can be recognized is limited.

目　的本発明は、上述のごとき従来技術の欠点を解消するため
になされたもので、特に、音声認識装置において、多数
の入力が同時に重なる場合に、録音等の手段で記録した
音声を録音時により早く再生して効率的に認識できる音
声認識装置を提供することを目的としてなされたもので
ある。Purpose The present invention has been made in order to solve the above-mentioned drawbacks of the prior art.In particular, in a voice recognition device, when a large number of inputs overlap at the same time, the voice recorded by means such as recording is recorded. This was done with the aim of providing a speech recognition device that can reproduce data faster and recognize it more efficiently.

構　成本発明の構成について、以下、実施例に基づ、いて説明
する。Configuration The configuration of the present invention will be described below based on examples.

本発明は、前述のごとき実情に鑑みてなされたもので、
特に、記録された音声を記録したときよりも再生の速度
を上げて早く再生することにより時間的に効率良く入力
音声の認識ができるようにしたものであるが、以下、理
解を容易にするために２倍速で再生して認識する場合に
ついて説明する。The present invention was made in view of the above-mentioned circumstances, and
In particular, by increasing the playback speed of the recorded voice and playing it back faster than when it was recorded, the input voice can be recognized in a time-efficient manner. The following describes the case where the image is played back at double speed and recognized.

図は、本発明の一実施例を説明するための電気的ブロッ
ク線図で、図中、ｌはマイクログオン。The figure is an electrical block diagram for explaining one embodiment of the present invention, and in the figure, l represents a microgon.

２は例えば録音機に記録したものを２倍速で再生する手
段、３はこの２つの入力の切替えスイッチ、４は人間の
音声の音響減衰特性を補正するためのプリエンファシス
回路、５は特徴パラメータであるパワースペクトルを例
えばＢ−Ｐ−Ｆ　（バンドパスフィルタ）群で抽出する
ためのＢ−Ｐ−Ｆ部、６及び７はフィルタ５の出力から
パワーを抽出するための検波部と平滑部で、この平滑部
はＬ−・Ｐ・Ｆ（ローパスフィルタ）で構成される。８
はパワースペクトルを量子化するＡＤ変換器、９は量子
化したパワースペクトルからなる特徴パターンを次段の
辞書部１０と照合部１１の一方に切替えて入力するスイ
ッチで、辞書ｌＯは登録音声の特徴パターンを格納する
ためにあり、照合部１１は入力音声と辞書１０の特徴パ
ターンとを照合するためにある。１２は照合した結果に
より入力音声がどの内容であるかで判定する判定部、１
４はスイッチ３からの切替え信号１３によりプリエンフ
ァシス回路４乃至ＡＤ変換器８の設定パラメータを２倍
速かノーマルな速度に制御する制御部である。2 is means for playing back what was recorded on a recorder at double speed, 3 is a switch for selecting these two inputs, 4 is a pre-emphasis circuit for correcting the acoustic attenuation characteristics of human voice, and 5 is a characteristic parameter. A B-P-F section for extracting a certain power spectrum using, for example, a B-P-F (band pass filter) group; 6 and 7 are a detection section and a smoothing section for extracting power from the output of the filter 5; This smoothing section is composed of L-, P, and F (low-pass filters). 8
9 is an AD converter that quantizes the power spectrum; 9 is a switch that switches and inputs the feature pattern consisting of the quantized power spectrum to either the dictionary section 10 or the collation section 11 in the next stage; The matching section 11 is provided to store patterns, and the matching section 11 is provided to match the input speech with characteristic patterns in the dictionary 10. 12 is a determination unit that determines the content of the input audio based on the comparison result;
Reference numeral 4 denotes a control section that controls the setting parameters of the pre-emphasis circuit 4 to the AD converter 8 to double speed or normal speed using the switching signal 13 from the switch 3.

表１は、ノーマル速度と２倍速のパラメータを示すが、
２倍速の場合再生時間は記録時間の半分となり、再生さ
れた信号の周波数は２倍にシフトされる。従って、設定
パラメータは、この２倍にシフトした信号を処理するよ
うに設定する。Table 1 shows the parameters for normal speed and double speed.
In the case of double speed, the reproduction time is half the recording time, and the frequency of the reproduced signal is shifted twice. Therefore, the setting parameters are set so that the signal shifted twice is processed.

表１表１から明らかなように、プリエンファシス回路４；Ｂ
−Ｐ−Ｆ群５　；　Ｌ　−Ｐ　−、Ｆ部７；及び、ＡＤ
変換部８の各々の設定値はノーマル時の２倍の設定値と
なる。但し、フレーム時間はノーマル時の半分となる。Table 1 As is clear from Table 1, pre-emphasis circuit 4;
-P-F group 5; L-P-, F section 7; and AD
Each setting value of the converter 8 is twice the normal setting value. However, the frame time will be half of the normal time.

以上の如く設定すれば、データ量はサンプル数が２倍と
なるが再生時間が半分となるので、全体としては変化し
ない。よって、辞書１０にノーマルで登録されている特
徴パターンのデータと２倍速で処理されたデータは１対
１の対応をし、そのまＮ照合することができる。If the settings are made as described above, the data amount will not change as a whole because the number of samples will double, but the playback time will be halved. Therefore, there is a one-to-one correspondence between the characteristic pattern data registered in the dictionary 10 as normal and the data processed at double speed, and N matching can be performed as is.

なお、以上には２倍速を例にして説明したが、本発明は
、上記実施例°に限定されるものではなく、例えば、ｎ
倍の再生速度のときは、再生時間とフレーム時間は１　
／　ｎに、表１の他のパラメータをｎ倍することにより
、ｎ倍の認識ができることは言うに及ばない。Note that although the explanation has been made using double speed as an example, the present invention is not limited to the above embodiment; for example,
When the playback speed is double, the playback time and frame time are 1
It goes without saying that by multiplying /n by n times the other parameters in Table 1, n times more recognition can be achieved.

また、以上には、音声取り込みにＢ−Ｈ−Ｆ群を用いた
例について説明したが、入力音声を直ちに量子化してプ
リエンファシス以降の操作をデジタルフィルタ等で構成
する場合もパラメータの設定は同様であるし、以上に説
明した以外の特徴パラメータ（例えば自己相関係数、Ｐ
ａｒｃｏｒ係数、雲交叉数等）を用いた認識においても
全く同様に処理できることは言うに及びない。In addition, although the example above uses the B-H-F group for audio capture, the parameter settings are the same when input audio is immediately quantized and operations after pre-emphasis are performed using a digital filter, etc. , and characteristic parameters other than those explained above (e.g. autocorrelation coefficient, P
It goes without saying that recognition using (arcor coefficients, cloud intersection numbers, etc.) can be processed in exactly the same way.

塾−一果以上の説明から明らかなように、本発明によると、記録
された音声を記録した速度よりも早い再生速度で再生し
ても、登録されている辞書と音声取り込みの設定パラメ
ータを変更することのみの操作により、ノーマル時と同
様の照合操作によって正確な音声認識をする音声認識装
置を提供することができ、効率的な認識をすることがで
きる。As is clear from the above explanation, according to the present invention, even if the recorded audio is played back at a faster playback speed than the recording speed, the registered dictionary and audio capture setting parameters can be changed. It is possible to provide a speech recognition device that performs accurate speech recognition by performing the same verification operation as in the normal case, and it is possible to perform efficient recognition.

[Brief explanation of the drawing]

図は、本発明の一実施例を説明するための電気的ブロッ
ク線図である。１・・・マイクロフォン、２・・再生装置、４・・・プ
リエンファシス回路、５・・Ｂ−Ｐ−Ｆ群、６・・検波
部、７・・・平滑部（Ｌ−Ｐ−Ｆ）、８・・Ａ／Ｄ変換
器、１０・・・辞書部、１１・・・照合部、１２・・・
判定部、１４・・・制御部。The figure is an electrical block diagram for explaining one embodiment of the present invention. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Reproduction device, 4... Pre-emphasis circuit, 5... B-P-F group, 6... Detection section, 7... Smoothing section (L-P-F), 8... A/D converter, 10... Dictionary section, 11... Verification section, 12...
Judgment unit, 14...control unit.

Claims

[Claims]

(1) an extraction unit that extracts the feature parameters of the audio signal; a dictionary unit that stores the feature parameters of the audio to be registered;
A speech recognition device that has a matching section that matches the feature parameters of the input speech with the feature parameters stored in the dictionary section, and a determination section that discriminates the input speech from the matching result, uses direct speech input and speech that has been recorded once. A voice recognition device characterized by switching between reproducing a signal and applying a voice input to recognize each one.

(2) The speech recognition device according to claim (1), further comprising a control means for obtaining a control signal from a selection signal for switching between direct speech input and recorded speech input.

(3) The speech recognition device according to claim (2), characterized in that the extraction parameters of the feature extraction section are changed depending on the playback speed (n times) of the recorder by a control signal from the control means. .