JPS6152698A

JPS6152698A - Voice recognition equipment

Info

Publication number: JPS6152698A
Application number: JP59174141A
Authority: JP
Inventors: 俊宏木村
Original assignee: Hitachi Electronics Engineering Co Ltd
Current assignee: Hitachi High Tech Corp
Priority date: 1984-08-23
Filing date: 1984-08-23
Publication date: 1986-03-15

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、人間の音声信号を入力１該音声によって表
現されたデータを認識するための装置に関し、特に「え
−」、「え−と」等の無意味語の混入による誤認識を防
ぐようにしたものに関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an apparatus for inputting a human voice signal and recognizing data expressed by the voice, and in particular, it relates to a device for inputting a human voice signal and recognizing data expressed by the voice. ” and other meaningless words mixed in to prevent misrecognition.

[Conventional technology]

マン・マシン・インターフェース改善の一つ、として、
単語ごとに区切って発音した音声を認識対象とする音声
認識装置が、近年実用化の域に達している。−例を示せ
ば、銀行のキャッシュ・カード・サービスにおいては、
カードの暗証番号等の照会を行なう際に預金者からの音
声入力により該暗証番号を認識する装置が実用に供され
ている。As one way to improve the man-machine interface,
2. Description of the Related Art Speech recognition devices that recognize speech that is divided into words and pronounced have reached the stage of practical use in recent years. -For example, in a bank's cash card service,
2. Description of the Related Art Devices have been put into practical use that recognize a card's personal identification number based on voice input from a depositor when inquiring about the personal identification number of the card.

このような音声認識装置は、夫々の利用分野に応じて、
あらかじめ種々の特定の単語（例えば数字等）について
の音声スペクトルの標準パターンを保持している。該音
声認識装置は、音声信号が入力されると、該音声信号を
スペクトル分析してそのスペクトルデータを求め、該ス
ペクトルデータと前記標準パターンの全てとの比較演算
を逐次行なっていく。そして、全ての前記標準パターン
の中から入力音声信号のスペクトルに対する類似度の最
も高いものを選択し、その最も類似度の高い標準パター
ンに対応する所定の語を示すデータを音声認識データ（
入力音声から認識した語を示すデータ）として出力する
。Such voice recognition devices are used in different ways depending on the field of use.
Standard patterns of speech spectra for various specific words (for example, numbers, etc.) are held in advance. When a voice signal is input, the voice recognition device spectrally analyzes the voice signal to obtain spectrum data, and sequentially performs a comparison operation between the spectrum data and all of the standard patterns. Then, the standard pattern with the highest degree of similarity to the spectrum of the input audio signal is selected from among all the standard patterns, and data indicating a predetermined word corresponding to the standard pattern with the highest degree of similarity is converted into voice recognition data (
output as data indicating the words recognized from the input speech).

〔発明が解決しようとする問題点〕しかしこのような従来の音声認識装置では、必要なデー
タを表現する音声だけでなく、それ以外の無意味な音声
が入力されたときにも、上述の比較演算及び選択の結果
、最も相対的な類似度の高い標準パターンが選択され、
該パターンに対応して予め準備されている語がその音声
によって表現された入力データであると誤認識されてし
まう。[Problems to be Solved by the Invention] However, in such conventional speech recognition devices, not only speech expressing necessary data but also other meaningless speech is input. As a result of calculation and selection, the standard pattern with the highest relative similarity is selected,
Words prepared in advance corresponding to the pattern are erroneously recognized as input data expressed by the voice.

前述のキャッシュ・カード・サービスを例にとり、預金
者が暗証番号を音声入力する際に「え−と、２，６，１
，５Ｊと発音したとしよう。そうするとこの音声認識装
置では、最初の「え−と」という音声についても上述の
比較演算が行なわれ、この「え−と」という音声のスペ
クトルに対する類似度が相対的に最も高い標準パターン
を有する数字（仮りに「３」であるとしよう）の選択が
行なわれる。その結果、前記預金者の音声は「３゜２．
６，１，５Ｊと認識され、最初の４桁の［３゜２．６．
ＩＪがこの預金者の暗証番号であると判断されてしまう
のである。尚「え−と」のような無意味な言葉の他に、
咳ばらいや特定の騒音が検出されたときにも、こうした
現象が生じるおそれがある。Taking the above-mentioned cash card service as an example, when a depositor inputs his/her PIN number by voice, he/she says, "Um, 2, 6, 1.
, 5J. Then, in this speech recognition device, the above-mentioned comparison operation is also performed for the first voice "uh-to", and the number having the standard pattern with the highest relative similarity to the spectrum of this voice "uh-to" is used. (Let's assume it is "3") is selected. As a result, the depositor's voice was "3°2.
It is recognized as 6,1,5J, and the first four digits [3°2.6.
The IJ is determined to be the depositor's PIN number. In addition to meaningless words like "um",
This phenomenon can also occur when coughing or certain noises are detected.

このように、従来の音声認識装置では、必要なデータを
表現する音声以外の無意味な音声が入力されることによ
り、データの誤認識を犯してしまうという問題があった
。As described above, conventional speech recognition devices have the problem of erroneously recognizing data due to input of meaningless speech other than speech expressing necessary data.

この発明は上述の点に鑑みてなされたもので、正確なデ
ータの認識ができるようにした音声認識装置を提供しよ
うとするものである。The present invention has been made in view of the above-mentioned points, and it is an object of the present invention to provide a speech recognition device capable of accurately recognizing data.

〔問題点を解決するための手段及び作用〕この発明に係
る音声認識装置は、該当する利用分野に応じた所定のデ
ータについての音声スペクトルの標準パターンとは別途
に、人間が発声しやすい無意味路若しくは咳ばらい、騒
音等の無意吸音についての音声スペクトルの標準パター
ンを記憶手段内に保持している。比較演算装置において
は、取込まれたスペクトルは、前記データについての標
準パターンだけでなく、この無意味路等についての全て
の標準パターンとも逐次比較され。[Means and effects for solving the problems] The speech recognition device according to the present invention uses a standard pattern of the speech spectrum for predetermined data according to the applicable field of use, as well as a meaningless pattern that is easy for humans to utter. Standard patterns of sound spectra for involuntary sound absorption such as coughing, noise, etc. are held in the storage means. In the comparison calculation device, the acquired spectrum is successively compared not only with the standard pattern for the data but also with all the standard patterns for the meaningless paths and the like.

類似度が求められる。Similarity is required.

判定手段は、スペクトル化された成る音声に対する類似
度が最も大きなものとして、前記所定のデータについて
の標準パターンのひとつを選択したときは、該標準パタ
ーンによって表現されるデータを上位のコンピュータ等
に出力する。しかし類似度が最も大きなものとして前記
無意味路についての標準パターンのひとつを選択したと
きは、該標準パターンによって表現される無意味路を無
視し、出力しない。When the determining means selects one of the standard patterns for the predetermined data as having the greatest degree of similarity to the spectralized speech, the determining means outputs the data expressed by the standard pattern to a host computer, etc. do. However, when one of the standard patterns for the meaningless path is selected as having the highest degree of similarity, the meaningless path expressed by the standard pattern is ignored and not output.

これにより、所定のデータを音声入力する際に、その標
準パターンが前記記憶手段に記憶されているところの無
意味路を発声した場合でも、該無意味路は無視され、音
声入力された正しいデータだけが正確に判定手段から出
力される。As a result, when inputting predetermined data by voice, even if a nonsensical path whose standard pattern is stored in the storage means is uttered, the meaningless path is ignored and the correct data input by voice is uttered. only that is accurately output from the determining means.

〔Example〕

以下、添付図面を参照してこの発明の一実施例を詳細に
説明しよう。Hereinafter, one embodiment of the present invention will be described in detail with reference to the accompanying drawings.

第１図において、音声分析部１は、検出された音声の波
形を逐次分析して該音声のスペクトルデータを得るため
のものである。該分析部１では、例えば１秒間に８００
〜１６００ハイドの割合で波形の分析が行なわれる。尚
１分析部１では波形の分析を行なう前に該波形をディジ
タル信号に変換するのが通常であるが、該分析部１はア
ナログ波形のまま分析を行なうものであってもよい。In FIG. 1, a voice analysis section 1 is for sequentially analyzing the waveform of a detected voice to obtain spectrum data of the voice. For example, in the analysis section 1, 800
Waveform analysis is performed at a rate of ~1600 Hyde. Although the analysis section 1 normally converts the waveform into a digital signal before analyzing the waveform, the analysis section 1 may analyze the analog waveform as it is.

音声分析部１によって得られる音声のスペクトルデータ
を逐次取入れるものとして、データ圧縮部２が設けられ
ている。データ圧縮部２は、所定の時間間隔を単位とし
て該スペクトルデータの間引き又は平均化等のデータ圧
縮操作を行なうものである。前述のように毎秒８００〜
１６００バイトの割合で与えられる膨大な量のスペクト
ルデータは、これによって逐次演算処理を行なうのに適
した量にまで減少させられる。尚この間引きまたは平均
化は、音声の特徴を損なわないような範囲内で行ない、
ｆｌ＋　Ｉｆ該スペクトルデータを約１０倍に圧縮する
程度の時間間隔で行なうものとする。A data compression section 2 is provided to sequentially take in the speech spectrum data obtained by the speech analysis section 1. The data compression unit 2 performs data compression operations such as thinning or averaging of the spectrum data in units of predetermined time intervals. As mentioned above, 800~
The enormous amount of spectral data provided at the rate of 1600 bytes is thereby reduced to an amount suitable for sequential arithmetic processing. This thinning or averaging should be done within a range that does not impair the characteristics of the voice.
fl+ If is performed at time intervals that compress the spectrum data approximately 10 times.

比較演算部３は、データ圧縮部２によって間引きまたは
平均化されたスペクトルデータを逐次取入れて、あらか
じめ記憶された標準パターンと該スペクトルデータとの
比較演算を行なうために設けられたものであり、マイク
ロコンピュータまたは専用のハード装置から成るもので
ある。The comparison calculation unit 3 is provided to sequentially take in the spectral data thinned out or averaged by the data compression unit 2 and perform a comparison calculation between the spectral data and a standard pattern stored in advance. It consists of a computer or a dedicated hardware device.

この発明によれば、標準パターンを記憶するためのもの
として、メモリ４及びメモリ５が設けられている。メモ
リ４は、利用分野に応じ、必要とされるデータを表わす
音声のスペクトルの標準パターンを記憶するものである
。例えば列車予約の場合であれば、数字９列車名、普通
車かグリーン車かの区別、駅名等についての標準パター
ンがメモリ４に記憶される、といつた具合である。他方
、メモリ５は、該データを表わす音声以外の音声であっ
てデータ入力時に発生しやすいものについてのスペクト
ルの標準パターンを記憶するためのものである。メモリ
５には、例えば「え−と」、「え−」。According to this invention, a memory 4 and a memory 5 are provided for storing standard patterns. The memory 4 stores standard patterns of the audio spectrum representing the required data depending on the field of use. For example, in the case of train reservations, standard patterns for the number 9 train name, distinction between ordinary cars and green cars, station names, etc. are stored in the memory 4. On the other hand, the memory 5 is for storing standard patterns of spectra for sounds other than the sounds representing the data that are likely to occur during data input. The memory 5 stores, for example, "um" and "um".

「あの−」のように入間が通常発声しやすい無意味語を
はじめ、咳ばらい、使用場所に特有の騒音等についての
スペクトルの標準パターンが記憶される。Standard spectral patterns are stored for nonsensical words that Iruma usually tends to pronounce, such as ``Ano'', as well as for coughing, noises specific to the place of use, and the like.

比較演算部３は、データ圧縮部２からスペクトルデータ
が与えられ始めると、これら全ての標準パターンをメモ
リ４及びメモリ５から順次読出して、該スペクトルデー
タと全ての標準パターンとの比較演算を逐次行なってい
く。この比較演算の結果、比較演算部３からは、該スペ
クトルデータと各標準パターンとの類似度を夫々示す信
号が逐次出力される。When the spectral data starts to be supplied from the data compression section 2, the comparison calculation section 3 sequentially reads out all these standard patterns from the memory 4 and the memory 5, and sequentially performs a comparison calculation between the spectral data and all the standard patterns. To go. As a result of this comparison calculation, the comparison calculation unit 3 sequentially outputs signals indicating the degree of similarity between the spectrum data and each standard pattern.

比較演算部３の出力信号は、判定部６に逐次取入れられ
る。判定部６は、マイクロコンピュータまたは専用のハ
ード装置から成っており、該信号に基づき、各入力音声
のスペクトルデータに対する類似度の最も大きい標準パ
ターンを選択するものである。その結果選択された標準
パターンがメモリ４に記憶されているものであれば（す
なわち意味のあるデータを表現するものであれば）、判
定部６は、該標準パターンに対応する語（−語又は単語
又は熟語等を含む）を示すデータを入力音声の音声認識
データとして図示しない上位のコンピュータ等忙出力す
る。また選択された標準パターンがメモリ５に記憶され
ているものであれば（すなわち無意味語に対応するもの
であれば）、判定部６は、これを無視し、音声認識デー
タの出力を行なわない。The output signal of the comparison calculation section 3 is sequentially input to the determination section 6. The determination unit 6 is comprised of a microcomputer or a dedicated hardware device, and is configured to select, based on the signal, the standard pattern that has the greatest similarity to the spectrum data of each input voice. If the standard pattern selected as a result is one stored in the memory 4 (that is, if it expresses meaningful data), the determination unit 6 determines the word (- word or (including words, phrases, etc.) is output to a host computer (not shown) as speech recognition data of the input speech. Furthermore, if the selected standard pattern is stored in the memory 5 (that is, if it corresponds to a meaningless word), the determination unit 6 ignores this and does not output speech recognition data. .

ちなみに、比較演算部３及び判定部６がマイクロコンピ
ュータから成るものであるとした場合に該マイクロコン
ピュータによって実行されるべきプログラムを略示する
と、第２図のとおりである。Incidentally, assuming that the comparison calculation section 3 and the determination section 6 are composed of a microcomputer, a program to be executed by the microcomputer is schematically shown in FIG. 2.

ステップ７及び８は比較演算部３の機能に相当するもの
であり、ステップ７では、データ圧縮部２から与えられ
たスペクトルデータとメモリ４から読出した真の標準パ
ターンとの比較を逐次行ない、各々の類似度を求める。Steps 7 and 8 correspond to the functions of the comparison calculation unit 3, and in step 7, the spectrum data given from the data compression unit 2 and the true standard pattern read from the memory 4 are sequentially compared, and each Find the similarity of.

ステップ７が終了すると、ステップ８に進む。When step 7 is completed, the process proceeds to step 8.

ステップ８では、前記スペクトルデータとメモリ５から
読出した無意味語等についての標準パターンとの比較を
逐次行ない、各々の類似度を求める。ステップ８が終了
すると、ステップ９に進む。In step 8, the spectral data is successively compared with the standard patterns for meaningless words etc. read out from the memory 5, and the degree of similarity of each is determined. When step 8 is completed, the process proceeds to step 9.

ステップ９〜１１は判定部６の機能に相当するものであ
り、ステップ９ては、前記ステップ７及び８の比較結果
に基づき前記スペクトルテータに対する類似度の最も大
きい標準パターンを選択する。Steps 9 to 11 correspond to the functions of the determining section 6, and step 9 selects the standard pattern having the highest degree of similarity to the spectrum theta based on the comparison results of steps 7 and 8.

続いてステップ１０では、前記ステップ９で選択された
標準パターンが、メモリ５に記憶された無意米語等につ
いての標準パターンに属するものであるか否かが判断さ
れる。メモリ５内の標準パターンに属するものであれば
、ステップ１０でＹＥＳと判断し、そのままリターンす
る（従って上位コンピュータへの音声認識データの出力
は行なわれない。）。他方、メモリ５内の標準パターン
ＶＣ属さないものであれば（すなわち、メモリ４に記憶
された真の標準パターンに属するものであれば）、ステ
ップ１０でＮＯと判断してステップ１１に進み、ステッ
プ９で選択された該標準パターンに対応する音声認識デ
ータが上位コンピュータに出力される。ステップ１１を
終了すると、リターンする。Subsequently, in step 10, it is determined whether the standard pattern selected in step 9 belongs to the standard patterns for idioms stored in the memory 5. If the pattern belongs to the standard pattern in the memory 5, YES is determined in step 10, and the process returns (therefore, the speech recognition data is not output to the host computer). On the other hand, if it does not belong to the standard pattern VC in the memory 5 (that is, if it belongs to the true standard pattern stored in the memory 4), it is determined NO in step 10, and the process proceeds to step 11. The speech recognition data corresponding to the standard pattern selected in step 9 is output to the host computer. When step 11 is completed, the process returns.

次に、具体例を交えてこの発明に係る音声認識装置の動
作の一例を説明すると、以下のとおりである。列車予約
において、予約者が列車名を音声入力する際［ひかり、
え−、２１７号］、駅名を音声入力する際「あの−、新
大阪」と発音したとしよう。このとき該音声は、音声分
析部１及びデータ圧縮部２によりスペクトル化され１間
引きまたは平均化された後、比較演算部３において、メ
モリ４及びメモリ５内に記憶されている数字２列車名、
駅名、無意味語等についての全ての標準パターンとの比
較が行なわれ、その結果類似度を示す信号が該演算部３
から判定部６に与えられる。Next, an example of the operation of the speech recognition device according to the present invention will be described with reference to a specific example. When making a train reservation, when the person making the reservation inputs the train name by voice, [Hikari,
Let's say that when inputting the station name by voice, you pronounce it as ``Ano, Shin-Osaka''. At this time, the voice is spectralized by the voice analysis section 1 and the data compression section 2 and thinned out or averaged by 1, and then in the comparison calculation section 3, the numbers 2, the train name, and the numbers stored in the memory 4 and the memory 5,
Comparisons are made with all standard patterns for station names, meaningless words, etc., and a signal indicating the degree of similarity is sent to the calculation unit 3.
is given to the determination unit 6 from.

このとき判定部６　　“゛牟は、「ひかりＪ、ｒ２１７号」、「新大阪」という音
声のスペクトルについては、メモリ４にその標準パター
ンが記憶されているところの駅名。At this time, the determination unit 6 ``゛゛゛ is the station name whose standard pattern is stored in the memory 4 for the voice spectra of ``Hikari J, R217'' and ``Shin-Osaka.''

列車名に関するデータのひとつである「ひかり」、１２
１７号」、「新大阪」を夫々最も類似度の大きいものと
して選択し、外部に出力する。他方「え−」、「あの−
」という音声のスペクトルについては、メモリ５ＶＣそ
の標準パターンが記憶されているところの無意米語のひ
とっである「え−」、「あの−」を夫々最も類似度の大
きいものとして選択し、これを無視する。従ってこの音
声認識からは、列車名について「ひかり２１７号」。"Hikari", one of the data regarding train names, 12
No. 17'' and ``Shin-Osaka'' are selected as having the highest degree of similarity, respectively, and output to the outside. On the other hand, “Eh-”, “Um-”
Regarding the spectrum of the voice "E-" and "Ano-", which are one of the Japanese words for which the standard pattern is stored in Memory 5VC, we select them as having the greatest similarity, respectively. ignore. Therefore, from this voice recognition, the train name is "Hikari 217".

駅名について「新大阪」という正確なデータが出力され
ることになる。The correct data for the station name, ``Shin-Osaka,'' will be output.

このように、メモリ４にその標章パターンが記憶されて
いるところのデータを音声入力する際に、の音声認識装
置は、該無意米語等を無祝し、音声入力された該データ
を正確に認識して外部に出力するのである。In this way, when inputting the data whose mark pattern is stored in the memory 4 by voice, the voice recognition device ignores the uninformed American words, etc., and accurately inputs the data inputted by voice. It is recognized and output to the outside.

なお、メモリ４及び５はハード的に別々のメモリである
ことを要せず、真のデータに対応する標準パターン及び
無意米語等についての標準パターンの双方を一体のメモ
リハード回路に記憶したものであってもよい。It should be noted that the memories 4 and 5 do not need to be separate memories in terms of hardware, but can store both the standard pattern corresponding to the true data and the standard pattern for non-American words, etc. in a single memory hardware circuit. There may be.

〔Effect of the invention〕

以上のとおりこの発明に係る音声認識装置によれば、デ
ータの音声入力中に無意米語等が入力された場合にも、
該無意米語等の影響を受けることなく、正確に真の入力
データの認識を行なうことができる。従ってまた、デー
タの音声入力中に無意米語等が入力されても改めて該デ
ータの音声入力をやり直す必要がないので、全体として
のデータの入力時間を短縮することができる。これによ
り、音声認識装置を用いた各種サービス業務において、
サービスの向上を図ることができる。As described above, according to the speech recognition device according to the present invention, even when Japanese words such as Japanese are input during voice input of data,
It is possible to accurately recognize true input data without being affected by such unspoken words. Therefore, even if an unwritten American word is input during voice input of data, there is no need to input the voice of the data again, so the overall data input time can be shortened. As a result, in various service operations using voice recognition devices,
It is possible to improve the service.

[Brief explanation of the drawing]

第１図は、この発明に係る音声認識装置の一実施例を示
す概略ブロック図、第２図は、該実施例における比較演
算部及び判定部としてマイクロコンピュータを用いた場
合に該マイクロコンピュータが実行すべきプログラムな
略示するフローチャートである。１・・・音声分析部、２・・・データ圧縮部、３・・・
比較演算部、４，５・・・メモリ、６・・・判定部。FIG. 1 is a schematic block diagram showing an embodiment of the speech recognition device according to the present invention, and FIG. 2 shows the execution of the microcomputer when the microcomputer is used as the comparison calculation section and determination section in the embodiment. 1 is a flowchart schematically illustrating a program to be executed. 1...Speech analysis section, 2...Data compression section, 3...
Comparison calculation unit, 4, 5... memory, 6... determination unit.

Claims

[Claims] Speech analysis means for sequentially analyzing an input speech signal to obtain spectrum data of the speech, and storing standard patterns of the speech spectrum for a plurality of specific words depending on the field of use. a first storage means that inputs the spectral data analyzed by the voice analysis means, compares the spectral data with all standard patterns stored in the storage means, and calculates the degree of similarity thereof. and determining means for selecting the standard pattern with the greatest degree of similarity based on the output of the comparison calculation means and outputting speech recognition data corresponding to the standard pattern, the speech recognition device comprising: further provided is a second storage means storing standard patterns of each voice spectrum for a predetermined plurality of meaningless words and other meaningless sounds that are easy to utter, and in the comparison calculation means, the voice analysis means The spectral data analyzed in the second storage means is further compared with the standard patterns of all meaningless words, etc. stored in the second storage means, and the determination means compares the standard patterns of the meaningless words, etc. with the degree of similarity. 2. A speech recognition device, wherein the speech recognition data is not output when the speech recognition data is selected as the largest one.