JPS58224399A

JPS58224399A - Voice recognition equipment

Info

Publication number: JPS58224399A
Application number: JP57108741A
Authority: JP
Inventors: 宇佐美　隆一; 横溝　信一; 松本　正至; 三郎安藤; 新家　修
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-06-24
Filing date: 1982-06-24
Publication date: 1983-12-26

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】発明の技術分野本発明は音声認識装置特にその音声辞書への音声パラメ
ータの登録に関する。TECHNICAL FIELD OF THE INVENTION The present invention relates to a speech recognition device, and in particular to the registration of speech parameters in its speech dictionary.

技術の背景一般に音声認識においては音声の特徴を示すデータ（音
声パラメータ）を格納する音声辞書を使用し、入力音声
から音声パラメータを求め、それを辞書内音声パラメー
タと比較し、一致した（距離が闇値内にある）辞書内音
声パラメータに対応する辞書内音声（言葉）として当該
入力音声を認識する。音声辞書は、不特定発声者に対処
できるσも用型が好ましいがこの型のものでは認識率が
低く、これを高めるには格納データ量が膨大になり、比
較が厄介で時間がか＼るなどの問題があるので、各発声
者に専用のものとするのが現状である。が＼る発声者別
音声辞書は、当該発声者が使用予定の言葉をマイクを通
して音声入力しかつそれに対応する言葉をキーインし、
認識装置では当該音声入力を分析つまり周波数分割、サ
ンプリング、特徴抽出等して音声パラメータを求め、そ
れをキーインされた言葉とともにメモリに格納して、作
成する。Background of the technology Generally speaking, speech recognition uses a speech dictionary that stores data (speech parameters) indicating the characteristics of speech, calculates the speech parameters from the input speech, compares them with the speech parameters in the dictionary, and finds a match (if the distance is The input speech is recognized as an in-dictionary speech (word) corresponding to an in-dictionary speech parameter (within a dark value). For voice dictionaries, it is preferable to use a type with σ that can handle unspecified speakers, but this type has a low recognition rate, and to increase this the amount of stored data would be enormous, making comparisons cumbersome and time-consuming. Because of these problems, the current situation is to use one dedicated to each speaker. The speech dictionary for each speaker allows the speaker to input the words they plan to use through a microphone and key in the corresponding words.
The recognition device analyzes the voice input, that is, performs frequency division, sampling, feature extraction, etc. to obtain voice parameters, stores them in a memory together with the keyed-in words, and creates them.

従来技術と問題点音声認識の１つの重要な用途は機器、装置、システム等
の音声による制御である。音声制御はキーボード等を操
作する必要がないので専門オペレータを必要としない、
遠隔制御に便利である点の利点を有するが現状では誤認
識する恐れがあり誤認識が生じると重大な事故を招くこ
とがある。そこで特に音声制御に使用するコマンドは認
識率の高いものとする必要がある。音声認識は前述のよ
うに距１ｉ１１を１算して闇値以内のもの（差が許容範
囲内のもの）を答とするが、この隙間が２つ、３つと出
てくることがある。勿論答は１つのはずであり、残りは
誤りである。許容範囲を小にすれば答が２つ３つと出て
くるｉ’ＩＪｉ向は減少するが、代りに答がないケース
が増大する。闇値はそのま−にして認識率を高めるには
音声パラメータ相互の距離を大にすればよい。Prior Art and Problems One important application of voice recognition is voice control of equipment, devices, systems, etc. Voice control does not require any keyboard or other operations, so there is no need for a specialized operator.
Although it has the advantage of being convenient for remote control, there is currently a risk of erroneous recognition, and if erroneous recognition occurs, it may lead to a serious accident. Therefore, commands used for voice control in particular need to have a high recognition rate. In voice recognition, as described above, the distance 1i11 is calculated by 1 and the answer is one within the dark value (the difference is within the allowable range), but two or three gaps may appear. Of course, there must be only one answer, and the rest are wrong. If the allowable range is made smaller, the number of i'IJi cases in which two or three answers appear will decrease, but the number of cases in which there are no answers will increase. In order to increase the recognition rate while leaving the dark value as is, it is sufficient to increase the distance between the audio parameters.

発明の目的本発明はか＼る点に着目するものであって、音声辞書作
成に際し、制御コマンドのような重要な音声に列しては
他の音声との距離が大になるように′する。具体的には
該距離が小さ□い入力音声に対しては同義異語で君い換
えて音声登録するよう発声者に警報し、こうして重要な
音声に対する他の音声との距離を大にして誤認識発生を
阻止しようとするものである。Purpose of the Invention The present invention focuses on this point, and when creating a speech dictionary, it is arranged so that important sounds such as control commands are placed at a greater distance from other sounds. . Specifically, the speaker is warned to register the input voice with a short distance by replacing it with a synonymous and different word, thereby increasing the distance between the important voice and other voices to avoid mistakes. This is an attempt to prevent recognition from occurring.

発明の構成本発明は人力音声の音声パラメータを抽出する人力処理
部、音声パラメータを格納する音声辞書、入力音声パラ
メータと辞書内音声パラメータを比較して対応するもの
を求める制御部を備える音声認識装置において、前記音
声辞書に他の登録音声パラメータと自身の許容最小距離
を指定する領域を設け、更に音声パラメータの登録に際
し゛ζ相互間距離が該最小距離内であるとその登録を禁
止する警告発生機能を持たせてなることを特徴とするが
、次に実施例を参照しながらこれを説明する発明の実施
例第１図は本発明による音声辞書の構成概要を示す。Structure of the Invention The present invention provides a speech recognition device that includes a human processing section that extracts speech parameters of human speech, a speech dictionary that stores the speech parameters, and a control section that compares the input speech parameters and the speech parameters in the dictionary to find corresponding ones. , an area is provided in the voice dictionary for specifying the minimum allowable distance between the voice parameter and other registered voice parameters, and furthermore, when registering the voice parameter, a warning prohibits the registration if the mutual distance between ゛ζ is within the minimum distance. Embodiment 1 of the invention, which is characterized in that it has a generation function, will be described below with reference to embodiments. FIG. 1 shows an outline of the configuration of a speech dictionary according to the invention.

要部は音声パラメータ１０で、１つの言葉例えば「神奈
川」、「川崎」などの音声の特徴を示す。The main part is the audio parameter 10, which indicates the characteristics of the audio of one word, such as "Kanagawa" or "Kawasaki."

具体的にはマイクに向って「神奈川」と発声して生じた
音声を〜２００〜５０００ＫＨｚの音声周波数帯を１６
チヤンネル程度に分割する１６個程度のバンドパスフィ
ルタに通して周波数分割し、１程度度続く各チャンネル
の出力を周期１０ｍ５程度のクロックでサンプリングし
、パワー、変動量、高域パワー・・・・・・を示ず第１
．第２．第３・・・・・・音声パラメータを得、これを
各サンプリング時点毎に並べたものである。即ち図では
単純に「音声パラメータ」としζいるが、詳しくは第１
〜第ｎ音声パラメータからなり、しかも各々がサンプリ
ング時点ｔ１．ｔ２・・・・・・ｔｍで変るもの、であ
る。Specifically, the sound produced by saying "Kanagawa" into a microphone is recorded in the audio frequency band of ~200 to 5000 KHz.
The frequency is divided through about 16 band-pass filters that divide it into channels, and the output of each channel, which lasts about 1 degree, is sampled with a clock cycle of about 10m5, and the power, fluctuation amount, high frequency power...・No.1
．． Second. Third...Audio parameters are obtained and arranged for each sampling time. In other words, in the figure, it is simply referred to as "voice parameter", but for more details, refer to Section 1.
~ nth audio parameters, each of which is sampled at the sampling time t1. t2... Something that changes at tm.

項目Ｉ２は音声パラメータ１０に対応する音声または言
葉、上記の例では神奈川、川崎であるが、具体例を挙げ
るとこれらを格納するテーブルのアｌ゛レスである。こ
の項目をアドレスとして該テーブルをアクセスすると神
奈川、川崎に対する音声または表示出力の入力信号デー
タが得られる。カテゴリ１４は当該項目のカテゴリ、本
例では項目は神奈川、川崎であるから都道府県、市町村
などく詳しくはそれを示すコート）である。例えばカテ
ゴリ１は開始、終了などの音声コマンド群、カテゴリー
２は都道前県名のデータセット、カテゴリ３は市町村名
のデータセットを示し、これは音声認識を容易、確実に
するのに使用する。例えば「日本」　「二本」は誤認識
しやすいものであるが、項目「日本」のカテゴリには「
国名」、項目１２本」のカテゴリには「本数」を示すも
のを格納しておき、「日本」を音声入力する際はキー操
作などでカテゴリ「国名」も入力すると、音声８７２　
ｍに際しては当該カテゴリの辞書内音声パラメータのみ
を取出し、比較して、簡単迅速に「日本」を認識結果と
することができる。Item I2 is the voice or word corresponding to voice parameter 10, which is Kanagawa or Kawasaki in the above example, but to give a specific example, it is the address of a table that stores these. When this table is accessed using this item as an address, input signal data for audio or display output for Kanagawa and Kawasaki can be obtained. Category 14 is the category of the item in question; in this example, since the items are Kanagawa and Kawasaki, it is a code indicating the prefecture, city, town, village, etc.). For example, category 1 is a group of voice commands such as start and end, category 2 is a data set of prefecture names, and category 3 is a data set of municipal names, which are used to facilitate and ensure voice recognition. . For example, "Japan" and "Nihon" are easily misrecognized, but the category of the item "Japan" is
The category ``Country name'', 12 items'' stores information indicating the ``number of items'', and when inputting ``Japan'' by voice, if you also input the category ``Country name'' using keystrokes, the voice 872
m, only the speech parameters in the dictionary for the category are extracted and compared, and "Japan" can be easily and quickly recognized as the result.

本発明ではか＼る辞書に最小距離１６の欄を設けておく
。これは辞書内の他の項目または音声パラメータとの距
離の許容最小値を示し、登録しようとする項目（言葉）
が該最小距離以下であると認識装置は警告を発し、他の
意味は同しであるが音声は異なる言葉に変えるよう指示
する。例えば「主導」が登録されているところへ「手動
」を登録しようとするとき、「しゅどう」と発声すると
最小距離以下になるから警報に従って［てどう］と言い
換えらせる。また本発明では制御コマンドのような重要
なものに対しては最小距離を大にし、イ以かよった音声
が登録されないようにする。このようにすると、闇値を
ある程度大にしても誤認識を生じる恐れはなく、そして
闇値が大きいので発声者の発声が若干変動しても充分認
識できる。In the present invention, a column for minimum distance 16 is provided in the dictionary. This indicates the minimum allowable distance of the item (word) you are trying to register to other items or phonetic parameters in the dictionary.
is less than the minimum distance, the recognition device issues a warning and instructs to change the voice to a different word, although the other meanings are the same. For example, when trying to register ``manual'' in a place where ``direction'' is registered, if you say ``shudo,'' the distance will be below the minimum distance, so follow the warning and have the user change the word to ``tedo.'' In addition, in the present invention, the minimum distance is set large for important items such as control commands, so that erroneous voices are not registered. In this way, even if the darkness value is increased to a certain extent, there is no risk of erroneous recognition, and since the darkness value is large, even if the speaker's utterance changes slightly, it can be sufficiently recognized.

第２図はアプリケーション例を示す。音声認識に際し発
声者は、これから入力する音声のカテゴリは音声コマン
ドであることをキーボード操作などを指定し、次いで「
開始」を音声入力する。認識装置は入力されたコマンド
と音声により認識を行なって「開始」指示であることを
知ると入力ＯＫを示すプロンプトメソセージ「〜をどう
ぞ」のガイドを音声でまたはＣＲＴ画面上に文字で出す
。FIG. 2 shows an example application. During voice recognition, the speaker specifies that the category of the voice to be input is a voice command using a keyboard operation, etc., and then presses "
"Start" by voice input. The recognition device recognizes the input command and voice, and when it finds that it is a ``start'' instruction, it issues a prompt message ``Please proceed'' indicating that the input is OK, either by voice or by text on the CRT screen.

発声者はそこでまたキーボード操作等でデータセット１
をキーインし、人力する音声のカテゴリは都道府県名で
あることを示すと共に「神奈川」と発声する。認識装置
ではこれらの入力されたカテゴリと音声を用いて認識を
ｉ−ｒない、「神奈川」を得て「神奈川ですね」と確認
を求める。発声者は１音声コマンド」のカテゴリを指示
し、かつ［−はい］を音音入力する。認識装置はこれで
認識結果が良であったことを知り、次の音声入力を促す
プロンプトメツセージ１〜をどうぞ」を出力する。The speaker then uses the keyboard etc. to select Dataset 1.
, and says "Kanagawa" while indicating that the category of the human-generated voice is the name of the prefecture. The recognition device performs recognition using these input categories and voices, obtains ``Kanagawa,'' and asks for confirmation, ``It's Kanagawa.'' The speaker indicates the category of ``1 voice command'' and inputs [-yes] as a sound. The recognition device now knows that the recognition result is good, and outputs the prompt message 1~, please for the next voice input.

そこで発声者は次の音声のカテゴリをデータセット２と
指定しかつ「川崎」を音声入力する。以下これに準じる
。The speaker then specifies the next voice category as data set 2 and inputs "Kawasaki" by voice. This shall apply hereinafter.

カテゴリを指定すると、定形的な処理に対し期待値（入
力）を限定できるので比較範囲が小となりかつ認識率を
向上させることができる。そして最小距離により、同一
カテゴリ中でも各項目（音声又は言葉）に優先度を設け
、重要な項目は最小距離を大きくとって至近距離には他
の登録項目が存在しないようにすると誤認識回避に有効
である。By specifying a category, the expected value (input) can be limited for regular processing, so the comparison range can be reduced and the recognition rate can be improved. Then, it is effective to avoid misrecognition by setting a priority for each item (sound or word) within the same category based on the minimum distance, and setting a large minimum distance for important items so that there are no other registered items in close range. It is.

最小距離は登録時に発声者が、項目、カテゴリと共に認
識装置に入力する。最小距離の大きいものつまり重要な
項目の登録は他の項目の登録の前に行なうのが有利であ
ろう、。更に最小距離はカテゴ　　　　′りという因子
を用いる場合は同一カテゴリ内で決定すればよい。The minimum distance is input by the speaker into the recognition device along with the item and category at the time of registration. It would be advantageous to register items with large minimum distances, that is, important items, before registering other items. Furthermore, when using the factor of category, the minimum distance can be determined within the same category.

第３図は認識装置の構成の概要を示し２０はＣＲＴディ
スプレイ、２２は制御部、２４は入力処理部、２６は音
声辞書である。処理部２４へはマイクにより音声が入力
され、前述の如き音声パラメータの抽出を行なう。制御
部２２はホスト計算機ＨＯＳ　’Ｆ及びキーボードＫＢ
と接続され、該キーボードより入力されたカテゴリ情報
と抽出部２４で抽出された音声パラメータを用い、音声
辞書２６を参照してホスト計算機と共に前述の距離計算
なとを行なって音声認識し、また前述の案内情報、警告
、言い換え要求を音声出力しまたＣＲＴディスプレイ２
０に可視表示したりする。FIG. 3 shows an outline of the configuration of the recognition device, and 20 is a CRT display, 22 is a control section, 24 is an input processing section, and 26 is a voice dictionary. Voice is input to the processing unit 24 through a microphone, and the voice parameters are extracted as described above. The control unit 22 includes a host computer HOS'F and a keyboard KB.
It uses the category information input from the keyboard and the voice parameters extracted by the extraction unit 24, refers to the voice dictionary 26, performs the distance calculation described above together with the host computer, performs voice recognition, and performs the voice recognition as described above. It outputs guidance information, warnings, and paraphrase requests by voice, and also displays the CRT display 2.
0 visually.

距Ｍ剖算には幾つかの計算法があるが、単純なものは入
力音声パラメータと辞書内音声パラメータとの差の和を
求める方法である。また音声パラメータは時系列である
からザンプリンタタイムを−１，２・・・・・・ｍに対
する入力音声パラメータ（これも複数あるが、こ−では
１つで代表する）ＩＰｌ、ＩＦ５・・・・・・ＩＰｍと
辞書内音声パラメータＤＰＩ、ＤＰ２・・・・・・ＤＰ
ｍとを同じタイミングのもの同志で比較する（ｉ＝１．
２．・・・・・・ｍとしてＩＰｌとＤＰｊを比較する）
方法と異なるタイミンクのもの同志を比較する（ｔ、ｊ
共に１．２．・・・・・・ｍであるがｉ≠ｊとしてＩＰ
ｉとＤＰｊを比較する）ＤＰマツチング法などがある。There are several calculation methods for calculating the distance M, but the simplest method is to calculate the sum of the differences between the input speech parameters and the speech parameters in the dictionary. Also, since the audio parameters are in time series, the input audio parameters for the printer time -1, 2, . ...IPm and the audio parameters in the dictionary DPI, DP2...DP
m at the same timing (i=1.
2. ...Compare IPl and DPj as m)
Compare methods and timings (t, j
Both 1.2.・・・・・・m, but IP as i≠j
There is a DP matching method (comparing i and DPj).

本発明ではいずれの距％１ｔＫＩ算法をとってもよい。In the present invention, any distance %1tKI calculation method may be used.

発明の詳細な説明したように本発明では音声辞書に他の項目との許
容最小距離を指定する最小比Ｒ部を設け、該最小距離内
に入る他の項目の登録を禁止するので、音声認識に必要
なマージン（前述の闇値）を変えずに誤認識を回避し、
正確な音声認識を行なうことができる。As described in detail, in the present invention, the voice dictionary is provided with a minimum ratio R section that specifies the minimum allowable distance from other items, and registration of other items that fall within the minimum distance is prohibited. Avoid misrecognition without changing the margin required for (the above-mentioned dark value),
Accurate speech recognition can be performed.

[Brief explanation of drawings]

第１図は音声辞書の構成の概要を示す説明図、第２図は
音声認識の要領を示す説明図、第３図は認識装置の概要
を示すブロック図である。図面で２４は入力処理部、２６は音声辞書、２２は制御
部、１６は最小距離指定領域である。出願人　富士通株式会社代理人弁理士　　青　　柳　　　　稔第１図第２図第３図０FIG. 1 is an explanatory diagram showing an overview of the structure of a speech dictionary, FIG. 2 is an explanatory diagram showing the outline of speech recognition, and FIG. 3 is a block diagram showing an outline of a recognition device. In the drawing, 24 is an input processing section, 26 is a speech dictionary, 22 is a control section, and 16 is a minimum distance designation area. Applicant Fujitsu Ltd. Representative Patent Attorney Minoru Aoyagi Figure 1 Figure 2 Figure 3 0

Claims

[Claims]

A speech recognition device comprising: an input processing section that extracts speech parameters of input speech; a speech dictionary that stores the speech parameters; and a control section that compares the input speech parameters and the speech parameters in the dictionary to find corresponding ones. An area for specifying the minimum allowable distance between itself and other registered audio parameters is provided, and a warning generation function is provided to prohibit registration if the mutual distance is within the minimum distance when registering audio parameters. Characteristic voice recognition device.