JPS6370296A

JPS6370296A - Word registration

Info

Publication number: JPS6370296A
Application number: JP61214366A
Authority: JP
Inventors: 敏奥山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-09-11
Filing date: 1986-09-11
Publication date: 1988-03-30

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概要〕単語音声認識の為に特定話者の単語を登録する場合、登
録する為に発した単語の音声特徴パラメータを、既にメ
モリに格納登録済の単語の音声特徴パラメータと比較し
、類似度の近いものがあれば、類似した語との違いを強
調して発声して貰うか、又は言い方の異なる同じ意味を
持つ単語を発して貰い、再度登録を行うようにすること
で、既にメモリに格納登録済の単語の音声特徴パラメー
タと類似度の近いものを登録しなくすることで、単語音
声認識時の認識率を向上するようにしたものである。[Detailed Description of the Invention] [Summary] When registering words of a specific speaker for word speech recognition, the speech feature parameters of the words uttered for registration are already stored in memory and the speech features of the registered words are stored. Compare it with the parameters, and if there are any words that are similar, have them say the word emphasizing the difference from similar words, or say a word that has the same meaning in a different way, and then register it again. By doing so, the recognition rate during word speech recognition is improved by not registering words that have a similar degree of similarity to the speech feature parameters of words that have already been stored and registered in the memory.

[Industrial application field]

本発明は、特定話者の単語音声認識の為に、特定話者の
単語を登録する場合の単語登録方法の改良に関する。The present invention relates to an improvement in a word registration method for registering words of a specific speaker for word speech recognition of a specific speaker.

特定話者の単語音声認識装置は、筒車で小形に出来るの
で、多くの装置に用いられている。Word speech recognition devices for specific speakers can be made compact using hour wheels, so they are used in many devices.

このｊ■語音声認識装置を使用するには誤認識の少ない
ことが要求される。In order to use this J ■ language speech recognition device, it is required that there are few false recognitions.

現在誤認識の改善が進められているが、ある程度限界に
きており、より誤認識を少なく出来る単語登録方法の提
供が要望されている。Although efforts are currently being made to improve misrecognition, they have reached their limits to some extent, and there is a demand for a word registration method that can further reduce misrecognition.

[Conventional technology]

以下従来例を図を用いて説明する。 A conventional example will be explained below using figures.

第３図は従来例の単語登録装置のプロ・７り図、第４図
は１例の音声特徴パラメータ抽出器のブロック図である
。FIG. 3 is a professional diagram of a conventional word registration device, and FIG. 4 is a block diagram of an example of a speech feature parameter extractor.

第３図に示す装置で単語を登録する場合は、特定話者が
単語を発すると、マイクロホン５を介してＡ／Ｄ変換部
６に人力し、ディジタル信号に変換され、音声特徴パラ
メータ抽出器１に人力し、音声特徴パラメータが抽出さ
れ、メモリ２に格納登録される。When registering a word using the device shown in FIG. The audio feature parameters are manually extracted and stored and registered in the memory 2.

この時、キーボード１２より、単語を意味するキャラク
タコードを入力し、メモリ２に一諸に格納登録しておく
。At this time, character codes representing the words are input from the keyboard 12 and are stored and registered in the memory 2.

尚これ等の制ＱＵはプロセッサ７にて行ない、１３はバ
スを示す。Note that these control QUs are performed by a processor 7, and 13 indicates a bus.

次に音声特徴パラメータ抽出器について説明すると、現
在上として用いられてものは、第４図に示すバンドパス
を用いるスペクトル包絡を求めるものである。Next, the voice feature parameter extractor will be explained. The one currently used is one that obtains a spectral envelope using a band pass as shown in FIG.

これは約２秒間位続（単語音声を、Ａ／Ｄ変換部６にて
ディジタル信号に変換したものを、１フレーム５１２サ
ンプル程度で、各フレーム毎に５〜３０個程度の通過帯
域幅が１００〜２００Ｈｚのバンドパスフィルタ群２０
により、低域は１００Ｈｚから高域７０００Ｈｚ位（電
話による音声の場合は３４００Ｈｚ）にわたる範囲で、
スペクトル分析を行い、正規化群２１にて正規化し、数
Ｈｚ〜’Ｑ十Ｈｚのカットオフ周波数のローパスフィル
タ群２２を通した分析値を総合し、音声特徴パラメータ
であるスペクトル包絡を求めるものである。This lasts about 2 seconds (word speech is converted into a digital signal by the A/D converter 6, and one frame has about 512 samples, and the passband width of about 5 to 30 samples per frame is 100 samples). ~200Hz bandpass filter group 20
Therefore, the low range ranges from 100Hz to the high range of about 7000Hz (3400Hz for telephone voice).
Spectral analysis is performed, normalized by a normalization group 21, and analyzed values passed through a low-pass filter group 22 with a cutoff frequency of several Hz to Q10 Hz are integrated to obtain a spectral envelope, which is a voice characteristic parameter. be.

[Problem that the invention seeks to solve]

しかしながら、従来の単語登録方法では、入力された単
語の音声特徴パラメータを抽出して、その侭、メモリへ
格納登録している為、特徴の似ている単語を格納登録す
ることもあり、単語音声認識時に、誤認識が多くなる問
題点がある。However, in the conventional word registration method, the audio feature parameters of the input word are extracted and then stored and registered in memory, so words with similar characteristics may be stored and registered. There is a problem that there are many erroneous recognitions during recognition.

[Means for solving problems]

上記問題点は、第１図の原理ブロック図に示す如く、登
録する為に発した単語の、音声特徴パラメータ抽出器１
にて抽出した音声特徴パラメータを、既にメモリ２に格
納登録済の単語の音声特徴パラメータと、音声特徴パラ
メータ比較部３にて比較し、類似度の近いものがあれば
、類似した語との違いを強調して発声して貰うか、又は
言い方の異なる同じ意味を持つ単語を発して貰い、再度
登録を行うようにした本発明の単語登録方法により解決
される。The above problem is solved by the speech feature parameter extractor 1 of the word uttered for registration, as shown in the principle block diagram of Fig. 1.
The speech feature parameters extracted in the above are compared with the speech feature parameters of the words already stored and registered in the memory 2 in the speech feature parameter comparison section 3, and if there are any words with similar degrees of similarity, the differences between the similar words are determined. This problem is solved by the word registration method of the present invention, in which the word registration method of the present invention is performed by asking the person to say the word with emphasis, or by having the person say a word with the same meaning in a different way, and then re-registering the word.

[Effect]

本発明によれば、単語登録時に、この単語の音声特徴パ
ラメータと、既にメモリ２に格納登録済の単語の音声特
徴パラメータとを比較し、類似度の近いものがあれば、
例えば、明瞭な言い方に直すとか、あだ名等の同じ意味
を持つ単語を発して貰い、再度登録することで、類似度
の高いものは登録しないようにしているので、単語音声
認識時、誤認識を少なくすることが出来る。According to the present invention, when registering a word, the phonetic feature parameters of this word are compared with the phonetic feature parameters of words that have already been stored and registered in the memory 2, and if there are words with similar degrees of similarity,
For example, by asking people to say words that have the same meaning, such as nicknames, and registering them again, we avoid registering words with a high degree of similarity. It can be reduced.

〔Example〕

以下本発明の１実施例に付き図に従って説明する。 An embodiment of the present invention will be described below with reference to the accompanying drawings.

第２図は本発明の実施例の単語登録装置のブロック図で
ある。FIG. 2 is a block diagram of a word registration device according to an embodiment of the present invention.

第２図での単語登録について説明すると、特定話者が単
語を発すると、マイクロホン５を介してＡ／Ｄ変換部６
に人力し、ディジタル信号に変換され、音声特徴パラメ
ータ抽出器１に入力し、音声特徴パラメータが抽出され
、ダイナミック・プログラミング・マツチング方式等の
音声特徴パラメータ比較部３に入力する。To explain the word registration in FIG.
The input signal is manually inputted, converted into a digital signal, and input to the voice feature parameter extractor 1, where the voice feature parameters are extracted and input to the voice feature parameter comparison unit 3 using a dynamic programming matching method or the like.

一方メモリ２に既に格納登録されている音声特徴パラメ
ータがプロセッサ７の制御により順次読み出され、音声
特徴パラメータ比較部３に送られ、順次比較され、類似
度の近いものがあれば、メモリ２に同時に格納登録され
ているキャラクタコードが表示部８に表示され、又この
キャラクタコードは音声合成部９に送られ音声信号が合
成され、スピーカ１１より単語音声が出力される。On the other hand, the voice feature parameters already stored and registered in the memory 2 are sequentially read out under the control of the processor 7, sent to the voice feature parameter comparison section 3, and compared sequentially. At the same time, the stored and registered character code is displayed on the display section 8, and this character code is also sent to the voice synthesis section 9, where a voice signal is synthesized, and the word voice is outputted from the speaker 11.

そこで特定話者は今発した単語はメモリ２に既に格納登
録された音声特徴パラメータと近似していることが判る
ので、特定話者には再度明瞭な言い方に直すとか、あだ
名等の同じ意味を持つ単語を発して買うようにしている
ので、単語音声認識時の誤認識を少なくすることが出来
る。Then, the specific speaker can see that the word he just uttered is similar to the voice feature parameters already stored and registered in memory 2, so the specific speaker may be asked to rephrase it in a clearer way, or give him a nickname or other similar meaning. Since the user is trying to buy by saying the word he or she has, it is possible to reduce misrecognition during word speech recognition.

この場合も、これ等の制御はプロセッサ７にて行われる
。In this case as well, these controls are performed by the processor 7.

尚このメモリ２に格納登録された音声特徴パラメータは
単語音声認識時にその侭使用される。Note that the voice feature parameters stored and registered in this memory 2 are used during word voice recognition.

次ぎに、音声特徴パラメータを比較する方法として、現
在上として用いられているダイナミック・プログラミン
グ・マツチング方式（ＤＰマツチング方式）について説
明する。Next, a dynamic programming matching method (DP matching method) currently used as a method for comparing voice feature parameters will be explained.

これは、入力する単語音声特徴パラメータと、これと比
較する単語音声特徴パラメータとの最もよいマツチング
が得られるように時間軸の対応づけを行い、単語音声に
おける時間軸の不均等な伸縮の影響を全く受けなく時間
正規化を行い、次ぎに両者のパターンの各点の比較を行
い時間正規化距離を求め、この値をパターンについて夫
々計算し、総和を求め、この総和が所定の値より小さい
時類似度が貰いとするものである。This is done by associating the time axes so as to obtain the best matching between the input word speech feature parameters and the word speech feature parameters being compared, and to eliminate the effects of uneven expansion and contraction of the time axis in word speech. Perform time normalization without receiving any data, then compare each point of both patterns to find the time normalized distance, calculate this value for each pattern, find the sum, and if this sum is less than a predetermined value It is assumed that the degree of similarity is obtained.

〔Effect of the invention〕

以上詳細に説明せる如（本発明によれば、既にメモリに
格納登録済の単語の音声特徴パラメータと類似度の近い
ものがあれば、類似した語との違いを強調して発声して
貰うか、又は言い方の異なる同じ意味を持つ単語を発し
て貰い、再度登録を行うことで、既にメモリに格納登録
済の単語の音声特徴パラメータと類似度の近いものを登
録しなくしているので、単語音声認識時の認識率を向上
出来る効果がある。As explained in detail above, (according to the present invention, if there is a word that has a similar degree of similarity to the voice feature parameters of a word that has already been stored and registered in the memory, the difference between the word and the similar word is emphasized and the voice is uttered.) , or a word that has the same meaning in a different way, and then registers it again. This prevents the registration of words that have a similar degree of similarity to the voice feature parameters of words that have already been stored and registered in memory. This has the effect of improving the recognition rate during recognition.

[Brief explanation of the drawing]

第１図は本発明の原理ブロック図、第２図は本発明の実施例の単語登録装置のブロック図、第３図は従来例の単語登録装置のブロック図、第４図は
１例の音声特徴パラメータ抽出器のブロック図である。図において、１は音声特徴パラメータ抽出器、２はメモリ、３は音声特徴パラメータ比較部、５はマイクロホン、６はＡ／Ｄ変換部、７はプロセッサ、８は表示部、９は音声合成部、１１はスピーカ、爪だ田川の大プレイΣすＱ′牟舒ト）貨峻昔捻のプロ・
フロ非　２　口従４−９」の御−ｑ月）釧９壱対姪のフ゛Ｑソ２渇羊　
３　りFig. 1 is a block diagram of the principle of the present invention, Fig. 2 is a block diagram of a word registration device according to an embodiment of the present invention, Fig. 3 is a block diagram of a conventional word registration device, and Fig. 4 is an example of speech. FIG. 2 is a block diagram of a feature parameter extractor. In the figure, 1 is a voice feature parameter extractor, 2 is a memory, 3 is a voice feature parameter comparison unit, 5 is a microphone, 6 is an A/D conversion unit, 7 is a processor, 8 is a display unit, 9 is a voice synthesis unit, 11 is the speaker, Tsumeda Tagawa's big play
Furohei 2 Kuchiju 4-9's Go-q Month) Sen 9 1 vs. niece's FQ So 2 Thirst
3 Ri

Claims

[Claims] When registering words of a specific speaker for word speech recognition, the speech feature parameters extracted by the speech feature parameter extractor (1) of the words uttered for registration are already stored in memory. The voice feature parameters of the words stored and registered in (2) are compared in the voice feature parameter comparison section (3), and if there are words with a similar degree of similarity, the words are uttered while emphasizing the differences from similar words. A word registration method characterized in that the word registration is performed again after receiving the word or having the word uttered in a different way but having the same meaning.