JPH04128898A

JPH04128898A - Voice recognition device

Info

Publication number: JPH04128898A
Application number: JP2251551A
Authority: JP
Inventors: Motoaki Koyama; 元昭児山
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-09-20
Filing date: 1990-09-20
Publication date: 1992-04-30

Abstract

PURPOSE:To easily increase a vocabulary and speakers to be recognized by generating the standard pattern of an input voice. CONSTITUTION:When an input voice is used for registration, a switch 12 is switched to the side of a standard pattern generation part 14 and the standard pattern of the vocabulary is generated for a voice pattern converted by a signal conversion part 11 by using data stored in a memory part 15. When the input voice is used for discrimination, the switch 12 is switching to the side of a similarity calculation part 13, the similarity between the registered standard pattern and input voice pattern is calculated and a decision part 17 outputs the standard pattern with the highest similarity. Standard patterns corresponding to three kinds of vocabulary of one speaker are registered in a memory part 16 for the standard patterns, but when a sufficient recognition result is obtained by the voicing of a different speaker, new voice registration is not performed.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は予め話者の音声を標準パターンとして登録し
ておき、入力音声パターンとこの標準パターンとの類似
度を計算して入力音声の認識を行う音声認識装置に関す
る。[Detailed Description of the Invention] [Objective of the Invention] (Field of Industrial Application) This invention registers a speaker's voice as a standard pattern in advance, and calculates the similarity between an input voice pattern and this standard pattern. The present invention relates to a speech recognition device that recognizes input speech.

（従来の技術）音声認識装置では、音声認識用の標準パターンと入力音
声パターンとの類似度が計算され、この計算結果に基づ
いて入力音声の認識が行われる。(Prior Art) A speech recognition device calculates the degree of similarity between a standard pattern for speech recognition and an input speech pattern, and recognizes the input speech based on the calculation result.

このように入力音声の認識が行われる音声認識装置は、
一般に不特定話者用と特定話者用の音声認識装置とに大
きく分類される。前者の不特定話者用音声認識装置では
、予め認識対象の語當が決定され、その語當を多数の話
者に発声させた音声を用いて音声認識用の標準パターン
が作成される。A speech recognition device that recognizes input speech in this way is
Speech recognition devices are generally classified into speech recognition devices for non-specific speakers and speech recognition devices for specific speakers. In the former speech recognition device for unspecified speakers, a word to be recognized is determined in advance, and a standard pattern for speech recognition is created using the speech of the word by a number of speakers.

後者の特定話者用音声認識装置では、使用する特定の話
者により認識させたいｇ　４９が予め登録される。In the latter speech recognition device for a specific speaker, g49 that is desired to be recognized by the specific speaker used is registered in advance.

（発明が解決しようとする課題）ところで、前者の不特定話者用音声認識装置では、音声
認識に際して音声の発声は不要であるが、予め用意しで
ある語彙の音声しか認識できないという問題がある。(Problem to be Solved by the Invention) Incidentally, the former speech recognition device for unspecified speakers does not require vocalization during speech recognition, but has a problem in that it can only recognize speech from vocabulary that has been prepared in advance. .

また後者の特定話者用音声認識装置では、前者とは異な
り認識対象話者の制限はないが、登録した話者以外の者
がその音声認識装置を使用するときに認識性能が低下す
るという問題がある。そのため、標準パターンを登録し
た話者以外の者が認識装置を使用するには新たに標準パ
ターンの登録をし直す必要がある。しかし、該音声認識
装置を使用する話者が増加すると、それに比例して標準
パターンを格納するメモリが大量に必要となる問題があ
る。In addition, unlike the former, the latter type of speech recognition device for specific speakers does not have any restrictions on the speakers to be recognized, but there is a problem that recognition performance deteriorates when a person other than the registered speaker uses the speech recognition device. There is. Therefore, in order for a person other than the speaker who has registered the standard pattern to use the recognition device, it is necessary to newly register the standard pattern. However, as the number of speakers using the speech recognition device increases, a problem arises in that a proportionally large amount of memory is required to store standard patterns.

この発明は上記のような事情を考慮してなされたもので
あり、その目的は、語索の増加が容易に行え、かつ認識
対象話者の増加も容易に行え、また認識対象話者が増加
した場合でもその話者数に比例した標準パターンの容量
の増加を伴わない音声認識装置を提供することにある。This invention was made in consideration of the above circumstances, and its purpose is to easily increase the number of word searches, easily increase the number of speakers to be recognized, and increase the number of speakers to be recognized. To provide a speech recognition device that does not involve an increase in standard pattern capacity in proportion to the number of speakers even when the number of speakers increases.

［発明の構成］（課題を解決するための手段とその作用）この発明の音
声認識装置は、入力音声から登録用の標準パターンを作
成する標準パターン作成部と、上記標準パターン作成部
において登録用の標準パターンを作成する際に使用され
るデータを格納する第１のメモリ部と、種々の音声に対
応した標準パターンを予め格納すると共に上記パターン
作成部で作成された標準パターンを格納する第２のメモ
リ部と、上記第２のメモリ部に格納されている標準パタ
ーンと認識用音声入力パターンとの類似度を求める類似
度計算部とを具備したことを特徴とする。[Structure of the Invention] (Means for Solving the Problems and Their Effects) The speech recognition device of the present invention includes a standard pattern creation section that creates a standard pattern for registration from input speech, and a standard pattern creation section that creates a standard pattern for registration from input speech. a first memory section that stores data used when creating standard patterns for the above, and a second memory section that stores standard patterns corresponding to various voices in advance and also stores standard patterns created by the pattern creation section. The present invention is characterized by comprising a memory section, and a similarity calculation section that calculates the similarity between the standard pattern stored in the second memory section and the speech input pattern for recognition.

上記発明では、登録時に予め第２のメモリ部に格納され
ている標準パターンと同−語彙の音声が異なる話者によ
り発声された場合は、パターン作成部で入力音声と第１
のメモリ部に格納されているデータとから新たな登録用
の標準パターンが作成され、第２のメモリ部の対応する
語彙の標準パターンが予め格納されている領域に再格納
される。In the above invention, when a voice with the same vocabulary as the standard pattern previously stored in the second memory section is uttered by a different speaker at the time of registration, the pattern creation section matches the input voice with the first one.
A new standard pattern for registration is created from the data stored in the second memory section, and is re-stored in the area where the standard pattern of the corresponding vocabulary is previously stored in the second memory section.

また、登録時に第２のメモリ部に標準パターンが格納さ
れていない新たな語索の音声が異なる話者により始めて
発声された場合は、パターン作成部で入力音声と第１の
メモリ部に格納されているデータとから登録用の標準パ
ターンが作成され、第２のメモリ部に新たに格納される
。Additionally, if the voice of a new word search for which the standard pattern is not stored in the second memory section at the time of registration is uttered for the first time by a different speaker, the pattern creation section will combine the input voice with the standard pattern stored in the first memory section. A standard pattern for registration is created from the existing data and newly stored in the second memory section.

また、この発明の音声認識装置は、種々の音声に対応し
た標準パターンを予め格納するメモリ部と、入力音声及
び上記メモリ部に格納されている標準パターンを用いて
新たな標準パターンを作成し上記メモリ部に再格納させ
るパターン作成部と、上記メモリ部に格納されている標
準パターンと認識用音声入力パターンとの類似度を求め
る類似度計算部とを具備したことを特徴とする。Further, the speech recognition device of the present invention includes a memory section that stores in advance standard patterns corresponding to various voices, and creates a new standard pattern using the input voice and the standard patterns stored in the memory section. The present invention is characterized by comprising a pattern creation section that causes the pattern to be re-stored in the memory section, and a similarity calculation section that calculates the similarity between the standard pattern stored in the memory section and the voice input pattern for recognition.

上記発明では、登録時にメモリ部に格納されている標準
パターンと同−語量の音声が異なる話者により発声され
た場合は、パターン作成部で入力音声とメモリ部に格納
されている標準パターンとから新たな標準パターンが作
成され、メモリ部に再格納される。In the above invention, when a voice with the same number of words as the standard pattern stored in the memory section is uttered by a different speaker at the time of registration, the pattern creation section compares the input voice with the standard pattern stored in the memory section. A new standard pattern is created from this and stored again in the memory section.

（実施例）以下、図面を参照してこの発明を実施例により説明する
。(Examples) Hereinafter, the present invention will be explained by examples with reference to the drawings.

第１図はこの発明に係る音声認識装置の第１の実施例に
係る構成を示すブロック図である。発声入力された音声
は信号変換部１１において音声パターンに変換される。FIG. 1 is a block diagram showing the configuration of a first embodiment of a speech recognition device according to the present invention. The input voice is converted into a voice pattern in the signal converter 11.

ここで変換された音声パターンはスイッチ１２を介して
類似度計算部１３及び標準パターン作成部１４に選択的
に供給される。上記標準パターン作成部１４では、入力
音声パターンと標準パターン作成データ用メモリ部１５
に格納されている標準パターン作成用のデータとを用い
て、入力音声パターンから登録用の標準ノくターンが作
成される。上記標準パターン作成データ用メモリ部１５
には、予め複数の話者が発声した特定語當の音声を用い
て作成された標準パターン作成用の核となるデータが格
納されている。そして、上記標準パターン作成部１４で
作成された標準パターンは標準パターン用メモリ部１６
に送られ、所定の領域に格納される。The voice pattern converted here is selectively supplied to the similarity calculation section 13 and the standard pattern creation section 14 via the switch 12. In the standard pattern creation section 14, an input voice pattern and a memory section 15 for standard pattern creation data are stored.
A standard pattern for registration is created from the input voice pattern using the standard pattern creation data stored in . Memory section 15 for the above standard pattern creation data
stores core data for standard pattern creation, which is created in advance using sounds of specific words uttered by a plurality of speakers. The standard pattern created by the standard pattern creation section 14 is stored in a standard pattern memory section 16.
and stored in a predetermined area.

一方、上記類似度計算部１３では入力音声パターンと上
記標準パターン用メモリ部ＩＢに格納されている標準パ
ターンとの類似度が計算され、その計算結果は判定部１
７に送られる。判定部１７では各標準パターンと入力音
声パターンとの間で計算された類似度の計算結果に基づ
き、最も類似度が高い標準パターンに対応した語彙が入
力された音声であるとの認識結果が出力される。On the other hand, the similarity calculation unit 13 calculates the similarity between the input voice pattern and the standard pattern stored in the standard pattern memory unit IB, and the calculation result is sent to the determination unit 1.
Sent to 7. Based on the calculation result of the degree of similarity calculated between each standard pattern and the input speech pattern, the determination unit 17 outputs a recognition result indicating that the input voice has the vocabulary corresponding to the standard pattern with the highest degree of similarity. be done.

次に上記構成でなる装置の動作を説明する。まず、音声
が入力されると、その入力音声を認識に用いるか、登録
に用いるかに応じてスイッチ１２が切り替えられる。な
お、このスイッチ１２の切り替え動作は、図示しない制
御回路の制御の下に−、行われる。入力音声を登録に用
いる場合にはスイッチ１２が標準パターン作成部１４側
に切り替えられる。Next, the operation of the apparatus having the above configuration will be explained. First, when a voice is input, the switch 12 is changed depending on whether the input voice is used for recognition or registration. Note that this switching operation of the switch 12 is performed under the control of a control circuit (not shown). When input audio is used for registration, the switch 12 is switched to the standard pattern creation section 14 side.

従って、信号変換部１１で変換された音声パターンは標
準パターン作成部１４に供給される。、標準パターン作
成部１４では、予め標準パターン作成データ用メモリ部
１５に格納されているデータと変換された音声パターン
とを用いて対応する語索の標準パターンが作成され、そ
の後、標準パターン用メモリ１６に格納される。Therefore, the audio pattern converted by the signal converter 11 is supplied to the standard pattern generator 14. In the standard pattern creation unit 14, a standard pattern of the corresponding word search is created using the data stored in advance in the standard pattern creation data memory unit 15 and the converted voice pattern, and then the standard pattern is stored in the standard pattern creation data memory 15. 16.

一方、入力音声を識別に用いる場合には、スイッチ１２
が類似度計算部１３側に切り替えられる。この場合には
信号変換部１１で変換された音声パターンが類似度計算
部１３に供給される。そして、この類似度計算部１３で
は、予め登録（格納）されているいくつかの語當に対応
した標準パターンと入力音声パターンとの類似度が計算
され、その後、判定部１７において最も類似度が高い標
準パターンに対応した語當が入力された音声であるとの
認識結果が出力される。On the other hand, when the input voice is used for identification, the switch 12
is switched to the similarity calculation unit 13 side. In this case, the audio pattern converted by the signal conversion section 11 is supplied to the similarity calculation section 13. Then, the similarity calculation unit 13 calculates the similarity between the input speech pattern and the standard pattern corresponding to some words registered (stored) in advance, and then the determination unit 17 calculates the similarity between the input speech pattern and the standard pattern corresponding to some words registered (stored) in advance. A recognition result is output indicating that the input voice corresponds to a high standard pattern.

次に複数の話者が上記実施例装置を使用する場合の動作
を説明する。なお、標準パターン用メモリ部１Ｂには、
ある話者により、第２図に示すように３種類の語＊ｗ１
．ｗ２．ｗ３に対応した標準パターンＷｌ、Ｗ２．Ｗ３
が予め登録（格納）されているものとする。この状態で
登録をしたときとは異なる話者が上記３種類の語當ｗｌ
、ｗ２ｗ３に対応した音声を発声して認識を行わせる際
に、十分な認識結果が得られる場合には新たな音声の登
録は行われない。しかし、新しい語紮の登録や、認識性
能が満足でない藷常について使用者の音声の登録が必要
な場合には、新たに登録が行われる。例えば第２図に示
すように話者Ａが語紮ｗ２．ｗ４の音声を発声した場合
、それぞれの語常について標準パターン作成部１４で標
準パターンＷ２’　、Ｗ４が作成される。作成された標
準パターンは標準パターン用メモリ部１６に送られるが
、話者Ａの語當Ｗ２に対応した標準パターンＷ２’はい
ままでＷ２が格納されていた領域に再格納され、新しい
誘電Ｗ４に対応した標準パターンＷ４は標４パターン用
メモリ部１６の新しい領域に格納される。この状態で上
記とは異なる話者Ｂが上記４種類の語當ｗ　１　、　ｗ
　２．　ｗ　３．　ｗ４に対応した音声を発声して認識
を行わせる際に、十分な認識結果が得られる場合には新
たな音声の登録は行われない。Next, the operation when a plurality of speakers use the device of the above embodiment will be explained. In addition, in the standard pattern memory section 1B,
As shown in Figure 2, a speaker uses three types of words *w1.
．． w2. Standard patterns Wl, W2. corresponding to w3. W3
is registered (stored) in advance. In this state, a different speaker than when registering is using the above three types of words.
, w2w3 is uttered for recognition, and if sufficient recognition results are obtained, no new voice is registered. However, if it is necessary to register a new rhyme or to register the user's voice for a phrase whose recognition performance is not satisfactory, new registration is performed. For example, as shown in FIG. 2, speaker A pronounces w2. When the voice w4 is uttered, the standard pattern creation unit 14 creates standard patterns W2' and W4 for each common word. The created standard pattern is sent to the standard pattern memory unit 16, but the standard pattern W2' corresponding to the word W2 of speaker A is stored again in the area where W2 was previously stored, and is stored in the new dielectric W4. The corresponding standard pattern W4 is stored in a new area of the standard 4 pattern memory section 16. In this state, a speaker B different from the above speaks the four types of words w 1 , w
2. w 3. When a voice corresponding to w4 is uttered to perform recognition, if a sufficient recognition result is obtained, no new voice is registered.

しかし、新しい誘電の音声の追加登録や、話者Ｂにとっ
て認識性能の低い語當に対する音声の登録が必要な場合
には、上記と同様に新たに登録か行われる。例えば第２
図に示すように話者Ｂが語＄１ｗｌ、ｗ５の音声を発声
した場合、それぞれの誘電について標準パターン作成部
１４で標準パターンＷｌ’　、Ｗ５が作成される。作成
された標準パターンは標準パターン用メモリ部１６に送
られ、話者Ｂの語當Ｗ１に対応した標準パターンＷｌ’
　はいままでＷｌが格納されていた領域に再格納され、
新しい語當Ｗ５に対応した標準パターンＷ５は標準パタ
ーン用メモリ部１６の新しい領域に格納される。従って
、最終的には当初と比べて誘電Ｗ４とｗ５に対応する標
準パターンＷ４．Ｗ５が追加されているか、誘電ｗ　ｌ
　、　ｗ　２については多人数の話者の特徴を含む更新
された標準パターンＷｌ’Ｗ２’　となっている。However, if it is necessary to additionally register a new dielectric voice or to register a voice for a word with low recognition performance for speaker B, a new registration is performed in the same manner as above. For example, the second
As shown in the figure, when speaker B utters the words $1wl and w5, the standard pattern creation section 14 creates standard patterns Wl' and W5 for the respective dielectrics. The created standard pattern is sent to the standard pattern memory unit 16, and a standard pattern Wl' corresponding to the word W1 of speaker B is sent to the standard pattern memory unit 16.
is re-stored in the area where Wl was previously stored,
The standard pattern W5 corresponding to the new word W5 is stored in a new area of the standard pattern memory section 16. Therefore, in the end, the standard pattern W4.corresponding to the dielectrics W4 and w5 is different from the original one. W5 is added or dielectric w l
, w2 is an updated standard pattern Wl'W2' that includes the characteristics of multiple speakers.

このように上記実施例の音声認識装置では、同一語彙に
対する登録話者が増加するのに伴い、標準パターン用メ
モリ部１６に格納される標準パターンに含まれる話者の
特徴が多人数のものになるため、不特定話者用音声認識
装置に近いものとなり、登録していない話者に対する認
識率が向上する。As described above, in the speech recognition device of the above embodiment, as the number of registered speakers for the same vocabulary increases, the characteristics of the speakers included in the standard pattern stored in the standard pattern memory section 16 change to those of a large number of speakers. Therefore, it is similar to a speech recognition device for unspecified speakers, and the recognition rate for unregistered speakers is improved.

また、多数の話者が登録を行うのに伴い、標準パターン
用メモリ部１６に格納される標準パターンが変化してい
くため、不特定話者用音声認識装置のように一度に多人
数の発声データを収集する必要がなく、標準パターンの
作成に長時間を費やす必要もない。さらに複数の話者が
同一語彙に対して標準パターンの登録を行うときは話者
数に比例した量のメモリ容量は不要であり、メモリの使
用効率を高めることができる。In addition, as the standard pattern stored in the standard pattern memory section 16 changes as a large number of speakers register, it is possible to use a voice recognition system for multiple speakers at once, such as in a speech recognition device for unspecified speakers. There is no need to collect data or spend time creating standard patterns. Furthermore, when a plurality of speakers register standard patterns for the same vocabulary, a memory capacity proportional to the number of speakers is not required, and memory usage efficiency can be improved.

なお、上記実施例において、標準パターン作成部１４と
しては周知の主軸話者均等法、主軸話者特定法や平滑／
微分パターン直交化法等により入力音声から登録用の標
準パターンを作成するものが使用可能であり、標準パタ
ーン作成データ用メモリ部１５としてはマスクＲＯＭ（
Ｒｅａｄ　０ｎｌｙ　ＭｅＩｌｏｒｙ）、ＰＲＯＭ　　
（Ｐｒｏｇｒａｍｍａｂｌｅ　　ＲＯＭ）　　　　ＥＰ
ＲＯＭ（ＥｒａｓａｂｌｅＰＲＯＭ）、　Ｅ２ＦＲＯＭ
　（Ｅｌｅｃｔｒｉｃａｌ　Ｉｙ　　ＥＰＲＯＭ）、Ｃ
Ｄ（ＣｏＩｌｐａｃｔ　Ｄｉｓｋ）装置、ＦＤ（Ｆｌｏ
ｐｌ）ｙ　Ｄｉｓｋ）装置やＨＤ（Ｈａｒｄ　Ｄｉｓｋ
）装置等が使用可能であり、さらに標準パターン用メモ
リ部１６としてはＲＡＭ　（ＲａｎｄｏｍＡｃｃｅｓｓ
　Ｍｅｍｏｒｙ）、Ｅ２ＰＲＯＭ、ＦＤ装置、ＨＤ装置
や光デイスク装置等、データの書き込みが可能な記録媒
体ならばどのようなものでも使用可能である。In the above embodiment, the standard pattern creation unit 14 uses the well-known dominant speaker equality method, dominant speaker identification method, smoothing/
It is possible to use a device that creates a standard pattern for registration from input audio using a differential pattern orthogonalization method, etc., and a mask ROM (
Read 0nly MeIlory), PROM
(Programmable ROM) EP
ROM (Erasable PROM), E2FROM
(Electrical Iy EPROM), C
D (CoIlpact Disk) device, FD (Flo
pl)y Disk) device and HD (Hard Disk)
) device, etc. can be used, and as the standard pattern memory section 16, a RAM (Random Access
Any recording medium on which data can be written can be used, such as a memory, an E2PROM, an FD device, an HD device, or an optical disk device.

そして、上記標準パターン作成データ用メモリ部１５に
格納されるデータは例えばに−Ｌ（Ｋａｒｈｕｎｅｎ−
Ｌｏｅｖｅ）展開法、微分フィルタ法、微分直交化フィ
ルタ法、ベクトル量子化法やパックプロパゲーション法
等による方法によって作成することができる。一方、上
記類似度計算部１３における計算方法としては、ＤＰ（
Ｄｙｎａｍｉｃ　Ｐｒｏｇｒａｍｍｉｎｇ）マツチング
（Ｍａｔｅ旧ｎｇ）や複合類似度法、ＨＭＭ（）Ｉｌｄ
ｄｅｎＭａｒｃｏｎ　Ｍｏｄｅｌ）を用１いた類似度法
、ファジィ（Ｆｕｚｚｙ）推論を用いた類似度法等を用
いて入力パターンと標準パターンとの類似度を計算する
ことができる。The data stored in the standard pattern creation data memory section 15 is, for example, -L (Karhunen-
Loeve) expansion method, differential filter method, differential orthogonalization filter method, vector quantization method, pack propagation method, or the like. On the other hand, as a calculation method in the similarity calculation section 13, DP (
Dynamic Programming) matching (Mate old ng), composite similarity method, HMM ()Ild
The similarity between the input pattern and the standard pattern can be calculated using a similarity method using the denMarcon Model 1, a similarity method using fuzzy inference, or the like.

第３図はこの発明に係る音声認識装置の第２の実施例に
係る構成を示すブロック図である。この実施例装置が第
１図装置と異なる点は、標準パターン作成データ用メモ
リ部１５が設けられていない点と、登録時に標準パター
ン作成部１４がスイッチ１２を介して供給される入力音
声パターンと、予め標準パターン用メモリ部１６に格納
されている標準パターンとを用いて新たな登録用の標準
パターンを作成する点の２点である。従って、標準パタ
ーン用メモリ部１６に格納される標準パターンは、異な
る話者による登録が行われる毎に順次変化していき、多
人数の話者の特徴を融合したものとなるが、初期の標準
パターンは失われることになる。FIG. 3 is a block diagram showing the configuration of a second embodiment of the speech recognition device according to the present invention. This embodiment device differs from the device shown in FIG. 1 in that a memory section 15 for standard pattern creation data is not provided, and at the time of registration, the standard pattern creation section 14 uses the input audio pattern supplied via the switch 12. The two points are that a new standard pattern for registration is created using the standard pattern previously stored in the standard pattern memory section 16. Therefore, the standard pattern stored in the standard pattern memory section 16 changes sequentially each time a different speaker registers it, and it becomes a pattern that combines the characteristics of multiple speakers. The pattern will be lost.

しかし、第１図の実施例装置と比べて標準パターン作成
データ用メモリ部１５が不要となり、第１図の場合と比
べてメモリ容量を削減することができる。However, compared to the embodiment apparatus shown in FIG. 1, the standard pattern creation data memory section 15 is not required, and the memory capacity can be reduced compared to the case shown in FIG.

第４図はこの発明に係る音声認識装置の第３の実施例に
係る構成を示すブロック図である。この実施例装置が第
１図装置と異なる点は、前記標準パターン用メモリ部１
６と標準パターン作成部１４及び標準パターン作成デー
タ用メモリ部１５との間に新たにスイッチ１８が設けら
れている点である。FIG. 4 is a block diagram showing the configuration of a third embodiment of the speech recognition device according to the present invention. The difference between this embodiment device and the device shown in FIG. 1 is that the standard pattern memory section 1
6 and the standard pattern creation section 14 and the standard pattern creation data memory section 15, a switch 18 is newly provided.

この実施例装置では、標準パターン作成部１４て登録用
の標準パターンを作成する場合にスイッチ１８の切り替
え操作により、標準パターン作成データ用メモリ部１５
に格納されているデータもしくは標準パターン用メモリ
部１６に格納されている標準パターンのいずかを選択し
て使用することかできる。このため、最初の登録時には
スイッチ１８を標準パターン作成データ用メモ９部１５
側に切り替え、この標準パターン作成データ用メモリ部
１５に格納されているデータを用いて登録用の標準パタ
ーンを作成し、その後は、スイッチ１８を標準パターン
用メモ９部１６側に切り替え、標準パターン用メモリ部
１６に格納された標準パターンを順次更新して異なる話
者の音声登録を行うことができる。従って、この実施例
装置では、ノイズの影響や誤動作等により不適当な標準
パターンが標準パターン用メモリ部１６に登録された場
合でも、スイッチＩ８をを標準パターン作成データ用メ
モ９部１５側に切り替え、ここに格納されているデータ
を用いて登録用の標準パターンを作成することにより、
初期状態から登録をやり直すことが可能になる。In this embodiment, when the standard pattern creation section 14 creates a standard pattern for registration, the standard pattern creation data memory section 15
It is possible to select and use either the data stored in the standard pattern memory section 16 or the standard patterns stored in the standard pattern memory section 16. Therefore, when registering for the first time, the switch 18 is set to the memo 9 section 15 for standard pattern creation data.
side, and create a standard pattern for registration using the data stored in this standard pattern creation data memory section 15. After that, switch 18 to the standard pattern memo 9 section 16 side, and create the standard pattern. It is possible to register the voices of different speakers by sequentially updating the standard patterns stored in the memory unit 16. Therefore, in this embodiment, even if an inappropriate standard pattern is registered in the standard pattern memory section 16 due to the influence of noise or malfunction, the switch I8 is switched to the standard pattern creation data memo 9 section 15 side. , by creating a standard pattern for registration using the data stored here.
It becomes possible to re-register from the initial state.

［発明の効果］以上説明したようにこの発明によれば、藷當の増加が容
易に行え、かつ認識対象話者の増加も容易に行え、また
認識対象話者が増加した場合でもその話者数に比例した
標準パターンの容量の増加を伴わない音声認識装置を提
供することができる。[Effects of the Invention] As explained above, according to the present invention, the number of speakers can be easily increased, and the number of speakers to be recognized can also be easily increased. It is possible to provide a speech recognition device that does not require an increase in the capacity of standard patterns in proportion to the number of standard patterns.

[Brief explanation of drawings]

第１図はこの発明の第１の実施例係る構成を示すブロッ
ク図、第２図は上記実施例装置の標準パターン用メモリ
部における標準パターンの格納状態を示す図、第３図は
この発明の第２の実施例係る構成を示すブロック図、第
４図はこの発明の第３の実施例係る構成を示すブロック
図である。１１・・・信号変換部、１２．１８・スイッチ、１３・
・・類似度計算部、１４・・・標準パターン作成部、１
５・・・標準パターン作成データ用メモリ部、１Ｂ・・
・標準パターン用メモリ部、１７・・・判定部。FIG. 1 is a block diagram showing the configuration of a first embodiment of the present invention, FIG. 2 is a diagram showing the storage state of standard patterns in the standard pattern memory section of the device of the above embodiment, and FIG. 3 is a block diagram showing the configuration of a first embodiment of the present invention. FIG. 4 is a block diagram showing the configuration according to the second embodiment, and FIG. 4 is a block diagram showing the configuration according to the third embodiment of the present invention. 11...Signal converter, 12.18.Switch, 13.
・・Similarity calculation unit, 14 ・Standard pattern creation unit, 1
5...Memory section for standard pattern creation data, 1B...
- Standard pattern memory section, 17...judgment section.

Claims

[Claims]

(1) a standard pattern creation unit that creates a standard pattern for registration from input audio; a first memory unit that stores data used when creating the standard pattern for registration in the standard pattern creation unit; a second memory section that stores in advance standard patterns corresponding to various sounds and also stores the standard patterns created by the standard pattern creation section; A speech recognition device comprising: a similarity calculation unit that calculates similarity with a speech input pattern.

(2) When voices of the same vocabulary by different speakers are input, the new standard pattern for registration created by the standard pattern creation section is set to the standard pattern of the corresponding vocabulary in the second memory section. 2. The speech recognition device according to claim 1, wherein the speech recognition device is configured to be re-stored in a pre-stored area.

(3) A memory unit that stores in advance standard patterns corresponding to various voices, and a standard that creates new standard patterns using input voices and the standard patterns stored in the memory unit and stores them again in the memory unit. A speech recognition device comprising: a pattern creation section; and a similarity calculation section that calculates the similarity between a standard pattern stored in the memory section and a speech input pattern for recognition.

(4) The standard pattern creation unit is configured to change the standard pattern stored in the second memory unit and create a new standard pattern when the same vocabulary is input by different speakers. The speech recognition device according to claim 1 or 3.

(5) The speech recognition device according to claim 1 or 3, wherein the standard pattern creation section creates a standard pattern for registration from input speech using a dominant speaker equality method.

(6) The speech recognition device according to claim 1 or 3, wherein the standard pattern creation section creates a standard pattern for registration from input speech using a dominant speaker identification method.

(7) The speech recognition device according to claim 1 or 3, wherein the standard pattern creation section creates a standard pattern for registration from input speech using a smoothing/differential pattern orthogonalization method.