JPH0619492A

JPH0619492A - Speech recognizing device

Info

Publication number: JPH0619492A
Application number: JP4178226A
Authority: JP
Inventors: Shinichi Tsurufuji; 真一鶴藤
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1992-07-06
Filing date: 1992-07-06
Publication date: 1994-01-28

Abstract

PURPOSE:To provide the speech recognizing device which decreases errors in matching and improves recognition performance even if the input level of a speech is less than a threshold value set according to a noise level at the time of the recognition of the speech inputted in a noisy environment. CONSTITUTION:The speech recognizing device is equipped with a threshold value setting means 3 which sets the threshold value on the basis of the noise parameter of the input level varying with the input of the speech signal of the speech to be recognized in the presence of the noise, a speech section determining means 4 which determines the effective speech sections of the speech pattern of the speech to be recognized and a standard speech pattern according to the threshold value set by the threshold value setting means 3, and a discriminating means 7 which compares the speech pattern of the object speech recognized by the speech section determining means 4 with the standard speech pattern to discriminate the speech pattern of the speech.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に関する
ものであり、特に周囲雑音の影響に左右されずに認識可
能な音声認識装置である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device, and more particularly to a voice recognition device capable of recognizing regardless of the influence of ambient noise.

【０００２】[0002]

【従来の技術】従来の音声認識装置のブロック図を図６
に示す。2. Description of the Related Art A block diagram of a conventional voice recognition apparatus is shown in FIG.
Shown in.

【０００３】同図において、１ａは音声を入力するマイ
クロフォン、２ａはマイクロフォン１ａから入力された
音声を分析し、その特徴パラメータを抽出する特徴抽出
手段、３ａはマイクロフォン１ａから入力された音声の
雑音パラメータの更新を行なう雑音更新部、３ｂは雑音
更新部３ａにて更新された雑音パラメータに基づいて、
音声区間を決定するための閾値を設定する閾値設定部で
あり、これらの雑音更新部３ａ及び閾値設定部３ｂから
閾値設定手段３が構成されている。In the figure, 1a is a microphone for inputting a voice, 2a is a feature extracting means for analyzing a voice input from the microphone 1a and extracting a feature parameter thereof, 3a is a noise parameter of the voice input from the microphone 1a. Based on the noise parameters updated by the noise updating unit 3a,
This is a threshold value setting unit that sets a threshold value for determining a voice section, and the noise updating unit 3a and the threshold value setting unit 3b constitute a threshold value setting unit 3.

【０００４】４は特徴抽出手段２ａにて抽出された特徴
パラメータ、及び閾値設定部３ｂにて設定された閾値に
基づいて、入力された音声の始端、終端の検出を行って
認識するのに有効な音声区間の切り出しを行なう入力音
声区間決定手段、５は入力音声区間決定手段４、及び雑
音更新部３ａの雑音パラメータに基づいて、雑音除去並
びにパターン作成を行なう入力音声パターン作成手段、
６はマイクロフォン１ａから入力された音声の中で、入
力音声パターン作成手段５にて作成された標準音声パタ
ーンを記憶しておく第１標準音声パターン記憶手段、７
は入力音声パターン作成手段５にて作成された入力音声
の音声パターンと、第１標準音声パターン記憶手段６に
記憶された音声パターンとを比較照合する識別手段であ
る。Numeral 4 is effective for detecting and recognizing the start and end of the input voice based on the feature parameter extracted by the feature extracting means 2a and the threshold value set by the threshold value setting section 3b. Input voice section determining means 5 for cutting out a different voice section, reference numeral 5 denotes an input voice pattern creating means for removing noise and creating a pattern based on the noise parameters of the input voice section determining means 4 and the noise updating section 3a,
Reference numeral 6 denotes a first standard voice pattern storage means for storing the standard voice pattern created by the input voice pattern creation means 5 in the voice input from the microphone 1a, 7
Is an identification means for comparing and collating the voice pattern of the input voice created by the input voice pattern creating means 5 with the voice pattern stored in the first standard voice pattern storage means 6.

【０００５】斯る構成において、例えば図７（ａ）のよ
うな、例えば「アップ」という、雑音を含まない標準音
声が入力された際の標準音声パターンを第１標準音声パ
ターン記憶手段６へ登録する方法について述べる。In such a configuration, a standard voice pattern when a standard voice that does not include noise, for example, "up" as shown in FIG. 7A, is input is registered in the first standard voice pattern storage means 6. How to do is described.

【０００６】まず、図示しない登録スイッチを押下して
登録モードに設定し、第１標準音声パターン記憶手段６
を書き込み可能状態にする。この後、マイクロフォン１
ａを介して標準音声パターンの音声を入力する。First, a registration switch (not shown) is pressed to set the registration mode, and the first standard voice pattern storage means 6 is set.
To the writable state. After this, microphone 1
The voice of the standard voice pattern is input via a.

【０００７】特徴抽出手段２ａでは、入力された標準音
声パターンの逐次変化する入力レベルを雑音更新部３ａ
に送ると共に、その標準音声パターンの特徴パラメータ
を、例えばフィルタバンク法を用いて抽出する。その標
準音声パターンの特徴パラメータの抽出方法を具体的に
述べると、音声帯域を８つのバンドパスフィルタで分割
し、その音声の特徴を抽出し、フィルタからの出力を一
定の時間間隔でＡ／Ｄ変換することによって、音声スペ
クトルの時系列、即ち音声パターンである特徴パラメー
タが得られる。In the feature extracting means 2a, the noise updating unit 3a calculates the input level of the input standard speech pattern which changes successively.
And the characteristic parameters of the standard speech pattern are extracted using, for example, the filter bank method. The method of extracting the characteristic parameters of the standard speech pattern will be described in detail. The speech band is divided by eight bandpass filters, the characteristic of the speech is extracted, and the output from the filter is A / D at regular time intervals. By the conversion, the time series of the voice spectrum, that is, the characteristic parameter that is the voice pattern is obtained.

【０００８】雑音更新部３ａでは、特徴抽出手段２ａか
ら逐次送られてくる入力レベルに基づいて雑音パラメー
タの更新を行ない、この雑音パラメータを閾値設定部３
ｂに送る。閾値設定部３ｂでは、雑音更新部３ａから送
られる雑音パラメータに基づいて閾値を設定し、この閾
値情報を入力音声区間決定手段４に送る。これと同時
に、入力音声区間決定手段４は最終的な閾値が決定され
ると、雑音更新部３ａに雑音パラメータの更新を中止す
る指令を送る。The noise updating section 3a updates the noise parameter based on the input level successively sent from the feature extracting means 2a, and the noise parameter is set to the threshold setting section 3a.
send to b. The threshold value setting unit 3b sets a threshold value based on the noise parameter sent from the noise updating unit 3a, and sends this threshold value information to the input voice section determining means 4. At the same time, when the final threshold value is determined, the input voice section determining means 4 sends a command to the noise updating section 3a to stop updating the noise parameters.

【０００９】なお、上記閾値は、通常入力される音声の
入力レベルの約１．５乃至２倍程度に設定されることが
好ましい。The threshold value is preferably set to about 1.5 to 2 times the input level of the normally input voice.

【００１０】入力音声区間決定手段４では、特徴抽出手
段２ａから送られた特徴パラメータと閾値設定部３ｂで
設定された閾値とを比較し、その音声区間の始端、及び
終端を決定する。The input voice section determining means 4 compares the feature parameter sent from the feature extracting means 2a with the threshold value set by the threshold setting section 3b, and determines the beginning and end of the voice section.

【００１１】入力音声パターン作成手段５では、図７
（ａ）の如く音声区間Ａ−Ｂが決定された標準音声パタ
ーンを第１標準音声パターン記憶手段６に送り、その音
声パターンを登録する。上述の如き、複数個の異なる標
準の音声入力を繰り返し行うことによって、第１標準音
声パターン記憶手段６には多くの標準音声パターンが登
録されている。The input voice pattern creating means 5 is shown in FIG.
As shown in (a), the standard voice pattern in which the voice section AB is determined is sent to the first standard voice pattern storage means 6, and the voice pattern is registered. As described above, many standard voice patterns are registered in the first standard voice pattern storage means 6 by repeatedly inputting a plurality of different standard voice inputs.

【００１２】次に、斯る標準音声パターンの登録後に、
この標準音声パターンを基にして未知の音声を認識する
場合について述べる。Next, after registering such a standard voice pattern,
A case of recognizing an unknown voice based on this standard voice pattern will be described.

【００１３】図示しない認識スイッチを押下して認識モ
ードに設定する。この後、マイクロフォン１ａから雑音
と共に音声が入力されると、特徴抽出手段２ａでは、音
声の入力レベルを雑音更新部３ａに送ると共に、登録時
に行った処理と同様に、その音声の特徴パラメータを抽
出する。A recognition switch (not shown) is pressed to set the recognition mode. After that, when a voice is input together with noise from the microphone 1a, the feature extraction means 2a sends the input level of the voice to the noise updating unit 3a and also extracts the feature parameter of the voice as in the process performed at the time of registration. To do.

【００１４】雑音更新部３ａでは、特徴抽出手段２ａか
ら送られる入力レベルに基づいて、雑音パラメータを更
新する。このとき、入力された音声は、周囲の雑音と共
に入力されているので、その初期の入力レベルは、標準
の音声パターンの入力時の入力レベルと比較すると、大
きな値になる。即ち、図７（ｂ）に示すように、閾値設
定部３ｂでは、雑音による入力レベルの漸増に従って、
閾値も追従して大きな値になる。The noise updating section 3a updates the noise parameter based on the input level sent from the feature extracting means 2a. At this time, since the input voice is input together with the ambient noise, the initial input level becomes a large value as compared with the input level when the standard voice pattern is input. That is, as shown in FIG. 7B, in the threshold setting unit 3b, as the input level gradually increases due to noise,
The threshold value also follows and becomes a large value.

【００１５】入力音声区間決定手段４では、閾値設定部
３ｂで決定された閾値を越える入力レベルに対応して、
音声区間Ｃ−Ｄが決定される。The input voice section determining means 4 corresponds to the input level exceeding the threshold value determined by the threshold value setting section 3b.
The voice section C-D is determined.

【００１６】入力音声パターン作成手段５では、その音
声パターンの音声区間Ｃ−Ｄ内の特徴パラメータの雑音
除去を行なうと共に、一定時間毎に音声パターンを作成
し直し、識別手段７に送る。識別手段７では、第１標準
音声パターン記憶手段６に記憶されている標準音声パタ
ーンと、入力音声パターン作成手段５の音声パターンと
を、例えば線形マッチング法、又はＤＰマッチング法に
よって比較し、最も類似している標準パターンを選び出
す。The input voice pattern creating means 5 removes noise from the characteristic parameters in the voice section CD of the voice pattern, recreates the voice pattern at regular intervals, and sends the voice pattern to the identifying means 7. The identification unit 7 compares the standard voice pattern stored in the first standard voice pattern storage unit 6 with the voice pattern of the input voice pattern creating unit 5 by, for example, a linear matching method or a DP matching method, and finds the most similar. Select the standard pattern that you are using.

【００１７】[0017]

【発明が解決しようとする課題】上述の如く、現存する
認識手法では、標準音声パターンは雑音の殆どない環境
下で作成されているのに対して、認識すべき音声は雑音
を多く含んでいるので、雑音によって閾値が上昇して、
認識すべき音声の有効な音声区間の切り出しの際に音声
区間Ａ−Ｂとすべきところを、音声区間Ｃ−Ｄと誤認し
てしまい、この結果、音声区間の検出誤りが音声の誤認
識を招いてしまう。As described above, in the existing recognition method, the standard speech pattern is created in an environment where there is almost no noise, whereas the speech to be recognized contains much noise. So, the threshold rises due to noise,
When the effective voice section of the voice to be recognized is cut out, the place which should be the voice section AB is mistakenly recognized as the voice section C-D, and as a result, the detection error of the voice section causes the false recognition of the voice. I will invite you.

【００１８】本発明は、雑音環境下で入力された音声を
認識する際に、その音声の入力レベルがたとえ雑音レベ
ルに基づいて設定された閾値より下まわる場合であって
も、マッチング時の誤りを減らすことができ、認識性能
を向上させることが可能な音声認識装置を提供すること
を目的とする。According to the present invention, when recognizing a voice input in a noisy environment, even if the input level of the voice is lower than a threshold value set based on the noise level, an error in matching is generated. It is an object of the present invention to provide a voice recognition device capable of reducing the number of noises and improving the recognition performance.

【００１９】[0019]

【課題を解決するための手段】本発明は、入力された雑
音を含む認識すべき音声の音声信号の特徴抽出を行なう
特徴抽出手段と、該音声信号の入力に応じて変化する入
力レベルの雑音パラメータを基にして閾値を設定する閾
値設定手段と、該閾値設定手段にて設定された閾値に基
づいて認識すべき音声パターンの有効な音声区間を決定
する入力音声区間決定手段と、認識すべき音声の有効な
音声区間の音声パターンを作成する入力音声パターン作
成手段と、標準音声パターンを記憶する第１標準音声パ
ターン記憶手段と、上記閾値設定手段にて設定された閾
値に基づいて、上記第１標準音声パターン記憶手段に記
憶された標準音声パターンの音声区間を決定する標準音
声区間決定手段と、該標準音声区間決定手段にて音声区
間が決定された標準音声パターンのパターン作成を行な
う標準音声パターン作成手段と、上記入力音声パターン
作成手段にて作成された認識すべき音声の音声パターン
と上記標準音声パターン作成手段にて作成された標準音
声パターンとを比較識別する識別手段と、を具備したこ
とを特徴とする。DISCLOSURE OF THE INVENTION The present invention is directed to a feature extracting means for extracting features of a voice signal of a voice to be recognized which includes input noise, and an input level noise which changes according to the input of the voice signal. Threshold value setting means for setting a threshold value based on a parameter, input voice section determining means for determining an effective voice section of a voice pattern to be recognized based on the threshold value set by the threshold setting means, and to be recognized Based on the threshold value set by the threshold value setting means, an input voice pattern creating means for creating a voice pattern of a valid voice section of the voice, a first standard voice pattern storing means for storing the standard voice pattern, 1 standard voice section determining means for determining the voice section of the standard voice pattern stored in the standard voice pattern storing means, and a target whose voice section has been determined by the standard voice section determining means A standard voice pattern creating means for creating a voice pattern is compared with the voice pattern of the voice to be recognized created by the input voice pattern creating means and the standard voice pattern created by the standard voice pattern creating means. Identification means for identifying.

【００２０】[0020]

【作用】予め、第１標準音声パターン記憶手段に標準音
声パターンを記憶させておき、認識すべき音声が入力さ
れたときの周囲の雑音に応じて設定された閾値レベルに
基づいて、その認識すべき音声の音声パターン、及び第
１標準音声パターン記憶手段に記憶された標準音声パタ
ーンの音声区間の切り出しを夫々行ない、それらの音声
パターン同士の比較識別を行なう。The standard voice pattern is stored in the first standard voice pattern storing means in advance, and the standard voice pattern is recognized based on the threshold level set according to the ambient noise when the voice to be recognized is input. The voice pattern of the desired voice and the voice section of the standard voice pattern stored in the first standard voice pattern storage means are respectively cut out, and the voice patterns are compared and identified.

【００２１】[0021]

【実施例】本発明を図１乃至図５に基づいて説明する。
１ａ乃至７は、図６の従来構成と同じものであり、その
説明を割愛する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described with reference to FIGS.
1a to 7 are the same as the conventional configuration of FIG. 6, and the description thereof will be omitted.

【００２２】図１において、８は雑音更新部３ａから送
られる雑音パラメータに基づいて第１標準音声パターン
記憶手段６に記憶されている標準音声パターンの音声区
間の切り出しを行なう標準音声区間決定手段、９は雑音
更新部３ａの雑音パラメータを基にして、標準音声区間
決定手段８にて切り出された標準音声パターンの音声区
間における雑音除去並びにパターン作成を行なう標準音
声パターン作成手段、１０は標準音声パターン作成手段
９にて作成された標準音声パターンを記憶しておく第２
標準音声パターン記憶手段である。In FIG. 1, reference numeral 8 is a standard voice section determining means for cutting out a voice section of the standard voice pattern stored in the first standard voice pattern storage means 6 based on the noise parameter sent from the noise updating section 3a. Reference numeral 9 is a standard voice pattern creating means for performing noise removal and pattern creation in the voice section of the standard voice pattern cut out by the standard voice section determining means 8 based on the noise parameter of the noise updating section 3a. Second, storing the standard voice pattern created by the creating means 9
It is a standard voice pattern storage means.

【００２３】標準音声パターンの第１標準音声パターン
記憶手段６への登録に関しては、従来と同一の方法によ
って行われるので、その説明を省略する。The registration of the standard voice pattern in the first standard voice pattern storage means 6 is performed by the same method as the conventional one, and therefore the description thereof is omitted.

【００２４】従来と異なる点は、雑音を含む認識すべき
音声が入力されたときの、その雑音に応じて決定された
閾値レベルに基づいて、第１標準音声パターン記憶手段
６に記憶された標準音声パターンの音声区間の切り出し
を行ない、この標準音声パターンと認識すべき音声パタ
ーンとの比較識別を行うことである。The point different from the conventional method is that the standard stored in the first standard voice pattern storage means 6 is based on the threshold level determined according to the noise when the voice to be recognized including the noise is input. That is, the voice section of the voice pattern is cut out, and the standard voice pattern and the voice pattern to be recognized are compared and identified.

【００２５】第１標準音声パターン記憶手段６には、例
えば、図２（ａ）ような音声「アップ」の標準音声パタ
ーンが記憶されている。この後、認識すべき音声がマイ
クロフォン１ａから入力されると、雑音更新部３ａから
その音声の入力時の入力レベルである雑音パラメータ
が、標準音声区間決定手段８に送られる。入力音声パタ
ーン作成手段５では、従来と同様に図２（ｂ）のように
入力された識別すべき音声の音声パターンが作成され、
この音声パターンは識別手段７に送られる。The first standard voice pattern storage means 6 stores, for example, a standard voice pattern of voice "up" as shown in FIG. After that, when the voice to be recognized is input from the microphone 1a, the noise parameter, which is the input level at the time of inputting the voice, is sent from the noise updating unit 3a to the standard voice section determining means 8. The input voice pattern creating means 5 creates a voice pattern of the input voice to be identified as shown in FIG.
This voice pattern is sent to the identification means 7.

【００２６】一方、標準音声区間決定手段８では、雑音
更新部３ａから送られた雑音パラメータを基にして、第
１標準音声パターン記憶手段６に記憶されている標準音
声パターンの音声区間を、図２（ｃ）のように切り出
し、その音声パターンを標準音声パターン作成手段９に
送る。標準音声パターン作成手段９では、切り出された
標準音声パターンの作成を行ない、第２標準音声パター
ン記憶手段１０に送る。On the other hand, in the standard voice section determining means 8, the voice section of the standard voice pattern stored in the first standard voice pattern storage means 6 is calculated based on the noise parameter sent from the noise updating section 3a. 2 (c) is cut out and the voice pattern is sent to the standard voice pattern creating means 9. The standard voice pattern creating means 9 creates the cut out standard voice pattern and sends it to the second standard voice pattern storing means 10.

【００２７】識別手段７では、入力音声パターン作成手
段５から送られてきた認識すべき音声の音声パターン
と、第２標準音声パターン記憶手段１０から送られてき
た標準音声パターンとの比較照合を行ない、最も類似し
ている標準パターンを選び出す。The identifying means 7 compares and verifies the voice pattern of the voice to be recognized sent from the input voice pattern creating means 5 with the standard voice pattern sent from the second standard voice pattern storing means 10. , Select the most similar standard pattern.

【００２８】ここで、本発明の音声認識装置の認識処理
を図３のフローチャートに示す。尚、第１標準音声パタ
ーン記憶手段６には既に標準音声パターンが記憶保持さ
れているものとする。Here, the recognition processing of the speech recognition apparatus of the present invention is shown in the flowchart of FIG. Incidentally, it is assumed that the standard voice pattern is already stored and held in the first standard voice pattern storage means 6.

【００２９】ステップＳ１において、マイクロフォン１
ａから入力された認識すべき音声が入力されると、その
音声の特徴パラメータを求める。ステップＳ２におい
て、その音声の認識すべき有効な音声区間の始端検出を
行ない、始端検出済みであればステップＳ４に進み、始
端検出済みでなければステップＳ３に進む。ステップＳ
３において、入力される音声の入力レベルに追従して雑
音パラメータの更新を行なう。ステップＳ５において、
ステップＳ３にて設定された雑音パラメータに基づい
て、閾値の更新を行ない、ステップＳ１に戻る。その閾
値は、通常入力される音声の入力レベルの約１．５乃至
２倍程度に設定される。In step S1, the microphone 1
When the voice to be recognized input from a is input, the characteristic parameter of the voice is obtained. In step S2, the start of a valid voice section to be recognized for the voice is detected. If the start is detected, the process proceeds to step S4. If the start is not detected, the process proceeds to step S3. Step S
In 3, the noise parameter is updated by following the input level of the input voice. In step S5,
The threshold value is updated based on the noise parameter set in step S3, and the process returns to step S1. The threshold value is set to about 1.5 to 2 times the input level of the normally input voice.

【００３０】ステップＳ１において、ステップＳ５にて
設定された閾値に基づいて、入力される音声の特徴パラ
メータを求める。In step S1, the characteristic parameter of the input voice is obtained based on the threshold value set in step S5.

【００３１】ステップＳ４において、その音声を認識す
るのに有効な音声区間の終端検出を行ない、終端検出済
みであればステップＳ６に進み、終端検出済みでなけれ
ばステップＳ１に戻る。In step S4, the end of the voice section effective for recognizing the voice is detected. If the end has been detected, the process proceeds to step S6. If the end has not been detected, the process returns to step S1.

【００３２】ステップＳ６において、ステップＳ５にお
いて決定された音声区間のパターン作成を行なう。ステ
ップＳ７において、ステップＳ３でおいて、更新を行っ
た最終の雑音パラメータの読み込みを行なう。ステップ
Ｓ８において、ステップＳ７にて読み込んだ雑音パラメ
ータに基づいて、第１音声メモリ７に記憶された標準音
声パターンの音声区間の切り出しを行う。In step S6, the pattern of the voice section determined in step S5 is created. In step S7, the final noise parameter updated in step S3 is read. In step S8, the voice section of the standard voice pattern stored in the first voice memory 7 is cut out based on the noise parameter read in step S7.

【００３３】ステップＳ９において、ステップＳ７で読
み込んだ雑音パラメータを基にして標準音声パターンの
雑音除去を行なうと共に、切り出された標準音声パター
ンのパターン作成を行なう。ステップＳ１０において、
認識すべき音声のパターンと切り出された標準音声パタ
ーンとの比較を行ない、最も類似している標準パターン
を選び出し、この音声パターンが認識結果として出力さ
れる。In step S9, the standard voice pattern noise is removed based on the noise parameters read in step S7, and the cut-out standard voice pattern is created. In step S10,
The voice pattern to be recognized is compared with the clipped standard voice pattern, the most similar standard pattern is selected, and this voice pattern is output as the recognition result.

【００３４】ここで、上述の実施例では、雑音パラメー
タによって閾値が一旦設定されると、その閾値を定常的
に固定して音声区間を決定したが、例えば図５（ａ）に
示すように雑音の入力レベルが急激に上昇するような場
合には、これに伴って認識すべき音声の入力レベルも上
昇するので、音声区間の始端を決めることはできたとし
ても、終端を決めることはできず、結局音声区間を決め
ることは不可能になってしまう。この場合には上述の実
施例を補足するべく、図４に示すように、雑音を含む認
識すべき音声が入力されるマイクロフォン１ａとは別
に、認識すべき音声が入力される際のその周囲の雑音の
みを入力するマイクロフォン１ｂを設けることによって
実現される。Here, in the above-mentioned embodiment, once the threshold is set by the noise parameter, the threshold is constantly fixed and the voice section is determined. For example, as shown in FIG. If the input level of the voice suddenly rises, the input level of the voice to be recognized also rises accordingly, so even if the start end of the voice section can be determined, the end cannot be determined. After all, it becomes impossible to decide the voice section. In this case, in order to supplement the above-described embodiment, as shown in FIG. 4, in addition to the microphone 1a to which the voice including the noise to be recognized is input, the surrounding area around the time when the voice to be recognized is input. This is realized by providing the microphone 1b that inputs only noise.

【００３５】音声区間の決定に際しては、マイクロフォ
ン１ｂから入力されてくる逐次変化する雑音の音声信号
の特徴パラメータを雑音音声特徴抽出手段２ｂにて抽出
し、雑音更新部３ａを介して閾値設定手段３ｂに送る。When determining the voice section, the noise voice feature extracting means 2b extracts the characteristic parameters of the voice signal of the noise which is successively input from the microphone 1b, and the threshold setting means 3b via the noise updating portion 3a. Send to.

【００３６】閾値設定手段３ｂでは、逐次変化する雑音
パラメータに追従して、その雑音レベルの約１．５乃至
２倍の閾値を設定し、この閾値情報を入力音声区間決定
手段４に送り、ここでその閾値を越える入力レベルを音
声区間として決定する。The threshold setting means 3b sets a threshold of about 1.5 to 2 times the noise level following the noise parameter which changes successively, and sends this threshold information to the input voice section determining means 4, where The input level exceeding the threshold is determined as a voice section.

【００３７】即ち、図５（ｂ）に示すように、認識すべ
き音声が、雑音パラメータによって設定された閾値を越
えたときから、その認識すべき音声がその値を下回ると
きまでを音声区間Ｅ−Ｆとして決定することが可能であ
る。That is, as shown in FIG. 5 (b), the voice section E starts from when the voice to be recognized exceeds the threshold value set by the noise parameter to when the voice to be recognized falls below that value. It can be determined as -F.

【００３８】従って、雑音レベルが逐次変化する場合に
は、認識すべき音声が入力される際のその周囲の雑音に
追従し乍ら閾値を決めることができるので、精度よく音
声区間を決めることができる。Therefore, when the noise level changes successively, the threshold value can be determined by following the noise around the voice to be recognized, so that the voice segment can be determined with high accuracy. it can.

【００３９】[0039]

【発明の効果】本発明によれば、予め、第１標準音声パ
ターン記憶手段に標準音声パターンを記憶させておき、
認識すべき音声が入力されたときの周囲の雑音に応じて
設定された閾値レベルに基づいて、その認識すべき音声
の音声パターンと、第１標準音声パターン記憶手段に記
憶された標準音声パターンとの音声区間の切り出しを夫
々行なうので、雑音環境下で入力された音声を認識する
際に、その音声の語尾の入力レベルがたとえ雑音レベル
に基づいて設定された閾値より下まわる場合であって
も、マッチング時の誤りを減らすことができ、認識性能
を向上させることが可能になる。According to the present invention, the standard voice pattern is stored in the first standard voice pattern storage means in advance,
The voice pattern of the voice to be recognized and the standard voice pattern stored in the first standard voice pattern storage means based on the threshold level set according to the ambient noise when the voice to be recognized is input. Since each voice segment is cut out, when recognizing a voice input in a noisy environment, even if the input level of the ending of the voice falls below a threshold set based on the noise level. , It is possible to reduce errors at the time of matching and improve the recognition performance.

[Brief description of drawings]

【図１】本発明に係る音声認識装置のブロック図FIG. 1 is a block diagram of a voice recognition device according to the present invention.

【図２】本発明の音声認識処理で使用する音声パターン
を示す図FIG. 2 is a diagram showing a voice pattern used in the voice recognition processing of the present invention.

【図３】本発明における音声認識処理に関するフローチ
ャートFIG. 3 is a flowchart regarding voice recognition processing according to the present invention.

【図４】本発明に係る他の音声認識装置のブロック図FIG. 4 is a block diagram of another voice recognition device according to the present invention.

【図５】本発明の図４に示す音声認識装置で使用する音
声パターンを示す図5 is a diagram showing a voice pattern used in the voice recognition apparatus shown in FIG. 4 of the present invention.

【図６】従来の音声認識装置のブロック図FIG. 6 is a block diagram of a conventional voice recognition device.

【図７】従来の音声認識処理で使用する音声パターンを
示す図FIG. 7 is a diagram showing a voice pattern used in conventional voice recognition processing.

[Explanation of symbols]

１ａ、１ｂマイクロフォン２ａ音声特徴抽出手段２ｂ雑音音声特徴抽出手段３閾値設定手段３ａ雑音更新部３ｂ閾値設定部４入力音声区間決定手段５入力音声パターン作成手段６第１標準音声パターン記憶手段７識別手段８標準音声区間決定手段９標準音声パターン作成手段１０第２標準音声パターン記憶手段 1a, 1b Microphone 2a Speech feature extraction means 2b Noise speech feature extraction means 3 Threshold setting means 3a Noise updating section 3b Threshold setting section 4 Input speech section determining means 5 Input speech pattern creating means 6 First standard speech pattern storing means 7 Discriminating means 8 standard voice section determining means 9 standard voice pattern creating means 10 second standard voice pattern storing means

Claims

[Claims]

1. A feature extracting means for extracting a feature of a voice signal of a voice to be recognized including an input noise, and a threshold value is set based on a noise parameter of an input level which changes according to the input of the voice signal. Threshold setting means, input voice section determining means for determining a valid voice section of a voice pattern to be recognized based on the threshold set by the threshold setting means, and voice of a valid voice section of a voice to be recognized Input voice pattern creating means for creating a pattern, first standard voice pattern storing means for storing a standard voice pattern, and storage in the first standard voice pattern storing means based on the threshold value set by the threshold value setting means. Standard voice section determining means for determining the voice section of the standard voice pattern that has been generated, and a pattern production of the standard voice pattern for which the voice section has been determined by the standard voice section determining means. And a standard voice pattern created by the input voice pattern creating means and a standard voice pattern created by the standard voice pattern creating means. A voice recognition device comprising: