JP3114757B2

JP3114757B2 - Voice recognition device

Info

Publication number: JP3114757B2
Application number: JP04015491A
Authority: JP
Inventors: 教幸藤本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-01-30
Filing date: 1992-01-30
Publication date: 2000-12-04
Anticipated expiration: 2015-12-04
Also published as: JPH05210397A

Abstract

PURPOSE:To provide the voice recognizing device improving reject ability against noise by considering a pitch cycle peculiar for voices in the case of a recognition processing so as to recognize only the voices in the input sound mixing noise and voices. CONSTITUTION:This device is composed of a recognition part 1 to output voice recognizing candidates in the blocks of a signal estimated as voice parts in the input audio signal, pitch extraction part 2 to extract the pitch cycle from the input audio signal, and judge part 3 to output the voice recognized result from the voice recognizing candidates to the signal of the blocks outputted by the recognition part 1 and the pitch extracted result of the signal in that block by the pitch extraction part 2.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、入力手段によって入力
される音声を認識する音声認識装置に関する。今日で
は、コンピュータへのデータ入力、電話による予約、製
鉄所および自動車工場などにおける作業の進行状況の制
御などに音声認識装置を使用する機会が増えている。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device for recognizing speech input by input means. Today, voice recognition devices are increasingly being used to input data into computers, make reservations by telephone, control the progress of work in steelworks and car factories, and the like.

【０００２】このため、発声される音声と雑音が混在す
る入力音声から音声のみを認識するために、雑音のリジ
ェクト能力の高い音声認識装置が必要となる。For this reason, in order to recognize only speech from input speech in which speech and noise are mixed, a speech recognition device having a high noise rejection capability is required.

【０００３】[0003]

【従来の技術】図３は従来例を示す図である。入力手段
１０１は、マイクなどの集音装置であり、入力された音
声はアナログ信号に変換され、その後、そのアナログ信
号は、Ａ／Ｄ変換部１０２によってディジタル信号に変
換される。2. Description of the Related Art FIG. 3 shows a conventional example. The input unit 101 is a sound collection device such as a microphone, and the input sound is converted into an analog signal, and then the analog signal is converted into a digital signal by the A / D converter 102.

【０００４】特徴抽出部１０３は、前記ディジタル化さ
れた信号を一定の時間で分割し、さらに、予め設定され
た周波数範囲をいくつかの周波数帯域に分割し、分割さ
れた各時間における各周波数帯域ごとの信号値から特徴
を抽出する。区間検出部１０４は、予め設定された信号
値に対する閾値を満たす信号の区間を音声として認識す
る処理を行う認識処理区間として検出する。その認識処
理区間を検出する処理の際、信号値が閾値を満たしてい
る状態から閾値を満たさない状態になり、ある時間経過
後再び閾値を満たす状態になったとすると、その閾値を
満たさなかった時間が予め設定された時間内であれば、
その閾値を満たさなかった信号の区間も認識処理区間と
して区間検出される。[0004] The feature extracting unit 103 divides the digitized signal at a fixed time, further divides a predetermined frequency range into several frequency bands, and divides each frequency band at each divided time. The feature is extracted from the signal value for each. The section detection unit 104 detects a section of a signal that satisfies a threshold for a preset signal value as a recognition processing section that performs a process of recognizing as speech. In the process of detecting the recognition processing section, if the signal value changes from a state that satisfies the threshold to a state that does not satisfy the threshold, and after a certain time elapses, a state that satisfies the threshold again, a time when the threshold is not satisfied Is within a preset time,
The section of the signal that does not satisfy the threshold is also detected as the recognition processing section.

【０００５】辞書１０５には予め作成された音声の標準
パターンが格納されており、区間検出部１０４で検出さ
れた認識処理区間での特徴抽出部１０３で抽出された信
号の特徴と辞書１０５に格納されている全ての標準パタ
ーンとの距離計算が照合部１０６において行われる。こ
の照合部１０６で行われる距離計算によって求められた
距離のうち、最も距離が小さい標準パターンが区間検出
部１０４で検出された認識処理区間の信号に対する認識
候補として出力される。[0005] The dictionary 105 stores a standard pattern of speech created in advance, and stores in the dictionary 105 the features of the signal extracted by the feature extraction unit 103 in the recognition processing section detected by the section detection unit 104. The collation unit 106 calculates distances to all the standard patterns that have been set. The standard pattern having the smallest distance among the distances calculated by the distance calculation performed by the matching unit 106 is output as a recognition candidate for the signal of the recognition processing section detected by the section detection unit 104.

【０００６】その後、認識判定部１０８において前記照
合部１０６で求められた最小距離が距離閾値記憶部１０
７に格納されている、距離に対する閾値を満たしている
かどうか判定され、前記最小距離が閾値を満たしている
と判定されると、区間検出部１０４で検出された認識処
理区間の信号は音声として認識され、照合部１０６で出
力された認識候補を認識結果として出力するが、閾値を
満たしていないと判定されると、前記認識処理区間の信
号は雑音としてリジェクトされ、入力手段１０１へ音声
の再入力となる。Thereafter, the minimum distance obtained by the collating unit 106 in the recognition judging unit 108 is stored in the distance threshold storing unit 10.
7, it is determined whether or not the minimum distance satisfies the threshold. If it is determined that the minimum distance satisfies the threshold, the signal of the recognition processing section detected by the section detection unit 104 is recognized as speech. Then, the recognition candidate output by the matching unit 106 is output as a recognition result. If it is determined that the threshold is not satisfied, the signal in the recognition processing section is rejected as noise and re-input of speech to the input unit 101 is performed. Becomes

【０００７】[0007]

【発明が解決しようとする課題】上記のように、認識判
定部１０８で求められた区間検出部１０４で検出された
認識処理区間の信号のパターンと照合部１０６で出力さ
れた認識候補のパターンとの距離が距離閾値記憶部１０
７に格納されている距離に対する閾値を満たしていると
判定されると、その認識候補は認識結果として出力さ
れ、閾値を満たしていないと判定されると雑音としてリ
ジェクトされる。As described above, the pattern of the signal of the recognition processing section detected by the section detection section 104 obtained by the recognition determination section 108 and the pattern of the recognition candidate output by the matching section 106 Is the distance threshold storage unit 10
If it is determined that the threshold for the distance stored in 7 is satisfied, the recognition candidate is output as a recognition result, and if it is determined that the threshold is not satisfied, it is rejected as noise.

【０００８】従って、距離閾値記憶部１０７に格納され
ている距離に対する閾値の設定によって、区間検出部１
０４で検出された認識処理区間の信号を音声として認識
するか、雑音としてリジェクトするかが決定される。そ
の距離閾値記憶部１０７に格納されている距離に対する
閾値は実験的もしくは経験的によって求められ、該閾値
を低く設定すると、たとえ区間検出された認識処理区間
の信号が雑音だとしても照合部１０６で求められた最小
距離が距離に対する閾値を満たすため、該照合部１０６
で求められた認識候補を音声として誤って認識する場合
が生じる。Accordingly, by setting the threshold value for the distance stored in the distance threshold value storage unit 107, the section detection unit 1
It is determined whether the signal of the recognition processing section detected in 04 is recognized as speech or rejected as noise. The threshold value for the distance stored in the distance threshold value storage unit 107 is obtained experimentally or empirically. If the threshold value is set low, even if the signal in the recognition processing section where the section is detected is noise, the matching unit 106 sets the threshold value. Since the obtained minimum distance satisfies the threshold for the distance, the matching unit 106
In some cases, the recognition candidate obtained in step (1) is incorrectly recognized as speech.

【０００９】また、前記距離に対する閾値を高く設定す
ると、認識処理区間の信号が音声だとしても、照合部１
０６で求められた最小距離が距離に対する閾値を満たさ
ないことにより雑音としてリジェクトされる場合が生じ
るため、前記の距離に対する閾値を低く設定した場合の
ことを併せて音声と雑音を完全に振るい分ける閾値を設
定することは不可能である。When the threshold value for the distance is set high, even if the signal in the recognition processing section is a voice, the matching unit 1
Since the minimum distance obtained in step 06 does not satisfy the threshold for distance, it may be rejected as noise. Therefore, the threshold for completely separating voice and noise together with the case where the threshold for distance is set low is set. Is impossible to set.

【００１０】本発明は、雑音と音声が混在する入力にお
いて、音声のみを認識するために音声特有のピッチ周期
を認識処理の際に考慮することにより、雑音のリジェク
ト能力を向上させる音声認識装置を提供することを目的
とする。According to the present invention, there is provided a speech recognition apparatus for improving noise rejection capability by considering a pitch period peculiar to speech in recognition processing in order to recognize only speech in an input in which noise and speech are mixed. The purpose is to provide.

【００１１】[0011]

【課題を解決するための手段】図１は本発明の原理図で
ある。図中、１は認識部であり、入力音声信号から音声
部分と推定される信号の区間の音声認識候補を出力す
る。２はピッチ抽出部であり、前記入力音声信号からピ
ッチ周期を抽出する。FIG. 1 is a diagram illustrating the principle of the present invention. In the figure, reference numeral 1 denotes a recognition unit, which outputs speech recognition candidates for a section of a signal estimated to be a speech part from an input speech signal. Reference numeral 2 denotes a pitch extraction unit that extracts a pitch cycle from the input audio signal.

【００１２】３は判定部であり、前記認識部１で出力さ
れた区間の信号に対する音声認識候補と前記ピッチ抽出
部２における該区間の信号のピッチ抽出結果から音声認
識結果を出力する。４は周期範囲記憶部であり、ピッチ
抽出部２によって抽出されるピッチ周期に対する周期範
囲を記憶している。Reference numeral 3 denotes a judgment unit which outputs a speech recognition result from a speech recognition candidate for the signal of the section output by the recognition unit 1 and a pitch extraction result of the signal of the section by the pitch extraction unit 2. Reference numeral 4 denotes a period range storage unit, which stores a period range for the pitch period extracted by the pitch extraction unit 2.

【００１３】[0013]

【作用】本発明では、従来同様、入力した音声を信号に
変換し、該信号の特徴を抽出し、該信号値が予め設定さ
れた値を満たす区間を認識処理区間として検出し、該認
識処理区間の信号の特徴と辞書に格納されている標準パ
ターンとの距離計算を行い、最小距離の標準パターンを
認識候補として挙げ、該最小距離が距離に対する閾値を
満たすかどうか判定する。According to the present invention, as in the prior art, an input voice is converted into a signal, the characteristics of the signal are extracted, and a section in which the signal value satisfies a preset value is detected as a recognition processing section. The distance between the feature of the signal of the section and the standard pattern stored in the dictionary is calculated, the standard pattern with the minimum distance is selected as a recognition candidate, and it is determined whether or not the minimum distance satisfies a threshold for distance.

【００１４】本発明の請求項１では、併せて、前記区間
検出された認識処理区間の信号から音声を発声したとき
の声帯の振動周期を示すピッチ周期の抽出を行い、その
結果、ピッチ周期が抽出されると前記認識処理区間の信
号は音声候補であると判定し、ピッチ周期が抽出されな
い場合には雑音であると判定する。前記最小距離が距離
に対する閾値を満たすと判断され、かつピッチに関する
処理によって前記認識処理区間の信号が音声候補である
と判定されるならば、前記認識候補は該認識処理区間の
音声認識結果として出力する。According to the first aspect of the present invention, a pitch period indicating a vibration period of a vocal cord when a voice is uttered is extracted from a signal of the recognition processing section detected in the section. When extracted, the signal in the recognition processing section is determined to be a voice candidate, and when no pitch period is extracted, it is determined to be noise. If it is determined that the minimum distance satisfies the threshold for distance, and if the signal of the recognition processing section is determined to be a speech candidate by processing related to pitch, the recognition candidate is output as a speech recognition result of the recognition processing section. I do.

【００１５】また、前記最小距離は距離に対する閾値を
満たすが、ピッチに関する処理によって前記認識処理区
間の信号が雑音であると判定されると、該認識処理区間
の信号は雑音であると判断しリジェクトする。従って、
距離に対する閾値を低く設定することにより、区間検出
によって検出された認識処理区間の信号が雑音であって
も、該認識処理区間の信号の特徴と標準パターンとの距
離計算によって求められる最小距離が距離に対する閾値
を満たす場合が増えるが、雑音からは殆どピッチ周期が
抽出されないので該認識処理区間の信号は雑音としてリ
ジェクトされる可能性が大きい。The minimum distance satisfies a threshold value for the distance. If the signal in the recognition processing section is determined to be noise by the processing related to pitch, the signal in the recognition processing section is determined to be noise and rejected. I do. Therefore,
By setting the threshold value for the distance low, even if the signal of the recognition processing section detected by the section detection is noise, the minimum distance obtained by calculating the distance between the characteristic of the signal of the recognition processing section and the standard pattern is the distance. However, since the pitch period is hardly extracted from the noise, the signal in the recognition processing section is likely to be rejected as noise.

【００１６】本発明の請求項２では、請求項１のピッチ
抽出処理に併せて、そのピッチ抽出処理の結果、ピッチ
周期が抽出され、かつ該ピッチ周期が予め設定されたピ
ッチ周期に対する設定周期範囲内であれば、該認識処理
区間の信号は音声候補であると判定し、ピッチ周期に対
する周期の範囲外であれば雑音であると判定する。前記
最小距離が距離に対する閾値を満たすと判断され、かつ
ピッチに関する処理によって前記認識処理区間の信号が
音声候補であると判定されるならば、前記認識候補は該
認識処理区間の音声認識結果として出力する。According to a second aspect of the present invention, a pitch cycle is extracted as a result of the pitch extraction processing in addition to the pitch extraction processing of the first aspect, and the pitch cycle is set to a preset cycle range with respect to a preset pitch cycle. If it is within, the signal in the recognition processing section is determined to be a speech candidate, and if it is outside the range of the pitch cycle, it is determined to be noise. If it is determined that the minimum distance satisfies the threshold for distance, and if the signal of the recognition processing section is determined to be a speech candidate by processing related to pitch, the recognition candidate is output as a speech recognition result of the recognition processing section. I do.

【００１７】また、前記最小距離は距離に対する閾値を
満たすが、ピッチに関する処理によって前記認識処理区
間の信号が雑音であると判定されると、該認識処理区間
の信号は雑音であると判断しリジェクトする。従って、
請求項２の手段を用いると、請求項１での区間検出によ
って検出された認識処理区間の信号が雑音であって、該
認識処理区間の信号からピッチ周期が抽出されたとして
も、そのピッチ周期はピッチ周期に対する設定周期範囲
外の場合が多いため、該認識処理区間の信号は雑音とし
てリジェクトされるため、請求項１より更に高精度のリ
ジェクト能力をもつ音声認識装置を実現することができ
る。The minimum distance satisfies a threshold value for the distance. If the signal of the recognition processing section is determined to be noise by the processing related to the pitch, the signal of the recognition processing section is determined to be noise and rejected. I do. Therefore,
According to the second aspect of the present invention, even if the signal of the recognition processing section detected by the section detection in claim 1 is noise and the pitch cycle is extracted from the signal of the recognition processing section, the pitch cycle is not changed. Is often outside the set cycle range for the pitch cycle, the signal in the recognition processing section is rejected as noise. Therefore, it is possible to realize a speech recognition apparatus having a rejection ability with higher accuracy than the first aspect.

【００１８】本発明の請求項３では、請求項２において
ピッチ周期を抽出しそのピッチ周期が音声であるか判定
する際に参照されるピッチ周期の設定周期範囲を、音声
入力者の音声のピッチ周期に応じて変更できるようにし
ている。ピッチ周期の分布範囲は音声入力者によって異
なるため、ピッチ周期の設定周期範囲を音声入力者のピ
ッチ周期に設定することにより、特定の入力者の音声の
み認識することが可能となる。According to a third aspect of the present invention, a pitch cycle set in the second aspect is determined by extracting a pitch cycle and determining whether the pitch cycle is a voice. It can be changed according to the cycle. Since the distribution range of the pitch cycle differs depending on the voice input person, setting the pitch cycle setting cycle range to the pitch cycle of the voice input person makes it possible to recognize only the voice of a specific input person.

【００１９】本発明の請求項４では、請求項２において
ピッチ周期が抽出され、かつ抽出されたピッチ周期が予
め設定されているピッチ周期の設定周期範囲であって
も、抽出されたピッチ周期を中心とするようにピッチ周
期の設定周期範囲を自動的に変更するようにしている。
人間の発声する音声から抽出されるピッチ周期の周期範
囲は限られているが、各音声入力者のピッチ周期範囲は
異なるため、音声入力者が入力した音声での認識処理区
間で抽出されるピッチの周期に対応してピッチ周期の設
定周期範囲を変更することにより、音声入力者に応じた
最適な音声認識処理が可能となる。According to a fourth aspect of the present invention, the pitch period is extracted in the second aspect, and even if the extracted pitch period is within a preset period range of the pitch period, the extracted pitch period is determined. The set cycle range of the pitch cycle is automatically changed so as to be centered.
Although the pitch range of the pitch period extracted from the voice uttered by a human is limited, the pitch period range of each voice input person is different, so the pitch extracted in the recognition processing section in the voice input by the voice input user By changing the set cycle range of the pitch cycle in accordance with the cycle of the above, it becomes possible to perform optimal speech recognition processing according to the speech input person.

【００２０】[0020]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。まず、第１の実施例について説明する。図２
は本発明の実施例を示す図である。入力手段２０１によ
る音声の入力およびアナログ信号への変換、Ａ／Ｄ変換
部２０２によるアナログ信号からディジタル信号への変
換、特徴抽出部２０３による該信号の特徴抽出、区間検
出部２０４による該信号の認識処理区間の検出、照合部
２０６による辞書２０５の標準パターンと該認識処理区
間の信号の特徴との距離計算および該距離計算による最
小距離の標準パターンの認識候補出力は前記従来例のよ
うに行うため、その詳細な説明は省略する。Embodiments of the present invention will be described below in detail with reference to the drawings. First, a first embodiment will be described. FIG.
FIG. 3 is a diagram showing an embodiment of the present invention. Speech input by the input means 201 and conversion to an analog signal, conversion from an analog signal to a digital signal by the A / D converter 202, feature extraction of the signal by the feature extractor 203, recognition of the signal by the section detector 204 The detection of the processing section, the calculation of the distance between the standard pattern of the dictionary 205 by the matching unit 206 and the feature of the signal of the recognition processing section, and the recognition candidate output of the standard pattern of the minimum distance by the distance calculation are performed as in the above-described conventional example. , And a detailed description thereof will be omitted.

【００２１】図２に示すピッチ抽出部２０７はＡ／Ｄ変
換部２０２によって出力される信号のピッチ周期を抽出
する。雑音・音声判定部２０８は区間検出部２０４で検
出された認識処理区間において、まず、ピッチ抽出部２
０７でピッチ周期が抽出されたかどうか判定する。ピッ
チ周期が抽出されたと判定されると、該ピッチ周期が周
期範囲記憶部２０９に格納されているピッチ周期に対す
る設定周期範囲内であるかどうか判定する。A pitch extracting section 207 shown in FIG. 2 extracts a pitch period of a signal output from the A / D converting section 202. In the recognition processing section detected by the section detection section 204, the noise / speech determination section 208 firstly outputs the pitch extraction section 2
At 07, it is determined whether or not the pitch period has been extracted. When it is determined that the pitch period has been extracted, it is determined whether or not the pitch period is within a set period range for the pitch period stored in the period range storage unit 209.

【００２２】抽出されたピッチ周期が周期範囲記憶部２
０９に格納されているピッチ周期に対する設定周期範囲
内である場合には、認識処理区間の信号は音声候補であ
ると判定され、設定周期範囲外である場合には該認識処
理区間の信号は雑音であると判定される。雑音・音声判
定部２０８で区間検出部２０４によって検出された認識
処理区間は音声候補であると判定されると、照合部２０
６での距離計算によって得られる最小距離が距離閾値記
憶部２１０に格納されている距離に対する閾値を満たし
ているかどうか認識判定部２１１で判定され、閾値を満
たしていれば照合部２０６で出力される認識候補を認識
結果として出力する。The extracted pitch period is stored in the period range storage 2
If the signal is within the set cycle range for the pitch cycle stored in step 09, the signal in the recognition processing section is determined to be a speech candidate. Is determined. When the noise / speech determining unit 208 determines that the recognition processing section detected by the section detecting unit 204 is a voice candidate, the matching unit 20
The recognition determination unit 211 determines whether or not the minimum distance obtained by the distance calculation in Step 6 meets the threshold for the distance stored in the distance threshold storage unit 210. If the minimum distance is satisfied, the recognition unit 206 outputs the result. The recognition candidate is output as a recognition result.

【００２３】前記認識判定部２１１において、前記最小
距離が距離閾値記憶部２１０に格納されている距離に対
する閾値を満たしていないと判定されると、前記認識処
理区間は雑音であると判定されリジェクトされる。ま
た、前記雑音・音声判定部２０８によって、ピッチ抽出
部２０７で認識処理区間においてピッチ周期が抽出され
なかったと判定されると、照合部２０６で求められた最
小距離が距離閾値記憶部２１０に格納されている距離に
対する閾値を満たしていると認識判定部２１１で判定さ
れても、該認識処理区間は雑音であると判定されリジェ
クトされる。If the recognition determining section 211 determines that the minimum distance does not satisfy the threshold for the distance stored in the distance threshold storage section 210, the recognition processing section is determined to be noise and rejected. You. When the noise / voice determination unit 208 determines that the pitch period has not been extracted in the recognition processing section by the pitch extraction unit 207, the minimum distance obtained by the matching unit 206 is stored in the distance threshold storage unit 210. Even if the recognition determining unit 211 determines that the threshold value for the present distance is satisfied, the recognition processing section is determined to be noise and is rejected.

【００２４】雑音としてリジェクトされると、従来同
様、入力手段２０１への再入力となる。次に第２の実施
例について説明する。第２の実施例が第１の実施例と異
なる点は、周期範囲記憶部２０９の設定周期範囲を変更
可能にした点である。If rejected as noise, it will be re-input to the input means 201 as in the prior art. Next, a second embodiment will be described. The second embodiment differs from the first embodiment in that the set cycle range of the cycle range storage unit 209 can be changed.

【００２５】前記第１の実施例のように、ピッチ抽出部
２０７で信号のピッチ周期を抽出した後、区間検出部２
０４で検出された認識処理区間でピッチ周期が抽出され
たかどうか雑音・音声判定部２０８で判定し、該認識処
理区間でピッチ周期が抽出されたと判定されると、さら
に、該認識区間のピッチ周期が周期範囲記憶部２０９に
格納されているピッチ周期に対する設定周期範囲内であ
るか判定される。As in the first embodiment, after the pitch period of the signal is extracted by the pitch extracting section 207, the section detecting section 2
The noise / speech determination unit 208 determines whether or not a pitch cycle has been extracted in the recognition processing section detected in step 04. If it is determined that the pitch cycle has been extracted in the recognition processing section, the noise / speech determination section 208 further determines the pitch cycle of the recognition section. Is within the set cycle range for the pitch cycle stored in the cycle range storage unit 209.

【００２６】本実施例では前記周期範囲記憶部２０９で
設定しているピッチ周期に対する設定周期範囲を変更可
能にし、周期範囲記憶部２０９に予め設定されているピ
ッチ周期の設定周期範囲を音声入力者のピッチ周期に応
じた設定周期範囲に設定することで特定の音声入力者の
み音声認識処理を行うことを可能とする。次に第３の実
施例について説明する。In this embodiment, the set cycle range for the pitch cycle set in the cycle range storage unit 209 can be changed, and the set cycle range of the pitch cycle preset in the cycle range storage unit 209 can be changed by the voice input user. By setting the range to a set cycle range corresponding to the pitch cycle, only a specific voice input person can perform the voice recognition processing. Next, a third embodiment will be described.

【００２７】第３の実施例が第１の実施例と異なる点
は、周期範囲記憶部２０９のピッチ周期に対する設定周
期範囲が入力音声のピッチ周期に応じて変更されるよう
にした点である。尚、第１および第２の実施例と同様の
処理を行うところはその説明を省略する。本実施例では
前記区間検出部２０４で区間検出された認識処理区間の
信号が雑音・音声判定部２０８で音声であると判定され
ると、周期範囲記憶部２０９に予め設定しているピッチ
周期に対する設定周期範囲を該認識処理区間から抽出さ
れるピッチ周期に応じて、周期範囲記憶部２０９のピッ
チ周期に対する設定周期範囲が変更される。The third embodiment differs from the first embodiment in that the set cycle range for the pitch cycle of the cycle range storage unit 209 is changed according to the pitch cycle of the input voice. The description of the same processing as in the first and second embodiments will be omitted. In the present embodiment, when the signal of the recognition processing section detected by the section detection section 204 is determined to be a speech by the noise / speech determination section 208, the noise / speech determination section 208 determines whether the signal corresponds to the pitch cycle preset in the cycle range storage section 209. The set cycle range for the pitch cycle in the cycle range storage unit 209 is changed in accordance with the pitch cycle extracted from the recognition processing section.

【００２８】[0028]

【発明の効果】以上説明したように、本発明によれば、
雑音が混在する入力音声に雑音のリジェクト能力が高い
音声認識処理を行うことができるため、雑音に対して認
識処理が行われ、それにより誤った認識結果を出力して
いたことを減少させることが可能となり、認識結果の正
誤率を向上することができる。As described above, according to the present invention,
Since speech recognition processing with high noise rejection capability can be performed on input speech containing noise, recognition processing is performed on noise, thereby reducing the possibility of outputting incorrect recognition results. This makes it possible to improve the accuracy rate of the recognition result.

[Brief description of the drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の実施例を示す図である。FIG. 2 is a diagram showing an embodiment of the present invention.

【図３】従来例を示す図である。FIG. 3 is a diagram showing a conventional example.

[Explanation of symbols]

１認識部２ピッチ抽出部３判定部４周期範囲記憶部１０１，２０１入力手段１０２，２０２Ａ／Ｄ変換部１０３，２０３特徴抽出部１０４，２０４区間検出部１０５，２０５辞書１０６，２０６照合部１０７，２１０距離閾値記憶部１０８，２１１認識判定部２０７ピッチ抽出部２０８雑音・音声判定部２０９周期範囲記憶部 DESCRIPTION OF SYMBOLS 1 Recognition part 2 Pitch extraction part 3 Judgment part 4 Period range storage part 101,201 Input means 102,202 A / D conversion part 103,203 Feature extraction part 104,204 Section detection part 105,205 Dictionary 106,206 Collation part 107 , 210 Distance threshold storage unit 108, 211 Recognition determination unit 207 Pitch extraction unit 208 Noise / voice determination unit 209 Cycle range storage unit

フロントページの続き (56)参考文献特開平４−115299（ＪＰ，Ａ) 特開平２−238493（ＪＰ，Ａ) 特開平１−159697（ＪＰ，Ａ) 特開昭64−40997（ＪＰ，Ａ) 特開昭63−163495（ＪＰ，Ａ) 特開昭63−123100（ＪＰ，Ａ) 特公昭61−29517（ＪＰ，Ｂ２) 古井「ディジタル音声処理」（1985− ９−25）東海大学出版会ｐ．17−18 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 17/00 Continuation of the front page (56) References JP-A-4-115299 (JP, A) JP-A-2-238493 (JP, A) JP-A 1-159697 (JP, A) JP-A 64-40997 (JP) , A) JP-A-63-163495 (JP, A) JP-A-63-123100 (JP, A) JP-B-61-29517 (JP, B2) Furui "Digital Audio Processing" (1985-9-25) Tokai University Press p. 17-18 (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-17/00

Claims

(57) [Claims]

A recognition unit for estimating a section including a voice signal from an input voice signal and outputting a voice recognition candidate for the signal in the section; a pitch extraction unit for extracting a pitch period from the input voice signal; A speech recognition device comprising: a decision unit that outputs a speech recognition candidate output by the recognition unit as a speech recognition result when a pitch cycle extracted by the pitch extraction unit is within a set cycle range of a cycle range storage unit. When the pitch cycle extracted by the pitch extraction unit is within a set cycle range preset in a cycle range storage unit, the cycle range set in the cycle range storage unit is changed to the extracted pitch cycle. Changing to a set cycle range based on the speech recognition device.