JPH11119793A

JPH11119793A - Speech recognition device

Info

Publication number: JPH11119793A
Application number: JP9283324A
Authority: JP
Inventors: Dairo Katayama; 大朗片山; Junichi Nakabashi; 順一中橋; Mitsuhiko Serikawa; 光彦芹川; Yoshihisa Nakato; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-10-16
Filing date: 1997-10-16
Publication date: 1999-04-30

Abstract

PROBLEM TO BE SOLVED: To automatically set a threshold and the number of target active buses of cumulative likelihood for pruning by determining a threshold for pruning by a beam search method used as a speech recognition means from detected perplexity. SOLUTION: A speech input means 102 detects a speech section from time waveform, and divides the detected speech section into frames, and parameterize the frame-divided speech for recognition. A speech recognition means 104 compares standard patterns read from a dictionary with a signal parameterized in the speech input means 102 by using the beam search method, and outputs the recognition result as a character string, etc. A perplexity detecting means 105 reads out a dictionary stored in a storage means 103, and detects perplexity F which is a parameter showing recognition task complexity. A threshold value setting means 106 decides a threshold value α of pruning according to the beam search method carried out in the speech recognition means 104 from the detected perplexity.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、人が発声した単語
などの音声を入力信号とし、その音声を標準パターンと
比較して最も似たパターンを探索することにより認識
し、結果を文字列などとして出力するような音声認識装
置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech such as a word uttered by a human being as an input signal, comparing the speech with a standard pattern, searching for the most similar pattern, and recognizing the result. The present invention relates to a voice recognition device that outputs as "?".

【０００２】[0002]

【従来の技術】従来の技術について、図６を参照しなが
ら説明する。2. Description of the Related Art A conventional technique will be described with reference to FIG.

【０００３】従来の音声認識装置は、入力された音声を
認識して結果を出力する音声認識装置６０１であって、
音声入力手段６０２と、記憶手段６０３と、音声認識手
段６０４と、閾値設定手段６０５とを具備している。[0003] A conventional speech recognition apparatus is a speech recognition apparatus 601 for recognizing input speech and outputting a result.
The apparatus includes a voice input unit 602, a storage unit 603, a voice recognition unit 604, and a threshold setting unit 605.

【０００４】音声入力手段６０２は、時間波形から音声
区間を検出し、検出された音声区間をフレーム分割し、
フレーム分割された音声を認識のためにパラメータ化す
る。The voice input means 602 detects a voice section from the time waveform, divides the detected voice section into frames,
Parameterize the framed speech for recognition.

【０００５】記憶手段６０３は、パラメータ化された音
声の標準パターンを、あらかじめ辞書として記憶してい
る。[0005] The storage means 603 stores in advance a standardized voice standard pattern as a dictionary.

【０００６】音声認識手段６０４は、音声入力手段にお
いてパラメータ化された信号と、辞書から読み出した標
準パターンとを、ＤＰマッチング法などにより、認識結
果を文字列などとして出力する。ここで、前記音声認識
手段６０４においては、演算量を削減するために、前記
ＤＰマッチング法などに、ビームサーチ法を組み合わせ
て音声認識を行う。[0006] The speech recognition means 604 outputs a recognition result of a signal parameterized by the speech input means and a standard pattern read from the dictionary as a character string or the like by a DP matching method or the like. Here, the speech recognition means 604 performs speech recognition by combining the DP matching method and the like with a beam search method in order to reduce the amount of calculation.

【０００７】閾値設定手段６０５は、前記音声認識手段
で行うビームサーチ法による枝刈りの閾値αを、音声認
識手段６０４に送るものである。The threshold setting means 605 sends a threshold α for pruning by the beam search method performed by the speech recognition means to the speech recognition means 604.

【０００８】音声認識手段６０４は、パラメータ化され
た入力音声と標準パターンとを先頭フレームから比較し
累積尤度を計算して行くという、最適パス探索問題を解
くものである。音声認識手段６０４で行われるビームサ
ーチ法による枝刈りとは、前記最適パス探索問題におい
て、先頭フレームから計算して行くそれぞれのパスのう
ち、累積尤度の低いものについては途中で計算を止める
という手法のことである。これは音声認識手段における
演算量を軽減するために行われる。また、これは、フレ
ームが進むに連れて増えて行く枝を刈り取るような作業
であるので、枝刈りと呼ばれる。The speech recognition means 604 solves the optimal path search problem of comparing the input speech parameterized with the standard pattern from the first frame and calculating the cumulative likelihood. The pruning by the beam search method performed by the voice recognition unit 604 means that, in the optimal path search problem, among the paths calculated from the first frame, those having a low cumulative likelihood are stopped in the middle. It is a method. This is performed to reduce the amount of calculation in the voice recognition means. Also, this is called pruning, because it is an operation of pruning branches that increase as the frame progresses.

【０００９】ここで、枝刈りを行うには、刈るべきパス
の累積尤度の閾値が必要であるが、従来の音声認識装置
６０１においては、外部から閾値を与えるか、あるい
は、外部から残すべきパスの本数を与える必要があっ
た。この残すべきパスの本数を目標アクティブパス数と
呼ぶ。Here, in order to perform pruning, a threshold value of the cumulative likelihood of the path to be pruned is required. In the conventional speech recognition apparatus 601, a threshold value must be given from outside or left from outside. I needed to give the number of passes. The number of paths to be left is called a target number of active paths.

【００１０】以上のような構成の音声認識装置６０１に
より、入力された音声を認識し、その結果を文字列など
として出力することが可能となる。With the speech recognition apparatus 601 having the above-described configuration, it is possible to recognize the input speech and output the result as a character string or the like.

【００１１】[0011]

【発明が解決しようとする課題】従来の音声認識装置に
おいてはビームサーチ法による枝刈りの閾値を設定する
際に、外部から累積尤度の閾値を与えたり、あるいは、
外部から目標アクティブパス数を与える必要があった。In the conventional speech recognition apparatus, when setting a pruning threshold value by the beam search method, a threshold value of the cumulative likelihood is given from the outside, or
It was necessary to give the target number of active paths from outside.

【００１２】しかしながら、累積尤度の閾値や、目標ア
クティブパス数は、認識タスクの語彙数、辞書に登録さ
れている語彙の類似度、標準パターンの精度などに依存
する。よって、枝刈りのための累積尤度の閾値や目標ア
クティブパス数を設定するには、認識タスクごとに、そ
れらと認識率の関係について、あらかじめ調べておかな
ければならない。これは、非常に手間のかかる作業であ
った。However, the threshold value of the cumulative likelihood and the number of target active paths depend on the number of vocabularies of the recognition task, the similarity of the vocabulary registered in the dictionary, the accuracy of the standard pattern, and the like. Therefore, in order to set the threshold value of the cumulative likelihood for pruning and the number of target active paths, the relationship between the recognition task and the recognition rate must be checked in advance for each recognition task. This was a very laborious task.

【００１３】本発明は上記の課題に鑑みてなされたもの
であり、枝刈りのための累積尤度の閾値や目標アクティ
ブパス数を自動的に設定することを目的とする。The present invention has been made in view of the above problems, and has as its object to automatically set a threshold value of a cumulative likelihood for pruning and a target number of active paths.

【００１４】[0014]

【課題を解決するための手段】上記の課題を解決するた
めに、本発明の音声認識装置は、入力された音声を認識
して認識結果を出力する音声認識装置であって、音声を
入力し、前記音声の時間波形から音声区間を検出し、検
出された音声区間をフレーム分割し、フレーム分割され
た音声を認識のためのパラメータに変換する音声入力手
段と、あらかじめ用意された音声の標準パターンを辞書
として記憶している記憶手段と、音声入力手段において
パラメータに変換された信号と、辞書から読み出した標
準パターンとを、ビームサーチ法を用いながら比較し、
認識結果を出力する音声認識手段と、前記記憶手段に記
憶された辞書を読み出して、前記音声認識手段で行う認
識タスクの複雑さを表すパープレキシティを検出するパ
ープレキシティ検出手段と、前記パープレキシティ検出
手段において検出されたパープレキシティより、前記音
声認識手段で行うビームサーチ法による枝刈りの閾値を
決定する閾値設定手段と、を具備することを特徴とす
る。In order to solve the above-mentioned problems, a voice recognition device according to the present invention is a voice recognition device for recognizing an input voice and outputting a recognition result. A voice input means for detecting a voice section from the time waveform of the voice, dividing the detected voice section into frames, and converting the frame-divided voice into parameters for recognition, and a standard pattern of voice prepared in advance. Is stored as a dictionary, a signal converted into a parameter by the voice input means, and a standard pattern read from the dictionary are compared using a beam search method,
Voice recognition means for outputting a recognition result, perplexity detection means for reading a dictionary stored in the storage means and detecting perplexity representing complexity of a recognition task performed by the voice recognition means; Threshold value setting means for determining a pruning threshold value by the beam search method performed by the voice recognition means based on the perplexity detected by the plexity detection means.

【００１５】また、本発明の音声認識装置は、入力され
た音声を認識して認識結果を出力する音声認識装置であ
って、音声を入力し、前記音声の時間波形から音声区間
を検出し、検出された音声区間をフレーム分割し、フレ
ーム分割された音声を認識のためにパラメータ化する音
声入力手段と、あらかじめ用意された音声の標準パター
ンを辞書として記憶している記憶手段と、音声入力手段
においてパラメータ化された信号と、辞書から読み出し
た標準パターンとを、ビームサーチ法を用いながら比較
し、認識結果を出力する音声認識手段と、前記音声認識
手段より現在処理しているフレーム番号が入力され、ま
た、前記記憶手段に記憶された辞書を読み出して、前記
音声認識手段で行う認識タスクの、あるフレーム区間の
複雑さを表すパープレキシティを検出する区間パープレ
キシティ検出手段と、前記区間パープレキシティ検出手
段において検出されたパープレキシティより、前記音声
認識手段で行うビームサーチ法による枝刈りの閾値を決
定する閾値設定手段と、を具備することを特徴とする。The voice recognition device of the present invention is a voice recognition device for recognizing an input voice and outputting a recognition result. The voice recognition device receives a voice and detects a voice section from a time waveform of the voice. Voice input means for dividing a detected voice section into frames and parameterizing the frame-divided voice for recognition, storage means for storing a standard pattern of voice prepared in advance as a dictionary, and voice input means The signal parameterized in the above and the standard pattern read from the dictionary are compared using the beam search method, and the speech recognition means for outputting the recognition result, and the frame number currently being processed by the speech recognition means are input. A dictionary representing the complexity of a certain frame section of the recognition task performed by the speech recognition means by reading the dictionary stored in the storage means. Interval perplexity detecting means for detecting lexity, and threshold setting means for determining a pruning threshold by a beam search method performed by the voice recognition means from the perplexity detected by the section perplexity detecting means, It is characterized by having.

【００１６】また、本発明の音声認識装置は、入力され
た音声を認識して認識結果を出力する音声認識装置であ
って、音声を入力し、前記音声の時間波形から音声区間
を検出し、検出された音声区間をフレーム分割し、フレ
ーム分割された音声を認識のためにパラメータ化する音
声入力手段と、あらかじめ用意された音声の標準パター
ンを辞書として記憶している記憶手段と、音声入力手段
においてパラメータ化された信号と、辞書から読み出し
た標準パターンとを、ビームサーチ法を用いながら比較
し、認識結果を出力する音声認識手段と、前記記憶手段
に記憶された辞書を読み出して、前記音声認識手段で行
う認識タスクの複雑さを表すパープレキシティを検出す
るパープレキシティ検出手段と、前記音声認識手段にお
いて実際にアクティブなパスの本数を検出する実アクテ
ィブパス数検出手段と、前記パープレキシティ検出手段
から出力されたパープレキシティと、前記実アクティブ
パス数検出手段から出力されたアクティブなパスの本数
とが入力され、前記ビームサーチ法における枝刈りの閾
値を決定し、前記枝刈りの閾値を前記音声認識手段に出
力する、閾値設定手段と、を具備することを特徴とす
る。The voice recognition device of the present invention is a voice recognition device for recognizing an input voice and outputting a recognition result. The voice recognition device receives a voice and detects a voice section from a time waveform of the voice. Voice input means for dividing a detected voice section into frames and parameterizing the frame-divided voice for recognition, storage means for storing a standard pattern of voice prepared in advance as a dictionary, and voice input means The parameterized signal and the standard pattern read from the dictionary are compared using a beam search method, and a speech recognition unit that outputs a recognition result, and a dictionary stored in the storage unit is read, and the speech is read. A perplexity detecting means for detecting perplexity representing the complexity of the recognition task performed by the recognition means; Active path number detecting means for detecting the number of active paths, the perplexity output from the perplexity detecting means, and the number of active paths output from the real active path number detecting means. A threshold setting unit that determines a pruning threshold value in the beam search method and outputs the pruning threshold value to the voice recognition unit.

【００１７】また、本発明の音声認識装置は、入力され
た音声を認識して認識結果を出力する音声認識装置であ
って、音声を入力し、前記音声の時間波形から音声区間
を検出し、検出された音声区間をフレーム分割し、フレ
ーム分割された音声を認識のためにパラメータ化する音
声入力手段と、あらかじめ用意された音声の標準パター
ンを辞書として記憶している記憶手段と、音声入力手段
においてパラメータ化された信号と、辞書から読み出し
た標準パターンとを、ビームサーチ法を用いながら比較
し、認識結果を出力する音声認識手段と、前記記憶手段
に記憶された辞書を読み出して、前記音声認識手段で行
う認識タスクの複雑さを表すパープレキシティを検出す
るパープレキシティ検出手段と、前記音声認識手段より
現在処理しているフレーム番号が入力され、前記記憶手
段に記憶された辞書を読み出して、前記認識タスクの前
記フレーム番号における計算アクティブパス数を計算す
るアクティブパス数計算手段と、前記パープレキシティ
検出手段から出力されたパープレキシティと、前記アク
ティブパス数計算手段から出力された計算アクティブパ
ス数とが入力され、前記音声認識装置でのビームサーチ
法における目標アクティブパス数を決定し、出力する、
目標アクティブパス数設定手段と、前記音声認識装置に
おける実際のアクティブパス数を検出する実アクティブ
パス数検出手段と、前記目標アクティブパス数設定手段
から出力された目標アクティブパス数と、実アクティブ
パス数検出手段から出力されたアクティブパス数が入力
され、アクティブパス数が目標アクティブパス数を上回
っている場合には、枝刈りの閾値を更新して、前記音声
認識装置に新たな枝刈りの閾値を出力する閾値設定手段
と、を具備することを特徴とする。The voice recognition device of the present invention is a voice recognition device for recognizing an input voice and outputting a recognition result. The voice recognition device receives a voice and detects a voice section from a time waveform of the voice. Voice input means for dividing a detected voice section into frames and parameterizing the frame-divided voice for recognition, storage means for storing a standard pattern of voice prepared in advance as a dictionary, and voice input means The parameterized signal and the standard pattern read from the dictionary are compared using a beam search method, and a speech recognition unit that outputs a recognition result, and a dictionary stored in the storage unit is read, and the speech is read. A perplexity detecting means for detecting a perplexity representing the complexity of the recognition task performed by the recognizing means; A frame number is input, a dictionary stored in the storage unit is read, and an active path number calculation unit that calculates a calculated active path number in the frame number of the recognition task is output from the perplexity detection unit. Perplexity and the calculated number of active paths output from the number of active paths calculating means are input, determine the target number of active paths in the beam search method in the speech recognition device, and output.
Target active path number setting means, actual active path number detecting means for detecting the actual number of active paths in the speech recognition device, target active path number output from the target active path number setting means, actual active path number When the number of active paths output from the detection means is input and the number of active paths exceeds the target number of active paths, the pruning threshold is updated and a new pruning threshold is set to the speech recognition device. And output threshold setting means.

【００１８】また、本発明の音声認識装置は、入力され
た音声を認識して認識結果を出力する音声認識装置であ
って、音声を入力し、前記音声の時間波形から音声区間
を検出し、検出された音声区間をフレーム分割し、フレ
ーム分割された音声を認識のためにパラメータ化する音
声入力手段と、あらかじめ用意された音声の標準パター
ンを辞書として記憶している記憶手段と、音声入力手段
においてパラメータ化された信号と、辞書から読み出し
た標準パターンとを、ビームサーチ法を用いながら比較
し、認識結果を出力する音声認識手段と、前記記憶手段
に記憶された辞書を読み出して、前記音声認識手段で行
う認識タスクの複雑さを表すパープレキシティを検出す
るパープレキシティ検出手段と、前記音声認識手段より
現在処理しているフレーム番号が入力され、前記記憶手
段に記憶された辞書を読み出して、前記認識タスクの前
記フレーム番号における計算アクティブパス数を計算す
るアクティブパス数計算手段と、前記パープレキシティ
検出手段から出力されたパープレキシティと、前記アク
ティブパス数計算手段から出力された計算アクティブパ
ス数とが入力され、前記音声認識装置でのビームサーチ
法における適確な目標アクティブパス数を決定して、前
記音声認識手段に対して前記目標アクティブパス数を出
力する、目標アクティブパス数設定手段と、前記音声認
識手段における実際のアクティブなパスの本数を検出す
る実アクティブパス数検出手段と、前記実アクティブパ
ス数検出手段から出力されるアクティブパス数と、前記
目標アクティブパス数とが入力され、前記アクティブな
パスの本数が前記目標アクティブパス数よりも大きい場
合には、閾値の更新命令を出力する比較手段と、前記音
声認識手段において、あるフレームで計算される、アク
ティブパスの累積尤度の平均値および分散値を検出す
る、平均値分散値検出手段と、前記平均値分散値検出手
段から出力される平均値および分散値と、前記比較手段
から出力される閾値の更新命令が入力された場合に、前
記音声認識手段の枝刈りの閾値を更新する閾値設定手段
と、を具備することを特徴とする。The voice recognition device of the present invention is a voice recognition device for recognizing an input voice and outputting a recognition result. The voice recognition device receives a voice and detects a voice section from a time waveform of the voice. Voice input means for dividing a detected voice section into frames and parameterizing the frame-divided voice for recognition, storage means for storing a standard pattern of voice prepared in advance as a dictionary, and voice input means The parameterized signal and the standard pattern read from the dictionary are compared using a beam search method, and a speech recognition unit that outputs a recognition result, and a dictionary stored in the storage unit is read, and the speech is read. A perplexity detecting means for detecting a perplexity representing the complexity of the recognition task performed by the recognizing means; A frame number is input, a dictionary stored in the storage unit is read, and an active path number calculation unit that calculates a calculated active path number in the frame number of the recognition task is output from the perplexity detection unit. The perplexity and the calculated number of active paths output from the number of active paths calculating means are input, and an appropriate target number of active paths in a beam search method in the voice recognition device is determined. A target active path number setting means for outputting the target active path number to the voice recognition means; a real active path number detecting means for detecting the actual number of active paths in the voice recognition means; and the real active path number detecting means And the target number of active paths output from the When the number of the active paths is larger than the target number of active paths, the comparison unit that outputs a threshold update command, and the speech recognition unit calculates the cumulative likelihood of the active path in a certain frame. An average value variance value detecting means for detecting an average value and a variance value, an average value and a variance value output from the average value variance value detecting means, and a threshold update command output from the comparing means are input A threshold setting unit that updates a pruning threshold of the voice recognition unit.

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施の形態につい
て、図１から図５を参照しながら説明を行う。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to FIGS.

【００２０】（実施の形態１）本発明の実施の形態１に
ついて、図１を参照しながら説明する。Embodiment 1 Embodiment 1 of the present invention will be described with reference to FIG.

【００２１】実施の形態１の音声認識装置１０１は、音
声入力手段１０２と、記憶手段１０３と、音声認識手段
１０４と、パープレキシティ検出手段１０５と、第１の
閾値設定手段１０６とを具備している。The speech recognition apparatus 101 according to the first embodiment includes a speech input unit 102, a storage unit 103, a speech recognition unit 104, a perplexity detection unit 105, and a first threshold setting unit 106. ing.

【００２２】音声入力手段１０２は、時間波形から音声
区間を検出し、検出された音声区間をフレーム分割し、
フレーム分割された音声を認識のためにパラメータ化す
る。The voice input means 102 detects a voice section from the time waveform, divides the detected voice section into frames,
Parameterize the framed speech for recognition.

【００２３】記憶手段１０３は、あらかじめ用意された
音声の標準パターンを辞書として記憶している。The storage means 103 stores a standard pattern of voice prepared in advance as a dictionary.

【００２４】音声認識手段１０４は、音声入力手段にお
いてパラメータ化された信号と、辞書から読み出した標
準パターンとを、ビームサーチ法を用いながら比較し、
認識結果を文字列などとして出力する。The voice recognition means 104 compares the signal parameterized by the voice input means with the standard pattern read from the dictionary while using a beam search method.
Output the recognition result as a character string.

【００２５】パープレキシティ検出手段１０５は、記憶
手段１０３に格納されている辞書を読み出して、認識タ
スクの複雑さを表わすパラメータであるパープレキシテ
ィＦを検出する。ここで、パープレキシティＦとは、定
数である。前記パープレキシティ検出手段１０５は、認
識タスク全体のパープレキシティ、あるいは、認識タス
クの初期フレーム部分のパープレキシティを検出するも
のとする。The perplexity detecting means 105 reads the dictionary stored in the storage means 103 and detects perplexity F which is a parameter representing the complexity of the recognition task. Here, the perplexity F is a constant. The perplexity detecting means 105 detects the perplexity of the entire recognition task or the perplexity of the initial frame portion of the recognition task.

【００２６】第１の閾値設定手段１０６は、前記パープ
レキシティ検出手段１０５において検出されたパープレ
キシティより、前記音声認識手段１０４で行うビームサ
ーチ法による枝刈りの閾値αを決定する。パープレキシ
ティは、認識タスクの複雑さを表わすパラメータである
から、パープレキシティが大きい場合には累積尤度の閾
値αは小さく、逆に、パープレキシティが小さい場合に
は累積尤度の閾値αは大きくすれば良い。The first threshold setting means 106 determines a threshold α for pruning by the beam search method performed by the voice recognition means 104 from the perplexity detected by the perplexity detection means 105. Since the perplexity is a parameter representing the complexity of the recognition task, the threshold α of the cumulative likelihood is small when the perplexity is large, and conversely, the threshold of the cumulative likelihood is small when the perplexity is small. α may be increased.

【００２７】また、多くの認識タスクにおいて、初期フ
レーム部分でのパープレキシティが、全体のパープレキ
シティに比べて大きいという傾向がある。よって、前記
パープレキシティ検出手段１０５においては、特に認識
タスクの初期フレーム部分のパープレキシティを検出し
て枝刈りの閾値を設定すると、枝刈りの閾値を狭く設定
し過ぎて最適パスをも刈り取ってしまうという危険性が
軽減される。In many recognition tasks, there is a tendency that the perplexity in the initial frame portion is larger than the entire perplexity. Therefore, when the perplexity detecting means 105 detects the perplexity of the initial frame portion of the recognition task and sets the pruning threshold value, the pruning threshold value is set too narrow and the optimal path is also pruned. The danger of losing is reduced.

【００２８】以上の構成により、本発明の音声認識装置
は、認識タスクのパープレキシティＦを用いて、枝刈り
の閾値αを自動的に設定することが可能となる。With the above configuration, the speech recognition apparatus of the present invention can automatically set the pruning threshold α using the perplexity F of the recognition task.

【００２９】（実施の形態２）本発明の実施の形態２に
ついて、図２を参照しながら説明する。(Embodiment 2) Embodiment 2 of the present invention will be described with reference to FIG.

【００３０】実施の形態２の音声認識装置２０１は、音
声入力手段２０２と、記憶手段２０３と、音声認識手段
２０４と、区間パープレキシティ検出手段２０５と、第
２の閾値設定手段２０６とを具備している。The speech recognition apparatus 201 according to the second embodiment includes a speech input unit 202, a storage unit 203, a speech recognition unit 204, a section perplexity detection unit 205, and a second threshold setting unit 206. doing.

【００３１】実施の形態２の構成は、実施の形態１の構
成とほとんど同じであるが、異なるのは、パープレキシ
ティ検出手段１０５の代わりに、区間パープレキシティ
検出手段２０５を具備していることである。The configuration of the second embodiment is almost the same as the configuration of the first embodiment, except that a section perplexity detecting means 205 is provided instead of the perplexity detecting means 105. That is.

【００３２】区間パープレキシティ検出手段２０５は、
前記音声認識手段より現在処理しているフレーム番号ｔ
が入力され、また、記憶手段２０３に格納されている辞
書を読み出す。ここで、認識タスクの辞書を参照しなが
ら、ある限られた区間、例えば、初期フレーム部分、中
間フレーム部分、後期フレーム部分など、のパープレキ
シティＦｂを検出する。The section perplexity detecting means 205
The frame number t currently processed by the voice recognition means
Is input, and the dictionary stored in the storage unit 203 is read. Here, the perplexity Fb of a certain limited section, for example, an initial frame portion, an intermediate frame portion, a late frame portion, etc., is detected while referring to the dictionary of the recognition task.

【００３３】一般的に、パープレキシティは、辞書に登
録されている語彙の初期フレーム部分、中間フレーム部
分、後期フレーム部分などによって変化する。In general, the perplexity varies depending on the initial frame portion, intermediate frame portion, late frame portion, etc. of the vocabulary registered in the dictionary.

【００３４】本発明の区間パープレキシティ検出手段に
よれば、例えば、入力音声の初期、中間、後期の各区間
に対応したパープレキシティＦｂを検出し、枝刈りの閾
値をそれぞれ設定することが可能となる。これにより、
ビームサーチ法において、不要なパスの累積尤度を計算
するという冗長性や、残すべきパスを刈り取ってしまう
危険性が、改善される。According to the section perplexity detecting means of the present invention, for example, it is possible to detect the perplexity Fb corresponding to each of the initial, intermediate, and late sections of the input voice, and set the pruning thresholds, respectively. It becomes possible. This allows
In the beam search method, the redundancy of calculating the cumulative likelihood of unnecessary paths and the risk of pruning remaining paths are improved.

【００３５】（実施の形態３）本発明の実施の形態３の
音声認識装置３０１について、図３を参照しながら説明
する。(Embodiment 3) A speech recognition apparatus 301 according to Embodiment 3 of the present invention will be described with reference to FIG.

【００３６】音声入力手段３０２と、記憶手段３０３
と、音声認識手段３０４と、パープレキシティ検出手段
３０５とは、前記実施の形態１と同じである。実施の形
態３が前記実施の形態と異なるのは、実アクティブパス
数検出手段３０６と、第３の閾値設定手段３０７と、を
具備していることである。Voice input means 302 and storage means 303
The voice recognition means 304 and the perplexity detection means 305 are the same as in the first embodiment. The third embodiment is different from the third embodiment in that the third embodiment includes a real active path number detecting unit 306 and a third threshold setting unit 307.

【００３７】実アクティブパス数検出手段３０６は、前
記音声認識手段３０４で行われるビームサーチ法におけ
るアクティブなパスの本数ｎ（ｔ）を、各フレーム毎に
検出し、第３の閾値設定手段３０７に送る。The actual number of active paths detecting means 306 detects the number n (t) of active paths in the beam search method performed by the voice recognizing means 304 for each frame, and outputs the number to the third threshold setting means 307. send.

【００３８】閾値設定手段３０７は、前記パープレキシ
ティ検出手段３０５から出力されたパープレキシティＦ
と、前記実アクティブパス数検出手段３０６から出力さ
れたアクティブパス数ｎ（ｔ）とが入力され、前記音声
認識装置３０４でのビームサーチ法におけるアクティブ
パスの枝刈りの閾値α（ｔ）を決定する。The threshold setting means 307 outputs the perplexity F output from the perplexity detecting means 305.
And the number n (t) of active paths output from the actual number-of-active-paths detecting means 306 are input, and the threshold α (t) for pruning active paths in the beam search method in the speech recognition device 304 is determined. I do.

【００３９】本実施の形態の音声認識装置３０１は、枝
刈りの閾値α（ｔ）の設定にあたって、パープレキシテ
ィＦおよびアクティブパス数ｎ（ｔ）を用いるのが特徴
である。これにより、パープレキシティＦのみで枝刈り
の閾値を設定する場合に比べて、枝刈りの閾値α（ｔ）
をさらに適確な値に自動設定することが可能となる。つ
まり、実際に処理中のアクティブパス数ｎ（ｔ）が分か
らないために、枝刈りが不十分で不要なパスの計算を残
してしまうという冗長性や、枝刈りをし過ぎて最適パス
をも刈り取ってしまうという危険性が軽減される。The speech recognition apparatus 301 of this embodiment is characterized in that a perplexity F and the number of active paths n (t) are used in setting the pruning threshold value α (t). As a result, compared to the case where the pruning threshold is set only by the perplexity F, the pruning threshold α (t)
Can be automatically set to a more accurate value. That is, since the number n (t) of active paths that are actually being processed is not known, pruning is insufficient and unnecessary paths are left uncalculated. The risk of reaping is reduced.

【００４０】（実施の形態４）本発明の実施の形態４の
音声認識装置４０１について、図４を参照しながら説明
する。(Embodiment 4) A speech recognition apparatus 401 according to Embodiment 4 of the present invention will be described with reference to FIG.

【００４１】音声入力手段４０２と、記憶手段４０３
と、音声認識手段４０４と、パープレキシティ検出手段
４０５と、実アクティブパス数検出手段４０８は、前記
実施の形態３のそれらと同じである。実施の形態４の音
声認識装置４０１は、さらに、前記記憶手段４０３に接
続されたアクティブパス数計算手段４０６と、目標アク
ティブパス数設定手段４０７と、第４の第４の閾値設定
手段４０９、を具備している。Voice input means 402 and storage means 403
The voice recognition unit 404, the perplexity detection unit 405, and the actual number of active paths detection unit 408 are the same as those of the third embodiment. The speech recognition apparatus 401 according to the fourth embodiment further includes an active path number calculating unit 406, a target active path number setting unit 407, and a fourth fourth threshold value setting unit 409 connected to the storage unit 403. I have it.

【００４２】アクティブパス数計算手段４０６は、前記
音声認識手段４０４より現在処理しているフレーム番号
ｔが入力され、かつ、前記記憶手段４０３に格納された
標準パターンを読み出す。ここでは、もし、前記音声認
識手段４０４で枝刈りを行わないとした場合に、前記フ
レーム番号ｔにおいて、累積尤度の計算が必要となるア
クティブパス数Ｍ（ｔ）を算出する。The number of active paths calculating means 406 receives the frame number t currently being processed from the voice recognizing means 404 and reads the standard pattern stored in the storing means 403. Here, if pruning is not performed by the voice recognition unit 404, the number of active paths M (t) for which the calculation of the accumulated likelihood is required is calculated at the frame number t.

【００４３】目標アクティブパス数設定手段４０７は、
前記パープレキシティ検出手段４０５から出力されたパ
ープレキシティＦと、前記アクティブパス数計算手段４
０６から出力された計算アクティブパス数Ｍ（ｔ）とが
入力され、現在の認識タスクに適した目標アクティブパ
ス数Ｎ（ｔ）を設定し、第４の閾値設定手段４０９に対
して出力する。ここで、パープレキシティＦは定数、計
算アクティブパス数Ｍは各フレーム毎に変化する値であ
る。The target number of active paths setting means 407 is
The perplexity F output from the perplexity detecting means 405 and the number of active paths calculating means 4
The calculated active path number M (t) output from step 06 is input, the target active path number N (t) suitable for the current recognition task is set, and output to the fourth threshold setting means 409. Here, the perplexity F is a constant, and the number M of calculated active paths is a value that changes for each frame.

【００４４】第４の閾値設定手段４０９は、前記実アク
ティブパス数検出手段４０８から出力された実際のアク
ティブパス数ｎ（ｔ）と、前記目標アクティブパス数設
定手段４０７から出力された目標アクティブパス数Ｎ
（ｔ）とが入力され、実際のアクティブパス数ｎ（ｔ）
が目標アクティブパス数Ｎ（ｔ）よりも多い場合には、
枝刈りの閾値α（ｔ）を更新する。The fourth threshold value setting means 409 calculates the actual number of active paths n (t) output from the actual number of active paths detecting means 408 and the target active path number output from the target number of active paths setting means 407. Number N
(T) and the actual number of active paths n (t)
Is larger than the target number of active paths N (t),
The pruning threshold α (t) is updated.

【００４５】本発明によれば、枝刈りの閾値α（ｔ）を
設定する際に、認識タスクのパープレキシティＦと、認
識タスクの標準パターンの辞書から計算した計算アクテ
ィブパス数Ｍ（ｔ）とを用いて目標アクティブパス数Ｎ
（ｔ）を設定し、その目標アクティブパス数Ｎ（ｔ）と
実際のアクティブパス数ｎ（ｔ）を各フレーム毎に比較
しながら枝刈りの閾値α（ｔ）を更新して行くので、よ
り適確な閾値を自動的に設定することが可能となる。According to the present invention, when setting the pruning threshold value α (t), the perplexity F of the recognition task and the number of calculated active paths M (t) calculated from the dictionary of the standard pattern of the recognition task. And the target number of active paths N
(T) is set, and the pruning threshold α (t) is updated while comparing the target number of active paths N (t) with the actual number of active paths n (t) for each frame. An appropriate threshold can be automatically set.

【００４６】これにより、ビームサーチ法において、枝
刈りが不十分で不要なパスの計算をしてしまうという冗
長性や、枝刈りをし過ぎて最適パスをも刈り取ってしま
うという危険性が軽減される。As a result, in the beam search method, the redundancy that an unnecessary path is calculated due to insufficient pruning and the danger that the optimum path is pruned after excessive pruning are reduced. You.

【００４７】（実施の形態５）本発明の実施の形態５の
音声認識装置５０１について、図５を参照しながら説明
する。(Fifth Embodiment) A speech recognition apparatus 501 according to a fifth embodiment of the present invention will be described with reference to FIG.

【００４８】音声入力手段５０２と、記憶手段５０３
と、音声認識手段５０４と、パープレキシティ検出手段
５０５と、アクティブパス数計算手段５０６と、目標ア
クティブパス数設定手段５０７と、実アクティブパス検
出手段５０８は、前記実施の形態４のそれらと同じであ
る。Voice input means 502 and storage means 503
The voice recognition means 504, the perplexity detection means 505, the active path number calculation means 506, the target active path number setting means 507, and the actual active path detection means 508 are the same as those of the fourth embodiment. It is.

【００４９】実施の形態５の音声認識装置５０１は、さ
らに、前記実アクティブパス数検出手段５０８から出力
される実際のアクティブパス数ｎ（ｔ）と、前記目標ア
クティブパス数設定手段５０７から出力される目標アク
ティブパス数Ｎ（ｔ）が入力される、比較手段５０９
と、前記音声認識手段５０４において、あるフレームで
計算される、全アクティブパスの累積尤度のばらつき具
合を表わす、累積尤度の平均値Ａ（ｔ）および分散値Ｂ
（ｔ）を検出する、平均値分散値検出手段５１０と、前
記比較器５０９において、前記実際のアクティブパス数
ｎ（ｔ）が前記目標アクティブパス数Ｎ（ｔ）を上回っ
た場合に出力される閾値更新命令と、前記平均値分散値
検出手段５１０から出力される累積尤度の平均値Ａ
（ｔ）および分散値Ｂ（ｔ）とが入力される、第５の閾
値設定手段５１１と、を具備している。The speech recognition apparatus 501 of the fifth embodiment further outputs the actual number of active paths n (t) output from the actual number of active paths detecting means 508 and the output from the target number of active paths setting means 507. The comparison unit 509 receives the target number N (t) of active paths.
And the average value A (t) and the variance value B of the cumulative likelihood, which indicate the degree of variation of the cumulative likelihood of all the active paths calculated in a certain frame by the voice recognition means 504.
Mean value variance detecting means 510 for detecting (t) and the comparator 509 output when the actual number of active paths n (t) exceeds the target number of active paths N (t). A threshold update command and an average value A of the accumulated likelihood output from the average value variance value detection means 510
And a fifth threshold value setting unit 511 to which (t) and the variance value B (t) are input.

【００５０】前記平均値分散値検出手段５１０は、各フ
レーム毎に、音声認識手段５０４で計算されるアクティ
ブパスの累積尤度の平均値Ａ（ｔ）および分散値Ｂ
（ｔ）を検出し、第５の閾値設定手段５１１に出力す
る。The average value variance value detecting means 510 calculates the average value A (t) and the variance value B of the cumulative likelihood of the active path calculated by the speech recognition means 504 for each frame.
(T) is detected and output to the fifth threshold value setting means 511.

【００５１】前記第５の閾値設定手段５１１は、前記比
較手段５０９から閾値更新命令を受け、前記平均値分散
値検出手段５１０から出力された、アクティブパスの累
積尤度の平均値Ａ（ｔ）および分散値Ｂ（ｔ）、およ
び、現在の閾値とによって、新たな閾値α（ｔ）を設定
し、前記音声認識手段５０４に送る。The fifth threshold value setting means 511 receives the threshold update command from the comparing means 509 and outputs the average value A (t) of the cumulative likelihood of the active path output from the average value variance value detecting means 510. A new threshold value α (t) is set based on the variance value B (t) and the current threshold value, and is sent to the voice recognition unit 504.

【００５２】本発明によれば、認識タスクのパープレキ
シティＦと計算アクティブパス数Ｍ（ｔ）から計算した
目標アクティブパス数Ｎ（ｔ）と、アクティブパスの累
積尤度の平均値Ａ（ｔ）と、分散値Ｂ（ｔ）と、を用い
て、枝刈りの閾値α（ｔ）を求めるので、前記実施の形
態に比べてさらに適確な閾値を自動的に設定することが
可能となる。According to the present invention, the target active path number N (t) calculated from the perplexity F of the recognition task and the calculated active path number M (t), and the average value A (t) of the cumulative likelihood of the active paths ) And the variance value B (t), the pruning threshold value α (t) is obtained, so that a more accurate threshold value can be automatically set as compared with the above embodiment. .

【００５３】これにより、ビームサーチ法において、枝
刈りが不十分で不要なパスの計算をしてしまうという冗
長性や、枝刈りをし過ぎて最適パスをも刈り取ってしま
うという危険性が軽減される。As a result, in the beam search method, there is reduced the redundancy that an unnecessary path is calculated due to insufficient pruning, and the danger that the optimum path is pruned due to excessive pruning is reduced. You.

【００５４】[0054]

【発明の効果】以上のように本発明によれば、枝刈りの
ための累積尤度の閾値や目標アクティブパス数を自動的
に設定することが可能になるという効果が得られる。As described above, according to the present invention, it is possible to automatically set the threshold value of the cumulative likelihood for pruning and the target number of active paths.

[Brief description of the drawings]

【図１】本発明の実施の形態１の音声認識装置を表わす
ブロック図FIG. 1 is a block diagram showing a speech recognition device according to a first embodiment of the present invention.

【図２】本発明の実施の形態２の音声認識装置を表わす
ブロック図FIG. 2 is a block diagram illustrating a speech recognition device according to a second embodiment of the present invention.

【図３】本発明の実施の形態３の音声認識装置を表わす
ブロック図FIG. 3 is a block diagram illustrating a speech recognition device according to a third embodiment of the present invention.

【図４】本発明の実施の形態４の音声認識装置を表わす
ブロック図FIG. 4 is a block diagram showing a voice recognition device according to a fourth embodiment of the present invention.

【図５】本発明の実施の形態５の音声認識装置を表わす
ブロック図FIG. 5 is a block diagram showing a speech recognition apparatus according to a fifth embodiment of the present invention.

【図６】従来の技術の音声認識装置を表わすブロック図FIG. 6 is a block diagram showing a conventional speech recognition apparatus.

[Explanation of symbols]

１０１音声認識装置１０２音声入力手段１０３記憶手段１０４音声認識手段１０５パープレキシティ検出手段１０６第１の閾値設定手段２０１音声認識装置２０２音声入力手段２０３記憶手段２０４音声認識手段２０５区間パープレキシティ検出手段２０６第２の閾値設定手段３０１音声認識装置３０２音声入力手段３０３記憶手段３０４音声認識手段３０５パープレキシティ検出手段３０６実アクティブパス数検出手段３０７第３の閾値設定手段４０１音声認識装置４０２音声入力手段４０３記憶手段４０４音声認識手段４０５パープレキシティ検出手段４０６アクティブパス数計算手段４０７目標アクティブパス数設定手段４０８実アクティブパス数検出手段４０９第４の閾値設定手段５０１音声認識装置５０２音声入力手段５０３記憶手段５０４音声認識手段５０５パープレキシティ検出手段５０６アクティブパス数計算手段５０７目標アクティブパス数設定手段５０８実アクティブパス数検出手段５０９比較手段５１０平均値分散値検出手段５１１第５の閾値設定手段 Reference Signs List 101 voice recognition device 102 voice input means 103 storage means 104 voice recognition means 105 perplexity detection means 106 first threshold setting means 201 voice recognition device 202 voice input means 203 storage means 204 voice recognition means 205 section perplexity detection means 206 Second threshold setting means 301 Speech recognition device 302 Speech input means 303 Storage means 304 Speech recognition means 305 Perplexity detection means 306 Actual active path number detection means 307 Third threshold setting means 401 Speech recognition device 402 Speech input means 403 Storage means 404 Voice recognition means 405 Perplexity detection means 406 Active path number calculation means 407 Target active path number setting means 408 Real active path number detection means 409 Fourth threshold value setting means 501 Sound Recognition device 502 Speech input means 503 Storage means 504 Speech recognition means 505 Perplexity detection means 506 Active path number calculation means 507 Target active path number setting means 508 Real active path number detection means 509 Comparison means 510 Average value variance value detection means 511 Fifth threshold setting means

───────────────────────────────────────────────────── フロントページの続き (72)発明者中藤良久大阪府門真市大字門真1006番地松下電器産業株式会社内 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Yoshihisa Nakato 1006 Kazuma Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. A speech recognition apparatus for recognizing an input speech and outputting a recognition result, wherein the speech is input, a speech section is detected from a time waveform of the speech, and the detected speech section is divided into frames. A voice input means for converting the frame-divided voice into parameters for recognition; a storage means for storing a standard pattern of voice prepared in advance as a dictionary; and a signal converted into parameters by the voice input means. When,
A voice recognition unit that compares a standard pattern read from the dictionary with a beam search method and outputs a recognition result, and reads a dictionary stored in the storage unit and performs a recognition task performed by the voice recognition unit. A perplexity detecting means for detecting a perplexity representing the depth, and a threshold setting for determining a pruning threshold by a beam search method performed by the voice recognition means from the perplexity detected by the perplexity detecting means. And a means.

2. The speech recognition apparatus according to claim 1, wherein said perplexity detection means detects a perplexity of an initial frame portion of said input speech.

3. A speech recognition apparatus for recognizing an inputted speech and outputting a recognition result, wherein the speech is inputted, a speech section is detected from a time waveform of the speech, and the detected speech section is divided into frames. A voice input unit for parameterizing the frame-divided voice for recognition; a storage unit for storing a standard pattern of voice prepared in advance as a dictionary; a signal parameterized by the voice input unit; The standard pattern read from the dictionary is compared using a beam search method, and a voice recognition unit that outputs a recognition result is input with a frame number that is currently being processed from the voice recognition unit, and is stored in the storage unit. A section for reading a stored dictionary and detecting a perplexity representing the complexity of a certain frame section of a recognition task performed by the voice recognition means. -Plexity detecting means, and threshold setting means for determining a pruning threshold by a beam search method performed by the voice recognition means from the perplexity detected by the section perplexity detecting means. Voice recognition device.

4. A speech recognition device for recognizing an inputted speech and outputting a recognition result, wherein the speech is inputted, a speech section is detected from a time waveform of the speech, and the detected speech section is divided into frames. A voice input unit for parameterizing the frame-divided voice for recognition; a storage unit for storing a standard pattern of voice prepared in advance as a dictionary; a signal parameterized by the voice input unit; A voice recognition unit that compares a standard pattern read from the dictionary with a beam search method and outputs a recognition result, and reads a dictionary stored in the storage unit and performs a recognition task performed by the voice recognition unit. A perplexity detecting means for detecting a perplexity representing the length of the path, and detecting the number of actually active paths in the voice recognition means. Active path number detecting means, perplexity output from the perplexity detecting means, and the number of active paths output from the actual active path number detecting means are input, and pruning in the beam search method is performed. And a threshold setting unit that determines the threshold value of pruning and outputs the pruning threshold value to the voice recognition unit.

5. A speech recognition apparatus for recognizing inputted speech and outputting a recognition result, comprising the steps of receiving speech, detecting a speech section from a time waveform of the speech, and dividing the detected speech section into frames. A voice input unit for parameterizing the frame-divided voice for recognition; a storage unit for storing a standard pattern of voice prepared in advance as a dictionary; a signal parameterized by the voice input unit; A voice recognition unit that compares a standard pattern read from the dictionary with a beam search method and outputs a recognition result, and reads a dictionary stored in the storage unit and performs a recognition task performed by the voice recognition unit. A perplexity detecting means for detecting a perplexity representing the frame number, and a frame number which is currently being processed is inputted from the voice recognition means. An active path number calculating means for reading a dictionary stored in a storage means and calculating a calculated active path number in the frame number of the recognition task; a perplexity output from the perplexity detecting means; Target active path number setting means, to which the calculated active path number output from the path number calculating means is input, to determine and output the target active path number in the beam search method in the speech recognition apparatus, and the speech recognition apparatus The actual active path number detecting means for detecting the actual number of active paths in, the target active path number output from the target active path number setting means, and the active path number output from the real active path number detecting means are input. , The number of active paths exceeds the target number of active paths In this case,
And a threshold setting unit that updates a pruning threshold value and outputs a new pruning threshold value to the speech recognition device.

6. A speech recognition apparatus for recognizing an input speech and outputting a recognition result, comprising the steps of: inputting a speech, detecting a speech section from a time waveform of the speech, and dividing the detected speech section into frames. A voice input unit for parameterizing the frame-divided voice for recognition; a storage unit for storing a standard pattern of voice prepared in advance as a dictionary; a signal parameterized by the voice input unit; A voice recognition unit that compares a standard pattern read from the dictionary with a beam search method and outputs a recognition result, and reads a dictionary stored in the storage unit and performs a recognition task performed by the voice recognition unit. A perplexity detecting means for detecting a perplexity representing the frame number, and a frame number which is currently being processed is inputted from the voice recognition means. An active path number calculating means for reading a dictionary stored in a storage means and calculating a calculated active path number in the frame number of the recognition task; a perplexity output from the perplexity detecting means; The calculated active path number output from the path number calculation means is input, and an appropriate target active path number in the beam search method in the speech recognition device is determined. Target active path number setting means for outputting the number of active paths, actual active path number detecting means for detecting the number of actual active paths in the voice recognition means, and the number of active paths output from the actual active path number detecting means And the target number of active paths are input, and the When the number is larger than the target number of active paths, a comparison unit that outputs a threshold update command; and the average value and variance of the cumulative likelihood of the active path calculated in a certain frame in the voice recognition unit. Average value variance value detection means for detecting the average value and variance value output from the average value variance value detection means, and when the threshold update command output from the comparison means is input, the voice A threshold setting unit that updates a pruning threshold of the recognition unit.