JP4461646B2

JP4461646B2 - Speech recognition apparatus, beam search method, and beam search program

Info

Publication number: JP4461646B2
Application number: JP2001195050A
Authority: JP
Inventors: 孝友枝
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-06-27
Filing date: 2001-06-27
Publication date: 2010-05-12
Anticipated expiration: 2021-06-27
Also published as: JP2003015683A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識装置、ビームサーチ方法、およびビームサーチプログラムに関する。
【０００２】
【従来の技術】
音声認識処理において、ビームサーチ法と呼ばれる方法が知られている。これは、大語彙連続音声認識などにおいて、保持する仮説（認識候補）を一定数内に納めて演算量・メモリ容量を削減するための方法の１つである。ビームサーチ法では、各フレームにおける仮説群において、所定のビーム幅を用いて、評価値の高いものだけ残し、低いものは枝刈り（プルーニング）する。
【０００３】
携帯端末やカーナビでは、メモリ量の小さな組み込み用途のＣＰＵ上で動作する連続音声認識装置が求められている。そのアプリケーションとしては、目的地検索のための住所入力や目的地である施設名の入力などがある。これらのタスクを想定すると、組み込み用途の音声認識エンジンに求められる機能としては、以下のようなものが挙げられる。
【０００４】
○大語彙な固有名詞の認識 ○発声中の息継ぎの許容 ○誤認識があった場合に、正しく認識された部分を省略して誤認識された所から再発声することの許容○「えー」などの付加語の許容 ○語順を入れ替えた発声の許容 ○与えられた一定のメモリ量内での動作（一時的にでもそのメモリ量を超えることは望ましくない）など。
【０００５】
従来のビームサーチ法の一例が、ＩＣＳＬＰ９４，ＹＯＫＯＨＡＭＡ１９９４年の「ＩＭＰＲＯＶＥＭＥＮＴＳＩＮＢＥＡＭＳＥＡＲＣＨ」と題されたＶｏｌｋｅｒＳｔｅｉｎｂｉｓｓらによる論文に掲載されている。この従来方法について図１０および図１１を参照して説明する。ここでは、説明を簡略化して、１フレーム分の処理について述べる。
【０００６】
あるフレームについて、仮説展開処理を施す前の仮説群のことを「展開前仮説」と呼ぶことにする。また、仮説展開処理が施された後の仮説群のことを「展開後仮説」と呼ぶことにする。
【０００７】
ステップ１：サーチ制御部２２は、展開前仮説の中から１つ仮説を取り出し、以下のステップ３までの処理を行う。サーチ制御部２２は、全ての展開前仮説に対して順次このループ処理を行う。
【０００８】
ステップ２：仮説を、ネットワーク管理部２３に記録されているネットワーク（＝音素などの認識単位をアークとするネットワーク）に従って、定められた遷移先（自己遷移を含む）に展開する。もし、ネットワークの展開処理（＝サーチ処理に必要なネットワーク部分を外部記憶からメモリ上に展開する処理）が必要であれば、ネットワーク管理部２３はネットワークを展開する。
【０００９】
ステップ３：仮説を遷移先に展開した際に単語遷移が発生した場合、ワードエンドテーブル管理部２４は、単語遷移情報を記録する。
【００１０】
ステップ４：展開前仮説に対する仮説展開処理が完了すると、ビーム調整部２１は、展開後仮説中で最も高いスコアＳを求める。
【００１１】
ステップ５：ビーム調整部２１は、ステップ４で求めた最高スコアＳから所定のビーム幅ｂを差し引いた値を枝刈りの閾値ｔｈとして決定して、展開後仮説のうち閾値ｔｈ以下のスコアを持つ仮説を枝刈り（プルーニング）して棄却する。
【００１２】
ステップ６：ビーム調整部２１は、展開後仮説のうち、枝刈り処理後に残った仮説の数ｎを求める。
【００１３】
ステップ７：ステップ６で求めた枝刈り後仮説数ｎが、予め指定された最大仮説数Ｎｍａｘよりも多い場合、ビーム調整部２１は、枝刈り後の仮説数が上記Ｎｍａｘになるような仮説スコアの閾値ｔｈ’を求める。ここでは、ヒストグラムを用いてその閾値を求めている。ステップ６で求めた仮説数ｎが、Ｎｍａｘ以下の場合、このフレームでの仮説展開処理は終了する。
【００１４】
ステップ８：サーチ制御部２２は、ステップ７においてビーム調整部２１が新たに求めた閾値ｔｈ’を用いて、展開後仮説をさらに枝刈りする。この結果、枝刈りされずに残っている仮説の個数は、Ｎｍａｘ以下に収まる。これでこのフレームでの仮説展開処理は終了する。
【００１５】
【発明が解決しようとする課題】
しかしながら、上記従来のビームサーチ方法は、仮説個数の制御を仮説展開処理後に行っているため、仮説展開処理中、一時的に、仮説数が予め指定された個数を超えてしまう。これは、メモリの最大使用量が制限される組み込み用途での音声認識処理には不都合であるという問題点がある。
【００１６】
本発明は、かかる問題点に鑑みてなされたものであり、ビーム幅を動的に調整制御することにより、仮説展開処理中および仮説展開処理後の仮説数を予め指定された一定個数内に抑え、処理に必要となるメモリ量を予め指定された一定容量内に納めることのできる音声認識装置、ビームサーチ方法、およびビームサーチプログラムを提供することを目的とする。
【００１７】
【課題を解決するための手段】
かかる目的を達成するために、請求項１記載の発明は、連続音声認識におけるフレーム同期ビームサーチの制御を行うサーチ制御手段と、ビームサーチの際、展開される仮説が所定の最大許容仮説数および仮説を保持するための所定のメモリ容量内に納まるようにビーム幅の動的な調整処理を行うビーム調整手段と、ビームサーチの際に使用するネットワークを記憶するネットワーク管理手段と、仮説の単語履歴情報を保持するワードエンドテーブル管理手段と、を有し、ビーム調整手段は、現在フレームでの仮説展開の前に、過去のフレームでの仮説展開の振る舞いを調べ、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて、現在フレームでの展開後仮説数の予測を行い、該予測に基づきビーム幅の調整を行うことを特徴としている。
【００１８】
請求項２記載の発明は、離散単語認識におけるトレリス上でのビームサーチを制御するサーチ制御手段と、ビームサーチの際、展開される仮説が所定の最大許容仮説数および仮説を保持するための所定のメモリ容量内に納まるようにビーム幅の動的な調整処理を行うビーム調整手段と、ビームサーチにおいて使用するネットワークを記憶するネットワーク管理手段と、を有し、ビーム調整手段は、現在フレームでの仮説展開の前に、過去のフレームでの仮説展開の振る舞いを調べ、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて、現在フレームでの展開後仮説数の予測を行い、該予測に基づきビーム幅の調整を行うことを特徴としている。
【００１９】
請求項３記載の発明は、連続音声認識におけるフレーム同期ビームサーチの制御を行うサーチ制御手段と、ビームサーチの際、展開される仮説が所定の最大許容仮説数および仮説を保持するための所定のメモリ容量内に納まるようにビーム幅の動的な調整処理を行うビーム調整手段と、ビームサーチの際に使用するネットワークを記憶するネットワーク管理手段と、仮説の単語履歴情報を保持するワードエンドテーブル管理手段と、を有し、ビーム調整手段は、現在フレームでの仮説展開の前に、過去のフレームでの仮説展開の振る舞いを調べ、過去のフレームにおける仮説展開の振る舞いが現在フレームにおける仮説展開の振る舞いを予測するための良質な指標となるかどうかを判断し、良質な指標とはならないと判断した場合、現在フレームにおける仮説展開の振る舞いの予測を他の方法に変更して行い、該予測に基づきビーム幅の調整を行うことを特徴としている。
【００２０】
請求項４記載の発明は、離散単語認識におけるトレリス上でのビームサーチを制御するサーチ制御手段と、ビームサーチの際、展開される仮説が所定の最大許容仮説数および仮説を保持するための所定のメモリ容量内に納まるようにビーム幅の動的な調整処理を行うビーム調整手段と、ビームサーチにおいて使用するネットワークを記憶するネットワーク管理手段と、を有し、ビーム調整手段は、現在フレームでの仮説展開の前に、過去のフレームでの仮説展開の振る舞いを調べ、過去のフレームにおける仮説展開の振る舞いが現在フレームにおける仮説展開の振る舞いを予測するための良質な指標となるかどうかを判断し、良質な指標とはならないと判断した場合、現在フレームにおける仮説展開の振る舞いの予測を他の方法に変更して行い、該予測に基づきビーム幅の調整を行うことを特徴としている。
【００２１】
請求項５記載の発明は、音声認識におけるビームサーチ方法であって、時間フレーム上の仮説群について、評価値を計算するステップと、該ステップで導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算ステップと、該閾値計算ステップ後、閾値以上の評価値を持つ仮説の数を求め、該仮説数に基づき、展開後仮説数の予測を行うか否か判断するステップと、該ステップで展開後仮説数の予測を行うと判断された場合に閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測ステップと、以上のステップにおいて求めた閾値を用いて仮説群の枝刈りを行う枝刈りステップと、該枝刈りステップにおいて枝刈りされた後の仮説群について、ネットワークに従った展開処理を行う展開ステップと、該展開ステップ中、展開された仮説の数が最大許容数を越えることが確定した場合、展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈りステップに戻って該閾値により再度枝刈りを行って展開処理をやり直すステップと、を有することを特徴としている。
【００２２】
請求項６記載の発明は、音声認識におけるビームサーチ方法であって、時間フレーム上の仮説群について、評価値を計算するステップと、該ステップで導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算ステップと、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測ステップと、以上のステップにおいて求めた閾値を用いて仮説群の枝刈りを行う枝刈りステップと、該枝刈りステップにおいて枝刈りされた後の仮説群について、ネットワークに従った展開処理を行う展開ステップと、該展開ステップ中、展開された仮説の数が最大許容数を越えることが確定した場合、展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈りステップに戻って該閾値により再度枝刈りを行って展開処理をやり直すステップと、を有し、予測ステップは、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて展開後仮説数の予測を行うことを特徴としている。
【００２３】
請求項７記載の発明は、音声認識におけるビームサーチ方法であって、時間フレーム上の仮説群について、評価値を計算するステップと、該ステップで導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算ステップと、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測ステップと、以上のステップにおいて求めた閾値を用いて仮説群の枝刈りを行う枝刈りステップと、該枝刈りステップにおいて枝刈りされた後の仮説群について、ネットワークに従った展開処理を行う展開ステップと、該展開ステップ中、展開された仮説の数が最大許容数を越えることが確定した場合、展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈りステップに戻って該閾値により再度枝刈りを行って展開処理をやり直すステップと、を有し、予測ステップは、過去のフレームでの仮説展開の振る舞いに応じて、展開後仮説数予測方法の変更を行うことを特徴としている。
【００２４】
請求項８記載の発明は、音声認識におけるビームサーチ方法であって、時間フレーム上の仮説群について、評価値を計算するステップと、該ステップで導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算ステップと、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測ステップと、以上のステップにおいて求めた閾値を用いて仮説群の枝刈りを行う枝刈りステップと、該枝刈りステップにおいて枝刈りされた後の仮説群について、ネットワークに従った展開処理を行う展開ステップと、該展開ステップ中、展開された仮説の数が最大許容数を越えることが確定した場合、展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈りステップに戻って該閾値により再度枝刈りを行って展開処理をやり直すステップと、を有し、予測ステップは、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて展開後仮説数の予測を行うとともに、過去のフレームでの仮説展開の振る舞いに応じて、展開後仮説数予測方法の変更を行うことを特徴としている。
【００２５】
請求項９記載の発明は、音声認識におけるビームサーチプログラムであって、時間フレーム上の仮説群について、評価値を計算する処理と、該処理で導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算処理と、該閾値計算処理後、閾値以上の評価値を持つ仮説の数を求め、該仮説数に基づき、展開後仮説数の予測を行うか否か判断する処理と、該処理で展開後仮説数の予測を行うと判断された場合に閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測処理と、以上の処理において求めた閾値を用いて仮説群の枝刈りを行う枝刈り処理と、該枝刈り処理において枝刈りされた後の仮説群について、ネットワークに従った展開を行う展開処理と、該展開処理中、展開された仮説の数が最大許容数を越えることが確定した場合、該展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈り処理に戻って該閾値により再度枝刈りを行って展開処理をやり直す処理と、をコンピュータに実行させることを特徴としている。
【００２６】
請求項１０記載の発明は、音声認識におけるビームサーチプログラムであって、時間フレーム上の仮説群について、評価値を計算する処理と、該処理で導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算処理と、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測処理と、以上の処理において求めた閾値を用いて仮説群の枝刈りを行う枝刈り処理と、該枝刈り処理において枝刈りされた後の仮説群について、ネットワークに従った展開を行う展開処理と、該展開処理中、展開された仮説の数が最大許容数を越えることが確定した場合、該展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈り処理に戻って該閾値により再度枝刈りを行って展開処理をやり直す処理と、をコンピュータに実行させ、予測処理は、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて展開後仮説数の予測を行うことを特徴としている。
【００２７】
請求項１１記載の発明は、音声認識におけるビームサーチプログラムであって、時間フレーム上の仮説群について、評価値を計算する処理と、該処理で導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算処理と、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測処理と、以上の処理において求めた閾値を用いて仮説群の枝刈りを行う枝刈り処理と、該枝刈り処理において枝刈りされた後の仮説群について、ネットワークに従った展開を行う展開処理と、該展開処理中、展開された仮説の数が最大許容数を越えることが確定した場合、該展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈り処理に戻って該閾値により再度枝刈りを行って展開処理をやり直す処理と、をコンピュータに実行させ、予測処理は、過去のフレームでの仮説展開の振る舞いに応じて、展開後仮説数予測方法の変更を行うことを特徴としている。
【００２８】
請求項１２記載の発明は、音声認識におけるビームサーチプログラムであって、時間フレーム上の仮説群について、評価値を計算する処理と、該処理で導出された最高評価値から所定のビーム幅を差し引いて枝刈り用の評価値閾値を計算する閾値計算処理と、閾値以上の評価値を持つ仮説についての展開後仮説数の予測を行い、該予測された仮説数が所定の最大許容数を越える場合、該最大許容数以内に納まるようにビーム幅を調整して閾値を補正する予測処理と、以上の処理において求めた閾値を用いて仮説群の枝刈りを行う枝刈り処理と、該枝刈り処理において枝刈りされた後の仮説群について、ネットワークに従った展開を行う展開処理と、該展開処理中、展開された仮説の数が最大許容数を越えることが確定した場合、該展開処理のキャンセルを行って仮説展開前の状態に戻し、ビーム幅を再調整して新たな閾値を求め、枝刈り処理に戻って該閾値により再度枝刈りを行って展開処理をやり直す処理と、をコンピュータに実行させ、予測処理は、過去のフレームでの仮説展開速度、仮説展開加速度、および現在フレームでの仮説展開前の仮説数を用いて展開後仮説数の予測を行うとともに、過去のフレームでの仮説展開の振る舞いに応じて、展開後仮説数予測方法の変更を行うことを特徴としている。
【００３７】
【発明の実施の形態】
以下、本発明の実施の形態を添付図面を参照しながら詳細に説明する。
【００３８】
図１は、本発明の実施の形態における音声認識装置の構成を示すブロック図である。本装置は、ビーム調整部１と、サーチ制御部２と、ネットワーク管理部３と、ワードエンドテーブル管理部４とを有する。また、図示しない音声入力処理部、認識結果出力処理部などを有する。各部は、本発明のビームサーチ方法およびプログラムに従って動作する。ビーム調整部１は、ビームサーチの際のビーム幅の動的調整処理を行う。サーチ制御部２は、ビームサーチの主要な制御、つまり、枝刈りや仮説展開などを行う。ネットワーク管理部３は、ビームサーチの際に参照するネットワークを記憶および管理する。ワードエンドテーブル管理部４は、連続音声認識を行う際に必要な要素であり、単語履歴情報（単語終端情報・単語遷移情報を含む）を保持して管理する。
【００３９】
本実施例における音声認識装置で使用するネットワークの例について説明する。本装置では、文脈自由文法（ＣＦＧ）を、再帰を許すネットワーク文法の形式で記述して音声認識を行う。本装置でのネットワーク文法全体は、複数のサブネットワーク文法（ルールと呼ぶ）から構成される。ネットワーク文法全体をルールの集合で表すことにより、木構造ルールの効率的な利用などが可能である。
【００４０】
例えば、図３に示す１１万語全国住所認識タスクでは、全国の住所を３階層の木構造ルールから構成しており、県名の木構造ルールの出力は、各県ごとに別の木構造ルール（「神奈川県の市を集めた木構造ルール」など）に接続される。
【００４１】
また、図４に示すように、ネットワーク文法中に同一の木構造ルール（「地名」や「カテゴリ名」）が重複して多数回出現するような場合にも、それら木構造ルールの形式が共有されるので、各ルールはただ１つだけ保持すればよい。これにより語順の自由度が高まる。
【００４２】
図７は、本発明の実施の形態における音声認識装置でのビームサーチ方法およびビームサーチプログラムの動作を示すフローチャートである。図７および図６を用いて、本音声認識装置でのビームサーチにおける枝刈り処理方法、ビーム幅の決定、単語終端情報の記録、およびサーチ制御に必要な各種処理の例を説明する。その他の動作は、従来の連続音声認識または離散単語音声認識におけるフレーム同期ビームサーチ方法に従うものとする。以下、あるフレームについて、仮説展開処理を施す前の仮説群のことを「展開前仮説」と呼ぶことにする。また、仮説展開処理が施された後の仮説群のことを「展開後仮説」と呼ぶことにする（図６参照）。
【００４３】
本発明の第１の実施の形態における音声認識装置は、以下の処理を行う。あるフレームについて、ビーム調整部１は、展開前仮説の中から最も高いスコアＳを求める（ステップＳ１）。ビーム調整部１は、最高スコアＳから、所定のビーム幅ｂを差し引いた値ｔｈを枝刈り用の閾値とする（ステップＳ２、図２参照）。そして、閾値ｔｈ以上のスコアを持つ仮説の個数ｎを求める（ステップＳ３）。
【００４４】
ビーム調整部１は、ステップ３で求めた仮説数ｎと予め指定された数Ｎとを比較して、展開後仮説数の予測を行う必要があるかどうか判断する（ステップＳ４）。ｎがＮより小さい場合、枝刈り用の閾値をステップＳ２で求めたｔｈに決定する（ステップＳ４・ＮＯ）。ｎがＮよりも大きい場合（ステップＳ４・ＮＯ）、展開後仮説が予め指定された最大仮説数Ｎｍａｘを越える可能性があると判断して、ビーム調整部１は、所定の手法（後述）に基づき、展開後仮説数の予測を行う（予測された展開後仮説数をｎｐとする）（ステップＳ５）。そして、予測された仮説数ｎｐと、予め指定された最大仮説数Ｎｍａｘとを比較する（ステップＳ６）。ｎｐがＮｍａｘ以下の場合、枝刈りのための仮説スコア閾値として、ステップ２で求めた閾値ｔｈを使う（ステップＳ７）。ｎｐがＮｍａｘより大きい場合、展開後の仮説数がＮｍａｘ（あるいはＮｍａｘを越えることのなく最も近い数）になるような仮説スコアの閾値ｔｈ’を計算して求める（ステップＳ８）。
【００４５】
サーチ制御部２は、上記ステップにおいて決定された閾値（ｔｈまたはｔｈ’）を用いて、展開前仮説の枝刈り処理を行う（ステップＳ９）。
【００４６】
サーチ制御部２は、展開前仮説の中から１つずつ仮説を取り出し、以下のステップＳ１７までの仮説展開ループ処理を行う（ステップＳ１０）。サーチ制御部２は、仮説を、ネットワーク管理部３に記録されているネットワークに従って、定められた遷移先（自己遷移を含む）に展開する（ステップＳ１１）。もし、ネットワークの展開処理（メモリへのロード）が必要であれば、ネットワーク管理部３は、ネットワークの展開処理を行う（ステップＳ１２、１３）。サーチ制御部２は、仮説展開処理中、展開された仮説の個数ｎａをカウントする（ステップＳ１４）。仮説展開処理において、単語遷移が発生した場合、ワードエンドテーブル管理部４は、単語遷移情報を記録する（ステップＳ１６、１７）。
【００４７】
仮説展開処理中、展開された仮説数ｎａがＮｍａｘを越えることが確定した場合（ステップＳ１５・ＹＥＳ）、サーチ制御部２は、後述の仮説展開キャンセル処理Ａ〜Ｃを行い、ビーム調整部１に対し、枝刈りのための新しい閾値ｔｈ’’を要求する（ステップＳ１８）。ビーム調整部１は、サーチ制御部２の要求に応じて、その時点での閾値（ｔｈまたはｔｈ’）より大きな閾値ｔｈ’’を求めてサーチ制御部２に返す。サーチ制御部２は、新しい閾値ｔｈ’’を用いて、再度、枝刈り処理を行って展開前仮説数を減らす。そして、ステップ１０以降の仮説展開ループ処理をやり直す。
【００４８】
キャンセル処理Ａ：サーチ制御部２は、仮説展開処理において展開前仮説に対して行った変更を全て元に戻し、展開後仮説を保持しているメモリを、新たに仮説を記録することができるように初期化する。
【００４９】
キャンセル処理Ｂ：連続音声認識の場合、サーチ制御部２は、ワードエンドテーブル管理部４がこのフレーム中に記録した全ての単語終端情報を削除し、再びこのフレームにおける処理で発生する単語終端情報を記録することができるようにワードエンドテーブル管理部４を初期化する。
【００５０】
キャンセル処理Ｃ：サーチ制御部２は、仮説展開処理においてネットワーク管理部３の記憶するネットワーク情報を変更している場合、変更部分を、仮説展開処理前の状態に戻す。以上で、仮説展開のやり直しのためのキャンセル処理は終了である。
【００５１】
本発明の第２の実施例における音声認識装置について、図８を参照して説明する。第２の実施例でのビーム調整部１は、過去のフレームにおける仮説展開の振る舞いから、現在フレームでの仮説展開における適切なビーム幅を予測して動的に調整制御する。第ｉフレームにおける、仮説展開前の仮説数をＮｂ（ｉ）、仮説展開後の仮説数をＮａ（ｉ）と定義する。ここで、Ｎａ（ｉ）＝Ｎｂ（ｉ）×Ｖ（ｉ）が成り立つような係数Ｖ（ｉ）を仮説展開速度と定義する。予め定められた最大仮説数をＮｍａｘとすると、Ｎｂ（ｉ）×Ｖ（ｉ）＝ＮｍａｘとなるようなＮｂ（ｉ）は、Ｎｍａｘ／Ｖ（ｉ）と求まる。もし、Ｎｍａｘ／Ｖ（ｉ）＜Ｎｂ（ｉ）である場合、枝刈りによってＮｂ（ｉ）個の仮説をＮｍａｘ／Ｖ（ｉ）個に減らすための、枝刈り閾値を求める。この閾値の導出計算では、従来方法と同様にヒストグラムを使用することにより、演算量を小さく抑えることができる。
【００５２】
本発明の第３の実施例における音声認識装置について、図８を参照して説明する。第３の実施例でのビーム調整部１は、１つ前のフレームでの仮説展開速度Ｖ（ｉ−１）から現在フレームでの仮説展開速度Ｖ（ｉ）を予測し、ビーム幅を調整する。第ｉフレームにおける仮説展開加速度を、Ａ（ｉ）＝Ｖ（ｉ）／Ｖ（ｉ−１）と定義する。理想的には、認識対象である入力音声は、フレーム間では急激には変化しないため、Ａ（ｉ）≒１が成り立つことが多い。このため、例えば、Ｖ（ｉ）≒Ｖ（ｉ−１）とみなし、第２の実施例の方法を用いてビーム幅の調整を行う。
【００５３】
また、Ａ（ｉ）＝Ｖ（ｉ）／Ｖ（ｉ−１）≒１としてＮａ（ｉ）を予測する方法以外にも、過去数フレーム分のＶ（ｊ）（ｊ＜ｉ）を用いた回帰計算を用いてＶ（ｉ）を予測する方法も可能である。
【００５４】
本発明の第４の実施の形態における音声認識装置のビーム調整部１は、サーチ制御部２から枝刈り閾値の変更要求を受けると、より多くの仮説を枝刈りするために、枝刈り閾値を上げる（＝ビーム幅を狭める）。閾値の変更例をいくつか挙げる。
【００５５】
例１：ビーム調整部１は、変更要求を受けた時点でのビーム幅に、予め定められた１より小さな値を掛けることにより、ビーム幅を狭め、現在残っている仮説のスコアの中で最も良いスコアから上記新たなビーム幅を差し引いたスコアを新たな閾値としてサーチ制御部２に渡す。
【００５６】
例２：図９を参照して説明する。変更要求を受けた時点での、展開前仮説のうち仮説展開処理の終了済みの仮説の数をＮｅとする。変更要求を受けた時点で、展開後仮説数は、予め指定された最大値Ｎｍａｘに達しているので、仮説展開速度をＮｍａｘ／Ｎｅとみなす。展開後仮説数をＮｍａｘにするためには、展開前仮説数をＮｅとすればよい。ビーム調整部１は、展開前仮説数をＮｅとするような閾値を計算し、サーチ制御部２に渡す。また、安全係数として予め定められた１より小さな値ｓをＮｅに掛け、展開前仮説数をｓ×Ｎｅとしてもよい。
【００５７】
例３：例２で求めた仮説展開速度Ｎｍａｘ／Ｎｅと、変更要求を受けた時点で計算されている仮説展開速度Ｖとから、新たな仮説展開速度を内挿して求める。例えば、定数ａ（０≦ａ≦１）を用いて、ａＶ＋（１−ａ）Ｎｍａｘ／Ｎｅとする。この仮説展開速度を用いると、展開前仮説数は、Ｎｍａｘ／｛ａＶ＋（１−ａ）Ｎｍａｘ／Ｎｅ｝となる。ビーム調整部１は、展開前仮説数がこの数になる閾値を求め、サーチ制御部２に渡す。
【００５８】
本発明の第５の実施の形態における音声認識装置のサーチ制御部２は、展開後仮説数が、予め指定された最大値Ｎｍａｘを越えた場合、そのフレームで既に行った展開前仮説に対する仮説展開処理による全ての変更をキャンセルして元に戻す。
【００５９】
例えば、前述の従来方法では、ネットワークのアークの持つ情報として、仮説へのポインタを持つ必要があり、仮説展開処理中に、このポインタの値は、アーク上の展開前仮説のポインタから展開後仮説へのポインタに書き換えられる。この場合、アークの持つ仮説へのポインタを仮説展開前の状態に戻す必要がある。
【００６０】
また、ワードエンドテーブル管理部４に、そのフレームで発生した単語遷移情報が記録されており、この記録も削除しなくてはならない。第５の実施例の音声認識装置のサーチ制御部２は、以上のようなキャンセル処理の必要な情報を全て初期化する（展開処理前の状態に戻す）。ただし、ネットワーク展開のキャンセル処理は行っても行わなくても良い。ネットワーク展開のキャンセル処理を行った場合、別の閾値で再びこのフレームのサーチ処理を行う際にネットワーク展開処理を行わなくてはならないが、仮説展開処理のやり直しを行う前に既に展開されてしまった不必要なネットワーク展開のための必要なメモリを要しないため、メモリ量が少なくて済む。ネットワーク展開のキャンセル処理を行わなかった場合、別の閾値で再びこのフレームのビームサーチを行う際にネットワーク展開処理を行う必要はなく、ネットワーク展開のための演算量を減らすことができる。但しこの場合、不必要なネットワーク展開も記録したままとなるため、余分なメモリ量を必要とする。
【００６１】
本発明の第６の実施の形態における音声認識装置のワードエンドテーブル管理部４は、サーチ制御部２から、現在フレームでの仮説展開処理において記録された単語終端情報の削除要求があった場合、現在フレームでの仮説展開処理において記録された全ての単語終端情報を削除することにより、不要なメモリを削減する。
【００６２】
本発明の第７の実施の形態における音声認識装置のビーム調整部１は、仮説展開の予測が困難であると判断したとき、予測方法を変更する。仮説展開の予測が困難であるケースの１つとして、ビームサーチを行っている対象のネットワーク中で急激に分岐数が増える箇所があり、そこに到達した仮説が一気に多くの仮説に展開されるというケースが考えられる。このため、仮説展開加速度が、予め指定された一定値（＝Ａｍａｘとする）を越えた場合、仮説展開の予測方法を別の方法に変更した方が良い。予測方法としては以下のものが考えられる。
【００６３】
方法１：仮説展開速度を１とする。この仮説展開速度を用いて、第２の実施例の方法で展開後仮説数を予測する。
【００６４】
方法２：仮説展開速度として、現在求められている仮説展開速度ではなく、Ｖ（ｉ−１）／Ａｍａｘを用いる。この仮説展開速度を用いて、第２の実施例の方法で展開後仮説数を予測する。
【００６５】
方法３：現在生き残っている仮説の存在するアークについて、全ての後続アーク数をネットワークから求め、Ｖ＝（［全ての後続アーク数］＋［現在の仮説の存在するアーク数］）／［現在の仮説の存在するアーク数］を仮説展開速度とする。この仮説展開速度を用いて、第２の実施例の方法で展開後仮説数を予測する。また、予め定めた安全係数ｓ（０＜ｓ＜１）を上記Ｖに掛け、ｓ×Ｖを仮説展開速度としても良い。
【００６６】
なお、上述した実施形態は、本発明の好適な実施形態の一例を示すものであり、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲内において、種々変形実施が可能である。
【００６７】
【発明の効果】
以上の説明から明らかなように、本発明によれば、仮説を保持する数およびメモリ容量を所定値内に納めることにより、省メモリ・省演算量を実現することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態における音声認識装置の構成を示すブロック図である。
【図２】枝刈り閾値の決定の仕方について示す図である。
【図３】ネットワークの例（１１万語全国住所タスク）を示す図である。
【図４】ネットワークの例（施設名入力タスク）を示す図である。
【図５】あるフレームにおける展開前仮説と展開後仮説について示す図である。
【図６】本発明のビームサーチ方法について説明するための図である。
【図７】本発明の実施の形態における音声認識装置でのビームサーチ方法およびビームサーチプログラムの処理を示すフローチャートである。
【図８】第３の実施例における予測方法を説明するための図である。
【図９】第４の実施例における閾値変更例２を説明するための図である。
【図１０】従来のビームサーチを行う音声認識装置の構成を示すブロック図である。
【図１１】従来のビームサーチ方法の一例の動作を示すフローチャートである。
【符号の説明】
１ビーム調整部
２サーチ制御部
３ネットワーク管理部
４ワードエンドテーブル管理部
２１ビーム調整部
２２サーチ制御部
２３ネットワーク管理部
２４ワードエンドテーブル管理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition apparatus, a beam search method, and a beam search program.
[0002]
[Prior art]
In speech recognition processing, a method called a beam search method is known. This is one of the methods for reducing the amount of calculation and memory capacity by storing hypotheses (recognition candidates) to be held within a certain number in large vocabulary continuous speech recognition and the like. In the beam search method, in a hypothesis group in each frame, only a high evaluation value is left using a predetermined beam width, and a low one is pruned.
[0003]
In portable terminals and car navigation systems, there is a demand for a continuous speech recognition device that operates on a CPU for embedded use with a small memory capacity. The application includes input of an address for searching for a destination and input of a facility name as a destination. Assuming these tasks, the functions required for a built-in speech recognition engine include the following.
[0004]
○ Recognition of large vocabulary proper nouns ○ Allowance of breathing during utterance ○ In case of misrecognition, omission of correct recognition and omission of recurrence from the place of misrecognition ○ Eh ○ Allowed additional words ○ Allowed utterance with reversed word order ○ Operation within a given amount of memory (it is not desirable to exceed that amount of memory even temporarily).
[0005]
An example of a conventional beam search method is published in a paper by Volker Steinbiss et al. Entitled “IMPROVEMENTS IN BEAM SEARCH” in ICSLP94, YOKOHAMA 1994. This conventional method will be described with reference to FIGS. Here, the description will be simplified and processing for one frame will be described.
[0006]
A hypothesis group before a hypothesis development process for a certain frame is referred to as a “pre-deployment hypothesis”. The hypothesis group after the hypothesis development processing is performed is referred to as “post-expansion hypothesis”.
[0007]
Step 1: The search control unit 22 extracts one hypothesis from the pre-development hypotheses, and performs the processing up to Step 3 below. The search control unit 22 sequentially performs this loop process for all the pre-deployment hypotheses.
[0008]
Step 2: The hypothesis is developed to a predetermined transition destination (including self-transition) according to a network recorded in the network management unit 23 (= a network in which recognition units such as phonemes are arcs). If network expansion processing (= processing for expanding a network portion necessary for search processing from an external storage to a memory) is necessary, the network management unit 23 expands the network.
[0009]
Step 3: When a word transition occurs when the hypothesis is expanded to the transition destination, the word end table management unit 24 records the word transition information.
[0010]
Step 4: When the hypothesis development process for the pre-deployment hypothesis is completed, the beam adjustment unit 21 obtains the highest score S in the post-deployment hypothesis.
[0011]
Step 5: The beam adjustment unit 21 determines a value obtained by subtracting the predetermined beam width b from the highest score S obtained in Step 4 as a pruning threshold th, and has a score equal to or lower than the threshold th among the post-deployment hypotheses. Pruning the hypothesis and rejecting it.
[0012]
Step 6: The beam adjustment unit 21 obtains the number n of hypotheses remaining after the pruning process among the expanded hypotheses.
[0013]
Step 7: When the number n of post-pruning hypotheses obtained in step 6 is larger than the maximum hypothesis number Nmax specified in advance, the beam adjustment unit 21 makes a hypothesis score such that the number of hypotheses after pruning is Nmax. The threshold value th ′ is obtained. Here, the threshold value is obtained using a histogram. If the number of hypotheses n obtained in step 6 is less than or equal to Nmax, the hypothesis development process in this frame ends.
[0014]
Step 8: The search control unit 22 further prunes the post-deployment hypothesis using the threshold th ′ newly obtained by the beam adjustment unit 21 in Step 7. As a result, the number of hypotheses remaining without being pruned falls below Nmax. This completes the hypothesis development process in this frame.
[0015]
[Problems to be solved by the invention]
However, in the conventional beam search method, since the number of hypotheses is controlled after the hypothesis development process, the number of hypotheses temporarily exceeds the number designated in advance during the hypothesis development process. This has the problem that it is inconvenient for speech recognition processing in embedded applications where the maximum amount of memory used is limited.
[0016]
The present invention has been made in view of such problems. By dynamically adjusting and controlling the beam width, the number of hypotheses during and after the hypothesis development processing is suppressed to a predetermined number. An object of the present invention is to provide a speech recognition device, a beam search method, and a beam search program that can store a memory amount necessary for processing within a predetermined capacity.
[0017]
[Means for Solving the Problems]
In order to achieve such an object, the invention according to claim 1 is characterized in that search control means for controlling frame-synchronized beam search in continuous speech recognition, and a hypothesis to be developed in the beam search is a predetermined maximum allowable number of hypotheses and Beam adjustment means for dynamically adjusting the beam width so as to fit within a predetermined memory capacity for holding a hypothesis, network management means for storing a network used for beam search, and word history of the hypothesis A word end table management means for holding information; , Have The beam adjustment means examines the behavior of the hypothesis development in the past frame before the hypothesis development in the current frame, and the hypothesis development speed in the past frame, the hypothesis development acceleration, and the hypothesis development in the current frame. The number of hypotheses is used to predict the number of hypotheses after deployment in the current frame, and the beam width is adjusted based on the prediction. It is characterized by that.
[0018]
According to the second aspect of the present invention, there is provided search control means for controlling beam search on a trellis in discrete word recognition, a predetermined hypothesis number for holding a predetermined maximum allowable hypothesis number and a hypothesis when a hypothesis to be developed in the beam search is Beam adjusting means for performing dynamic adjustment of the beam width so as to fit within the memory capacity, and network management means for storing a network used in beam search , Have The beam adjustment means examines the behavior of the hypothesis development in the past frame before the hypothesis development in the current frame, and the hypothesis development speed in the past frame, the hypothesis development acceleration, and the hypothesis development in the current frame. The number of hypotheses is used to predict the number of hypotheses after deployment in the current frame, and the beam width is adjusted based on the prediction. It is characterized by that.
[0019]
The invention described in claim 3 Search control means for controlling frame-synchronized beam search in continuous speech recognition, and beam so that the hypothesis to be developed falls within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search. Beam adjustment means for performing dynamic adjustment processing of width, network management means for storing a network used for beam search, and word end table management means for holding hypothesis word history information, The beam adjustment means examines the behavior of hypothesis development in the past frame before the hypothesis development in the current frame. If the behavior of hypothesis development in the past frame is a good index for predicting the behavior of hypothesis development in the current frame, and if it is not a good index, Change the behavior prediction to another method, and adjust the beam width based on the prediction It is characterized by that.
[0020]
The invention according to claim 4 Search control means for controlling beam search on the trellis in discrete word recognition, and so that the developed hypothesis is within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search Beam adjustment means for performing dynamic adjustment processing of the beam width, and network management means for storing a network used in the beam search. Examine the behavior of hypothesis expansion in the frame, determine whether the behavior of hypothesis expansion in the past frame is a good index for predicting the behavior of hypothesis expansion in the current frame, and determine that it does not become a good index In this case, the prediction of the behavior of hypothesis development in the current frame is changed to another method, and Adjust the width It is characterized by that.
[0021]
The invention according to claim 5 A beam search method in speech recognition, the step of calculating an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step Calculating a threshold value after calculating the threshold value, determining the number of hypotheses having an evaluation value equal to or greater than the threshold value, and determining whether to predict the number of hypotheses after expansion based on the hypothesis number; When it is determined that the number of hypotheses after expansion is predicted in the step, the number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold value, and the predicted number of hypotheses exceeds a predetermined maximum allowable number A prediction step of adjusting the beam width so as to be within the maximum allowable number and correcting the threshold, a pruning step of pruning a hypothesis group using the threshold obtained in the above steps, For a hypothesis group that has been pruned in the pruning step, a deployment step that performs a deployment process according to the network, and if it is determined that the number of deployed hypotheses exceeds the maximum allowable number during the deployment step, the deployment process To return to the state before the hypothesis development, to re-adjust the beam width to obtain a new threshold value, to return to the pruning step, perform pruning again with the threshold value, and perform the deployment process again. It is characterized by that.
[0022]
The invention described in claim 6 A beam search method in speech recognition, the step of calculating an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step A threshold calculation step for calculating the number of hypotheses after the expansion for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction step for adjusting the beam width and correcting the threshold value, a pruning step for pruning a hypothesis group using the threshold value obtained in the above steps, and a hypothesis after pruning in the pruning step For a group, if a deployment step that performs deployment processing according to the network and the number of hypotheses deployed during the deployment step is determined to exceed the maximum allowable number, Canceling the process and returning to the state before the hypothesis development, re-adjusting the beam width to obtain a new threshold value, returning to the pruning step, pruning again with the threshold value, and performing the deployment process again. And the prediction step predicts the number of hypotheses after expansion using the hypothesis expansion speed, hypothesis expansion acceleration in the past frame, and the number of hypotheses before hypothesis expansion in the current frame. It is characterized by that.
[0023]
The invention described in claim 7 A beam search method in speech recognition, the step of calculating an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step A threshold calculation step for calculating the number of hypotheses after the expansion for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction step for adjusting the beam width and correcting the threshold value, a pruning step for pruning a hypothesis group using the threshold value obtained in the above steps, and a hypothesis after pruning in the pruning step For a group, if a deployment step that performs deployment processing according to the network and the number of hypotheses deployed during the deployment step is determined to exceed the maximum allowable number, Canceling the process and returning to the state before the hypothesis development, re-adjusting the beam width to obtain a new threshold value, returning to the pruning step, pruning again with the threshold value, and performing the deployment process again. The prediction step changes the post-expansion hypothesis number prediction method according to the behavior of the hypothesis expansion in the past frame. It is characterized by that.
[0024]
The invention described in claim 8 A beam search method in speech recognition, the step of calculating an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step A threshold calculation step for calculating the number of hypotheses after the expansion for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction step for adjusting the beam width and correcting the threshold value, a pruning step for pruning a hypothesis group using the threshold value obtained in the above steps, and a hypothesis after pruning in the pruning step For a group, if a deployment step that performs deployment processing according to the network and the number of hypotheses deployed during the deployment step is determined to exceed the maximum allowable number, Canceling the process and returning to the state before the hypothesis development, re-adjusting the beam width to obtain a new threshold value, returning to the pruning step, pruning again with the threshold value, and performing the deployment process again. The prediction step predicts the number of hypotheses after deployment using the hypothesis development speed, hypothesis development acceleration in the past frame, and the number of hypotheses before the hypothesis development in the current frame, and the hypothesis in the past frame. Change post-expansion hypothesis number prediction method according to the development behavior It is characterized by that.
[0025]
The invention according to claim 9 A beam search program in speech recognition, which calculates an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived by the processing A threshold value calculation process for calculating the number of hypotheses having an evaluation value equal to or greater than the threshold value after the threshold value calculation process, and determining whether to predict the number of hypotheses after expansion based on the number of hypotheses; When it is determined that the number of hypotheses after expansion is predicted by processing, the number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold value, and the predicted number of hypotheses exceeds a predetermined maximum allowable number A prediction process for adjusting the beam width so as to be within the maximum allowable number and correcting the threshold, a pruning process for pruning a hypothesis group using the threshold obtained in the above process, and the pruning process Pruned in If the hypothesis group is expanded in accordance with the network, and if it is determined that the number of expanded hypotheses exceeds the maximum allowable number during the expansion process, the expansion process is canceled before the hypothesis expansion. , Readjust the beam width to obtain a new threshold value, return to the pruning process, perform pruning again with the threshold value, and restart the deployment process It is characterized by that.
[0026]
The invention according to claim 10 is: A beam search program in speech recognition, which calculates an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived by the processing Threshold value calculation processing for calculating the number of hypotheses after expansion for a hypothesis having an evaluation value equal to or greater than the threshold value. If the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction process for adjusting the beam width and correcting the threshold value, pruning process for pruning the hypothesis group using the threshold value obtained in the above process, and the hypothesis after pruning in the pruning process For a group, if it is determined that an expansion process is performed according to the network and the number of expanded hypotheses exceeds the maximum allowable number during the expansion process, the expansion process is canceled and the hypothesis expansion is performed. Returning to the previous state, readjusting the beam width to obtain a new threshold value, returning to the pruning process, pruning again with the threshold value, and performing the expansion process again, and causing the computer to execute the prediction process, , Predict the number of hypotheses after expansion using the hypothesis expansion speed, hypothesis expansion acceleration in the past frame, and the number of hypotheses before the hypothesis expansion in the current frame It is characterized by that.
[0027]
The invention according to claim 11 A beam search program in speech recognition, which calculates an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived by the processing Threshold value calculation processing for calculating the number of hypotheses after expansion for a hypothesis having an evaluation value equal to or greater than the threshold value. If the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction process for adjusting the beam width and correcting the threshold value, pruning process for pruning the hypothesis group using the threshold value obtained in the above process, and the hypothesis after pruning in the pruning process For a group, if it is determined that an expansion process is performed according to the network and the number of expanded hypotheses exceeds the maximum allowable number during the expansion process, the expansion process is canceled and the hypothesis expansion is performed. Returning to the previous state, readjusting the beam width to obtain a new threshold value, returning to the pruning process, pruning again with the threshold value, and performing the expansion process again, and causing the computer to execute the prediction process, Change the method for predicting the number of hypotheses after expansion according to the behavior of hypothesis expansion in the past frame. It is characterized by that.
[0028]
The invention according to claim 12 A beam search program in speech recognition, which calculates an evaluation value for a hypothesis group on a time frame, and an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived by the processing Threshold value calculation processing for calculating the number of hypotheses after expansion for a hypothesis having an evaluation value equal to or greater than the threshold value. If the predicted number of hypotheses exceeds a predetermined maximum allowable number, it falls within the maximum allowable number Prediction process for adjusting the beam width and correcting the threshold value, pruning process for pruning the hypothesis group using the threshold value obtained in the above process, and the hypothesis after pruning in the pruning process For a group, if it is determined that an expansion process is performed according to the network and the number of expanded hypotheses exceeds the maximum allowable number during the expansion process, the expansion process is canceled and the hypothesis expansion is performed. Returning to the previous state, readjusting the beam width to obtain a new threshold value, returning to the pruning process, pruning again with the threshold value, and performing the expansion process again, and causing the computer to execute the prediction process, Predict the number of hypotheses after deployment using the hypothesis deployment speed, hypothesis deployment acceleration in the previous frame, and the number of hypotheses before the hypothesis deployment in the current frame, and depending on the behavior of the hypothesis deployment in the past frame , Change the hypothesis number prediction method after deployment It is characterized by that.
[0037]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
[0038]
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to an embodiment of the present invention. The apparatus includes a beam adjustment unit 1, a search control unit 2, a network management unit 3, and a word end table management unit 4. In addition, a voice input processing unit, a recognition result output processing unit, and the like (not shown) are included. Each unit operates according to the beam search method and program of the present invention. The beam adjustment unit 1 performs dynamic adjustment processing of the beam width at the time of beam search. The search control unit 2 performs main control of the beam search, that is, pruning and hypothesis expansion. The network management unit 3 stores and manages a network to be referred to at the time of beam search. The word end table management unit 4 is an element necessary for continuous speech recognition, and stores and manages word history information (including word end information and word transition information).
[0039]
An example of a network used in the speech recognition apparatus in this embodiment will be described. In this apparatus, context-free grammar (CFG) is described in a network grammar format that allows recursion, and speech recognition is performed. The entire network grammar in this apparatus is composed of a plurality of sub-network grammars (called rules). By representing the entire network grammar as a set of rules, it is possible to efficiently use tree structure rules.
[0040]
For example, in the 110,000-word national address recognition task shown in FIG. 3, national addresses are composed of three-level tree structure rules, and the output of prefecture-name tree structure rules is a separate tree structure rule for each prefecture. (Such as “a tree structure rule that collects cities in Kanagawa Prefecture”).
[0041]
Also, as shown in FIG. 4, even when the same tree structure rule ("place name" or "category name") appears multiple times in the network grammar, the tree structure rule format is shared. Therefore, only one rule needs to be held. This increases the degree of freedom in word order.
[0042]
FIG. 7 is a flowchart showing operations of the beam search method and the beam search program in the speech recognition apparatus according to the embodiment of the present invention. Examples of various processes necessary for the pruning processing method, beam width determination, word end information recording, and search control in the beam search in the speech recognition apparatus will be described with reference to FIGS. Other operations follow the conventional frame-synchronized beam search method in continuous speech recognition or discrete word speech recognition. Hereinafter, a hypothesis group before a hypothesis development process for a certain frame is referred to as a “pre-deployment hypothesis”. The hypothesis group after the hypothesis development process is performed is referred to as “post-deployment hypothesis” (see FIG. 6).
[0043]
The speech recognition apparatus according to the first embodiment of the present invention performs the following processing. For a certain frame, the beam adjusting unit 1 obtains the highest score S from the pre-deployment hypotheses (step S1). The beam adjusting unit 1 sets a value th obtained by subtracting a predetermined beam width b from the highest score S as a pruning threshold (see step S2, FIG. 2). Then, the number n of hypotheses having a score equal to or higher than the threshold th is obtained (step S3).
[0044]
The beam adjustment unit 1 compares the number of hypotheses n obtained in step 3 with the number N specified in advance, and determines whether or not it is necessary to predict the number of hypotheses after deployment (step S4). When n is smaller than N, the threshold for pruning is determined to be th obtained in step S2 (NO in step S4). When n is larger than N (step S4 / NO), it is determined that the post-deployment hypothesis may exceed the maximum number Nmax specified in advance, and the beam adjustment unit 1 performs a predetermined method (described later). Based on this, the number of post-deployment hypotheses is predicted (the predicted number of post-deployment hypotheses is np) (step S5). Then, the predicted number of hypotheses np is compared with a predetermined maximum number of hypotheses Nmax (step S6). If np is less than or equal to Nmax, the threshold th obtained in step 2 is used as a hypothesis score threshold for pruning (step S7). When np is larger than Nmax, a hypothesis score threshold th ′ is calculated and obtained so that the number of hypotheses after expansion becomes Nmax (or the closest number without exceeding Nmax) (step S8).
[0045]
The search control unit 2 performs the pre-development hypothesis pruning process using the threshold value (th or th ′) determined in the above step (step S9).
[0046]
The search control unit 2 extracts hypotheses one by one from the pre-development hypotheses, and performs hypothesis expansion loop processing up to the following step S17 (step S10). The search control unit 2 develops the hypothesis to a predetermined transition destination (including self-transition) according to the network recorded in the network management unit 3 (step S11). If network expansion processing (loading to memory) is necessary, the network management unit 3 performs network expansion processing (steps S12 and S13). The search control unit 2 counts the number of expanded hypotheses na during the hypothesis expansion process (step S14). When a word transition occurs in the hypothesis development process, the word end table management unit 4 records word transition information (steps S16 and S17).
[0047]
When it is determined that the number of expanded hypotheses na exceeds Nmax during the hypothesis expansion processing (step S15, YES), the search control unit 2 performs hypothesis expansion cancellation processing A to C described later, and the beam adjustment unit 1 On the other hand, a new threshold th ″ for pruning is requested (step S18). In response to a request from the search control unit 2, the beam adjustment unit 1 obtains a threshold th ″ that is larger than the threshold (th or th ′) at that time and returns it to the search control unit 2. The search control unit 2 performs the pruning process again using the new threshold th ″ to reduce the number of hypotheses before expansion. Then, the hypothesis expansion loop process after step 10 is performed again.
[0048]
Cancel processing A: The search control unit 2 can undo all the changes made to the pre-deployment hypothesis in the hypothesis development processing and newly record the hypothesis in the memory holding the post-deployment hypothesis. Initialize to.
[0049]
Cancel processing B: In the case of continuous speech recognition, the search control unit 2 deletes all the word end information recorded in this frame by the word end table management unit 4, and again uses the word end information generated by the processing in this frame. The word end table management unit 4 is initialized so that it can be recorded.
[0050]
Cancel processing C: When the network information stored in the network management unit 3 is changed in the hypothesis development processing, the search control unit 2 returns the changed portion to the state before the hypothesis development processing. This completes the cancellation process for redoing the hypothesis development.
[0051]
A speech recognition apparatus according to the second embodiment of the present invention will be described with reference to FIG. The beam adjustment unit 1 in the second embodiment predicts an appropriate beam width in the hypothesis development in the current frame from the behavior of the hypothesis development in the past frame, and dynamically performs adjustment control. In the i-th frame, the number of hypotheses before hypothesis expansion is defined as Nb (i), and the number of hypotheses after hypothesis expansion is defined as Na (i). Here, a coefficient V (i) that satisfies Na (i) = Nb (i) × V (i) is defined as a hypothesis development speed. Assuming that the predetermined maximum hypothesis number is Nmax, Nb (i) such that Nb (i) × V (i) = Nmax is obtained as Nmax / V (i). If Nmax / V (i) <Nb (i), a pruning threshold for reducing Nb (i) hypotheses to Nmax / V (i) by pruning is obtained. In this derivation calculation of the threshold value, the amount of calculation can be kept small by using a histogram as in the conventional method.
[0052]
A speech recognition apparatus according to the third embodiment of the present invention will be described with reference to FIG. The beam adjustment unit 1 in the third embodiment predicts the hypothesis development speed V (i) in the current frame from the hypothesis development speed V (i-1) in the previous frame, and adjusts the beam width. . The hypothetical development acceleration in the i-th frame is defined as A (i) = V (i) / V (i−1). Ideally, the input speech to be recognized does not change abruptly between frames, so A (i) ≈1 often holds. For this reason, for example, V (i) ≈V (i−1) is assumed, and the beam width is adjusted using the method of the second embodiment.
[0053]
In addition to the method of predicting Na (i) with A (i) = V (i) / V (i−1) ≈1, V (j) (j <i) for the past several frames was used. A method of predicting V (i) using regression calculation is also possible.
[0054]
When the beam adjustment unit 1 of the speech recognition apparatus according to the fourth embodiment of the present invention receives a change request for the pruning threshold from the search control unit 2, the pruning threshold is set to prune more hypotheses. Raise (= Narrow beam width). Some examples of changing the threshold are given below.
[0055]
Example 1: The beam adjustment unit 1 narrows the beam width by multiplying the beam width at the time of receiving the change request by a value smaller than a predetermined value 1, and is the most of the remaining hypothesis scores. A score obtained by subtracting the new beam width from the good score is passed to the search control unit 2 as a new threshold value.
[0056]
Example 2: This will be described with reference to FIG. The number of hypotheses for which hypothesis expansion processing has been completed among the pre-expansion hypotheses at the time when the change request is received is represented by Ne. When the change request is received, the number of post-expansion hypotheses has reached the maximum value Nmax specified in advance, so the hypothesis expansion speed is regarded as Nmax / Ne. In order to set the number of hypotheses after expansion to Nmax, the number of hypotheses before expansion may be set to Ne. The beam adjustment unit 1 calculates a threshold value such that the number of hypotheses before deployment is Ne and passes it to the search control unit 2. Further, Ne may be multiplied by a value s smaller than 1 which is predetermined as a safety coefficient, and the number of hypotheses before expansion may be s × Ne.
[0057]
Example 3: A new hypothesis development speed is interpolated from the hypothesis development speed Nmax / Ne obtained in Example 2 and the hypothesis development speed V calculated when the change request is received. For example, aV + (1−a) Nmax / Ne is obtained using a constant a (0 ≦ a ≦ 1). Using this hypothesis expansion speed, the number of hypotheses before expansion is Nmax / {aV + (1−a) Nmax / Ne}. The beam adjustment unit 1 obtains a threshold value at which the number of hypotheses before deployment is this number, and passes it to the search control unit 2.
[0058]
The search control unit 2 of the speech recognition apparatus according to the fifth embodiment of the present invention expands a hypothesis for a pre-development hypothesis that has already been performed in that frame when the number of post-development hypotheses exceeds a predetermined maximum value Nmax. Cancel and undo all changes made by the process.
[0059]
For example, in the above-described conventional method, it is necessary to have a pointer to a hypothesis as information held by the arc of the network. During the hypothesis expansion process, the value of this pointer is changed from the pre-expansion hypothesis pointer on the arc to the post-expansion hypothesis. Is rewritten as a pointer to In this case, it is necessary to return the pointer to the hypothesis held by the arc to the state before the hypothesis development.
[0060]
Further, the word transition information generated in the frame is recorded in the word end table management unit 4, and this record must also be deleted. The search control unit 2 of the speech recognition apparatus of the fifth embodiment initializes all information necessary for the cancel processing as described above (returns to the state before the expansion processing). However, the network deployment cancellation process may or may not be performed. When canceling the network expansion process, the network expansion process must be performed when the search process for this frame is performed again with a different threshold value, but it has already been expanded before the hypothesis expansion process is performed again. Since the memory required for unnecessary network deployment is not required, the amount of memory can be reduced. When the network expansion cancellation process is not performed, it is not necessary to perform the network expansion process when performing the beam search of this frame again with another threshold value, and the amount of calculation for the network expansion can be reduced. However, in this case, unnecessary network development is still recorded, and an extra memory amount is required.
[0061]
When the word end table management unit 4 of the speech recognition device according to the sixth exemplary embodiment of the present invention has received a request to delete the word end information recorded in the hypothesis development process in the current frame from the search control unit 2, By deleting all word end information recorded in the hypothesis development process in the current frame, unnecessary memory is reduced.
[0062]
The beam adjustment unit 1 of the speech recognition apparatus according to the seventh embodiment of the present invention changes the prediction method when determining that it is difficult to predict the hypothesis development. One of the cases where hypothesis expansion is difficult to predict is that there are locations where the number of branches suddenly increases in the target network where beam search is being performed, and the hypothesis that reaches that point is expanded into many hypotheses at once. Cases are considered. For this reason, when the hypothesis development acceleration exceeds a predetermined value (= Amax), it is better to change the hypothesis development prediction method to another method. The following prediction methods are conceivable.
[0063]
Method 1: The hypothesis development speed is set to 1. Using this hypothesis development speed, the number of post-deployment hypotheses is predicted by the method of the second embodiment.
[0064]
Method 2: V (i−1) / Amax is used as the hypothesis development speed instead of the currently calculated hypothesis development speed. Using this hypothesis development speed, the number of post-deployment hypotheses is predicted by the method of the second embodiment.
[0065]
Method 3: For the arcs in which hypotheses that currently survive exist, the number of all subsequent arcs is obtained from the network, and V = ([number of all subsequent arcs] + [number of arcs in which the current hypothesis exists]) / [current The number of arcs in which a hypothesis exists] is defined as a hypothesis expansion speed. Using this hypothesis development speed, the number of post-deployment hypotheses is predicted by the method of the second embodiment. Alternatively, a predetermined safety factor s (0 <s <1) may be multiplied by the V, and s × V may be set as the hypothesis development speed.
[0066]
The above-described embodiment shows an example of a preferred embodiment of the present invention, and the present invention is not limited thereto, and various modifications can be made without departing from the scope of the invention. .
[0067]
【The invention's effect】
As is clear from the above description, according to the present invention, it is possible to realize a memory saving and a calculation saving amount by storing the number of hypotheses and the memory capacity within predetermined values.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a method for determining a pruning threshold.
FIG. 3 is a diagram showing an example of a network (110,000 words national address task).
FIG. 4 is a diagram illustrating an example of a network (facility name input task);
FIG. 5 is a diagram showing a pre-deployment hypothesis and a post-deployment hypothesis in a certain frame.
FIG. 6 is a diagram for explaining a beam search method of the present invention.
FIG. 7 is a flowchart showing a beam search method and beam search program processing in the speech recognition apparatus in the embodiment of the present invention.
FIG. 8 is a diagram for explaining a prediction method in the third embodiment;
FIG. 9 is a diagram for explaining a threshold change example 2 in the fourth embodiment.
FIG. 10 is a block diagram illustrating a configuration of a conventional speech recognition apparatus that performs a beam search.
FIG. 11 is a flowchart showing an operation of an example of a conventional beam search method.
[Explanation of symbols]
1 Beam adjustment unit
2 Search control unit
3 Network Management Department
4 Word End Table Management Department
21 Beam adjustment section
22 Search control unit
23 Network Management Department
24 Word End Table Management Department

Claims

Search control means for controlling frame-synchronized beam search in continuous speech recognition;
Beam adjusting means for performing dynamic adjustment processing of the beam width so that a hypothesis to be developed falls within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search;
Network management means for storing a network used in the beam search;
Word end table management means for holding the hypothesis word history information;
I have a,
The beam adjusting means examines the behavior of hypothesis development in the past frame before hypothesis development in the current frame, and hypothesis development speed, hypothesis development acceleration in the past frame, and before hypothesis development in the current frame. A speech recognition apparatus that predicts the number of hypotheses after expansion in the current frame using the number of hypotheses and adjusts the beam width based on the prediction .

Search control means for controlling beam search on a trellis in discrete word recognition;
Beam adjusting means for performing dynamic adjustment processing of the beam width so that a hypothesis to be developed falls within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search;
Network management means for storing a network used in the beam search;
I have a,
The beam adjusting means examines the behavior of hypothesis development in the past frame before hypothesis development in the current frame, and hypothesis development speed, hypothesis development acceleration in the past frame, and before hypothesis development in the current frame. A speech recognition apparatus that predicts the number of hypotheses after expansion in the current frame using the number of hypotheses and adjusts the beam width based on the prediction .

Search control means for controlling frame-synchronized beam search in continuous speech recognition;
Beam adjusting means for performing dynamic adjustment processing of the beam width so that a hypothesis to be developed falls within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search;
Network management means for storing a network used in the beam search;
Word end table management means for holding the hypothesis word history information;
Have
The beam adjustment means examines the behavior of the hypothesis development in the past frame before the hypothesis development in the current frame, and the behavior of the hypothesis development in the past frame predicts the behavior of the hypothesis development in the current frame. If it is determined that it is not a good index, the prediction of the behavior of hypothesis development in the current frame is changed to another method, and the beam width is adjusted based on the prediction. features and be Ruoto voice recognition device to be carried out.

Search control means for controlling beam search on a trellis in discrete word recognition;
Beam adjusting means for performing dynamic adjustment processing of the beam width so that a hypothesis to be developed falls within a predetermined memory capacity for holding a predetermined maximum allowable number of hypotheses and hypotheses during the beam search;
Network management means for storing a network used in the beam search;
Have
The beam adjustment means examines the behavior of the hypothesis development in the past frame before the hypothesis development in the current frame, and the behavior of the hypothesis development in the past frame predicts the behavior of the hypothesis development in the current frame. If it is determined that it is not a good index, the prediction of the behavior of hypothesis development in the current frame is changed to another method, and the beam width is adjusted based on the prediction. A speech recognition apparatus characterized by performing.

A beam search method in speech recognition,
Calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation step of calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step;
Determining the number of hypotheses having an evaluation value equal to or greater than the threshold after the threshold calculation step, and determining whether to predict the number of hypotheses after expansion based on the number of hypotheses;
When it is determined in this step that the number of hypotheses after expansion is predicted, the number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold value, and the predicted number of hypotheses has a predetermined maximum allowable number. A prediction step for adjusting the threshold by adjusting the beam width so that it falls within the maximum allowable number,
A pruning step of pruning the hypothesis group using the threshold value determined in the above steps;
An expansion step for performing an expansion process according to the network for the hypothesis group that has been pruned in the pruning step;
If it is determined during the expansion step that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to a new threshold value. Re-running the expansion process by returning to the pruning step, pruning again with the threshold, and
A beam search method characterized by comprising:

A beam search method in speech recognition,
Calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation step of calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction step of correcting the threshold value;
A pruning step of pruning the hypothesis group using the threshold value determined in the above steps;
An expansion step for performing an expansion process according to the network for the hypothesis group that has been pruned in the pruning step;
If it is determined during the expansion step that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to a new threshold value. Re-running the expansion process by returning to the pruning step, pruning again with the threshold, and
Have
The prediction step, hypotheses deployment speed of the past frame, hypothesis development acceleration, and features and to ruby Musachi way to make a prediction after the number of hypotheses in developed using the number of hypotheses before hypothesis development in the current frame.

A beam search method in speech recognition,
Calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation step of calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction step of correcting the threshold value;
A pruning step of pruning the hypothesis group using the threshold value determined in the above steps;
An expansion step for performing an expansion process according to the network for the hypothesis group that has been pruned in the pruning step;
If it is determined during the expansion step that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to a new threshold value. Re-running the expansion process by returning to the pruning step, pruning again with the threshold, and
Have
The prediction step, in response to the behavior of the hypothetical development in the past frame, features and be ruby Musachi method that changes the number of hypotheses prediction method after deployment.

A beam search method in speech recognition,
Calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation step for calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the step;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction step of correcting the threshold value;
A pruning step of pruning the hypothesis group using the threshold value determined in the above steps;
An expansion step for performing an expansion process according to the network for the hypothesis group that has been pruned in the pruning step;
If it is determined during the expansion step that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to a new threshold value. Re-running the expansion process by returning to the pruning step, pruning again with the threshold, and
Have
The prediction step predicts the number of hypotheses after deployment using the hypothesis deployment speed, the hypothesis deployment acceleration in the past frame, and the number of hypotheses before the hypothesis deployment in the current frame, and the hypothesis deployment in the past frame. A beam search method characterized by changing a method for predicting the number of hypotheses after deployment according to behavior.

A beam search program for speech recognition,
A process for calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation process for calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the process;
After the threshold value calculation process, the number of hypotheses having an evaluation value equal to or greater than the threshold value is obtained, and based on the number of hypotheses, a process of determining whether to predict the number of hypotheses after expansion;
When it is determined that the number of hypotheses after expansion is predicted in the process, the number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold value, and the predicted number of hypotheses has a predetermined maximum allowable number. A prediction process that adjusts the beam width to correct the threshold so that it falls within the maximum allowable number,
A pruning process for pruning the hypothesis group using the threshold value obtained in the above process;
An expansion process for performing an expansion according to the network for the hypothesis group that has been pruned in the pruning process;
If it is determined during the expansion process that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to create a new one. Obtaining a threshold value, returning to the pruning process, pruning again with the threshold value, and redoing the expansion process; and
A beam search program for causing a computer to execute.

A beam search program for speech recognition,
A process for calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation process for calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the process;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction process for correcting the threshold value,
A pruning process for pruning the hypothesis group using the threshold value obtained in the above process;
An expansion process for performing an expansion according to the network for the hypothesis group that has been pruned in the pruning process;
If it is determined during the expansion process that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to create a new one. Obtaining a threshold value, returning to the pruning process, pruning again with the threshold value, and redoing the expansion process; and
To the computer,
The prediction process, the hypothesis expansion rate of the past frame, hypothesis development acceleration, and to the feature to make a prediction after the number of hypotheses in developed using the number of hypotheses before hypothesis development in the current frame ruby Musachi program.

A beam search program for speech recognition,
A process for calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation process for calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the process;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction process for correcting the threshold value,
A pruning process for pruning the hypothesis group using the threshold value obtained in the above process;
An expansion process for performing an expansion according to the network for the hypothesis group that has been pruned in the pruning process;
If it is determined during the expansion process that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to create a new one. Obtaining a threshold value, returning to the pruning process, pruning again with the threshold value, and redoing the expansion process; and
To the computer,
The prediction process in accordance with the behavior of the hypothetical development in the past frame, features and be ruby over beam search program that changes the number of hypotheses prediction method after deployment.

A beam search program for speech recognition,
A process for calculating an evaluation value for a hypothesis group on a time frame;
A threshold calculation process for calculating an evaluation value threshold for pruning by subtracting a predetermined beam width from the highest evaluation value derived in the process;
The number of hypotheses after expansion is predicted for a hypothesis having an evaluation value equal to or greater than the threshold, and if the predicted number of hypotheses exceeds a predetermined maximum allowable number, the beam width is adjusted so as to be within the maximum allowable number And a prediction process for correcting the threshold value,
A pruning process for pruning the hypothesis group using the threshold value obtained in the above process;
An expansion process for performing an expansion according to the network for the hypothesis group that has been pruned in the pruning process;
If it is determined during the expansion process that the number of expanded hypotheses exceeds the maximum allowable number, the expansion process is canceled to return to the state before the hypothesis expansion, and the beam width is readjusted to create a new one. Obtaining a threshold value, returning to the pruning process, pruning again with the threshold value, and redoing the expansion process; and
To the computer,
The prediction process predicts the number of hypotheses after expansion using the hypothesis expansion speed, hypothesis expansion acceleration in the past frame, and the number of hypotheses before the hypothesis expansion in the current frame, and the hypothesis expansion in the past frame. A beam search program that changes a method of predicting the number of hypotheses after deployment according to behavior.