JP3473704B2

JP3473704B2 - Voice recognition device

Info

Publication number: JP3473704B2
Application number: JP02738093A
Authority: JP
Inventors: 浩明小川; 和夫石井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1993-01-22
Filing date: 1993-01-22
Publication date: 2003-12-08
Anticipated expiration: 2018-12-08
Also published as: JPH06222790A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を認識する場合に
用いて好適な音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus suitable for recognizing speech.

【０００２】[0002]

【従来の技術】従来の音声認識装置は、大きく分けて、
発話者が発声した音声から、例えば単語（語彙）をワー
ドスポッティングする音声認識部と、音声認識部におけ
るワードスポッティングの結果得られた単語候補列を、
あらかじめ用意された、例えば構文情報などを用いて、
文単位で構文解析する解析部とから構成される。2. Description of the Related Art Conventional speech recognition devices are roughly classified into
From a voice uttered by a speaker, for example, a speech recognition unit that word-spots a word (vocabulary), and a word candidate string obtained as a result of word spotting in the speech recognition unit,
Using the prepared syntax information, for example,
It is composed of a parsing unit for parsing sentence by sentence.

【０００３】このように構成される音声認識装置におい
ては、解析部が音声認識部にワードスポッティングする
単語を要求すると、音声認識部において、発話者が発声
した音声から、その単語がワードスポッティングされ、
解析部に出力される。解析部においては、構文情報を用
いて、音声認識部におけるワードスポッティングの結果
得られた単語候補列が文単位で構文解析される。In the speech recognition apparatus having such a configuration, when the analysis unit requests the speech recognition unit for a word to be spotted, the speech recognition unit performs word spotting on the voice uttered by the speaker.
It is output to the analysis unit. In the analysis unit, the word candidate string obtained as a result of the word spotting in the speech recognition unit is syntactically analyzed in sentence units using the syntax information.

【０００４】そして、その構文解析結果に基づいて、ワ
ードスポッティング結果としての単語候補列から、誤っ
た単語候補を除き、正しい文（文章）が得られるように
なされている。Then, based on the result of the syntactic analysis, a correct sentence (sentence) is obtained from the word candidate string as the word spotting result by removing the incorrect word candidate.

【０００５】[0005]

【発明が解決しようとする課題】ところで、従来の音声
認識装置では、解析部より音声認識部にワードスポッテ
ィング処理要求された単語数が多い場合、音声認識部
が、発話者の発話終了時までに、その単語すべてのスポ
ッティング処理を行うことができないときがあった。By the way, in the conventional voice recognition apparatus, when the number of words for which the word spotting processing is requested by the voice recognition unit is larger than that by the analysis unit, the voice recognition unit is required to finish the speech by the speaker. , There were times when I couldn't do the spotting process for all that word.

【０００６】従って、この場合、入力された音声に対す
る応答が遅れる課題があった。Therefore, in this case, there is a problem that the response to the input voice is delayed.

【０００７】そこで、音声認識部に、解析部よりワード
スポッティング処理要求のあった単語のうちの一部の単
語のスポッティング処理を保留させる方法がある。Therefore, there is a method in which the speech recognition unit suspends the spotting processing of some of the words for which the word spotting processing request has been made by the analysis unit.

【０００８】しかしながら、この方法では、解析部が必
要とする単語のスポッティング処理が行われず、やはり
入力された音声に対する応答が遅れる場合があった。However, according to this method, the spotting process of words required by the analysis unit is not performed, and the response to the input voice may be delayed.

【０００９】本発明は、このような状況に鑑みてなされ
たものであり、装置の応答速度を向上させるものであ
る。The present invention has been made in view of such a situation, and improves the response speed of the apparatus.

【００１０】[0010]

【課題を解決するための手段】本発明の音声認識装置
は、音声を認識する認識手段としてのワードスポッティ
ング処理部４と、ワードスポッティング処理部４に、音
声中からの単語の音声認識処理を依頼するとともに、ワ
ードスポッティング処理部４の認識結果を解析し、音声
を理解する解析手段としての構文解析部５とを備え、構
文解析部５は、音声認識処理する優先度を単語に付加し
て、ワードスポッティング処理部４に供給し、ワードス
ポッティング処理部４は、音声認識処理する単語の単語
数を、単語の優先度または自身の音声認識処理能力に基
づいて変更し、単語の優先度に基づいて、音声中からの
単語の音声認識処理を行うことを特徴とする。A speech recognition apparatus according to the present invention requests a word spotting processing section 4 as a recognition means for recognizing a speech and the word spotting processing section 4 to perform speech recognition processing of a word in a speech. In addition, the word spotting processing unit 4 is provided with a syntactic analysis unit 5 as an analyzing means for analyzing the recognition result and understanding the speech, and the syntactic analysis unit 5 adds a priority of speech recognition processing to the word, The word spotting processing unit 4 supplies the word spotting processing unit 4, and the word spotting processing unit 4 supplies the word
Based on word priority or your own speech recognition throughput.
Modify Zui, based on the word priority, and performs a speech recognition process words from in the speech.

【００１１】この音声認識装置は、音声の音声区間を検
出する検出手段としての音声区間検出部３をさらに備え
ることができ、ワードスポッティング処理部４に、音声
区間中と音声区間終了後とで、単語の優先度に基づい
て、音声中から音声認識する単語を変更させることがで
きる。The speech recognition apparatus can further include a speech section detection unit 3 as a detection means for detecting the speech section of the speech, and the word spotting processing unit 4 can detect the speech section during and after the speech section. Based on the priority of the word, the word to be recognized in the voice can be changed.

【００１２】さらに、この音声認識装置は、構文解析部
５に、ワードスポッティング処理部４の認識結果の解析
結果に基づいて、単語の優先度を変更させることができ
る。Further, this speech recognition apparatus can cause the syntax analysis unit 5 to change the priority of words based on the analysis result of the recognition result of the word spotting processing unit 4.

【００１３】[0013]

【００１４】[0014]

【作用】上記構成の音声認識装置においては、ワードス
ポッティング処理部４が、音声認識処理する単語の単語
数を、構文解析部５により付加された単語の優先度また
は自身の音声認識処理能力に基づいて変更し、さらに、
単語の優先度に基づいて、音声中からの単語の音声認識
処理を行う。従って、入力された音声に対する応答処理
の迅速化を図ることができる。In the speech recognition apparatus having the above-described structure, the word spotting processing unit 4 causes the words of the words to be speech-recognized
The number is the priority of the word added by the syntax analysis unit 5 or
Changes based on its own speech recognition processing capability, and
Based on the priority of the word, the speech recognition process of the word in the voice is performed. Therefore, it is possible to speed up the response process to the input voice.

【００１５】ワードスポッティング処理部４に、音声区
間中と音声区間終了後とで、単語の優先度に基づいて、
音声中から音声認識する単語を変更させることができる
場合においては、例えば音声区間終了後には、優先度の
最も高い単語だけの音声認識処理を行うようにすること
ができるので、装置のリアルタイム性を向上させること
ができる。In the word spotting processing unit 4, the word spotting processing unit 4 is based on the priority of the word during the voice section and after the end of the voice section.
In the case where the word to be recognized by voice can be changed from the voice, for example, after the end of the voice section, it is possible to perform the voice recognition process only for the word with the highest priority. Can be improved.

【００１６】構文解析部５に、ワードスポッティング処
理部４の認識結果の解析結果に基づいて、単語の優先度
を変更させることができる場合においては、入力された
音声に対する応答処理の迅速化をさらに図ることができ
る。In the case where the syntactic analysis unit 5 can change the priority of the word based on the analysis result of the recognition result of the word spotting processing unit 4, the response processing to the inputted voice is further speeded up. Can be planned.

【００１７】[0017]

【００１８】[0018]

【実施例】図１は、本発明の音声認識装置を応用した外
部機器コントローラの一実施例の構成を示すブロック図
である。この外部機器コントローラにおいては、外部機
器操作部６に接続された、例えばＡＶ機器などの外部機
器（図示せず）に対する操作を音声により行うことがで
きるようになされている。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the configuration of an embodiment of an external device controller to which the voice recognition device of the present invention is applied. In this external device controller, an external device (not shown) such as an AV device connected to the external device operating unit 6 can be operated by voice.

【００１９】即ち、音声入力部１は、入力された音声を
電気信号としての音声信号に変換し、さらにＡ／Ｄ変換
して、音声分析部２に出力する。音声分析部２は、音声
入力部１からの音声信号から、例えば線形予測係数など
の音声の特徴パラメータを１フレームごとに抽出し、音
声区間検出部３およびワードスポッティング処理部４に
時系列に出力する。That is, the voice input unit 1 converts the input voice into a voice signal as an electric signal, further A / D-converts it, and outputs it to the voice analysis unit 2. The voice analysis unit 2 extracts, for each frame, voice feature parameters such as a linear prediction coefficient from the voice signal from the voice input unit 1 and outputs them to the voice section detection unit 3 and the word spotting processing unit 4 in time series. To do.

【００２０】音声区間検出部３は、音声分析部２からの
特徴パラメータに基づいて、発話の開始されたフレーム
およびその終了したフレーム、即ち音声区間を検出し、
ワードスポッティング処理部４および構文解析部５に出
力する。The voice section detection unit 3 detects the frame at which the utterance started and the frame at which the utterance ended, that is, the voice section, based on the characteristic parameter from the voice analysis unit 2,
It is output to the word spotting processing unit 4 and the syntax analysis unit 5.

【００２１】ワードスポッティング処理部４は、音声区
間検出部３で発話の開始されたフレームが検出される
と、入力された音声に対して、構文解析部５が単語の優
先度とともに出力した各単語のワードスポッティング
を、その単語の優先度に基づいて順次行い、そのワード
スポッティング結果を構文解析部５に出力する。When the speech section detection unit 3 detects a frame at which speech is started, the word spotting processing unit 4 outputs each word output from the syntax analysis unit 5 together with the priority of the word to the input speech. The word spotting is sequentially performed based on the priority of the word, and the word spotting result is output to the syntax analysis unit 5.

【００２２】即ち、ワードスポッティング処理部４は、
音声区間検出部３で発話の開始されたフレームが検出さ
れると、まず内蔵する入力バッファ（図示せず）に音声
分析部２からの音声の特徴パラメータを順次記憶し、そ
の入力バッファに記憶された音声の特徴パラメータを１
フレーム分ずつ読み出す。そして、ワードスポッティン
グ処理部４は、構文解析部５が単語の優先度とともに出
力した各単語の標準パターンを、その各単語の優先度に
基づいて、内蔵する入力バッファより順次読み出した入
力パターンとしての特徴パラメータにマッチングさせ、
その結果得られたスコアが所定の閾値以上であった場
合、そのスコアと、マッチングした区間の始点および終
点を構文解析部５に出力する。That is, the word spotting processing unit 4 is
When the speech section detection unit 3 detects a frame at which speech is started, first, the characteristic parameters of the speech from the speech analysis unit 2 are sequentially stored in a built-in input buffer (not shown) and stored in the input buffer. Set the characteristic parameter of the voice
Read frame by frame. Then, the word spotting processing unit 4 sets the standard pattern of each word output by the syntax analysis unit 5 together with the priority of the word as the input pattern sequentially read from the built-in input buffer based on the priority of each word. Match the characteristic parameters,
When the score obtained as a result is equal to or higher than a predetermined threshold value, the score and the start point and end point of the matched section are output to the syntax analysis unit 5.

【００２３】ここで、スコアとは、スポッティングした
単語の尤度を意味し、その値が大きいほど、スポッティ
ングした単語が確からしいものとする。Here, the score means the likelihood of the spotted word, and the larger the value, the more likely the spotted word is.

【００２４】構文解析部５は、音声区間検出部３で発話
の開始されたフレームが検出される前までに、入力され
る音声の仮説（文章仮説）をたて、音声の解析の初期段
階で必要となる単語すべてに最も高い優先度を付加し、
ワードスポッティング処理部４に出力するとともに、ワ
ードスポッティング処理を要求する。The syntactic analysis unit 5 establishes a hypothesis (sentence hypothesis) of the input voice before the speech section detection unit 3 detects the frame in which the utterance has started, and at the initial stage of the analysis of the voice. Add the highest priority to all required words,
It outputs to the word spotting processing unit 4 and requests the word spotting processing.

【００２５】さらに、構文解析部５は、音声区間検出部
３で発話の開始されたフレームが検出された後、ワード
スポッティング処理部４から、ワードスポッティング処
理を要求した各単語のスコア、始点、および終点が供給
されると、そのスコア、もしくは始点から終点までの区
間長、並びに内蔵する構文辞書（図示せず）にあらかじ
め登録された構文情報に基づいて、入力された音声を解
析し、その解析結果に基づいて、新たな文章仮説をたて
る。Furthermore, after the speech section detection unit 3 detects the frame in which the utterance has started, the syntax analysis unit 5 causes the word spotting processing unit 4 to make a score, a start point, and a score for each word for which the word spotting processing is requested. When the end point is supplied, the input voice is analyzed based on the score or the section length from the start point to the end point and the syntax information registered in advance in the built-in syntax dictionary (not shown), and the analysis is performed. Create a new text hypothesis based on the results.

【００２６】そして、構文解析部５は、新たな文章仮説
を解析するために必要となった単語に優先度を付加し、
ワードスポッティング処理部４に出力する。Then, the syntactic analysis unit 5 adds priorities to the words necessary for analyzing the new sentence hypothesis,
It is output to the word spotting processing unit 4.

【００２７】なお、この場合、単語の優先度は、構文解
析部５における文章仮説の検索（選択）方法に基づい
て、単語に付加される。即ち、構文解析部５において、
例えばbest-first法により、構文解析が行われている場
合には、確信度の高い文章仮説に関連する単語ほど、よ
り高い優先度が付加される。In this case, the priority of the word is added to the word based on the method of searching (selecting) the sentence hypothesis in the syntax analysis unit 5. That is, in the syntax analysis unit 5,
For example, when syntactic analysis is performed by the best-first method, a higher priority is added to a word associated with a sentence hypothesis having a higher certainty factor.

【００２８】構文解析部５は、音声区間検出部３で発話
の終了したフレームが検出された後、自身の構文解析結
果に基づいて、ワードスポッティング結果としての単語
候補列から、誤った（誤っていると思われる）単語候補
を除き、正しい（正しいと思われる）文（文章）を得る
と、それを外部機器操作部６に出力する。After the speech section detection unit 3 detects the frame in which the utterance has ended, the syntactic analysis unit 5 makes an erroneous (incorrectly) error from the word candidate sequence as the word spotting result based on its own syntactic analysis result. When a correct sentence (sentence) is obtained excluding the word candidates that are considered to be present, it is output to the external device operation unit 6.

【００２９】外部機器操作部６は、構文解析部５より出
力された文（文章）の内容に対応して、そこに接続され
た外部機器を操作する。The external device operation unit 6 operates the external device connected thereto in accordance with the content of the sentence (sentence) output from the syntax analysis unit 5.

【００３０】次に、その動作について説明する。まず音
声入力部１において、入力された音声が電気信号として
の音声信号に変換されてＡ／Ｄ変換され、音声分析部２
に出力される。音声分析部２において、音声入力部１か
らの音声信号から、１フレームごとの音声の特徴パラメ
ータが抽出され、音声区間検出部３およびワードスポッ
ティング処理部４に出力される。Next, the operation will be described. First, in the voice input unit 1, the input voice is converted into a voice signal as an electric signal and A / D converted, and the voice analysis unit 2
Is output to. In the voice analysis unit 2, the voice feature parameter for each frame is extracted from the voice signal from the voice input unit 1 and output to the voice section detection unit 3 and the word spotting processing unit 4.

【００３１】音声区間検出部３においては、音声分析部
２からの音声の特徴パラメータに基づいて、発話の開始
されたフレームおよびその終了したフレーム、即ち音声
区間が検出され、ワードスポッティング処理部４および
構文解析部５に出力される。The voice section detection unit 3 detects the frame in which the utterance has started and the frame in which the utterance has ended, that is, the voice section, based on the characteristic parameters of the voice from the voice analysis unit 2, and the word spotting processing unit 4 and It is output to the syntax analysis unit 5.

【００３２】同時に、構文解析部５では、音声区間検出
部３で発話の開始されたフレームが検出される前まで
に、入力される音声の文章仮説がたてられ、音声の解析
の初期段階で必要となる単語すべてに最も高い優先度が
付加されて、ワードスポッティング処理部４に出力され
る。At the same time, the syntactic analysis unit 5 sets a sentence hypothesis of the input voice before the speech section detection unit 3 detects the frame in which the speech is started, and at the initial stage of the analysis of the voice. The highest priority is added to all necessary words, and the words are output to the word spotting processing unit 4.

【００３３】そして、ワードスポッティング処理部４に
おいては、入力された音声に対して、構文解析部５が単
語の優先度とともに出力した各単語のワードスポッティ
ングが、例えば図２に示すフローチャートにしたがって
行われる。In the word spotting processing unit 4, the word spotting of each word output from the syntax analysis unit 5 together with the priority of the word is performed on the input voice, for example, according to the flowchart shown in FIG. .

【００３４】即ち、ワードスポッティング処理部４で
は、まずステップＳ１において、構文解析部５より出力
された（構文解析部５からワードスポッティング処理要
求（検索要求）のあった）単語すべての数を示す変数Ｗ
に０がセットされることにより初期化され、ステップＳ
２に進む。ステップＳ２において、構文解析部５より出
力された（構文解析部５からワードスポッティング処理
要求（検索要求）のあった）単語すべての数が変数Ｗに
セットされ、ステップＳ３に進み、音声区間検出部３で
発話の開始されたフレームが検出されたか否かが判定さ
れる。That is, in the word spotting processing unit 4, first in step S1, a variable indicating the number of all the words output from the syntax analyzing unit 5 (the word spotting processing request (search request) has been issued from the syntax analyzing unit 5). W
Is initialized by setting 0 to step S
Go to 2. In step S2, the number of all the words output from the syntax analysis unit 5 (the word spotting processing request (search request) from the syntax analysis unit 5) is set in the variable W, and the process proceeds to step S3, and the voice section detection unit In 3, it is determined whether or not the frame in which speech has started is detected.

【００３５】ステップＳ３において、音声区間検出部３
で発話の開始されたフレームが検出されていないと判定
された場合、ステップＳ２に戻り、ステップＳ３で音声
区間検出部３により発話の開始されたフレームが検出さ
れたと判定されるまで、ステップＳ２およびＳ３の処理
を繰り返す。In step S3, the voice section detector 3
When it is determined that the frame in which the utterance is started is not detected in step S2, the process returns to step S2, and the steps in step S2 and step S3 are performed until it is determined in step S3 that the frame in which the utterance is started is detected. The process of S3 is repeated.

【００３６】ステップＳ３において、音声区間検出部３
で発話の開始されたフレームが検出されたと判定された
場合、即ち音声入力部１に音声の入力が開始された場
合、ステップＳ４に進み、ワードスポッティング処理部
４がワードスポッティング処理する単語数Ｎを示す変数
Ｎに、変数Ｗに記憶された値（音声区間検出部３で発話
の開始されたフレームが検出される前までに、構文解析
部５よりワードスポッティング処理部４に処理要求のあ
った単語数Ｗ）がセットされ、ステップＳ５に進む。In step S3, the voice section detector 3
When it is determined that the frame in which the utterance is started is detected, that is, when the input of the voice to the voice input unit 1 is started, the process proceeds to step S4, and the word spotting processing unit 4 sets the number N of words to be word-spotted. In the variable N shown, the value stored in the variable W (the word requested to be processed by the word spotting processing unit 4 from the syntactic analysis unit 5 before the frame in which speech is started is detected by the voice section detection unit 3) (Several W) is set, and the process proceeds to step S5.

【００３７】ステップＳ５において、音声分析部２から
出力され、ワードスポッティング処理部４の内蔵する入
力バッファに既に記憶されている音声の特徴パラメータ
が１フレーム分だけ読み出され、ステップＳ６に進み、
ワードスポッティング処理部４の内蔵する入力バッファ
にデータ（音声分析部２より出力される音声の特徴パラ
メータ）が残っているか否かが判定される。ステップＳ
６において、ワードスポッティング処理部４の内蔵する
入力バッファにデータが残っていると判定された場合、
ステップＳ７に進み、ワードスポッティング処理部４が
ワードスポッティング処理する単語数Ｎを示す変数Ｎが
１だけデクリメントされ、ステップＳ８に進む。In step S5, the voice feature parameters output from the voice analysis unit 2 and already stored in the input buffer incorporated in the word spotting processing unit 4 are read out for one frame, and the process proceeds to step S6.
It is determined whether or not data (characteristic parameter of the voice output from the voice analysis unit 2) remains in the input buffer built in the word spotting processing unit 4. Step S
When it is determined that the data remains in the input buffer incorporated in the word spotting processing unit 4 in 6,
In step S7, the word spotting processing unit 4 decrements the variable N indicating the number N of words to be subjected to word spotting by 1, and then proceeds to step S8.

【００３８】一方、ステップＳ６において、ワードスポ
ッティング処理部４の内蔵する入力バッファにデータが
残っていないと判定された場合、ステップＳ７をスキッ
プして、ステップＳ８に進み、構文解析部５が単語の優
先度とともに出力した単語の標準パターンのうち、優先
度の高い順にＮ個の単語の標準パターンが、内蔵する入
力バッファより時系列に読み出された入力パターンとし
ての特徴パラメータに順次マッチングされ、その結果得
られたスコアが所定の閾値以上であった場合、スコア、
始点および終点が、ワードスポッティング処理部４の内
蔵する出力バッファ（図示せず）に供給されて記憶され
る。On the other hand, if it is determined in step S6 that no data remains in the input buffer built in the word spotting processing section 4, then step S7 is skipped and step S8 follows. Among the standard patterns of the words output together with the priority, the standard patterns of N words in the descending order of priority are sequentially matched with the characteristic parameters as the input patterns read in time series from the built-in input buffer. If the resulting score is greater than or equal to a predetermined threshold, the score,
The start point and the end point are supplied to and stored in an output buffer (not shown) built in the word spotting processing unit 4.

【００３９】即ち、ステップＳ８においては、構文解析
部５が単語の優先度とともに出力した単語のうち、優先
度の高いＮ個の単語がワードスポッティングされる。That is, in step S8, among the words output by the syntax analysis unit 5 together with the priority of the words, the N high-priority words are word-spotted.

【００４０】ステップＳ８のワードスポッティング処理
の後、ステップＳ９に進み、ワードスポッティング処理
部４の内蔵する入力バッファにデータ（音声分析部２よ
り出力される音声の特徴パラメータ）が残っているか否
かが判定される。ステップＳ９において、ワードスポッ
ティング処理部４の内蔵する入力バッファにデータが残
っていないと判定された場合、ステップＳ１０に進み、
ワードスポッティング処理部４がワードスポッティング
処理する単語数Ｎを示す変数Ｎが１だけインクリメント
され、ステップＳ１１に進む。After the word spotting process in step S8, the process proceeds to step S9, and it is determined whether or not data (speech characteristic parameter output from the voice analysis unit 2) remains in the input buffer incorporated in the word spotting processing unit 4. To be judged. If it is determined in step S9 that no data remains in the input buffer built in the word spotting processing unit 4, the process proceeds to step S10.
The variable N indicating the number N of words subjected to word spotting processing by the word spotting processing unit 4 is incremented by 1, and the process proceeds to step S11.

【００４１】一方、ステップＳ９において、ワードスポ
ッティング処理部４の内蔵する入力バッファにデータが
残っていると判定された場合、ステップＳ１０をスキッ
プして、ステップＳ１１に進み、構文解析部５から新た
なワードスポッティング処理要求（検索要求）があれ
ば、その要求のあった単語の数が変数Ｗに加算され、ス
テップＳ１２に進む。On the other hand, in step S9, when it is determined that the data remains in the input buffer built in the word spotting processing section 4, step S10 is skipped, the process proceeds to step S11, and the syntactic analysis section 5 creates a new one. If there is a word spotting processing request (search request), the number of requested words is added to the variable W, and the process proceeds to step S12.

【００４２】ステップＳ１２において、ワードスポッテ
ィング処理部４の内蔵する出力バッファにワードスポッ
ティング結果としてのスコア、始点、および終点が記憶
されていれば、それが構文解析部５に供給され、ステッ
プＳ１３に進み、音声区間検出部３で発話の終了したフ
レームが検出されたか否かが判定されるとともに、ワー
ドスポッティング処理部４の内蔵する入力バッファが空
であるか否かが判定される。In step S12, if the output buffer incorporated in the word spotting processing unit 4 stores the score, the start point, and the end point as the word spotting result, the score is supplied to the syntax analysis unit 5, and the process proceeds to step S13. The voice section detection unit 3 determines whether or not the frame in which the utterance has ended is detected, and whether or not the input buffer incorporated in the word spotting processing unit 4 is empty.

【００４３】ステップＳ１３において、音声区間検出部
３で発話の終了したフレームが検出されていないと判定
されるか、またはワードスポッティング処理部４の内蔵
する入力バッファが空でないと判定された場合、ステッ
プＳ５に戻り、再びステップＳ５からの処理を繰り返
す。If it is determined in step S13 that the frame in which the utterance has ended is not detected by the voice section detector 3 or the input buffer incorporated in the word spotting processor 4 is not empty, step S13 Returning to S5, the processing from step S5 is repeated again.

【００４４】ステップＳ１３において、音声区間検出部
３で発話の終了したフレームが検出されたと判定され、
且つワードスポッティング処理部４の内蔵するバッファ
が空であると判定された場合、図３に示すステップＳ２
１に進み、今までに構文解析部５からワードスポッティ
ング処理の依頼のあった単語の中で、まだワードスポッ
ティング処理が行われていない単語のうち、優先度の最
も高い単語の数が、変数Ｎにセットされ、ステップＳ２
２に進む。In step S13, it is determined that the speech section detection unit 3 has detected the frame for which speech has ended,
If it is determined that the internal buffer of the word spotting processing unit 4 is empty, step S2 shown in FIG.
In step 1, the number of words having the highest priority among the words for which word parsing processing has been requested by the syntax analysis unit 5 has not been performed. Is set to step S2
Go to 2.

【００４５】ステップＳ２２において、入力の終了した
音声全体に対して、構文解析部５から今までにワードス
ポッティング処理の依頼のあった単語の中で、まだワー
ドスポッティング処理が行われていない単語のうち、優
先度の最も高いＮ個の単語のワードスポッティング処理
が行われ、ステップＳ２３に進み、ステップＳ２２での
ワードスポッティング処理結果（スコア、始点、および
終点）が構文解析部５に出力され、ステップＳ２４に進
む。In step S22, of the words for which the word parsing process has been requested by the syntactic analysis unit 5 for the entire input speech, the word spotting process has not been performed yet. , The word spotting process of the N highest priority words is performed, the process proceeds to step S23, and the word spotting process result (score, start point, and end point) in step S22 is output to the syntax analysis unit 5, and step S24. Proceed to.

【００４６】ステップＳ２４において、変数Ｗから変数
Ｎが減算され、その減算値が変数Ｗにセットされ、ステ
ップＳ２５に進む。即ち、ステップＳ２４において、今
までに構文解析部５からワードスポッティング処理の依
頼のあった単語の中で、まだワードスポッティング処理
が行われていない単語の数から、ステップＳ２２でワー
ドスポッティング処理が行われた単語の数が減算され、
ステップＳ２５に進む。In step S24, the variable N is subtracted from the variable W, the subtracted value is set in the variable W, and the process proceeds to step S25. That is, in step S24, the word spotting process is performed in step S22 from the number of words that have not been subjected to the word spotting process among the words for which the syntax analysis unit 5 has requested the word spotting process. The number of words
It proceeds to step S25.

【００４７】ステップＳ２５において、構文解析部５か
ら新たなワードスポッティング処理要求（検索要求）が
あれば、その要求のあった単語の数が変数Ｗに加算さ
れ、ステップＳ２６に進み、構文解析部５からのワード
スポッティング処理要求（検索要求）が終了したか否か
が判定されるとともに、変数Ｗが０であるか否かが判定
される。In step S25, if there is a new word spotting processing request (search request) from the syntax analysis unit 5, the number of requested words is added to the variable W, and the process proceeds to step S26, where the syntax analysis unit 5 It is determined whether or not the word spotting processing request (search request) has been completed, and whether or not the variable W is 0 is determined.

【００４８】ステップＳ２６において、構文解析部５か
らのワードスポッティング処理要求（検索要求）が終了
していないと判定されるか、または変数Ｗが０でないと
判定された場合、ステップＳ２１に戻り、再びステップ
Ｓ２１からの処理を繰り返す。If it is determined in step S26 that the word spotting processing request (search request) from the syntax analysis unit 5 has not ended, or if it is determined that the variable W is not 0, the process returns to step S21, and again. The processing from step S21 is repeated.

【００４９】ステップＳ２６において、構文解析部５か
らのワードスポッティング処理要求（検索要求）が終了
したと判定され、且つ変数Ｗが０であると判定された場
合、処理を終了する。When it is determined in step S26 that the word spotting processing request (search request) from the syntax analysis unit 5 has been completed and the variable W is 0, the processing is completed.

【００５０】一方、構文解析部５では、ワードスポッテ
ィング処理部４から、ワードスポッティング処理を要求
した各単語のスコア、始点、および終点が供給される
と、そのスコア、もしくは始点から終点までの区間長、
並びに内蔵する構文辞書にあらかじめ登録された構文情
報に基づいて、入力された音声が解析され、その解析結
果に基づいて、新たな文章仮説がたてられるとともに、
その新たな文章仮説を解析するために必要となった単語
に、上述したようにして優先度が付加され（以前にワー
ドスポッティング処理の要求をした単語については、そ
の優先度が変更され）、ワードスポッティング処理部４
に出力される。On the other hand, in the syntactic analysis unit 5, when the word spotting processing unit 4 supplies the score, start point, and end point of each word for which word spotting processing is requested, the score or the section length from the start point to the end point is supplied. ,
Also, based on the syntax information registered in advance in the built-in syntax dictionary, the input voice is analyzed, based on the analysis result, a new sentence hypothesis is created,
Priority is added to the words needed to analyze the new sentence hypothesis as described above (for words for which a word spotting process was previously requested, the priority is changed), and the word Spotting processing unit 4
Is output to.

【００５１】そして、構文解析部５においては、音声区
間検出部３で発話の終了したフレームが検出された後、
自身の構文解析結果に基づいて、ワードスポッティング
結果としての単語候補列から、誤った（誤っていると思
われる）単語候補が除かれ、正しい（正しいと思われ
る）文（文章）が得られると、それが外部機器操作部６
に出力される。In the syntactic analysis unit 5, after the speech section detection unit 3 detects the frame in which the utterance is completed,
Based on the result of its own parsing, the word candidate string as a word spotting result is excluded from the wrong (probably wrong) word candidate, and a correct (presumably correct) sentence (sentence) is obtained. , That is the external device operation part 6
Is output to.

【００５２】外部機器操作部６においては、構文解析部
５より出力された文（文章）の内容に対応して、そこに
接続された外部機器が操作される。即ち、外部機器操作
部６に接続された外部機器が、例えばＡＶ機器であり、
構文解析部５より出力された文（文章）が、例えば”Ｃ
Ｄ再生”であった場合、外部機器動作部６において、Ｃ
Ｄの再生が開始されるように、ＡＶ機器が操作される。In the external device operating unit 6, the external device connected to the external device operating unit 6 is operated according to the content of the sentence (sentence) output from the syntax analyzing unit 5. That is, the external device connected to the external device operation unit 6 is, for example, an AV device,
The sentence (sentence) output from the syntax analysis unit 5 is, for example, “C
In the case of “D playback”, in the external device operation unit 6, C
The AV device is operated so that the reproduction of D is started.

【００５３】以上のように、音声の入力中においては
（ステップＳ１乃至Ｓ１３）、ワードスポッティング処
理部４の内蔵する入力バッファにデータ（音声の特徴パ
ラメータ）が残っており、ワードスポッティング処理部
４における処理が遅れている場合、ワードスポッティン
グ処理する単語数Ｎが減少され（ステップＳ７）、ま
た、ワードスポッティング処理部４の内蔵する入力バッ
ファが空で、ワードスポッティング処理部４における処
理に余裕がある場合、ワードスポッティング処理する単
語数Ｎが増加され（ステップＳ１０）、ワードスポッテ
ィング処理部４において、構文解析部５からワードスポ
ッティング処理の依頼のあった単語のうち、優先度の高
い順にＮ個の単語のワードスポッティング処理が行われ
る（ステップＳ８）。As described above, during voice input (steps S1 to S13), data (voice feature parameter) remains in the input buffer built in the word spotting processing unit 4, and the word spotting processing unit 4 operates. When the processing is delayed, the number N of words to be subjected to the word spotting processing is reduced (step S7), and the input buffer incorporated in the word spotting processing unit 4 is empty, so that the word spotting processing unit 4 has a sufficient processing capacity. The number N of words to be word-spotted is increased (step S10), and in the word-spotting processing unit 4, among the words for which word-spotting processing has been requested by the syntax analysis unit 5, N words are ordered in descending order of priority. Word spotting processing is performed (step S8).

【００５４】また、音声の入力の終了後においては（ス
テップＳ２１乃至Ｓ２６）、ワードスポッティング処理
部４において、構文解析部５からワードスポッティング
処理の依頼のあった単語のうち、優先度の最も高い単語
のグループから順次ワードスポッティング処理される。After the voice input is completed (steps S21 to S26), the word spotting processing unit 4 has the highest priority word out of the words requested by the syntax analyzing unit 5 for the word spotting processing. Word spotting processing is performed sequentially from the group.

【００５５】従って、ワードスポッティング処理部４に
おいては、優先度の高い単語、即ち構文解析部５で構文
解析を行うのにより必要な単語から、順次ワードスポッ
ティング処理が行われるので、発話者の発話に対するレ
スポンスを向上させることができる。Therefore, the word spotting processing unit 4 sequentially performs word spotting processing from a word having a high priority, that is, a word required by the syntax analysis unit 5 to perform the syntax analysis. The response can be improved.

【００５６】さらに、ワードスポッティング処理部４の
処理能力に対応して、ワードスポッティング処理する単
語数を変更するようにしたので、ワードスポッティング
処理部４がリアルタイムで処理することのできる数を越
えた単語が、構文解析部５から与えられても、迅速に処
理を行うことができる。Furthermore, since the number of words to be word-spotted is changed in accordance with the processing capability of the word spotting processor 4, the number of words that can be processed in real time by the word spotting processor 4 is exceeded. However, even if given by the syntactic analysis unit 5, the processing can be performed quickly.

【００５７】以上、本発明の音声認識装置を、外部機器
コントローラに適用した場合について説明したが、本発
明は、外部機器コントローラの他、音声を認識するあら
ゆる装置に適用することができる。The case where the voice recognition device of the present invention is applied to an external device controller has been described above, but the present invention can be applied to any device that recognizes voice in addition to the external device controller.

【００５８】なお、本実施例では、ワードスポッティン
グ処理部４におけるワードスポッティングの方法につい
ては言及しなかったが、ワードスポッティング処理部４
においては、例えば、例えばＤＰマッチング法やＨＭＭ
法、特開昭６０−２４９１９８、特開昭６０−２４９１
９９、または特開昭６０−２５２３９６などに開示され
ている音声認識装置の音声認識アルゴリズムなど、あら
ゆる音声認識アルゴリズムに基づいて、ワードスポッテ
ィング処理するようにすることができる。Although the word spotting method in the word spotting processing unit 4 is not mentioned in the present embodiment, the word spotting processing unit 4 is not mentioned.
For example, for example, DP matching method or HMM
Method, JP-A-60-249198, JP-A-60-24991
The word spotting process can be performed based on any voice recognition algorithm, such as the voice recognition algorithm of the voice recognition device disclosed in JP-A-60-252396.

【００５９】さらに、本実施例においては、音声分析部
２で、音声から、線形予測係数を音声の特徴パラメータ
として抽出するようにしたが、これに限られるものでは
ない。即ち、音声分析部２では、音声から、例えば所定
の周波数帯域幅ごとのパワーやケプストラム係数、パー
コール係数、フォルマント、ゼロクロス数などのあらゆ
る特徴パラメータを抽出するようにすることができる。Further, in the present embodiment, the voice analysis unit 2 extracts the linear prediction coefficient from the voice as the feature parameter of the voice, but the present invention is not limited to this. That is, the voice analysis unit 2 can extract all characteristic parameters such as power, cepstrum coefficient, Percoll coefficient, formant, and zero-cross number for each predetermined frequency bandwidth from the voice.

【００６０】[0060]

【発明の効果】請求項１に記載の音声認識装置によれ
ば、認識手段が、音声認識処理する単語の単語数を、解
析手段により付加された単語の優先度または自身の音声
認識処理能力に基づいて変更し、さらに、単語の優先度
に基づいて、音声中からの単語の音声認識処理を行う。
従って、入力された音声に対する応答処理の迅速化を図
ることができる。According to the speech recognition apparatus of the first aspect, the recognition means determines the number of words to be subjected to the speech recognition processing by the priority of the words added by the analysis means or the own speech.
The speech recognition processing is performed based on the recognition processing capability, and the speech recognition processing of the word in the speech is performed based on the priority of the word.
Therefore, it is possible to speed up the response process to the input voice.

【００６１】請求項２に記載の音声認識装置によれば、
認識手段に、音声区間中と音声区間終了後とで、単語の
優先度に基づいて、音声中から音声認識する単語を変更
させる。従って、例えば音声区間終了後には、優先度の
最も高い単語だけの音声認識処理を行うようにすること
ができるので、装置のリアルタイム性を向上させること
ができる。According to the voice recognition device of the second aspect,
The recognition unit changes the word to be recognized in the voice from the voice based on the priority of the word during the voice section and after the end of the voice section. Therefore, for example, after the end of the voice section, it is possible to perform the voice recognition processing only on the word having the highest priority, so that the real-time property of the device can be improved.

【００６２】請求項３に記載の音声認識装置によれば、
解析手段に、認識手段の認識結果の解析結果に基づい
て、単語の優先度を変更させるので、入力された音声に
対する応答処理の迅速化をさらに図ることができる。According to the voice recognition device of the third aspect,
Since the analysis unit changes the priority of the word based on the analysis result of the recognition result of the recognition unit, it is possible to further speed up the response process to the input voice.

【００６３】[0063]

[Brief description of drawings]

【図１】本発明の音声認識装置を応用した外部機器コン
トローラの一実施例の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of an external device controller to which a voice recognition device of the present invention is applied.

【図２】図１の実施例のワードスポッティング処理部４
の動作を説明するフローチャートである。FIG. 2 is a word spotting processing unit 4 of the embodiment shown in FIG.
3 is a flowchart illustrating the operation of the above.

【図３】図２のフローチャートに続くフローチャートで
ある。FIG. 3 is a flowchart following the flowchart of FIG.

[Explanation of symbols]

１音声入力部２音声分析部３音声区間検出部４ワードスポッティング処理部５構文解析部６外部機器操作部 1 Voice input section 2 Speech analysis section 3 Voice section detector 4 Word spotting processing section 5 Parsing part 6 External device operation section

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平１−255925（ＪＰ，Ａ) 特開平６−161488（ＪＰ，Ａ) 特開平３−177899（ＪＰ，Ａ) 特開昭63−165900（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 15/28 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-1-255925 (JP, A) JP-A-6-161488 (JP, A) JP-A-3-177899 (JP, A) JP-A-63- 165900 (JP, A) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 15/00-15/28 JISST file (JOIS)

Claims

(57) [Claims]

1. A recognition unit for recognizing a voice, and an analysis unit for requesting the recognition unit to perform a voice recognition process of a word in the voice and analyzing a recognition result of the recognition unit to understand the voice. And the analysis means adds a priority for speech recognition processing to the word and supplies the word to the recognition means, and the recognition means calculates the number of words of the speech recognition processing as
Based on written word priority or own speech recognition processing ability
The speech recognition apparatus is characterized in that the speech recognition processing of the word in the speech is performed based on the priority of the word.

2. A detection unit for detecting a voice section of the voice is further provided, wherein the recognizing unit selects from among the voice based on the priority of the word during the voice section and after the end of the voice section. The voice recognition apparatus according to claim 1, wherein the voice recognition word is changed.

3. The voice recognition device according to claim 1 , wherein the analysis unit changes the priority of the word based on an analysis result of a recognition result of the recognition unit.