JP2000181487A

JP2000181487A - Voice recognition device

Info

Publication number: JP2000181487A
Application number: JP10354332A
Authority: JP
Inventors: Takahide Takahashi; 隆英高橋; Kenichi Yamamoto; 健一山本
Original assignee: Toshiba TEC Corp
Current assignee: Toshiba TEC Corp
Priority date: 1998-12-14
Filing date: 1998-12-14
Publication date: 2000-06-30

Abstract

PROBLEM TO BE SOLVED: To improve the recognition rate of 2nd and succeeding voice recognition when the voice of the same word or phrase is recognized again. SOLUTION: This device is provided with a voice input part 17 for inputting the voice of a speaker, a 1st voice recognition resource 31 which stores the word or phrase to be recognized in advance, a voice recognition part 32 which recognizes the word or phrase from the voice inputted from the voice input part 17, extracts the recognized word or phrase from the 1st voice recognition resource 31, and outputs it, and a voice recognition resource generation part 34 which narrows down words and phrases stored in the 1st voice recognition resource 31 according to the extracted word or phrase to generate a 2nd voice recognition resource 35. Here, when a word or phrase inputted again from the voice input part 17 is recognized, a word or phrase is extracted from the 2nd voice recognition resource 35.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力した音声を認
識し、その認識結果に相当する語句を予め記憶した語句
から抽出する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing an input speech and extracting a phrase corresponding to the recognition result from a previously stored phrase.

【０００２】[0002]

【従来の技術】従来の音声認識装置は、図１５に示すよ
うに音声を入力するマイク１ａとこのマイクからの音声
をデジタル信号に変換するＡ／Ｄ変換器１ｂを備える音
声入力部１、予め認識されるべき語句と各語句に対して
定義した認識コードからなる音声認識リソース２、この
音声入力部１からの出力に基づいて語句を認識し、その
語句に対応する認識コードを音声認識リソース２に基づ
いて抽出する音声認識部３、音声認識部３で抽出した認
識コードに基づいて商品の登録、商品の売上金額の算出
などを行うアプリケーションプログラム部４から構成さ
れる。2. Description of the Related Art As shown in FIG. 15, a conventional voice recognition apparatus has a voice input unit 1 having a microphone 1a for inputting voice and an A / D converter 1b for converting voice from the microphone into a digital signal. A speech recognition resource 2 composed of a word to be recognized and a recognition code defined for each word; a word is recognized based on an output from the speech input unit 1; The speech recognition unit 3 extracts a product based on the recognition code, and an application program unit 4 that registers a product based on the recognition code extracted by the speech recognition unit 3, calculates a sales amount of the product, and the like.

【０００３】上記音声認識リソース２は、図１６に示す
ように一般的なＢＮ記法（Backus-Naur-form；コンピュ
ータ言語を形式的に定義するときに使うメタ言語であ
り、バッカス記法、ＢＮＦともいう。）で記述されてい
る。[0003] The speech recognition resource 2 is a meta-language used to formally define a general BN notation (Backus-Naur-form; computer language) as shown in FIG. .).

【０００４】図１６において、「＜ＳＥＮＴＥＮＣＥ＞
＝＜商品金額＞円の＜商品種類＞を＜販売個数＞枚です
ね．」とあるが、これは例えば「５０円の普通紙を３枚
ですね」という文が音声入力された場合に、「５０」を
商品金額、「普通紙」を商品種類、「３」を販売個数」
と定義するとの意味である。In FIG. 16, "<SENTENCE>
= <Product price> Yen <Product type> is <Sale quantity>. For example, when the sentence “3 plain 50 yen papers” is input by voice, “50” is the product price, “plain paper” is the product type, and “3” is sold. Quantity)
It means to define.

【０００５】また、「＜商品金額＞＝１〜９９９９
９．」とあるが、これは商品金額が１〜９９９９９のい
ずれかであるとの意味である。また、「＜商品種類＞＝
写真｜カード｜用紙｜ざら紙…｜ブロマイド．」とある
が、これは商品種類は写真、カード、用紙、ざら紙、…
ブロマイドのいずれかであるとの意味である。さらに、
「＜販売個数＞＝１〜９９９９９．」とあるが、これは
販売個数が１〜９９９９９のいずれかであるとの意味で
ある。これらの語句（金額の語句を含む）が予め認識さ
れる語句として定義されているものである。Also, "<commodity amount> = 1 to 9999
9. , Which means that the price of the merchandise is one of 1 to 99999. Also, "<product type> =
Photo ｜ Card ｜ Paper ｜ Rough paper… ｜ Bromide. "This means that the product types are photo, card, paper, rough paper, ...
It is one of bromides. further,
There is "<sales quantity> = 1 to 99999.", which means that the sales quantity is one of 1 to 99999. These phrases (including the phrase of the amount of money) are defined as phrases recognized in advance.

【０００６】このような装置では、例えば操作者（話
者）が「５０円の普通紙を３枚ですね」という音声を音
声入力部１から入力すると、予め定義されている語句、
すなわち商品金額として「５０」、商品種類として「普
通紙」、販売個数として「３」が認識され、各語句に対
応する認識コードがアプリケーションプログラム部４へ
出力されるようになっていた。In such an apparatus, for example, when an operator (speaker) inputs a voice "3 sheets of plain paper of 50 yen" from the voice input unit 1, a predetermined phrase,
That is, "50" is recognized as the product price, "plain paper" as the product type, and "3" as the sales quantity, and a recognition code corresponding to each word is output to the application program unit 4.

【０００７】しかし、このような音声認識装置における
音声認識リソースは、膨大な語句を記憶してあり、各項
目の組合せも膨大となり、その中から１つの認識結果を
出すことになるので、誤認識することが多かった。従っ
て、従来は、音声入力した後に、同じ語句について再度
確認の音声入力ができるような会話形式で入力させるよ
うになっていた。However, the speech recognition resources in such a speech recognition apparatus store a huge number of words and phrases, and the number of combinations of items becomes enormous, and one recognition result is output from them. I often did. Therefore, conventionally, after the voice input, the same phrase is input in a conversation format in which a voice input for confirmation can be made again.

【０００８】例えば客から「普通紙を下さい」との注文
を受けたら「普通紙ですね」と音声入力し、客に「どれ
くらいにしましょう」と質問すると、客が「５０円の普
通紙を３枚下さい」と答えたとすれば、「確認します
と、５０円の普通紙を３枚ですね」と音声入力する。こ
れにより、同じ語句「普通紙」について２回音声入力す
ることになる。[0008] For example, when a customer receives an order for "please give plain paper", he or she inputs a voice saying "this is plain paper," and asks the customer "how long should it be?" If you answered "Please, three sheets", you would say "If you check, three sheets of plain 50 yen paper." As a result, the voice input for the same phrase “plain paper” is performed twice.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、従来の
装置では、同じ語句について２回目以降に音声入力した
場合においても、同じ音声認識リソース４を使用して音
声認識をやり直すので、２回目の音声入力においても１
回目の音声入力と同じくらい大きな確率で誤認識する可
能性がある。これでは、せっかく確認のために複数回音
声入力するようにした効果を十分に発揮できないという
問題があった。However, in the conventional apparatus, even if the same word is input for the second and subsequent times, the voice recognition is performed again using the same voice recognition resource 4. At 1
There is a possibility that it will be misrecognized with a probability as large as the second voice input. In this case, there is a problem that the effect of inputting a plurality of voices for confirmation is not sufficiently exhibited.

【００１０】また、一般に音声認識リソース４に定義さ
れている語句の数が多いほど誤認識し易く、発音が類似
する語句が多いほど誤認識し易いので、２回目以降の音
声認識においては、１回目の音声認識による結果に基づ
いて２回目以降に認識される可能性のある語句を絞込ん
だ方が認識率が向上すると考えられる。In general, the greater the number of phrases defined in the speech recognition resource 4, the easier it is to misrecognize, and the more words with similar pronunciations, the easier it is to misrecognize them. It is considered that the recognition rate is improved by narrowing down the words that are likely to be recognized after the second time based on the result of the second voice recognition.

【００１１】そこで、本発明は、同じ語句について再度
音声認識する場合に、２回目以降の音声認識の認識率を
向上させることができる音声認識装置を提供しようとす
るものである。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech recognition apparatus capable of improving the recognition rate of the second and subsequent speech recognitions when recognizing the same phrase again.

【００１２】[0012]

【課題を解決するための手段】請求項１の本発明は、話
者の音声を入力するための音声入力手段と、この音声入
力手段から入力した音声から語句を認識する音声認識手
段と、予め認識されるべき複数の語句を記憶する第１の
音声認識リソースと、音声認識手段で認識された語句を
第１の音声認識リソースから抽出する語句抽出手段と、
この語句抽出手段で抽出された語句に基づいて第１の音
声認識リソースに記憶された語句を絞込んで第２の音声
認識リソースを生成する音声認識リソース生成手段とを
設け、音声入力手段から再入力された語句を認識すると
きには、第２の音声認識リソースから語句を抽出するこ
とを特徴とする音声認識装置である。According to a first aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and A first speech recognition resource that stores a plurality of phrases to be recognized, a phrase extraction unit that extracts the phrase recognized by the speech recognition unit from the first speech recognition resource,
A speech recognition resource generation unit for narrowing down the words stored in the first speech recognition resource based on the words extracted by the word extraction unit to generate a second speech recognition resource; When recognizing an inputted phrase, the speech recognition apparatus is characterized by extracting a phrase from a second speech recognition resource.

【００１３】請求項２の本発明は、音声認識リソース生
成手段は、語句抽出手段で第１の音声認識リソースから
抽出された語句を予め認識されるべき語句として第２の
音声認識リソースを生成することを特徴とする請求項１
記載の音声認識装置である。According to a second aspect of the present invention, the voice recognition resource generating means generates a second voice recognition resource as a word to be recognized in advance by using the word extracted from the first voice recognition resource by the word extracting means. 2. The method according to claim 1, wherein
It is a speech recognition device of the description.

【００１４】請求項３の本発明は、第１の音声認識リソ
ースは、複数の予め認識されるべき語句を語句群ごとに
記憶し、音声認識リソース生成手段は、第１の音声認識
リソースの語句群のうち、語句抽出手段で第１の音声認
識リソースから抽出された語句を含む語句群の音声認識
リソースを第２の音声認識リソースとして生成すること
を特徴とする請求項１記載の音声認識装置である。According to a third aspect of the present invention, the first speech recognition resource stores a plurality of phrases to be recognized in advance for each phrase group, and the speech recognition resource generating means includes: 2. The speech recognition apparatus according to claim 1, wherein a speech recognition resource of a phrase group including the phrase extracted from the first speech recognition resource by the phrase extraction unit is generated as a second speech recognition resource. It is.

【００１５】請求項４の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句を記憶する第１の音声認識リソース
と、音声認識手段で認識された語句が予め認識されるべ
き語句を第１の音声認識リソースから抽出する語句抽出
手段と、語句抽出手段で第１の音声認識リソースから抽
出された語句を候補語句Ａとするとともに、第１の音声
認識リソースに記憶された語句を候補語句Ｂとする第２
の音声認識リソースを生成する音声認識リソース生成手
段とを設け、音声入力手段から再入力された語句を音声
認識するときには、第２の音声認識リソースの候補語句
Ａ及び候補語句Ｂのいずれかから語句を抽出することを
特徴とする音声認識装置である。According to a fourth aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a plurality of voice recognition means to be recognized in advance. A first speech recognition resource for storing a phrase, a phrase extracting unit for extracting from the first speech recognition resource a phrase for which the phrase recognized by the speech recognition unit is to be recognized in advance, and a first speech for the phrase extracting unit A phrase extracted from the recognition resource is set as a candidate phrase A, and a phrase stored in the first speech recognition resource is set as a candidate phrase B.
And a voice recognition resource generating means for generating a voice recognition resource of the second type. When voice recognition of a word re-input from the voice input means is performed, the word from one of the candidate words A and B of the second voice recognition resource is used. Is a speech recognition device characterized by extracting

【００１６】請求項５の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句を語句群ごとに記憶した第１の音声
認識リソースと、音声認識手段で認識された語句を第１
の音声認識リソースから抽出する語句抽出手段と、語句
抽出手段で第１の音声認識リソースから抽出された語句
を候補語句Ａとするとともに、第１の音声認識リソース
の語句群のうち、語句抽出手段で抽出された語句を含む
語句群を候補語句Ｂとする第２の音声認識リソースを生
成する音声認識リソース生成手段とを設け、音声入力手
段から再入力された語句を音声認識するときには、第２
の音声認識リソースの候補語句Ａ及び候補語句Ｂのいず
れかから語句を抽出することを特徴とする音声認識装置
である。According to a fifth aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a plurality of voice recognition means to be recognized in advance. A first speech recognition resource in which words are stored for each word group, and
And a phrase extracted from the first speech recognition resource by the phrase extraction unit as a candidate phrase A, and a phrase extraction unit of a phrase group of the first speech recognition resource. And a voice recognition resource generating means for generating a second voice recognition resource that uses the word group including the word extracted in step 2 as a candidate word B.
A phrase is extracted from any one of the candidate phrases A and B of the speech recognition resource.

【００１７】請求項６の本発明は、話者の音声を入力す
るための音声入力手段と、この音声入力手段から入力し
た音声から語句を認識する音声認識手段と、予め認識さ
れるべき複数の語句として金額の語句を記憶する第１の
音声認識リソースと、音声認識手段で認識された語句を
第１の音声認識リソースから抽出する語句抽出手段と、
第１の音声認識リソースの金額の語句から商品の売上金
額より低い金額の語句を削除したものを第２の音声認識
リソースとして生成する音声認識リソース生成手段とを
設け、音声入力手段から預り金額を音声入力するときに
は、第２の音声認識リソースから金額の語句を抽出する
ことを特徴とする音声認識装置である。According to a sixth aspect of the present invention, there is provided a voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from the voice input from the voice input means, A first speech recognition resource that stores a phrase of a price as a phrase, a phrase extraction unit that extracts a phrase recognized by the speech recognition unit from the first speech recognition resource,
Voice recognition resource generating means for generating, as a second voice recognition resource, what is obtained by deleting a phrase having a price lower than the sales amount of the product from the price term of the first voice recognition resource; A speech recognition apparatus characterized in that when speech is input, a price phrase is extracted from a second speech recognition resource.

【００１８】[0018]

【発明の実施の形態】以下、本発明の第１の実施の形態
を図１ないし図７を参照して説明する。図１は、本実施
の形態にかかる音声認識装置の構成を示すブロック図
で、１１は制御部本体を構成するＣＰＵ（中央処理装
置）、１２はこのＣＰＵ１１が実行するプログラムデー
タを格納したＲＯＭ（リード・オンリ・メモリ）、１３
は各種データ処理のために使用されるメモリ等を設けた
ＲＡＭ（ランダム・アクセス・メモリ）、１４はハード
ディスク装置（ＨＤＤ）、１５は作業者（話者）が各種
操作を行うためのキーを設けたキーボード１６を接続し
たキーボード制御部、１７は音声をアナログ信号として
入力するマイク１８とこのマイク１８からの音声をアナ
ログ信号として入力した音声をデジタル信号に変換する
Ａ／Ｄ変換器１９を備えた音声入力手段としての音声入
力部、２０は入力した音声を認識した結果などを表示す
る表示部２１を制御する表示制御部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing the configuration of a speech recognition apparatus according to the present embodiment. Reference numeral 11 denotes a CPU (central processing unit) constituting a control unit main body, and 12 denotes a ROM (ROM) storing program data to be executed by the CPU 11. Read-only memory), 13
Is a RAM (random access memory) provided with a memory used for various data processing, 14 is a hard disk drive (HDD), and 15 is a key provided for a worker (speaker) to perform various operations. The keyboard control unit 17 connected to the keyboard 16 includes a microphone 18 for inputting audio as an analog signal and an A / D converter 19 for converting audio input from the microphone 18 as an analog signal to a digital signal. A voice input unit 20 as a voice input unit is a display control unit that controls a display unit 21 that displays a result of recognition of the input voice.

【００１９】上記ＣＰＵ１１と、ＲＯＭ１２、ＲＡＭ１
３、ハードディスク装置１４、キーボード制御部１５、
Ａ／Ｄ変換器１９、表示制御部２０とは、それぞれデー
タバス、制御バス、アドレスバスなどのバスラインで接
続されている。The CPU 11, ROM 12, RAM 1
3, hard disk device 14, keyboard control unit 15,
The A / D converter 19 and the display control unit 20 are connected to each other by bus lines such as a data bus, a control bus, and an address bus.

【００２０】図２は、本実施の形態にかかる音声認識装
置の構成を示す機能ブロック図であり、３１は予め認識
されるべき語句と各語句に対して定義された(関連づけ
られた)認識コードからなる第１の音声認識リソース、
３２は音声入力部１７からの出力に基づいて、入力した
音声に対応する語句を認識し（音声認識手段）、その語
句に対応する認識コードを第１の音声認識リソース３１
又は後述する第２の音声認識リソース３５から抽出して
出力するとともに、認識結果から得られた認識テキスト
を出力（語句抽出手段）する音声認識部、３３は音声認
識部３２からの認識コードに基づいて該当する商品の表
示を行うとともに、音声認識部３２からの認識テキスト
を後述する音声認識リソース生成部３４へ渡すアプリケ
ーションプログラム部、３４はアプリケーションプログ
ラム部３３からの認識テキストに基づいて第１の音声認
識リソース３１を絞込んで第２の音声認識リソース３５
を生成する音声認識リソース生成手段としての音声認識
リソース生成部である。FIG. 2 is a functional block diagram showing the configuration of the speech recognition apparatus according to the present embodiment. Reference numeral 31 denotes a word to be recognized in advance and a recognition code defined (associated) for each word. A first speech recognition resource consisting of
32 recognizes a phrase corresponding to the input voice based on the output from the voice input unit 17 (voice recognition means), and outputs a recognition code corresponding to the phrase to the first voice recognition resource 31.
Alternatively, the speech recognition unit 33 extracts and outputs from a second speech recognition resource 35 to be described later and outputs a recognition text obtained from the recognition result (phrase extraction means), based on the recognition code from the speech recognition unit 32. The application program unit that displays the corresponding product and passes the recognition text from the speech recognition unit 32 to a speech recognition resource generation unit 34, which will be described later. The second speech recognition resource 35 is narrowed down by recognizing the recognition resource 31.
This is a speech recognition resource generation unit as a speech recognition resource generation unit that generates.

【００２１】なお、図示はしないが音声認識されるべき
語句については、予め標準話者の音声特徴データを関連
づけて記憶しておく（不特定話者対応型）。但し、使用
者に実際に発声してもらった音声特徴データを関連づけ
ておいてもよい（特定話者対応型）。Although not shown, the words to be speech-recognized are stored in advance in association with the speech characteristic data of the standard speaker (unspecified speaker-compatible type). However, the voice feature data actually uttered by the user may be associated (specific speaker correspondence type).

【００２２】上記第１の音声認識リソース３１は、最初
の音声認識を行う場合に使用される元のデータを記憶す
るもので、例えば上記ハードディスク装置１４などに予
め構築されている。具体的には音声認識されるべき語句
が定義されているが、このうち、特に商品種類について
一般的なＢＮ記法で記述した例を図３に示す。The first speech recognition resource 31 stores the original data used for performing the first speech recognition, and is constructed in advance in, for example, the hard disk device 14 or the like. Specifically, words to be recognized by speech are defined. Among them, FIG. 3 shows an example in which the product type is described in general BN notation.

【００２３】図３において、「＜ＳＥＮＴＥＮＣＥ＞＝
＜商品種類＞と？ですか？．」とあるが、これは例えば
「普通紙と写真とカードですね」という文が音声入力さ
れた場合に、「普通紙」、「写真」、「カード」を商品
種類と定義するとの意味である。また、「＜商品種類＞
＝写真｜カード｜用紙｜ざら紙…｜ブロマイド．」とあ
るが、これは商品種類は写真、カード、用紙、ざら紙、
…ブロマイドのいずれかであるとの意味である。In FIG. 3, "<SENTENCE> =
<Product type>? Is it? . However, this means that, for example, when the sentence "Is plain paper, photo and card" is spoken, "plain paper", "photo", and "card" are defined as product types. . In addition, "<Product type>
= Photo | card | paper | rough paper ... | bromide. , But the product types are photo, card, paper, rough paper,
... means that it is one of bromide.

【００２４】上記第２の音声認識リソース３５は、２回
目以降の音声認識を行う場合に使用されるもので、音声
認識リソース生成部３４により例えば上記ハードディス
ク装置１４などに構築される。具体的には音声認識され
るべき語句（金額の語句を含む）が定義されている。こ
の第２の音声認識リソース３５について、一般的なＢＮ
記法で記述した例を図４に示す。The second speech recognition resource 35 is used when performing speech recognition for the second time or later, and is constructed by the speech recognition resource generation unit 34 in, for example, the hard disk device 14 or the like. Specifically, terms to be recognized by speech (including terms of money) are defined. For the second speech recognition resource 35, a general BN
An example described in the notation is shown in FIG.

【００２５】図４において、「＜ＳＥＮＴＥＮＣＥ＞＝
確認しますと？＜商品金額＞円の＜商品種類＞を＜販売
個数＞枚と？ですね？．」とあるが、これは例えば「５
０円の普通紙を３枚ですね」という文が音声入力された
場合に、「５０」を商品金額、「普通紙」を商品種類、
「３」を販売個数と定義するとの意味である。また、
「＜商品金額＞＝１〜９９９９９．」とあるが、これは
商品金額が１〜９９９９９のいずれかであるとの意味で
ある。In FIG. 4, "<SENTENCE> =
Do you check? <Product price> Yen <Product type> as <Sale quantity> sheets? is not it? . , But this is, for example, "5
If the sentence "3 sheets of plain paper of 0 yen" is input by voice, "50" is the product price, "plain paper" is the product type,
This means that “3” is defined as the number of units sold. Also,
There is "<commodity amount> = 1 to 99999.", which means that the commodity amount is any of 1 to 99999.

【００２６】また、「＜商品種類＞＝普通紙｜感熱紙｜
ブロマイド．」とあるが、これは商品種類は普通紙、感
熱紙、ブロマイドのいずれかであるとの意味である。つ
まり、図３に示すような最初に認識されるべき元の商品
種類の中から、２回目以降に認識されるべき商品が普通
紙、感熱紙、ブロマイドに絞込まれたことになる。"<Product type> = plain paper | thermal paper |
bromide. This means that the product type is either plain paper, thermal paper, or bromide. That is, among the original product types to be recognized first as shown in FIG. 3, the products to be recognized for the second and subsequent times are narrowed down to plain paper, thermal paper, and bromide.

【００２７】また、「＜販売個数＞＝１〜９９９９
９．」とあるが、これは販売個数が１〜９９９９９のい
ずれかであるとの意味である。Also, "<sales quantity> = 1 to 9999
9. , Which means that the sales quantity is any one of 1 to 99999.

【００２８】なお、上記音声認識部３２、アプリケーシ
ョンプログラム部３３、及び音声認識リソース生成部３
４は、具体的には例えばハードディスク装置１４、ＲＯ
Ｍ１２などに記憶され、上記ＣＰＵ１１が読取可能なソ
フトウエアプログラムで構成される。The speech recognition unit 32, the application program unit 33, and the speech recognition resource generation unit 3
4 is, for example, a hard disk drive 14, RO
It is configured by a software program stored in M12 or the like and readable by the CPU 11.

【００２９】上記音声認識リソース生成部３４において
は、アプリケーションプログラム部３３から渡された認
識テキスト（最初の音声認識によって音声認識部３２か
ら出力された商品名などの文字列）を２回目以降に音声
認識されるべき語句として第２の音声認識リソース３５
を生成する。The speech recognition resource generation unit 34 converts the recognition text (the character string such as the product name output from the speech recognition unit 32 by the first speech recognition) passed from the application program unit 33 into the second and subsequent speeches. The second speech recognition resource 35 as a phrase to be recognized
Generate

【００３０】このような構成の本発明の実施の形態にお
いては、音声入力は次のような会話形式で行われる（但
し、マイク１８は操作者側に設置されており、客の音声
はマイク１８には入力されない）。In the embodiment of the present invention having such a configuration, voice input is performed in the following conversational form (however, the microphone 18 is installed on the operator side, and the voice of the customer is Will not be entered).

【００３１】例えば、客から「えーとですね、普通紙と
感熱紙とブロマイドを下さい。」との注文に対して、操
作者（話者）がマイク１８に向って「普通紙と感熱紙と
ブロマイドですか？」と発声すると、音声認識部３２に
より図３に示す第１の音声認識リソース３１に基づいて
商品種類についての最初の音声認識が行われる。For example, in response to an order from a customer, "Well, please give me plain paper, thermal paper and bromide." ? ", The speech recognition unit 32 performs the first speech recognition for the product type based on the first speech recognition resource 31 shown in FIG.

【００３２】具体的には、「普通紙」、「感熱紙」、
「ブロマイド」がそれぞれ認識され、これら「普通紙」
「感熱紙」「ブロマイド」に対応する認識コード及び
「普通紙」「感熱紙」「ブロマイド」という認識テキス
トがアプリケーションプログラム部３３に渡される。す
ると、アプリケーションプログラム部３３により、上記
認識コードに基づいて「普通紙」「感熱紙」「ブロマイ
ド」が商品種類として表示部２１に表示されるととも
に、上記認識テキストは音声認識リソース生成部３４に
渡される。Specifically, "plain paper", "thermal paper",
"Bromide" is recognized and these "plain paper"
The recognition codes corresponding to “thermal paper” and “bromide” and the recognition texts “plain paper”, “thermal paper” and “bromide” are passed to the application program unit 33. Then, the application program unit 33 displays “plain paper”, “thermal paper”, and “bromide” as product types on the display unit 21 based on the recognition code, and passes the recognition text to the speech recognition resource generation unit 34. It is.

【００３３】続いて、音声認識リソース生成部３４によ
り、２回目以降の音声認識で認識されるべき商品種類が
上記認識テキストである「普通紙」「感熱紙」「ブロマ
イド」に絞込まれ、新たに図４に示すような第２の音声
認識リソース３５が生成される。Subsequently, the speech recognition resource generation unit 34 narrows down the product types to be recognized in the second and subsequent speech recognitions to the above-mentioned recognized texts, "plain paper", "thermal paper", and "bromide". Then, a second speech recognition resource 35 as shown in FIG. 4 is generated.

【００３４】そして、操作者（話者）による「普通紙と
感熱紙とブロマイドですか？」との確認に対して、客が
「はい、そうです。」と答えると、操作者（話者）は
「どれくらいにしましょうか」と質問する。これに対し
て、客は「えーと、５０円の普通紙を３枚と６０円の感
熱紙を４枚と８０円のブロマイドを１０枚下さい。」と
答えたとすると、操作者（話者）は「確認しますと、５
０円の普通紙を３枚と６０円の感熱紙を４枚と８０円の
ブロマイドを１０枚ですね。」とマイク１８に向って発
声する。すると、音声認識部３２により図４に示す第２
の音声認識リソース３５に基づいて商品種類についての
２回目の音声認識が行われる。In response to the operator (speaker) confirming "is plain paper, thermal paper and bromide?", The customer answers "yes, yes." Asks, "How much should I do?" On the other hand, if the customer answers, "Um, 3 sheets of plain paper of 50 yen, 4 sheets of thermal paper of 60 yen and 10 sheets of bromide of 80 yen," the operator (speaker) "If you check, 5
3 sheets of plain paper of 0 yen, 4 sheets of thermal paper of 60 yen and 10 sheets of bromide of 80 yen. "To the microphone 18. Then, the voice recognition unit 32 outputs the second
The second speech recognition for the product type is performed based on the speech recognition resource 35 of FIG.

【００３５】この場合には、第２の音声認識リソース３
５の商品種類は、既に「普通紙」「感熱紙」「ブロマイ
ド」のみであるので、商品種類については、この中から
音声認識が行われるため、認識率が高くなる。その後、
音声認識された商品金額、商品種類、販売個数の認識コ
ードに基づいて、アプリケーションプログラム部３３に
より商品の登録、商品の売上金額の算出などの業務処理
が行われる。In this case, the second speech recognition resource 3
Since the product types of No. 5 are only “plain paper”, “thermal paper”, and “bromide”, the recognition rate of the product type is high because voice recognition is performed from among them. afterwards,
Based on the recognition code of the product amount, the product type, and the sales quantity recognized by voice, the application program unit 33 performs business processes such as registration of the product and calculation of the sales amount of the product.

【００３６】このように、同じ語句について最初に音声
認識を行う場合は、データ量の多い第１の音声認識リソ
ース３１を使用し、２回目以降に音声認識を行う場合
は、最初の音声認識の結果により得られた認識テキスト
の語句に絞込んで生成した第２の音声認識リソース３５
を使用するので、音声認識されるべき語句の範囲が限定
されているため、２回目以降の音声認識の認識率を向上
させることができる。これにより、２回目以降の音声入
力、すなわち確認による音声入力を行う場合の効果を十
分に発揮できる。As described above, the first speech recognition resource 31 having a large amount of data is used when performing speech recognition for the same phrase for the first time, and the first speech recognition is performed when speech recognition is performed for the second and subsequent times. A second speech recognition resource 35 generated by narrowing down the words of the recognition text obtained as a result.
Is used, the range of words to be speech-recognized is limited, so that the recognition rate of the second and subsequent speech recognition can be improved. As a result, the effect in the case of performing the second or later voice input, that is, the voice input by confirmation can be sufficiently exhibited.

【００３７】また、第１の音声認識リソース３１の語句
を絞込んだ第２の音声認識リソース３５によって２回目
以降の音声認識を行うので、２回目以降の音声認識に要
する音声認識部３２の処理量を軽減することができ、処
理時間も短縮できる。Further, since the second and subsequent speech recognition is performed by the second speech recognition resource 35 in which the words of the first speech recognition resource 31 are narrowed down, the processing of the speech recognition unit 32 required for the second and subsequent speech recognition is performed. The amount can be reduced and the processing time can be reduced.

【００３８】なお、本実施の形態においては、音声認識
部３２からの認識テキストをアプリケーションプログラ
ム部３３を介して受取る場合について説明したが、必ず
しもこれに限定されるものではなく、図５に示すように
音声認識部３２から直接受取るようにしてもよい。In the present embodiment, the case where the recognized text from the voice recognition unit 32 is received via the application program unit 33 has been described. However, the present invention is not necessarily limited to this. As shown in FIG. May be directly received from the voice recognition unit 32.

【００３９】また、本実施の形態における第１の音声認
識リソース３１を図６に示すように複数の音声認識リソ
ース群に分割して構成しておいて、これらの音声認識リ
ソース群の中から、同じ語句について最初に音声認識を
行ったときに音声認識部３２で得られた認識テキストの
語句を含む音声認識リソースを取出して、それを第２の
音声認識リソース３５として生成してもよい。このよう
にしても、２回目以降に音声認識されるべき語句の範囲
が限定されるので、認識率を向上させることができる。Further, the first speech recognition resource 31 in the present embodiment is divided into a plurality of speech recognition resource groups as shown in FIG. 6, and from among these speech recognition resource groups, A speech recognition resource including the phrase of the recognized text obtained by the speech recognition unit 32 when speech recognition is first performed for the same phrase may be extracted and generated as the second speech recognition resource 35. Even in this case, the range of words to be subjected to voice recognition after the second time is limited, so that the recognition rate can be improved.

【００４０】この場合の第１の音声認識リソース３１
は、例えば図７に示すように音声認識リソース（１）、
音声認識リソース（２）、音声認識リソース（３）…の
音声認識リソース群からなる。そして、最初の音声認識
において音声認識リソース生成部３４が音声認識部３２
から例えば「普通紙」の音声テキストを受取ると、この
「普通紙」を含む音声認識リソース（１）が第２の音声
認識リソース３５とされ、２回目以降の音声認識はこの
第２の音声認識リソース３５に基づいて行われる。この
ように、音声認識リソース生成部３４は第１の音声認識
リソースの音声認識リソース群の中から２回目以降の音
声認識を行う音声認識リソースを選択するだけでよく、
新たに音声認識リソースを生成する必要がないので、そ
の分の記憶容量を不要にすることができる。In this case, the first speech recognition resource 31
Is a speech recognition resource (1) as shown in FIG.
The speech recognition resources (2), the speech recognition resources (3)... Then, in the first speech recognition, the speech recognition resource generation unit 34
For example, when a voice text of "plain paper" is received from the user, the voice recognition resource (1) including the "plain paper" is set as the second voice recognition resource 35, and the second and subsequent voice recognition is performed by the second voice recognition. This is performed based on the resource 35. As described above, the voice recognition resource generation unit 34 only needs to select a voice recognition resource for performing the second and subsequent voice recognition from the voice recognition resource group of the first voice recognition resource,
Since it is not necessary to newly generate a speech recognition resource, the storage capacity for the resource can be eliminated.

【００４１】次に、本発明の第２の実施の形態を図８な
いし図１１を参照して説明する。なお、本実施の形態に
かかる音声認識装置の構成を示すブロック図、機能ブロ
ック図は、図１、図２に示すものと同様であるため、そ
の詳細な説明は省略する。Next, a second embodiment of the present invention will be described with reference to FIGS. Note that a block diagram and a functional block diagram illustrating the configuration of the speech recognition apparatus according to the present embodiment are the same as those illustrated in FIGS. 1 and 2, and thus detailed description thereof will be omitted.

【００４２】本実施の形態において、上記第１の実施の
形態と異なるのは、同じ語句について最初の音声認識を
行ったときに認識された語句（認識テキスト）に基づい
て第１の音声認識リソース３１に記憶する語句の範囲を
限定した第２の音声認識リソース３５を生成する代り
に、最初の音声認識を行ったときに認識された語句が再
度認識される確率を高くした第２の音声認識リソース３
５を生成する点である。This embodiment is different from the first embodiment in that the first speech recognition resource based on the phrase (recognized text) recognized when the first speech recognition is performed for the same phrase. Instead of generating the second speech recognition resource 35 in which the range of the words stored in 31 is limited, the second speech recognition in which the probability that the words recognized when the first speech recognition is performed is re-recognized is increased. Resource 3
5 is generated.

【００４３】具体的には、本実施の形態における音声認
識リソース生成部３４は、最初の音声認識を行って認識
された語句を候補語句Ａとするとともに、第１の音声認
識リソース３１に記憶された語句を候補語句Ｂとし、２
回目以降の音声認識では候補語句Ａ及び候補語句Ｂのい
ずれかから認識することができるような第２の音声認識
リソース３５を生成する。More specifically, the speech recognition resource generation unit 34 in the present embodiment sets the phrase recognized by performing the first speech recognition as a candidate phrase A and is stored in the first speech recognition resource 31. The term that was used as the candidate word B is 2
In the second and subsequent speech recognition, a second speech recognition resource 35 that can be recognized from either the candidate phrase A or the candidate phrase B is generated.

【００４４】これを一般的なＢＮ記法で記述した場合に
は、図９に示すようになる。この図９に示す第２の音声
認識リソース３５は、最初の音声認識で商品種類として
「普通紙」が認識されたとした場合に音声認識リソース
生成部３４で生成されたものである。図８に示す第１の
音声認識リソース３１と異なるのは、「＜商品種類＞＝
＜商品種類１＞｜＜商品種類２＞」を追加して商品種類
を商品種類１（候補語句Ａに相当）又は商品種類２（候
補語句Ｂに相当）であると定義するとともに、商品種類
１を「普通紙」であると定義し、商品種類２を第１の音
声認識リソース３１に記憶された語句のいずれかである
と定義する点である。これにより、音声認識部３２によ
り第２の音声認識リソース３５を使用して商品種類の音
声認識を行う場合は、「普通紙」が認識される確率が上
がる。つまり、商品種類を商品種類１又は商品種類２と
しているので、商品種類１である「普通紙」が認識され
る確率は５０％であり、さらに商品種類２には「普通
紙」を含むので、「普通紙」が認識される全体の確率
は、少なくとも５０％以上になる。When this is described in general BN notation, the result is as shown in FIG. The second voice recognition resource 35 shown in FIG. 9 is generated by the voice recognition resource generation unit 34 when “plain paper” is recognized as the product type in the first voice recognition. The difference from the first speech recognition resource 31 shown in FIG.
<Product Type 1> | <Product Type 2> ”is added to define the product type as product type 1 (corresponding to candidate phrase A) or product type 2 (corresponding to candidate phrase B) and product type 1 Is defined as “plain paper”, and the product type 2 is defined as any of the phrases stored in the first speech recognition resource 31. Accordingly, when the speech recognition unit 32 performs speech recognition of the product type using the second speech recognition resource 35, the probability that “plain paper” is recognized increases. That is, since the product type is the product type 1 or the product type 2, the probability that the “type 1” “plain paper” is recognized is 50%, and the product type 2 includes “plain paper”. The overall probability that "plain paper" is recognized is at least 50% or more.

【００４５】このような構成の本発明の実施の形態にお
いても、上記第１の実施の形態と同様に音声入力は次の
ような会話形式で行われる。例えば、客から「えーとで
すね、普通紙を下さい。」との注文に対して、操作者
（話者）がマイク１８に向って「普通紙ですね？」と発
声すると、音声認識部３２により図８に示すような音声
認識リソースに基づいて商品種類についての最初の音声
認識が行われる。なお、この図８に示す音声認識リソー
スは、商品金額の語句、商品種類の語句、販売個数の語
句についての第１の音声認識リソース３１に相当する。Also in the embodiment of the present invention having such a configuration, the voice input is performed in the following conversation format as in the first embodiment. For example, when an operator (speaker) utters “Plain paper?” To the microphone 18 in response to an order “Um, plain paper please” from a customer, the voice recognition unit 32 The first speech recognition for the product type is performed based on the speech recognition resources as shown in FIG. The speech recognition resources shown in FIG. 8 correspond to the first speech recognition resources 31 for terms of the price of the product, terms of the product type, and terms of the number of units sold.

【００４６】具体的には、「普通紙」が認識され、「普
通紙」に対応する認識コード及び「普通紙」という認識
テキストがアプリケーションプログラム部３３に渡され
る。すると、アプリケーションプログラム部３３によ
り、上記認識コードに基づいて「普通紙」が商品種類と
して表示部２１に表示されるとともに、上記認識テキス
トは音声認識リソース生成部３４に渡される。これによ
り、音声認識リソース生成部３４によって、商品種類に
ついて「普通紙」が認識される確率を高くする図９に示
すような新たな音声認識リソースが生成される。なお、
この図９に示す音声認識リソースは商品種類の語句につ
いては第２の音声認識リソース３５に相当するが、商品
種類以外の語句、すなわち商品金額、販売個数の語句に
ついては絞込みが行われていないので、未だ第１の音声
認識リソース３１に相当する。Specifically, “plain paper” is recognized, and a recognition code corresponding to “plain paper” and a recognition text “plain paper” are passed to the application program unit 33. Then, “plain paper” is displayed as a product type on the display unit 21 based on the recognition code by the application program unit 33, and the recognition text is passed to the speech recognition resource generation unit 34. Thereby, the voice recognition resource generation unit 34 generates a new voice recognition resource as shown in FIG. 9 that increases the probability that “plain paper” is recognized for the product type. In addition,
The speech recognition resources shown in FIG. 9 correspond to the second speech recognition resources 35 for terms of the product type, but terms other than the product type, that is, terms of the product price and the number of units sold, are not narrowed down. , Still correspond to the first speech recognition resource 31.

【００４７】そして、操作者（話者）による「いくらの
ですか？」との質問に対して、客が「５０円のもの」と
回答してきた場合に、操作者（話者）がマイク１８に向
って「５０円ですね？」と発声すると、音声認識部３２
により図９に示す音声認識リソースに基づいて商品金額
についての最初の音声認識が行われる。When the customer (the speaker) asks “how much is it?” And the customer answers “50 yen”, the operator (speaker) turns the microphone 18 When the user says "50 yen?"
As a result, the first speech recognition for the price of the commodity is performed based on the speech recognition resources shown in FIG.

【００４８】具体的には、「５０」が認識され、「５
０」に対応する認識コード及び「５０」という認識テキ
ストがアプリケーションプログラム部３３に渡される。
すると、アプリケーションプログラム部３３により、上
記認識コードに基づいて「５０」が商品金額５０円とし
て表示部２１に表示されるとともに、上記認識テキスト
は音声認識リソース生成部３４に渡される。これによ
り、音声認識リソース生成部３４によって、商品金額に
ついて「５０」が認識される確率を高くする図１０に示
すような新たな音声認識リソースが生成される。Specifically, “50” is recognized, and “5” is recognized.
The recognition code corresponding to “0” and the recognition text “50” are passed to the application program unit 33.
Then, based on the recognition code, “50” is displayed on the display unit 21 as the product price of 50 yen by the application program unit 33, and the recognition text is passed to the speech recognition resource generation unit 34. As a result, the speech recognition resource generation unit 34 generates a new speech recognition resource as shown in FIG. 10 that increases the probability that “50” is recognized for the product price.

【００４９】つまり、商品金額について、「＜商品金額
＞＝＜商品金額１＞｜＜商品金額２＞」を追加して商品
金額を商品金額１（候補語句Ａに相当）又は商品金額２
（候補語句Ｂに相当）であると定義するとともに、商品
金額１を「５０」であると定義し、商品金額２を第１の
音声認識リソース３１に記憶された語句のいずれか（１
〜９９９９９）であると定義する。これにより、音声認
識部３２により図１０に示す音声認識リソースを使用し
て商品金額の音声認識を行う場合は、「５０」が認識さ
れる確率が上がる。つまり、商品金額を商品金額１又は
商品金額２としているので、商品金額１である「５０」
が認識される確率は５０％であり、さらに商品金額２に
は「５０」を含むので、「５０」が認識される全体の確
率は、少なくとも５０％以上になる。That is, as for the product price, “<product price> = <product price 1> | <product price 2>” is added and the product price is changed to product price 1 (corresponding to candidate phrase A) or product price 2
(Corresponding to candidate phrase B), product price 1 is defined as "50", and product price 2 is defined as any one of the phrases (1
９９99999). As a result, when the speech recognition unit 32 performs speech recognition of the commodity price using the speech recognition resources shown in FIG. 10, the probability that "50" is recognized increases. That is, since the product price is the product price 1 or the product price 2, the product price 1 is “50”.
Is 50%, and the commodity price 2 includes "50", so that the overall probability of "50" being recognized is at least 50% or more.

【００５０】なお、この図１０に示す音声認識リソース
は商品種類の語句及び商品金額の語句については第２の
音声認識リソース３５に相当するが、販売個数の語句に
ついては絞込みが行われていないので、未だ第１の音声
認識リソース３１に相当する。Note that the speech recognition resources shown in FIG. 10 correspond to the second speech recognition resources 35 for terms of the product type and terms of the price of the product, but the terms of the number sold are not narrowed down. , Still correspond to the first speech recognition resource 31.

【００５１】次に、操作者（話者）による「何枚ですか
？」との質問に対して、客が「４枚下さい」と回答して
きた場合に、操作者（話者）がマイク１８に向って「４
枚ですね？」と発声すると、音声認識部３２により図１
０に示す音声認識リソースに基づいて販売個数について
の最初の音声認識が行われる。具体的には、「４」が認
識され、「４」に対応する認識コード及び「４」という
認識テキストがアプリケーションプログラム部３３に渡
される。すると、アプリケーションプログラム部３３に
より、上記認識コードに基づいて「４」が販売個数４枚
として表示部２１に表示されるとともに、上記認識テキ
ストは音声認識リソース生成部３４に渡される。これに
より、音声認識リソース生成部３４によって、商品金額
について「４」が認識される確率を高くする図１１に示
すような新たな音声認識リソースが生成される。Next, in response to the question “How many cards?” By the operator (speaker), when the customer answers “Please give 4 cards”, the operator (speaker) receives the microphone 18. "4
Is it a piece? "By the voice recognition unit 32 in FIG.
Based on the speech recognition resource indicated by 0, the first speech recognition for the sales quantity is performed. Specifically, “4” is recognized, and a recognition code corresponding to “4” and a recognition text “4” are passed to the application program unit 33. Then, the application program unit 33 displays “4” on the display unit 21 as four sales units based on the recognition code, and the recognized text is passed to the speech recognition resource generation unit 34. As a result, the speech recognition resource generation unit 34 generates a new speech recognition resource as shown in FIG. 11 that increases the probability that “4” is recognized for the product price.

【００５２】つまり、販売個数について、「＜販売個数
＞＝＜販売個数１＞｜＜販売個数２＞」を追加して販売
個数を販売個数１（候補語句Ａに相当）又は販売個数２
（候補語句Ｂに相当）であると定義するとともに、販売
個数１を「４」であると定義し、販売個数２を第１の音
声認識リソース３１に記憶された語句のいずれか（１〜
９９９９９）であると定義する。これにより、音声認識
部３２により図１１に示す音声認識リソースを使用して
販売個数の音声認識を行う場合は、上記商品金額の場合
と同様の理由により「４」が認識される確率が上がる。
なお、この図１１に示す音声認識リソースは商品種類の
語句、商品金額の語句、販売個数の語句についての第２
の音声認識リソース３５に相当する。That is, as for the sales quantity, “<sales quantity> = <sales quantity 1> | <sales quantity 2>” is added, and the sales quantity is changed to sales quantity 1 (corresponding to candidate phrase A) or sales quantity 2
(Corresponding to candidate word B), the sales quantity 1 is defined as "4", and the sales quantity 2 is defined as any one of the words (1 to 1) stored in the first speech recognition resource 31.
99999). As a result, when the speech recognition unit 32 performs speech recognition of the number of sales using the speech recognition resources shown in FIG. 11, the probability that “4” is recognized is increased for the same reason as in the case of the commodity price.
Note that the speech recognition resources shown in FIG. 11 are the second words about the product type phrase, the product price phrase, and the sales quantity phrase.
Correspond to the voice recognition resource 35 of FIG.

【００５３】そして、操作者（話者）が「確認します
と、５０円の普通紙を４枚ですね。」とマイク１８に向
って発声する。すると、音声認識部３２により商品金額
の語句、商品種類の語句、販売個数の語句についての２
回目の音声認識が行われる。Then, the operator (speaker) speaks toward the microphone 18 saying, "If you confirm, it is four sheets of plain paper of 50 yen." Then, the speech recognition unit 32 uses the words of the item price, the item type, and the number of units sold as 2 words.
The second speech recognition is performed.

【００５４】この場合には、図１１に示す音声認識リソ
ース（各語句についての第２の音声認識リソース３５に
相当）に基づいて音声認識が行われるため、認識率が高
くなる。その後、音声認識された商品金額、商品種類、
販売個数の認識コードに基づいて、アプリケーションプ
ログラム部３３により商品の登録、商品の売上金額の算
出などの業務処理が行われる。In this case, the speech recognition is performed based on the speech recognition resources shown in FIG. 11 (corresponding to the second speech recognition resources 35 for each phrase), so that the recognition rate increases. After that, the product value, product type,
Based on the sales number recognition code, the application program unit 33 performs business processing such as registration of a product and calculation of the sales amount of the product.

【００５５】このように、音声認識リソース生成部３４
により最初の音声認識を行って認識された語句を候補語
句Ａとするとともに、第１の音声認識リソース３１に記
憶された語句を候補語句Ｂとした第２の音声認識リソー
ス３５を生成し、２回目以降の音声認識では候補語句Ａ
及び候補語句Ｂのいずれかから認識するようにしたた
め、最初の音声認識を行ったときに認識された語句が再
度認識される確率を高くすることができる。これによ
り、２回目以降の音声認識の認識率を向上させることが
できる。従って、２回目以降の音声入力、すなわち確認
による音声入力を行う場合の効果を十分に発揮できる。As described above, the speech recognition resource generation unit 34
Generates a second speech recognition resource 35 with the phrase recognized by performing the first speech recognition as the candidate phrase A and the phrase stored in the first speech recognition resource 31 as the candidate phrase B, In the second and subsequent speech recognition, candidate phrase A
And the candidate phrase B, it is possible to increase the probability that the phrase recognized at the time of the first speech recognition will be recognized again. Thereby, the recognition rate of the second and subsequent speech recognition can be improved. Therefore, the effect in the case of performing the second or later voice input, that is, the voice input by confirmation can be sufficiently exhibited.

【００５６】次に、本発明の第３の実施の形態を図１２
ないし図１４を参照して説明する。なお、本実施の形態
にかかる音声認識装置の構成を示すブロック図は、図１
に示すものと同様であるため、その詳細な説明は省略す
る。図１２は、本実施の形態にかかる音声認識装置の機
能ブロック図で、図２に示すものと異なるのは、音声認
識リソース生成部４１はアプリケーションプログラム部
３３において商品金額、販売個数などの認識コードに基
づいて算出された売上金額を示す売上金額コードを受取
り、「預り金額」を認識するのに使用する第１の音声認
識リソース４２に記憶された金額の語句の範囲を売上金
額コードに基づいて限定した第２の音声認識リソース４
３を生成する点である。具体的には、第１の音声認識リ
ソース４２の売上金額の語句から、売上金額コードが示
す売上金額よりも低い金額の語句を削除したものを第２
の音声認識リソース４３とする。Next, a third embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus according to the present embodiment.
And the detailed description is omitted. FIG. 12 is a functional block diagram of the speech recognition apparatus according to the present embodiment. The difference from the speech recognition apparatus shown in FIG. The sales amount code indicating the sales amount calculated based on the sales amount code is received, and the term range of the amount stored in the first speech recognition resource 42 used for recognizing the “deposit amount” is determined based on the sales amount code. Limited second speech recognition resource 4
3 is generated. More specifically, a phrase obtained by deleting a phrase having an amount lower than the sales amount indicated by the sales amount code from the phrase of the sales amount of the first speech recognition resource 42 is referred to as a second expression.
Of the speech recognition resource 43.

【００５７】本実施の形態における第１の音声認識リソ
ース４２は、図１３に示すように一般的なＢＮ記法で記
述されており、これを元にして第２の音声認識リソース
４３が生成される。図１３において、「＜ＳＥＮＴＥＮ
ＣＥ＞＝＜預り金額＞円お預りします．」とあるが、こ
れは例えば「１００５円お預りします」という文が音声
入力された場合に、「１００５」を預り金額と定義する
との意味である。また、「＜預り金額＞＝１〜９９９９
９．」とあるが、これは預り金額が１〜９９９９９のい
ずれかであるとの意味である。The first speech recognition resource 42 in the present embodiment is described in a general BN notation as shown in FIG. 13, and based on this, the second speech recognition resource 43 is generated. . In FIG. 13, "<SENTEN
CE> = <deposit amount> JPY This means that, for example, when a sentence “1005 yen will be deposited” is input by voice, “1005” is defined as the deposit amount. Also, “<deposit amount> = 1 to 9999
9. , Which means that the deposit amount is one of 1 to 99999.

【００５８】このような構成の本実施の形態において
は、第１又は第２の実施の形態で詳述したように操作者
（話者）からの音声入力に基づいて音声認識部３２によ
り商品金額、商品個数が音声認識されて認識コードが出
力される。この認識コードに基づいてアプリケーション
プログラム部３３により、売上金額が算出され、その売
上金額コードが音声認識リソース生成部４１に渡され
る。In the present embodiment having such a configuration, as described in detail in the first or second embodiment, the commodity recognition unit 32 uses the speech recognition unit 32 based on the speech input from the operator (speaker). Then, the number of products is voice-recognized and a recognition code is output. The sales amount is calculated by the application program unit 33 based on the recognition code, and the sales amount code is passed to the speech recognition resource generation unit 41.

【００５９】これにより、音声認識リソース生成部４１
は、売上金額コードが示す売上金額よりも低い金額を削
除したものを売上金額の語句として第２の音声認識リソ
ース４３を生成する。例えば売上金額が１００５円であ
った場合に、音声認識リソース生成部４１で生成される
第２の音声認識リソース４３は、図１４に示すように、
「＜預り金額＞＝１００５〜９９９９９．」と記述され
る。つまり、第１の音声認識リソース４２における＜預
り金額＞の定義から１〜１００４が削除される。売上金
額が１００５円ということは、客からの預り金額として
１００４円以下の金額が認識されることはあり得ないか
らである。Thus, the speech recognition resource generation unit 41
Generates the second speech recognition resource 43 by using the value of the sales amount after deleting the amount lower than the sales amount indicated by the sales amount code. For example, when the sales amount is 1005 yen, the second speech recognition resource 43 generated by the speech recognition resource generation unit 41 is, as shown in FIG.
It is described as “<deposit amount> = 1005 to 99999.”. That is, 1 to 1004 are deleted from the definition of <deposit amount> in the first voice recognition resource 42. The reason that the sales amount is 1005 yen is because it is impossible to recognize an amount of 1004 yen or less as a deposit amount from a customer.

【００６０】これにより、あり得ない預り金額を誤って
認識することを防止でき、預り金額の認識率が向上す
る。例えば、預り金額として１０００円（せんえん）と
３円（さんえん）とは、音声が類似しており誤認識され
易いが、売上金額が例えば８００円であったとすれば、
８００円より低い金額は排除された第２の音声認識リソ
ースで預り金額１０００円の音声認識が行われるので、
３円と誤認識することを防止できる。As a result, it is possible to prevent erroneous recognition of an impossible deposit amount, and the recognition rate of the deposit amount is improved. For example, assuming that the deposit amounts of 1,000 yen (3 yen) and 3 yen (3 yen) have similar sounds and are easily misrecognized, but if the sales amount is 800 yen, for example,
Since an amount lower than 800 yen is recognized by the eliminated second voice recognition resource, a voice recognition of the deposit amount of 1000 yen is performed.
It is possible to prevent erroneous recognition of 3 yen.

【００６１】また、第１の音声認識リソース４２の金額
の語句を絞込んだ第２の音声認識リソース４３によって
預り金額の音声認識を行うので、預り金額の音声認識に
要する音声認識部３２の処理量を軽減することができ、
処理時間も短縮できる。Since the voice recognition of the deposit amount is performed by the second voice recognition resource 43 in which the terms of the amount of money of the first voice recognition resource 42 are narrowed down, the processing of the voice recognition unit 32 required for the voice recognition of the deposit amount is performed. The amount can be reduced,
Processing time can also be reduced.

【００６２】このような構成の本発明の第３の実施の形
態は、預り金額だけを音声で入力するような装置に適用
してもよく、また第１又は第２の実施の形態と組合わせ
て適用してもよい。The third embodiment of the present invention having such a configuration may be applied to a device for inputting only the deposit amount by voice, and may be combined with the first or second embodiment. May be applied.

【００６３】[0063]

【発明の効果】以上詳述したように本発明によれば、同
じ語句について最初に音声認識を行う場合は、データ量
の多い第１の音声認識リソースを使用し、２回目以降に
音声認識を行う場合は、最初の音声認識の結果により得
られた語句に基づいて絞込んで生成した第２の音声認識
リソースを使用することにより、音声認識されるべき語
句の範囲が限定されるため、２回目以降の音声認識の認
識率を向上させることができる。また、２回目以降の音
声認識に要する音声認識手段の処理量を軽減することが
でき、処理時間も短縮できる。As described in detail above, according to the present invention, when performing speech recognition for the same phrase for the first time, the first speech recognition resource having a large data amount is used, and speech recognition is performed for the second and subsequent times. In the case of performing, since the range of the phrase to be recognized is limited by using the second speech recognition resource narrowed and generated based on the phrase obtained as a result of the first speech recognition, 2 It is possible to improve the recognition rate of the voice recognition after the first time. Further, the processing amount of the voice recognition means required for the second and subsequent voice recognition can be reduced, and the processing time can be reduced.

【００６４】また、音声認識リソース生成手段により最
初の音声認識を行って認識された語句を候補語句Ａとす
るとともに、第１の音声認識リソースに記憶された語句
又は当該語句を含む第１の音声認識リソースの語句群を
候補語句Ｂとした第２の音声認識リソースを生成し、２
回目以降の音声認識では候補語句Ａ及び候補語句Ｂのい
ずれかから認識するようにすることにより、最初の音声
認識を行ったときに認識された語句が再度認識される確
率を高くすることができる。これにより、２回目以降の
音声認識の認識率を向上させることができる。The phrase recognized by the first speech recognition by the speech recognition resource generating means is set as a candidate phrase A, and the phrase stored in the first speech recognition resource or the first speech including the phrase is also considered. Generating a second speech recognition resource with the word group of the recognition resource as a candidate word B;
In the second and subsequent speech recognitions, by recognizing either of the candidate words A and B, it is possible to increase the probability that the words recognized when the first speech recognition is performed will be recognized again. . Thereby, the recognition rate of the second and subsequent speech recognition can be improved.

【００６５】また、第１の音声認識リソースの金額の語
句から売上金額よりも低い金額の語句を削除して第２の
音声認識リソースを生成し、客からの預り金額を音声認
識するときには、第２の音声認識リソースから金額の語
句を抽出することにより、あり得ない預り金額を誤って
認識することを防止でき、預り金額の認識率が向上す
る。また、第１の音声認識リソースの金額の語句を絞込
んだ第２の音声認識リソースによって預り金額の音声認
識を行うので、預り金額の音声認識に要する音声認識手
段の処理量を軽減することができ、処理時間も短縮でき
る。When a second speech recognition resource is generated by deleting a phrase having a price lower than the sales amount from the phrase of the first speech recognition resource, a second speech recognition resource is used. By extracting the phrase of the amount of money from the second speech recognition resource, it is possible to prevent the impossible amount of money from being mistakenly recognized, and the recognition rate of the amount of money to be deposited is improved. Further, since the voice recognition of the deposit amount is performed by the second voice recognition resource in which the terms of the amount of money of the first voice recognition resource are narrowed down, the processing amount of the voice recognition means required for the voice recognition of the deposit amount can be reduced. Processing time can be reduced.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態にかかる音声認識装
置の構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a speech recognition device according to a first embodiment of the present invention.

【図２】本実施の形態における機能ブロック図。FIG. 2 is a functional block diagram according to the embodiment.

【図３】本実施の形態における第１の音声認識リソース
の例を示す図。FIG. 3 is a diagram showing an example of a first speech recognition resource according to the embodiment.

【図４】本実施の形態における第２の音声認識リソース
の例を示す図。FIG. 4 is a diagram showing an example of a second speech recognition resource according to the embodiment.

【図５】本実施の形態にかかる音声認識装置の他の例を
示す機能ブロック図。FIG. 5 is a functional block diagram showing another example of the speech recognition device according to the embodiment;

【図６】本実施の形態における音声認識装置の他の例を
示す機能ブロック図。FIG. 6 is a functional block diagram showing another example of the speech recognition device according to the embodiment.

【図７】図６に示す音声認識装置における第１の音声認
識リソースの例を示す図。FIG. 7 is a diagram showing an example of a first speech recognition resource in the speech recognition device shown in FIG. 6;

【図８】本発明の第２の実施の形態における第１の音声
認識リソースの例を示す図。FIG. 8 is a diagram showing an example of a first speech recognition resource according to the second embodiment of the present invention.

【図９】本実施の形態において商品種類についての第２
の音声認識リソースの例を示す図。FIG. 9 shows a second example of the product type in the present embodiment.
The figure which shows the example of the speech recognition resource of FIG.

【図１０】本実施の形態において商品種類の語句、商品
金額の語句についての第２の音声認識リソースの例を示
す図。FIG. 10 is a diagram showing an example of a second speech recognition resource for a phrase of a product type and a phrase of a product price in the embodiment.

【図１１】本実施の形態において商品種類の語句、商品
金額の語句、販売個数の語句についての第２の音声認識
リソースの例を示す図。FIG. 11 is a diagram showing an example of a second speech recognition resource for a phrase of a product type, a phrase of a product price, and a phrase of a sales quantity in the embodiment.

【図１２】本発明の第３の実施の形態における機能ブロ
ック図。FIG. 12 is a functional block diagram according to a third embodiment of the present invention.

【図１３】本実施の形態における第１の音声認識リソー
スの例を示す図。FIG. 13 is a diagram showing an example of a first speech recognition resource according to the embodiment.

【図１４】本実施の形態における第２の音声認識リソー
スの例を示す図。FIG. 14 is a diagram showing an example of a second speech recognition resource according to the embodiment.

【図１５】従来の音声認識装置の機能ブロック図。FIG. 15 is a functional block diagram of a conventional voice recognition device.

【図１６】従来の音声認識装置における音声認識リソー
スの例を示す図。FIG. 16 is a diagram showing an example of a speech recognition resource in a conventional speech recognition device.

[Explanation of symbols]

１１…ＣＰＵ１７…音声入力部３１，４２…第１の音声認識リソース３２…音声認識部３３…アプリケーションプログラム部３４…音声認識リソース生成部３５，４３…第２の音声認識リソース４１…音声認識リソース生成部 11 CPU 17 Voice input units 31 and 42 First voice recognition resource 32 Voice recognition unit 33 Application program unit 34 Voice recognition resource generation unit 35 and 43 Second voice recognition resource 41 Voice recognition resource Generator

Claims

[Claims]

1. A voice input unit for inputting a voice of a speaker, a voice recognition unit for recognizing a phrase from a voice input from the voice input unit, and a first storing a plurality of phrases to be recognized in advance. A speech recognition resource, a phrase extraction unit that extracts a phrase recognized by the speech recognition unit from the first speech recognition resource, and the first speech recognition resource based on the phrase extracted by the phrase extraction unit. And a speech recognition resource generating means for generating a second speech recognition resource by narrowing down the words stored in the second speech recognition resource. A speech recognition device for extracting words and phrases.

2. The speech recognition resource generation unit generates a second speech recognition resource as a phrase to be recognized in advance, using a phrase extracted from the first speech recognition resource by the phrase extraction unit. The speech recognition device according to claim 1, wherein

3. The first speech recognition resource stores a plurality of phrases to be recognized in advance for each phrase group, and the speech recognition resource generation unit includes: 2. A speech recognition apparatus according to claim 1, wherein said phrase extraction means generates a speech recognition resource of a phrase group including the phrase extracted from said first speech recognition resource as a second speech recognition resource.

4. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a word from a voice input from the voice input means, and a first memory for storing a plurality of words to be recognized in advance. And a phrase extraction unit for extracting a phrase from which the phrase recognized by the speech recognition unit is to be recognized in advance, from the first speech recognition resource, and the first speech recognition resource by the phrase extraction unit. And a speech recognition resource generating means for generating a second speech recognition resource having a phrase stored in the first speech recognition resource as a candidate phrase B, while the phrase extracted from When recognizing a phrase re-input from the voice input unit, the phrase is extracted from one of the candidate phrases A and B of the second speech recognition resource. Voice recognition device.

5. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a plurality of words to be recognized in advance for each word group. The stored first speech recognition resource, a phrase extraction unit for extracting a phrase recognized by the speech recognition unit from the first speech recognition resource, and a phrase extracted from the first speech recognition resource by the phrase extraction unit. A second speech recognition resource is generated in which the extracted phrase is set as a candidate phrase A, and a group of phrases including the phrase extracted by the phrase extraction unit is set as a candidate phrase B among the phrases of the first speech recognition resource. And a speech recognition resource generating unit that performs speech recognition of a word re-input from the speech input unit, and selects one of the candidate words A and B of the second speech recognition resource. A speech recognition device characterized by extracting a word from a phrase.

6. A voice input means for inputting a voice of a speaker, a voice recognition means for recognizing a phrase from a voice input from the voice input means, and a price term as a plurality of words to be recognized in advance. A first speech recognition resource to be stored, a word extraction unit for extracting a word recognized by the speech recognition unit from the first speech recognition resource,
Voice recognition resource generating means for generating, as a second voice recognition resource, a phrase obtained by deleting a phrase having a lower price than the sales amount of the product from the phrase of the price of the voice recognition resource. When inputting, the speech recognition device extracts a phrase of a price from the second speech recognition resource.