JPH08202388A

JPH08202388A - Voice recognition device and voice recognition method

Info

Publication number: JPH08202388A
Application number: JP7028857A
Authority: JP
Inventors: Naoyuki Okazaki; 尚行岡崎; Daishirou Katou; 大志朗加藤; Kenji Kobayashi; 賢治小林; Masatomo Hashimoto; 政朋橋本
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1995-01-24
Filing date: 1995-01-24
Publication date: 1996-08-09

Abstract

PURPOSE: To improve a recognition rate by supplementally using plural voice recognition processing means which are relatively easily obtainable even if recognition systems to satisfy stringent requirements and specifications at a high degree are not established and gradually autonomously adapting these means to a change in working environment and to actual recognition objects. CONSTITUTION: This voice recognition device has the plural voice recognition processing means 1A, 1B having the respectively intrinsic recognition systems, a recognition result consolidating means 2 which totalizes the respective recognition results of these voice recognition processing means 1A, 1B, rearranges these results by the degree of reliability of the respective voice recognition processing means 1A, 1B to be prescribed references, makes overall evaluation thereof and estimates and outputs the final results and a consolidative learning means 3 which consolidates the data for learning by integrating the respective voice recognition processing means 1A, 1B and the recognition result consolidating means 3 based on feedback data from users and knowledge possessed by a learner.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、例えば各種の券売機
などに有効に組み込み適用されるもので、入力として与
えられる対象領域の音声データの特徴量を計算機を使用
して抽出し、その抽出した特徴量をもとにして入力音声
データを認識して出力する音声認識装置及び音声認識方
法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is effectively applied to, for example, various ticket vending machines and the like, and the feature amount of voice data of a target area given as an input is extracted using a computer, and the extraction is performed. The present invention relates to a voice recognition device and a voice recognition method for recognizing and outputting input voice data based on the feature amount.

【０００２】[0002]

【従来の技術】従来一般の音声認識装置は、図１１に示
すように、テンプレート作成用音声入力をテンプレート
作成装置１００に通して標準テンプレート１０１を作成
し、この標準テンプレート１０１を備えた単一の音声認
識処理装置１０２によって新規音声入力の認識処理を行
い、その認識結果を出力するようになされていた。ま
た、学習については、図１１に示すように、標準テンプ
レート１０１の作成時にフィードバックを行う方法や、
図１２に示すように、新規音声入力を使用して学習用フ
ィードバックを行う方法などが知られていた。2. Description of the Related Art A conventional general voice recognition apparatus, as shown in FIG. 11, passes a voice input for template creation through a template creation apparatus 100 to create a standard template 101, and a single standard template 101 is provided. The voice recognition processing device 102 performs a recognition process of a new voice input and outputs the recognition result. As for learning, as shown in FIG. 11, a method of giving feedback when the standard template 101 is created,
As shown in FIG. 12, there has been known a method of providing learning feedback using a new voice input.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記し
た従来の音声認識装置及び音声認識方法においては、対
象領域の音声データの特徴量としてどのような値あるい
は値の集合を考えることが可能か、また、入力された音
声データからその特徴量を計算する計算方式などの高度
で要求仕様の厳しい認識方式の音声認識処理手段をはじ
めに確立しなければならず、認識されるべき対象が複雑
で、かつ、認識結果として出力されるべき特徴により高
度性が要求されるに従って、認識時に計算される特徴量
をはじめに決定することが非常に困難になる。その結
果、標準テンプレート１０１の作成の品質が悪いと、認
識率が経年的に低下し、また、品質を良くするために
は、大変な労力および高度な技術を要するという問題が
ある。However, in the above-described conventional voice recognition apparatus and voice recognition method, what value or set of values can be considered as the feature amount of the voice data of the target area, and , It is necessary to first establish a speech recognition processing means of a high-level and strict specification recognition method such as a calculation method for calculating the feature amount from input speech data, the target to be recognized is complicated, and As a feature to be output as a recognition result requires a high degree of sophistication, it becomes very difficult to first determine the feature amount calculated at the time of recognition. As a result, if the quality of the standard template 101 is poor, the recognition rate will decrease over time, and in order to improve the quality, there is a problem that a great deal of labor and advanced technology are required.

【０００４】また、学習に関しても、上記した従来の方
法は、単一モジュールでの学習に止まっており、逐次学
習などの効果的な学習でないために、作動環境の変化や
実際の認識対象の装置構築時の想定とのずれ、装置自体
の経年変化などの影響で認識率が要求される仕様に対し
て低下することは避けられないという問題がある。Regarding learning, the above-mentioned conventional method is limited to learning by a single module and is not effective learning such as sequential learning. There is a problem that it is inevitable that the recognition rate will be lower than the required specifications due to the deviation from the assumption at the time of construction and the change over time of the device itself.

【０００５】この発明は上記のような実情に鑑みてなさ
れたもので、高度で厳しい要求仕様を満たす認識方式を
確立するのでなく、比較的容易に得られる複数の音声認
識処理手段を相補的に使用して、認識率の向上を図るこ
とができる音声認識装置及び音声認識方法を提供するこ
とを主たる目的としている。The present invention has been made in view of the above circumstances, and does not establish a recognition method satisfying a high level and strict requirement specification, but complements a plurality of voice recognition processing means which are relatively easy to obtain. It is a main object to provide a voice recognition device and a voice recognition method which can improve the recognition rate by using the voice recognition device.

【０００６】この発明の他の目的は、作動環境の変化や
実際の認識対象に漸次的に自律的に適応させることが可
能で、認識率を経年的に向上させることができるように
することにある。Another object of the present invention is to make it possible to gradually and autonomously adapt to changes in the operating environment and actual recognition targets, and to improve the recognition rate over time. is there.

【０００７】[0007]

【課題を解決するための手段】上記主たる目的を達成す
るために、請求項１の発明に係る音声認識装置は、複数
の音声認識処理手段と、これら複数の音声認識処理手段
それぞれの認識結果を、所定基準により纏めて総合評価
して最終結果を出力する認識結果統合手段とを備えたこ
とを特徴とするものである。In order to achieve the above main object, a voice recognition apparatus according to the invention of claim 1 provides a plurality of voice recognition processing means and a recognition result of each of the plurality of voice recognition processing means. A recognition result integrating means for collectively evaluating the results according to a predetermined standard and outputting a final result is provided.

【０００８】また、上記他の目的を達成するために、請
求項２の発明に係る音声認識装置は、複数の音声認識処
理手段と、これら複数の音声認識処理手段それぞれの認
識結果を、所定基準により纏めて総合評価して最終結果
を出力する認識結果統合手段と、上記複数の音声認識処
理手段を一括して学習するためのデータを統合する統合
学習手段とを備えたことを特徴とするものである。In order to achieve the above-mentioned other object, a voice recognition apparatus according to the invention of claim 2 has a plurality of voice recognition processing means and the recognition results of each of the plurality of voice recognition processing means as a predetermined criterion. And a recognition result integrating means for comprehensively evaluating and outputting a final result, and an integrated learning means for integrating data for collectively learning the plurality of voice recognition processing means. Is.

【０００９】上記請求項１または２の音声認識装置にお
ける上記認識結果統合手段として、請求項３のように、
複数の音声認識処理手段それぞれの認識結果のうち、所
定基準を満たす認識処理手段の認識結果を選択し、それ
ら選択された認識結果を総合評価して最終結果を出力す
る認識選択手段を有するものを使用することが好まし
い。As the recognition result integrating means in the voice recognition device according to claim 1 or 2, as in claim 3,
Among the recognition results of each of the plurality of voice recognition processing means, a recognition selection means that selects a recognition result of the recognition processing means satisfying a predetermined criterion, comprehensively evaluates the selected recognition results, and outputs a final result. Preference is given to using.

【００１０】また、請求項１〜３のいずれかの音声認識
装置において、請求項４のように、上記複数の音声認識
処理手段の前段に複数の音声区間切り出し手段を設ける
ことが好ましく、また、請求項５のように、上記複数の
音声認識処理手段として、それぞれ異なる種類のものを
使用することが望ましい。Further, in the voice recognition device according to any one of claims 1 to 3, it is preferable that a plurality of voice segment cutout means are provided in front of the plurality of voice recognition processing means, as in claim 4. As in claim 5, it is desirable that different types of speech recognition processing means are used.

【００１１】さらに、上記請求項２の発明に係る音声認
識装置における音声認識処理手段として、請求項６のよ
うに、標準パターンをもつ音声認識手段と短期学習標準
パターンをもつ音声認識手段と長期学習標準パターンを
もつ音声認識手段の３種類からなり、また、上記統合学
習手段として、上記短期学習標準パターンを第１の所定
期間にわたり学習結果を反映させ、かつ上記長期学習標
準パターンを上記第１の所定期間よりも長い第２の所定
期間にわたり学習結果を反映させるものであることが望
ましく、またここで、請求項７のように、上記統合学習
手段が、上記３種類の音声認識処理手段それぞれの認識
結果を、所定基準により纏めて総合評価して最終結果を
出力するものであることが望ましく、さらに、請求項８
のように、上記短期学習標準パターンおよび長期学習標
準パターン個々の利用頻度を所定の尺度で評価して、そ
の評価に基づいて不必要な学習標準パターンを消去する
パターン評価手段を備えていることが特に望ましい。Further, as the voice recognition processing means in the voice recognition apparatus according to the second aspect of the present invention, as in the sixth aspect, the voice recognition means having the standard pattern, the short-term learning, the voice recognition means having the standard pattern, and the long-term learning. It is composed of three types of voice recognition means having a standard pattern. Further, as the integrated learning means, the short-term learning standard pattern reflects the learning result for a first predetermined period, and the long-term learning standard pattern is the first learning pattern. It is desirable that the learning result is reflected over a second predetermined period that is longer than the predetermined period, and here, as in claim 7, the integrated learning means is configured so that each of the three types of speech recognition processing means has It is desirable that the recognition results are collectively evaluated according to a predetermined standard, and the final result is output.
As described above, it is possible to provide a pattern evaluation means for evaluating the use frequency of each of the short-term learning standard pattern and the long-term learning standard pattern on a predetermined scale, and erasing unnecessary learning standard patterns based on the evaluation. Especially desirable.

【００１２】請求項９の発明に係る音声認識方法は、複
数の音声認識処理手段それぞれによって得た認識結果
を、所定基準により纏めて総合評価して最終結果を出力
することを特徴とするものである。A speech recognition method according to a ninth aspect of the present invention is characterized in that the recognition results obtained by each of the plurality of speech recognition processing means are collectively evaluated according to a predetermined standard and a final result is output. is there.

【００１３】また、請求項１０の発明に係る音声認識方
法は、複数の音声認識処理手段それぞれによって得た認
識結果を、所定基準により纏めて総合評価して最終結果
を出力することを特徴とするものである。Further, the voice recognition method according to the invention of claim 10 is characterized in that the recognition results obtained by each of the plurality of voice recognition processing means are collectively evaluated according to a predetermined standard, and a final result is output. It is a thing.

【００１４】上記請求項９または１０の音声認識方法に
おいて、請求項１１のように、複数の音声認識処理手段
それぞれによって得た認識結果のうち、所定基準を満た
す認識結果を選択し、それら選択された認識結果を総合
評価して最終結果を出力することが好ましい。In the speech recognition method according to claim 9 or 10, the recognition result satisfying a predetermined criterion is selected from the recognition results obtained by each of the plurality of speech recognition processing means, and those are selected. It is preferable to comprehensively evaluate the recognition results and output the final result.

【００１５】また、請求項９〜１１のいずれかの音声認
識方法において、請求項１２のように、入力音声を複数
の区間で切り出す複数の音声区間切り出し手段を有し、
これら複数の切り出し手段により切り出された複数区間
の音声を複数の音声認識処理手段によって認識処理させ
ることが好ましく、また、請求項１３のように、上記複
数の音声認識処理手段として、それぞれ異なる種類のも
のを使用することが望ましい。Further, in the voice recognition method according to any one of claims 9 to 11, as in claim 12, there is provided a plurality of voice section cutout means for cutting out the input voice into a plurality of sections.
It is preferable that the voices of a plurality of sections cut out by the plurality of cutout means are recognized by a plurality of voice recognition processing means, and the plurality of voice recognition processing means have different types. It is desirable to use one.

【００１６】上記請求項１０の発明に係る音声認識方法
における音声認識処理手段として、請求項１４のよう
に、標準パターンをもつ音声認識手段と短期学習標準パ
ターンをもつ音声認識手段と長期学習標準パターンをも
つ音声認識手段の３種類を使用し、上記短期学習標準パ
ターンを第１の所定期間にわたり学習結果を反映させ、
かつ、上記長期学習標準パターンを上記第１の所定期間
よりも長い第２の所定期間にわたり学習結果を反映させ
ることが望ましく、またここで、請求項１５のように、
上記３種類の音声認識処理手段それぞれによって得た認
識結果を、所定基準により纏めて総合評価して最終結果
を出力することが望ましく、さらに、請求項１６のよう
に、上記短期学習標準パターンおよび長期学習標準パタ
ーン個々の利用頻度を所定の尺度で評価して、その評価
に基づいて不必要な学習標準パターンの消去することが
特に望ましい。As the speech recognition processing means in the speech recognition method according to the tenth aspect of the invention, as in claim 14, the speech recognition means having a standard pattern, the speech recognition means having a short-term learning standard pattern, and the long-term learning standard pattern. Using the three types of speech recognition means having, the short-term learning standard pattern reflects the learning result over the first predetermined period,
Moreover, it is desirable that the long-term learning standard pattern reflects the learning result over a second predetermined period that is longer than the first predetermined period, and here, as in claim 15,
It is desirable to collectively evaluate the recognition results obtained by each of the three types of speech recognition processing means according to a predetermined standard and output a final result. Furthermore, as in claim 16, the short-term learning standard pattern and the long-term learning standard pattern are obtained. It is particularly desirable to evaluate the use frequency of each learning standard pattern on a predetermined scale and eliminate unnecessary learning standard patterns based on the evaluation.

【００１７】[0017]

【作用】請求項１および請求項９の発明によれば、音声
データが入力されると、複数の音声識別処理手段がそれ
ぞれ固有の認識方式に従って、それぞれの認識結果を出
力し、これら複数の認識結果が集計されるとともに、所
定基準により纏めて総合的に評価されて、最終結果が出
力される。このように、比較的容易に得ることが可能な
複数の認識方式を相補的に使用し、それら認識方式によ
って得られる結果を纏めて総合評価することにより、要
求仕様を満たす高い認識率の最終音声認識結果を出力す
ることが可能となる。According to the inventions of claims 1 and 9, when voice data is input, a plurality of voice recognition processing means output respective recognition results in accordance with their own recognition methods, and the plurality of recognitions are performed. The results are aggregated and collectively evaluated according to a predetermined standard, and the final result is output. In this way, a plurality of recognition methods that are relatively easy to obtain are used in a complementary manner, and the results obtained by these recognition methods are collectively evaluated to comprehensively evaluate the final speech with a high recognition rate that satisfies the required specifications. It is possible to output the recognition result.

【００１８】また、請求項２および請求項１０の発明に
よれば、上記のようにして最終音声認識結果を出力しな
がら、その結果に基づくユーザーからのフィードバック
などをもとにして、上記複数の音声識別処理手段を現在
の入力音声データにおける認識の適合度が向上するよう
に一括して学習させることにより、自律的に作動環境の
変化や実際の認識対象に漸次的に適応させて、認識率の
経年的な向上を図ることが可能となる。According to the second and tenth aspects of the present invention, while outputting the final voice recognition result as described above, based on feedback from the user based on the result, the above plurality of By collectively learning the voice identification processing means so as to improve the recognition suitability in the current input voice data, the voice recognition processing means is gradually and autonomously adapted to a change in the operating environment or an actual recognition target, thereby recognizing the recognition rate. It is possible to improve over time.

【００１９】また、請求項３および請求項１１の発明に
よれば、複数の音声認識処理手段それぞれの認識結果を
纏めて総合評価する際に、所定基準を満たす認識処理手
段の認識結果を選択し、その選択された認識結果を総合
評価して最終結果を出力することにより、一層認識率の
高い最終音声認識結果を出力することが可能である。Further, according to the inventions of claims 3 and 11, when the recognition results of the plurality of speech recognition processing means are collectively evaluated, the recognition results of the recognition processing means satisfying a predetermined criterion are selected. By comprehensively evaluating the selected recognition result and outputting the final result, it is possible to output the final speech recognition result having a higher recognition rate.

【００２０】また、上記の音声認識処理の実行にあたっ
て、請求項４および請求項１２のように、入力音声を複
数の区間で切り出す複数の音声区間切り出し手段を設け
て、これら手段により切り出された複数区間の音声を複
数の音声認識処理手段によって認識処理させることによ
り、種々の条件によって優劣が変化する複数の切り出し
手段のうち、音声入力の時点で最も精度のよい切り出し
手段が効率的に使用されて音声区間切り出し精度が高め
られ、その結果として、音声認識率の向上が図れる。Further, in executing the above-mentioned voice recognition processing, a plurality of voice section cutout means for cutting out the input voice into a plurality of sections are provided, and a plurality of pieces cut out by these means are provided. By recognizing the voice of the section by the plurality of voice recognition processing means, among the plurality of cutout means whose superiority or inferiority changes according to various conditions, the most accurate cutout means at the time of voice input is efficiently used. The accuracy of voice segment extraction is improved, and as a result, the voice recognition rate can be improved.

【００２１】また、上記の音声認識処理の実行にあたっ
て、請求項５および請求項１３のように、それぞれが異
なる種類の複数の音声認識処理手段を使用することによ
り、認識可能な語彙数を音声認識処理手段の数だけ計算
コストを増やさずに増大することが容易となる。In executing the above-mentioned voice recognition processing, the number of recognizable vocabularies is voice-recognized by using a plurality of voice recognition processing means of different types as in claims 5 and 13. It becomes easy to increase the number of processing means without increasing the calculation cost.

【００２２】また、請求項６および請求項１４の発明に
よれば、初期状態は工場出荷時などに予め登録された音
声データをストックしている標準パターンをもつ音声認
識手段を使用して所定の音声認識処理を行うと同時に、
特徴データを抽出し、それがある量以上にストックされ
たときにクラスタリングさせて短期学習標準パターンお
よび長期学習標準パターンを作成させるといった学習処
理アルゴリズムを実行し、それ以降は標準パターン、短
期学習標準パターンおよび長期学習標準パターンをもつ
３種類の音声認識手段を使用して所定の音声認識処理を
行うといった通常学習処理アルゴリズムを実行すること
によって、各パターンの学習結果を効率的に反映させ
て、標準パターンの良否に関係なく、また、例えばニュ
ーラルネットのような多大な計算量を要することなく、
種々の状況変化に対応して高速で、かつ、柔軟な音声認
識を実現し、認識率の向上を図ることが可能となる。Further, according to the inventions of claims 6 and 14, in the initial state, the voice recognition means having a standard pattern in which voice data registered in advance at the time of factory shipment is stocked is predetermined. At the same time as performing voice recognition processing,
Executes a learning processing algorithm such as extracting feature data and creating a short-term learning standard pattern and a long-term learning standard pattern by clustering when it is stocked above a certain amount, and thereafter, the standard pattern and short-term learning standard pattern. By executing a normal learning processing algorithm such as performing a predetermined speech recognition process using three types of speech recognition means having a long-term learning standard pattern, the learning result of each pattern is efficiently reflected, and the standard pattern Irrespective of whether it is good or bad, and without requiring a large amount of calculation such as a neural network,
It is possible to realize high-speed and flexible voice recognition in response to various changes in the situation and improve the recognition rate.

【００２３】さらに、上記の音声認識にあたって、請求
項７および請求項１５のように、複数の音声認識処理手
段それぞれの認識結果を所定基準により纏めて総合的に
評価して最終結果を出力することにより、要求仕様を満
たす高い認識率の最終音声認識結果を出力することが可
能となる。Further, in the above-mentioned voice recognition, as in claims 7 and 15, the recognition results of each of the plurality of voice recognition processing means are collectively evaluated according to a predetermined standard, and the final result is output. As a result, it becomes possible to output the final voice recognition result having a high recognition rate that satisfies the required specifications.

【００２４】特に、上記の音声認識にあたって、請求項
８および請求項１６のように、上記短期学習標準パター
ンおよび長期学習標準パターン個々の利用頻度を所定の
尺度で評価して不必要な学習標準パターンを消去し、有
効な学習標準パターンのみを残すことで、計算機の消費
パワーを低減することが可能となる。Particularly, in the above speech recognition, as in claims 8 and 16, unnecessary use standard patterns are evaluated by evaluating the use frequency of each of the short-term learning standard pattern and the long-term learning standard pattern by a predetermined scale. Is eliminated and only the effective learning standard pattern is left, so that the power consumption of the computer can be reduced.

【００２５】[0025]

【実施例】以下、この発明の実施例を図面にもとづいて
説明する。実施例１：図１は、音声認識処理手段を２つ備えた実施
例１による音声認識装置の概略構成図であり、同図にお
いて、１Ａ，１Ｂは２つの音声認識処理手段で、これら
音声認識処理手段１Ａ，１Ｂは、認識に用いる知識デー
タをストックした標準テンプレート１ａ１，１ｂ１とこ
れら標準テンプレート１ａ１，１ｂ１の知識データを使
用してそれぞれ固有の認識方式に従って、それぞれの認
識結果を出力する音声認識処理部１ａ２，１ｂ２とから
なる。２は入力音声を所定の区間で切り出す音声区間切
り出し部で、これによって切り出した区間の音声は上記
２つの音声認識処理手段１Ａ，１Ｂの音声認識処理部１
ａ２，１ｂ２に入力される。Embodiments of the present invention will be described below with reference to the drawings. Embodiment 1 FIG. 1 is a schematic configuration diagram of a voice recognition apparatus according to Embodiment 1 provided with two voice recognition processing means. In FIG. 1, 1A and 1B are two voice recognition processing means, and these voice recognition processing means are provided. The processing means 1A, 1B use the standard templates 1a1, 1b1 in which the knowledge data used for recognition are stocked and the knowledge data of these standard templates 1a1, 1b1 to output the respective recognition results according to their own recognition methods. It is composed of processing units 1a2 and 1b2. Reference numeral 2 is a voice section cutout unit that cuts out the input voice in a predetermined section, and the voice of the section cut out by this is the voice recognition processing unit 1 of the two voice recognition processing units 1A and 1B.
It is input to a2, 1b2.

【００２６】３は認識結果統合手段で、上記２つの音声
認識処理手段１Ａ，１Ｂそれぞれの認識結果を集計する
とともに、所定基準となる各音声認識処理手段１Ａ，１
Ｂの信頼度（後述する）により纏めて総合評価して最終
結果を推定し出力する。４は統合学習手段で、ユーザー
からのフィードバックデータと該統合学習手段４自身が
もっている知識とをもとにして、上記各音声認識処理手
段１Ａ，１Ｂおよび認識結果統合手段３を一括して学習
するためのデータを統合する。Reference numeral 3 is a recognition result integrating means, which collects the recognition results of the two voice recognition processing means 1A, 1B and also serves as a predetermined reference.
Based on the reliability of B (described later), a comprehensive evaluation is performed and the final result is estimated and output. Reference numeral 4 denotes an integrated learning means, which collectively learns each of the voice recognition processing means 1A and 1B and the recognition result integration means 3 based on feedback data from the user and knowledge possessed by the integrated learning means 4 itself. Integrate the data to do so.

【００２７】つぎに、上記構成の実施例１による音声認
識動作について、図２のフローチャートを参照しながら
説明する。例えば、マイクロホンなどを通して入力され
た音声は所定区間にわたり音声区間切り出し部２で切り
出されたのち、それら切り出された区間の音声が２つの
音声認識処理手段１Ａ，１Ｂの音声認識処理部１ａ２，
１ｂ２に入力される（ステップＳ１０）。この入力され
た音声は上記各音声認識処理部１ａ２，１ｂ２それぞれ
固有の認識方式に従って、それぞれ認識処理されて（ス
テップＳ１１）、その結果が出力される。Next, the voice recognition operation according to the first embodiment having the above configuration will be described with reference to the flowchart of FIG. For example, the voice input through a microphone or the like is cut out by the voice section cutout unit 2 over a predetermined section, and the voices of the cutout sections are then recognized by the two voice recognition processing units 1A and 1B.
It is input to 1b2 (step S10). The input voice is subjected to recognition processing according to the recognition method unique to each of the voice recognition processing units 1a2 and 1b2 (step S11), and the result is output.

【００２８】ついで、上記各音声認識処理手段１Ａ，１
Ｂの音声認識処理部１ａ２，１ｂ２から出力された認識
結果が認識結果統合手段３に集計される（ステップＳ１
２）とともに、所定基準である信頼度を用いて各音声認
識処理手段１Ａ，１Ｂからの認識結果の補正距離を求
め、その補正距離から各認識結果を総合評価して最終結
果を推定し出力する（ステップＳ１３）。その出力され
る最終結果にもとづいて統合学習手段４にユーザーから
フィードバックデータが入力され（ステップＳ１４）、
このユーザーからのフィードバックデータと統合学習手
段４自身がもっている知識とをもとにして、上記各音声
認識処理手段１Ａ，１Ｂにおける認識方式の信頼度をア
ップデートする（ステップＳ１５）とともに、現在の入
力音声における認識の適合度の向上を図るように上記各
音声認識処理手段１Ａ，１Ｂの音声認識処理部１ａ２，
１ｂ２の学習処理を行う（ステップＳ１６）。Next, each of the above speech recognition processing means 1A, 1
The recognition results output from the speech recognition processing units 1a2 and 1b2 of B are aggregated by the recognition result integration unit 3 (step S1).
Along with 2), the correction distance of the recognition result from each of the voice recognition processing means 1A and 1B is obtained using the reliability as a predetermined reference, and the recognition result is comprehensively evaluated from the correction distance to estimate and output the final result. (Step S13). Based on the output final result, feedback data is input from the user to the integrated learning means 4 (step S14),
Based on the feedback data from the user and the knowledge possessed by the integrated learning means 4 itself, the reliability of the recognition method in each of the voice recognition processing means 1A, 1B is updated (step S15) and the current input is made. The speech recognition processing sections 1a2, 1a2 of the speech recognition processing means 1A, 1B are arranged so as to improve the adaptability of speech recognition.
The learning process of 1b2 is performed (step S16).

【００２９】つぎに、上記各音声認識処理手段１Ａ，１
Ｂの音声認識処理部１ａ２，１ｂ２の学習処理が終了し
た時点で、その学習後の認識の適合度に従って、最終結
果がユーザーからのフィードバックデータと一致するよ
うに、上記統合学習手段４の学習処理を行う。Next, each of the voice recognition processing means 1A, 1
When the learning process of the speech recognition processing units 1a2 and 1b2 of B is completed, the learning process of the integrated learning unit 4 is performed so that the final result matches the feedback data from the user according to the fitness of the recognition after the learning. I do.

【００３０】なお、上記した音声認識装置における認識
結果統合手段３での総合評価に用いる信頼度および統合
学習手段４での認識方式の信頼度のアップデートに関し
て、詳しく説明する。上記各音声認識処理手段１Ａ，１
Ｂそれぞれが有する信頼度（Ｓ）は、Ｓ＝ｈｉｔ÷ｂａｔ ……（１）但し、ｂａｔ：音声認識の実行回数ｈｉｔ：音声認識の実行回数のうち、認識結果が３位ま
でに正解を含んでいた回数（３位認識回数）で求められる。この信頼度（Ｓ）は、１に近付くほど音
声認識処理手段の重要性が高く、０に近いほど音声認識
処理手段の重要性が低い。この信頼度（Ｓ）を認識結果
統合手段３に教える（学習処理）することにより、最終
認識結果の信頼性が向上する。It should be noted that the updating of the reliability used in the comprehensive evaluation by the recognition result integrating means 3 and the reliability of the recognition method by the integrated learning means 4 in the above-described voice recognition device will be described in detail. Each of the above speech recognition processing means 1A, 1
The reliability (S) of each B is: S = hit / bat (1) where bat is the number of voice recognition executions hit: Among the number of voice recognition executions, the recognition result includes the correct answers up to the third place It is calculated by the number of times (3rd place recognition). As the reliability (S) is closer to 1, the importance of the voice recognition processing means is higher, and as the reliability is closer to 0, the importance of the voice recognition processing means is lower. The reliability of the final recognition result is improved by teaching (learning process) the recognition result integrating means 3 with this reliability (S).

【００３１】上記（１）式を用いて上記各音声認識処理
手段１Ａ，１Ｂの信頼度データを求めた一例を示すと、
表１のようになる。An example of obtaining the reliability data of each of the voice recognition processing means 1A and 1B by using the above equation (1) is shown below.
It becomes like Table 1.

【００３２】[0032]

【表１】 [Table 1]

【００３３】上記した信頼度（Ｓ）を用いて、上記各音
声認識処理手段１Ａ，１Ｂで求められた入力音声とテン
プレート１ａ１，１ｂ１との類似度を示す一般的な方法
である距離（distance；ｄ）の補正距離（correct dist
ance；ｃｄ）は、ｃｄ＝ｄ÷Ｓ ……（２）で求められる。上記（２）式を用いて上記各音声認識処
理手段１Ａ，１Ｂの距離（ｄ）および補正距離（ｃｄ）
を求めた一例を示すと、表２のようになる。Using the reliability (S) described above, a distance (distance;) which is a general method showing the degree of similarity between the input voices obtained by the respective voice recognition processing means 1A, 1B and the templates 1a1, 1b1. d) correct dist
ance; cd) is calculated by cd = d ÷ S (2). The distance (d) and the correction distance (cd) of each of the voice recognition processing means 1A and 1B are calculated by using the equation (2).
Table 2 shows an example of the calculation.

【００３４】[0034]

【表２】 [Table 2]

【００３５】ところで、上記実施例１における２つの各
音声認識処理手段１Ａ，１Ｂにおいて、それぞれのテン
プレート１ａ１，１ｂ１が同一の単語についての異なる
発声方法（男女別、年齢別、地方別など）で、かつ、異
なる形式で作成されたテンプレートである場合、上記音
声認識処理部１ａ２，１ｂ２として、それぞれのテンプ
レート１ａ１，１ｂ１に対応したものを用いることによ
り、各認識結果を統合することで認識率の向上が図れ
る。また、上記実施例１における２つの各音声認識処理
手段１Ａ，１Ｂの音声認識処理部１ａ２，１ｂ２が同一
で、それぞれのテンプレート１ａ１，１ｂ１が異なる単
語について作成されたものである場合は、認識可能な単
語数、つまり、認識語彙数を計算コストの上昇を招くこ
となく増大することができる。さらに、上記実施例１に
おける２つの各音声認識処理手段１Ａ，１Ｂそれぞれの
テンプレート１ａ１，１ｂ１が異なる単語についての異
なる発声方法（男女別、年齢別、地方別など）で、か
つ、異なる形式で作成されたテンプレートである場合、
上記音声認識処理部１ａ２，１ｂ２として、それぞれの
テンプレート１ａ１，１ｂ１に対応したものを用いるこ
とにより、認識可能な単語数、つまり、認識語彙数を音
声認識処理部１ａ２，１ｂ２の数だけ計算コストの上昇
を招くことなく増大することができる。By the way, in each of the two speech recognition processing means 1A and 1B in the above-mentioned first embodiment, the respective templates 1a1 and 1b1 are different voicing methods for the same word (by gender, age, region, etc.), Further, when the templates are created in different formats, by using the ones corresponding to the respective templates 1a1, 1b1 as the voice recognition processing units 1a2, 1b2, the recognition results are improved by integrating the respective recognition results. Can be achieved. Further, if the voice recognition processing units 1a2 and 1b2 of the two voice recognition processing means 1A and 1B in the first embodiment are the same and the respective templates 1a1 and 1b1 are created for different words, recognition is possible. It is possible to increase the number of words, that is, the number of recognized vocabulary words, without increasing the calculation cost. Further, the templates 1a1 and 1b1 of the two respective voice recognition processing means 1A and 1B in the above-described first embodiment are created by different voicing methods (by gender, age, region, etc.) for different words and in different formats. If the template is
By using the ones corresponding to the respective templates 1a1, 1b1 as the speech recognition processing units 1a2, 1b2, the number of recognizable words, that is, the number of recognized vocabularies, is calculated by the number of the speech recognition processing units 1a2, 1b2. It can be increased without causing a rise.

【００３６】実施例２：図３は、音声認識処理手段を３
つ備えた実施例２による音声認識装置の概略構成図であ
り、同図において、１Ａ，１Ｂ，１Ｃは３つの音声認識
処理手段で、これら３つの音声認識処理手段１Ａ，１
Ｂ，１Ｃは、認識に用いる知識データをストックした標
準テンプレート１ａ１，１ｂ１，１ｃ１とこれら標準テ
ンプレート１ａ１，１ｂ１，１ｃ１の知識データを使用
してそれぞれ固有の認識方式に従って、それぞれの認識
結果を出力する音声認識処理部１ａ２，１ｂ２，１ｃ２
とからなる。その他の構成は、図１に示す実施例１と同
一であるため、同一部分に同一の符号を付して、それら
の説明を省略する。Embodiment 2: FIG. 3 shows a voice recognition processing means 3
FIG. 3 is a schematic configuration diagram of a speech recognition apparatus according to a second embodiment provided therein, in which FIG. 1A, 1B, 1C are three speech recognition processing means, and these three speech recognition processing means 1A, 1C.
B and 1C output the respective recognition results according to their own recognition methods using the standard templates 1a1, 1b1, 1c1 in which the knowledge data used for recognition are stocked and the knowledge data of these standard templates 1a1, 1b1, 1c1. Speech recognition processing unit 1a2, 1b2, 1c2
Consists of Since the other configurations are the same as those of the first embodiment shown in FIG. 1, the same reference numerals are given to the same portions, and the description thereof will be omitted.

【００３７】このような構成の実施例２による音声認識
動作は、上記実施例１の場合とほぼ同様であり、この場
合の各音声認識処理手段１Ａ，１Ｂ，１Ｃの信頼度デー
タを上記した（１）式により求めた一例を示すと、表３
のようになる。The voice recognition operation according to the second embodiment having such a configuration is almost the same as that of the first embodiment, and the reliability data of the respective voice recognition processing means 1A, 1B, 1C in this case are described above. Table 3 shows an example obtained by the equation 1).
become that way.

【００３８】[0038]

【表３】 [Table 3]

【００３９】また、上記した（２）式を用いて上記各音
声認識処理手段１Ａ，１Ｂ，１Ｃの距離（ｄ）および補
正距離（ｃｄ）を求めた一例を示すと、表４のようにな
る。Table 4 shows an example in which the distance (d) and the correction distance (cd) of each of the voice recognition processing means 1A, 1B, 1C are obtained by using the above equation (2). .

【００４０】[0040]

【表４】 [Table 4]

【００４１】表４を見てみると、音声認識処理手段１Ａ
のみが１位となっている。その距離は最も近い訳ではな
いが、信頼度（Ｓ）が高いため、総合評価による統合結
果では１位になる。このように多数決の原理では、１位
認識になるはずの単語が２位以下に落ちている場合があ
る。このような結果を踏まえて、上記認識結果統合手段
３に、信頼度（Ｓ）が低く統合結果の悪い音声認識処理
手段の認識結果は順次捨棄し、信頼度（Ｓ）の高い音声
認識処理手段の認識結果のみを選択して、それら選択さ
れた認識結果を総合評価して最終結果を出力する認識選
択機能をもたせることにより、音声認識結果の信頼性を
一層向上することができる。Looking at Table 4, the speech recognition processing means 1A
Only No. 1 Although the distance is not the shortest, since the reliability (S) is high, it is ranked first in the integrated result of the comprehensive evaluation. In this way, according to the principle of majority voting, the words that should be recognized as the first place may fall below the second place. Based on such a result, the recognition result integrating means 3 sequentially discards the recognition results of the speech recognition processing means having a low reliability (S) and a poor integration result, and the speech recognition processing means having a high reliability (S). It is possible to further improve the reliability of the voice recognition result by selecting only the recognition result and having a recognition selection function of comprehensively evaluating the selected recognition results and outputting the final result.

【００４２】実施例３：図４は、音声区間切り出し手段
および音声認識処理手段をそれぞれ２つ備えた実施例３
による音声認識装置の概略構成図であり、同図におい
て、実施例１と相違する点は、２つの音声認識処理手段
１Ａ，１Ｂの前段に、入力音声を所定の区間で切り出す
２つの音声区間切り出し手段２Ａ，２Ｂを設けて、これ
ら２つの音声区間切り出し手段２Ａ，２Ｂにより切り出
した区間の音声を上記２つの音声認識処理手段１Ａ，１
Ｂに入力してそれぞれ認識処理するようにした点であ
り、その他の構成は、図１に示す実施例１と同一である
ため、同一部分に同一の符号を付して、それらの説明を
省略する。Third Embodiment: FIG. 4 shows a third embodiment in which two voice section cutting means and two voice recognition processing means are provided.
2 is a schematic configuration diagram of a voice recognition device according to the first embodiment. In the figure, the difference from the first embodiment is that two voice segment cutouts are provided before the two voice recognition processing units 1A and 1B to cut out input voices in predetermined sections. Means 2A and 2B are provided, and the voices of the sections cut out by these two voice section cutout means 2A and 2B are used as the two voice recognition processing means 1A and 1B.
The other components are the same as those of the first embodiment shown in FIG. 1, and therefore the same parts are designated by the same reference numerals and their description is omitted. To do.

【００４３】上記構成の実施例３による場合は、パラメ
ータの設定などの種々の条件によって優劣が変化する２
つの音声区間切り出し手段２Ａ，２Ｂのうち、音声入力
の時点で最も精度のよい切り出し手段を効率的に使用す
ることにより、音声区間切り出し精度を高めることが可
能で、その結果として、音声認識率の向上が図れる。In the case of the third embodiment having the above configuration, the superiority or inferiority changes depending on various conditions such as parameter setting.
Of the two voice section cutout means 2A and 2B, the voice section cutout accuracy can be improved by efficiently using the most accurate cutout means at the time of voice input, and as a result, the voice recognition rate can be improved. Can be improved.

【００４４】実施例４：図５は、上記実施例１における
音声認識装置の２つの音声認識処理手段１Ａ，１Ｂそれ
ぞれの具体的な構成を示すブロック図であり、同図にお
いて、５はＡ／Ｄ変換部で、入力音声をディジタルデー
タに変換する。６は時間正規化・特徴抽出部で、ディジ
タルデータに変換さけた入力音声から特徴データを抽出
する。７は標準パターン作成部で、上記時間正規化・特
徴抽出部６により抽出された音声特徴データを用いて、
３種類の標準パターン（実施例１のテンプレートに相当
する）を作成する。このような３種類の標準パターンを
作成するために上記音声特徴データをストックしておく
ための領域として、図６に明示するように、有効学習デ
ータ領域Ｔｅｍｐと２つの学習データ領域Ｔｅｍｐ１，
Ｔｅｍｐ２の３種類を用意している。このうち、２つの
学習データ領域Ｔｅｍｐ１，Ｔｅｍｐ２は短期学習標準
パターン作成用の音声特徴データを蓄積し、かつ、有効
学習データ領域Ｔｅｍｐは後述の短期学習標準パターン
および長期学習標準パターンを標準データとして使用し
ていく上で、利用頻度である貢献度を評価して、その貢
献度の高い音声特徴データを長期有効データとして保管
する。Fourth Embodiment: FIG. 5 is a block diagram showing a concrete configuration of each of the two voice recognition processing means 1A and 1B of the voice recognition apparatus in the first embodiment. In FIG. 5, 5 is A / The D converter converts the input voice into digital data. A time normalization / feature extraction unit 6 extracts feature data from the input voice that has not been converted into digital data. Reference numeral 7 denotes a standard pattern creation unit, which uses the voice feature data extracted by the time normalization / feature extraction unit 6
Three types of standard patterns (corresponding to the template of the first embodiment) are created. As an area for stocking the voice feature data in order to create such three kinds of standard patterns, as shown in FIG. 6, an effective learning data area Temp and two learning data areas Temp1,
Three types of Temp2 are prepared. Of these, two learning data areas Temp1 and Temp2 accumulate voice feature data for creating a short-term learning standard pattern, and the effective learning data area Temp uses a short-term learning standard pattern and a long-term learning standard pattern described later as standard data. In doing so, the contribution, which is the frequency of use, is evaluated, and the voice feature data with high contribution is stored as long-term effective data.

【００４５】上記３種類の標準パターン（テンプレー
ト）は、図７に明示のように、工場出荷時に予め登録さ
れた音声データをクラスタリングして作成された標準パ
ターンＤｅｆａｕｌｔと上記学習データ領域Ｔｅｍｐ１
またはＴｅｍｐ２に蓄積された音声特徴データを短期学
習標準データとしてクラスタリングすることによって作
成された短期学習標準パターンＬｅａｒｎ１と上記の貢
献度の高い音声特徴データを長期有効データとして有効
学習データ領域Ｔｅｍｐからクラスタリングすることに
よって作成された長期学習標準パターンＬｅａｒｎ２の
３種類である。As shown in FIG. 7, the three types of standard patterns (templates) are standard patterns Default created by clustering voice data registered in advance at the time of factory shipment and the learning data area Temp1.
Alternatively, the short-term learning standard pattern Learn1 created by clustering the speech feature data accumulated in Temp2 as short-term learning standard data and the above-mentioned speech feature data having a high contribution are clustered from the valid learning data area Temp as long-term effective data. There are three types of long-term learning standard pattern Learn2 created by the above.

【００４６】図５において、８はパターン照合部で、上
記３種類の標準パターンに保管されている音声特徴デー
タと現実の入力音声特徴データとを照合する。９は結果
選択部で、上記パターン照合部８での照合結果を選択し
て認識結果を出力する。１０は貢献度評価部で、上記短
期学習標準パターンＬｅａｒｎ１および長期学習標準パ
ターンＬｅａｒｎ２個々の利用頻度を所定の尺度で評価
する。上記の構成によって音声認識処理手段１Ａ，１Ｂ
それぞれが、標準パターンＤｅｆａｕｌｔをもつ音声認
識手段と短期学習標準パターンＬｅａｒｎ１をもつ音声
認識手段と長期学習標準パターンＬｅａｒｎ２をもつ音
声認識手段の３種類からなり、このほかに、この実施例
４では、２つの音声認識処理手段１Ａ，１Ｂそれぞれの
パターン照合部８から出力される認識結果を総合評価し
て最終結果を出力する認識結果統合手段および統合学習
手段を備えているが、それらは実施例１で述べたものと
同一であるため、それらについての図示および説明は省
略する。In FIG. 5, a pattern collating unit 8 collates the voice characteristic data stored in the above three types of standard patterns with the actual input voice characteristic data. A result selection unit 9 selects the matching result of the pattern matching unit 8 and outputs the recognition result. A contribution degree evaluation unit 10 evaluates the use frequency of each of the short-term learning standard pattern Learn1 and the long-term learning standard pattern Learn2 on a predetermined scale. With the above configuration, the voice recognition processing means 1A, 1B
Each of them consists of three types of speech recognition means having a standard pattern Default, speech recognition means having a short-term learning standard pattern Learn1, and speech recognition means having a long-term learning standard pattern Learn2. The speech recognition processing means 1A, 1B are provided with a recognition result integrating means and an integrated learning means for comprehensively evaluating the recognition results output from the pattern matching parts 8 and outputting the final results. Since they are the same as those described, their illustration and description are omitted.

【００４７】なお、上記貢献度評価部１０での貢献度評
価方法としては、発声された音声データと標準パターン
との距離の近かったものの分割行列を参照してそのグレ
ートの高いデータに高い点数を付ける方法や、単純に音
声認識に使用された回数を貢献度（ｓｃｏｒｅ）とする
方法や、ヒット率を貢献度（ｓｃｏｒｅ）とする方法な
どが考えられるが、そのうち、該実施例が採用するヒッ
ト率を用いた例を説明すると、貢献度（ｓｃｏｒｅ）
は、ｓｃｏｒｅ＝ｈｉｔ÷ｂａｔ ……（３）但し、ｂａｔ：学習データ領域Ｔｅｍｐ１またはＴｅｍ
ｐ２に蓄積されている音声特徴データ（単語）の発声回
数ｈｉｔ：音声特徴データ（単語）の発声回数のうち、１
位として認識された回数で求められ、その具体例を示すと、表５のようになる。As a contribution level evaluation method in the contribution level evaluation unit 10, a high score is assigned to high-quality data by referring to the partition matrix of the voice data that is close in distance to the standard pattern. It is possible to use a method of attaching, a method of simply using the number of times used for voice recognition as a contribution degree (score), a method of using a hit rate as a contribution degree (score), etc. Explaining an example using the rate, the contribution degree (score)
Is score = hit ÷ bat (3) where bat: learning data area Temp1 or Temp
Number of utterances of voice feature data (word) accumulated in p2 hit: 1 out of number of utterances of voice feature data (word)
It is calculated by the number of times it is recognized as a rank, and a specific example is shown in Table 5.

【００４８】[0048]

【表５】 [Table 5]

【００４９】つぎに、上記実施例４による音声認識処理
手段の学習アルゴリズムについて説明する。初期状態か
らの学習処理アルゴリズムは、図８のフローチャートに
示す通りであって、マイクロホンなどを通して音声が入
力されると（ステップＳ２０）、その入力された音声は
Ａ／Ｄ変換部５でディジタルデータに変換されたのち、
時間正規化・特徴抽出部６で特徴データが抽出される
（ステップＳ２１）。ついで、標準パターンＤｅｆａｕ
ｌｔをもつ音声認識手段のみを使って認識処理が行われ
（ステップＳ２２）、それが１位で認識されたか否かの
判定（ステップＳ２３）においてｎｏの場合は、上記抽
出した特徴データを学習データ領域Ｔｅｍｐ２へ格納す
る（ステップＳ２４）。そして、学習データ領域Ｔｅｍ
ｐ２が一杯になると（ステップＳ２５）、該学習データ
領域Ｔｅｍｐ２をクラスタリングすることによって、短
期学習標準パターンＬｅａｒｎ１を作成する（ステップ
Ｓ２６）。Next, the learning algorithm of the voice recognition processing means according to the above-mentioned fourth embodiment will be explained. The learning processing algorithm from the initial state is as shown in the flowchart of FIG. 8. When voice is input through a microphone or the like (step S20), the input voice is converted into digital data by the A / D conversion unit 5. After being converted,
The feature data is extracted by the time normalization / feature extraction unit 6 (step S21). Then, the standard pattern Defau
The recognition process is performed using only the voice recognition means having lt (step S22), and if it is no in the determination (step S23) as to whether or not it is recognized in the first place, the extracted feature data is the learning data. The data is stored in the area Temp2 (step S24). Then, the learning data area Tem
When p2 is full (step S25), the short-term learning standard pattern Learn1 is created by clustering the learning data area Temp2 (step S26).

【００５０】それ以降は通常の学習処理アルゴリズムに
移行する。この通常学習処理アルゴリズムは、図９のフ
ローチャートに示す通りであって、音声が入力されると
（ステップＳ３０）、その入力された音声はＡ／Ｄ変換
部５でディジタルデータに変換されたのち、時間正規化
・特徴抽出部６で特徴データが抽出される（ステップＳ
３１）。ついで、その抽出特徴データについて、標準パ
ターンＤｅｆａｕｌｔをもつ音声認識手段、短期学習標
準パターンＬｅａｒｎ１および長期学習標準パターンＬ
ｅａｒｎ２を使った認識処理が行われる（ステップＳ３
２）とともに、短期学習標準パターンＬｅａｒｎ１およ
び長期学習標準パターンＬｅａｒｎ２の貢献度が評価さ
れる（ステップＳ３３）。そして、上記の認識処理が１
位認識であったか否かが判定され（ステップＳ３４）、
ｎｏの場合は、抽出された特徴データを学習データ領域
Ｔｅｍｐ１へ格納し（ステップＳ３５）、この学習デー
タ領域Ｔｅｍｐ１が一杯になった時点（ステップＳ３
６）で、上記短期学習標準パターンＬｅａｒｎ１の中で
貢献度の高いものを学習データ領域Ｔｅｍｐ２から有効
学習データ領域Ｔｅｍｐへ格納し、残りは削除する（ス
テップＳ３７）。After that, the process shifts to a normal learning processing algorithm. This normal learning processing algorithm is as shown in the flowchart of FIG. 9. When a voice is input (step S30), the input voice is converted into digital data by the A / D converter 5, The feature data is extracted by the time normalization / feature extraction unit 6 (step S
31). Then, for the extracted feature data, the speech recognition means having the standard pattern Default, the short-term learning standard pattern Learn1 and the long-term learning standard pattern L
A recognition process using earn2 is performed (step S3).
Along with 2), the contribution levels of the short-term learning standard pattern Learn1 and the long-term learning standard pattern Learn2 are evaluated (step S33). Then, the above recognition processing is 1
It is determined whether or not the position was recognized (step S34),
In the case of no, the extracted feature data is stored in the learning data area Temp1 (step S35), and when the learning data area Temp1 is full (step S3).
In 6), the short-term learning standard pattern Learn1 having a high contribution is stored from the learning data area Temp2 to the effective learning data area Temp, and the rest is deleted (step S37).

【００５１】同時に、一杯になった学習データ領域Ｔｅ
ｍｐ１から空白の生じた他の学習データ領域Ｔｅｍｐ２
へ抽出特徴データを転送し格納し（ステップＳ３８）、
この学習データ領域Ｔｅｍｐ２をクラスタリングするこ
とによって、短期学習標準パターンＬｅａｒｎ１を作成
する（ステップＳ３９）。これの繰り返しによって、上
記有効学習データ領域Ｔｅｍｐが一杯になったことが判
定される（ステップＳ４０）と、この有効学習データ領
域Ｔｅｍｐをクラスタリングすることにより長期学習標
準パターンＬｅａｒｎ２を作成する（ステップ４１）。At the same time, the filled learning data area Te
Another learning data area Temp2 in which a blank is generated from mp1
The extracted characteristic data is transferred to and stored (step S38),
The short-term learning standard pattern Learn1 is created by clustering the learning data area Temp2 (step S39). By repeating this, when it is determined that the effective learning data area Temp is full (step S40), the effective learning data area Temp is clustered to create the long-term learning standard pattern Learn2 (step 41). .

【００５２】なお、上記図９のフローチャートにおける
ステップ３３の動作で、短期学習標準パターンＬｅａｒ
ｎ１および長期学習標準パターンＬｅａｒｎ２それぞれ
の貢献度（ｓｃｏｒｅ）が上述した表５の状態にあると
きの貢献度評価時のアルゴリズムは、図１０に示す通り
である。すなわち、特徴データ（単語）の発声回数が所
定値（ｃｏｎｓｔ１）を超えたか否かを判定し（ステッ
プＳ５０）、発声回数が所定値（ｃｏｎｓｔ１）を超え
たときはヒット率が所定値（ｃｏｎｓｔ２）以下である
か否かを判定し（ステップＳ５１）、ヒット率が所定値
（ｃｏｎｓｔ２）以下である場合は、学習データ領域Ｔ
ｅｍｐ１、Ｔｅｍｐ２へ格納されている特徴データを削
除する（ステップＳ５２）。これによって、計算機の消
費パワーを可及的に低減することが可能である。In the operation of step 33 in the flow chart of FIG. 9, the short-term learning standard pattern Lear is
The algorithm at the time of the contribution evaluation when the contributions (score) of n1 and the long-term learning standard pattern Learn2 are in the states of Table 5 described above is as shown in FIG. That is, it is determined whether or not the number of utterances of the characteristic data (word) exceeds a predetermined value (const1) (step S50), and when the number of utterances exceeds the predetermined value (const1), the hit rate is a predetermined value (const2). It is determined whether or not it is below (step S51), and if the hit rate is below a predetermined value (const2), the learning data area T
The characteristic data stored in emp1 and Temp2 are deleted (step S52). This makes it possible to reduce the power consumption of the computer as much as possible.

【００５３】[0053]

【発明の効果】以上のように、請求項１および請求項
９の発明によれば、複数の音声識別処理手段を用いて、
これらが有する比較的容易な認識方式を相補的に使用
し、それらの結果を纏めて総合評価することにより、認
識されるべき対象が複雑な場合であっても、高度で厳し
い要求仕様を満たす認識方式を確立することなく、認識
率の著しい向上を実現することができるという効果を奏
する。As described above, according to the inventions of claims 1 and 9, a plurality of voice identification processing means are used,
Even if the object to be recognized is complicated, it can be used to compliment the advanced and strict requirement specifications by complementarily using the relatively easy recognition methods that these have and collectively evaluating the results. This has the effect of significantly improving the recognition rate without establishing a method.

【００５４】また、請求項２および請求項１０の発明に
よれば、上記のような最終音声認識結果を出力しなが
ら、その結果に基づくユーザーからのフィードバックな
どをもとにして、上記複数の音声識別処理手段を現在の
入力音声データにおける認識の適合度が向上するように
一括して学習させることにより、自律的に作動環境の変
化や実際の認識対象に漸次的に適応させて、認識率の経
年的な向上を図ることができるという効果を奏する。Further, according to the inventions of claims 2 and 10, while outputting the final voice recognition result as described above, based on feedback from the user based on the result, the plurality of voices are output. By collectively learning the identification processing means so that the recognition suitability in the current input voice data is improved, the identification processing means is gradually and autonomously adapted to changes in the operating environment and the actual recognition target, thereby improving the recognition rate. This has the effect of improving over time.

【００５５】また、請求項３および請求項１１の発明に
よれば、複数の音声認識処理手段それぞれの認識結果の
うち、所定基準を満たす認識処理手段の認識結果を選択
し、その選択された認識結果を総合評価して最終結果を
出力することが可能で、認識率のより一層の向上を図る
ことができる。According to the invention of claims 3 and 11, the recognition result of the recognition processing means satisfying a predetermined criterion is selected from the recognition results of the plurality of speech recognition processing means, and the selected recognition is performed. It is possible to comprehensively evaluate the results and output the final result, and it is possible to further improve the recognition rate.

【００５６】また、上記の音声認識処理の実行にあたっ
て、請求項４および請求項１２のように、入力音声を複
数の音声区間切り出し手段により切り出して複数の音声
認識処理手段によって認識処理させることにより、種々
の条件によって優劣が変化する複数の切り出し手段のう
ち、音声入力の時点で最も精度のよい切り出し手段を効
率的に使用して音声区間切り出し精度を高め、その結果
として、音声認識率の一層の向上を図ることができる。Further, in executing the above-mentioned voice recognition processing, as in claims 4 and 12, the input voice is cut out by a plurality of voice section cutout means and is recognized by a plurality of voice recognition processing means. Among a plurality of cutting means whose superiority or inferiority changes depending on various conditions, the most accurate cutting means at the time of voice input is efficiently used to improve the voice segment cutout accuracy, and as a result, the voice recognition rate is further improved. It is possible to improve.

【００５７】また、上記の音声認識処理の実行にあたっ
て、請求項５および請求項１３のように、それぞれが異
なる種類の複数の音声認識処理手段を使用することによ
り、認識可能な語彙数を音声認識処理手段の数だけ計算
コストを増やさずに容易に増大することができる。Further, in executing the above-mentioned voice recognition processing, by using a plurality of voice recognition processing means of different types as in claim 5 and claim 13, the number of recognizable vocabulary is recognized by the voice recognition. It can be easily increased by the number of processing means without increasing the calculation cost.

【００５８】また、請求項６および請求項１４の発明に
よれば、工場出荷時などに予め登録された音声データを
ストックしている標準パターンをもつ音声認識手段とそ
の標準パターンを使っての音声認識処理時に抽出される
特徴データを格納して作成される短期学習標準パターン
および長期学習標準パターンをもつ音声認識手段という
３種類の音声認識手段を使用することによって、各パタ
ーンの学習結果を効率的に反映させて、標準パターンの
良否に関係なく、かつ、例えばニューラルネットのよう
な多大な計算量を要することなく、種々の状況変化に対
応して高速で、かつ、柔軟な音声認識を実現し、認識率
の向上を図ることができるという効果を奏する。Further, according to the inventions of claims 6 and 14, the voice recognition means having a standard pattern in which the voice data registered in advance at the time of factory shipment is stocked and the voice using the standard pattern. By using three types of speech recognition means, that is, a speech recognition means having a short-term learning standard pattern and a long-term learning standard pattern created by storing the feature data extracted at the time of recognition processing, the learning result of each pattern can be efficiently obtained. It is possible to realize fast and flexible voice recognition in response to various situation changes regardless of whether the standard pattern is good or bad and without requiring a large amount of calculation such as a neural network. Thus, the recognition rate can be improved.

【００５９】さらに、上記のような音声認識にあたっ
て、請求項７および請求項１５のように、複数の音声認
識処理手段それぞれの認識結果を所定基準により纏めて
総合的に評価して最終結果を出力することにより、高度
で厳しい要求仕様を満たす高い認識率を得ることができ
る。Further, in the above speech recognition, as in claims 7 and 15, the recognition results of each of the plurality of speech recognition processing means are collectively evaluated based on a predetermined standard, and the final result is output. By doing so, it is possible to obtain a high recognition rate that meets the high and strict requirement specifications.

【００６０】特に、上記の音声認識にあたって、請求項
８および請求項１６のように、上記短期学習標準パター
ンおよび長期学習標準パターン個々の利用頻度を所定の
尺度で評価して不必要な学習標準パターンを消去し、有
効な学習標準パターンのみを残すことで、計算機の消費
パワーを低減することができる。Particularly, in the above-mentioned speech recognition, as in claims 8 and 16, unnecessary use of the learning standard pattern is evaluated by evaluating the use frequency of each of the short-term learning standard pattern and the long-term learning standard pattern by a predetermined scale. , And leave only the effective learning standard pattern, the power consumption of the computer can be reduced.

[Brief description of drawings]

【図１】この発明の実施例１による音声認識装置の概略
構成図である。FIG. 1 is a schematic configuration diagram of a voice recognition device according to a first embodiment of the present invention.

【図２】実施例１による音声認識装置の動作を説明する
フローチャートである。FIG. 2 is a flowchart illustrating an operation of the voice recognition device according to the first embodiment.

【図３】この発明の実施例２による音声認識装置の概略
構成図である。FIG. 3 is a schematic configuration diagram of a voice recognition device according to a second embodiment of the present invention.

【図４】この発明の実施例３による音声認識装置の概略
構成図である。FIG. 4 is a schematic configuration diagram of a voice recognition device according to a third embodiment of the present invention.

【図５】図１に示す実施例１における音声認識装置の２
つの音声認識処理手段それぞれの具体的な構成を示すブ
ロック図である。FIG. 5 is a voice recognition device 2 according to the first embodiment shown in FIG.
It is a block diagram which shows the concrete structure of each one speech recognition processing means.

【図６】図４に示す実施例４の音声特徴データの構成図
である。FIG. 6 is a configuration diagram of voice feature data according to the fourth embodiment illustrated in FIG. 4.

【図７】図４に示す実施例４の標準パターンの構成図で
ある。FIG. 7 is a configuration diagram of a standard pattern of Example 4 shown in FIG.

【図８】図４に示す実施例４の初期状態からの学習処理
アルゴリズムを説明するフローチャートである。FIG. 8 is a flowchart illustrating a learning processing algorithm from the initial state according to the fourth exemplary embodiment illustrated in FIG.

【図９】図４に示す実施例４の通常学習処理アルゴリズ
ムを説明するフローチャートである。FIG. 9 is a flowchart illustrating a normal learning processing algorithm of the fourth embodiment illustrated in FIG.

【図１０】図４に示す実施例４の貢献度評価時のアルゴ
リズムを説明するフローチャートである。FIG. 10 is a flowchart illustrating an algorithm for evaluating a contribution degree according to the fourth exemplary embodiment illustrated in FIG.

【図１１】従来の音声認識装置の概略構成図である。FIG. 11 is a schematic configuration diagram of a conventional voice recognition device.

【図１２】従来の音声認識学習装置の概略構成図であ
る。FIG. 12 is a schematic configuration diagram of a conventional speech recognition learning device.

[Explanation of symbols]

１Ａ，１Ｂ音声認識処理手段２，２Ａ，２Ｂ音声区間切り出し手段３認識結果統合手段４統合学習手段１０貢献度評価部Ｄｅｆａｕｌｔ標準パターンＬｅａｒｎ１短期学習標準パターンＬｅａｒｎ２長期学習標準パターン 1A, 1B Speech recognition processing means 2, 2A, 2B Speech segment cut-out means 3 Recognition result integration means 4 Integrated learning means 10 Contribution evaluation part Default standard pattern Learn1 Short term learning standard pattern Learn2 Long term learning standard pattern

───────────────────────────────────────────────────── フロントページの続き (72)発明者橋本政朋京都府京都市左京区高野泉町68 ベルジュ 56−606 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Masatomo Hashimoto 68 Izumi-cho, Takano Izumi-cho, Sakyo-ku, Kyoto-shi, Kyoto Berge 56-606

Claims

[Claims]

1. A plurality of voice recognition processing means, and a recognition result integrating means for collectively evaluating the recognition results of the plurality of voice recognition processing means according to a predetermined standard and outputting a final result. Characteristic voice recognition device.

2. A plurality of voice recognition processing means, a recognition result integrating means for collectively evaluating the recognition results of the plurality of voice recognition processing means according to a predetermined standard and outputting a final result, and the plurality of voices. A speech recognition apparatus comprising: integrated learning means for integrating data for collectively learning recognition processing means.

3. The recognition result integrating means selects a recognition result of a recognition processing means satisfying a predetermined criterion from among the recognition results of the plurality of voice recognition processing means, and comprehensively evaluates the selected recognition results. The voice recognition device according to claim 1 or 2, further comprising a recognition selection unit that outputs a final result.

4. A plurality of voice segment cutout means are provided in front of the plurality of voice recognition processing means.
The voice recognition device according to any one of 3 above.

5. The voice recognition device according to claim 1, wherein the plurality of voice recognition processing means are of different types.

6. The voice recognition processing means comprises three types, a voice recognition means having a standard pattern, a voice recognition means having a short-term learning standard pattern, and a voice recognition means having a long-term learning standard pattern, and the integrated learning. A means reflects the learning result on the short-term learning standard pattern over a first predetermined period, and reflects the learning result on the long-term learning standard pattern over a second predetermined period longer than the first predetermined period. The voice recognition device according to claim 2.

7. The voice recognition device according to claim 6, wherein the integrated learning means collectively evaluates the recognition results of the three types of voice recognition means by a predetermined standard and outputs a final result. .

8. A pattern evaluation means for evaluating the use frequency of each of the short-term learning standard pattern and the long-term learning standard pattern by a predetermined scale and erasing unnecessary learning standard patterns based on the evaluation. Item 6 or 7
The voice recognition device described in.

9. A voice recognition method, wherein the recognition results obtained by each of the plurality of voice recognition processing means are collectively evaluated according to a predetermined standard and a final result is output.

10. The recognition results obtained by each of the plurality of voice recognition processing means are collectively evaluated according to a predetermined standard, a final result is output, and the plurality of voice recognition processing means are collectively learned. Characteristic voice recognition method.

11. A recognition result satisfying a predetermined criterion is selected from the recognition results obtained by each of the plurality of voice recognition processing means, and the selected recognition result is comprehensively evaluated to output a final result. 10. The voice recognition method according to 10.

12. A method according to claim 9, further comprising a plurality of voice section cutout means for cutting out the input voice into a plurality of sections, wherein the plurality of voice sections cut out by the plurality of cutout means are recognized by a plurality of voice recognition processing means. The voice recognition method according to any one of 1 to 11.

13. The plurality of voice recognition processing means,
The voice recognition method according to claim 9, wherein different types are used.

14. As the voice recognition processing means, three types of voice recognition means having a standard pattern, voice recognition means having a short-term learning standard pattern and voice recognition means having a long-term learning standard pattern are used, and the short-term learning standard is used. The learning result is reflected in a pattern for a first predetermined period, and the learning result is reflected in a second predetermined period, which is longer than the first predetermined period, for the long-term learning standard pattern.
The voice recognition method described in 0.

15. The voice recognition method according to claim 14, wherein the recognition results obtained by each of the three types of voice recognition means are collectively evaluated according to a predetermined standard and a final result is output.

16. The method according to claim 10, wherein the use frequency of each of the short-term learning standard pattern and the long-term learning standard pattern is evaluated by a predetermined scale, and unnecessary learning standard patterns are deleted based on the evaluation. Speech recognition method.