JP2005227673A

JP2005227673A - Voice recognition device

Info

Publication number: JP2005227673A
Application number: JP2004038102A
Authority: JP
Inventors: Seiji Mori; 政治森
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2004-02-16
Filing date: 2004-02-16
Publication date: 2005-08-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition device which can easily be operated almost without depending on visual sense. <P>SOLUTION: When a denial evaluation is inputted in the voice recognition device, information is made for a phoneme text of a next recognition candidate and the device urges an evaluation input of the text. Moreover, an input operation of an affirmative evaluation is made by stroking a touch pad 21 and an input operation of a denial result is made by patting the touch pad 21 so that evaluation inputs are made by the direct feeling operations. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、音声信号から音素テキストを認識する音声認識装置に関する。 The present invention relates to a speech recognition apparatus that recognizes phoneme text from a speech signal.

従来より、音声信号から音素テキスト（発声された文字列）を認識する音声認識装置が知られている。この種の音声認識装置には、ユーザの音声から複数候補の音素テキストを認識し、ユーザが所定の入力装置を操作して正しい認識の音素テキストを選択するようにすると共に、このユーザからのフィードバックをもとに以降の認識率を向上させるようにしたものがある（例えば、特許文献１）。
特開平８−２０２３８８号公報 2. Description of the Related Art Conventionally, a speech recognition device that recognizes phoneme text (uttered character string) from a speech signal is known. In this type of speech recognition apparatus, a plurality of candidate phoneme texts are recognized from the user's voice, and the user operates a predetermined input device to select a correct recognition phoneme text, and feedback from the user Based on the above, there is one that improves the subsequent recognition rate (for example, Patent Document 1).
JP-A-8-202388

ところで、近年、この種の音声認識装置をカーナビゲーション装置などの車載装置として構成し、音声にて各種操作を可能としたものが提供されている。 By the way, in recent years, this type of voice recognition device is configured as an in-vehicle device such as a car navigation device, and various voice operations are available.

しかしながら、従来の音声認識装置は、複数の操作ボタンが異なる箇所に配置されており、かつ、ユーザが適切な音素テキストを確定するには、複数候補の中から一つの音素テキストを選択する操作（選択操作）と、選択した音素テキストを確定する操作（決定操作）とが必要で、視覚に頼った操作が必要となり、車載装置の操作としては好ましくない。 However, in the conventional speech recognition apparatus, a plurality of operation buttons are arranged at different places, and in order for the user to determine an appropriate phoneme text, an operation of selecting one phoneme text from a plurality of candidates ( A selection operation) and an operation (decision operation) for confirming the selected phoneme text are required, and an operation that relies on vision is required, which is not preferable as an operation of the in-vehicle device.

本発明は、上述した事情に鑑みてなされたものであり、視覚にほとんど頼らないで簡易に操作することができる音声認識装置を提供することを目的としている。 The present invention has been made in view of the above-described circumstances, and an object of the present invention is to provide a voice recognition device that can be easily operated with little dependence on vision.

上述課題を解決するため、本発明は、音声信号から複数候補の音素テキストを認識する音声認識手段と、この音声認識手段が認識した複数候補のうちの一つの音素テキストを報知する報知手段と、この報知手段が報知した音素テキストの肯定評価又は否定評価を入力する入力装置とを備え、前記報知手段は、前記入力装置を介して否定評価が入力されると、他の音素テキストの評価の入力を促すべく、前記複数候補の音素テキストのうちの他の音素テキストを報知することを特徴とする音声認識装置を提供する。この構成によれば、否定評価が入力されると、他の音素テキストの評価の入力を促すべく、他の音素テキストを報知するので、音素テキストを選択する操作を行う必要がなくなる。この構成において、前記入力装置を、前記肯定評価と否定評価とを各々異なる操作態様により入力可能とすることが好ましい。 In order to solve the above-described problem, the present invention provides a speech recognition unit that recognizes a plurality of candidate phoneme texts from a speech signal, a notification unit that notifies one phoneme text of the plurality of candidates recognized by the speech recognition unit, An input device for inputting a positive or negative evaluation of the phoneme text notified by the notification means, and the notification means inputs an evaluation of another phoneme text when a negative evaluation is input via the input device. The speech recognition apparatus is characterized by informing other phoneme texts of the plurality of candidate phoneme texts. According to this configuration, when a negative evaluation is input, the other phoneme text is notified in order to prompt the input of the evaluation of the other phoneme text, so that it is not necessary to perform an operation for selecting the phoneme text. In this configuration, it is preferable that the input device can input the positive evaluation and the negative evaluation by different operation modes.

また、本発明は、音声信号から音素テキストを認識する音声認識手段と、この音声認識手段が認識した音素テキストを報知する報知手段と、この報知手段が報知した音素テキストの肯定評価又は否定評価を入力する入力装置とを備え、前記入力装置を、前記肯定評価と否定評価とを各々異なる操作態様により入力可能としたことを特徴とする音声認識装置を提供する。この構成によれば、肯定評価と否定評価とを各々異なる操作態様により入力可能としたことにより、簡易に評価を入力することができる。この構成において、肯定評価の操作態様を撫でる操作とすることが好ましく、この場合、揺動自在に設けられた操作体を設け、この操作体の揺動を検出して撫でる操作を検出することが好ましい。 The present invention also provides speech recognition means for recognizing phoneme text from a speech signal, notification means for notifying the phoneme text recognized by the speech recognition means, and positive or negative evaluation of the phoneme text notified by the notification means. An input device for inputting is provided, and the speech recognition device is characterized in that the input device can input the positive evaluation and the negative evaluation by different operation modes. According to this configuration, since the positive evaluation and the negative evaluation can be input by different operation modes, the evaluation can be easily input. In this configuration, it is preferable that the positive evaluation operation mode is a stroke operation. In this case, a swingable operation body is provided, and a stroke operation is detected by detecting the swing of the operation body. preferable.

また、この構成において、前記否定評価の操作態様を叩く操作とすることが好まく、この場合、上下動自在に設けられた操作体を設け、この操作体の上下動を検出して叩く操作を検出することが好ましい。また、上記各構成において、入力装置を介して入力した評価に応じて音声を放音する放音手段や、肯定評価が入力されると、前記音素テキストに応じた処理を実行する処理実行手段を設けてもよい。 Further, in this configuration, it is preferable that the operation mode of the negative evaluation is an operation of tapping. In this case, an operation body provided so as to be movable up and down is provided, and an operation of detecting and tapping the operation body is detected. It is preferable to detect. Further, in each of the above configurations, a sound emitting unit that emits a sound according to the evaluation input through the input device, or a process execution unit that executes a process according to the phoneme text when a positive evaluation is input. It may be provided.

また、この構成において、複数の認識アルゴリズムと複数の認識パラメータとを記憶する記憶手段を備え、前記音声認識手段は、前記記憶手段に記憶されたいずれかの認識アルゴリズムと認識パラメータとに基づいて音声認識を行うと共に、この音声認識に用いた認識アルゴリズムと認識パラメータの組み合わせと、この組み合わせの際に入力された肯定評価と否定評価との頻度とに基づいて、相関が高く肯定評価が高い認識アルゴリズムと認識パラメータの組み合わせを決定し、以降は、この決定した組み合わせで音声認識を行うようにしてもよい。 Further, in this configuration, a storage unit that stores a plurality of recognition algorithms and a plurality of recognition parameters is provided, and the speech recognition unit performs speech based on any of the recognition algorithms and recognition parameters stored in the storage unit. Recognition algorithm with high correlation and high positive evaluation based on the combination of recognition algorithm and recognition parameters used for speech recognition and the frequency of positive evaluation and negative evaluation input at the time of this combination The recognition parameter combination may be determined, and thereafter, speech recognition may be performed using the determined combination.

本発明は、ユーザは複数の認識候補の中から音素テキストを選択する操作（選択操作や決定操作）を行う必要がなく、評価の入力操作を行うだけでよくなる。また、入力装置を、肯定評価と否定評価とを各々異なる操作態様により入力可能としたことにより、視覚に頼らない操作が可能となる。 In the present invention, the user does not need to perform an operation (selection operation or determination operation) for selecting a phoneme text from among a plurality of recognition candidates, and only needs to perform an evaluation input operation. In addition, since the input device can input positive evaluation and negative evaluation in different operation modes, an operation that does not depend on vision is possible.

以下、図面を参照しつつ本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施形態にかかる車載用の音声認識装置１０の概要を示すブロック図である。この音声認識装置１０は、カーナビゲーション装置やカーオーディオ装置などの車載装置として構成されるものである。なお、本実施形態では、音声認識に関する構成以外は、従来の車載装置と略同様の構成であるため、音声認識に関する構成のみを説明する。 FIG. 1 is a block diagram showing an outline of an in-vehicle voice recognition device 10 according to the present embodiment. The voice recognition device 10 is configured as a vehicle-mounted device such as a car navigation device or a car audio device. In addition, in this embodiment, since it is the structure substantially the same as the conventional vehicle-mounted apparatus except the structure regarding speech recognition, only the structure regarding speech recognition is demonstrated.

制御部（処理実行手段）１１は、マイクロフォン１２を介して入力した音声信号から音素テキストを認識する音声認識処理を行う音声認識エンジンとして機能すると共に、この音声認識装置１０全体を制御するものである。アルゴリズムデータベース（以下、「データベース」を「ＤＢ」と表記する）１３は、複数の認識アルゴリズムを記憶するデータベースであり、認識パラメータＤＢ１４は、複数の認識用パラメータ（男女別、年齢別、地方別の音響パラメータなど）を記憶するデータベースである。ここで、認識アルゴリズムは、例えば、発声時間長の伸縮を動的計画法を用いて正規化するＤＰマッチング法、ＨＭＮ(Hidden Markov Model)を用いたＨＭＮ統計確立手法、或いはｎ−Ｇｒａｍ法(ｎ語間の統計確率を用いる方式)などのアルゴリズムなどが採用される。 The control unit (processing execution means) 11 functions as a speech recognition engine that performs speech recognition processing for recognizing phoneme text from speech signals input via the microphone 12 and controls the entire speech recognition apparatus 10. . An algorithm database (hereinafter referred to as “DB”) 13 is a database that stores a plurality of recognition algorithms, and a recognition parameter DB 14 includes a plurality of recognition parameters (gender-specific, age-specific, and region-specific). This is a database for storing acoustic parameters and the like. Here, the recognition algorithm may be, for example, a DP matching method that normalizes expansion / contraction of the utterance time length using dynamic programming, an HMN statistics establishment method using an HMN (Hidden Markov Model), or an n-Gram method (n An algorithm such as a method using statistical probability between words is employed.

すなわち、制御部１１は、アルゴリズムＤＢ１３に記憶された複数の認識アルゴリズムと、認識パラメータＤＢ１４に記憶された認識用パラメータとを用いることで複数の認識方式で音声認識を行うことが可能となっている。 That is, the control unit 11 can perform speech recognition using a plurality of recognition methods by using a plurality of recognition algorithms stored in the algorithm DB 13 and a recognition parameter stored in the recognition parameter DB 14. .

また、条件記憶用メモリ１５は、音声認識条件を記憶する不揮発性メモリであり、このメモリには、さらに、学習データや統計的言語モデル（文の出現確率を連続したN個の単語の並びに対する出現確率から推定するためのモデル）のデータも記憶される。 The condition storage memory 15 is a non-volatile memory for storing speech recognition conditions. The memory 15 further stores learning data and a statistical language model (for a sequence of N words in which appearance probabilities are consecutive). Data of a model for estimating from the appearance probability is also stored.

音声合成部１６は、制御部１１の制御の下、合成音声の音声信号を生成する音声合成処理を行い、この音声信号をスピーカー１７に出力して該合成音声を放音させる。ここで、このスピーカー１７は、この音声認識装置１０が搭載される車が予め装備するスピーカーを用いてもよい。 Under the control of the control unit 11, the voice synthesizer 16 performs a voice synthesis process for generating a voice signal of the synthesized voice, and outputs the voice signal to the speaker 17 to emit the synthesized voice. Here, as the speaker 17, a speaker that is preliminarily installed in a vehicle on which the voice recognition device 10 is mounted may be used.

ユーザ評価入力装置２０は、ユーザから音声認識の評価を入力するための入力装置である。以下、このユーザ評価入力装置の構成を詳述する。 The user evaluation input device 20 is an input device for inputting a voice recognition evaluation from a user. Hereinafter, the configuration of the user evaluation input device will be described in detail.

図２はユーザ評価入力装置２０の上面図であり、図３は側断面図である。ユーザ評価入力装置２０は、ユーザにより操作されるタッチパッド（操作体）２１と、このタッチパッド２１を支持する支持台２２とタッチパッド２１の周囲を囲む壁部２３とからなる外ケース２４とを備え、音声認識装置１０の装置本体（図示せず）に取り付けられている。 2 is a top view of the user evaluation input device 20, and FIG. 3 is a side sectional view. The user evaluation input device 20 includes a touch pad (operation body) 21 operated by a user, an outer case 24 including a support base 22 that supports the touch pad 21 and a wall portion 23 that surrounds the touch pad 21. And is attached to an apparatus main body (not shown) of the speech recognition apparatus 10.

詳述すると、タッチパッド２１は、プラスチックなどの樹脂などで形成され、上面がなだらかな曲面に形成されたパッド部２１Ａと、このパッド部２１Ａの下方に一体的に設けられたスプリング挿通部２１Ｂとを備えている、このスプリング挿通部２１Ｂには、金属などの導電性材料から形成されたコイルスプリング２５が挿通され、このコイルスプリング２５の他端が、支持台２２の上面に設けられた凸部２２Ａに挿通され、これにより、タッチパッド２１は上記支持台２２に揺動自在かつ上下動自在に支持される。 More specifically, the touch pad 21 is formed of a resin such as plastic and has a pad portion 21A formed with a gently curved upper surface, and a spring insertion portion 21B provided integrally below the pad portion 21A. A coil spring 25 made of a conductive material such as metal is inserted into the spring insertion portion 21B, and the other end of the coil spring 25 is a convex portion provided on the upper surface of the support base 22. Thus, the touch pad 21 is supported by the support base 22 so as to be swingable and vertically movable.

壁部２３は、パッド部２１Ａの外径より大きい内径を有する筒形状に形成され、図２及び図３に示すように、タッチパッド２１のパッド部２１Ａとの間に間隙を有するように支持台２２に固定される。 The wall portion 23 is formed in a cylindrical shape having an inner diameter larger than the outer diameter of the pad portion 21A, and has a support base so as to have a gap between the pad portion 21A of the touch pad 21 as shown in FIGS. 22 is fixed.

また、タッチパッド２１には、パッド部２１Ａの周囲に電極３０Ａが設けられ、この電極３０Ａがコイルスプリング２４と電気的に導通される。また、スプリング挿通部２１Ｂの下面にも電極３０Ｂが設けられる。また、壁部２３にも、上述したパッド部２１Ａの電極３０Ａと対向する位置に電極３０Ｃが設けられ、さらに、支持台２２にも、凸部２２Ａの上面（つまり。電極３０Ｂと対向する位置）に電極３０Ｄが設けられる。さらに、支持台２２には、３つの金属端子（導通端子）４０ａ、４０ｂ、４０ｃが設けられ、この金属端子４０ａ、４０ｂ、４０ｃには、電極３０Ｃ、コイルスプリング２４、電極３０Ｄがそれぞれ電気的に接続される。この３つの金属端子４０ａ、４０ｂ、４０ｃは、音声認識装置１０の制御部１１によって２端子間の導通が検出されるようになっている。 Further, the touch pad 21 is provided with an electrode 30A around the pad portion 21A, and the electrode 30A is electrically connected to the coil spring 24. An electrode 30B is also provided on the lower surface of the spring insertion portion 21B. Further, the wall portion 23 is also provided with an electrode 30C at a position facing the electrode 30A of the pad portion 21A described above, and the support base 22 is also provided with an upper surface of the convex portion 22A (that is, a position facing the electrode 30B). An electrode 30D is provided on the substrate. Further, the support base 22 is provided with three metal terminals (conduction terminals) 40a, 40b, and 40c. The electrode 30C, the coil spring 24, and the electrode 30D are electrically connected to the metal terminals 40a, 40b, and 40c, respectively. Connected. The three metal terminals 40a, 40b, and 40c are configured such that the continuity between the two terminals is detected by the control unit 11 of the speech recognition apparatus 10.

以上の構成により、タッチパッド２１が揺動操作されると、パッド部２１Ａの電極３０Ａと壁部２３の電極３０Ｃとが接触し、コイルスプリング２４を介して金属端子４０ａと金属端子４０ｂとが導通する（つまり、揺動操作（撫で操作）検出手段として機能する）。言い換えれば、図４に示すように、ユーザがタッチパッド２１に対して人を撫でる動作を模した操作を行うと、金属端子４０ａと金属端子４０ｂとが導通する。本実施形態では、この操作を「適切な認識である」との肯定評価の入力操作に割り当てており、制御部１１は、この金属端子４０ａと金属端子４０ｂとの導通を検出することによって、ユーザから肯定評価が入力されたことを検知する。 With the above configuration, when the touch pad 21 is swung, the electrode 30A of the pad portion 21A and the electrode 30C of the wall portion 23 are in contact with each other, and the metal terminal 40a and the metal terminal 40b are electrically connected via the coil spring 24. (I.e., it functions as a swinging operation (swinging operation) detecting means). In other words, as shown in FIG. 4, when the user performs an operation that simulates the action of stroking a person on the touch pad 21, the metal terminal 40 a and the metal terminal 40 b become conductive. In the present embodiment, this operation is assigned to an affirmative evaluation input operation of “appropriate recognition”, and the control unit 11 detects the continuity between the metal terminal 40a and the metal terminal 40b, thereby allowing the user to It is detected that a positive evaluation is input from.

また、タッチパッド２１が押圧操作されると、タッチパッド２１の電極３０Ｂと支持台２２の電極３０Ｄとが接触し、金属端子４０ｂと金属端子４０ｃとが導通する（つまり、上下動（叩き操作）検出手段として機能する）。言い換えれば、図５に示すように、ユーザがタッチパッド２１に対して人を叩く動作を模した操作を行うと、金属端子４０ａと金属端子４０ｂとが導通する。本実施形態では、この操作を、「不適切な認識（間違った認識）である」との否定評価の入力操作に割り当てており、制御部１１は、この金属端子４０ａと金属端子４０ｂとの導通を検出することによって、ユーザから否定評価が入力されたことを検知する。 When the touch pad 21 is pressed, the electrode 30B of the touch pad 21 and the electrode 30D of the support base 22 come into contact with each other, and the metal terminal 40b and the metal terminal 40c are electrically connected (that is, up and down movement (striking operation)). Functions as a detection means). In other words, as shown in FIG. 5, when the user performs an operation simulating the operation of hitting a person on the touch pad 21, the metal terminal 40a and the metal terminal 40b become conductive. In the present embodiment, this operation is assigned to an input operation for negative evaluation of “inappropriate recognition (incorrect recognition)”, and the control unit 11 conducts the metal terminal 40a and the metal terminal 40b. , It is detected that a negative evaluation is input from the user.

次に、この音声認識装置１０の音声認識の手順を説明する。図６は、音声認識の手順を示すフローチャートである。電源が投入されると、まず、制御部１１は、初期条件の設定を行う（ステップＳ１）。初期条件の設定とは、条件記憶用メモリ１５に記憶された学習データを読み出し、アルゴリズムＤＢ１３に登録された複数の認識アルゴリズムの中からデフォルトに設定された認識アルゴリズムを選択すると共に、認識パラメータＤＢ１４に登録された複数の認識用パラメータの中からデフォルトに設定されたパラメータを選択する処理である。 Next, the voice recognition procedure of the voice recognition apparatus 10 will be described. FIG. 6 is a flowchart showing a voice recognition procedure. When the power is turned on, first, the control unit 11 sets initial conditions (step S1). The initial condition is set by reading the learning data stored in the condition storage memory 15 and selecting a recognition algorithm set as a default from a plurality of recognition algorithms registered in the algorithm DB 13 and in the recognition parameter DB 14. This is a process of selecting a parameter set as a default from a plurality of registered recognition parameters.

次いで、制御部１１は、マイクロフォン１２を介して音声を入力したことを条件に、音声認識処理を実行する（ステップＳ２）。この音声認識処理の動作は従来と略同様であるため、簡単に説明すると、制御部１１は、マイクロフォン１２を介して入力した音声信号に対して上記ステップＳ１の処理で選択された認識用パラメータを用いて、選択された認識アルゴリズムにて音声認識を行い、読み出した学習データがある場合はこの学習データも用いて、複数の音素テキストを演算より求め、これら音素テキストを優先順位順に並べ替えて一時記憶メモリ（図示せず）に格納する。 Next, the control unit 11 performs a voice recognition process on condition that voice is input via the microphone 12 (step S2). Since the operation of the voice recognition process is substantially the same as the conventional one, in brief explanation, the control unit 11 sets the recognition parameter selected in the process of step S1 above for the voice signal input through the microphone 12. Using the selected recognition algorithm, and if there is read learning data, the learning data is also used to obtain a plurality of phoneme texts by calculation, and these phoneme texts are rearranged in order of priority and temporarily stored. Store in a storage memory (not shown).

そして、制御部１１は、認識候補の音素テキストの中から最も可能性が高い音素テキスト（最も優先順位の高い音素テキスト）をユーザに報知する報知処理を行う（ステップＳ３）。具体的には、制御部１１は、音声合成部１６によりその音素テキストに対応する合成音声をスピーカー１７から放音させる処理を行う。これにより、ユーザ（例えば運転手）は、視覚に頼らず、音声認識結果を認識することが可能となっている。なお、この報知処理は、ユーザに対して音声認識結果が適切か否かの評価の入力を促すものでもあるため、上記放音内容は、例えば、「○○○（音声認識結果である音素テキスト）で良いですか？」といった問い合わせ形式（対話形式）のものであってもよい。なお、この音声認識装置１０が液晶表示装置などの表示装置を具備する構成の場合、この報知処理の際に、上記放音処理と併せて、或いは、上記放音処理に代えて、音声認識結果をテキスト或いは画像で表示するようにしてもよい。 And the control part 11 performs the alerting | reporting process which alert | reports to a user the phoneme text with the highest possibility (phoneme text with the highest priority) from the phoneme text of a recognition candidate (step S3). Specifically, the control unit 11 performs a process of causing the speech synthesizer 16 to emit a synthesized speech corresponding to the phoneme text from the speaker 17. Thereby, a user (for example, a driver) can recognize a voice recognition result without depending on vision. Since this notification process also prompts the user to input an evaluation as to whether or not the speech recognition result is appropriate, the sound emission content is, for example, “XX (phoneme text as a speech recognition result). ) "Can be used (inquiry format). In the case where the voice recognition device 10 includes a display device such as a liquid crystal display device, the voice recognition result is combined with the sound emission processing or instead of the sound emission processing at the time of the notification processing. May be displayed as text or images.

この報知処理の後、制御部１１は、ユーザからこの音声認識結果に対する評価の入力待ち状態となり、入力装置１を介して評価が入力されると（ステップＳ４）、ステップＳ５の処理に移行する。 After this notification process, the control unit 11 enters an input waiting state for an evaluation on the voice recognition result from the user. When the evaluation is input via the input device 1 (step S4), the process proceeds to the process of step S5.

ここで、上述したように、肯定評価の入力操作は、図４に示すように、タッチパッド２１を撫でる操作であるため、ユーザから見れば、タッチパッド２１を撫でるという容易、かつ、肯定の場合に人がとりうる自然な操作にて肯定評価を入力することが可能となる。また、否定評価の入力操作についても、図５に示すように、タッチパッド２１を叩く操作であるため、ユーザから見れば、タッチパッド２１を叩くという容易、かつ、否定の場合に人がとりうる自然な操作にて否定評価を入力することが可能となっている。これにより、ユーザは、直感的な操作で評価を入力することができる。 Here, as described above, since the input operation for affirmative evaluation is an operation for stroking the touch pad 21 as shown in FIG. 4, it is easy and positive that the user can stroke the touch pad 21 from the viewpoint of the user. It is possible to input an affirmative evaluation by a natural operation that a person can take. Further, since the negative evaluation input operation is also an operation of hitting the touch pad 21 as shown in FIG. 5, it is easy for the user to hit the touch pad 21 and can be taken by a person in the case of a negative. It is possible to input a negative evaluation by natural operation. Thereby, the user can input evaluation by intuitive operation.

否定評価が入力されると、制御部１１は、次の認識候補の音素テキストの評価を促すべく、ステップＳ３の処理に移行して、次の音素テキストを報知する。この場合、肯定評価が入力されるまで、ステップＳ３〜Ｓ５の処理が繰り返され、認識候補の音素テキストが順に報知される。つまり、ユーザから見れば、否定評価の入力操作を行うだけで、次の認識候補の音素テキストが自動で選択されて報知され、音素テキストを選択する操作を別途行うことが必要ない。 When a negative evaluation is input, the control unit 11 proceeds to the process of step S3 to notify the next phoneme text in order to prompt the evaluation of the next recognition candidate phoneme text. In this case, the processes of steps S3 to S5 are repeated until a positive evaluation is input, and the phoneme texts of the recognition candidates are notified in order. That is, from the viewpoint of the user, only by performing a negative evaluation input operation, the phoneme text of the next recognition candidate is automatically selected and notified, and there is no need to separately perform an operation of selecting the phoneme text.

一方、肯定評価が入力されると、制御部１１は、肯定評価の音素テキストに応じた処理、例えば、カーナビゲーション装置の場合にその音素テキストが「ちずかくだい（地図拡大）」であった場合は、表示している地図を拡大表示する処理を実行し、カーオーディオ装置の場合にその音素テキストが「しーでぃーさいせい（ＣＤ再生）」であった場合は、ＣＤの再生処理を実行する。すなわち、この音声認識装置１０が音声認識時にユーザに要求する操作は、否定評価又は肯定結果の入力操作だけとなっている。 On the other hand, when an affirmative evaluation is input, the control unit 11 performs processing corresponding to the phoneme text of the affirmative evaluation, for example, in the case of a car navigation device, the phoneme text is “Chizukakuai (map expansion)”. Executes a process of enlarging the displayed map, and in the case of a car audio device, if the phoneme text is “Shi Disai (CD playback)”, the CD playback process is performed. Execute. That is, the operation that the speech recognition apparatus 10 requests the user at the time of speech recognition is only a negative evaluation or a positive result input operation.

次いで、制御部１１は、この評価結果を基づいて統計的処理（利用した認識アルゴリズム認識パラメータの組み合わせと評価結果の「肯定評価」、「否定評価」の頻度との相関計算など）を行い（ステップＳ６）、相関が高くかつ「肯定評価」の頻度が高い認識アルゴリズムと認識パラメータの組み合わせを選択する（ステップＳ７）。 Next, the control unit 11 performs statistical processing (correlation calculation between the combination of the recognition algorithm recognition parameters used and the frequency of “positive evaluation” and “negative evaluation” of the evaluation result) based on the evaluation result (step) S6) A combination of a recognition algorithm and a recognition parameter having a high correlation and a high frequency of “positive evaluation” is selected (step S7).

そして、制御部１１は、マイクロフォン１２を介して入力した音声については、ステップＳ７で選択した認識アルゴリズムと認識パラメータの組み合わせに従って音声認識処理（ステップＳ２の処理）を実行し、電源がオフされた場合などに、その時点の認識アルゴリズムと認識パラメータの組み合わせをデフォルトとすべく、条件記憶用メモリ１５に記憶された学習データを更新（学習処理）する（ステップＳ８）。これにより、ユーザの評価結果を反映して音声認識アルゴリズムや認識パラメータをリアルタイムに改善することができる。以上が、この音声認識装置１０の音声認識の手順である。 And the control part 11 performs the speech recognition process (process of step S2) according to the combination of the recognition algorithm selected in step S7, and the recognition parameter about the audio | voice input via the microphone 12, and a power supply is turned off. For example, the learning data stored in the condition storage memory 15 is updated (learning process) so that the combination of the recognition algorithm and the recognition parameter at that time is the default (step S8). Thereby, a voice recognition algorithm and a recognition parameter can be improved in real time reflecting a user's evaluation result. The above is the speech recognition procedure of the speech recognition apparatus 10.

以上説明したように、本実施形態に係る音声認識装置１０は、否定評価が入力されると、次の認識候補の音素テキストを報知してそのテキストの評価の入力を促すので、ユーザは複数の認識候補の中から音素テキストを選択する操作（選択操作や決定操作）を行う必要がなく、評価の入力操作を行うだけでよくなる。さらに、この音声認識装置１０は、肯定評価の入力操作を、ユーザがタッチパッド２１を撫でる操作とし、否定結果の入力操作を、ユーザがタッチパッド２１を叩く操作としたことにより、容易かつ自然な操作で評価を入力することができる。これにより、入力による疲労やストレスが軽減される。これらにより、ユーザは、操作対象となるタッチパッド２１だけの位置を把握しておけば、視覚に頼らないで容易に操作することが可能となる。 As described above, when a negative evaluation is input, the speech recognition apparatus 10 according to the present embodiment notifies the next recognition candidate phoneme text and prompts the user to input the evaluation of the text. There is no need to perform an operation (selection operation or determination operation) for selecting a phoneme text from recognition candidates, and only an evaluation input operation is required. Furthermore, the voice recognition device 10 is easy and natural because the input operation for affirmative evaluation is an operation for the user to stroke the touch pad 21 and the input operation for a negative result is an operation for the user to tap the touch pad 21. Evaluation can be input by operation. Thereby, fatigue and stress due to input are reduced. Thus, if the user knows the position of only the touch pad 21 to be operated, the user can easily operate without relying on vision.

なお、上述した実施形態は、本発明の一態様を示すものであり、この発明を限定するものではなく、本発明の範囲内で任意に変更可能である。さらに、実施例で示した構成はそれに限定されるものでは無く、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。例えば、上述した実施形態において、タッチパッド２１を撫でる操作と、タッチパッド２１を叩く操作とによって肯定評価と否定評価とを入力する場合について述べたが、要は、各々異なる操作態様により肯定評価と否定評価とを入力可能にすればよく、その他の操作方法を適用してもよい。 In addition, embodiment mentioned above shows the one aspect | mode of this invention, This invention is not limited, It can change arbitrarily within the scope of the present invention. Further, the configuration shown in the embodiment is not limited to the configuration, and can be appropriately changed without departing from the gist of the present invention. For example, in the above-described embodiment, the case where the positive evaluation and the negative evaluation are input by the operation of stroking the touch pad 21 and the operation of hitting the touch pad 21 has been described. What is necessary is just to be able to input negative evaluation, and other operation methods may be applied.

また、上述した実施形態において、評価が入力された場合にその評価に応じてスピーカー１７から音声を放音するようにしてもよい。具体的には、例えば、否定評価が入力された場合は誤りの言葉を音声合成して発したり、肯定評価が入力された場合には喜びの言葉を音声合成して発して、評価への反応を行うようにしてもよい。これにより、ユーザは対話形式で操作を行うことがことができ、音声認識処理の不完全さによるストレスが緩和される。 In the above-described embodiment, when an evaluation is input, sound may be emitted from the speaker 17 according to the evaluation. Specifically, for example, when a negative evaluation is input, an error word is synthesized by speech synthesis, and when a positive evaluation is input, a joy word is synthesized by voice synthesis and the response to the evaluation May be performed. Thereby, the user can perform an operation in an interactive manner, and the stress due to the incompleteness of the speech recognition process is alleviated.

また、上述の実施形態では、本発明を車載用の音声認識装置１０に適用する場合について説明したが、車載用以外の音声認識装置（例えば、音声認識のアプリケーションプログラムがインストールされたパーソナルコンピュータなど）に適用可能である。 In the above-described embodiment, the case where the present invention is applied to the in-vehicle voice recognition device 10 has been described. However, the in-vehicle voice recognition device (for example, a personal computer in which a voice recognition application program is installed). It is applicable to.

本実施形態にかかる音声認識装置の概要を示すブロック図である。It is a block diagram which shows the outline | summary of the speech recognition apparatus concerning this embodiment. 音声認識装置のユーザ評価入力装置の上面図である。It is a top view of the user evaluation input device of a voice recognition device. ユーザ評価入力装置の側断面図である。It is a sectional side view of a user evaluation input device. 肯定評価の入力操作の説明に供する図である。It is a figure where it uses for description of input operation of affirmation evaluation. 否定評価の入力操作の説明に供する図である。It is a figure where it uses for description of input operation of negative evaluation. 音声認識の手順を示すフローチャートである。It is a flowchart which shows the procedure of voice recognition.

Explanation of symbols

１０音声認識装置
１１制御部
１２マイクロフォン
１６音声合成部
１７スピーカー
２０ユーザ評価入力装置
３０Ａ〜３０Ｄ電極 DESCRIPTION OF SYMBOLS 10 Speech recognition apparatus 11 Control part 12 Microphone 16 Speech synthesis part 17 Speaker 20 User evaluation input device 30A-30D Electrode

Claims

Speech recognition means for recognizing a plurality of candidate phoneme texts from a speech signal;
Informing means for informing one phoneme text of a plurality of candidates recognized by the speech recognition means,
An input device for inputting a positive or negative evaluation of the phoneme text notified by the notification means,
When a negative evaluation is input through the input device, the notification unit notifies other phoneme texts of the plurality of candidate phoneme texts in order to prompt input of evaluation of other phoneme texts. Voice recognition device.

The speech recognition apparatus according to claim 1, wherein the input device can input the positive evaluation and the negative evaluation by different operation modes.

Speech recognition means for recognizing phoneme text from speech signals;
Informing means for informing the phoneme text recognized by the speech recognition means,
An input device for inputting a positive or negative evaluation of the phoneme text notified by the notification means,
The speech recognition apparatus, wherein the input device can input the positive evaluation and the negative evaluation by different operation modes.

The speech recognition apparatus according to claim 2, wherein the operation mode of the positive evaluation is a stroke operation.

The voice recognition device according to claim 4, wherein the input device includes an operation body provided so as to be swingable, and detects the stroke operation by detecting the swing of the operation body.

The voice recognition apparatus according to claim 2, wherein the negative evaluation operation mode is a tapping operation.

The voice recognition device according to claim 6, wherein the input device includes an operation body provided so as to freely move up and down, and detects the tapping operation by detecting up and down movement of the operation body.

The speech recognition apparatus according to claim 1, further comprising a sound emitting unit that emits a sound in accordance with an evaluation input via the input device.

The speech recognition apparatus according to claim 1, further comprising a process execution unit that executes a process according to the phoneme text when the positive evaluation is input.

Comprising storage means for storing a plurality of recognition algorithms and a plurality of recognition parameters;
The voice recognition means performs voice recognition based on any of the recognition algorithms and recognition parameters stored in the storage means, and the combination of the recognition algorithm and the recognition parameters used for the voice recognition, Based on the frequency of positive evaluation and negative evaluation input in the above, a combination of a recognition algorithm and a recognition parameter having a high correlation and a high positive evaluation is determined, and thereafter, speech recognition is performed with this determined combination. The speech recognition apparatus according to claim 1.