JP3340163B2

JP3340163B2 - Voice recognition device

Info

Publication number: JP3340163B2
Application number: JP32813292A
Authority: JP
Inventors: 博松浦
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-12-08
Filing date: 1992-12-08
Publication date: 2002-11-05
Anticipated expiration: 2017-11-05
Also published as: JPH06175688A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、人間の発声した音声を
認識して機器等を制御するのに好適な音声認識装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus suitable for controlling a device by recognizing a voice uttered by a human.

【０００２】[0002]

【従来の技術】この種の音声認識装置では、利用者が発
声した音声を所定の認識方式により認識し、その認識結
果を例えば類似度の大きい順に例えば第３位までを選択
候補として表示することにより、利用者にいずれか１つ
を選択させるようになっている。2. Description of the Related Art In a speech recognition apparatus of this type, a speech uttered by a user is recognized by a predetermined recognition method, and the recognition result is displayed, for example, up to the third place as a selection candidate in descending order of similarity. Thus, the user can select one of them.

【０００３】[0003]

【発明が解決しようとする課題】しかし従来は、特定の
認識方式による類似度順で選択候補を表示した場合、利
用者の感覚では似ても似つかない候補が併記されること
が度々発生し、非常な違和感を利用者にいだかせるとい
う問題があった。However, conventionally, when selection candidates are displayed in the order of similarity according to a specific recognition method, candidates that are not similar even to the user's sense often appear together, There was a problem that a user could feel very uncomfortable.

【０００４】本発明は上記事情に鑑みてなされたもので
その目的は、音声認識の結果により、その認識候補を外
部出力して利用者に選択させる場合に、極端に似ていな
い候補が表示されることを防止できる音声認識装置を提
供することにある。[0004] The present invention has been made in view of the above circumstances, and its object is to display a candidate that is not extremely similar when the recognition candidate is externally output and selected by a user based on the result of speech recognition. It is an object of the present invention to provide a speech recognition device capable of preventing the voice recognition.

【０００５】[0005]

【課題を解決するための手段】本発明は、利用者が発声
した音声を入力するための音声入力手段と、入力された
音声を分析し、特徴量を抽出する分析・特徴抽出手段
と、抽出された特徴量を用いて音声を認識して１つまた
は複数の認識候補を求める認識手段と、誤って聞き取り
やすい似通った候補同士を予めグループ化して登録して
いる登録手段と、認識手段によって求められた第１位の
認識候補に基づき、上記登録手段に登録された候補グル
ープを特定し、このグループに属する候補と認識手段で
求められた第２位以降の認識候補との間で合致する候補
を求める選択候補決定手段と、認識手段で求めた第１位
の認識候補と上記選択候補決定手段で求めた候補とを合
わせて、これら求められた候補を選択できるよう候補表
示する手段とを備えたことを特徴とするものである。SUMMARY OF THE INVENTION The present invention provides a voice input unit for inputting a voice uttered by a user, an analysis / feature extraction unit for analyzing the input voice and extracting a feature amount, and an extraction unit. a recognition means for obtaining a feature amount of one or more recognition candidates by recognizing speech using a listening erroneously
Group similar easy candidates in advance and register
Registration means are, in the first place obtained by the recognition means
Based on the recognition candidate, the candidate group registered in the registration means
Group and identify them with candidates and recognition means belonging to this group.
Candidates matching with the obtained second and subsequent recognition candidates
A selection candidate determining means for determining a first position determined by the recognition means
Of the candidate for recognition and the candidate obtained by the selection candidate determining means.
In addition, a candidate table is displayed so that these determined candidates can be selected.
And means shown in the drawings .

【０００６】[0006]

【作用】上記の構成において、利用者が発声した音声は
音声入力手段により入力されて、分析・特徴抽出手段に
より分析され、その特徴量が抽出され、しかる後、認識
手段によりその特徴量を用いた認識処理が行われ、認識
候補が求められる。選択候補決定手段は、認識手段によ
って求められた第１位の認識候補をもとに、登録手段を
参照する。この登録手段には、利用者にとって誤って聞
き取りやすい似通った候補同士が予めグループ化されて
登録されている。この登録情報は、複数の話者が１人ず
つ発声する単語を複数の聞き手が認識した結果を集計す
ることにより、実験的に求めることが可能である。In the above arrangement, the voice uttered by the user is input by the voice input means, analyzed by the analysis / feature extraction means, and the feature quantity is extracted. Thereafter, the feature quantity is used by the recognition means. A recognition process is performed to obtain a recognition candidate. The selection candidate determination unit refers to the registration unit based on the first-ranked recognition candidate obtained by the recognition unit. This registration means, by mistake for the user listen
Similar candidates that are easy to remove are grouped in advance and registered. This registration information can be obtained experimentally by summing up the results obtained by a plurality of listeners recognizing words uttered by a plurality of speakers one by one.

【０００７】そこで選択候補決定手段は、上記のように
第１位の認識候補をもとに登録手段を参照することによ
り候補グループを特定し、このグループに属する候補と
認識手段で求められた第２位以降の認識候補との間で合
致する候補を求める。候補表示手段は、上記第１位の認
識候補と選択候補決定手段で求められた候補とを合わせ
て、これら求められた候補を利用者が選択できるよう候
補表示する。 [0007] Therefore, the selection candidate deciding means, as described above,
A candidate group is specified by referring to the registration means based on the first-ranked recognition candidate, and candidates belonging to this group are identified.
Between the second and subsequent recognition candidates obtained by the recognition means
Find a matching candidate. The candidate display means is configured to recognize the first place.
Combine knowledge candidates and candidates obtained by the selection candidate determination means
So that the user can select these requested candidates.
Complementary display.

【０００８】このように上記の構成によれば、認識手段
により認識された候補を（例えば類似度の大きい順に）
そのまま表示して利用者に選択させるのではなく、第２
位以降の認識候補については、第１位の認識候補と似通
った誤って聞き取りやすい候補のみ選んで利用者に選択
させるため、利用者に違和感を与えない。As described above, according to the above configuration, candidates recognized by the recognition means are sorted (for example, in descending order of similarity).
Not directly be selected by the user indicates the table, the second
The recognition candidates after the first rank are similar to the first recognition candidates.
Since the user selects only candidates that are easy to hear by mistake and makes the user select it, the user does not feel uncomfortable.

【０００９】[0009]

【実施例】以下、本発明の一実施例について、駅の券売
機に用いる音声認識装置に適用した場合を例に、図面を
参照して説明する。図１は、同実施例における音声認識
装置の構成を概略的に示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings, taking as an example a case where the present invention is applied to a voice recognition device used in a ticket vending machine at a station. FIG. 1 is a block diagram schematically showing the configuration of the speech recognition apparatus in the embodiment.

【００１０】図１において、１は本装置に入力される音
声信号（入力音声）をＡ／Ｄ（アナログ／ディジタル）
変換するＡ／Ｄ変換器である。Ａ／Ｄ変換器１は、入力
音声を、例えばサンプリング周波数１２ｋＨｚ，１２ビ
ットで量子化する。In FIG. 1, reference numeral 1 denotes an A / D (analog / digital) audio signal (input audio) input to the apparatus.
An A / D converter for conversion. The A / D converter 1 quantizes the input voice at, for example, a sampling frequency of 12 kHz and 12 bits.

【００１１】Ａ／Ｄ変換器１により量子化された入力音
声は、その音声を分析して特徴量を抽出するための分析
・特徴抽出部２に与えられる。分析・特徴抽出部２は、
Ａ／Ｄ変換器１によって量子化された入力音声の音声パ
ワ−の計算と、ＬＰＣ（Linear Predictive Coding）分
析とを行う。このＬＰＣ分析は、例えばフレ−ム長１６
msec、フレ−ム周期８msecで１６次のＬＰＣメルケプス
トラムを分析パラメ−タとして行われる。なお、分析・
特徴抽出部２での分析は、ＬＰＣ分析に限るものではな
く、ＢＰＦ（Band Pass Filter）分析等でもよい。The input voice quantized by the A / D converter 1 is supplied to an analysis / feature extraction unit 2 for analyzing the voice and extracting a feature. The analysis / feature extraction unit 2
The voice power of the input voice quantized by the A / D converter 1 is calculated, and an LPC (Linear Predictive Coding) analysis is performed. This LPC analysis is performed, for example, for a frame length of 16
The analysis is performed using the LPC mel-cepstral of the 16th order with msec and a frame period of 8 msec as analysis parameters. The analysis and
The analysis in the feature extraction unit 2 is not limited to LPC analysis, but may be BPF (Band Pass Filter) analysis or the like.

【００１２】分析・特徴抽出部２で分析された特徴パラ
メ−タは連続マッチング部３に与えられる。この連続マ
ッチング部３は、音声セグメント（Phonetic Segment）
複合辞書部（以下、ＰＳ複合辞書部と称する）４に登録
されている所定のＰＳ単位の認識辞書との間で時間軸方
向に連続的にマッチング処理して、第１位乃至第ｎ位ま
でのラベル系列（ＰＳラベル系列）とその類似度を求め
るためのものである。以上の音声セグメント（ＰＳ）に
ついては、例えば特願平２−３０６０６１号に詳述され
ている。なお、上記認識辞書は、各ＰＳ（ＰＳラベル）
毎に複数の標準パタ−ンから作成された識別用辞書から
なる。連続マッチング部３でのＰＳによる連続マッチン
グ処理は、次式に示す複合ＬＰＣメルケプストラム類似
尺度を用いて行われる。The feature parameters analyzed by the analysis / feature extraction unit 2 are given to a continuous matching unit 3. The continuous matching unit 3 includes a speech segment (Phonetic Segment)
A matching process is continuously performed in the time axis direction with a predetermined PS unit recognition dictionary registered in a complex dictionary unit (hereinafter, referred to as a PS complex dictionary unit) 4 to perform the first to n-th ranks. To determine the label sequence (PS label sequence) and the similarity thereof. The above audio segment (PS) is described in detail in, for example, Japanese Patent Application No. 2-306061. In addition, the above recognition dictionary is used for each PS (PS label).
It consists of an identification dictionary created from a plurality of standard patterns for each. The continuous matching process by the PS in the continuous matching unit 3 is performed using a composite LPC mel-cepstral similarity scale shown in the following equation.

【００１３】[0013]

【数１】 (Equation 1)

【００１４】なお、（１）式において、ＣはＬＰＣメル
ケプストラム、Ｗ_m ^(Ki)，φ_m ^(ki)はそれぞれＰＳ名Ｋ
i の固有値から求められる重みと固有ベクトルである。
また、（・）は内積を示し、‖ ‖はノルムを示し
ている。In the equation (1), C is an LPC mel-cepstrum, W _m ^(Ki) and φ _m ^(ki) are PS names K, respectively.
The weight and eigenvector obtained from the eigenvalue of i.
Also, (•) indicates the inner product, and {} indicates the norm.

【００１５】連続マッチング部３で求められたＰＳラベ
ル系列のうち、第１位の系列が、ＨＭＭ（hidden marco
v model ；隠れマルコフモデル）を用いた単語照合を行
うためのＨＭＭ認識部５に送られる。[0015] Of the PS label sequences obtained by the continuous matching unit 3, the first sequence is an HMM (hidden marco).
v model; Hidden Markov Model) is sent to the HMM recognition unit 5 for performing word matching.

【００１６】このＨＭＭ認識部５における単語照合につ
き説明する。まず単語照合は、連続マッチング部３から
送られた第１位のＰＳラベル系列を単語毎（カテゴリ
毎）のＨＭＭに通すことにより行われる。The word matching in the HMM recognition unit 5 will be described. First, word matching is performed by passing the first-ranked PS label sequence sent from the continuous matching unit 3 through an HMM for each word (for each category).

【００１７】ここで、ＨＭＭの一般的定式化について述
べる。ＨＭＭでは、Ｎ個の状態Ｓ₁，Ｓ₂，…，Ｓ_Nを
持ち、初期状態がこれらＮ個の状態に確率的に分布して
いるとする。音声では、一定のフレ−ム周期毎に、ある
確率（遷移確率）で状態を遷移するモデルが使われる。
遷移の際には、ある確率（出力確率）でラベルを出力す
るが、ラベルを出力しないで状態を遷移するナル遷移を
導入することもある。出力ラベル系列が与えられても状
態遷移系列は一意には決らない。観測できるのは、ラベ
ル系列だけであることからhidden（隠れ）marcov model
（ＨＭＭ）と呼ばれている。ＨＭＭのモデルＭは次の６
つのパラメ−タから定義される。Ｎ：状態数（状態Ｓ₁，Ｓ₂，…，Ｓ_N）Ｋ：ラベル数（ラベルＲ＝１，２，…，Ｋ）ｐ_ij ：遷移確率Ｓ_iからＳ_jに遷移する確率ｑ_ij(k) ：Ｓ_iからＳ_jへの遷移の際にラベルｋを出力
する確率ｍ_i ：初期状態確率初期状態がＳ_iである確率Ｆ：最終状態の集合Here, a general formulation of the HMM will be described. The HMM has N states S ₁ , S ₂ ,..., S _N , and the initial states are stochastically distributed among these N states. For speech, a model is used in which the state transitions at a certain probability (transition probability) every fixed frame period.
At the time of transition, a label is output with a certain probability (output probability), but a null transition that transitions between states without outputting a label may be introduced. Even if an output label sequence is given, a state transition sequence is not uniquely determined. Since only the label sequence can be observed, the hidden (hidden) marcov model
(HMM). The HMM model M is 6
It is defined from two parameters. N: number of states (states S ₁ , S ₂ ,..., _SN ) K: number of labels (labels R = 1, 2,..., K) p _ij : transition probability Probability q _ij (transition from S _i to S _j ) k): probability of outputting a label k at the time of transition to the S _j from S _i m _i: probability initial state probability initial state is S _i F: the set of final states

【００１８】次に、モデルＭに対して音声の特徴を反映
した遷移上の制限を加える。音声では、一般に状態Ｓ_i
から以前に通過した状態（Ｓ_i-1，Ｓ_i-2，…）に戻る
ようなル−プの遷移は時間的前後関係を乱すため許され
ない。Next, a restriction on the transition reflecting the characteristics of the voice is added to the model M. In speech, generally state S _i
Loop transitions that return to the previously passed state (S _i−1 , S _i−2 ,...) Are not allowed because they disturb the temporal context.

【００１９】ＨＭＭの評価は、モデルＭが第１位のラベ
ル系列Ｏ₁＝ｏ₁₁，ｏ₂₁，…，ｏ_T1を出力する確率Ｐｒ
（Ｏ／Ｍ）を求めることにより行われる。認識時には、
ＨＭＭ認識部５で各モデルを仮定し、連続マッチング部
３から送られる第１位のラベル系列（ＰＳラベル系列）
を用いて、まず確率Ｐｒ（Ｏ／Ｍ）が最大になるような
モデルＭ1 を探し、第１位の認識候補とする。同様に、
ＨＭＭ認識部５は、次に確率が大きくなるモデルＭ2 を
第２位の認識候補とし、その次に確率が大きくなるモデ
ルＭ3 を第３位の認識候補とする。但し、確率が予め定
められた値より小さい時は、認識候補としない。このＨ
ＭＭ認識部５で仮定される各モデル（のパラメータ）
は、ＨＭＭの学習により求められるものであり、ＨＭＭ
バッファ６に蓄積されている。以上のようにして、発声
された入力音声を認識処理することによって、その入力
音声、例えば行先の駅名を高精度に認識することが可能
となる。The evaluation of the HMM is based on the probability Pr that the model M outputs the _first label sequence O ₁ = o ₁₁ , o ₂₁ ,..., O _T1.
(O / M). At the time of recognition,
Assuming each model in the HMM recognizing unit 5, the first label sequence (PS label sequence) sent from the continuous matching unit 3
Is used to find a model M1 that maximizes the probability Pr (O / M), and sets it as the first recognition candidate. Similarly,
The HMM recognizing unit 5 sets the model M2 having the next highest probability as the second recognition candidate, and sets the model M3 having the next highest probability as the third recognition candidate. However, when the probability is smaller than a predetermined value, it is not regarded as a recognition candidate. This H
(Parameters of) each model assumed by the MM recognition unit 5
Is obtained by learning the HMM.
It is stored in the buffer 6. As described above, by performing the recognition processing of the uttered input voice, the input voice, for example, the name of the station at the destination can be recognized with high accuracy.

【００２０】さて、図１の音声認識装置は、以上のＡ／
Ｄ変換器１、分析・特徴抽出部２、連続マッチング部
３、ＰＳ複合辞書部４、ＨＭＭ認識部５およびＨＭＭバ
ッファ６の他に、表示部７、選択部８、候補表示制限部
９および制御部１０を有している。Now, the speech recognition apparatus shown in FIG.
In addition to the D converter 1, analysis / feature extraction unit 2, continuous matching unit 3, PS compound dictionary unit 4, HMM recognition unit 5, and HMM buffer 6, a display unit 7, a selection unit 8, a candidate display restriction unit 9, and a control unit It has a part 10.

【００２１】表示部７は、制御部１０のもとで、例えば
行先駅名の発声勧誘のための表示（ここでは、文字列
「発声して下さい」の表示）、ＨＭＭ認識部５での認識
結果（認識候補）の表示等を行う。この表示部７による
認識結果表示は、図３（ａ），（ｂ）に示すように、第
１位の認識候補（ここでは「菰野」）のみを表示して
も、複数の候補、例えば図３（ｃ）に示すように、第１
位の認識候補（ここでは「菰野」）と第２位の認識候補
（ここでは「駒野」）を表示しても構わない。Under the control of the control unit 10, the display unit 7 displays, for example, an invitation to utter the destination station name (here, the display of the character string "Please utter"), and the recognition result of the HMM recognizing unit 5. (Recognition candidates) are displayed. As shown in FIGS. 3A and 3B, the display of the recognition result by the display unit 7 can be performed by displaying a plurality of candidates, for example, even if only the first-ranked recognition candidate (here, “Komono”) is displayed. As shown in FIG. 3 (c), the first
The position recognition candidate (here, “Komono”) and the second position recognition candidate (here, “Komano”) may be displayed.

【００２２】選択部８は、表示部７に表示された認識候
補を利用者が選択可能なように構成されたものである。
この選択部８と上記表示部７は、例えば液晶ディスプレ
イ（ＣＲＴディスプレイなどの表示モニタでも構わな
い）上に感圧型の透明タブレットを組合わせたタッチパ
ネルを用いて構成されている。The selection unit 8 is configured so that a user can select a recognition candidate displayed on the display unit 7.
The selection unit 8 and the display unit 7 are configured using a touch panel in which a pressure-sensitive transparent tablet is combined on a liquid crystal display (a display monitor such as a CRT display may be used).

【００２３】候補表示制限部９は、ＨＭＭ認識部５で得
られた複数の認識候補を表示部７に表示する際に、例え
ば第２位以下の候補については、第１位の候補と極端に
似ていないものが表示されないように制限を加えるため
のものである。この候補表示制限部９は、図４に示すよ
うに、第１位の候補に対して、２位以下に表示すること
が許される候補、具体的には第１位の候補が発声された
場合に人間では誤って聞取りやすい単語を、第１位とな
り得る各候補毎に登録（蓄積）したテーブル（データベ
ース）である。このテーブルの登録情報は、複数の話者
が１人ずつ発声する単語（この例では駅名）を複数の聞
き手が認識した結果を集計することにより、実験的に求
めることが可能である。When displaying the plurality of recognition candidates obtained by the HMM recognizing unit 5 on the display unit 7, the candidate display restricting unit 9 may, for example, place the second and lower candidates in a position extremely different from the first candidate. This is to restrict things that are not similar from being displayed. As shown in FIG. 4, the candidate display restriction unit 9 determines whether the first candidate is allowed to be displayed in the second place or less, specifically, when the first candidate is uttered. Is a table (database) that registers (accumulates) words that can be easily heard by humans for each candidate that can be the first place. The registration information in this table can be obtained experimentally by summing up results obtained by a plurality of listeners recognizing words (station names in this example) spoken by a plurality of speakers one by one.

【００２４】制御部１０は、表示部７に対する表示制御
をＨＭＭ認識部５の認識結果、候補表示制限部９の登録
内容等に従って行うと共に、選択部８からの選択指示情
報に従い外部機器（ここでは、券売機）を制御する。図
２は、図１の音声認識装置内の表示部７および選択部８
の構成を示す。The control unit 10 performs display control on the display unit 7 in accordance with the recognition result of the HMM recognizing unit 5, the registered contents of the candidate display restricting unit 9 and the like, and according to the selection instruction information from the selection unit 8, an external device (here, , Ticket vending machines). FIG. 2 shows a display unit 7 and a selection unit 8 in the voice recognition device of FIG.
Is shown.

【００２５】図２に示すように、表示部７は、液晶ディ
スプレイ７１と、同ディスプレイ７１に表示する表示情
報を格納するための表示メモリ７２とから構成される。
この表示メモリ７２内の表示情報は、図１に示す制御部
１０により書込まれる。液晶ディスプレイ７１に表示さ
れる内容には、上記した発声勧誘のための表示情報、認
識候補の表示情報の他に、表示された認識候補の確認
（第１位の候補だけの表示の場合）を勧誘するための表
示情報（文字列「確認して下さい」）、利用者による画
面上での確認操作に供される領域（キー領域）の表示情
報（［確認］という項目キー）、（表示された認識候補
が誤りである場合、即ち利用者の意図した候補が表示さ
れていない場合に）利用者による再発声を受付けるため
の領域（キー領域）の表示情報（［言い直し］という項
目キー）、認識結果の選択操作を勧誘するための表示情
報（文字列「選択して下さい」）等がある。As shown in FIG. 2, the display unit 7 comprises a liquid crystal display 71 and a display memory 72 for storing display information to be displayed on the display 71.
The display information in the display memory 72 is written by the control unit 10 shown in FIG. The contents displayed on the liquid crystal display 71 include, in addition to the display information for solicitation invitation and the display information of the recognition candidates, confirmation of the displayed recognition candidates (in the case of displaying only the first candidate). Display information for inviting (character string "please confirm"), display information of the area (key area) used for user's confirmation operation on the screen (item key of [Confirm]), (displayed Display information of an area (key area) for accepting re-speaking by the user when the recognized recognition candidate is an error, that is, when a candidate intended by the user is not displayed (item key of [rephrase]) And display information for inviting a selection operation of the recognition result (character string “please select”).

【００２６】一方、選択部８は、表示部７の表示画面
上、即ち液晶ディスプレイ７１上に積層されて、同ディ
スプレイ７１と一体に形成された感圧シート型の透明タ
ブレット８１と、透明タブレット８１が利用者の指等に
より押圧された場合に、その透明タブレット８１面上の
座標位置を検出するための指示座標検出部８２と、指示
情報判定部８３とから構成される。この指示情報判定部
８３は、指示座標検出部８２により検出された座標と表
示メモリ７２の内容とから、画面上のいずれの項目キー
（表示情報）が利用者により選択指定されたかを判定
し、その判定結果を選択指示情報として制御部１０に送
る。On the other hand, the selection unit 8 includes a pressure-sensitive sheet type transparent tablet 81 laminated on the display screen of the display unit 7, that is, on the liquid crystal display 71 and integrally formed with the display 71, and a transparent tablet 81. Is constituted by an indicated coordinate detecting unit 82 for detecting a coordinate position on the transparent tablet 81 surface when the user is pressed by a user's finger or the like, and an indicated information determining unit 83. The instruction information determination unit 83 determines which item key (display information) on the screen has been selected and designated by the user based on the coordinates detected by the instruction coordinate detection unit 82 and the contents of the display memory 72. The determination result is sent to the control unit 10 as selection instruction information.

【００２７】このような構成において、制御部１０の制
御による発声の勧誘に従って、利用者が行先駅名として
「こもの」と発声したのに対し、ＨＭＭ認識部５にて、
その音声に対する認識処理が行われ、その認識結果とし
て複数の認識候補が制御部１０に送られたものとする。
この認識結果には、第１位の認識候補「菰野」（こも
の）と第２位の認識候補「駒野」（こまの）が含まれて
いるものとする。In such a configuration, while the user utters “komono” as the destination station name in accordance with the invitation to utterance under the control of the control unit 10, the HMM recognition unit 5
It is assumed that a recognition process for the voice is performed, and a plurality of recognition candidates are sent to the control unit 10 as a recognition result.
It is assumed that the recognition result includes the first-ranked recognition candidate “Kono” and the second-ranked recognition candidate “Komano”.

【００２８】制御部１０は、発声勧誘タイミングの一定
時間前から始まる、同タイミングを挟む一定期間（また
は、利用者による候補選択が行われるまでの期間）は、
ＨＭＭ認識部５の認識結果を全て受取るように構成され
ている。このようにすることにより、利用者の発声のタ
イミングに対する制限が緩和される。なお、ＨＭＭ認識
部５の認識結果に対する制御部１０の受取り期間を設定
する代わりに、ＨＭＭ認識部５の動作期間を設定するよ
うにしてもよい。また、認識の対象とする音声の入力期
間を設定するためのスイッチを設け、利用者がこのスイ
ッチをオンにしている期間だけ利用者の発声した音声が
本装置に入力され、その入力音声が全てＨＭＭ認識部５
での認識処理に供されて、その認識結果が制御部１０で
受取られる構成とするようにしてもよい。The control unit 10 sets a fixed period (or a period until the user selects a candidate) starting from a certain time before the vocal invitation solicitation timing and sandwiching the same timing.
The HMM recognition unit 5 is configured to receive all the recognition results. By doing so, the restriction on the timing of the utterance of the user is relaxed. Instead of setting the receiving period of the control unit 10 for the recognition result of the HMM recognizing unit 5, the operating period of the HMM recognizing unit 5 may be set. In addition, a switch for setting the input period of the voice to be recognized is provided, and the voice uttered by the user is input to the apparatus only while the user turns on this switch, and all the input voices are input. HMM recognition unit 5
, And the control unit 10 may receive the recognition result.

【００２９】さて制御部１０は、ＨＭＭ認識部５から送
られた第１位の認識候補「菰野」を含む複数の認識候補
を受取り、表示部７（の液晶ディスプレイ７１）の表示
画面上に、例えば図３（ａ）に示すように第１位の認識
候補「菰野」のみを表示させる。この画面上には、同時
に、例えば画面上方に「確認して下さいの」勧誘メッセ
ージが、画面右下に項目キー［確認］が、画面左下に項
目キー［言い直し］が、それぞれ表示される。The control unit 10 receives a plurality of recognition candidates including the first-ranked recognition candidate “Komono” sent from the HMM recognition unit 5 and displays the plurality of recognition candidates on the display screen of the display unit 7 (the liquid crystal display 71 thereof). For example, as shown in FIG. 3A, only the first-ranked recognition candidate “Kono” is displayed. At the same time, on this screen, for example, an invitation message of "Please confirm" is displayed at the top of the screen, an item key [Confirm] is displayed at the lower right of the screen, and an item key [Restate] is displayed at the lower left of the screen.

【００３０】ここで、画面表示された認識候補「菰野」
が正しいならば、利用者は「確認して下さい」の要求に
従って、表示画面右下の［確認］の項目キー（の領域）
を、指により透明タブレット８１上で触る。Here, the recognition candidate "Komono" displayed on the screen is displayed.
If the is correct, the user follows the request of "Please confirm", and the item key of [Confirmation] at the bottom right of the display screen
Is touched on the transparent tablet 81 with a finger.

【００３１】すると、その［確認］の項目キーの座標が
指示座標検出部８２により検出される。指示情報判定部
８３は、指示座標検出部８２により検出された座標と表
示メモリ７２の内容とから、この検出座標位置に表示さ
れている表示情報の示す項目キーが選択されたこと、即
ち［確認］が選択されたことを判定し、その旨を示す選
択指示情報を制御部１０に送る。これにより制御部１０
は、第１位の候補「菰野」が確認されたものとして、外
部機器（券売機）を制御する。Then, the coordinates of the item key of “confirmation” are detected by the designated coordinate detecting section 82. The instruction information determination unit 83 determines that the item key indicated by the display information displayed at the detected coordinate position has been selected from the coordinates detected by the instruction coordinate detection unit 82 and the contents of the display memory 72, namely, ] Is selected, and selection instruction information indicating the selection is sent to the control unit 10. Thereby, the control unit 10
Controls the external device (ticket vending machine) assuming that the first candidate "Kono" has been confirmed.

【００３２】なお、図３（ａ）に示すような［確認］の
項目キーを表示する代わりに、第３図（ｂ）に示すよう
に第１位の認識候補自体に［確認］の項目キーの役割を
持たせ、「菰野」の表示領域を指で触って確認入力する
ことが可能な構成としてもよい。Instead of displaying the item key of [confirmation] as shown in FIG. 3 (a), the item key of [confirmation] is added to the first recognition candidate itself as shown in FIG. 3 (b). And the confirmation area may be input by touching the display area of “Kono” with a finger.

【００３３】また、利用者が「こもの」と発声したとき
に、例えば図３（ｃ）に示すように、第１位の候補「菰
野」と第２位の候補「駒野」が、［確認］の項目キーの
役割を兼ねた形態で表示された場合には、「選択して下
さい」の要求に従って、利用者が「菰野」の表示領域を
指で触れば、指示情報判定部８３により「菰野」が選択
され、その旨を示す選択指示情報が制御部１０に送られ
る。もし、利用者が意図した候補が表示されなかった場
合には、利用者は、［言い直し］の項目キー（の領域）
を、指により透明タブレット８１上で触る。When the user utters “komono”, as shown in FIG. 3C, for example, the first candidate “Komono” and the second candidate “Komano” are displayed as “confirmed”. When the user touches the display area of “Komono” with a finger according to the request of “Please select”, the instruction information determination unit 83 displays “ “Kono” is selected, and selection instruction information indicating that is selected is sent to the control unit 10. If the candidate intended by the user is not displayed, the user selects the [Rephrase] item key (area).
Is touched on the transparent tablet 81 with a finger.

【００３４】すると、前記した［確認］の項目キー（の
領域）が指で触られた場合と同様にして、選択部８内の
指示情報判定部８３により、［言い直し］が選択された
ことが判定され、その旨を示す選択指示情報が制御部１
０に送られる。これにより制御部１０は、［言い直し］
（再発声）が要求されたものと判断して、言い直しモー
ド（再発声モード）に設定し、再発声のための勧誘を例
えば“ピー”音等により行う。この勧誘に従い、利用者
は、正しく認識されなかった行き先駅名を再発声するこ
とができる。Then, in the same manner as when the (confirmation) item key (area) is touched with a finger, the instruction information determination unit 83 in the selection unit 8 selects [rephrase]. Is determined, and the selection instruction information indicating that is
Sent to 0. Thereby, the control unit 10 [restate]
(Respeak) is determined to have been requested, the mode is set to the rephrasing mode (replay mode), and solicitation for resound is performed by, for example, a “p” sound. According to the invitation, the user can re-utter the destination station name that has not been correctly recognized.

【００３５】さて本実施例において、図３（ａ）または
（ｂ）に示すように第１位の候補だけを表示するか、図
３（ｃ）に示すように第２位までの候補を表示するか
は、制御部１０により決定される。この制御部１０によ
る決定条件は２つある。In this embodiment, only the first candidate is displayed as shown in FIG. 3 (a) or (b), or only the second candidate is displayed as shown in FIG. 3 (c). Whether to do so is determined by the control unit 10. There are two determination conditions by the control unit 10.

【００３６】第１の条件は、第１位と第２位の類似度の
差が第１の所定値を超えているか否か、あるいは第２位
の類似度値が第２の所定値未満であるか否かである。ま
た、第２の条件は、第２位の候補が第１位の候補と対を
なして候補表示制限部９に登録されているか否かであ
る。The first condition is whether the difference between the first and second similarities exceeds a first predetermined value, or if the second similarity value is less than a second predetermined value. It is or not. The second condition is whether or not the second candidate is registered in the candidate display restriction unit 9 in a pair with the first candidate.

【００３７】第１の条件が成立する場合には、第２の条
件の成立／不成立に無関係に、第１位の候補だけが表示
される。また第１の条件が不成立の場合には、第２の条
件が成立なら第２位までの候補が表示され、第２の条件
が不成立なら第１位の候補だけが表示される。When the first condition is satisfied, only the first candidate is displayed regardless of whether the second condition is satisfied or not. If the first condition is not satisfied, the second candidate is displayed if the second condition is satisfied, and only the first candidate is displayed if the second condition is not satisfied.

【００３８】このように本実施例では、第２位の候補の
確からしさが低い場合には、第１位の候補だけを表示し
て、利用者が選択し易いようにしている。また、第２位
の候補の確からしさが高い場合でも、その第２位の候補
が第１位の候補と対をなして候補表示制限部９に登録さ
れていない場合には、その第２位の候補は第１位の候補
とは極端に似ていないものとして表示せず、利用者に違
和感を与えないようにしている。As described above, in this embodiment, when the probability of the second candidate is low, only the first candidate is displayed so that the user can easily select the candidate. Further, even when the probability of the second candidate is high, if the second candidate is not registered in the candidate display restriction unit 9 in a pair with the first candidate, the second candidate is not registered. Is not displayed as not extremely similar to the first candidate, so that the user does not feel uncomfortable.

【００３９】ここで、上記第１の条件が不成立のため
に、第２の条件の成立／不成立に従って、第２位までの
候補を表示するか、あるいは第１位の候補だけを表示す
るかを決定する具体例について説明する。Here, because the first condition is not satisfied, it is determined whether to display the candidates up to the second place or only the first place candidates according to whether the second condition is satisfied or not. A specific example to be determined will be described.

【００４０】図５は、ある利用者（話者）の発声入力に
対するＨＭＭ認識部５の認識結果の１位候補から３位候
補までを示したものである。但し、発声入力された単語
が１位候補にならなかった場合のみ示している。FIG. 5 shows the first to third candidates of the recognition result of the HMM recognizing unit 5 in response to the utterance input of a certain user (speaker). However, only the case where the uttered word is not the first candidate is shown.

【００４１】図５において、例えば「二上」（にじょ
う）という発声入力に対する、認識候補の１位は「十
条」（じゅうじょう）、２位は「二上」、３位は「西の
京」である。一方、候補表示制限部９では、図４に示す
ように、１位候補「十条」に対しては、「新庄」と「二
上」だけが（第２位以下に）表示することが許されてい
る。したがって、上記第１の条件が不成立のために２位
候補まで表示しようとする場合には、１位候補「十条」
の他、２位候補「二上」も表示部７に表示される。In FIG. 5, for example, in response to an utterance input of “Niigami”, the first place of the recognition candidate is “Jujo” (Jujou), the second place is “Niigami”, and the third place is “Nishi no Kyo”. is there. On the other hand, as shown in FIG. 4, the candidate display restriction unit 9 allows only “Shinjo” and “Nigami” to be displayed (below the second place) for the first place candidate “Jujo”. ing. Therefore, when the second condition is to be displayed because the first condition is not satisfied, the first position candidate “Jujo” is displayed.
In addition, the second place candidate “Niigami” is also displayed on the display unit 7.

【００４２】なお上記の例では、たとえ３位候補まで表
示しようとしても、１位候補が「十条」の場合には、３
位候補の「西の京」は、候補表示制限部９では表示する
ことが許されていないため、１位の「十条」と２位の
「二上」以外は表示されない。即ち本実施例では、利用
者にとって「十条」や「二上」とは全く似ていないと感
じられる３位候補「西の京」は、３位候補まで表示しよ
うとする場合でも表示されず、利用者に違和感をいだか
せない。In the above example, even if the third place candidate is to be displayed, if the first place candidate is "Jujo", the third place candidate is displayed.
Since the candidate "Nishi no Kyo" is not allowed to be displayed by the candidate display restriction unit 9, only the first place "Jujo" and the second place "Nigami" are not displayed. That is, in the present embodiment, the third place candidate "Nishi no Kyo", which is felt to be completely dissimilar to "Jujo" or "Nikami" for the user, is not displayed even when trying to display the third place candidate. You don't feel uncomfortable.

【００４３】また、例えば「白木」（しらき）という発
声入力に対する、認識候補の１位は「白木」、２位は
「千代崎」（ちよざき）である。一方、候補表示制限部
９では、図４に示すように、１位候補「白木」に対し、
いずれの単語も表示することが許されていない。したが
って、上記第１の条件が不成立のために２位候補まで表
示しようとする場合であっても、１位候補「白木」しか
表示されず、利用者にとって「白木」とは全く似ていな
いと感じられる２位候補「千代崎」は表示されない。For example, in response to the utterance input of "Shiraki" (Shiraki), the first place of the recognition candidate is "Shiraki" and the second place is "Chiyozaki" (Chiyozaki). On the other hand, as shown in FIG. 4, the candidate display restricting unit 9
No words are allowed to be displayed. Therefore, even when the first condition is not satisfied, even if the second candidate is to be displayed, only the first candidate “Shiroki” is displayed, and the user is not completely similar to “Shiroki”. The second-place candidate "Chiyozaki" that can be felt is not displayed.

【００４４】なお、前記実施例では、候補表示制限部９
を構成するテーブル（データベース）には、第１位の候
補に対して２位以下に表示することが許される候補が、
第１位となり得る各候補毎に登録（蓄積）されているも
のとして説明したが、これに限るものではない。例え
ば、グループ単位で候補を登録し、そのグループ内のい
ずれかの候補が１位候補であれば、そのグループに含ま
れていない候補は、表示が許されない構成としても構わ
ない。In the above embodiment, the candidate display restricting section 9
Are included in the table (database) that constitutes
Although it has been described as being registered (stored) for each candidate that can be the first place, the present invention is not limited to this. For example, candidates may be registered in groups, and if any of the candidates in the group is the first candidate, the display of the candidates not included in the group may not be permitted.

【００４５】また、前記実施例では、選択候補を画面表
示する場合について説明したが、第１位の候補より順次
音声出力して利用者に提示する場合にも同様に適用可能
である。また、本発明は、駅の券売機に用いる音声認識
装置に限らず、音声認識装置全般に適用可能である。In the above-described embodiment, the case where the selection candidates are displayed on the screen has been described. However, the present invention can be similarly applied to the case where the first candidates are sequentially output as voices and presented to the user. Further, the present invention is not limited to a speech recognition device used for a ticket vending machine at a station, and is applicable to all speech recognition devices.

【００４６】[0046]

【発明の効果】以上説明したように本発明の音声認識装
置によれば、音声入力手段により入力されて認識手段に
より認識された認識候補のうちの第２位以降の認識候補
については、第１位の認識候補と似通った誤って聞き取
りやすい候補のみを選んで、つまり第１位の認識候補と
似通っていない候補を除外して、第１位の認識候補と合
わせて利用者が選択できるように候補表示する構成とし
たので、たとえ適用する認識方式特有の誤りのために、
認識手段により認識された候補中に利用者の感覚では似
ても似つかない候補が含まれていたとしても、この種の
候補が選択候補とされることが防止され、利用者に違和
感を与えずに済む。As described above, according to the speech recognition apparatus of the present invention , the second and subsequent recognition candidates among the recognition candidates inputted by the speech input means and recognized by the recognition means.
Was mistakenly heard similar to the first-ranked recognition candidate.
Only the candidates that are easy to understand, that is,
Exclude dissimilar candidates and match them with the top recognition candidate.
In addition , since it is configured to display candidates so that the user can select it, even if it is an error specific to the applied recognition method,
Even if the candidate recognized by the recognition means includes a candidate that does not resemble the user's sense, this type of candidate is prevented from being selected, and the user does not feel uncomfortable. Only

[Brief description of the drawings]

【図１】本発明の一実施例に係る音声認識装置の基本構
成を示すブロック図。FIG. 1 is a block diagram showing a basic configuration of a speech recognition device according to one embodiment of the present invention.

【図２】図１中の表示部７および選択部８の構成を示す
ブロック図。FIG. 2 is a block diagram showing a configuration of a display unit 7 and a selection unit 8 in FIG.

【図３】同実施例における動作を説明するための表示画
面例を示す図。FIG. 3 is an exemplary view showing an example of a display screen for explaining an operation in the embodiment.

【図４】図１中の候補表示制限部９を構成するテーブル
の内容例を示す図。FIG. 4 is a view showing an example of the contents of a table constituting a candidate display restriction unit 9 in FIG. 1;

【図５】ある利用者の発声入力に対する図１中のＨＭＭ
認識部５による認識結果の一例を示す図。FIG. 5 is an HMM in FIG. 1 for a user's utterance input;
The figure which shows an example of the recognition result by the recognition part 5.

[Explanation of symbols]

１…Ａ／Ｄ変換器（音声入力手段）、２…分析・特徴抽
出部、３…連続マッチング部、４…ＰＳ複合辞書部、５
…ＨＭＭ認識部、６…ＨＭＭバッファ、７…表示部、８
…選択部、９…候補表示制限部（登録手段）、１０…制
御部（選択候補決定手段）、７１…液晶ディスプレイ、
７２…表示メモリ、８１…透明タブレット、８２…指示
座標検出部、８３…指示情報判定部。DESCRIPTION OF SYMBOLS 1 ... A / D converter (voice input means), 2 ... Analysis and feature extraction part, 3 ... Continuous matching part, 4 ... PS compound dictionary part, 5
... HMM recognition section, 6 ... HMM buffer, 7 ... Display section, 8
... selection unit, 9 ... candidate display restriction unit (registration unit), 10 ... control unit (selection candidate determination unit), 71 ... liquid crystal display,
72: display memory; 81: transparent tablet; 82: designated coordinate detecting unit; 83: designated information determining unit.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/22 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 15/22

Claims

(57) [Claims]

An audio input means for 1. A user inputs a speech uttered, analyzes the voice inputted by said voice input means, and analyzing and feature extraction means for extracting a feature value, the analysis and feature Recognition means for recognizing speech using the feature amount extracted by the extraction means to obtain one or more recognition candidates, and similar candidates which are apt to be heard by mistake and are grouped in advance.
A registration unit that has been converted and registered, and a first recognition candidate obtained by the recognition unit.
The candidate group registered in the registration means
And the candidates belonging to this group and the
Matching candidates with the second and subsequent recognition candidates
A selection candidate determining means that, first of the recognition candidates with the selected candidate which has been determined by the recognition means
Together with the candidates determined by the decision means,
Means for displaying candidates so that the candidates can be selected .