JP2011221293A

JP2011221293A - Command processing device

Info

Publication number: JP2011221293A
Application number: JP2010090492A
Authority: JP
Inventors: Akio Horii; 昭男堀井; Yohei Okato; 洋平岡登; Toshiyuki Hanazawa; 利行花沢; Tomohiro Iwasaki; 知弘岩崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2010-04-09
Filing date: 2010-04-09
Publication date: 2011-11-04

Abstract

PROBLEM TO BE SOLVED: To provide a command processing device which facilitates a screen operation of a correction work when a selected command is not a user's intended command.SOLUTION: A command processing device comprises: a command selection section 3 which selects an intermediate command which is likely to be a command candidate according to a likelihood worked-out by assuming a sound model and a language model from an input data while a recognition result of a vocabulary recognition is worked out by a voice recognition process section 2 according to the sound model and the language model with respect to the input data entered by a user; an operation cost database memory 7 which stores the operation cost for an operation between commands by defining an error of the operation made by the user as the operation cost; and a likelihood correction section 4 which selects a command from the intermediate commands for minimizing an worked-out evaluation value based on the operation cost.

Description

本発明は、音声により入力された操作コマンドに応じて処理を実行するコマンド処理装置に関するものである。 The present invention relates to a command processing apparatus that executes processing in response to an operation command input by voice.

従来より、ナビゲーション装置などの画面操作と音声操作を有するマルチモーダルの情報機器が開発されている。画面操作は、画面の遷移関係が機器の特定の機能に対応付けられているコマンドの階層構造に沿って作成されている。このため、操作を進めることでユーザの所望のコマンドに徐々に近づき実行することができる。ただし、１回の画面遷移につき１回の画面操作を行う必要があり、コマンドによっては実行に複数回の操作が必要になり不便である。音声操作はコマンドによらず１回の発声で実行可能であり、画面操作の不便さを補うことが可能である。しかし、誤認識によりユーザの所望のコマンドとは異なるコマンドを実行する場合がある。 Conventionally, a multimodal information device having a screen operation and a voice operation such as a navigation device has been developed. The screen operation is created along a hierarchical structure of commands in which the screen transition relationship is associated with a specific function of the device. For this reason, by proceeding with the operation, the user's desired command can be gradually approached and executed. However, it is necessary to perform one screen operation for each screen transition, which is inconvenient because several operations are required for execution depending on the command. The voice operation can be executed with one utterance regardless of the command, and the inconvenience of the screen operation can be compensated. However, a command different from the user's desired command may be executed due to erroneous recognition.

音声認識処理により得られた語彙からコマンドに写像するコマンド処理装置は、音声認識の分野で利用されており、様々な語彙で機器を操作するのに有効である。例えば特許文献１に開示されたコマンド処理装置では、音声認識で認識したユーザが発声した音響的・言語的に最も確からしい語彙に基づいて、コマンドの選出を行っている。また、特許文献２に開示されたコマンド処理装置では、最も確からしい語彙に基づいて、コマンドと関連語句情報が関連付けられた変換データベースを用いて、以下の式（１）に示すように、最も確からしいコマンドの選出を行っている。

なお、式（１）において、Ｃはコマンド、Ｘは入力音声、Ｐ（Ｃ）はコマンドごとの事前に判明している出現確率、Ｐ（Ｘ｜Ｃ）は事前の学習された確率モデルに基づく入力音声の尤度である。ａｒｇｍａｘは要素の中で最大の値の要素を返す関数として与えられるものである。 A command processing apparatus that maps a vocabulary obtained by speech recognition processing to a command is used in the field of speech recognition, and is effective for operating devices with various vocabularies. For example, in the command processing device disclosed in Patent Document 1, a command is selected based on a vocabulary most likely to be acoustically and linguistically uttered by a user who has been recognized by speech recognition. Further, in the command processing device disclosed in Patent Document 2, based on the most probable vocabulary, using the conversion database in which the command and related phrase information are associated, the most probable as shown in the following formula (1). The selection of a new command is performed.

In Equation (1), C is a command, X is an input voice, P (C) is a known occurrence probability for each command, and P (X | C) is based on a prior learned probability model. This is the likelihood of the input speech. arg max is given as a function that returns the element with the maximum value among the elements.

特開平９−５０２９１号公報Japanese Patent Laid-Open No. 9-50291 国際公開ＷＯ２００７／１１４２２６号公報International Publication WO2007 / 114226

従来のコマンド処理装置は以上のように構成されているので、コマンドの選出を式（１）に基づいて行い、尤度Ｐ（Ｘ｜Ｃ）１位のコマンドを実行するか、尤度上位の複数のコマンド候補を提示してユーザに選択させるため、複数のコマンド候補を提示する場合、尤度に基づいて決定するため、機能の内容として全く異なるコマンド候補を複数提示し、ユーザを混乱させる場合があるという課題があった。この問題は特に機能が複雑になった場合に起こる。
さらに、誤認識が生じた場合に、ユーザにどのように遷移をさせたらいいかわかるような階層的に近いコマンドを選出することができず、修正作業に多大な労力をかけさせてしまうという課題があった。 Since the conventional command processing apparatus is configured as described above, the command selection is performed based on the equation (1) and the command with the highest likelihood P (X | C) is executed, or In order to present multiple command candidates and allow the user to select them, when presenting multiple command candidates, to determine based on likelihood, when presenting multiple completely different command candidates as the content of the function, to confuse the user There was a problem that there was. This problem occurs especially when the function is complicated.
Furthermore, when a misrecognition occurs, it is not possible to select a command that is hierarchically similar so that the user can know how to make a transition, and this causes a great deal of effort for correction work. was there.

この発明は、上記のような課題を解決するためになされたもので、コマンドの階層関係も考慮してコマンドを選出することにより、選出したコマンドがユーザの所望のコマンドでなかった場合に行う修正作業である画面操作に伴う労力を軽減することを目的とする。 The present invention has been made to solve the above-described problems. By selecting a command in consideration of the hierarchical relationship of commands, the correction performed when the selected command is not the user's desired command. The purpose is to reduce the labor associated with screen operations.

この発明に係るコマンド処理装置は、ユーザによって入力される入力データに対して語彙認識を行って、確からしい語彙を認識結果として出力する認識処理部と、認識結果について、モデルに従って結果が出現する場合に逆に入力データから見て当該モデルを推測する尤もらしさである尤度を算出し、当該尤度に基づき確からしいコマンド候補である中間コマンドを選出するコマンド選出部と、ユーザが操作を誤った際の操作を操作コストと定義し、コマンド間の操作に伴う操作コストを記憶する操作コストデータベースメモリと、操作コストに基づいて算出される評価値を最小化するコマンドを中間コマンドから選択する尤度補正部を備えるように構成したものである。 The command processing device according to the present invention performs vocabulary recognition on input data input by a user, and outputs a probable vocabulary as a recognition result, and when the result appears according to a model of the recognition result On the other hand, a likelihood that is a likelihood of guessing the model when viewed from the input data is calculated, and a command selection unit that selects an intermediate command that is a probable command candidate based on the likelihood, and the user performs an operation error. Operation cost database memory that stores the operation cost associated with the operation between commands, and the likelihood of selecting the command that minimizes the evaluation value calculated based on the operation cost from the intermediate command A correction unit is provided.

この発明によれば、認識処理部がユーザによって入力される入力データに対して語彙認識を行って算出した認識結果について、モデルに従って結果が出現する場合に逆に入力データから見て当該モデルを推測する尤もらしさである尤度を算出し、当該尤度に基づき確からしいコマンド候補である中間コマンドを選出するコマンド選出部と、ユーザが操作を誤った際の操作を操作コストと定義し、コマンド間の操作に伴う操作コストを記憶する操作コストデータベースメモリと、操作コストに基づいて算出される評価値を最小化するコマンドを中間コマンドから選択する尤度補正部を備えるように構成したので、コマンドの階層関係を考慮してコマンドを選出することができ、選出したコマンドの修正作業である画面操作に伴う労力を軽減することができる。 According to the present invention, when the recognition result calculated by performing vocabulary recognition on the input data input by the user by the recognition processing unit appears according to the model, the model is inferred from the input data. A command selection unit that calculates a likelihood that is a likelihood to be performed, selects an intermediate command that is a probable command candidate based on the likelihood, and defines an operation when the user makes an operation error as an operation cost, The operation cost database memory for storing the operation cost associated with the operation and the likelihood correction unit for selecting the command for minimizing the evaluation value calculated based on the operation cost from the intermediate command. Commands can be selected in consideration of hierarchical relationships, reducing the labor involved in screen operations, which is a modification of selected commands. Door can be.

この発明のコマンド処理における階層関係の一例を示す説明図である。It is explanatory drawing which shows an example of the hierarchical relationship in the command processing of this invention. この発明のコマンド処理における画面操作に基づく操作コストを表したデータベースを示す図である。It is a figure which shows the database showing the operation cost based on the screen operation in the command processing of this invention. この発明のコマンド処理における操作コストの平均値の一例を示す説明図である。It is explanatory drawing which shows an example of the average value of the operation cost in the command processing of this invention. この発明のコマンド処理における階層関係の一例を示す説明図である。It is explanatory drawing which shows an example of the hierarchical relationship in the command processing of this invention. この発明のコマンド処理における操作コストの期待値の算出を示す説明図である。It is explanatory drawing which shows calculation of the expected value of the operation cost in the command processing of this invention. この発明の実施の形態１によるコマンド処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the command processing apparatus by Embodiment 1 of this invention. この発明の実施の形態１によるコマンド処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the command processing apparatus by Embodiment 1 of this invention. この発明の実施の形態１によるコマンド処理装置の変換データベースの一例を示す説明図である。It is explanatory drawing which shows an example of the conversion database of the command processing apparatus by Embodiment 1 of this invention. この発明の実施の形態２によるコマンド処理装置の動作を示すブロック図である。It is a block diagram which shows operation | movement of the command processing apparatus by Embodiment 2 of this invention. この発明の実施の形態２によるコマンド処理装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the command processing apparatus by Embodiment 2 of this invention.

本発明では、コマンド処理装置において、誤認識が生じた場合の修正を考慮して画面操作に基づく操作コストを定義し、操作コストの平均値（評価値）を最小化するようにコマンド選出を行うことを特徴とする。本発明の説明では、まず画面操作に基づくコマンドの階層関係図と操作コストとについて説明を行い、次に操作コストに基づき操作コストの平均値を最小化する音声コマンドの選定手順について説明を行う。 In the present invention, in the command processing device, an operation cost based on the screen operation is defined in consideration of correction in the case of erroneous recognition, and a command is selected so as to minimize the average value (evaluation value) of the operation cost. It is characterized by that. In the description of the present invention, first, a hierarchy diagram of commands based on screen operations and operation costs will be described, and then a voice command selection procedure for minimizing the average value of operation costs based on operation costs will be described.

図１は、この発明による操作コストデータベースのコマンド間の階層関係の一例を示す説明図である。
画面操作に基づくコマンドは、操作画面を提示し、ユーザが画面中から選択する操作を繰り返し行うことで画面遷移し、最終的に所望のコマンドを実行するものである。下位ノードを持つ中間ノードは、コマンド分類を表す選択画面を表し、末端ノードは特定のコマンドを表す。ユーザは、初期ノードＦ０から画面操作により中間ノードＦ１０およびＦ２０を経て、末端ノードＦ１１〜Ｆ１３およびＦ２１〜Ｆ２３に対応付けられた特定のコマンドを実行する。一方、音声操作では直接コマンドに相当する発話を行うことで、初期ノードＦ０から末端のコマンドである末端ノードＦ１１〜Ｆ１３およびＦ２１〜Ｆ２３を実行することができる。 FIG. 1 is an explanatory diagram showing an example of a hierarchical relationship between commands of an operation cost database according to the present invention.
A command based on a screen operation presents an operation screen, and a screen transition is performed by repeatedly performing an operation selected by the user from the screen, and finally a desired command is executed. An intermediate node having lower nodes represents a selection screen representing command classification, and a terminal node represents a specific command. The user executes specific commands associated with the end nodes F11 to F13 and F21 to F23 through the intermediate nodes F10 and F20 by screen operation from the initial node F0. On the other hand, in the voice operation, the end nodes F11 to F13 and F21 to F23, which are the end commands, can be executed from the initial node F0 by directly speaking the command.

次に、操作コストデータベースについて説明を行う。画面操作に基づいて操作を行う場合、ユーザは画面の提示内容を確認し、メニューの選択を繰り返す。もし、誤って所望の操作以外を選択した場合は上位階層に戻り別メニューの選択を行う必要がある。操作コストは、画面遷移に対応付けられ、具体的には平均的な操作時間に対応付けられるものである。この画面操作ごとに発生するコストを図２を用いて説明する。図２は、画面操作に基づく操作コストをデータベースとして表した図であり、具体的には図１で示したコマンド間の移動に伴って発生する操作コストをデータベースに表している。 Next, the operation cost database will be described. When performing an operation based on a screen operation, the user confirms the content presented on the screen and repeats the menu selection. If an operation other than the desired operation is selected by mistake, it is necessary to return to the upper hierarchy and select another menu. The operation cost is associated with the screen transition, and specifically is associated with the average operation time. The cost generated for each screen operation will be described with reference to FIG. FIG. 2 is a diagram showing the operation cost based on the screen operation as a database. Specifically, the operation cost generated in accordance with the movement between commands shown in FIG. 1 is shown in the database.

階層Ｎｏは初期ノードであれば「０」、中間ノードであれば「１」、末端ノードであれば「２」というように、下の階層に移動すればするほど値が増加する。対応するコマンドＩＤ、コマンド名称も合せて記載している。コマンド間の移動にかかる操作コストも記載しており、例えば、コマンドＦ１１からＦ１２の移動にかかる操作コストは、まずＦ１１からＦ１０に戻る操作コスト「１」、Ｆ１０からＦ１２へ進む操作コスト「１」が発生するため、操作コストの期待値は全部で操作回数２×Ｆ１２の尤度６で「１２」と算出される。 Hierarchy No is “0” for the initial node, “1” for the intermediate node, “2” for the end node, and the value increases as the level moves down. Corresponding command IDs and command names are also shown. The operation cost for moving between commands is also described. For example, the operation cost for moving from command F11 to F12 is the operation cost “1” for returning from F11 to F10, and the operation cost “1” for proceeding from F10 to F12. Therefore, the expected value of the operation cost is calculated as “12” with a likelihood 6 of the number of operations 2 × F12 in total.

図２において、例えば、後述するコマンド選出部３から出力されるコマンドが「ＴＯＬＬ＿ＦＩＲＳＴ＿ＲＯＵＴＥ」、「ＲＥＣＯＭＭＥＮＤＥＤ＿ＲＯＵＴＥ」および「ＳＨＯＲＴＥＳＴ＿ＲＯＵＴＥ」で全て同じ、あるいは近似した尤度であった場合、以下の条件で以下の式（２）から最適なコマンドを求める。
＜条件＞
・最も単純に一回の操作（コマンド間の一回の移動）を操作コスト「１」として計算。
・Ｆ１０はスコアが「３」、Ｆ１１、Ｆ１２のスコアが「６」、Ｆ１３のスコアが「５」とする。
式（２）は操作コストの平均値を算出し、この値が最小のコマンドを最終コマンドＣ＾として出力する式で、以下のように表わされる。

In FIG. 2, for example, when the commands output from the command selection unit 3 described later have the same or approximate likelihoods in “TOLL_FIRST_ROUTE”, “RECOMMENDED_ROUTE”, and “SHORTEST_ROUTE”, the following formula is used under the following conditions: The optimum command is obtained from (2).
<Conditions>
・ The simplest operation (one movement between commands) is calculated as the operation cost “1”.
F10 has a score of “3”, F11 and F12 have a score of “6”, and F13 has a score of “5”.
Expression (2) is an expression for calculating an average value of operation costs and outputting a command having the smallest value as the final command C ^ and is expressed as follows.

なお、式（２）において、ＣおよびＣ´は個々の音声コマンド、Ｘは入力音声である。またｌ（）は括弧内の辞書の遷移に伴って発生する操作コストを返す関数である損失関数、Ｐ（Ｃ´｜Ｘ）は音声Ｘが観測された場合にコマンドＣ´である尤度である。ａｒｇｍｉｎは要素の中で最小の値の要素を返す関数である。式（２）は、式（１）にベイズの決定理論における損失関数ｌを考慮した場合に相当し、式（３）の条件では式（４）に変形できるため、式（１）と等しい。ベイズ決定理論については参考文献１に記載されている。 In Equation (2), C and C ′ are individual voice commands, and X is an input voice. Also, l () is a loss function that is a function that returns an operation cost that occurs in accordance with the dictionary transition in parentheses, and P (C ′ | X) is a likelihood that the command C ′ is obtained when the speech X is observed. is there. arg min is a function that returns the element having the smallest value among the elements. Equation (2) corresponds to the case where the loss function l in Bayesian decision theory is considered in Equation (1), and can be transformed into Equation (4) under the condition of Equation (3), and is therefore equal to Equation (1). The Bayesian decision theory is described in Reference 1.

参考文献１．
飯島泰造：パターン認識理論，基礎情報工学シリーズ６，森北出版（１９８９） Reference 1.
Taizo Iijima: Pattern Recognition Theory, Basic Information Engineering Series 6, Morikita Publishing (1989)

図３は、式（２）を用いて算出された、各コマンドがユーザの所望のコマンド、即ち正解コマンドである場合の操作コストの平均値を示す図である。この図３を参照し、操作コストの平均値が最小となるＦ１０が最終的にコマンドとして出力される。
この例では、ユーザの所望ではないコマンドの選択が行われず、中間ノードでユーザに選択肢を提示し、次の操作で所望のコマンドを選択することが可能となり、操作コストを抑制することができる。また、ユーザの発声が曖昧である場合は常に中間ノードに移動する従来の技術と比較して、比較的近いコマンドが存在する場合はそのコマンドを提示することでユーザの操作コストを軽減する。 FIG. 3 is a diagram illustrating an average value of operation costs when each command is a command desired by the user, that is, a correct command, calculated using Expression (2). Referring to FIG. 3, F10 that minimizes the average operation cost is finally output as a command.
In this example, a command that is not desired by the user is not selected, it is possible to present options to the user at the intermediate node, select the desired command in the next operation, and suppress operation costs. In addition, when the user's utterance is ambiguous, the operation cost of the user is reduced by presenting the command when there is a relatively close command, as compared with the conventional technique that always moves to the intermediate node.

なお、Ｆ１１とＦ２１が曖昧である可能性が少ない場合等を考慮して、全コマンドに関して操作コストの平均値を求めるのではなく、特定のグループ（図４に示すコマンド間の階層関係の例で示すＦ１０〜Ｆ１３のグループ）に限定して操作コストの平均値を求めることも可能である。なお、具体的な構成については実施の形態２において記載する。 In consideration of the case where F11 and F21 are unlikely to be ambiguous, the average operation cost is not calculated for all commands, but a specific group (in the example of the hierarchical relationship between commands shown in FIG. 4). It is also possible to obtain the average value of the operation cost by limiting to the group of F10 to F13 shown. A specific configuration will be described in Embodiment 2.

また、式（２）は式（５）に示すように操作コストの期待値や尤度の重みを変更することも可能である。

Further, in Expression (2), as shown in Expression (5), the expected value of the operation cost and the weight of likelihood can be changed.

なお、式（５）において、ＣおよびＣ´は個々の音声コマンド、Ｘは入力音声である。また、ｌ（）は括弧内の事象の遷移に伴い発生する操作コストの期待値を返す関数である損失関数、Ｐ（Ｃ´｜Ｘ）は音声Ｘが観測された場合にコマンドＣ´である尤度である。ａｒｇｍｉｎは要素の中で最小の値の要素を返す関数、λ₁は操作コストの重み、λ₂は尤度の重みとして与えられるものである。図５に示すように、式（１）と式（２）で操作コストの重みλ₁が０の場合は同一の最終コマンドＣ＾が出力される。 In Equation (5), C and C ′ are individual voice commands, and X is an input voice. Further, l () is a loss function that is a function that returns an expected value of the operation cost that occurs in association with the transition of the event in parentheses, and P (C ′ | X) is a command C ′ when the voice X is observed. Likelihood. arg min is a function that returns an element having the smallest value among the elements, λ ₁ is given as a weight of the operation cost, and λ ₂ is given as a weight of likelihood. As shown in FIG. 5, when the operation cost weight λ ₁ is 0 in the equations (1) and (2), the same final command C ^ is output.

なお、上述の例では、一回の画面操作につき操作コストが「１」発生する場合を例に説明したが、同階層の機能を選択する際に発生するコストである認知コスト、誤認識の起こりやすさから発生するコストである誤認識のコスト、上位機能へ戻れない場合のコストである戻るコスト、メニュー画面等に戻り再度同じ階層をたどるコストである再選択コストなどを単独あるいは複合的に用いて操作コストを算出してもよい。
なお、上述の例では、音声認識処理の結果得られる認識結果を対象にコマンドを選出すると説明したが、テキスト入力の結果得られるテキストに基づきコマンドを選出してもよい。 In the above example, the case where an operation cost of “1” is generated for each screen operation has been described as an example. However, a recognition cost or a misrecognition that is a cost generated when selecting a function in the same hierarchy is described. The cost of misrecognition, which is a cost arising from ease, the return cost, which is the cost when returning to a higher function, the reselection cost, which is the cost of returning to the menu screen and following the same hierarchy, etc., is used alone or in combination The operation cost may be calculated.
In the above example, the command is selected based on the recognition result obtained as a result of the speech recognition process. However, the command may be selected based on the text obtained as a result of the text input.

実施の形態１．
次に、上述したコマンド処理を行うコマンド処理装置について説明を行う。図６は、この発明の実施の形態１によるコマンド処理装置の構成を示すブロック図である。
コマンド処理装置１０は、音声入力部１、音声認識処理部（認識処理部）２、コマンド選出部３、尤度補正部４、認識辞書データベースメモリ５、変換データベースメモリ６および操作コストデータベースメモリ７で構成されている。 Embodiment 1 FIG.
Next, a command processing apparatus that performs the above-described command processing will be described. FIG. 6 is a block diagram showing the configuration of the command processing apparatus according to Embodiment 1 of the present invention.
The command processing device 10 includes a voice input unit 1, a voice recognition processing unit (recognition processing unit) 2, a command selection unit 3, a likelihood correction unit 4, a recognition dictionary database memory 5, a conversion database memory 6, and an operation cost database memory 7. It is configured.

音声入力部１は、ユーザによって発声される音声を入力し、この音声に対してＡ／Ｄ変換を行って音声データを出力する。音声認識処理部２は、音声入力部１から出力される音声データを入力し、認識辞書データベースメモリ５を参照し、この音声データに対して語彙の認識を行って、認識結果を出力する。この際、音声認識処理部２は最も確からしい語彙を第１候補として予想され得る語彙を例えば第５候補まで求めて確からしい順に並べた候補リストとそれぞれの候補の確からしさを表わす数値である認識尤度を認識結果として出力する（参考文献２参照）。 The voice input unit 1 inputs voice uttered by the user, performs A / D conversion on the voice, and outputs voice data. The voice recognition processing unit 2 inputs the voice data output from the voice input unit 1, refers to the recognition dictionary database memory 5, performs vocabulary recognition on the voice data, and outputs a recognition result. At this time, the speech recognition processing unit 2 obtains a vocabulary that can be expected from the most probable vocabulary as the first candidate, for example, a candidate list in which the vocabulary can be predicted up to the fifth candidate and arranged in a probable order, and a recognition value that represents the probability of each candidate. The likelihood is output as a recognition result (see Reference 2).

参考文献２．
特開昭６０−１６６９９７ Reference 2.
JP-A-60-166997

コマンド選出部３は、認識結果を入力し、変換データベースメモリ６を参照し、尤度を算出し、中間コマンドと尤度を出力する。この際、コマンド選出部３は最も確からしいコマンドを第１候補として予想され得るコマンドを例えば第５候補まで求めて確からしい順に並べた候補リストを中間コマンドとして出力する。尤度補正部４は、中間コマンドと尤度を入力し、操作コストデータベースメモリ７を参照し、操作コストを用いて、操作コストの平均値を算出し、操作コストの平均値の最小のコマンドを出力する。 The command selection unit 3 inputs the recognition result, refers to the conversion database memory 6, calculates the likelihood, and outputs the intermediate command and the likelihood. At this time, the command selection unit 3 obtains a command that can be predicted as the most likely command as the first candidate, for example, up to the fifth candidate, and outputs a candidate list that is arranged in the most likely order as an intermediate command. The likelihood correction unit 4 inputs an intermediate command and likelihood, refers to the operation cost database memory 7, calculates an average value of the operation cost using the operation cost, and calculates a command having the minimum average operation cost. Output.

認識辞書データベースメモリ５は、音声認識処理部２で尤度を算出する際に用いる音響モデル（モデル）や言語モデル（モデル）のデータベースを記憶するメモリである。変換データベースメモリ６は、ユーザが発声した語句をコマンドに変換するためのコマンド名称・関連語句の対応を記述したデータベースを記憶するメモリである。すなわち、変換データベースメモリ６は、変換情報を複数記憶するテーブル構造に構築されている。 The recognition dictionary database memory 5 is a memory for storing a database of acoustic models (models) and language models (models) used when the speech recognition processing unit 2 calculates the likelihood. The conversion database memory 6 is a memory for storing a database in which correspondences between command names and related words for converting words uttered by the user into commands are described. That is, the conversion database memory 6 is constructed in a table structure that stores a plurality of pieces of conversion information.

操作コストデータベースメモリ７は、コマンド間の階層構造の関係を表し、尤度補正部４で操作コストの平均値を算出するためのコマンド間の遷移に伴う操作コストのデータベースを記憶するメモリである。すなわち、操作コストデータベースメモリ７は、図２に示すように、コマンド間の遷移に伴う操作コストを記憶するメモリである。 The operation cost database memory 7 represents a hierarchical structure relationship between commands, and is a memory for storing a database of operation costs associated with transitions between commands for the likelihood correction unit 4 to calculate an average value of operation costs. That is, the operation cost database memory 7 is a memory for storing operation costs associated with transitions between commands, as shown in FIG.

次に、実施の形態１によるコマンド処理装置の動作について説明する。図７は、実施の形態１によるコマンド処理装置の動作を示すフローチャートである。
音声入力部１は、一般的に話者であるユーザの近傍に設置されるか、またはユーザが保持し、ユーザの音声が入力される。音声入力部１は、この入力に対して、例えば１６ｋＨｚサンプリング、１６ビットでＡ／Ｄ変換を行い、音声データを音声認識処理部２に出力する（ステップＳＴ１）。 Next, the operation of the command processing apparatus according to the first embodiment will be described. FIG. 7 is a flowchart showing the operation of the command processing apparatus according to the first embodiment.
The voice input unit 1 is generally installed in the vicinity of a user who is a speaker, or is held by the user, and the user's voice is input. The voice input unit 1 performs A / D conversion with, for example, 16 kHz sampling and 16 bits on this input, and outputs voice data to the voice recognition processing unit 2 (step ST1).

音声認識処理部２は、ステップＳＴ１において入力される音声データについて公知の方法と同様に音声区間を判定し、判定した音声区間の音声データに対して、認識辞書データベースメモリ５の音響モデル・言語モデルを参照し、マッチングをとり、最も確からしい語句を認識結果としてコマンド選出部３に出力する（ステップＳＴ２）。 The voice recognition processing unit 2 determines a voice section for the voice data input in step ST1 in the same manner as a known method, and the acoustic model / language model of the recognition dictionary database memory 5 is determined for the voice data of the determined voice section. Are matched, and the most probable word / phrase is output as a recognition result to the command selection unit 3 (step ST2).

コマンド選出部３は、ステップＳＴ２において入力される認識結果に基づき、変換データベースメモリ６を参照し、中間コマンドおよび尤度を尤度補正部４に出力する（ステップＳＴ３）。図８は、実施の形態１のコマンド選出部３が尤度を算出する際に用いる変換データベースを示す図である。以下、図８を参照しながらコマンド選出部３の動作を詳細に説明する。
図８において、例えば、音声認識処理部２に入力されるユーザの発声が“早いルート”だった場合、認識辞書に記載されている音響モデルや言語モデルを用いて算出された確率値を尤度とすると、「ＴＯＬＬ＿ＦＩＲＳＴ＿ＲＯＵＴＥ」（有料道路優先ルート検索）と「ＲＥＣＯＭＭＥＮＤＥＤ＿ＲＯＵＴＥ」（推奨ルート検索）が尤度６、「ＳＨＯＲＴＥＳＴ＿ＲＯＵＴＥ」（最短距離検索）が尤度５、「ＲＯＵＴＥ＿ＳＥＡＲＣＨ」が尤度３となり中間コマンドとして選出される。なお、“”内は認識結果、「」内はコマンド名称とする。 The command selection unit 3 refers to the conversion database memory 6 based on the recognition result input in step ST2, and outputs the intermediate command and likelihood to the likelihood correction unit 4 (step ST3). FIG. 8 is a diagram illustrating a conversion database used when the command selection unit 3 according to Embodiment 1 calculates the likelihood. Hereinafter, the operation of the command selection unit 3 will be described in detail with reference to FIG.
In FIG. 8, for example, when the user's utterance input to the speech recognition processing unit 2 is “fast route”, the probability value calculated using the acoustic model or language model described in the recognition dictionary is used as the likelihood. Then, “TOLL_FIRST_ROUTE” (toll road priority route search) and “RECOMMENDED_ROUTE” (recommended route search) have a likelihood of 6, “SHORSTEST_ROUTE” (shortest distance search) has a likelihood of 5, and “ROUTE_SEARCH” has a likelihood of 3, Elected as. “” Indicates the recognition result, and “” indicates the command name.

尤度補正部４は、ステップＳＴ３において入力された中間コマンドおよび尤度に基づき、操作コストデータベースメモリ７を参照し、最終コマンドを出力する（ステップＳＴ４）。ステップＳＴ４において出力される最終コマンドＣ＾は上述した式（２）に示すように表わされる。尤度補正部４の動作は図２に基づいて説明した上述のとおりである。 The likelihood correcting unit 4 refers to the operation cost database memory 7 based on the intermediate command and the likelihood input in step ST3, and outputs a final command (step ST4). The final command C ^ output in step ST4 is expressed as shown in the above-described equation (2). The operation of the likelihood correction unit 4 is as described above based on FIG.

以上のように、この実施の形態１によれば、コマンド間の遷移に伴う操作コストのデータベースを記憶する操作コストデータベースメモリ７と、当該操作コストデータベースメモリ７の操作コストデータベースを参照し、操作コストを用いて操作コストの平均値を算出し、当該平均値が最小のコマンドを最終コマンドとして出力する尤度補正部４を備えるように構成したので、コマンドの階層関係を考慮してコマンドを選出することができ、選出したコマンドの修正作業である画面操作に伴う労力を軽減することができる。 As described above, according to the first embodiment, the operation cost database memory 7 that stores a database of operation costs associated with transitions between commands and the operation cost database of the operation cost database memory 7 are referred to, and Is used to calculate the average value of the operation costs, and the likelihood correction unit 4 that outputs the command having the smallest average value as the final command is provided. Therefore, the command is selected in consideration of the hierarchical relationship of the commands. It is possible to reduce the labor associated with the screen operation, which is a modification of the selected command.

なお、この実施の形態１では、評価値として操作コストの平均値を例に説明を行ったが、式（５）で表わされる操作コストの平均値に操作コストの期待値や尤度の重みを考慮した値を評価値としてもよい。以下の実施の形態２においても同様である。 In the first embodiment, the average value of the operation cost is described as an example of the evaluation value. However, the expected value of the operation cost and the weight of likelihood are added to the average value of the operation cost represented by the equation (5). A value in consideration may be used as the evaluation value. The same applies to the second embodiment below.

実施の形態２．
この実施の形態２では、実施の形態１の構成に加え、特定のコマンドを選択して操作コストの平均値を算出する構成を示す。
図９は、この実施の形態２によるコマンド処理装置の構成を示すブロック図であり、実施の形態１のコマンド処理装置１０にコマンド選択部８を追加して設けている。なお以下では、実施の形態１に係るコマンド処理装置の構成要素と同一または相当する部分には実施の形態１で使用した符号と同一の符号を付して説明を省略または簡略化する。 Embodiment 2. FIG.
In the second embodiment, in addition to the configuration of the first embodiment, a configuration is shown in which a specific command is selected and an average value of operation costs is calculated.
FIG. 9 is a block diagram showing a configuration of the command processing device according to the second embodiment, and a command selection unit 8 is additionally provided in the command processing device 10 of the first embodiment. In the following, the same or corresponding parts as those of the command processing apparatus according to the first embodiment are denoted by the same reference numerals as those used in the first embodiment, and the description thereof is omitted or simplified.

コマンド選択部８は、コマンド選出部３が選出した中間コマンドと尤度を入力し、操作コストデータベースメモリ７を参照して、例えば中間コマンドの中で尤度が最大のコマンドとの操作コストが一定値以下のコマンドを選択コマンドとし、当該選択コマンドと尤度を尤度補正部４ａに出力する。
尤度補正部４ａは、選択コマンドと尤度を入力し、操作コストデータベースメモリ７を参照し、操作コストを用いて、操作コストの平均値を算出し、操作コストの平均値が最小となるコマンドを出力する。 The command selection unit 8 inputs the intermediate command selected by the command selection unit 3 and the likelihood, and refers to the operation cost database memory 7, for example, the operation cost with the command having the maximum likelihood among the intermediate commands is constant. The command below the value is set as a selection command, and the selection command and likelihood are output to the likelihood correction unit 4a.
The likelihood correction unit 4a inputs a selection command and likelihood, refers to the operation cost database memory 7, calculates an average value of the operation cost using the operation cost, and a command that minimizes the average value of the operation cost Is output.

次に、この実施の形態２のコマンド処理装置の動作について図１０のフローチャートに従って説明を行う。なお、実施の形態１のコマンド処理装置と同一の処理を行うステップには図７で使用した符号と同一の符号を付し、説明を省略または簡略化する。
コマンド選出部３は、ステップＳＴ２において入力される認識結果に基づき、変換データベースメモリ６を参照し、中間コマンドおよび尤度をコマンド選択部８に出力する（ステップＳＴ１１）。コマンド選択部８は、ステップＳＴ１１においてコマンド選出部３から入力された中間コマンドおよび尤度に基づき、操作コストデータベースメモリ７を参照し、操作コマンドおよび尤度を取得して尤度補正部４ａに出力する（ステップＳＴ１２）。 Next, the operation of the command processing apparatus according to the second embodiment will be described with reference to the flowchart of FIG. Note that the same reference numerals as those used in FIG. 7 are attached to steps for performing the same processing as that of the command processing apparatus of the first embodiment, and description thereof will be omitted or simplified.
The command selection unit 3 refers to the conversion database memory 6 based on the recognition result input in step ST2, and outputs an intermediate command and likelihood to the command selection unit 8 (step ST11). The command selection unit 8 refers to the operation cost database memory 7 based on the intermediate command and likelihood input from the command selection unit 3 in step ST11, acquires the operation command and likelihood, and outputs the operation command and likelihood to the likelihood correction unit 4a. (Step ST12).

尤度補正部４ａは、ステップＳＴ１２において入力された選択コマンドおよび尤度に基づき、操作コストデータベースメモリ７を参照し、最終コマンドを出力する（ステップＳＴ１３）。ステップＳＴ１３において出力される最終コマンドＣ＾は上述した式（２）に示すように表わされる。尤度補正部４ａの動作は図２に基づいて説明した上述のとおりである。 The likelihood correcting unit 4a refers to the operation cost database memory 7 based on the selection command and the likelihood input in step ST12, and outputs a final command (step ST13). The final command C ^ output in step ST13 is expressed as shown in the above-described equation (2). The operation of the likelihood correcting unit 4a is as described above based on FIG.

以上のように、この実施の形態２によれば、コマンド間の遷移に伴う操作コストのデータベースを記憶する操作コストデータベースメモリ７と、当該操作コストデータベースメモリ７の操作コストデータベースを参照し、操作コストを用いて操作コストの平均値を算出し、当該平均値が最小のコマンドを最終コマンドとして出力する尤度補正部４ａを備えるように構成したので、コマンドの階層関係を考慮してコマンドを選出することができ、選出したコマンドの修正作業である画面操作に伴う労力を軽減することができる。 As described above, according to the second embodiment, the operation cost database memory 7 that stores a database of operation costs associated with the transition between commands and the operation cost database of the operation cost database memory 7 are referred to, and the operation cost is stored. Is used to calculate the average value of the operation costs, and the likelihood correction unit 4a that outputs the command having the smallest average value as the final command is provided. Therefore, the command is selected in consideration of the hierarchical relationship of the commands. It is possible to reduce the labor associated with the screen operation, which is a modification of the selected command.

また、この実施の形態２によれば、操作コストデータベースメモリ７を参照して、所定の条件を満たすコマンドを選択するコマンド選択部８を備えるように構成したので、操作コストの設定によって常に中間コマンドを無視して操作階層上で上位のコマンドを選択するようになるのを防止することができる。さらに、評価値を算出する計算量を抑制することができる。 In addition, according to the second embodiment, the operation cost database memory 7 is referred to and the command selection unit 8 that selects a command that satisfies a predetermined condition is provided. It is possible to prevent the upper command from being selected on the operation hierarchy by ignoring. Furthermore, the calculation amount for calculating the evaluation value can be suppressed.

１音声入力部、２音声認識処理部、３コマンド選出部、４，４ａ尤度補正部、５認識辞書データベースメモリ、６変換データベースメモリ、７操作コストデータベースメモリ、８コマンド選択部、１０コマンド処理装置。 DESCRIPTION OF SYMBOLS 1 Voice input part, 2 Voice recognition process part, 3 Command selection part, 4, 4a Likelihood correction part, 5 Recognition dictionary database memory, 6 Conversion database memory, 7 Operation cost database memory, 8 Command selection part, 10 Command processing apparatus .

Claims

A recognition processing unit that performs vocabulary recognition on input data input by a user and outputs a probable vocabulary as a recognition result;
For the recognition result, when a result appears according to the model, a likelihood that is a likelihood that the model is estimated by looking from the input data is calculated, and an intermediate command that is a probable command candidate is calculated based on the likelihood. A command selection section to select;
An operation cost database memory that defines an operation when the user makes an operation error as an operation cost, and stores an operation cost associated with an operation between commands,
A command processing apparatus comprising: a likelihood correcting unit that selects a command that minimizes an evaluation value calculated based on the operation cost from the intermediate command.

The command processing apparatus according to claim 1, further comprising a command selection unit that refers to the operation cost and the likelihood of the intermediate command and selects an intermediate command to be output to the likelihood correction unit from the intermediate command.