JP2006195576A

JP2006195576A - Onboard voice recognizer

Info

Publication number: JP2006195576A
Application number: JP2005004360A
Authority: JP
Inventors: Masaaki Ichihara; 雅明市原
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2005-01-11
Filing date: 2005-01-11
Publication date: 2006-07-27
Anticipated expiration: 2025-01-11
Also published as: JP4466379B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognizer capable of providing a highly convenient search function. <P>SOLUTION: This onboard voice recognizer 10 is provided with a voice recognizing means 20 for conducting voice recognition processing for a speech detected in a cabin, a display control means 30 for displaying respective identification character sequences included in the speech within individual display areas separated discriminatably each other on a touch panel type display 44, and for detecting a touch operation of a user in each display area on the touch panel type display, and a retrieval system 70 for retrieving information within a database according to a prescribed retrieval condition, based on each identification character sequence displayed on the touch panel type display. The condition where the identification character sequence displayed within the display area is taken into consideration as an exclusion condition in the retrieval condition is formed in response to the prescribed touch operation onto the display area on the touch panel type display. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声認識を用いて検索を行う車載音声認識装置に関する。 The present invention relates to an in-vehicle speech recognition apparatus that performs a search using speech recognition.

従来から、音声認識時に順位の低い認識結果を予備的な認識候補（誤認識候補）として記憶しておき、順位の高い認識結果を表示し、順位の高い認識結果に対してユーザから誤認識の指摘があった場合に当該予備的な認識候補の中から所望の認識結果をユーザに選択させることができるようにした技術が知られている（例えば、特許文献１参照）。この従来技術では、ユーザが表示画面に表示されたファンクションキーを操作すると、予備的な認識候補（誤認識候補）が表示画面に列挙され、その中からユーザが所望の認識結果を選択できるので、ユーザは再度発話する必要が無くなる。
特開２０００−２５９１７８号公報 Conventionally, a recognition result with a low rank is stored as a preliminary recognition candidate (misrecognition candidate) at the time of speech recognition, a recognition result with a high rank is displayed, and a recognition result with a high rank is recognized by the user. A technique is known that allows a user to select a desired recognition result from the preliminary recognition candidates when there is an indication (see, for example, Patent Document 1). In this prior art, when the user operates a function key displayed on the display screen, preliminary recognition candidates (misrecognition candidates) are listed on the display screen, and the user can select a desired recognition result from among them. The user does not need to speak again.
JP 2000-259178 A

ところで、車両に搭載されるこの種の音声認識装置は、ナビゲーション装置の目的地検索設定機能と結びつくことができる。例えばこの付近のハンバーガー屋に行きたいユーザは、「ハンバーガー」と発声し、音声認識装置に“ハンバーガー”を認識させ、検索ボタンを操作する。この場合、音声認識装置は、現在の車両位置と所定の地図データに基づいて周辺のハンバーガー屋を検索し、ディスプレイに検索結果を一斉に表示する。 By the way, this type of voice recognition device mounted on a vehicle can be combined with the destination search setting function of the navigation device. For example, a user who wants to go to a nearby hamburger shop says “hamburger”, causes the voice recognition device to recognize “hamburger”, and operates the search button. In this case, the voice recognition device searches for a surrounding hamburger shop based on the current vehicle position and predetermined map data, and simultaneously displays the search results on the display.

ここで、ユーザは、同じハンバーガー屋であっても、“ABCバーガー（店名）”以外のハンバーガー屋に行きたい場合や、“DCEバーガーハウス（店名）”若しくは“FGHバーガー（店名）”に行きたい場合がありうる。 Here, even if the user wants to go to a hamburger shop other than “ABC burger (store name)”, or wants to go to “DCE burger house (store name)” or “FGH burger (store name)”, even at the same hamburger shop There may be cases.

しかしながら、上述のような従来的な音声認識装置では、膨大となり得る検索結果（多数のハンバーガー屋のリスト）の中から所望の“ハンバーガー屋”（例えば“ABCバーガー”以外のハンバーガー屋）をユーザが自ら探して選択しなければならず、利便性やユーザフレンドリ性の観点から不十分な一面があった。 However, in the conventional speech recognition apparatus as described above, a user can obtain a desired “hamburger shop” (for example, a hamburger shop other than “ABC burger”) from search results (a list of many hamburger shops) that can be enormous. There was one aspect that was insufficient from the viewpoint of convenience and user friendliness.

一方、現時点で車両に搭載可能な音声認識装置では、検索結果の絞込みを音声認識により行おうにも、音声認識精度上の問題で、“以外の”や“ではなく”といった排他条件や“または”や“若しくは”といったOR条件を音声認識することが困難な現状である。 On the other hand, in a voice recognition device that can be installed in a vehicle at present, even if the search result is narrowed down by voice recognition, an exclusive condition such as “other than” or “not” or “or” Currently, it is difficult to recognize the OR condition such as “or” or “or”.

そこで、本発明は、高い音声識別能力を有さなくても利便性の高い検索機能を提供できる音声認識装置の提供を目的とする。 Accordingly, an object of the present invention is to provide a speech recognition apparatus that can provide a highly convenient search function without having a high speech identification capability.

上記課題を解決するため、本発明の一局面によれば、車内で検出される発話に対して音声認識処理を行う音声認識手段と、
タッチパネル式ディスプレイ上に、前記発話に含まれる各識別文字列を、互いに区別可能な別々の表示領域内にそれぞれ表示すると共に、タッチパネル式ディスプレイ上の各表示領域に対するユーザのタッチ操作を検出する表示制御手段と、
タッチパネル式ディスプレイ上に表示された各識別文字列に基づいて、所定の検索条件に従ってデータベース内における情報検索を行う検索システムとを備え、
タッチパネル式ディスプレイ上の前記表示領域に対する所定のタッチ操作に応じて、該表示領域内に表示されている識別文字列が、前記検索条件において排他条件として考慮される状態が形成されることを特徴とする、車載音声認識装置が提供される。 In order to solve the above-described problem, according to one aspect of the present invention, a voice recognition unit that performs voice recognition processing on an utterance detected in a vehicle;
Display control for displaying each identification character string included in the utterance in a separate display area distinguishable from each other on the touch panel display and detecting a user's touch operation on each display area on the touch panel display Means,
A search system for searching information in the database according to a predetermined search condition based on each identification character string displayed on the touch panel display;
According to a predetermined touch operation on the display area on the touch panel display, a state is formed in which the identification character string displayed in the display area is considered as an exclusive condition in the search condition. An in-vehicle speech recognition device is provided.

本局面において、タッチパネル式ディスプレイ上の前記表示領域に対するタッチ操作回数に応じて、
（１）該表示領域内に表示されている識別文字列が、前記検索条件において他の識別文字列に対してＡＮＤ条件として考慮される第１の状態、
（２）該識別文字列が前記検索条件においてなんら考慮されない第２の状態、及び、
（３）該識別文字列が前記検索条件において排他条件として考慮される第３の状態の何れかが形成されてよい。 In this aspect, according to the number of touch operations for the display area on the touch panel display,
(1) a first state in which an identification character string displayed in the display area is considered as an AND condition with respect to another identification character string in the search condition;
(2) a second state in which the identification character string is not considered in the search condition; and
(3) Any of the third states in which the identification character string is considered as an exclusion condition in the search condition may be formed.

また、前記表示制御手段は、タッチパネル式ディスプレイ上の前記表示領域に対するタッチ操作態様に応じて、該表示領域内に表示されている識別文字列の表示状態を変化させるものであってよい。前記音声認識手段は、「または」「および」「以外」「ではなく」のような検索条件に関わる単語を識別しないものであってよい。 Further, the display control means may change a display state of the identification character string displayed in the display area according to a touch operation mode with respect to the display area on the touch panel display. The voice recognition means may not identify a word related to a search condition such as “or” “and” “other than” “not”.

また、前記表示領域のうちの所定の２つ以上の前記表示領域内に表示されている２つ以上の識別文字列は、前記検索条件において他の識別文字列に対してＯＲ条件として考慮されてよい。前記所定の２つ以上の表示領域は、他の表示領域に対して区別可能な方向に配列されてよい。前記表示制御手段は、前記所定の２つ以上の表示領域には、前記識別文字列のうち同種の識別文字列を表示するものであってよい。 Further, two or more identification character strings displayed in two or more predetermined display areas of the display areas are considered as OR conditions with respect to other identification character strings in the search condition. Good. The two or more predetermined display areas may be arranged in a direction distinguishable from other display areas. The display control means may display the same kind of identification character string among the identification character strings in the two or more predetermined display areas.

また、前記各表示領域内に表示されている各識別文字列間が、前記検索条件においてＡＮＤ条件で互いに結ばれるように初期設定される場合、
所定の隣接する２つの表示領域毎に、１つの操作領域が割り当てられ、
前記１つの操作領域に対するタッチ操作に応じて、該１つの操作領域に係る２つの表示領域内に表示されている２つの識別文字列間の前記検索条件における結びつきが、ＡＮＤ条件からＯＲ条件に切り換えられてよい。この場合において、ユーザが未確定の目的地を確定していくのに適した第１モードでは、前記第１の状態、第２の状態又は第３の状態が選択的に形成されるのに対して、ユーザが確定している目的地を発話して音声認識させるのに適した第２モードでは、前記第１の状態又は第２の状態しか形成されないこととしてよい。また、前記２つのモード間は、ユーザによるスイッチ操作、若しくは、前記発話に含まれる特定のキーワードに応じて切り換えられてよい。前記第２モードでは、前記音声認識手段は、地図関連用語のみを含む認識辞書に基づいて音声識別を行うものであってよい。 Further, when each identification character string displayed in each display area is initially set to be connected to each other by an AND condition in the search condition,
One operation area is assigned to every two adjacent display areas,
In response to a touch operation on the one operation area, the connection in the search condition between two identification character strings displayed in the two display areas related to the one operation area is switched from the AND condition to the OR condition. May be. In this case, in the first mode suitable for the user to confirm the uncertain destination, the first state, the second state, or the third state is selectively formed. Thus, in the second mode suitable for uttering and recognizing the destination determined by the user, only the first state or the second state may be formed. The two modes may be switched according to a switch operation by a user or a specific keyword included in the utterance. In the second mode, the voice recognition means may perform voice identification based on a recognition dictionary including only map-related terms.

本発明によれば、高い音声識別能力を有さなくても利便性の高い検索機能を提供できる音声認識装置を得ることができる。 According to the present invention, it is possible to obtain a voice recognition apparatus that can provide a convenient search function without having a high voice identification capability.

以下、図面を参照して、本発明を実施するための最良の形態の説明を行う。 The best mode for carrying out the present invention will be described below with reference to the drawings.

図１は、本発明による車載音声認識装置の一実施例を示すシステム構成図である。本実施例の音声認識装置１０は、音声認識エンジン２０を搭載したマイクロコンピューターからなる。音声認識エンジン２０は、前処理部２２、特徴量抽出部２４、音響モデル処理／マッチング部２６、及び、言語モデル処理／マッチング部２８を備える。 FIG. 1 is a system configuration diagram showing an embodiment of an in-vehicle speech recognition apparatus according to the present invention. The speech recognition apparatus 10 according to the present embodiment includes a microcomputer equipped with a speech recognition engine 20. The speech recognition engine 20 includes a preprocessing unit 22, a feature amount extraction unit 24, an acoustic model processing / matching unit 26, and a language model processing / matching unit 28.

音声認識装置１０は、車室内の音（音声）を拾うマイク４０を備える。マイク４０により検出された音声は、前処理部２２にて増幅、ノイズ除去などの所定処理を受けて、特徴量抽出部２４に送られる。特徴量抽出部２４では、検出された音声信号（発話データ）から特徴量が抽出され、次いで、音響モデル処理／マッチング部２６及び言語モデル処理／マッチング部２８を介して、認識候補が決定される。尚、本発明は、音声認識処理の詳細を特定するものでなく、如何なる音声認識技術に基づくものであってもよい。 The voice recognition device 10 includes a microphone 40 that picks up sound (voice) in the passenger compartment. The sound detected by the microphone 40 is subjected to predetermined processing such as amplification and noise removal in the preprocessing unit 22 and is sent to the feature amount extraction unit 24. In the feature quantity extraction unit 24, feature quantities are extracted from the detected speech signal (utterance data), and then recognition candidates are determined via the acoustic model processing / matching unit 26 and the language model processing / matching unit 28. . Note that the present invention does not specify the details of the voice recognition processing, and may be based on any voice recognition technology.

音声認識装置１０は、更に、ユーザとの対話を制御する対話制御部３０を備える。上述の音声認識エンジン２０にて得られた認識候補（音声認識エンジン２０の認識結果）は、文字データとして対話制御部３０に入力される。対話制御部３０は、後述の如く、車内に配設されるディスプレイ４４に、音声認識エンジン２０の認識結果を表示する。また、対話制御部３０は、車内に配設されるスピーカ４２から、音声認識エンジン２０の認識結果を音声合成部３２を介して音声出力してもよい。 The speech recognition apparatus 10 further includes a dialogue control unit 30 that controls dialogue with the user. The recognition candidates (recognition results of the speech recognition engine 20) obtained by the speech recognition engine 20 are input to the dialogue control unit 30 as character data. As will be described later, the dialogue control unit 30 displays the recognition result of the voice recognition engine 20 on the display 44 disposed in the vehicle. Further, the dialogue control unit 30 may output the recognition result of the voice recognition engine 20 through the voice synthesizing unit 32 from the speaker 42 provided in the vehicle.

ディスプレイ４４は、ユーザがタッチ操作することで各種入力が可能なタッチパネル式ディスプレイである。対話制御部３０は、タッチパネル式ディスプレイ４４上の各表示領域（後述するキーワード枠９０等）に対するユーザのタッチ操作を検出する手段を備え、グラフィカルユーザーインターフェースを介した対話入力が実現される態様で画面制御を行う。 The display 44 is a touch panel display that allows various inputs by a user's touch operation. The dialogue control unit 30 includes means for detecting a user's touch operation on each display area (a keyword frame 90 or the like described later) on the touch panel display 44, and displays a screen in a manner in which dialogue input via a graphical user interface is realized. Take control.

対話制御部３０には、検索システム７０が接続される。対話制御部３０は、後述の如く、タッチパネル式ディスプレイ４４を介して検索システム７０に対する検索指示等をユーザから受け付けると共に、タッチパネル式ディスプレイ４４上に検索システム７０による検索結果を表示する。 A search system 70 is connected to the dialog control unit 30. As will be described later, the dialogue control unit 30 receives a search instruction or the like for the search system 70 from the user via the touch panel display 44 and displays the search result by the search system 70 on the touch panel display 44.

検索システム７０は、地名や施設、それらの位置やジャンルなど各種情報を格納した地図データベース７２を備える。検索システム７０は、以下詳説する如く、地図データベース７２の中から、タッチパネル式ディスプレイ４４を介してユーザが設定した検索条件に従って、適切な情報を検索・抽出する機能を有する。 The search system 70 includes a map database 72 that stores various information such as place names, facilities, their positions, and genres. The search system 70 has a function of searching and extracting appropriate information from the map database 72 according to the search conditions set by the user via the touch panel display 44, as will be described in detail below.

例えば、ユーザが、近くのハンバーガー屋に行きたい場合に、マイク４０に向かって「ハンバーガー」と発声すると、対話制御部３０は、音声認識エンジン２０の認識結果として“ハンバーガー”をディスプレイ４４上に表示する。この際、ユーザが、例えばタッチパネル式ディスプレイ４４上の検索スイッチ８６（図３参照）にタッチすると、対話制御部３０は、検索システム７０により地図データベース７２の中からカテゴリ［ハンバーガー屋］に属するレストラン情報を抽出する。対話制御部３０は、検索システム７０が抽出したレストラン情報を、ディスプレイ４４上に表示する。そして、ユーザは、ディスプレイ４４上に表示されたレストラン情報の中から所望のハンバーガー屋を目的地として選択・設定する。この結果、対話制御部３０は、検索システム７０により当該ハンバーガー屋までのルート検索を実行し、この検索ルートをディスプレイ４４上に地図と共に表示させる。基本的には、このような対話の流れで目的地設定からルート案内開始までが進む。 For example, when the user wants to go to a nearby hamburger shop and speaks “hamburger” toward the microphone 40, the dialogue control unit 30 displays “hamburger” on the display 44 as a recognition result of the speech recognition engine 20. To do. At this time, when the user touches, for example, the search switch 86 (see FIG. 3) on the touch panel display 44, the dialog control unit 30 uses the search system 70 to search for restaurant information belonging to the category [hamburger shop] from the map database 72. To extract. The dialogue control unit 30 displays the restaurant information extracted by the search system 70 on the display 44. Then, the user selects / sets a desired hamburger shop from the restaurant information displayed on the display 44 as a destination. As a result, the dialogue control unit 30 performs a route search to the burger shop by the search system 70 and displays the search route on the display 44 together with the map. Basically, the process from the destination setting to the start of route guidance proceeds through such a flow of dialogue.

ところで、一般的に、ディスプレイ４４上に表示できる情報量には限りがあり、また、膨大な情報を表示するのは却って不便であるので、該当する情報量（先の例ではカテゴリ［ハンバーガー屋］に属するレストラン情報）が所定量以上ある場合には、情報の絞込みのための検索条件を追加する必要が生ずる。 By the way, in general, the amount of information that can be displayed on the display 44 is limited, and since it is inconvenient to display a large amount of information, the corresponding information amount (category [hamburger shop] in the previous example) When there is a predetermined amount or more of restaurant information belonging to (2), it is necessary to add a search condition for narrowing down the information.

しかしながら、上述の如く音声認識により検索条件を入力する構成では、キーボードなどを用いてパーソナルコンピューター上で行う検索システムとは異なり、複雑な検索条件の入力が困難である。従って、検索条件の絞込みとしては、例えば地域に関する条件をアンド条件として加えたり、更なる詳細な条件（例えば、先の例では店名）に変更したりすることが一般的である。 However, in a configuration in which search conditions are input by voice recognition as described above, it is difficult to input complicated search conditions, unlike a search system that uses a keyboard or the like on a personal computer. Therefore, for narrowing down the search conditions, for example, it is common to add conditions related to the area as AND conditions, or to change to more detailed conditions (for example, store names in the previous example).

これに対して、本実施例では、以下詳説する如く、音声認識による入力を主体としつつ、非常に簡易なスイッチ操作を補助的に用いることで、多様な検索条件の絞込みの設定を可能とする。 On the other hand, in the present embodiment, as described in detail below, it is possible to set various search conditions to be narrowed down by using a very simple switch operation while mainly inputting by voice recognition. .

図２は、本実施例の音声認識装置１０により実行される特徴的な処理の流れを示すフローチャートである。本例では、ユーザが豊田市内で“ABCバーガー（店名）”以外のハンバーガー屋に行きたい場合を想定する。 FIG. 2 is a flowchart showing a flow of characteristic processing executed by the speech recognition apparatus 10 of the present embodiment. In this example, it is assumed that the user wants to go to a hamburger shop other than “ABC burger (store name)” in Toyota city.

先ず、ステップ１００として、音声認識装置１０は、例えばイグニッションスイッチがオンとなった際に起動され、ユーザの発話に対する待ち受け状態となる。尚、音声認識装置１０は、所定の条件が成立した場合のみ（例えば、所定のボタンが操作された場合のみ）、マイク４０により検出された音声に対して上述の音声認識処理を実行するように構成されていてよい。 First, as step 100, the speech recognition apparatus 10 is activated when, for example, an ignition switch is turned on, and enters a standby state for a user's utterance. Note that the voice recognition device 10 performs the above-described voice recognition processing on the voice detected by the microphone 40 only when a predetermined condition is satisfied (for example, only when a predetermined button is operated). It may be configured.

ユーザの発話があると、音声認識装置１０は、音声認識処理を実行して認識結果をディスプレイ４４上に表示する（ステップ１１０）。この際、ユーザが「ハンバーガーが食べたいんだけど、ABCバーガーはなしにして。できれば豊田市がいいなー」と発話し、音声認識装置１０が、キーワード候補（識別文字列）として“ハンブルグ”、“ABCバーガー”及び“豊田市”を認識したとする。この場合、ディスプレイ４４上には、図３に示すように、認識結果が、各種の機能スイッチ８０と共に、キーワード枠９０内に表示される。 When the user utters, the speech recognition apparatus 10 executes speech recognition processing and displays the recognition result on the display 44 (step 110). At this time, the user says, “I want to eat a hamburger, but no ABC burger. If possible, I would like Toyota City”, and the speech recognition device 10 uses “Hamburg”, “ABC” as keyword candidates (identification character strings). Suppose you recognize "Burger" and "Toyota City". In this case, the recognition result is displayed on the display 44 in the keyword frame 90 together with various function switches 80, as shown in FIG.

キーワード枠９０は、キーワード毎に設けられる。即ち、ある発話データにおいてｉ個のキーワードが確認される場合、ｉ個のキーワード枠９０_ｉ（ｉ＝1,2,…）が用意される。尚、１つのキーワードに対して複数のキーワード候補が認識される認識エンジンの場合、最も信頼度の高いキーワード候補がキーワード枠９０内に表示される。 The keyword frame 90 is provided for each keyword. That is, when i keywords are confirmed in a certain utterance data, i keyword frames 90 _i (i = 1, 2,...) Are prepared. In the case of a recognition engine that recognizes a plurality of keyword candidates for one keyword, the keyword candidate with the highest reliability is displayed in the keyword frame 90.

本実施例のキーワード枠９０は、キーワード候補を表示するだけでなく、当該キーワード候補に係る検索条件を決定するためのタッチスイッチとしての機能も有する。即ち、キーワード枠９０は、それに対するユーザの操作態様によって検索条件が変わるように構成されている。 The keyword frame 90 of this embodiment not only displays keyword candidates, but also has a function as a touch switch for determining search conditions related to the keyword candidates. That is, the keyword frame 90 is configured such that the search condition changes depending on the user's operation mode for the keyword frame 90.

対話制御部３０は、上述の如くディスプレイ４４上のキーワード枠９０内に各認識結果を表示した後、ステップ１２０として、ユーザからの更なる入力を待機する状態に入る。 After displaying each recognition result in the keyword frame 90 on the display 44 as described above, the dialog control unit 30 enters a state of waiting for further input from the user as step 120.

この際、対話制御部３０は、キーワード枠９０に対するユーザのタッチ回数Sに基づいて、キーワード枠９０内のキーワード候補の検索条件を判断する。ここで、タッチ回数Sは初期値が０であり、一回タッチする毎に１ずつ増え、３回タッチすると０に戻るように設定される。即ち、初期値S＝０（S＝S＋１）であり、S=３のときS＝０となる。これは、各キーワード枠９０_ｉに対してそれぞれ付与される。即ち、キーワード枠９０_ｉに係るタッチ回数はS（i）である。 At this time, the dialogue control unit 30 determines the search condition for the keyword candidate in the keyword frame 90 based on the number of times S the user touches the keyword frame 90. Here, the initial value of the number of touches S is 0, and is set to increase by 1 for each touch and return to 0 when touched three times. That is, the initial value S = 0 (S = S + 1), and when S = 3, S = 0. This is given to each keyword frame 90 _i . That is, the number of touches related to the keyword frame 90 _i is S (i).

本例の対話制御部３０は、キーワード枠９０_ｉに対するユーザのタッチ回数S（i）に応じて、当該キーワード枠９０_ｉの検索条件を判断するだけでなく、当該キーワード枠９０_ｉの表示状態をも変化させる。 Dialogue control unit 30 of this embodiment, depending on the number of touches user S (i) for the keyword frame 90 _i, not only to determine the search conditions of the keyword frame 90 _i, the display state of the keyword frame 90 _i Also change.

例えば、キーワード枠９０_ｉに対して１回タッチすると（S（i）＝１）、対話制御部３０は、図３に示すように、キーワード枠９０_ｉの色を初期状態に比して薄くする（ステップ１３０）。これは、当該キーワード枠９０_ｉ内のキーワード候補が検索条件から除外されることを意味する。即ち、本例では、ユーザは、“ハンバーガー”の誤認識結果である“ハンブルグ”に係るキーワード枠９０_ｉ（ｉ＝１）に１回タッチすることで、“ハンブルグ”をキーワード候補から外すことができる。 For example, when the keyword frame 90 _{i is} touched once (S (i) = 1), the dialogue control unit 30 makes the color of the keyword frame 90 _i lighter than the initial state, as shown in FIG. (Step 130). This means that keyword candidates in the keyword frame 90 _i are excluded from the search conditions. That is, in this example, the user can remove “Hamburg” from the keyword candidates by touching the keyword frame 90 _i (i = 1) related to “Hamburg”, which is the erroneous recognition result of “Hamburger” once. it can.

また、キーワード枠９０_ｉに対して２回タッチすると（S（i）＝２）、対話制御部３０は、図３に示すように、キーワード枠９０_ｉ内のキーワード候補の横に“×”なる表示を生成する（ステップ１４０）。これは、当該キーワード枠９０_ｉ内のキーワード候補が排除条件であることを意味する。即ち、本例では、ユーザは、“ABCバーガー” に係るキーワード枠９０_ｉ（ｉ＝３）に２回タッチすることで、“ABCバーガー”を排除条件として設定することができる。 When the keyword frame 90 _{i is} touched twice (S (i) = 2), the dialogue control unit 30 becomes “x” next to the keyword candidate in the keyword frame 90 _{i as} shown in FIG. A display is generated (step 140). This means that the keyword candidates in the keyword frame 90 _i are exclusion conditions. That is, in this example, the user can set “ABC burger” as an exclusion condition by touching the keyword frame 90 _i (i = 3) related to “ABC burger” twice.

尚、キーワード枠９０_ｉに対して３回タッチすると（S（i）＝３＝０）、対話制御部３０は、キーワード枠９０_ｉの表示状態を初期状態に戻す（ステップ１５０）。 If the keyword frame 90 _{i is} touched three times (S (i) = 3 = 0), the dialogue control unit 30 returns the display state of the keyword frame 90 _i to the initial state (step 150).

このように本実施例では、ユーザは、キーワード枠９０_ｉの表示状態を一目するだけで当該キーワード枠９０_ｉ内のキーワード候補に係る検索条件が容易に判断できる。尚、キーワード枠９０_ｉの表示状態の変化態様は、表示の明度や輝度を変化させるのみならず付加表示の重畳や点滅などの強調表示等々多種多様であり、本発明は、上記の変化態様に限られることない。 Thus, in this embodiment, the user may search condition by simply glance display state of the keyword frame 90 _i according to the keyword candidate of the keyword frame within 90 _i can be easily determined. The change state of the display state of the keyword frame 90 _i is not limited to changing the brightness and brightness of the display, but also variously, such as emphasis display such as superimposition and blinking of additional display. Not limited.

尚、先の例において、除外された“ハンブルグ”に代えて“ハンバーガー”をキーワード枠９０_１内に入れるため、ユーザは、ディスプレイ４４上の再認識スイッチ８４にタッチしてよい。この場合、音声認識エンジン２０は、対話制御部３０からの指令に応じて、“ハンブルグ”を認識辞書から除外して、バッファ内に格納しておいた発話データに対して再度認識処理を実行してもよい。或いは、音声認識エンジン２０が初めから複数の候補を出力しうる構成では、再認識スイッチ８４に操作に応答して、“ハンブルグ”の次候補としてメモリに格納されていた“ハンバーガー”がキーワード枠９０_１内に表示されうる。 Incidentally, in the previous example, to add "hamburger" in place of the excluded "Hamburg" in the keyword frame 90 _1, the user may touch the re-recognition switch 84 on the display 44. In this case, the speech recognition engine 20 executes “recognition processing” again on the speech data stored in the buffer, excluding “Hamburg” from the recognition dictionary in response to a command from the dialogue control unit 30. May be. Alternatively, in a configuration in which the speech recognition engine 20 can output a plurality of candidates from the beginning, the “hamburger” stored in the memory as the next candidate for “Hamburg” is displayed in the keyword frame 90 in response to the operation of the re-recognition switch 84. ₁ can be displayed.

また、本例では、その他の機能スイッチ８０として、リセットスイッチ８８が用意されている。リセットスイッチ８８が操作されると（ステップ１６０）、対話制御部３０は、全てのキーワード枠９０_ｉの表示状態を初期状態に戻し、それに伴い、全てのキーワード枠９０_ｉのタッチ回数S（i）が初期値０に戻される。尚、その後、もしユーザにより再度発話が実行されると、上記ステップ１００からの処理が再度実行されることになる。 In this example, a reset switch 88 is prepared as the other function switch 80. When the reset switch 88 is operated (step 160), the dialog control section 30 returns the display state of all keywords frame 90 _i to an initial state, accordingly, number of touches all keywords frame 90 _i S (i) Is returned to the initial value 0. After that, if the user speaks again, the processing from step 100 will be performed again.

上述のようなユーザによる検索条件設定操作が完了すると、ユーザは、ディスプレイ４４上の検索スイッチ８６にタッチすることになる。検索スイッチ８６が操作されると（ステップ１７０）、対話制御部３０による指令に基づいて検索システム７０による検索が実行される。この際、検索システム７０は、上述のようなキーワード枠９０_ｉ内の各キーワード候補の検索条件に従って（即ち、S（i）の値に従って）、検索を実行する。先の例では、キーワード枠９０_１内に“ハンバーガー”が入れ直されたとして、検索システム７０は、ユーザの望みどおり、“豊田市”という地域で“ABCバーガー”以外のハンバーガー屋を地図データベース７２の中から検索・抽出することになる。或いは、先の例のように、“ABCバーガー”からハンバーガー屋が推定できる場合には、キーワード枠９０_１のタッチ回数S（１）＝１の場合でも（即ち“ハンブルグ”をキーワード候補から外すだけで）、“豊田市”という地域で“ABCバーガー”以外のハンバーガー屋が適切に検索・抽出されうる。尚、この検索の際、通常通り、キーワード枠９０_ｉ内の各キーワード候補間はＡＮＤ条件で結ばれているとみなされている。 When the search condition setting operation by the user as described above is completed, the user touches the search switch 86 on the display 44. When the search switch 86 is operated (step 170), the search by the search system 70 is executed based on a command from the dialogue control unit 30. At this time, the search system 70 executes a search according to the search condition of each keyword candidate in the keyword frame 90 _i as described above (that is, according to the value of S (i)). In the previous example, as has been cycled the "hamburger" in the keyword frame 90 within the _1, the search system 70, as desired by the user, map a hamburger shop other than "ABC Burger" in the area of "Toyota City" database 72 Search and extract from. Or, as in the previous example, in the case from the "ABC Burger" can hamburger shop is estimated, even if the keyword frame 90 ₁ of touch number of times S (1) = 1 (ie, "Hamburg" only removed from the keyword candidate In other words, a hamburger store other than “ABC Burger” can be properly searched and extracted in the area of “Toyota City”. In this search, it is considered that each keyword candidate in the keyword frame 90 _i is connected by an AND condition as usual.

このように本実施例によれば、上述の如く簡易なスイッチ操作で排他条件を設定できるので、検索条件の絞込み方法の選択肢が広がり、ユーザの希望に合った検索条件の設定が可能となり、音声認識による検索システムの利便性が向上する。 As described above, according to the present embodiment, the exclusion condition can be set by a simple switch operation as described above, so that the selection method of the search condition narrows down and the search condition can be set according to the user's desire. The convenience of the search system by recognition is improved.

また、上述の如く排他条件を簡易なスイッチ操作で設定可能とすることで、“以外の”や“ではなく”といった排他条件を音声認識させる必要が無くなる。これに伴い、ユーザの発話データに対して、認識の困難な助詞や接続詞などを認識対象（即ちキーワード）から外し、住所、ジャンル、名詞などの特定ワードのみを認識対象とすることが可能となる。この結果、音声認識エンジン２０に非常に高い認識能力を要求することなく、高精度の認識結果を得ることが可能となる。 In addition, by making it possible to set the exclusion condition with a simple switch operation as described above, it is not necessary to recognize the exclusion condition such as “other than” or “not”. Along with this, it is possible to remove particles and conjunctions that are difficult to recognize from the recognition target (that is, keywords) from the user's utterance data, and target only specific words such as addresses, genres, and nouns. . As a result, it is possible to obtain a highly accurate recognition result without requiring the speech recognition engine 20 to have a very high recognition capability.

尚、本実施例において、検索スイッチ８６のようなキーワード枠９０_ｉ以外の機能スイッチ８０の操作は、音声入力により代替されても良い。これにより、ユーザによる手による操作を可能な限り減らし、音声入力の利便性を高めても良い。 In this embodiment, the operation of the function switch 80 other than the keyword frame 90 _i such as the search switch 86 may be replaced by voice input. Thereby, the manual operation by the user may be reduced as much as possible, and the convenience of voice input may be improved.

図４は、本発明のその他の実施例による検索条件設定態様の説明図であり、ディスプレイ４４上の表示画面を示す（図３と同様の表示画面）。本例では、ユーザが豊田市内で“DCEバーガーハウス（店名）”若しくは“FGHバーガー（店名）”に行きたい場合を想定する。 FIG. 4 is an explanatory diagram of a search condition setting mode according to another embodiment of the present invention, and shows a display screen on the display 44 (display screen similar to FIG. 3). In this example, it is assumed that the user wants to go to “DCE Burger House (Store Name)” or “FGH Burger (Store Name)” in Toyota City.

この場合、ユーザは例えば「ハンバーガーが食べたいんだけど、DCEバーガーハウスかFGHバーガーに行きたい気分。豊田市限定で探して」と発話する。これに対して、音声認識装置１０が、キーワード候補として“ハンバーガー”、“DCEバーガーハウス”、“FGHバーガー”及び“豊田市”を適切に認識できたとする。 In this case, the user says, for example, “I want to eat a hamburger but I feel like going to DCE Burger House or FGH Burger. In contrast, it is assumed that the speech recognition apparatus 10 can appropriately recognize “hamburger”, “DCE burger house”, “FGH burger”, and “Toyota City” as keyword candidates.

本実施例の対話制御部３０は、図４に示すように、“DCEバーガーハウス”及び“FGHバーガー”を同類のキーワード候補とし、ディスプレイ４４上に横並びで並列的に配設されたキーワード枠９０内にこれらの同類のキーワード候補を表示する。尚、説明上、これらの同類のキーワード候補が表示されるキーワード枠９０を「同類キーワード枠９０」と称する。但し、同類キーワード枠９０は、キーワード枠９０としての機能も上述の実施例と同様に有している。即ち、同類キーワード枠９０を含むそれぞれのキーワード枠９０は、そのタッチ回数に応じて排他条件の設定や認識辞書からのキーワード候補の削除が可能である。 As shown in FIG. 4, the dialogue control unit 30 of the present embodiment uses “DCE burger house” and “FGH burger” as similar keyword candidates, and is arranged side by side on the display 44 in a keyword frame 90. These similar keyword candidates are displayed. For the sake of explanation, the keyword frame 90 in which these similar keyword candidates are displayed is referred to as a “similar keyword frame 90”. However, the similar keyword frame 90 also has a function as the keyword frame 90 as in the above-described embodiment. That is, each of the keyword frames 90 including the similar keyword frame 90 can set an exclusion condition or delete a keyword candidate from the recognition dictionary according to the number of touches.

隣接する同類キーワード枠９０間には、図４に示すように、反転スイッチ８９が設けられる。反転スイッチ８９は、初期状態（デフォルト）で検索条件がＯＲ条件であり、それに対するユーザの操作態様によって検索条件がＡＮＤ条件に反転するように構成されている。 As shown in FIG. 4, a reversing switch 89 is provided between adjacent similar keyword frames 90. The inversion switch 89 is configured so that the search condition is an OR condition in the initial state (default), and the search condition is inverted to an AND condition depending on the user's operation mode.

対話制御部３０は、隣接する２つの同類キーワード枠９０間の検索条件を、その間の反転スイッチ８９に対するユーザのタッチ回数Ｇに基づいて、判断する。ここで、タッチ回数Ｇは初期値が０であり、一回タッチする毎に１ずつ増え、２回タッチすると０に戻るように設定される。また、各反転スイッチ８９は、各キーワード枠９０_ｉと同様、互いに独立して操作される。 The dialogue control unit 30 determines a search condition between two adjacent similar keyword frames 90 based on the number of times G of the user touches the reverse switch 89 between them. Here, the initial value of the number of touches G is 0, and is set to be incremented by 1 for each touch and to return to 0 when touched twice. In addition, each reversing switch 89 is operated independently from each other like each keyword frame 90 _i .

対話制御部３０は、反転スイッチ８９に対するユーザのタッチ回数Ｇに応じて、当該反転スイッチ８９の検索条件を判断するだけでなく、当該反転スイッチ８９の表示状態をも変化させる。例えば、初期状態（Ｇ＝０）の反転スイッチ８９には、“ＯＲ”なる表示がなされ、反転状態（Ｇ＝１）の反転スイッチ８９には、“ＡＮＤ”なる表示がなされてよい。 The dialogue control unit 30 not only determines the search condition of the reversing switch 89 but also changes the display state of the reversing switch 89 according to the number of times G the user touches the reversing switch 89. For example, the reversing switch 89 in the initial state (G = 0) may be displayed as “OR”, and the reversing switch 89 in the reversed state (G = 1) may be displayed as “AND”.

先の例では、ユーザは、特に反転スイッチ８９を操作することなく、ディスプレイ４４上の検索スイッチ８６にタッチして、検索を開始させることができる。検索スイッチ８６が操作されると、対話制御部３０による指令に基づいて検索システム７０による検索が実行される。この際、検索システム７０は、上述のようなキーワード枠９０_ｉ内の各キーワード候補の検索条件（即ち、S（i）の値）と反転スイッチ８９の検索条件（即ち、Ｇの値）とに従って、検索を実行する。従って、先の例では、“DCEバーガーハウス”及び“FGHバーガー”を入れる２つの同類キーワード枠９０間の反転スイッチ８９がＯＲ条件で結ばれているので、検索システム７０は、ユーザの望みどおり、“豊田市”という地域で“DCEバーガーハウス”または“FGHバーガー”という名のハンバーガー屋を地図データベース７２の中から検索・抽出することになる。尚、この検索の際、同類キーワード枠９０内のキーワード候補と、キーワード枠９０_ｉ内の各キーワード候補とはＡＮＤ条件で結ばれているとみなされている。 In the above example, the user can start the search by touching the search switch 86 on the display 44 without particularly operating the reverse switch 89. When the search switch 86 is operated, a search by the search system 70 is executed based on a command from the dialogue control unit 30. At this time, the search system 70 follows the search conditions (that is, the value of S (i)) of each keyword candidate in the keyword frame 90 _i as described above and the search conditions (that is, the value of G) of the inversion switch 89. , Perform a search. Therefore, in the previous example, the reversing switch 89 between the two similar keyword frames 90 into which “DCE burger house” and “FGH burger” are inserted is connected by the OR condition. A hamburger shop named “DCE Burger House” or “FGH Burger” in the area “Toyota City” is searched and extracted from the map database 72. In this search, it is considered that the keyword candidates in the similar keyword frame 90 and the keyword candidates in the keyword frame 90 _i are connected by the AND condition.

このように本実施例によれば、上述の如く簡易なスイッチ操作で複数のキーワード候補間をＯＲ条件又はＡＮＤ条件で結びつけることができるので、検索条件の絞込み方法の選択肢が広がり、ユーザの希望に合った検索条件の設定が可能となり、音声認識による検索システムの利便性が向上する。 As described above, according to the present embodiment, since a plurality of keyword candidates can be linked with an OR condition or an AND condition by a simple switch operation as described above, the options for narrowing down the search condition are widened and the user's wish can be obtained. Search conditions that match can be set, and the convenience of the search system based on voice recognition is improved.

尚、本実施例において、同類キーワード枠９０内のキーワード候補と、キーワード枠９０_ｉ内の各キーワード候補と入れ替え可能とされてよい。例えば、先の例で、“FGHバーガー”が通常のキーワード枠９０内に誤って表示された場合には、ユーザが、例えば入れ替えスイッチ８２を操作した後に、“FGHバーガー”が入っているキーワード枠９０にタッチする。これに応答して、対話制御部３０は、当該キーワード枠９０内のキーワード候補（即ち“FGHバーガー”）を同類キーワード枠９０内に移してよい。或いは、ユーザが、“FGHバーガー”が入っているキーワード枠９０を長押しすることで、これに応答して、対話制御部３０が、当該キーワード枠９０自体を同類キーワード枠９０に変化させてよい。この場合、対話制御部３０は、これら２つの同類キーワード枠９０に係る検索条件を決定するための反転スイッチ８９を適切な位置に表示してよい。 In the present embodiment, the keyword candidates in the similar keyword frame 90 and the keyword candidates in the keyword frame 90 _i may be interchanged. For example, in the above example, when “FGH burger” is displayed in the normal keyword frame 90 by mistake, the keyword frame containing “FGH burger” after the user operates the switch 82, for example. Touch 90. In response to this, the dialogue control unit 30 may move the keyword candidates in the keyword frame 90 (that is, “FGH burger”) into the similar keyword frame 90. Alternatively, when the user presses and holds the keyword frame 90 containing “FGH burger”, the dialogue control unit 30 may change the keyword frame 90 itself to the similar keyword frame 90 in response to this. . In this case, the dialogue control unit 30 may display the reverse switch 89 for determining the search condition related to these two similar keyword frames 90 at an appropriate position.

以上説明した各実施例は、特に、ユーザが当初決まった目的地が無く、対話式で目的地を設定していく場合に好適である。従って、本実施例の音声認識装置１０は、対話型で目的地を設定する対話型設定モードと、通常的な目的地設定モードとで選択的に動作可能であってよい。これは、後者の目的地設定モードでは、当初から決まった目的地があるときは、ユーザが単にその目的地を発声するだけなので、上述のような各種検索条件の絞込みの必要性が乏しく、むしろ認識対象を地図関連の用語（地名）に絞る方が認識精度の観点から有用であるからである。従って、目的地設定モードでは、上述のような図３や図４に示す態様でない別の態様の表示制御・識別辞書が採用されてもよい。例えば、目的地設定モードでは、キーワード枠９０は、それに対するタッチの有無に応じて認識辞書からのキーワード候補の削除のみが可能とされてよい。また、目的地設定モードでは、１つのキーワードに対して複数のキーワード候補が認識された場合、ユーザがキーワード枠９０の操作で選択・除外できるように、複数のキーワード候補がキーワード枠９０内に表示されてもよい。 Each of the embodiments described above is particularly suitable when the user does not have an initially determined destination and interactively sets the destination. Therefore, the speech recognition apparatus 10 of the present embodiment may be selectively operable in an interactive setting mode in which a destination is set interactively and a normal destination setting mode. In the latter destination setting mode, when there is a destination decided from the beginning, the user simply utters the destination, so there is little need for narrowing down the various search conditions as described above. This is because it is more useful from the viewpoint of recognition accuracy to narrow down the recognition target to map-related terms (place names). Therefore, in the destination setting mode, a display control / identification dictionary of another mode other than the mode shown in FIGS. 3 and 4 as described above may be employed. For example, in the destination setting mode, the keyword frame 90 may only be able to delete keyword candidates from the recognition dictionary depending on whether or not there is a touch on the keyword frame 90. In the destination setting mode, when a plurality of keyword candidates are recognized for one keyword, the plurality of keyword candidates are displayed in the keyword frame 90 so that the user can select / exclude them by operating the keyword frame 90. May be.

尚、これら２つのモード間（対話型設定モードと目的地設定モード）の切換は、ユーザによるスイッチ操作により実現されてもよいが、ユーザの会話や発話に対する識別結果に基づいてユーザの意図を推定することで自動切換えにより実現されてもよい。後者の場合、例えば、特定のキーワードに基づいてユーザの目的地が既に決まっているか、現在検討中かを判断してもよい。 Switching between these two modes (interactive setting mode and destination setting mode) may be realized by a switch operation by the user, but the user's intention is estimated based on the identification result of the user's conversation or utterance. This may be realized by automatic switching. In the latter case, for example, it may be determined whether the user's destination has already been determined based on a specific keyword or is currently under consideration.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなく、上述した実施例に種々の変形及び置換を加えることができる。 The preferred embodiments of the present invention have been described in detail above. However, the present invention is not limited to the above-described embodiments, and various modifications and substitutions can be made to the above-described embodiments without departing from the scope of the present invention. Can be added.

例えば、上述した各実施例では、キーワード枠９０に対するタッチ回数に応じて検索条件が変化させているが、長押しなどの他のタッチ操作態様に応じて変化させてもよい。 For example, in each of the embodiments described above, the search condition is changed according to the number of times the keyword frame 90 is touched, but may be changed according to another touch operation mode such as a long press.

また、上述した各実施例において、ディスプレイ４４上の表示画面におけるキーワード枠９０や数や配置、反転スイッチ８９の位置や配置は種々の変更が可能である。例えば、図４の破線で示すように、デフォルトでＡＮＤ条件を示す反転スイッチ８９ｂが縦列のキーワード枠９０間に設定されてもよい。 Further, in each of the above-described embodiments, various changes can be made to the keyword frame 90, the number and arrangement thereof, and the position and arrangement of the reversing switch 89 on the display screen on the display 44. For example, as indicated by a broken line in FIG. 4, an inversion switch 89 b indicating an AND condition by default may be set between the keyword frames 90 in the column.

また、上述した各実施例は、主に目的地設定までの処理に関するものであったが、目的地設定後の案内ルート検索・選択時にも適用可能である。例えば、対話制御部３０は、案内ルート検索時、“高速道路”との音声認識結果に応答して、高速道路を利用するルートを希望するか否かをユーザに問うためのタッチスイッチを表示させても良い。この場合、ユーザが当該タッチスイッチを操作した場合には、高速道路を利用しない案内ルートのみを検索するようにしてよい。 In addition, each of the embodiments described above mainly relates to the processing up to the destination setting, but can also be applied to the search / selection of the guidance route after the destination setting. For example, when searching for a guidance route, the dialogue control unit 30 displays a touch switch for asking the user whether or not he wants a route using the highway in response to the voice recognition result “highway”. May be. In this case, when the user operates the touch switch, only the guidance route that does not use the expressway may be searched.

また、上述した各実施例においては、地域に関する条件（先の例では豊田市）が付与されていない場合には、検索システム７０は、車両の現在位置に対して所定エリア内における情報のみを抽出するものであってもよい。この場合、車両の現在位置は、ＧＰＳ（Global Positioning System）受信機、ビーコン受信機及びＦＭ多重受信機や、車速センサやジャイロセンサ等の各種センサに基づいて検出されて良い。 Further, in each of the above-described embodiments, when the condition regarding the region (Toyota City in the previous example) is not given, the search system 70 extracts only the information in the predetermined area with respect to the current position of the vehicle. You may do. In this case, the current position of the vehicle may be detected based on various sensors such as a GPS (Global Positioning System) receiver, a beacon receiver, an FM multiplex receiver, a vehicle speed sensor, and a gyro sensor.

また、上述した各実施例において、対話制御部３０は、キーワード枠９０_ｉに対するユーザのタッチ操作に応じて、スピーカ４２を介して音声出力を行ってもよい。例えば「排他条件」といったように、設定変更された検索条件を音声出力させてよい。 Further, in each embodiment described above, the dialog control section 30, in accordance with a touch operation of a user on the keyword frame 90 _i, may be carried out audio output via the speaker 42. For example, search conditions whose settings have been changed may be output as voices such as “exclusive conditions”.

また、上述からも明らかなように、上述した各実施例において、音声認識装置１０その他構成要素は、全て若しくはその一部がナビゲーション装置に組み込まれて具現化されてよい。例えば、ディスプレイ４４や地図データベース７２は、ナビゲーション装置が通常的に備えるディスプレイや地図データであってよい。 Further, as is clear from the above, in each of the above-described embodiments, the voice recognition device 10 and other components may be embodied by incorporating all or part of them into the navigation device. For example, the display 44 or the map database 72 may be a display or map data normally provided in the navigation device.

本発明による車載音声認識装置の一実施例を示すシステム構成図である。1 is a system configuration diagram showing an embodiment of a vehicle-mounted speech recognition apparatus according to the present invention. 本実施例の音声認識装置１０により実行される特徴的な処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the characteristic process performed by the speech recognition apparatus 10 of a present Example. 本発明の一実施例によるディスプレイ４４上の表示画面を示す図である。It is a figure which shows the display screen on the display 44 by one Example of this invention. 本発明のその他の一実施例によるディスプレイ４４上の表示画面を示す図である。It is a figure which shows the display screen on the display 44 by other one Example of this invention.

Explanation of symbols

１０音声認識装置
２０音声認識エンジン
２２前処理部
２４特徴量抽出部
２６音響モデル処理／マッチング部
２８言語モデル処理／マッチング部
３０対話制御部
４０マイク
４２スピーカ
４４ディスプレイ
７０検索システム
７２地図データベース
８０機能スイッチ
８４再認識スイッチ
８６検索スイッチ
８８リセットスイッチ
８９反転スイッチ
９０キーワード枠 DESCRIPTION OF SYMBOLS 10 Speech recognition apparatus 20 Speech recognition engine 22 Pre-processing part 24 Feature-value extraction part 26 Acoustic model processing / matching part 28 Language model processing / matching part 30 Dialogue control part 40 Microphone 42 Speaker 44 Display 70 Search system 72 Map database 80 Function switch 84 Re-recognition switch 86 Search switch 88 Reset switch 89 Reverse switch 90 Keyword frame

Claims

Speech recognition means for performing speech recognition processing on utterances detected in the vehicle;
Display control for displaying each identification character string included in the utterance in a separate display area distinguishable from each other on the touch panel display and detecting a user's touch operation on each display area on the touch panel display Means,
A search system for searching information in the database according to a predetermined search condition based on each identification character string displayed on the touch panel display;
According to a predetermined touch operation on the display area on the touch panel display, a state is formed in which the identification character string displayed in the display area is considered as an exclusive condition in the search condition. A vehicle-mounted speech recognition device.

According to the number of touch operations for the display area on the touch panel display,
(1) a first state in which an identification character string displayed in the display area is considered as an AND condition with respect to another identification character string in the search condition;
(2) a second state in which the identification character string is not considered in the search condition; and
(3) The vehicle-mounted speech recognition apparatus according to claim 1, wherein any one of a third state in which the identification character string is considered as an exclusion condition in the search condition is formed.

The in-vehicle voice recognition according to claim 1, wherein the display control means changes a display state of an identification character string displayed in the display area according to a touch operation mode with respect to the display area on a touch panel display. apparatus.

The in-vehicle speech recognition device according to claim 1, wherein the speech recognition means does not identify a word related to a search condition such as “or” “and” “other than” “not”.

Two or more identification character strings displayed in two or more predetermined display areas in the display area are considered as OR conditions for other identification character strings in the search condition. The vehicle-mounted speech recognition apparatus according to Item 1.

The in-vehicle voice recognition device according to claim 5, wherein the two or more predetermined display areas are arranged in a direction distinguishable with respect to other display areas.

The in-vehicle voice recognition device according to claim 5, wherein the display control means displays the same kind of identification character string among the identification character strings in the two or more predetermined display areas.

The in-vehicle speech recognition apparatus according to claim 1, wherein each identification character string displayed in each display area is initially set to be connected to each other by an AND condition in the search condition.
One operation area is assigned to every two adjacent display areas,
In response to a touch operation on the one operation area, the connection in the search condition between two identification character strings displayed in the two display areas related to the one operation area is switched from the AND condition to the OR condition. A vehicle-mounted speech recognition device.

The in-vehicle speech recognition device according to claim 2,
In the first mode suitable for the user to confirm the undecided destination, the first state, the second state, or the third state is selectively formed,
The in-vehicle speech recognition apparatus, in which only the first state or the second state is formed in the second mode suitable for speech recognition by uttering a destination determined by the user.

The vehicle-mounted speech recognition apparatus according to claim 9, wherein switching between the two modes is performed according to a switch operation by a user or a specific keyword included in the utterance.

The in-vehicle voice recognition device according to claim 9, wherein in the second mode, the voice recognition means performs voice identification based on a recognition dictionary including only map-related terms.