JP2011203349A

JP2011203349A - Speech recognition system and automatic retrieving system

Info

Publication number: JP2011203349A
Application number: JP2010068463A
Authority: JP
Inventors: Toshiyuki Nanba; 利行難波; Hiroaki Sekiyama; 博昭関山; Minako Fujishiro; 実奈子藤城; Kazuomi Ota; 和臣太田; Taisuke Katayama; 泰輔片山
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2010-03-24
Filing date: 2010-03-24
Publication date: 2011-10-13
Anticipated expiration: 2030-03-24
Also published as: JP5434731B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech recognition system and an automatic retrieving system, capable of reducing incorrect recognition of a keyword at natural communication time.SOLUTION: An automatic retrieving system 1 includes a telephone unit 3 installed in a vehicle P, a controller unit 4 installed in a call center Q, a telephone unit 5, and an operator terminal 6. The controller unit 4 includes: a speech recognition processor 7 which obtains utterance information of a driver and an operator from the telephone unit 5, and which extracts the keyword relating to intention of the driver from utterance contents before and after the characteristic utterance, when it is determined that characteristic utterance is included in the utterance information; and an automatic retrieving processor 8 for performing automatic point of interest (POI) retrieving by using the keyword extracted by the speech recognition processor 7.

Description

本発明は、ドライバとオペレータとの通話時に音声認識を行う音声認識システム及び当該音声認識システムを利用した自動検索システムに関するものである。 The present invention relates to a voice recognition system that performs voice recognition during a call between a driver and an operator, and an automatic search system using the voice recognition system.

従来の音声認識システムとしては、例えば特許文献１に記載されているように、話者の発話音声を音声認識処理し、その認識結果からキーワードを検出し、そのキーワードと対応する条件が成立するときに、当該キーワードと関連のある情報を出力するようにしたものが知られている。 As a conventional speech recognition system, for example, as described in Patent Document 1, when a speaker's speech is subjected to speech recognition processing, a keyword is detected from the recognition result, and a condition corresponding to the keyword is satisfied In addition, there is known one that outputs information related to the keyword.

特開２００５−２１５７２６号公報JP 2005-215726 A

ところで、シナリオのない自然通話時には、話者が発するワード量が多くなり、任意のワードを抽出するのが難しくなるため、必要となる重要なワード（キーワード）をもらし、余計なワードを拾う可能性がある。 By the way, during a natural call without a scenario, the amount of words that the speaker utters increases, making it difficult to extract arbitrary words, so it is possible to get the necessary important words (keywords) and pick up extra words There is.

本発明の目的は、自然通話時におけるキーワードの誤認識を低減することができる音声認識システム及び自動検索システムを提供することである。 An object of the present invention is to provide a voice recognition system and an automatic search system that can reduce misrecognition of a keyword during a natural call.

本発明の音声認識システムは、ドライバとオペレータとの通話時の発話情報を取得する発話情報取得手段と、発話情報取得手段により取得された発話情報に特徴的発話が含まれているかどうかを判断する特徴的発話判断手段と、特徴的発話判断手段により特徴的発話が含まれていると判断されたときに、特徴的発話の前後の発話内容からキーワードを抽出するキーワード抽出手段とを備えることを特徴とするものである。 The speech recognition system according to the present invention determines utterance information acquisition means for acquiring utterance information during a call between a driver and an operator, and determines whether or not a characteristic utterance is included in the utterance information acquired by the utterance information acquisition means. Characteristic utterance determination means, and keyword extraction means for extracting a keyword from utterance contents before and after the characteristic utterance when the characteristic utterance determination means determines that the characteristic utterance is included. It is what.

このように本発明の音声認識システムにおいては、ドライバとオペレータとの自然通話時の発話情報を取得し、その発話情報に特徴的発話が含まれていると判断されたときは、特徴的発話の前後の発話内容からキーワードを抽出する。オペレータによる特徴的発話の前後では、ドライバが意図する重要なキーワードが出やすくなる。これにより、ドライバ及びオペレータが発するワード量が多い自然通話時でも、発話前後でキーワードが出やすい特徴的発話を特定することで、キーワードの誤認識を低減することができる。 As described above, in the speech recognition system of the present invention, the utterance information at the time of a natural call between the driver and the operator is acquired, and when it is determined that the utterance information includes the characteristic utterance, the characteristic utterance Extract keywords from previous and next utterance contents. Important keywords intended by the driver are likely to appear before and after characteristic utterances by the operator. This makes it possible to reduce misrecognition of keywords by specifying characteristic utterances in which keywords are likely to appear before and after utterances even during natural calls with a large amount of words uttered by drivers and operators.

好ましくは、ドライバ及びオペレータの音声の認識に使用する音声認識辞書を切り替える辞書切替手段を更に備える。この場合には、適切なタイミングで適切な音声認識辞書に切り替えることで、キーワードの誤認識を一層低減しつつ、キーワードの認識時間を短縮化することができる。 Preferably, the apparatus further includes dictionary switching means for switching a voice recognition dictionary used for recognition of voices of a driver and an operator. In this case, by switching to an appropriate speech recognition dictionary at an appropriate timing, the keyword recognition time can be shortened while further reducing the erroneous keyword recognition.

このとき、特徴的発話の対話フェーズを判別する対話フェーズ判別手段を更に備え、辞書切替手段は、対話フェーズ判別手段により判別された対話フェーズに応じて音声認識辞書を切り替える手段を有することが好ましい。対話フェーズとしては、御用伺いや条件聞き出し等がある。このような対話フェーズに適した音声認識辞書を使用することで、キーワードの誤認識をより確実に低減することができる。 At this time, it is preferable to further include a dialogue phase discriminating unit for discriminating the dialogue phase of the characteristic utterance, and the dictionary switching unit includes a unit for switching the voice recognition dictionary according to the dialogue phase discriminated by the dialogue phase discriminating unit. The dialogue phase includes business visits and conditions. By using a speech recognition dictionary suitable for such a dialogue phase, keyword misrecognition can be more reliably reduced.

また、辞書切替手段は、特徴的発話の前後に出てくるワードに応じて音声認識辞書を切り替える手段を有していても良い。この場合には、特徴的発話の前後に出てくるワードに適した音声認識辞書を使用することで、キーワードの誤認識をより確実に低減することができる。 The dictionary switching means may include means for switching the speech recognition dictionary according to words appearing before and after the characteristic utterance. In this case, misrecognition of keywords can be more reliably reduced by using a speech recognition dictionary suitable for words appearing before and after a characteristic utterance.

さらに、オペレータが情報検索に関する操作を行うための検索操作手段を更に備え、辞書切替手段は、検索操作手段により検索ワードが入力されて検索処理の開始が指示されたときに、検索ワードに応じて音声認識辞書を切り替える手段を有していても良い。オペレータが検索操作手段を操作して検索処理を実行する場合には、検索操作手段により入力された検索ワードに適した音声認識辞書を使用することで、キーワードの誤認識をより確実に低減することができる。 Furthermore, a search operation means for an operator to perform an operation related to information search is further provided, and the dictionary switching means responds to the search word when the search word is input by the search operation means and the start of the search process is instructed. You may have a means to switch a speech recognition dictionary. When an operator operates search operation means to execute search processing, the use of a speech recognition dictionary suitable for the search word input by the search operation means can more reliably reduce keyword misrecognition. Can do.

本発明の自動検索システムは、ドライバとオペレータとの通話時の発話情報を取得する発話情報取得手段と、発話情報取得手段により取得された発話情報に特徴的発話が含まれているかどうかを判断する特徴的発話判断手段と、特徴的発話判断手段により特徴的発話が含まれていると判断されたときに、特徴的発話の前後の発話内容からキーワードを抽出するキーワード抽出手段と、キーワード抽出手段により抽出されたキーワードに基づいて情報検索を行う検索手段とを備えることを特徴とするものである。 The automatic search system according to the present invention determines utterance information acquisition means for acquiring utterance information during a call between a driver and an operator, and whether or not a characteristic utterance is included in the utterance information acquired by the utterance information acquisition means. A characteristic utterance judging means, a keyword extracting means for extracting a keyword from utterance contents before and after the characteristic utterance when the characteristic utterance judging means judges that the characteristic utterance is included, and a keyword extracting means Search means for searching for information based on the extracted keyword is provided.

このように本発明の自動検索システムにおいては、ドライバとオペレータとの自然通話時の発話情報を取得し、その発話情報に特徴的発話が含まれていると判断されたときは、特徴的発話の前後の発話内容からキーワードを抽出する。オペレータによる特徴的発話の前後では、ドライバが意図する重要なキーワードが出やすくなる。これにより、ドライバ及びオペレータが発するワード量が多い自然通話時でも、発話前後でキーワードが出やすい特徴的発話を特定することで、キーワードの誤認識を低減することができる。また、誤認識の少ない適切なキーワードに基づいて情報検索を自動的に行うことになるため、オペレータが検索ワードを入力しなくて済む。従って、オペレータの作業効率を向上させることができる。 As described above, in the automatic search system of the present invention, the utterance information at the time of natural call between the driver and the operator is acquired, and when it is determined that the utterance information includes the characteristic utterance, the characteristic utterance Extract keywords from previous and next utterance contents. Important keywords intended by the driver are likely to appear before and after characteristic utterances by the operator. This makes it possible to reduce misrecognition of keywords by specifying characteristic utterances in which keywords are likely to appear before and after utterances even during natural calls with a large amount of words uttered by drivers and operators. In addition, the information search is automatically performed based on an appropriate keyword with few misrecognitions, so that the operator does not have to input a search word. Therefore, the operator's work efficiency can be improved.

本発明によれば、自然通話時におけるキーワードの誤認識を低減することができる。これにより、当該キーワードを用いた情報の自動検索を適切に行うことが可能となる。 According to the present invention, it is possible to reduce misrecognition of keywords during a natural call. This makes it possible to appropriately perform automatic search for information using the keyword.

本発明に係わる自動検索システムの第１実施形態の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of 1st Embodiment of the automatic search system concerning this invention. 図１に示した音声認識処理部により実行される音声認識処理の手順の詳細を示すフローチャートである。It is a flowchart which shows the detail of the procedure of the speech recognition process performed by the speech recognition process part shown in FIG. 特徴的発話の一例を示す表である。It is a table | surface which shows an example of characteristic utterance. ドライバとオペレータとの通話イメージの一例を示すフロー図である。It is a flowchart which shows an example of the telephone call image of a driver and an operator. 本発明に係わる自動検索システムの第２実施形態の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of 2nd Embodiment of the automatic search system concerning this invention. 図５に示した辞書切替処理部により実行される音声認識辞書切替処理の手順の詳細を示すフローチャートである。It is a flowchart which shows the detail of the procedure of the speech recognition dictionary switching process performed by the dictionary switching process part shown in FIG. 対話フェーズ及び特徴的発話の一例を示す表である。It is a table | surface which shows an example of a dialogue phase and characteristic utterance. ドライバとオペレータとの通話において音声認識辞書を切り替える一例を示すフロー図である。It is a flowchart which shows an example which switches a speech recognition dictionary in the telephone call of a driver and an operator.

以下、本発明に係わる音声認識システム及び自動検索システムの好適な実施形態について、図面を参照して詳細に説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of a speech recognition system and an automatic search system according to the present invention will be described in detail with reference to the drawings.

図１は、本発明に係わる自動検索システムの第１実施形態の概略構成を示すブロック図である。同図において、本実施形態の自動検索システム１は、車両ＰのドライバとコールセンターＱのオペレータとの自然通話からＰＯＩ（Point of Interest）検索を自動的に行うシステムであり、音声認識システムを含んでいる。ＰＯＩ検索とは、飲食店、ホテル、コンビニエンスストア、ガソリンスタンド等といった施設情報の検索のことである。 FIG. 1 is a block diagram showing a schematic configuration of a first embodiment of an automatic search system according to the present invention. In the figure, an automatic search system 1 of the present embodiment is a system that automatically performs a point of interest (POI) search from a natural call between a driver of a vehicle P and an operator of a call center Q, and includes a voice recognition system. Yes. The POI search is a search for facility information such as restaurants, hotels, convenience stores, gas stations, and the like.

自動検索システム１は、車両Ｐに搭載されたカーナビゲーション（カーナビ）２及び通話機３と、コールセンターＱに設置されたコントローラユニット４、通話機５及びオペレータ端末６とを備えている。 The automatic search system 1 includes a car navigation (car navigation) 2 and a telephone 3 mounted on a vehicle P, a controller unit 4, a telephone 5 and an operator terminal 6 installed in a call center Q.

通話機３は、ドライバがオペレータと音声通話を行う携帯電話や車載無線機であり、カーナビ２と接続されている。 The telephone 3 is a mobile phone or in-vehicle wireless device in which a driver makes a voice call with an operator, and is connected to the car navigation 2.

通話機５及びオペレータ端末６は、コントローラユニット４と接続されている。通話機５は、オペレータがドライバと音声通話を行うものである。オペレータ端末６は、オペレータがＰＯＩ検索に関する操作を行ったり、検索結果を画面表示するものである。 The telephone 5 and the operator terminal 6 are connected to the controller unit 4. The telephone 5 is used by an operator to make a voice call with a driver. The operator terminal 6 is used by an operator to perform operations related to POI search and display search results on the screen.

コントローラユニット４は、音声認識処理部７と、自動検索処理部８とを有している。音声認識処理部７は、ドライバ及びオペレータの発話情報を通話機５より入力し、音声認識辞書９を用いて所定の音声認識処理を行い、その認識結果を自動検索処理部８に出力する。自動検索処理部８は、施設情報データベース１０を用いて自動ＰＯＩ検索処理を実行し、その検索結果をオペレータ端末６に画面表示させる。 The controller unit 4 includes a voice recognition processing unit 7 and an automatic search processing unit 8. The voice recognition processing unit 7 inputs the speech information of the driver and the operator from the telephone 5, performs a predetermined voice recognition process using the voice recognition dictionary 9, and outputs the recognition result to the automatic search processing unit 8. The automatic search processing unit 8 executes an automatic POI search process using the facility information database 10 and displays the search result on the operator terminal 6 on the screen.

図２は、音声認識処理部７により実行される音声認識処理の手順の詳細を示すフローチャートである。同図において、まずドライバ及びオペレータの発話情報を通話機５より取得する（手順Ｓ５１）。 FIG. 2 is a flowchart showing details of the procedure of the speech recognition processing executed by the speech recognition processing unit 7. In the figure, first, the utterance information of the driver and the operator is acquired from the telephone 5 (step S51).

続いて、取得した発話情報に予め用意された特徴的発話が含まれているかどうかを判断する（手順Ｓ５２）。特徴的発話とは、その発話前後でドライバの意図に関わるキーワードの取得が期待できるような言葉である。 Subsequently, it is determined whether or not the acquired utterance information includes a characteristic utterance prepared in advance (step S52). A characteristic utterance is a word that can be expected to acquire keywords related to the driver's intention before and after the utterance.

ここでの特徴的発話としては、例えば図３に示すように、「本日はいかがいたしましょうか」等の御用伺い、「ご希望は」等の条件聞き出し、「かしこまりました」等の理解表現が挙げられる。例えば理解表現は、オペレータがドライバの話を理解した時に発する言葉である。このため、理解表現の発話前後では、オペレータがドライバの意図を確認するような言葉が出やすい。 As the characteristic utterances here, for example, as shown in Fig. 3, there is a request for a question such as "Would you like to meet you today?" Can be mentioned. For example, the understanding expression is a word that is uttered when the operator understands the driver's story. For this reason, before and after the utterance of the understanding expression, words that allow the operator to confirm the driver's intention are likely to appear.

取得した発話情報に特徴的発話が含まれていないと判断されたときは、手順Ｓ５１に戻る。取得した発話情報に特徴的発話が含まれていると判断されたときは、その特徴的発話の出現位置を保持し、特徴的発話の前後の発話内容から、ドライバの意図に関わるキーワードを抽出する（手順Ｓ５３）。 When it is determined that the acquired utterance information does not include a characteristic utterance, the process returns to step S51. When it is determined that a characteristic utterance is included in the acquired utterance information, the appearance position of the characteristic utterance is retained, and keywords related to the driver's intention are extracted from the utterance contents before and after the characteristic utterance. (Procedure S53).

続いて、抽出したキーワードにエリアやジャンル語等、検索キーワードとなり得るものが存在しているかどうかを判断する（手順Ｓ５４）。抽出したキーワードにエリアやジャンル語等が存在していないと判断されたときは、手順Ｓ５１に戻る。抽出したキーワードにエリアやジャンル語等が存在していると判断されたときは、検索キーワードとなり得るエリアやジャンル語が揃ったかどうかを判断する（手順Ｓ５５）。検索キーワードとなり得るエリアやジャンル語が揃っていないと判断されたときは、手順Ｓ５１に戻る。 Subsequently, it is determined whether or not the extracted keyword includes a search keyword such as an area or a genre word (step S54). When it is determined that no area or genre word exists in the extracted keyword, the process returns to step S51. If it is determined that an area or genre word exists in the extracted keyword, it is determined whether or not an area or genre word that can be a search keyword is prepared (step S55). If it is determined that there are no areas or genre words that can be used as search keywords, the process returns to step S51.

検索キーワードとなり得るエリアやジャンル語が揃ったと判断されたときは、その検索キーワードを自動検索処理部８に送出し、自動ＰＯＩ検索の実行を要求する（手順Ｓ５６）。 If it is determined that the search keyword area and genre words have been prepared, the search keyword is sent to the automatic search processing unit 8 to request execution of an automatic POI search (step S56).

すると、自動検索処理部８は、音声認識処理部７からの検索キーワードを用いて自動的にＰＯＩ検索を行い、その検索結果をオペレータ端末６に画面表示させることでオペレータに提示する。その後、オペレータは、通話によって検索結果をドライバに報告する。 Then, the automatic search processing unit 8 automatically performs a POI search using the search keyword from the voice recognition processing unit 7 and displays the search result on the operator terminal 6 to present it to the operator. Thereafter, the operator reports the search result to the driver through a call.

図４は、ドライバとオペレータとのシナリオのない通話イメージの一例を示したものである。同図において、オペレータが発した「本日いかがいたしましょうか」、「ご希望」、「かしこまりました」、「予算」が特徴的発話となる。そして、特徴的発話の前後でドライバ及びオペレータが発した「高円寺」、「和食」が、ドライバの意図に関わる重要なキーワードとなる。従って、「高円寺」、「和食」が検索キーワードとして音声認識処理部７から自動検索処理部８に送られ、自動検索処理部８では、「高円寺」、「和食」で自動ＰＯＩ検索が実行されることになる。 FIG. 4 shows an example of a call image without a scenario between a driver and an operator. In the figure, the utterances “Let's go today”, “Wanted”, “Strictly” and “Budget” uttered by the operator are characteristic utterances. Then, “Koenji” and “Japanese food” uttered by the driver and operator before and after the characteristic utterance are important keywords related to the driver's intention. Accordingly, “Koenji” and “Japanese food” are sent as search keywords from the speech recognition processing unit 7 to the automatic search processing unit 8, and the automatic search processing unit 8 executes an automatic POI search for “Koenji” and “Japanese food”. It will be.

以上において、通話機５と音声認識処理部７の上記手順Ｓ５１とは、ドライバとオペレータとの通話時の発話情報を取得する発話情報取得手段を構成する。同手順Ｓ５２は、発話情報取得手段により取得された発話情報に特徴的発話が含まれているかどうかを判断する特徴的発話判断手段を構成する。同手順Ｓ５３〜Ｓ５５は、特徴的発話判断手段により特徴的発話が含まれていると判断されたときに、特徴的発話の前後の発話内容からキーワードを抽出するキーワード抽出手段を構成する。また、同手順Ｓ５６と自動検索処理部８とは、キーワード抽出手段により抽出されたキーワードに基づいて情報検索を行う検索手段を構成する。 In the above, the procedure S51 of the telephone 5 and the voice recognition processing unit 7 constitutes utterance information acquisition means for acquiring utterance information during a call between the driver and the operator. The procedure S52 constitutes characteristic utterance determining means for determining whether or not characteristic utterance is included in the utterance information acquired by the utterance information acquiring means. The steps S53 to S55 constitute keyword extracting means for extracting a keyword from utterance contents before and after the characteristic utterance when the characteristic utterance determining means determines that the characteristic utterance is included. The procedure S56 and the automatic search processing unit 8 constitute search means for performing information search based on the keyword extracted by the keyword extraction means.

以上のように本実施形態にあっては、ドライバとオペレータとの通話時の発話情報から、発話前後でドライバの意図に関わるキーワードの位置を特定しやすい特徴的発話を抽出し、その特徴的発話の前後の発話情報からキーワードを抽出するようにしたので、ドライバとオペレータとのシナリオのない自然通話を行う際に、自動ＰＯＩ検索に用いるキーワードの誤認識を抑制することができる。 As described above, in the present embodiment, characteristic utterances are extracted from the utterance information at the time of a call between the driver and the operator so as to easily identify the position of the keyword related to the driver's intention before and after the utterance. Since the keywords are extracted from the utterance information before and after the keyword, erroneous recognition of the keywords used for the automatic POI search can be suppressed when a natural call without a scenario between the driver and the operator is performed.

また、ドライバの意図に関わるキーワードを用いたＰＯＩ検索が自動的に実行されるので、オペレータがオペレータ端末６により検索ワードをいちいち入力したり、検索指示ボタンを操作しなくて済む。従って、オペレータの作業効率を高めることができる。その結果、ドライバへの検索結果の回答時間が短くなるため、ドライバが受けるストレスを軽減することができる。 In addition, since the POI search using a keyword related to the driver's intention is automatically executed, the operator does not have to input the search word one by one or operate the search instruction button. Accordingly, it is possible to improve the operator's work efficiency. As a result, the response time of the search result to the driver is shortened, so that the stress received by the driver can be reduced.

図５は、本発明に係わる自動検索システムの第２実施形態の概略構成を示すブロック図である。図中、第１実施形態と同一または同等の要素には同じ符号を付し、その説明を省略する。 FIG. 5 is a block diagram showing a schematic configuration of the second embodiment of the automatic search system according to the present invention. In the figure, the same or equivalent elements as those in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

同図において、本実施形態の自動検索システム１は、コントローラユニット４に代えてコントローラユニット２０を備えている。コントローラユニット２０は、上記の音声認識処理部７及び自動検索処理部８に加え、辞書切替処理部２１を有している。 In the figure, the automatic search system 1 of this embodiment includes a controller unit 20 instead of the controller unit 4. The controller unit 20 includes a dictionary switching processing unit 21 in addition to the voice recognition processing unit 7 and the automatic search processing unit 8 described above.

辞書切替処理部２１は、ドライバとオペレータとの通話の流れ又はオペレータによるオペレータ端末６の操作をトリガとして、使用する音声認識辞書９を切り替える。音声認識辞書９としては、例えば全ての語が登録された辞書、ジャンル語、エリアに関する語及び要求語をが登録された辞書、食事に関する語が登録された辞書、メニューに関する語及び関連語が登録された辞書等というように複数種類用意されている（図８参照）。 The dictionary switching processing unit 21 switches the speech recognition dictionary 9 to be used with a flow of a call between the driver and the operator or an operation of the operator terminal 6 by the operator as a trigger. As the speech recognition dictionary 9, for example, a dictionary in which all words are registered, a genre word, a dictionary in which words related to areas and requested words are registered, a dictionary in which words related to meals are registered, words related to menus, and related words are registered. A plurality of types are prepared such as a dictionary or the like (see FIG. 8).

図６は、辞書切替処理部２１により実行される音声認識辞書切替処理の手順の詳細を示すフローチャートである。同図において、まずドライバ及びオペレータの発話情報を通話機５より取得する（手順Ｓ６１）。 FIG. 6 is a flowchart showing details of the procedure of the speech recognition dictionary switching process executed by the dictionary switching processing unit 21. In the figure, first, utterance information of a driver and an operator is acquired from the telephone 5 (step S61).

続いて、オペレータによりオペレータ端末６の検索指示ボタンが押されたかどうかを判断する（手順Ｓ６２）。検索指示ボタンが押されたと判断されたときは、オペレータ端末６に入力された検索キーワードに適した音声認識辞書９に切り替える（手順Ｓ６３）。 Subsequently, it is determined whether or not the search instruction button of the operator terminal 6 has been pressed by the operator (step S62). When it is determined that the search instruction button has been pressed, the voice recognition dictionary 9 suitable for the search keyword input to the operator terminal 6 is switched (step S63).

検索キーワードとして必須のジャンル語やエリアに関する語が通話中に出現すると、オペレータは、そのジャンル語やエリアに関する語をオペレータ端末６に入力し、更にオペレータ端末６の検索指示ボタンを押すことがある。従って、オペレータによるオペレータ端末６の操作は、音声認識辞書９を切り替えるトリガとして重要な役割を果たす。このとき、例えばジャンル語が飲食店である場合には、食事のメニューに関する語が登録された音声認識辞書９に切り替えられる。 When a word related to a genre word or area that is essential as a search keyword appears during a call, the operator may input the word related to the genre word or area to the operator terminal 6 and further press a search instruction button on the operator terminal 6. Therefore, the operation of the operator terminal 6 by the operator plays an important role as a trigger for switching the voice recognition dictionary 9. At this time, for example, when the genre word is a restaurant, the language is switched to the voice recognition dictionary 9 in which words related to the meal menu are registered.

検索指示ボタンが押されていないと判断されたときは、手順Ｓ６１で取得した発話情報に予め用意された特徴的発話が含まれているかどうかを判断する（手順Ｓ６４）。 When it is determined that the search instruction button has not been pressed, it is determined whether or not a characteristic utterance prepared in advance is included in the utterance information acquired in step S61 (step S64).

特徴的発話は、上述したように、発話前後でドライバの意図に関わるキーワードの取得が期待できるような言葉であり、例えば図７に示すようなものが挙げられる。特徴的発話は、対話フェーズに区分けされている。対話フェーズは、ドライバとオペレータとの通話を進捗や話題の変化により分割して出来たまとまりのことである。対話フェーズとしては、御用伺い、条件聞き出し（概要）、条件聞き出し（詳細）がある。 As described above, the characteristic utterance is a word that can be expected to acquire a keyword related to the driver's intention before and after the utterance, and examples thereof include those shown in FIG. Characteristic utterances are divided into dialogue phases. The dialogue phase is a group of calls made by dividing a call between a driver and an operator according to progress or changes in topics. As the dialogue phase, there are a visit, a condition inquiry (outline), and a condition inquiry (detail).

取得した発話情報に特徴的発話が含まれていないと判断されたときは、手順Ｓ６１に戻る。一方、取得した発話情報に特徴的発話が含まれていると判断されたときは、特徴的発話の対話フェーズを判別する（手順Ｓ６５）。そして、判別された対話フェーズに適した音声認識辞書９に切り替える（手順Ｓ６６）。 When it is determined that the acquired utterance information does not include a characteristic utterance, the process returns to step S61. On the other hand, when it is determined that the acquired utterance information includes a characteristic utterance, the dialogue phase of the characteristic utterance is determined (step S65). Then, the speech recognition dictionary 9 suitable for the determined dialogue phase is switched (step S66).

例えば特徴的発話が「本日はいかが致しましょうか」の場合には、対話フェーズは御用伺いとなり、検索の主キーワードの取得を意図した対話フェーズに移行したことを示すことになる。よって、かかる対話フェーズに移行をトリガとして音声認識辞書９を切り替える。対話フェーズが御用伺いである場合には、辞書をジャンル語やエリアに絞れるため、ジャンル語やエリアに関する語が登録された音声認識辞書９に切り替えられる。 For example, if the characteristic utterance is “Would you like to do it today?”, The dialogue phase will be a patronage, indicating that it has shifted to the dialogue phase intended to acquire the main keywords of the search. Therefore, the voice recognition dictionary 9 is switched by using the transition to the dialogue phase as a trigger. When the dialogue phase is a visit, the dictionary can be narrowed down to genre words and areas, so that the speech recognition dictionary 9 in which words related to genre words and areas are registered is switched.

また、手順Ｓ６５，Ｓ６６の処理と併行して、特徴的発話の前後に出現するワード（語）に適した音声認識辞書９に切り替える（手順Ｓ６７）。特徴的発話の前後に出現するワードは、ドライバの意図に関わる重要なワードであることが多いため、音声認識辞書９を切り替えるトリガとして扱う。 In parallel with the processing of steps S65 and S66, the speech recognition dictionary 9 suitable for words (words) appearing before and after the characteristic utterance is switched (step S67). Since words appearing before and after the characteristic utterance are often important words related to the driver's intention, they are handled as triggers for switching the speech recognition dictionary 9.

図８は、ドライバとオペレータとの通話において音声認識辞書を切り替える一例を示したものである。同図において、最初は、全ての語が登録された音声認識辞書が使用される。 FIG. 8 shows an example of switching the voice recognition dictionary in a call between a driver and an operator. In the figure, first, a speech recognition dictionary in which all words are registered is used.

オペレータが「本日はどのようなご用件でしょう」と話すと、その特徴的発話の対話フェーズ（御用伺い）をトリガとして、ジャンル語、エリアに関する語及び要求語が登録された音声認識辞書に切り替えられる。そして、オペレータが発した上記特徴的発話に対して、ドライバが「新宿で食事がしたいんですけど」と話すと、「食事」という語をトリガとして、ジャンル語（ここでは食事を行う飲食店に関する語）が登録された音声認識辞書に切り替えられる。 When the operator says, “What is your business today?”, The dialogue phase (inquiry) of the characteristic utterance is used as a trigger in the speech recognition dictionary where genre words, area words, and required words are registered. Can be switched. Then, in response to the characteristic utterances uttered by the operator, when the driver says "I want to eat in Shinjuku", the word "meal" is used as a trigger, and here is a genre word (related to the restaurant where you eat) Word) is switched to the registered speech recognition dictionary.

その後、ドライバとオペレータとの通話において「ラーメン屋」というドライバの意図に関わる重要なワードが出現すると、オペレータは、「新宿ラーメン屋」を検索ワードとしてオペレータ端末６に入力し、オペレータ端末６の検索指示ボタンを押下する。すると、その検索指示ボタンの操作をトリガとして、ラーメンのメニューに関する語及び関連語が登録された音声認識辞書に切り替えられる。 After that, when an important word related to the driver's intention called “ramen shop” appears in the call between the driver and the operator, the operator inputs “Shinjuku ramen shop” to the operator terminal 6 as a search word, and searches for the operator terminal 6. Press the instruction button. Then, the operation of the search instruction button is used as a trigger to switch to the speech recognition dictionary in which words related to the ramen menu and related words are registered.

その後、ドライバとオペレータとの通話において「とんこつラーメン」というドライバの意図に関わる重要なワードが出現すると、例えば「新宿とんこつラーメン」を検索キーワードとして、自動検索処理部８により自動ＰＯＩ検索が実行されることになる。 Thereafter, when an important word related to the driver's intention called “Tonkotsu Ramen” appears in the call between the driver and the operator, for example, the automatic search processing unit 8 executes an automatic POI search using “Shinjuku Tonkotsu Ramen” as a search keyword. It will be.

以上において、通話機５と自動検索処理部８の上記手順Ｓ６１とは、ドライバとオペレータとの通話時の発話情報を取得する発話情報取得手段を構成する。同上記手順Ｓ６４は、発話情報取得手段により取得された発話情報に特徴的発話が含まれているかどうかを判断する特徴的発話判断手段を構成する。同上記手順Ｓ６２，Ｓ６３，Ｓ６６，Ｓ６７は、ドライバ及びオペレータの音声の認識に使用する音声認識辞書を切り替える辞書切替手段を構成する。また、同上記手順Ｓ６５は、特徴的発話の対話フェーズを判別する対話フェーズ判別手段を構成する。 In the above, the communication device 5 and the procedure S61 of the automatic search processing unit 8 constitute speech information acquisition means for acquiring speech information during a call between the driver and the operator. The procedure S64 constitutes characteristic utterance determining means for determining whether or not characteristic utterance is included in the utterance information acquired by the utterance information acquiring means. The above steps S62, S63, S66, and S67 constitute a dictionary switching means for switching a voice recognition dictionary used for recognition of voices of drivers and operators. The procedure S65 constitutes a dialogue phase discrimination means for discriminating the dialogue phase of the characteristic utterance.

ところで、音声認識辞書９に登録される語彙が多くなるほど、音声の認識率が下がる傾向にある。このため、従来では、例えば分類別の音声認識辞書を用意し、誤認識したら音声認識辞書を順次切り替えるという方法がある。しかし、この場合には、誤認識が続くと、音声認識時間が長くなってしまう。 By the way, as the vocabulary registered in the speech recognition dictionary 9 increases, the speech recognition rate tends to decrease. For this reason, conventionally, for example, there is a method of preparing a speech recognition dictionary for each category and sequentially switching the speech recognition dictionary when erroneous recognition is performed. However, in this case, if the erroneous recognition continues, the voice recognition time becomes long.

これに対し本実施形態では、ドライバとオペレータとの通話の流れとオペレータによるオペレータ端末８の検索指示ボタンの操作とをトリガとして、適切なタイミングで登録語彙の少ない適切な音声認識辞書９に切り替えるようにしたので、音声認識の精度が高くなる。これにより、自動ＰＯＩ検索に用いるキーワードの誤認識を一層抑制することができる。また、キーワードを認識するまでの時間を短縮することができる。 On the other hand, in this embodiment, the flow of the call between the driver and the operator and the operation of the search instruction button on the operator terminal 8 by the operator are used as triggers to switch to the appropriate speech recognition dictionary 9 with a small registered vocabulary at an appropriate timing. As a result, the accuracy of voice recognition increases. Thereby, the misrecognition of the keyword used for an automatic POI search can be further suppressed. Moreover, the time until the keyword is recognized can be shortened.

なお、本発明は、上記実施形態に限定されるものではない。例えば本発明の音声認識システムは、ドライバとオペレータとの通話時の音声認識を行うものであれば、上述した自動検索システム以外のシステムにも適用可能である。 The present invention is not limited to the above embodiment. For example, the voice recognition system of the present invention can be applied to systems other than the automatic search system described above as long as voice recognition is performed during a call between a driver and an operator.

１…自動検索システム、４…コントローラユニット、５…通話機（発話情報取得手段、６…オペレータ端末（検索操作手段）、７…音声認識処理部（発話情報取得手段、特徴的発話判断手段、キーワード抽出手段、検索手段）、８…自動検索処理部（検索手段）、９…音声認識辞書、２０…コントローラユニット、２１…辞書切替処理部（発話情報取得手段、特徴的発話判断手段、辞書切替手段、対話フェーズ判別手段）。 DESCRIPTION OF SYMBOLS 1 ... Automatic search system, 4 ... Controller unit, 5 ... Call machine (speech information acquisition means, 6 ... Operator terminal (search operation means), 7 ... Voice recognition process part (Speech information acquisition means, characteristic utterance judgment means, keyword Extraction means, search means), 8 ... automatic search processing section (search means), 9 ... voice recognition dictionary, 20 ... controller unit, 21 ... dictionary switching processing section (utterance information acquisition means, characteristic utterance determination means, dictionary switching means) , Dialog phase discrimination means).

Claims

Utterance information acquisition means for acquiring utterance information during a call between a driver and an operator;
Characteristic utterance determination means for determining whether or not characteristic utterance is included in the utterance information acquired by the utterance information acquisition means;
Voice recognition comprising: keyword extraction means for extracting a keyword from utterance contents before and after the characteristic utterance when the characteristic utterance determination means determines that the characteristic utterance is included. system.

The speech recognition system according to claim 1, further comprising dictionary switching means for switching a speech recognition dictionary used for recognition of speech of the driver and the operator.

A dialogue phase discrimination means for discriminating a dialogue phase of the characteristic utterance;
3. The speech recognition system according to claim 2, wherein the dictionary switching means includes means for switching the speech recognition dictionary in accordance with the dialog phase determined by the dialog phase determining means.

4. The speech recognition system according to claim 2, wherein the dictionary switching means includes means for switching the speech recognition dictionary according to words appearing before and after the characteristic utterance.

The operator further comprises search operation means for performing an operation related to information search,
The dictionary switching means includes means for switching the speech recognition dictionary according to the search word when a search word is input by the search operation means and an instruction to start search processing is given. The speech recognition system according to any one of claims 2 to 4.

Utterance information acquisition means for acquiring utterance information during a call between a driver and an operator;
Characteristic utterance determination means for determining whether or not characteristic utterance is included in the utterance information acquired by the utterance information acquisition means;
Keyword extracting means for extracting a keyword from utterance contents before and after the characteristic utterance when the characteristic utterance determining means determines that the characteristic utterance is included;
An automatic search system comprising: search means for performing information search based on the keyword extracted by the keyword extraction means.