JP2016218361A

JP2016218361A - Speech recognition system, in-vehicle device, and server device

Info

Publication number: JP2016218361A
Application number: JP2015105783A
Authority: JP
Inventors: 山口　敦史; Atsushi Yamaguchi; 敦史山口; 亜紀子荒川; Akiko Arakawa; 竹内　良輔; Ryosuke Takeuchi; 良輔竹内
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2015-05-25
Filing date: 2015-05-25
Publication date: 2016-12-22

Abstract

PROBLEM TO BE SOLVED: To improve a response speed to user's speaking while reducing the burden of a communication cost.SOLUTION: If words in user's speaking are not registered with additional dictionary information specifying vehicle devices and operation contents, an in-vehicle device transmits speech information of user's speaking to a server device, and the server device determines whether the speech information acquired from the in-vehicle device intends an operation of the vehicle device or not. If the speech information is determined to intend an operation of the vehicle device as the result of the determination, the server device transmits additional dictionary registration information associating the words in user's speaking and information showing an operation command for the vehicle device with each other, to the in-vehicle device, and the in-vehicle device uses the additional dictionary registration information acquired from the server device to generate additional dictionary information associating the words in user's speaking and the operation command with each other and uses the operation command to output an execution indication for the operation to the corresponding vehicle device.SELECTED DRAWING: Figure 3

Description

本発明は、音声認識システム、車載器およびサーバ装置に関する。 The present invention relates to a voice recognition system, an in-vehicle device, and a server device.

特許文献１には、音声入力部に入力された音声コマンドの発話データと発話データをもとに情報センタで得られた認識結果とを少なくとも対応付けた対応関係リスト登録していくことによって音声認識用の辞書としての簡易辞書を作成・更新するとともに、情報センタで音声認識を行わせることが可能でないと判定した場合には、音声入力部に入力された音声コマンドの発話データをもとに、最新の簡易辞書を用いてこの発話データに対応する認識結果を得ることによってナビゲーション装置側で音声認識を行う音声認識システムが記載されている。 In Patent Document 1, speech recognition is performed by registering a correspondence list at least associating speech data of speech commands input to the speech input unit with recognition results obtained from the information center based on speech data. If it is determined that it is not possible to perform voice recognition at the information center, based on the utterance data of the voice command input to the voice input unit, A speech recognition system that performs speech recognition on the navigation device side by obtaining a recognition result corresponding to the utterance data using the latest simple dictionary is described.

特開２０１０−２２４３０１号公報JP 2010-224301 A

特許文献１の音声認識システムでは、車載端末と音声認識サーバとの間の通信が確立できない等の場合を除き、ユーザから受け付けた発話データを音声認識サーバに送信し、音声認識結果を受け取っている。すなわち、ユーザ発話と一致する発話データが簡易辞書に登録されている場合でも、所定の場合を除いてユーザの発話データは音声認識サーバに送信される。しかしながら、都度、発話データを音声認識サーバに送信すれば、その分、通信時間がかかるためユーザ発話に対する応答速度が遅くなり、通信コストの負担も増大するという問題がある。 In the speech recognition system of Patent Document 1, the speech data received from the user is transmitted to the speech recognition server, and the speech recognition result is received, except when the communication between the in-vehicle terminal and the speech recognition server cannot be established. . That is, even when utterance data that matches the user utterance is registered in the simple dictionary, the user's utterance data is transmitted to the voice recognition server except for a predetermined case. However, if the utterance data is transmitted to the voice recognition server each time, there is a problem that the communication time is increased accordingly, the response speed to the user utterance is slowed, and the communication cost is increased.

そこで、本発明は、通信コストの負担を低減しつつ、ユーザ発話に対する応答速度を向上させる音声認識システムの提供を目的とする。 Accordingly, an object of the present invention is to provide a speech recognition system that improves the response speed to user utterances while reducing the burden of communication costs.

上記課題を解決するため、本発明に係る音声認識システムは、車載器およびサーバ装置を備えた音声認識システムであって、前記車載器は、ユーザ発話の言葉が、車両機器および操作内容を特定する追加辞書情報に登録されていない場合、該ユーザ発話の音声情報を前記サーバ装置に送信し、前記サーバ装置は、前記車載器から取得した前記音声情報が前記車両機器の操作を意図するものであるか否かを判定し、判定の結果、前記車両機器の操作を意図するものであると判定した場合、前記ユーザ発話の言葉と、前記車両機器の操作コマンドを示す情報とを対応付けた追加辞書登録情報を前記車載器に送信し、前記車載器は、前記サーバ装置から取得した前記追加辞書登録情報を用いて、前記ユーザ発話の言葉と前記操作コマンドとを対応付けた追加辞書情報を生成し、前記操作コマンドを用いて、対応する前記車両機器に対して操作の実行指示を出力する。 In order to solve the above-described problem, a voice recognition system according to the present invention is a voice recognition system including an on-vehicle device and a server device, and the on-vehicle device specifies a vehicle device and an operation content in terms of a user utterance. When not registered in the additional dictionary information, the voice information of the user utterance is transmitted to the server device, and the server device intends the operation of the vehicle device by the voice information acquired from the vehicle-mounted device. And when it is determined that the operation of the vehicle device is intended, an additional dictionary that associates the words of the user utterance with information indicating the operation command of the vehicle device The registration information is transmitted to the on-vehicle device, and the on-vehicle device associates the word of the user utterance with the operation command using the additional dictionary registration information acquired from the server device. Generating the additional dictionary information, by using the operation command, and outputs an instruction to execute the operation with respect to the corresponding said vehicle equipment.

また、本発明に係る車載器は、車両機器の操作を示す言葉と、車両機器および操作内容の識別番号を組合せた操作コマンドと、を対応付けた追加辞書情報を記憶した記憶部と、ユーザ発話の音声情報の入力を受け付ける入力受付部と、前記音声情報をテキスト変換した発話テキスト情報を生成し、該発話テキスト情報を用いて前記追加辞書情報を検索する音声認識部と、前記検索の結果、前記テキスト情報が示すユーザ発話の言葉が前記追加辞書情報に登録されていない場合、前記ユーザ発話の音声情報を所定のサーバ装置に送信する通信部とを備える。 In addition, the vehicle-mounted device according to the present invention includes a storage unit that stores additional dictionary information in which a word indicating an operation of a vehicle device and an operation command in which an identification number of the vehicle device and an operation content is associated, and a user utterance An input receiving unit that receives input of the voice information, a speech recognition unit that generates speech text information obtained by text-converting the speech information, and searches the additional dictionary information using the speech text information; and a result of the search, A communication unit that transmits voice information of the user utterance to a predetermined server device when a word of the user utterance indicated by the text information is not registered in the additional dictionary information.

また、本発明に係るサーバ装置は、車載器から取得したユーザ発話の音声情報を音声認識し、該音声情報をテキスト変換した発話テキスト情報を生成する操作コマンド特定部を備え、前記操作コマンド特定部は、前記発話テキスト情報を用いて、前記ユーザ発話が示す言葉が前記車載器に搭載された車両機器の操作を意図するものであるか否かを判定する
。 The server device according to the present invention further includes an operation command specifying unit that recognizes voice information of a user utterance acquired from the vehicle-mounted device and generates utterance text information obtained by converting the voice information into text. Uses the utterance text information to determine whether or not a word indicated by the user utterance is intended to operate a vehicle device mounted on the vehicle-mounted device.

本発明に係る音声認識システムによれば、通信コストの負担を低減しつつ、ユーザ発話に対する応答速度を向上させることができる。 According to the voice recognition system of the present invention, it is possible to improve the response speed to user utterance while reducing the burden of communication cost.

なお、上記以外の課題、構成および効果等は、以下の実施形態の説明により明らかにされる。 Problems, configurations, effects, and the like other than those described above will be clarified by the following description of embodiments.

本発明の一実施形態に係るヘッドユニットを実現する車載器のハードウェア構成の一例を示した図である。It is the figure which showed an example of the hardware constitutions of the onboard equipment which implement | achieves the head unit which concerns on one Embodiment of this invention. 本発明の一実施形態に係るサーバ装置のハードウェア構成の一例を示した図である。It is the figure which showed an example of the hardware constitutions of the server apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る車載器およびサーバ装置の機能ブロックの一例を示した図である。It is the figure which showed an example of the functional block of the onboard equipment and server apparatus which concern on one Embodiment of this invention. 図４（ａ）は、本発明の一実施形態に係る操作コマンド情報の一例を示した図である。図４（ｂ）は、本発明の一実施形態に係る追加辞書情報の一例を示す情報である。FIG. 4A is a diagram showing an example of operation command information according to an embodiment of the present invention. FIG. 4B is information indicating an example of additional dictionary information according to an embodiment of the present invention. 本発明の一実施形態に係る音声認識処理の流れの一例を示したフロー図である。It is the flowchart which showed an example of the flow of the speech recognition process which concerns on one Embodiment of this invention. 本発明の一実施形態に係る操作コマンド特定処理の流れの一例を示したフロー図である。It is the flowchart which showed an example of the flow of the operation command specific process which concerns on one Embodiment of this invention. 本発明の一実施形態に係る追加辞書登録処理の流れの一例を示したフロー図である。It is the flowchart which showed an example of the flow of the additional dictionary registration process which concerns on one Embodiment of this invention.

以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

本実施形態に係る音声認識（ＶＲ）システムは、ヘッドユニット（Ｈ／Ｕ）とサーバ装置とを有する。図１は、ヘッドユニットを実現する車載器１００のハードウェア構成の一例を示した図である。車載器１００は、車に搭載されうる電子機器のことであり、非搭載状態で流通されるものを含む。また、本実施形態に係る車載器１００は、推奨経路の探索や、地図情報および交通情報の表示等、いわゆるナビゲーション機能を備えたナビゲーション装置である。ただし、車載器１００は、ナビゲーション装置に限られるものではなく、音声の入出力機能、音声認識機能およびサーバ装置との通信機能を備えたコンポーネント（例えば、車載オーディオシステムなど）であればどのような種類の機器であっても良い。 The voice recognition (VR) system according to the present embodiment includes a head unit (H / U) and a server device. FIG. 1 is a diagram illustrating an example of a hardware configuration of the vehicle-mounted device 100 that realizes the head unit. The vehicle-mounted device 100 is an electronic device that can be mounted on a vehicle, and includes devices that are distributed in a non-mounted state. Moreover, the onboard equipment 100 which concerns on this embodiment is a navigation apparatus provided with what is called a navigation function, such as a search of a recommended route, and the display of map information and traffic information. However, the vehicle-mounted device 100 is not limited to the navigation device, and any component having a voice input / output function, a voice recognition function, and a communication function with the server device (for example, a vehicle-mounted audio system) may be used. It may be a type of equipment.

車載器１００は、演算処理装置１と、ディスプレイ２と、記憶装置３と、音声入出力装置４（音声入力装置としてのマイクロフォン４１と、音声出力装置としてのスピーカ４２とを有する）と、入力装置５と、ＲＯＭ装置６と、車速センサ７と、ジャイロセンサ８と、ＧＰＳ(ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ)受信装置９と、ＦＭ多重放送受信装置１０と、ビーコン受信装置１１と、通信装置１２と、ＣＡＮＩ／Ｆ（コントローラエリアネットワークインターフェース）１３とを有している。 The vehicle-mounted device 100 includes an arithmetic processing device 1, a display 2, a storage device 3, a voice input / output device 4 (having a microphone 41 as a voice input device and a speaker 42 as a voice output device), and an input device. 5, ROM device 6, vehicle speed sensor 7, gyro sensor 8, GPS (Global Positioning System) receiving device 9, FM multiplex broadcast receiving device 10, beacon receiving device 11, communication device 12, CAN I / F (controller area network interface) 13.

演算処理装置１は、車載器１００の様々な処理を行う中心的なユニットである。演算処理装置１は、例えば、車速センサ７などの各種センサおよびＧＰＳ受信装置９から出力された情報を用いて現在地を検出する。また、演算処理装置１は、得られた現在地情報に基づいて、表示に必要な地図情報を記憶装置３あるいはＲＯＭ装置６から読み出す。また、演算処理装置１は、読み出した地図情報をグラフィックス展開し、そこに現在地を示すマークを重ねてディスプレイ２へ表示させる信号を出力する。また、演算処理装置１は、記憶装置３あるいはＲＯＭ装置６に記憶されている地図情報などを用いて、ユーザから指示された出発地と目的地とを結ぶ推奨経路を探索する。また、演算処理装置１は、スピーカ４２やディスプレイ２に所定の信号を出力して経路誘導を行う。 The arithmetic processing device 1 is a central unit that performs various processes of the vehicle-mounted device 100. The arithmetic processing device 1 detects the current location using, for example, various sensors such as the vehicle speed sensor 7 and information output from the GPS receiver 9. The arithmetic processing device 1 reads map information necessary for display from the storage device 3 or the ROM device 6 based on the obtained current location information. Further, the arithmetic processing device 1 develops the read map information in graphics, and outputs a signal to be displayed on the display 2 with a mark indicating the current location superimposed thereon. Further, the arithmetic processing device 1 searches for a recommended route connecting the departure point and the destination instructed by the user using the map information stored in the storage device 3 or the ROM device 6. In addition, the arithmetic processing device 1 outputs a predetermined signal to the speaker 42 and the display 2 to perform route guidance.

また、演算処理装置１は、マイクロフォン４１を介して入力されたユーザ発話の音声情報を用いて音声認識処理を行う。また、演算処理装置１は、所定の場合に、通信装置１２を介してユーザ発話の音声情報をサーバ装置に送信する。 In addition, the arithmetic processing device 1 performs voice recognition processing using voice information of a user utterance input via the microphone 41. In addition, the arithmetic processing device 1 transmits voice information of a user utterance to the server device via the communication device 12 in a predetermined case.

また、演算処理装置１は、通信装置１２を介して、ユーザ発話の音声情報をテキスト変換したテキスト情報と、車両機器の操作コマンドとを含む追加辞書登録情報をサーバ装置から取得する。また、演算処理装置１は、追加辞書登録情報を用いて追加辞書を生成および更新し、操作コマンドにより特定される車両機器操作の実行指示を出力する。 Further, the arithmetic processing device 1 acquires additional dictionary registration information including text information obtained by text-converting voice information of a user utterance and operation commands for vehicle equipment from the server device via the communication device 12. In addition, the arithmetic processing device 1 generates and updates an additional dictionary using the additional dictionary registration information, and outputs a vehicle device operation execution instruction specified by the operation command.

なお、車両機器とは、エアコン、オーディオ、ナビゲーション装置、ウィンカーおよびワイパーなど、ＣＡＮ（ＣｏｎｔｒｏｌｌｅｒＡｒｅａＮｅｔｗｏｒｋ）で相互に電機接続されている機器や、ユーザにより操作可能な機器である。 The vehicle device is a device that is electrically connected to each other by a CAN (Controller Area Network), such as an air conditioner, an audio, a navigation device, a blinker, and a wiper, or a device that can be operated by a user.

このような演算処理装置１は、各デバイス間をバスで接続した構成となっている。具体的には、演算処理装置１は、数値演算及び各デバイスを制御するなど様々な処理を実行するＣＰＵ２１(ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ)と、記憶装置３またはＲＯＭ２３から読み出した地図情報や演算データなどを格納するＲＡＭ２２(ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ)と、ＣＰＵ２１が実現するブートプログラムやＣＰＵ２１が実行するプログラム（例えば、音声認識（ＶＲ）機能を実現するプログラム）などを格納するＲＯＭ２３(ＲｅａｄＯｎｌｙＭｅｍｏｒｙ)と、演算処理装置１に各種ハードウェアを接続するためのＩ／Ｆ２４（インターフェイス）と、これらを相互に接続するバス２５とを有している。 Such an arithmetic processing device 1 has a configuration in which devices are connected by a bus. Specifically, the arithmetic processing apparatus 1 stores a CPU 21 (Central Processing Unit) that executes various processes such as numerical calculation and control of each device, and map information and arithmetic data read from the storage device 3 or the ROM 23. A RAM 22 (Random Access Memory), a ROM 23 (Read Only Memory) for storing a boot program realized by the CPU 21, a program executed by the CPU 21 (for example, a program realizing a voice recognition (VR) function), and the like, and an arithmetic processing unit 1 includes an I / F 24 (interface) for connecting various kinds of hardware and a bus 25 for connecting them to each other.

ディスプレイ２は、グラフィックス情報を表示するユニットである。ディスプレイ２は、例えば、液晶ディスプレイや有機ＥＬディスプレイなどで構成される。 The display 2 is a unit that displays graphics information. The display 2 is composed of, for example, a liquid crystal display or an organic EL display.

記憶装置３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）や不揮発性メモリカードといった、少なくとも読み書きが可能な記憶媒体で構成される。記憶装置３には、例えば、演算処理装置１によって用いられる様々な情報（例えば、地図情報など）が格納されている。 The storage device 3 includes at least a readable / writable storage medium such as an HDD (Hard Disk Drive) or a nonvolatile memory card. In the storage device 3, for example, various information (for example, map information) used by the arithmetic processing device 1 is stored.

音声入出力装置４は、音声入力装置としてのマイクロフォン４１と、音声出力装置としてのスピーカ４２とを有する。マイクロフォン４１は、運転者や同乗者の発した声（ユーザ発話）など、車載器１００の外部の音声を取得する。また、スピーカ４２は、演算処理装置１で生成された運転者などへの案内を音声として出力する。 The voice input / output device 4 includes a microphone 41 as a voice input device and a speaker 42 as a voice output device. The microphone 41 acquires sound outside the in-vehicle device 100 such as a voice (user utterance) uttered by a driver or a passenger. Further, the speaker 42 outputs the guidance to the driver and the like generated by the arithmetic processing device 1 as voice.

入力装置５は、ユーザからの指示入力を受け付ける装置である。入力装置５は、タッチパネル５１と、ダイヤルスイッチ５２と、その他のハードスイッチであるスクロールキー、など（図示せず）で構成されている。入力装置５は、各キーや各スイッチの操作に応じた情報を演算処理装置１など他の装置に出力する。 The input device 5 is a device that receives an instruction input from a user. The input device 5 includes a touch panel 51, a dial switch 52, scroll keys that are other hard switches, and the like (not shown). The input device 5 outputs information corresponding to the operation of each key or each switch to other devices such as the arithmetic processing device 1.

ＲＯＭ装置６は、ＣＤ-ＲＯＭやＤＶＤ-ＲＯＭなどのＲＯＭ、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カードなどの、少なくとも読み取りが可能な記憶媒体で構成されている。この記憶媒体には、例えば、動画データや、音声データなどが記憶されている。 The ROM device 6 is composed of at least a readable storage medium such as a ROM such as a CD-ROM or a DVD-ROM, or an IC (Integrated Circuit) card. In this storage medium, for example, moving image data, audio data, and the like are stored.

車速センサ７は、車速を算出するのに用いる値を出力するセンサである。ジャイロセンサ８は、光ファイバジャイロや振動ジャイロなどで構成され、移動体の回転による角速度を検出するセンサである。ＧＰＳ受信装置９は、ＧＰＳ衛星からの信号を受信し移動体とＧＰＳ衛星間の距離と距離の変化率とを３個以上の衛星に対して測定することで移動体の現在地、進行速度および進行方位を測定するものである。これらの各装置は、車載器１００が搭載された車両の現在地を検出するために演算処理装置で用いられる。 The vehicle speed sensor 7 is a sensor that outputs a value used to calculate the vehicle speed. The gyro sensor 8 is configured by an optical fiber gyro, a vibration gyro, or the like, and is a sensor that detects an angular velocity due to rotation of a moving body. The GPS receiver 9 receives a signal from a GPS satellite and measures the distance between the mobile body and the GPS satellite and the rate of change of the distance with respect to three or more satellites to thereby determine the current location, travel speed, and travel of the mobile body. It measures the direction. Each of these devices is used in an arithmetic processing unit to detect the current location of a vehicle on which the vehicle-mounted device 100 is mounted.

ＦＭ多重放送受信装置１０は、ＦＭ放送局から送られてくるＦＭ多重放送信号を受信する。ＦＭ多重放送には、ＶＩＣＳ（ＶｅｈｉｃｌｅＩｎｆｏｒｍａｔｉｏｎＣｏｍｍｕｎｉｃａｔｉｏｎＳｙｓｔｅｍ：登録商標）情報の概略現況交通情報、規制情報、ＳＡ／ＰＡ（サービスエリア／パーキングエリア）情報、駐車場情報、天気情報およびＦＭ多重一般情報としてラジオ局が提供する文字情報などが含まれている。 The FM multiplex broadcast receiver 10 receives an FM multiplex broadcast signal transmitted from an FM broadcast station. FM multiplex broadcasting includes VICS (Vehicle Information Communication System) information, as well as current traffic information, regulation information, SA / PA (service area / parking area) information, parking information, weather information, and FM multiplex general information. Contains text information provided by radio stations.

ビーコン受信装置１１は、ＶＩＣＳ情報などの概略現況交通情報、規制情報、ＳＡ／ＰＡ（サービスエリア／パーキングエリア）情報、駐車場情報、天気情報および緊急警報などを受信する。ビーコン受信装置１１には、例えば、光により通信する光ビーコン、電波により通信する電波ビーコンなどがある。 The beacon receiving device 11 receives rough current traffic information such as VICS information, regulation information, SA / PA (service area / parking area) information, parking lot information, weather information, emergency alerts, and the like. The beacon receiving device 11 includes, for example, an optical beacon that communicates by light and a radio beacon that communicates by radio waves.

通信装置１２は、外部装置（本例では、サーバ装置）との間で情報通信を行う。具体的には、通信装置１２は、ユーザ発話の音声情報を所定のサーバ装置に送信する。また、通信装置１２は、追加辞書登録情報をサーバ装置から受信する。 The communication device 12 performs information communication with an external device (a server device in this example). Specifically, the communication device 12 transmits the voice information of the user utterance to a predetermined server device. Further, the communication device 12 receives additional dictionary registration information from the server device.

ＣＡＮＩ／Ｆ１３は、車両内に設置されている各種車両機器との間で多重通信を行う車載ネットワーク（ＣＡＮ）に対して、情報の入出力を行うインターフェースである。なお、ＣＡＮＩ／Ｆ１３は、前述のＩ／Ｆ２４によって実現されても良い。 The CAN I / F 13 is an interface for inputting / outputting information to / from an in-vehicle network (CAN) that performs multiplex communication with various vehicle devices installed in the vehicle. The CAN I / F 13 may be realized by the I / F 24 described above.

以上、車載器１００のハードウェア構成について説明した。 The hardware configuration of the vehicle-mounted device 100 has been described above.

次に、サーバ装置２００のハードウェア構成について説明する。図２は、サーバ装置２００のハードウェア構成の一例を示した図である。サーバ装置２００は、例えば、ワークステーションやＰＣ（パーソナルコンピュータ）などの情報処理装置である。 Next, the hardware configuration of the server device 200 will be described. FIG. 2 is a diagram illustrating an example of a hardware configuration of the server device 200. The server apparatus 200 is an information processing apparatus such as a workstation or a PC (personal computer).

図示するように、サーバ装置２００は、演算装置２０１と、外部記憶装置２０２と、送受信装置２０３と、各々の装置を相互に接続するバス２０４とを有している。 As illustrated, the server device 200 includes an arithmetic device 201, an external storage device 202, a transmission / reception device 203, and a bus 204 that connects the respective devices to each other.

演算装置２０１は、数値演算及び各デバイスを制御するなど様々な処理を実行するＣＰＵ２１１と、外部記憶装置２０２または後述のＲＯＭ２１３から読み出した地図情報や演算データなどを格納するＲＡＭ２１２と、ＣＰＵ２１１が実現するブートプログラムやＣＰＵ２１１が実行するプログラム（例えば、音声認識（ＶＲ）機能を実現するプログラム）などを格納するＲＯＭ２１３と、各装置を相互に接続するバス２０４とを有している。 The arithmetic device 201 is realized by the CPU 211 that executes various processes such as numerical calculation and control of each device, the RAM 212 that stores map information and arithmetic data read from the external storage device 202 or the ROM 213 described later, and the CPU 211. It has a ROM 213 that stores a boot program, a program executed by the CPU 211 (for example, a program that realizes a voice recognition (VR) function), and a bus 204 that interconnects the devices.

外部記憶装置２０２は、例えば、ハードディスク装置やフラッシュメモリなどの不揮発性記憶装置である。 The external storage device 202 is a nonvolatile storage device such as a hard disk device or a flash memory.

送受信装置２０３は、例えば、外部装置（本例では、車載器１００）との間で情報通信を行う通信モジュールなどの装置である。 The transmission / reception device 203 is, for example, a device such as a communication module that performs information communication with an external device (in this example, the vehicle-mounted device 100).

以上、サーバ装置２００のハードウェア構成について説明した。 The hardware configuration of the server device 200 has been described above.

次に、車載器１００およびサーバ装置２００の機能構成を示す機能ブロックについて説明する。図３は、車載器１００およびサーバ装置２００の機能ブロックの一例を示した図である。車載器１００は、入力受付部３０１と、出力処理部３０２と、音声認識部３０３と、追加辞書登録部３０４と、記憶部３０５と、通信部３０６とを有している。 Next, functional blocks showing functional configurations of the vehicle-mounted device 100 and the server device 200 will be described. FIG. 3 is a diagram illustrating an example of functional blocks of the vehicle-mounted device 100 and the server device 200. The vehicle-mounted device 100 includes an input receiving unit 301, an output processing unit 302, a voice recognition unit 303, an additional dictionary registration unit 304, a storage unit 305, and a communication unit 306.

入力受付部３０１は、車載器１００が備える入力装置５を介して、ユーザからの指示や情報の入力を受け付ける機能部である。具体的には、入力受付部３０１は、ハードスイッチなどの入力装置５を介して音声認識処理の実行指示をユーザから受け付ける。より具体的には、入力受付部３０１は、ユーザによって所定のハードスイッチが押下されたことを検出すると、それを音声認識部３０３に通知してＰＴＴ（ＰｕｓｈＴｏＴａｌｋ）機能を開始する。 The input receiving unit 301 is a functional unit that receives an instruction or information input from the user via the input device 5 provided in the vehicle-mounted device 100. Specifically, the input receiving unit 301 receives a voice recognition processing execution instruction from the user via the input device 5 such as a hard switch. More specifically, when the input reception unit 301 detects that a predetermined hard switch has been pressed by the user, the input reception unit 301 notifies the voice recognition unit 303 of this and starts a PTT (Push To Talk) function.

出力処理部３０２は、車載器１００が備えるディスプレイ２に表示させる画面情報を生成する機能部である。具体的には、出力処理部３０２は、音声認識の結果を示す画面情報を生成し、これをディスプレイ２に出力する。例えば、ユーザ発話が「レストラン」や「銀行」など特定の場所についての表示要求である場合、出力処理部３０２は、地図情報から座標情報を取得し、かかる場所を示すアイコンを地図上に重ねて表示する画面情報を生成してディスプレイ２に出力する。 The output processing unit 302 is a functional unit that generates screen information to be displayed on the display 2 included in the vehicle-mounted device 100. Specifically, the output processing unit 302 generates screen information indicating the result of voice recognition and outputs the screen information to the display 2. For example, when the user utterance is a display request for a specific place such as “restaurant” or “bank”, the output processing unit 302 acquires coordinate information from the map information, and overlays an icon indicating the place on the map. Screen information to be displayed is generated and output to the display 2.

音声認識部３０３は、入力音声の音声認識を行う機能部である。具体的には、音声認識部３０３は、マイクロフォン４１を介して入力されたユーザ発話の音声情報を取得すると、音響モデルを用いて音響分析を行い、ＶＲ辞書を用いて入力音声の音声認識処理を実行する。また、音声認識部３０３は、音声認識処理の結果、ユーザ発話の音声情報をテキスト情報に変換する。なお、音声認識処理の方法については特に限定されるものではなく、公知の音声認識技術が用いられれば良い。 The speech recognition unit 303 is a functional unit that performs speech recognition of input speech. Specifically, when the voice recognition unit 303 acquires voice information of a user utterance input via the microphone 41, the voice recognition unit 303 performs acoustic analysis using an acoustic model, and performs voice recognition processing of the input voice using a VR dictionary. Run. Also, the voice recognition unit 303 converts voice information of the user utterance into text information as a result of the voice recognition process. Note that the voice recognition processing method is not particularly limited, and a known voice recognition technique may be used.

追加辞書登録部３０４は、追加辞書情報３１５を生成および更新する機能部である。具体的には、追加辞書登録部３０４は、通信部３０６を介してサーバ装置２００から取得した追加辞書登録情報を用いて、追加辞書情報３１５を生成および更新する。 The additional dictionary registration unit 304 is a functional unit that generates and updates additional dictionary information 315. Specifically, the additional dictionary registration unit 304 uses the additional dictionary registration information acquired from the server device 200 via the communication unit 306 to generate and update additional dictionary information 315.

記憶部３０５は、様々な情報を記憶する機能部である。具体的には、記憶部３０５は、地図情報３１１と、ＶＲ辞書３１２と、操作コマンド情報３１３と、待受語彙情報３１４と、追加辞書情報３１５とを有している。 The storage unit 305 is a functional unit that stores various information. Specifically, the storage unit 305 includes map information 311, a VR dictionary 312, operation command information 313, standby vocabulary information 314, and additional dictionary information 315.

地図情報３１１は、地図上の道路に関するリンク情報などを含むメッシュ領域情報を格納した道路の構成情報である。 The map information 311 is road configuration information storing mesh area information including link information related to roads on the map.

ＶＲ辞書３１２は、音素と単語とが対応付けられて登録されている辞書情報であり、音声認識部３０３がユーザ発話の音声情報を用いて音声認識処理を行う際に用いられる。 The VR dictionary 312 is dictionary information in which phonemes and words are registered in association with each other, and is used when the speech recognition unit 303 performs speech recognition processing using speech information of user utterances.

操作コマンド情報３１３は、車両機器の操作に関する情報である。図４（ａ）は、操作コマンド情報３１３の一例を示した図である。操作コマンド情報３１３は、コマンド群１およびコマンド群２から構成され、コマンド群１には車両機器を特定する情報が登録され、コマンド群２にはコマンド群１で特定された車両機器の操作内容を特定する情報が登録されている。 The operation command information 313 is information related to the operation of the vehicle device. FIG. 4A is a diagram illustrating an example of the operation command information 313. The operation command information 313 is composed of a command group 1 and a command group 2, information specifying a vehicle device is registered in the command group 1, and the operation content of the vehicle device specified in the command group 1 is registered in the command group 2. Information to be identified is registered.

具体的には、コマンド群１には、車両機器の識別情報である識別番号および車両機器名とが対応付けて登録されている。例えば、コマンド群１には、「１．エアコン」、「２．オーディオ」および「３．ナビゲーション」といった情報が登録されており、「１．」〜「３．」が車両機器の識別番号である。また、「エアコン」、「オーディオ」および「ナビゲーション」が車両機器名である。 Specifically, in the command group 1, an identification number that is identification information of a vehicle device and a vehicle device name are registered in association with each other. For example, information such as “1. air conditioner”, “2. audio”, and “3. navigation” is registered in the command group 1, and “1.” to “3.” are vehicle device identification numbers. . “Air conditioner”, “audio” and “navigation” are vehicle device names.

また、コマンド群２には、コマンド群１で特定された車両機器ごとに、操作識別番号と操作内容とが対応付けて登録されている。例えば、コマンド群１の「１．エアコン」に対応付けられたコマンド群２には、「１．ＯＮ」、「２．ＯＦＦ」、「３．温度を上げる」、「４．温度を下げる」、「５．風量強く」、「６．風量弱く」といった情報が登録されている。ここで、「１．」〜「６．」が操作識別番号である。また、「ＯＮ」、「温度を上げる」などが操作内容である。 In the command group 2, for each vehicle device specified in the command group 1, an operation identification number and an operation content are registered in association with each other. For example, the command group 2 associated with “1. air conditioner” in the command group 1 includes “1. ON”, “2. OFF”, “3. Increase temperature”, “4. Lower temperature”, Information such as “5. High air flow” and “6. Low air flow” is registered. Here, “1.” to “6.” are operation identification numbers. “ON”, “increase temperature”, and the like are operation details.

各車両機器の操作コマンドは、コマンド群１およびコマンド群２の識別番号の組合せにより特定される。例えば、「エアコンの温度を下げる」といった場合の操作コマンドは、（コマンド群１＝１、コマンド群２＝４）という識別番号の組合せによって特定される。 An operation command for each vehicle device is specified by a combination of identification numbers of the command group 1 and the command group 2. For example, the operation command in the case of “decrease the temperature of the air conditioner” is specified by a combination of identification numbers (command group 1 = 1, command group 2 = 4).

図３に戻って説明する。待受語彙情報３１４は、所定の語彙が登録された情報である。例えば、待受語彙情報３１４には、「目的地」、「自宅」、「画面明るさ」、「マップ（ＭＡＰ）」、「ルート」、「レストラン」および「銀行」といった所定の語彙が登録されており、音声認識部３０３により生成された発話テキスト情報の検索対象として用いられる。 Returning to FIG. The standby vocabulary information 314 is information in which a predetermined vocabulary is registered. For example, predetermined words such as “Destination”, “Home”, “Brightness”, “Map (MAP)”, “Route”, “Restaurant”, and “Bank” are registered in the standby vocabulary information 314. And is used as a search target of the utterance text information generated by the voice recognition unit 303.

追加辞書情報３１５は、テキスト情報であるユーザの発話音声と車両機器の操作コマンドとが対応付けられた情報である。図４（ｂ）は、追加辞書情報３１５の一例を示す情報である。具体的には、追加辞書情報３１５は、発話音声欄３２１と、コマンド群１欄３２２と、コマンド群２欄３２３とが対応付けられたレコードを有している。 The additional dictionary information 315 is information in which a user's utterance voice, which is text information, and an operation command of the vehicle device are associated with each other. FIG. 4B is information indicating an example of the additional dictionary information 315. Specifically, the additional dictionary information 315 has a record in which the speech voice field 321, the command group 1 field 322, and the command group 2 field 323 are associated with each other.

発話音声欄３２１に登録されている情報は、音声認識部３０３によってテキスト変換されたユーザ発話の内容を示す情報である。発話音声欄３２１には、例えば、「寒い」、「暑い」、「エアコン下げて」といったユーザ発話の内容を示す情報が登録されている。コマンド群１欄３２２に登録されている情報は、車両機器を特定する識別番号であり、操作コマンド情報３１３のコマンド群１の識別番号に対応している。コマンド群２欄３２３に登録されている情報は、車両機器の操作内容を特定する操作識別番号であり、操作コマンド情報３１３のコマンド群２の操作識別番号に対応している。このような追加辞書情報３１５は、サーバ装置２００から取得した追加辞書登録情報を用いて、追加辞書登録部３０４により生成される。 Information registered in the utterance voice column 321 is information indicating the contents of the user utterance text-converted by the voice recognition unit 303. In the utterance voice column 321, for example, information indicating the contents of the user utterance such as “cold”, “hot”, “lower the air conditioner” is registered. The information registered in the command group 1 column 322 is an identification number that identifies the vehicle device, and corresponds to the identification number of the command group 1 in the operation command information 313. The information registered in the command group 2 column 323 is an operation identification number that identifies the operation content of the vehicle device, and corresponds to the operation identification number of the command group 2 in the operation command information 313. Such additional dictionary information 315 is generated by the additional dictionary registration unit 304 using the additional dictionary registration information acquired from the server device 200.

図３に戻って説明する。通信部３０６は、外部装置（本例では、サーバ装置２００）との間で情報のやり取りを行う機能部である。具体的には、通信部３０６は、インターネットなどの所定のネットワーク網Ｎを介して、ユーザ発話の音声情報を所定のサーバ装置２００に送信する。また、通信部３０６は、ネットワーク網Ｎを介して、ユーザ発話のテキスト情報と車両機器の操作コマンドとを含む追加辞書登録情報をサーバ装置２００から受信する。 Returning to FIG. The communication unit 306 is a functional unit that exchanges information with an external device (the server device 200 in this example). Specifically, the communication unit 306 transmits voice information of user utterances to a predetermined server device 200 via a predetermined network N such as the Internet. Further, the communication unit 306 receives additional dictionary registration information including text information of user utterances and operation commands of vehicle equipment from the server device 200 via the network N.

サーバ装置２００は、操作コマンド特定部４０１と、出力部４０２と、追加辞書登録情報生成部４０３と、情報格納部４０４と、送受信部４０５とを有している。 The server device 200 includes an operation command specifying unit 401, an output unit 402, an additional dictionary registration information generation unit 403, an information storage unit 404, and a transmission / reception unit 405.

操作コマンド特定部４０１は、ユーザ発話の音声情報を用いて車両機器の操作コマンドを特定する機能部である。具体的には、操作コマンド特定部４０１は、送受信部４０５を介してユーザ発話の音声情報を車載器１００から取得すると、音声認識処理を行い、ユーザ発話の音声情報をテキスト情報に変換する。また、操作コマンド特定部４０１は、かかるテキスト情報を用いて、ユーザ発話が車両機器の操作を意図するものであるか否かを判定する。また、操作コマンド特定部４０１は、操作コマンド情報４１４を用いて、ユーザ発話が示す車両機器の操作コマンドを特定する。 The operation command specifying unit 401 is a functional unit that specifies an operation command of a vehicle device using voice information of a user utterance. Specifically, when the operation command specifying unit 401 acquires the voice information of the user utterance from the vehicle-mounted device 100 via the transmission / reception unit 405, the operation command specifying unit 401 performs a voice recognition process and converts the voice information of the user utterance into text information. In addition, the operation command specifying unit 401 uses the text information to determine whether or not the user utterance is intended to operate the vehicle device. Further, the operation command specifying unit 401 uses the operation command information 414 to specify the operation command of the vehicle device indicated by the user utterance.

出力部４０２は、ユーザ発話によって求められた情報を特定し、送受信部４０５を介して車載器１００に送信する機能部である。具体的には、出力部４０２は、ユーザ発話が車両機器の操作コマンドを示すものではなく、特定地点の表示要求である場合、地図情報４１１を用いて地点座標を特定し、送受信部４０５を介して車載器１００に送信する。 The output unit 402 is a functional unit that specifies information obtained by user utterance and transmits the information to the vehicle-mounted device 100 via the transmission / reception unit 405. Specifically, the output unit 402 specifies a point coordinate using the map information 411 when the user utterance is not a vehicle device operation command but a display request for a specific point, and the transmission / reception unit 405 is used. To the vehicle-mounted device 100.

追加辞書登録情報生成部４０３は、車載器１００の追加辞書情報に登録される情報を生成する機能部である。具体的には、追加辞書登録情報生成部４０３は、ユーザ発話のテキスト情報と、車両機器の操作コマンドとを含む追加辞書登録情報を生成し、送受信部４０５を介して車載器１００に送信する。 The additional dictionary registration information generation unit 403 is a functional unit that generates information registered in the additional dictionary information of the in-vehicle device 100. Specifically, the additional dictionary registration information generation unit 403 generates additional dictionary registration information including text information of user utterances and operation commands for vehicle equipment, and transmits the additional dictionary registration information to the in-vehicle device 100 via the transmission / reception unit 405.

情報格納部４０４は、様々な情報を格納する機能部である。具体的には、情報格納部４０４は、地図情報４１１と、ＶＲ辞書４１２と、文脈辞書４１３と、操作コマンド情報４１４とを格納している。なお、地図情報４１１、ＶＲ辞書４１２および操作コマンド情報４１４の各々は、車載器１００の記憶部３０５に格納されているものと同様であるため、説明を省略する。 The information storage unit 404 is a functional unit that stores various information. Specifically, the information storage unit 404 stores map information 411, a VR dictionary 412, a context dictionary 413, and operation command information 414. Note that the map information 411, the VR dictionary 412 and the operation command information 414 are the same as those stored in the storage unit 305 of the vehicle-mounted device 100, and thus the description thereof is omitted.

文脈辞書４１３は、文脈に応じた単語同士の結びつきを登録した辞書情報であり、操作コマンド特定部４０１がユーザ発話の音声情報を用いて音声認識処理を行う際に用いられる。 The context dictionary 413 is dictionary information in which associations between words corresponding to the context are registered, and is used when the operation command specifying unit 401 performs speech recognition processing using speech information of user utterances.

以上、車載器１００およびサーバ装置２００の機能ブロックについて説明した。なお、車載器１００の入力受付部３０１、出力処理部３０２、音声認識部３０３および追加辞書登録部３０４と、サーバ装置２００の操作コマンド特定部４０１、出力部４０２および追加辞書登録情報生成部４０３とは、各々のＣＰＵ２１、ＣＰＵ２１１に処理を行わせるプログラムによって実現される。このプログラムは各々、車載器１００のＲＯＭ２３または記憶装置３と、サーバ装置２００のＲＯＭ２１３または外部記憶装置２０２に格納されており、実行にあたってＲＡＭ２２、ＲＡＭ２１２上にロードされ、ＣＰＵ２１、ＣＰＵ２１１により実行される。なお、車載器１００のＶＲ辞書３１２は、記憶装置３に格納される場合に限られず、ＲＯＭ２３に格納されていても良い。また、サーバ装置２００のＶＲ辞書４１２および文脈辞書４１３は、外部記憶装置２０２に格納される場合に限られず、ＲＯＭ２１３に格納されていても良い。 The functional blocks of the vehicle-mounted device 100 and the server device 200 have been described above. Note that the input receiving unit 301, the output processing unit 302, the voice recognition unit 303, and the additional dictionary registration unit 304 of the vehicle-mounted device 100, the operation command specifying unit 401, the output unit 402, and the additional dictionary registration information generation unit 403 of the server device 200, Is realized by a program for causing the CPU 21 and the CPU 211 to perform processing. Each of these programs is stored in the ROM 23 or the storage device 3 of the vehicle-mounted device 100 and the ROM 213 or the external storage device 202 of the server device 200. The programs are loaded onto the RAM 22 and the RAM 212 and executed by the CPU 21 and the CPU 211. Note that the VR dictionary 312 of the vehicle-mounted device 100 is not limited to being stored in the storage device 3, and may be stored in the ROM 23. In addition, the VR dictionary 412 and the context dictionary 413 of the server device 200 are not limited to being stored in the external storage device 202, and may be stored in the ROM 213.

また、各機能ブロックは、本実施形態において実現される車載器１００およびサーバ装置２００の機能を理解容易にするために、主な処理内容に応じて分類したものである。したがって、各機能の分類の仕方やその名称によって、本発明が制限されることはない。また、車載器１００およびサーバ装置２００の各構成は、処理内容に応じて、さらに多くの構成要素に分類することもできる。また、１つの構成要素がさらに多くの処理を実行するように分類することもできる。 Each functional block is classified according to the main processing contents in order to facilitate understanding of the functions of the vehicle-mounted device 100 and the server device 200 realized in the present embodiment. Therefore, the present invention is not limited by the classification method of each function or its name. Moreover, each structure of the onboard equipment 100 and the server apparatus 200 can also be classified into many more components according to the processing content. Moreover, it can also classify | categorize so that one component may perform more processes.

また、各機能部の全部または一部は、コンピュータに実装されるハードウェア（ＡＳＩＣといった集積回路など）により構築されてもよい。また、各機能部の処理が１つのハードウェアで実行されてもよいし、複数のハードウェアで実行されてもよい。 In addition, all or part of each functional unit may be constructed by hardware (an integrated circuit such as an ASIC) mounted on a computer. Further, the processing of each functional unit may be executed by one hardware, or may be executed by a plurality of hardware.

なお、車載器１００の記憶部３０５は、記憶装置３によって実現される。また、車載器１００の通信部３０６は、通信装置１２によって実現される。また、サーバ装置２００の記憶部３０５は、外部記憶装置２０２によって実現される。また、サーバ装置２００の送受信部４０５は、送受信装置２０３によって実現される。 Note that the storage unit 305 of the vehicle-mounted device 100 is realized by the storage device 3. Further, the communication unit 306 of the vehicle-mounted device 100 is realized by the communication device 12. Further, the storage unit 305 of the server device 200 is realized by the external storage device 202. The transmission / reception unit 405 of the server device 200 is realized by the transmission / reception device 203.

［動作の説明］
次に、車載器１００で実行される音声認識処理について説明する。図５は、音声認識処理の流れの一例を示したフロー図である。入力受付部３０１は、入力装置５を介してユーザから音声認識処理の開始指示を受け付けると、かかる処理を開始する。 [Description of operation]
Next, the speech recognition process performed with the onboard equipment 100 is demonstrated. FIG. 5 is a flowchart showing an example of the flow of voice recognition processing. When the input receiving unit 301 receives a voice recognition processing start instruction from the user via the input device 5, the input receiving unit 301 starts such processing.

音声認識処理が開始されると、入力受付部３０１は、マイクロフォン４１を介してユーザ発話の音声入力を受け付けたか否かを判定する（ステップＳ００１）。また、入力受付部３０１は、ユーザ発話の音声入力を受け付けた場合（ステップＳ００１でＹｅｓ）、入力された音声情報を音声認識部３０３に受け渡す。 When the voice recognition process is started, the input receiving unit 301 determines whether or not a voice input of a user utterance has been received via the microphone 41 (step S001). In addition, when the input reception unit 301 receives a voice input of a user utterance (Yes in step S001), the input reception unit 301 transfers the input voice information to the voice recognition unit 303.

音声認識部３０３は、所定の音響モデルやＶＲ辞書を用いて音声認識処理を行い、入力受付部３０１を介して取得したユーザ発話の音声情報をテキスト変換した発話テキスト情報を生成する（ステップＳ００２）。 The speech recognition unit 303 performs speech recognition processing using a predetermined acoustic model or VR dictionary, and generates speech text information obtained by text-converting the speech information of the user utterance acquired through the input reception unit 301 (step S002). .

次に、音声認識部３０３は、発話テキスト情報に変換したユーザ発話の言葉が待受語彙情報３１４に登録されているか否かを判定する（ステップＳ００３）。具体的には、音声認識部３０３は、発話テキスト情報を用いて待受語彙情報３１４を検索する。待受語彙情報３１４に発話テキスト情報が示すユーザ発話の言葉が登録されている場合（ステップＳ００３でＹｅｓ）、出力処理部３０２は、検索された待受語彙に応じた所定の処理を実行する（ステップＳ００４）。例えば、発話テキスト情報が示すユーザ発話の言葉が「銀行」である場合、出力処理部３０２は、地図情報を用いて現在地周辺にある銀行の地点座標を取得し、かかる地点を示すマークを地図上に重ねた画面情報を生成する。また、出力処理部３０２は、生成した画面情報をディスプレイに出力する。 Next, the voice recognition unit 303 determines whether or not the user utterance words converted into the utterance text information are registered in the standby vocabulary information 314 (step S003). Specifically, the voice recognition unit 303 searches the standby vocabulary information 314 using the utterance text information. When the words of user utterances indicated by the utterance text information are registered in the standby vocabulary information 314 (Yes in step S003), the output processing unit 302 executes a predetermined process corresponding to the searched standby vocabulary ( Step S004). For example, when the word of the user utterance indicated by the utterance text information is “bank”, the output processing unit 302 acquires the point coordinates of the bank around the current location using the map information, and displays a mark indicating the point on the map. The screen information superimposed on is generated. The output processing unit 302 outputs the generated screen information to a display.

一方で、発話テキスト情報が示すユーザ発話の言葉が待受語彙情報３１４に登録されていない場合（ステップＳ００３でＮｏ）、音声認識部３０３は、かかる言葉が追加辞書情報３１５に登録されているか否かを判定する（ステップＳ００５）。具体的には、音声認識部３０３は、発話テキスト情報を用いて追加辞書情報３１５を検索する。 On the other hand, when the word of the user utterance indicated by the utterance text information is not registered in the standby vocabulary information 314 (No in step S003), the speech recognition unit 303 determines whether the word is registered in the additional dictionary information 315. Is determined (step S005). Specifically, the voice recognition unit 303 searches the additional dictionary information 315 using the utterance text information.

追加辞書情報３１５に発話テキスト情報が示すユーザ発話の言葉が登録されている場合（ステップＳ００５でＹｅｓ）、音声認識部３０３は、車両機器の操作指示を出力する（ステップＳ００６）。具体的には、音声認識部３０３は、発話テキスト情報と一致する発話音声が対応付けられた追加辞書情報３１５のレコードを特定する。また、音声認識部３０３は、特定したレコードのコマンド群１欄３２２およびコマンド群２欄３２３に格納されている識別番号を用いて操作コマンド情報３１３を参照し、対象となる車両機器および操作内容を特定する。また、音声認識部３０３は、ＣＡＮＩ／Ｆ１３を介して、特定した車両機器に対し、特定した操作内容の実行指示を出力する。 When the user utterance word indicated by the utterance text information is registered in the additional dictionary information 315 (Yes in step S005), the voice recognition unit 303 outputs an operation instruction for the vehicle device (step S006). Specifically, the voice recognition unit 303 specifies a record of the additional dictionary information 315 associated with the utterance voice that matches the utterance text information. Further, the voice recognition unit 303 refers to the operation command information 313 using the identification numbers stored in the command group 1 column 322 and the command group 2 column 323 of the specified record, and determines the target vehicle device and the operation content. Identify. Further, the voice recognition unit 303 outputs an execution instruction for the specified operation content to the specified vehicle device via the CAN I / F 13.

一方で、発話テキスト情報が示すユーザ発話の言葉が追加辞書情報３１５に登録されていない場合（ステップＳ００５でＮｏ）、音声認識部３０３は、入力受付部３０１から取得した音声情報を、通信部３０６を介してサーバ装置２００に送信し、本フローの処理を終了する。 On the other hand, when the word of the user utterance indicated by the utterance text information is not registered in the additional dictionary information 315 (No in step S005), the voice recognition unit 303 uses the voice information acquired from the input reception unit 301 as the communication unit 306. To the server apparatus 200, and the process of this flow is terminated.

以上、車載器１００で実行される音声認識処理について説明した。 In the above, the speech recognition process performed with the onboard equipment 100 was demonstrated.

次に、ユーザ発話の音声情報を車載器１００から取得したサーバ装置２００の操作コマンド特定処理について説明する。図６は、操作コマンド特定処理の流れの一例を示したフロー図である。かかる処理は、サーバ装置２００の操作コマンド特定部４０１が、送受信部４０５を介して、ユーザ発話の音声情報を車載器１００から取得すると開始される。すなわち、コマンド特定処理４０１は、ユーザ発話の言葉が車載器１００の待受語彙情報３１４および追加辞書情報３１５に登録されていない場合に行われる。 Next, the operation command specifying process of the server device 200 that acquires the voice information of the user utterance from the vehicle-mounted device 100 will be described. FIG. 6 is a flowchart showing an example of the flow of the operation command specifying process. Such processing is started when the operation command specifying unit 401 of the server device 200 acquires the voice information of the user utterance from the in-vehicle device 100 via the transmission / reception unit 405. That is, the command specifying process 401 is performed when the words of user utterances are not registered in the standby vocabulary information 314 and the additional dictionary information 315 of the in-vehicle device 100.

ユーザ発話の音声情報を取得すると、サーバ装置２００の操作コマンド特定部４０１は、所定の音響モデル、ＶＲ辞書４１２および文脈辞書４１３を用いて音声認識処理を行い、ユーザ発話の音声情報をテキスト変換した発話テキスト情報を生成する（ステップＳ０１１）。 When the voice information of the user utterance is acquired, the operation command specifying unit 401 of the server device 200 performs voice recognition processing using a predetermined acoustic model, the VR dictionary 412 and the context dictionary 413, and converts the voice information of the user utterance into text. Utterance text information is generated (step S011).

次に、操作コマンド特定部４０１は、発話テキスト情報を用いて、ユーザ発話の音声情報が車両機器の操作を示すものであるか否かを判定する（ステップＳ０１２）。具体的には、操作コマンド特定部４０１は、発話テキスト情報を用いて、ユーザ発話の内容が車両機器の操作を意図するものであるか、または、それ以外であるかを判定する。そして、車両機器の操作を意図するものではないと判定した場合（ステップＳ０１２でＮｏ）、操作コマンド特定部４０１は、音声情報に応じた所定の処理を実行する（ステップＳ０１３）。例えば、発話テキスト情報が「この辺りにある銀行３つ」である場合、操作コマンド特定部４０１は、現在地から最も近い３つの銀行の地点座標を地図情報４１１から取得し、送受信部４０５を介して車載器１００に送信する。 Next, the operation command specifying unit 401 uses the utterance text information to determine whether or not the voice information of the user utterance indicates an operation of the vehicle device (step S012). Specifically, the operation command specifying unit 401 uses the utterance text information to determine whether the content of the user utterance is intended for the operation of the vehicle device or otherwise. When it is determined that the operation of the vehicle device is not intended (No in step S012), the operation command specifying unit 401 executes a predetermined process corresponding to the voice information (step S013). For example, when the utterance text information is “three banks in this area”, the operation command specifying unit 401 acquires the point coordinates of the three banks closest to the current location from the map information 411, and transmits via the transmission / reception unit 405. It transmits to the onboard equipment 100.

一方で、ユーザ発話の内容が車両機器の操作を意図するものであると判定した場合（ステップＳ０１２でＹｅｓ）、操作コマンド特定部４０１は、操作コマンド情報４１４を用いて車両機器の操作コマンドを特定する（ステップＳ０１４）。例えば、発話テキスト情報が「暑い」である場合、操作コマンド特定部４０１は、操作対象の車両機器が室温調整を行う「エアコン」であることを特定し、コマンド群１から識別番号「１」を特定する。また、操作コマンド特定部４０１は、かかる発話テキスト情報から、操作内容が「温度を下げる」であることを特定し、コマンド群２から操作識別番号「４」を特定する。このようにして、操作コマンド特定部４０１は、「暑い」というユーザ発話に対応する操作コマンド「１」および「４」を特定する。 On the other hand, when it is determined that the content of the user utterance is intended to operate the vehicle device (Yes in step S012), the operation command specifying unit 401 specifies the operation command of the vehicle device using the operation command information 414. (Step S014). For example, when the utterance text information is “hot”, the operation command specifying unit 401 specifies that the vehicle device to be operated is an “air conditioner” that adjusts the room temperature, and assigns the identification number “1” from the command group 1. Identify. Further, the operation command specifying unit 401 specifies that the operation content is “decrease temperature” from the utterance text information, and specifies the operation identification number “4” from the command group 2. In this way, the operation command specifying unit 401 specifies the operation commands “1” and “4” corresponding to the user utterance “hot”.

次に、追加辞書登録情報生成部４０３は、追加辞書登録情報を生成し（ステップＳ０１５）、送受信部４０５を介して車載器１００に送信する（ステップＳ０１６）。具体的には、追加辞書登録情報生成部４０５は、ユーザ発話の音声情報に基づいて生成した発話テキスト情報と、発話テキスト情報を用いて特定した車両機器の操作コマンドとを含む追加辞書登録情報を生成し、送受信部４０５を介して車載器１００に送信する。 Next, the additional dictionary registration information generation unit 403 generates additional dictionary registration information (step S015) and transmits it to the vehicle-mounted device 100 via the transmission / reception unit 405 (step S016). Specifically, the additional dictionary registration information generation unit 405 generates additional dictionary registration information including utterance text information generated based on the voice information of the user utterance and the operation command of the vehicle device specified using the utterance text information. Generated and transmitted to the vehicle-mounted device 100 via the transmission / reception unit 405.

以上、サーバ装置２００で実行される車両機器の操作コマンド特定処理について説明した。 The vehicle device operation command specifying process executed by the server device 200 has been described above.

次に、追加辞書登録情報をサーバ装置２００から取得した車載器１００の追加辞書登録処理について説明する。図７は、追加辞書登録処理の流れの一例を示したフロー図である。かかる処理は、車載器１００の追加辞書登録部３０４が、通信部３０６を介して、追加辞書登録情報をサーバ装置２００から取得すると開始される。 Next, an additional dictionary registration process of the vehicle-mounted device 100 that acquires additional dictionary registration information from the server device 200 will be described. FIG. 7 is a flowchart showing an example of the flow of additional dictionary registration processing. Such processing is started when the additional dictionary registration unit 304 of the vehicle-mounted device 100 acquires additional dictionary registration information from the server device 200 via the communication unit 306.

車載器１００の追加辞書登録部３０４は、取得した追加辞書登録情報を用いて追加辞書情報３１５を生成および更新する（ステップＳ０２１）。具体的には、追加辞書登録部３０４は、追加辞書登録情報から発話テキスト情報を抽出し、追加辞書情報３１５の発話音声欄３２１に格納する。また、追加辞書登録部３０４は、追加辞書登録情報から操作コマンドを抽出し、コマンド群１欄３２２およびコマンド群２欄３２３に各々格納する。例えば、本例では、追加辞書登録部３０４は、「暑い」という発話テキスト情報を追加辞書情報３１５の発話音声欄３２１に格納し、操作コマンド「１」および「４」を各々、コマンド群１欄３２２およびコマンド群２欄３２３に格納する。 The additional dictionary registration unit 304 of the in-vehicle device 100 generates and updates the additional dictionary information 315 using the acquired additional dictionary registration information (step S021). Specifically, the additional dictionary registration unit 304 extracts utterance text information from the additional dictionary registration information and stores it in the utterance voice column 321 of the additional dictionary information 315. Further, the additional dictionary registration unit 304 extracts operation commands from the additional dictionary registration information and stores them in the command group 1 column 322 and the command group 2 column 323, respectively. For example, in this example, the additional dictionary registration unit 304 stores the utterance text information “hot” in the utterance voice column 321 of the additional dictionary information 315, and the operation commands “1” and “4” respectively in the command group 1 column. 322 and command group 2 column 323.

次に、音声認識部３０３は、コマンド群１およびコマンド群２によって特定された車両機器の操作の実行指示を対応する車両機器に出力する（ステップＳ０２２）。具体的には、音声認識部３０３は、追加辞書登録部３０４によって生成および更新された追加辞書情報３１５のレコードを特定する。また、音声認識部３０３は、特定したレコードのコマンド群１欄３２２およびコマンド群２欄３２３に格納されている識別番号を特定し、かかる識別番号を用いて、操作コマンド情報３１３から対象の車両機器および操作内容を特定する。また、音声認識部３０３は、特定した車両機器に対して、特定した操作内容の実行指示を出力する。本例では、音声認識部３０３は、対象の車両機器であるエアコンに対し、温度を下げる指示を出力する。また、音声認識部３０３は、実行指示を出力すると、本フローの処理を終了する。 Next, the voice recognition unit 303 outputs an execution instruction for the operation of the vehicle device specified by the command group 1 and the command group 2 to the corresponding vehicle device (step S022). Specifically, the voice recognition unit 303 identifies the record of the additional dictionary information 315 generated and updated by the additional dictionary registration unit 304. In addition, the voice recognition unit 303 identifies the identification numbers stored in the command group 1 column 322 and the command group 2 column 323 of the identified record, and uses the identification number to identify the target vehicle device from the operation command information 313. And identify the operation. In addition, the voice recognition unit 303 outputs an execution instruction for the specified operation content to the specified vehicle device. In this example, the voice recognition unit 303 outputs an instruction to lower the temperature to an air conditioner that is a target vehicle device. Further, when the voice recognition unit 303 outputs the execution instruction, the process of this flow ends.

以上、追加辞書登録処理について説明した。 The additional dictionary registration process has been described above.

このような音声認識システムによれば、車両機器の操作を指示するユーザ発話があった場合でも、追加辞書情報に車両機器の操作コマンドが登録されている場合には、サーバ装置２００に音声情報を送信して操作コマンドを取得しなくても、車載器１００側で車両機器を操作コマンドを特定することができる。これにより、本発明に係る音声認識システムでは、サーバ装置２００に音声情報を送信する機会を減らすことができるため、通信コストの負担を低減させることができる。また、次回以降は車載器１００側で操作コマンドを特定できるため、サーバ装置２００からの応答時間を省略でき、ユーザ発話に対する応答速度を向上させることができる。 According to such a voice recognition system, even when there is a user utterance instructing the operation of the vehicle device, if the operation command for the vehicle device is registered in the additional dictionary information, the voice information is sent to the server device 200. Even if it does not transmit and acquire the operation command, the vehicle apparatus can specify the operation command on the vehicle-mounted device 100 side. Thereby, in the speech recognition system according to the present invention, the opportunity to transmit speech information to the server device 200 can be reduced, so that the communication cost burden can be reduced. Since the operation command can be specified on the vehicle-mounted device 100 side after the next time, the response time from the server device 200 can be omitted, and the response speed to the user utterance can be improved.

また、音声認識システムでは、ユーザ発話の音声情報を音声認識し、ユーザ発話のテキスト情報に車両機器の操作コマンドを対応付けた追加辞書情報を生成する。したがって、言い回しの違う複数の言葉の各々に対して、共通する１つの操作コマンドを対応付けることができる。このような追加辞書情報を用いることで、車載器１００は、ユーザの言い回しの癖を吸収して、１つの操作コマンドを特定することができる。すなわち、車両機器の操作を指示する場合、ユーザは、操作コマンドとして特定の言葉を覚えている必要がなくなる。 The voice recognition system also recognizes voice information of user utterances, and generates additional dictionary information in which operation commands for vehicle equipment are associated with text information of user utterances. Therefore, one common operation command can be associated with each of a plurality of words having different expressions. By using such additional dictionary information, the vehicle-mounted device 100 can identify a single operation command by absorbing the habit of the user's wording. That is, when instructing the operation of the vehicle device, the user does not need to remember a specific word as the operation command.

また、通常、地図情報のアップデートに伴い待受語彙情報に登録されている言葉も更新されるが、本発明では、車両機器の操作コマンドが登録されている追加辞書情報は、待受語彙情報から独立して格納されているため、かかるアップデートの影響を受けることもない。 Usually, the words registered in the standby vocabulary information are also updated with the update of the map information. However, in the present invention, the additional dictionary information in which the operation commands of the vehicle equipment are registered is obtained from the standby vocabulary information. Because they are stored independently, they are not affected by such updates.

なお、本発明は前述の実施形態に限られるものではなく、様々な変形例が可能である。例えば、車載器１００の記憶部３０５は、追加辞書情報３１５を更新する際、追加辞書情報３１５に登録された情報が所定数以上（例えば、１００個以上）であるか否かを判定し、所定数以上であると判定した場合、待受語彙情報３１４から使用頻度の低い語彙を削除し、追加辞書情報３１５の使用領域を増加させても良い。 The present invention is not limited to the above-described embodiment, and various modifications can be made. For example, when updating the additional dictionary information 315, the storage unit 305 of the vehicle-mounted device 100 determines whether or not the information registered in the additional dictionary information 315 is greater than or equal to a predetermined number (for example, 100 or more). When it is determined that the number is greater than or equal to the number, the vocabulary that is not frequently used may be deleted from the standby vocabulary information 314 to increase the usage area of the additional dictionary information 315.

また、記憶部３０５は、追加辞書情報３１５に登録された発話音声および操作コマンドが所定数以上（例えば、１００個以上）となった場合、使用頻度の低い情報から順に削除しても良い。 Further, the storage unit 305 may delete the utterance voices and operation commands registered in the additional dictionary information 315 in order from the least frequently used information when the predetermined number or more (for example, 100 or more).

このような音声認識システムの車載器によれば、追加辞書情報３１５の使用領域を必要に応じて増加させることができる。 According to the vehicle-mounted device of such a speech recognition system, the usage area of the additional dictionary information 315 can be increased as necessary.

また、本発明は、上記の実施形態や変形例などに限られるものではなく、これら以外にも様々な実施形態および変形例が含まれる。例えば、上記の実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施形態の構成の一部を他の実施形態や変形例の構成に置き換えることが可能であり、ある実施形態の構成に他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Further, the present invention is not limited to the above-described embodiments and modifications, and includes various embodiments and modifications in addition to these. For example, the above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. In addition, a part of the configuration of an embodiment can be replaced with the configuration of another embodiment or a modification, and the configuration of another embodiment can be added to the configuration of a certain embodiment. In addition, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

また、上記の各構成、機能、処理部および処理手段などは、それらの一部または全部を、プロセッサが各々の機能を実現するプログラムにより実現しても良い。各機能を実現するプログラム、テーブル、ファイルなどの情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などの記憶装置、または、ＩＣカード、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリカード、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）などの記録媒体に置くことができる。なお、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。 In addition, each or all of the above-described configurations, functions, processing units, processing means, and the like may be realized by a program in which the processor realizes each function. Information such as programs, tables, and files for realizing each function is stored in a memory, a storage device such as a hard disk or SSD (Solid State Drive), an IC card, an SD (Secure Digital) memory card, or a DVD (Digital Versatile Disk). Can be placed on a recording medium. Note that the control lines and information lines are those that are considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown.

１００・・・車載器、１・・・演算処理装置、２・・・ディスプレイ、３・・・記憶装置、４・・・音声入出力装置、４１・・・マイクロフォン、４２・・・スピーカ、
５・・・入力装置、５１・・・タッチパネル、５２・・・ダイヤルスイッチ、
６・・・ＲＯＭ装置、７・・・車速センサ、８・・・ジャイロセンサ、
９・・・ＧＰＳ受信装置、１０・・・ＦＭ多重放送受信装置、１１・・・ビーコン受信装置、
１２・・・通信装置、１３・・・ＣＡＮＩ／Ｆ、
２００・・・サーバ装置、２０１・・・演算装置、２１１・・・ＣＰＵ、
２１２・・・ＲＡＭ、２１３・・・ＲＯＭ、２０２・・・外部記憶装置、
２０３・・・送受信装置、２０４・・・バス DESCRIPTION OF SYMBOLS 100 ... Onboard equipment, 1 ... Arithmetic processing device, 2 ... Display, 3 ... Memory | storage device, 4 ... Voice input / output device, 41 ... Microphone, 42 ... Speaker,
5 ... input device, 51 ... touch panel, 52 ... dial switch,
6 ... ROM device, 7 ... vehicle speed sensor, 8 ... gyro sensor,
9 ... GPS receiver, 10 ... FM multiplex broadcast receiver, 11 ... beacon receiver,
12 ... Communication device, 13 ... CAN I / F,
200: server device, 201: arithmetic device, 211: CPU,
212 ... RAM, 213 ... ROM, 202 ... external storage device,
203 ... Transceiver, 204 ... Bus

Claims

A speech recognition system including an on-vehicle device and a server device,
The in-vehicle device is
When the words of the user utterance are not registered in the additional dictionary information that identifies the vehicle device and the operation content, the voice information of the user utterance is transmitted to the server device,
The server device
When it is determined whether the audio information acquired from the vehicle-mounted device is intended for operation of the vehicle device, and, as a result of the determination, when it is determined that the operation of the vehicle device is intended, the user Sending additional dictionary registration information that associates the words of the utterance and information indicating the operation command of the vehicle device to the vehicle-mounted device,
The in-vehicle device is
Using the additional dictionary registration information acquired from the server device, to generate additional dictionary information that associates the words of the user utterance and the operation command,
An operation execution instruction is output to the corresponding vehicle device using the operation command.

The speech recognition system according to claim 1,
The in-vehicle device is
A storage unit that stores additional dictionary information in which words indicating the operation of the vehicle device and an operation command combining the identification information of the vehicle device and the operation content are associated;
An input receiving unit that receives input of voice information of a user utterance;
Generating speech text information obtained by text-converting the speech information, and searching the additional dictionary information using the speech text information;
As a result of the search, when a word of a user utterance indicated by the text information is not registered in the additional dictionary information, a communication unit that transmits voice information of the user utterance to the server device,
The server device
It includes an operation command specifying unit that recognizes voice information of a user utterance acquired from the in-vehicle device and generates utterance text information obtained by converting the voice information into text,
The operation command specifying unit includes:
A speech recognition system for determining whether or not a word indicated by the user utterance is intended to operate a vehicle device mounted on the vehicle-mounted device, using the utterance text information.

The speech recognition system according to claim 2,
The server device
An information storage unit storing operation command information in which the vehicle device and operation content are associated with identification information;
An additional dictionary registration information generating unit for generating additional dictionary registration information including information related to the operation command of the vehicle device,
The operation command specifying unit includes:
When it is determined that the word indicated by the user utterance is intended to operate the vehicle device, the operation command of the vehicle device is identified using the operation command information,
The additional dictionary registration information generation unit
Sending additional dictionary registration information in which the operation command is associated with the utterance text information to the in-vehicle device,
The in-vehicle device is
Voice recognition, further comprising: an additional dictionary generation unit that generates and updates the additional dictionary information using the additional dictionary registration information when the additional dictionary registration information is acquired from the server device via the communication unit system.

The speech recognition system according to claim 3,
The voice recognition unit
Using the utterance text information, the additional dictionary information, and the operation command information, specify the target vehicle device and the operation content,
A voice recognition system that outputs an instruction to execute the operation content to the identified vehicle device.

A storage unit that stores additional dictionary information in which words indicating the operation of the vehicle device and an operation command combining the identification information of the vehicle device and the operation content are associated;
An input receiving unit that receives input of voice information of a user utterance;
Generating speech text information obtained by text-converting the speech information, and searching the additional dictionary information using the speech text information;
A communication unit that transmits voice information of the user utterance to a predetermined server device when a word of the user utterance indicated by the text information is not registered in the additional dictionary information as a result of the search. Onboard equipment.

The vehicle-mounted device according to claim 5,
The storage unit
It further has operation command information that associates vehicle equipment and operation content with identification information,
The voice recognition unit
When words of user utterances indicated by the utterance text information are registered in the additional dictionary information, using the additional dictionary information and the operation command information, specify a target vehicle device and operation content,
An in-vehicle device that outputs an instruction to execute the operation content to the identified vehicle device.

The vehicle-mounted device according to claim 6,
The vehicle-mounted device further comprising: an additional dictionary generation unit that generates and updates the additional dictionary information using the additional dictionary registration information when additional dictionary registration information is acquired from the server device via the communication unit .

The vehicle-mounted device according to claim 7,
The storage unit
When updating the additional dictionary information, it is determined whether or not the information registered in the additional dictionary information is greater than or equal to a predetermined number. If it is determined that the information is greater than or equal to the predetermined number, the standby where the predetermined vocabulary is registered is determined. A vehicle-mounted device that deletes vocabulary that is less frequently used from vocabulary information.

It includes an operation command specifying unit that recognizes voice information of a user utterance acquired from the in-vehicle device and generates utterance text information obtained by converting the voice information into text,
The operation command specifying unit includes:
The server apparatus characterized by using the said utterance text information, and determining whether the word which the said user utterance intends operation of the vehicle equipment mounted in the said onboard equipment.

The server device according to claim 9,
An information storage unit storing operation command information in which the vehicle device and operation content are associated with identification information;
An additional dictionary registration information generating unit for generating additional dictionary registration information including information related to the operation command of the vehicle device,
The operation command specifying unit includes:
When it is determined that the word indicated by the user utterance is intended to operate the vehicle device, the operation command of the vehicle device is identified using the operation command information,
The additional dictionary registration information generation unit
A server device that generates additional dictionary registration information in which the operation command is associated with the utterance text information.