JP7278830B2

JP7278830B2 - TERMINAL DEVICE, TERMINAL DEVICE CONTROL METHOD, AND PROGRAM

Info

Publication number: JP7278830B2
Application number: JP2019059873A
Authority: JP
Inventors: 慎一菊池; 昌宏暮橋; 正樹栗原; 裕本田
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2023-05-22
Anticipated expiration: 2039-03-27
Also published as: JP2020160281A; CN111755007A; CN111755007B

Description

本発明は、端末装置、端末装置の制御方法、およびプログラムに関する。 The present invention relates to a terminal device, a terminal device control method, and a program.

従来、車両において乗員により発せられた音声を認識する技術について研究が進められている。特許文献１には、ユーザーの発話に基づく音声信号が入力される音声信号入力部と、ユーザーの手動操作に基づく操作信号が入力される操作信号入力部と、音声信号に含まれるノイズに関する指標であるノイズ指標を算出するＳＮ比算出部と、音声信号または操作信号のいずれかを入力信号として受け付け、入力信号に基づく制御を実行する制御部と、を備え、制御部は、ＳＮ比算出部が算出するノイズ指標に基づいて、次の操作入力において音声信号、または操作信号のいずれを入力信号として受け付けるかを決定する車載装置の発明が開示されている。 Conventionally, research has been conducted on technology for recognizing voices uttered by passengers in vehicles. Patent Document 1 describes an audio signal input unit for inputting an audio signal based on a user's utterance, an operation signal input unit for inputting an operation signal based on a user's manual operation, and an index for noise included in the audio signal. An SN ratio calculation unit that calculates a certain noise index, and a control unit that accepts either an audio signal or an operation signal as an input signal and executes control based on the input signal, wherein the control unit is configured such that the SN ratio calculation unit An invention of an in-vehicle device is disclosed that determines which of an audio signal or an operation signal is to be accepted as an input signal in the next operation input based on a calculated noise index.

特開２０１７－１０２８２２号公報JP 2017-102822 A

音声認識の分野では、端末装置からネットワークを介してサーバ装置に音声が送信され、サーバ装置において音声認識が行われ、その結果が端末装置に返信される仕組みが主流である。しかしながら、従来の技術では、複数のサーバ装置に対して選択的に、或いは並行して音声認識を依頼するための仕組みについて十分に検討されていなかった。このため、従来の技術では、音声認識機能を有する複数のサーバ装置を有効に活用することができない場合があった。 In the field of speech recognition, the main mechanism is that speech is transmitted from a terminal device to a server device via a network, speech recognition is performed in the server device, and the result is sent back to the terminal device. However, in the conventional technology, sufficient consideration has not been given to a mechanism for requesting speech recognition selectively or in parallel to a plurality of server devices. For this reason, with the conventional technology, there were cases where it was not possible to effectively utilize a plurality of server devices having a speech recognition function.

本発明は、このような事情を考慮してなされたものであり、音声認識機能を有し、互いに異なる複数のサーバ装置を有効に活用することができる端末装置、端末装置の制御方法、およびプログラムを提供することを目的の一つとする。 SUMMARY OF THE INVENTION The present invention has been made in consideration of such circumstances, and includes a terminal device, a terminal device control method, and a program that have a speech recognition function and can effectively utilize a plurality of mutually different server devices. One of the purposes is to provide

この発明に係る端末装置、端末装置の制御方法、およびプログラムは、以下の構成を採用した。 A terminal device, a terminal device control method, and a program according to the present invention employ the following configuration.

（１）：本発明の一態様に係る端末装置は、車両に搭載される端末装置であって、マイクによって収音された車室内の音声に対して、音声認識機能を有する複数のサーバ装置のうち二以上のサーバ装置のそれぞれに応じた前処理を行う二以上の前処理部と、前記二以上の前処理部のそれぞれにより前処理が行われた音声を、通信部を用いて、対応するサーバ装置に送信する通信制御部と、を備えるものである。 (1): A terminal device according to an aspect of the present invention is a terminal device mounted in a vehicle, and a plurality of server devices having a voice recognition function for voices in the vehicle interior picked up by microphones. Two or more preprocessing units that perform preprocessing corresponding to each of two or more server devices, and a communication unit that processes audio preprocessed by each of the two or more preprocessing units and a communication control unit for transmitting to the server device.

（２）：上記（１）の態様において、前記通信制御部は、前記複数のサーバ装置のうち第１のサーバ装置に対しては、少なくとも、前記前処理部による前処理が行われなかった音声を送信するものである。 (2): In the aspect of (1) above, the communication control unit controls at least the first server device among the plurality of server devices for the voice that has not been preprocessed by the preprocessing unit. is to be sent.

（３）：上記（２）の態様において、前記二以上の前処理部のうち、前記第１のサーバ装置以外のサーバ装置に応じた前処理を行う前処理部を実現する第１のＯＳと、前記前処理部による前処理が行われなかった音声を取り出すための第２のＯＳと、を搭載しているものである。 (3): In the aspect of (2) above, a first OS that realizes a preprocessing unit that performs preprocessing according to a server device other than the first server device among the two or more preprocessing units; , and a second OS for retrieving audio that has not been preprocessed by the preprocessing unit.

（４）：上記（１）から（３）の態様において、前記二以上の前処理部のうち一部または全部は、シーケンシャルに処理を行うものである。 (4): In the above aspects (1) to (3), some or all of the two or more pretreatment units perform treatment sequentially.

（５）：上記（１）から（４）の態様において、前記複数のサーバ装置のそれぞれに対応した複数の前記通信制御部を備えるものである。 (5): In the aspects (1) to (4) above, a plurality of communication control units are provided corresponding to each of the plurality of server devices.

（６）：本発明の他の態様に係る端末装置の制御方法は、車両に搭載される端末装置の制御方法であって、前記端末装置が備える二以上の前処理部のそれぞれが、マイクによって収音された車室内の音声に対して、音声認識機能を有する複数のサーバ装置のうち二以上のサーバ装置のそれぞれに応じた前処理を行い、前記二以上の前処理部のそれぞれにより前処理が行われた音声を、通信部を用いて、対応するサーバ装置に送信するものである。 (6): A method of controlling a terminal device according to another aspect of the present invention is a method of controlling a terminal device mounted on a vehicle, wherein each of two or more preprocessing units provided in the terminal device is controlled by a microphone. Preprocessing is performed on the sound collected in the vehicle interior according to each of two or more server devices out of a plurality of server devices having a speech recognition function, and preprocessing is performed by each of the two or more preprocessing units. is transmitted to the corresponding server device using the communication unit.

（７）：本発明の他の態様に係るプログラムは、車両に搭載される端末装置により実行されるプログラムであって、前記端末装置に、マイクによって収音された車室内の音声に対して、音声認識機能を有する複数のサーバ装置のうち二以上のサーバ装置のそれぞれに応じた前処理を行わせ、前記複数のサーバ装置のそれぞれに応じた前処理が行われた音声を、通信部を用いて、対応するサーバ装置に送信させるものである。 (7): A program according to another aspect of the present invention is a program executed by a terminal device mounted on a vehicle, wherein the terminal device receives voice in the vehicle cabin picked up by a microphone, Preprocessing corresponding to each of two or more server devices out of a plurality of server devices having a voice recognition function is performed, and the preprocessed voice corresponding to each of the plurality of server devices is transmitted using a communication unit. and is transmitted to the corresponding server device.

上記（１）～（７）の態様によれば、音声認識機能を有し、互いに異なる複数のサーバ装置を有効に活用することができる。 According to the aspects (1) to (7) above, it is possible to effectively utilize a plurality of different server devices each having a speech recognition function.

端末装置１００を含むサービスシステム１の構成図である。1 is a configuration diagram of a service system 1 including a terminal device 100; FIG. 第１実施形態に係る端末装置１００の構成と、車両Ｍに搭載された機器とを示す図である。It is a figure which shows the structure of the terminal device 100 which concerns on 1st Embodiment, and the apparatus mounted in the vehicle M. FIG. 表示・操作装置２０の配置例を示す図である。FIG. 2 is a diagram showing an arrangement example of a display/operation device 20; スピーカユニット３０の配置例を示す図である。3 is a diagram showing an example of arrangement of speaker units 30. FIG. サーバ装置２００の構成の一例を示す図である。2 is a diagram illustrating an example of a configuration of a server device 200; FIG. 前処理部により実行される処理について説明するための図である。FIG. 4 is a diagram for explaining processing executed by a preprocessing unit; FIG.

以下、図面を参照し、本発明の端末装置、端末装置の制御方法、およびプログラムの実施形態について説明する。端末装置は、サービスシステムの一部または全部を実現する装置である。端末装置は、例えば、車両（以下、車両Ｍ）に搭載される。サービスシステムとは、車両Ｍの車室内で収集された音声をサーバ装置に送信し、サーバ装置で音声認識を含めた情報処理が行われた結果を車両Ｍに返信し、車両Ｍで何らかのサービス（情報提供、機器制御など、如何なるものでもよい）が提供されるシステムである。 Embodiments of a terminal device, a terminal device control method, and a program according to the present invention will be described below with reference to the drawings. A terminal device is a device that implements part or all of a service system. The terminal device is mounted, for example, on a vehicle (hereinafter referred to as vehicle M). The service system transmits voices collected in the vehicle interior of the vehicle M to the server device, returns the result of information processing including voice recognition performed by the server device to the vehicle M, and provides some service ( It is a system that provides information, equipment control, etc.).

サービスシステムでは、例えば、乗員の音声を認識する音声認識機能（音声をテキスト化する機能）、自然言語処理機能（テキストの構造や意味を理解する機能）、その他各種サービス機能等を統合的に利用して実現される。これらの機能の一部または全部は、ＡＩ（Artificial Intelligence）技術によって実現されてよい。 In the service system, for example, a speech recognition function that recognizes the voice of the passenger (a function that converts voice into text), a natural language processing function (a function that understands the structure and meaning of text), and other various service functions are used in an integrated manner. is realized by Some or all of these functions may be realized by AI (Artificial Intelligence) technology.

［全体構成］
図１は、端末装置１００を含むサービスシステム１の構成図である。サービスシステム１は、例えば、端末装置１００と、複数のサーバ装置２００－１、２００－２、２００－３、２００－４…とを備える。符号の末尾のハイフン以下数字は、サービスを区別するための識別子であるものとする。いずれのサーバ装置であるかを区別しない場合、単にサーバ装置２００と称する場合がある。図１では３つのサーバ装置２００を示しているが、サーバ装置２００の数は２つ以下であってもよいし、３つ以上であってもよい。それぞれのサーバ装置２００は、互いに異なるサービスの提供者が運営するものである。従って、本発明におけるサービスは、互いに異なる提供者により実現されるものである。任意の主体（法人、団体、個人等）がサービスの提供者となり得る。 [overall structure]
FIG. 1 is a configuration diagram of a service system 1 including a terminal device 100. As shown in FIG. The service system 1 includes, for example, a terminal device 100 and a plurality of server devices 200-1, 200-2, 200-3, 200-4, . The numbers following the hyphen at the end of the code are assumed to be identifiers for distinguishing services. When not distinguishing which server device it is, it may simply be referred to as the server device 200 . Although three server devices 200 are shown in FIG. 1, the number of server devices 200 may be two or less, or may be three or more. Each server device 200 is operated by a different service provider. Accordingly, services in the present invention are realized by different providers. Any entity (corporation, organization, individual, etc.) can be a service provider.

端末装置１００は、ネットワークＮＷを介してサーバ装置２００と通信する。ネットワークＮＷは、例えば、インターネット、セルラー網、Ｗｉ－Ｆｉ網、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、公衆回線、電話回線、無線基地局などのうち一部または全部を含む。 The terminal device 100 communicates with the server device 200 via the network NW. The network NW includes, for example, some or all of the Internet, cellular network, Wi-Fi network, WAN (Wide Area Network), LAN (Local Area Network), public line, telephone line, wireless base station, and the like.

端末装置１００は、車両Ｍの乗員により発せられた音声をサーバ装置２００に送信し、サーバ装置２００から返信された情報に基づいて乗員に任意のサービスを提供する。サービスは、単に音声認識された内容を表示するものであってもよいし、任意の言語に翻訳するものであってもよいし、その他、如何なるサービスであってもよい。 The terminal device 100 transmits the voice uttered by the passenger of the vehicle M to the server device 200 and provides the passenger with any service based on the information returned from the server device 200 . The service may simply display the contents of speech recognition, may translate into any language, or may be any other service.

［車両］
図２は、第１実施形態に係る端末装置１００の構成と、車両Ｍに搭載された機器とを示す図である。車両Ｍには、例えば、一以上のマイク１０と、表示・操作装置２０と、スピーカユニット３０と、車載通信装置６０と、端末装置１００とが搭載される。また、スマートフォンなどの汎用通信装置７０が車室内に持ち込まれ、通信装置として使用される場合がある。これらの装置は、ＣＡＮ（Controller Area Network）通信線等の多重通信線やシリアル通信線、無線通信網等によって互いに接続される。なお、図２に示す構成はあくまで一例であり、構成の一部が省略されてもよいし、更に別の構成が追加されてもよい。 [vehicle]
FIG. 2 is a diagram showing the configuration of the terminal device 100 and devices mounted on the vehicle M according to the first embodiment. On the vehicle M, for example, one or more microphones 10, a display/operation device 20, a speaker unit 30, an in-vehicle communication device 60, and a terminal device 100 are mounted. Also, a general-purpose communication device 70 such as a smart phone may be brought into the vehicle and used as a communication device. These devices are connected to each other by multiplex communication lines such as CAN (Controller Area Network) communication lines, serial communication lines, wireless communication networks, and the like. Note that the configuration shown in FIG. 2 is merely an example, and a part of the configuration may be omitted, or another configuration may be added.

マイク１０は、車室内で発せられた音声を収集する収音部である。表示・操作装置２０は、画像を表示すると共に、入力操作を受付可能な装置（或いは装置群）である。表示・操作装置２０は、例えば、タッチパネルとして構成されたディスプレイ装置を含む。表示・操作装置２０は、更に、ＨＵＤ（Head Up Display）や機械式の入力装置を含んでもよい。スピーカユニット３０は、例えば、車室内の互いに異なる位置に配設された複数のスピーカ（音出力部）を含む。表示・操作装置２０は、端末装置１００とナビゲーション装置４０とで共用されてもよい。 The microphone 10 is a sound pickup unit that collects sounds emitted inside the vehicle. The display/operation device 20 is a device (or device group) that displays images and can accept input operations. The display/operation device 20 includes, for example, a display device configured as a touch panel. The display/operation device 20 may further include a HUD (Head Up Display) or a mechanical input device. The speaker unit 30 includes, for example, a plurality of speakers (sound output units) arranged at different positions in the vehicle interior. The display/operation device 20 may be shared by the terminal device 100 and the navigation device 40 .

車載通信装置６０は、例えば、セルラー網やＷｉ－Ｆｉ網を利用してネットワークＮＷにアクセス可能な無線通信装置である。 The vehicle-mounted communication device 60 is, for example, a wireless communication device that can access the network NW using a cellular network or a Wi-Fi network.

図３は、表示・操作装置２０の配置例を示す図である。表示・操作装置２０は、例えば、第１ディスプレイ２２と、第２ディスプレイ２４と、操作スイッチＡＳＳＹ２６とを含む。表示・操作装置２０は、更に、ＨＵＤ２８を含んでもよい。 FIG. 3 is a diagram showing an arrangement example of the display/operation device 20. As shown in FIG. The display/operation device 20 includes, for example, a first display 22, a second display 24, and an operation switch ASSY26. The display/operation device 20 may further include a HUD 28 .

車両Ｍには、例えば、ステアリングホイールＳＷが設けられた運転席ＤＳと、運転席ＤＳに対して車幅方向（図中Ｙ方向）に設けられた助手席ＡＳとが存在する。第１ディスプレイ２２は、インストルメントパネルにおける運転席ＤＳと助手席ＡＳとの中間辺りから、助手席ＡＳの左端部に対向する位置まで延在する横長形状のディスプレイ装置である。第２ディスプレイ２４は、運転席ＤＳと助手席ＡＳとの車幅方向に関する中間あたり、且つ第１ディスプレイの下方に設置されている。例えば、第１ディスプレイ２２と第２ディスプレイ２４は、共にタッチパネルとして構成され、表示部としてＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electroluminescence）、プラズマディスプレイなどを備えるものである。操作スイッチＡＳＳＹ２６は、ダイヤルスイッチやボタン式スイッチなどが集積されたものである。表示・操作装置２０は、乗員によってなされた操作の内容を端末装置１００に出力する。第１ディスプレイ２２または第２ディスプレイ２４が表示する内容は、端末装置１００によって決定されてよい。 The vehicle M has, for example, a driver's seat DS provided with a steering wheel SW and a passenger's seat AS provided in the vehicle width direction (Y direction in the figure) with respect to the driver's seat DS. The first display 22 is a horizontally long display device that extends from the middle of the instrument panel between the driver's seat DS and the passenger's seat AS to a position facing the left end of the passenger's seat AS. The second display 24 is installed in the middle of the vehicle width direction between the driver's seat DS and the front passenger's seat AS and below the first display. For example, both the first display 22 and the second display 24 are configured as touch panels, and have LCDs (Liquid Crystal Displays), organic ELs (Electroluminescence), plasma displays, etc. as display units. The operation switch ASSY 26 is a combination of dial switches, button switches, and the like. The display/operation device 20 outputs to the terminal device 100 the details of operations performed by the passenger. The content displayed by the first display 22 or the second display 24 may be determined by the terminal device 100 .

図４は、スピーカユニット３０の配置例を示す図である。スピーカユニット３０は、例えば、スピーカ３０Ａ～３０Ｈを含む。スピーカ３０Ａは、運転席ＤＳ側の窓柱（いわゆるＡピラー）に設置されている。スピーカ３０Ｂは、運転席ＤＳに近いドアの下部に設置されている。スピーカ３０Ｃは、助手席ＡＳ側の窓柱に設置されている。スピーカ３０Ｄは、助手席ＡＳに近いドアの下部に設置されている。スピーカ３０Ｅは、右側後部座席ＢＳ１側に近いドアの下部に設置されている。スピーカ３０Ｆは、左側後部座席ＢＳ２側に近いドアの下部に設置されている。スピーカ３０Ｇは、第２ディスプレイ２４の近傍に設置されている。スピーカ３０Ｈは、車室の天井（ルーフ）に設置されている。 FIG. 4 is a diagram showing an arrangement example of the speaker units 30. As shown in FIG. The speaker unit 30 includes, for example, speakers 30A-30H. The speaker 30A is installed on a window pillar (so-called A pillar) on the driver's seat DS side. The speaker 30B is installed under the door near the driver's seat DS. The speaker 30C is installed on the window pillar on the side of the passenger seat AS. The speaker 30D is installed under the door near the passenger seat AS. The speaker 30E is installed under the door near the right rear seat BS1 side. The speaker 30F is installed under the door near the left rear seat BS2 side. The speaker 30G is installed near the second display 24 . The speaker 30H is installed on the ceiling (roof) of the passenger compartment.

係る配置において、例えば、専らスピーカ３０Ａおよび３０Ｂに音を出力させた場合、音像は運転席ＤＳ付近に定位することになる。また、専らスピーカ３０Ｃおよび３０Ｄに音を出力させた場合、音像は助手席ＡＳ付近に定位することになる。また、専らスピーカ３０Ｅに音を出力させた場合、音像は右側後部座席ＢＳ１付近に定位することになる。また、専らスピーカ３０Ｆに音を出力させた場合、音像は左側後部座席ＢＳ２付近に定位することになる。また、専らスピーカ３０Ｇに音を出力させた場合、音像は車室の前方付近に定位することになり、専らスピーカ３０Ｈに音を出力させた場合、音像は車室の上方付近に定位することになる。これに限らず、スピーカユニット３０は、ミキサーやアンプを用いて各スピーカの出力する音の配分を調整することで、車室内の任意の位置に音像を定位させることができる。 In such an arrangement, for example, if the speakers 30A and 30B exclusively output sound, the sound image is localized near the driver's seat DS. Further, when the sound is output exclusively from the speakers 30C and 30D, the sound image is localized near the front passenger seat AS. Further, when the sound is exclusively output from the speaker 30E, the sound image is localized near the right rear seat BS1. Further, when the sound is exclusively output from the speaker 30F, the sound image is localized near the left rear seat BS2. Further, when the sound is exclusively output from the speaker 30G, the sound image is localized near the front of the vehicle compartment, and when the sound is exclusively output from the speaker 30H, the sound image is localized near the upper part of the vehicle compartment. Become. Not limited to this, the speaker unit 30 can localize a sound image at an arbitrary position in the vehicle compartment by adjusting distribution of sound output from each speaker using a mixer or an amplifier.

［端末装置］
図２に戻り、端末装置１００は、例えば、第１管理部１１０と、第２管理部１２０と、通信制御部１５０－１～１５０－３と、ペアリングアプリ実行部１５２とを備える。第１管理部１１０は、前処理部１１２－１、１１２－２と、表示制御部１１６と、音声制御部１１８とを備える。第２管理部１２０は、前処理部１２２－３と、表示制御部１２６と、音声制御部１２８とを備える。いずれの通信制御部であるか区別しない場合、単に通信制御部１５０と称する。３つの通信制御部１５０を示しているのは、図１におけるサーバ装置２００の数に対応させた一例に過ぎず、通信制御部１５０の数は、２つであってもよいし、４つ以上であってもよい。図２に示すソフトウェア配置は説明のために簡易に示しており、実際には、例えば、通信制御部１５０と車載通信装置６０の間に管理部１１０が介在してもよいように、任意に改変することができる。 [Terminal device]
Returning to FIG. 2, the terminal device 100 includes, for example, a first management unit 110, a second management unit 120, communication control units 150-1 to 150-3, and a pairing application execution unit 152. First management unit 110 includes preprocessing units 112 - 1 and 112 - 2 , display control unit 116 , and audio control unit 118 . The second management unit 120 includes a preprocessing unit 122-3, a display control unit 126, and an audio control unit 128. When not distinguishing which communication control unit it is, it is simply referred to as the communication control unit 150 . The illustration of three communication control units 150 is merely an example corresponding to the number of server devices 200 in FIG. 1, and the number of communication control units 150 may be two or four or more. may be The software arrangement shown in FIG. 2 is simply shown for explanation. can do.

端末装置１００の各構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。 Each component of the terminal device 100 is implemented, for example, by a hardware processor such as a CPU (Central Processing Unit) executing a program (software). Some or all of these components are hardware (circuit part; circuitry) or by cooperation of software and hardware. The program may be stored in advance in a storage device (a storage device with a non-transitory storage medium) such as a HDD (Hard Disk Drive) or flash memory, or may be stored in a removable storage such as a DVD or CD-ROM. It may be stored in a medium (non-transitory storage medium) and installed by loading the storage medium into a drive device.

管理部１１０は、ＯＳ（Operating System）やミドルウェアなどのプログラムが実行されることで機能する。管理部１２０も同様であり、端末装置１００は、管理部１１０を実現するためのＯＳと、管理部１２０を実現するためのＯＳとを別々に搭載してもよい。 The management unit 110 functions by executing programs such as an OS (Operating System) and middleware. The same applies to the management unit 120, and the terminal device 100 may have an OS for realizing the management unit 110 and an OS for realizing the management unit 120 separately.

前処理部１１２－１および通信制御部１５０－１は、サーバ装置２００－１に対応した処理を行う。前処理部１１２－２および通信制御部１５０－２は、サーバ装置２００－２に対応した処理を行う。前処理部１２２－３および通信制御部１５０－３は、サーバ装置２００－３に対応した処理を行う。各前処理部は、対応するサーバ装置２００が音声認識を行うのに適した状態となるように、音声に対する音響処理などを行ったり、行わなかったりする。通信制御部１５０は、対応する前処理部から出力される音声或いは音声処理が行われた結果を、対応するサーバ装置２００に送信する。これらの詳細については後述する。 Preprocessing unit 112-1 and communication control unit 150-1 perform processing corresponding to server device 200-1. Preprocessing unit 112-2 and communication control unit 150-2 perform processing corresponding to server device 200-2. Preprocessing unit 122-3 and communication control unit 150-3 perform processing corresponding to server device 200-3. Each preprocessing unit performs or does not perform acoustic processing and the like on the voice so that the corresponding server device 200 is in a state suitable for voice recognition. The communication control unit 150 transmits the voice output from the corresponding preprocessing unit or the result of voice processing to the corresponding server device 200 . Details of these will be described later.

通信制御部１５０には、ペアリングアプリ実行部１５２を介して汎用通信装置７０と連携し、サーバ装置２００と通信するものがあってよい。通信制御部１５０－１は、車載通信装置６０を用いてサーバ装置２００－１と通信する。通信制御部１５０－２は、車載通信装置６０を用いてサーバ装置２００－２と通信する。通信制御部１５０－３は、ペアリングアプリ実行部１５２を介して汎用通信装置７０と連携し、サーバ装置２００－３と通信する。車載通信装置６０および汎用通信装置７０のそれぞれは、「通信部」の一例である。ペアリングアプリ実行部１５２は、例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）によって汎用通信装置７０とペアリングを行い、通信制御部１５０－３と汎用通信装置７０とを接続させる。なお、通信制御部１５０－３は、ＵＳＢ（Universal Serial Bus）などを利用した有線通信によって汎用通信装置７０に接続されるようにしてもよい。 The communication control unit 150 may cooperate with the general-purpose communication device 70 via the pairing application execution unit 152 and communicate with the server device 200 . Communication control unit 150-1 uses in-vehicle communication device 60 to communicate with server device 200-1. Communication control unit 150-2 uses in-vehicle communication device 60 to communicate with server device 200-2. Communication control unit 150-3 cooperates with general-purpose communication device 70 via pairing application execution unit 152 and communicates with server device 200-3. Each of the in-vehicle communication device 60 and the general-purpose communication device 70 is an example of a “communication unit”. The pairing application execution unit 152 performs pairing with the general-purpose communication device 70 by, for example, Bluetooth (registered trademark), and connects the communication control unit 150-3 and the general-purpose communication device 70 together. Note that the communication control unit 150-3 may be connected to the general-purpose communication device 70 by wired communication using USB (Universal Serial Bus) or the like.

表示制御部１１６および１２６は、サーバ装置２００から返信された情報に基づく画像を第１ディスプレイ２２または第２ディスプレイ２４に表示させる。 Display control units 116 and 126 cause first display 22 or second display 24 to display an image based on the information returned from server device 200 .

音声制御部１１８および１２８は、サーバ装置２００から返信された情報に基づく音声を、スピーカユニット３０に含まれるスピーカのうち一部または全部に音声を出力させる。 Audio control units 118 and 128 cause some or all of the speakers included in speaker unit 30 to output audio based on the information returned from server device 200 .

［サーバ装置］
図５は、サーバ装置２００の構成の一例を示す図である。ここでは、端末装置１００からネットワークＮＷまでの物理的な通信についての説明を省略する。 [Server device]
FIG. 5 is a diagram showing an example of the configuration of the server device 200. As shown in FIG. Here, description of physical communication from the terminal device 100 to the network NW is omitted.

サーバ装置２００は、通信部２１０を備える。通信部２１０は、例えばＮＩＣ（Network Interface Card）などのネットワークインターフェースである。更に、サーバ装置２００は、例えば、音声認識部２２０と、自然言語処理部２２２と、返信情報生成部２２４とを備える。これらの構成要素は、例えば、ＣＰＵなどのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。これらの構成要素のうち一部または全部は、ＬＳＩやＡＳＩＣ、ＦＰＧＡ、ＧＰＵなどのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。プログラムは、予めＨＤＤやフラッシュメモリなどの記憶装置（非一過性の記憶媒体を備える記憶装置）に格納されていてもよいし、ＤＶＤやＣＤ－ＲＯＭなどの着脱可能な記憶媒体（非一過性の記憶媒体）に格納されており、記憶媒体がドライブ装置に装着されることでインストールされてもよい。 The server device 200 has a communication unit 210 . The communication unit 210 is a network interface such as a NIC (Network Interface Card). Further, the server device 200 includes, for example, a speech recognition unit 220, a natural language processing unit 222, and a reply information generation unit 224. These components are implemented by, for example, a hardware processor such as a CPU executing a program (software). Some or all of these components may be realized by hardware (including circuitry) such as LSI, ASIC, FPGA, GPU, etc., or by cooperation of software and hardware. good too. The program may be stored in advance in a storage device such as an HDD or flash memory (a storage device with a non-transitory storage medium), or may be stored in a removable storage medium such as a DVD or CD-ROM (non-transitory storage medium). physical storage medium), and may be installed by mounting the storage medium in a drive device.

サーバ装置２００は、記憶部２５０を備える。記憶部２５０は、上記の各種記憶装置により実現される。記憶部２５０には、辞書ＤＢ（データベース）２５２などのデータやプログラムが格納される。 The server device 200 has a storage unit 250 . The storage unit 250 is implemented by the various storage devices described above. The storage unit 250 stores data such as a dictionary DB (database) 252 and programs.

通信部２１０が端末装置から音声或いは音声処理が行われた結果を取得すると、音声認識部２２０が音声認識を行ってテキスト化された文字情報を出力し、自然言語処理部２２２が文字情報に対して辞書ＤＢ２５２を参照しながら意味解釈を行う。辞書ＤＢ２５２は、文字情報に対して抽象化された意味情報が対応付けられたものである。辞書ＤＢ２５２は、同義語や類義語の一覧情報を含んでもよい。音声認識部２２０の処理と、自然言語処理部２２２の処理は、段階が明確に分かれるものではなく、自然言語処理部２２２の処理結果を受けて音声認識部２２０が認識結果を修正するなど、相互に影響し合って行われてよい。返信情報生成部２２４は、自然言語処理部２２２の処理結果に基づいて、端末装置１００に返信する情報（返信情報）を生成する。返信情報は、如何なるものであってもよく、例えば、音声を別の言語に変換したものであってよい。なお、自然言語処理部２２２が省略され、サーバ装置２００は単にテキスト化された文字情報を出力するものであってもよい。 When the communication unit 210 acquires voice or the result of voice processing from the terminal device, the voice recognition unit 220 performs voice recognition and outputs character information converted into text, and the natural language processing unit 222 processes the character information. Semantic interpretation is performed while referring to the dictionary DB 252. The dictionary DB 252 associates abstracted semantic information with character information. The dictionary DB 252 may include synonyms and synonym list information. The processing of the speech recognition unit 220 and the processing of the natural language processing unit 222 are not clearly divided into stages, and the speech recognition unit 220 receives the processing result of the natural language processing unit 222 and corrects the recognition result. It may be done by influencing each other. The reply information generation unit 224 generates information (reply information) to be returned to the terminal device 100 based on the processing result of the natural language processing unit 222 . The reply information may be anything, for example, voice converted into another language. Note that the natural language processing unit 222 may be omitted, and the server apparatus 200 may simply output character information converted into text.

［前処理］
以下、端末装置１００において前処理部により実行される前処理について説明する。図６は、前処理部により実行される処理について説明するための図である。マイク１０により収音された音声は、例えば、前処理部１１２－１と、少なくとも通信制御部１５０－３とに供給される。なお、図６に示す例はあくまで一例であり、マイク１０により収音された音声は、各前処理部に並行して供給されてよい。 [Preprocessing]
The preprocessing executed by the preprocessing unit in the terminal device 100 will be described below. FIG. 6 is a diagram for explaining processing executed by a preprocessing unit; The sound picked up by the microphone 10 is supplied to, for example, the preprocessing unit 112-1 and at least the communication control unit 150-3. Note that the example shown in FIG. 6 is merely an example, and the sound picked up by the microphone 10 may be supplied to each preprocessing unit in parallel.

前処理部１１２－１は、例えば、入力された音声に対してビームフォーミング、ノイズキャンセル、イコライジングなどの処理を行った処理済音声（１）を含む処理結果（１）を通信制御部１５０－１に出力する。また、前処理部１１２－１は、処理済音声（１）に対して簡易な音声認識を行い、その結果を処理結果（１）に含めて通信制御部１５０－１に出力してもよい。通信制御部１５０－１は、車載通信装置６０を用いて処理結果（１）をサーバ装置２００－１に送信させる。処理済音声（１）は、前処理部１１２－２に出力される。 Pre-processing unit 112-1 outputs processing result (1) including processed sound (1) obtained by performing processing such as beamforming, noise cancellation, equalizing, etc. on the input sound to communication control unit 150-1. output to Further, preprocessing unit 112-1 may perform simple speech recognition on processed speech (1), and include the result in processing result (1) and output it to communication control unit 150-1. Communication control unit 150-1 causes in-vehicle communication device 60 to transmit processing result (1) to server device 200-1. Processed speech (1) is output to preprocessing section 112-2.

前処理部１１２－２は、例えば、入力された処理済音声（１）に対して、前処理部１１２－１の処理だけでは不足する分の処理を行った処理済音声（２）を含む処理結果（２）を通信制御部１５０－２に出力する。また、前処理部１１２－２は、処理済音声（２）に対して簡易な音声認識を行い、その結果を処理結果（２）に含めて通信制御部１５０－１に出力してもよい。通信制御部１５０－２は、車載通信装置６０を用いて処理結果（２）をサーバ装置２００－２に送信させる。 The preprocessing unit 112-2 performs, for example, the processing including the processed sound (2) obtained by performing the insufficient processing on the input processed sound (1) only by the processing of the preprocessing unit 112-1. Result (2) is output to communication control section 150-2. Further, preprocessing section 112-2 may perform simple speech recognition on processed speech (2), and include the result in processing result (2) and output it to communication control section 150-1. Communication control unit 150-2 causes in-vehicle communication device 60 to transmit processing result (2) to server device 200-2.

前処理部１２２－３は省略されてもよく、前処理部１２２－３が存在する場合も、存在しない場合も、通信制御部１５０－３には、マイク１０により収音された音声が入力される。前処理部１２２－３が存在する場合、前処理部１２２－３は、入力された音声に対してビームフォーミング、ノイズキャンセル、イコライジングなどの処理を行った処理済音声（３）を含む処理結果（３）を通信制御部１５０－３に出力する。また、前処理部１１２－１は、処理済音声（３）に対して簡易な音声認識を行い、その結果を処理結果（３）に含めて通信制御部１５０－３に出力してもよい。通信制御部１５０－３は、ペアリングアプリ実行部１５２を介して汎用通信装置７０に指示し、少なくともマイク１０により収音された音声をサーバ装置２００－３に送信する。また、通信制御部１５０－３は、更に、処理結果（３）をサーバ装置２００－３に送信してもよい。サーバ装置２００－３は、「第１のサーバ装置」の一例である。 The preprocessing unit 122-3 may be omitted, and the voice picked up by the microphone 10 is input to the communication control unit 150-3 regardless of whether the preprocessing unit 122-3 exists or not. be. If the preprocessing unit 122-3 exists, the preprocessing unit 122-3 performs processing such as beamforming, noise cancellation, and equalizing on the input audio, and the processing result (3) including the processed audio (3) is 3) is output to communication control section 150-3. Further, preprocessing unit 112-1 may perform simple speech recognition on processed speech (3) and output the result to communication control unit 150-3 while including it in processing result (3). Communication control unit 150-3 instructs general-purpose communication device 70 via pairing application execution unit 152 to transmit at least the sound picked up by microphone 10 to server device 200-3. Communication control section 150-3 may further transmit processing result (3) to server device 200-3. Server device 200-3 is an example of a “first server device”.

このように、実施形態の端末装置１００によれば、音声認識機能を有する複数のサーバ装置２００のうち二以上のサーバ装置２００のそれぞれに応じた前処理を行う二以上の前処理部を備え、二以上の前処理部のそれぞれにより前処理が行われた音声を、通信制御部１５０が対応するサーバ装置２００に送信する。これによって、サーバ装置２００の特性に応じた前処理がなされた処理結果を、それぞれのサーバ装置２００に送信することができる。例えば、あるサーバ装置２００はノイズ耐性が強く、別のサーバ装置２００はノイズ耐性が弱い場合、前者に対してはマイク１０により収音された音声に近いデータを送信し、後者に対しては強めのノイズキャンセルを行ったデータを送信することで、いずれのサーバ装置２００に対しても要求品質に近いデータを送信することができる。この結果、音声認識機能を有し、互いに異なる複数のサーバ装置２００を有効に活用することができる。 Thus, according to the terminal device 100 of the embodiment, two or more preprocessing units that perform preprocessing corresponding to each of two or more server devices 200 among a plurality of server devices 200 having a speech recognition function are provided, The voice preprocessed by each of the two or more preprocessing units is transmitted to the corresponding server device 200 by the communication control unit 150 . As a result, it is possible to transmit the processing results that have been preprocessed according to the characteristics of the server devices 200 to the respective server devices 200 . For example, if one server device 200 has a strong noise tolerance and another server device 200 has a weak noise tolerance, data close to the voice picked up by the microphone 10 is transmitted to the former, and stronger data is transmitted to the latter. By transmitting the noise-cancelled data, it is possible to transmit data close to the requested quality to any of the server devices 200 . As a result, it is possible to effectively utilize a plurality of different server devices 200 each having a speech recognition function.

また、端末装置１００において、通信制御部１５０－３は、サーバ装置２００－３に対しては、少なくとも、前処理部１２２－３による前処理が行われなかった音声を送信する。これによって、前処理が行われていない音声を要求するサーバ装置２００－３に対しても音声認識のための適切なデータを送信することができる。 In the terminal device 100, the communication control unit 150-3 transmits at least the voice that has not been preprocessed by the preprocessing unit 122-3 to the server device 200-3. As a result, appropriate data for speech recognition can be transmitted to server device 200-3 requesting speech that has not been preprocessed.

また、端末装置１００において、サーバ装置２００－３以外のサーバ装置２００に応じた前処理を行う前処理部を実現する第１のＯＳと、サーバ装置２００－３に送信するために前処理部による前処理が行われなかった音声を取り出すための第２のＯＳとを搭載している。これによって、同一のソフトウェアで処理を行う場合に必要となる調停等の煩わしさを解消することができる。 Further, in the terminal device 100, a first OS for realizing a preprocessing unit that performs preprocessing according to the server device 200 other than the server device 200-3, and a preprocessing unit for transmitting to the server device 200-3 and a second OS for retrieving audio that has not been preprocessed. As a result, it is possible to eliminate the troublesomeness of arbitration or the like that is required when processing is performed using the same software.

また、端末装置１００において、前処理部１１２－１と前処理部１１２－２はシーケンシャルに（直列に、連続的に）処理を行う。これによって、両者に共通する処理がある場合に、前処理部１１２－１のみが共通する処理に係る機能を実装すればよいため、メモリ資源を節約することができる。また、並列に処理を行うのではないため、両者に排他的な処理結果が生じるような懸念も解消されている。 In the terminal device 100, the preprocessing units 112-1 and 112-2 sequentially (serially, continuously) perform processing. Accordingly, when there is a process common to both, only the preprocessing unit 112-1 needs to implement the function related to the common process, so memory resources can be saved. In addition, since processing is not performed in parallel, concerns about exclusive processing results occurring in both are eliminated.

以上説明した実施形態の端末装置１００によれば、音声認識機能を有し、互いに異なる複数のサーバ装置を有効に活用することができる。 According to the terminal device 100 of the embodiment described above, it is possible to effectively utilize a plurality of different server devices each having a speech recognition function.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As described above, the mode for carrying out the present invention has been described using the embodiments, but the present invention is not limited to such embodiments at all, and various modifications and replacements can be made without departing from the scope of the present invention. can be added.

１０マイク
２０表示・操作装置
３０スピーカユニット
６０車載通信装置
７０汎用通信装置
１００端末装置
１１０第１管理部
１１２－１、１１２－２、１２２－３前処理部
１１６、１２６表示制御部
１１８、１２８音声制御部
１２０第２管理部
１５０－１、１５０－２、１５０－３通信制御部
１５２ペアリングアプリ実行部
２００－１、２００－２、２００－３サーバ装置 10 microphone 20 display/operation device 30 speaker unit 60 in-vehicle communication device 70 general-purpose communication device 100 terminal device 110 first management units 112-1, 112-2, 122-3 preprocessing units 116, 126 display control units 118, 128 voice Control unit 120 Second management unit 150-1, 150-2, 150-3 Communication control unit 152 Pairing application execution unit 200-1, 200-2, 200-3 Server device

Claims

A terminal device mounted on a vehicle,
two or more preprocessing units that perform preprocessing corresponding to each of two or more server devices out of a plurality of server devices having a voice recognition function, on voice in the vehicle interior picked up by a microphone;
a communication control unit that transmits the audio preprocessed by each of the two or more preprocessing units to the corresponding server device using the communication unit ;
The communication control unit transmits at least the audio that has not been preprocessed by the preprocessing unit to a first server device among the plurality of server devices,
Among the two or more preprocessing units, a first OS that realizes a preprocessing unit that performs preprocessing according to a server device other than the first server device, and a preprocessing unit that does not perform preprocessing Equipped with a second OS for extracting the sound
Terminal equipment.

A terminal device mounted on a vehicle,
two or more preprocessing units that perform preprocessing corresponding to each of two or more server devices out of a plurality of server devices having a voice recognition function, on voice in the vehicle interior picked up by a microphone;
a communication control unit that transmits the audio preprocessed by each of the two or more preprocessing units to the corresponding server device using the communication unit ;
Some or all of the two or more preprocessing units sequentially process,
Terminal equipment.

The communication control unit transmits at least the audio that has not been preprocessed by the preprocessing unit to a first server device among the plurality of server devices.
3. The terminal device according to claim 2 .

comprising a plurality of the communication control units corresponding to each of the plurality of server devices,
The terminal device according to any one of claims 1 to 3 .

A control method for a terminal device mounted on a vehicle,
Each of the two or more preprocessing units provided in the terminal device responds to each of two or more server devices out of a plurality of server devices having a voice recognition function with respect to the voice in the vehicle interior picked up by the microphone. pretreatment,
transmitting the audio preprocessed by each of the two or more preprocessing units to the corresponding server device using the communication unit;
transmitting at least the audio that has not been preprocessed by the preprocessing unit to a first server device among the plurality of server devices;
Among the two or more preprocessing units, a first OS that implements a preprocessing unit that performs preprocessing according to a server device other than the first server device, and a preprocessing unit that does not perform preprocessing. Equipped with a second OS for extracting the sound
Terminal device control method.

A control method for a terminal device mounted on a vehicle,
Each of the two or more preprocessing units provided in the terminal device responds to each of two or more server devices out of a plurality of server devices having a voice recognition function with respect to the voice in the vehicle interior picked up by the microphone. pretreatment,
transmitting the audio preprocessed by each of the two or more preprocessing units to the corresponding server device using the communication unit;
Some or all of the two or more preprocessing units sequentially process,
Terminal device control method.

A program executed by a terminal device mounted on a vehicle,
to the terminal device,
Using each of the two or more preprocessing units provided in the terminal device, each of the two or more server devices out of a plurality of server devices having a voice recognition function for the voice in the vehicle cabin picked up by the microphone Perform pretreatment according to
transmitting the audio preprocessed by each of the two or more preprocessing units to the corresponding server device using the communication unit ;
transmitting at least the audio that has not been preprocessed by the preprocessing unit to a first server device among the plurality of server devices;
Among the two or more preprocessing units, a first OS that realizes a preprocessing unit that performs preprocessing according to a server device other than the first server device, and a preprocessing unit that does not perform preprocessing Equipped with a second OS for extracting the sound
program.

A program executed by a terminal device mounted on a vehicle,
to the terminal device,
Using each of the two or more preprocessing units provided in the terminal device, each of the two or more server devices out of a plurality of server devices having a voice recognition function for the voice in the vehicle cabin picked up by the microphone Perform pretreatment according to
Some or all of the two or more pretreatment units are sequentially processed,
program.