JP7178630B2

JP7178630B2 - VOICE OPERATION METHOD, PROGRAM, VOICE OPERATION SYSTEM, AND MOVING OBJECT

Info

Publication number: JP7178630B2
Application number: JP2018224255A
Authority: JP
Inventors: 知浩小沼; 伸一芳澤; 直樹田中
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2022-11-28
Anticipated expiration: 2038-11-29
Also published as: JP2020086320A

Description

本開示は、音声操作方法、プログラム、音声操作システム、及び、移動体に関する。本開示は、特に、音声により移動体を操作するための音声操作方法、プログラム、音声操作システム、及び、移動体に関する。 TECHNICAL FIELD The present disclosure relates to a voice operation method, a program, a voice operation system, and a mobile object. The present disclosure particularly relates to a voice operation method, a program, a voice operation system, and a mobile object for operating a mobile object by voice.

特許文献１は、車両用装置を開示する。特許文献１の車両用装置は、携帯機器と無線通信を行い、携帯機器からの指示信号に応じ、車両移動装置を制御して車両の駐車位置を調整する。 Patent Literature 1 discloses a vehicle device. The vehicular device of Patent Document 1 performs wireless communication with a mobile device, controls a vehicle movement device, and adjusts the parking position of the vehicle according to an instruction signal from the mobile device.

特開２０１６－７９５９号公報JP 2016-7959 A

特許文献１は、携帯機器にマイクロフォンを備え、ユーザの音声で駐車位置の調整を指示してもよいことが開示されている。この場合、ユーザが携帯機器に「もう少し前に」と音声入力し、車両用装置が車両を１０[ｃｍ]前進させてもよいことが開示されている。 Patent Literature 1 discloses that a mobile device may be equipped with a microphone, and a user's voice may instruct adjustment of a parking position. In this case, it is disclosed that the user may voice-input "a little while ago" into the portable device, and the vehicle device may move the vehicle forward by 10 [cm].

しかしながら、特許文献１では、「もう少し前に」という副詞句に対して「車両を１０[ｃｍ]前進させる」という制御を紐付けているだけである。そのため、自然な発話で移動体を操作できるわけではない。 However, in Patent Literature 1, the adverb phrase "a little while ago" is simply associated with the control "to move the vehicle forward by 10 [cm]". Therefore, it is not possible to operate the moving object by natural speech.

課題は、自然な発話で移動体を操作できる、音声操作方法、プログラム、音声操作システム、及び、移動体に関する。 An object of the present invention is to provide a voice operation method, a program, a voice operation system, and a mobile object that can operate the mobile object with natural speech.

本開示の一態様の音声操作方法は、コンピュータによって実行される音声操作方法である。前記音声操作方法は、変換ステップと、抽出ステップと、解釈ステップと、生成ステップとを含む。前記変換ステップは、移動体の操作のための音声データを文字データに変換するステップである。前記抽出ステップは、前記文字データで表される文字列から曖昧部分を抽出するステップである。前記解釈ステップは、前記移動体の操作履歴及び前記移動体の状態の少なくとも一方を含む移動体情報を利用して前記曖昧部分の解釈をするステップである。前記生成ステップは、前記曖昧部分の解釈の結果に基づいて、前記文字データから前記移動体の操作コマンドを生成するステップである。前記解釈ステップは、前記移動体情報と前記曖昧部分の解釈の結果との関係を学習した学習済みモデルを利用して、前記抽出ステップで抽出した前記曖昧部分の解釈をする。 A voice manipulation method according to one aspect of the present disclosure is a voice manipulation method executed by a computer. The voice manipulation method includes a transforming step, an extracting step, an interpreting step, and a generating step. The conversion step is a step of converting voice data for operating a mobile object into character data. The extracting step is a step of extracting an ambiguous portion from the character string represented by the character data. The interpretation step is a step of interpreting the ambiguous portion using mobile information including at least one of an operation history of the mobile and a state of the mobile. The generating step is a step of generating an operation command for the moving body from the character data based on the interpretation result of the ambiguous part. The interpreting step interprets the ambiguous portion extracted in the extracting step using a trained model that has learned the relationship between the moving body information and the interpretation result of the ambiguous portion.

本開示の一態様のプログラムは、１以上のプロセッサに、上述の音声操作方法を実行させるための、プログラムである。 A program according to one aspect of the present disclosure is a program for causing one or more processors to execute the voice operation method described above.

本開示の一態様の音声操作システムは、変換部と、抽出部と、解釈部と、生成部とを備える。前記変換部は、移動体の操作のための音声データを文字データに変換する。前記抽出部は、前記文字データで表される文字列から曖昧部分を抽出する。前記解釈部は、前記移動体の操作履歴及び前記移動体の状態の少なくとも一方を含む移動体情報を利用して前記曖昧部分の解釈をする。前記生成部は、前記曖昧部分の解釈の結果に基づいて、前記文字データから前記移動体の操作コマンドを生成する。前記解釈部は、前記移動体情報と前記曖昧部分の解釈の結果との関係を学習した学習済みモデルを利用して、前記抽出部が抽出した前記曖昧部分の解釈をする。 A voice manipulation system according to one aspect of the present disclosure includes a converter, an extractor, an interpreter, and a generator. The conversion unit converts voice data for operating a mobile object into character data. The extraction unit extracts an ambiguous part from the character string represented by the character data. The interpretation unit interprets the ambiguous portion using mobile information including at least one of an operation history of the mobile and a state of the mobile. The generation unit generates an operation command for the moving body from the character data based on the interpretation result of the ambiguous part. The interpreting unit interprets the ambiguous part extracted by the extracting unit by using a learned model that has learned the relationship between the moving body information and the interpretation result of the ambiguous part.

本開示の一態様の移動体は、上述の音声操作システムと、前記音声操作システムが搭載される本体とを備える。 A moving object according to one aspect of the present disclosure includes the voice operation system described above and a main body on which the voice operation system is mounted.

本開示の態様によれば、自然な発話で移動体を操作できる、という効果を奏する。 According to the aspect of the present disclosure, it is possible to operate a moving object with natural speech.

図１は、一実施形態の音声操作システムを備える移動体の説明図である。FIG. 1 is an explanatory diagram of a moving object equipped with a voice operation system according to one embodiment. 図２は、上記実施形態の音声操作システムのブロック図である。FIG. 2 is a block diagram of the voice operation system of the above embodiment. 図３は、上記実施形態の音声操作システムの動作のフローチャートである。FIG. 3 is a flow chart of the operation of the voice operation system of the above embodiment.

１．実施形態
１．１概要
図１は、移動体１００を示す。移動体１００は、一実施形態の音声操作方法を実行する音声操作システム１０を備える。本実施形態の音声操作方法は、変換ステップと、抽出ステップと、解釈ステップと、生成ステップとを含む。変換ステップは、移動体１００の操作のための音声データを文字データに変換するステップである。抽出ステップは、文字データで表される文字列から曖昧部分を抽出するステップである。解釈ステップは、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報１４１を利用して曖昧部分の解釈をするステップである。生成ステップは、曖昧部分の解釈の結果に基づいて、文字データから移動体１００の操作コマンドを生成するステップである。 1. Embodiment 1.1 Overview FIG. 1 shows a mobile object 100 . A mobile object 100 includes a voice operation system 10 that executes the voice operation method of one embodiment. The voice manipulation method of this embodiment includes a conversion step, an extraction step, an interpretation step, and a generation step. The conversion step is a step of converting voice data for operating the mobile object 100 into character data. The extraction step is a step of extracting an ambiguous portion from a character string represented by character data. The interpretation step is a step of interpreting the ambiguous part using the mobile body information 141 including at least one of the operation history of the mobile body 100 and the state of the mobile body 100 . The generating step is a step of generating an operation command for the moving body 100 from the character data based on the interpretation result of the ambiguous part.

音声操作方法では、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報によって、文字列の曖昧部分を解釈し、曖昧部分の解釈の結果に基づいて文字データから移動体１００の操作コマンドを生成する。そのため、文字列に曖昧部分がある場合であっても、曖昧部分の意味が反映された操作コマンドを移動体１００に与えることができる。そのため、ユーザは、移動体１００を操作するにあたって、意図的に曖昧部分がないように発話しなくて済む。したがって、音声操作方法によれば、自然な発話で移動体１００を操作できる。 In the voice operation method, the ambiguous part of the character string is interpreted based on the moving body information including at least one of the operation history of the moving body 100 and the state of the moving body 100, and based on the interpretation result of the ambiguous part, character data is converted to the moving body. Generate 100 operation commands. Therefore, even if the character string has an ambiguous part, an operation command reflecting the meaning of the ambiguous part can be given to the moving body 100 . Therefore, when operating the mobile object 100, the user does not have to intentionally speak without ambiguous parts. Therefore, according to the voice operation method, the moving object 100 can be operated with natural speech.

１．２詳細
以下、本実施形態の音声操作方法について更に詳細に説明する。本実施形態の音声操作方法は、図１に示す、音声操作システム１０によって実行される。音声操作システム１０は、ユーザが音声によって、移動体１００の操作をすることを可能にする。音声操作システム１０は、いわゆる、ボイスユーザインタフェース（ＶＵＩ）又は音声対話システムともいわれる。また、本実施形態では、移動体１００は自動車である。 1.2 Details The voice operation method of this embodiment will be described in more detail below. The voice operation method of this embodiment is executed by the voice operation system 10 shown in FIG. The voice operation system 10 enables the user to operate the moving body 100 by voice. The voice operating system 10 is also referred to as a so-called voice user interface (VUI) or voice interaction system. Further, in this embodiment, the mobile object 100 is an automobile.

本実施形態では、移動体１００の操作とは、移動体１００それ自体の操作だけではなく、移動体１００に付随する装置又はシステムの操作を含む。移動体１００それ自体の操作としては、移動体１００の運転操作を想定している。移動体１００に付随する装置又はシステムとしては、移動体１００の本体１０１に搭載されている種々の装置又はシステムを含む。例えば、移動体１００に付随する装置又はシステムの例としては、空調システム、カーナビゲーションシステム、オーディオシステム、シート調整システム（リクライニングシステム）、パワーウィンドウシステム等がある。移動体１００に付随する装置又はシステムは、必ずしも移動体１００の本体１０１に搭載されている必要はなく、移動体１００と通信可能な装置又はシステムであってもよい。移動体１００と通信可能な装置又はシステムは、一例としては、スマートフォン、タブレット端末、ウェアラブル端末等の端末装置がある。 In this embodiment, the operation of the mobile body 100 includes not only the operation of the mobile body 100 itself but also the operation of devices or systems associated with the mobile body 100 . As the operation of the moving body 100 itself, driving operation of the moving body 100 is assumed. Devices or systems associated with the mobile object 100 include various devices or systems mounted on the main body 101 of the mobile object 100 . For example, devices or systems associated with the moving body 100 include an air conditioning system, a car navigation system, an audio system, a seat adjustment system (reclining system), a power window system, and the like. A device or system associated with the mobile object 100 does not necessarily have to be mounted on the main body 101 of the mobile object 100 , and may be a device or system capable of communicating with the mobile object 100 . Examples of devices or systems that can communicate with the mobile object 100 include terminal devices such as smartphones, tablet terminals, and wearable terminals.

音声操作システム１０は、図１に示すように、移動体１００の本体１０１に搭載される。本体１０１には、更に、音声入力システム２０、情報取得システム３０、走行制御装置４０、空調装置５０、カーナビゲーションシステム６０、及びオーディオシステム７０が搭載される。 The voice operation system 10 is mounted on a main body 101 of a moving body 100, as shown in FIG. The main body 101 further includes a voice input system 20 , an information acquisition system 30 , a travel control device 40 , an air conditioner 50 , a car navigation system 60 and an audio system 70 .

音声操作システム１０は、図２に示すように、入出力部１１と、処理部１２と、通信部１３と、記憶部１４とを備える。音声操作システム１０は、コンピュータシステムにより実現され得る。コンピュータシステムは、１以上のプロセッサ、１以上のコネクタ、１以上の通信機器、及び１以上のメモリ等を含み得る。 The voice operation system 10 includes an input/output unit 11, a processing unit 12, a communication unit 13, and a storage unit 14, as shown in FIG. Voice operation system 10 may be implemented by a computer system. A computer system may include one or more processors, one or more connectors, one or more communication devices, one or more memories, and the like.

入出力部１１は、移動体１００の本体１０１に搭載されているシステムとの間の情報の入出力を行う。本実施形態では、入出力部１１は、音声入力システム２０、情報取得システム３０、走行制御装置４０、空調装置５０、カーナビゲーションシステム６０、及びオーディオシステム７０に通信可能に接続される。入出力部１１は、１以上の入出力装置を含み、１以上の入出力インタフェースを利用する。 The input/output unit 11 inputs and outputs information to and from a system mounted on the main body 101 of the mobile object 100 . In this embodiment, the input/output unit 11 is communicably connected to the voice input system 20, the information acquisition system 30, the travel control device 40, the air conditioner 50, the car navigation system 60, and the audio system . The input/output unit 11 includes one or more input/output devices and uses one or more input/output interfaces.

音声入力システム２０は、移動体１００の操作のための音声データの入力のためのシステムである。音声入力システム２０は、１以上のマイクロフォンを含む。音声入力システム２０は、１以上のマイクロフォンにより音声を収集し、収集した音声を表す音声データを生成して出力する。これによって、入出力部１１は、音声入力システム２０から音声データを取得する。 The voice input system 20 is a system for inputting voice data for operating the mobile object 100 . Voice input system 20 includes one or more microphones. The voice input system 20 collects voices with one or more microphones, generates and outputs voice data representing the collected voices. Thereby, the input/output unit 11 acquires voice data from the voice input system 20 .

情報取得システム３０は、移動体１００の状態の取得のためのシステムである。特に、情報取得システム３０は、移動体１００の内部状態の取得のためのシステムである。情報取得システム３０が取得する状態は、特に、移動体１００の動的な内部状態を含む。移動体１００の動的な内部状態は、移動体１００の移動に関する状態、及び移動体１００に乗っている人物に関する状態の少なくとも一つを含み得る。移動体１００の移動に関する状態の例としては、移動体１００の速度（現在の速度）、移動体１００の操舵角、移動体１００のエネルギー（ガソリン、電気）の残量が挙げられる。更に、移動体１００の移動に関する状態の例としては、移動体１００の周囲の温度（現在の温度）、移動体１００の周囲の騒音レベル、移動体１００の周囲の状況（物体の有無等）が挙げられる。移動体１００に乗っている人物に関する状態の例としては、移動体１００に乗っている人物の状態（現在の状態）が挙げられる。情報取得システム３０は、加速度センサ、カメラ、超音波センサ、マイクロフォン、温度センサ、圧力センサ、ミリ波（テラヘルツ波）センサ、呼気センサ等の種々のセンサを含み得る。また、情報取得システム３０は、走行制御装置４０等の他の装置から移動体１００の情報（例えば、移動体１００の操舵角、移動体１００のエネルギー（ガソリン、電気）の残量）を取得するための入出力インタフェースを含み得る。 The information acquisition system 30 is a system for acquiring the state of the mobile object 100 . In particular, the information acquisition system 30 is a system for acquiring the internal state of the mobile object 100 . The state acquired by the information acquisition system 30 particularly includes the dynamic internal state of the mobile object 100 . The dynamic internal state of the mobile object 100 may include at least one of a state related to movement of the mobile object 100 and a state related to a person riding the mobile object 100 . Examples of states related to movement of the mobile object 100 include the speed (current speed) of the mobile object 100, the steering angle of the mobile object 100, and the remaining amount of energy (gasoline, electricity) of the mobile object 100. FIG. Further, examples of states related to the movement of the mobile object 100 include the temperature (current temperature) around the mobile object 100, the noise level around the mobile object 100, and the situation around the mobile object 100 (presence or absence of objects, etc.). mentioned. An example of the state of the person riding the mobile object 100 is the state (current state) of the person riding the mobile object 100 . The information acquisition system 30 may include various sensors such as acceleration sensors, cameras, ultrasonic sensors, microphones, temperature sensors, pressure sensors, millimeter wave (terahertz wave) sensors, and breath sensors. The information acquisition system 30 also acquires information on the mobile object 100 (for example, the steering angle of the mobile object 100 and the remaining amount of energy (gasoline, electricity) of the mobile object 100) from other devices such as the travel control device 40. may include an input/output interface for

走行制御装置（走行制御システム）４０は、移動体１００の走行を制御するシステムである。走行制御装置４０は、移動体１００の速度の調節、移動体１００の操舵角の調節、及び移動体１００の前進・後退の切り替えを行う機能を有し得る。したがって、走行制御装置４０によれば、移動体１００のドライバーの操作がなくても、移動体１００の速度を低下させたり、移動体１００を停止させたりすることが可能である。 A travel control device (travel control system) 40 is a system that controls travel of the moving body 100 . The travel control device 40 can have functions of adjusting the speed of the mobile body 100 , adjusting the steering angle of the mobile body 100 , and switching between forward and backward movement of the mobile body 100 . Therefore, according to the travel control device 40 , it is possible to reduce the speed of the mobile body 100 or stop the mobile body 100 without the driver of the mobile body 100 operating.

空調装置（空調システム）５０は、移動体１００内の温度を調整するためのシステムである。カーナビゲーションシステム６０は、目的地までの案内等をするためのシステムである。カーナビゲーションシステム６０は、液晶ディスプレイや、有機ＥＬディスプレイ、３Ｄヘッドアップディスプレイ（ＨＵＤ）等の画像表示装置、及び音声出力装置を含み得る。オーディオシステム７０は、音声や音楽等を再生するためのシステムである。オーディオシステム７０は、１以上のスピーカを含み得る。なお、空調装置５０、カーナビゲーションシステム６０、及びオーディオシステム７０の機能自体はよく知られているため、詳細な説明は省略する。 An air conditioner (air conditioning system) 50 is a system for adjusting the temperature inside the moving object 100 . The car navigation system 60 is a system for providing guidance to a destination. The car navigation system 60 may include an image display device such as a liquid crystal display, an organic EL display, a 3D head-up display (HUD), and an audio output device. The audio system 70 is a system for reproducing voice, music, and the like. Audio system 70 may include one or more speakers. Since the functions of the air conditioner 50, the car navigation system 60, and the audio system 70 are well known, detailed description thereof will be omitted.

通信部１３は、移動体１００の本体１０１に搭載されていないシステムとの間の情報の入出力を行う。本実施形態では、通信部１３は、外部システム８０に通信可能に接続される。通信部１３は、１以上の通信装置を含み、通信ネットワークを介した通信を実現する。通信ネットワークは、単一の通信プロトコルに準拠したネットワークだけではなく、異なる通信プロトコルに準拠した複数のネットワークで構成されていてもよい。通信プロトコルは、周知の様々な有線及び無線通信規格から選択され得る。ただし、通信部１３と外部システム８０との間の通信は、移動体通信により行われることが好ましい。なお、通信ネットワークは、必要に応じて、リピータハブ、スイッチングハブ、ブリッジ、ゲートウェイ、ルータ等のデータ通信機器を含み得る。 The communication unit 13 inputs and outputs information to and from a system not mounted on the main body 101 of the mobile object 100 . In this embodiment, the communication unit 13 is communicably connected to the external system 80 . The communication unit 13 includes one or more communication devices and implements communication via a communication network. The communication network may consist of not only a network conforming to a single communication protocol, but also a plurality of networks conforming to different communication protocols. Communication protocols may be selected from a variety of well-known wired and wireless communication standards. However, communication between the communication unit 13 and the external system 80 is preferably performed by mobile communication. Note that the communication network may include data communication equipment such as repeater hubs, switching hubs, bridges, gateways, routers, etc., as required.

外部システム８０は、移動体１００の状態の取得のためのシステムである。特に、外部システム８０は、移動体１００の外部状態の取得のためのシステムである。外部状態は、移動体１００の位置に関する状態、移動体１００の周辺の状態、及び移動体１００と連携するシステムの状態の少なくとも一つを含み得る。移動体１００の位置に関する状態の例としては、移動体１００の位置情報が挙げられる。移動体１００の周辺の状態の例としては、交通情報（渋滞情報等）、騒音レベル、天気情報、時間情報が挙げられる。移動体１００と連携するシステムの状態の例としては、移動体１００とは別の移動体からの情報、移動体１００に関するサービス提供者からの情報等が挙げられる。例えば、別の移動体が移動体１００の前方の移動体（車両）である場合には、別の移動体からの情報の例としては、衝突可能性物体の検出情報、急ブレーキの発生情報がある。移動体１００に関するサービス提供者からの情報としては、移動体１００が走行中の道路の平均速度がある。外部システム８０は、１以上の情報提供システムを含み得る。情報提供システムの例としては、グローバル・ポジショニング・システム（ＧＰＳ）、道路交通情報通信システムが挙げられる。 The external system 80 is a system for acquiring the state of the mobile object 100 . In particular, the external system 80 is a system for acquiring the external state of the mobile object 100 . The external state may include at least one of a state related to the position of the mobile object 100 , a state around the mobile object 100 , and a state of a system that cooperates with the mobile object 100 . An example of the state related to the position of the mobile object 100 is positional information of the mobile object 100 . Examples of the state of the surroundings of the mobile object 100 include traffic information (traffic information, etc.), noise level, weather information, and time information. Examples of the state of the system that cooperates with the mobile unit 100 include information from a mobile unit other than the mobile unit 100, information from a service provider regarding the mobile unit 100, and the like. For example, if another moving body is a moving body (vehicle) in front of the moving body 100, examples of information from the other moving body include detection information of a collision-probable object and information on occurrence of sudden braking. be. Information about the mobile object 100 from the service provider includes the average speed of the road on which the mobile object 100 is traveling. External system 80 may include one or more information providing systems. Examples of information providing systems include the Global Positioning System (GPS) and road traffic information and communication systems.

記憶部１４は、処理部１２が利用する情報を記憶するために用いられる。記憶部１４は、１以上の記憶装置を含む。記憶装置は、例えば、ＲＡＭ（Random Access Memory）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）である。 The storage unit 14 is used to store information used by the processing unit 12 . Storage unit 14 includes one or more storage devices. The storage device is, for example, RAM (Random Access Memory) or EEPROM (Electrically Erasable Programmable Read Only Memory).

記憶部１４は、移動体情報１４１を記憶する。移動体情報１４１は、移動体１００の操作履歴、及び移動体１００の状態の少なくとも一方を含み得る。本実施形態では、移動体情報１４１は、移動体１００の操作履歴、及び移動体１００の状態の両方を含んでいる。 The storage unit 14 stores mobile unit information 141 . The mobile body information 141 may include at least one of the operation history of the mobile body 100 and the state of the mobile body 100 . In this embodiment, the mobile body information 141 includes both the operation history of the mobile body 100 and the state of the mobile body 100 .

操作履歴は、移動体１００の操作の履歴である。操作の履歴は、音声操作システム１０による移動体１００の操作の履歴と、ユーザによる移動体１００の手動操作の履歴とを含み得る。上述したように、移動体１００の操作は、移動体１００それ自体の操作だけではなく、移動体１００に付随する装置又はシステムの操作を含む。本実施形態では、移動体１００の操作は、走行制御装置４０の操作（移動体１００自体の操作）に加えて、空調装置５０の操作、カーナビゲーションシステム６０の操作、及びオーディオシステム７０の操作を含む。 The operation history is the history of operations of the moving object 100 . The operation history may include the operation history of the mobile object 100 by the voice operation system 10 and the manual operation history of the mobile object 100 by the user. As described above, the operation of the mobile object 100 includes not only the operation of the mobile object 100 itself, but also the operation of devices or systems associated with the mobile object 100 . In the present embodiment, the operation of the mobile object 100 includes operation of the air conditioner 50, operation of the car navigation system 60, and operation of the audio system 70 in addition to operation of the travel control device 40 (operation of the mobile object 100 itself). include.

移動体１００の状態は、移動体１００の内部状態と移動体１００の外部状態との少なくとも一方を含み得る。内部状態は、移動体１００の移動に関する状態、移動体１００の性能に関する状態、及び移動体１００に乗っている人物に関する状態の少なくとも一つを含み得る。ここで、移動体１００の移動に関する状態、及び、移動体１００に乗っている人物に関する状態は、情報取得システム３０から取得可能である。移動体１００の性能に関する状態は、移動体１００の諸元（移動体１００の本体１０１の寸法、燃費等）を含む。外部状態は、移動体の位置に関する状態、移動体の周辺の状態、及び移動体と連携する外部システムの状態の少なくとも一つを含み得る。外部状態は、外部システム８０から取得可能である。なお、移動体１００の状態のうち、静的な状態については、予め記憶部１４に記憶されていてよいし、必要に応じて、外部システム８０から取得してもよい。 The state of the mobile body 100 may include at least one of an internal state of the mobile body 100 and an external state of the mobile body 100 . The internal state may include at least one of a state related to movement of the mobile body 100 , a state related to performance of the mobile body 100 , and a state related to a person riding on the mobile body 100 . Here, the state of movement of the mobile object 100 and the state of the person riding the mobile object 100 can be acquired from the information acquisition system 30 . The state related to the performance of the mobile object 100 includes specifications of the mobile object 100 (dimensions of the main body 101 of the mobile object 100, fuel efficiency, etc.). The external state may include at least one of a state related to the position of the mobile body, a surrounding state of the mobile body, and a state of an external system that cooperates with the mobile body. The external state can be obtained from external system 80 . Among the states of the moving body 100, static states may be stored in advance in the storage unit 14, or may be acquired from the external system 80 as necessary.

処理部１２は、例えば、１以上のプロセッサ（マイクロプロセッサ）により実現され得る。つまり、１以上のプロセッサが１以上のメモリに記憶された１以上のプログラム（コンピュータプログラム）を実行することで、処理部１２として機能する。１以上のプログラムは、１以上のメモリに予め記録されていてもよいし、インターネット等の電気通信回線を通じて、又はメモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The processing unit 12 can be implemented by, for example, one or more processors (microprocessors). That is, one or more processors function as the processing unit 12 by executing one or more programs (computer programs) stored in one or more memories. One or more programs may be pre-recorded in one or more memories, or may be provided by being recorded in a non-temporary recording medium such as a memory card or through a telecommunication line such as the Internet.

処理部１２は、図２に示すように、取得部１２１と、変換部１２２と、抽出部１２３と、解釈部１２４と、生成部１２５とを有している。取得部１２１と、変換部１２２と、抽出部１２３と、解釈部１２４と、生成部１２５とは、実体のある構成を示しているわけではなく、処理部１２によって実現される機能を示している。 The processing unit 12 has an acquisition unit 121, a conversion unit 122, an extraction unit 123, an interpretation unit 124, and a generation unit 125, as shown in FIG. The acquisition unit 121, the conversion unit 122, the extraction unit 123, the interpretation unit 124, and the generation unit 125 do not represent actual configurations, but represent functions realized by the processing unit 12. .

取得部１２１は、移動体１００の操作のための音声データを取得する。つまり、処理部１２は、移動体の操作のための音声データを取得する取得ステップを実行する。取得部１２１は、入出力部１１を通じて、音声入力システム２０から音声データを取得する。 Acquisition unit 121 acquires voice data for operating mobile object 100 . That is, the processing unit 12 executes an acquisition step of acquiring voice data for operating the mobile object. The acquisition unit 121 acquires voice data from the voice input system 20 through the input/output unit 11 .

本実施形態では、取得部１２１は、音声データを特定の人物の音声から取得する。取得部１２１は、音声入力システム２０からの音声データが表す音声の発話者が特定の人物である場合に、音声入力システム２０からの音声データを取得する。特定の人物は、あらかじめ音声操作システム１０に登録された人物であってよい。一例として、特定の人物は、移動体１００のドライバーである。取得部１２１は、話者認識技術により、音声入力システム２０から取得した音声データに基づいて、音声データが表す音声の発話者が特定の人物かどうかを判断する。取得部１２１は、音声データが表す音声の発話者が特定の人物である場合に、音声データを取得するが、そうでない場合には、音声データを破棄する。つまり、取得部１２１は、特定の人物からの音声データを変換部１２２に与える。これによって、意図しない人物によって音声操作システム１０で移動体１００が操作される可能性を低減できる。なお、話者認識技術には従来周知の技術を採用できるから詳細な説明は省略する。 In this embodiment, the acquisition unit 121 acquires voice data from the voice of a specific person. The acquiring unit 121 acquires the voice data from the voice input system 20 when the speaker of the voice represented by the voice data from the voice input system 20 is a specific person. The specific person may be a person registered in the voice operation system 10 in advance. As an example, the specific person is the driver of mobile object 100 . Based on the voice data acquired from the voice input system 20, the acquisition unit 121 determines whether or not the speaker of the voice represented by the voice data is a specific person using speaker recognition technology. The acquisition unit 121 acquires the voice data when the speaker of the voice represented by the voice data is a specific person, but otherwise discards the voice data. That is, the acquisition unit 121 provides the conversion unit 122 with voice data from a specific person. This reduces the possibility that the mobile object 100 is operated by the voice operation system 10 by an unintended person. Note that a conventionally well-known technique can be adopted for the speaker recognition technique, so a detailed description thereof will be omitted.

変換部１２２は、移動体１００の操作のための音声データを文字データに変換する。つまり、処理部１２は、移動体１００の操作のための音声データを文字データに変換する変換ステップを実行する。文字データは、特定言語の文字列（テキスト）を表すデータである。本実施形態では、特定言語として日本語を想定している。 The conversion unit 122 converts voice data for operating the mobile object 100 into character data. That is, the processing unit 12 executes a conversion step of converting voice data for operating the mobile body 100 into character data. Character data is data representing character strings (text) in a specific language. In this embodiment, Japanese is assumed as the specific language.

変換部１２２は、音声認識技術により、取得部１２１からの音声データを、文字データに変換する。なお、音声認識技術には従来周知の技術が採用できるから詳細な説明は省略する。本実施形態では、変換部１２２は、文字データに特定の文字列が含まれている場合に、文字データの抽出部１２３への出力を開始する。特定の文字列は、いわゆるウェイクアップフレーズである。特定の文字列は、自由に決めることができるが、日常的な語句や移動体１００の操作のための語句との混同が生じないように決定されるとよい。なお、変換部１２２は、終了条件が満たされるまでは、文字データに特定の文字列が含まれていなくても、文字データを抽出部１２３に出力してよい。これにより、ユーザは、その都度、特定の文字列を含む音声を発しなくて済むようになる。終了条件としては、取得部１２１から所定期間のあいだ音声データの入力がないことであってよい。所定期間は、ユーザが、特定の文字列を含む音声を発した後に、移動体１００の操作の内容に対応する音声を発することが許容されるように決定されるとよい。 The conversion unit 122 converts the voice data from the acquisition unit 121 into character data using voice recognition technology. Note that a conventionally well-known technique can be adopted for the speech recognition technique, so a detailed description thereof will be omitted. In this embodiment, the conversion unit 122 starts outputting the character data to the extraction unit 123 when the character data includes a specific character string. A specific character string is a so-called wake-up phrase. The specific character string can be freely determined, but should be determined so as not to cause confusion with everyday words and phrases and words and phrases for operating the mobile body 100 . Note that the conversion unit 122 may output the character data to the extraction unit 123 until the termination condition is satisfied, even if the character data does not contain a specific character string. As a result, the user does not need to utter a voice containing a specific character string each time. The end condition may be that no voice data is input from the acquisition unit 121 for a predetermined period. The predetermined period may be determined such that the user is allowed to make a sound corresponding to the content of the operation of the mobile object 100 after making a sound including a specific character string.

抽出部１２３は、文字データで表される文字列から曖昧部分を抽出する。つまり、処理部１２は、文字データで表される文字列から曖昧部分を抽出する抽出ステップを実行する。曖昧部分は、文字列の一部又は全部であって、単独では意味が一意に定まらないものをいう。曖昧部分は、副詞又は副詞句である副詞部分である。副詞の種類としては、状態、程度等がある。特に、本実施形態では、副詞部分としては、程度を表す副詞部分を想定している。程度の副詞（又は副詞句）としては、程度の「大」、「中」、「小」に応じた副詞がある。程度が「大」の副詞としては、「もっと」、「ずっと」、「かなり」がある。程度が「中」の副詞としては、「そこそこ」、「ほどほど」がある。程度が「小」の副詞としては、「もう少し」、「ちょっと」ある。例えば、「速度をもう少し上げて」という文字列の場合、曖昧部分は、「もう少し」である。「温度をちょっと下げて」という文字列の場合、曖昧部分は、「ちょっと」である。「もっと先の地図を出して」という文字列の場合、曖昧部分は、「もっと」である。「音量をかなり上げて」という文字列の場合、曖昧部分は、「かなり」である。また、曖昧部分は、文字列の一部に限らない。文字列全体が曖昧部分である場合があり、これは、文字列が文として成立していない場合である。一例としては、述語（動詞）が欠けている文、必要な目的語がない文等が挙げられる。例えば、「もっと」、「まだ」、「まだまだ」、「ずっと」、「とても」、「かなり」「もう少し」、「あとちょっと」、「まだまだ」とうい文字列の場合、これら全体が曖昧部分である。他の例としては、「ストップ」、「だめ」、「やめて」という文字列の場合、これら全体が曖昧部分である。 Extraction unit 123 extracts an ambiguous part from a character string represented by character data. That is, the processing unit 12 executes an extraction step of extracting ambiguous parts from the character string represented by the character data. An ambiguous part is a part or all of a character string whose meaning cannot be uniquely determined by itself. An ambiguous part is an adverbial part that is an adverb or an adverbial phrase. Types of adverbs include state, degree, and the like. In particular, in this embodiment, the adverb part that expresses degree is assumed as the adverb part. Adverbs (or adverbial phrases) of degree include adverbs corresponding to "large", "middle", and "small" degrees. Adverbs with a degree of ``great'' include ``more'', ``much'', and ``much''. Adverbs with a degree of ``middle'' include ``somewhat'' and ``moderate''. Adverbs with a degree of ``small'' include ``slightly'' and ``slightly''. For example, in the case of the character string "Raise the speed a little more", the ambiguous part is "a little more". In the case of the character string "lower the temperature a little", the ambiguous part is "a little". In the case of the character string "show me a further map", the ambiguous part is "more". In the case of the string "turn up the volume significantly", the ambiguous part is "significantly". Also, the ambiguous part is not limited to a part of the character string. The entire string may be an ambiguous part, which is when the string does not form a sentence. An example is a sentence lacking a predicate (verb), a sentence missing a required object, and the like. For example, in the case of character strings such as "more", "not yet", "not yet", "much", "very", "a lot", "a little more", "a little more", and "not yet", all of these are ambiguous parts. be. As another example, for the strings "stop", "no", "stop", all of these are ambiguous parts.

本実施形態では、抽出部１２３は、文字列に曖昧部分が含まれているかどうかを判断する。抽出部１２３は、文字列に曖昧部分が含まれている場合には、文字列から曖昧部分を抽出し、解釈部１２４に与える。一方、抽出部１２３は、文字列に曖昧部分が含まれていない場合には、文字列を生成部１２５に与える。 In this embodiment, the extraction unit 123 determines whether the character string includes an ambiguous part. If the character string contains an ambiguous part, the extraction unit 123 extracts the ambiguous part from the character string and provides it to the interpretation unit 124 . On the other hand, the extraction unit 123 provides the character string to the generation unit 125 when the character string does not contain an ambiguous part.

解釈部１２４は、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報を利用して曖昧部分の解釈をする。つまり、処理部１２は、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報を利用して曖昧部分の解釈をする解釈ステップを実行する。 The interpretation unit 124 interprets the ambiguous part using mobile information including at least one of the operation history of the mobile 100 and the state of the mobile 100 . In other words, the processing unit 12 executes an interpretation step of interpreting the ambiguous portion using mobile information including at least one of the operation history of the mobile 100 and the state of the mobile 100 .

解釈部１２４は、抽出部１２３によって抽出された曖昧部分の解釈を行う。特に、解釈部１２４は、副詞部分の解釈では、副詞部分が表す程度に応じて移動体１００の操作に関連するパラメータの変量又は目標値を決定する。特に、解釈部１２４は、副詞部分が表す程度の度合い（例えば、大、中、小の３分類）を判断し、これをパラメータの変量又は目標値に反映させる The interpretation unit 124 interprets the ambiguous part extracted by the extraction unit 123 . In particular, in interpreting the adverb part, the interpreting unit 124 determines the variable amount or the target value of the parameter related to the operation of the moving body 100 according to the degree of expression of the adverb part. In particular, the interpreting unit 124 determines the degree of expression of the adverb part (for example, three categories of large, medium, and small), and reflects this in the parameter variable or target value.

この場合、まず、解釈部１２４は、文字列における曖昧部分を除く部分と移動体情報１４１との少なくとも一方を利用して、音声データに応じて変更するパラメータを決定する。本実施形態では、解釈部１２４は、文字列における曖昧部分を除く部分に目的語が含まれているかどうかを判断する。 In this case, first, the interpreting unit 124 uses at least one of the part of the character string excluding the ambiguous part and the moving body information 141 to determine the parameter to be changed according to the voice data. In this embodiment, the interpreter 124 determines whether or not the part of the character string excluding the ambiguous part contains the object.

目的語があれば、解釈部１２４は、目的語に基づいて、パラメータを決定する。例えば、文字列が「速度をもう少し上げて」である場合、曖昧部分は「もう少し」であり、曖昧部分を除く部分は、述語（動詞）「上げて」と目的語「速度」とを含んでいる。この場合、解釈部１２４は、文字列における曖昧部分を除く部分（目的語「速度」）を利用して、パラメータを決定する。つまり、解釈部１２４は、走行制御装置４０での速度を、音声データに応じて変更するパラメータとして選択する。なお、解釈部１２４は、動詞に基づいて、パラメータを増やすか減らすかを決定してよい。ここでは動詞は「上げて」であるから、解釈部１２４は、パラメータを増やすと決定できる。例えば、文字列が「温度をちょっと下げて」である場合、曖昧部分を除く部分には、目的語「温度」がある。この場合、解釈部１２４は、空調装置５０での温度（設定温度）を、音声データに応じて変更するパラメータとして選択する。また、文字列が「もっと先の地図を表示して」である場合に、曖昧部分を除く部分には、目的語「地図」がある。この場合、解釈部１２４は、カーナビゲーションシステム６０で表示する地図の現在地からの距離を、音声データに応じて変更するパラメータとして選択する。また、文字列が「音量をかなり上げて」である場合、曖昧部分を除く部分には、目的語「音量」がある。この場合、解釈部１２４は、オーディオシステム７０での音量を、音声データに応じて変更するパラメータとして選択する。 If there is an object, the interpreter 124 determines parameters based on the object. For example, if the string is "speed up a little more", the ambiguous part is "a little more" and the disambiguating part contains the predicate (verb) "speed up" and the object "speed". there is In this case, the interpreting unit 124 determines the parameters by using the part of the character string that is not ambiguous (the object "velocity"). That is, the interpreting unit 124 selects the speed in the cruise control device 40 as a parameter to be changed according to the voice data. Note that the interpretation unit 124 may determine whether to increase or decrease the parameter based on the verb. Since the verb here is "raise", the interpreter 124 can decide to increase the parameter. For example, if the string is "Turn the temperature down a little", the disambiguated part has the object "temperature". In this case, the interpretation unit 124 selects the temperature (set temperature) of the air conditioner 50 as a parameter to be changed according to the voice data. In addition, when the character string is "display a further map", the part excluding the ambiguous part has the object "map". In this case, the interpretation unit 124 selects the distance from the current location on the map displayed by the car navigation system 60 as a parameter to be changed according to the voice data. Also, if the string is "very loud", then the disambiguated part has the object "volume". In this case, the interpreter 124 selects the volume of the audio system 70 as a parameter to be changed according to the audio data.

目的語がない場合、解釈部１２４は、移動体情報１４１を利用して、音声データに応じて変更するパラメータを決定する。例えば、解釈部１２４は、移動体情報１４１に含まれる移動体１００の操作履歴を参照する。例えば、文字列が「もう少し」である場合、この文字列は、副詞のみであり、目的語が含まれていない。この場合、解釈部１２４は、移動体情報１４１に含まれる操作履歴を利用して、パラメータを決定する。つまり、解釈部１２４は、移動体１００の操作履歴に基づいて、曖昧部分が係る語句を判定する。例えば、操作履歴によれば、音声データの入力の直前にドライバーが速度を上げていた場合、解釈部１２４は、走行制御装置４０での速度を音声データに応じて変更するパラメータとして選択する。更に、解釈部１２４は、操作履歴から、パラメータを増やすか減らすかを決定してよい。操作履歴では速度を挙げているから、解釈部１２４は、パラメータを増やすと決定できる。また、操作履歴によれば、音声データの入力の直前にドライバーが空調装置５０で温度（設定温度）を上げていた場合、解釈部１２４は、空調装置５０の設定温度を音声データに応じて変更するパラメータとして選択する。また、操作履歴によれば、音声データの入力の直前にドライバーがカーナビゲーションシステム６０の画面のサイズを大きくしていた場合、解釈部１２４は、カーナビゲーションシステム６０の画面のサイズを音声データに応じて変更するパラメータとして選択する。また、操作履歴によれば、音声データの入力の直前にドライバーがオーディオシステム７０での音量を上げていた場合、解釈部１２４は、オーディオシステム７０での音量を音声データに応じて変更するパラメータとして選択する。 If there is no object, the interpreter 124 uses the mobile information 141 to determine parameters to change according to the voice data. For example, the interpretation unit 124 refers to the operation history of the mobile object 100 included in the mobile object information 141 . For example, if the string is "a little more", the string is an adverb only and does not contain an object. In this case, the interpretation unit 124 uses the operation history included in the moving body information 141 to determine the parameters. In other words, the interpreting unit 124 determines the word/phrase related to the ambiguous part based on the operation history of the mobile object 100 . For example, if the operation history indicates that the driver increased the speed immediately before inputting the voice data, the interpretation unit 124 selects the speed in the cruise control device 40 as a parameter to be changed according to the voice data. Furthermore, the interpretation unit 124 may determine whether to increase or decrease the parameters from the operation history. Since the operation history mentions the speed, the interpretation unit 124 can decide to increase the parameter. Further, according to the operation history, if the driver has raised the temperature (set temperature) of the air conditioner 50 immediately before inputting the voice data, the interpretation unit 124 changes the set temperature of the air conditioner 50 according to the voice data. parameter. Further, according to the operation history, if the driver has increased the screen size of the car navigation system 60 immediately before inputting the voice data, the interpreting unit 124 adjusts the screen size of the car navigation system 60 according to the voice data. to select as the parameter to change. Further, according to the operation history, if the driver has increased the volume of the audio system 70 immediately before inputting the audio data, the interpreting unit 124 sets the volume of the audio system 70 according to the audio data as the parameter select.

このようにして、解釈部１２４は、文字列における曖昧部分を除く部分と移動体情報１４１との少なくとも一方を利用して、音声データに応じて変更するパラメータを決定する。解釈部１２４が変更するパラメータを決定した時点では、文字列からは、どの程度パラメータを変更すればよいのかが不明である。そこで、解釈部１２４は、副詞部分が表す程度に応じて移動体１００の操作に関連するパラメータの変量又は目標値を決定する。 In this way, the interpreting unit 124 uses at least one of the portion of the character string excluding the ambiguous portion and the moving body information 141 to determine the parameter to be changed according to the voice data. When the interpretation unit 124 determines the parameter to be changed, it is unclear from the character string how much the parameter should be changed. Therefore, the interpreting unit 124 determines the variable amount or target value of the parameter related to the operation of the moving body 100 according to the degree of expression of the adverb part.

本実施形態では、解釈部１２４は、予め設定された１以上のルールに従って曖昧部分の解釈をする。下記の表１は、速度というパラメータに関するルールの例を示す。表１のルールは、副詞部分の程度と速度の現在値とから、速度の変量を決定するルールである。表１のルールでは、変更するパラメータの現在値から、その変量を決定している。 In this embodiment, the interpretation unit 124 interprets the ambiguous part according to one or more preset rules. Table 1 below shows an example of a rule for the parameter velocity. The rules in Table 1 are rules for determining the velocity variable from the degree of the adverb part and the current value of the velocity. The rules in Table 1 determine the variables from the current values of the parameters to be changed.

このルールに従い、解釈部１２４は、移動体情報から速度の現在値を取得し、副詞部分の程度及び速度の現在値に応じて、速度の変量を決定する。例えば、曖昧部分が「もう少し」である場合、副詞部分の程度の度合いが「小」となる。そして、速度の現在値が９０ｋｍ／ｈであれば、解釈部１２４は、「もう少し」という曖昧部分（副詞部分）の程度が、速度というパラメータについて「５ｋｍ／ｈ」である、という解釈をする。また、速度の現在値が１５ｋｍ／ｈであれば、解釈部１２４は、「もう少し」という曖昧部分（副詞部分）の程度が、速度というパラメータについて「１ｋｍ／ｈ」である、という解釈をする。 According to this rule, the interpreting unit 124 acquires the current value of speed from the moving body information, and determines the variation of speed according to the degree of the adverb part and the current value of speed. For example, if the ambiguous part is "a little more", the degree of degree of the adverb part is "small". Then, if the current speed value is 90 km/h, the interpreting unit 124 interprets that the vague part (adverb part) of "just a little more" is "5 km/h" for the parameter of speed. Also, if the current speed value is 15 km/h, the interpretation unit 124 interprets that the vague part (adverb part) of "just a little more" is "1 km/h" for the parameter of speed.

また、下記の表２は、カーナビゲーションシステム６０の画面に表示する地図の現在地からの距離というパラメータに関するルールの例を示す。表２のルールは、副詞部分の程度と速度の現在値とから、目標値を決定している。ここで、目標値は、カーナビゲーションシステム６０の画面に表示する地図の現在地からの距離である。この例では、現在地からの距離というパラメータの目標値を決定するにあたっては、速度の現在値を利用している。つまり、曖昧部分の解釈においては、パラメータの変量又は目標値を決定するにあたり、別のパラメータを参照してよい。 Table 2 below shows an example of rules relating to the parameter of the distance from the current location of the map displayed on the screen of the car navigation system 60 . The rules in Table 2 determine the target value from the degree of the adverb part and the current value of speed. Here, the target value is the distance from the current location on the map displayed on the screen of car navigation system 60 . In this example, the current value of velocity is used to determine the target value of the parameter distance from the current location. That is, the interpretation of the ambiguity may refer to another parameter in determining the variable or target value of the parameter.

このルールに従い、解釈部１２４は、移動体情報から速度の現在値を取得し、副詞部分の程度及び速度の現在値に応じて、表示する地図の現在地からの距離の目標値を決定する。例えば、曖昧部分が「もっと」である場合、副詞部分の程度の度合いが「小」となる。そして、速度の現在値が９０ｋｍ／ｈであれば、解釈部１２４は、「もっと」という曖昧部分（副詞部分）の程度が、表示する地図の現在値からの距離というパラメータについて「２ｋｍ」である、という解釈をする。また、速度の現在値が１５ｋｍ／ｈであれば、解釈部１２４は、「もっと」という曖昧部分（副詞部分）の程度が、表示する地図の現在値からの距離というパラメータについて「３００ｍ」である、という解釈をする。 According to this rule, the interpreting unit 124 acquires the current speed value from the moving object information, and determines the target value of the distance from the current location of the map to be displayed according to the degree of the adverb part and the current speed value. For example, when the ambiguous part is "more", the degree of degree of the adverb part is "small". Then, if the current value of the speed is 90 km/h, the interpreting unit 124 determines that the degree of the ambiguous part (adverb part) of "more" is "2 km" for the parameter of the distance from the current value of the map to be displayed. , the interpretation. Also, if the current value of the speed is 15 km/h, the interpretation unit 124 determines that the degree of the ambiguous part (adverb part) of "more" is "300 m" for the parameter of the distance from the current value of the map to be displayed. , the interpretation.

一方、解釈部１２４は、曖昧部分が程度の副詞部分ではない場合には、曖昧部分が否定的な意味合いかどうか判断する。例えば、解釈部１２４は、抽出部１２３で抽出された曖昧部分が、否定的な意味合いである曖昧部分に合致するかどうかを判断する。否定的な意味合いである曖昧部分の例としては、「ストップ」、「だめ」、「やめて」という文字列がある。解釈部１２４は、曖昧部分が否定的な意味合いである場合には、曖昧部分が否定的な意味合いであるという解釈の結果を、生成部１２５に与える。 On the other hand, if the ambiguous part is not an adverb of degree, the interpretation unit 124 determines whether the ambiguous part has a negative connotation. For example, the interpreting unit 124 determines whether the ambiguous part extracted by the extracting unit 123 matches the ambiguous part having a negative connotation. Examples of ambiguities that have a negative connotation are the strings "stop", "no", and "stop". If the ambiguous part has a negative connotation, the interpreting part 124 gives the generating part 125 an interpretation result indicating that the ambiguous part has a negative connotation.

以上述べたように、解釈部１２４は、抽出部１２３によって抽出された曖昧部分の解釈を行う。そして、解釈部１２４は、曖昧部分の解釈の結果を、生成部１２５に与える。 As described above, the interpretation unit 124 interprets the ambiguous part extracted by the extraction unit 123 . Then, the interpreting unit 124 gives the interpretation result of the ambiguous part to the generating unit 125 .

生成部１２５は、文字データから移動体１００の操作コマンドを生成する。つまり、生成部１２５は、文字データから移動体１００の操作コマンドを生成する生成ステップを実行する。 The generation unit 125 generates an operation command for the moving body 100 from character data. That is, the generation unit 125 executes a generation step of generating an operation command for the moving body 100 from character data.

生成部１２５は、文字データに曖昧部分がない場合には、文字データから移動体１００の操作コマンドを生成する。例えば、「速度を５キロ上げて」という文字列であれば、生成部１２５は、文字列の目的語に基づいてパラメータを決定する。また、生成部１２５は、文字列の述語（動詞）に基づいて、パラメータを増やすか減らすかを決定する。ここでは動詞は「上げて」であるから、生成部１２５は、パラメータを増やすと決定する。更に、生成部１２５は。文字列からパラメータの変量を決定する。ここでは、「５キロ」という部分に基づいて、速度のパラメータの変量を５ｋｍ／ｈと判断する。その結果、生成部１２５は、速度を５ｋｍ／ｈ増加させることを指示する操作コマンドを生成する。生成部１２５は、生成した操作コマンドを、入出力部１１を介して走行制御装置４０に出力する。このような文字データに曖昧部分がない場合の処理は、従来周知の技術により実現可能であるから、詳細には説明しない。 The generation unit 125 generates an operation command for the moving body 100 from the character data when there is no ambiguous part in the character data. For example, if the character string is "Increase speed by 5 kilometers", the generation unit 125 determines parameters based on the object of the character string. Also, the generation unit 125 determines whether to increase or decrease the parameter based on the predicate (verb) of the character string. Since the verb here is "raise", the generator 125 determines to increase the parameter. Furthermore, the generation unit 125 is: Determines the variability of a parameter from a string. Here, based on the part "5 km", the variation of the speed parameter is determined to be 5 km/h. As a result, the generator 125 generates an operation command instructing to increase the speed by 5 km/h. Generation unit 125 outputs the generated operation command to cruise control device 40 via input/output unit 11 . Such processing when there is no ambiguous part in the character data can be realized by a conventionally known technique, so detailed description thereof will be omitted.

生成部１２５は、文字データに曖昧部分がある場合には、解釈部１２４での解釈の結果に基づいて、文字データから移動体１００の操作コマンドを生成する。ここで、曖昧部分が、程度を表す副詞部分であれば、解釈部１２４の解釈の結果から、生成部１２５は、音声データに応じて変更するパラメータ、及びパラメータの目標値又は変量を取得して、文字データから操作コマンドを生成する。例えば、文字列が「速度をもう少し上げて」であって、解釈部１２４が、「もう少し」という曖昧部分（副詞部分）の程度が、速度というパラメータについて「５ｋｍ／ｈ」である、という解釈をしたとする。この場合、生成部１２５は、解釈部１２４の解釈の結果に基づいて、「速度をもう少し上げて」という文字列を表す文字データから、速度を５ｋｍ／ｈ増加させることを指示する操作コマンドを生成する。生成部１２５は、生成した操作コマンドを、入出力部１１を介して走行制御装置４０に出力する。したがって、文字データが表す文字列に曖昧部分がある場合でも、音声操作システム１０は、音声による移動体１００の操作を可能にする。 If the character data has an ambiguous part, the generation unit 125 generates an operation command for the moving object 100 from the character data based on the result of interpretation by the interpretation unit 124 . Here, if the ambiguous part is an adverb part that expresses a degree, the generation unit 125 acquires the parameter to be changed according to the speech data and the target value or variable of the parameter from the interpretation result of the interpretation unit 124. , to generate operation commands from character data. For example, if the character string is "speed up a little more", the interpreting unit 124 interprets that the vague part (adverb part) of "a little more" is "5 km/h" for the parameter speed. Suppose In this case, the generating unit 125 generates an operation command instructing to increase the speed by 5 km/h from the character data representing the character string “Increase the speed a little more” based on the interpretation result of the interpreting unit 124. do. Generation unit 125 outputs the generated operation command to cruise control device 40 via input/output unit 11 . Therefore, even if there is an ambiguous part in the character string represented by the character data, the voice operation system 10 enables voice operation of the moving object 100 .

一方、解釈部１２４で曖昧部分が否定的な意味合いに解釈された場合、生成部１２５は、移動体１００の操作を中止するための操作コマンドを生成する。特に、生成部１２５は、直前の移動体１００の操作を中止するための操作コマンドを生成する。直前の移動体１００の操作は、例えば、直前に生成した操作コマンドによる移動体１００の操作である。一例として、移動体１００のドライバーが「ストップ」という音声を発した場合、抽出部１２３は「ストップ」という文字列を表す文字データから「ストップ」という部分を曖昧部分として抽出する。そして、解釈部１２４は「ストップ」という曖昧部分を否定的な意味合いに解釈する。そして、生成部１２５は、直前に、速度を５ｋｍ／ｈ増加させることを指示する操作コマンドを生成していた場合、この操作を中止するための操作コマンドを生成する。そして、生成部１２５は、生成した操作コマンドを、入出力部１１を介して走行制御装置４０に出力する。これによって、音声操作システム１０は、ドライバーが誤った操作を行い、とっさに取り消したい場合にも対応することが可能となる。 On the other hand, when the interpretation unit 124 interprets the ambiguous part to have a negative connotation, the generation unit 125 generates an operation command for canceling the operation of the moving body 100 . In particular, the generation unit 125 generates an operation command for canceling the previous operation of the moving body 100 . The previous operation of the moving body 100 is, for example, the operation of the moving body 100 by the operation command generated immediately before. As an example, when the driver of the moving object 100 utters the sound "stop", the extraction unit 123 extracts the part "stop" from the character data representing the character string "stop" as the ambiguous part. Then, the interpretation unit 124 interprets the ambiguous part "stop" as having a negative connotation. Then, if the generation unit 125 generated an operation command instructing to increase the speed by 5 km/h immediately before, it generates an operation command for canceling this operation. Generation unit 125 then outputs the generated operation command to cruise control device 40 via input/output unit 11 . As a result, the voice operation system 10 can cope with the case where the driver performs an erroneous operation and wants to cancel it immediately.

１．３動作
以下、音声操作システム１０の動作について図３のフローチャートを参照して簡単に説明する。まず、音声入力システム２０に音声が入力された場合（Ｓ１１：Ｙｅｓ）、音声操作システム１０では、取得部１２１が入出力部１１を通じて音声入力システム２０から音声データを取得する（Ｓ１２）。なお、本実施形態では、取得部１２１は、音声データの音声が特定の人物の音声である場合だけ、音声データを取得する。つまり、特定の人物以外の音声の音声データは破棄される。取得部１２１が取得した音声データは、変換部１２２によって、文字データに変換される（Ｓ１３）。そして、文字データが表す文字列に曖昧部分があるかどうかが、抽出部１２３によって判断される（Ｓ１４）。曖昧部分がない場合（Ｓ１４：Ｎｏ）、生成部１２５は、文字データから操作コマンドを生成する（Ｓ１５）。そして、生成部１２５は、生成した操作コマンドを、対応するシステムに出力する（Ｓ１６）。一方、文字データが表す文字列に曖昧部分がある場合（Ｓ１４：Ｎｏ）、解釈部１２４で曖昧部分の解釈が行われる（Ｓ１７）。そして、生成部１２５は、解釈部１２４での解釈の結果に基づいて、文字データから操作コマンドを生成する（Ｓ１８）。そして、生成部１２５は、解釈部１２４での解釈の結果を利用して生成した操作コマンドを、対応するシステムに出力する（Ｓ１６）。 1.3 Operation The operation of the voice operation system 10 will be briefly described below with reference to the flowchart of FIG. First, when a voice is input to the voice input system 20 (S11: Yes), in the voice operation system 10, the acquisition unit 121 acquires voice data from the voice input system 20 through the input/output unit 11 (S12). Note that in the present embodiment, the acquisition unit 121 acquires voice data only when the voice of the voice data is the voice of a specific person. In other words, voice data of voices other than those of a specific person are discarded. The voice data acquired by the acquisition unit 121 is converted into character data by the conversion unit 122 (S13). Then, it is determined by the extraction unit 123 whether or not the character string represented by the character data has an ambiguous part (S14). If there is no ambiguous part (S14: No), the generator 125 generates an operation command from the character data (S15). The generator 125 then outputs the generated operation command to the corresponding system (S16). On the other hand, if there is an ambiguous part in the character string represented by the character data (S14: No), the interpretation unit 124 interprets the ambiguous part (S17). Then, the generation unit 125 generates an operation command from the character data based on the result of interpretation by the interpretation unit 124 (S18). Then, the generation unit 125 outputs the operation command generated using the result of interpretation by the interpretation unit 124 to the corresponding system (S16).

１．４まとめ
以上述べた音声操作システム１０は、変換部１２２と、抽出部１２３と、解釈部１２４と、生成部１２５とを備える。変換部１２２は、移動体１００の操作のための音声データを文字データに変換する。抽出部１２３は、文字データで表される文字列から曖昧部分を抽出する。解釈部１２４は、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報１４１を利用して曖昧部分の解釈をする。生成部１２５は、曖昧部分の解釈の結果に基づいて、文字データから移動体１００の操作コマンドを生成する。音声操作システム１０によれば、自然な発話で移動体１００を操作できる。 1.4 Summary The voice operation system 10 described above comprises a converter 122 , an extractor 123 , an interpreter 124 and a generator 125 . The conversion unit 122 converts voice data for operating the mobile object 100 into character data. Extraction unit 123 extracts an ambiguous part from a character string represented by character data. The interpreting unit 124 interprets the ambiguous part using the moving body information 141 including at least one of the operation history of the moving body 100 and the state of the moving body 100 . The generation unit 125 generates an operation command for the moving body 100 from the character data based on the interpretation result of the ambiguous part. According to the voice operation system 10, the moving object 100 can be operated by natural speech.

換言すれば、音声操作システム１０は、下記の方法（音声操作方法）を実行しているといえる。音声操作方法は、変換ステップと、抽出ステップと、解釈ステップと、生成ステップとを含む。変換ステップは、移動体１００の操作のための音声データを文字データに変換するステップである。抽出ステップは、文字データで表される文字列から曖昧部分を抽出するステップである。解釈ステップは、移動体１００の操作履歴及び移動体１００の状態の少なくとも一方を含む移動体情報１４１を利用して曖昧部分の解釈をするステップである。生成ステップは、曖昧部分の解釈の結果に基づいて、文字データから移動体１００の操作コマンドを生成するステップである。この音声操作方法によれば、音声操作システム１０と同様に、自然な発話で移動体１００を操作できる。 In other words, it can be said that the voice operation system 10 executes the following method (voice operation method). The voice manipulation method includes a conversion step, an extraction step, an interpretation step, and a generation step. The conversion step is a step of converting voice data for operating the mobile object 100 into character data. The extraction step is a step of extracting an ambiguous portion from a character string represented by character data. The interpretation step is a step of interpreting the ambiguous part using the mobile body information 141 including at least one of the operation history of the mobile body 100 and the state of the mobile body 100 . The generating step is a step of generating an operation command for the moving body 100 from the character data based on the interpretation result of the ambiguous part. According to this voice operation method, like the voice operation system 10, the moving object 100 can be operated by natural speech.

音声操作方法は、１以上のプロセッサがプログラム（コンピュータプログラム）を実行することにより実現される。このプログラムは、１以上のプロセッサに音声操作方法を実行させるためのプログラムである。このようなプログラムによれば、音声操作方法と同様に、自然な発話で移動体１００を操作できる。 The voice operation method is implemented by one or more processors executing a program (computer program). This program is a program for causing one or more processors to execute the voice operation method. According to such a program, the mobile object 100 can be operated with natural speech, similar to the voice operation method.

２．変形例
本開示の実施形態は、上記実施形態に限定されない。上記実施形態は、本開示の目的を達成できれば、設計等に応じて種々の変更が可能である。以下に、上記実施形態の変形例を列挙する。 2. Modifications Embodiments of the present disclosure are not limited to the embodiments described above. The above-described embodiment can be modified in various ways according to design and the like, as long as the object of the present disclosure can be achieved. Modifications of the above embodiment are listed below.

一変形例では、入出力部１１、通信部１３及び記憶部１４は必須ではない。音声操作システム１０は、処理部１２が音声入力システム２０、情報取得システム３０、走行制御装置４０、空調装置５０、カーナビゲーションシステム６０、オーディオシステム７０、外部システム８０と情報の授受をするように構成されていてよい。 In one modification, the input/output unit 11, the communication unit 13, and the storage unit 14 are not essential. The voice operation system 10 is configured such that the processing unit 12 exchanges information with the voice input system 20, the information acquisition system 30, the travel control device 40, the air conditioner 50, the car navigation system 60, the audio system 70, and the external system 80. It can be.

一変形例では、取得部１２１は、必ずしも、音声データを特定の人物の音声から取得しなくてもよい。また、変換部１２２は、文字データが表す文字列に特定の文字列が確認されなくても、文字データの抽出部１２３への出力をしてよい。あるいは、取得部１２１は、特定の操作がされた際に、音声データを取得して変換部１２２に与え、これによって、変換部１２２が文字データを抽出部１２３へ出力してよい。特定の操作としては、発話開始の通知のためのボタンが押すことが挙げられる。これにより、プッシュ・トゥ・トーク（ＰＴＴ）のように、音声を入力することが可能となる。 In a modified example, the acquisition unit 121 does not necessarily have to acquire voice data from a specific person's voice. The conversion unit 122 may output the character data to the extraction unit 123 even if a specific character string is not confirmed in the character string represented by the character data. Alternatively, the acquisition unit 121 may acquire voice data and provide it to the conversion unit 122 when a specific operation is performed, so that the conversion unit 122 outputs character data to the extraction unit 123 . A specific operation includes pressing a button for notifying the start of speech. This makes it possible to input voice like push-to-talk (PTT).

一変形例では、解釈部１２４で用いるルールは、関数であってもよい。つまり、副詞部分の程度及び移動体情報１４１からの数値に対してパラメータの目標値又は変量を与える関数を利用可能である。これによって、パラメータの目標値又は変量の設定の自由度が向上する。 In one variation, the rules used by interpreter 124 may be functions. That is, it is possible to use a function that gives a parameter target value or variable to the degree of the adverb part and the numerical value from the moving body information 141 . This improves the degree of freedom in setting parameter target values or variables.

一変形例では、解釈部１２４において、学習済みモデルを利用することが可能である。例えば、解釈部１２４は、移動体情報１４１と曖昧部分の解釈の結果との関係を学習した学習済みモデルを利用して、抽出部１２３で抽出した曖昧部分の解釈をしてよい。換言すれば、解釈ステップは、移動体情報１４１と曖昧部分の解釈の結果との関係を学習した学習済みモデルを利用して、抽出ステップで抽出した曖昧部分の解釈をするステップであってよい。 In one variation, the interpreter 124 can utilize a trained model. For example, the interpreting unit 124 may interpret the ambiguous portion extracted by the extracting unit 123 using a learned model that has learned the relationship between the moving object information 141 and the interpretation result of the ambiguous portion. In other words, the interpretation step may be a step of interpreting the ambiguous portion extracted in the extraction step using a learned model that has learned the relationship between the moving body information 141 and the interpretation result of the ambiguous portion.

この場合、記憶部１４は、解釈部１２４での解釈に使用される解釈モデルを格納する。解釈モデルは、移動体情報１４１と曖昧部分の解釈の結果との関係を示す学習用データセットにより、人工知能のプログラム（アルゴリズム）に、移動体情報１４１と曖昧部分の解釈の結果との関係を学習させることで、生成される。人工知能のプログラムは、機械学習のモデルであって、例えば、階層モデルの一種であるニューラルネットワークが用いられる。解釈モデルは、ニューラルネットワークに学習用データセットで機械学習（例えば、深層学習）を行わせることで、生成される。つまり、解釈モデルは、移動体情報１４１と曖昧部分の解釈の結果との関係を学習した学習済みモデルである。解釈モデルは、音声操作システム１０の処理部１２又は外部システム８０により生成されてよい。 In this case, the storage unit 14 stores interpretation models used for interpretation by the interpretation unit 124 . The interpretation model provides an artificial intelligence program (algorithm) with the relationship between the moving object information 141 and the interpretation result of the ambiguous part using a learning data set showing the relationship between the moving object information 141 and the interpretation result of the ambiguous part. Generated by learning. An artificial intelligence program is a model of machine learning, and for example, a neural network, which is a kind of hierarchical model, is used. The interpreted model is generated by having the neural network perform machine learning (eg, deep learning) on a training dataset. In other words, the interpretation model is a trained model that has learned the relationship between the moving object information 141 and the interpretation result of the ambiguous part. The interpretation model may be generated by the processing portion 12 of the voice manipulation system 10 or by an external system 80 .

解釈部１２４は、記憶部１４に記憶された解釈モデルを利用して、文字列が曖昧部分を含む文字データに基づいて曖昧部分の解釈を行う。解釈部１２４は、文字データを受け取ると、解釈モデルに、文字データ及び移動体情報１４１を入力して、曖昧部分の解釈の結果を出力させる。解釈部１２４は、曖昧部分の解釈の結果が得られると、生成部１２５に与える。 The interpretation unit 124 uses the interpretation model stored in the storage unit 14 to interpret the ambiguous part based on the character data in which the character string includes the ambiguous part. Upon receiving the character data, the interpretation unit 124 inputs the character data and the moving body information 141 to the interpretation model, and causes the interpretation model to output the interpretation result of the ambiguous part. The interpreting unit 124 provides the generation unit 125 with the interpretation result of the ambiguous part when it is obtained.

音声操作システム１０では、解釈部１２４は、解釈モデルを生成するための学習用データを収集し、蓄積してもよい。このように解釈部１２４で新たに収集した学習用データは、解釈モデルの再学習に利用でき、これによって、解釈モデル（学習済みモデル）の性能の向上が図れる。特に、程度を表す副詞部分の解釈を行った後に受け取った曖昧部分が否定的な意味合いであった場合に、程度を表す副詞部分の解釈が誤りであったと考えられる。そこで、程度を表す副詞部分の誤った解釈を学習用データに利用すれば、解釈モデルの性能の向上が図れる。 In the voice operation system 10, the interpreter 124 may collect and accumulate learning data for generating an interpretation model. The learning data newly collected by the interpretation unit 124 in this way can be used for re-learning the interpretation model, thereby improving the performance of the interpretation model (learned model). In particular, it is considered that the adverbial portion of degree was misinterpreted if the ambiguous portion received after the adverbial portion of degree was interpreted had a negative connotation. Therefore, the performance of the interpretation model can be improved by using the erroneous interpretation of the adverb part that expresses the degree as training data.

上記実施形態では、音声データで変更するパラメータが速度や表示する地図の現在地からの距離である場合に、移動体情報として、速度の現在値を用いている。しかしながら、移動体情報１４１は、速度の現在値に限定されず、上述した移動体１００の操作履歴及び移動体１００の情報（内部情報及び外部情報）のいずれを利用してもよい。例えば、走行制御装置４０に関する曖昧部分の解釈では、移動体１００の速度（現在の速度）、移動体１００の操舵角、移動体１００の性能、移動体１００が走行中の道路の平均速度が利用できる。例えば、空調装置５０に関する曖昧部分の解釈では、移動体１００の周囲の温度（現在の温度）、移動体１００に乗っている人物の状態（現在の状態）、移動体１００の位置情報、天気情報、時間情報が利用できる。例えば、カーナビゲーションシステム６０に関する曖昧部分の解釈では、移動体１００の速度（現在の速度）、移動体１００の操舵角、移動体１００のエネルギー（ガソリン、電気）の残量、移動体１００の位置情報、天気情報、時間情報が利用できる。例えば、オーディオシステム７０に関する曖昧部分の解釈では、移動体１００の周囲の騒音レベル、移動体１００に乗っている人物の状態（現在の状態）、天気情報、時間情報が利用できる。 In the above embodiment, when the parameters to be changed by the voice data are the speed and the distance from the current location on the map to be displayed, the current value of the speed is used as the moving body information. However, the moving object information 141 is not limited to the current speed value, and any of the above-described operation history of the moving object 100 and information (internal information and external information) of the moving object 100 may be used. For example, in the interpretation of the ambiguous portion regarding the travel control device 40, the speed (current speed) of the mobile object 100, the steering angle of the mobile object 100, the performance of the mobile object 100, and the average speed of the road on which the mobile object 100 is traveling are used. can. For example, in the interpretation of the ambiguous part regarding the air conditioner 50, the ambient temperature (current temperature) of the mobile object 100, the state of the person riding the mobile object 100 (current state), the position information of the mobile object 100, the weather information , time information is available. For example, in the interpretation of the ambiguous part regarding the car navigation system 60, the speed (current speed) of the mobile object 100, the steering angle of the mobile object 100, the remaining amount of energy (gasoline, electricity) of the mobile object 100, the position of the mobile object 100 Information, weather information, time information is available. For example, in interpreting the ambiguous portion regarding the audio system 70, the noise level around the vehicle 100, the state (current state) of the person riding the vehicle 100, weather information, and time information can be used.

上記実施形態では、音声操作システム１０は、移動体１００の本体１０１に搭載されているが、音声操作システム１０は、移動体１００と通信可能な装置に搭載されてよい。つまり、音声操作システム１０は、移動体１００の遠隔操作にも利用可能である。また、音声操作システム１０での操作の対象となる移動体１００は、自動車に限らず、例えば、二輪車、電車、ドローン、航空機、建設機械、及び船舶等、自動車以外の移動体にも適用可能である。 Although the voice operation system 10 is mounted on the main body 101 of the moving object 100 in the above embodiment, the voice operation system 10 may be mounted on a device capable of communicating with the moving object 100 . In other words, the voice operation system 10 can also be used for remote operation of the moving object 100 . Further, the mobile object 100 to be operated by the voice operation system 10 is not limited to automobiles, and can be applied to mobile objects other than automobiles, such as motorcycles, trains, drones, aircraft, construction machinery, and ships. be.

一変形例では、音声操作システム１０は、複数のコンピュータにより構成されていてもよい。例えば、音声操作システム１０の機能（特に、取得部１２１、変換部１２２、抽出部１２３、解釈部１２４、及び生成部１２５）は、複数の装置に分散されていてもよい。更に、音声操作システム１０の機能の少なくとも一部が、例えば、クラウド（クラウドコンピューティング）によって実現されていてもよい。 In one variation, the voice operation system 10 may be composed of multiple computers. For example, the functions of the voice manipulation system 10 (particularly the acquisition unit 121, the conversion unit 122, the extraction unit 123, the interpretation unit 124, and the generation unit 125) may be distributed among multiple devices. Furthermore, at least part of the functions of the voice operation system 10 may be realized by cloud (cloud computing), for example.

以上述べた音声操作システム１０の実行主体は、コンピュータシステムを含んでいる。コンピュータシステムは、ハードウェアとしてのプロセッサ及びメモリを有する。コンピュータシステムのメモリに記録されたプログラムをプロセッサが実行することによって、本開示における音声操作システム１０の実行主体としての機能が実現される。プログラムは、コンピュータシステムのメモリに予め記録されていてもよいが、電気通信回線を通じて提供されてもよい。また、プログラムは、コンピュータシステムで読み取り可能なメモリカード、光学ディスク、ハードディスクドライブ等の非一時的な記録媒体に記録されて提供されてもよい。コンピュータシステムのプロセッサは、半導体集積回路（ＩＣ）又は大規模集積回路（ＬＳＩ）を含む１乃至複数の電子回路で構成される。ＬＳＩの製造後にプログラムされる、フィールド・プログラマブル・ゲート・アレイ（ＦＧＰＡ）、ＡＳＩＣ（application specific integrated circuit）、又はＬＳＩ内部の接合関係の再構成又はＬＳＩ内部の回路区画のセットアップができる再構成可能な論理デバイスも同じ目的で使うことができる。複数の電子回路は、１つのチップに集約されていてもよいし、複数のチップに分散して設けられていてもよい。複数のチップは、１つの装置に集約されていてもよいし、複数の装置に分散して設けられていてもよい。 The execution subject of the voice operation system 10 described above includes a computer system. A computer system has a processor and memory as hardware. The processor executes a program recorded in the memory of the computer system, thereby realizing the function of the voice operation system 10 of the present disclosure as an execution entity. The program may be prerecorded in the memory of the computer system, or may be provided through an electric communication line. Also, the program may be provided by being recorded on a non-temporary recording medium such as a computer system-readable memory card, optical disk, hard disk drive, or the like. A processor in a computer system consists of one or more electronic circuits including semiconductor integrated circuits (ICs) or large scale integrated circuits (LSIs). A field programmable gate array (FGPA), an ASIC (application specific integrated circuit), or a reconfigurable field programmable gate array (FGPA) or ASIC (application specific integrated circuit), which can be programmed after the LSI is manufactured, or which can reconfigure junction relationships inside the LSI or set up circuit partitions inside the LSI. Logical devices can also be used for the same purpose. A plurality of electronic circuits may be integrated into one chip, or may be distributed over a plurality of chips. A plurality of chips may be integrated in one device, or may be distributed in a plurality of devices.

３．態様
上記実施形態及び変形例から明らかなように、本開示は、下記の態様を含む。以下では、実施形態との対応関係を明示するためだけに、符号を括弧付きで付している。 3. Aspects As is clear from the above embodiments and modifications, the present disclosure includes the following aspects. In the following, reference numerals are attached with parentheses only for the purpose of clarifying correspondence with the embodiments.

第１の態様は、音声操作方法であって、変換ステップと、抽出ステップと、解釈ステップと、生成ステップとを含む。前記「変換ステップは、移動体（１００）の操作のための音声データを文字データに変換する。前記抽出ステップは、前記文字データで表される文字列から曖昧部分を抽出する。前記解釈ステップは、前記移動体（１００）の操作履歴及び前記移動体（１００）の状態の少なくとも一方を含む移動体（１００）情報を利用して前記曖昧部分の解釈をする。前記生成ステップは、前記曖昧部分の解釈の結果に基づいて、前記文字データから前記移動体（１００）の操作コマンドを生成する。この態様によれば、自然な発話で移動体（１００）を操作できる。 A first aspect is a voice manipulation method comprising a conversion step, an extraction step, an interpretation step, and a generation step. The "conversion step" converts voice data for operating the mobile body (100) into character data. The extraction step extracts an ambiguous part from the character string represented by the character data. , the ambiguous part is interpreted using information of the mobile body (100) including at least one of an operation history of the mobile body (100) and a state of the mobile body (100). generates an operation command for the moving body 100 from the character data based on the interpretation result of 1. According to this mode, the moving body 100 can be operated by natural speech.

第２の態様は、第１の態様の音声操作方法に基づく。第２の態様では、前記曖昧部分、副詞又は副詞句である副詞部分である。この態様によれば、より自然な発話で移動体（１００）を操作できる。 The second aspect is based on the voice manipulation method of the first aspect. In a second aspect, the ambiguous part is an adverbial part that is an adverb or an adverbial phrase. According to this aspect, the moving object (100) can be operated with more natural speech.

第３の態様は、第２の態様の音声操作方法に基づく。第３の態様では、前記副詞部分は、程度を表す。この態様によれば、より自然な発話で移動体（１００）を操作できる。 A third aspect is based on the voice manipulation method of the second aspect. In a third aspect, the adverbial part expresses degree. According to this aspect, the moving object (100) can be operated with more natural speech.

第４の態様は、第３の態様の音声操作方法に基づく。第４の態様では、前記副詞部分の解釈では、前記副詞部分が表す程度に応じて前記移動体（１００）の操作に関連するパラメータの変量又は目標値を決定する。この態様によれば、より自然な発話で移動体（１００）を操作できる。 A fourth aspect is based on the voice operation method of the third aspect. In the fourth aspect, in the interpretation of the adverb part, a variable amount or a target value of a parameter related to the operation of the moving body (100) is determined according to the degree expressed by the adverb part. According to this aspect, the moving object (100) can be operated with more natural speech.

第５の態様は、第４の態様の音声操作方法に基づく。第５の態様では、前記解釈ステップは、前記文字列における前記曖昧部分を除く部分と前記移動体（１００）情報との少なくとも一方を利用して、前記音声データに応じて変更するパラメータを決定する。この態様によれば、より自然な発話で移動体（１００）を操作できる。 A fifth aspect is based on the voice operation method of the fourth aspect. In a fifth aspect, the interpreting step uses at least one of the portion of the character string excluding the ambiguous portion and the information of the moving body (100) to determine the parameter to be changed according to the voice data. . According to this aspect, the moving object (100) can be operated with more natural speech.

第６の態様は、第１～第５の態様のいずれか一つの音声操作方法に基づく。第６の態様では、前記解釈ステップは、移動体（１００）情報と曖昧部分の解釈の結果との関係を学習した学習済みモデルを利用して、前記抽出ステップで抽出した曖昧部分の解釈をする。この態様によれば、曖昧部分の解釈の精度が向上する。 A sixth aspect is based on the voice operation method of any one of the first to fifth aspects. In the sixth aspect, the interpreting step interprets the ambiguous portion extracted in the extracting step using a trained model that has learned the relationship between the moving body (100) information and the interpretation result of the ambiguous portion. . According to this aspect, the accuracy of interpretation of the ambiguous part is improved.

第７の態様は、第１～第５の態様のいずれか一つの音声操作方法に基づく。第７の態様では、前記解釈ステップは、予め設定された１以上のルールに従って前記曖昧部分の解釈をする。この態様によれば、曖昧部分の解釈の精度が向上する。 A seventh aspect is based on the voice operation method of any one of the first to fifth aspects. In a seventh aspect, the interpretation step interprets the ambiguous part according to one or more preset rules. According to this aspect, the accuracy of interpretation of the ambiguous part is improved.

第８の態様は、第１～第７の態様のいずれか一つの音声操作方法に基づく。第８の態様では、前記解釈ステップで前記曖昧部分が否定的な意味合いに解釈された場合、前記生成ステップは、前記移動体（１００）の操作を中止するための操作コマンドを生成する。この態様によれば、移動体（１００）の動作を中断する操作が行いやすくなる。 An eighth aspect is based on the voice operation method of any one of the first to seventh aspects. In the eighth aspect, when the ambiguous portion is interpreted to have a negative connotation in the interpretation step, the generation step generates an operation command for stopping the operation of the moving body (100). According to this aspect, it becomes easier to perform an operation to interrupt the movement of the moving object (100).

第９の態様は、第１～第８の態様のいずれか一つの音声操作方法に基づく。第９の態様では、前記解釈ステップは、前記移動体（１００）の操作履歴に基づいて、前記曖昧部分が係る語句を判定する。この態様によれば、曖昧部分の解釈の精度が向上する。 A ninth aspect is based on the voice operation method of any one of the first to eighth aspects. In the ninth aspect, the interpreting step determines a phrase related to the ambiguous part based on the operation history of the moving object (100). According to this aspect, the accuracy of interpretation of the ambiguous part is improved.

第１０の態様は、第１～第９の態様のいずれか一つの音声操作方法に基づく。第１０の態様では、前記変換ステップの前に、前記音声データを特定の人物の音声から取得する取得ステップを含む。この態様によれば、意図しない人物によって移動体（１００）が操作される可能性を低減できる。 A tenth aspect is based on the voice operation method of any one of the first to ninth aspects. A tenth aspect includes an obtaining step of obtaining the voice data from a specific person's voice before the converting step. According to this aspect, it is possible to reduce the possibility that the moving body (100) is operated by an unintended person.

第１１の態様は、第１～第１０の態様のいずれか一つの音声操作方法に基づく。第１１の態様では、前記移動体（１００）の状態は、前記移動体（１００）の内部状態と前記移動体（１００）の外部状態との少なくとも一方を含む。この態様によれば、曖昧部分の解釈の精度が向上する。 The eleventh aspect is based on the voice operation method of any one of the first to tenth aspects. In the eleventh aspect, the state of the moving body (100) includes at least one of an internal state of the moving body (100) and an external state of the moving body (100). According to this aspect, the accuracy of interpretation of the ambiguous part is improved.

第１２の態様は、第１１の態様の音声操作方法に基づく。第１２の態様では、前記内部状態は、前記移動体（１００）の移動に関する状態、前記移動体（１００）の性能に関する状態、及び前記移動体（１００）に乗っている人物に関する状態の少なくとも一つを含む。前記外部状態は、前記移動体（１００）の位置に関する状態、前記移動体（１００）の周辺の状態、及び前記移動体（１００）と連携するシステムの状態の少なくとも一つを含む。この態様によれば、曖昧部分の解釈の精度が向上する。 The twelfth aspect is based on the voice operation method of the eleventh aspect. In the twelfth aspect, the internal state is at least one of a state related to movement of the moving object (100), a state related to performance of the moving object (100), and a state related to a person riding the moving object (100). including one. The external state includes at least one of a state related to the position of the moving body (100), a state of the surroundings of the moving body (100), and a state of a system that cooperates with the moving body (100). According to this aspect, the accuracy of interpretation of the ambiguous part is improved.

第１３の態様は、プログラム（コンピュータプログラム）であって、１以上のプロセッサに、第１～第１２の態様のいずれか一つの音声操作方法を実行させるための、プログラムである。この態様によれば、自然な発話で移動体（１００）を操作できる。 A thirteenth aspect is a program (computer program) for causing one or more processors to execute the voice operation method of any one of the first to twelfth aspects. According to this aspect, the mobile object (100) can be operated by natural speech.

第１４の態様は、音声操作システム（１０）であって、変換部（１２２）と、抽出部（１２３）と、解釈部（１２４）と、生成部（１２５）とを備える。前記変換部（１２２）は、移動体（１００）の操作のための音声データを文字データに変換する。前記抽出部（１２３）は、前記文字データで表される文字列から曖昧部分を抽出する。前記解釈部（１２４）は、前記移動体（１００）の操作履歴及び前記移動体（１００）の状態の少なくとも一方を含む移動体（１００）情報を利用して前記曖昧部分の解釈をする。前記生成部（１２５）は、前記曖昧部分の解釈の結果に基づいて、前記文字データから前記移動体（１００）の操作コマンドを生成する。この態様によれば、自然な発話で移動体（１００）を操作できる。 A fourteenth aspect is a voice manipulation system (10) comprising a transforming unit (122), an extracting unit (123), an interpreting unit (124), and a generating unit (125). The conversion unit (122) converts voice data for operating the mobile body (100) into character data. The extraction unit (123) extracts an ambiguous part from the character string represented by the character data. The interpreting unit (124) interprets the ambiguous part using mobile (100) information including at least one of an operation history of the mobile (100) and a state of the mobile (100). The generation unit (125) generates an operation command for the moving body (100) from the character data based on the interpretation result of the ambiguous part. According to this aspect, the mobile object (100) can be operated by natural speech.

第１５の態様は、移動体（１００）であって、第１４の態様の音声操作システム（１０）と、前記音声操作システム（１０）が搭載される本体（１０１）と、を備える。この態様によれば、自然な発話で移動体（１００）を操作できる。 A fifteenth aspect is a moving object (100) comprising the voice operation system (10) of the fourteenth aspect and a main body (101) on which the voice operation system (10) is mounted. According to this aspect, the mobile object (100) can be operated by natural speech.

１０音声操作システム
１２２変換部
１２３抽出部
１２４解釈部
１２５生成部
１００移動体
１０１本体 10 voice operation system 122 conversion unit 123 extraction unit 124 interpretation unit 125 generation unit 100 moving body 101 main body

Claims

A computer implemented voice manipulation method comprising:
a conversion step of converting voice data for operating a mobile object into character data;
an extraction step of extracting an ambiguous part from a character string represented by the character data;
an interpretation step of interpreting the ambiguous portion using mobile information including at least one of an operation history of the mobile and a state of the mobile;
a generation step of generating an operation command for the moving body from the character data based on the interpretation result of the ambiguous portion;
including
The interpreting step interprets the ambiguous portion extracted in the extracting step using a trained model that has learned the relationship between the moving body information and the interpretation result of the ambiguous portion.
Voice operation method.

the ambiguous part is an adverbial part that is an adverb or an adverbial phrase;
2. The voice operation method according to claim 1.

The adverbial part expresses degree,
3. The voice operation method according to claim 2.

In the interpretation of the adverb part, determining a variable amount or target value of a parameter related to the operation of the moving object according to the degree expressed by the adverb part;
4. The voice operation method according to claim 3.

The interpreting step uses at least one of a part of the character string excluding the ambiguous part and the moving body information to determine a parameter to be changed according to the voice data.
5. The voice operation method according to claim 4.

The interpretation step interprets the ambiguous part according to one or more preset rules.
The voice operation method according to any one of claims 1-5.

If the interpretation step interprets the ambiguous part to have a negative connotation, the generation step generates an operation command for stopping the operation of the moving body.
The voice operation method according to any one of claims 1 to 6.

The interpreting step determines a phrase related to the ambiguous part based on the operation history of the moving object.
The voice operation method according to any one of claims 1-7.

An obtaining step of obtaining the audio data from a specific person's speech before the converting step.
The voice operation method according to any one of claims 1-8.

The state of the moving body includes at least one of an internal state of the moving body and an external state of the moving body,
The voice operation method according to any one of claims 1-9.

The internal state includes at least one of a state related to movement of the mobile body, a state related to performance of the mobile body, and a state related to a person riding the mobile body,
The external state includes at least one of a state related to the position of the moving body, a state surrounding the moving body, and a state of a system that cooperates with the moving body.
11. The voice operation method of claim 10.

for causing one or more processors to execute the voice operation method according to any one of claims 1 to 11,
program.

a conversion unit that converts voice data for operating a mobile object into character data;
an extraction unit that extracts an ambiguous part from a character string represented by the character data;
an interpretation unit that interprets the ambiguous part using mobile information including at least one of an operation history of the mobile and a state of the mobile;
a generation unit that generates an operation command for the moving body from the character data based on the interpretation result of the ambiguous part;
with
The interpreting unit interprets the ambiguous portion extracted by the extracting unit using a trained model that has learned the relationship between the moving object information and the result of interpretation of the ambiguous portion.
voice control system.

a voice operation system of claim 13;
a main body on which the voice operation system is mounted;
comprising
Mobile.