JP2024067341A

JP2024067341A - Information presentation method and information presentation device for vehicle

Info

Publication number: JP2024067341A
Application number: JP2022177333A
Authority: JP
Inventors: 乘西山; Nori Nishiyama
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2024-05-17

Abstract

To provide an information presentation method and an information presentation device for vehicles, which prevent a driver talking with another person from missing guidance voice.SOLUTION: When a driver of a vehicle talks with another person, guidance voice which is similar to voice of the another person and presents information to the driver is generated, and the generated guidance voice is outputted from a speech position of the another person.SELECTED DRAWING: Figure 1

Description

本発明は、車両の情報提示方法及び情報提示装置に関するものである。 The present invention relates to a vehicle information presentation method and information presentation device.

取得した音声データを音声ファイルに録音し、当該音声データに対して音声認識処理を行って道案内に用いる表現を抽出し、抽出した表現を、音声ファイル、走行ルート、及び音声を取得した時の自車位置と対応付けて記録し、記録した走行ルートと同じルートを走行する場合に、自車位置が、記録した自車位置から所定範囲内であることを検出したときは、記録した自車位置に対応する音声ファイルを出力するナビゲーション方法が知られている（特許文献１）。 A navigation method is known in which acquired voice data is recorded in a voice file, the voice data is subjected to voice recognition processing to extract expressions to be used for route guidance, the extracted expressions are recorded in association with the voice file, the driving route, and the vehicle's position at the time the voice was acquired, and when traveling along the same route as the recorded driving route, if it is detected that the vehicle's position is within a predetermined range from the recorded vehicle position, the voice file corresponding to the recorded vehicle position is output (Patent Document 1).

国際公開第２０１３／０５１０７２号International Publication No. 2013/051072

人間は、様々な音声が同時に発出されている場合でも、これらの音声の中から自分に必要な情報及び重要な情報を無意識に選択して聞き取ることができる。これをカクテルパーティー効果とも言う。 Even when various sounds are being produced simultaneously, humans can unconsciously select and hear the information they need and that is important to them. This is also known as the cocktail party effect.

上記従来技術では、記録した走行ルートと同じルートを走行する場合に、記録した自車位置から所定範囲内で、記録した音声ファイルが、運転者に情報を提示する案内音声として出力される。この案内音声を出力した時に運転者が他者と会話をしていると、上述したカクテルパーティー効果により、運転者は、他者が発した音声のみを選択的に聞き取ってしまい、案内音声の方を聞き逃してしまうという問題がある。 In the above-mentioned conventional technology, when driving along the same route as the recorded driving route, the recorded audio file is output as a guidance voice that presents information to the driver within a specified range from the recorded vehicle position. If the driver is talking to someone else when this guidance voice is output, there is a problem that the driver will selectively hear only the voice uttered by the other person due to the cocktail party effect mentioned above, and miss the guidance voice.

本発明が解決しようとする課題は、他者と会話中の運転者が案内音声を聞き逃すことを抑制できる、車両の情報提示方法及び情報提示装置を提供することである。 The problem that the present invention aims to solve is to provide a vehicle information presentation method and information presentation device that can prevent a driver who is talking to someone from missing audio guidance.

本発明は、車両の運転者が他者と会話している場合に、他者の音声と類似する、運転者に情報を提示する案内音声を生成し、生成した案内音声を他者の発話位置から出力することによって上記課題を解決する。 The present invention solves the above problem by generating guidance voice that is similar to the voice of another person when the driver of a vehicle is talking to another person and presents information to the driver, and outputting the generated guidance voice from the position where the other person is speaking.

本発明によれば、他者と会話中の運転者が案内音声を聞き逃すことを抑制できる。 The present invention can prevent a driver who is engaged in a conversation with another person from missing the audio guidance.

本発明の情報提示装置を含む情報提示システムの一例を示すブロック図である。1 is a block diagram showing an example of an information presentation system including an information presentation device according to an embodiment of the present invention. 図１に示す車内カメラ、マイクロフォン、スピーカーの車室における配置の一例を示す、車両の高さ方向に垂直な断面図である。2 is a cross-sectional view perpendicular to the height direction of the vehicle, showing an example of the arrangement of the in-vehicle camera, microphone, and speaker shown in FIG. 1 in the vehicle cabin. 図１の情報提示システムにおける処理手順の一例を示すフローチャートである（その１）。2 is a flowchart showing an example of a processing procedure in the information presentation system of FIG. 1 (part 1); 図１の情報提示システムにおける処理手順の一例を示すフローチャートである（その２）。10 is a flowchart showing an example of a processing procedure in the information presentation system of FIG. 1 (part 2); 図１の情報提示システムにおける処理手順の一例を示すフローチャートである（その３）。10 is a flowchart showing an example of a processing procedure in the information presentation system of FIG. 1 (part 3); 図１の情報提示システムにおける処理手順の一例を示すフローチャートである（その４）。10 is a flowchart showing an example of a processing procedure in the information presentation system of FIG. 1 (part 4);

以下、本発明の実施形態を図面に基づいて説明する。 The following describes an embodiment of the present invention with reference to the drawings.

［情報提示システムの構成］
図１は、本発明に係る情報提示システム１を示すブロック図である。情報提示システム１は、例えば車載システムであり、図１に示すように車内カメラ１１、マイクロフォン１２、通信装置１３、ナビゲーション装置１４、スピーカー１５及び情報提示装置１６を備える。情報提示システム１を構成する装置は、ＣＡＮ（Controller Area Network）その他の車載ＬＡＮによって接続され、互いに情報を授受できる。 [Configuration of information presentation system]
Fig. 1 is a block diagram showing an information presentation system 1 according to the present invention. The information presentation system 1 is, for example, an in-vehicle system, and as shown in Fig. 1, includes an in-vehicle camera 11, a microphone 12, a communication device 13, a navigation device 14, a speaker 15, and an information presentation device 16. The devices constituting the information presentation system 1 are connected by a Controller Area Network (CAN) or other in-vehicle LAN, and can transmit and receive information to and from each other.

車内カメラ１１は、車内に設置されたカメラであり、ＣＣＤ、ＣＭＯＳなどの撮像素子を備えるカメラ、超音波カメラ、赤外線カメラなどが挙げられる。車内カメラ１１が撮影する対象物は、車内に存在する物体であり、主として乗員（具体的には乗員の顔）である。車内カメラ１１は、乗員を特定できるように乗員の顔を撮影できる位置に設置される。また、車内カメラ１１は、取得された画像から乗員の人数と着座位置が特定できる位置に設置されてもよい。なお、一台の車内カメラ１１で全ての乗員を撮影できない場合は、複数台の車内カメラ１１を設置してもよい。 The in-vehicle camera 11 is a camera installed inside the vehicle, and examples of such cameras include cameras equipped with imaging elements such as CCD and CMOS, ultrasonic cameras, and infrared cameras. The objects photographed by the in-vehicle camera 11 are objects present inside the vehicle, and are primarily occupants (specifically, the faces of the occupants). The in-vehicle camera 11 is installed in a position where it can photograph the faces of the occupants so that the occupants can be identified. The in-vehicle camera 11 may also be installed in a position where the number of occupants and their seating positions can be identified from the captured images. Note that if one in-vehicle camera 11 cannot photograph all occupants, multiple in-vehicle cameras 11 may be installed.

マイクロフォン１２は、車内の音を音声データとして取得する装置であり、スタンドマイク、接話型マイク、ガンマイクなどが挙げられる。マイクロフォン１２は、指向性及び無指向性のいずれでもよく、有線及び無線のいずれかの通信方式を用いる。マイクロフォン１２で取得される音声は車両の車室内の音声であり、乗員が発する声、通信装置１３及びスピーカー１５から出力される音声などが含まれる。マイクロフォン１２は、車室内の音声が検出できる範囲内で適宜の位置に設置でき、その台数は、マイクロフォン１２の検出範囲と車室の大きさに応じて適宜の台数とする。 The microphone 12 is a device that captures sounds inside the vehicle as audio data, and examples of such microphones include a stand microphone, a close-talking microphone, and a gun microphone. The microphone 12 may be either directional or omnidirectional, and may use either a wired or wireless communication method. The sound captured by the microphone 12 is sound inside the vehicle cabin, and includes voices emitted by the occupants, and sound output from the communication device 13 and the speaker 15. The microphones 12 may be installed at appropriate positions within a range in which sound inside the vehicle cabin can be detected, and the number of microphones 12 may be determined appropriately depending on the detection range of the microphones 12 and the size of the vehicle cabin.

通信装置１３は、運転者（ドライバー）が車外の人間と通話するための装置であり、ネットワークを介し、車外の人間の有する通信装置１３ａと通信できる。なお、本実施形態では、通信装置１３ａは、情報提示システム１には含まれないものとする。通信装置１３は、例えばスマートフォンのような携帯端末であり、通信装置１３が備えるマイクロフォンとスピーカーに代えて、車載のマイクロフォン１２及びスピーカー１５を用いて通話できる。 The communication device 13 is a device that allows the driver to communicate with people outside the vehicle, and can communicate with a communication device 13a owned by a person outside the vehicle via a network. In this embodiment, the communication device 13a is not included in the information presentation system 1. The communication device 13 is, for example, a mobile terminal such as a smartphone, and can communicate using the microphone 12 and speaker 15 mounted on the vehicle instead of the microphone and speaker provided in the communication device 13.

ナビゲーション装置１４は、地図情報（図示しない）を参照し、車両の現在位置から、乗員により設定された目的地までの走行経路を算出する装置である。車両の現在位置は、ＧＰＳ（Global Positioning System）を用いた測位システムなどから取得する。ナビゲーション装置１４は、例えば、高精細地図情報（ＨＤマップ）の道路情報及び施設情報などを用いて、車両が現在位置から目的地まで到達するための走行経路を検索する。走行経路は、車両が走行する道路、走行車線及び車両の走行方向の情報を含み、例えば線形で表示される。 The navigation device 14 is a device that refers to map information (not shown) and calculates a driving route from the current position of the vehicle to a destination set by the occupant. The current position of the vehicle is acquired from a positioning system using a GPS (Global Positioning System), for example. The navigation device 14 searches for a driving route for the vehicle to reach the destination from the current position using, for example, road information and facility information from high definition map information (HD map). The driving route includes information on the road the vehicle is traveling on, the driving lane, and the vehicle's driving direction, and is displayed, for example, linearly.

図１に示すように、車内カメラ１１により取得された画像データ、マイクロフォン１２により取得された音声データ、通信装置１３から出力された音声データ、及びナビゲーション装置１４にて算出された走行経路の案内情報は、必要に応じて各装置から出力され、情報提示装置１６により取得される。 As shown in FIG. 1, image data acquired by the in-vehicle camera 11, voice data acquired by the microphone 12, voice data output from the communication device 13, and guidance information for the driving route calculated by the navigation device 14 are output from each device as necessary and acquired by the information presentation device 16.

スピーカー１５は、電気信号エネルギーを音響エネルギーに変換して空間に放出する装置であり、拡声器とも言う。図１に示すように、スピーカー１５は、情報提示装置１６から出力された案内音声の信号を、音声に変換して車室内に放出（出力）する。スピーカー１５は、乗員に音声を伝達できる範囲内で適宜の位置に設置でき、その台数は、スピーカー１５の伝達範囲と車室の大きさに応じて適宜の台数とする。 The speaker 15 is a device that converts electrical signal energy into acoustic energy and releases it into space, and is also called a loudspeaker. As shown in FIG. 1, the speaker 15 converts the guidance voice signal output from the information presentation device 16 into sound and releases (outputs) it into the vehicle cabin. The speaker 15 can be installed in an appropriate position within a range in which the sound can be transmitted to the occupants, and the number of speakers 15 is determined appropriately according to the transmission range of the speaker 15 and the size of the vehicle cabin.

上述した車内カメラ１１、マイクロフォン１２及びスピーカー１５の車室における配置の一例を、図２を用いて示す。図２は、車両Ｖの高さ方向に垂直な断面図である。図２に示す車両Ｖの右前側には運転席Ａ１が設けられ、左前側には助手席Ａ２が設けられ、車両Ｖの後側には後部座席Ａ３、Ａ４が設けられている。また、図２に示す車両Ｖでは、運転席Ａ１には運転者Ｂ１が座り、助手席Ａ２には乗員Ｂ２が座り、後部座席Ａ３には乗員Ｂ３が座り、後部座席Ａ４には乗員Ｂ４が座っているものとする。乗員Ｂ２～Ｂ４は、運転者Ｂ１と同じ車両Ｖに乗車する同乗者である。 An example of the arrangement of the above-mentioned in-vehicle camera 11, microphone 12, and speaker 15 in the vehicle cabin is shown in FIG. 2. FIG. 2 is a cross-sectional view perpendicular to the height direction of the vehicle V. A driver's seat A1 is provided on the right front side of the vehicle V shown in FIG. 2, a passenger seat A2 is provided on the left front side, and rear seats A3 and A4 are provided on the rear side of the vehicle V. In addition, in the vehicle V shown in FIG. 2, a driver B1 sits in the driver's seat A1, an occupant B2 sits in the passenger seat A2, an occupant B3 sits in the rear seat A3, and an occupant B4 sits in the rear seat A4. Occupants B2 to B4 are passengers who ride in the same vehicle V as the driver B1.

図２に示す車両Ｖでは、運転者Ｂ１及び乗員Ｂ２～Ｂ４を撮影するため、運転席Ａ１と助手席Ａ２との間に車内カメラ１１が設置されている。具体的には、車内カメラ１１は、ウィンドシールドの上部に設置された後写鏡に備え付けられている。また、運転者Ｂ１の音声を取得するため、運転席Ａ１の前方にマイクロフォン１２ａが設置され、乗員Ｂ２の音声を取得するため、助手席Ａ２の前方にマイクロフォン１２ｂが設置されている。具体的には、マイクロフォン１２ａは、車両Ｖのステアリングホイール又はインストルメントパネルに設置され、マイクロフォン１２ｂは、グローブボックスに設置されている。また、乗員Ｂ３、Ｂ４の音声を取得するため、車両Ｖのルーフの車室中央付近にマイクロフォン１２ｃが設置されている。さらに、運転者Ｂ１に音声を伝達するため、運転席Ａ１の右前方にスピーカー１５ａが設置され、乗員Ｂ２に音声を伝達するため、助手席Ａ２の左前方にスピーカー１５ｂが設置され、乗員Ｂ３、Ｂ４に音声を伝達するため、後部座席Ａ３の右後方にスピーカー１５ｃが設置され、後部座席Ａ４の左後方にスピーカー１５ｄが設置されている。 In vehicle V shown in FIG. 2, an in-vehicle camera 11 is installed between the driver's seat A1 and the passenger seat A2 to capture images of the driver B1 and passengers B2 to B4. Specifically, the in-vehicle camera 11 is attached to a rearview mirror installed at the top of the windshield. A microphone 12a is installed in front of the driver's seat A1 to capture the voice of driver B1, and a microphone 12b is installed in front of the passenger seat A2 to capture the voice of passenger B2. Specifically, microphone 12a is installed on the steering wheel or instrument panel of vehicle V, and microphone 12b is installed in the glove box. A microphone 12c is installed near the center of the passenger compartment on the roof of vehicle V to capture the voices of passengers B3 and B4. Furthermore, a speaker 15a is installed on the right front of the driver's seat A1 to transmit sound to the driver B1, a speaker 15b is installed on the left front of the passenger seat A2 to transmit sound to the passenger B2, a speaker 15c is installed on the right rear of the rear seat A3 to transmit sound to the passengers B3 and B4, and a speaker 15d is installed on the left rear of the rear seat A4.

図１に戻り、情報提示装置１６は、情報提示システム１を構成する各装置を制御して協働させ、車両の乗員に情報を提示するための装置である。情報提示装置１６は、例えばコンピュータであり、プロセッサであるＣＰＵ（Central Processing Unit）と、プログラムが格納されたＲＯＭ（Read Only Memory）と、アクセス可能な記憶装置として機能するＲＡＭ（Random Access Memory）とを備える。ＣＰＵは、ＲＯＭに格納されたプログラムを実行し、情報提示装置１６が有する機能を実現するための動作回路である。 Returning to FIG. 1, the information presentation device 16 is a device that controls and cooperates with each device that constitutes the information presentation system 1 to present information to vehicle occupants. The information presentation device 16 is, for example, a computer, and includes a CPU (Central Processing Unit) that is a processor, a ROM (Read Only Memory) that stores programs, and a RAM (Random Access Memory) that functions as an accessible storage device. The CPU is an operating circuit that executes the programs stored in the ROM and realizes the functions of the information presentation device 16.

情報提示装置１６は、車両の乗員に情報を提示する情報提示機能を有する。情報提示装置１６のＲＯＭには情報提示機能を実現するプログラムが格納され、ＣＰＵがＲＯＭに格納されたプログラムを実行することで、情報提示機能が実現される。図１には、情報提示機能を実現する機能ブロックとして、特定部２、取得部３、モデル生成部４、音声生成部５及び出力部６を便宜的に抽出して示す。以下、図１に示す各機能ブロックが有する機能について説明する。 The information presentation device 16 has an information presentation function that presents information to vehicle occupants. A program that realizes the information presentation function is stored in the ROM of the information presentation device 16, and the information presentation function is realized by the CPU executing the program stored in the ROM. For convenience, FIG. 1 shows the identification unit 2, acquisition unit 3, model generation unit 4, voice generation unit 5, and output unit 6 as functional blocks that realize the information presentation function. The functions of each functional block shown in FIG. 1 are explained below.

［各機能ブロックの機能］
特定部２は、乗員の人数と、各乗員の着座位置とを特定する機能を有する。情報提示装置１６は、特定部２の機能により、車内カメラ１１から取得した画像を解析し、各座席について、乗員が座っているか否かを判定する。例えば、車内カメラ１１から取得した画像に対してパターンマッチングを行い、乗員が座席に座っているパターンを検出した場合は、乗員が座席に座っていると判定する。この場合に、各座席の乗員の顔を識別し、乗員を特定してもよい。 [Functions of each functional block]
The identification unit 2 has a function of identifying the number of occupants and the seating position of each occupant. The information presentation device 16 analyzes the image acquired from the in-vehicle camera 11 by the function of the identification unit 2, and judges whether or not an occupant is sitting in each seat. For example, the information presentation device 16 performs pattern matching on the image acquired from the in-vehicle camera 11, and when a pattern in which an occupant is sitting in a seat is detected, it judges that an occupant is sitting in the seat. In this case, the face of the occupant in each seat may be identified to identify the occupant.

これに代え又はこれに加え、情報提示装置１６は、各座席に設置された圧力センサの圧力値を取得し、圧力値が所定圧力値以上であるか否かを判定してもよい。所定圧力値は、例えば、成人が座席に座った場合に計測される圧力値とし、圧力値が所定圧力値以上であれば乗員が座席に座っていると判定し、圧力値が所定圧力値未満であれば乗員が座席に座っていないと判定する。 Alternatively or in addition, the information presentation device 16 may obtain the pressure value of a pressure sensor installed in each seat and determine whether the pressure value is equal to or greater than a predetermined pressure value. The predetermined pressure value may be, for example, a pressure value measured when an adult is seated in the seat, and if the pressure value is equal to or greater than the predetermined pressure value, it is determined that an occupant is seated in the seat, and if the pressure value is less than the predetermined pressure value, it is determined that no occupant is seated in the seat.

図２に示す車両Ｖであれば、情報提示装置１６は、運転席Ａ１、助手席Ａ２及び後部座席Ａ３、Ａ４に設置された圧力センサの圧力値を検出し、各圧力センサの圧力値が所定圧力値以上であるため、乗員の人数が４人であることを特定する。また、情報提示装置１６は、車内カメラ１１から取得した画像に対してパターンマッチングを行い、運転者Ｂ１及び乗員Ｂ２～Ｂ４の顔を識別し、運転者Ｂ１の着座位置が運転席Ａ１であり、乗員Ｂ２の着座位置が助手席Ａ２であり、乗員Ｂ３の着座位置が後部座席Ａ３であり、乗員Ｂ４の着座位置が後部座席Ａ４であると特定する。 For vehicle V shown in FIG. 2, the information presentation device 16 detects pressure values from pressure sensors installed in the driver's seat A1, passenger seat A2, and rear seats A3 and A4, and determines that the number of occupants is four because the pressure values from each pressure sensor are equal to or greater than a predetermined pressure value. The information presentation device 16 also performs pattern matching on the image acquired from the in-vehicle camera 11 to identify the faces of the driver B1 and passengers B2 to B4, and determines that the driver B1 is seated in the driver's seat A1, the passenger seat A2, the rear seat A3, and the rear seat A4.

取得部３は、車両Ｖの運転者Ｂ１が他者と会話しているか否かを判定し、運転者Ｂ１が他者と会話していると判定した場合に、当該他者の音声を取得する機能を有する。本実施形態の他者とは、車両Ｖの運転者Ｂ１が会話し得る人間であり、具体的には、車両Ｖに乗車している同乗者と、通信装置１３を介して運転者Ｂ１と通話する車外の通話者とのうち少なくとも一方である。つまり、本実施形態では、特定部２の機能により、運転者Ｂ１以外の乗員が存在しないと判定された場合でも、運転者Ｂ１が通話者と会話することがある。 The acquisition unit 3 has a function of determining whether the driver B1 of the vehicle V is talking to another person, and acquiring the voice of the other person when it is determined that the driver B1 is talking to another person. In this embodiment, the other person is a person with whom the driver B1 of the vehicle V can talk, and specifically, is at least one of a passenger in the vehicle V and a caller outside the vehicle who talks to the driver B1 via the communication device 13. In other words, in this embodiment, even if the function of the identification unit 2 determines that there are no occupants other than the driver B1, the driver B1 may talk to the caller.

情報提示装置１６は、運転者Ｂ１が他者と会話しているか否かを判定するため、取得部３の機能により、マイクロフォン１２から車両Ｖの車室内の音声を取得し、取得した音声から乗員の音声を抽出する。具体的には、車室内の音声から、車両Ｖの機器から出力された音声、車両Ｖの走行に伴い車両Ｖを構成する部品から発生する音、車外の騒音などを除去する処理を行う。車両Ｖの機器から出力された音声としては、スピーカー１５から出力されたラジオの音声、ナビゲーション装置１４の効果音などが挙げられる。また、車両Ｖを構成する部品から発生する音としては、タイヤが路面を転がる音、エンジンの作動音、制動時のブレーキの音などが挙げられる。情報提示装置１６は、これらの音声（音）の波形と逆位相の波形を有する音声を、取得した車室内の音声に重ね合わせ、乗員の音声以外の音声（音）を除去する。 In order to determine whether the driver B1 is talking to another person, the information presentation device 16 acquires the voice in the vehicle V from the microphone 12 using the function of the acquisition unit 3, and extracts the voice of the occupant from the acquired voice. Specifically, the information presentation device 16 performs processing to remove the voice output from the equipment of the vehicle V, the sound generated from the parts constituting the vehicle V as the vehicle V runs, and the noise outside the vehicle from the voice in the vehicle. Examples of the voice output from the equipment of the vehicle V include the radio voice output from the speaker 15 and the sound effects of the navigation device 14. Examples of the sound generated from the parts constituting the vehicle V include the sound of the tires rolling on the road surface, the operating sound of the engine, and the sound of the brakes during braking. The information presentation device 16 superimposes the voice having a waveform in the opposite phase to the waveform of these voices (sounds) on the acquired voice in the vehicle interior, and removes the voices (sounds) other than the voice of the occupant.

また、情報提示装置１６は、車内カメラ１１から取得した画像を解析して各乗員の口の動きを検出する。そして、取得した音声における乗員の発話のタイミングと、運転者Ｂ１の口の動きとを比較し、運転者Ｂ１が発話しているか否かを判定する。より具体的には、運転者Ｂ１が、ある一定の時間内に複数回発話したか否かを判定する。当該一定の時間は、運転者Ｂ１が他人と会話をしていることを確認できる範囲内で適宜の時間を設定でき、例えば５～３０秒である。情報提示装置１６は、ある一定の時間内に運転者Ｂ１が複数回発話したと判定した場合は、運転者Ｂ１が同乗者及び／又は通話者と会話していると認識する。これに対し、ある一定の時間内に運転者Ｂ１が複数回発話しなかったと判定した場合は、運転者Ｂ１が同乗者及び／又は通話者と会話していないと認識する。 The information presentation device 16 also analyzes the images acquired from the in-vehicle camera 11 to detect the mouth movements of each occupant. It then compares the timing of the occupant's speech in the acquired voice with the mouth movements of the driver B1 to determine whether the driver B1 is speaking. More specifically, it determines whether the driver B1 has spoken multiple times within a certain period of time. The certain period of time can be set appropriately within a range in which it is possible to confirm that the driver B1 is speaking with another person, for example, 5 to 30 seconds. If the information presentation device 16 determines that the driver B1 has spoken multiple times within a certain period of time, it recognizes that the driver B1 is speaking with a passenger and/or a caller. On the other hand, if it determines that the driver B1 has not spoken multiple times within a certain period of time, it recognizes that the driver B1 is not speaking with a passenger and/or a caller.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしているものとすると、情報提示装置１６は、マイクロフォン１２ａ～１２ｃから車室内の音声を取得し、取得した音声から、通信装置１３が備えるスピーカーと、スピーカー１５から出力された音声を除去する。また、車両Ｖが走行する際に発生する摩擦音、振動音なども除去する。これにより、取得した車室内の音声から、運転者Ｂ１と乗員Ｂ２の音声のみを抽出する。次に、車内カメラ１１から取得した画像を解析し、運転者Ｂ１の口の動きを検出する。そして、運転者Ｂ１の口の動きと、抽出した音声におけるの発話のタイミングとを比較し、ある一定の時間内に運転者Ｂ１が複数回発話したと判定する。これにより、情報提示装置１６は、運転者Ｂ１が同乗者と会話していることを認識する。 For example, if driver B1 and passenger B2 are having a conversation in vehicle V shown in FIG. 2, information presentation device 16 acquires the voice in the vehicle cabin from microphones 12a to 12c, and removes the voice output from the speaker of communication device 13 and speaker 15 from the acquired voice. It also removes frictional noise and vibration noise generated when vehicle V is traveling. In this way, only the voices of driver B1 and passenger B2 are extracted from the acquired voice in the vehicle cabin. Next, it analyzes the image acquired from in-vehicle camera 11 to detect the mouth movement of driver B1. It then compares the mouth movement of driver B1 with the timing of the speech in the extracted voice, and determines that driver B1 has spoken multiple times within a certain period of time. In this way, information presentation device 16 recognizes that driver B1 is conversing with a passenger.

運転者Ｂ１以外の乗員が存在すると判定された場合は、情報提示装置１６は、ある一定の時間内に運転者Ｂ１と、運転者Ｂ１以外の乗員の少なくとも一名とが発話したか否かを判定してもよい。この場合、ある一定の時間内に運転者Ｂ１と、運転者Ｂ１以外の乗員とが発話したと判定したときは、情報提示装置１６は、運転者Ｂ１が同乗者と会話していると認識する。これに対し、一定の時間内に運転者Ｂ１と、運転者Ｂ１以外の乗員とが発話しなかったと判定したときは、情報提示装置１６は、運転者Ｂ１が同乗者と会話していないと認識する。なお、運転者Ｂ１以外の乗員が複数存在する場合は、同様の方法により、乗員同士が会話をしているか否かを判定してもよい。 When it is determined that there is an occupant other than the driver B1, the information presentation device 16 may determine whether or not the driver B1 and at least one of the occupants other than the driver B1 have spoken within a certain period of time. In this case, when it is determined that the driver B1 and the occupant other than the driver B1 have spoken within a certain period of time, the information presentation device 16 recognizes that the driver B1 is talking to a passenger. In contrast, when it is determined that the driver B1 and the occupant other than the driver B1 have not spoken within a certain period of time, the information presentation device 16 recognizes that the driver B1 is not talking to a passenger. Note that when there are multiple occupants other than the driver B1, it may be determined by a similar method whether or not the occupants are talking to each other.

また、運転者Ｂ１以外の乗員が存在しないと判定された場合、又は運転者Ｂ１以外の乗員が発話していないと判定された場合は、情報提示装置１６は、通信装置１３が、情報提示システム１の外部の通信装置１３ａと通話中であるか否かを判定してもよい。この場合、通信装置１３が外部の通信装置１３ａと通話中であると判定したときは、情報提示装置１６は、運転者Ｂ１が通話者と会話していると判定し、通信装置１３が外部の通信装置１３ａと通話中でないと判定したときは、運転者Ｂ１が同乗者及び通話者のいずれとも会話をしていないと判定する。 In addition, if it is determined that there are no occupants other than the driver B1, or if it is determined that no occupants other than the driver B1 are speaking, the information presentation device 16 may determine whether the communication device 13 is in a call with a communication device 13a external to the information presentation system 1. In this case, when it is determined that the communication device 13 is in a call with the external communication device 13a, the information presentation device 16 determines that the driver B1 is talking to the caller, and when it is determined that the communication device 13 is not in a call with the external communication device 13a, it determines that the driver B1 is not talking to either a passenger or the caller.

さらに、情報提示装置１６は、複数のマイクロフォン１２が取得した音声の波形同士の位相の差と、マイクロフォン１２が設置された位置との関係に基づき、どの乗員がどの音声を発話したのかを特定してもよい。例えば、図２に示す車両Ｖにおいて運転者Ｂ１が発話した場合は、ある波形を有する音声がマイクロフォン１２ａにて検出された後、同じ波形を有する音声がマイクロフォン１２ｃにて検出される。マイクロフォン１２ａとマイクロフォン１２ｃにおける当該音声の検出タイミングの差は、運転者Ｂ１とマイクロフォン１２ａの距離と、運転者Ｂ１とマイクロフォン１２ｃの距離の差に比例するため、当該音声は運転者Ｂ１が発話したものと特定できる。この場合は、車内カメラ１１から取得した画像を用いずに、運転者Ｂ１が発話しているか否か（つまり、運転者Ｂ１が他人と会話しているか否か）を判定できる。 Furthermore, the information presentation device 16 may identify which occupant has spoken which voice based on the phase difference between the waveforms of the voices acquired by the multiple microphones 12 and the relationship with the positions at which the microphones 12 are installed. For example, when the driver B1 speaks in the vehicle V shown in FIG. 2, a voice having a certain waveform is detected by the microphone 12a, and then a voice having the same waveform is detected by the microphone 12c. Since the difference in the detection timing of the voice at the microphones 12a and 12c is proportional to the difference between the distance between the driver B1 and the microphone 12a and the distance between the driver B1 and the microphone 12c, the voice can be identified as having been spoken by the driver B1. In this case, it is possible to determine whether the driver B1 is speaking (i.e., whether the driver B1 is talking to another person) without using an image acquired from the in-vehicle camera 11.

情報提示装置１６は、運転者Ｂ１が他者と会話していると判定した場合は、取得部３の機能により、運転者Ｂ１と会話する他者の音声を取得する。情報提示装置１６は、運転者Ｂ１が、同じ車両Ｖに乗車している同乗者と会話している場合は、同乗者の音声を取得し、運転者Ｂ１が、通信装置１３を介して車外の通話者と会話している場合は、通話者の音声を取得する。これに対し、運転者Ｂ１が他者と会話していないと判定した場合は、情報提示装置１６は、運転者Ｂ１が他者と会話しているか否かの判定を繰り返す。又はこれに代え、情報提示装置１６は、スタンバイの状態になる。以下、同乗者の音声を第１音声とも言い、通話者の音声を第２音声とも言うこととする。 When the information presentation device 16 determines that the driver B1 is talking to another person, it acquires the voice of the other person who is talking to the driver B1 through the function of the acquisition unit 3. When the driver B1 is talking to a passenger in the same vehicle V, the information presentation device 16 acquires the voice of the passenger, and when the driver B1 is talking to a caller outside the vehicle via the communication device 13, it acquires the voice of the caller. In contrast, when it is determined that the driver B1 is not talking to another person, the information presentation device 16 repeats the determination of whether the driver B1 is talking to another person. Alternatively, the information presentation device 16 goes into a standby state. Hereinafter, the voice of the passenger is also referred to as the first voice, and the voice of the caller is also referred to as the second voice.

同乗者の第１音声は、上述と同様の方法で、車室内の音声から同乗者の音声を抽出して取得する。例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合は、マイクロフォン１２ａ～１２ｃから車室内の音声を取得し、取得した音声から、スピーカー１５から出力された音声などを除去し、運転者Ｂ１と乗員Ｂ２の音声のみを抽出する。次に、車内カメラ１１から取得した画像から乗員Ｂ２の口の動きを検出し、運転者Ｂ１と乗員Ｂ２のみの音声から、乗員Ｂ２の口が開いている（つまり、乗員Ｂ２が発話している）部分のみをさらに抽出する。これに対し、通話者の第２音声は、通信装置１３のスピーカー又は車両Ｖのスピーカー１５から出力される通話者の音声を取得する。 The first voice of the passenger is obtained by extracting the voice of the passenger from the voice in the vehicle cabin in the same manner as described above. For example, when the driver B1 and the passenger B2 are talking in the vehicle V shown in FIG. 2, the voice in the vehicle cabin is obtained from the microphones 12a to 12c, and the voice output from the speaker 15 and the like are removed from the obtained voice, and only the voices of the driver B1 and the passenger B2 are extracted. Next, the mouth movement of the passenger B2 is detected from the image obtained from the in-vehicle camera 11, and only the part where the mouth of the passenger B2 is open (i.e., the passenger B2 is speaking) is further extracted from the voice of only the driver B1 and the passenger B2. In contrast, the second voice of the caller is obtained by obtaining the voice of the caller output from the speaker of the communication device 13 or the speaker 15 of the vehicle V.

モデル生成部４は、案内音声を生成するための音声モデルを生成する機能を有する。音声モデルとは、音声の特徴量が設定されたモデルであり、例えば、入力されたテキストを読み上げて人工音声を生成するソフトウェアである。特徴量とは、音声の特徴を示す数値であり、具体的には、発音の速さ、声の大きさ、母音と子音の音声の波形、単語の使用頻度、方言の種類とその使用頻度などが挙げられる。 The model generation unit 4 has a function of generating a voice model for generating guidance voice. A voice model is a model in which voice features are set, and is, for example, software that reads input text and generates an artificial voice. Features are numerical values that indicate the characteristics of a voice, and specific examples include pronunciation speed, voice volume, voice waveforms of vowels and consonants, frequency of word use, types of dialects and their frequency of use.

これらの特徴量は、発話された内容に対する形態素解析、音声の波形に対する周波数解析などの解析により求められる。特に、情報提示装置１６は、取得した音声に対して周波数解析を行い、周波数解析により得られた周波数成分とその振幅を特徴量として算出する。周波数解析とは、ある波形を複数の単純な正弦波と余弦波に分解することで、当該波形に含まれる周波数成分とその振幅を求めることを言う。情報提示装置１６は、取得した音声の波形の所定範囲で上述の各種解析を実行する。当該所定範囲は、例えば、運転者Ｂ１と会話している乗員が発話を開始してから、運転者Ｂ１又は他の乗員が発話を開始するまでの範囲である。また、当該所定範囲は、運転者Ｂ１と会話している乗員が発話を開始してから、一つの文を発話し終えるまでの範囲であってもよい。 These features are obtained by morphological analysis of the spoken content, frequency analysis of the voice waveform, and other analyses. In particular, the information presentation device 16 performs frequency analysis on the acquired voice, and calculates the frequency components and their amplitudes obtained by the frequency analysis as features. Frequency analysis refers to determining the frequency components and their amplitudes contained in a waveform by breaking down the waveform into multiple simple sine waves and cosine waves. The information presentation device 16 performs the above-mentioned various analyses within a predetermined range of the acquired voice waveform. The predetermined range is, for example, the range from when the occupant who is talking to the driver B1 starts speaking to when the driver B1 or another occupant starts speaking. The predetermined range may also be the range from when the occupant who is talking to the driver B1 starts speaking to when he finishes speaking a sentence.

情報提示装置１６は、モデル生成部４の機能により、音声に対して周波数解析などの解析を行い、当該音声の特徴量を算出し、取得した音声に類似する案内音声を生成できる音声モデルを生成する。情報提示装置１６は、算出された特徴量に基づき、機械学習（例えば深層学習）を用いて新規な音声モデルを生成してもよく、算出された特徴量を用いて、予め登録された音声モデルを変更することで音声モデルを生成してもよい。また、情報提示装置１６は、特徴量に基づき、予め登録された複数の音声モデルから特徴量が最も類似する音声モデルを選択してもよい。予め登録された音声モデルから一のモデルを選択することで、時間のかかる、音声モデルを生成する工程を省略できる。 The information presentation device 16 uses the functions of the model generation unit 4 to perform frequency analysis and other analyses on the voice, calculates the features of the voice, and generates a voice model that can generate a guidance voice similar to the acquired voice. The information presentation device 16 may generate a new voice model using machine learning (e.g., deep learning) based on the calculated features, or may generate a voice model by modifying a pre-registered voice model using the calculated features. Furthermore, the information presentation device 16 may select a voice model with the most similar features from multiple pre-registered voice models based on the features. By selecting one model from the pre-registered voice models, the time-consuming process of generating a voice model can be omitted.

なお、特徴量が最も類似するとは、特徴量同士の差が最も小さい（つまり最も似ている）ことを言う。例えば、特徴量（特に周波数成分とその振幅）の差が最も小さくなることを言う。一例として、周波数成分とその振幅について、小さい周波数成分から順番に周波数を比較し、周波数の差を足し合わせた時に差の総和が最も小さくなるものが、最も類似する特徴量である。また、別の例として、小さい周波数成分から順番に振幅の差を求め、振幅の差を足し合わせた時に差の総和が最も小さくなるものが、最も類似する特徴量である。 The feature quantities being most similar means that the difference between the feature quantities is the smallest (i.e., they are the most similar). For example, this means that the difference between feature quantities (particularly frequency components and their amplitudes) is the smallest. As an example, when comparing frequency components and their amplitudes in order from the smallest frequency components, the feature quantity with the smallest sum of the differences when the differences in frequency are added is the most similar feature quantity. As another example, when comparing amplitude differences in order from the smallest frequency components, the feature quantity with the smallest sum of the differences when the differences in amplitude are added is the most similar feature quantity.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合に、乗員Ｂ２の音声を取得したときは、情報提示装置１６は、乗員Ｂ２の音声に対して周波数解析を行い、乗員Ｂ２の音声に含まれる周波数成分とその振幅を算出する。そして、当該周波数成分と振幅に基づき、ナビゲーション装置１４のデフォルトの音声モデルを変更し、乗員Ｂ２の音声に類似する案内音声を生成するための音声モデルを生成する。 For example, in the case where the driver B1 and the occupant B2 are having a conversation in the vehicle V shown in FIG. 2, when the voice of the occupant B2 is acquired, the information presentation device 16 performs a frequency analysis on the voice of the occupant B2 and calculates the frequency components and their amplitudes contained in the voice of the occupant B2. Then, based on the frequency components and the amplitudes, the default voice model of the navigation device 14 is changed, and a voice model is generated for generating a guidance voice similar to the voice of the occupant B2.

音声生成部５は、運転者Ｂ１に情報を提示する案内音声を生成する機能を有する。運転者Ｂ１に提示する情報とは、例えばナビゲーション装置１４から出力される情報であり、特に、車両Ｖが、設定された走行経路に沿って走行するための情報である。具体的には、車両Ｖの走行状態、交差点における右左折の要否、右左折専用車線の有無、車線変更の要否、回避の必要がある障害物の位置などが挙げられる。また、案内音声により提示される情報には、目的地付近の観光情報、近隣の施設情報、ニュースなどの時事情報、車両Ｖの故障情報などの、走行経路に沿って走行するための情報以外の情報が含まれる。これらの情報を総称して案内情報とも言う。 The voice generating unit 5 has a function of generating guidance voice that presents information to the driver B1. The information presented to the driver B1 is, for example, information output from the navigation device 14, and in particular, information for the vehicle V to travel along a set travel route. Specifically, this includes the travel state of the vehicle V, whether or not to turn right or left at an intersection, whether or not there is a dedicated lane for right or left turns, whether or not to change lanes, and the location of obstacles that need to be avoided. In addition, the information presented by the guidance voice includes information other than information for traveling along the travel route, such as tourist information near the destination, information about nearby facilities, current events such as news, and information about malfunctions of the vehicle V. This information is collectively referred to as guidance information.

情報提示装置１６は、上述の案内情報を運転者Ｂ１に伝達するため、音声生成部５の機能により、音声モデルを用いて案内音声を生成する。具体的には、同乗者の第１音声と類似する案内音声と、通話者の第２音声と類似する案内音声のうち少なくとも一方を生成する。生成される案内音声の内容は、図１に示すとおり、ナビゲーション装置１４から取得した案内情報に基づく。以下、同乗者の第１音声と類似する案内音声を第１案内音声とも言い、通話者の第２音声と類似する案内音声を第２案内音声とも言うこととする。 In order to convey the above-mentioned guidance information to the driver B1, the information presentation device 16 generates a guidance voice using a voice model through the function of the voice generation unit 5. Specifically, at least one of a guidance voice similar to the passenger's first voice and a guidance voice similar to the caller's second voice is generated. The content of the generated guidance voice is based on the guidance information acquired from the navigation device 14, as shown in FIG. 1. Hereinafter, the guidance voice similar to the passenger's first voice is also referred to as the first guidance voice, and the guidance voice similar to the caller's second voice is also referred to as the second guidance voice.

情報提示装置１６が生成する案内音声は、運転者Ｂ１と会話している他者の音声と類似している。案内音声が他者の音声と類似しているとは、案内音声が他者の音声と同一であることを含み、他者と会話をしている運転者Ｂ１に対して案内音声を出力した場合に、カクテルパーティー効果が抑制され、運転者Ｂ１が案内音声を聞き逃すことが抑制できることを言う。つまり、会話している乗員の音声と似た音声であれば、運転者Ｂ１は、会話中であっても案内音声を無意識に選択して聞き取ることができる。 The guidance voice generated by the information presentation device 16 is similar to the voice of another person who is conversing with the driver B1. The guidance voice being similar to the voice of another person includes the guidance voice being identical to the voice of the other person, and means that when a guidance voice is output to the driver B1 who is conversing with another person, the cocktail party effect is suppressed, and it is possible to prevent the driver B1 from missing the guidance voice. In other words, if the voice is similar to the voice of the passenger who is conversing, the driver B1 can unconsciously select and hear the guidance voice even during a conversation.

一例として、他者の音声の波形と、生成した案内音声の波形とに対し、上述の所定範囲において周波数解析を行い、周波数成分とその振幅を算出する。次に、他者の音声に含まれる周波数成分と、案内音声に含まれる周波数成分を、値の小さい順に互いに対応付ける。次に、他者の音声の周波数成分の振幅と、他者の音声の周波数成分に対応する、案内音声の周波数成分の振幅とを比較する。そして、対応する各周波数成分において、案内音声の振幅が、他者の音声の振幅の±１０％の範囲内であるか否かを判定する。案内音声の振幅が、他者の音声の振幅の±１０％の範囲内あると判定した場合は、案内音声は他者の音声と類似すると判定する。これに対し、案内音声の振幅が、他者の音声の振幅の－１０％未満の範囲内又は＋１０％を超える範囲内であると判定した場合は、案内音声は他者の音声と類似しないと判定する。なお、他者の音声に対する周波数解析は、上述の所定範囲にて行い、案内音声に対する周波数解析は、生成した案内音声全体に対して行うものとする。 As an example, frequency analysis is performed on the waveform of the other person's voice and the generated guidance voice within the above-mentioned specified range to calculate the frequency components and their amplitudes. Next, the frequency components contained in the other person's voice and the frequency components contained in the guidance voice are associated with each other in ascending order of value. Next, the amplitude of the frequency components of the other person's voice is compared with the amplitude of the frequency components of the guidance voice corresponding to the frequency components of the other person's voice. Then, for each corresponding frequency component, it is determined whether the amplitude of the guidance voice is within a range of ±10% of the amplitude of the other person's voice. If it is determined that the amplitude of the guidance voice is within a range of ±10% of the amplitude of the other person's voice, it is determined that the guidance voice is similar to the other person's voice. On the other hand, if it is determined that the amplitude of the guidance voice is within a range of less than -10% or more than +10% of the amplitude of the other person's voice, it is determined that the guidance voice is not similar to the other person's voice. Note that the frequency analysis of the other person's voice is performed within the above-mentioned specified range, and the frequency analysis of the guidance voice is performed on the entire generated guidance voice.

別の例として、他者の音声に含まれる周波数成分と、案内音声に含まれる周波数成分とを、値が小さい順に対応させる。そして、対応する二つの周波数成分の差の最大値が５～５０［Ｈｚ］以下であるか否かを判定する。対応する二つの周波数成分の差の最大値が５～５０［Ｈｚ］以下であると判定した場合は、案内音声は他者の音声と類似すると判定する。これに対し、対応する二つの周波数成分の差の最大値が５～５０［Ｈｚ］を超えると判定した場合は、案内音声は他者の音声と類似しないと判定する。なお、他者の音声と案内音声に対して周波数解析を行う範囲は、上述の例と同様である。 As another example, the frequency components contained in the voice of another person and the frequency components contained in the guidance voice are matched in ascending order of value. Then, it is determined whether the maximum value of the difference between the two corresponding frequency components is 5 to 50 [Hz] or less. If it is determined that the maximum value of the difference between the two corresponding frequency components is 5 to 50 [Hz] or less, it is determined that the guidance voice is similar to the voice of another person. In contrast, if it is determined that the maximum value of the difference between the two corresponding frequency components exceeds 5 to 50 [Hz], it is determined that the guidance voice is not similar to the voice of another person. Note that the range in which frequency analysis is performed on the voice of another person and the guidance voice is the same as in the above example.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合に、乗員Ｂ２の音声を取得し、乗員Ｂ２の音声に類似する案内音声を生成するための音声モデルを生成したときは、情報提示装置１６は、ナビゲーション装置１４から案内情報を取得する。案内情報が、前方の交差点を右折する必要があるとの情報である場合は、情報提示装置１６は、取得した案内情報に基づき、音声モデルを用いて、例えば「次の交差点を右に曲がります」という案内音声を生成する。当該案内音声は、乗員Ｂ２の音声と類似しており、一例として、乗員Ｂ２の音声に含まれる周波数成分ごとに、当該案内音声に含まれる周波数成分の中から最も値が近い周波数成分を選択し、二つの周波数成分を対応させた場合に、対応する二つの周波数成分の差の最大値が５～５０［Ｈｚ］以下である。 For example, in the case where the driver B1 and the passenger B2 are having a conversation in the vehicle V shown in FIG. 2, when the voice of the passenger B2 is acquired and a voice model for generating a guidance voice similar to the voice of the passenger B2 is generated, the information presentation device 16 acquires guidance information from the navigation device 14. If the guidance information is information that it is necessary to turn right at the intersection ahead, the information presentation device 16 uses the voice model based on the acquired guidance information to generate a guidance voice such as "Turn right at the next intersection." The guidance voice is similar to the voice of the passenger B2, and as an example, for each frequency component included in the voice of the passenger B2, the frequency component with the closest value is selected from the frequency components included in the guidance voice, and the two frequency components are matched, and the maximum value of the difference between the two corresponding frequency components is 5 to 50 [Hz] or less.

出力部６は、音声生成部５の機能により生成した案内音声を、運転者Ｂ１と会話している他者の発話位置から出力する機能を有する。他者の発話位置とは、他者の音声が放出（又は出力）される位置であり、例えば他者の頭部の位置であり、特に口の位置である。他者が同乗者である場合は、発話位置は、例えば同乗者の着座位置であり、より具体的には同乗者の頭部の位置である。情報提示装置１６は、運転者Ｂ１が同乗者と会話している場合は、出力部６の機能により、同乗者から運転者に向けて第１案内音声を出力する。これに対し、他者が通話者である場合は、発話位置は、第２音声を出力するスピーカーであり、例えば、通信装置１３のスピーカー又は車両Ｖのスピーカー１５である。情報提示装置１６は、運転者Ｂ１が通話者と会話している場合は、出力部６の機能により、第２音声を出力するスピーカーから運転者に向けて第２案内音声を出力する。 The output unit 6 has a function of outputting the guidance voice generated by the function of the voice generation unit 5 from the speaking position of the other person who is talking to the driver B1. The speaking position of the other person is the position where the voice of the other person is emitted (or output), for example, the position of the head of the other person, and particularly the position of the mouth. If the other person is a passenger, the speaking position is, for example, the seating position of the passenger, more specifically, the position of the head of the passenger. When the driver B1 is talking to the passenger, the information presentation device 16 outputs the first guidance voice from the passenger to the driver by the function of the output unit 6. On the other hand, when the other person is the caller, the speaking position is a speaker that outputs the second voice, for example, the speaker of the communication device 13 or the speaker 15 of the vehicle V. When the driver B1 is talking to the caller, the information presentation device 16 outputs the second guidance voice from the speaker that outputs the second voice to the driver by the function of the output unit 6.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合に、乗員Ｂ２の音声に類似する案内音声を生成したときは、情報提示装置１６は、助手席Ａ２の左前側に設置されたスピーカー１５ｂから運転者Ｂ１（又は運転席Ａ１）に向けて案内音声を出力する。これにより、運転者Ｂ１は、乗員Ｂ２が案内音声を発話したように認識し、乗員Ｂ２と会話している場合であっても、案内音声を無意識に選択して聞き取ることができる。 For example, in the case where the driver B1 and the passenger B2 are having a conversation in the vehicle V shown in FIG. 2, when a guidance voice similar to the voice of the passenger B2 is generated, the information presentation device 16 outputs the guidance voice to the driver B1 (or the driver's seat A1) from the speaker 15b installed on the left front side of the passenger seat A2. This allows the driver B1 to recognize that the guidance voice is being spoken by the passenger B2, and can unconsciously select and hear the guidance voice even when the driver B1 is having a conversation with the passenger B2.

情報提示装置１６は、出力部６の機能により、運転者Ｂ１と会話している同乗者の着座位置周辺から案内音声を出力してもよい。例えば、複数あるスピーカー１５から、同乗者の着座位置に最も近い位置のスピーカーから案内音声を出力する。また、情報提示装置１６は、出力部６の機能により、運転者Ｂ１に、同乗者が発話したと認識させるように車内のスピーカー１５を制御してもよい。具体的には、スピーカー１５から出力される音声の大きさ、音声の波形の位相、出力のタイミングなどを制御する。また、運転者Ｂ１に、同乗者が発話したと認識させるよう、車室内でスピーカー１５を設置する位置を設定してもよい。 The information presentation device 16 may use the function of the output unit 6 to output a guidance voice from the vicinity of the seating position of the passenger who is conversing with the driver B1. For example, out of the multiple speakers 15, the guidance voice may be output from the speaker closest to the seating position of the passenger. Furthermore, the information presentation device 16 may use the function of the output unit 6 to control the speaker 15 in the vehicle so that the driver B1 recognizes that the passenger has spoken. Specifically, the information presentation device 16 controls the volume of the voice output from the speaker 15, the phase of the voice waveform, the output timing, etc. Furthermore, the position at which the speaker 15 is installed in the vehicle may be set so that the driver B1 recognizes that the passenger has spoken.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合は、乗員Ｂ２の音声に類似した案内音声を、スピーカー１５ｂ、１５ｄから運転者Ｂ１に向けて出力する。この場合に、スピーカー１５ｂから出力した案内音声と、スピーカー１５ｄから出力した案内音声とが互いに打ち消し合わないよう、いずれか一方のスピーカーから出力する案内音声の波形の位相を制御する。 For example, in the vehicle V shown in FIG. 2, when the driver B1 and passenger B2 are having a conversation, a guidance voice similar to the voice of passenger B2 is output from speakers 15b and 15d to the driver B1. In this case, the phase of the waveform of the guidance voice output from one of the speakers is controlled so that the guidance voice output from speaker 15b and the guidance voice output from speaker 15d do not cancel each other out.

情報提示装置１６は、出力部６の機能により、所定時間における運転者Ｂ１と、運転者Ｂ１と会話している他者の発言時間の占める割合が、所定値以下になったか否かを判定してもよい。所定時間は、運転者Ｂ１と他者との会話に案内音声を割り込ませるタイミングが把握できる範囲内で適宜の時間を設定でき、例えば１５～６０秒である。所定値は、運転者Ｂ１と他者との会話に案内音声を割り込ませた場合に、運転者Ｂ１が案内音声を聞き逃すことを抑制できる範囲内で適宜の値を設定でき、例えば０～２０％である。情報提示装置１６は、所定時間に占める、運転者Ｂ１と他者の発言時間の割合が所定値以下になったと判定した場合は、当該割合が所定値以下になったと判定したタイミングで案内音声を出力する。 The information presentation device 16 may use the function of the output unit 6 to determine whether the ratio of speech time between the driver B1 and another person who is conversing with the driver B1 in a predetermined time period is equal to or less than a predetermined value. The predetermined time period can be set appropriately within a range in which the timing for interrupting the conversation between the driver B1 and another person with a guidance voice can be grasped, for example, 15 to 60 seconds. The predetermined value can be set appropriately within a range in which the driver B1 is prevented from missing the guidance voice when the guidance voice is interrupted in the conversation between the driver B1 and another person, for example, 0 to 20%. When the information presentation device 16 determines that the ratio of speech time between the driver B1 and another person in a predetermined time period is equal to or less than a predetermined value, it outputs the guidance voice at the timing when it is determined that the ratio is equal to or less than the predetermined value.

これに対し、所定時間に占める、運転者Ｂ１と他者の発言時間の割合が所定値を超えると判定した場合は、案内音声を出力しない。これに代え、情報提示装置１６は、所定時間に占める、運転者Ｂ１と他者の発言時間の割合が所定値を超えると判定した場合は、発話者が運転者Ｂ１であるか否かを判定し、発話者が運転者Ｂ１であると判定した場合は、案内音声を出力してもよい。これに対し、発話者が運転者Ｂ１でないと判定した場合は、案内音声を出力しない。またこれに代え、情報提示装置１６は、案内音声を生成したタイミングで直ちに案内音声を出力してもよい。つまり、案内音声の生成完了と同時に案内音声を出力してもよい。 In contrast, if it is determined that the ratio of speech time of driver B1 and others to a predetermined time exceeds a predetermined value, the guidance voice is not output. Alternatively, if the information presentation device 16 determines that the ratio of speech time of driver B1 and others to a predetermined time exceeds a predetermined value, it may determine whether the speaker is driver B1 or not, and output the guidance voice if it is determined that the speaker is driver B1. In contrast, if it is determined that the speaker is not driver B1, the guidance voice is not output. Alternatively, the information presentation device 16 may output the guidance voice immediately at the timing when the guidance voice is generated. In other words, the guidance voice may be output at the same time as the generation of the guidance voice is completed.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合に、車室内の音声を３０秒間取得し、運転者Ｂ１と乗員Ｂ２の発言時間が３秒であると検出されたときは、所定時間における発言時間の割合が１０％であるため、情報提示装置１６は、この割合を算出したタイミングで案内音声を出力する。なお、所定時間における発言時間の割合が所定値以下である状態を、運転者Ｂ１と他者の会話が途切れた状態とも言うこととする。 For example, in the case where driver B1 and passenger B2 are having a conversation in vehicle V shown in FIG. 2, if the voice in the vehicle cabin is acquired for 30 seconds and it is detected that the speech time of driver B1 and passenger B2 is 3 seconds, the ratio of speech time to the specified time is 10%, so the information presentation device 16 outputs a guidance voice at a timing calculated based on this ratio. Note that a state in which the ratio of speech time to the specified time is equal to or less than a specified value is also referred to as a state in which the conversation between driver B1 and another person has been interrupted.

情報提示装置１６は、出力部６の機能により、ナビゲーション装置１４から案内情報を取得し、案内情報に基づく案内音声が、運転者Ｂ１に操作を要求する音声であるか否かを判定してもよい。操作を要求する音声とは、例えば、運転者Ｂ１に運転操作の実行を指示する音声であり、交差点における右左折を指示する案内音声、右左折専用車線への進入を指示する案内音声、接近する他車両の存在を報知する案内音声などが挙げられる。これに対し、操作を要求しない声とは、運転者Ｂ１に単に情報を提示するのみの音声であり、近隣に駐車場が存在することを知らせる音声、ニュースを伝達する音声などが挙げられる。案内音声が操作を要求する音声であると判定された場合は、情報提示装置１６は、操作を要求する音声を、操作を要求しない他の音声に優先して出力する。これに対し、案内音声が操作を要求する音声でないと判定された場合は、情報提示装置１６は、操作を要求する他の音声を優先して出力し、その後、操作を要求しない音声を出力する。 The information presentation device 16 may obtain guidance information from the navigation device 14 using the function of the output unit 6, and may determine whether the guidance voice based on the guidance information is a voice requesting an operation from the driver B1. The voice requesting an operation is, for example, a voice instructing the driver B1 to perform a driving operation, such as a guidance voice instructing the driver B1 to turn right or left at an intersection, a guidance voice instructing the driver to enter a right or left turn exclusive lane, and a guidance voice notifying the driver of the presence of an approaching vehicle. In contrast, a voice that does not request an operation is a voice that simply presents information to the driver B1, such as a voice informing the driver B1 of the existence of a parking lot nearby and a voice conveying news. If it is determined that the guidance voice is a voice requesting an operation, the information presentation device 16 outputs the voice requesting an operation in priority over other voices that do not request an operation. In contrast, if it is determined that the guidance voice is not a voice requesting an operation, the information presentation device 16 outputs the other voices that request an operation in priority, and then outputs a voice that does not request an operation.

情報提示装置１６は、案内音声を出力する場合に、取得部３の機能により、運転者Ｂ１と会話する他者が発話しているか否かを判定してもよい。情報提示装置１６は、例えば、車内カメラ１１から取得した画像を解析し、運転者Ｂ１と会話している乗員の口の動きを検出する。そして、当該乗員の口の動き（例えば口の開き方）から、当該乗員が発話しているか否かを判定する。また、情報提示装置１６は、通信装置１３から通話者の第２音声を取得し、当該通話者が発話しているか否かを判定する。 When outputting the guidance voice, the information presentation device 16 may use the function of the acquisition unit 3 to determine whether or not another person who is conversing with the driver B1 is speaking. For example, the information presentation device 16 analyzes an image acquired from the in-vehicle camera 11 and detects the mouth movement of the occupant who is conversing with the driver B1. Then, based on the mouth movement of the occupant (e.g., the way the mouth is open), it determines whether or not the occupant is speaking. The information presentation device 16 also acquires the second voice of the caller from the communication device 13 and determines whether or not the caller is speaking.

運転者Ｂ１と会話している他者が発話していると判定した場合は、出力部６の機能により、当該他者の発話する音声の波形と逆位相の波形を有する異なる音声を出力する。これにより、当該他者の音声が打ち消され、運転者Ｂ１が案内音声を聞き逃すことを抑制できる。これに対し、運転者Ｂ１と会話している他者が発話していないと判定した場合は、当該他者の音声に類似する案内音声を出力する。なお、他者が同乗者である場合は、逆位相の波形を有する音声は、例えば、車両Ｖのスピーカー１５から出力され、他者が通話者である場合は、逆位相の波形を有する音声は、第２音声を出力するスピーカーから出力される。 If it is determined that the other person who is conversing with the driver B1 is speaking, the output unit 6 functions to output a different voice having a waveform in the opposite phase to the waveform of the voice of the other person. This cancels out the voice of the other person, preventing the driver B1 from missing the guidance voice. In contrast, if it is determined that the other person who is conversing with the driver B1 is not speaking, a guidance voice similar to the voice of the other person is output. Note that if the other person is a passenger, the voice having the opposite phase waveform is output, for example, from the speaker 15 of the vehicle V, and if the other person is the caller, the voice having the opposite phase waveform is output from the speaker that outputs the second voice.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合は、情報提示装置１６は、乗員Ｂ２の口の動きを検出し、乗員Ｂ２の口の動きから、乗員Ｂ２が発話しているか否かを判定する。この場合、乗員Ｂ２が発話していると判定されるため、情報提示装置１６は、当該乗員の発話する音声の波形と逆位相の波形を有する異なる音声を、スピーカー１５ａ又はスピーカー１５ｂから出力する。 For example, in the case where driver B1 and occupant B2 are having a conversation in vehicle V shown in FIG. 2, information display device 16 detects the movement of occupant B2's mouth and determines whether occupant B2 is speaking from the movement of occupant B2's mouth. In this case, since it is determined that occupant B2 is speaking, information display device 16 outputs a different voice having a waveform in the opposite phase to the waveform of the voice spoken by the occupant from speaker 15a or speaker 15b.

情報提示装置１６は、モデル生成部４の機能により、運転者Ｂ１と会話した他者の音声モデルを生成した場合は、生成した音声モデルを、当該他者と対応付けて登録してもよい。例えば、情報提示装置１６のＲＯＭなどの記録媒体又は情報提示システム１の外部のサーバに、予め音声モデルを登録しておく。そして、取得部３の機能により、運転者Ｂ１と会話している他者の音声を取得した場合に、特定部２の機能により特定された当該他人が、運転者Ｂ１と過去に会話したことがあるか否かを判定する。例えば、他者の顔の画像が、音声モデルと対応付けて予め登録されている場合は、車内カメラ１１から画像を取得し、取得した画像から他者の顔を認識する。そして、車両Ｖに乗車中の他者の顔と一致する顔の画像が、予め登録された画像の中あるか否かを判定する。 When the information presentation device 16 generates a voice model of another person who has conversed with the driver B1 using the function of the model generation unit 4, the information presentation device 16 may register the generated voice model in association with the other person. For example, the voice model is registered in advance in a recording medium such as a ROM of the information presentation device 16 or in a server external to the information presentation system 1. Then, when the information presentation device 16 acquires the voice of another person who is conversing with the driver B1 using the function of the acquisition unit 3, it determines whether or not the other person identified by the function of the identification unit 2 has conversed with the driver B1 in the past. For example, if an image of the face of the other person has been registered in advance in association with the voice model, it acquires an image from the in-vehicle camera 11 and recognizes the face of the other person from the acquired image. Then, it determines whether or not there is an image of a face that matches the face of the other person who is riding in the vehicle V among the pre-registered images.

車両Ｖに乗車中の他者の顔と一致する顔の画像が、予め登録された画像の中あると判定した場合は、情報提示装置１６は、車両Ｖの乗車中の他者が、運転者Ｂ１と過去に会話したことがあると認識し、当該他者に対応する、予め登録された音声モデルを音声生成部５に出力する。そして、出力された音声モデルを用いて案内音声を生成する。これに対し、車両Ｖに乗車中の他者の顔と一致する顔の画像が、予め登録された画像の中にないと判定した場合は、車両Ｖの乗車中の他者が、運転者Ｂ１と過去に会話したことがないと認識し、新たに音声モデルを生成する。 If it is determined that there is a facial image in the pre-registered images that matches the face of the other person in the vehicle V, the information presentation device 16 recognizes that the other person in the vehicle V has previously spoken with the driver B1, and outputs a pre-registered voice model corresponding to that other person to the voice generation unit 5. The output voice model is then used to generate a guidance voice. On the other hand, if it is determined that there is no facial image in the pre-registered images that matches the face of the other person in the vehicle V, it recognizes that the other person in the vehicle V has not previously spoken with the driver B1, and generates a new voice model.

例えば、図２に示す車両Ｖにおいて運転者Ｂ１と乗員Ｂ２が会話をしている場合は、情報提示装置１６は、車内カメラ１１の画像から乗員Ｂ２の顔を識別し、識別した乗員Ｂ２の顔と同じ顔の画像が、予め登録された画像の中に存在するか否かを判定する。識別した乗員Ｂ２の顔と同じ顔の画像が、予め登録された画像の中に存在すると判定した場合は、乗員Ｂ２に対応する、登録された音声モデルを用いて案内音声を生成する。これに対し、識別した乗員Ｂ２の顔と同じ顔の画像が、予め登録された画像の中に存在しないと判定した場合は、乗員Ｂ２の音声の特徴量を用いて、新たに音声モデルを生成する。 For example, when driver B1 and occupant B2 are having a conversation in vehicle V shown in FIG. 2, information presentation device 16 identifies the face of occupant B2 from the image from in-vehicle camera 11, and determines whether an image of the same face as the identified face of occupant B2 exists in pre-registered images. If it is determined that an image of the same face as the identified face of occupant B2 exists in pre-registered images, a guidance voice is generated using a registered voice model corresponding to occupant B2. On the other hand, if it is determined that an image of the same face as the identified face of occupant B2 does not exist in pre-registered images, a new voice model is generated using the voice features of occupant B2.

また、運転者Ｂ１と会話している他者が複数存在する場合は、情報提示装置１６は、モデル生成部４の機能により、発言時間が最も長い他者を選択し、音声生成部５の機能により、発言時間が最も長い他者の音声と類似する案内音声を生成してもよい。発言時間は、例えば、運転者Ｂ１が他者と会話をしていると判定してから、音声モデルの生成が完了するまでの発言時間とする。またこれに代え、運転者Ｂ１が他者と会話をしていると判定してから、上述の所定時間（例えば１５～６０秒）が経過するまでの発言時間としてもよい。情報提示装置１６は、出力部６の機能により、発言時間が最も長い他者の発話位置から案内音声を出力する。 In addition, if there are multiple other people conversing with the driver B1, the information presentation device 16 may use the function of the model generation unit 4 to select the other person who has spoken the longest, and use the function of the voice generation unit 5 to generate a guidance voice similar to the voice of the other person who has spoken the longest. The speech time may be, for example, the speech time from when it is determined that the driver B1 is conversing with another person to when the generation of the voice model is completed. Alternatively, it may be the speech time from when it is determined that the driver B1 is conversing with another person to when the above-mentioned predetermined time (for example, 15 to 60 seconds) has passed. The information presentation device 16 uses the function of the output unit 6 to output the guidance voice from the speaking position of the other person who has spoken the longest.

同様に、運転者Ｂ１と会話している他者が複数存在する場合は、情報提示装置１６は、音声生成部５の機能により他者ごとに案内音声を生成し、出力部６の機能により、生成した案内音声の中から、他者の音声と最も類似する案内音声を選択し、選択した案内音声を出力してもよい。他者の音声と最も類似するとは、他者の音声に含まれる周波数成分とその振幅と、生成した案内音声に含まれる周波数成分とその振幅との相違が最も小さいことを言う。なお、周波数成分とその振幅を算出する周波数解析は、生成した案内音声の全体に対して行う。 Similarly, if there are multiple other people conversing with the driver B1, the information presentation device 16 may generate a guidance voice for each other person using the function of the voice generation unit 5, and use the function of the output unit 6 to select from the generated guidance voices the guidance voice that is most similar to the voice of the other person, and output the selected guidance voice. "Most similar to the voice of the other person" means that the difference between the frequency components and their amplitudes contained in the voice of the other person and the frequency components and their amplitudes contained in the generated guidance voice is the smallest. Note that the frequency analysis to calculate the frequency components and their amplitudes is performed on the entire generated guidance voice.

同様に、運転者Ｂ１と会話している他者が複数存在する場合は、情報提示装置１６は、音声生成部５の機能により他者ごとに案内音声を生成し、取得部３の機能により、他者の中に発話していない他者が存在するか否かを判定する。情報提示装置１６は、例えば、車内カメラ１１から取得した画像を解析し、運転者Ｂ１と会話している乗員の口の動きを検出する。そして、当該乗員の口の動き（例えば口の開き方）から、当該乗員が発話しているか否かを判定する。情報提示装置１６は、発話していない他者が存在すると判定した場合は、発話していない他者の案内音声を出力する。これに対し、発話していない他者が存在しないと判定した場合は、案内音声の出力を一度停止し、発話していない他者が存在すると判定されるまで、当該判定を繰り返す。 Similarly, if there are multiple other people conversing with the driver B1, the information presentation device 16 generates a guidance voice for each other person using the function of the voice generation unit 5, and determines whether or not there is a non-speaking other person among the other people using the function of the acquisition unit 3. The information presentation device 16, for example, analyzes images acquired from the in-vehicle camera 11 to detect the mouth movement of the occupant conversing with the driver B1. Then, based on the mouth movement of the occupant (e.g., the opening of the mouth), determines whether or not the occupant is speaking. If the information presentation device 16 determines that there is a non-speaking other person, it outputs a guidance voice for the non-speaking other person. On the other hand, if it determines that there is no non-speaking other person, it stops outputting the guidance voice once, and repeats the determination until it determines that there is a non-speaking other person.

［情報提示システムにおける処理］
図３～６を参照して、情報提示装置１６が情報を処理する際の手順を説明する。図３～６は、本実施形態の情報提示システム１において実行される、情報の処理を示すフローチャートの一例である。以下に説明する処理は、情報提示装置１６のプロセッサ（ＣＰＵ）により所定の時間間隔で実行される。 [Processing in Information Presentation System]
The procedure for processing information by the information presentation device 16 will be described with reference to Figures 3 to 6. Figures 3 to 6 are examples of flowcharts showing information processing executed in the information presentation system 1 of this embodiment. The processing described below is executed at predetermined time intervals by the processor (CPU) of the information presentation device 16.

なお、図３～６に示すフローチャートは、車両Ｖが運転者Ｂ１による手動運転で走行している走行シーンを前提とする。手動運転とは、車載機器ではなく、ドライバーの操作により車両の走行を制御することを言うものとする。 The flowcharts shown in Figures 3 to 6 are based on the premise that the vehicle V is being driven manually by the driver B1. Manual driving refers to controlling the vehicle's driving by the driver's operation, rather than by on-board equipment.

まず、図３のステップＳ１にて、特定部２の機能により、車内カメラ１１が撮影した画像から、車両Ｖの乗員の人数と各乗員の着座位置とを特定する。続くステップＳ２にて、取得部３の機能により、マイクロフォン１２から車室内の音声を取得し、ステップＳ３にて、車内カメラ１１が撮影した画像から、乗員（特に運転者Ｂ１）の口の動きを検出する。ステップＳ４にて、取得部３の機能により、運転者Ｂ１が他者と会話しているか否かを判定する。運転者Ｂ１が他者と会話していないと判定した場合は、ステップＳ２に進む。これに対し、運転者Ｂ１が他者と会話していると判定した場合は、ステップＳ５に進む。 First, in step S1 of FIG. 3, the identification unit 2 functions to identify the number of occupants in the vehicle V and the seating positions of each occupant from the image captured by the in-vehicle camera 11. In the following step S2, the acquisition unit 3 functions to acquire voices within the vehicle cabin from the microphone 12, and in step S3, the mouth movements of the occupants (particularly the driver B1) are detected from the image captured by the in-vehicle camera 11. In step S4, the acquisition unit 3 functions to determine whether the driver B1 is talking to another person. If it is determined that the driver B1 is not talking to another person, the process proceeds to step S2. On the other hand, if it is determined that the driver B1 is talking to another person, the process proceeds to step S5.

ステップＳ５にて、運転者Ｂ１が会話している他者が、車両Ｖの同乗者であるか否かを判定する。当該他者が、車両Ｖの同乗者でないと判定した場合は、図４のステップＳ２１に進む。これに対し、当該他者が、車両Ｖの同乗者であると判定した場合は、ステップＳ６に進む。図４については後述する。ステップＳ６にて、同乗者が複数人であるか否かを判定する。同乗者が複数人であると判定した場合は、図５のステップＳ４１又は図６のステップＳ６１に進む。これに対し、同乗者が複数人でないと判定した場合は、ステップＳ７に進む。 In step S5, it is determined whether the other person with whom the driver B1 is talking is a passenger in the vehicle V. If it is determined that the other person is not a passenger in the vehicle V, the process proceeds to step S21 in FIG. 4. On the other hand, if it is determined that the other person is a passenger in the vehicle V, the process proceeds to step S6. FIG. 4 will be described later. In step S6, it is determined whether there are multiple passengers. If it is determined that there are multiple passengers, the process proceeds to step S41 in FIG. 5 or step S61 in FIG. 6. On the other hand, if it is determined that there are not multiple passengers, the process proceeds to step S7.

ステップＳ７にて、取得部３の機能により、車両の機器から出力された音声を車室内の音声から除去し、続くステップＳ８にて、車室内の音声から第１音声を抽出する。ステップＳ９にて、第１音声に対して周波数解析を実行し、続くステップＳ１０にて、第１音声の特徴量を取得する。ステップＳ１１にて、モデル生成部４の機能により、特徴量を用いて音声モデルを生成し、続くステップＳ１２にて、音声生成部５の機能により、生成した音声モデルを用いて第１案内音声を生成する。 In step S7, the function of the acquisition unit 3 is to remove the voice output from the vehicle equipment from the voice within the vehicle cabin, and in the subsequent step S8, the first voice is extracted from the voice within the vehicle cabin. In step S9, frequency analysis is performed on the first voice, and in the subsequent step S10, the feature quantities of the first voice are acquired. In step S11, the function of the model generation unit 4 is to generate a voice model using the feature quantities, and in the subsequent step S12, the function of the voice generation unit 5 is to generate the first guidance voice using the generated voice model.

ステップＳ１３にて、出力部６の機能により、所定時間における運転者Ｂ１と他者の発言時間の占める割合が所定値以下であるか否かを判定する。当該割合が所定値以下であると判定した場合は、ステップＳ１４に進み、同乗者から運転者に向けて第１案内音声を出力する。そして、本ルーチンの実行を終了する。これに対し、当該割合が所定値を超えると判定した場合は、ステップＳ１３を繰り返す。 In step S13, the function of the output unit 6 is used to determine whether the ratio of speech time taken by the driver B1 and other persons in a given time period is equal to or less than a given value. If it is determined that the ratio is equal to or less than the given value, the process proceeds to step S14, where a first guidance voice is output from the passenger to the driver. Then, execution of this routine ends. On the other hand, if it is determined that the ratio exceeds the given value, step S13 is repeated.

次に、図３のステップＳ５にて、当該他者が、車両Ｖの同乗者でないと判定した場合について、図４を参照して説明する。 Next, a case where it is determined in step S5 of FIG. 3 that the other person is not a passenger in vehicle V will be described with reference to FIG. 4.

まず、図４のステップＳ２１にて、取得部３の機能により、通信装置１３ら第２音声を取得し、続くステップＳ２２にて、第２音声に対して周波数解析を実行し、続くステップＳ２３にて、第２音声の特徴量を取得する。ステップＳ２４にて、モデル生成部４の機能により、通話者が運転者Ｂ１と過去に会話したか否かを判定する。通話者が運転者Ｂ１と過去に会話したことがあると判定した場合は、ステップＳ２５に進み、予め登録された音声モデルを用いて第２案内音声を生成する。これに対し、通話者が運転者Ｂ１と過去に会話したことがないと判定した場合は、ステップＳ２６に進み、特徴量を用いて音声モデルを生成し、続くステップＳ２７にて、音声生成部５の機能により、生成した音声モデルを用いて第２案内音声を生成する。 First, in step S21 of FIG. 4, the function of the acquisition unit 3 is used to acquire the second voice from the communication device 13, and in the subsequent step S22, frequency analysis is performed on the second voice, and in the subsequent step S23, the feature quantities of the second voice are acquired. In step S24, the function of the model generation unit 4 is used to determine whether the caller has previously spoken with the driver B1. If it is determined that the caller has previously spoken with the driver B1, the process proceeds to step S25, where the second guidance voice is generated using a pre-registered voice model. On the other hand, if it is determined that the caller has not previously spoken with the driver B1, the process proceeds to step S26, where a voice model is generated using the feature quantities, and in the subsequent step S27, the function of the voice generation unit 5 is used to generate the second guidance voice using the generated voice model.

ステップＳ２８にて、出力部６の機能により、通話者が発話中であるか否かを判定する。通話者が発話中であると判定した場合は、ステップＳ２９に進み、第２音声の波形と逆位相の波形を有する異なる音声を出力する。その後、ステップＳ３０に進む。これに対し、通話者が発話中でないと判定した場合は、ステップＳ３０に進み、第２音声を出力するスピーカーから運転者に向けて第２案内音声を出力する。そして、本ルーチンの実行を終了する。 In step S28, the function of the output unit 6 is used to determine whether the caller is speaking. If it is determined that the caller is speaking, the process proceeds to step S29, where a different voice having a waveform in the opposite phase to the waveform of the second voice is output. Then, the process proceeds to step S30. On the other hand, if it is determined that the caller is not speaking, the process proceeds to step S30, where a second guidance voice is output to the driver from the speaker that outputs the second voice. Then, execution of this routine ends.

次に、図３のステップＳ６にて、同乗者が複数人であると判定した場合について、図５を参照して説明する。 Next, the case where it is determined in step S6 of FIG. 3 that there are multiple passengers will be described with reference to FIG. 5.

まず、図５のステップＳ４１にて、車室内の音声から各同乗者の第１音声を抽出する。ステップＳ４２にて、各同乗者の第１音声に対して周波数解析を実行し、続くステップＳ４３にて、各同乗者の第１音声の特徴量を取得する。ステップＳ４４にて、モデル生成部４の機能により、各同乗者の第１音声の特徴量に最も類似した特徴量を有する、音声モデルを選択し、続くステップＳ４５にて、音声生成部５の機能により、選択した音声モデルを用いて第１案内音声を生成する。続くステップＳ４６にて、発話していない同乗者が存在するか否かを判定する。 First, in step S41 of FIG. 5, the first voice of each passenger is extracted from the voice in the vehicle cabin. In step S42, frequency analysis is performed on the first voice of each passenger, and in the subsequent step S43, the features of the first voice of each passenger are obtained. In step S44, the function of the model generation unit 4 is used to select a voice model having features most similar to the features of the first voice of each passenger, and in the subsequent step S45, the function of the voice generation unit 5 is used to generate a first guidance voice using the selected voice model. In the subsequent step S46, it is determined whether or not there is a passenger who is not speaking.

発話していない同乗者が存在すると判定した場合は、ステップＳ４７に進み、出力部６の機能により、発話していない同乗者の第１案内音声を出力するとともに、ステップＳ４８にて、同乗者が発話したと認識させるように車内のスピーカーを制御する。そして、本ルーチンの実行を終了する。これに対し、発話していない同乗者が存在しないと判定した場合は、ステップＳ４９に進み、第１音声と最も類似する第１案内音声を選択し、続くステップＳ５０にて、選択した第１案内音声を、同乗者の着座位置に最も近い位置のスピーカー１５から運転者Ｂ１に向けて出力する。そして、本ルーチンの実行を終了する。 If it is determined that there is a passenger who is not speaking, the process proceeds to step S47, where the output unit 6 uses its function to output a first guidance voice for the passenger who is not speaking, and in step S48, the speaker in the vehicle is controlled so that it is recognized that the passenger has spoken. Then, execution of this routine is terminated. On the other hand, if it is determined that there is no passenger who is not speaking, the process proceeds to step S49, where the first guidance voice that is most similar to the first voice is selected, and in the following step S50, the selected first guidance voice is output to the driver B1 from the speaker 15 located closest to the passenger's seating position. Then, execution of this routine is terminated.

次に、図３のステップＳ６にて、同乗者が複数人であると判定した場合について、図５に示す処理とは異なる処理について、図６を参照して説明する。 Next, in the case where it is determined in step S6 of FIG. 3 that there are multiple passengers, the process that differs from the process shown in FIG. 5 will be described with reference to FIG. 6.

まず、図６のステップＳ６１にて、モデル生成部４の機能により、発言時間が最長の同乗者を選択し、続くステップＳ６２にて、発言時間が最長の同乗者の第１音声を抽出し、続くステップＳ６３にて、抽出した第１音声に対して周波数解析を実行し、続くステップＳ６４にて、発言時間が最長の同乗者の第１音声の特徴量を取得する。ステップＳ６５にて、取得した特徴量が最も類似する音声モデルを選択し、続くステップＳ６６にて、音声生成部５の機能により、選択した音声モデルを用いて第１案内音声を生成する。 First, in step S61 of FIG. 6, the passenger with the longest speaking time is selected using the function of the model generation unit 4, then in step S62, the first voice of the passenger with the longest speaking time is extracted, then in step S63, frequency analysis is performed on the extracted first voice, and then in step S64, the feature quantities of the first voice of the passenger with the longest speaking time are acquired. In step S65, the voice model to which the acquired feature quantities are most similar is selected, and in step S66, the first guidance voice is generated using the selected voice model using the function of the voice generation unit 5.

ステップＳ６７にて、出力部６の機能により、案内音声は運転者に操作を要求する音声か否かを判定する。案内音声が運転者に操作を要求する音声であると判定した場合は、ステップＳ６８に進み、発言時間が最も長い同乗者から運転者Ｂ１に向けて第１案内音声を優先的に出力する。これに対し、案内音声が運転者に操作を要求する音声でないと判定した場合は、ステップＳ６９に進み、運転者に操作を要求する他の案内音声を優先的に出力する。 In step S67, the output unit 6 determines whether the guidance voice is a voice requesting an operation from the driver. If it is determined that the guidance voice is a voice requesting an operation from the driver, the process proceeds to step S68, where the first guidance voice is preferentially output from the passenger who has been speaking the longest to the driver B1. On the other hand, if it is determined that the guidance voice is not a voice requesting an operation from the driver, the process proceeds to step S69, where another guidance voice requesting an operation from the driver is preferentially output.

ステップＳ７０にて、運転者Ｂ１に操作を要求する他の案内音声を出力したか否かを判定する。運転者Ｂ１に操作を要求する他の案内音声を出力したと判定した場合は、ステップＳ７１に進み、発言時間が最も長い同乗者から運転者Ｂ１に向けて第１案内音声を出力する。そして、本ルーチンの実行を終了する。これに対し、運転者Ｂ１に操作を要求する他の案内音声を出力していないと判定した場合は、ステップＳ６９に進む。なお、本実施形態に係る情報提示装置１６は、運転者Ｂ１と他人とが会話をしている場合のみならず、他人同士が会話をしている場合にも適用できる。 In step S70, it is determined whether or not another voice guidance requesting an operation has been output to the driver B1. If it is determined that another voice guidance requesting an operation has been output to the driver B1, the process proceeds to step S71, in which a first voice guidance request is output to the driver B1 from the passenger who has spoken the longest. Then, execution of this routine ends. On the other hand, if it is determined that another voice guidance requesting an operation has not been output, the process proceeds to step S69. Note that the information presentation device 16 according to this embodiment can be applied not only to cases in which the driver B1 is having a conversation with another person, but also to cases in which other people are having a conversation with each other.

［本発明の実施態様］
以上のとおり、本実施形態によれば、プロセッサにより実行される、車両Ｖの情報提示方法において、前記プロセッサは、前記車両Ｖの運転者Ｂ１が他者と会話している場合は、前記他者の音声を取得し、前記音声と類似する、前記運転者Ｂ１に情報を提示する案内音声を生成し、前記他者の発話位置から前記案内音声を出力する、車両Ｖの情報提示方法が提供される。これにより、運転者Ｂ１の運転操作を阻害することなく、他者と会話中の運転者Ｂ１が案内音声を聞き逃すことを抑制できる。 [Embodiments of the invention]
As described above, according to the present embodiment, there is provided a method for presenting information to a vehicle V, which is executed by a processor, in which, when a driver B1 of the vehicle V is talking to another person, the processor acquires the voice of the other person, generates a guidance voice similar to the voice and presents information to the driver B1, and outputs the guidance voice from a position where the other person is speaking. This makes it possible to prevent the driver B1, who is talking to another person, from missing the guidance voice without impeding the driving operation of the driver B1.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記他者が、前記車両Ｖに乗車している同乗者である場合は、前記同乗者の第１音声を取得し、前記第１音声と類似する、前記車両Ｖが走行経路に沿って走行するための情報を前記運転者Ｂ１に提示する第１案内音声を生成し、前記同乗者から前記運転者Ｂ１に向けて前記第１案内音声を出力する。これにより、運転者Ｂ１の運転操作を阻害することなく、同乗者と会話中の運転者Ｂ１が案内音声を聞き逃すことを抑制できる。 According to the information presentation method of this embodiment, when the other person is a passenger in the vehicle V, the processor acquires a first voice of the passenger, generates a first guidance voice similar to the first voice and presents information for the vehicle V to travel along a travel route to the driver B1, and outputs the first guidance voice from the passenger to the driver B1. This makes it possible to prevent the driver B1 from missing the guidance voice while talking to the passenger, without interfering with the driving operation of the driver B1.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記他者が、通信装置１３を介して前記運転者Ｂ１と会話する車外の通話者である場合は、前記通話者の第２音声を取得し、前記第２音声と類似する、前記車両Ｖが走行経路に沿って走行するための情報を前記運転者Ｂ１に提示する第２案内音声を生成し、前記第２音声を出力するスピーカーから前記運転者Ｂ１に向けて前記第２案内音声を出力する。これにより、運転者Ｂ１の運転操作を阻害することなく、通話者と会話中の運転者Ｂ１が案内音声を聞き逃すことを抑制できる。 Furthermore, according to the information presentation method of this embodiment, when the other person is a caller outside the vehicle who is talking to the driver B1 via the communication device 13, the processor acquires the second voice of the caller, generates a second guidance voice similar to the second voice and presents information for the vehicle V to travel along a travel route to the driver B1, and outputs the second guidance voice to the driver B1 from a speaker that outputs the second voice. This makes it possible to prevent the driver B1, who is talking to the caller, from missing the guidance voice without interfering with the driving operation of the driver B1.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記音声に対して周波数解析を行って前記音声の特徴量を算出し、前記特徴量に基づき、予め登録された複数の音声モデルから前記特徴量が最も類似する前記音声モデルを選択し、選択した前記音声モデルを用いて前記案内音声を生成する。これにより、音声モデルを生成する工程を省略できる。 In addition, according to the information presentation method of this embodiment, the processor performs frequency analysis on the voice to calculate the features of the voice, selects the voice model with the most similar features from a plurality of pre-registered voice models based on the features, and generates the guidance voice using the selected voice model. This makes it possible to omit the step of generating a voice model.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記他者が複数存在する場合は、発言時間が最も長い前記他者を選択し、前記発言時間が最も長い前記他者の前記音声と類似する前記案内音声を生成し、前記発言時間が最も長い前記他者の前記発話位置から前記案内音声を出力する。これにより、会話の中心となる人間の音声を用いることができ、運転者Ｂ１が最も意識しやすい案内音声を出力できる。 In addition, according to the information presentation method of this embodiment, when there are multiple other people, the processor selects the other person who has spoken the longest, generates the guidance voice similar to the voice of the other person who has spoken the longest, and outputs the guidance voice from the speaking position of the other person who has spoken the longest. This makes it possible to use the voice of the person who is at the center of the conversation, and to output a guidance voice that is most easily noticeable to the driver B1.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記他者が複数存在する場合は、前記他者ごとに前記案内音声を生成し、前記案内音声の中から、前記音声と最も類似する前記案内音声を選択し、前記音声と最も類似する前記案内音声を出力する。これにより、運転者Ｂ１が案内音声に覚える違和感を抑制できる。 In addition, according to the information presentation method of this embodiment, when there are multiple other people, the processor generates the guidance voice for each of the other people, selects the guidance voice that is most similar to the voice from the guidance voices, and outputs the guidance voice that is most similar to the voice. This makes it possible to suppress the discomfort that the driver B1 feels from the guidance voice.

また、本実施形態の情報提示方法によれば、前記プロセッサは、所定時間における前記運転者Ｂ１と前記他者の発言時間の占める割合が所定値以下になったか否かを判定し、前記割合が前記所定値以下になったと判定したタイミングで前記案内音声を出力する。これにより、会話中の運転者Ｂ１が案内音声を聞き逃すことをより抑制できる。 In addition, according to the information presentation method of this embodiment, the processor determines whether the ratio of speech time of the driver B1 and the other person in a given time period is equal to or less than a given value, and outputs the guidance voice at the timing when it is determined that the ratio is equal to or less than the given value. This makes it possible to further prevent the driver B1 who is currently talking from missing the guidance voice.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記案内音声が、前記運転者Ｂ１に操作を要求する音声であるか否かを判定し、前記案内音声が前記操作を要求する音声であると判定した場合は、前記操作を要求する音声を、前記操作を要求しない音声に優先して出力する。これにより、重要度の高い情報を優先して伝達でき、併せて会話が中断することを抑制できる。 In addition, according to the information presentation method of this embodiment, the processor determines whether the guidance voice is a voice requesting the driver B1 to perform an operation, and if it determines that the guidance voice is a voice requesting the operation, outputs the voice requesting the operation in priority to the voice that does not request the operation. This allows information of high importance to be transmitted with priority, and also prevents conversation from being interrupted.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記案内音声を出力する場合に、前記他者が発話しているか否かを判定し、前記他者が発話していると判定したときは、前記音声の波形と逆位相の波形を有する異なる音声を出力する。これにより、会話中の運転者Ｂ１が案内音声を聞き逃すことをより抑制できる。 In addition, according to the information presentation method of this embodiment, when the processor outputs the guidance voice, it determines whether the other person is speaking, and when it determines that the other person is speaking, it outputs a different voice having a waveform in the opposite phase to the waveform of the voice. This makes it possible to further prevent the driver B1 who is talking from missing the guidance voice.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記運転者Ｂ１と会話した前記他者の音声モデルを、前記他者と対応付けて予め登録し、前記他者が前記運転者Ｂ１と過去に会話したことがあるか否かを判定し、前記他者が前記運転者Ｂ１と過去に会話したことがあると判定した場合は、前記他者に対応する、予め登録された前記音声モデルを用いて前記案内音声を生成する。これにより、音声モデルを生成する工程を省略できる。 In addition, according to the information presentation method of this embodiment, the processor pre-registers a voice model of the other person who has conversed with the driver B1 in association with the other person, determines whether the other person has conversed with the driver B1 in the past, and if it determines that the other person has conversed with the driver B1 in the past, generates the guidance voice using the pre-registered voice model corresponding to the other person. This makes it possible to omit the process of generating a voice model.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記他者が複数存在する場合は、前記他者ごとに前記案内音声を生成し、前記他者の中に発話していない前記他者が存在するか否かを判定し、発話していない前記他者が存在すると判定したときは、発話していない前記他者の前記案内音声を出力する。これにより、会話中の運転者Ｂ１が案内音声を聞き逃すことをより抑制できる。 In addition, according to the information presentation method of this embodiment, when there are multiple other people, the processor generates the guidance voice for each of the other people, determines whether or not there is a other person who is not speaking among the other people, and when it determines that there is a other person who is not speaking, outputs the guidance voice for the other person who is not speaking. This makes it possible to further prevent the driver B1 who is talking from missing the guidance voice.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記同乗者の人数と、各同乗者の着座位置とを特定する。これにより、複数の人間が車両Ｖに乗車していることを確認できる。 Furthermore, according to the information presentation method of this embodiment, the processor identifies the number of passengers and the seating positions of each passenger. This makes it possible to confirm that multiple people are riding in the vehicle V.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記車両Ｖの車室内の音声を取得し、前記車室内の音声から前記第１音声を抽出する。これにより、音声モデルの生成に影響するノイズを削減でき、案内音声の類似度が向上する。 Furthermore, according to the information presentation method of this embodiment, the processor acquires the voice in the vehicle cabin of the vehicle V and extracts the first voice from the voice in the vehicle cabin. This reduces noise that affects the generation of the voice model, and improves the similarity of the guidance voice.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記車室内の音声から前記第１音声を抽出する場合に、前記車室内の音声から前記車両Ｖの機器から出力された音声を除去する処理を行う。これにより、音声モデルの生成に影響するノイズをより的確に削減できる。 In addition, according to the information presentation method of this embodiment, when extracting the first voice from the voice within the vehicle cabin, the processor performs a process of removing the voice output from the device of the vehicle V from the voice within the vehicle cabin. This makes it possible to more accurately reduce noise that affects the generation of the voice model.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記運転者Ｂ１に、前記同乗者が発話したと認識させるように車内のスピーカー１５を制御する。これにより、複数のスピーカー１５を連動させ、会話中の運転者Ｂ１が案内音声を聞き逃すことをより抑制できる。 In addition, according to the information presentation method of this embodiment, the processor controls the speaker 15 in the vehicle so that the driver B1 recognizes that the passenger has spoken. This allows multiple speakers 15 to be linked together, making it possible to further reduce the likelihood that the driver B1, who is having a conversation, will miss the voice guidance.

また、本実施形態の情報提示方法によれば、前記プロセッサは、前記同乗者の着座位置に最も近い位置のスピーカー１５から前記案内音声を出力する。これにより、案内音声が出力する位置を同乗者に近づけることができ、会話中の運転者Ｂ１が案内音声を聞き逃すことをより抑制できる。 In addition, according to the information presentation method of this embodiment, the processor outputs the guidance voice from the speaker 15 located closest to the passenger's seating position. This allows the position from which the guidance voice is output to be closer to the passenger, making it possible to further reduce the likelihood that the driver B1, who is currently having a conversation, will miss the guidance voice.

また、本実施形態によれば、車両Ｖの運転者Ｂ１が他者と会話している場合に、前記他者の音声を取得する取得部３と、前記音声と類似する、前記運転者Ｂ１に情報を提示する案内音声を生成する音声生成部５と、前記他者の発話位置から前記案内音声を出力する出力部６とを備える、車両Ｖの情報提示装置１６が提供される。これにより、これにより、運転者Ｂ１の運転操作を阻害することなく、他者と会話中の運転者Ｂ１が案内音声を聞き逃すことを抑制できる。 In addition, according to this embodiment, an information presentation device 16 for a vehicle V is provided, which includes an acquisition unit 3 that acquires the voice of another person when the driver B1 of the vehicle V is talking to another person, a voice generation unit 5 that generates a guidance voice that presents information to the driver B1 and is similar to the voice, and an output unit 6 that outputs the guidance voice from the speaking position of the other person. This makes it possible to prevent the driver B1, who is talking to another person, from missing the guidance voice without interfering with the driving operation of the driver B1.

１…情報提示システム
１１…車内カメラ
１２、１２ａ、１２ｂ、１２ｃ…マイクロフォン
１３、１３ａ…通信装置
１４…ナビゲーション装置
１５、１５ａ、１５ｂ、１５ｃ、１５ｄ…スピーカー
１６…情報提示装置
２…特定部
３…取得部
４…モデル生成部
５…音声生成部
６…出力部
Ａ１…運転席
Ａ２…助手席
Ａ３、Ａ４…後部座席
Ｂ１…運転者
Ｂ２、Ｂ３、Ｂ４…乗員
Ｖ…車両 Reference Signs List 1...Information presentation system 11...In-vehicle camera 12, 12a, 12b, 12c...Microphone 13, 13a...Communication device 14...Navigation device 15, 15a, 15b, 15c, 15d...Speaker 16...Information presentation device 2...Identification unit 3...Acquisition unit 4...Model generation unit 5...Speech generation unit 6...Output unit A1...Driver's seat A2...Passenger seat A3, A4...Rear seat B1...Driver B2, B3, B4...Occupant V...Vehicle

Claims

A method for displaying information about a vehicle, the method comprising:
The processor,
When the driver of the vehicle is talking to another person, the voice of the other person is acquired;
generating a guidance voice similar to the voice for presenting information to the driver;
The vehicle information presentation method includes outputting the guidance voice from the speaking position of the other person.

The processor,
When the other person is a passenger in the vehicle, a first voice of the passenger is acquired;
generating a first guidance voice similar to the first voice and presenting information to the driver for the vehicle to travel along a travel route;
The vehicle information presentation method according to claim 1 , wherein the first guidance voice is output from the passenger to the driver.

The processor,
When the other person is a caller outside the vehicle who is talking to the driver via a communication device, a second voice of the caller is acquired;
generating a second guidance voice similar to the second voice and presenting information to the driver for the vehicle to travel along a travel route;
The vehicle information presentation method according to claim 1 , further comprising: outputting the second guidance voice to the driver from a speaker that outputs the second voice.

The processor,
performing a frequency analysis on the voice to calculate a feature amount of the voice;
Selecting a voice model having the most similar feature from a plurality of pre-registered voice models based on the feature;
The method for presenting information to a vehicle according to claim 1 , further comprising generating the guidance voice using the selected voice model.

The processor,
If there are multiple other people, select the other person who has spoken the longest,
generating the guidance voice similar to the voice of the other person having the longest speaking time;
The method for presenting information to a vehicle according to claim 1 , further comprising: outputting the guidance voice from the speaking position of the other person whose speech duration is the longest.

The processor,
When there are a plurality of other people, the guidance voice is generated for each of the other people;
Selecting the guidance voice that is most similar to the voice from among the guidance voices;
The method for presenting information to a vehicle according to claim 1 or 4, further comprising: outputting the guidance voice that is most similar to the voice.

The processor,
determining whether a ratio of speech time of the driver and the other person in a predetermined time period is equal to or less than a predetermined value;
The information presentation method for a vehicle according to claim 1 , further comprising: outputting the guidance voice at a timing when it is determined that the ratio is equal to or smaller than the predetermined value.

The processor,
determining whether the guidance voice is a voice requesting an operation from the driver;
The vehicle information presentation method according to claim 1 or 4, wherein, when it is determined that the guidance voice is a voice requesting the operation, the voice requesting the operation is output in priority to a voice not requesting the operation.

The processor,
When outputting the guidance voice, it is determined whether or not the other person is speaking;
5. The method for presenting information to a vehicle according to claim 1, further comprising the step of outputting a different voice having a waveform inversely phase to a waveform of the voice when it is determined that the other person is speaking.

The processor,
A voice model of the other person who has had a conversation with the driver is registered in advance in association with the other person;
determining whether the other person has had a previous conversation with the driver;
5. The vehicle information presentation method according to claim 1, wherein if it is determined that the other person has had a conversation with the driver in the past, the guidance voice is generated using a pre-registered voice model corresponding to the other person.

The processor,
When there are a plurality of other people, the guidance voice is generated for each of the other people;
determining whether or not there is a non-speaking other person among the other people;
The method for presenting information to a vehicle according to claim 1 or 4, further comprising the step of outputting the guidance voice of the other person who is not speaking when it is determined that the other person who is not speaking is present.

The processor,
The vehicle information presentation method according to claim 2 , further comprising identifying the number of passengers and the seating positions of each passenger.

The processor,
Acquire voice in the cabin of the vehicle;
The method for presenting information to a vehicle according to claim 2 or 12, further comprising extracting the first voice from a voice within the vehicle cabin.

The processor,
The vehicle information presentation method according to claim 13 , further comprising the step of: removing a sound output from a device of the vehicle from the sound within the vehicle cabin when the first sound is extracted from the sound within the vehicle cabin.

The processor,
The information presentation method for a vehicle according to claim 2 or 12, further comprising controlling a speaker in the vehicle so as to make the driver recognize that the passenger has spoken.

The processor,
The information presentation method for a vehicle according to claim 2 or 12, wherein the guidance voice is output from a speaker located closest to a seating position of the passenger.

an acquisition unit that acquires a voice of another person when the driver of the vehicle is talking to the other person;
a voice generating unit that generates a guidance voice similar to the voice and presents information to the driver;
and an output unit that outputs the guidance voice from the speaking position of the other person.