JP5853746B2

JP5853746B2 - Audio output device and audio output system

Info

Publication number: JP5853746B2
Application number: JP2012025052A
Authority: JP
Inventors: 俊兵花田
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2012-02-08
Filing date: 2012-02-08
Publication date: 2016-02-09
Anticipated expiration: 2032-02-08
Also published as: JP2013161038A

Description

本発明は、予め設定された案内ポイントに関する案内音声として、予め録音された録音音声と新たに生成した合成音声とからなる混成音声を出力する音声出力装置、および、この音声出力装置と当該音声出力装置に通信可能に接続される外部の装置とからなる音声出力システムに関する。 The present invention relates to a voice output device that outputs a hybrid voice composed of a pre-recorded voice and a newly generated synthesized voice as guidance voice related to a preset guidance point, and the voice output device and the voice output. The present invention relates to an audio output system including an external device that is communicably connected to the device.

従来より、音声合成用のライブラリを搭載し、入力された音声から合成音声データを生成し、その合成音声データに基づいて合成音声を出力する装置が公知である（例えば、特許文献１参照）。ところが、合成音声データを生成する処理は負荷が大きく、従って、合成音声データの生成が遅れてしまう傾向がある。そのため、予め録音された録音音声と新たに生成した合成音声とからなる混成音声を出力する場合には、録音音声と生成が遅れた合成音声との間に音切れ、つまり、録音音声の出力と合成音声の出力とが滑らかに連続せず途中で途切れてしまうという不具合が発生してしまう。 2. Description of the Related Art Conventionally, a device that has a library for speech synthesis, generates synthesized speech data from input speech, and outputs synthesized speech based on the synthesized speech data is known (see, for example, Patent Document 1). However, the process of generating the synthesized voice data has a heavy load, and therefore the generation of the synthesized voice data tends to be delayed. For this reason, when outputting a mixed sound composed of a pre-recorded recording sound and a newly generated synthesized sound, the sound is interrupted between the recorded sound and the synthesized sound delayed in generation, that is, the output of the recorded sound There is a problem that the output of the synthesized speech is not smoothly continuous and is interrupted in the middle.

特開２００５−１４２９２６号公報JP 2005-142926 A

本発明は上記した事情に鑑みてなされたものであり、その目的は、予め録音された録音音声と新たに生成した合成音声とからなる混成音声を出力する場合に、録音音声の出力と合成音声の出力とが途切れてしまうことを回避することができる音声出力装置、および、この音声出力装置と当該音声出力装置に通信可能に接続される外部の装置とからなる音声出力システムを提供することにある。 The present invention has been made in view of the above-described circumstances, and an object of the present invention is to output recorded speech and synthesized speech when outputting mixed speech composed of pre-recorded recorded speech and newly generated synthesized speech. To provide an audio output system including the audio output device and an external device that is communicably connected to the audio output device. is there.

本発明によれば、移動体に搭載された音声出力装置は、移動体の進行方向前方に存在する案内ポイントに対応する混成音声データを、当該案内ポイントに移動体が到達する前に、録音音声データ抽出手段が録音音声データ記憶部から抽出した録音音声データと、音声出力装置に通信可能に接続される外部の装置が生成した合成音声データとを用いて生成する。そして、移動体が進行方向前方に存在する案内ポイントに到達すると、その混成音声データに基づいて混成音声を出力する。
即ち、例えば車両などの移動体の進行方向前方に存在する案内ポイントに対応する混成音声データを当該案内ポイントに移動体が到達する前に事前に生成するように構成した。これにより、処理の負荷が大きい合成音声データの生成が遅れたとしても、音声出力装置を搭載する移動体が案内ポイントに到達する前に、余裕を持って録音音声データと合成音声データとからなる混成音声データを生成することができる。従って、生成した混成音声データに基づいて案内音声として混成音声を出力する場合に、録音音声の出力と合成音声の出力とが途切れてしまうことを回避することができる。 According to the present invention, the audio output device mounted on the moving body records the mixed audio data corresponding to the guidance point existing ahead in the traveling direction of the moving body before the moving body reaches the guidance point. It is generated using the recorded voice data extracted from the recorded voice data storage unit by the data extraction means and the synthesized voice data generated by an external device communicably connected to the voice output device. When the moving body reaches a guidance point existing forward in the traveling direction, the hybrid voice is output based on the hybrid voice data.
That is, for example, mixed voice data corresponding to a guide point existing in front of a moving body such as a vehicle is generated in advance before the mobile body reaches the guide point. As a result, even if the generation of the synthesized voice data with a large processing load is delayed, before the mobile object equipped with the voice output device reaches the guide point, it is composed of the recorded voice data and synthesized voice data with a margin. Hybrid audio data can be generated. Therefore, it is possible to avoid the interruption of the output of the recorded voice and the output of the synthesized voice when outputting the mixed voice as the guidance voice based on the generated mixed voice data.

本発明の一実施形態に係る音声出力システムの構成を概略的に示す機能ブロック図1 is a functional block diagram schematically showing the configuration of an audio output system according to an embodiment of the present invention. （ａ）は音声データテーブルの一例を示す図、（ｂ）は必要音声データテーブルの一例を示す図(A) is a figure which shows an example of an audio | voice data table, (b) is a figure which shows an example of a required audio | voice data table. 合成音声データ用バッファ部および混成音声データ用バッファ部を概略的に示す図The figure which shows schematically the buffer part for synthetic | combination audio | voice data, and the buffer part for hybrid audio | voice data 経路案内用の画面の一例を示す図The figure which shows an example of the screen for route guidance 音声出力システムの動作内容を示すフローチャートFlow chart showing operation contents of audio output system 混成音声データが生成される過程を示す図であり、（ａ）は合成音声データが格納された状態を示す図、（ｂ）はさらに録音音声データが格納された状態を示す図、（ｃ）は混成音声データが生成された状態を示す図It is a figure which shows the process in which mixed audio | voice data are produced | generated, (a) is a figure which shows the state in which synthetic audio | voice data is stored, (b) is a figure which shows the state in which recording audio | voice data is further stored, (c) Is a diagram showing a state in which hybrid audio data is generated 混成音声データ生成処理の内容を示すフローチャートFlow chart showing contents of hybrid voice data generation processing 変形例に係る図７相当図FIG. 7 equivalent diagram according to a modified example 変形例に係る図４相当図FIG. 4 equivalent diagram according to the modification

以下、本発明の一実施形態について図面を参照しながら説明する。図１に示すように、音声出力システム１０は、例えば車両などの移動体に搭載されるナビゲーション装置１１と、このナビゲーション装置１１に通信可能に接続される携帯通信端末４１と、からなる。なお、ナビゲーション装置１１は、特許請求の範囲に記載した「音声出力装置」に相当し、携帯通信端末４１は、特許請求の範囲に記載した「外部の装置」に相当する。
ナビゲーション装置１１は、制御部１２、位置検出部１３、データ通信部１４、外部記憶部１５、内部記憶部１６、表示出力部１７、音声出力部１８、操作入力部１９、ナビゲーション機能部２０、ユーザ設定機能部２１、音声合成機能部２２などを備える。制御部１２は、図示しないＣＰＵ、ＲＡＭ、ＲＯＭおよびＩ／Ｏバスなどを有するマイクロコンピュータを主体として構成されている。制御部１２は、ＲＯＭなどの記憶媒体に記憶されているコンピュータプログラムに従って、各種の表示出力動作、音声出力動作、経路案内動作などナビゲーション装置１１の動作全般を制御する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. As shown in FIG. 1, the audio | voice output system 10 consists of the navigation apparatus 11 mounted in moving bodies, such as a vehicle, for example, and the portable communication terminal 41 connected to this navigation apparatus 11 so that communication is possible. The navigation device 11 corresponds to an “audio output device” described in the claims, and the mobile communication terminal 41 corresponds to an “external device” described in the claims.
The navigation device 11 includes a control unit 12, a position detection unit 13, a data communication unit 14, an external storage unit 15, an internal storage unit 16, a display output unit 17, a voice output unit 18, an operation input unit 19, a navigation function unit 20, and a user. A setting function unit 21 and a speech synthesis function unit 22 are provided. The control unit 12 is mainly configured by a microcomputer having a CPU, a RAM, a ROM, an I / O bus, and the like (not shown). The control unit 12 controls the overall operation of the navigation device 11 such as various display output operations, audio output operations, route guidance operations, and the like according to a computer program stored in a storage medium such as a ROM.

また、この制御部１２は、コンピュータプログラムを実行することにより、案内ポイント検出処理部３１、録音音声データ抽出処理部３２、合成音声データ保持処理部３３、混成音声データ生成処理部３４、混成音声出力処理部３５をソフトウェアによって仮想的に実現する。なお、録音音声データ抽出処理部３２は、特許請求の範囲に記載した録音音声データ抽出手段に相当し、混成音声データ生成処理部３４は、特許請求の範囲に記載した混成音声データ生成手段に相当し、混成音声出力処理部３５は、特許請求の範囲に記載した混成音声出力手段に相当し、合成音声データ生成処理部３６は、特許請求の範囲に記載した音声出力装置が備える合成音声データ生成手段に相当する。 In addition, the control unit 12 executes a computer program, so that a guidance point detection processing unit 31, a recorded voice data extraction processing unit 32, a synthesized voice data holding processing unit 33, a hybrid voice data generation processing unit 34, and a hybrid voice output. The processing unit 35 is virtually realized by software. The recorded voice data extraction processing unit 32 corresponds to the recorded voice data extraction unit described in the claims, and the mixed voice data generation processing unit 34 corresponds to the mixed voice data generation unit described in the claims. The hybrid voice output processing unit 35 corresponds to the hybrid voice output unit described in the claims, and the synthesized voice data generation processing unit 36 generates the synthesized voice data included in the voice output device described in the claims. Corresponds to means.

位置検出部１３は、ナビゲーション装置１１を搭載した車両の現在位置を検出するための検出モジュールである。位置検出部１３は、方位センサ１３ａ、ジャイロセンサ１３ｂ、距離センサ１３ｃ、測位用電波受信器１３ｄなどの各種のセンサ類を有している。方位センサ１３ａは、車両の方位を検出する。ジャイロセンサ１３ｂは、車両の回転角度を検出する。距離センサ１３ｃは、車両の走行距離を検出する。測位用電波受信器１３ｄは、測位システムにより車両の現在位置を測位するために、図示しない測位用衛星から送信される電波を受信する。位置検出部１３は、これらのセンサ類から得られる検出データを相互に補完しながら車両の現在位置を検出し、その現在位置を示す現在位置情報を制御部１２に出力する。 The position detection unit 13 is a detection module for detecting the current position of the vehicle on which the navigation device 11 is mounted. The position detection unit 13 includes various sensors such as a direction sensor 13a, a gyro sensor 13b, a distance sensor 13c, and a positioning radio wave receiver 13d. The direction sensor 13a detects the direction of the vehicle. The gyro sensor 13b detects the rotation angle of the vehicle. The distance sensor 13c detects the travel distance of the vehicle. The positioning radio wave receiver 13d receives radio waves transmitted from a positioning satellite (not shown) in order to determine the current position of the vehicle by the positioning system. The position detector 13 detects the current position of the vehicle while mutually complementing the detection data obtained from these sensors, and outputs current position information indicating the current position to the controller 12.

データ通信部１４は、後述する携帯通信端末４１のデータ通信部４３との間に通信回線を確立し、この通信回線を介して携帯通信端末４１との間で各種のデータを送受信する通信モジュールである。この場合、データ通信部１４およびデータ通信部４３は、無線の近距離通信回線を確立する。なお、本実施形態における近距離通信は、一般的な車両の車室内をカバーする程度の距離を主として想定しているが、例えば、車両のタイプ（型）や大きさ、ナビゲーション装置１１および携帯通信端末４１の通信性能など種々の要因に応じて、その距離範囲を適宜変更して実施することができ、その距離範囲に適した通信機能を採用することができる。 The data communication unit 14 is a communication module that establishes a communication line with a data communication unit 43 of a mobile communication terminal 41 (to be described later) and transmits / receives various data to / from the mobile communication terminal 41 via the communication line. is there. In this case, the data communication unit 14 and the data communication unit 43 establish a wireless short-range communication line. The short-range communication in the present embodiment mainly assumes a distance that covers the interior of a general vehicle. For example, the type and size of the vehicle, the navigation device 11, and portable communication The distance range can be changed as appropriate according to various factors such as the communication performance of the terminal 41, and a communication function suitable for the distance range can be employed.

外部記憶部１５は、特許請求の範囲に記載した録音音声データ記憶部に相当するものであり、この場合、ナビゲーション装置１１に対して着脱可能な例えばＳＤカードなどの記憶媒体で構成されている。
この外部記憶部１５には、図２（ａ）に示す音声データテーブルＴ１が記憶されており、この音声データテーブルＴ１には、各音声ＩＤに対応付けて録音音声データあるいはテキストデータが格納されている。録音音声データは、予め録音された録音音声のデータであり、例えば録音音声データ[In half of a mile,]は、予め録音された「インハーフオブアマイル」という音声を出力するためのデータである。また、テキストデータ[ABC Street]は、「Ａ」，「Ｂ」，「Ｃ」，「Ｓ」，「ｔ」，「ｒ」，「ｅ」，「ｅ」，「ｔ」という文字列を格納したデータである。 The external storage unit 15 corresponds to the recorded voice data storage unit described in the claims, and in this case, the external storage unit 15 is configured by a storage medium such as an SD card that can be attached to and detached from the navigation device 11.
The external storage unit 15 stores a voice data table T1 shown in FIG. 2A. The voice data table T1 stores recorded voice data or text data in association with each voice ID. Yes. The recorded voice data is pre-recorded voice data, for example, the recorded voice data [In half of a mile,] is data for outputting a pre-recorded “in half of a mile” voice. is there. The text data [ABC Street] stores character strings “A”, “B”, “C”, “S”, “t”, “r”, “e”, “e”, “t”. Data.

また、この外部記憶部１５には、図２（ｂ）に示す必要音声データテーブルＴ２が記憶されている。この必要音声データテーブルＴ２は、特許請求の範囲に記載の「必要音声識別情報」を含むデータテーブルであり、予め設定された複数の案内ポイントについて、これら案内ポイントのそれぞれに対応する混成音声データを生成するために必要な録音音声データおよび合成音声データを、各音声データに付与された音声ＩＤに基づいて案内ポイントごとに指定するデータテーブルである。即ち、例えば案内ポイントａに対応する混成音声データを生成するために必要な音声データは、音声ＩＤ＝１，２，３，４の音声データ、つまり、図２（ａ）に示す音声ＩＤ[１]の録音音声データ[In half of a mile,]、音声ＩＤ[２]の録音音声データ[right turn]、音声ＩＤ[３]の録音音声データ[onto]、および、詳しくは後述するようにして音声ＩＤ[４]のテキストデータ[ABC Street]に基づいて新たに生成される合成音声データ[ABC Street]である。
なお、「案内ポイント」とは、例えば現在走行中の走行道路と当該走行道路に続く案内対象となる道路との交差点などのポイントを示すものではなく、このような交差点などのポイントの手前において案内音声の出力を開始する案内音声出力開始ポイントとして設定されたポイントを示す。 The external storage unit 15 stores a necessary audio data table T2 shown in FIG. This necessary voice data table T2 is a data table including “necessary voice identification information” described in the claims, and for a plurality of preset guidance points, mixed voice data corresponding to each of these guidance points is obtained. It is a data table which designates the recording audio | voice data and synthetic | combination audio | voice data required in order for every guidance point based on audio | voice ID provided to each audio | voice data. That is, for example, the voice data necessary for generating the hybrid voice data corresponding to the guide point a is the voice data of voice ID = 1, 2, 3, 4, that is, the voice ID [1 shown in FIG. ] Of recorded voice data [In half of a mile,], recorded voice data [right turn] of voice ID [2], recorded voice data [onto] of voice ID [3], and details will be described later. This is synthesized voice data [ABC Street] newly generated based on the text data [ABC Street] of the voice ID [4].
Note that the “guidance point” does not indicate a point such as an intersection between a currently traveling road and a road to be guided following the road, but guides before such a point. The point set as the guidance audio | voice output start point which starts an audio | voice output is shown.

内部記憶部１６は、ナビゲーション装置１１に内蔵された例えばメモリなどの記憶媒体で構成されており、この場合、合成音声データ用バッファ部１６Ａと混成音声データ用バッファ部１６Ｂとを有する。これら合成音声データ用バッファ部１６Ａおよび混成音声データ用バッファ部１６Ｂは、図３に示すように、何れも複数の記憶領域を有しており、各記憶領域に１つの音声データが格納されるようになっている。合成音声データ用バッファ部１６Ａは、詳しくは後述するようにして携帯通信端末４１から受信した合成音声データを一時的に保持するための記憶領域である。一方、混成音声データ用バッファ部１６Ｂは、録音音声データと合成音声データとからなる混成音声データを生成し保持するための記憶領域である。この場合、混成音声データ用バッファ部１６Ｂには、録音音声データを格納するための録音音声データ格納部１６Ｂａと、合成音声データを格納するための合成音声データ格納部１６Ｂｂとが設けられている。 The internal storage unit 16 is configured by a storage medium such as a memory built in the navigation device 11. In this case, the internal storage unit 16 includes a synthesized voice data buffer unit 16A and a hybrid voice data buffer unit 16B. As shown in FIG. 3, each of the synthesized audio data buffer unit 16A and the mixed audio data buffer unit 16B has a plurality of storage areas, and one audio data is stored in each storage area. It has become. The synthesized voice data buffer unit 16A is a storage area for temporarily holding synthesized voice data received from the mobile communication terminal 41 as will be described in detail later. On the other hand, the mixed sound data buffer unit 16B is a storage area for generating and holding mixed sound data composed of recorded sound data and synthesized sound data. In this case, the mixed voice data buffer unit 16B is provided with a recorded voice data storage unit 16Ba for storing recorded voice data and a synthesized voice data storage unit 16Bb for storing synthesized voice data.

表示出力部１７は、例えば液晶や有機ＥＬなどのカラーディスプレイからなる表示画面を有しており、制御部１２からの表示出力信号に基づいて各種の情報を表示する。この表示出力部１７の画面は、タッチパネルで構成されている。この表示出力部１７に表示される内容は、例えば図４に示す経路案内用の画面Ｇ１、図示しない各種の設定用画面などである。 The display output unit 17 has a display screen composed of a color display such as a liquid crystal or an organic EL, and displays various types of information based on a display output signal from the control unit 12. The screen of the display output unit 17 is composed of a touch panel. The contents displayed on the display output unit 17 are, for example, a route guidance screen G1 shown in FIG. 4 and various setting screens (not shown).

音声出力部１８は、例えばスピーカなどで構成されており、制御部１２からの音声出力信号に基づいて各種の音声を出力する。この音声出力部１８から出力される音声は、詳しくは後述する混成音声データに基づく案内音声、および、その他の経路案内用の音声、あるいは、操作説明用の音声などである。
操作入力部１９は、表示出力部１７の画面の近傍に設けられたメカニカルスイッチ、あるいは、表示出力部１７の画面に設けられているタッチパネルスイッチなど各種のスイッチ群から構成されている。ユーザは、操作入力部１９の各スイッチを用いて各種の設定操作が可能である。 The audio output unit 18 includes, for example, a speaker and outputs various types of audio based on an audio output signal from the control unit 12. The voice output from the voice output unit 18 is, in detail, a guidance voice based on hybrid voice data, which will be described later, and other route guidance voices or voices for explaining operations.
The operation input unit 19 includes various switch groups such as a mechanical switch provided near the screen of the display output unit 17 or a touch panel switch provided on the screen of the display output unit 17. The user can perform various setting operations using each switch of the operation input unit 19.

ナビゲーション機能部２０は、例えば、位置検出部１３から入力される現在位置情報に基づいて特定した車両の現在位置から、操作入力部１９を介して入力された目的地までの経路を探索し、その経路に基づいて車両を案内する機能、いわゆるナビゲーション機能を実行するものである。このナビゲーション機能部２０は、車両の経路案内を実行しない場合には、デフォルト表示として、車両の現在位置周辺の地図を表示出力部１７に表示するとともに、この表示中の地図に重ねて車両の現在位置および進行方向を示す現在位置マークを表示する。このとき、ナビゲーション機能部２０は、車両の現在位置マークを地図中の道路上に合わせるマップマッチング処理を実施する。この現在位置マークは、車両の走行に伴い地図上を移動する。また、表示出力部１７の画面に表示される地図は、車両の現在位置に応じてスクロールされる。また、ナビゲーション機能部２０は、車両の経路案内を実行する場合には、表示出力部１７に例えば図４に示す経路案内用の画面Ｇ１を表示し、探索した案内経路Ｒ１を強調表示するとともに当該案内経路Ｒ１に基づいて車両を案内する。この場合も、ナビゲーション機能部２０は、車両の現在位置マークＮを地図中の道路上に合わせるマップマッチング処理を実施し、現在位置マークＮを車両の走行に伴い地図上を移動させるとともに、表示出力部１７の画面に表示される地図を車両の現在位置に応じてスクロールする。 For example, the navigation function unit 20 searches for a route from the current position of the vehicle specified based on the current position information input from the position detection unit 13 to the destination input via the operation input unit 19, and A function for guiding the vehicle based on the route, that is, a so-called navigation function is executed. The navigation function unit 20 displays a map around the current position of the vehicle on the display output unit 17 as a default display when the route guidance of the vehicle is not executed, and superimposes the current map of the vehicle on the displayed map. The current position mark indicating the position and the traveling direction is displayed. At this time, the navigation function unit 20 performs a map matching process for aligning the current position mark of the vehicle on the road in the map. This current position mark moves on the map as the vehicle travels. Further, the map displayed on the screen of the display output unit 17 is scrolled according to the current position of the vehicle. Further, when executing the route guidance of the vehicle, the navigation function unit 20 displays, for example, the route guidance screen G1 shown in FIG. 4 on the display output unit 17, highlights the searched guidance route R1 and The vehicle is guided based on the guide route R1. In this case as well, the navigation function unit 20 performs a map matching process for aligning the current position mark N of the vehicle on the road in the map, and moves the current position mark N on the map as the vehicle travels, and outputs the display. The map displayed on the screen of the unit 17 is scrolled according to the current position of the vehicle.

ユーザ設定機能部２１は、音声出力システム１０を使用するユーザを設定し、その設定したユーザに関する情報を、図示しないユーザ情報記憶部にユーザ情報として保持する。このユーザ情報記憶部に保持される情報には、少なくとも、音声出力システム１０を使用するユーザが所有する携帯通信端末４１を識別するための端末ＩＤが含まれる。ユーザ設定機能部２１は、例えば表示出力部１７に表示される図示しないユーザ設定画面を介してユーザが入力した情報をユーザ情報として保持する。また、ユーザ設定機能部２１は、複数のユーザについてそれぞれのユーザ情報を保持することが可能であり、複数のユーザ情報を保持している場合には、例えば表示出力部１７に表示される図示しないユーザ切替画面を介して、音声出力システム１０を使用する現在のユーザを切り替え可能となっている。なお、ユーザ情報記憶部は、外部記憶部１５、内部記憶部１６、あるいは、制御部１２が備えるメモリなどと共用してもよいし、これらとは独立して別個に設けてもよい。 The user setting function unit 21 sets a user who uses the audio output system 10, and holds information regarding the set user as user information in a user information storage unit (not shown). The information held in the user information storage unit includes at least a terminal ID for identifying the mobile communication terminal 41 owned by the user who uses the audio output system 10. The user setting function unit 21 holds, as user information, information input by the user via a user setting screen (not shown) displayed on the display output unit 17, for example. Further, the user setting function unit 21 can hold user information for a plurality of users. When the user setting function unit 21 holds a plurality of user information, for example, the user setting function unit 21 is displayed on the display output unit 17 (not shown). The current user who uses the audio output system 10 can be switched via the user switching screen. Note that the user information storage unit may be shared with the external storage unit 15, the internal storage unit 16, or a memory included in the control unit 12, or may be provided separately from these.

音声合成機能部２２は、音声合成用の音声合成ライブラリや音声合成辞書などを有している。この音声合成機能部２２は、ナビゲーション装置１１に接続されている図示しない車載マイクを介して入力された音声、あるいは、操作入力部１９を介して入力されたテキストを音声合成ライブラリや音声合成辞書などを用いて音声データに変換し、その音声データを合成音声データとして出力する機能を有する。 The speech synthesis function unit 22 includes a speech synthesis library for speech synthesis, a speech synthesis dictionary, and the like. The voice synthesis function unit 22 converts voices input via an in-vehicle microphone (not shown) connected to the navigation device 11 or texts input via the operation input unit 19 into a voice synthesis library or a voice synthesis dictionary. Is converted into voice data, and the voice data is output as synthesized voice data.

案内ポイント検出処理部３１は、この場合、ナビゲーション機能部２０によって経路案内用に設定された案内経路上において車両の進行方向前方に存在する案内ポイントを検出する。なお、案内経路が確定した時点で、当該案内経路上に存在する案内ポイントの座標データは地図データなどに基づいて特定することができる。従って、この場合、この案内ポイント検出処理部３１による案内ポイントの検出処理は、確定した案内経路のデータと地図データとに基づいて、案内経路上に存在する案内ポイントを制御部１２が認識する処理として捉えることができる。 In this case, the guidance point detection processing unit 31 detects a guidance point existing ahead in the traveling direction of the vehicle on the guidance route set for route guidance by the navigation function unit 20. Note that when the guide route is determined, the coordinate data of the guide points existing on the guide route can be specified based on map data or the like. Therefore, in this case, the guidance point detection processing by the guidance point detection processing unit 31 is a process in which the control unit 12 recognizes the guidance points existing on the guidance route based on the determined guidance route data and map data. Can be understood as

録音音声データ抽出処理部３２は、案内ポイント検出処理部３１が検出した移動体の進行方向前方に存在する案内ポイントに対応する混成音声データを生成するために必要な録音音声データを必要音声データテーブルＴ２に基づいて特定し、その特定した録音音声データを外部記憶部１５の音声データテーブルＴ１から抽出する。
合成音声データ保持処理部３３は、案内ポイント検出処理部３１が検出した移動体の進行方向前方に存在する案内ポイントに対応する混成音声データを生成するために、詳しくは後述するようにして携帯通信端末４１が新たに生成した合成音声データを合成音声データ用バッファ部１６Ａに保持する。 The recorded voice data extraction processing unit 32 stores the recorded voice data necessary for generating the mixed voice data corresponding to the guide point existing in the traveling direction of the moving object detected by the guide point detection processing unit 31. Based on T2, the specified recorded voice data is extracted from the voice data table T1 of the external storage unit 15.
The synthesized voice data holding processing unit 33 generates the mixed voice data corresponding to the guidance point existing in the traveling direction of the moving object detected by the guidance point detection processing unit 31. The synthesized voice data newly generated by the terminal 41 is held in the synthesized voice data buffer unit 16A.

混成音声データ生成処理部３４は、録音音声データ抽出処理部３２が抽出した録音音声データと合成音声データ保持処理部３３が保持している合成音声データとからなる混成音声データを混成音声データ用バッファ部１６Ｂに生成する。この場合、混成音声データ生成処理部３４は、案内ポイントよりも所定距離手前のポイントをデータ生成完了ポイントとして設定し、車両が当該データ生成完了ポイントに到達するまでに混成音声データを生成するようになっている。この所定距離は、車両が案内ポイントに到達するまでに混成音声データの生成が完了することを確実にするために、制御部１２の処理能力や車両の速度などに応じて適宜変更して設定することができる。 The hybrid voice data generation processing unit 34 converts the mixed voice data composed of the recorded voice data extracted by the recorded voice data extraction processing unit 32 and the synthesized voice data held by the synthesized voice data holding processing unit 33 into a mixed voice data buffer. Generated in the part 16B. In this case, the hybrid voice data generation processing unit 34 sets a point a predetermined distance before the guide point as a data generation completion point, and generates the hybrid voice data until the vehicle reaches the data generation completion point. It has become. This predetermined distance is appropriately changed and set according to the processing capacity of the control unit 12, the speed of the vehicle, etc., in order to ensure that the generation of the hybrid voice data is completed before the vehicle reaches the guidance point. be able to.

混成音声出力処理部３５は、案内ポイント検出処理部３１が検出した案内ポイントに車両が到達すると、混成音声データ生成処理部３４が生成した混成音声データに基づいて音声出力部１８を介して混成音声を出力する。合成音声データ生成処理部３６は、上記した音声合成機能部２２を用いて合成音声データを生成する。即ち、ナビゲーション装置１１は、音声合成機能部２２および合成音声データ生成処理部３６を備えたことにより、外部の装置に依らずとも自身で合成音声データを生成することが可能である。 When the vehicle reaches the guidance point detected by the guidance point detection processing unit 31, the hybrid voice output processing unit 35 mixes the mixed voice via the voice output unit 18 based on the hybrid voice data generated by the hybrid voice data generation processing unit 34. Is output. The synthesized speech data generation processing unit 36 generates synthesized speech data using the speech synthesis function unit 22 described above. That is, the navigation device 11 includes the speech synthesis function unit 22 and the synthesized speech data generation processing unit 36, so that it can generate synthesized speech data by itself without depending on an external device.

次に、携帯通信端末４１の構成について説明する。携帯通信端末４１は、制御部４２、データ通信部４３、音声合成機能部４４などを備える。制御部４２は、図示しないＣＰＵ、ＲＡＭ、ＲＯＭおよびＩ／Ｏバスなどを有するマイクロコンピュータを主体として構成されている。制御部４２は、ＲＯＭなどの記憶媒体に記憶されているコンピュータプログラムに従って、各種の表示出力動作、音声出力動作、通話動作、音声合成動作など携帯通信端末４１の動作全般を制御する。
また、この制御部４２は、コンピュータプログラムを実行することにより、合成音声データ生成処理部５１をソフトウェアによって仮想的に実現する。なお、合成音声データ生成処理部５１は、特許請求の範囲に記載した外部の装置が備える合成音声データ生成手段に相当する。
データ通信部４３は、上述したようにデータ通信部１４との間に通信回線を確立し、この通信回線を介してナビゲーション装置１１との間で各種のデータを送受信する通信モジュールである。 Next, the configuration of the mobile communication terminal 41 will be described. The mobile communication terminal 41 includes a control unit 42, a data communication unit 43, a voice synthesis function unit 44, and the like. The control unit 42 is mainly configured by a microcomputer having a CPU, a RAM, a ROM, an I / O bus, and the like (not shown). The control unit 42 controls the overall operation of the mobile communication terminal 41 such as various display output operations, voice output operations, call operations, and voice synthesis operations in accordance with a computer program stored in a storage medium such as a ROM.
Further, the control unit 42 virtually implements the synthesized voice data generation processing unit 51 by software by executing a computer program. The synthesized voice data generation processing unit 51 corresponds to synthesized voice data generation means provided in an external device described in the claims.
As described above, the data communication unit 43 is a communication module that establishes a communication line with the data communication unit 14 and transmits / receives various data to / from the navigation apparatus 11 via the communication line.

音声合成機能部４４は、音声合成用の音声合成ライブラリや音声合成辞書などを有している。この音声合成機能部４４は、携帯通信端末４１が備える図示しないマイクを介して入力された音声、あるいは、詳しくは後述するようにしてナビゲーション装置１１から受信したテキストデータを音声合成ライブラリや音声合成辞書などを用いて音声データに変換し、その音声データを合成音声データとして出力する機能を有する。例えば、「ストリート」という音声がマイクを介して入力された場合、この音声合成機能部４４は、その音声を解析して、「ストリート」という音声が出力される合成音声データを生成する。また、テキストデータ[ABC Street]が入力された場合、この音声合成機能部４４は、そのテキストデータを解析して、「エービーシーストリート」という音声が出力される合成音声データ[ABC Street]を生成する。
合成音声データ生成処理部５１は、ナビゲーション装置１１から受信したテキストデータに基づいて、案内ポイント検出処理部３１が検出した案内ポイントに対応する混成音声データを生成するために必要な合成音声データを特定し、その特定した合成音声データを、上記した音声合成機能部４４を用いて生成する。 The speech synthesis function unit 44 includes a speech synthesis library for speech synthesis, a speech synthesis dictionary, and the like. The voice synthesis function unit 44 uses a voice synthesis library or a voice synthesis dictionary for voice input via a microphone (not shown) included in the mobile communication terminal 41 or text data received from the navigation device 11 as will be described in detail later. Etc., and a function of outputting the voice data as synthesized voice data. For example, when a voice “street” is input via a microphone, the voice synthesis function unit 44 analyzes the voice and generates synthesized voice data in which a voice “street” is output. When text data [ABC Street] is input, the speech synthesis function unit 44 analyzes the text data and generates synthesized speech data [ABC Street] from which the speech “ABC Street” is output. .
Based on the text data received from the navigation device 11, the synthesized voice data generation processing unit 51 specifies synthesized voice data necessary for generating the hybrid voice data corresponding to the guidance point detected by the guidance point detection processing unit 31. Then, the specified synthesized speech data is generated using the speech synthesis function unit 44 described above.

次に、上記した構成の音声出力システム１０の動作内容について図５を参照しながら説明する。なお、以下に説明する処理は、説明の便宜上、「ナビゲーション装置１１」、「携帯通信端末４１」を主体として説明するが、実際は「ナビゲーション装置１１の制御部１２」、「携帯通信端末４１の制御部４２」が実行する処理である。
ナビゲーション装置１１は、起動すると、ユーザ情報記憶部に記憶されている現在のユーザのユーザ情報を読み出す（ステップＡ１）。そして、ナビゲーション装置１１は、当該ナビゲーション装置１１から所定範囲内に存在する携帯通信端末４１を探索し、探索された携帯通信端末４１との間に通信回線を確立して通信可能に接続する（ステップＡ２）。 Next, the operation content of the audio output system 10 having the above-described configuration will be described with reference to FIG. For convenience of explanation, the processing described below will be described mainly with “navigation device 11” and “portable communication terminal 41”, but in reality, “control unit 12 of navigation device 11” and “control of portable communication terminal 41”. This is a process executed by the unit 42 ”.
When the navigation device 11 is activated, it reads the user information of the current user stored in the user information storage unit (step A1). And the navigation apparatus 11 searches the portable communication terminal 41 which exists in the predetermined range from the said navigation apparatus 11, establishes a communication line between the searched portable communication terminals 41, and connects so that communication is possible (step) A2).

ナビゲーション装置１１に接続された携帯通信端末４１は、当該携帯通信端末４１の端末ＩＤをナビゲーション装置１１に送信する（ステップＢ１）。携帯通信端末４１から端末ＩＤを受信したナビゲーション装置１１は、ステップＡ１にて読み出したユーザ情報に含まれる携帯通信端末の端末ＩＤと受信した端末ＩＤとを比較し、その比較結果、つまり、両端末ＩＤが一致しているか、あるいは、不一致であるのかを記憶する（ステップＡ３）。ナビゲーション装置１１は、両端末ＩＤが一致した場合には、現在設定されているユーザの携帯通信端末４１が通信可能に接続されたと認識する。 The mobile communication terminal 41 connected to the navigation device 11 transmits the terminal ID of the mobile communication terminal 41 to the navigation device 11 (step B1). The navigation apparatus 11 that has received the terminal ID from the mobile communication terminal 41 compares the terminal ID of the mobile communication terminal included in the user information read in step A1 with the received terminal ID, and the comparison result, that is, both terminals Whether the IDs match or does not match is stored (step A3). If both terminal IDs match, the navigation device 11 recognizes that the currently set mobile communication terminal 41 of the user is connected so as to be communicable.

次に、ナビゲーション装置１１は、現在設定されている案内経路と車両の現在位置とに基づいて、案内経路上において車両の進行方向前方であり且つ車両の現在位置から所定距離以内、この場合、１ｋｍ以内に存在する案内ポイントを検出する（ステップＡ４）。例えば図４に示す例では、ナビゲーション装置１１は、案内経路Ｒ１上において車両の進行方向前方であり且つ車両の現在位置Ｎから所定距離内に存在する案内ポイントａを検出する。なお、この所定距離は、例えば車両の速度、案内ポイント周辺の渋滞状況、天候や道路の状況などに応じて変更可能に構成するとよい。 Next, the navigation device 11 is based on the currently set guide route and the current position of the vehicle, and is ahead of the vehicle in the traveling direction on the guide route and within a predetermined distance from the current position of the vehicle. In this case, 1 km Guidance points existing within are detected (step A4). For example, in the example shown in FIG. 4, the navigation device 11 detects a guidance point a that is ahead of the vehicle in the traveling direction on the guidance route R1 and is within a predetermined distance from the current position N of the vehicle. The predetermined distance may be configured to be changeable according to, for example, the speed of the vehicle, traffic congestion around the guidance point, weather, road conditions, and the like.

ナビゲーション装置１１は、案内ポイントが検出されると（ステップＡ４：ＹＥＳ）、その案内ポイントに対応する混成音声データを生成するために必要な音声データの音声ＩＤを必要音声データテーブルＴ２から読み出し、読み出した音声ＩＤに基づいて、必要な音声データを特定する（ステップＡ５）。そして、ナビゲーション装置１１は、上記したステップＡ３の比較結果が「一致」であるか否かを判断し（ステップＡ６）、「一致」である場合（ステップＡ６：ＹＥＳ）には、ステップＡ７に移行して合成音声データ先行生成依頼処理を実行する。なお、ナビゲーション装置１１は、上記したステップＡ３の比較結果が「不一致」である場合（ステップＡ６：ＮＯ）には、ステップＡ７，Ａ８に移行することなく、詳しくは後述するステップＡ９に移行する。 When the guidance point is detected (step A4: YES), the navigation device 11 reads out the voice ID of the voice data necessary for generating the hybrid voice data corresponding to the guidance point from the necessary voice data table T2, and reads it out. Necessary voice data is specified based on the voice ID (step A5). Then, the navigation device 11 determines whether or not the comparison result in step A3 described above is “match” (step A6). If it is “match” (step A6: YES), the navigation device 11 proceeds to step A7. Then, the synthesized voice data advance generation request process is executed. In addition, when the comparison result of above-mentioned step A3 is "mismatch" (step A6: NO), the navigation apparatus 11 will transfer to step A9 mentioned later in detail, without transferring to step A7, A8.

ステップＡ７の合成音声データ先行生成依頼処理では、ナビゲーション装置１１は、携帯通信端末４１に合成音声データ先行生成依頼情報を送信する。このとき、ナビゲーション装置１１は、この合成音声データ先行生成依頼情報に、上記のステップＡ５にて特定した音声データに含まれるテキストデータそのもの、および、そのテキストデータの音声ＩＤを添付する。即ち、ステップＡ５にて特定した音声データに、例えばテキストデータ[ABC Street]が含まれている場合には、ナビゲーション装置１１は、そのテキストデータ[ABC Street]そのもの、および、そのテキストデータ[ABC Street]の音声ＩＤ[４]を添付して送信する。 In the synthesized voice data advance generation request process in step A7, the navigation device 11 transmits the synthesized voice data advance generation request information to the mobile communication terminal 41. At this time, the navigation device 11 attaches the text data itself included in the voice data identified in step A5 and the voice ID of the text data to the synthesized voice data advance generation request information. That is, when the voice data specified in step A5 includes, for example, text data [ABC Street], the navigation device 11 sends the text data [ABC Street] itself and the text data [ABC Street]. ] With voice ID [4] attached.

携帯通信端末４１は、ナビゲーション装置１１から合成音声データ先行生成依頼情報を受信すると、受信したテキストデータ[ABC Street]そのものを音声合成機能部４４によって解析することにより、合成音声データ[ABC Street]を生成する（ステップＢ２）。そして、携帯通信端末４１は、生成した合成音声データ[ABC Street]をナビゲーション装置１１に送信する（ステップＢ３）。このとき、携帯通信端末４１は、受信した合成音声データ先行生成依頼情報に含まれる音声ＩＤ、この場合、音声ＩＤ[４]を合成音声データ[ABC Street]に添付して送信する。ナビゲーション装置１１は、受信した合成音声データ[ABC Street]を、図６（ａ）に示すように合成音声データ用バッファ部１６Ａに格納する（ステップＡ８）。そして、ナビゲーション装置１１は、混成音声データ生成処理を実行する（ステップＡ９）。 When the mobile communication terminal 41 receives the synthesized speech data advance generation request information from the navigation device 11, the speech synthesis function unit 44 analyzes the received text data [ABC Street] itself to obtain the synthesized speech data [ABC Street]. Generate (step B2). Then, the mobile communication terminal 41 transmits the generated synthesized voice data [ABC Street] to the navigation device 11 (step B3). At this time, the mobile communication terminal 41 transmits the voice ID included in the received synthesized voice data advance generation request information, in this case, the voice ID [4] attached to the synthesized voice data [ABC Street]. The navigation device 11 stores the received synthesized voice data [ABC Street] in the synthesized voice data buffer 16A as shown in FIG. 6A (step A8). And the navigation apparatus 11 performs a hybrid audio | voice data production | generation process (step A9).

ここで、この混成音声データ生成処理の内容について図７を参照しながら説明する。この混成音声データ生成処理では、ナビゲーション装置１１は、上記のステップＡ５にて特定した音声データに含まれる録音音声データを録音音声データ格納部１６Ｂａに格納する（ステップＣ１）。即ち、ステップＡ５にて特定した音声データに、例えば録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]が含まれている場合には、図６（ｂ）に示すように、ナビゲーション装置１１は、それら録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]を、音声ＩＤが小さいデータから順に録音音声データ格納部１６Ｂａに格納していく。 Here, the contents of the hybrid audio data generation process will be described with reference to FIG. In this hybrid voice data generation process, the navigation device 11 stores the recorded voice data included in the voice data specified in step A5 in the recorded voice data storage unit 16Ba (step C1). That is, if the audio data specified in step A5 includes, for example, recorded audio data [In half of a mile,], recorded audio data [right turn], and recorded audio data [onto], FIG. As shown in (b), the navigation device 11 records the recorded voice data [In half of a mile,], the recorded voice data [right turn], and the recorded voice data [onto] in order from the data with the lowest voice ID. The data is stored in the audio data storage unit 16Ba.

そして、ナビゲーション装置１１は、ステップＡ５にて特定した音声データに含まれるテキストデータの音声ＩＤと、携帯通信端末４１から受信した合成音声データに添付されている音声ＩＤとを比較する（ステップＣ２）。この場合、ステップＡ５にて特定した音声データに含まれるテキストデータ[ABC Street]の音声ＩＤ[４]と、携帯通信端末４１から受信した合成音声データ[ABC Street]に添付されている音声ＩＤ[４]とが一致するので（ステップＣ２：ＹＥＳ）、ナビゲーション装置１１は、図６（ｃ）に示すように、合成音声データ用バッファ部１６Ａに格納されている合成音声データ[ABC Street]を合成音声データ格納部１６Ｂｂに移動させる（ステップＣ３）。これにより、ナビゲーション装置１１は、検出した案内ポイントに対応する混成音声データとして、録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]と合成音声データ[ABC Street]とが連続する一連の混成音声データを生成する。 Then, the navigation device 11 compares the voice ID of the text data included in the voice data specified in step A5 with the voice ID attached to the synthesized voice data received from the mobile communication terminal 41 (step C2). . In this case, the voice ID [4] of the text data [ABC Street] included in the voice data specified in step A5 and the voice ID [ABC Street] attached to the synthesized voice data [ABC Street] received from the mobile communication terminal 41. 4] (step C2: YES), the navigation device 11 synthesizes the synthesized voice data [ABC Street] stored in the synthesized voice data buffer unit 16A as shown in FIG. 6C. The voice data storage unit 16Bb is moved (step C3). Thereby, the navigation apparatus 11 records audio data [In half of a mile,], audio recording data [right turn], audio recording data [onto], and synthesized audio data as hybrid audio data corresponding to the detected guidance point. A series of mixed audio data with [ABC Street] is generated.

なお、ナビゲーション装置１１は、両音声ＩＤが一致しない場合（ステップＣ２：ＮＯ）には、合成音声データ用バッファ部１６Ａに格納されている合成音声データ[ABC Street]を合成音声データ格納部１６Ｂｂに移動させないようになっている。この場合、図８に示すように、ナビゲーション装置１１は、合成音声データ先行生成依頼処理を再度実行（ステップＣ４）するように構成するとよい。この再度の合成音声データ先行生成依頼処理を受けて、携帯通信端末４１は、再度受信したテキストデータの音声ＩＤと現在生成している合成音声データに添付する音声ＩＤとが一致するか否かを判断する。そして、両音声ＩＤが一致する場合には、携帯通信端末４１は、現在生成中の合成音声データが生成され次第、直ちにナビゲーション装置１１に送信する。そして、ナビゲーション装置１１は、受信した合成音声データを含む混成音声データを生成する。一方、両音声ＩＤが一致しない場合には、携帯通信端末４１は、新たに受信したテキストデータに対応する合成音声データを生成してナビゲーション装置１１に送信する。そして、ナビゲーション装置１１は、その合成音声データを含む混成音声データを生成する。 In the case where the two voice IDs do not match (step C2: NO), the navigation device 11 sends the synthesized voice data [ABC Street] stored in the synthesized voice data buffer unit 16A to the synthesized voice data storage unit 16Bb. It is designed not to move. In this case, as shown in FIG. 8, the navigation device 11 may be configured to execute the synthesized voice data advance generation request process again (step C4). In response to the re-synthesized voice data preceding generation request process, the mobile communication terminal 41 determines whether or not the voice ID of the text data received again matches the voice ID attached to the currently generated synthesized voice data. to decide. If both voice IDs match, the mobile communication terminal 41 immediately transmits the synthesized voice data being generated to the navigation device 11 as soon as it is generated. And the navigation apparatus 11 produces | generates the composite audio | voice data containing the received synthetic audio | voice data. On the other hand, if the two voice IDs do not match, the mobile communication terminal 41 generates synthesized voice data corresponding to the newly received text data and transmits it to the navigation device 11. And the navigation apparatus 11 produces | generates the composite audio | voice data containing the synthetic | combination audio | voice data.

また、ナビゲーション装置１１は、合成音声データ先行生成依頼処理を再度実行したにも関わらず、所定時間内、即ち、遅くとも案内ポイントに車両が到達するまでに、より好ましくは案内ポイント手前のデータ生成完了ポイントに車両が到達するまでに携帯通信端末４１から合成音声データを受信できない場合には、合成音声データ生成中止依頼処理を実行して、携帯通信端末４１による合成音声データの生成処理を中止する。そして、ナビゲーション装置１１は、合成音声データを有さず録音音声データのみからなる音声データ、この場合、録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]からなる音声データを生成する。 In addition, the navigation device 11 has completed data generation within a predetermined time, that is, until the vehicle arrives at the guide point at the latest, more preferably before the guide point, despite executing the synthesized voice data advance generation request processing again. If the synthesized voice data cannot be received from the mobile communication terminal 41 before the vehicle reaches the point, the synthesized voice data generation stop request process is executed, and the synthesized voice data generation process by the portable communication terminal 41 is stopped. The navigation device 11 does not have synthesized voice data, and is voice data consisting only of recorded voice data. In this case, the recorded voice data [In half of a mile,], recorded voice data [right turn], recorded voice data [ Generate audio data consisting of [onto].

ナビゲーション装置１１は、ステップＡ９の混成音声データ生成処理にて混成音声データを生成すると、その混成音声データに基づいて混成音声を出力する（ステップＡ１０）。即ち、録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]と合成音声データ[ABC Street]とが連続する一連の混成音声データが生成されている場合には、ナビゲーション装置１１は、その混成音声データに基づいて「インハーフオブアマイル、ライトターンオントゥエービーシーストリート」という音声を出力する。なお、混成音声データ生成処理にて合成音声データを有しない音声データ、この場合、録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]からなる音声データが生成された場合には、ナビゲーション装置１１は、当該音声データの末端の録音音声データ、この場合、録音音声データ[onto]を削除して、「インハーフオブアマイル、ライトターン」という音声を出力する。 When the navigation apparatus 11 generates the hybrid voice data in the hybrid voice data generation process in step A9, the navigation apparatus 11 outputs the hybrid voice based on the hybrid voice data (step A10). That is, a series of mixed voice data in which the recorded voice data [In half of a mile,], the recorded voice data [right turn], the recorded voice data [onto], and the synthesized voice data [ABC Street] are generated is generated. In this case, the navigation device 11 outputs a voice “In half of a mile, light turn on to ABC street” based on the mixed voice data. Note that voice data that does not have synthesized voice data in the mixed voice data generation process, in this case, voice composed of recorded voice data [In half of a mile,], recorded voice data [right turn], and recorded voice data [onto]. When the data is generated, the navigation device 11 deletes the recorded voice data at the end of the voice data, in this case, the recorded voice data [onto], and the voice “in half of a mile, light turn” is recorded. Is output.

なお、ステップＡ３の比較結果が「不一致」である場合（ステップＡ６：ＮＯ）も、携帯通信端末４１は合成音声データを生成しない。よって、この場合も、混成音声データ生成処理にて合成音声データを有しない音声データ、この場合、録音音声データ[In half of a mile,]、録音音声データ[right turn]、録音音声データ[onto]からなる音声データが生成される。そのため、ナビゲーション装置１１は、音声データの末端の録音音声データを削除して、「インハーフオブアマイル、ライトターン」という音声を出力する。 Note that the mobile communication terminal 41 does not generate synthesized speech data even when the comparison result in step A3 is “mismatch” (step A6: NO). Therefore, in this case as well, voice data that does not have synthesized voice data in the mixed voice data generation process, in this case, recorded voice data [In half of a mile,], recorded voice data [right turn], recorded voice data [onto ] Is generated. Therefore, the navigation device 11 deletes the recorded voice data at the end of the voice data, and outputs a voice “in half of a mile, light turn”.

以上に説明したように本実施形態によれば、例えば車両などの経路案内用に設定された案内経路上において進行方向前方に存在する案内ポイントを予め検出して、その案内ポイントに対応する混成音声データを、当該案内ポイントに車両が到達する前に事前に生成するように構成した。これにより、処理の負荷が大きい合成音声データの生成が遅れたとしても、車両が案内ポイントに到達する前に、余裕を持って録音音声データと合成音声データとからなる混成音声データを生成することができ、生成した混成音声データに基づいて案内音声として混成音声を出力する場合に、録音音声の出力と合成音声の出力とが途切れてしまうことを回避することができる。 As described above, according to the present embodiment, for example, a guidance point existing ahead in the traveling direction on a guidance route set for route guidance such as a vehicle is detected in advance, and the hybrid voice corresponding to the guidance point is detected. The data is configured to be generated in advance before the vehicle reaches the guidance point. As a result, even if the generation of the synthesized voice data with a large processing load is delayed, before the vehicle reaches the guidance point, the mixed voice data composed of the recorded voice data and the synthesized voice data is generated with a margin. It is possible to prevent the output of the recorded voice and the output of the synthesized voice from being interrupted when the mixed voice is output as the guidance voice based on the generated mixed voice data.

また、合成音声データ生成処理部５１を、ナビゲーション装置１１とは別体の外部の携帯通信端末４１に設け、ナビゲーション装置１１は、外部の携帯通信端末４１に合成音声データの生成を依頼し、当該携帯通信端末４１が生成した合成音声データを用いて混成音声データを生成する構成とした。即ち、音声出力の主体となるナビゲーション装置１１は、処理の負荷が大きい合成音声データの生成処理を、自身では行わず外部の携帯通信端末４１に行わせる構成とした。これにより、ナビゲーション装置１１の処理負荷を低減することができ、音声出力処理を無理なく実行することができる。 In addition, the synthesized voice data generation processing unit 51 is provided in an external mobile communication terminal 41 that is separate from the navigation apparatus 11, and the navigation apparatus 11 requests the external mobile communication terminal 41 to generate synthesized voice data, and It was set as the structure which produces | generates mixed audio | voice data using the synthetic | combination audio | voice data which the portable communication terminal 41 produced | generated. That is, the navigation device 11 that is the main body of voice output is configured to cause the external mobile communication terminal 41 to perform synthetic voice data generation processing with a large processing load by itself. Thereby, the processing load of the navigation device 11 can be reduced, and the voice output process can be executed without difficulty.

また、混成音声データは、案内音声の出力を開始する案内ポイントよりも所定距離手前に設定されるデータ生成完了ポイントに車両が到達するまでに生成されるから、車両が案内ポイントに到達する前に十分に余裕を持って混成音声データを準備することができ、案内ポイントにおける混成音声データに基づく案内音声の出力を、遅延することなく円滑に実行することができる。
ナビゲーション装置１１は、予め整備された必要音声データテーブルＴ２に基づいて、外部記憶部１５から抽出する録音音声データおよび新たに生成する合成音声データを精度良く特定することができ、ひいては、最終的に生成される混成音声データを精度良く生成することができる。 Further, since the hybrid voice data is generated before the vehicle reaches the data generation completion point set a predetermined distance before the guidance point at which the guidance voice starts to be output, before the vehicle reaches the guidance point, The mixed voice data can be prepared with a sufficient margin, and the output of the guidance voice based on the hybrid voice data at the guidance point can be smoothly executed without delay.
The navigation device 11 can accurately specify the recorded voice data extracted from the external storage unit 15 and the newly generated synthesized voice data based on the necessary voice data table T2 prepared in advance. The generated hybrid audio data can be generated with high accuracy.

なお、本発明は、上述した一実施形態のみに限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能であり、例えば、以下のように変形または拡張することができる。
ナビゲーション装置１１は、案内ポイント検出処理部３１により、案内経路上に存在する案内ポイントのうち車両の進行方向前方に存在する「直近」の案内ポイントを、車両の進行に伴いながら随時検出し、その直近の１つの案内ポイントに対応する混成音声データを随時生成する構成としてもよい。あるいは、ナビゲーション装置１１は、案内経路が設定された時点で当該案内経路上に存在する全ての案内ポイントを検出し、それら複数の案内ポイントにそれぞれ対応する混成音声データを一括して生成する構成としてもよい。この場合、ナビゲーション装置１１は、車両が各案内ポイントに到達するごとに、その案内ポイントに対応する混成音声データに基づいて案内音声を随時出力する。 In addition, this invention is not limited only to one embodiment mentioned above, It can apply to various embodiment in the range which does not deviate from the summary, For example, it can deform | transform or expand as follows. .
The navigation device 11 uses the guidance point detection processing unit 31 to detect the “most recent” guidance point existing in the forward direction of the vehicle among the guidance points existing on the guidance route as needed as the vehicle travels. It is good also as a structure which produces | generates the mixed audio | voice data corresponding to the one latest guidance point at any time. Alternatively, the navigation device 11 detects all the guide points existing on the guide route at the time when the guide route is set, and collectively generates mixed voice data corresponding to each of the plurality of guide points. Also good. In this case, every time the vehicle reaches each guidance point, the navigation device 11 outputs guidance voice as needed based on the hybrid voice data corresponding to the guidance point.

携帯通信端末４１が存在しない場合には、ナビゲーション装置１１は、自身が備える音声合成機能部２２および合成音声データ生成処理部３６により該当する合成音声データを生成することができる。この場合、ナビゲーション装置１１に合成音声データの生成処理の負荷がかかることから、データ生成完了ポイントを設定する際の所定距離を長く設定し、データ生成完了ポイントから案内ポイントまでの距離を一層長く確保するように構成するとよい。これにより、ナビゲーション装置１１は、車両が変更後のデータ生成完了ポイントに到達するまでに、自身が生成した合成音声データを用いて混成音声データを生成することができる。従って、ナビゲーション装置１１にて合成音声データを生成する場合であっても、車両が案内ポイントに到達する前に十分に余裕を持って混成音声データを準備することができ、案内ポイントにおける案内音声の出力を遅延なく円滑に実行することができる。 When the mobile communication terminal 41 does not exist, the navigation device 11 can generate corresponding synthesized voice data by the voice synthesis function unit 22 and the synthesized voice data generation processing unit 36 included in the navigation device 11. In this case, since the synthesized speech data generation processing load is applied to the navigation device 11, a predetermined distance when setting the data generation completion point is set longer, and a longer distance from the data generation completion point to the guidance point is secured. It may be configured to do so. Thereby, the navigation apparatus 11 can generate | occur | produce hybrid audio | voice data using the synthetic | combination audio | voice data which self produced | generated until the vehicle arrives at the data generation completion point after a change. Therefore, even when the synthesized voice data is generated by the navigation device 11, the mixed voice data can be prepared with a sufficient margin before the vehicle reaches the guidance point. The output can be executed smoothly without delay.

合成音声データ生成処理部５１が備えられる外部の装置は、携帯通信端末４１に限られるものではなく、例えば、ナビゲーション装置１１に、渋滞情報や事故情報などの交通情報、天候情報などを提供する情報提供サーバに合成音声データ生成処理部５１を備える構成としてもよい。また、ナビゲーション装置１１に合成音声データ生成処理部５１を備える構成としてもよい。 The external device provided with the synthesized voice data generation processing unit 51 is not limited to the mobile communication terminal 41. For example, the navigation device 11 provides traffic information such as traffic jam information and accident information, weather information, and the like. It is good also as a structure provided with the synthetic | combination audio | voice data generation process part 51 in a provision server. Further, the navigation device 11 may be configured to include the synthesized voice data generation processing unit 51.

一旦設定された案内経路が変更された場合、例えば、車両が案内経路から外れ、ナビゲーション装置１１が備えるリルート機能により新たな案内経路が設定される場合も考慮して、ナビゲーション装置１１は、現在走行中の案内経路上において進行方向前方に存在する案内ポイントのみならず、車両が走行する可能性のある経路、つまり、現在走行中の案内経路とは異なる経路上に存在する案内ポイントも検出し、検出した複数の案内ポイントについて、それぞれ、各案内ポイントに対応する混成音声データを予め生成する構成としてもよい。即ち、例えば図９に示すように、ナビゲーション装置１１は、案内ポイントａのみならず、車両が走行する可能性のある経路Ｒ２上に存在する案内ポイントｂも検出し、その案内ポイントｂに対応する混成音声データも予め生成する。 When the guide route once set is changed, for example, in consideration of the case where the vehicle deviates from the guide route and a new guide route is set by the reroute function provided in the navigation device 11, the navigation device 11 performs the current travel. Detect not only the guidance points that are ahead in the direction of travel on the middle guidance route, but also the routes that the vehicle may travel on, that is, the guidance points that are on a different route from the currently traveling guidance route, It is good also as a structure which produces | generates beforehand the mixed audio | voice data corresponding to each guidance point about the detected several guidance point, respectively. That is, for example, as shown in FIG. 9, the navigation apparatus 11 detects not only the guidance point a but also the guidance point b existing on the route R2 where the vehicle may travel, and corresponds to the guidance point b. Hybrid voice data is also generated in advance.

さらに、ナビゲーション装置１１は、通過する可能性がなくなった案内ポイントに関連する音声データを消去する構成とするとよい。例えば図９に示す例では、車両が案内経路Ｒ１から外れ経路Ｒ２を走行し始めた場合には、予め生成した案内ポイントａに対応する混成音声データを消去する。これにより、ナビゲーション装置１１が備える記憶媒体を無駄なく有効に活用することができる。
また、ナビゲーション装置１１は、車両が案内経路Ｒ１から外れ経路Ｒ２を走行し始めたときに、未だ携帯通信端末４１から合成音声データを受信していない場合には、合成音声データ生成中止依頼処理を実行して、携帯通信端末４１による案内ポイントａに対応する合成音声データの生成処理を中止する構成としてもよい。
ナビゲーション装置１１および携帯通信端末４１は、データ通信ケーブルを介して相互に有線通信可能に接続する構成としてもよい。 Further, the navigation device 11 may be configured to delete the audio data related to the guidance point that is no longer likely to pass. For example, in the example shown in FIG. 9, when the vehicle starts to travel off the guidance route R1 and travels on the route R2, the hybrid voice data corresponding to the guidance point a generated in advance is deleted. Thereby, the storage medium with which the navigation apparatus 11 is provided can be utilized effectively without waste.
In addition, when the vehicle has started to travel on the route R2 deviating from the guide route R1, the navigation device 11 performs the synthesized speech data generation stop request process when the synthesized speech data has not been received from the mobile communication terminal 41 yet. It is good also as a structure which performs and stops the production | generation process of the synthetic | combination audio | voice data corresponding to the guidance point a by the portable communication terminal 41. FIG.
The navigation device 11 and the mobile communication terminal 41 may be connected to each other via a data communication cable so that wired communication is possible.

図面中、１１はナビゲーション装置（音声出力装置）、１５は外部記憶部（録音音声データ記憶部）、３１は案内ポイント検出処理部（案内ポイント検出手段）、３２は録音音声データ抽出処理部（録音音声データ抽出手段）、３３は合成音声データ保持処理部（合成音声データ保持手段）、３４は混成音声データ生成処理部（混成音声データ生成手段）、３５は混成音声出力処理部（混成音声出力手段）、３６は合成音声データ生成処理部（合成音声データ生成手段）、４１は携帯通信端末（外部の装置）、５１は合成音声データ生成処理部（合成音声データ生成手段）を示す。 In the drawing, 11 is a navigation device (voice output device), 15 is an external storage unit (recorded voice data storage unit), 31 is a guidance point detection processing unit (guidance point detection means), and 32 is a recorded voice data extraction processing unit (sound recording). (Speech data extraction means), 33 is a synthesized voice data holding processing section (synthetic voice data holding means), 34 is a hybrid voice data generation processing section (hybrid voice data generation means), and 35 is a hybrid voice output processing section (hybrid voice output means). , 36 denotes a synthesized voice data generation processing unit (synthetic voice data generation means), 41 denotes a portable communication terminal (external device), and 51 denotes a synthesized voice data generation processing unit (synthetic voice data generation means).

Claims

A recording voice data storage unit (15) that is mounted on the mobile body and stores prerecorded recording voice data, and when reaching a guide point existing ahead in the traveling direction of the mobile body, the recording voice data storage unit A voice output device (11) for outputting a hybrid voice based on the hybrid voice data composed of the recorded voice data extracted from the synthesized voice data;
A recorded voice data extracting means (32) for extracting the recorded voice data necessary for generating the hybrid voice data corresponding to the guidance point from the recorded voice data storage unit;
Before the moving body reaches the guide point, the recorded voice data extracted by the recorded voice data extraction means and the mixed voice data corresponding to the guide point can be communicated with the voice output device. Mixed voice data generating means (34) for generating the mixed voice data consisting of the synthesized voice data generated by an external device connected to
When the mobile body reaches the guidance point, a hybrid voice output means (35) for outputting the hybrid voice based on the hybrid voice data generated by the hybrid voice data generation means;
Synthesized voice data generating means (36) for generating the synthesized voice data necessary for generating the hybrid voice data corresponding to the guidance point;
Equipped with a,
The hybrid voice data generation means sets a point a predetermined distance before the guide point as a data generation completion point, and generates the hybrid voice data until the mobile body reaches the data generation completion point. When the synthesized voice data cannot be obtained from the external device, the predetermined distance is set longer to increase the distance from the data generation completion point to the guide point, and the moving body is changed. An audio output device that generates the mixed audio data by using the synthesized audio data generated by the synthesized audio data generating means until reaching a later data generation completion point .

The voice output device according to claim 1 , further comprising: necessary voice identification information indicating the recorded voice data and the synthesized voice data necessary for generating the corresponding mixed voice data for each of the guide points.

Wherein the guide point existing ahead in the traveling direction of the moving body, an audio output device according to claim 1 or 2 is a guide point that is present on the guide path of the moving body.

The audio output device according to claim 1 or 2 , wherein the guide point existing in the forward direction of the moving body is a guide point existing on a route different from the guide route of the moving body.

A recording voice data storage unit (15) that is mounted on the mobile body and stores prerecorded recording voice data, and when reaching a guide point existing ahead in the traveling direction of the mobile body, the recording voice data storage unit An audio output device (11) for outputting mixed audio based on the mixed audio data composed of the recorded audio data extracted from the synthesized audio data and an external device (communication connected to the audio output device) 41) and an audio output system (10) comprising:
Recorded voice data extracting means (32) provided in the voice output device for extracting the recorded voice data necessary for generating the mixed voice data corresponding to the guidance point from the recorded voice data storage unit;
A synthesized voice data generating means (51) provided in the external device for generating the synthesized voice data necessary for generating the hybrid voice data corresponding to the guidance point;
Provided in the voice output device, the recorded voice data extracted by the recorded voice data extracting means before the moving body reaches the guide point, and the synthesized voice data generated by the synthesized voice data generating means Hybrid voice data generating means (34) for generating the hybrid voice data comprising:
A hybrid voice output means (35) provided in the voice output device for outputting the hybrid voice based on the hybrid voice data generated by the hybrid voice data generating means when the mobile body reaches the guidance point;
A synthesized voice data generating means (36) provided in the voice output device for generating the synthesized voice data necessary for generating the hybrid voice data corresponding to the guidance point;
Equipped with a,
The hybrid voice data generation means sets a point a predetermined distance before the guide point as a data generation completion point, and generates the hybrid voice data until the mobile body reaches the data generation completion point. When the synthesized voice data cannot be obtained from the external device, the predetermined distance is set longer to increase the distance from the data generation completion point to the guide point, and the moving body is changed. An audio output system that generates the mixed audio data by using the synthesized audio data generated by the synthesized audio data generating means included in the audio output device before reaching a later data generation completion point .

The voice output according to claim 5 , wherein the synthesized voice data generation unit provided in the external device is configured to generate the synthesized voice data in response to a request to generate synthesized voice data from the voice output device. system.

The voice output system according to claim 6 , wherein when the requested synthesized voice data cannot be obtained, the voice output device executes the synthetic voice data generation request again.