JP5807451B2

JP5807451B2 - Voice processing device, voice processing method, program, and guidance system

Info

Publication number: JP5807451B2
Application number: JP2011186489A
Authority: JP
Inventors: 鈴木　雄介; 雄介鈴木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2011-08-29
Filing date: 2011-08-29
Publication date: 2015-11-10
Anticipated expiration: 2031-08-29
Also published as: JP2013047653A

Description

本発明は、音声処理装置、音声処理方法、プログラムおよび誘導案内システムに関する。 The present invention relates to a voice processing device, a voice processing method, a program, and a guidance system.

近年、ユーザを目標位置まで誘導するための誘導案内技術が広く普及している。例えば、誘導案内装置の典型例であるカーナビゲーションシステムは、現在位置から目標位置までの経路を作成し、当該経路に基づいて地図表示と音声出力との組合せによりユーザを目標位置まで誘導する。 In recent years, guidance and guidance techniques for guiding a user to a target position have become widespread. For example, a car navigation system, which is a typical example of a guidance and guidance device, creates a route from a current position to a target position, and guides the user to the target position by a combination of map display and voice output based on the route.

一方、視覚障害者については、地図表示に依存して目標位置まで誘導することは困難である。このため、視覚障害者であるユーザを音声出力のみにより誘導する技術が提案されている。 On the other hand, it is difficult for a visually handicapped person to guide to a target position depending on the map display. For this reason, a technique for guiding a user who is visually impaired by only sound output has been proposed.

例えば、下記特許文献１では、ユーザの向きを検出し、当該ユーザの向きに基づいてユーザが進行すべき方向を決定し、当該進行すべき方向を音声により通知する誘導案内装置が提案されている。 For example, Patent Document 1 below proposes a guidance and guidance device that detects the direction of a user, determines a direction in which the user should proceed based on the direction of the user, and notifies the direction to proceed by voice. .

特開２００２−２５７５８１号公報JP 2002-257581 A

しかし、上記特許文献１の誘導案内装置では、ユーザは、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することができない。例えば、ユーザは、正しい方向に進めているかを知るために、マイクに向かって音声を発することによって誘導案内装置にその都度確認しなければならない。また、ユーザは、経路から離れてしまった場合に当該経路や移動の開始位置を容易に把握することができない。 However, in the guidance and guidance device of Patent Document 1, the user cannot intuitively grasp the correct traveling direction and how to return when the user leaves the route. For example, in order to know whether the user is moving in the right direction, the user must confirm with the guidance and guidance device each time by uttering a voice toward a microphone. In addition, when the user is away from the route, the user cannot easily grasp the route and the start position of the movement.

そこで、本発明は、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することを可能にする、新規かつ改良された音声処理装置、音声処理方法、プログラムおよび誘導案内システムを提供しようとするものである。 Therefore, the present invention provides a new and improved voice processing device, voice processing method, program, and guidance system that make it possible to intuitively grasp the correct direction of travel and how to return when leaving a route. It is something to try.

本発明によれば、ユーザ位置を検出する検出部と、ユーザに対する誘導が開始される時点で上記検出部により検出された上記ユーザ位置である開始位置に応じた第１の仮想音源位置、および上記誘導の目標位置または上記ユーザ位置と上記目標位置の間の経路上に存在する位置に応じた第２の仮想音源位置を設定する音源設定部と、上記検出部により検出された上記ユーザ位置を用いて、上記第１の仮想音源位置が音源位置としてユーザに知覚される第１の音声データ、および上記第２の仮想音源位置が音源としてユーザに知覚される第２の音声データを作成する音声作成部とを備え、上記第１の音声データおよび上記第２の音声データは、異なる音声パターンを有し、上記音声作成部は、上記第１の音声データおよび上記第２の音声データを時間軸上の異なる位置に配置する、音声処理装置が提供される。
According to the present invention, a detection unit that detects a user position, a first virtual sound source position corresponding to a start position that is the user position detected by the detection unit when guidance to the user is started, and the above a sound source setting unit that sets a second virtual sound source position corresponding to the position existing on the path between the target position or the user position and the target position of the induction, the detected the user position by the detection unit The first audio data in which the first virtual sound source position is perceived by the user as the sound source position and the second sound data in which the second virtual sound source position is perceived by the user as the sound source are used. and a creating unit, the first sound data and the second audio data has a different voice pattern, the audio creation unit, the first audio data and the second audio data It places at different positions on the time axis, the sound processing apparatus is provided.

上記第１の音声データおよび上記第２の音声データは、異なる音声パターンを有し、上記音声作成部は、上記第１の音声データおよび上記第２の音声データを時間軸上の異なる位置に配置してもよい。 The first voice data and the second voice data have different voice patterns, and the voice creation unit arranges the first voice data and the second voice data at different positions on the time axis. May be.

上記検出部は、上記ユーザの向きをさらに検出し、上記音声作成部は、上記第１の音声データまたは上記第２の音声データの各々を、上記ユーザ位置または上記ユーザの向きと、上記第１の仮想音源位置または上記第２の仮想音源位置の各々との相対的関係に基づいて作成してもよい。 The detection unit further detects the orientation of the user, and the voice creation unit detects each of the first voice data or the second voice data as the user position or the user orientation, and the first voice data. May be created based on a relative relationship with each of the virtual sound source positions or the second virtual sound source positions.

上記音声作成部は、上記第１の音声データおよび上記第２の音声データの各々を、上記相対的関係に応じた音量または音声パターンで作成してもよい。 The voice creation unit may create each of the first voice data and the second voice data with a volume or a voice pattern corresponding to the relative relationship.

上記音声作成部は、上記第１の音声データおよび上記第２の音声データの各々を、上記相対的関係に応じた頻度で時間軸上の異なる位置に配置してもよい。 The voice creation unit may arrange each of the first voice data and the second voice data at different positions on the time axis at a frequency according to the relative relationship.

上記相対的関係は、上記ユーザ位置と、上記第１の仮想音源位置若しくは上記第２の仮想音源位置との距離、または、上記ユーザの向きと、上記第１の仮想音源位置の方向若しくは上記第２の仮想音源位置の方向とのなす角度を含んでもよい。 The relative relationship is the distance between the user position and the first virtual sound source position or the second virtual sound source position, or the direction of the user and the direction of the first virtual sound source position or the first virtual sound source position. An angle formed by the direction of the two virtual sound source positions may be included.

上記音声作成部は、上記第１の音声データおよび上記第２の音声データをステレオ形式で作成してもよい。 The voice creation unit may create the first voice data and the second voice data in a stereo format.

上記音声作成部は、上記第１の音声データおよび上記第２の音声データを頭部伝達関数の畳み込みにより作成してもよい。 The voice creation unit may create the first voice data and the second voice data by convolution of a head-related transfer function.

また、本発明によれば、検出部がユーザ位置を検出するステップと、音源設定部がユーザに対する誘導が開始される時点で上記検出部により検出された上記ユーザ位置である開始位置に応じた第１の仮想音源位置、および上記誘導の目標位置または上記ユーザ位置と上記目標位置の間の経路上に存在する位置に応じた第２の仮想音源位置を設定するステップと、音源作成部が上記検出部により検出された上記ユーザ位置を用いて、上記音源設定部により設定された上記第１の仮想音源位置が音源位置としてユーザに知覚される第１の音声データ、および上記第１の音声データとは異なる音声パターンを有し、上記音源設定部により設定された上記第２の仮想音源位置が音源としてユーザに知覚される第２の音声データを作成するステップと、上記音源作成部が上記第１の音声データおよび上記第２の音声データを時間軸上の異なる位置に配置するステップとを含む、音声処理方法が提供される。
According to the present invention, the detection unit detects the user position, and the sound source setting unit detects the user position according to the start position that is the user position detected by the detection unit when the guidance to the user is started. 1 of the virtual sound source position, and a step of setting a second virtual sound source position corresponding to the position existing on the path between the target position or the user position and the target position of the induction, sound creation section above First audio data in which the first virtual sound source position set by the sound source setting unit is perceived by the user as a sound source position using the user position detected by the detection unit , and the first sound data and creating a different an audio pattern, the second sound data set the second virtual sound source position by the sound source setting unit is perceived by the user as sound source and, Serial sound creation unit and placing at different positions on the time axis of the first audio data and the second audio data, the audio processing method is provided.

また、本発明によれば、コンピュータを、ユーザ位置を検出する検出部と、ユーザに対する誘導が開始される時点で上記検出部により検出された上記ユーザ位置である開始位置に応じた第１の仮想音源位置、および上記誘導の目標位置または上記ユーザ位置と上記目標位置の間の経路上に存在する位置に応じた第２の仮想音源位置を設定する音源設定部と、上記検出部により検出された上記ユーザ位置を用いて、上記第１の仮想音源位置が音源位置としてユーザに知覚される第１の音声データ、および上記第１の音声データとは異なる音声パターンを有し上記第２の仮想音源位置が音源としてユーザに知覚される第２の音声データを作成し、上記第１の音声データおよび上記第２の音声データを時間軸上の異なる位置に配置する音声作成部と、として機能させるためのプログラムが提供される。
According to the present invention, the computer includes a detection unit that detects a user position, and a first virtual corresponding to the start position that is the user position detected by the detection unit when guidance to the user is started. sound source location, and a sound source setting unit that sets a second virtual sound source position corresponding to the position existing on the path between the target position or the user position and the target position of the induction, is detected by the detection unit In addition, using the user position, the first virtual sound source position is perceived by the user as a sound source position, and the second virtual sound source has a sound pattern different from the first sound data. create a second audio data source position is perceived to the user as a sound source, a sound creation unit for disposing the first voice data and the second audio data at different positions on the time axis Program for functioning as is provided.

また、本発明によれば、センサ、音声出力装置および音声処理装置を含む誘導案内システムであって、上記音声処理装置は、上記センサからの入力に基づいてユーザ位置を検出する検出部と、ユーザに対する誘導が開始される時点で上記検出部により検出された上記ユーザ位置である開始位置に応じた第１の仮想音源位置、および上記誘導の目標位置または上記ユーザ位置と上記目標位置の間の経路上に存在する位置に応じた第２の仮想音源位置を設定する音源設定部と、上記検出部により検出された上記ユーザ位置を用いて、上記第１の仮想音源位置が音源位置としてユーザに知覚される第１の音声データ、および上記第２の仮想音源位置が音源としてユーザに知覚される第２の音声データを作成する音声作成部とを備え、上記第１の音声データおよび上記第２の音声データは、異なる音声パターンを有し、上記音声作成部は、上記第１の音声データおよび上記第２の音声データを時間軸上の異なる位置に配置し、上記音声出力装置は、上記第１の音声データの音声および上記第２の音声データの音声を出力する、誘導案内システムが提供される。 According to the present invention, there is provided a guidance system including a sensor, a voice output device, and a voice processing device, wherein the voice processing device includes a detection unit that detects a user position based on an input from the sensor, and a user A first virtual sound source position corresponding to the start position which is the user position detected by the detection unit at the time when the guidance for the guidance is started , and a route between the guidance target position or the user position and the target position a sound source setting unit that sets a second virtual sound source position corresponding to the position that is present above using the detected the user position by the detection unit, the user of the first virtual sound source position as a sound source position It comprises first audio data to be perceived, and the second virtual sound source position and the voice generating unit configured to generate a second audio data that is perceived to the user as a sound source, the first speech Over data and the second audio data has a different voice pattern, the audio creation unit arranges the first audio data and the second audio data at different positions on the time axis, the sound The output device is provided with a guidance guidance system that outputs the voice of the first voice data and the voice of the second voice data.

以上説明したように本発明に係る音声処理装置、音声処理方法、プログラムおよび誘導案内システムによれば、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することが可能になる。 As described above, according to the voice processing device, the voice processing method, the program, and the guidance and guidance system according to the present invention, it is possible to intuitively grasp the correct traveling direction and how to return when leaving the route.

一実施形態に係る誘導案内システムの概略的な構成の一例を示す説明図である。It is explanatory drawing which shows an example of the schematic structure of the guidance system which concerns on one Embodiment. 一実施形態に係る音声処理装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the audio | voice processing apparatus which concerns on one Embodiment. 検出されるユーザの方向の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the direction of the user detected. 案内情報作成部により作成される経路の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the route produced by the guidance information creation part. 音声作成部による音声データの作成を説明するための説明図である。It is explanatory drawing for demonstrating preparation of the audio | voice data by an audio | voice preparation part. ユーザが目標位置の方向に向くまでの音声データの作成を説明するための説明図である。It is explanatory drawing for demonstrating preparation of the audio | voice data until a user turns to the direction of a target position. ユーザが目標位置に向かって直線移動する際の音声データの作成を説明するための説明図である。It is explanatory drawing for demonstrating creation of the audio | voice data at the time of a user moving linearly toward a target position. ユーザが移動の途中で向きを変える際の音声データの作成を説明するための説明図である。It is explanatory drawing for demonstrating creation of the audio | voice data when a user changes direction in the middle of a movement. 音声作成部による音声データの時間軸上への配置を説明するための説明図である。It is explanatory drawing for demonstrating arrangement | positioning on the time-axis of the audio | voice data by an audio | voice preparation part. 仮想音源位置の設定の変形例を説明するための説明図である。It is explanatory drawing for demonstrating the modification of the setting of a virtual sound source position. 一実施形態に係る音声処理の概略的な流れの一例を示すフローチャートである。It is a flowchart which shows an example of the schematic flow of the audio | voice process which concerns on one Embodiment.

以下に添付の図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

以下では、＜１．はじめに＞、＜２．誘導案内システムの概略的な構成＞、＜３．音声処理装置の構成＞、＜４．処理の流れ＞という順序で本発明の実施形態を説明する。 In the following, <1. Introduction>, <2. Schematic configuration of guidance system>, <3. Configuration of speech processing apparatus>, <4. Embodiments of the present invention will be described in the order of processing flow>.

＜１．はじめに＞
近年、ユーザを目標位置まで誘導するための誘導案内技術が広く普及している。例えば、誘導案内装置の典型例であるカーナビゲーションシステムは、まずＧＰＳ（Global Positioning System）から現在位置の情報を取得する。そして、当該カーナビゲーションシステムは、現在位置から目標位置までの経路を作成し、当該経路に基づいて地図表示と音声出力との組合せによりユーザを目標位置まで誘導する。上記カーナビゲーションシステムによれば、ユーザは、表示された地図を閲覧することにより、自らの位置と目標位置または案内の開始位置との位置関係、正しい方向に進んでいるか否か、経路から離れているか否か、等を容易に把握することができる。また、ユーザは、右折左折のタイミング等を通知する音声を聞くことにより、目標位置まで到達するための有用情報を得ることができる。このように、多くの誘導案内技術は、地図表示と音声の組合せによってユーザを目標位置に誘導する。 <1. Introduction>
In recent years, guidance and guidance techniques for guiding a user to a target position have become widespread. For example, a car navigation system, which is a typical example of a guidance and guidance device, first acquires current position information from a GPS (Global Positioning System). Then, the car navigation system creates a route from the current position to the target position, and guides the user to the target position by a combination of map display and audio output based on the route. According to the car navigation system, the user can browse the displayed map to determine the positional relationship between the user's position and the target position or the start position of the guidance, whether or not the vehicle is moving in the correct direction, and away from the route. It is possible to easily grasp whether or not there is. In addition, the user can obtain useful information for reaching the target position by listening to the sound for notifying the timing of turning right or left. As described above, many guide guidance techniques guide a user to a target position by a combination of map display and voice.

一方、視覚障害者にとって表示された地図を閲覧することは難しいので、上記のように地図表示に依存して視覚障害者であるユーザを目標位置まで誘導することは困難である。このため、音声出力のみにより視覚障害者であるユーザを誘導する技術が提案されている。一例として、ユーザの向きを検出し、当該ユーザの向きに基づいてユーザが進行すべき方向を決定し、当該進行すべき方向を音声により通知する誘導案内装置が提案されている（特開２００２−２５７５８１号公報）。 On the other hand, since it is difficult for a visually handicapped person to view a displayed map, it is difficult to guide a user who is a visually handicapped person to a target position depending on the map display as described above. For this reason, a technique for guiding a user who is visually impaired by only sound output has been proposed. As an example, there has been proposed a guidance and guidance device that detects a user's direction, determines a direction in which the user should travel based on the user's direction, and notifies the direction to travel by voice (Japanese Patent Laid-Open No. 2002-2002). No. 2,57581).

しかし、従来の誘導案内技術では、ユーザは、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することができない。例えば、一例として挙げられた上記誘導案内装置を用いる場合、ユーザは、正しい方向に進めているかを知るためには、マイクに向かって音声を発することによって誘導案内装置にその都度確認しなければならない。また、ユーザは、経路から離れてしまった場合に、経路から離れてしまっていることおよび元の経路または移動の開始位置への戻り方を容易に把握することができない。 However, with the conventional guidance and guidance technology, the user cannot intuitively grasp the correct traveling direction and how to return when the user leaves the route. For example, when using the above-described guidance and guidance device as an example, in order to know whether the user is moving in the right direction, the user must confirm with the guidance and guidance device each time by uttering a voice toward a microphone. . Further, when the user has left the route, the user cannot easily grasp that the user has left the route and how to return to the original route or the start position of the movement.

発明者は、誘導案内技術の上記問題を認識し、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することを可能にする誘導案内技術を研究するに至った。そして、当該案内技術の研究の中で、発明者は、「異種鳴き交わし方式」に着目した。 The inventor has recognized the above problems of the guidance and guidance technology, and has studied the guidance and guidance technology that makes it possible to intuitively grasp the correct traveling direction and how to return when leaving the route. And in the research of the guidance technology, the inventor paid attention to the “heterogeneous squealing method”.

異種鳴き交わし方式は、視覚障害者が安全に道路を横断できるように信号機に導入される音声誘導手法である（「平成十五年１０月２２日警視庁丁規発７７号視覚障害者用付加装置に関する設置・運用指針の制定について」を参照）。当該異種鳴き交わし方式によれば、一方の横断歩道端に位置する信号機が、「カッコー」、「ピヨ」等の音声を発し、他方の横断歩道端に位置する信号機が、「カカッコー」、「ピヨピヨ」等の音声を発する。ユーザは、これらの音声を聞くことによって、２つの横断歩道端がどのあたりに存在するかを直感的に把握することができる。また、ユーザが聞くこれらの音声は、ユーザの移動に応じて変化するので、ユーザは、当該音声の変化から、自身が正しい進行方向に進んでいるか、横断歩道から離れていないかを、随時直感的に把握することができる。また、横断歩道の両端の信号機から音声が発せられるので、ユーザは、横断歩道から離れてしまった場合であっても、横断歩道や自身がいた方の横断歩道端に戻ることもできる。このように、異種鳴き交わし方式は、正しい進行方向および横断歩道から離れた場合の戻り方を直感的に把握することを可能にするという利点を有する。なお、２つの信号機は互いに異なる音声を発するので、ユーザは当該２つの信号機（または２つの横断歩道端）を区別することができる。また、これらの音声は異なる時間に交互に発せられるので、当該音声は混じり合わず、その結果、ユーザは当該音声を容易に聞きとることができる。ただし、当該異種鳴き交わし方式は、横断歩道のように特定の場所に設置されたスピーカ等により音声が発せられるので、当該特定の場所以外ではユーザを誘導することはできない。 The heterogeneous squealing method is a voice guidance method that is introduced to traffic lights so that visually impaired people can safely cross the road ("No. 77 of the Tokyo Metropolitan Police Department, No. 77, visually impaired person additional device for visually impaired persons, October 22, 2003." (See “Establishment of Installation / Operation Guidelines”). According to the heterogeneous squealing method, a traffic light located at one pedestrian crossing emits a sound such as “cuckoo” or “piyo”, and a traffic light located at the other pedestrian crossing ends is “cuckoo” or “piyopiyo”. And so on. The user can intuitively grasp where the two pedestrian crossings exist by listening to these sounds. In addition, since these voices heard by the user change in accordance with the movement of the user, the user can intuitively know from the change in the voice whether he / she is moving in the correct direction of travel or not away from the pedestrian crossing. Can be grasped. In addition, since the sound is emitted from the traffic lights at both ends of the pedestrian crossing, the user can return to the pedestrian crossing or to the end of the pedestrian crossing where he / she was. As described above, the heterogeneous squealing method has an advantage that it is possible to intuitively grasp the correct traveling direction and how to return when leaving a pedestrian crossing. Since the two traffic lights emit different sounds, the user can distinguish between the two traffic lights (or two pedestrian crossing ends). Moreover, since these voices are emitted alternately at different times, the voices are not mixed, and as a result, the user can easily listen to the voices. However, in the heterogeneous squealing method, since a voice is emitted from a speaker or the like installed at a specific place like a pedestrian crossing, the user cannot be guided outside the specific place.

そこで、本実施形態では、上記特定の位置に限定されずに、異種鳴き交わし方式のように、正しい進行方向および経路から離れた場合の戻り方を直感的に把握させることを可能にする誘導案内システムを説明する。
＜２．誘導案内システムの概略的な構成＞
まず、図１を参照して、本実施形態に係る誘導案内システム１の概略的な構成について説明する。図１は、本実施形態に係る誘導案内システム１の概略的な構成の一例を示す説明図である。図１を参照すると、誘導案内システム１は、センサ１０、音声出力装置２０および音声処理装置１００を含む。 Therefore, in this embodiment, the guidance guidance is not limited to the above specific position, and it is possible to intuitively grasp the correct traveling direction and how to return when leaving the route as in the case of different squealing methods. Describe the system.
<2. Schematic configuration of guidance system>
First, with reference to FIG. 1, the schematic structure of the guidance system 1 which concerns on this embodiment is demonstrated. FIG. 1 is an explanatory diagram illustrating an example of a schematic configuration of the guidance system 1 according to the present embodiment. Referring to FIG. 1, the guidance and guidance system 1 includes a sensor 10, a voice output device 20, and a voice processing device 100.

（センサ１０）
センサ１０は、ユーザ３の位置（以下、「ユーザ位置」と呼ぶ）を検知する装置である。センサ１０は、例えばＧＰＳ（Global Positioning System）受信機を備えてもよい。または、センサ１０は、例えば特開２００３−９１７９４号公報に開示されているような、ユーザ３が移動する環境内に設けられたアクティブマーカまたはパッシブマーカから電波、赤外線等により位置情報を受信する受信機を備えてもよい。ここで、センサ１０は、当該受信機によりアクティブマーカまたはパッシブマーカから位置情報ではなくＩＤを取得し、当該ＩＤと位置情報との対応関係を記憶するサーバ装置にアクセスすることにより、位置情報を取得可能であってもよい。 (Sensor 10)
The sensor 10 is a device that detects the position of the user 3 (hereinafter referred to as “user position”). The sensor 10 may include, for example, a GPS (Global Positioning System) receiver. Alternatively, the sensor 10 receives position information by radio waves, infrared rays, or the like from an active marker or passive marker provided in an environment in which the user 3 moves as disclosed in, for example, Japanese Patent Application Laid-Open No. 2003-91794. A machine may be provided. Here, the sensor 10 acquires the position information by acquiring the ID instead of the position information from the active marker or the passive marker by the receiver, and accessing the server device that stores the correspondence relationship between the ID and the position information. It may be possible.

また、センサ１０は、例えば、ユーザ３の向き（以下、「ユーザの向き」と呼ぶ）をさらに検知する。センサ１０は、ジャイロセンサ、地磁気センサ、加速度センサのいずれかを備えてもよい。または、センサ１０は、アクティブマーカまたはパッシブマーカから情報を受信する上記受信機を備え、当該受信機とアクティブマーカまたはパッシブマーカとの位置関係からユーザ３の方向を検知してもよい。 The sensor 10 further detects, for example, the orientation of the user 3 (hereinafter referred to as “user orientation”). The sensor 10 may include any of a gyro sensor, a geomagnetic sensor, and an acceleration sensor. Alternatively, the sensor 10 may include the receiver that receives information from the active marker or the passive marker, and may detect the direction of the user 3 from the positional relationship between the receiver and the active marker or the passive marker.

図１を参照すると、一例として、センサ１０は、ＧＰＳ受信機および加速度センサを備える場合に、ユーザ３の腰に装着される。なお、センサ１０は、アクティブマーカまたはパッシブマーカから情報を受信する受信機である場合に、ユーザ３が利用する白杖と同様の形状で形成され、ユーザ３により携行されてもよい。 Referring to FIG. 1, as an example, when the sensor 10 includes a GPS receiver and an acceleration sensor, the sensor 10 is mounted on the waist of the user 3. When the sensor 10 is a receiver that receives information from an active marker or a passive marker, the sensor 10 may be formed in the same shape as the white cane used by the user 3 and carried by the user 3.

（音声出力装置２０）
音声出力装置２０は、音声処理装置１００により作成された音声データの音声を出力する装置である。音声出力装置２０は、例えば、２チャンネルステレオ音声を発するヘッドホンである。音声出力装置２０は、例えば、音声データをデジタル／アナログ変換（以下、「Ｄ／Ａ変換」と呼ぶ）することにより得られるアナログ音声信号を音声処理装置１００から取得し、当該アナログ音声信号に基づいて音声を出力する。音声出力装置２０は、音声処理装置１００から音声データそのものを取得し、当該音声データをＤ／Ａ変換してもよい。図１を参照すると、一例として、音声出力装置２０は、上記ヘッドホンである場合にユーザ３の頭部に装着される。 (Audio output device 20)
The audio output device 20 is a device that outputs audio of audio data created by the audio processing device 100. The audio output device 20 is, for example, headphones that emit 2-channel stereo sound. The audio output device 20 acquires, for example, an analog audio signal obtained by digital / analog conversion (hereinafter referred to as “D / A conversion”) of audio data from the audio processing device 100, and based on the analog audio signal. To output sound. The audio output device 20 may acquire the audio data itself from the audio processing device 100 and perform D / A conversion on the audio data. Referring to FIG. 1, as an example, the audio output device 20 is worn on the head of the user 3 in the case of the headphones.

（音声処理装置１００）
音声処理装置１００は、正しい進行方向および経路から離れた場合の戻り方を直感的に把握することを可能にするための音声データを作成する装置である。音声処理装置１００は、有線または無線によりセンサ３０および音声出力装置２０と接続される。音声処理装置１００は、例えば、センサ１０により検知されたユーザ位置およびユーザの向きに基づいて音声データを作成する。そして、音声処理装置１００は、例えば、音声データをＤ／Ａ変換することにより得られたアナログ音声信号を音声出力装置２０に出力する。音声出力装置２０が音声データをＤ／Ａ変換する場合には、音声処理装置１００は音声データを音声出力装置２０に出力してもよい。なお、当該音声処理装置１００の具体的な構成および当該音声処理装置１００による具体的な音声処理については、後述の＜３．音声処理装置の構成＞および＜４．処理の流れ＞において説明する。 (Speech processor 100)
The voice processing device 100 is a device that creates voice data for enabling an intuitive grasp of the correct traveling direction and how to return when leaving a route. The audio processing device 100 is connected to the sensor 30 and the audio output device 20 by wire or wireless. For example, the voice processing apparatus 100 creates voice data based on the user position and the user orientation detected by the sensor 10. Then, the audio processing device 100 outputs, for example, an analog audio signal obtained by D / A converting audio data to the audio output device 20. When the audio output device 20 performs D / A conversion on the audio data, the audio processing device 100 may output the audio data to the audio output device 20. Note that the specific configuration of the voice processing apparatus 100 and the specific voice processing by the voice processing apparatus 100 will be described in <3. Configuration of speech processing apparatus> and <4. Processing flow> will be described.

以上、図１を参照して本発明の実施形態に係る誘導案内システム１の構成の一例を説明したが、誘導案内システム１の構成はこれに限られない。例えば、センサ１０、音声出力装置２０および音声処理装置１００を物理的に分離された別の装置として説明したが、これらの装置のうちのいずれか２つ以上が、物理的に一体化された装置であってもよい。 As mentioned above, although an example of the structure of the guidance system 1 which concerns on embodiment of this invention with reference to FIG. 1 was demonstrated, the structure of the guidance system 1 is not restricted to this. For example, the sensor 10, the sound output device 20, and the sound processing device 100 have been described as separate devices that are physically separated, but any two or more of these devices are physically integrated. It may be.

＜３．音声処理装置の構成＞
次に、図２〜図１０を参照して、本実施形態に係る音声処理装置１００の構成の一例について説明する。図２は、本実施形態に係る音声処理装置１００の構成の一例を示すブロック図である。図２を参照すると、音声処理装置１００は、検出部１１０、目標入力部１２０、記憶部１３０、案内情報作成部１４０、音源設定部１５０、音声作成部１６０および音声出力部１７０を備える。 <3. Configuration of speech processing apparatus>
Next, an example of the configuration of the speech processing apparatus 100 according to the present embodiment will be described with reference to FIGS. FIG. 2 is a block diagram illustrating an example of the configuration of the speech processing apparatus 100 according to the present embodiment. Referring to FIG. 2, the voice processing apparatus 100 includes a detection unit 110, a target input unit 120, a storage unit 130, a guidance information creation unit 140, a sound source setting unit 150, a voice creation unit 160, and a voice output unit 170.

（検出部１１０）
検出部１１０は、ユーザ位置Ｐ_ｕを検出する。より具体的には、検出部１１０は、例えば、センサ１０により検知されたユーザ位置をセンサ１０から取得することにより、ユーザ位置Ｐ_ｕを検出する。ユーザ位置Ｐ_ｕは、例えば、ｘ座標、ｙ座標からなる平面座標（ｘ_ｕ，ｙ_ｕ）で表される。例えば、センサ１０がＧＰＳ受信機を備える場合に、ｘ座標およびｙ座標は、それぞれ緯度および経度であってもよい。なお、ユーザ位置Ｐ_ｕは、ｘ座標、ｙ座標およびｚ座標からなる空間座標（ｘ_ｕ，ｙ_ｕ，ｚ_ｕ）により表されてもよい。この場合に、ｚ座標は、標高のような高さを示す値であってもよい。 (Detector 110)
The detection unit 110 detects the user position _Pu . More specifically, the detection unit 110 is, for example, by obtaining the user position detected by the sensor 10 from the sensor 10 detects the user position P _u. The user position P _u is represented by, for example, plane coordinates (x _u , _yu ) consisting of an x coordinate and ay coordinate. For example, when the sensor 10 includes a GPS receiver, the x coordinate and the y coordinate may be latitude and longitude, respectively. Note that the user position P _u may be represented by spatial coordinates (x _u , _yu , z _u ) made up of an x coordinate, ay coordinate, and a z coordinate. In this case, the z coordinate may be a value indicating a height such as an altitude.

また、検出部１１０は、例えば、ユーザの向きをさらに検出する。より具体的には、検出部１１０は、例えば、センサ１０により検知されたユーザの向きをセンサ１０から取得することにより、ユーザの向きを検出する。ここでのユーザの向きは、例えばユーザの正面方向であり、ユーザの正面方向と所定の方向とのなす角度θ_ｕで表される。図３は、検出されるユーザの向きの例を説明するための説明図である。図３を参照すると、例えば、ユーザ位置Ｐ_ｕが上記平面座標（ｘ_ｕ，ｙ_ｕ）で表される場合に、ユーザの向きは、ｘ軸の正の方向とユーザの向きとのなす角度θ_ｕで表される。 Moreover, the detection part 110 further detects a user's direction, for example. More specifically, the detection unit 110 detects the direction of the user by acquiring the direction of the user detected by the sensor 10 from the sensor 10, for example. The direction of the user here is, for example, the front direction of the user, and is represented by an angle θ _u formed by the front direction of the user and a predetermined direction. FIG. 3 is an explanatory diagram for explaining an example of a detected user orientation. Referring to FIG. 3, for example, when the user position P _u is represented by the plane coordinates (x _u , y _u ), the user orientation is the angle θ formed by the positive x-axis direction and the user orientation. It is represented by _u .

（目標入力部１２０）
目標入力部１２０は、ユーザ３による移動の最終的な目標位置（以下、「最終目標位置」と呼ぶ）を取得する。最終目標位置は、例えば、ユーザ位置Ｐ_ｕと同様の形式で表される。目標入力部１２０は、例えばユーザ３による入力操作に応じて最終目標位置を取得する。より具体的には、目標入力部１２０は、音声により最終目標位置の候補をユーザ３に提示し、ボタン、スイッチ、レバー等の操作部によりユーザ３により選択された候補を最終目標位置として取得してもよい。または、目標入力部１２０は、操作部によりユーザ３により選択された数字を検知し、当該数字を最終目標位置の平面座標として取得してもよい。または、目標入力部１２０は、集音されたユーザ３の音声を認識し、認識された音声に対応する最終目標位置を取得してもよい。なお、目標入力部１２０は、操作部を用いたユーザによる開始指示に応じて、ユーザ３に入力操作を求めてもよく、または、ユーザ位置Ｐ_ｕが地下鉄の改札出口、横断歩道端等の特定の位置となる場合に、ユーザ３に入力操作を求めてもよい。一方、ユーザ３による入力操作の代わりに、目標入力部１２０は、自動的に最終目標位置を取得してもよい。例えば、目標入力部１２０は、ユーザ位置Ｐ_ｕが地下鉄の改札出口、横断歩道端等の特定の位置となる場合に、当該特定の位置に対応する所定の位置を最終目標位置として自動的に取得してもよい。 (Target input unit 120)
The target input unit 120 acquires a final target position (hereinafter referred to as “final target position”) of movement by the user 3. Final target position is represented by, for example, user position P _u similar format. The target input unit 120 acquires the final target position in accordance with, for example, an input operation by the user 3. More specifically, the target input unit 120 presents the final target position candidate to the user 3 by voice, and acquires the candidate selected by the user 3 using the operation unit such as a button, switch, or lever as the final target position. May be. Alternatively, the target input unit 120 may detect a number selected by the user 3 using the operation unit and acquire the number as the plane coordinates of the final target position. Alternatively, the target input unit 120 may recognize the collected voice of the user 3 and acquire a final target position corresponding to the recognized voice. The target input unit 120 may request the user 3 to perform an input operation in response to a start instruction from the user using the operation unit, or the user position _Pu may specify a subway ticket gate exit, a pedestrian crossing end, or the like. The user 3 may be requested to perform an input operation. On the other hand, instead of the input operation by the user 3, the target input unit 120 may automatically acquire the final target position. For example, when the user position _Pu is a specific position such as a subway ticket gate exit or a crosswalk end, the target input unit 120 automatically acquires a predetermined position corresponding to the specific position as the final target position. May be.

（記憶部１３０）
記憶部１３０は、音声処理装置１００において一時的にまたは恒久的に保持すべき情報を記憶する。記憶部１３０は、例えば、ユーザが移動する空間における地図情報、および後述の案内情報作成部１４０により作成される経路を記憶する。また、記憶部１３０は、音声データを作成するためのサンプル音声データを記憶する。なお、記憶部１３０は、ハードディスク（Hard Disk）等の磁気記録媒体であってもよく、またはＥＥＰＲＯＭ（Electrically Erasable and Programmable Read Only Memory）、フラッシュメモリ（flash memory）、ＭＲＡＭ（Magnetoresistive Random Access Memory）、ＦｅＲＡＭ（Ferroelectric Random Access Memory）、ＰＲＡＭ（Phase change Random Access Memory）等の不揮発性メモリであってもよい。 (Storage unit 130)
The storage unit 130 stores information that should be temporarily or permanently stored in the speech processing apparatus 100. The storage unit 130 stores, for example, map information in a space where the user moves, and a route created by the guidance information creation unit 140 described later. The storage unit 130 also stores sample audio data for creating audio data. The storage unit 130 may be a magnetic recording medium such as a hard disk, an EEPROM (Electrically Erasable and Programmable Read Only Memory), a flash memory, a MRAM (Magnetoresistive Random Access Memory), It may be a non-volatile memory such as FeRAM (Ferroelectric Random Access Memory) or PRAM (Phase change Random Access Memory).

（案内情報作成部１４０）
案内情報作成部１４０は、ユーザ３を誘導すべき目標位置Ｐ_ｇと、当該目標位置への誘導の開始位置Ｐ_ｓとを設定する。案内情報作成部１４０は、例えば、記憶部１３０に記憶される地図情報に基づき、目標入力部１２０により取得された最終目標位置までの経路を作成し、当該経路に基づいて上記目標位置Ｐ_ｇおよび上記開始位置Ｐ_ｓを設定する。以下、この点について図４を参照してより具体的に説明する。 (Guidance information creation unit 140)
Guide information creating unit 140 sets the target position P _g should guide the user 3, the start position P _s of guidance to the target position. For example, the guidance information creation unit 140 creates a route to the final target position acquired by the target input unit 120 based on the map information stored in the storage unit 130, and based on the route, the target position _Pg and setting the start position _{P s.} Hereinafter, this point will be described more specifically with reference to FIG.

図４は、案内情報作成部１４０により作成される経路の例を説明するための説明図である。図４の４−１に示されるように、一例として、最終目標位置４３は、直線的な移動により経路作成時のユーザ位置４１から到達可能な位置である。この場合に、案内情報作成部１４０は、経路作成時のユーザ位置４１および最終目標位置４３を含む経路を作成する。そして、案内情報作成部１４０は、例えば経路作成時のユーザ位置４１を開始位置Ｐ_ｓとして設定し、最終目標位置４３を目標位置Ｐ_ｇとして設定する。 FIG. 4 is an explanatory diagram for explaining an example of a route created by the guidance information creation unit 140. As shown in 4-1 of FIG. 4, as an example, the final target position 43 is a position that can be reached from the user position 41 at the time of route creation by linear movement. In this case, the guidance information creation unit 140 creates a route including the user position 41 and the final target position 43 when the route is created. The guide information creation unit 140, for example, sets the user position 41 during route generation starting at P _s, sets the final target position 43 as the target position P _g.

また、図４の４−２に示されるように、別の例として、最終目標位置４３は、進行方向を変えることにより経路作成時のユーザ位置４１から到達可能な位置である。この場合に、案内情報作成部１４０は、経路作成時のユーザ位置４１、最終目標位置４３、および進行方向を変えるべき経路上の位置４５を含む経路を作成する。そして、案内情報作成部１４０は、まず例えば経路作成時のユーザ位置４１を開始位置Ｐ_ｓ１として設定し、当該開始位置Ｐ_ｓ１から直線的な移動により到達可能な経路上の位置４５ａを目標位置Ｐ_ｇ１として設定する。その後、ユーザ３が、経路上の位置４５ａである目標位置Ｐ_ｇ１に到達すると、案内情報作成部１４０は、経路上の位置４５ａを新たな開始位置Ｐ_ｓ２として設定し、当該開始位置Ｐ_ｓ２から直線的な移動により到達可能な経路上の位置４５ｂを新たな目標位置Ｐ_ｇ２として設定する。その後、同様に、案内情報作成部１４０は、経路上の位置４５ｂを新たな開始位置Ｐ_ｓ３として設定し、最終目標位置４３を新たな目標位置Ｐ_ｇ３として設定する。なお、案内情報作成部１４０は、例えば、作成された経路を記憶部１３０に記憶させる。 As shown in 4-2 of FIG. 4, as another example, the final target position 43 is a position that can be reached from the user position 41 at the time of route creation by changing the traveling direction. In this case, the guidance information creation unit 140 creates a route including the user position 41 at the time of route creation, the final target position 43, and a position 45 on the route whose traveling direction should be changed. The guidance information creation unit 140 first sets, for example, the user position 41 at the time of route creation as the start position P _s1 , and sets the position 45a on the route that can be reached by linear movement from the start position P _s1 as the target position P. Set as _g1 . Thereafter, when the user 3 reaches the target position P _g1 that is the position 45a on the route, the guidance information creation unit 140 sets the position 45a on the route as a new start position P _s2 , and starts from the start position P _s2. A position 45b on the path that can be reached by linear movement is set as a new target position _Pg2 . Thereafter, similarly, the guidance information creation unit 140 sets a position 45b on the route as a new start position _Ps3 , and sets a final target position 43 as a new target position _Pg3 . Note that the guidance information creation unit 140 stores the created route in the storage unit 130, for example.

なお、案内情報作成部１４０は、例えば、ユーザ３が目標位置Ｐ_ｇに到達したか否かを判定する。案内情報作成部１４０は、目標位置Ｐ_ｇとユーザ位置Ｐ_ｕとが一致する場合、または目標位置Ｐ_ｇとユーザ位置Ｐ_ｕとの距離が所定の閾値以下である場合に、ユーザ３が目標位置Ｐ_ｇに到達したと判定してもよい。また、案内情報作成部１４０は、ユーザの向きにも基づいて、ユーザ３が目標位置Ｐ_ｇに到達したと判定してもよい。例えば、案内情報作成部１４０は、上記距離が所定の閾値以下であって、かつユーザの向きが次の目標位置Ｐ_ｇの方向と一致する場合、すなわちユーザ３が次の目標位置Ｐ_ｇに向けて進み始める場合に、ユーザ３が目標位置Ｐ_ｇに到達したと判定してもよい。 The guide information creation unit 140, for example, determines whether the user 3 reaches the target position P _g. Guide information creating unit 140, when the case where the target position P _g and the user position P _u match, or the distance between the target position P _g and the user position P _u is equal to or less than a predetermined threshold value, the user 3 the target position It may be determined that _Pg has been reached. The guide information creation unit 140, also based on the user's orientation, it may be determined that the user 3 has reached the target position P _g. For example, the guide information creating unit 140 is a is the distance is equal to or less than a predetermined threshold value, and if the user of the orientation coincides with the direction of the next target position P _g, i.e. the user 3 toward the next target position P _g If you start the process proceeds Te, it may determine that the user 3 has reached the target position P _g.

（音源設定部１５０）
音源設定部１５０は、開始位置Ｐ_ｓに応じた第１の仮想音源位置（以下、「開始側仮想音源位置」と呼ぶ）、および目標位置Ｐ_ｇに応じた第２の仮想音源位置（以下、「目標側仮想音源位置」と呼ぶ）を設定する。音源設定部１５０は、例えば、案内情報作成部１４０により設定された開始位置Ｐ_ｓを開始側仮想音源位置として設定し、案内情報作成部１４０により設定された目標位置Ｐ_ｇを目標側仮想音源位置として設定する。以下、開始側仮想音源位置および目標側仮想音源位置がこのように設定される例について説明する。当該設定が前提となることを容易に理解できるように、開始側仮想音源位置を開始側仮想音源位置Ｐ_ｓ、目標側仮想音源位置を目標側仮想音源位置Ｐ_ｇと記載する。 (Sound source setting unit 150)
The sound source setting unit 150 includes a first virtual sound source position corresponding to the start position P _s (hereinafter referred to as “start side virtual sound source position”) and a second virtual sound source position corresponding to the target position P _g (hereinafter referred to as “start virtual sound source position”). (Referred to as “target-side virtual sound source position”). Sound source setting unit 150, for example, set the start position P _s that is set by the guide information creation section 140 as a starting virtual sound source position, target virtual sound source position set target position P _g by the guide information creation portion 140 Set as. Hereinafter, an example in which the start side virtual sound source position and the target side virtual sound source position are set in this way will be described. In order to easily understand that this setting is a prerequisite, the start side virtual sound source position is described as a start side virtual sound source position P _s , and the target side virtual sound source position is described as a target side virtual sound source position P _g .

（音声作成部１６０）
音声作成部１６０は、検出部１１０により検出されたユーザ位置Ｐ_ｕを用いて、開始側仮想音源位置Ｐ_ｓが音源位置としてユーザ３に知覚される第１の音声データ（以下、「開始側音声データ」と呼ぶ）、および目標側仮想音源位置Ｐ_ｇが音源としてユーザ３に知覚される第２の音声データ（以下、「目標側音声データ」と呼ぶ）を作成する。また、音声作成部１６０は、例えば、開始側音声データまたは目標側音声データの各々を、ユーザ位置Ｐ_ｕまたはユーザの向きと、開始側仮想音源位置Ｐ_ｓまたは目標側仮想音源位置Ｐ_ｇの各々との相対的関係に基づいて作成する。以下、この点について図５を参照してより具体的に説明する。 (Voice creation unit 160)
The sound creation unit 160 uses the user position P _u detected by the detection unit 110 to generate first sound data (hereinafter referred to as “start side sound”) that the user 3 perceives the start side virtual sound source position P _s as the sound source position. And second audio data (hereinafter referred to as “target-side audio data”) in which the user 3 perceives the target-side virtual sound source position _Pg as a sound source. Also, each of the sound creation unit 160, for example, the each initiator audio data or the target-side audio data, and user position P _u or user orientation, starting virtual sound source position P _s or target virtual sound source position P _g Create based on the relative relationship. Hereinafter, this point will be described more specifically with reference to FIG.

図５は、音声作成部１６０による音声データの作成を説明するための説明図である。図５を参照すると、例えば、音声作成部１６０は、ユーザ位置Ｐ_ｕと開始側仮想音源位置Ｐ_ｓとの距離ｄ_１、およびユーザの向き（ユーザの正面方向）と開始側仮想音源位置Ｐ_ｓの方向のなす角度θ_１を算出する。そして、音声作成部１６０は、開始側音声データとして、正面方向から角度θ_１だけずれた方向に距離ｄ_１だけ離れた位置が音源として知覚される音声データを作成する。同様に、音声作成部１６０は、例えば、ユーザ位置Ｐ_ｕと目標側仮想音源位置Ｐ_ｇとの距離ｄ_２、およびユーザの向き（ユーザの正面方向）と目標側仮想音源位置Ｐ_ｇの方向のなす角度θ_２を算出する。そして、音声作成部１６０は、目標側音声データとして、正面方向から角度θ_２だけずれた方向に距離ｄ_２だけ離れた位置が音源として知覚される音声データを作成する。すなわち、上記相対的関係は、例えば、ユーザ位置Ｐ_ｕと開始側仮想音源位置Ｐ_ｓとの距離ｄ_１、ユーザ位置Ｐ_ｕと目標側仮想音源位置Ｐ_ｇとの距離ｄ_２、ユーザの向きと開始側仮想音源位置Ｐ_ｓの方向とのなす角度θ_１、またはユーザの向きと目標側仮想音源位置Ｐ_ｇの方向とのなす角度θ_２を含む。 FIG. 5 is an explanatory diagram for explaining creation of voice data by the voice creation unit 160. Referring to FIG. 5, for example, the sound creation unit 160 includes the distance d ₁ between the user position P _u and the start-side virtual sound source position P _s , the user orientation (the user's front direction), and the start-side virtual sound source position P _s. An angle θ ₁ formed by the directions is calculated. Then, the voice creation unit 160 creates voice data in which the position separated by the distance d _{1 in} the direction shifted by the angle θ ₁ from the front direction is perceived as the sound source as the start side voice data. Similarly, the voice creation unit 160, for example, the distance d ₂ between the user position P _u and the target-side virtual sound source position P _g and the direction of the user (the front direction of the user) and the direction of the target-side virtual sound source position P _g The formed angle θ ₂ is calculated. Then, the sound creation unit 160 creates sound data in which the position separated by the distance d _{2 in} the direction shifted by the angle θ ₂ from the front direction is perceived as the sound source as the target side sound data. That is, the relative relationship, for example, the distance between the user position P _u start and virtual sound source position P _s d _1, user position P _u and the distance d ₂ between the target virtual sound source position P _{_g,} and the user's orientation It includes an angle θ ₁ formed with the direction of the start-side virtual sound source position P _s , or an angle θ ₂ formed between the user direction and the direction of the target-side virtual sound source position P _g .

上記距離ｄ_１およびｄ_２、並びに上記角度θ_１およびθ_２の算出手法を説明する。図５を参照すると、音声作成部１６０は、まず、検出部１１０により検出されたユーザ位置Ｐ_ｕおよびユーザの向きθ_ｕ、並びに、音源設定部１５０により設定された開始側仮想音源位置Ｐ_ｓおよび目標側仮想音源位置Ｐ_ｇを取得する。ここで、ユーザ位置Ｐ_ｕ、開始側仮想音源位置Ｐ_ｓおよび目標側仮想音源位置Ｐ_ｇが、それぞれ平面座標（ｘ_ｕ，ｙ_ｕ）、（ｘ_ｓ，ｙ_ｓ）および（ｘ_ｇ，ｙ_ｇ）で表されるものとする。すると、音声作成部１６０は、以下の式（１）および（２）により、距離ｄ_１およびｄ_２を算出する。 A method for calculating the distances d ₁ and d ₂ and the angles θ ₁ and θ ₂ will be described. Referring to FIG. 5, first, the sound creation unit 160 first detects the user position P _u and the user orientation θ _u detected by the detection unit 110, and the start side virtual sound source position P _s set by the sound source setting unit 150. The target side virtual sound source position _Pg is acquired. Here, the user position P _u , the start-side virtual sound source position P _s, and the target-side virtual sound source position P _g are respectively represented by plane coordinates (x _u , _yu ), (x _s , y _s ), and (x _g , y _g). ). Then, the voice creation unit 160 calculates the distances d ₁ and d ₂ by the following equations (1) and (2).

（１）

（２）

また、音声作成部１６０は、以下の式（３）および（４）により、ユーザ位置Ｐ_ｕから開始側仮想音源位置Ｐ_ｓへの方向θ_ｓ、およびユーザ位置Ｐ_ｕから目標側仮想音源位置Ｐ_ｇへの方向θ_ｇを算出する。

(1)

(2)

The audio creation unit 160, following the equation (3) and (4), the direction theta _s from the user position _{P u} to start virtual sound source position _{P s,} and user location _{P u} target virtual sound source position from the P to calculate the direction θ _g to _g.

（３）

（４）

そして、音声作成部１６０は、以下の式（５）および（６）により、角度θ_１およびθ_２を算出する。

(3)

(4)

Then, the voice creation unit 160 calculates the angles θ ₁ and θ ₂ by the following equations (5) and (6).

（５）

(5)

また、音声作成部１６０は、例えば以下のような音源定位手法により、正面方向から角度θ_１だけずれた方向に距離ｄ_１だけ離れた位置が音源として知覚される開始側音声データと、正面方向から角度θ_２だけずれた方向に距離ｄ_２だけ離れた位置が音源として知覚される目標側音声データとを作成する。なお、音声作成部１６０は、開始側音声データおよび目標側音声データをステレオ形式で作成する。 In addition, the voice creation unit 160 uses, for example, the following sound source localization method, start side voice data in which a position separated by a distance d _{1 in} a direction shifted by an angle θ ₁ from the front direction is perceived as a sound source, and the front direction And target side audio data in which a position separated by a distance d _{2 in} a direction deviated by an angle θ ₂ from the target is perceived as a sound source. The voice creation unit 160 creates the start side voice data and the target side voice data in a stereo format.

第１の例として、音声作成部１６０は、開始側音声データおよび目標側音声データを頭部伝達関数（ＨＲＴＦ：Head Related Transfer Function）の畳み込みにより作成する。ＨＲＴＦは、音源から耳に至るまでの音の伝達特性を表す関数である。例えば、ダミーヘッドとその耳の箇所に取付けられたマイクロフォンで、離散的に配置された音源で発する音のインパルス応答を測定することにより、ＨＲＴＦを得ることができる。音声作成部１６０は、記憶部１３０に記憶されている第１のサンプル音声データに、角度θ_１および距離ｄ_１に対応するＨＲＴＦを時間領域で畳み込むことにより、開始側音声データを作成することができる。同様に、音声作成部１６０は、記憶部１３０に記憶されている第２のサンプル音声データに、角度θ_２および距離ｄ_２に対応するＨＲＴＦを時間領域で畳み込むことにより、目標側音声データを作成することができる。 As a first example, the voice creation unit 160 creates start-side voice data and target-side voice data by convolution of a head related transfer function (HRTF). HRTF is a function representing the transfer characteristic of sound from the sound source to the ear. For example, the HRTF can be obtained by measuring the impulse response of sound emitted from a discretely arranged sound source with a dummy head and a microphone attached to the ear. The sound creation unit 160 can create start-side sound data by convolving the first sample sound data stored in the storage unit 130 with the HRTF corresponding to the angle θ ₁ and the distance d ₁ in the time domain. it can. Similarly, the voice creation unit 160 creates target side voice data by convolving the HRTF corresponding to the angle θ ₂ and the distance d ₂ with the second sample voice data stored in the storage unit 130 in the time domain. can do.

第２の例として、音声作成部１６０は、ステレオ形式におけるチャンネル間の音量のバランスを調整することにより、開始側音声データおよび目標側音声データを作成する。例えば、左右の２チャンネルステレオ形式で各音声データが作成される場合に、音声作成部１６０は、角度θ_１および距離ｄ_１に応じて左チャネルと右チャネルとの間の音量バランスを調整することにより、開始側音声データを作成する。同様に、音声作成部１６０は、角度θ_２および距離ｄ_２に応じて左チャネルと右チャネルとの間の音量バランスを調整することにより、目標側音声データを作成する。 As a second example, the sound creation unit 160 creates start-side sound data and target-side sound data by adjusting the balance of volume between channels in the stereo format. For example, when each audio data is generated in the left and right two-channel stereo format, the audio generation unit 160 adjusts the volume balance between the left channel and the right channel according to the angle θ ₁ and the distance d _1. Thus, the start side audio data is created. Similarly, the voice creation unit 160 creates target-side voice data by adjusting the volume balance between the left channel and the right channel according to the angle θ ₂ and the distance d ₂ .

以上、音声作成部１６０による開始側音声データおよび目標側音声データの作成手法を説明した。次に、図６〜図８を参照して、ユーザ３が開始位置Ｐ_ｓから目標位置Ｐ_ｇに移動するまでどのように音声データが作成されるかについて説明する。 Heretofore, the creation method of the start side voice data and the target side voice data by the voice creation unit 160 has been described. Next, with reference to FIGS. 6 to 8, it will be described how the audio data to the user 3 moves from the start position P _s in the target position P _g is created.

図６は、ユーザ３が目標位置Ｐ_ｇの方向に向くまでの音声データの作成を説明するための説明図である。図６を参照すると、６−１で、開始位置Ｐ_ｓから目標位置Ｐ_ｇへのユーザ３の誘導が開始される。そして、６−１から６−３にかけて、ユーザ３は、ユーザの向きと目標位置Ｐ_ｇの方向とが一致するように向きを変える。これに伴い、ユーザの向きと目標位置Ｐ_ｇ（すなわち目標側音源位置Ｐ_ｇ）の方向とのなす角θ_２が変化するので、音声作成部１６０は、正面方向から当該角度θ_２だけずれた方向に距離ｄ_２だけ離れた位置が音源として知覚される目標側音声データを随時作成する。なお、開始位置Ｐ_ｓ（すなわち開始側音源位置Ｐ_ｓ）とユーザ位置Ｐ_ｕとが一致するので、ユーザの向きと開始側仮想音源位置Ｐ_ｓの方向とのなす角θ_１は存在しない。しかし、図６に示されるように、音声作成部１６０は、例えば、６−１の誘導開始時のユーザの向きを開始側仮想音源位置Ｐ_ｓの方向とみなす。すなわち、音声作成部１６０は、現在のユーザの向きと誘導開始時のユーザの向き（開始側仮想音源位置Ｐ_ｓの方向）とのなす角θ_１だけずれた方向にあるいずれかの位置が音源として知覚される開始側音声データを作成する。このように、開始位置Ｐ_ｓにおいて誘導開始時のユーザの向きが開始側仮想音源位置Ｐ_ｓの方向とみなされることにより、ユーザ３は、開始位置Ｐ_ｓにおいて誘導開始時のユーザの向きにいつでも向き直ることが可能となる。 Figure 6 is an explanatory view for the user 3 illustrating the creation of audio data up to point toward the target position P _g. Referring to FIG. 6, 6-1, induction of the user 3 to the target position P _g is started from the start position P _s. Then, over a period of from 6-1 6-3, user 3, changing the direction so that the direction of the user of the orientation and the target position P _g coincide. Along with this, the angle θ ₂ formed by the user's direction and the direction of the target position P _g (that is, the target sound source position P _g ) changes, so that the sound creation unit 160 is shifted from the front direction by the angle θ ₂ . position apart direction by a distance d ₂ to create a target-side audio data that is perceived as a sound source at any time. Note that since the start position P _s (that is, the start-side sound source position P _s ) and the user position P _u coincide with each other, there is no angle θ ₁ formed by the user direction and the start-side virtual sound source position P _s . However, as shown in FIG. 6, the sound creation unit 160 is, for example, regarded as the direction of the start-side virtual sound source position P _s The user of the orientation of the induction starting 6-1. That is, the voice creation unit 160 determines that any position in a direction shifted by an angle θ ₁ formed by the current user direction and the user direction at the start of guidance (the direction of the starting virtual sound source position P _s ) is a sound source. Create the start side audio data perceived as Thus, the user of the orientation of at induction start is regarded as the direction of the start-side virtual sound source position P _s at the start position P _s, the user 3 at any time in the direction of the induction at the beginning of the user at the start position P _s It becomes possible to turn around.

図７は、ユーザ３が目標位置Ｐ_ｇに向かって直線移動する際の音声データの作成を説明するための説明図である。図７を参照すると、７−１で、ユーザ３は目標位置Ｐ_ｇへの移動を開始する。そして、７−１から７−３にかけて、ユーザ３は、目標位置Ｐ_ｇの方向へ移動する。これに伴い、ユーザ位置Ｐ_ｕと目標位置Ｐ_ｇ（すなわち目標側音源位置Ｐ_ｇ）との距離ｄ_２が変化するので、音声作成部１６０は、正面方向に当該距離ｄ_２だけ離れた位置が音源として知覚される目標側音声データを随時作成する。また、ユーザ位置Ｐ_ｕと開始位置Ｐ_ｓ（すなわち開始側音源位置Ｐ_ｓ）との距離ｄ_１が変化するので、音声作成部１６０は、背面方向に当該距離ｄ_１だけ離れた位置が音源として知覚される開始側音声データを随時作成する。このように音声データを随時作成するにあたり、音声作成部１６０は、例えば、距離ｄ_１および距離ｄ_２に応じて、開始側音声データおよび目標側音声データの音量を変化させる。以上のように、ユーザ３は、図６および図７のように移動すれば開始位置Ｐ_ｓから目標位置Ｐ_ｇに到達可能である。ただし、次に説明するように、ユーザ３は、開始位置Ｐ_ｓに戻ろうとする場合、経路から外れた場合等に移動の途中で向きを変える可能性もある。 Figure 7 is an explanatory diagram for explaining the creation of sound data when the user 3 linearly moves toward the target position P _g. Referring to FIG. 7, at 7-1, the user 3 starts moving to the target position P _g. Then, over a period of from 7-1 7-3, the user 3 is moved toward the target position _{P g.} Accordingly, since the distance d ₂ between the user position P _u and the target position P _g (that is, the target-side sound source position P _g ) changes, the sound creation unit 160 has a position separated by the distance d _{2 in the} front direction. Create target-side audio data perceived as a sound source. In addition, since the distance d ₁ between the user position P _u and the start position P _s (that is, the start-side sound source position P _s ) changes, the sound creation unit 160 uses a position separated by the distance d _{1 in the} back direction as a sound source. Create perceived starter audio data as needed. As described above, when creating the voice data as needed, the voice creation unit 160 changes the volume of the start side voice data and the target side voice data according to the distance d ₁ and the distance d ₂ , for example. As described above, the user 3 can reach the target position P _g from the start position P _s be moved as shown in FIGS. However, as described below, the user 3, when attempts to return to the start position P _s, there is a possibility to change the orientation in the middle of the movement, such as when out of the path.

図８は、ユーザ３が移動の途中で向きを変える際の音声データの作成を説明するための説明図である。図８を参照すると、８−１で、ユーザ３は目標位置Ｐ_ｇへの移動を停止する。そして、８−１から８−３にかけて、ユーザ３は、右側に向きを変化させる。これに伴い、ユーザの向きと目標位置Ｐ_ｇ（すなわち目標側音源位置Ｐ_ｇ）の方向とのなす角θ_２が変化するので、音声作成部１６０は、正面方向から当該角度θ_２だけずれた方向に距離ｄ_２だけ離れた位置が音源として知覚される目標側音声データを随時作成する。同様に、ユーザの向きと開始位置Ｐ_ｓ（すなわち開始側音源位置Ｐ_ｓ）の方向とのなす角θ_１が変化するので、音声作成部１６０は、正面方向から当該角度θ_１だけずれた方向に距離ｄ_１だけ離れた位置が音源として知覚される開始側音声データを随時作成する。 FIG. 8 is an explanatory diagram for explaining creation of audio data when the user 3 changes direction during movement. Referring to FIG. 8, 8-1, user 3 stops moving to the target position P _g. And from 8-1 to 8-3, the user 3 changes direction to the right side. Along with this, the angle θ ₂ formed by the user's direction and the direction of the target position P _g (that is, the target sound source position P _g ) changes, so that the sound creation unit 160 is shifted from the front direction by the angle θ ₂ . position apart direction by a distance d ₂ to create a target-side audio data that is perceived as a sound source at any time. Similarly, since the angle θ ₁ formed by the direction of the user and the direction of the start position P _s (that is, the start-side sound source position P _s ) changes, the sound creation unit 160 is shifted from the front direction by the angle θ _1. Starting side audio data in which a position separated by a distance d ₁ is perceived as a sound source is generated as needed.

このような開始側音声データおよび目標側音声データの作成により、特定の位置にスピーカのような音声を発する装置が設置されなくても、開始位置Ｐ_ｓおよび目標位置Ｐ_ｇがどのあたりに存在するかをユーザ３に直感的に把握させることが可能となる。また、ユーザ３が聞くこれらの音声データの音声は、ユーザの移動に応じて変化するので、ユーザは、当該音声の変化から、自身が正しい進行方向に進んでいるか、経路から離れていないかを、随時直感的に把握することができる。また、目標側音声データのみならず開始側音声データも作成されるので、ユーザは、正しい経路や開始位置Ｐ_ｓに戻ることもできる。すなわち、音声を発する装置が設置されていない場所であっても、異種鳴き交わし方式のように、正しい進行方向および経路から離れた場合の戻り方を直感的に把握させることが可能になる。 The creation of such initiator audio data and the target-side audio data, even if it is not installed device that emits a sound such as a speaker in a specific position, the start position P _s and the target position P _g is present per any It is possible to make the user 3 intuitively grasp this. In addition, since the voice of these voice data that the user 3 hears changes according to the movement of the user, the user can check whether the voice is changing in the correct traveling direction or not away from the route. Intuitive grasp at any time. Moreover, since the initiating audio data not target side audio data only are also created, the user can return to the correct path and the start position P _s. That is, even in a place where a device that emits voice is not installed, it is possible to intuitively grasp the correct direction of travel and how to return when the route is away from the route, as in the case of different squealing methods.

以上のように作成される開始側音声データおよび目標側音声データは、例えば、異なる音声パターンを有する。ここでの音声パターンは、例えば、音声の種類である。開始側音声データは、例えば「カッコー」という音声の種類を有し、目標側音声データは、例えば「カカッコー」という音声の種類を有する。なお、より具体的な情報を提供するために、開始側音声データが「開始位置」または「後ろ」という音声の種類を有し、目標側音声データが「目標位置」または「前」という音声の種類を有してもよい。また、音声パターンは、音声の種類に限られず、例えば音声の音高、テンポ等であってもよい。このように、開始側音声データと目標側音声データの音声パターンが互いに異なれば、ユーザ３は両者の音声を聞き分けることができる。その結果、ユーザ３は、どちらが開始位置Ｐ_ｓであり、どちらが目標位置Ｐ_ｇであるかを識別することが可能となる。 The start side audio data and the target side audio data created as described above have different audio patterns, for example. The sound pattern here is, for example, the type of sound. The start side audio data has, for example, a type of audio “Cuckoo”, and the target side audio data has, for example, a type of audio “Cuckoo”. In order to provide more specific information, the start side audio data has a sound type of “start position” or “back”, and the target side audio data has an audio type of “target position” or “front”. You may have a kind. The sound pattern is not limited to the type of sound, and may be, for example, the pitch of the sound, the tempo, or the like. Thus, if the voice patterns of the start side voice data and the target side voice data are different from each other, the user 3 can distinguish between the voices of the two. As a result, the user 3, which is the start position P _s, which is possible to identify whether the target position P _g.

以上のように、音声作成部１６０は、開始側音声データおよび目標側音声データをそれぞれ作成する。そして、音声作成部１６０は、作成した開始側音声データおよび目標側音声データを時間軸上の異なる位置に配置する。図９は、音声作成部１６０による音声データの時間軸上への配置を説明するための説明図である。図９を参照すると、時間帯ｔ１に示されるように、音声作成部１６０は、例えば、時間軸上において開始側音声データＡと目標側音声データＢとを交互に配置する。または、時間帯ｔ２に示されるように、音声作成部１６０は、例えば、開始側音声データＡと目標側音声データＢとを異なる頻度で配置してもよい。このように開始側音声データと目標側音声データとが時間軸上の異なる位置に配置されれば、開始側音声データの音声と目標側音声データの音声とが混じらない。その結果、ユーザ３は、開始側音声データの音声と目標側音声データの音声を容易に聞き分けることができる。なお、音声作成部１６０は、例えば、音声の出力頻度または出力タイミングを示すパラメータを設定することにより、開始側音声データおよび目標側音声データを時間軸上の異なる位置に配置する。 As described above, the voice creation unit 160 creates the start side voice data and the target side voice data, respectively. Then, the voice creation unit 160 arranges the created start-side voice data and target-side voice data at different positions on the time axis. FIG. 9 is an explanatory diagram for explaining the arrangement of audio data on the time axis by the audio creation unit 160. Referring to FIG. 9, as shown in the time zone t1, the voice creation unit 160 alternately arranges the start side voice data A and the target side voice data B on the time axis, for example. Alternatively, as shown in the time zone t2, the voice creation unit 160 may arrange the start side voice data A and the target side voice data B at different frequencies, for example. Thus, if the start side audio data and the target side audio data are arranged at different positions on the time axis, the start side audio data and the target side audio data are not mixed. As a result, the user 3 can easily distinguish the voice of the start side voice data and the voice of the target side voice data. Note that the voice creation unit 160 arranges the start-side voice data and the target-side voice data at different positions on the time axis by setting a parameter indicating the voice output frequency or output timing, for example.

（音声出力部１７０）
音声出力部１７０は、音声作成部１６０により作成された開始側音声データおよび目標側音声データをＤ／Ａ変換することによりアナログ音声信号を生成し、当該アナログ音声信号を音声出力装置２０へ出力する。音声出力部１７０は、音声作成部１６０による時間軸上への音声データの配置に従ってアナログ音声信号の出力を行う。なお、音声出力装置２０が音声データをＤ／Ａ変換する場合には、音声出力部１７０は音声データそのものを音声出力装置２０に出力してもよい。 (Audio output unit 170)
The audio output unit 170 generates an analog audio signal by performing D / A conversion on the start side audio data and the target side audio data generated by the audio generation unit 160, and outputs the analog audio signal to the audio output device 20. . The audio output unit 170 outputs an analog audio signal according to the arrangement of audio data on the time axis by the audio generation unit 160. When the audio output device 20 performs D / A conversion on the audio data, the audio output unit 170 may output the audio data itself to the audio output device 20.

以上、図２〜図１０を参照して、本実施形態に係る音声処理装置１００の構成の一例について説明したが、本実施形態に係る音声処理装置１００の構成はこの一例に限定されない。 As mentioned above, with reference to FIGS. 2-10, although the example of the structure of the speech processing apparatus 100 which concerns on this embodiment was demonstrated, the structure of the speech processing apparatus 100 which concerns on this embodiment is not limited to this example.

（変形例）
−音声データの作成
例えば、音声作成部１６０は、開始側音声データおよび目標側音声データの各々を、ユーザ位置Ｐ_ｕまたはユーザの向きと、開始側仮想音源位置Ｐ_ｓまたは目標側仮想音源位置Ｐ_ｇの各々との相対的関係に応じた音量または音声パターンで作成してもよい。既に説明したように、上記相対的関係は、例えば、ユーザ位置Ｐ_ｕと開始側仮想音源位置Ｐ_ｓとの距離ｄ_１、ユーザ位置Ｐ_ｕと目標側仮想音源位置Ｐ_ｇとの距離ｄ_２、ユーザの向きと開始側仮想音源位置Ｐ_ｓの方向とのなす角度θ_１、またはユーザの向きと目標側仮想音源位置Ｐ_ｇの方向とのなす角度θ_２を含む。また、音声パターンは、例えば、音声の種類、音声の音高、テンポ等である。 (Modification)
- creation of audio data, for example, voice creation unit 160, each of the start-side audio data and the target-side audio data, and user position P _u or user orientation, starting virtual sound source position P _s or target virtual sound source position P You may produce by the sound volume or audio | voice pattern according to the relative relationship with each of _g . As already explained, the relative relationship, for example, the user position P the distance d ₁ between _u and start virtual sound source position P _{_s,} the user position P _u and the distance d ₂ between the target virtual sound source position P _g, It includes an angle θ ₁ formed by the user direction and the direction of the start-side virtual sound source position P _s , or an angle θ ₂ formed by the user direction and the direction of the target-side virtual sound source position P _g . The sound pattern is, for example, a sound type, a sound pitch, a tempo, and the like.

具体的な例として、図７を再び参照すると、音声作成部１６０は、例えば、距離ｄ_１または距離ｄ_２が小さくなるにつれて、開始側音声データまたは目標側音声データにおける音声のテンポを速くしてもよい。または、音声作成部１６０は、距離ｄ_１と距離ｄ_２との比率に応じて、開始側音声データにおける音声のテンポおよび目標側音声データにおける音声のテンポを変化させてもよい。また、図６または図８を再び参照すると、音声作成部１６０は、例えば、角度θ_１または角度θ_２が小さくなるにつれて、開始側音声データまたは目標側音声データにおける音量を大きくしてもよい。または、音声作成部１６０は、角度θ_１と角度θ_２との比率に応じて、開始側音声データにおける音量および目標側音声データにおける音量を変化させてもよい。または、音声作成部１６０は、角度θ_１または角度θ_２が０°または所定の閾値よりも小さい値である場合、すなわちユーザ３が開始位置Ｐ_ｓまたは目標位置Ｐ_ｇの方向を向いている場合に、開始側音声データまたは目標側音声データにおける音声の種類を、その他の場合とは異なる音声の種類としてもよい。 As a specific example, referring to FIG. 7 again, for example, the voice creation unit 160 increases the voice tempo in the start-side voice data or the target-side voice data as the distance d ₁ or the distance d ₂ decreases. Also good. Alternatively, the voice creation unit 160 may change the voice tempo in the start-side voice data and the voice tempo in the target-side voice data according to the ratio between the distance d ₁ and the distance d ₂ . Further, referring to FIG. 6 or FIG. 8 again, the sound creation unit 160 may increase the volume in the start side sound data or the target side sound data as the angle θ ₁ or the angle θ ₂ decreases, for example. Alternatively, the voice creation unit 160 may change the volume in the start side voice data and the volume in the target side voice data in accordance with the ratio between the angle θ ₁ and the angle θ ₂ . Alternatively, the voice creation unit 160 determines that the angle θ ₁ or the angle θ ₂ is 0 ° or a value smaller than a predetermined threshold, that is, the user 3 is facing the start position P _s or the target position P _g . In addition, the voice type in the start side voice data or the target side voice data may be a voice type different from other cases.

別の具体的な例として、音声作成部１６０は、開始側音声データおよび目標側音声データの各々を、上記相対的関係に応じた案内情報（すなわち音声の種類）で作成してもよい。例えば、案内情報作成部１４０が、開始位置Ｐ_ｓまたは目標位置Ｐ_ｇに関する案内情報と、当該案内情報が音声化されたサンプル音声データとを作成する。そして、音声作成部１６０は、案内情報作成部１４０により作成された当該サンプル音声データを用いて、開始側音声データまたは目標側音声データを作成する。当該案内情報は、例えば、「目標までｄ_２メートル、角度θ_２度」という情報である。または、当該案内情報は、開始位置Ｐ_ｓまたは目標位置Ｐ_ｇに何があるかを示す情報であってもよい。 As another specific example, the voice creation unit 160 may create each of the start-side voice data and the target-side voice data with guidance information (that is, the type of voice) according to the relative relationship. For example, the guidance information creation unit 140 creates guidance information regarding the start position P _s or the target position P _g and sample voice data in which the guidance information is voiced. Then, the voice creation unit 160 creates start-side voice data or target-side voice data using the sample voice data created by the guidance information creation unit 140. The guidance information is, for example, information "d ₂ meters target, the angle theta ₂ degrees". Or, the guidance information may be information indicating what is in the start position P _s or the target position P _g.

以上のように、上記相対的関係に応じた音量または音声パターンで開始側音声データおよび目標側音声データを作成することにより、開始位置Ｐ_ｓまたは目標位置Ｐ_ｇの位置をより容易に認識することをユーザ３に可能にする。例えば、ユーザ３は、開始位置Ｐ_ｓ若しくは目標位置Ｐ_ｇの方向を容易に認識することができ、または、開始位置Ｐ_ｓ若しくは目標位置Ｐ_ｇに近づいているか否かを容易に認識することができる。 As described above, by creating a starting side audio data and the target-side audio data in sound or voice patterns corresponding to the relative relationship, to recognize the position of the start position P _s or the target position P _g more easily To the user 3. For example, user 3, that the direction of the start position P _s or the target position P _g can be easily recognized, or easily recognize whether the approaching start position P _s or the target position P _g it can.

−音声データの時間軸上での配置
また、音声作成部１６０は、開始側音声データおよび目標側音声データの各々を、上記相対的関係に応じた頻度で時間軸上の異なる位置に配置してもよい。 -Arrangement of audio data on time axis In addition, the audio creation unit 160 arranges each of the start side audio data and the target side audio data at different positions on the time axis at a frequency according to the relative relationship. Also good.

具体的には、音声作成部１６０は、例えば、距離ｄ_１または距離ｄ_２がより小さくなるにつれて、開始側音声データまたは目標側音声データをより高い頻度で配置してもよい。または、音声作成部１６０は、例えば、角度θ_１または角度θ_２が小さくなるにつれて、開始側音声データまたは目標側音声データをより高い頻度で配置してもよい。または、音声作成部１６０は、距離ｄ_１と距離ｄ_２との比率、または角度θ_１と角度θ_２との比率に応じて、開始側音声データおよび目標側音声データの頻度を変化させてもよい。図９を再び参照すると、例えば、ユーザ３が開始位置Ｐ_ｓから目標位置Ｐ_ｇに近づいた結果、比率ｄ_１／ｄ_２が時間帯ｔ_１から時間帯ｔ_２になって大きくなる場合に、音声作成部１６０は、目標側音声データＢの頻度をより高くしてもよい。 Specifically, the sound creation unit 160, for example, as the distance d ₁ or the distance d ₂ is smaller, may be arranged initiator audio data or the target side voice data more frequently. Alternatively, the voice creation unit 160 may arrange the start side voice data or the target side voice data at a higher frequency, for example, as the angle θ ₁ or the angle θ ₂ decreases. Alternatively, the voice creation unit 160 may change the frequency of the start-side voice data and the target-side voice data according to the ratio between the distance d ₁ and the distance d ₂ or the ratio between the angle θ ₁ and the angle θ _2. Good. Referring again to FIG. 9, for example, when the ratio d ₁ / d ₂ increases from the time zone t ₁ to the time zone t ₂ as a result of the user 3 approaching the target position P _g from the start position P _s , The voice creation unit 160 may increase the frequency of the target side voice data B.

以上のように、相対的関係に応じた時間軸上に配置される頻度を変えることによっても、開始位置Ｐ_ｓまたは目標位置Ｐ_ｇの位置をより容易に認識することをユーザ３に可能にする。 As described above, by changing the frequency to be arranged on the time axis corresponding to the relative relationship allows the user 3 to recognize the position of the start position P _s or the target position P _g more easily .

−仮想音源位置の設定
また、音源設定部１５０は、開始位置Ｐ_ｓ以外の位置を開始側仮想音源位置として設定してもよく、また目標位置Ｐ_ｇ以外の位置を目標側仮想音源位置として設定してもよい。以下、図１０を参照して、具体的な仮想音源位置の設定の変形例を説明する。 - Setting of the virtual sound source position also sound source setting unit 150, the setting may be setting the position other than the start position P _s as a starting virtual sound source position and the position other than the target position P _g as the target virtual sound source positions May be. Hereinafter, a specific modification of the setting of the virtual sound source position will be described with reference to FIG.

図１０は、仮想音源位置の設定の変形例を説明するための説明図である。一例として、図１０の１０−１および１０−２に示されるように、音源設定部１５０は、開始位置Ｐ_ｓとユーザ位置Ｐ_ｕとの間のいずれかの位置５１を開始側仮想音源位置として設定してもよい。同様に、音源設定部１５０は、目標位置Ｐ_ｇとユーザ位置Ｐ_ｕとの間のいずれかの位置５３を目標側仮想音源位置として設定してもよい。 FIG. 10 is an explanatory diagram for describing a modified example of the setting of the virtual sound source position. As an example, as illustrated in 10-1 and 10-2 of FIG. 10, the sound source setting unit 150 uses any position 51 between the start position P _s and the user position _Pu as a start-side virtual sound source position. It may be set. Similarly, the sound source setting portion 150 may set one of the positions 53 between the target position P _g and the user position P _u as the target virtual sound source positions.

別の例として、図１０の１０−３および１０−４に示されるように、音源設定部１５０は、開始位置Ｐ_ｓから目標位置Ｐ_ｇまでの経路上の位置であって、ユーザ位置Ｐ_ｕから開始位置Ｐ_ｓに戻るために通過すべき位置５５を開始側仮想音源位置として設定してもよい。同様に、音源設定部１５０は、開始位置Ｐ_ｓから目標位置Ｐ_ｇまでの経路上の位置であって、ユーザ位置Ｐ_ｕから目標位置Ｐ_ｇに到達するために通過すべき位置５７を目標側仮想音源位置として設定してもよい。 As another example, as shown in 10-3 and 10-4 in FIG. 10, the sound source setting unit 150, a position on the path from the start position _{P s} to the target position _{P g,} the user position _{P u} may set the position 55 it should pass to return to the start position P _s as a starting virtual sound source position from. Similarly, the sound source setting unit 150 is a position on the path from the start position P _s to the target position P _g that should pass through to reach the target position P _g from the user position _Pu. You may set as a virtual sound source position.

さらに別の例として、音源設定部１５０は、開始位置Ｐ_ｓの近傍のいずれかの位置および目標位置Ｐ_ｇの近傍のいずれかの位置を、それぞれ開始側仮想音源位置および目標側仮想音源位置として設定してもよい。 As yet another example, the sound source setting unit 150 uses any position in the vicinity of the start position P _s and any position in the vicinity of the target position P _{g as} the start side virtual sound source position and the target side virtual sound source position, respectively. It may be set.

なお、別の観点として、音源設定部１５０は、開始側仮想音源位置および目標側仮想音源位置を２次元平面上ではなく３次元空間内に設定してもよい。この場合に、音声作成部１６０は、ＨＲＴＦを用いてユーザ３よりも高い位置または低い位置が音源として知覚される開始側音声データまたは目標側音声データを作成してもよい。このような音声データによれば、ユーザ３は、例えば階下または階上に目標位置を直感的に把握することができる。 As another viewpoint, the sound source setting unit 150 may set the start-side virtual sound source position and the target-side virtual sound source position in a three-dimensional space instead of on a two-dimensional plane. In this case, the voice creation unit 160 may create start-side voice data or target-side voice data in which a position higher or lower than the user 3 is perceived as a sound source using HRTF. According to such audio data, the user 3 can intuitively grasp the target position, for example, downstairs or upstairs.

＜４．処理の流れ＞
以下では、図１１を用いて、本実施形態に係る位置推定処理の一例について説明する。図１１は、本実施形態に係る位置推定処理の概略的な流れの一例を示すフローチャートである。 <4. Process flow>
Hereinafter, an example of the position estimation process according to the present embodiment will be described with reference to FIG. FIG. 11 is a flowchart illustrating an example of a schematic flow of the position estimation process according to the present embodiment.

まず、ステップＳ２０１では、目標入力部１２０は、最終目標位置を取得する。またステップＳ２０３では、検出部１１０は、ユーザ位置Ｐ_ｕおよびユーザの向きを検出する。そして、ステップＳ２０５では、案内情報作成部１４０は、記憶部１３０に記憶される地図情報に基づき、ユーザ位置Ｐ_ｕから最終目標位置までの経路を作成する。 First, in step S201, the target input unit 120 acquires a final target position. In step S203, the detection unit 110 detects the user position _Pu and the user orientation. In step S205, the guidance information creation unit 140 creates a route from the user position _Pu to the final target position based on the map information stored in the storage unit 130.

次に、ステップＳ２０７では、案内情報作成部１４０は、作成された経路に基づいて、ユーザ３を誘導すべき目標位置Ｐ_ｇと、当該目標位置への誘導の開始位置Ｐ_ｓとを設定する。そして、ステップＳ２０９では、音源設定部１５０は、開始位置Ｐ_ｓに応じた開始側仮想音源位置、および目標位置Ｐ_ｇに応じた目標側仮想音源位置を設定する。 Next, in step S207, the guide information creating unit 140, based on the route that was created, sets the target position P _g should guide the user 3, the start position P _s of guidance to the target position. In step S209, the sound source setting unit 150, the initiator virtual sound source position corresponding to the start position P _s, and sets the target virtual sound source position corresponding to the target position P _g.

次に、ステップＳ２１１では、音声作成部１６０は、検出部１１０により検出されたユーザ位置Ｐ_ｕを用いて、開始側仮想音源位置Ｐ_ｓが音源位置としてユーザ３に知覚される開始側音声データ、および目標側仮想音源位置Ｐ_ｇが音源としてユーザ３に知覚される目標側音声データを作成する。そして、ステップＳ２１３では、音声作成部１６０は、開始側音声データおよび目標側音声データを時間軸上の異なる位置に配置する。そして、ステップＳ２１５では、音声出力部１７０は、音声作成部１６０により作成された開始側音声データおよび目標側音声データをＤ／Ａ変換することによりアナログ音声信号を生成し、当該アナログ音声信号を音声出力装置２０へ出力する。 Next, in step S211, the voice creation unit 160 uses the user position P _u detected by the detection unit 110 to start side voice data in which the user 3 perceives the start side virtual sound source position P _s as a sound source position, Then, target side audio data in which the target side virtual sound source position _Pg is perceived by the user 3 as a sound source is created. In step S213, the voice creation unit 160 arranges the start side voice data and the target side voice data at different positions on the time axis. In step S215, the audio output unit 170 generates an analog audio signal by performing D / A conversion on the start side audio data and the target side audio data generated by the audio generation unit 160, and converts the analog audio signal into audio. Output to the output device 20.

次に、ステップＳ２１７では、検出部１１０は、ユーザ位置Ｐ_ｕおよびユーザの向きを検出する。次に、ステップＳ２１９では、音声作成部１６０は、ユーザ位置Ｐ_ｕまたはユーザの向きが変わったか否かを判定する。ユーザ位置Ｐ_ｕまたはユーザの向きが変わっていれば、処理はステップＳ２２１へ進む。そうでなければ、処理はステップＳ２１３へ戻る。 Next, in step S217, the detection unit 110 detects the user position _Pu and the user orientation. Next, in step S219, the voice creation unit 160 determines whether the user position _Pu or the user orientation has changed. If the user position _Pu or the user orientation has changed, the process proceeds to step S221. Otherwise, the process returns to step S213.

ステップＳ２２１では、案内情報作成部１４０は、ユーザ３が目標位置Ｐ_ｇに到達したか否かを判定する。ユーザ３が目標位置Ｐ_ｇに到達していれば、処理はステップＳ２２３へ進む。そうでなければ、処理はステップＳ２１１へ戻る。 In step S221, the guide information creating unit 140 determines whether the user 3 reaches the target position _{P g.} If the user 3 long reached the target position _{P g,} the process proceeds to step S223. Otherwise, the process returns to step S211.

ステップＳ２２１では、案内情報作成部１４０は、目標位置Ｐ_ｇが最終目標位置であるか否かを判定する。目標位置Ｐ_ｇが最終目標位置であれば、処理は終了する。そうでなければ、処理はステップＳ２０７へ戻る。 In step S221, the guidance information creation unit 140 determines whether or not the target position _Pg is the final target position. If the target position _Pg is the final target position, the process ends. Otherwise, the process returns to step S207.

以上、本発明の一実施形態について説明したように、本実施形態によれば、特定の位置に音声を発する装置が設置されなくても、異種鳴き交わし方式のように、正しい進行方向および経路から離れた場合の戻り方を直感的に把握させることが可能になる。 As described above, according to the embodiment of the present invention, according to the present embodiment, even if a device that emits a sound is not installed at a specific position, it is possible to start from a correct traveling direction and route as in the case of different types of squealing. It is possible to intuitively understand how to return when leaving.

なお、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明は係る例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 In addition, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to the example which concerns. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

また、本明細書の音声処理における各ステップは、必ずしもフローチャートに記載された順序に沿って時系列に処理する必要はない。例えば、音声処理における各ステップは、フローチャートとして記載した順序と異なる順序で処理されても、並列的に処理されてもよい。 Further, each step in the voice processing of this specification does not necessarily have to be processed in time series in the order described in the flowchart. For example, each step in the audio processing may be processed in an order different from the order described as the flowchart, or may be processed in parallel.

また、音声処理装置１００に内蔵されるＣＰＵ、ＲＯＭおよびＲＡＭ等のハードウェアを、上記音声処理装置１００の各構成と同等の機能を発揮させるためのコンピュータプログラムも作成可能である。また、当該コンピュータプログラムを記憶させた記憶媒体も提供される。 Further, it is possible to create a computer program for causing hardware such as a CPU, a ROM, and a RAM built in the voice processing apparatus 100 to perform the same functions as the components of the voice processing apparatus 100. A storage medium storing the computer program is also provided.

１誘導案内システム
３ユーザ
１０センサ
２０音声出力装置
１００音声処理装置
１１０検出部
１２０目標入力部
１３０記憶部
１４０案内情報作成部
１５０音源設定部
１６０音声作成部
１７０音声出力部
DESCRIPTION OF SYMBOLS 1 Guide guidance system 3 User 10 Sensor 20 Audio | voice output apparatus 100 Audio | voice processing apparatus 110 Detection part 120 Target input part 130 Memory | storage part 140 Guidance information creation part 150 Sound source setting part 160 Audio | voice creation part 170 Audio | voice output part

Claims

A detection unit for detecting a user position;
A first virtual sound source position corresponding to a start position which is the user position detected by the detection unit at the time when guidance to the user is started, and a target position of the guidance or between the user position and the target position a sound source setting unit that sets a second virtual sound source position corresponding to the position existing on the path,
Using the user position detected by the detection unit, first audio data in which the first virtual sound source position is perceived by the user as a sound source position, and the second virtual sound source position is perceived by the user as a sound source. A voice creation unit for creating second voice data to be played,
Equipped with a,
The first audio data and the second audio data have different audio patterns,
The sound creation unit place the first audio data and the second audio data at different positions on the time axis, the sound processing apparatus.

The detection unit further detects the orientation of the user;
The voice creation unit converts each of the first voice data or the second voice data into the user position or the user orientation, the first virtual sound source position, or the second virtual sound source position. created based on the relative relationship between the sound processing device according to claim 1.

The voice processing device according to claim 2 , wherein the voice creation unit creates each of the first voice data and the second voice data with a volume or a voice pattern corresponding to the relative relationship.

The sound creation unit, each of the first audio data and the second audio data is placed in different positions on the time axis at a frequency corresponding to the relative relationship, claim 2 or 3 The speech processing apparatus according to one item.

The relative relationship is the distance between the user position and the first virtual sound source position or the second virtual sound source position, or the direction of the user and the direction of the first virtual sound source position or the first virtual sound source position. including an angle between the direction of the second virtual sound source position, the sound processing apparatus as claimed in any one in claim 2-4.

The sound creation unit, the first audio data and the second sound data to create a stereo format, audio processing apparatus according to any one of claims 1-5.

The speech processing apparatus according to claim 6, wherein the speech creation unit creates the first speech data and the second speech data by convolution of a head related transfer function.

A step of detecting a user position by the detection unit ;
The first virtual sound source position corresponding to the start position, which is the user position detected by the detection unit when the sound source setting unit starts guidance to the user , and the target position of the guide or the user position and the target and setting a second virtual sound source position corresponding to the position existing on the path between the position,
The sound source preparation section is detected by the detection unit with a user position, the first audio data have been the first virtual sound source position set by the setting in the sound source unit is perceived by the user as a sound source position, and the Creating second voice data having a voice pattern different from the first voice data, and the second virtual sound source position set by the sound source setting unit being perceived by a user as a sound source;
The sound source creation unit placing the first audio data and the second audio data at different positions on a time axis;
Including a voice processing method.

Computer
A detection unit for detecting a user position;
A first virtual sound source position corresponding to a start position which is the user position detected by the detection unit at the time when guidance to the user is started, and a target position of the guidance or between the user position and the target position a sound source setting unit that sets a second virtual sound source position corresponding to the position existing on the path,
Using the user position detected by the detection unit, the first virtual sound source position is perceived by the user as a sound source position, and has a sound pattern different from the first sound data. sound creation aforementioned second virtual sound source position is to create a second audio data that is perceived to the user as a sound source, arranging the first audio data and the second audio data at different positions on the time axis And
Program to function as.

A guidance guidance system including a sensor, a voice output device and a voice processing device,
The voice processing device
A detection unit for detecting a user position based on an input from the sensor;
A first virtual sound source position corresponding to a start position which is the user position detected by the detection unit at the time when guidance to the user is started, and a target position of the guidance or between the user position and the target position a sound source setting unit that sets a second virtual sound source position corresponding to the position existing on the path,
Using the user position detected by the detection unit, first audio data in which the first virtual sound source position is perceived by the user as a sound source position, and the second virtual sound source position is perceived by the user as a sound source. A voice creation unit for creating second voice data to be played,
With
The first audio data and the second audio data have different audio patterns,
The voice creation unit arranges the first voice data and the second voice data at different positions on the time axis,
The sound output device outputs the sound of the first sound data and the sound of the second sound data;
Guide guidance system.