JP2019185107A

JP2019185107A - Information processing apparatus, terminal apparatus, information processing system, program and information processing method

Info

Publication number: JP2019185107A
Application number: JP2018070861A
Authority: JP
Inventors: 慧渡部; Akira Watanabe
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2018-04-02
Filing date: 2018-04-02
Publication date: 2019-10-24

Abstract

To provide an information processing apparatus, a terminal apparatus, an information processing system, a program and an information processing method that present route guidance information according to an age group or gender of a speaker and present information suitable for the speaker not only an utterance content of the speaker.SOLUTION: In an information processing system 1, an information processing apparatus 100 acquires voice data related to an utterance of a speaker via a server communication unit 101, identifies an age group or gender of the speaker by a gender and age determination unit 112 by referring to the acquired voice data, and determines route guidance information presenting the speaker by a guidance determination unit 113 according to the age group or gender identified by the gender and age determination unit.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、端末装置、情報処理システム、プログラム、情報処理方法に関する。 The present invention relates to an information processing device, a terminal device, an information processing system, a program, and an information processing method.

従来、ユーザによる目的地を曖昧に指定する発話内容に基づいて目的地を設定し、目的地までの誘導経路を探索するナビゲーションシステムが知られている。 2. Description of the Related Art Conventionally, a navigation system is known in which a destination is set based on utterance contents that specify a destination by a user vaguely and a guidance route to the destination is searched.

特開２００２−２６７４７２号公報（２００２年９月１８日公開）JP 2002-267472 A (published September 18, 2002)

しかしながら、目的地を曖昧に指定する発話内容に基づいて目的地を設定し、誘導する構成では、ユーザにとって適切な情報が提供されるとは限らず、ユーザの利便性の向上が望まれていた。 However, in the configuration in which the destination is set and guided based on the utterance content that specifies the destination in an ambiguous manner, appropriate information for the user is not always provided, and improvement of the user's convenience has been desired. .

本発明の一態様は、ユーザの入力音声に基づいて、ユーザに適した情報を提示することができる技術を提供することを目的とする。 An object of one embodiment of the present invention is to provide a technique capable of presenting information suitable for a user based on the user's input voice.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、取得部と、制御部とを備えた情報処理装置であって、上記制御部は、発話者の発話に関する音声データを、上記取得部を介して取得し、取得した上記音声データを参照して、上記発話者の年齢層又は性別を特定し、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定する構成である。 In order to solve the above-described problem, an information processing apparatus according to an aspect of the present invention is an information processing apparatus including an acquisition unit and a control unit, and the control unit includes voice data related to a speaker's utterance. Is acquired through the acquisition unit, the age group or gender of the speaker is specified with reference to the acquired voice data, and the speaker is presented to the speaker according to the specified age group or gender. It is the structure which determines route guidance information.

上記の課題を解決するために、本発明の一態様に係る端末装置は、取得部と、提示部と、制御部とを備えた端末装置であって、上記制御部は、発話者の発話に関する音声データを、上記取得部を介して取得し、取得した上記音声データを参照して、上記発話者の年齢層又は性別を特定し、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定し、決定した道案内情報を、上記提示部を介して提示する構成である。 In order to solve the above problems, a terminal device according to an aspect of the present invention is a terminal device including an acquisition unit, a presentation unit, and a control unit, and the control unit relates to a speaker's utterance. Voice data is acquired via the acquisition unit, the acquired voice data is referred to, the age group or gender of the speaker is specified, and the speaker is determined according to the specified age group or gender. The route guidance information to be presented is determined, and the determined route guidance information is presented via the presenting unit.

上記の課題を解決するために、本発明の一態様に係る情報処理システムは、１又は複数の端末装置と、情報処理装置とを備えた情報処理システムであって、上記１又は複数の端末装置の何れかは、上記情報処理装置に対して音声データを送信し、上記情報処理装置は、通信部と、制御部とを備えており、上記制御部は、発話者の発話に関する音声データを、上記通信部を介して取得し、取得した上記音声データを参照して、上記発話者の年齢層又は性別を特定し、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定し、上記通信部を介して、上記道案内情報を上記１又は複数の端末装置の何れかに送信し、上記道案内情報を受信した端末装置は、上記道案内情報を提示する構成である。 In order to solve the above problems, an information processing system according to an aspect of the present invention is an information processing system including one or more terminal devices and an information processing device, and the one or more terminal devices are provided. Is transmitted to the information processing apparatus, the information processing apparatus includes a communication unit and a control unit, the control unit, the voice data regarding the utterance of the speaker, The route information that is obtained via the communication unit, refers to the obtained voice data, identifies the speaker's age group or gender, and presents to the speaker according to the identified age group or gender. A configuration in which information is determined, the route guidance information is transmitted to one of the one or more terminal devices via the communication unit, and the terminal device receiving the route guidance information presents the route guidance information It is.

上記の課題を解決するために、本発明の一態様に係る情報処理方法は、音声データを取得する音声データ取得ステップと、取得した音声データを参照して、発話者の年齢層又は性別を特定する属性特定ステップと、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定する道案内情報決定ステップと、を含む方法である。 In order to solve the above problems, an information processing method according to an aspect of the present invention includes an audio data acquisition step of acquiring audio data, and specifying an age group or a gender of a speaker with reference to the acquired audio data And a route guidance information determining step for determining route guidance information to be presented to the speaker according to the specified age group or sex.

本発明の一態様によれば、ユーザの入力音声に基づいて、ユーザに適した情報を提示することができる。 According to one embodiment of the present invention, information suitable for a user can be presented based on the user's input voice.

実施形態１に係る情報処理システム１の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the information processing system 1 which concerns on Embodiment 1. FIG. （ａ）〜（ｃ）は、端末装置の外観構成を示す図である。(A)-(c) is a figure which shows the external appearance structure of a terminal device. 実施形態１に係る情報処理システムの概要を模式的に示す図である。1 is a diagram schematically illustrating an overview of an information processing system according to a first embodiment. 発話者に提供される道案内情報の表示の一例を示す図である。It is a figure which shows an example of the display of the route guidance information provided to a speaker. 記憶部に記憶されたデータテーブルを示す図である。It is a figure which shows the data table memorize | stored in the memory | storage part. 記憶部に記憶されたデータテーブルを示す図である。It is a figure which shows the data table memorize | stored in the memory | storage part. 実施形態１に係る情報処理システムの処理の流れを示すフローチャートである。3 is a flowchart illustrating a processing flow of the information processing system according to the first embodiment. 実施形態１に係る情報処理システムの処理の流れを示すフローチャートである。3 is a flowchart illustrating a processing flow of the information processing system according to the first embodiment. 実施形態２に係る情報処理システムの概要を模式的に示す図である。It is a figure which shows typically the outline | summary of the information processing system which concerns on Embodiment 2. FIG. 記憶部に記憶されたデータテーブルを示す図である。It is a figure which shows the data table memorize | stored in the memory | storage part. 実施形態３に係る情報処理システムの処理の流れを示すフローチャートである。10 is a flowchart illustrating a processing flow of the information processing system according to the third embodiment. 端末装置および情報処理装置として利用可能なコンピュータの構成を例示したブロック図である。It is the block diagram which illustrated the composition of the computer which can be used as a terminal unit and an information processor.

〔実施形態１〕
以下、本発明の実施形態１について、詳細に説明する。 Embodiment 1
Hereinafter, Embodiment 1 of the present invention will be described in detail.

図１は、実施形態１に係る情報処理システム１の要部構成を示すブロック図である。図２の（ａ）〜（ｃ）は、端末装置１０の外観構成の例を示す図である。図３は、情報処理システム１の概要を模式的に示す図である。図５は、発話者に提供される道案内情報の表示の一例を示す図である。 FIG. 1 is a block diagram illustrating a main configuration of an information processing system 1 according to the first embodiment. 2A to 2C are diagrams illustrating an example of an external configuration of the terminal device 10. FIG. 3 is a diagram schematically showing an overview of the information processing system 1. FIG. 5 is a diagram showing an example of display of route guidance information provided to a speaker.

図１に示すように、情報処理システム１は、情報処理装置１００と、端末装置１０と、を備えている。情報処理システム１では、情報処理装置１００と、端末装置１０とがネットワークを介した無線通信により互いに通信可能に構成されている。図示は省略するが、情報処理システム１では、１又は複数の情報処理装置１００が１又は複数の端末装置１０と通信可能に接続されている構成であってもよい。 As illustrated in FIG. 1, the information processing system 1 includes an information processing device 100 and a terminal device 10. In the information processing system 1, the information processing apparatus 100 and the terminal device 10 are configured to be able to communicate with each other by wireless communication via a network. Although illustration is omitted, the information processing system 1 may have a configuration in which one or a plurality of information processing apparatuses 100 are communicably connected to one or a plurality of terminal apparatuses 10.

情報処理システム１において、情報処理装置１００は、１又は複数の端末装置１０のネットワーク上のサーバとして機能する。端末装置１０は、情報処理装置１００によって処理された情報を発話者に提示するための端末である。 In the information processing system 1, the information processing apparatus 100 functions as a server on the network of one or a plurality of terminal apparatuses 10. The terminal device 10 is a terminal for presenting information processed by the information processing device 100 to a speaker.

（端末装置１０の構成）
端末装置１０は、端末通信部１１、端末制御部１２、音声入力部１３、音声出力部１４、及び表示部１５を備えている。なお、本実施形態では、情報処理システム１は、端末装置１０と、情報処理装置１００とが別体である場合を例に挙げて説明するが、本発明はこれに限定されるものではい。例えば、端末装置１０が、情報処理装置１００が有する各機能部を備えて構成されていてもよいし、複数の端末装置１０が共同で処理を行って、情報処理装置１００が有する各機能部を実現する構成であってもよい。 (Configuration of terminal device 10)
The terminal device 10 includes a terminal communication unit 11, a terminal control unit 12, a voice input unit 13, a voice output unit 14, and a display unit 15. In the present embodiment, the information processing system 1 is described by taking an example in which the terminal device 10 and the information processing device 100 are separate, but the present invention is not limited to this. For example, the terminal device 10 may be configured to include each functional unit included in the information processing device 100, or a plurality of terminal devices 10 may jointly perform processing to include each functional unit included in the information processing device 100. The structure which implement | achieves may be sufficient.

端末通信部１１は、例えばインターネットなどのネットワークを介して、情報処理装置１００と無線通信により通信する。端末通信部１１は、ネットワークを介して、情報処理装置から情報を受信する受信部として機能する。また、端末通信部１１は、ネットワークを介して情報処理装置１００に対して情報を送信する送信部としても機能する。 The terminal communication unit 11 communicates with the information processing apparatus 100 by wireless communication via a network such as the Internet. The terminal communication unit 11 functions as a receiving unit that receives information from the information processing apparatus via the network. The terminal communication unit 11 also functions as a transmission unit that transmits information to the information processing apparatus 100 via the network.

端末制御部１２は、端末装置１０の各部を統括的に制御する機能を備えている演算装置である。端末制御部１２は、例えば１つ以上のプロセッサ（例えばＣＰＵなど）が、１つ以上のメモリ（例えばＲＡＭやＲＯＭなど）に記憶されているプログラムを実行することで端末装置１０の各構成要素を制御する。 The terminal control unit 12 is an arithmetic device having a function of comprehensively controlling each unit of the terminal device 10. For example, one or more processors (for example, a CPU) execute a program stored in one or more memories (for example, a RAM, a ROM, etc.) so that each component of the terminal device 10 can be executed by the terminal control unit 12. Control.

音声入力部１３は、周囲の音声を集音して記憶する機能を有し、例えば、音声信号を電気信号に変換して記憶するマイクロフォンを備えている。音声入力部１３は、発話者の発話音声を集音して、音声データに変換する。音声入力部１３によって生成された音声データは、端末通信部１１を介して、情報処理装置１００に送信される。 The voice input unit 13 has a function of collecting and storing surrounding voices, and includes, for example, a microphone that converts voice signals into electrical signals and stores them. The voice input unit 13 collects the voice of the speaker and converts it into voice data. The voice data generated by the voice input unit 13 is transmitted to the information processing apparatus 100 via the terminal communication unit 11.

音声出力部１４は、スピーカを備え、端末制御部１２の制御に応じて音声出力を行う。なお、端末装置１０は、音声出力部１４に加えて、可動部を駆動させる駆動出力部や、ＬＥＤを発光させる発光出力部等様々な出力を行うことができる構成であってもよい。音声出力部１４は、端末通信部１１を介して、情報処理装置１００から受信した情報を音声出力することにより発話者に提示する。 The audio output unit 14 includes a speaker and performs audio output according to the control of the terminal control unit 12. In addition to the audio output unit 14, the terminal device 10 may have a configuration capable of performing various outputs such as a drive output unit that drives the movable unit and a light emission output unit that emits light from the LED. The voice output unit 14 presents information received from the information processing apparatus 100 via the terminal communication unit 11 by voice output to the speaker.

表示部１５は、端末制御部１２の制御に応じて、画像をディスプレイ表示することができるディスプレイデバイスである。表示部１５は、端末通信部１１を介して、情報処理装置１００から受信した情報をディスプレイ表示することにより発話者に提示する。 The display unit 15 is a display device that can display an image according to the control of the terminal control unit 12. The display unit 15 presents the information received from the information processing apparatus 100 via the terminal communication unit 11 to the speaker by displaying on the display.

なお、音声出力部１４、及び表示部１５を共に、提示部とも呼称するが、提示部は音声出力部１４、及び表示部１５の構成に限定されるものではく、提示部は、端末装置１０とは別体である構成であってもよい。情報処理システム１は、例えば、建物の床や壁に設けられ、光や画像で道案内を行う提示部を備えている構成であってもよい。 Note that the voice output unit 14 and the display unit 15 are both referred to as presentation units, but the presentation unit is not limited to the configurations of the voice output unit 14 and the display unit 15, and the presentation unit is not limited to the terminal device 10. The structure which is a different body may be sufficient. For example, the information processing system 1 may be configured to include a presentation unit that is provided on a floor or wall of a building and guides the road using light or an image.

端末装置１０は、例えば、図２の（ａ）に示したように、情報を発話者に提示するロボット端末であってもよい。ロボット端末１０ａは、音声出力部１４の機能により発話するロボットであり、情報を発話者に対して発話することで提示してもよい。また、ロボット端末１０ａは、表示部１５に情報をディスプレイ表示することによって発話者に提示してもよい。また、ロボット端末１０ａは、１又は複数の可動部を有し、端末制御部１２の制御に応じて可動部を駆動させることで、様々な姿勢や動作を取ることができるロボットであってもよい。 The terminal device 10 may be, for example, a robot terminal that presents information to a speaker as shown in FIG. The robot terminal 10a is a robot that speaks by the function of the audio output unit 14, and may present information by speaking to the speaker. Further, the robot terminal 10a may present the information to the speaker by displaying information on the display unit 15. The robot terminal 10a may be a robot that has one or a plurality of movable parts and can take various postures and actions by driving the movable parts according to the control of the terminal control unit 12. .

また、端末装置１０は、例えば、図２の（ｂ）に示したように、表示部１５に画像をディスプレイ表示するディスプレイデバイス１０ｂであってもよい。ディスプレイデバイス１０ｂは、情報をＬＥＤディスプレイや有機ＥＬディスプレイを備える表示部１５に表示することで発話者に提示する。また、ディスプレイデバイス１０ｂは音声出力部１４から情報を音声出力することによって発話者に提示することができてもよい。 Further, the terminal device 10 may be a display device 10b that displays an image on the display unit 15 as shown in FIG. 2B, for example. The display device 10b presents information to the speaker by displaying information on the display unit 15 including an LED display or an organic EL display. Further, the display device 10b may be able to present the information to the speaker by outputting the information from the voice output unit 14.

また、端末装置１０は、例えば、図２の（ｃ）に示したように、スマートフォン、携帯電話、タブレットＰＣなどの携帯型端末１０ｃであってもよい。携帯型端末１０ｃは、表示部１５よるディプスレイ表示、及び音声出力部１４からの音声出力の少なくとも何れか一方により発話者に情報を提示することができる。 Moreover, the terminal device 10 may be a portable terminal 10c such as a smartphone, a mobile phone, or a tablet PC, for example, as illustrated in FIG. The portable terminal 10 c can present information to the speaker by at least one of depth display by the display unit 15 and voice output from the voice output unit 14.

（情報処理装置１００の要部構成）
図１に示すように、情報処理装置１００は、サーバ通信部１０１（取得部）、制御部１１０、及び記憶部（案内ＤＢ）１２０を含んでいる。なお、本実施形態では、情報処理システム１は、１つの情報処理装置１００を備える場合を例に挙げて説明するが、本発明はこれに限定されるものではい。例えば、情報処理装置１００が有する各機能部が別々のサーバで構成され、複数のサーバにより情報処理装置１００が構成されていてもよいし、複数の情報処理装置１００が共同で処理を行っている構成であってもよい。 (Main part configuration of the information processing apparatus 100)
As illustrated in FIG. 1, the information processing apparatus 100 includes a server communication unit 101 (acquisition unit), a control unit 110, and a storage unit (guidance DB) 120. In the present embodiment, the information processing system 1 will be described by taking as an example the case of including one information processing apparatus 100, but the present invention is not limited to this. For example, each functional unit included in the information processing apparatus 100 may be configured by a separate server, and the information processing apparatus 100 may be configured by a plurality of servers, or the plurality of information processing apparatuses 100 perform processing together. It may be a configuration.

サーバ通信部１０１は、例えばインターネットなどのネットワークを介して、１又は複数の端末装置１０と無線通信により通信する。サーバ通信部１０１は、ネットワークを介して、１又は複数の端末装置１０から情報を取得する取得部として機能する。また、サーバ通信部１０１は、ネットワークを介して、１又は複数の端末装置１０に対して情報を出力する出力部としても機能する。 The server communication unit 101 communicates with one or a plurality of terminal devices 10 by wireless communication via a network such as the Internet, for example. The server communication unit 101 functions as an acquisition unit that acquires information from one or a plurality of terminal devices 10 via a network. The server communication unit 101 also functions as an output unit that outputs information to one or a plurality of terminal devices 10 via a network.

サーバ通信部１０１は、端末装置１０から、発話者の発話に基づく音声データを取得する。端末装置１０から送信される音声データには、端末装置１０の端末ＩＤを含む、端末装置１０を特定するための情報が含まれている。サーバ通信部１０１は、端末ＩＤを参照して、複数の端末装置１０の何れから音声データを受信したかを識別するとともに、いずれの端末装置１０に情報を送信するかを識別する。 The server communication unit 101 acquires voice data based on the utterance of the speaker from the terminal device 10. The audio data transmitted from the terminal device 10 includes information for specifying the terminal device 10 including the terminal ID of the terminal device 10. The server communication unit 101 refers to the terminal ID to identify from which of the plurality of terminal devices 10 audio data has been received and to which terminal device 10 the information is to be transmitted.

制御部１１０は、情報処理装置１００の各部を統括的に制御する機能を備えている演算装置である。制御部１１０は、例えば１つ以上のプロセッサ（例えばＣＰＵなど）が、１つ以上のメモリ（例えばＲＡＭやＲＯＭなど）に記憶されているプログラムを実行することで情報処理装置１００の各構成要素を制御する。 The control unit 110 is an arithmetic device that has a function of comprehensively controlling each unit of the information processing apparatus 100. For example, one or more processors (for example, a CPU) execute the program stored in one or more memories (for example, a RAM, a ROM, etc.) to control each component of the information processing apparatus 100. Control.

記憶部１２０は、制御部１１０で用いられる種々のデータを格納するストレージである。記憶部１２０は、例えば、内容の書き換えが可能な不揮発性メモリである、ＥＰＲＯＭ、ＥＥＰＲＯＭ（登録商標）、ＨＤＤ、フラッシュメモリなどのいずれか１つ、又はそれらの１つ以上の組み合わせによって実現される。 The storage unit 120 is a storage that stores various data used by the control unit 110. The storage unit 120 is realized by, for example, any one of EPROM, EEPROM (registered trademark), HDD, flash memory, etc., which is a rewritable nonvolatile memory, or a combination of one or more thereof. .

図５及び図６は、記憶部１２０に記憶されたデータテーブルを示す図である。図５は、目的地の候補に関する情報が書き込まれた目的地情報テーブルを示し、図６は、１又は複数の端末装置１０のそれぞれに関する情報が書き込まれた端末情報テーブルを示す。 5 and 6 are diagrams showing data tables stored in the storage unit 120. FIG. FIG. 5 shows a destination information table in which information on destination candidates is written, and FIG. 6 shows a terminal information table in which information on each of one or a plurality of terminal devices 10 is written.

図５に示すように、目的地情報テーブルには、目的地の候補の、施設、名称、場所、座標位置、キーワード、発話に関する情報が含まれている。目的地の座標位置は、３次元座標で示されており、緯度、経度、及び、高度で示されていてもよいし、所定の座標位置を基準（０，０，０）とするＸＹＺ座標で示されていてもよい。また、目的地情報テーブルに含まれるキーワードは、当該キーワードが対応付けられた施設や名称に関連するワードであり、当該目的地で購入することができる商品のカテゴリーを示すワードや、当該目的地で受けられるサービスを示すワードである。 As shown in FIG. 5, the destination information table includes information on facilities, names, places, coordinate positions, keywords, and utterances of destination candidates. The coordinate position of the destination is indicated by three-dimensional coordinates, and may be indicated by latitude, longitude, and altitude, or by XYZ coordinates with a predetermined coordinate position as a reference (0, 0, 0). May be shown. The keyword included in the destination information table is a word related to the facility or name associated with the keyword, such as a word indicating a category of products that can be purchased at the destination, or the destination. It is a word indicating the service that can be received.

また、図６に示すように、端末情報テーブルには、端末装置１０のそれぞれに固有の端末ＩＤに対応づけて、設置施設、設置場所、座標位置に関する情報が含まれている。端末装置１０の座標位置は、３次元座標で示されており、緯度、経度、及び、高度で示されていてもよいし、目的地情報テーブルに含まれる目的地の候補と同じ所定の座標位置を基準（０，０，０）とするＸＹＺ座標で示されていてもよい。 Further, as shown in FIG. 6, the terminal information table includes information related to the installation facility, the installation location, and the coordinate position in association with the terminal ID unique to each terminal device 10. The coordinate position of the terminal device 10 is indicated by three-dimensional coordinates, and may be indicated by latitude, longitude, and altitude, or the same predetermined coordinate position as the destination candidate included in the destination information table May be indicated by XYZ coordinates with reference to (0, 0, 0).

（制御部１１０の構成）
図１に示すように、制御部１１０は、音声認識部１１１、性別年代判定部１１２、案内決定部１１３、及び案内内容生成部１１４を含んでいる。 (Configuration of control unit 110)
As shown in FIG. 1, the control unit 110 includes a voice recognition unit 111, a gender age determination unit 112, a guidance determination unit 113, and a guidance content generation unit 114.

音声認識部１１１は、サーバ通信部１０１が端末装置１０から取得した音声データを参照して、発話者の発話内容に含まれる発話ワードを抽出する。 The voice recognition unit 111 refers to the voice data acquired from the terminal device 10 by the server communication unit 101 and extracts an utterance word included in the utterance content of the utterer.

性別年代判定部１１２は、サーバ通信部１０１が端末装置１０から取得した発話者の発話に関する音声データを参照して、発話者の年齢層又は性別を特定する。性別年代判定部１１２は、発話者の発話音声から、発話者の年齢層又は性別を特定する構成とすることができる。 The gender age determination unit 112 refers to the voice data regarding the utterance of the speaker acquired by the server communication unit 101 from the terminal device 10 and identifies the age group or gender of the speaker. The gender age determination unit 112 can be configured to identify the age group or gender of the speaker from the speech voice of the speaker.

例えば、性別年代判定部１１２は、発話者の発話音声から、当該音声に含まれる周波数の平均値及び標準偏差を抽出する。そして性別年代判定部１１２は、抽出した周波数の平均値及び標準偏差が、予め設定した範囲に当てはまる否かで発話者が成人男性であるのか、成人女性であるのか、子供であるのかを特定してもよい。また、性別年代判定部１１２は、発話者の発話音声の物理的特徴量、例えば音声派の音圧レベルの特徴量、を抽出して、発話者が成人男性であるのか、成人女性であるのか、子供であるのかを特定してもよい。さらに、性別年代判定部１１２は、発話者の発話音声の音声波形スペクトルの形状を参照して、発話者が成人男性であるのか、成人女性であるのか、子供であるのかを特定してもよい。 For example, the gender age determination unit 112 extracts an average value and a standard deviation of frequencies included in the speech from the speech speech of the speaker. Then, the gender age determination unit 112 specifies whether the speaker is an adult male, an adult female, or a child depending on whether or not the average value and standard deviation of the extracted frequencies fall within a preset range. May be. Further, the gender age determination unit 112 extracts the physical feature amount of the utterance voice of the speaker, for example, the feature amount of the sound pressure level of the voice group, and whether the speaker is an adult man or an adult woman. You may specify whether you are a child. Furthermore, the gender age determination unit 112 may identify whether the speaker is an adult male, an adult female, or a child by referring to the shape of the speech waveform spectrum of the speech of the speaker. .

また、性別年代判定部１１２は、所謂ｉ−ｖｅｃｔｏｒと呼ばれる音響特徴量を使用し、機械学習により発話者の性別や年代を特定することができてもよい。性別年代判定部１１２は、発話者を成人男性、成人女性、子供に判別する構成に限らず、さらに詳細に発話者の年齢層、及び、性別を判別することができてもよい。例えば、性別年代判定部１１２は、発話者を成人男性、成人女性に加えて、高齢者男性（例えば６６歳以上を想定）、高齢者女性（例えば６６歳以上を想定）、とに分けて特定することができてもよい。また、性別年代判定部１１２は、子供に加えて、子供を男児（例えば１２歳以下を想定）、女児（例えば１２歳以下を想定）、とに分けて特定することができてもよい。また、性別年代判定部１１２は、判定が難しい発話者の声や、人の声以外の音声が入力された場合には、判定不明の結果を出力してもよい。 The gender age determination unit 112 may be able to specify the gender and age of the speaker by machine learning using an acoustic feature called so-called i-vector. The gender age determination unit 112 is not limited to a configuration that determines an utterer as an adult male, an adult woman, or a child, but may be able to determine the age group and gender of the speaker in more detail. For example, the gender age determination unit 112 divides speakers into adult men and adult women, and identifies elderly men (for example, assuming 66 years old or older) and elderly women (for example, 66 years or older). You may be able to. In addition to the child, the gender age determination unit 112 may be able to specify the child separately as a boy (for example, assuming 12 years old or younger) and a girl (for example, assuming 12 years old or younger). Further, the gender age determination unit 112 may output a result of unknown determination when a voice of a speaker that is difficult to determine or a voice other than a human voice is input.

案内決定部１１３は、音声認識部１１１によって発話者の発話内容から抽出された発話ワードと、性別年代判定部１１２によって特定された発話者の年齢層又は性別と、に応じて、発話者に提示する道案内情報を決定する。 The guidance determination unit 113 presents to the speaker according to the utterance word extracted from the utterance content of the speaker by the voice recognition unit 111 and the age group or gender of the speaker identified by the gender age determination unit 112. The route guidance information to be determined is determined.

これらの構成によれば、発話者の発話内容だけではなく、発話者の年齢層又は性別に応じた道案内情報を発話者に提示することができ、発話者のそれぞれに適した情報を提示することができる。 According to these configurations, not only the utterance contents of the utterer but also the route guidance information according to the age group or sex of the utterer can be presented to the utterer, and information suitable for each of the utterers is presented. be able to.

また、案内決定部１１３は、発話者に提示する道案内情報の候補が複数ある場合に、性別年代判定部１１２によって特定された発話者の年齢層又は性別と、発話者の音声データが入力された端末装置１０の位置情報と、の少なくとも何れかを参照して、発話者に提示する道案内情報を選択する。 Further, when there are a plurality of candidates for the route guidance information to be presented to the speaker, the guidance determination unit 113 receives the speaker's age group or gender specified by the gender age determination unit 112 and the voice data of the speaker. The route guidance information to be presented to the speaker is selected with reference to at least one of the location information of the terminal device 10.

案内決定部１１３は、音声認識部１１１によって発話者の発話内容から抽出された発話ワードと、目的地情報テーブルに含まれるキーワードとのマッチングを行い、目的地の候補を決定する。 The guidance determination unit 113 performs matching between the utterance word extracted from the utterance content of the speaker by the voice recognition unit 111 and the keyword included in the destination information table, and determines a destination candidate.

また、案内決定部１１３は、複数の目的地の候補から、性別年代判定部１１２によって特定された発話者の年齢層又は性別に応じて、目的地の候補を絞り込む。 In addition, the guidance determination unit 113 narrows down destination candidates from a plurality of destination candidates according to the age group or gender of the speaker specified by the gender age determination unit 112.

さらに、案内決定部１１３は、発話者の音声が入力された端末装置１０の位置に応じて、目的地の候補を絞り込むことができてもよい。例えば、案内決定部１１３は、発話者の目的地がトイレである場合には、発話者の音声が入力された端末装置１０の位置から最も近いトイレを目的地として決定してもよい。 Furthermore, the guidance determination unit 113 may be able to narrow down destination candidates according to the position of the terminal device 10 to which the voice of the speaker is input. For example, when the destination of the speaker is a toilet, the guidance determination unit 113 may determine the closest toilet from the position of the terminal device 10 to which the voice of the speaker has been input as the destination.

案内内容生成部１１４は、案内決定部１１３によって決定された目的地に応じた案内内容を生成する。案内内容生成部１１４は、例えば、目的地情報テーブルに含まれる、決定された目的地に応じた情報を参照して、案内内容を生成してもよい。また、案内内容生成部１１４は、案内決定部１１３によって決定された目的地の位置と、発話者の音声が入力された端末装置１０の位置と、に応じて、発話者の現在位置から目的地までの経路案内に係る案内内容を決定してもよい。案内内容生成部１１４によって生成される目的地の候補に関する情報と、目的地までの経路案内を含む情報とを総称して、本実施形態では道案内情報を呼称する。 The guidance content generation unit 114 generates guidance content according to the destination determined by the guidance determination unit 113. For example, the guidance content generation unit 114 may generate the guidance content with reference to information according to the determined destination included in the destination information table. Further, the guidance content generation unit 114 determines the destination from the current position of the speaker according to the position of the destination determined by the guidance determination unit 113 and the position of the terminal device 10 to which the voice of the speaker is input. The content of guidance related to the route guidance up to may be determined. Information regarding destination candidates generated by the guidance content generation unit 114 and information including route guidance to the destination are collectively referred to as road guidance information in this embodiment.

このように、情報処理システム１によれば、図３に示すように、発話者が端末装置１０に向かって、例えば「洋服売り場はどこ？」と発話すると、当該発話に基づく音声データが端末装置１０から情報処理装置１００に提供される。情報処理装置１００は、取得した音声データを参照して、発話者が成人の女性であると判定し、発話者の属性に応じた情報を目的地情報テーブルから抽出して、道案内情報を生成して、端末装置１０に提供する。端末装置１０は、情報処理装置１００から受信した「婦人服コーナーは３Ｆ、４Ｆです」という道案内情報を、音声出力、及びディスプレイ表示の少なくとも何れか一方によって発話者に提示する。 In this way, according to the information processing system 1, as shown in FIG. 3, when a speaker speaks toward the terminal device 10, for example, “Where is the clothing store?”, Voice data based on the speech is transmitted to the terminal device. 10 to the information processing apparatus 100. The information processing apparatus 100 refers to the acquired voice data, determines that the speaker is an adult woman, extracts information according to the attribute of the speaker from the destination information table, and generates route guidance information And provided to the terminal device 10. The terminal device 10 presents the route guidance information received from the information processing device 100 “Women's clothing corner is 3F, 4F” to the speaker by at least one of voice output and display display.

なお、上述したように、情報処理装置１００から提供される道案内情報には、目的地までの経路案内の情報が含まれている。端末装置１０は、情報処理装置１００から取得した道案内情報に応じて発話者に情報を提供する際に、目的地を案内するだけではなく、図４に示すように、目的地までの道順を示すこともできる構成であってもよい。 As described above, the route guidance information provided from the information processing apparatus 100 includes information on route guidance to the destination. When the terminal device 10 provides information to the speaker according to the route guidance information acquired from the information processing device 100, the terminal device 10 not only guides the destination but also provides directions to the destination as shown in FIG. The structure which can also be shown may be sufficient.

（情報処理システム１の処理の流れについて）
図７は情報処理システム１の処理の流れの一例を示すフローチャートである。 (About the processing flow of the information processing system 1)
FIG. 7 is a flowchart showing an example of the processing flow of the information processing system 1.

（ステップＳ１）
端末装置１０に対して、発話者の音声入力が行われる。 (Step S1)
A speaker's voice is input to the terminal device 10.

（ステップＳ２）
端末装置１０の端末制御部１２は、音声入力部１３によって生成された音声データを、端末通信部１１を介して、情報処理装置１００に送信する。 (Step S2)
The terminal control unit 12 of the terminal device 10 transmits the audio data generated by the audio input unit 13 to the information processing apparatus 100 via the terminal communication unit 11.

（ステップＳ３）
情報処理装置１００のサーバ通信部１０１は、音声データを取得する。 (Step S3)
The server communication unit 101 of the information processing apparatus 100 acquires audio data.

（ステップＳ４）
情報処理装置１００の制御部１１０は、音声認識部１１１の機能により、取得した音声データから発話ワードを抽出する。また、情報処理装置１００の制御部１１０は、性別年代判定部１１２の機能により、取得した音声データを参照して、発話者の年齢層又は性別を特定する。 (Step S4)
The control unit 110 of the information processing apparatus 100 extracts an utterance word from the acquired voice data by the function of the voice recognition unit 111. Moreover, the control part 110 of the information processing apparatus 100 refers to the acquired audio | voice data with the function of the sex age determination part 112, and specifies an age group or sex of a speaker.

（ステップＳ５）
情報処理装置１００の制御部１１０は、案内決定部１１３の機能により、音声認識部１１１によって音声データから抽出された発話ワードと、記憶部１２０の目的地情報テーブルに含まれるキーワードとのマッチングを行う。案内決定部１１３は、発話ワードと、キーワードとがマッチしたか否かを判定する。案内決定部１１３は、発話ワードと、キーワードとがマッチしたと判定すると（ステップＳ５でＹＥＳ）、ステップＳ６に進む。案内決定部１１３は、発話ワードと、キーワードとがマッチしなかったと判定すると（ステップＳ５でＮＯ）、ステップＳ１０に進む。 (Step S5)
The control unit 110 of the information processing apparatus 100 uses the function of the guidance determination unit 113 to match the utterance word extracted from the voice data by the voice recognition unit 111 and the keyword included in the destination information table of the storage unit 120. . The guidance determination unit 113 determines whether or not the utterance word matches the keyword. If the guidance determination unit 113 determines that the utterance word matches the keyword (YES in step S5), the guidance determination unit 113 proceeds to step S6. If the guidance determination unit 113 determines that the utterance word does not match the keyword (NO in step S5), the guidance determination unit 113 proceeds to step S10.

（ステップＳ６）
案内決定部１１３は、発話ワードと、キーワードとがマッチしたことによって目的地情報テーブルから選択した目的地の候補が１件であるか否かを判定する。案内決定部１１３は、目的地の候補が１件であると判定すると（ステップＳ６でＹＥＳ）、ステップＳ１０に進む。案内決定部１１３は、目的地の候補が１件ではなく複数であると判定すると（ステップＳ６でＮＯ）、ステップＳ７に進む。 (Step S6)
The guidance determination unit 113 determines whether or not there is only one destination candidate selected from the destination information table when the utterance word matches the keyword. If the guidance determination unit 113 determines that there is one destination candidate (YES in step S6), the guidance determination unit 113 proceeds to step S10. If the guidance determination unit 113 determines that there are a plurality of destination candidates instead of one (NO in step S6), the guidance determination unit 113 proceeds to step S7.

（ステップＳ７）
案内決定部１１３は、発話ワードと、キーワードとがマッチした複数の目的地の候補を、性別年代判定部１１２によって特定された発話者の年齢層又は性別によって絞り込み、発話者の年齢層又は性別に応じた目的地の候補を選択する。 (Step S7)
The guidance determination unit 113 narrows down a plurality of destination candidates that match the utterance word and the keyword by the age group or gender of the speaker specified by the gender age determination unit 112, and determines the age group or gender of the speaker. Select a destination candidate according to the destination.

（ステップＳ８）
案内決定部１１３は、発話者の年齢層又は性別に応じた目的地の候補が１件であるか否かを判定する。案内決定部１１３は、目的地の候補が１件であると判定すると（ステップＳ８でＹＥＳ）、ステップＳ１０に進む。案内決定部１１３は、目的地の候補が１件ではなく複数であると判定すると（ステップＳ８でＮＯ）、ステップＳ９に進む。 (Step S8)
The guidance determination unit 113 determines whether there is one destination candidate according to the age group or gender of the speaker. If the guidance determination unit 113 determines that there is only one destination candidate (YES in step S8), the guidance determination unit 113 proceeds to step S10. If the guidance determining unit 113 determines that there are a plurality of destination candidates instead of one (NO in step S8), the process proceeds to step S9.

（ステップＳ９）
案内決定部１１３は、発話者の年齢層又は性別に応じた複数の目的地の候補を、発話者が音声入力を行った端末装置１０の位置に応じて絞り込む。案内決定部１１３は、例えば、発話者が音声入力を行った端末装置１０の位置に最も近い目的地を、複数の目的地の候補から選択する。 (Step S9)
The guidance determination unit 113 narrows down a plurality of destination candidates according to the age group or gender of the speaker according to the position of the terminal device 10 on which the speaker has input voice. For example, the guidance determination unit 113 selects a destination closest to the position of the terminal device 10 on which the speaker has made a voice input from a plurality of destination candidates.

なお、図示は省略するが、目的地情報テーブルには、各目的地の候補に優先度が設定されており、案内決定部１１３は、複数の目的地の候補から優先度の高いものを選択する構成であってもよい。 Although illustration is omitted, priority is set to each destination candidate in the destination information table, and the guidance determination unit 113 selects a plurality of destination candidates having higher priority. It may be a configuration.

（ステップＳ１０）
情報処理装置１００の制御部１１０は、案内内容生成部１１４の機能により、ステップＳ６、ステップＳ８、又はステップＳ９において案内決定部１１３によって決定された目的地に応じた道案内情報を生成する。また、案内内容生成部１１４は、ステップ５において発話ワードと、キーワードとがマッチしなかった場合には、目的地の候補が見つからなかったことを示す道案内情報を生成する。 (Step S10)
The control unit 110 of the information processing apparatus 100 generates the route guidance information according to the destination determined by the guidance determination unit 113 in step S6, step S8, or step S9 by the function of the guidance content generation unit 114. Further, if the utterance word does not match the keyword in step 5, the guidance content generation unit 114 generates route guidance information indicating that no destination candidate has been found.

（ステップＳ１１）
情報処理装置１００の制御部１１０は、サーバ通信部１０１を介して、生成した道案内情報を端末装置１０に送信する。 (Step S11)
The control unit 110 of the information processing apparatus 100 transmits the generated route guidance information to the terminal device 10 via the server communication unit 101.

（ステップＳ１２）
端末装置１０の端末通信部１１は、情報処理装置１００から道案内情報を受信する。 (Step S12)
The terminal communication unit 11 of the terminal device 10 receives route guidance information from the information processing device 100.

（ステップＳ１３）
端末装置１０の端末制御部１２は、音声出力部１４による音声案内、及び、表示部１５による案内画面表示の少なくとも何れか一方によって、発話者に道案内情報を提示する。 (Step S13)
The terminal control unit 12 of the terminal device 10 presents road guidance information to the speaker by at least one of voice guidance by the voice output unit 14 and guidance screen display by the display unit 15.

（情報処理装置１００の処理の流れについて）
図８は情報処理装置１００の処理の流れの一例を示すフローチャートであり、特に、性別年代判定に係る処理の流れを示す。 (About the flow of processing of the information processing apparatus 100)
FIG. 8 is a flowchart showing an example of the processing flow of the information processing apparatus 100, and particularly shows the processing flow related to gender age determination.

（ステップＳ２１）
情報処理装置１００は、サーバ通信部１０１の機能によって、端末装置１０から音声データを取得する。 (Step S21)
The information processing apparatus 100 acquires audio data from the terminal apparatus 10 by the function of the server communication unit 101.

（ステップＳ２２）
情報処理装置１００の制御部１１０は、性別年代判定部１１２の機能により、取得した音声データを解析し、発話者の年齢層を特定する。 (Step S22)
The control unit 110 of the information processing apparatus 100 analyzes the acquired voice data by the function of the gender age determination unit 112 and specifies the age group of the speaker.

（ステップＳ２３）
性別年代判定部１１２は、ステップＳ２２で特定した発話者の年齢層を参照して、発話者は子供であるか否かを判定する。性別年代判定部１１２は、発話者は子供であると判定すると（ステップＳ２３でＹＥＳ）、ステップＳ２６に進む。性別年代判定部１１２は、発話者は子供ではないと判定すると（ステップＳ２３でＮＯ）、ステップＳ２４に進む。 (Step S23)
The gender age determination unit 112 determines whether or not the speaker is a child with reference to the age group of the speaker specified in step S22. If the sex age determination unit 112 determines that the speaker is a child (YES in step S23), the process proceeds to step S26. If the sex age determination unit 112 determines that the speaker is not a child (NO in step S23), the process proceeds to step S24.

（ステップＳ２４）
性別年代判定部１１２は、ステップＳ２１で取得した音声データを更に解析し、発話者の性別を特定する。 (Step S24)
The gender age determination unit 112 further analyzes the voice data acquired in step S21 and identifies the gender of the speaker.

（ステップＳ２５）
性別年代判定部１１２は、ステップＳ２４で発話者の性別が特定できたか否かを判定する。性別年代判定部１１２は、発話者の性別が特定できたと判定すると（ステップＳ２５でＹＥＳ）、ステップＳ２６に進む。性別年代判定部１１２は、発話者の性別が特定できなかったと判定すると（ステップＳ２５でＮＯ）、ステップＳ２７に進む。 (Step S25)
The gender age determination unit 112 determines whether or not the gender of the speaker has been identified in step S24. If the gender age determination unit 112 determines that the gender of the speaker has been identified (YES in step S25), the process proceeds to step S26. If the gender age determination unit 112 determines that the gender of the speaker cannot be specified (NO in step S25), the process proceeds to step S27.

（ステップＳ２６）
性別年代判定部１１２は、判定結果を案内決定部１１３に出力する。 (Step S26)
The sex age determination unit 112 outputs the determination result to the guidance determination unit 113.

（ステップＳ２７）
性別年代判定部１１２は、性別年代判定が失敗したことを示す判定失敗の結果を案内決定部１１３に出力する。 (Step S27)
The gender age determination unit 112 outputs a determination failure result indicating that the gender age determination has failed to the guidance determination unit 113.

性別年代判定部１１２によって判定が失敗した場合、制御部１１０は、発話者に再び発話を促す案内情報を端末装置１０に送信してもよい。また、性別年代判定部１１２によって判定が失敗した場合、制御部１１０は、発話者の発話ワードに応じて抽出した目的地の候補を全て含む道案内情報を端末装置１０に送信してもよい。 When the determination by the gender age determination unit 112 fails, the control unit 110 may transmit guidance information that prompts the speaker to speak again to the terminal device 10. When the determination by the gender age determination unit 112 fails, the control unit 110 may transmit the route guidance information including all the destination candidates extracted according to the utterance word of the speaker to the terminal device 10.

制御部１１０は、取得した音声データを参照して、発話者の年齢層を特定し、特定した年齢層が所定の年齢よりも大きい場合には、更に、当該発話者の性別を特定してもよい。このように、発話者が子供であるか否かを判定したうえで、発話者が子供でなければ発話者の性別を判定する構成であるため、発話者にとって適切な情報を提供することができ、発話者の利便性の向上を図ることができる。 The control unit 110 refers to the acquired voice data, specifies the age group of the speaker, and if the specified age group is larger than the predetermined age, the control unit 110 may further specify the gender of the speaker. Good. Thus, since it is the structure which determines whether a speaker is a child and determines the gender of a speaker, if a speaker is not a child, information suitable for a speaker can be provided. , The convenience of the speaker can be improved.

なお、図示は省略するが、端末装置１０は、発話者を含む画像を撮影する撮影部を備え、音声データとともに、発話者を含む撮影画像を情報処理装置１００に提供することができてもよい。そして、情報処理装置１００は、音声データとともに、発話者の撮影画像を参照して、発話者の年齢性又は性別を特定することができてもよい。 Although illustration is omitted, the terminal device 10 may include an imaging unit that captures an image including a speaker, and may provide the information processing apparatus 100 with a captured image including the speaker along with audio data. . Then, the information processing apparatus 100 may be able to specify the age or sex of the speaker by referring to the captured image of the speaker along with the voice data.

〔実施形態２〕
本発明の実施形態２について、以下に説明する。なお、説明の便宜上、上記実施形態１にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。また、実施形態２に係る情報処理システム１の要部構成については、上記実施形態１にて説明した情報処理システム１と同様であり、その説明を省略する。 [Embodiment 2]
Embodiment 2 of the present invention will be described below. For convenience of explanation, members having the same functions as those described in the first embodiment are given the same reference numerals, and the description thereof will not be repeated. Moreover, about the structure of the principal part of the information processing system 1 which concerns on Embodiment 2, it is the same as that of the information processing system 1 demonstrated in the said Embodiment 1, The description is abbreviate | omitted.

図９は、実施形態２に係る情報処理システム１の概要を模式的に示す図である。図９は、発話者が子供の場合を例に示すものである。実施形態２に係る情報処理システム１は、発話者の年齢層又は性別に応じて、発話者に提示する道案内情報の提示形態を変更することができるシステムである。 FIG. 9 is a diagram schematically illustrating an overview of the information processing system 1 according to the second embodiment. FIG. 9 shows an example where the speaker is a child. The information processing system 1 according to the second embodiment is a system that can change the presentation form of the route guidance information presented to the speaker according to the age group or sex of the speaker.

図９に示すように、発話者が端末装置１０に向かって、例えば「洋服売り場はどこ？」と発話すると、当該発話に基づく音声データが端末装置１０から情報処理装置１００に提供される。情報処理装置１００は、取得した音声データを参照して、発話者が子供であると判定し、発話者の発話内容と、発話者の属性とに応じた情報を目的地情報テーブルから抽出して、道案内情報を生成する。情報処理装置１００の制御部１１０は、発話者が子供である場合には、発話者に提示する道案内情報の表現を子供にとってなじみがあり、親しみやすい口調や表現に変更する。情報処理装置１００の制御部１１０は、例えば、道案情報が、「子供服コーナーは８Ｆ」という情報であれば、語尾を変更して、「子供服コーナーは８Ｆだよ」という表現としてもよい。 As shown in FIG. 9, when a speaker speaks toward the terminal device 10, for example, “Where is the clothing store?”, Voice data based on the speech is provided from the terminal device 10 to the information processing device 100. The information processing apparatus 100 refers to the acquired audio data, determines that the speaker is a child, and extracts information corresponding to the speaker's utterance content and the attributes of the speaker from the destination information table. , Generate route guidance information. When the speaker is a child, the control unit 110 of the information processing apparatus 100 changes the expression of the route guidance information presented to the speaker to a familiar tone and expression that is familiar to the child. The control unit 110 of the information processing apparatus 100 may, for example, change the ending if the plan information is information that “the children's clothing corner is 8F”, and the expression “the children's clothing corner is 8F” may be used. .

また、情報処理装置１００の制御部１１０は、生成した道案内情報に、発話者の年齢層又は性別に応じた端末装置１０の制御情報を含ませて、端末装置１０に提供してもよい。情報処理装置１００の制御部１１０は、端末装置１０がロボット端末１０ａである場合には、ロボット端末１０ａの可動部を駆動させるための制御情報を生成して、端末装置１０に提供してもよい。これにより、ロボット端末１０ａは、発話者が子供である場合には、例えば、可動部を回転させながら、音声出力を行うといった提示形態で、道案内情報の提示を行うことができてもよい。 Further, the control unit 110 of the information processing apparatus 100 may include the control information of the terminal device 10 according to the age group or sex of the speaker in the generated route guidance information, and provide the terminal device 10 with the control information. When the terminal device 10 is the robot terminal 10a, the control unit 110 of the information processing device 100 may generate control information for driving the movable unit of the robot terminal 10a and provide the control information to the terminal device 10. . Thereby, when the speaker is a child, the robot terminal 10a may be able to present the route guidance information in a presentation form in which voice output is performed while rotating the movable part, for example.

また、端末装置１０がディスプレイデバイス１０ｂである場合には、情報処理装置１００の制御部１１０は、子供向けの画像を含む道案内情報や、ひらがな表記の道案内情報を生成して、子供向けの提示形態で道案内情報を発話者に提示することができてもよい。 When the terminal device 10 is the display device 10b, the control unit 110 of the information processing apparatus 100 generates road guidance information including an image for children or road guidance information in hiragana notation, and It may be possible to present the route guidance information to the speaker in a presentation form.

このように、情報処理システム１は、発話者の年齢層又は性別に応じて、発話者に提示する道案内情報の提示形態を変更することができるため、発話者にとって分かり易い提示形態で、道案内情報を発話者に提示することができ、発話者の利便性の向上を図ることができる。また、発話者にとって、親しみやすいユーザインターフェースを提供することができる。 As described above, the information processing system 1 can change the presentation form of the route guidance information presented to the speaker according to the age group or gender of the speaker. Guidance information can be presented to the speaker, and the convenience of the speaker can be improved. In addition, it is possible to provide a user interface that is familiar to the speaker.

〔実施形態３〕
本発明の実施形態３について、以下に説明する。なお、説明の便宜上、上記実施形態１にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。また、実施形態３に係る情報処理システム１の要部構成については、上記実施形態１にて説明した情報処理システム１と同様であり、その説明を省略する。 [Embodiment 3]
Embodiment 3 of the present invention will be described below. For convenience of explanation, members having the same functions as those described in the first embodiment are given the same reference numerals, and the description thereof will not be repeated. Moreover, about the structure of the principal part of the information processing system 1 which concerns on Embodiment 3, it is the same as that of the information processing system 1 demonstrated in the said Embodiment 1, The description is abbreviate | omitted.

図１０は、記憶部１２０に記憶された経由地情報テーブルの一例を示す図である。図１１は、情報処理装置１００の制御部１１０の処理の流れを示すフローチャートであり、案内内容生成部１１４による処理の流れを示している。実施形態３に係る情報処理システム１は、発話者の年齢層又は性別に応じて、発話者に提示する道案内情報に含まれる目的地までの経路を変更することができるシステムである。 FIG. 10 is a diagram illustrating an example of the waypoint information table stored in the storage unit 120. FIG. 11 is a flowchart showing a processing flow of the control unit 110 of the information processing apparatus 100, and shows a processing flow by the guidance content generation unit 114. The information processing system 1 according to the third embodiment is a system that can change a route to a destination included in route guidance information presented to a speaker according to the age group or gender of the speaker.

図１０に示すように、経由地情報テーブルには、各施設における、お勧めの経由地（スポット）の情報である、名称、場所、座標位置、発話に関する情報が含まれている。経由地の座標位置は、３次元座標で示されており、緯度、経度、及び高度で示されていてもよいし、目的地情報テーブルに含まれる目的地の候補と同じ所定の座標位置を基準（０，０，０）とするＸＹＺ座標で示されていてもよい。 As shown in FIG. 10, the waypoint information table includes information about names, places, coordinate positions, and utterances, which are information of recommended waypoints (spots) in each facility. The coordinate position of the waypoint is indicated by three-dimensional coordinates, and may be indicated by latitude, longitude, and altitude, and is based on the same predetermined coordinate position as the destination candidate included in the destination information table. It may be indicated by XYZ coordinates (0, 0, 0).

次に、図１１を参照して、案内内容生成部１１４による処理の流れについて説明する。 Next, with reference to FIG. 11, the flow of processing by the guidance content generation unit 114 will be described.

（ステップＳ３１）
案内内容生成部１１４は、発話者が音声入力を行った端末装置１０の位置と、案内情報決定部１１３によって決定された目的地と、発話者の年齢層又は性別とを参照して、経由地情報テーブルから経由地の候補を抽出する。 (Step S31)
The guidance content generation unit 114 refers to the location of the terminal device 10 where the speaker has made a voice input, the destination determined by the guidance information determination unit 113, and the age group or gender of the speaker, Extract route candidates from the information table.

案内内容生成部１１４は、例えば端末装置１０が１デパートの４Ｆに設置されているものであり、発話者が成人男性であり、目的地が６Ｆのトイレであれば、男性を発話とする５Ｆのホビーに関する店舗を経由地の候補として抽出する。 For example, if the terminal device 10 is installed on the 4th floor of one department store, the speaker is an adult male, and the destination is a 6F toilet, the guidance content generation unit 114 is a 5F speaker that speaks of a male. A store related to hobby is extracted as a candidate for a transit point.

（ステップＳ３２）
案内内容生成部１１４は、ステップＳ３１で抽出した経由地の候補を経由する、現在地から目的地までの経路を示す経路案内情報を生成する。 (Step S32)
The guidance content generation unit 114 generates route guidance information indicating a route from the current location to the destination via the waypoint candidates extracted in step S31.

（ステップＳ３３）
案内内容生成部１１４は、ステップＳ３２で生成した経路案内情報を含む道案内情報を生成する。 (Step S33)
The guidance content generation unit 114 generates road guidance information including the route guidance information generated in step S32.

（ステップＳ３４）
案内内容生成部１１４は、ステップＳ３３で生成した道案内情報をサーバ通信部１０１に対して出力し、サーバ通信部１０１を介して端末装置１０に送信する。 (Step S34)
The guidance content generation unit 114 outputs the route guidance information generated in step S33 to the server communication unit 101 and transmits it to the terminal device 10 via the server communication unit 101.

このように、案内内容生成部１１４は、発話者の年齢層又は性別に応じて、発話者に提示する道案内情報に含まれる目的地までの経路を変更するため、発話者は、目的地までの経路を楽しみながら進むことができる。また、施設側にとっても、お勧めのスポットを発話者に提示することができる。 In this way, the guidance content generation unit 114 changes the route to the destination included in the route guidance information presented to the speaker according to the age group or sex of the speaker. You can proceed while enjoying the route. In addition, it is possible for the facility side to present a recommended spot to the speaker.

また、案内内容生成部１１４は、発話者の年齢層又は性別に応じたお勧めの経由地を経由する経路を道案内情報に含める構成に限らず、例えば発話者が子供の場合には、人通りが多く、安全な経路を選択して、経路案内情報を生成してもよい。 In addition, the guidance content generation unit 114 is not limited to the configuration in which the route guidance information includes a route passing through a recommended route according to the age group or gender of the speaker. For example, when the speaker is a child, Route guidance information may be generated by selecting a safe route with many streets.

また、経由地情報テーブルには、お勧めのスポットに関する情報に加えて、階段、エレベータ、エスカレータ等の移動手段に関する情報が含まれており、案内内容生成部１１４は、発話者の年齢層又は性別に応じて適切な移動手段を用いる経路案内情報を生成してもよい。これにより、例えば発話者が高齢者である場合には、案内内容生成部１１４は、バリアフリーな順路を通って行く現在地から目的地までの経路を案内することもできる。 In addition to the information about recommended spots, the waypoint information table includes information about moving means such as stairs, elevators, escalators, etc., and the guidance content generation unit 114 stores the age group or gender of the speaker. Depending on the situation, route guidance information using an appropriate moving means may be generated. Thereby, for example, when the speaker is an elderly person, the guidance content generation unit 114 can also guide the route from the current location to the destination through a barrier-free route.

さらに、案内内容生成部１１４は、発話者が子供でも幼児の場合には、同伴の大人の発話を促す案内情報を生成して出力するなど、大人が同伴していることを確認することができてもよい。 Furthermore, when the speaker is a child or an infant, the guidance content generation unit 114 can confirm that the adult is accompanied, for example, by generating and outputting guidance information that prompts the accompanying adult to speak. May be.

また、情報処理装置１００が、音声データと共に、発話者を含む撮影画像を参照して発話者の年齢層又は性別を特定する構成では、制御部１１０は、発話者がカートやベビーカー押している等の発話者の状況を特定することもできる構成であってもよい。そして、制御部１１０は、特定した発話者の状況に応じた移動手段を用いた経路を道案内情報に含めることができてもよい。 In the configuration in which the information processing apparatus 100 identifies the age group or gender of the speaker with reference to the captured image including the speaker along with the voice data, the control unit 110 may be configured such that the speaker presses a cart or a stroller. The structure which can also specify the condition of a speaker may be sufficient. And the control part 110 may be able to include the path | route using the moving means according to the situation of the specified speaker in route guidance information.

〔実施形態４〕
上記各実施形態では、１つのサーバである情報処理装置１００を用いる例を説明したが、情報処理装置１００の有する各機能が、個別のサーバにて実現されていてもよい。そして、複数のサーバを適用する場合においては、各サーバは、同じ事業者によって管理されていてもよいし、異なる事業者によって管理されていてもよい。 [Embodiment 4]
In each of the above-described embodiments, an example in which the information processing apparatus 100 that is one server is used has been described. However, each function of the information processing apparatus 100 may be realized by an individual server. And when applying a some server, each server may be managed by the same provider, and may be managed by a different provider.

〔実施形態５〕
端末装置１０およびサーバである情報処理装置１００の各ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。後者の場合、端末装置１０および情報処理装置１００のそれぞれを、図１２に示すようなコンピュータ（電子計算機）を用いて構成することができる。 [Embodiment 5]
Each block of the terminal device 10 and the information processing apparatus 100 as a server may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software. In the latter case, each of the terminal device 10 and the information processing apparatus 100 can be configured using a computer (electronic computer) as shown in FIG.

図１２は、端末装置１０および情報処理装置１００として利用可能なコンピュータ９１０の構成を例示したブロック図である。コンピュータ９１０は、バス９１１を介して互いに接続された演算装置９１２と、主記憶装置９１３と、補助記憶装置９１４と、入出力インターフェース９１５と、通信インターフェース９１６とを備えている。演算装置９１２、主記憶装置９１３、および補助記憶装置９１４は、それぞれ、例えばプロセッサ（例えばＣＰＵ：Central Processing Unit等）、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。入出力インターフェース９１５には、ユーザがコンピュータ９１０に各種情報を入力するための入力装置９２０、および、コンピュータ９１０がユーザに各種情報を出力するための出力装置９３０が接続される。入力装置９２０および出力装置９３０は、コンピュータ９１０に内蔵されたものであってもよいし、コンピュータ９１０に接続された（外付けされた）ものであってもよい。例えば、入力装置９２０は、キーボード、マウス、タッチセンサなどであってもよく、出力装置９３０は、ディスプレイ、プリンタ、スピーカなどであってもよい。また、タッチセンサとディスプレイとが一体化されたタッチパネルのような、入力装置９２０および出力装置９３０の双方の機能を有する装置を適用してもよい。そして、通信インターフェース９１６は、コンピュータ９１０が外部の装置と通信するためのインターフェースである。 FIG. 12 is a block diagram illustrating a configuration of a computer 910 that can be used as the terminal device 10 and the information processing apparatus 100. The computer 910 includes an arithmetic device 912, a main storage device 913, an auxiliary storage device 914, an input / output interface 915, and a communication interface 916 that are connected to each other via a bus 911. The arithmetic device 912, the main storage device 913, and the auxiliary storage device 914 may be, for example, a processor (for example, CPU: Central Processing Unit), a RAM (random access memory), and a hard disk drive, for example. Connected to the input / output interface 915 are an input device 920 for the user to input various information to the computer 910 and an output device 930 for the computer 910 to output various information to the user. The input device 920 and the output device 930 may be incorporated in the computer 910 or may be connected (externally attached) to the computer 910. For example, the input device 920 may be a keyboard, a mouse, a touch sensor, or the like, and the output device 930 may be a display, a printer, a speaker, or the like. In addition, a device having both functions of the input device 920 and the output device 930, such as a touch panel in which a touch sensor and a display are integrated, may be applied. The communication interface 916 is an interface for the computer 910 to communicate with an external device.

補助記憶装置９１４には、コンピュータ９１０を端末装置１０および情報処理装置１００として動作させるための各種のプログラムが格納されている。そして、演算装置９１２は、補助記憶装置９１４に格納された上記プログラムを主記憶装置９１３上に展開して該プログラムに含まれる命令を実行することによって、コンピュータ９１０を、端末装置１０および情報処理装置１００が備える各部として機能させる。なお、補助記憶装置９１４が備える、プログラム等の情報を記録する記録媒体は、コンピュータ読み取り可能な「一時的でない有形の媒体」であればよく、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブル論理回路などであってもよい。また、記録媒体に記録されているプログラムを、主記憶装置９１３上に展開することなく実行可能なコンピュータであれば、主記憶装置９１３を省略してもよい。なお、上記各装置（演算装置９１２、主記憶装置９１３、補助記憶装置９１４、入出力インターフェース９１５、通信インターフェース９１６、入力装置９２０、および出力装置９３０）は、それぞれ１つであってもよいし、複数であってもよい。 The auxiliary storage device 914 stores various programs for operating the computer 910 as the terminal device 10 and the information processing device 100. Then, the arithmetic device 912 expands the program stored in the auxiliary storage device 914 on the main storage device 913 and executes instructions included in the program, whereby the computer 910 is connected to the terminal device 10 and the information processing device. It is made to function as each part with which 100 is provided. Note that the recording medium for recording information such as programs provided in the auxiliary storage device 914 may be a computer-readable “non-temporary tangible medium”. For example, tape, disk, card, semiconductor memory, programmable logic, etc. It may be a circuit or the like. Further, the main storage device 913 may be omitted if the computer can execute the program recorded on the recording medium without developing it on the main storage device 913. Each of the above devices (the arithmetic device 912, the main storage device 913, the auxiliary storage device 914, the input / output interface 915, the communication interface 916, the input device 920, and the output device 930) may be one each. There may be a plurality.

また、上記プログラムは、コンピュータ９１０の外部から取得してもよく、この場合、任意の伝送媒体（通信ネットワークや放送波等）を介して取得してもよい。そして、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 The program may be acquired from the outside of the computer 910, and in this case, may be acquired via an arbitrary transmission medium (communication network, broadcast wave, etc.). The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る情報処理装置（１００）は、取得部（サーバ通信部１０１）と、制御部（１１０）とを備えた情報処理装置（１００）であって、上記制御部（１１０）は、発話者の発話に関する音声データを、上記取得部（１０１）を介して取得し、取得した上記音声データを参照して、上記発話者の年齢層又は性別を特定し、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定する構成である。 [Summary]
An information processing apparatus (100) according to an aspect 1 of the present invention is an information processing apparatus (100) including an acquisition unit (server communication unit 101) and a control unit (110), and includes the control unit (110). Acquires voice data related to the utterance of the speaker via the acquisition unit (101), refers to the acquired voice data, identifies the age group or gender of the speaker, and specifies the specified age group Or it is the structure which determines the route guidance information shown to the said speaker according to sex.

上記の構成によれば、発話者の年齢層又は性別に応じた道案内情報を発話者に提示することができ、発話者のそれぞれに適した情報を提示することができる。 According to said structure, the route guidance information according to the speaker's age group or sex can be shown to a speaker, and the information suitable for each speaker can be shown.

本発明の態様２に係る情報処理装置（１００）は、上記の態様１において、上記制御部（１１０）は、上記音声データを参照して、上記発話者の年齢層を特定し、特定した年齢層が所定の年齢よりも大きい場合には、更に、当該発話者の性別を特定する構成としてもよい。 In the information processing apparatus (100) according to the second aspect of the present invention, in the first aspect, the control unit (110) identifies the age group of the speaker by referring to the voice data, and the identified age When the layer is older than a predetermined age, the gender of the speaker may be specified.

上記の構成によれば、発話者にとって適切な情報を提供することができ、発話者の利便性の向上を図ることができる。 According to the above configuration, it is possible to provide appropriate information for the speaker, and to improve the convenience of the speaker.

本発明の態様３に係る情報処理装置（１００）は、上記の態様１又は２において、上記制御部（１１０）は、上記発話者に提示する上記道案内情報の候補が複数ある場合に、上記発話者の年齢層、上記発話者の性別、及び上記音声データが取得された場所の少なくとも何れか一つを参照して、上記発話者に提示する道案内情報を選択する構成としてもよい。 The information processing apparatus (100) according to aspect 3 of the present invention is the information processing apparatus (100) according to aspect 1 or 2, wherein the control unit (110) is configured as described above when there are a plurality of candidates for the route guidance information presented to the speaker. The route guidance information to be presented to the speaker may be selected with reference to at least one of the age group of the speaker, the sex of the speaker, and the location where the voice data is acquired.

上記の構成によれば、複数の目的地の候補から、目的地の候補を絞り込んで発話者に提示することができ、発話者にとって適切な情報を提供することができる。 According to the above configuration, destination candidates can be narrowed down and presented to the speaker from a plurality of destination candidates, and appropriate information for the speaker can be provided.

本発明の態様４に係る情報処理装置（１００）は、上記の態様１から３の何れか一項において、上記制御部（１１０）は、上記発話者の年齢層又は性別に応じて、上記発話者に提示する上記道案内情報の提示形態を変更する構成としてもよい。 The information processing device (100) according to aspect 4 of the present invention is the information processing apparatus (100) according to any one of aspects 1 to 3, wherein the control unit (110) determines the utterance according to the age group or gender of the speaker. It is good also as a structure which changes the presentation form of the said route guidance information shown to a person.

上記の構成によれば、発話者の年齢層又は性別に応じた適切な提示形態で、道案内情報を発話者に提示することができる。 According to said structure, route guidance information can be shown to a speaker by the suitable presentation form according to the speaker's age group or sex.

本発明の態様５に係る情報処理装置（１００）は、上記の態様１から４の何れか一項において、上記制御部（１１０）は、上記発話者の年齢層又は性別に応じて、上記発話者に提示する上記道案内情報に含まれる目的地までの経路を変更する構成としてもよい。 The information processing device (100) according to aspect 5 of the present invention is the information processing apparatus (100) according to any one of the aspects 1 to 4, wherein the control unit (110) includes the utterance according to the age group or gender of the speaker. It is good also as a structure which changes the route to the destination contained in the said route guidance information shown to a person.

上記の構成によれば、発話者の年齢層又は性別に応じた適切な経路を含んだ道案内情報を発話者に提示することができる。 According to said structure, the route guidance information containing the suitable path | route according to the speaker's age group or sex can be shown to a speaker.

本発明の態様６に係る端末装置（１０）は、取得部（音声入力部１２）と、提示部（表示部１５）と、制御部（１１０）とを備えた端末装置（１０）であって、上記制御部（１１０）は、発話者の発話に関する音声データを、上記音声入力部（１２）を介して取得し、取得した上記音声データを参照して、上記発話者の年齢層又は性別を特定し、特定した上記年齢層又は性別に応じて、上記発話者に提示する道案内情報を決定し、決定した道案内情報を、上記提示部を介して提示する構成である。 A terminal device (10) according to aspect 6 of the present invention is a terminal device (10) including an acquisition unit (speech input unit 12), a presentation unit (display unit 15), and a control unit (110). The control unit (110) acquires voice data related to the utterance of the speaker via the voice input unit (12), and refers to the acquired voice data to determine the age group or gender of the speaker. According to the specified age group or sex, the route guidance information to be presented to the speaker is determined, and the determined route guidance information is presented via the presenting unit.

本発明の各態様に係る情報処理装置１００は、コンピュータによって実現してもよく、この場合には、コンピュータを上記情報処理装置１００が備える各部（ソフトウェア要素）として動作させることにより上記情報処理装置１００をコンピュータにて実現させる情報処理装置１００の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The information processing apparatus 100 according to each aspect of the present invention may be realized by a computer. In this case, the information processing apparatus 100 is operated by causing the computer to operate as each unit (software element) included in the information processing apparatus 100. The control program of the information processing apparatus 100 for realizing the above in a computer and the computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

１情報処理システム
１０端末装置
１１端末通信部
１２端末制御部
１３音声入力部
１４音声出力部
１５表示部
１００情報処理装置
１０１サーバ通信部（取得部、通信部）
１１０制御部
１１１音声認識部
１１２性別年代判定部
１１３案内情報決定部
１１４案内内容生成部
１１３案内決定部
１２０記憶部 DESCRIPTION OF SYMBOLS 1 Information processing system 10 Terminal device 11 Terminal communication part 12 Terminal control part 13 Audio | voice input part 14 Audio | voice output part 15 Display part 100 Information processing apparatus 101 Server communication part (acquisition part, communication part)
DESCRIPTION OF SYMBOLS 110 Control part 111 Voice recognition part 112 Gender age determination part 113 Guidance information determination part 114 Guidance content generation part 113 Guidance determination part 120 Storage part

Claims

An information processing apparatus including an acquisition unit and a control unit,
The control unit
Acquire voice data related to the utterance of the speaker through the acquisition unit,
By referring to the acquired voice data, the age group or gender of the speaker is specified,
An information processing apparatus for determining route guidance information to be presented to the speaker according to the specified age group or sex.

The control unit
Referring to the audio data above, identify the age group of the speaker,
The information processing apparatus according to claim 1, wherein when the specified age group is larger than a predetermined age, the sex of the speaker is further specified.

The control unit
If there are a plurality of candidates for the route guidance information to be presented to the speaker, refer to at least one of the age group of the speaker, the gender of the speaker, and the place where the voice data is acquired. The information processing apparatus according to claim 1, wherein route guidance information to be presented to the speaker is selected.

The control unit
The information processing apparatus according to any one of claims 1 to 3, wherein a presentation form of the route guidance information presented to the speaker is changed according to an age group or sex of the speaker.

The control unit
5. The route to the destination included in the route guidance information presented to the speaker is changed according to the age group or gender of the speaker. 5. Information processing device.

A terminal device including an acquisition unit, a presentation unit, and a control unit,
The control unit
Acquire voice data related to the utterance of the speaker through the acquisition unit,
By referring to the acquired voice data, the age group or gender of the speaker is specified,
Depending on the specified age group or gender, the route guidance information to be presented to the speaker is determined,
The terminal device characterized by presenting the determined route guidance information via the presenting unit.

An information processing system comprising one or more terminal devices and an information processing device,
One of the one or more terminal devices transmits audio data to the information processing device,
The information processing apparatus
A communication unit and a control unit;
The control unit
Acquire voice data related to the utterance of the speaker via the communication unit,
By referring to the acquired voice data, the age group or gender of the speaker is specified,
Depending on the specified age group or gender, the route guidance information to be presented to the speaker is determined,
Via the communication unit, the route guidance information is transmitted to one or more of the terminal devices,
The terminal device that has received the route guidance information
An information processing system that presents the route guidance information.

A program for causing a computer to function as the information processing apparatus according to claim 1, wherein the program causes the computer to function as the acquisition unit and the control unit.

An audio data acquisition step for acquiring audio data relating to the utterance of the speaker;
An attribute specifying step for specifying the age group or gender of the speaker with reference to the acquired voice data;
A route guidance information determining step for determining route guidance information to be presented to the speaker according to the specified age group or sex.