JP2022067685A

JP2022067685A - Interactive communication device, communication system, and program

Info

Publication number: JP2022067685A
Application number: JP2020176397A
Authority: JP
Inventors: 公太平瀬; Kota Hirase; 拓也川田; Takuya Kawada; 明子大野; Akiko Ono
Original assignee: Tokyo Gas Co Ltd
Current assignee: Tokyo Gas Co Ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2022-05-09
Anticipated expiration: 2040-10-21
Also published as: JP7469211B2

Abstract

To provide an interactive communication device, etc., for performing recitations suitable for the occasions through dialogues with users, compared with conventional recitations which are consistently identical.SOLUTION: Provided is an interactive communication device for causing communication to proceed by dialogues with a user, and that comprises: a request obtainment section 13 for obtaining a request pertaining to a recitation from the user; a detection section 14 for detecting a state of the user from the dialogues with the user; a storage section 15 for storing recitation patterns; a selection section 16 for, from the storage section 15, selecting the recitation pattern correspondingly to the detected state of the user; and a voice output section 18 for reciting a book in the selected recitation pattern.SELECTED DRAWING: Figure 3

Description

本発明は、対話型コミュニケーション装置、コミュニケーションシステム、プログラムに関する。 The present invention relates to an interactive communication device, a communication system, and a program.

ユーザとの会話によってコミュニケーションを進行させる対話型コミュニケーション装置が存在する。 There is an interactive communication device that promotes communication by talking with a user.

特許文献１には、脚式ロボットが記載されている。この脚式ロボットは、本あるいはその他の印刷媒体・記録媒体に印刷・記録されたストーリ、あるいはネットワーク経由でダウンロードされたストーリを朗読する際に、記述された文字通りに単に逐語的に読み上げるのではなく、時間の変化、季節の変化、あるいはユーザの感情変化などの外部要因を利用して、元の内容との実質的同一の範囲内で動的にストーリを改編し、毎回異なった内容を朗読することができる。 Patent Document 1 describes a legged robot. This legged robot does not simply read aloud literally as it is written when reading a story printed or recorded on a book or other print or recording medium, or a story downloaded over a network. , Time changes, seasonal changes, or user emotional changes, dynamically reorganize the story within substantially the same range as the original content, and read different content each time. be able to.

特開２００２－２０５２９１号公報Japanese Unexamined Patent Publication No. 2002-205291

ナレーターや声優が書籍を朗読したものを録音した音声コンテンツがインターネットを介してダウンロード可能になってきている。また、これらの音声コンテンツを幼児向けに読み聞かせるホームロボットも出現している。通常、これらの音声コンテンツは、ナレーターや声優の朗読を録音した音声を再生するだけであるので、毎回、同じ音声が同じ音調で出力されるだけである。ところが、幼児は、同じストーリを何度も何度も聞きたがる傾向があり、毎回同じ音声出力では、面白味に欠けることがある。幼児の親にとっても本の内容に興味を持って幼児に付き添うことが望まれるため、読み聞かせ方に変化をもたせることが望まれていた。
本発明は、従来の常に同一の朗読と比較して、ユーザとの会話を通じてその場に適した書籍及び朗読パターンで朗読を行う対話型コミュニケーション装置等を提供することを目的とする。 Audio content, which is a recording of a book read aloud by a narrator or voice actor, is becoming available for download via the Internet. In addition, home robots that read these audio contents to young children have also appeared. Normally, these audio contents only reproduce the audio recorded from the reading of the narrator or the voice actor, so that the same audio is output in the same tone each time. However, toddlers tend to want to hear the same story over and over again, and the same audio output each time can be uninteresting. Since it is desirable for parents of infants to be interested in the contents of the book and accompany the infant, it was desired to change the way they read aloud.
An object of the present invention is to provide an interactive communication device or the like that reads aloud with a book suitable for the situation and a reading pattern through a conversation with a user, as compared with the conventional reading that is always the same.

かくして本発明によれば、ユーザとの会話によってコミュニケーションを進行させる対話型コミュニケーション装置であって、ユーザから朗読に関する要求を取得する要求取得手段と、ユーザとの会話からユーザの状態を検知する検知手段と、朗読パターンを記憶する記憶手段と、検知されたユーザの状態に応じて、朗読パターンを記憶手段から選択する選択手段と、選択された朗読パターンにより、書籍を朗読する音声出力手段と、を有することを特徴とする対話型コミュニケーション装置が提供される。 Thus, according to the present invention, it is an interactive communication device that advances communication by conversation with a user, and is a request acquisition means for acquiring a request for reading from the user and a detection means for detecting the user's state from the conversation with the user. A storage means for storing the reading pattern, a selection means for selecting the reading pattern from the storage means according to the detected state of the user, and an audio output means for reading the book according to the selected reading pattern. An interactive communication device characterized by having is provided.

さらに選択手段は、検知したユーザの状態及び／又は予め登録されたユーザ情報に基づき、朗読する書籍を選択するようにすることができる。この場合、書籍の選択がより的確になる。
また、ユーザの音声を取得する音声取得手段をさらに有し、検知手段は、音声取得手段が取得した音声を基にユーザの状態を検知することができる。この場合、ユーザの音声に含まれる文言を基に、ユーザの状態を検知することができる。
さらに、選択手段は、同一のユーザに対し同一の書籍について過去に朗読した履歴により朗読パターンを変更することができる。この場合、ユーザの状態が同じときに、同一の朗読パターンがいつも選択されることを防止できる。
またさらに、選択手段は、検知したユーザの状態に基づき、朗読する速度、音声の高低レベル、音声の質、抑揚のうち少なくとも１つ以上のパラメータの組み合わせから特定される朗読パターンを選択することができる。この場合、朗読のパターンに変化を付けやすくなる。
さらに、検知手段は、書籍の朗読に対するユーザの評価をユーザとの会話からさらに検知し、選択手段は、ユーザの評価をさらに加味して朗読パターンを選択することができる。この場合、ユーザの評価をフィードバックして朗読パターンを選択することができる。
そして、検知手段は、複数のユーザを判別し、選択手段は、複数のユーザの中の何れかのユーザの状態に応じて、朗読パターンを選択することができる。この場合、複数のユーザの中から朗読を聞かせるユーザを判別することができる。
また、選択手段は、複数のユーザの中で、子供の状態に応じて、朗読パターンを選択することができる。この場合、子共に対する読み聞かせを行うときに有効な朗読パターンを選択することができる。
さらに、検知手段は、自装置の周辺の状況をさらに検知し、選択手段は、検知した状況に基づき、朗読パターンを選択することができる。この場合、周囲の状況をさらに加えて朗読パターンを選択することができる。 Further, the selection means can select a book to be read based on the detected state of the user and / or the user information registered in advance. In this case, the selection of books becomes more accurate.
Further, the voice acquisition means for acquiring the user's voice is further provided, and the detection means can detect the user's state based on the voice acquired by the voice acquisition means. In this case, the user's state can be detected based on the wording included in the user's voice.
Further, the selection means can change the reading pattern based on the history of reading the same book to the same user in the past. In this case, it is possible to prevent the same reading pattern from being always selected when the user states are the same.
Furthermore, the selection means may select a reading pattern identified from a combination of at least one or more parameters of reading speed, voice level, voice quality, and intonation, based on the detected user's condition. can. In this case, it becomes easy to change the reading pattern.
Further, the detection means can further detect the user's evaluation of the reading of the book from the conversation with the user, and the selection means can select the reading pattern in consideration of the user's evaluation. In this case, the user's evaluation can be fed back and the reading pattern can be selected.
Then, the detection means can discriminate a plurality of users, and the selection means can select a reading pattern according to the state of any of the plurality of users. In this case, it is possible to determine the user who listens to the reading from a plurality of users.
In addition, the selection means can select a reading pattern among a plurality of users according to the state of the child. In this case, it is possible to select a reading pattern that is effective when reading aloud to the children.
Further, the detecting means can further detect the situation around the own device, and the selecting means can select the reading pattern based on the detected situation. In this case, the reading pattern can be selected by further adding the surrounding conditions.

さらに、本発明によれば、書籍を朗読する対話型コミュニケーション装置と、書籍を朗読した音声コンテンツのデータを保存する保存装置と、を備え、対話型コミュニケーション装置は、ユーザとの会話によってコミュニケーションを進行させる対話型コミュニケーション装置であって、ユーザから朗読に関する要求を取得する要求取得手段と、ユーザとの会話からユーザの状態を検知する検知手段と、朗読パターンを記憶する記憶手段と、検知されたユーザの状態に応じて、朗読パターンを記憶手段から選択する選択手段と、選択された朗読パターンにより、書籍を朗読する音声出力手段と、を有することを特徴とするコミュニケーションシステムが提供される。 Further, according to the present invention, the interactive communication device includes an interactive communication device for reading a book and a storage device for storing data of voice content read aloud in a book, and the interactive communication device advances communication by conversation with a user. An interactive communication device for making a user, a request acquisition means for acquiring a request for reading from a user, a detection means for detecting the user's state from a conversation with the user, a storage means for storing a reading pattern, and a detected user. A communication system is provided characterized by having a selection means for selecting a reading pattern from storage means according to the state of the above, and an audio output means for reading a book by the selected reading pattern.

またさらに、本発明によれば、コンピュータに、ユーザから朗読に関する要求を取得する要求取得機能と、ユーザとの会話からユーザの状態を検知する検知機能と、検知されたユーザの状態に応じて、朗読パターンを選択する選択機能と、選択された朗読パターンにより、書籍を朗読する音声出力機能と、を実現させるためのプログラムが提供される。 Further, according to the present invention, the computer has a request acquisition function for acquiring a request for reading from the user, a detection function for detecting the user's state from a conversation with the user, and a detected user's state. A program for realizing a selection function for selecting a reading pattern and an audio output function for reading a book by the selected reading pattern is provided.

本発明によれば、従来の常に同一の朗読と比較して、ユーザとの会話を通じてその場に適した朗読を行う対話型コミュニケーション装置等を提供することができる。 According to the present invention, it is possible to provide an interactive communication device or the like that performs reading suitable for the situation through conversation with a user, as compared with the conventional reading that is always the same.

本実施の形態におけるコミュニケーションシステムの構成例を示す図である。It is a figure which shows the structural example of the communication system in this embodiment. 端末装置をロボットとした場合について説明した図である。It is a figure explaining the case where the terminal device is a robot. コミュニケーションシステムの機能構成例を示したブロック図である。It is a block diagram which showed the functional composition example of a communication system. 本実施形態のコミュニケーションシステムの動作の例について説明したフローチャートである。It is a flowchart explaining the example of the operation of the communication system of this embodiment. （ａ）～（ｂ）は、書籍名に関し、記憶部に保存されるデータ構造について示した図である。(A) to (b) are diagrams showing the data structure stored in the storage unit with respect to the book title. 朗読パターンに関し、記憶部に保存されるデータ構造について示した図である。It is a figure which showed the data structure stored in the storage part about a reading pattern. （ａ）～（ｃ）は、音声を、基本周波数と非周期成分とに分けた場合を示した図である。(A) to (c) are diagrams showing the case where the voice is divided into a fundamental frequency and an aperiodic component. スペクトル包絡の例について示した図である。It is a figure which showed the example of the spectral envelope. ユーザの年齢を推定する方法の一例を示した図である。It is a figure which showed an example of the method of estimating the age of a user.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

＜コミュニケーションシステム１全体の説明＞
図１は、本実施の形態におけるコミュニケーションシステム１の構成例を示す図である。
図示するように本実施の形態のコミュニケーションシステム１は、端末装置１０と、管理サーバ２０とが、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されている。 <Explanation of the entire communication system 1>
FIG. 1 is a diagram showing a configuration example of the communication system 1 according to the present embodiment.
As shown in the figure, the communication system 1 of the present embodiment is configured by connecting the terminal device 10 and the management server 20 via the network 70, the network 80, and the access point 90.

端末装置１０は、書籍を朗読する対話型コミュニケーション装置の一例である。端末装置１０は、言葉や動作など、何らかの手段によってユーザとコミュニケーションをすることができ、ユーザとの会話によってコミュニケーションを進行させることができる。即ち、ユーザが、問いや指示を音声にて発すると、端末装置１０は、この問いや指示に対し何らかの反応を返す。この反応は、例えば、音声、画像、ジェスチャーなどである。また逆に、端末装置１０が、問いや指示を行い、ユーザがこの問いや指示に対し音声やジェスチャーを返すこともある。そして、ユーザや端末装置１０が、これらの動作をすることで、ユーザと端末装置１０との間で、コミュニケーションが成立する。端末装置１０は、例えば、ロボットとすることができる。このロボットは、例えば、ロボットを所有するユーザの住居に置かれる。 The terminal device 10 is an example of an interactive communication device for reading aloud a book. The terminal device 10 can communicate with the user by some means such as words and actions, and can proceed with the communication by talking with the user. That is, when the user issues a question or instruction by voice, the terminal device 10 returns some reaction to the question or instruction. This reaction is, for example, voice, image, gesture, and the like. On the contrary, the terminal device 10 may give a question or an instruction, and the user may return a voice or a gesture in response to the question or the instruction. Then, when the user or the terminal device 10 performs these operations, communication is established between the user and the terminal device 10. The terminal device 10 can be, for example, a robot. The robot is placed, for example, in the residence of the user who owns the robot.

図２は、端末装置１０をロボットとした場合について説明した図である。
図２に示した、ロボットとしての端末装置１０は、歩行等を行うことで移動する機能を有する移動式としてもよいが、移動しない非移動式としてもよい。
端末装置１０は、情報の送信及び受信を行う通信アンテナ１０１と、音声を取得するマイクロフォン１０２と、音声等の音を出力するスピーカ１０３と、ユーザが操作を行う操作ボタン１０４と、端末装置１０の全体の制御を行う制御部１０５とを備える。 FIG. 2 is a diagram illustrating a case where the terminal device 10 is a robot.
The terminal device 10 as a robot shown in FIG. 2 may be a mobile type having a function of moving by walking or the like, or may be a non-movable type that does not move.
The terminal device 10 includes a communication antenna 101 for transmitting and receiving information, a microphone 102 for acquiring voice, a speaker 103 for outputting sound such as voice, an operation button 104 operated by the user, and a terminal device 10. It is provided with a control unit 105 that controls the whole.

管理サーバ２０は、コミュニケーションシステム１の全体の管理をするサーバコンピュータである。管理サーバ２０は、保存装置の一例であり、ナレーターや声優が書籍を朗読したものを録音した音声コンテンツのデータを保存する。そして、端末装置１０は、管理サーバ２０から音声コンテンツのデータをダウンロードし、端末装置１０内に保存して、音声出力することができる。あるいは、端末装置１０は、管理サーバ２０からストリーミング形式で音声コンテンツをダウンロードし、音声出力することもできる。 The management server 20 is a server computer that manages the entire communication system 1. The management server 20 is an example of a storage device, and stores audio content data recorded by a narrator or a voice actor reading aloud a book. Then, the terminal device 10 can download the audio content data from the management server 20, store it in the terminal device 10, and output the audio. Alternatively, the terminal device 10 can download the audio content in a streaming format from the management server 20 and output the audio.

管理サーバ２０は、演算手段であるＣＰＵ（Central Processing Unit）と、記憶手段であるメインメモリを備える。ここで、ＣＰＵは、ＯＳ（基本ソフトウェア）やアプリ（応用ソフトウェア）等の各種ソフトウェアを実行する。また、メインメモリは、各種ソフトウェアやその実行に用いるデータ等を記憶する記憶領域である。さらに、管理サーバ２０は、外部との通信を行うための通信インタフェース（以下、「通信Ｉ／Ｆ」と表記する）と、ビデオメモリやディスプレイ等からなる表示機構と、入力ボタン、タッチパネル、キーボード等の入力機構とを備える。また、管理サーバ２０は、補助記憶装置として、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等のストレージを備える。 The management server 20 includes a CPU (Central Processing Unit) which is a calculation means and a main memory which is a storage means. Here, the CPU executes various software such as an OS (basic software) and an application (application software). The main memory is a storage area for storing various software and data used for executing the software. Further, the management server 20 has a communication interface for communicating with the outside (hereinafter referred to as "communication I / F"), a display mechanism including a video memory, a display, and the like, an input button, a touch panel, a keyboard, and the like. It is equipped with an input mechanism. Further, the management server 20 includes storage such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive) as an auxiliary storage device.

ネットワーク７０は、端末装置１０及び管理サーバ２０の情報通信に用いられる通信手段であり、例えば、インターネットである。
ネットワーク８０も、ネットワーク７０と同様に、端末装置１０及び管理サーバ２０の間の情報通信に用いられる通信手段であり、例えば、ＬＡＮ（Local Area Network）である。 The network 70 is a communication means used for information communication of the terminal device 10 and the management server 20, and is, for example, the Internet.
Similar to the network 70, the network 80 is also a communication means used for information communication between the terminal device 10 and the management server 20, and is, for example, a LAN (Local Area Network).

アクセスポイント９０は、無線通信回線を利用して無線通信を行う機器である。アクセスポイント９０は、端末装置１０とネットワーク７０との間の情報の送受信を媒介する。
無線通信回線の種類としては、携帯電話回線、ＰＨＳ（Personal Handy-phone System）回線、Ｗｉ－Ｆｉ（Wireless Fidelity）、Bluetooth（登録商標）、ZigBee、ＵＷＢ（Ultra Wideband）等の各回線が使用可能である。 The access point 90 is a device that performs wireless communication using a wireless communication line. The access point 90 mediates the transmission and reception of information between the terminal device 10 and the network 70.
As the type of wireless communication line, each line such as mobile phone line, PHS (Personal Handy-phone System) line, Wi-Fi (Wireless Fidelity), Bluetooth (registered trademark), ZigBee, UWB (Ultra Wideband) can be used. Is.

次に、本実施の形態のコミュニケーションシステム１の詳細な機能構成及び動作について説明する。 Next, the detailed functional configuration and operation of the communication system 1 of the present embodiment will be described.

＜コミュニケーションシステム１の機能構成の説明＞
図３は、コミュニケーションシステム１の機能構成例を示したブロック図である。
なおここでは、コミュニケーションシステム１が有する種々の機能のうち本実施の形態に関係するものを選択して図示している。
コミュニケーションシステム１において、端末装置１０は、音声コンテンツのデータの受信等を行う送受信部１１と、ユーザの音声を取得する音声取得部１２と、ユーザの要求を取得する要求取得部１３と、ユーザの状態を検知する検知部１４と、音声コンテンツのデータを記憶する記憶部１５と、朗読パターンを選択する選択部１６と、選択された朗読パターンによる音声を作成する音声作成部１７と、音声を出力する音声出力部１８とを備える。 <Explanation of the functional configuration of communication system 1>
FIG. 3 is a block diagram showing a functional configuration example of the communication system 1.
Here, among various functions of the communication system 1, those related to the present embodiment are selected and shown.
In the communication system 1, the terminal device 10 includes a transmission / reception unit 11 for receiving voice content data, a voice acquisition unit 12 for acquiring the user's voice, a request acquisition unit 13 for acquiring the user's request, and a user's request. A detection unit 14 that detects a state, a storage unit 15 that stores audio content data, a selection unit 16 that selects a reading pattern, a voice creation unit 17 that creates a voice based on the selected reading pattern, and an audio output. The audio output unit 18 is provided.

送受信部１１は、管理サーバ２０に対し、音声コンテンツのダウンロードの要求を送信する。また、送受信部１１は、音声コンテンツのデータの受信を行う。送受信部１１は、例えば、通信Ｉ／ＦやＣＰＵであり、これは例えば、通信アンテナ１０１や制御部１０５に対応する。送受信部１１は、ネットワーク７０、ネットワーク８０及びアクセスポイント９０を介し、管理サーバ２０との間でこれらの情報の送受信を行う。 The transmission / reception unit 11 transmits a request for downloading audio content to the management server 20. Further, the transmission / reception unit 11 receives the data of the audio content. The transmission / reception unit 11 is, for example, a communication I / F or a CPU, which corresponds to, for example, a communication antenna 101 or a control unit 105. The transmission / reception unit 11 transmits / receives these information to / from the management server 20 via the network 70, the network 80, and the access point 90.

音声取得部１２は、音声取得手段の一例であり、ユーザの音声等の音を取得する。音声取得部１２は、例えば、マイクロフォン１０２に対応する。マイクロフォンの種類としては、ダイナミック型、コンデンサ型等、既存の種々のものを用いてよい。また、マイクロフォンとして、無指向性のＭＥＭＳ（Micro Electro Mechanical Systems）型マイクロフォンであることが好ましい。 The voice acquisition unit 12 is an example of voice acquisition means, and acquires a sound such as a user's voice. The voice acquisition unit 12 corresponds to, for example, the microphone 102. As the type of microphone, various existing microphones such as a dynamic type and a condenser type may be used. Further, the microphone is preferably an omnidirectional MEMS (Micro Electro Mechanical Systems) type microphone.

要求取得部１３は、要求取得手段の一例であり。ユーザの音声を基に、ユーザの要求を取得する。要求取得部１３は、例えば、ユーザの音声を音声文字変換し、テキスト化する。そして、このテキストを基にユーザの要求を判断する。ここでは、要求取得部１３は、ユーザから朗読に関する要求を取得する。 The request acquisition unit 13 is an example of a request acquisition means. Acquires the user's request based on the user's voice. The request acquisition unit 13 converts, for example, the user's voice into voice characters and converts it into text. Then, the user's request is judged based on this text. Here, the request acquisition unit 13 acquires a request regarding reading from the user.

検知部１４は、検知手段の一例であり、ユーザとの会話からユーザの状態を検知する。この場合、検知部１４は、音声取得部１２が取得した音声を基にユーザの状態を検知する。ユーザの状態とは、例えば、ユーザは忙しい、急いでいる、怒っている、疲れている等を言う。
記憶部１５は、記憶手段の一例であり、音声コンテンツを記憶する。また、記憶部１５は、書籍の朗読のパターンである朗読パターンを記憶する。朗読パターンは、例えば、朗読する速度、音声の高低レベル、音声の質、抑揚のパターンである。さらに、記憶部１５は、検知部１４が検知したユーザの状態やユーザ情報を記憶する。ここで、ユーザ情報とは、ユーザに関する情報であれば、特に限られるものではない。ユーザ情報は、例えば、ユーザの性別、年齢、家族構成、続柄、生年月日などである。ユーザ情報は、操作ボタン１０４をユーザが操作することで、予めユーザが設定することができる。また、後述するように、端末装置１０がユーザ情報を推定することもできる。 The detection unit 14 is an example of the detection means, and detects the user's state from a conversation with the user. In this case, the detection unit 14 detects the user's state based on the voice acquired by the voice acquisition unit 12. The state of the user means, for example, that the user is busy, in a hurry, angry, tired, or the like.
The storage unit 15 is an example of storage means, and stores audio content. Further, the storage unit 15 stores a reading pattern, which is a reading pattern of a book. Reading patterns are, for example, reading speeds, high and low levels of speech, speech quality, and patterns of intonation. Further, the storage unit 15 stores the user state and user information detected by the detection unit 14. Here, the user information is not particularly limited as long as it is information about the user. The user information is, for example, the user's gender, age, family structure, relationship, date of birth, and the like. The user information can be set in advance by the user by operating the operation button 104. Further, as will be described later, the terminal device 10 can also estimate the user information.

選択部１６は、選択手段の一例であり、検知されたユーザの状態に応じて、朗読パターンを記憶部１５から選択する。さらに、選択部１６は、検知したユーザの状態及び／又は予め登録されたユーザ情報に基づき、朗読する書籍を選択する。
音声作成部１７は、選択された朗読パターンに応じた音声を作成する。音声作成部１７は、送受信部１１により取得された音声コンテンツのデータを基に、音声の変換を行い、選択された朗読パターンに応じた音声を作成する。
音声出力部１８は、音声出力手段の一例であり、選択された朗読パターンにより、書籍を朗読する。 The selection unit 16 is an example of the selection means, and selects a reading pattern from the storage unit 15 according to the detected state of the user. Further, the selection unit 16 selects a book to be read aloud based on the detected state of the user and / or the user information registered in advance.
The voice creation unit 17 creates a voice according to the selected reading pattern. The voice creation unit 17 converts the voice based on the data of the voice content acquired by the transmission / reception unit 11, and creates the voice according to the selected reading pattern.
The voice output unit 18 is an example of the voice output means, and reads a book according to the selected reading pattern.

要求取得部１３、検知部１４、選択部１６は、例えば、ＣＰＵであり、制御部１０５に対応する。また、記憶部１５は、例えば、メインメモリ、ストレージなどであり、制御部１０５に対応する。さらに、音声出力部１８は、例えば、スピーカ１０３に対応する。 The request acquisition unit 13, the detection unit 14, and the selection unit 16 are, for example, CPUs and correspond to the control unit 105. Further, the storage unit 15 is, for example, a main memory, a storage, or the like, and corresponds to the control unit 105. Further, the audio output unit 18 corresponds to, for example, the speaker 103.

管理サーバ２０は、音声コンテンツのデータの送信等を行う送受信部２１と、音声コンテンツを保存する保存部２２と、管理サーバ２０全体の制御を行う制御部２３とを備える。
送受信部２１は、端末装置１０から、音声コンテンツのダウンロードの要求を受け付けると、端末装置１０に対し、音声コンテンツのデータの送信を行う。送受信部２１は、例えば、通信Ｉ／Ｆに対応する。 The management server 20 includes a transmission / reception unit 21 for transmitting audio content data, a storage unit 22 for storing audio content, and a control unit 23 for controlling the entire management server 20.
Upon receiving the request for downloading the audio content from the terminal device 10, the transmission / reception unit 21 transmits the audio content data to the terminal device 10. The transmission / reception unit 21 corresponds to, for example, a communication I / F.

保存部２２は、音声コンテンツのデータの保存を行う。保存部２２は、例えば、ストレージに対応する。
制御部２３は、端末装置１０からの音声コンテンツのダウンロードの要求に応じ、必要な音声コンテンツのデータを選択する。そして、保存部２２から、選択した音声コンテンツのデータを取得し、送受信部２１を介して端末装置１０に対し送る。制御部２３は、例えば、ＣＰＵやメインメモリに対応する。 The storage unit 22 stores the data of the audio content. The storage unit 22 corresponds to, for example, storage.
The control unit 23 selects necessary audio content data in response to a request for downloading audio content from the terminal device 10. Then, the data of the selected audio content is acquired from the storage unit 22 and sent to the terminal device 10 via the transmission / reception unit 21. The control unit 23 corresponds to, for example, a CPU and a main memory.

＜コミュニケーションシステム１の動作の説明＞
次に、本実施の形態のコミュニケーションシステム１の動作について、より詳細に説明を行う。
図４は、本実施形態のコミュニケーションシステム１の動作の例について説明したフローチャートである。
まず、端末装置１０の音声取得部１２が、ユーザの音声を取得する（ステップ１０１）。
次に、要求取得部１３が、ユーザから、書籍の朗読の要求がなされたか否かを判断する（ステップ１０２）。これは、音声取得部１２が取得したユーザの音声の中に、書籍の朗読の要求を行う文言が含まれるか否かで判断することができる。即ち、ユーザが、「本読んで。」、「○○を読んでください。」、「ねえ、何か読んでよ。」などの音声が含まれる場合、要求取得部１３は、書籍の朗読の要求がなされたと判断する。なお、この場合、「○○」は、書籍の題名である。 <Explanation of operation of communication system 1>
Next, the operation of the communication system 1 of the present embodiment will be described in more detail.
FIG. 4 is a flowchart illustrating an example of the operation of the communication system 1 of the present embodiment.
First, the voice acquisition unit 12 of the terminal device 10 acquires the user's voice (step 101).
Next, the request acquisition unit 13 determines whether or not the user has requested to read the book (step 102). This can be determined by whether or not the user's voice acquired by the voice acquisition unit 12 includes the wording requesting the reading of the book. That is, when the user includes voices such as "Read a book.", "Please read XX.", "Hey, read something.", The request acquisition unit 13 reads the book. Judge that the request has been made. In this case, "○○" is the title of the book.

その結果、ユーザから、書籍の朗読の要求がなされていない場合（ステップ１０２でＮｏ）、検知部１４は、ユーザの状態を検知する（ステップ１０３）。そして、ステップ１０１に戻る。ユーザの状態は、音声取得部１２が取得したユーザの音声の中の文言から、判断することができる。具体的には、「時間がない」などの文言が含まれる場合は、検知部１４は、ユーザが忙しいと判断する。また、「早くして」などの文言が含まれる場合は、検知部１４は、ユーザが急いでいると判断する。さらに、「いいかげんにして」などの文言が含まれる場合は、検知部１４は、ユーザが怒っていると判断する。またさらに、「疲れた」などの文言が含まれる場合は、検知部１４は、ユーザが疲れていると判断する。検知されたユーザの状態は、順次、記憶部１５に記憶される。 As a result, when the user has not requested to read the book (No in step 102), the detection unit 14 detects the user's state (step 103). Then, the process returns to step 101. The user's state can be determined from the wording in the user's voice acquired by the voice acquisition unit 12. Specifically, when the wording such as "there is no time" is included, the detection unit 14 determines that the user is busy. Further, when the wording such as "quickly" is included, the detection unit 14 determines that the user is in a hurry. Further, when the wording such as "I'm sorry" is included, the detection unit 14 determines that the user is angry. Further, when the wording such as "tired" is included, the detection unit 14 determines that the user is tired. The detected user states are sequentially stored in the storage unit 15.

また、検知部１４は、分散表現を利用した手法で、ユーザの状態を検知してもよい。具体的には、ユーザの音声を音声認識してテキスト化し、テキストを構成する単語を、分散表現を利用して、高次元の実数ベクトルで表現する。単語を高次元の実数ベクトルで表すには、例えば、word2vecを利用して行うことができる。Word2vecは、ニューラルネットワークを利用して自然言語を解析し、文中に出現した単語の潜在表現をベクトルの形で表現することができる。そして、このベクトルのユークリッド距離が近い単語同士は、近い意味を有すると考えることができる。よって、検知部１４は、このベクトル空間内で、予め定められた領域を定め、それぞれの単語が、何れの領域に含まれるか否かを調べる。この領域は、ユーザの状態に対応付けられる。即ち、上述した、ユーザは忙しい、急いでいる、怒っている、疲れているなどの状態に対応する。そして、それぞれの領域に属する単語の数により、ユーザの状態を判断できる。例えば、このベクトル空間内で、「忙しい」に対応する領域に単語が多く含まれる場合は、ユーザの状態は、「忙しい」と判断できる。 Further, the detection unit 14 may detect the user's state by a method using a distributed expression. Specifically, the user's voice is recognized and converted into text, and the words constituting the text are expressed by a high-dimensional real number vector using a distributed expression. To represent a word as a high-dimensional real vector, for example, word2vec can be used. Word2vec can analyze natural language using neural networks and express the latent expressions of words that appear in sentences in the form of vectors. Then, words having a short Euclidean distance in this vector can be considered to have similar meanings. Therefore, the detection unit 14 determines a predetermined area in this vector space, and examines which area each word is included in. This area is associated with the user's state. That is, the user corresponds to the above-mentioned states such as busy, hurrying, angry, and tired. Then, the state of the user can be determined from the number of words belonging to each area. For example, if many words are included in the area corresponding to "busy" in this vector space, the user's state can be determined to be "busy".

また、ユーザから、書籍の朗読の要求がなされていた場合（ステップ１０２でＹｅｓ）、選択部１６は、書籍の選択を行う（ステップ１０４）。選択部１６は、ユーザの要求の中に書籍名が明示されていた場合は、この書籍名の書籍を選択する。また、選択部１６は、ユーザの要求の中に書籍名がない場合は、検知したユーザの状態やユーザ情報に基づき、朗読する書籍を選択する。例えば、ユーザの状態が、疲れている状態の場合、選択部１６は、ユーザをリラックスさせるような書籍を選択する。また、ユーザ情報により、ユーザが３歳の女の子であることがわかれば、選択部１６は、この子に興味をもってもらえそうな童話を選択する。 Further, when the user requests to read the book (Yes in step 102), the selection unit 16 selects the book (step 104). If the book name is specified in the user's request, the selection unit 16 selects the book with this book name. Further, when the book name is not included in the user's request, the selection unit 16 selects the book to be read based on the detected state of the user and the user information. For example, when the user is tired, the selection unit 16 selects a book that relaxes the user. Further, if the user information indicates that the user is a 3-year-old girl, the selection unit 16 selects a fairy tale that is likely to be of interest to this child.

書籍名は、ユーザの状態やユーザ情報と予め関連付けられ、記憶部１５に保存されている。また、ユーザの状態やユーザ情報に対し、複数の書籍名が記憶され、この中から選択することができることが好ましい。これにより、いつも同じ書籍が選択されることがなくなる。 The book title is associated with the user's state and user information in advance, and is stored in the storage unit 15. Further, it is preferable that a plurality of book titles are stored for the user status and user information and can be selected from these. This ensures that the same book is not always selected.

図５（ａ）～（ｂ）は、書籍名に関し、記憶部１５に保存されるデータ構造について示した図である。
図示するデータ構造は、ユーザ情報であるユーザの年齢、ユーザの状態及び書籍名が関連付けられる。
ここで、図５（ａ）に示したデータ構造は、ユーザの年齢とそれに応じたグループとが関連付けられる。即ち、ユーザの年齢に合わせた書籍を選択できるように、ユーザの年齢に合わせて書籍をグループ分けする。ここでは、区分された年齢に合わせ、グループＡ，グループＢ、… にグループ分けされている。
また、図５（ｂ）に示したデータ構造は、それぞれのグループ毎に設定され、ユーザの状態と書籍名とを関連付けさせる。そして、ユーザの状態に応じ、書籍名は，複数関連付けられる。この場合、ユーザの状態に応じてそれぞれ３つの書籍名が関連付けられている。即ち、選択部１６は、これらの中から何れの書籍名を選択してもよい。このようなデータ構造とすることで、選択部１６は、ユーザの年齢や状態に応じた書籍を選択することができる。 5 (a) to 5 (b) are diagrams showing the data structure stored in the storage unit 15 with respect to the book title.
The illustrated data structure is associated with user information such as the age of the user, the state of the user, and the title of the book.
Here, in the data structure shown in FIG. 5A, the age of the user and the corresponding group are associated with each other. That is, the books are grouped according to the age of the user so that the books according to the age of the user can be selected. Here, they are grouped into groups A, B, ..., According to the ages.
Further, the data structure shown in FIG. 5B is set for each group, and the user's state and the book name are associated with each other. Then, a plurality of book titles are associated with each other according to the state of the user. In this case, three book titles are associated with each other according to the state of the user. That is, the selection unit 16 may select any book title from these. With such a data structure, the selection unit 16 can select a book according to the age and state of the user.

図４に戻り、次に、送受信部１１が、管理サーバ２０に対し、選択部１６により選択された書籍名の音声コンテンツのデータのダウンロードの要求を行う（ステップ１０５）。
ダウンロードの要求は、管理サーバ２０の送受信部２１が受信し、制御部２３は、保存部２２から、要求された書籍名の音声コンテンツのデータを取得する。（ステップ１０６）。
音声コンテンツのデータは、制御部２３が、送受信部２１を介し、端末装置１０に送信し、端末装置１０の送受信部１１が受信する（ステップ１０７）。 Returning to FIG. 4, the transmission / reception unit 11 then requests the management server 20 to download the audio content data of the book name selected by the selection unit 16 (step 105).
The download request is received by the transmission / reception unit 21 of the management server 20, and the control unit 23 acquires the data of the audio content of the requested book name from the storage unit 22. (Step 106).
The audio content data is transmitted by the control unit 23 to the terminal device 10 via the transmission / reception unit 21, and is received by the transmission / reception unit 11 of the terminal device 10 (step 107).

次に、選択部１６は、検知部１４により検知されたユーザの状態に応じて、朗読パターンを記憶部１５から選択する（ステップ１０８）。このとき、選択部１６は、検知したユーザの状態に基づき、朗読する速度、音声の高低レベル、音声の質、抑揚のうち少なくとも１つ以上のパラメータの組み合わせから特定される朗読パターンを選択する。 Next, the selection unit 16 selects a reading pattern from the storage unit 15 according to the state of the user detected by the detection unit 14 (step 108). At this time, the selection unit 16 selects a reading pattern specified from a combination of at least one or more parameters of reading speed, high / low level of voice, voice quality, and intonation, based on the detected state of the user.

朗読パターンは、ユーザの状態と予め関連付けられ、記憶部１５に保存されている。
図６は、朗読パターンに関し、記憶部１５に保存されるデータ構造について示した図である。
図示するデータ構造は、ユーザの状態と朗読パターンとが関連付けられる。そして、ユーザの状態に応じて朗読パターンは，複数関連付けられる。この場合、ユーザの状態に応じてそれぞれ３つの朗読パターンが関連付けられている。即ち、選択部１６は、これらの中から何れの朗読パターンを選択してもよい。朗読パターンを複数設定することで、ユーザの状態が同じときに、同一の朗読パターンが常に選択されることを防止できる。 The reading pattern is associated with the user's state in advance and is stored in the storage unit 15.
FIG. 6 is a diagram showing a data structure stored in the storage unit 15 with respect to the reading pattern.
In the illustrated data structure, the user's state and the reading pattern are associated with each other. Then, a plurality of reading patterns are associated according to the state of the user. In this case, three reading patterns are associated with each other according to the state of the user. That is, the selection unit 16 may select any reading pattern from these. By setting a plurality of reading patterns, it is possible to prevent the same reading pattern from being always selected when the user's state is the same.

検知部１４が、ユーザの状態として、忙しい、急いでいると判断した場合は、選択部１６は、例えば、朗読の速度が速い朗読パターンを選択する。また、検知部１４が、ユーザが、疲れていると判断した場合は、選択部１６は、例えば、音声をやや低くゆっくりした口調の朗読パターンを選択する。 When the detection unit 14 determines that the user is busy or in a hurry, the selection unit 16 selects, for example, a reading pattern with a high reading speed. When the detection unit 14 determines that the user is tired, the selection unit 16 selects, for example, a reading pattern with a slightly low voice and a slow tone.

音声作成部１７は、選択された朗読パターンに応じた音声を作成する（ステップ１０９）。そして、作成された音声は、音声出力部１８から出力される（ステップ１１０）。
音声作成部１７は、送受信部１１により取得された音声コンテンツのデータを基に、音声の変換を行い、選択された朗読パターンに応じた音声を作成する。そして、音声出力部１８は、選択された朗読パターンに応じた、速度、音声の高低レベル、音声の質、抑揚にて、書籍を朗読する音声を出力する。 The voice creation unit 17 creates a voice according to the selected reading pattern (step 109). Then, the created voice is output from the voice output unit 18 (step 110).
The voice creation unit 17 converts the voice based on the data of the voice content acquired by the transmission / reception unit 11, and creates the voice according to the selected reading pattern. Then, the voice output unit 18 outputs the voice for reading the book at the speed, the high / low level of the voice, the quality of the voice, and the intonation according to the selected reading pattern.

音声作成部１７が、音声の変換を行うには、例えば、以下の方法を用いることができる。
まず、音声作成部１７は、音声を、基本周波数と非周期成分とに分ける。
図７（ａ）～（ｃ）は、音声を、基本周波数と非周期成分とに分けた場合を示した図である。
ここで、図７（ａ）は、音声の信号を表す。また、図７（ｂ）は、音声の信号の基本周波数を表し、図７（ｃ）は、音声の信号の非周期成分を表す。図７（ａ）～（ｃ）で、横軸は、時間であり、縦軸は、信号の強さである。
つまり、図７（ａ）で示す音声の信号は、図７（ｂ）で示す基本周波数と、図７（ｃ）で示す非周期成分の２つに分けることができる。
基本周波数を異なる周波数とすることで、声の高さを変更することができる。つまり、音声の高低レベを変更できる。また、非周期成分は、声色を表す。よって、非周期成分の大きさを変化させることでも音声の質が変化する。例えば、非周期成分が小さいほど、声のかすれが小さくなり、大きいほど声のかすれが大きくなる。そして、変換後の波形を再合成すると、音声の質を変更できる。また、非周期成分の大きさを変化させることで、抑揚についても変更できる。 For example, the following method can be used for the voice creation unit 17 to perform voice conversion.
First, the voice creation unit 17 divides the voice into a fundamental frequency and an aperiodic component.
7 (a) to 7 (c) are diagrams showing a case where the voice is divided into a fundamental frequency and an aperiodic component.
Here, FIG. 7A represents a voice signal. Further, FIG. 7 (b) shows the fundamental frequency of the voice signal, and FIG. 7 (c) shows the aperiodic component of the voice signal. In FIGS. 7A to 7C, the horizontal axis is time and the vertical axis is signal strength.
That is, the voice signal shown in FIG. 7 (a) can be divided into two, a fundamental frequency shown in FIG. 7 (b) and an aperiodic component shown in FIG. 7 (c).
The pitch of the voice can be changed by setting the fundamental frequency to a different frequency. That is, the height level of the voice can be changed. The aperiodic component represents the voice color. Therefore, changing the magnitude of the aperiodic component also changes the quality of the voice. For example, the smaller the aperiodic component, the smaller the faintness of the voice, and the larger the aperiodic component, the greater the faintness of the voice. Then, the quality of the voice can be changed by resynthesizing the converted waveform. In addition, the intonation can be changed by changing the size of the aperiodic component.

また、スペクトル包絡を変換することで、異なる音声とすることができる。
この場合、音声作成部１７は音声の信号をフーリエ変換し、周波数スペクトルを求め、これからスペクトル包絡を抽出する。スペクトル包絡は、周波数スペクトルの対数をさらにフーリエ変換したものであり、いわば、スペクトルのスペクトルである。 Also, by transforming the spectral envelope, different sounds can be obtained.
In this case, the voice generation unit 17 Fourier transforms the voice signal, obtains a frequency spectrum, and extracts a spectrum envelope from the frequency spectrum. The spectral envelope is a further Fourier transform of the logarithm of the frequency spectrum, so to speak, the spectrum of the spectrum.

図８は、スペクトル包絡の例について示した図である。
図８で、横軸は、周波数を表し、縦軸は、スペクトル強度を表す。図中、Ｓｓで表す線は、周波数スペクトルである。一方、Ｓｈで表す線は、スペクトル包絡である。このスペクトル包絡Ｓｈは、周波数スペクトルＳｓのなだらかな変動を表したものであり、周波数スペクトルＳｓから、周波数スペクトルＳｓの細かな変動（スペクトル微細構造）を分離したものである。そして、このスペクトル包絡Ｓｈは、人間の声道の特性を表している。よって、このスペクトル包絡Ｓｈを変換することで、異なる声道のスペクトル包絡Ｓｈを再現することができる。つまり、元とは異なる音声とすることができる。これにより、音声の高低レベル、音声の質、抑揚が変更できる。 FIG. 8 is a diagram showing an example of spectral envelope.
In FIG. 8, the horizontal axis represents frequency and the vertical axis represents spectral intensity. In the figure, the line represented by Ss is a frequency spectrum. On the other hand, the line represented by Sh is a spectral envelope. This spectral envelope Sh represents a gentle fluctuation of the frequency spectrum Ss, and is obtained by separating fine fluctuations (spectral fine structure) of the frequency spectrum Ss from the frequency spectrum Ss. The spectral envelope Sh represents the characteristics of the human vocal tract. Therefore, by converting this spectral envelope Sh, it is possible to reproduce the spectral envelope Sh of different vocal tracts. That is, the voice can be different from the original. This allows you to change the high and low levels of the voice, the quality of the voice, and the intonation.

以上説明した形態によれば、従来の常に同一の朗読と比較して、ユーザとの会話を通じてその場に適した書籍及び朗読パターンで朗読を行うことができる。
また、以上説明した形態によれば、端末装置１０が、ユーザの状態やユーザの情報に応じた書籍及び朗読パターンを選択する。このため、親が子に絵本等の書籍の読み聞かせをする場合、読み聞かせ方に変化をもたせることができる。その結果、聞き手の子に対し、より興味を引きやすくなるばかりか、子の親も朗読に関心を抱きやすくなる。この場合、親にとっては、書籍の朗読の手間が削減するだけでなく、親子で高い関心をもって満足感のある時間を共有することができる。そして、親子の一体感の形成にも寄与することができる。 According to the form described above, it is possible to read aloud with a book and a reading pattern suitable for the situation through conversation with the user, as compared with the conventional reading that is always the same.
Further, according to the above-described embodiment, the terminal device 10 selects a book and a reading pattern according to the user's state and the user's information. Therefore, when a parent reads aloud a book such as a picture book to a child, the reading method can be changed. As a result, not only is it easier for the listener's child to be interested in it, but also the child's parents are more likely to be interested in reading aloud. In this case, not only the time and effort for reading the book is reduced for the parents, but also the parents and children can share a satisfying time with high interest. It can also contribute to the formation of a sense of unity between parent and child.

＜変形例＞
以下、コミュニケーションシステム１の変形例について説明を行う。
（変形例１）
変形例１では、選択部１６は、同一のユーザに対し同一の書籍について過去に朗読した履歴により朗読パターンを変更する、つまり、このような場合は、既にユーザは、過去にこの書籍の朗読を聞いているため、選択部１６は、前回とは異なる朗読パターンを選択する。例えば、選択部１６は、ややスピードを上げて朗読する朗読パターンを選択する。
また、選択部１６は、毎回異なる朗読パターンを選択してもよく、同じ朗読パターンを予め定められた回数の中で１回だけとするようにしてもよい。これにより、同じ書籍の朗読を聞く場合でも、ユーザは、より新鮮な気分で朗読を聞くことができる <Modification example>
Hereinafter, a modified example of the communication system 1 will be described.
(Modification 1)
In the first modification, the selection unit 16 changes the reading pattern based on the history of reading the same book to the same user in the past, that is, in such a case, the user has already read this book in the past. Since listening, the selection unit 16 selects a reading pattern different from the previous one. For example, the selection unit 16 selects a reading pattern to be read at a slightly higher speed.
Further, the selection unit 16 may select a different reading pattern each time, or may set the same reading pattern only once within a predetermined number of times. This allows the user to hear the reading in a fresher mood, even when listening to the reading of the same book.

（変形例２）
変形例２では、検知部１４は、書籍の朗読に対するユーザの評価をユーザとの会話からさらに検知する。そして、選択部１６は、ユーザの評価をさらに加味して朗読パターンを選択する。つまり、この場合、選択部１６は、ユーザの評価をフィードバックして、朗読パターンを選択する。例えば、端末装置１０や親が、書籍について「この本は面白かった？」と質問をした際に、子供の幼児からの回答である、「楽しかった」、「怖かった」等を記憶部１５にて記憶し、選択部１６は、次回の朗読パターンの選択の際にフィードバックを行う。これにより、ユーザの書籍に対する興味の傾向を把握することができ、より興味を抱きやすい朗読パターンや書籍を選択することができる。 (Modification 2)
In the second modification, the detection unit 14 further detects the user's evaluation of the reading of the book from the conversation with the user. Then, the selection unit 16 selects a reading pattern in consideration of the user's evaluation. That is, in this case, the selection unit 16 feeds back the user's evaluation and selects the reading pattern. For example, when a terminal device 10 or a parent asks a question about a book, "Is this book interesting?" The selection unit 16 provides feedback when selecting the next reading pattern. As a result, it is possible to grasp the tendency of the user's interest in the book, and to select a reading pattern or a book that is easier to be interested in.

（変形例３）
変形例３では、検知部１４は、複数のユーザを判別する。そして、選択部１６は、複数のユーザの中の何れかのユーザの状態に応じて、朗読パターンを選択する。この複数のユーザは、例えば、親とその子供である。この場合、選択部１６は、複数のユーザの中で、子供の状態に応じて、朗読パターンを選択する。また同様に、選択部１６は、子供のユーザ情報や状態に応じて、書籍を選択する。端末装置１０として、図２に示すようなロボットを使用した場合、書籍の朗読は、大人よりも子供への絵本等の読み聞かせに用いられることが多い。よって、このようにすることで、朗読パターンや書籍の選択が、より的確になる。 (Modification 3)
In the third modification, the detection unit 14 determines a plurality of users. Then, the selection unit 16 selects a reading pattern according to the state of any of the plurality of users. The plurality of users are, for example, parents and their children. In this case, the selection unit 16 selects the reading pattern among the plurality of users according to the state of the child. Similarly, the selection unit 16 selects a book according to the user information and the state of the child. When a robot as shown in FIG. 2 is used as the terminal device 10, the reading of a book is often used to read aloud a picture book or the like to a child rather than an adult. Therefore, by doing so, the reading pattern and the selection of the book become more accurate.

（変形例４）
変形例４では、検知部１４は、自装置の周辺の状況をさらに検知し、選択部１６は、検知した状況に基づき、朗読パターンを選択する。
ここでは、検知部１４は、自装置の周囲の状態として、端末装置１０の周囲の環境音を識別し、選択部１６は、環境音に応じて朗読パターンを選択する。
この環境音は、ユーザの周囲から聞こえる音であり、雨の音、波の音、風の音、鳥や蝉の鳴き声、雑踏の音、自動車、電車、飛行機が通過する音等である。そして、この環境音が大きい場合、ユーザは、朗読を聞き取りにくくなる。そのため、選択部１６は、環境音の音圧が大きいと判断された場合は、それに応じて書籍の朗読する際の音量を大きくしたり、朗読の速度を遅くする。
また、例えば、選択部１６は、時間帯に合わせ、朗読を行う際の音量を設定してもよい。例えば、昼間には、音量を大きくし、夜間には、音量を小さくする。 (Modification example 4)
In the fourth modification, the detection unit 14 further detects the situation around the own device, and the selection unit 16 selects the reading pattern based on the detected situation.
Here, the detection unit 14 identifies the environmental sound around the terminal device 10 as the state around the own device, and the selection unit 16 selects the reading pattern according to the environmental sound.
This environmental sound is a sound heard from the surroundings of the user, such as the sound of rain, the sound of waves, the sound of wind, the sound of birds and worms, the sound of crowds, the sound of cars, trains, and airplanes passing by. And when this environmental sound is loud, it becomes difficult for the user to hear the reading. Therefore, when it is determined that the sound pressure of the environmental sound is large, the selection unit 16 increases the volume when reading the book or slows down the reading speed accordingly.
Further, for example, the selection unit 16 may set the volume for reading aloud according to the time zone. For example, increase the volume during the day and decrease the volume at night.

（変形例５）
変形例５では、端末装置１０がユーザ情報を推定する。ここでは、ユーザ情報として、ユーザの年齢や性別を推定する場合について説明する。
図９は、ユーザの年齢を推定する方法の一例を示した図である。
図９は、音声の周波数スペクトルについて示している。ここで、横軸は、周波数を表し、縦軸は、スペクトル強度を表す。即ち、周波数スペクトルは、音声に含まれる周波数成分について、周波数とその強度との関係を示している。
ここでは、音声について、４０歳、５０歳、６０歳、７０歳の人物の周波数スペクトルの一例を示している。図示するように、年齢が上昇するに従い、４ｋＨｚ以上のスペクトル強度が増加することがわかる。実際には、４ｋＨｚ以上のスペクトル強度が増加することで、音声が、よりかれた状態となる嗄声（させい）となる。
よって、検知部１４は、周波数スペクトルのうち、４ｋＨｚ以上のスペクトル強度を見ることで、ユーザの年齢を推定する。
また、図７に挙げた基本周波数は、声の高さを表す。例えば、男声の基本周波数は、１００Ｈｚ～２００Ｈｚであり、女声の基本周波数は、２５０Ｈｚ～５００Ｈｚである。よって、検知部１４は、基本周波数により、ユーザの性別を推定することができる。 (Modification 5)
In the fifth modification, the terminal device 10 estimates the user information. Here, a case where the age and gender of the user are estimated as the user information will be described.
FIG. 9 is a diagram showing an example of a method of estimating the age of a user.
FIG. 9 shows the frequency spectrum of voice. Here, the horizontal axis represents frequency and the vertical axis represents spectral intensity. That is, the frequency spectrum shows the relationship between the frequency and the intensity of the frequency component contained in the voice.
Here, for voice, an example of the frequency spectrum of a person aged 40, 50, 60, and 70 is shown. As shown in the figure, it can be seen that the spectral intensity of 4 kHz or higher increases as the age increases. In reality, the increase in the spectral intensity of 4 kHz or higher causes the voice to become hoarse, which is in a more distorted state.
Therefore, the detection unit 14 estimates the age of the user by observing the spectral intensity of 4 kHz or more in the frequency spectrum.
Further, the fundamental frequency shown in FIG. 7 represents the pitch of the voice. For example, the fundamental frequency of a male voice is 100 Hz to 200 Hz, and the fundamental frequency of a female voice is 250 Hz to 500 Hz. Therefore, the detection unit 14 can estimate the gender of the user from the fundamental frequency.

以上詳述した形態では、コミュニケーションシステム１は、端末装置１０及び管理サーバ２０が、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されていたが、管理サーバ２０の機能を端末装置１０に集約してもよい。この場合、端末装置１０は、コミュニケーションシステム１であるとして捉えることができる。また、端末装置１０で行う処理は、管理サーバ２０でも同様のことができる。つまり、端末装置１０の機能を管理サーバ２０に集約してもよい。よって、この場合、管理サーバ２０は、コミュニケーションシステム１であるとして捉えることができる。
さらに、上述した例では、端末装置１０は、ロボットである例を示したが、これに限られるものではない。例えば、モバイルコンピュータ、携帯電話、スマートフォン、タブレット等のモバイル端末であってもよく、デスクトップコンピュータであってもよい。 In the form described in detail above, the communication system 1 is configured by connecting the terminal device 10 and the management server 20 via the network 70, the network 80, and the access point 90. It may be aggregated in the terminal device 10. In this case, the terminal device 10 can be regarded as a communication system 1. Further, the processing performed by the terminal device 10 can be performed in the same manner by the management server 20. That is, the functions of the terminal device 10 may be integrated into the management server 20. Therefore, in this case, the management server 20 can be regarded as the communication system 1.
Further, in the above-mentioned example, the terminal device 10 shows an example of being a robot, but the present invention is not limited thereto. For example, it may be a mobile terminal such as a mobile computer, a mobile phone, a smartphone, or a tablet, or it may be a desktop computer.

＜プログラムの説明＞
ここで、以上説明を行った本実施の形態における端末装置１０が行う処理は、例えば、アプリケーションソフトウェア等のプログラムとして用意される。そして、この処理は、ソフトウェアとハードウェア資源とが協働することにより実現される。即ち、端末装置１０に設けられたコンピュータ内部の図示しないＣＰＵが、上述した各機能を実現するプログラムを実行し、これらの各機能を実現させる。 <Program description>
Here, the process performed by the terminal device 10 in the present embodiment described above is prepared as, for example, a program such as application software. And this processing is realized by the cooperation of software and hardware resources. That is, a CPU (not shown) inside the computer provided in the terminal device 10 executes a program that realizes each of the above-mentioned functions, and realizes each of these functions.

よって、本実施の形態で、端末装置１０が行う処理は、コンピュータに、ユーザから朗読に関する要求を取得する要求取得機能と、ユーザとの会話からユーザの状態を検知する検知機能と、検知されたユーザの状態に応じて、朗読パターンを選択する選択機能と、選択された朗読パターンにより、書籍を朗読する音声出力機能と、を実現させるためのプログラムとして捉えることもできる。 Therefore, in the present embodiment, the processes performed by the terminal device 10 are detected by the computer as a request acquisition function for acquiring a request for reading from the user and a detection function for detecting the user's state from the conversation with the user. It can also be regarded as a program for realizing a selection function for selecting a reading pattern according to the state of the user and an audio output function for reading a book by the selected reading pattern.

なお、本実施の形態を実現するプログラムは、通信手段により提供することはもちろんＣＤ－ＲＯＭ等の記録媒体に格納して提供することも可能である。 The program that realizes this embodiment can be provided not only by communication means but also by storing it in a recording medium such as a CD-ROM.

以上、本実施の形態について説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、種々の変更又は改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。 Although the present embodiment has been described above, the technical scope of the present invention is not limited to the scope described in the above embodiment. It is clear from the description of the claims that the above-described embodiment with various modifications or improvements is also included in the technical scope of the present invention.

１…コミュニケーションシステム、１０…端末装置、１１…送受信部、１２…音声取得部、１３…要求取得部、１４…検知部、１５…記憶部、１６…選択部、１７…音声作成部、１８…音声出力部、２０…管理サーバ 1 ... Communication system, 10 ... Terminal device, 11 ... Transmission / reception unit, 12 ... Voice acquisition unit, 13 ... Request acquisition unit, 14 ... Detection unit, 15 ... Storage unit, 16 ... Selection unit, 17 ... Voice creation unit, 18 ... Audio output unit, 20 ... Management server

Claims

An interactive communication device that promotes communication through conversations with users.
A request acquisition method for acquiring a reading request from a user,
A detection means that detects the user's condition from a conversation with the user,
A memory means to memorize reading patterns and
A selection means for selecting a reading pattern from the storage means according to the detected state of the user, and a selection means.
A voice output means for reading a book according to the selected reading pattern,
An interactive communication device characterized by having.

Further, the selection means selects a book to be read based on the detected state of the user and / or the user information registered in advance.
The interactive communication device according to claim 1.

It also has a voice acquisition means to acquire the user's voice,
The detection means detects the state of the user based on the voice acquired by the voice acquisition means.
The interactive communication device according to claim 1.

The selection means changes the reading pattern based on the history of reading the same book to the same user in the past.
The interactive communication device according to claim 1.

The selection means selects a reading pattern identified from a combination of at least one or more parameters of reading speed, voice level, voice quality, and intonation, based on the detected state of the user.
The interactive communication device according to claim 1.

The detection means further detects the user's evaluation of the reading of the book from the conversation with the user.
The selection means selects a reading pattern in consideration of the user's evaluation.
The interactive communication device according to claim 1.

The detection means discriminates a plurality of users and
The selection means selects a reading pattern according to the state of any of the plurality of users.
The interactive communication device according to claim 1.

The selection means selects a reading pattern among a plurality of users according to the state of the child.
The interactive communication device according to claim 1.

The detection means further detects the situation around the own device and further detects the situation.
The selection means selects a reading pattern based on the detected situation.
The interactive communication device according to claim 1.

An interactive communication device that reads books aloud,
A storage device that stores the data of the audio content read aloud from the book, and
Equipped with
The interactive communication device is
An interactive communication device that promotes communication through conversations with users.
A request acquisition method for acquiring a reading request from a user,
A detection means that detects the user's condition from a conversation with the user,
A memory means to memorize reading patterns and
A selection means for selecting a reading pattern from the storage means according to the detected state of the user, and a selection means.
A voice output means for reading a book according to the selected reading pattern,
A communication system characterized by having.

On the computer
A request acquisition function that acquires a request for reading from the user,
A detection function that detects the user's status from conversations with the user,
A selection function that selects a reading pattern according to the detected state of the user, and
With the audio output function to read aloud a book according to the selected reading pattern,
A program to realize.