JP7182997B2

JP7182997B2 - picture book display system

Info

Publication number: JP7182997B2
Application number: JP2018210793A
Authority: JP
Inventors: 友希新田; 岳陽冨田; 彩華清石; 賢太郎坂元
Original assignee: Tokyo Gas Co Ltd
Current assignee: Tokyo Gas Co Ltd
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2022-12-05
Anticipated expiration: 2038-11-08
Also published as: JP2020076892A

Description

本発明は、表示システム、絵本表示システム、プログラムに関する。 The present invention relates to a display system, a picture book display system, and a program.

従来技術として、例えば、テレビに表示されたテキスト情報を読み手が読むと、テレビに絵本のページが表示される絵本表示装置が存在する。 As a conventional technology, for example, there is a picture book display device that displays a page of a picture book on a television when a reader reads text information displayed on the television.

特許文献１には、ページ番号とページ画像と後続するページのページ番号とが対応付けて記憶するテキスト情報記憶手段と、テキスト情報とページ画像を所定の表示手段に表示させる表示画像制御部と、マイクロフォン部によって得られた音声情報を所定の出力手段に出力させる出力音声制御部と、得られた音声情報を分析して、音声情報の特徴を判別し、表示対象のページ番号に対応付けられた後続ページ番号において、特徴判別部の判別結果の特徴に対応付けられたページ番号を、後続ページ番号として特定する分岐処理部と、特定された後続ページ番号を次の表示対象のページ番号として特定する震度制御部とを有する電子絵本システムが記載されている。
また、特許文献２には、予め絵本となる画像および文章などを書き込む記録媒体と、この記録媒体から読み出した背景用画像データを格納する背景用メモリおよび動画用画像データを格納する動画用メモリと、これら背景用メモリおよび動画用メモリから読み出した背景データおよび動画データを合成して絵本にする合成部とを備え、この合成部が合成した画像データを絵本として表示するように構成する電子絵本表示装置が記載されている。 Patent Document 1 discloses text information storage means for storing page numbers, page images, and page numbers of subsequent pages in association with each other; a display image control unit for displaying the text information and the page images on a predetermined display means; An output audio control unit that outputs audio information obtained by the microphone unit to a predetermined output means, and an output audio control unit that analyzes the obtained audio information, determines the characteristics of the audio information, and associates it with the page number to be displayed. In the succeeding page number, a branch processing unit that specifies the page number associated with the feature of the discrimination result of the feature discriminating unit as the succeeding page number, and specifies the specified succeeding page number as the next page number to be displayed. An electronic picture book system with a seismic intensity controller is described.
Further, Patent Document 2 discloses a recording medium in which images and sentences for a picture book are written in advance, a background memory for storing background image data read from the recording medium, and a moving image memory for storing moving image data. , and a synthesizing unit for synthesizing the background data and the moving image data read out from the background memory and the moving image memory into a picture book, wherein the image data synthesized by the synthesizing unit is displayed as a picture book. A device is described.

特開２００９－１２２４９８号公報JP 2009-122498 A 特開平５－１２０４００号公報JP-A-5-120400

ところが、従来は、絵本のストーリーは、予め用意されたものであり、読み手が自由に創作することはできない。また予め用意されたストーリーを読み手が読む場合も、絵の表示の切り換えは、読み手がコントローラ等を使用して進行操作をする必要がある。
本発明の目的は、読み手の話の内容に基づいて、聞き手の携帯端末等にリアルタイムで読み手の発話内容に応じた絵やテキスト等の表示画像が表示される表示システム等を提供することを目的とする。 However, conventionally, the story of a picture book is prepared in advance, and the reader cannot freely create it. Also, when a reader reads a story prepared in advance, it is necessary for the reader to use a controller or the like to perform a proceeding operation to switch the display of pictures.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a display system or the like in which a display image such as a picture, text, or the like corresponding to the contents of a reader's utterance is displayed in real time on a listener's portable terminal or the like based on the contents of the reader's speech. and

かくして本発明によれば、読み手の発話音声を取得する音声取得手段と、読み手の発話音声の意味を把握する把握手段と、把握手段が把握した意味に応じた画像を取得する画像取得手段と、意味に応じて取得した画像を配置し、絵本とする配置手段と、を有し、画像取得手段は、読み手の発話音声の中に、予め登録され画像を特定する文言が含まれるか否かを調べ、文言が含まれるときに、文言に対応する画像を取得し、配置手段は、読み手の発話音声の中に、予め登録され画像取得手段が取得した画像の特徴を表す特徴情報が含まれる場合は、画像に対し特徴に合わせる処理を行い、文言と特徴情報とは、絵本のページを示すページ数に対応付けて登録されており、配置手段は、背景の画像と前景の画像とを区別して配置し、画像取得手段が、新たな背景の画像を取得したときは、当該新たな背景の画像に対応づいているページ数のページとして扱うことを特徴とする絵本表示システムが提供される。
ここで、把握手段は、聞き手の発話音声の意味をさらに把握し、画像取得手段は、把握手段が把握した聞き手の発話音声の意味に応じた画像を取得するようにすることができる。この場合、読み手の発話音声のみならず聞き手の発話音声を基に絵本を作成することができる。
また、読み手の発話音声と聞き手の発話音声とを分離する分離手段をさらに備えるようにすることができる。この場合、読み手の発話音声を基に絵本を作成することができる。 Thus, according to the present invention, a voice acquisition means for acquiring the voice spoken by the reader, a comprehension means for comprehending the meaning of the voice spoken by the reader, an image acquisition means for acquiring an image corresponding to the meaning comprehended by the comprehension means, arranging means for arranging the acquired images according to their meanings to form a picture book, wherein the image acquiring means determines whether or not a pre-registered word specifying an image is included in the uttered voice of the reader. Searching, when a word is included, an image corresponding to the word is acquired, and the arranging means includes feature information representing the feature of the image registered in advance and acquired by the image acquiring means in the uttered voice of the reader. performs processing to match the features of the image, and the wording and feature information are registered in correspondence with the page numbers indicating the pages of the picture book, and the placement means distinguishes between the background image and the foreground image. A picture book display system characterized in that, when an image acquisition means acquires a new background image, it treats it as a number of pages corresponding to the new background image.
Here, the comprehension means can further comprehend the meaning of the listener's uttered voice, and the image acquisition means can acquire an image corresponding to the meaning of the listener's uttered voice comprehended by the comprehension means. In this case, a picture book can be created based not only on the voice spoken by the reader but also on the voice spoken by the listener.
Further, it is possible to further include separating means for separating the uttered voice of the reader and the uttered voice of the listener. In this case, a picture book can be created based on the uttered voice of the reader .

本発明によれば、読み手の話の内容に基づいて、聞き手の携帯端末等にリアルタイムで読み手の発話内容に応じた絵やテキスト等の表示画像が表示される表示システム等を提供することができる。 According to the present invention, it is possible to provide a display system or the like in which a display image such as a picture, text, or the like corresponding to the utterance content of the reader is displayed in real time on the portable terminal of the listener based on the content of the utterance of the reader. .

本実施の形態における表示システムの構成例を示す図である。It is a figure which shows the structural example of the display system in this Embodiment. 表示システムの概略動作の例について示した図である。FIG. 4 is a diagram showing an example of schematic operation of the display system; 第１の実施形態における表示システムの機能構成例を示したブロック図である。2 is a block diagram showing an example functional configuration of a display system according to the first embodiment; FIG. 第１の実施形態における表示システムの動作の例について説明したフローチャートである。4 is a flow chart describing an example of the operation of the display system in the first embodiment; 分離部で読み手の発話音声と聞き手の発話音声とを分離する方法について示した図である。It is the figure which showed about the method of isolate|separating a reader's utterance voice and a listener's utterance voice in a separator. （ａ）～（ｂ）は、第１の実施形態で用いられる記憶部のデータ構造を示した図である。4A and 4B are diagrams showing the data structure of a storage unit used in the first embodiment; FIG. （ａ）～（ｃ）は、配置部が、画像を配置し、絵本とする処理について示した図である。(a) to (c) are diagrams showing a process of arranging an image and making it into a picture book by an arrangement unit. 第２の実施形態における表示システムの機能構成例を示したブロック図である。FIG. 11 is a block diagram showing an example of the functional configuration of a display system according to a second embodiment; FIG. 第２の実施形態の表示システムの動作の例について説明したフローチャートである。9 is a flow chart describing an example of the operation of the display system of the second embodiment; 第３の実施形態における表示システムの機能構成例を示したブロック図である。FIG. 11 is a block diagram showing an example of the functional configuration of a display system according to a third embodiment; FIG. 第３の実施形態の表示システムの動作の例について説明したフローチャートである。10 is a flow chart describing an example of the operation of the display system of the third embodiment; 第４の実施形態の表示システムの動作の例について説明したフローチャートである。FIG. 14 is a flow chart describing an example of the operation of the display system of the fourth embodiment; FIG. 第４の実施形態で用いられる記憶部のデータ構造を示した図である。It is the figure which showed the data structure of the memory|storage part used by 4th Embodiment. （ａ）～（ｂ）は、文章２行目以降に作成される絵本の例を示した図である。(a) and (b) are diagrams showing an example of a picture book created after the second line of text. 第５の実施形態における表示システムの機能構成例を示したブロック図である。FIG. 14 is a block diagram showing an example of the functional configuration of a display system according to a fifth embodiment; FIG. 第５の実施形態の表示システムの動作の例について説明したフローチャートである。FIG. 15 is a flow chart describing an example of the operation of the display system of the fifth embodiment; FIG.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

＜表示システム１全体の説明＞
図１は、本実施の形態における表示システム１の構成例を示す図である。
図示するように本実施の形態の表示システム１は、携帯端末２０ａ、２０ｂと、管理サーバ４０とが、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されている。なお以後、携帯端末２０ａと携帯端末２０ｂとを区別しない場合には、単に「携帯端末２０」と言うことがある。 <Description of Overall Display System 1>
FIG. 1 is a diagram showing a configuration example of a display system 1 according to this embodiment.
As illustrated, the display system 1 of the present embodiment is configured by connecting mobile terminals 20 a and 20 b and a management server 40 via a network 70 , a network 80 and an access point 90 . Hereinafter, when the mobile terminal 20a and the mobile terminal 20b are not distinguished from each other, they may simply be referred to as the "mobile terminal 20".

携帯端末２０は、例えば、モバイルコンピュータ、携帯電話、スマートフォン、タブレット等のモバイル端末である。携帯端末２０ａ、２０ｂは、無線通信を行うためにアクセスポイント９０に接続する。そして、携帯端末２０ａ、２０ｂは、アクセスポイント９０を介して、ネットワーク７０に接続する。なお、詳しくは後述するが、携帯端末２０ａは、読み聞かせを行う際に、読み手が所持する携帯端末であり、携帯端末２０ｂは、聞き手が所持する携帯端末である。 The mobile terminal 20 is, for example, a mobile terminal such as a mobile computer, mobile phone, smart phone, or tablet. The mobile terminals 20a and 20b connect to the access point 90 for wireless communication. The mobile terminals 20 a and 20 b connect to the network 70 via the access point 90 . Although details will be described later, the mobile terminal 20a is a mobile terminal owned by a reader when reading a story, and the mobile terminal 20b is a mobile terminal owned by a listener.

管理サーバ４０は、表示システム１の全体の管理をするサーバコンピュータである。詳しくは後述するが、例えば、管理サーバ４０は、読み手の所持する携帯端末２０ａから、絵本等の文章を読む際の発話音声を取得する。そして、発話音声の内容に基づき画像等を取得し、絵本等の表示画像を作成する。そして、絵本等の表示画像の情報を聞き手の携帯端末２０ｂに送信する。 The management server 40 is a server computer that manages the display system 1 as a whole. Although the details will be described later, for example, the management server 40 acquires an uttered voice when reading a text in a picture book or the like from the portable terminal 20a owned by the reader. Then, an image or the like is acquired based on the content of the uttered voice, and a display image such as a picture book is created. Then, the information of the display image of the picture book or the like is transmitted to the portable terminal 20b of the listener.

携帯端末２０および管理サーバ４０は、演算手段であるＣＰＵ（Central Processing Unit）と、記憶手段であるメインメモリを備える。ここで、ＣＰＵは、ＯＳ（基本ソフトウェア）やアプリ（応用ソフトウェア）等の各種ソフトウェアを実行する。また、メインメモリは、各種ソフトウェアやその実行に用いるデータ等を記憶する記憶領域である。さらに、携帯端末２０は、外部との通信を行うための通信インタフェース（以下、「通信Ｉ／Ｆ」と表記する）と、ビデオメモリやディスプレイ等からなる表示機構と、入力ボタン、タッチパネル、キーボード等の入力機構とを備える。そして、携帯端末２０は、音声の出力を行うスピーカと、音声の入力を行うマイクロフォンとを備える。また、管理サーバ４０は、補助記憶装置として、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を備える。 The mobile terminal 20 and the management server 40 are provided with a CPU (Central Processing Unit), which is computing means, and a main memory, which is storage means. Here, the CPU executes various types of software such as an OS (basic software) and applications (application software). The main memory is a storage area for storing various software and data used for executing the software. Further, the mobile terminal 20 includes a communication interface (hereinafter referred to as “communication I/F”) for communicating with the outside, a display mechanism including a video memory, a display, etc., an input button, a touch panel, a keyboard, etc. and an input mechanism. The mobile terminal 20 includes a speaker for outputting voice and a microphone for inputting voice. The management server 40 also includes an HDD (Hard Disk Drive) and an SSD (Solid State Drive) as auxiliary storage devices.

ネットワーク７０は、携帯端末２０および管理サーバ４０の情報通信に用いられる通信手段であり、例えば、インターネットである。
ネットワーク８０も、ネットワーク７０と同様に、携帯端末２０および管理サーバ４０の間の情報通信に用いられる通信手段であり、例えば、ＬＡＮ（Local Area Network）である。 The network 70 is communication means used for information communication between the mobile terminal 20 and the management server 40, and is, for example, the Internet.
The network 80 is also a communication means used for information communication between the mobile terminal 20 and the management server 40, like the network 70, and is, for example, a LAN (Local Area Network).

アクセスポイント９０は、無線通信回線を利用して無線通信を行う機器である。アクセスポイント９０は、携帯端末２０や管理サーバ４０とネットワーク７０やネットワーク８０との間の情報の送受信を媒介する。
無線通信回線の種類としては、携帯電話回線、ＰＨＳ（Personal Handy-phone System）回線、Ｗｉ－Ｆｉ（Wireless Fidelity）、Bluetooth（登録商標）、ZigBee、ＵＷＢ（Ultra Wideband）等の各回線が使用可能である。 The access point 90 is a device that performs wireless communication using a wireless communication line. The access point 90 mediates transmission and reception of information between the mobile terminal 20 or the management server 40 and the network 70 or 80 .
Available types of wireless communication lines include mobile phone lines, PHS (Personal Handy-phone System) lines, Wi-Fi (Wireless Fidelity), Bluetooth (registered trademark), ZigBee, and UWB (Ultra Wideband) lines. is.

＜表示システム１の動作の概略説明＞
図２は、表示システム１の概略動作の例について示した図である。
まず、携帯端末２０ａを所持する読み手が、本システムに使用される専用アプリを起動し、携帯端末２０ａのマイクロフォンに向け、絵本等の文章を話し、発話音声を入力する（１Ａ）。マイクロフォンは、発話音声を、発話音声の音圧に応じた電気信号に変換する。そして、携帯端末２０ａでは、音声信号をいったん増幅する。そして、予め定められたサンプリング周波数にて、サンプリングし、デジタル化を行い、発話音声の情報を作成する。なお、聞き手は、読み手の側におり、聞き手の発話音声を聞くことができる。また、例えば、読み手は、親、祖父母であり、聞き手は、子供である。 <Overview of Operation of Display System 1>
FIG. 2 is a diagram showing an example of the schematic operation of the display system 1. As shown in FIG.
First, a reader who owns the mobile terminal 20a activates a dedicated application used in this system, speaks text from a picture book or the like into the microphone of the mobile terminal 20a, and inputs the uttered voice (1A). A microphone converts speech into an electrical signal corresponding to the sound pressure of the speech. Then, the portable terminal 20a once amplifies the audio signal. Then, sampling is performed at a predetermined sampling frequency, digitization is performed, and information of the uttered voice is created. Note that the listener is on the reader's side and can hear the listener's uttered voice. Also, for example, readers are parents and grandparents, and listeners are children.

次に、携帯端末２０ａは、この発話音声の情報を、送信情報として管理サーバ４０に対し送信する（１Ｂ）。送信情報は、アクセスポイント９０、ネットワーク７０、ネットワーク８０を介し、管理サーバ４０に送られる。
管理サーバ４０では、受け取った発話音声の情報から、発話音声の意味を把握し、把握した意味を現す表示要素をデータベースから取得する（１Ｃ）。発話音声の意味は、音声認識により把握することができる。またここで、「表示要素」は、把握した意味に対応し、携帯端末２０ｂで表示を行う表示画像の個々の要素である。具体的には、「表示要素」は、画像やテキスト等であり、例えば、把握した意味が「熊」であった場合、画像としては、熊の画像が「表示要素」となる。なお、画像は、静止画であるか動画であるかを問わない。またテキストとしては、「熊」、「bear」などの単語が「表示要素」となる。なお、このテキストは、単語でもよいが、複数の単語から構成されているセンテンスでもよい。
さらに、管理サーバ４０では、データベースから取得した画像を配置し、聞き手が閲覧する絵本等の表示画像を作成する（１Ｄ）。
そして、管理サーバ４０は、作成した表示画像の情報を、送信情報として聞き手の所持する携帯端末２０ｂに対し送信する（１Ｅ）。表示画像の情報は、ネットワーク８０、ネットワーク７０、アクセスポイント９０を介し、携帯端末２０ｂに送られる。 Next, the portable terminal 20a transmits the information of the uttered voice to the management server 40 as transmission information (1B). The transmission information is sent to management server 40 via access point 90 , network 70 and network 80 .
The management server 40 grasps the meaning of the uttered voice from the information of the received uttered voice, and acquires display elements representing the grasped meaning from the database (1C). The meaning of the uttered voice can be grasped by voice recognition. Here, the “display element” is an individual element of the display image displayed on the mobile terminal 20b, corresponding to the grasped meaning. Specifically, the "display element" is an image, text, or the like. For example, if the grasped meaning is "bear", the image of the bear is the "display element". It does not matter whether the image is a still image or a moving image. As text, words such as "bear" and "bear" are "display elements". Note that this text may be a word, or may be a sentence composed of a plurality of words.
Furthermore, the management server 40 arranges the images acquired from the database and creates display images such as picture books for listeners to view (1D).
Then, the management server 40 transmits information of the created display image as transmission information to the mobile terminal 20b owned by the listener (1E). Information about the display image is sent to the mobile terminal 20b via the network 80, the network 70, and the access point 90. FIG.

携帯端末２０ｂでは、受け取った表示画像の情報を基に、表示機構に表示画像が表示される（１Ｆ）。そして、聞き手が、読み手の発話音声を聞きつつ、表示された表示画像を見ることで、読み聞かせを行うことができる。 In the portable terminal 20b, the display image is displayed on the display mechanism based on the received display image information (1F). Then, the listener can listen to the uttered voice of the reader and see the displayed display image, thereby reading aloud.

次に、本実施の形態の表示システム１の詳細な機能構成および動作について説明する。 Next, the detailed functional configuration and operation of the display system 1 of this embodiment will be described.

＜表示システム１の機能構成の説明＞
［第１の実施形態］
ここでは、まず、表示システム１の機能構成の第１の実施形態について説明を行う。第１の実施形態の表示システム１では、管理サーバ４０が、読み手の発話音声の意味に応じた画像を取得し、取得した画像を配置して、表示画像として絵本を作成する。そして、聞き手は、携帯端末２０ｂで、表示画像である絵本を閲覧する。即ち、この場合、表示システム１は、絵本表示システムとして機能する。
図３は、第１の実施形態における表示システム１の機能構成例を示したブロック図である。
なおここでは、表示システム１が有する種々の機能のうち本実施の形態に関係するものを選択して図示している。
表示システム１において、携帯端末２０ａと携帯端末２０ｂとは、同様の機能構成を有し、送信情報の送受信を行う送受信部２１と、画像の表示を行う表示部２２と、情報を入力する入力部２３と、発話音声を取得する音声取得部２４とを備える。 <Description of the functional configuration of the display system 1>
[First Embodiment]
Here, first, the first embodiment of the functional configuration of the display system 1 will be described. In the display system 1 of the first embodiment, the management server 40 acquires an image corresponding to the meaning of the uttered voice of the reader, arranges the acquired images, and creates a picture book as a display image. Then, the listener browses the picture book, which is the display image, on the portable terminal 20b. That is, in this case, the display system 1 functions as a picture book display system.
FIG. 3 is a block diagram showing a functional configuration example of the display system 1 according to the first embodiment.
Here, among the various functions of the display system 1, those related to the present embodiment are selected and illustrated.
In the display system 1, the mobile terminal 20a and the mobile terminal 20b have the same functional configuration, and include a transmission/reception unit 21 for transmitting and receiving transmission information, a display unit 22 for displaying images, and an input unit for inputting information. 23, and a voice acquisition unit 24 that acquires an uttered voice.

送受信部２１は、発話音声の情報や絵本の情報などの送信情報の送受信を行う。送受信部２１は、例えば、通信Ｉ／Ｆであり、アクセスポイント９０、ネットワーク７０およびネットワーク８０を介し、管理サーバ４０と情報の送受信を行う。 The transmitting/receiving unit 21 transmits/receives transmission information such as speech information and picture book information. The transmission/reception unit 21 is, for example, a communication I/F, and transmits/receives information to/from the management server 40 via the access point 90, the network 70, and the network 80. FIG.

表示部２２は、絵本などの画像の表示を行う。表示部２２は、例えば、タッチパネルである。この場合、表示部２２は、各種情報が表示されるディスプレイと、指やスタイラスペン等で接触された位置を検出する位置検出シートとを備える。接触された位置を検出する手段としては、接触による圧力をもとに検出する抵抗膜方式や、接触した物の静電気をもとに検出する静電容量方式など、どのようなものが用いられてもよい。 The display unit 22 displays images such as picture books. The display unit 22 is, for example, a touch panel. In this case, the display unit 22 includes a display for displaying various information and a position detection sheet for detecting the position touched by a finger, stylus pen, or the like. As a means to detect the contact position, what kind of method is used, such as a resistive film method that detects based on the pressure caused by contact, or a capacitance method that detects based on the static electricity of the contacted object. good too.

入力部２３は、絵本の読み手や聞き手が、所定の操作を行うための操作機構である。
例えば、上述したタッチパネルである。この場合、タッチパネルは、表示部２２および入力部２３の双方の機能を有する。つまり、絵本などの画像を表示するとともに、表示された画面に対し、タッチを行うことで、専用アプリの起動・終了や専用アプリに対する操作を行うことができる。なお、これに限られるものではなく、入力部２３は、キーボードやマウス等で構成されていてもよい。 The input unit 23 is an operation mechanism for a picture book reader or listener to perform a predetermined operation.
For example, it is the touch panel mentioned above. In this case, the touch panel has the functions of both the display section 22 and the input section 23 . In other words, by displaying an image of a picture book or the like and touching the displayed screen, it is possible to start/end the dedicated application or to operate the dedicated application. Note that the input unit 23 is not limited to this, and may be configured by a keyboard, a mouse, or the like.

音声取得部２４は、発話音声を取得する。音声取得部２４は、例えば、マイクロフォンである。マイクロフォンの種類としては、ダイナミック型、コンデンサ型等、既存の種々のものを用いてよい。また、マイクロフォンとして、無指向性のＭＥＭＳ（Micro Electro Mechanical Systems）型マイクロフォンであることが好ましい。 The voice acquisition unit 24 acquires uttered voice. The voice acquisition unit 24 is, for example, a microphone. As for the type of microphone, existing various types such as a dynamic type and a condenser type may be used. Further, the microphone is preferably an omnidirectional MEMS (Micro Electro Mechanical Systems) type microphone.

管理サーバ４０は、外部と通信を行う送受信部４１と、読み手の発話音声と聞き手の発話音声とを分離する分離部４２と、発話音声の意味を把握する把握部４３と、把握した意味に応じた画像を取得する画像取得部４４と、画像を記憶する記憶部４５と、画像を配置して絵本とする配置部４６とを有する。 The management server 40 includes a transmission/reception unit 41 that communicates with the outside, a separation unit 42 that separates the uttered voice of the reader and the uttered voice of the listener, a comprehension unit 43 that comprehends the meaning of the uttered voice, and a It has an image acquisition unit 44 for acquiring the image obtained, a storage unit 45 for storing the image, and an arrangement unit 46 for arranging the image to form a picture book.

送受信部４１は、携帯端末２０と通信を行い、所定の情報のやりとりを行う。送受信部４１は、音声取得手段の一例であり、携帯端末２０ａから送られた読み手の発話音声を送信情報として取得する。また、送受信部４１は、出力手段の一例であり、絵本等の表示画像の情報を聞き手の携帯端末２０ｂに送る。
分離部４２は、分離手段の一例であり、読み手の発話音声と聞き手の発話音声とを分離する。これらの分離を行う方法は、後述する。 The transmission/reception unit 41 communicates with the mobile terminal 20 to exchange predetermined information. The transmitting/receiving unit 41 is an example of voice acquisition means, and acquires the uttered voice of the reader sent from the mobile terminal 20a as transmission information. Further, the transmitting/receiving section 41 is an example of output means, and transmits information of a display image such as a picture book to the mobile terminal 20b of the listener.
The separating unit 42 is an example of separating means, and separates the uttered voice of the reader and the uttered voice of the listener. A method for performing these separations will be described later.

把握部４３は、把握手段の一例であり、読み手の発話音声の意味を把握する。詳しくは、後述するが、把握部４３は、発話音声の中に、予め登録された文言が含まれているか否かを判断し、含まれていた場合は、この文言の意味を発話音声の意味の１つとする。
画像取得部４４は、要素取得手段の一例および画像取得手段の一例であり、把握部４３が把握した意味を現す表示要素を取得する。ここでは、表示要素として、把握部４３が把握した意味に応じた画像を取得する。 The comprehension unit 43 is an example of comprehension means, and comprehends the meaning of the uttered voice of the reader. Although the details will be described later, the grasping unit 43 determines whether or not a pre-registered wording is included in the uttered voice. be one of
The image acquiring unit 44 is an example of an element acquiring unit and an example of an image acquiring unit, and acquires display elements representing the meaning grasped by the grasping unit 43 . Here, an image corresponding to the meaning grasped by the grasping unit 43 is acquired as a display element.

記憶部４５は、上記意味と関連付けて画像取得部４４が取得する画像を記憶する。また、記憶部４５は、上記意味と関連付けて画像に対して行う処理の内容について記憶する。この処理の内容については、後述する。
配置部４６は、配置手段の一例であり、取得した表示要素を配置し、聞き手が閲覧する表示画像とする。ここでは、配置部４６は、把握した意味に応じて取得した画像を配置し、絵本とする。つまり、配置部４６は、画像取得部４４が取得した画像に対し、把握した意味に応じた所定の処理を行い、処理が行われた画像を配置して、絵本の絵を構成する。詳しくは、後述するが、絵本の絵は、背景に対し前景を重ね合わせることで行われる。 The storage unit 45 stores the image acquired by the image acquiring unit 44 in association with the meaning. In addition, the storage unit 45 stores the content of the processing to be performed on the image in association with the meaning. The details of this processing will be described later.
The arranging unit 46 is an example of arranging means, and arranges the acquired display elements to make a display image that is viewed by the listener. Here, the arranging unit 46 arranges the acquired images according to the comprehended meaning to form a picture book. That is, the arranging unit 46 performs predetermined processing on the image acquired by the image acquiring unit 44 according to the grasped meaning, and arranges the processed image to form a picture book. Although the details will be described later, picture books are drawn by superimposing the foreground on the background.

送受信部４１は、例えば、通信Ｉ／Ｆである。また、分離部４２、把握部４３、画像取得部４４、配置部４６の各機能は、例えば、ＣＰＵにより実現することができる。さらに、記憶部４５は、例えば、ＨＤＤやＳＳＤ等の補助記憶装置を利用することで構築されたデータベースである。 The transmitter/receiver 41 is, for example, a communication I/F. Also, each function of the separating unit 42, the grasping unit 43, the image acquiring unit 44, and the arranging unit 46 can be realized by, for example, a CPU. Furthermore, the storage unit 45 is a database constructed by using an auxiliary storage device such as an HDD or SSD, for example.

＜表示システム１の動作の説明＞
次に、第１の実施形態の表示システム１の動作について、より詳細に説明を行う。
図４は、第１の実施形態における表示システム１の動作の例について説明したフローチャートである。
まず、絵本の読み手が、携帯端末２０ａの入力部２３を使用して専用アプリを操作し、発話音声を入力する。この発話音声は、音声取得部２４が取得する（ステップ１０１）。
次に、読み手の携帯端末２０ａの送受信部２１が、管理サーバ４０に対し、発話音声の情報を、送信情報として送信する（ステップ１０２）。 <Description of Operation of Display System 1>
Next, the operation of the display system 1 of the first embodiment will be described in more detail.
FIG. 4 is a flow chart explaining an example of the operation of the display system 1 according to the first embodiment.
First, the reader of the picture book operates the dedicated application using the input unit 23 of the mobile terminal 20a and inputs the uttered voice. This uttered voice is acquired by the voice acquiring unit 24 (step 101).
Next, the transmitting/receiving section 21 of the reader's portable terminal 20a transmits the information of the uttered voice as transmission information to the management server 40 (step 102).

管理サーバ４０では、送受信部４１が、発話音声の情報を受信する（ステップ１０３）。これは、いったん記憶部４５にて、記憶される。
次に、管理サーバ４０の分離部４２が、読み手の発話音声と聞き手の発話音声とを分離する（ステップ１０４）。つまり、絵本の読み聞かせをする際は、読み手の発話に対し、種々の反応を示すことが多い。例えば、読み手に絵本の内容について質問をしたり、絵本の内容に対する感想として、例えば、「すごい！」、「こわい！」などの音声を発することがある。本実施の形態では、分離部４２は、読み手の発話音声だけを抽出するために、読み手の発話音声と聞き手の発話音声とを分離する。 In the management server 40, the transmitting/receiving section 41 receives the information of the uttered voice (step 103). This is once stored in the storage unit 45 .
Next, the separation unit 42 of the management server 40 separates the uttered voice of the reader and the uttered voice of the listener (step 104). In other words, when reading a picture book, it often shows various reactions to the utterances of the reader. For example, the reader may ask a question about the content of the picture book, or utter a voice such as "Amazing!" or "Scary!" as an impression of the content of the picture book. In the present embodiment, the separating unit 42 separates the uttered voice of the reader and the uttered voice of the listener in order to extract only the uttered voice of the reader.

図５は、分離部４２で読み手の発話音声と聞き手の発話音声とを分離する方法について示した図である。
ここで、縦軸は、聞き手の発話音声の音圧を１としたときの、読み手の発話音声の音圧を示している。この例では、聞き手の発話音声の音圧と読み手の発話音声の音圧との比は、４となっている。つまり、読み手は、携帯端末２０ａに向かい発話を行い、携帯端末２０ａと発声部位である口との距離は、非常に近い。対して、聞き手は、読み手のそばにいるものの、携帯端末２０ａに対して、比較的遠い位置にいる。そのため、この距離差に起因して、音圧に、この例では、４倍の差異が生じる。よって、音圧に閾値を設け、分離部４２は、この閾値より大きい音圧が検出された場合は、読み手の発話音声と判断し、この閾値以下の音圧が検出された場合は、聞き手の発話音声と判断することができる。図示する例では、音圧として、２を閾値としている。 FIG. 5 is a diagram showing how the separator 42 separates the uttered voice of the reader and the uttered voice of the listener.
Here, the vertical axis indicates the sound pressure of the uttered voice of the reader when the sound pressure of the uttered voice of the listener is 1. In this example, the ratio of the sound pressure of the listener's uttered voice to the sound pressure of the reader's uttered voice is four. In other words, the reader speaks toward the mobile terminal 20a, and the distance between the mobile terminal 20a and the mouth, which is the utterance part, is very short. On the other hand, although the listener is near the reader, the listener is relatively far away from the portable terminal 20a. Therefore, due to this distance difference, the sound pressure will differ by a factor of four in this example. Therefore, a threshold is set for the sound pressure, and when the sound pressure greater than this threshold is detected, the separation unit 42 determines that it is the uttered voice of the reader, and when the sound pressure below this threshold is detected, the sound pressure is It can be judged as speech voice. In the illustrated example, 2 is used as the threshold for the sound pressure.

図４に戻り、次に、把握部４３が、分離部４２で分離された読み手の発話音声の意味を把握する（ステップ１０５）。具体的には、把握部４３は、既知の音声認識技術を使用し、音声を文字列に変換する。既知の音声認識技術は、例えば、隠れマルコフモデル（Hidden Markov Model）を用いた統計的手法や、動的時間伸縮法を用いることができる。
そして、把握部４３は、変換された文字列の中に、予め登録され画像を特定する文言が含まれるか否かを調べる（ステップ１０６）。この文言は、単語でもよく、複数の単語から構成されているセンテンスでもよい。
その結果、含まれない場合（ステップ１０６でＮｏ）、ステップ１０１に戻る。
対して、含まれる場合（ステップ１０６でＹｅｓ）、画像取得部４４は、記憶部４５から、この文言に対応する画像を取得する（ステップ１０７）。
次に、配置部４６は、読み手の発話音声の中に、予め登録され画像取得部４４が取得した画像の特徴を表す特徴情報が含まれるか否かを調べる（ステップ１０８）。
その結果、含まれない場合（ステップ１０８でＮｏ）、ステップ１１０に進む。
対して、含まれる場合（ステップ１０８でＹｅｓ）、配置部４６は、この画像に対しこの特徴に合わせる処理を行う（ステップ１０９）。 Returning to FIG. 4, next, the comprehension unit 43 comprehends the meaning of the uttered voice of the reader separated by the separation unit 42 (step 105). Specifically, the grasping unit 43 uses known speech recognition technology to convert the speech into a character string. Known speech recognition techniques can use, for example, statistical methods using Hidden Markov Models and dynamic time warping.
Then, the grasping unit 43 checks whether or not the converted character string includes a pre-registered word specifying an image (step 106). This phrase may be a word or a sentence composed of a plurality of words.
As a result, if it is not included (No at step 106 ), the process returns to step 101 .
On the other hand, if it is included (Yes in step 106), the image acquisition unit 44 acquires an image corresponding to this wording from the storage unit 45 (step 107).
Next, the arranging unit 46 checks whether or not feature information representing the feature of the image registered in advance and acquired by the image acquiring unit 44 is included in the uttered voice of the reader (step 108).
As a result, if it is not included (No in step 108), the process proceeds to step 110. FIG.
On the other hand, if it is included (Yes in step 108), the arrangement unit 46 performs processing for matching this feature to this image (step 109).

図６（ａ）～（ｂ）は、第１の実施形態で用いられる記憶部４５のデータ構造を示した図である。
図示するように、記憶部４５に記憶されるデータのデータ構造は、図６（ａ）に示す画像情報に対するデータ構造と、図６（ｂ）に示す特徴情報に対するデータ構造の２種類が存在する。ここで「特徴情報」は、絵本に登場するものを特徴付ける情報である。
図６（ａ）に示す画像情報に対するデータ構造は、図４のステップ１０６～ステップ１０７で使用される。このデータ構造は、Ｎｏ．文言、属性、画像の４つからなる。このうち、「Ｎｏ．」は、各文言毎に付与される番号である。また、「文言」は、ステップ１０６で述べた予め登録された文言である。さらに、「属性」は、画像の属性であり、背景であるか前景であるかの何れかを示す。そして、「画像」は、画像のデータが格納されるファイルのファイル名を示す。 6A and 6B are diagrams showing the data structure of the storage unit 45 used in the first embodiment.
As shown in the figure, there are two types of data structures for data stored in the storage unit 45: the data structure for image information shown in FIG. 6(a) and the data structure for feature information shown in FIG. 6(b). . Here, the "feature information" is information that characterizes the things appearing in the picture book.
The data structure for the image information shown in FIG. 6(a) is used in steps 106-107 of FIG. This data structure is No. It consists of four elements: text, attributes, and images. Among these, "No." is a number assigned to each wording. "Text" is the pre-registered text mentioned in step 106 . Further, "attribute" is the attribute of the image, indicating either background or foreground. "Image" indicates the file name of the file in which image data is stored.

また、図６（ｂ）に示す特徴情報に対するデータ構造は、図４のステップ１０８～ステップ１０９で使用される。このデータ構造は、Ｎｏ．文言、処理の３つからなる。このうち、「Ｎｏ．」は、各文言毎に付与される番号である。また、「文言」は、ステップ１０６で述べた予め登録された文言である。さらに、「処理」は、画像に対して行う処理の内容を示す。 Also, the data structure for the feature information shown in FIG. 6(b) is used in steps 108 and 109 of FIG. This data structure is No. It consists of three parts: wording and processing. Among these, "No." is a number assigned to each wording. "Text" is the pre-registered text mentioned in step 106 . Furthermore, "processing" indicates the content of the processing to be performed on the image.

図４のステップ１０６において、把握部４３は、変換された文字列の中に、図６（ａ）に挙げた文言が存在するか否かを調べる。そして、存在した場合、画像取得部４４は、ステップ１０７において、この文言に対応する画像を取得する。図６（ａ）で図示した例は、例えば、文言として、「森」、「空」等が登録される。そして、森を表す画像として、「□×◇．ｊｐｇ」のファイル名で示す画像のデータが用意され、空を表す画像として、「○×△．ｊｐｇ」のファイル名で示す画像のデータが用意される。 At step 106 in FIG. 4, the grasping unit 43 checks whether or not the wording shown in FIG. 6(a) exists in the converted character string. Then, if it exists, the image acquisition unit 44 acquires an image corresponding to this wording in step 107 . In the example shown in FIG. 6A, for example, "forest", "sky", etc. are registered as words. Then, image data with a file name of "□×◇.jpg" is prepared as an image representing the forest, and image data with a file name of "○×△.jpg" is prepared as an image representing the sky. be done.

また、図４のステップ１０８において、配置部４６は、変換された文字列の中に、図６（ｂ）に挙げた文言が存在するか否かを調べる。そして、存在した場合、配置部４６は、ステップ１０９において、この画像に対し、この特徴に合わせる処理を行う。図６（ｂ）で図示した例は、例えば、文言として、「大きい」、「小さい」等が登録される。そして、文言として「大きい」が存在したときは、対応する画像を拡大する。対して、文言として「小さい」が存在したときは、対応する画像を縮小する。また、例えば、「３匹」などの「数字＋匹（ひき）」の組み合わせの場合は、動物等の画像の数を３匹になるように増減する。さらに、文言として、例えば、「走る」が存在したときは、対応する画像を速く動かし、走る様を表す。またさらに、文言として、例えば、「青い」が存在したときは、対応する画像を青色に着色する。 Also, at step 108 in FIG. 4, the arrangement unit 46 checks whether the wordings listed in FIG. 6B exist in the converted character string. Then, if there is, the placement unit 46 performs a process of matching this feature to this image in step 109 . In the example shown in FIG. 6B, for example, "large", "small", etc. are registered as words. Then, when the word "large" exists, the corresponding image is enlarged. On the other hand, when the word "small" exists, the corresponding image is reduced. Further, for example, in the case of a combination of "number + animal" such as "3 animals", the number of images of animals or the like is increased or decreased to three animals. Furthermore, when there is, for example, "run" as a word, the corresponding image is moved quickly to represent running. Furthermore, when there is, for example, "blue" as a word, the corresponding image is colored blue.

再び図４に戻り、次に、配置部４６は、画像を配置し、絵本とする（ステップ１１０）。
図７（ａ）～（ｃ）は、配置部４６が、画像を配置し、絵本とする処理について示した図である。なお、ここでは、読み手が、以下の文章を発話した場合を例に取り説明を行う。

――――――――――――――――――――――――――――――――――――――――
読み手：「深い深い森の中に、熊さんがいました。熊さんは、３匹いました。」
――――――――――――――――――――――――――――――――――――――――
Returning to FIG. 4 again, the placement unit 46 then places the images to form a picture book (step 110).
FIGS. 7A to 7C are diagrams showing the process of arranging images to form a picture book by the arranging unit 46. FIG. Here, the case where the reader utters the following sentence will be explained as an example.

――――――――――――――――――――――――――――――――――――――――
Reader: "There was a bear in a deep, deep forest. There were three bears."
――――――――――――――――――――――――――――――――――――――――

このうち、図７（ａ）は、読み手の発話音声の中に、「森」の文言が存在した場合に、配置部４６が森に対応する画像Ｇｍを配置した状態を示している。「森」の文言は、図６（ａ）に示すＮｏ．１００１の「森」の文言に合致するため、画像取得部４４は、森の画像Ｇｍを取得し、配置部４６が森の画像Ｇｍを配置する。これは、読み手が、「深い深い森の中に、」の発話をした場合が該当する。また、配置部４６は、画像を配置するときに、図６（ａ）で示した属性を考慮する。即ち、属性には、背景と前景があり、配置部４６は、背景の画像と前景の画像とを区別して配置する。この場合、「森」の文言の属性は、背景であるため、配置部４６は、森の画像Ｇｍを背景として配置する。 Among them, FIG. 7A shows a state in which the arrangement unit 46 arranges the image Gm corresponding to the forest when the word "forest" is present in the uttered voice of the reader. The word "forest" is No. shown in FIG. 6(a). Since the word "forest" in 1001 is matched, the image acquisition unit 44 acquires the forest image Gm, and the arrangement unit 46 arranges the forest image Gm. This corresponds to the case where the reader utters "in a deep, deep forest". Also, the layout unit 46 considers the attributes shown in FIG. 6A when arranging the images. That is, the attributes include background and foreground, and the placement unit 46 places the background image and the foreground image separately. In this case, since the attribute of the words “forest” is the background, the arrangement unit 46 arranges the forest image Gm as the background.

次に、図７（ｂ）は、読み手の発話音声の中に、「熊」の文言が存在した場合に、配置部４６が熊に対応する画像Ｇｋを配置した状態を示している。「熊」の文言は、図６（ａ）に示すＮｏ．１００６の「熊」の文言に合致するため、画像取得部４４は、熊の画像Ｇｋを取得し、配置部４６が熊の画像Ｇｋを配置する。これは、読み手が、「深い深い森の中に、」の後に、「熊さんがいました。」の発話をした場合が該当する。このとき、上述した場合と同様に、配置部４６は、画像を配置するときに、図６（ａ）で示した属性を考慮する。この場合、「熊」の文言の属性は、前景であるため、配置部４６は、熊の画像Ｇｋを前景として配置する。具体的には、背景の森の画像Ｇｍを隠すようにして、熊の画像Ｇｋを重畳させて配置する。 Next, FIG. 7B shows a state in which the arrangement unit 46 arranges an image Gk corresponding to a bear when the word "bear" is present in the uttered voice of the reader. The wording of "bear" is No. shown in FIG. 6(a). Since the word "bear" in 1006 is matched, the image acquisition unit 44 acquires the bear image Gk, and the arrangement unit 46 arranges the bear image Gk. This corresponds to the case where the reader utters "There was a bear." after "In the deep, deep forest." At this time, similarly to the case described above, the arranging unit 46 considers the attributes shown in FIG. 6A when arranging the images. In this case, since the attribute of the words “bear” is foreground, the placement unit 46 places the bear image Gk as the foreground. Specifically, the background forest image Gm is hidden, and the bear image Gk is superimposed.

さらに、図７（ｃ）は、読み手の発話音声の中に、「３匹」の文言が存在した場合に、配置部４６がこれに対応する画像を配置した状態を示している。この場合、「３匹」の文言は、図６（ｂ）に示すＮｏ．５００４の「数字＋匹（ひき）」の文言に合致するため、配置部４６は、熊の画像Ｇｋを３匹になるように増加させ、３匹となった熊の画像Ｇｋを配置する。これは、例えば、読み手が、「深い深い森の中に、熊さんがいました。」の後に、「熊さんは、３匹いました。」等の発話をした場合が該当する。 Furthermore, FIG. 7(c) shows a state in which the arrangement unit 46 arranges an image corresponding to the word "three" in the uttered voice of the reader. In this case, the wording of "three" is No. shown in FIG. 6(b). 5004, the arrangement unit 46 increases the number of bear images Gk to three, and arranges the three bear images Gk. This corresponds to, for example, the case where the reader utters "There were three bears." after "There was a bear in a deep, deep forest."

図４に戻り、送受信部４１は、配置部４６が作成した絵本の情報を、聞き手の携帯端末２０ｂに送信する（ステップ１１１）。本実施の形態では、図７（ａ）～（ｃ）に示した絵本の各画像が、順次送られる。
絵本の情報は、携帯端末２０ｂの送受信部２１が受信し（ステップ１１２）、表示部２２にて、絵本が表示される（ステップ１１３）。この場合、聞き手は、読み手の音声として、「深い深い森の中に、熊さんがいました。熊さんは、３匹いました。」を聞きつつ、これに合わせて、図７（ａ）～（ｃ）の画像を順次見ることになり、絵本の読み聞かせをすることができる。 Returning to FIG. 4, the transmitting/receiving section 41 transmits the picture book information created by the arranging section 46 to the portable terminal 20b of the listener (step 111). In this embodiment, the picture book images shown in FIGS. 7A to 7C are sequentially sent.
The picture book information is received by the transmitter/receiver 21 of the portable terminal 20b (step 112), and the picture book is displayed on the display unit 22 (step 113). In this case, the listener listens to the reader's voice, "There was a bear in a deep forest. There were three bears." The images in (c) are sequentially viewed, and the picture book can be read aloud.

なお、画像取得部４４が、新たな背景の画像を取得したときは、絵本における新しいページとして扱うことができる。つまり、背景が新しくなったときは、今までのページとは、異なる場面であり、新たなページであるとみなすことができる。よって、配置部４６は、新しいページを用意し、新たな背景上に前景の画像を配置する。即ち、これによりページめくりをすることができる。 When the image acquiring unit 44 acquires a new background image, it can be treated as a new page in the picture book. In other words, when the background is new, it can be regarded as a different scene from the previous page and a new page. Therefore, the placement unit 46 prepares a new page and places the foreground image on the new background. That is, this enables page turning.

［第２の実施形態］
次に、第２の実施形態について説明を行う。第２の実施形態では、読み手の発話音声だけでなく、聞き手の発話音声を加えて、絵本の作成を行う。
図８は、第２の実施形態における表示システム１の機能構成例を示したブロック図である。
この表示システム１において、携帯端末２０ａおよび携帯端末２０ｂとは、図３に示した第１の実施形態における表示システム１と同様の機能構成を有し、送信情報の送受信を行う送受信部２１と、画像の表示を行う表示部２２と、情報を入力する入力部２３と、発話音声を取得する音声取得部２４とを備える。
一方、管理サーバ４０は、分離部４２が存在しないことを除き、第１の実施形態と同様の機能構成を有する。即ち、送受信部４１と、把握部４３と、画像取得部４４と、記憶部４５と、配置部４６とを有する。
携帯端末２０ａ、携帯端末２０ｂおよび管理サーバ４０の各機能部は、第１の実施形態と同様の動作を行う。 [Second embodiment]
Next, a second embodiment will be described. In the second embodiment, a picture book is created by adding not only the uttered voice of the reader but also the uttered voice of the listener.
FIG. 8 is a block diagram showing a functional configuration example of the display system 1 according to the second embodiment.
In this display system 1, the mobile terminal 20a and the mobile terminal 20b have the same functional configuration as the display system 1 in the first embodiment shown in FIG. It includes a display unit 22 for displaying images, an input unit 23 for inputting information, and a voice acquisition unit 24 for acquiring uttered voices.
On the other hand, the management server 40 has the same functional configuration as that of the first embodiment except that the separation unit 42 is not present. That is, it has a transmitting/receiving section 41 , a grasping section 43 , an image obtaining section 44 , a storage section 45 and an arrangement section 46 .
Each functional unit of the mobile terminal 20a, the mobile terminal 20b, and the management server 40 operates in the same manner as in the first embodiment.

図９は、第２の実施形態の表示システム１の動作の例について説明したフローチャートである。
図９に示したフローチャートのステップ２０１～ステップ２１２は、図４に示したフローチャートのステップ１０１～ステップ１０３、ステップ１０５～ステップ１１３と同様である。即ち、ステップ１０４がないことを除き同様である。よって、第２の実施形態では、読み手の発話音声と聞き手の発話音声との分離を行わない。そして、双方の発話音声についてステップ２０４以降の処理を行うため、読み手の発話音声の意味のみならず、聞き手の発話音声の意味についても把握し、絵本の作成に反映させる。なお、図５で示したように、聞き手の発話音声の音圧は、小さいため、聞き手の発話音声は、携帯端末２０ａではなく、携帯端末２０ｂで取得するようにしてもよい。この場合、携帯端末２０ｂは、聞き手が所持しており、携帯端末２０ａよりも、より近い距離に存在するため、より大きい音圧で発話音声を取得することができる。 FIG. 9 is a flow chart explaining an example of the operation of the display system 1 of the second embodiment.
Steps 201 to 212 of the flowchart shown in FIG. 9 are the same as steps 101 to 103 and steps 105 to 113 of the flowchart shown in FIG. That is, it is the same except that step 104 is omitted. Therefore, in the second embodiment, the uttered voice of the reader and the uttered voice of the listener are not separated. Then, since the processing from step 204 onwards is performed for both uttered voices, the meaning of not only the reader's uttered voice but also the listener's uttered voice is grasped and reflected in the creation of the picture book. As shown in FIG. 5, since the sound pressure of the listener's uttered voice is low, the listener's uttered voice may be acquired by the mobile terminal 20b instead of the mobile terminal 20a. In this case, since the mobile terminal 20b is owned by the listener and is located closer than the mobile terminal 20a, the utterance can be obtained with a higher sound pressure.

第２の実施形態では、把握部４３は、読み手の発話音声だけでなく、さらに聞き手の発話音声の意味を把握し、画像取得部４４は、把握部４３が把握した聞き手の発話音声の意味に応じた画像を取得する。さらに、配置部４６は、画像取得部４４が、取得した画像を配置して絵本を作成する。 In the second embodiment, the comprehending unit 43 comprehends not only the uttered voice of the reader but also the uttered voice of the listener, and the image acquiring unit 44 comprehends the meaning of the uttered voice of the listener comprehended by the comprehending unit 43. Get the corresponding image. Furthermore, the placement unit 46 creates a picture book by placing the images acquired by the image acquisition unit 44 .

この具体例を、再び図７（ａ）～（ｃ）を用いて説明する。この場合、読み手と聞き手との間に、次のような会話があった場合が該当する。

――――――――――――――――――――――――――――――――――――――――
読み手：「深い深い森の中に、熊さんがいました。」
聞き手：「何匹いるの？」
読み手：「３匹。」
――――――――――――――――――――――――――――――――――――――――
This specific example will be described again with reference to FIGS. 7(a) to 7(c). In this case, the following conversation occurs between the reader and the listener.

――――――――――――――――――――――――――――――――――――――――
Reader: "There was a bear in a deep, deep forest."
Interviewer: "How many are there?"
Reader: "Three."
――――――――――――――――――――――――――――――――――――――――

この場合、読み手の「深い深い森の中に、熊さんがいました。」の発話音声により、配置部４６は、図７（ａ）～（ｃ）に示す画像を配置する点は、第１の実施形態と同様である。
一方、第２の実施形態では、把握部４３は、聞き手の発話音声の意味として、聞き手の「何匹いるの？」により、熊の数を質問していることを把握する。そして、次の読み手の「３匹。」がその回答であるとして、配置部４６は、熊の画像を３匹になるように増加させ、３匹となった熊の画像を配置する。その結果、図７（ｃ）に示すような画像となる。
第２の実施形態の場合、読み手の発話音声のみならず、聞き手の発話音声を反映させて、絵本を作成することができる。 In this case, the arranging unit 46 arranges the images shown in FIGS. is similar to the embodiment of
On the other hand, in the second embodiment, the comprehension unit 43 comprehends that the listener is asking about the number of bears based on the listener's "How many bears are there?" Then, assuming that the next reader's answer is "three." As a result, an image as shown in FIG. 7(c) is obtained.
In the case of the second embodiment, a picture book can be created by reflecting not only the uttered voice of the reader but also the uttered voice of the listener.

［第３の実施形態］
次に、第３の実施形態について説明を行う。第３の実施形態では、絵本の中に登場するキャラクタが発話するときは、この発話音声を、聞き手の携帯端末２０ｂで実際の音声として、出力するものである。
図１０は、第３の実施形態における表示システム１の機能構成例を示したブロック図である。
この表示システム１において、管理サーバ４０は、第１の実施形態と同様の機能構成を有する。即ち、送受信部４１と、分離部４２と、把握部４３と、画像取得部４４と、記憶部４５と、配置部４６とを有する。
一方、携帯端末２０ａおよび携帯端末２０ｂとは、第１の実施形態に対し、音声出力部２５が加わる点で異なる。 [Third Embodiment]
Next, a third embodiment will be described. In the third embodiment, when a character appearing in a picture book speaks, the spoken voice is output as an actual voice by the portable terminal 20b of the listener.
FIG. 10 is a block diagram showing a functional configuration example of the display system 1 according to the third embodiment.
In this display system 1, the management server 40 has the same functional configuration as in the first embodiment. That is, it has a transmitting/receiving section 41 , a separation section 42 , a comprehension section 43 , an image acquisition section 44 , a storage section 45 and an arrangement section 46 .
On the other hand, the mobile terminals 20a and 20b differ from the first embodiment in that an audio output unit 25 is added.

音声出力部２５は、管理サーバ４０から送られ、表示画像の中に登場するキャラクタの発話音声を出力する。表示画像が絵本の場合、「キャラクタ」は、絵本の中に絵として登場するものである。キャラクタは、特に限られるものではなく、現実に存在するキャラクタでもよく、現実に存在しないキャラクタでもよい。また、現実に存在するキャラクタであっても、人や動物などのように実際に音声等を発する能力がある場合に限られるものではなく、この能力がないキャラクタに発話させてもよい。この例としては、昆虫、木・花等の植物、太陽・月などの天体、おもちゃ等が挙げられる。また、現実に存在しないキャラクタとしては、例えば、妖精、恐竜、怪物、神様、幽霊等が挙げられる。 The voice output unit 25 outputs the uttered voice of the character sent from the management server 40 and appearing in the display image. When the displayed image is a picture book, the "character" is a picture that appears in the picture book. The character is not particularly limited, and may be a character that exists in reality or a character that does not exist in reality. In addition, even if the character exists in reality, it is not limited to the case where the character has the ability to actually utter a voice or the like, such as a person or an animal. Examples include insects, plants such as trees and flowers, celestial bodies such as the sun and moon, and toys. Characters that do not exist in reality include, for example, fairies, dinosaurs, monsters, gods, and ghosts.

図１１は、第３の実施形態の表示システム１の動作の例について説明したフローチャートである。
図１１に示したフローチャートのステップ３０１～ステップ３１３は、図４に示したフローチャートのステップ１０１～ステップ１１３と同様である。そして、新たにステップ３１４が加わる。ステップ３１４では、上述したように、絵本の中に登場するキャラクタの発話音声を出力する。 FIG. 11 is a flow chart explaining an example of the operation of the display system 1 of the third embodiment.
Steps 301 to 313 of the flowchart shown in FIG. 11 are the same as steps 101 to 113 of the flowchart shown in FIG. Then, step 314 is newly added. At step 314, as described above, the uttered voices of the characters appearing in the picture book are output.

この場合、読み手は、例えば、以下のような文章を発話する。

――――――――――――――――――――――――――――――――――――――――
読み手：「走りながら熊さんはこう言いました。まて～！」
――――――――――――――――――――――――――――――――――――――――
In this case, the reader utters, for example, the following sentences.

――――――――――――――――――――――――――――――――――――――――
Reader: “While running, Mr. Bear said, wait!”
――――――――――――――――――――――――――――――――――――――――

この場合、把握部４３は、読み手の発話音声の意味として、「熊さんはこう言いました。」により、絵本の中に登場するキャラクタの発話であることを把握する。この場合、キャラクタは、熊である。そして、送受信部４１は、絵本の情報として、熊の発話音声をさらに聞き手の携帯端末２０ｂに送る。その結果、ステップ３１４で述べたように、携帯端末２０ｂでは、絵本の中に登場する熊の発話音声が出力される。具体的には、聞き手は、以下のような音声を聞く。

――――――――――――――――――――――――――――――――――――――――
読み手：「走りながら熊さんはこう言いました。まて～！」
携帯端末２０ｂ：「まて～！」
――――――――――――――――――――――――――――――――――――――――
In this case, the comprehension unit 43 comprehends that the uttered voice of the reader is the utterance of a character appearing in the picture book, based on "Kuma-san said this." In this case the character is a bear. Then, the transmitting/receiving unit 41 further transmits the uttered voice of the bear as picture book information to the portable terminal 20b of the listener. As a result, as described in step 314, the portable terminal 20b outputs the uttered voice of the bear appearing in the picture book. Specifically, the listener hears the following speech.

――――――――――――――――――――――――――――――――――――――――
Reader: “While running, Mr. Bear said, wait!”
Portable terminal 20b: "Wait!"
――――――――――――――――――――――――――――――――――――――――

このとき、携帯端末２０ｂで出力される発話音声は、予め用意していたものでもよく、合成音声であってもよい。また、読み手の発話音声を加工したものであってもよい。そして、この発話音声は、キャラクタのイメージに合致した声質で出力することが好ましい。例えば、熊の場合は、低い音声で出力する。また、妖精の場合は、高い音声で出力する。この場合、携帯端末２０ｂで出力される発話音声を、読み手の発話音声を加工したものとする場合、元の発話音声に対し、周波数変換等を行うことで実現できる。
第３の実施形態の場合、聞き手は、携帯端末２０ｂから、キャラクタの発話音声を聞くことができ、臨場感がより向上する。 At this time, the utterance voice output from the mobile terminal 20b may be prepared in advance or may be synthesized voice. Moreover, what processed the utterance voice of a reader may be used. It is preferable to output this uttered voice with a voice quality that matches the image of the character. For example, in the case of a bear, a low-pitched voice is output. In the case of fairies, high-pitched voices are output. In this case, if the uttered voice output by the portable terminal 20b is the uttered voice of the reader, it can be realized by performing frequency conversion or the like on the original uttered voice.
In the case of the third embodiment, the listener can hear the character's uttered voice from the portable terminal 20b, which further improves the sense of presence.

［第４の実施形態］
次に、第４の実施形態について説明を行う。第４の実施形態では、読み手は、予め定められた音声を読むことで発話を行う。つまり、絵本のストーリーは、読み手の創作や実物の本を読む等でもよいが、携帯端末２０ａに表示し、これを読み、発話するようにすれば、読み手の負担を軽減することができる。この場合、絵本のストーリーは、管理サーバ４０に用意されている。 [Fourth embodiment]
Next, a fourth embodiment will be described. In the fourth embodiment, the reader speaks by reading a predetermined voice. In other words, the story of the picture book may be created by the reader or read from the actual book, but if the story is displayed on the mobile terminal 20a and read and spoken, the burden on the reader can be reduced. In this case, the picture book story is prepared in the management server 40 .

第４の実施形態における表示システム１の機能構成例を示すブロック図は、図３に示した第１の実施形態と同様であるので、ここでは説明を省略する。 A block diagram showing an example of the functional configuration of the display system 1 according to the fourth embodiment is the same as that according to the first embodiment shown in FIG. 3, so description thereof will be omitted here.

図１２は、第４の実施形態の表示システム１の動作の例について説明したフローチャートである。
まず、絵本の読み手が、携帯端末２０ａの入力部２３を使用して専用アプリを操作し、読み聞かせをする絵本を選択する（ステップ４０１）。これは、専用アプリが、管理サーバ４０に保存されている絵本の一覧を表示し、この一覧から選択することで行うことができる。
次に、管理サーバ４０の送受信部４１が、絵本の文章であり、予め用意され、読み手が読む文章を読み手の携帯端末２０ａに対し出力する（ステップ４０２）。これにより、携帯端末２０ａの表示部２２には、この文章が表示され、読み手は、これを絵本の文章として読む。 FIG. 12 is a flow chart explaining an example of the operation of the display system 1 of the fourth embodiment.
First, a picture book reader operates a dedicated application using the input unit 23 of the portable terminal 20a to select a picture book to be read aloud (step 401). This can be done by the dedicated application displaying a list of picture books stored in the management server 40 and selecting from this list.
Next, the transmitting/receiving unit 41 of the management server 40 outputs the text of the picture book, which is prepared in advance and read by the reader, to the portable terminal 20a of the reader (step 402). As a result, the text is displayed on the display unit 22 of the mobile terminal 20a, and the reader reads it as the text of a picture book.

以下のステップ４０３～ステップ４１５は、第１の実施形態で説明した図４のステップ１０１～ステップ１１３と同様である。 Steps 403 to 415 below are the same as steps 101 to 113 in FIG. 4 described in the first embodiment.

図１３は、第４の実施形態で用いられる記憶部４５のデータ構造を示した図である。
図示するデータ構造は、Ｎｏ．ページ、文言、属性、画像、処理の６つからなる。このうち、「Ｎｏ．」、「文言」、「属性」、「画像」、「処理」は、第１の実施形態において、図６で説明した場合と同様である。また、「ページ」は、絵本のページ数を表す。
このデータ構造が用意される絵本のストーリーは、例えば、下記に示すような場合である。

――――――――――――――――――――――――――――――――――――――――
読み手：「深い深い森の中に、熊さんがいました。熊さんは、３匹いました。」
読み手：「そこに、少女が１人現れました。少女は驚き、逃げだしました。」
読み手：「そして、熊さんも少女を追って走りだしました。」
読み手：「走りながら熊さんはこう言いました。まて～！」
読み手：「町に逃げ戻った少女は、助けを求めました。」
――――――――――――――――――――――――――――――――――――――――
FIG. 13 is a diagram showing the data structure of the storage unit 45 used in the fourth embodiment.
The data structure shown is No. It consists of six items: page, text, attribute, image, and process. Among these, "No.", "text", "attribute", "image", and "processing" are the same as those described with reference to FIG. 6 in the first embodiment. "Page" represents the number of pages of the picture book.
A picture book story for which this data structure is prepared is, for example, the case shown below.

――――――――――――――――――――――――――――――――――――――――
Reader: "There was a bear in a deep, deep forest. There were three bears."
Reader: "There, a girl appeared. She was frightened and ran away."
Reader: "And Mr. Bear also started running after the girl."
Reader: “While running, Mr. Bear said, wait!”
Reader: "A girl who ran back to town asked for help."
――――――――――――――――――――――――――――――――――――――――

この場合、読み手の発話内容は、決まっており、把握部４３は、画像を特定する文言として、発話音声の中に、図１３に示した文言が登場するか否かを調べる。そして、その文言が登場したときに、画像取得部４４は、ステップ４０９において、この文言に対応する画像を取得する。また、配置部４６は、変換された文字列の中に、図１３に示した文言が登場するか否かを調べる。そして、その文言が登場したときに、配置部４６は、ステップ４１１において、この画像に対し、この特徴に合わせる処理を行う。 In this case, the utterance content of the reader is fixed, and the comprehension unit 43 checks whether the wording shown in FIG. 13 appears in the uttered voice as the wording specifying the image. Then, when the wording appears, the image acquiring section 44 acquires an image corresponding to this wording in step 409 . Also, the placement unit 46 checks whether the wording shown in FIG. 13 appears in the converted character string. Then, when the wording appears, the arrangement unit 46 performs a process of matching this feature to this image in step 411 .

上述した文章の場合、図１３において、Ｎｏ．７００１～Ｎｏ．７００９で示す箇所に対応する。
ここでは、まず、図７（ａ）～（ｃ）に挙げた絵本が作成される。具体的には、図７（ａ）～（ｃ）で説明したように、文章１行目の「深い深い森の中に、熊さんがいました。熊さんは、３匹いました。」により、Ｎｏ．７００１～Ｎｏ．７００３が参照され、図７（ａ）～（ｃ）の絵本が作成される。即ち、配置部４６は、「森」の文言により、背景として森の画像を配置する（図７（ａ））。さらに、配置部４６は、「熊」の文言により、前景として熊の画像を配置する（図７（ｂ））。そして、配置部４６は、「３匹」の文言により、熊を３匹に増加させる（図７（ｃ））。 In the case of the sentences described above, in FIG. 7001-No. It corresponds to the location indicated by 7009 .
Here, first, the picture books shown in FIGS. 7(a) to 7(c) are created. Specifically, as explained in FIGS. 7(a) to 7(c), the first line of the sentence "There was a bear in a deep, deep forest. There were three bears." , No. 7001-No. 7003 is referred to and the picture book of FIGS. 7A to 7C is created. That is, the arranging unit 46 arranges the image of the forest as the background using the words "forest" (FIG. 7(a)). Further, the arrangement unit 46 arranges an image of a bear as the foreground using the words "bear" (FIG. 7(b)). Then, the arranging unit 46 increases the number of bears to three with the words "three" (FIG. 7(c)).

図１４（ａ）～（ｂ）は、文章２行目以降に作成される絵本の例を示した図である。
この場合、文章２行目の「そこに、少女が１人現れました。少女は驚き、逃げだしました。」により、Ｎｏ．７００４～Ｎｏ．７００５が参照され、図１４（ａ）の絵本が作成される。そして、矢印Ｙｓで示した方向に少女が速く移動する。即ち、配置部４６は、「少女」の文言により、前景として少女の画像を配置する。そして、「逃げだしました」の文言により、逃げだす様子を表すように、矢印Ｙｓで示した方向に少女を速く移動させる。
次に、文章３行目の「そして、熊さんも少女を追って走りだしました。」により、Ｎｏ．７００６が参照され、矢印Ｙｋで示した方向に熊が速く移動する。即ち、配置部４６は、「走り出しました」の文言により、少女を追いかける様子を表すように、矢印Ｙｋで示した方向に熊を速く移動させる。 FIGS. 14A and 14B are diagrams showing examples of picture books created after the second line of text.
In this case, according to the second line of the sentence, "A girl appeared there. The girl was surprised and ran away." 7004-No. 7005 is referred to and the picture book of FIG. 14(a) is created. Then, the girl moves quickly in the direction indicated by the arrow Ys. That is, the arranging unit 46 arranges the image of the girl as the foreground using the words "girl". Then, the girl is moved quickly in the direction indicated by the arrow Ys so as to express the appearance of running away from the word "I ran away".
Next, on the third line of the sentence, "And Mr. Bear started running after the girl." 7006 is referred to, and the bear moves quickly in the direction indicated by the arrow Yk. That is, the arrangement unit 46 causes the bear to move quickly in the direction indicated by the arrow Yk so as to express the state of chasing the girl by the words "I started running".

さらに、文章４行目の「走りながら熊さんはこう言いました。まて～！」により、Ｎｏ．７００７が参照され、第３の実施形態で説明したように、聞き手の携帯端末２０ｂに、「まて～！」という熊の発話音声が出力される。
そして、文章５行目の「町に逃げ戻った少女は、助けを求めました。」により、Ｎｏ．７００８～Ｎｏ．７００９が参照され、図１４（ｂ）の絵本が作成される。即ち、配置部４６は、「町」の文言により、背景として町の画像を配置する。なお、この場合、ページ数が１から２に変化するため、ページめくりがされた状態となる。そして、配置部４６は、「少女」の文言により、前景として少女の画像を配置する。
第４の実施形態の場合、読み手は、携帯端末２０ａに表示される文章を読み上げるだけで、読み聞かせを行うことができ、読み手の負担を軽減することができる。 Furthermore, according to the 4th line of the sentence, "While running, Mr. Bear said this. Wait!" 7007 is referred to, and as described in the third embodiment, the bear's uttered voice "Wait!" is output to the mobile terminal 20b of the listener.
Then, on the fifth line of the sentence, "The girl who escaped back to town asked for help." 7008-No. 7009 is referred to and the picture book of FIG. 14(b) is created. That is, the arranging unit 46 arranges the image of the town as the background using the word "town". In this case, since the number of pages changes from 1 to 2, the pages are turned. Then, the arrangement unit 46 arranges the image of the girl as the foreground using the words "girl".
In the case of the fourth embodiment, the reader can read aloud only by reading the text displayed on the mobile terminal 20a, and the burden on the reader can be reduced.

［第５の実施形態］
次に、第５の実施形態について説明を行う。第５の実施形態の表示システム１では、管理サーバ４０は、読み手の発話音声の意味に応じたテキストを取得し、取得したテキストを配置し、表示画像として文章にする。そして、聞き手は、携帯端末２０ｂで、表示画像として文章を閲覧する。
図１５は、第５の実施形態における表示システム１の機能構成例を示したブロック図である。
この表示システム１において、携帯端末２０ａおよび携帯端末２０ｂとは、図３に示した第１の実施形態における表示システム１と同様の機能構成を有する。即ち、携帯端末２０ａおよび携帯端末２０ｂは、送信情報の送受信を行う送受信部２１と、画像の表示を行う表示部２２と、情報を入力する入力部２３と、発話音声を取得する音声取得部２４とを備える。これらの各機能部は、第１の実施形態と同様の動作を行う。
一方、管理サーバ４０は、第１の実施形態に比較して、画像取得部４４の代わりにテキスト取得部４７が入る。また、第１の実施形態に比較して、把握部４３、画像取得部４４および記憶部４５の動作が異なる。よって、以下、この事項を中心に説明を行う。 [Fifth Embodiment]
Next, a fifth embodiment will be described. In the display system 1 of the fifth embodiment, the management server 40 acquires text corresponding to the meaning of the uttered voice of the reader, arranges the acquired text, and forms a sentence as a display image. Then, the listener views the text as a display image on the portable terminal 20b.
FIG. 15 is a block diagram showing a functional configuration example of the display system 1 according to the fifth embodiment.
In this display system 1, the mobile terminals 20a and 20b have the same functional configuration as the display system 1 in the first embodiment shown in FIG. That is, the mobile terminal 20a and the mobile terminal 20b include a transmission/reception section 21 for transmitting and receiving transmission information, a display section 22 for displaying an image, an input section 23 for inputting information, and a voice acquisition section 24 for acquiring uttered voice. and Each of these functional units operates in the same manner as in the first embodiment.
On the other hand, the management server 40 includes a text acquisition section 47 instead of the image acquisition section 44 as compared with the first embodiment. Also, the operations of the grasping unit 43, the image acquisition unit 44, and the storage unit 45 are different from those of the first embodiment. Therefore, the following description will focus on this matter.

把握部４３は、読み手の発話音声の意味を把握する。この場合、音声認識等の手法により、読み手の発話音声の意味の全てを把握することが好ましい。
テキスト取得部４７は、要素取得手段の一例であり、把握部４３が把握した意味を現す表示要素を取得する。ここでは、表示要素として、把握部４３が把握した意味に応じたテキストを取得する。 The comprehension unit 43 comprehends the meaning of the uttered voice of the reader. In this case, it is preferable to grasp all the meanings of the speech uttered by the reader by means of speech recognition or the like.
The text acquisition unit 47 is an example of element acquisition means, and acquires a display element representing the meaning grasped by the grasping unit 43 . Here, as a display element, a text corresponding to the meaning grasped by the grasping unit 43 is acquired.

記憶部４５は、上記意味と関連付けて画像取得部４４が取得するテキストを記憶する。
配置部４６は、把握した意味に応じて取得したテキストを配置し、表示情報とする。つまり、配置部４６は、画像取得部４４が取得したテキストを配置して、文章を構成する。配置部４６は、例えば、縦書き、横書き、改行等を考慮してテキストの配置を行い、文章とすることが好ましい。 The storage unit 45 stores the text acquired by the image acquisition unit 44 in association with the meaning.
The arranging unit 46 arranges the acquired text according to the comprehended meaning and uses it as display information. That is, the arranging unit 46 arranges the text acquired by the image acquiring unit 44 to compose a sentence. The arrangement unit 46 preferably arranges the text in consideration of vertical writing, horizontal writing, line breaks, etc., and forms sentences.

図１６は、第５の実施形態の表示システム１の動作の例について説明したフローチャートである。
図１６に示したフローチャートのステップ５０１～ステップ５０５は、図４に示したフローチャートのステップ１０１～ステップ１０５と同様である。
ステップ５０６以降は、配置部４６は、テキストを配置し、文章とする（ステップ５０６）。
そして、送受信部４１は、配置部４６が作成した文章の情報を、聞き手の携帯端末２０ｂに送信する（ステップ５０７）。
文章の情報は、携帯端末２０ｂの送受信部２１が受信し（ステップ５０８）、表示部２２にて、文章が表示される（ステップ５０９）。この場合も、文章の読み聞かせをすることができる。 FIG. 16 is a flow chart explaining an example of the operation of the display system 1 of the fifth embodiment.
Steps 501 to 505 of the flowchart shown in FIG. 16 are the same as steps 101 to 105 of the flowchart shown in FIG.
After step 506, the arrangement unit 46 arranges the text to form a sentence (step 506).
Then, the transmitting/receiving section 41 transmits the information of the sentence created by the arranging section 46 to the portable terminal 20b of the listener (step 507).
The text information is received by the transmitting/receiving section 21 of the portable terminal 20b (step 508), and the text is displayed on the display section 22 (step 509). In this case also, the text can be read aloud.

以上詳述した表示システム１によれば、第１の実施形態～第４の実施形態では、把握部４３が、読み手の発話音声の意味を把握し、配置部４６が、これに応じた絵を配置し、絵本を作成する。これにより、読み手が自由に話し、ストーリーを創作するような場合でも絵本が作成される。また、読み手は、コントローラのボタン等を押すような作業は必要なく、絵本の絵の変更、ページめくりなどが、いわば自動的に行われ、ストーリーが進行するため、読み手の負担が軽減される。またその結果、絵本となる読み手の話の内容に基づいて、聞き手の携帯端末等にリアルタイムで絵本の絵が表示されるので、聞き手は、読み手の発話音声を聞きながら、携帯端末等で臨場感あふれる絵本を楽しむことができる。また、第５の実施形態では、把握部４３が、読み手の発話音声の意味を把握し、配置部４６が、これに応じたテキストを配置し、文章を作成する。これにより、文章を基に読み聞かせを行うことができる。
また、それぞれが、携帯端末２０を使用することで、読み聞かせを行う場所などの制限が緩和され、読み手や聞き手が、好きな場所で読み聞かせを行うことができる。さらに、それぞれが携行可能な携帯端末２０を所持することで、利便性が向上する。 According to the display system 1 detailed above, in the first to fourth embodiments, the comprehension unit 43 comprehends the meaning of the uttered voice of the reader, and the placement unit 46 displays a picture corresponding to this. Arrange and create a picture book. As a result, a picture book can be created even when the reader freely speaks and creates a story. In addition, the reader does not need to press the buttons of the controller, etc., and the change of picture book, page turning, etc. are automatically performed, so to speak, and the story progresses, so that the burden on the reader is reduced. As a result, based on the contents of the reader's story, the picture book is displayed in real time on the listener's mobile terminal, etc., so that the listener can feel the realism on the mobile terminal while listening to the reader's utterance. You can enjoy the overflowing picture books. Further, in the fifth embodiment, the comprehension unit 43 comprehends the meaning of the uttered voice of the reader, and the arrangement unit 46 arranges the text accordingly to create sentences. This enables reading aloud based on the text.
In addition, by using the portable terminal 20, the restrictions on the place where the reading is performed are relaxed, and the reader and the listener can read the story at their favorite place. Furthermore, having a mobile terminal 20 that can be carried by each person improves convenience.

また、以上詳述した形態では、表示システム１は、携帯端末２０および管理サーバ４０が、ネットワーク７０、ネットワーク８０、アクセスポイント９０を介して接続されることにより構成されていたが、管理サーバ４０だけでも表示システムであるとして捉えることができる。また、管理サーバ４０で行う処理は、携帯端末２０でも同様のことができる。よってこの場合は、携帯端末２０を表示システムとして捉えることもできる。 Further, in the embodiment described in detail above, the display system 1 is configured by connecting the mobile terminal 20 and the management server 40 via the network 70, the network 80, and the access point 90, but only the management server 40 is configured. However, it can be regarded as a display system. Also, the processing performed by the management server 40 can be performed similarly on the mobile terminal 20 . Therefore, in this case, the portable terminal 20 can also be regarded as a display system.

さらに、上述した例では、携帯端末２０を使用する例を示したが、これに限られるものではない。例えば、携帯端末２０の代わりに、デスクトップコンピュータやテレビを使用することもできる。
またさらに、上述した例では、読み手と聞き手とは、そばにいる場合について説明を行ったが、これに限られるものではなく、読み手と聞き手とが離れていてもよい。この場合、読み手の発話音声は、聞き手には直接届かない。そのため、読み手の発話音声は、携帯端末２０ａから管理サーバ４０を介して聞き手の携帯端末２０ｂに送られ、携帯端末２０ｂに備えられたスピーカ等から出力する。これにより、読み手の発話音声を、聞き手が聞くことができる。またこのとき、携帯端末２０ｂにおいて、絵本や文章の表示のみならず、携帯端末２０ａで撮影した読み手の映像を併せて表示するようにしてもよい。さらに、携帯端末２０ａにおいて、携帯端末２０ｂで取得した聞き手の映像や、発話音声を出力するようにしてもよい。なおこの場合、読み手と聞き手の発話音声は、それぞれの携帯端末２０ａ、２０ｂだけで取得されるため、読み手の発話音声と聞き手の発話音声とを分離する分離部４２は、不要になる場合がある。 Furthermore, in the example described above, an example using the mobile terminal 20 was shown, but the present invention is not limited to this. For example, instead of the mobile terminal 20, a desktop computer or television can be used.
Furthermore, in the above example, the reader and listener are close to each other, but the present invention is not limited to this, and the reader and listener may be separated. In this case, the speech voice of the reader does not reach the listener directly. Therefore, the reader's uttered voice is sent from the mobile terminal 20a to the listener's mobile terminal 20b via the management server 40, and is output from a speaker or the like provided in the mobile terminal 20b. As a result, the listener can hear the uttered voice of the reader. At this time, the mobile terminal 20b may display not only the picture book and text but also the image of the reader captured by the mobile terminal 20a. Furthermore, the mobile terminal 20a may output the image of the listener acquired by the mobile terminal 20b and the voice of the listener. In this case, since the uttered voices of the reader and the listener are acquired only by the respective mobile terminals 20a and 20b, the separating unit 42 for separating the uttered voice of the reader and the uttered voice of the listener may be unnecessary. .

＜プログラムの説明＞
ここで、以上説明を行った本実施の形態における管理サーバ４０が行う処理は、例えば、アプリケーションソフトウェア等のプログラムとして用意される。そして、この処理は、ソフトウェアとハードウェア資源とが協働することにより実現される。即ち、管理サーバ４０に設けられたコンピュータ内部の図示しないＣＰＵが、上述した各機能を実現するプログラムを実行し、これらの各機能を実現させる。 <Explanation of the program>
Here, the processing performed by the management server 40 according to the present embodiment described above is prepared as a program such as application software, for example. This processing is realized through the cooperation of software and hardware resources. That is, a CPU (not shown) inside a computer provided in the management server 40 executes a program for realizing each function described above to realize each function.

よって、本実施の形態で、管理サーバ４０が行う処理は、コンピュータに、読み手の発話音声を取得する音声取得機能と、読み手の発話音声の意味を把握する把握機能と、把握機能が把握した意味を現す表示要素を取得する要素取得機能と、取得した表示要素を配置し、聞き手が閲覧する表示情報とする配置機能と、を実現させるためのプログラムとして捉えることもできる。
また、本実施の形態で、管理サーバ４０が行う処理は、コンピュータに、読み手の発話音声を取得する音声取得機能と、読み手の発話音声の意味を把握する把握機能と、把握機能が把握した意味に応じた画像を取得する画像取得機能と、意味に応じて取得した画像を配置し、絵本とする配置機能と、を実現させるためのプログラムとして捉えることもできる。 Therefore, in the present embodiment, the processing performed by the management server 40 includes, in the computer, a voice acquisition function for acquiring the uttered voice of the reader, a comprehension function for comprehending the meaning of the uttered voice by the reader, and a meaning comprehended by the comprehension function. It can also be regarded as a program for realizing an element acquisition function that acquires display elements that represent , and an arrangement function that arranges the acquired display elements and uses them as display information to be viewed by listeners.
Further, in the present embodiment, the processing performed by the management server 40 includes, in the computer, a voice acquisition function for acquiring the voice uttered by the reader, a comprehension function for comprehending the meaning of the voice uttered by the reader, and a meaning comprehended by the comprehension function. It can also be regarded as a program for realizing an image acquisition function of acquiring an image according to the meaning and an arrangement function of arranging the acquired images according to the meaning to make a picture book.

なお、本実施の形態を実現するプログラムは、通信手段により提供することはもちろんＣＤ－ＲＯＭ等の記録媒体に格納して提供することも可能である。 It should be noted that the program that implements the present embodiment can be provided not only by communication means but also by being stored in a recording medium such as a CD-ROM.

以上、本実施の形態について説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、種々の変更または改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。 Although the present embodiment has been described above, the technical scope of the present invention is not limited to the range described in the above embodiment. It is clear from the scope of claims that various modifications and improvements to the above embodiment are also included in the technical scope of the present invention.

１…表示システム、２０、２０ａ、２０ｂ…携帯端末、４０…管理サーバ、４１…送受信部、４２…分離部、４３…把握部、４４…画像取得部、４５…記憶部、４６…配置部、４７…テキスト取得部 DESCRIPTION OF SYMBOLS 1... Display system 20, 20a, 20b... Portable terminal, 40... Management server, 41... Transmission-and-reception part, 42... Separation part, 43... Grasping part, 44... Image acquisition part, 45... Storage part, 46... Arrangement part, 47 text acquisition unit

Claims

voice acquisition means for acquiring voice spoken by a reader; grasping means for grasping the meaning of the voice spoken by the reader;
an image obtaining means for obtaining an image corresponding to the meaning grasped by the grasping means;
arranging means for arranging the images acquired according to the meaning to form a picture book;
has
The image acquiring means checks whether or not a pre-registered word specifying the image is included in the uttered voice of the reader, and acquires an image corresponding to the word when the word is included. ,
When the uttered voice of the reader includes feature information representing a feature of the image registered in advance and acquired by the image acquisition means, the arrangement means performs processing to match the image with the feature,
The wording and the feature information are registered in association with the number of pages indicating the pages of the picture book,
The arranging means arranges the background image and the foreground image separately,
A picture book display system, wherein when the image acquisition means acquires a new background image, it treats it as a page number corresponding to the new background image.

The grasping means further grasps the meaning of the listener's uttered voice,
2. The picture book display system according to claim 1 , wherein said image obtaining means obtains an image corresponding to the meaning of said listener's uttered voice grasped by said grasping means.

2. The picture book display system according to claim 1 , further comprising separating means for separating said reader's uttered voice from said listener's uttered voice.