JP2005321706A

JP2005321706A - Method for reproducing digital book and apparatus for the same

Info

Publication number: JP2005321706A
Application number: JP2004141165A
Authority: JP
Inventors: Akihiro Yoshida; 明弘吉田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-05-11
Filing date: 2004-05-11
Publication date: 2005-11-17

Abstract

<P>PROBLEM TO BE SOLVED: To realize a method for reproducing a digital book with which a plurality of voices varying in sound quality with ample humanity can be inexpensively provided and an apparatus for the same. <P>SOLUTION: The reproducing apparatus including a data base 11 for synthesizing a plurality of the voices corresponding to respective speakers generated by separately taping the voices of a plurality of the speakers is used and the data base 11 for the plurality of the speakers is changed over according to the control information in an execution file 1 of the digital book or the assignment information inputted from the user by a reproduction controller 16 such that the voices corresponding to the text information in the execution file 1 of the digital book is subjected to synthesizing processing in a voice synthesizing section 12. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、文字や画像などの視覚情報とともに音声などの聴覚情報により表現される電子書籍の再生方法及びその装置に関するものである。 The present invention relates to a method and apparatus for reproducing an electronic book expressed by auditory information such as voice together with visual information such as characters and images.

従来、小説や絵本、マンガなどのコンテンツは、それらを構成する文字や絵、写真（静止画）が紙の上に印刷された状態で配布、つまり紙を媒体として提供されていた。しかし、今日のコンピュータ技術の進歩により、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）やフレキシブルディスクなどの携帯性に優れた各種の記憶媒体やＷｅｂなどのネットワークを媒体として配布、いわゆる電子出版物（電子書籍）として提供されるようになってきた（非特許文献１参照）。 Conventionally, contents such as novels, picture books, and manga have been distributed in a state where characters, pictures, and photographs (still images) constituting them are printed on paper, that is, provided on paper as a medium. However, with the advancement of computer technology today, CD-ROMs (Compact Disk Read Only Memory) and flexible disks are distributed on various storage media with excellent portability and networks such as the Web. Books) (see Non-Patent Document 1).

また、電子書籍の場合、その利用に際して使用される再生装置がパーソナルコンピュータやＰＤＡ、携帯電話など、動画表示可能な表示装置や可聴音を再生可能なスピーカなどを備えている場合が多いため、文字や静止画だけでなく、動画や音声などの情報を含むことも多く、紙を媒体としたものより、多様な情報を提供できるという特徴がある。
”電子書籍とは”、電子書籍コンソーシアム、［平成１６年４月２６日検索］、インターネット＜ＵＲＬ：http://www.ebj.gr.jp/concept/a0101.html＞ In addition, in the case of electronic books, the playback device used for use often includes a display device capable of displaying a moving image, such as a personal computer, a PDA, or a mobile phone, and a speaker capable of reproducing audible sound. It often includes information such as video and audio as well as still images, and is characterized by being able to provide a variety of information compared to paper.
“What is an e-book?”, E-book consortium, [Search on April 26, 2004], Internet <URL: http://www.ebj.gr.jp/concept/a0101.html>

ここで、音声を提供する方法として、録音音声を用いる方法と、合成音声を用いる方法との２つが考えられる。録音音声を用いる方法では、音声自体は人間味の溢れるものとなるが、コンテンツ毎に音声を収録する必要があるため、コストが高くなるという問題があった。一方、合成音声を用いる方法では、音声合成用データベースと音声合成エンジンを用意すれば、コンテンツ毎に音声を収録するような必要はなくなるが、声質の異なる複数の音声を信号処理による声質変換で対応させると、人間味のない音声になってしまうという問題があった。 Here, there are two methods for providing sound, a method using recorded sound and a method using synthesized sound. In the method using recorded sound, the sound itself is full of humanity, but there is a problem that the cost increases because it is necessary to record the sound for each content. On the other hand, in the method using synthesized speech, if a speech synthesis database and a speech synthesis engine are prepared, there is no need to record speech for each content, but multiple voices with different voice qualities can be handled by voice quality conversion by signal processing. When it was done, there was a problem that it became the sound which is not human.

また、電子書籍の再生中、一旦、再生を止めたり、聞き逃した箇所を再度聞くために少し前に戻って再生したり、再生中に生じた疑問を解消するために質問したいという要求がユーザから発生することが考えられる。例えば、子供は絵本を見ている途中で何か疑問を持った時に、その疑問をよく投げかけてくる。この疑問に答えることは、教育上、とても重要なことであると考えられている。しかし、従来の電子書籍では、これらの要求への対処は考えられていなかった。 In addition, during playback of an e-book, there is a request that the user wants to stop playback, return to play a little before to hear the missed part again, or ask a question to solve the question that occurred during playback. It is possible that For example, when a child has a question while watching a picture book, he often raises that question. Answering this question is considered very important for education. However, conventional electronic books have not been considered to cope with these requirements.

本発明の目的は、声質の異なる複数の人間味に溢れた音声を安価に提供でき、再生途中における繰り返し再生や質問などのユーザの要求に対処可能な電子書籍の再生方法及びその装置を実現することにある。 An object of the present invention is to provide an electronic book reproducing method and apparatus capable of providing a plurality of human-sounding voices having different voice qualities at low cost and capable of coping with user requests such as repeated reproduction and questions during reproduction. It is in.

本発明の電子書籍の再生方法及びその装置は、例えば小説や絵本、マンガといったような、従来は紙などの媒体によって表現していたコンテンツを電子的手段を用いて手軽に表現するものである。 The electronic book reproducing method and apparatus according to the present invention is a method for easily expressing content that has been conventionally expressed by a medium such as paper, such as a novel, a picture book, and a manga, using electronic means.

本発明における電子書籍は、マークアップ言語などによって記述されているものとし、ＣＤ−ＲＯＭやフレキシブルディスクなどの携帯性に優れた各種の記憶媒体やＷｅｂなどのネットワーク経由により配布される。 The electronic book in the present invention is described in a markup language or the like, and is distributed via various storage media such as a CD-ROM and a flexible disk and a network such as the Web.

このマークアップ言語などにより、視覚情報のうち、文字についてはテキスト情報として含め、静止画や動画の画像情報についてはデータファイルを指定することができるものとし、指定されたデータファイルも各種の記憶媒体やネットワーク経由により入手することが可能であり、再生装置が備えるメモリに蓄積できるものとする。 With this markup language, text information can be included as text information and data files can be specified for image information of still images and moving images. The specified data files can also be stored in various storage media. Or via a network, and can be stored in a memory included in the playback device.

音声については、マークアップ言語と互換性のある、テキスト発声に関するプログラム言語などを用いて、発声内容をテキスト情報として登録しておく。また、このテキスト発声に関するプログラム言語を用いて、再生する音声の声質に関する識別を行い、所望の声質の音声を指定することができるものとする。 For speech, the content of utterance is registered as text information using a program language related to text utterance that is compatible with the markup language. Further, it is assumed that the voice language of the voice to be reproduced can be identified using the program language related to the text utterance, and the voice of the desired voice quality can be designated.

所望の音声は、音声合成技術により合成音声を生成することで実現することができる。音声合成を行うには、言語を構成する単位の音声素片を含む音声合成用データベースと、音声合成エンジンが必要である。多数の声質を生成するため、本発明では、基本周波数を変化させるような音声合成エンジンにより声質に変化を持たせる手法ではなく、複数の話者の音声を別々に収録することで各話者にそれぞれ対応する複数の音声合成用データベースを作成し、これらの複数の音声合成用データベースを状況に応じて使い分けるという手法を用いる。つまり、音声の声質は使用する音声合成用データベースに依存する。これらの多数の音声合成用データベースも再生装置が備えるメモリに蓄積できるものとする。 Desired speech can be realized by generating synthesized speech by speech synthesis technology. In order to perform speech synthesis, a speech synthesis database including speech units of units constituting a language and a speech synthesis engine are required. In order to generate a large number of voice qualities, the present invention is not a method of changing the voice quality by using a speech synthesis engine that changes the fundamental frequency, but by recording the voices of a plurality of speakers separately to each speaker. A method is used in which a plurality of corresponding speech synthesis databases are created, and the plurality of speech synthesis databases are selectively used according to the situation. That is, the voice quality of speech depends on the speech synthesis database to be used. It is assumed that these many speech synthesis databases can also be stored in a memory provided in the playback device.

このように、音声合成エンジンにおける信号処理に依らず、それぞれの声質に対応した音声合成用データベースを用いることで、声質の異なる複数の音声をより自然に発声させることができ、自然で異なる声質を登場人物毎に割り当てたりすることが可能である。また、所望の声質の話者に対応した音声合成用データベースを用意することで、どのような声質の合成音声も生成することが可能となる。なお、所望の声質の話者に対応した音声合成用データベースから作成した合成音声に対し、さらなる音声品質の向上のために信号処理を行うようにしても良い。 In this way, by using the speech synthesis database corresponding to each voice quality, regardless of the signal processing in the speech synthesis engine, a plurality of voices with different voice quality can be uttered more naturally, It can be assigned for each character. Moreover, by preparing a speech synthesis database corresponding to a speaker having a desired voice quality, synthesized voices of any voice quality can be generated. Note that signal processing may be performed on the synthesized speech created from the speech synthesis database corresponding to the speaker of the desired voice quality in order to further improve speech quality.

再生制御に関して記述されたマークアップ言語を実行することにより、装置内のメモリに蓄積された視覚情報源と音声情報源を使用するタイミングを計ることが可能であり、電子書籍の再生をスムーズに実現することができる。 By executing the markup language described for playback control, it is possible to measure the timing of using the visual information source and audio information source stored in the memory in the device, and smoothly play back e-books. can do.

次に、電子書籍の再生途中において、ユーザの要求に対処するための手段について述べる。 Next, means for coping with a user's request during playback of an electronic book will be described.

ユーザは疑問や要望を持った時、所定のボタンを操作するなどによりその旨を再生装置に知らせることができ、この時、再生装置は再生プロセスを一時停止し、その位置をメモリに記憶させた上で、マイクを音声入力待機状態とする。マイクにより入力された音声は、音声認識部で認識され、質問対応処理部で認識文を意味解析し、動作制御についての要求の場合はその意図を、また、書籍の内容に対する質問の場合は質問対応辞書から導き出された答えを対話制御部に伝えることができる。この質問対応処理部から得られた指示を受けて、対話制御部でプログラミングされた的確な動作制御を実行したり、質問への返答をスピーカとディスプレイを用いて示したりする。書籍の内容に対する質問を終えた後は、一時停止された位置を記憶しているメモリから情報を得て、電子書籍の再生プロセスを再開することができる。 When a user has a question or request, the user can notify the playback device by operating a predetermined button or the like. At this time, the playback device pauses the playback process and stores the position in the memory. Above, the microphone is set to the voice input standby state. The voice input by the microphone is recognized by the voice recognition unit, the question response processing unit performs semantic analysis of the recognized sentence, and if it is a request for operation control, its intention is indicated. The answer derived from the correspondence dictionary can be transmitted to the dialogue control unit. In response to the instruction obtained from the question handling processing unit, the operation control is performed accurately programmed by the dialogue control unit, or a response to the question is shown using a speaker and a display. After finishing the question about the contents of the book, information can be obtained from the memory storing the paused position, and the electronic book reproduction process can be resumed.

本発明によれば、電子書籍の再生音声の生成に関するコストを大幅に削減することが可能な合成音声において、声質の違いの表現を信号処理によるものではなく、複数の話者にそれぞれ対応する複数の音声合成用データベースを用いて表現することで、より人間味が溢れた音声の生成が可能である。 According to the present invention, in synthesized speech that can greatly reduce the cost of generating reproduced speech of an electronic book, the expression of the difference in voice quality is not based on signal processing, but a plurality corresponding to each of a plurality of speakers. By using the speech synthesis database, it is possible to generate speech with more humanity.

また、本発明によれば、ユーザの希望する声質の音声を、任意の登場人物に使用したりすることが可能となる。 Further, according to the present invention, it is possible to use the voice of the voice quality desired by the user for any character.

また、本発明によれば、電子書籍の再生途中におけるユーザの疑問や要望に適切に対処することが可能となる。 Further, according to the present invention, it is possible to appropriately cope with a user's question or request while playing an electronic book.

以下、図面を参照して、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

＜第１の実施の形態＞
図１は本発明の電子書籍の再生装置の第１の実施の形態を、電子書籍とともに示すものである。 <First Embodiment>
FIG. 1 shows a first embodiment of an electronic book reproducing apparatus of the present invention together with an electronic book.

本発明において電子書籍は、文字や合成音声のテキスト情報を含み、再生のための制御情報、即ち再生の実行手順や再生に使用するファイルやデータベースの指定情報等がマークアップ言語で記述された実行ファイル１と、視覚情報として提示するための静止画または動画のいずれか一方もしくは両方の画像情報を含むデータファイル２とから構成されるものとする。 In the present invention, an electronic book includes text information of characters and synthesized speech, and control information for reproduction, that is, execution procedure of reproduction, specification information of a file and database used for reproduction, etc. described in a markup language It is assumed that the file 1 is composed of a data file 2 including image information of one or both of a still image and a moving image to be presented as visual information.

絵本や小説、マンガなどの電子書籍の実行ファイル１は、ＨＴＭＬやＸＭＬ、ＶｏｉｃｅＸＭＬ、ＳＡＬＴ（Speech Application Language Tags）などのマークアップ言語や独自に定義した音声を扱える言語、音声オブジェクトを定義する独自に作成したタグなどを用いて記述することができる。また、文字や画像などの視覚情報と音声などの聴覚情報との同期は、実行ファイル１内に記述されたマークアップ言語や独自に定義した言語やタグを用いて実現することができる。 The executable file 1 for e-books such as picture books, novels, and manga is uniquely used to define markup languages such as HTML, XML, VoiceXML, Speech Application Language Tags (SALT), languages that can handle their own defined audio, and audio objects. It can be described using created tags. In addition, synchronization between visual information such as characters and images and auditory information such as voice can be realized using a markup language described in the execution file 1 or a language or tag defined uniquely.

電子書籍の実行ファイル１及びデータファイル２は、ＣＤ−ＲＯＭ、フレキシブルディスク、半導体メモリ、光磁気ディスクなどの携帯性に優れた各種の記憶媒体やＷｅｂなどの情報提供媒体３を介して手に入れることができる。 The e-book execution file 1 and data file 2 can be obtained through various storage media such as CD-ROM, flexible disk, semiconductor memory, and magneto-optical disk, and information providing medium 3 such as Web. be able to.

図１において、１１は音声合成用データベース、１２は音声合成部（音声合成エンジン）、１３は表示用メモリ、１４は表示制御部、１５はマークアップ言語用メモリ、１６は再生制御部であり、これらにより本発明の電子書籍の再生装置の主要部（再生専用システム部）１０が構成される。また、２１はスピーカ、２２はディスプレイである。 In FIG. 1, 11 is a speech synthesis database, 12 is a speech synthesis unit (speech synthesis engine), 13 is a display memory, 14 is a display control unit, 15 is a markup language memory, and 16 is a reproduction control unit. These components constitute a main part (reproduction-only system part) 10 of the electronic book reproduction apparatus of the present invention. Reference numeral 21 denotes a speaker, and 22 denotes a display.

電子書籍の再生処理は、基本的に再生制御部１６が実行ファイル１内に記述されたマークアップ言語を実行することによって行われる。 The electronic book reproduction process is basically performed by the reproduction control unit 16 executing the markup language described in the execution file 1.

実行ファイル１内にテキスト情報で記述されている発声すべき音声は、合成用データベース１１と音声合成部１２を用いることにより、音声としてスピーカ２１から出力される。 The voice to be uttered described in the text information in the execution file 1 is output from the speaker 21 as voice by using the synthesis database 11 and the voice synthesis unit 12.

この際、電子書籍の実行ファイル１内の制御情報もしくはユーザから入力された指定情報に従い、使用する音声合成用データベース１１を選択することで、登場人物毎に異なる声質の音声で再生したり、ユーザ好みの声質の音声で再生させることができる。本再生装置は、予め十分な人数分の音声合成用データベース、即ち複数の話者の音声を別々に収録することで作成した各話者に対応する複数の音声合成用データベース１１を含んでいるものとする。また、更に別の話者に対応した音声合成用データベースを必要とする場合は、Ｗｅｂなどのネットワークを経由したり、携帯性に優れている記憶媒体から入手したりできるものとする。 At this time, by selecting the speech synthesis database 11 to be used in accordance with the control information in the execution file 1 of the electronic book or the designation information input by the user, the voice can be reproduced with different voice quality for each character, It can be played back with the voice of your favorite voice quality. This playback apparatus includes a plurality of speech synthesis databases 11 corresponding to each speaker created by separately recording a plurality of speaker speeches in advance, that is, a sufficient number of speech synthesis databases. And If a speech synthesis database corresponding to another speaker is required, it can be obtained via a network such as the Web or from a storage medium having excellent portability.

このようにして生成された音声と同期して、ディスプレイ２２を用いて画像を表示させる必要がある。電子書籍の一部として、予めＷｅｂや記憶媒体から入手してある画像情報を含むデータファイル２を表示用メモリ１３に格納しておく。音声の発声と同期して、再生制御部１６から表示制御部７に発令された命令により、表示すべき画像情報を含むデータファイル２の指定が可能である。表示制御部７は、ここで指定されたデータファイル２を表示用メモリ１３から取り出し、ディスプレイ２２に表示することができる。 It is necessary to display an image using the display 22 in synchronization with the sound generated in this way. As a part of the electronic book, a data file 2 including image information obtained in advance from the Web or a storage medium is stored in the display memory 13. The data file 2 including the image information to be displayed can be designated by a command issued from the reproduction control unit 16 to the display control unit 7 in synchronization with the voice utterance. The display control unit 7 can take out the data file 2 designated here from the display memory 13 and display it on the display 22.

小説など、ほとんどが文字からなる電子書籍の場合、視覚情報として、紙を媒体とした書物を見開いた格好の画像などを表示することが望ましいと考えられる。 In the case of an electronic book consisting mostly of characters, such as a novel, it may be desirable to display, as visual information, an image that looks like a book using paper as a medium.

挿絵がある場合には、紙による書物と同様のレイアウトで表示することも可能であるが、挿絵を背景として用いることも可能である。 If there is an illustration, it can be displayed in the same layout as a paper book, but the illustration can also be used as the background.

また、今、どの部分を再生しているかを明示するために、読み上げ部分の色を変えるという手法が有効であると考えられる。この色を変える範囲は、文単位でもいいし、行単位や段落単位でも良い。 Also, in order to clearly indicate which part is being reproduced, it is considered effective to change the color of the reading part. The range for changing the color may be a sentence unit, a line unit or a paragraph unit.

これらの制御はマークアップ言語により実現することが可能なので、電子書籍の出版元が実行ファイル１を所定の記述法で作成することにより、上に述べたような仕様の電子書籍を実現することができる。 Since these controls can be realized by a markup language, the electronic book publisher can realize the electronic book having the specifications as described above by creating the executable file 1 by a predetermined description method. it can.

一方、電子書籍が絵本などの画像を多く含む書物の場合、絵による表現が重要と考えられるため、絵に文字（テキスト文）が重なって表示されるのは好ましいとはいえない。 On the other hand, in the case where the electronic book is a book including many images such as a picture book, it is considered that expression by a picture is important. Therefore, it is not preferable that characters (text sentences) are displayed over the picture.

この場合には、絵の表示スペースとテキスト文の表示スペースを分けることで回避することができる。この場合には、小説などの電子書籍と同様に、現在の再生部分を色で表示することが可能である。 In this case, it can be avoided by separating the picture display space and the text sentence display space. In this case, it is possible to display the current reproduction portion in color, as in an electronic book such as a novel.

電子書籍による絵本の場合、紙を媒体とした絵本とは違って、動画を表示することも可能になる。また、動画と似た形式であるが、静止画を短い間隔で次々と表示することで絵本独特の臨場感を出すことも可能であると考えられる。 In the case of a picture book using an electronic book, a moving picture can be displayed unlike a picture book using paper as a medium. In addition, although it is a format similar to a moving image, it can be considered that a realistic feeling unique to a picture book can be obtained by displaying still images one after another at short intervals.

＜第２の実施の形態＞
電子書籍による、幼い子供を対象とした絵本の場合、幼い子供が視聴することを考慮に入れ、適切なアプローチが必要になると考えられる。絵本は人間性を高める上でとても重要な役割を担っており、子供に生じた疑問に答えることはとりわけ大切であると考えられる。 <Second Embodiment>
In the case of picture books for young children using electronic books, it is considered that an appropriate approach is required in consideration of viewing by young children. Picture books play a very important role in improving humanity, and it is especially important to answer questions that arise in children.

また、再生の途中で一旦、再生を止めたり、聞き逃した箇所を再度聞くために少し前に戻って再生（巻き戻し再生）したい時や、興味の薄いところを飛ばしてその先の部分を再生（早送り再生）したい時などにも対処する必要がある。 Also, when you want to stop playback once in the middle of playback, or go back a little while to hear the missed part (rewind playback), or skip the less interesting part and play the part ahead You also need to deal with things like when you want (fast forward playback).

これらの理由により、疑問に対する答えを与えるなど、要求に対する応答機能を備えた電子書籍が必要になると考えられる。 For these reasons, it is considered that an electronic book having a response function for a request, such as giving an answer to a question, is required.

図２は本発明の電子書籍の再生装置の第２の実施の形態、ここでは簡単な対話応答機能を備えた形態を示すもので、図中、図１と同一構成部分は同一符号をもって表す。即ち、１０は再生専用システム部、２１はスピーカ、２２はディスプレイ、３１はマイク、３２は音声認識部、３３は質問対応処理部、３４は質問対応辞書、３５は対話制御部、３６は対話ボタン、３７は再生ＩＤタグ用メモリである。 FIG. 2 shows a second embodiment of an electronic book reproducing apparatus according to the present invention, which is a form provided with a simple dialog response function. In FIG. 2, the same components as those in FIG. That is, 10 is a reproduction-only system unit, 21 is a speaker, 22 is a display, 31 is a microphone, 32 is a voice recognition unit, 33 is a question response processing unit, 34 is a question response dictionary, 35 is a dialog control unit, and 36 is a dialog button. , 37 are reproduction ID tag memories.

ユーザが何か疑問や要望を持った時には、対話ボタン３６を押すことにより、その意図を対話制御部３５を介して再生専用システム部１０側に知らせ、再生を一時停止する。その際、現在、マークアップ言語内のどの部分を実行しているかを把握しておく必要がある。例としては、段落やページなど、ある程度のまとまりに対してＩＤタグを予め付しておき、再生しながらそのＩＤタグをメモリ３７に格納しておく。一時停止をした時にメモリ３７には停止した個所のＩＤタグが入っているので、要求を処理した後、該メモリ３７に格納しておいたＩＤタグの先頭から電子書籍の再生を再開する。 When the user has any doubts or requests, the user pushes the dialogue button 36 to notify the intention to the reproduction-only system unit 10 side through the dialogue control unit 35, and temporarily stops reproduction. At that time, it is necessary to know which part of the markup language is currently being executed. For example, an ID tag is attached in advance to a certain unit such as a paragraph or a page, and the ID tag is stored in the memory 37 while being reproduced. Since the memory 37 contains the ID tag of the stopped location when the pause is made, after the request is processed, the reproduction of the electronic book is resumed from the head of the ID tag stored in the memory 37.

ユーザの疑問に対する対応については、対話ボタン３６が押された時点で、対話制御部３５により音声認識部３２の動作をスタートさせる。より詳しくいうとマイク３１による音声の入力を受け付ける状態にする。 Regarding the response to the user's question, the operation of the voice recognition unit 32 is started by the dialogue control unit 35 when the dialogue button 36 is pressed. More specifically, a state in which voice input from the microphone 31 is received is set.

質問をマイク３１に向かって音声入力し、その音声を音声認識部３２により認識し、該認識結果に基づいて質問対応辞書３４を参照しながら質問に対する返答を質問対応処理部３３によって導き出す。 A question is inputted into the microphone 31 by voice, the voice is recognized by the voice recognition unit 32, and a response to the question is derived by the question correspondence processing unit 33 while referring to the question correspondence dictionary 34 based on the recognition result.

一例として、マイク３１から入力された質問事項が予め答えが用意されている質問であり、その質問が質問対応処理部３３の意味解析結果から特定された場合、対話制御部３５は質問応答処理部３３を介して質問対応辞書３４から応答内容及び提示資料を受け取ることができ、それらは再生専用システム部１０を介してスピーカ２１とディスプレイ２２を用いて示される。 As an example, when the question item input from the microphone 31 is a question for which an answer is prepared in advance, and the question is specified from the semantic analysis result of the question handling processing unit 33, the dialogue control unit 35 displays the question response processing unit. The response content and the presentation material can be received from the question correspondence dictionary 34 via 33, and these are shown using the speaker 21 and the display 22 via the reproduction-only system unit 10.

その後、ユーザに対して疑問が解決できたかという確認を対話ボタン３６より行い、上述の方法により電子書籍の再生を再開する。 Thereafter, the user confirms whether or not the question has been solved by using the dialogue button 36, and resumes the reproduction of the electronic book by the method described above.

巻き戻し再生などのユーザの要望に対する対応については、対話ボタン３６が押された時点で、対話制御部３５により音声認識部３２の動作をスタートさせ、ユーザに要望をマイク３１から入力させて音声認識部３２により音声認識させ、該認識結果に基づいて当該要望の内容を質問対応処理部３３及び質問対応辞書３４により解析させ、解析結果に対応する命令、例えば再生を中断した段落より前の段落まで戻って再生する命令を、前述したＩＤタグなどによる段落指定を含めて対話制御部３５より再生専用システム部１０へ送らせることによって行う。 As for the response to the user's request such as rewind playback, when the dialogue button 36 is pressed, the operation of the voice recognition unit 32 is started by the dialogue control unit 35, and the user inputs the request from the microphone 31 to recognize the voice. The voice is recognized by the unit 32, and the content of the request is analyzed by the question handling processing unit 33 and the question handling dictionary 34 based on the recognition result. A command for returning and reproducing is performed by causing the dialogue control unit 35 to send the command to the reproduction-only system unit 10 including the paragraph designation by the ID tag or the like.

なお、前述したマイク３１及び音声認識部３２が請求項でいう音声認識手段を構成し、質問対応処理部３３及び質問対応辞書３４が請求項でいう要求認識手段を構成し、対話制御部３５、対話ボタン３６及びメモリ３７が請求項でいう対話制御手段を構成するものとする。 The microphone 31 and the voice recognition unit 32 described above constitute a voice recognition unit as defined in the claims, the question response processing unit 33 and the question correspondence dictionary 34 constitute a request recognition unit as defined in the claims, a dialogue control unit 35, The dialogue button 36 and the memory 37 constitute dialogue control means in the claims.

以上説明した電子書籍の再生装置は、電子書籍の構成要素である、マークアップ言語などで書かれた実行ファイル１と、画像情報を含むデータファイル２さえあれば再生が可能である。 The electronic book reproducing apparatus described above can be reproduced as long as there is an executable file 1 written in a markup language or the like and a data file 2 including image information, which are components of the electronic book.

つまり、マークアップ言語に対するある程度の知識があれば実行ファイルは作成可能であり、データファイルについても簡単な絵やデジタルカメラなどで撮影した画像や録画した映像を用いることで簡単に用意できる。つまり、自作の電子書籍が簡単に作成できるというメリットがある。 In other words, an executable file can be created with some knowledge of the markup language, and a data file can be easily prepared by using a simple picture, an image taken with a digital camera, or a recorded video. In other words, there is an advantage that a self-made electronic book can be easily created.

簡単に電子書籍が作成できる要因の１つとして、合成音声による音声の生成が挙げられる。音声については、実行ファイル内にテキスト情報として書き込むだけで所望の音声が再生できる。 One of the factors that make it easy to create an electronic book is the generation of speech using synthesized speech. As for the sound, a desired sound can be reproduced simply by writing it as text information in the execution file.

このような使用方法により、この電子書籍の再生装置はエンターテイメント性にも優れたシステムとして考えることができる。 With such a usage method, the electronic book reproducing apparatus can be considered as a system excellent in entertainment properties.

本発明の電子書籍の再生装置の第１の実施の形態を示す構成図である。It is a block diagram which shows 1st Embodiment of the reproducing | regenerating apparatus of the electronic book of this invention. 本発明の電子書籍の再生装置の第２の実施の形態を示す構成図である。It is a block diagram which shows 2nd Embodiment of the reproducing | regenerating apparatus of the electronic book of this invention.

Explanation of symbols

１：実行ファイル、２：データファイル、３：情報提供媒体、１０：再生専用システム部、１１：音声合成用データベース、１２：音声合成部（音声合成エンジン）、１３：表示用メモリ、１４：表示制御部、１５：マークアップ言語用メモリ、１６：再生制御部、２１：スピーカ、２２：ディスプレイ、３１：マイク、３２：音声認識部、３３：質問対応処理部、３４：質問対応辞書、３５：対話制御部、３６：対話ボタン、３７：再生ＩＤタグ用メモリ。 1: execution file, 2: data file, 3: information providing medium, 10: reproduction-only system unit, 11: database for speech synthesis, 12: speech synthesis unit (speech synthesis engine), 13: display memory, 14: display Control part, 15: Memory for markup language, 16: Playback control part, 21: Speaker, 22: Display, 31: Microphone, 32: Voice recognition part, 33: Question correspondence processing part, 34: Question correspondence dictionary, 35: Dialog control unit 36: Dialog button 37: Memory for reproduction ID tag

Claims

A method of reproducing an electronic book expressed by auditory information such as sound together with visual information such as characters and images and distributed via a storage medium or a network,
An electronic book comprising text files of text and synthesized speech, and an executable file in which control information for reproduction is described in a markup language, and a data file containing image information;
Using a speech synthesis database, a speech synthesis engine, a display control unit, and a playback device including a playback control unit,
The playback controller
According to the control information in the execution file of the electronic book, the display control unit displays the character corresponding to the text information in the execution file of the electronic book and the image corresponding to the data file, and the control in the execution file of the electronic book A method for reproducing an electronic book, comprising: causing a speech synthesis engine to synthesize speech corresponding to text information in an execution file of the electronic book according to information or designation information input from a user.

The electronic book reproducing method according to claim 1,
As a speech synthesis database, using a playback device equipped with a plurality of speech synthesis databases corresponding to each speaker created by separately recording the speech of a plurality of speakers,
The playback controller
According to the control information in the execution file of the electronic book or the designation information input from the user, the plurality of databases for speech synthesis are switched, and the speech corresponding to the text information in the execution file of the electronic book is synthesized in the speech synthesis engine. A method for reproducing an electronic book, characterized in that:

The electronic book reproducing method according to claim 2,
In addition to the above, using a playback device including voice recognition means, request response means, and dialogue control means,
The dialogue control means
Receiving a request from the user during playback of the electronic book, the playback control unit temporarily stops playback,
Let the voice recognition means and request response means analyze the request from the user,
A method for reproducing an electronic book, comprising: causing a reproduction control unit to execute processing according to an analysis result, and then restarting reproduction.

An apparatus for reproducing an electronic book expressed by auditory information such as sound together with visual information such as characters and images and distributed via a storage medium or a network,
Multiple speech synthesis databases corresponding to each speaker created by separately recording the speech of multiple speakers,
A speech synthesis engine that synthesizes speech corresponding to text information using a speech synthesis database;
According to the control information in the execution file of the electronic book or the designation information input from the user, the plurality of databases for speech synthesis are switched, and the speech corresponding to the text information in the execution file of the electronic book is synthesized in the speech synthesis engine. An electronic book reproducing apparatus comprising: a reproduction control unit that controls the electronic book.

The electronic book reproducing apparatus according to claim 4, wherein
Voice recognition means for recognizing voice including a request from a user;
Request response means for analyzing the content of the request from the user based on the recognition result and deriving an answer or command to the request;
Accepts a request from the user during the reproduction of the electronic book, causes the reproduction control unit to suspend reproduction, causes the voice recognition unit and request response unit to analyze the request from the user, and causes the reproduction control unit to perform processing according to the analysis result An electronic book reproducing apparatus comprising: an interactive control unit configured to execute and then resume reproduction.