JPH05333891A

JPH05333891A - Automatic reading device

Info

Publication number: JPH05333891A
Application number: JP4138525A
Authority: JP
Inventors: Hiroshi Mori; 弘森; Masaaki Horikawa; 昌明堀川
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1992-05-29
Filing date: 1992-05-29
Publication date: 1993-12-17
Anticipated expiration: 2014-01-20
Also published as: JP2849504B2

Abstract

PURPOSE:To obtain the automatic reading device which outputs an effect sound matching the scenery of the contents of a book or projects an image showing the scenery simultaneously with a synthesized speech of a read of the contents of the book. CONSTITUTION:An optical character read part 13 after optically reading characters of the book 2 segments the individual characters to generate character-recognized text data. A vocalizing device part 14 while vocalizing the text data automatically detects an image key word showing the scenery in the text data by an image key word detection part 23 and reproduces and outputs effect sound data corresponding to it by an effect sound data reproduction part 26. In another way, a speech synthesizing process and display control part automatically detect the image key word showing the scenery in the text data by the part 23 and draws an corresponding image as a picture. Therefore, a reader is given a feeling of presence and the concrete image.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、小説等の書籍の印刷文
字を自動認識して合成音声で出力する自動読書装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic reading apparatus for automatically recognizing printed characters of a book such as a novel and outputting it as synthetic voice.

【０００２】[0002]

【従来の技術】従来、斯かる自動読書装置としては、図
１２に示すような外観を備えた盲人用のものが存在す
る。この装置は、本（２）を装置にセットするための引
出し機構部（１）と、操作盤（３）と、盲人に操作手順
等を説明するための点字説明盤（４）と、録音用カセッ
トテープのテープ装着部（５）と、ヘッドホーン（６）
とが外部に設けられている。2. Description of the Related Art Conventionally, as such an automatic reading device, there is a device for a blind person having an appearance as shown in FIG. This device includes a drawer mechanism (1) for setting the book (2) in the device, an operation panel (3), a Braille explanation board (4) for explaining the operation procedure to the blind, and a recording device. Tape mounting part (5) of cassette tape and headphones (6)
And are provided outside.

【０００３】次に、これのシステム構成を示した図１３
により動作を説明すると、盲人自身が本（２）を引出し
機構部（１）および操作盤（３）により装置本体にセッ
トし、ヘッドホーン（６）を装着する等の読書の準備
（Ａ）を行なうと、光学文字読取部（７）において、Ｃ
ＣＤセンサーが本（２）の印刷文字を光学的に読み取っ
てイメージデータを入力すると、文字認識部がこのイメ
ージデータを個々の文字毎に切り出して標準パターンと
の照合等により各文字を認識する。更に、文章解析部
（８）おいて超ＬＳＩメモリによる辞書引きが行なわれ
て文章の構成等の解析が行なわれ、文章のテキストデー
タを得る。このテキストデータが音声合成部（９）によ
りイントネーションやアクセントをつけた合成音声とし
てヘドホーン（７）から出力され、また、カセットテー
プ（５１）に録音される。Next, FIG. 13 showing the system configuration of this system.
The operation will be described below. The blind person himself sets the book (2) on the main body of the device by the pull-out mechanism section (1) and the operation panel (3), and prepares for reading (A) such as wearing headphones (6). Then, in the optical character reading unit (7), C
When the CD sensor optically reads the print character of the book (2) and inputs the image data, the character recognition unit cuts out the image data for each character and recognizes each character by collation with a standard pattern. Further, in the sentence analysis unit (8), the dictionary is searched by the VLSI memory to analyze the structure of the sentence and the like to obtain the text data of the sentence. The text data is output from the headphone (7) as synthetic voice with intonation and accent by the voice synthesizer (9) and recorded on the cassette tape (51).

【０００４】[0004]

【発明が解決しようとする課題】前述の自動読書装置は
聴覚で読書できることから、盲人にとっては点字以外の
通常の種々の本（２）の記載内容を他人の手を煩わせる
ことなく知ることができる極めて便利なものである。然
し乍ら、本（２）の記述内容を単に棒読みした程度の合
成音声のみによる表現であるため、人間が本（２）を読
んで聞かせる場合のような具体的なイメージの説得力や
臨場感の乏しいものである。そのため、飽き易く、永続
的に使用されないことが多く、また、盲人用以外に例え
ば子供の学習用等に活用しようとしても、前述のように
単に文章の無味乾燥な読み上げであるために到底利用で
きない。Since the above-mentioned automatic reading device can read aurally, a blind person can know the contents described in various ordinary books (2) other than Braille without bothering others. It's extremely convenient. However, since the description content of the book (2) is expressed only by synthesized speech, which is the same as a stick reading, the persuasive power and the realism of a concrete image such as when a human reads and listens to the book (2). It is scarce. Therefore, it is easy to get tired and is not used permanently, and even if it is used for learning for children other than blind people, it cannot be used at all because it is a plain dry reading of sentences as described above. ..

【０００５】そこで本発明は、本の記載内容を判読する
合成音声と共にその記述内容の景色や情景等に合致した
効果音を同時に出力したり、或いは本の記載内容を判読
する合成音声と共にその記載内容の景色や情景等のイメ
ージを映像で映し出せる自動読書装置を提供することを
技術的課題とするものである。Therefore, according to the present invention, a sound effect that matches the scenery, scene, etc. of the description content is output at the same time as the synthesized sound that interprets the description content of the book, or the description together with the synthetic sound that interprets the description content of the book. It is a technical object to provide an automatic reading device capable of displaying images such as scenery and scenes of contents in a video.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記した課題
を達成するための技術的手段として、自動読書装置を次
のように構成した。即ち、本の印刷文字を光学的に読み
取ったイメージデータを１文字毎に切り出し文字認識し
て本の文章のテキストデータを作成する光学文字読取部
と、前記テキストデータを合成音声として出力する発音
装置とからなり、該発音装置部に、前記テキストデータ
を合成音声に変換する音声合成手段と、予め設定された
文章の情景を示すイメージキーワードが前記テキストデ
ータ中に存在するか否かを判別するイメージキーワード
検出手段と、該イメージキーワード検出手段で検出され
たイメージキーワードに対応する効果音データを効果音
データメモリから読み出す効果音データ検出手段と、こ
の読み出された効果音データを再生する効果音データ再
生手段と、この再生された効果音と前記音声合成手段か
らの合成音声とをミキシングして出力する音声出力手段
とを備えたことを特徴として構成されている。The present invention has an automatic reading device configured as follows as a technical means for achieving the above-mentioned problems. That is, an optical character reading unit that creates image data of a text of a book by recognizing image data obtained by optically reading printed characters of a book for each character, and a sounding device that outputs the text data as a synthetic voice. An image for determining whether or not a voice synthesizing unit for converting the text data into a synthetic voice and a preset image keyword indicating a scene of a sentence are present in the text data in the sounding device unit. Keyword detecting means, sound effect data detecting means for reading sound effect data corresponding to the image keyword detected by the image keyword detecting means from the sound effect data memory, and sound effect data for reproducing the read sound effect data. A reproducing means, and the reproduced sound effect and the synthesized voice from the voice synthesizing means are mixed and output. It is configured as characterized by comprising an audio output means that.

【０００７】または、前述の発音装置部に代えて音声合
成処理兼表示制御部を設け、この音声合成処理兼表示制
御部に、前記テキストデータを合成音声に変換する音声
合成手段と、予め設定された文章の情景を示すイメージ
キーワードが前記テキストデータ中に存在するか否かを
判別するイメージキーワド検出手段と、該イメージキー
ワード検出手段で検出されたイメージキーワードに対応
する映像データを検索する映像データ検索部と、この映
像データ検索部により検索された映像データを映像デー
タメモリから読み出しイメージ画像表示部に表示するよ
う制御するとともに前記音声合成手段による合成音声を
発音手段から出力させる信号処理制御手段とを備えたこ
とを特徴として構成されている。Alternatively, a voice synthesizing process / display control unit is provided in place of the above-described sounding device unit, and the voice synthesizing process / display control unit is preset with voice synthesizing means for converting the text data into synthetic voice. Image key word detecting means for determining whether or not an image keyword indicating a scene of a sentence exists in the text data, and image data searching for searching image data corresponding to the image keyword detected by the image keyword detecting means And a signal processing control means for controlling the video data retrieved by the video data retrieval section to be read out from the video data memory and displayed on the image display section, and at the same time to output the synthesized voice by the voice synthesis means from the sounding means. It is characterized by having.

【０００８】[0008]

【作用】前者の自動読書装置では、光学文字読取部にお
いて、本の紙面の文章を、例えばイメージセンサーによ
る光電変換によりイメージデータとして取り込み、この
イメージデータから個々の文字の切り出しを行なうとと
もに、この切り出された文字を文字認識部で何の文字で
あるかを認識し、この認識した文字をテキストデータと
して記憶する。そして、発音装置部において、テキスト
データを例えば句点「。」までの読み上げ単位毎に読み
出して入力し、イメージキーワード検出手段が、読み込
まれたテキストデータ中に文章の情景を示すイメージキ
ーワドが存在するか否かを判別し、テキストデータ中に
イメージキーワードが存在する場合、例えばテキストデ
ータの文章が「…の雨の音がにわかに激しくなった。」
とすると、「雨の音」がイメージキーワードであり、効
果音データ検出手段が、イメージキーワードに対応する
効果音データを検索し、効果音データメモリに予め記憶
されているイメージキーワードに対応する効果音データ
が読み出され、この効果音データが効果音データ再生手
段により再生されると同時に、テキストデータが音声合
成手段により合成音声に変換され、この合成音声と再生
効果音とがミキシングされてスピーカ等の音声出力手段
から出力される。従って、読み上げられる文章の情景に
合致した効果音が同時に出力されるので、読者に臨場感
とその具体的なイメージを伝えることができる。In the former automatic reading device, in the optical character reading unit, the text on the paper surface of the book is captured as image data by photoelectric conversion by, for example, an image sensor, and individual characters are cut out from this image data and cut out. The character recognizing unit recognizes which character is the recognized character and stores the recognized character as text data. Then, in the sounding device section, the text data is read and input for each reading unit up to the punctuation mark ".", And the image keyword detecting means detects whether an image key word indicating the scene of the sentence exists in the read text data. If there is an image keyword in the text data, for example, the sentence of the text data is "the sound of rain suddenly becomes intense."
Then, "sound of rain" is an image keyword, and the sound effect data detection means searches the sound effect data corresponding to the image keyword, and the sound effect data corresponding to the image keyword stored in advance in the sound effect data memory. The data is read out, and this sound effect data is reproduced by the sound effect data reproducing means, and at the same time, the text data is converted into the synthetic sound by the sound synthesizing means, and the synthetic sound and the reproduced sound effect are mixed to produce a speaker or the like. Is output from the voice output means. Therefore, a sound effect that matches the scene of the read sentence is output at the same time, so that it is possible to convey a sense of presence and a concrete image to the reader.

【０００９】一方、後者の自動読書装置では、光学文字
読取部は前述の自動読書装置と同様である。そして、音
声合成処理兼表示制御部が、テキストデータを読み上げ
単位毎に読み出して入力すると、イメージキーワード検
出部が、読み込まれたテキストデータ中に文章の情景を
示すイメージキーワードが存在するか否かを判別し、テ
キストデータ中にイメージキーワードが存在する場合、
例えばテキストデータの文章が「雪国であった。」とす
ると、「雪国」がイメージキーワードであり、映像デー
タ検索手段が、「雪国」のイメージキーワードに対応す
る映像データを検索し、雪国の情景を表現する「雪の景
色」の映像データをイメージ画像表示部に描画するとと
もに、テキストデータが音声合成部により合成音声に変
換されてスピーカから出力される。従って、読み上げら
れる文章の情景に合致した画像が同時に映し出されるの
で、読者に臨場感とその具体的なイメージを伝えること
ができ、特に、子供の学習用等に適したものとなる。On the other hand, in the latter automatic reading device, the optical character reading section is the same as that of the above-mentioned automatic reading device. Then, when the voice synthesis processing / display control unit reads out and inputs the text data for each reading unit, the image keyword detection unit determines whether or not an image keyword indicating the scene of the sentence exists in the read text data. If there is an image keyword in the text data,
For example, if the sentence of the text data is "It was Snow Country.", "Snow Country" is the image keyword, and the video data search means searches the video data corresponding to the image keyword of "Snow Country" to display the scene of the snow country. The image data of the "snow scene" to be expressed is drawn on the image display unit, and the text data is converted into synthetic voice by the voice synthesis unit and output from the speaker. Therefore, since an image that matches the scene of the text being read is displayed at the same time, it is possible to convey the sense of presence and its concrete image to the reader, which is particularly suitable for children's learning.

【００１０】[0010]

【実施例】以下、本発明の好適な実施例について図面を
参照しながら詳述する。図１は本発明の一実施例の外観
を示し、正面中央下部に、本（２）を装置にセットする
ための本挿入トレイ（１０）が設けられ、このトレイ
（１０）の左右両側にスピーカ（１１）がそれぞれ配設
され、上面に操作部（１２）が設けられている。そし
て、図２は該実施例の機能ブロック図を示し、大別する
と、光学文字読取部（１３）、発音装置部（１４）およ
び制御回路（１５）とにより構成されている。これらの
各構成要素については図４および図５の各フローチャー
トによる作用説明に基づき説明する。図４は光学文字読
取部（１３）の作用を示し、本（２）の紙面の文章を、
イメージ入力部（１９）のイメージセンサーによる光電
変換によりイメージデータとして文字読取制御部（１
７）に取り込む（ステップＳ１）。文字読取制御部（１
７）が、入力されたイメージデータから個々の文字の切
り出しを行なうとともに、この切り出された文字を文字
認識部（１８）を参照しながら何の文字であるかを認識
し（ステップＳ２）、この認識した文字をテキストデー
タとしてテキストデータメモリ（１６）に一時記憶する
（ステップＳ３）。そして、光学文字読取部（１３）に
よる読取り動作が終了したと判別（ステップＳ４）され
るまで、入力されたイメージデータにおける全文字の認
識が終了する毎に改頁機構部（２０）により本（２）の
頁を自動的にめくった後にステップＳ１にジャンプして
前述と同様のルーチンを繰り返す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described in detail below with reference to the drawings. FIG. 1 shows an external appearance of an embodiment of the present invention. A book insertion tray (10) for setting a book (2) in an apparatus is provided at the lower center of the front surface, and speakers are provided on both left and right sides of the tray (10). (11) are respectively arranged, and the operation portion (12) is provided on the upper surface. FIG. 2 shows a functional block diagram of this embodiment, which is roughly composed of an optical character reading section (13), a sound producing section (14) and a control circuit (15). Each of these components will be described based on the description of the operation by the flowcharts of FIGS. 4 and 5. FIG. 4 shows the operation of the optical character reading unit (13), which converts the text on the paper of the book (2) into
The character reading control unit (1) is provided as image data by photoelectric conversion by the image sensor of the image input unit (19).
7) (step S1). Character reading controller (1
7) cuts out individual characters from the input image data, and recognizes the cut-out characters by referring to the character recognition unit (18) (step S2). The recognized character is temporarily stored as text data in the text data memory (16) (step S3). Then, until it is judged that the reading operation by the optical character reading unit (13) is completed (step S4), the page feed mechanism unit (20) reads the book () every time the recognition of all characters in the input image data is completed. After automatically turning the page of 2), the process jumps to step S1 and the same routine as described above is repeated.

【００１１】図５は発音装置部（１４) の作用を示し、
発音制御部（２１）が、テキストデータメモリ（１６）
に記憶のテキストデータを例えば句点「。」までの読み
上げ単位毎に読み出して入力する（ステップＳ６）。イ
メージキーワード検出部（２３）が、発音制御部（２
１）に読み込まれたテキストデータ中に文章の情景を示
すイメージキーワードが存在するか否かを判別する（ス
テップＳ７）。このイメージキーワードは、効果音デー
タメモリ（２５）に予め記憶されている各効果音データ
に対応して設定されている。もしもイメージキーワード
が存在しない場合には、そのテキストデータが音声合成
部（２２）により合成音声に変換されてスピーカ（１
１）から出力される。FIG. 5 shows the operation of the sounding device section (14),
The pronunciation control unit (21) has a text data memory (16).
The stored text data is read and input for each reading unit up to the punctuation mark "." (Step S6). The image keyword detection unit (23) is connected to the pronunciation control unit (2
It is determined whether or not the image keyword indicating the scene of the sentence exists in the text data read in 1) (step S7). This image keyword is set corresponding to each sound effect data stored in advance in the sound effect data memory (25). If the image keyword does not exist, the text data is converted into synthetic speech by the speech synthesis unit (22) and the speaker (1
It is output from 1).

【００１２】一方、テキストデータ中にイメージキーワ
ードが存在する場合、例えばテキストデータの文章が
「…の雨の音がにわかに激しくなった。」とすると、
「雨の音」がイメージキーワードであり、効果音データ
検出部（２４）が、前述のイメージキーワードに対応す
る効果音データを検索する（ステップＳ８）。即ち、効
果音データ検出部（２４）が図６（ａ）に示すようなデ
ータ検索テーブルに基づき前述の「雨の音」のイメージ
キーワードに対応する効果音データ番号の「１」を検出
し、更に、図６（ｂ）に示すように効果音データメモリ
（２５）に効果音データ番号「１」として予め記憶され
ている「雨の降る音」の効果音データが読み出される。On the other hand, when an image keyword is present in the text data, for example, if the sentence of the text data is "the sound of rain suddenly became severe."
“Sound of rain” is an image keyword, and the sound effect data detection unit (24) searches for sound effect data corresponding to the above-mentioned image keyword (step S8). That is, the sound effect data detection unit (24) detects the sound effect data number "1" corresponding to the image keyword of "rain sound" based on the data search table as shown in FIG. Further, as shown in FIG. 6B, the sound effect data of “raining sound” stored in advance in the sound effect data memory (25) as the sound effect data number “1” is read.

【００１３】続いて、前述の読み出された「雨の降る
音」の効果音データが効果音データ再生部（２６）によ
り再生されると同時に、テキストデータが音声合成部
（２２）により合成音声に変換され（ステップＳ９）、
この合成音声と再生効果音とがミキシングされてスピー
カ（１１）から出力される（ステップＳ１０）。そし
て、テキストデータメモリ（１６）から全てのテキスト
データの読み出しが終了したと判別（ステップＳ１１）
されるまで、ステップＳ６にジャンプして前述と同様の
ルーチンを繰り返して残りのテキストデータの読み出し
を行なう。従って、読み上げられる文章の情景に合致し
た効果音が同時に出力されるので、読者に臨場感とその
具体的なイメージを伝えることができる。Subsequently, the above-mentioned read-out sound effect data of "rainy sound" is reproduced by the sound effect data reproducing section (26), and at the same time, the text data is synthesized by the voice synthesizing section (22). Is converted to (step S9),
The synthesized voice and the reproduced sound effect are mixed and output from the speaker (11) (step S10). Then, it is determined that the reading of all the text data from the text data memory (16) is completed (step S11).
Until this is done, the process jumps to step S6 and the same routine as described above is repeated to read the remaining text data. Therefore, a sound effect that matches the scene of the read sentence is output at the same time, so that it is possible to convey a sense of presence and a concrete image to the reader.

【００１４】また、光学文字読取部（１３）および発音
装置部（１４）の個々の制御並びにこれら相互間のテキ
ストデータの受け渡し制御は制御回路（１５）が行な
う。従って、光学文字読取部（１３）は、発音装置部
（１４）の信号処理状況とは独立して改頁機構部（２
０）を駆動して次頁の読み取りを行なえるので、次頁の
読み上げ動作に備えることができる。このような処理を
行なうことにより何頁にもわたる連続頁の読み上げが可
能となる。A control circuit (15) controls the optical character reading section (13) and the sound producing section (14) individually and controls the transfer of text data between them. Therefore, the optical character reading unit (13) is independent of the signal processing status of the sounding device unit (14) and the page break mechanism unit (2).
0) can be driven to read the next page, so that it is possible to prepare for the reading operation of the next page. By performing such processing, it is possible to read continuous pages over many pages.

【００１５】図３は前記実施例を具現化したブロックを
示し、同図において図２と同一若しくは同等のものには
同一の符号を付してあり、図２の前述の説明に対し補足
すべき事項のみを説明する。光学文字読取部（１３）に
おいて、本（２）の印刷文字をイメージセンサー（２
８）で読み取った後にＡ／Ｄ変換部（２７）でデジタル
信号に変換したイメージデータからの各文字の切り出し
およびこの切り出した文字の認識を、プログラムＲＯＭ
（２９）、辞書用ＲＯＭ（３０）およびワーク用ＲＡＭ
（３１）が接続された文字読取り制御用ＣＰＵ（１７）
による制御により行ない、処理される。FIG. 3 shows a block embodying the above-mentioned embodiment, in which the same or equivalent parts as those in FIG. 2 are designated by the same reference numerals and should be supplemented to the above description of FIG. Only matters will be explained. In the optical character reading unit (13), the print character of the book (2) is read by the image sensor (2
A program ROM for cutting out each character from the image data converted into a digital signal by the A / D converter (27) after being read in 8) and recognizing the cut out character
(29), dictionary ROM (30) and work RAM
Character reading control CPU (17) connected to (31)
It is processed and processed under the control of.

【００１６】一方、発音装置部（１４）において、文字
認識されたテキストデータからイメージキーワードを検
出し、この検出したイメージキーワードに対応する効果
音データを検索した後にＤ／Ａ変換部（４０）でアナロ
グ信号に変換し、これと同時に、テキストデータに基づ
きデジタル・シグナル・プロセッサ（３９）によって音
声を合成してこれをＤ／Ａ変換部（４１）でアナログ信
号に変換し、これらのテキストデータと効果音データと
の各アナログ信号をアンプ（４２）でミキシングしてス
ピーカ（４３）を通じ出力する。これらの動作を、プロ
グラムＲＯＭ（３５）、ワーク用ＲＡＭ（３６）、音声
合成辞書用ＲＯＭおよび効果音データ用ＲＯＭ（３８）
が接続された発音制御用ＣＰＵ（２１）により制御し、
処理される。On the other hand, in the sound producing unit (14), an image keyword is detected from the text data recognized as characters, and the sound effect data corresponding to the detected image keyword is searched, and then the D / A conversion unit (40). The analog signal is converted into an analog signal, and at the same time, the digital signal processor (39) synthesizes voice based on the text data, and the D / A converter (41) converts the voice into an analog signal. Each analog signal with the sound effect data is mixed by the amplifier (42) and output through the speaker (43). These operations are performed by a program ROM (35), a work RAM (36), a voice synthesis dictionary ROM, and a sound effect data ROM (38).
Is controlled by the tone control CPU (21) connected to
It is processed.

【００１７】尚、前記実施例では、改頁機構部（２０）
を具備して本（２）を一々取り出すことなく自動的に頁
をめくる構成としたが、手動で本（２）をめくるように
してもよい。また、効果音データは、自然界の音または
合成した音の何れを用いてもよい。更に、図３のブロッ
クでは３個のＣＰＵ（１７），（２１），（３２）を用
いた場合を例示したが、１個のＣＰＵを共用して制御す
ることもでき、また、効果音の再生時間長は、読み上げ
られる文章の内容に応じて適宜設定できるようにもでき
る。In the above embodiment, the page break mechanism section (20) is used.
Although the book (2) is provided and the pages are automatically turned without taking out the book (2) one by one, the book (2) may be turned manually. Further, as the sound effect data, either a natural sound or a synthesized sound may be used. Furthermore, in the block of FIG. 3, the case where three CPUs (17), (21), and (32) are used is illustrated, but one CPU can be shared and controlled, and the effect sound is generated. The reproduction time length can be set as appropriate according to the content of the read sentence.

【００１８】図７は、前記実施例の効果音に代えてイメ
ージ画像を利用した本発明の他の実施例の外観を示し、
同図において図１と同一若しくは同等のものには同一の
符号を付してあり、外観上相違する点は、本挿入トレイ
（１０）の上方部に、液晶表示装置からなるイメージ画
像表示部（４４）を設けた構成のみである。これの機能
ブロック図を図８に示し、同図において図２と同一若し
くは同等のものには同一の符号を付してある。そして、
相違する点は、発音装置部（１４）に代えて音声合成処
理兼表示制御部（４５）を設けた構成のみである。この
音声合成処理兼表示制御部（４５）は、全体を制御する
音声合成処理兼イメージ表示用制御部（４６）と、音声
合成部（２２）と、イメージキーワード検出部（２３）
と、検出されたイメージキーワードに基づいて映像デー
タ番号を検索する映像データ検索部（４７）と、予め所
定の映像データが記憶された映像データメモリ（４９）
と、この映像データを表示するイメージ画像表示部（４
４）とにより構成されている。FIG. 7 shows the appearance of another embodiment of the present invention in which an image image is used instead of the sound effect of the above embodiment.
In the figure, the same or equivalent parts as those in FIG. 1 are designated by the same reference numerals, and the difference in appearance is that an image image display section (a liquid crystal display device) is provided above the main insertion tray (10). 44) is only provided. A functional block diagram of this is shown in FIG. 8, in which the same or equivalent parts to those in FIG. 2 are designated by the same reference numerals. And
The only difference is the configuration in which a voice synthesis processing / display control unit (45) is provided in place of the sounding device unit (14). The voice synthesis process / display control unit (45) controls the whole voice synthesis process / image display control unit (46), voice synthesis unit (22), and image keyword detection unit (23).
A video data search unit (47) for searching a video data number based on the detected image keyword, and a video data memory (49) in which predetermined video data is stored in advance.
And the image display section (4
4) and.

【００１９】光学文字読取部（１３）は図２のものと同
様であって図４のフローチャートに基づき動作する。次
に、音声合成処理兼表示制御部（４５）の動作を図１０
のフローチャートを参照しながら説明する。音声合成処
理兼イメージ表示用制御部（４６）が、テキストデータ
メモリ（１６）に記憶のテキストデータを例えば句
点「。」までの読み上げ単位毎に読み出して入力する
（ステップＳ１３）。イメージキーワード検出部（２
３）が、音声合成処理兼イメージ表示用制御部（４６）
に読み込まれたテキストデータ中に文章の情景を示すイ
メージキーワードが存在するか否かを判別する（ステッ
プＳ１４）。このイメージキーワードは、映像データメ
モリ（４８）に予め記憶されている各映像データに対応
して設定されている。もしもテキストデータ中にイメー
ジキーワードが存在しない場合には、そのテキストデー
タが音声合成部（２２）により合成音声に変換されてス
ピーカ（１１）から出力される（ステップＳ１７）。The optical character reading section (13) is similar to that of FIG. 2 and operates according to the flow chart of FIG. Next, the operation of the voice synthesis processing / display control unit (45) will be described with reference to FIG.
This will be described with reference to the flowchart of FIG. The voice synthesis processing / image display control unit (46) reads out and inputs the text data stored in the text data memory (16) for each reading unit up to a punctuation mark "." (Step S13). Image keyword detector (2
3) is a control unit (46) for voice synthesis processing and image display
It is determined whether or not there is an image keyword indicating the scene of the sentence in the text data read in (step S14). This image keyword is set corresponding to each video data stored in advance in the video data memory (48). If the image keyword does not exist in the text data, the text data is converted into synthetic speech by the speech synthesis unit (22) and output from the speaker (11) (step S17).

【００２０】一方、テキストデータ中にイメージキーワ
ードが存在する場合、例えばテキストデータの文章が
「雪国であった。」とすると、「雪国」がイメージキー
ワードであり、映像データ検索部（４７）が、前述のイ
メージキーワードに対応する映像データを検索する（ス
テップＳ１５）。即ち、映像データ検索部（４７）が図
１１（ａ）に示すようなデータ検索テーブルに基づき前
述の「雪国」のイメージキーワードに対応する映像デー
タ番号の「１」を検出し、更に、図１１（ｂ）に示すよ
うに、映像データメモリ（４８）に映像データ番号
「１」に関連付けて予め記憶されている雪国の情景を表
現する「雪の景色」の映像データが読み出される。On the other hand, when an image keyword is present in the text data, for example, if the sentence of the text data is "It was Snow Country.", "Snow Country" is the image keyword, and the video data retrieval unit (47) The video data corresponding to the above-mentioned image keyword is searched (step S15). That is, the video data search unit (47) detects the video data number "1" corresponding to the image keyword "Snow Country" based on the data search table as shown in FIG. As shown in (b), the image data of "snow scene" representing the scene of the snow country, which is stored in advance in the image data memory (48) in association with the image data number "1", is read.

【００２１】続いて、音声合成処理兼イメージ表示制御
部（４６）が、前述の読み出された「雪の景色」の映像
データをイメージ画像表示部（４４）に描画させるとと
もに（ステップＳ１６）、テキストデータが音声合成部
（２２）により合成音声に変換されてスピーカ（１１）
から出力される（ステップＳ１７）。そして、テキスト
データメモリ（１６）から全てのテキストデータの読み
出しが終了したと判別（ステップＳ１８）されるまで、
ステップＳ１３にジャンプして前述と同様のルーチンを
繰り返して残りのテキストデータの読み出しを行なう。
従って、読み上げられる文章の情景に合致した映像が同
時に描画されるので、読者に臨場感とその具体的なイメ
ージを伝えることができ、特に、子供の学習用等の用途
に極めて適したものとなる。Subsequently, the voice synthesis processing / image display control unit (46) causes the image image display unit (44) to draw the read video data of the "snow scene" (step S16). The text data is converted into synthetic speech by the speech synthesis unit (22) and the speaker (11)
Is output (step S17). Then, until it is determined that the reading of all the text data from the text data memory (16) is completed (step S18),
The process jumps to step S13 and the same routine as described above is repeated to read the remaining text data.
Therefore, an image that matches the scene of the text being read is drawn at the same time, so that it is possible to convey the sense of presence and its concrete image to the reader, and it is particularly suitable for applications such as learning for children. ..

【００２２】図９は前記実施例を具現化したブロック図
を示し、同図において図３および図８と同一若しくは同
等のものには同一の符号を付してあり、図８の前述の説
明に対し補足すべき事項のみを説明する。光学文字読取
部（１３）および制御回路（１５）については図３と同
様であるのでその説明を省略する。そして、音声合成処
理兼表示制御部（４５）において、文字認識されたテキ
ストデータからイメージキーワードを検出し、この検出
したイメージキーワードに対応する映像データを検索し
た後に、この映像データを表示ドライバ（５０）により
イメージ画像表示部（４４）に表示し、また、テキスト
データに基づきデジタル・シグナル・プロセッサ（３
９）により音声を合成してこれをＤ／Ａ変換部（４１）
でアナログ信号に変換し、このアナログ信号をアンプ
（４２）を介してスピーカ（１１）から出力する。これ
らの動作を、プログラムＲＯＭ（３５）、ワーク用ＲＡ
Ｍ（３６）、音声合成辞書用ＲＯＭおよび映像データ用
ＲＯＭ（４９）が接続された発音合成兼表示制御用ＣＰ
Ｕ（４３）により制御し、処理される。FIG. 9 shows a block diagram embodying the above-mentioned embodiment, in which the same or equivalent parts as those of FIGS. 3 and 8 are designated by the same reference numerals, and the description of FIG. Only items that should be supplemented will be explained. The optical character reading section (13) and the control circuit (15) are the same as those in FIG. Then, the voice synthesis processing / display control unit (45) detects an image keyword from the text data subjected to character recognition, searches for video data corresponding to the detected image keyword, and then displays this video data in a display driver (50). ) Is displayed on the image display section (44), and the digital signal processor (3
9) The voice is synthesized by the D / A converter (41).
Is converted into an analog signal by the, and the analog signal is output from the speaker (11) through the amplifier (42). These operations are performed by the program ROM (35) and the work RA.
M (36), a voice synthesis dictionary ROM and a video data ROM (49) connected to the pronunciation synthesis and display control CP
It is controlled and processed by U (43).

【００２３】尚、映像データとしては、絵や写真を基に
作成した画像データの他に、風景以外の動画データを用
いてもよい。As the video data, moving image data other than landscape may be used in addition to image data created based on pictures and photographs.

【００２４】[0024]

【発明の効果】以上のように本発明の自動読書装置によ
ると、本の文字を光学的に読み取った後にこれらを個々
の文字に切り出し文字認識したテキストデータを合成音
声として出力するとともに、テキストデータ中の景色や
情景の描写内容を自動的に認識してその情景を表す効果
音を同時に出力する構成としたので、恰も読み上げる文
章の情景に合致した効果音が同時に出力されることによ
って読者に臨場感とその具体的なイメージを伝えること
ができる。As described above, according to the automatic reading apparatus of the present invention, after the characters of a book are optically read, the characters are cut into individual characters and the recognized character data is output as a synthetic voice, and the text data is also output. Since it is configured to automatically recognize the description of the scenery and the scene inside and output the sound effect that represents that scene at the same time, the sound effect that matches the scene of the sentence that is read aloud will be output at the same time to the reader. You can convey the feeling and its concrete image.

【００２５】また、本の文字を光学的に読み取り文字認
識したテキストデータを合成音声として出力するととも
に、テキストデータ中の景色や情景の描写内容を自動的
に認識してその情景を表すイメージを映像として描画す
る構成としたので、恰も読み上げる文章の情景に合致し
たイメージ画像が映し出されることによって読者に臨場
感とその具体的なイメージを伝えることができ、特に子
供用の学習機器として活用できる利点かある。In addition, the text data obtained by optically reading the characters in the book are output as synthesized voice, and the scenery and the depiction contents of the scene in the text data are automatically recognized and an image representing the scene is displayed. Since it is configured as a drawing, it is possible to convey the presence and specific image to the reader by displaying an image image that matches the scene of the sentence to be read aloud, and it is an advantage that it can be used especially as a learning device for children. is there.

[Brief description of drawings]

【図１】本発明の一実施例の外観を示す斜視図である。FIG. 1 is a perspective view showing an appearance of an embodiment of the present invention.

【図２】同上、機能ブロック図である。FIG. 2 is a functional block diagram of the same.

【図３】同上、具現化した構成のブロックである。FIG. 3 is a block diagram of the embodied configuration of the same.

【図４】同上、光学文字読取部の動作を示すフローチャ
ートである。FIG. 4 is a flowchart showing the operation of the optical character reading unit.

【図５】同上、発音装置部の動作を示すフローチャート
である。FIG. 5 is a flowchart showing the operation of the sound producing device section.

【図６】同上、（ａ），（ｂ）はイメージキーワード検
索用のデータ検索テーブルおよび効果音データの記憶状
態の説明図である。6 (a) and 6 (b) are explanatory views of a data search table for image keyword search and a storage state of sound effect data.

【図７】本発明の他の実施例の外観を示す斜視図であ
る。FIG. 7 is a perspective view showing the outer appearance of another embodiment of the present invention.

【図８】同上、機能ブロック図である。FIG. 8 is a functional block diagram of the same as above.

【図９】同上、具現化した構成のブロックである。FIG. 9 is a block of the embodied configuration of the above.

【図１０】同上、音声合成処理兼表示制御部の動作を示
すフローチャートである。FIG. 10 is a flowchart showing the operation of the voice synthesis processing / display control unit of the above.

【図１１】同上、（ａ），（ｂ）はイメージキーワード
検索用のデータ検索テーブルおよび映像データの記憶状
態の説明図である。FIG. 11A and FIG. 11B are explanatory diagrams of a data search table for image keyword search and a storage state of video data.

【図１２】従来の盲人用自動読書装置の外観を示す斜視
図である。FIG. 12 is a perspective view showing an appearance of a conventional blind automatic reading device.

【図１３】同上、システム構成図である。FIG. 13 is a system configuration diagram of the same.

[Explanation of symbols]

２本１１スピーカ（音声出力手段）１３光学文字読取部１４発音装置部２２音声合成部（音声合成手段）２３イメージキーワード検出部（イメージキーワード
検出手段）２４効果音データ検出部（効果音データ検出手段）２５効果音データメモリ２６効果音データ再生部（効果音データ再生手段）４４イメージ画像表示部４５音声合成処理兼表示制御部４６音声合成処理兼イメージ表示用制御部（信号処理
制御手段）４７映像データ検索部（映像データ検索手段）４８映像データメモリTwo 11 Speaker (sound output means) 13 Optical character reading section 14 Sounding device section 22 Speech synthesis section (speech synthesis section) 23 Image keyword detection section (image keyword detection section) 24 Sound effect data detection section (Sound effect data detection section) ) 25 sound effect data memory 26 sound effect data reproducing section (sound effect data reproducing means) 44 image image display section 45 voice synthesis processing / display control section 46 voice synthesis processing / image display control section (signal processing control section) 47 video Data search unit (video data search means) 48 Video data memory

Claims

[Claims]

1. An optical character reading section for creating image data of a text of a book by recognizing image data obtained by optically reading printed characters of a book for each character, and outputting the text data as a synthetic voice. And a voice synthesizer for converting the text data into a synthesized voice, and whether or not an image keyword indicating a preset sentence scene is present in the text data. Image keyword detecting means for determining, sound effect data detecting means for reading sound effect data corresponding to the image keyword detected by the image keyword detecting means from the sound effect data memory, and reproducing the read effect sound data. The sound effect data reproducing means, and the reproduced sound effect and the synthesized voice from the voice synthesizing means are mixed. An automatic reading device, comprising:

2. A voice synthesizing process / display control unit is provided in place of the sounding device unit according to claim 1, and the voice synthesizing process / display control unit converts the text data into synthetic voice. Means, an image key word detecting means for determining whether or not an image keyword indicating a scene of a preset sentence exists in the text data, and video data corresponding to the image keyword detected by the image keyword detecting means. And a control unit for controlling the video data retrieval unit for retrieving the video data retrieved by the video data retrieval unit from the video data memory to be displayed on the image display unit, and for outputting the synthesized voice by the voice synthesis unit from the sounding unit. An automatic reading device comprising a signal processing control means.