JP2000099624A

JP2000099624A - Device for reading text including image character

Info

Publication number: JP2000099624A
Application number: JP10267589A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄; Junichiro Fujimoto; 潤一郎藤本; Hiroo Kitagawa; 博雄北川; Takashi Ariyoshi; 敬有吉; Yuichi Kojima; 裕一小島; Yoshibumi Sakuramata; 義文櫻又; Junichi Takami; 淳一鷹見; Akira Ro; 彬呂
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-09-22
Filing date: 1998-09-22
Publication date: 2000-04-07

Abstract

PROBLEM TO BE SOLVED: To improve the easiness of the understanding of character expression with an image by outputting a character expressed with an image as a synthetic voice. SOLUTION: Character codes and image codes in a text are separated by a character code/image separating means 1, and the separated image characters are recognized, and the recognized characters are replaced with the character codes by a character recognizing and character encoding means 2. Then, the character coded image characters and the original character codes are arranged according to the character array of the text by a reading-out order controlling part 3, and the order-controlled character codes are voice-outputted by a voice synthesizing means 4. Thus, even when the image characters are included in the text, the image characters can be character-encoded and read out.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、イメージ文字を含
むテキストの読み上げ装置、より詳細には、イメージに
よって表わされた文字（ビットマップやベクトルデータ
によって表わされたイメージ文字）を含むテキストを合
成音声にて読み上げ可能にしたイメージ文字を含むテキ
ストの読み上げ装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech apparatus for reading text including image characters, and more particularly to a text-to-speech device including characters represented by images (image characters represented by bitmaps and vector data). The present invention relates to a text-to-speech apparatus for reading text including image characters which can be read aloud by synthetic speech.

【０００２】[0002]

【従来の技術】電子メールや電子掲示板などの電子化文
書によるコミュニケーションが広がりを見せるに従い、
外出先からの電話によるアクセスや視覚障害者の利用な
ど、電子化文書の内容を合成音声によって確認する場面
が見られるようになった。而して、ホームページなどで
は表現力を高めるために文字コードではなくイメージに
よって文字を表わすことも良く行われるが、これはその
ままでは音声化できない。2. Description of the Related Art As electronic documents such as electronic mails and electronic bulletin boards have become more widely used for communication,
In some cases, the contents of digitized documents are confirmed by synthesized speech, such as access from outside by telephone or use by the visually impaired. Thus, on a homepage or the like, characters are often represented by images instead of character codes in order to enhance expressive power, but this cannot be converted into speech as it is.

【０００３】[0003]

【発明が解決しようとする課題】上述のように、ホーム
ページ等においては、イメージによって文字を表わすこ
とが行われているが、イメージ文字のままでは音声化す
ることはできない。As described above, characters are represented by images on homepages and the like, but cannot be converted into voices using image characters as they are.

【０００４】本発明は、上述のごとき実情に鑑みてなさ
れたもので、イメージによって表わされた文字を合成音
声として出力可能とし、もって、イメージによる文字表
現の理解容易性を向上することを目的としてなされたも
のである。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to make it possible to output a character represented by an image as synthesized speech, thereby improving the understandability of the character expression by the image. It was done as.

【０００５】[0005]

【課題を解決するための手段】請求項１の発明は、テキ
スト中の文字コードとイメージ文字を分離する文字コー
ド／イメージ分離手段と、分離されたイメージ文字を認
識し、認識された文字を文字コードに置き換える文字認
識・文字コード化手段と、該文字コード化されたイメー
ジ文字と元の文字コードとをテキストの文字配列に応じ
て並べる読み上げ順序制御手段と、順序制御された文字
コードを音声合成する音声合成手段とを有し、イメージ
文字を文字コード化して読み上げることを特徴としたも
のである。According to the first aspect of the present invention, there is provided a character code / image separating means for separating a character code and an image character in a text, a character code / image separating means for recognizing the separated image character, and converting the recognized character to a character. Character recognition / character encoding means for replacing with a code, reading order control means for arranging the character-encoded image characters and the original character code in accordance with the character arrangement of the text, and speech synthesis of the order-controlled character code Voice synthesizing means for converting image characters into character codes and reading them out.

【０００６】請求項２の発明は、請求項１の発明におい
て、前記分離されたイメージ文字の視覚的注目度を判定
する視覚重要度判定手段を有し、該視覚重要度判定手段
の判定結果に応じて、前記読み上げ順序を制御するよう
にしたことを特徴としたものである。According to a second aspect of the present invention, in the first aspect of the present invention, there is provided a visual importance determining means for determining a visual attention degree of the separated image characters, and The reading order is controlled accordingly.

【０００７】請求項３の発明は、請求項１の発明におい
て、前記分離されたイメージ文字の視覚的注目度を判定
する視覚重要度判定手段を有し、該視覚重要度判定手段
の判定結果に応じて、前記音声合成部を制御して前記分
離されたイメージ文字部の聴覚属性を変えるようにした
ことを特徴としたものである。In a third aspect of the present invention, in the first aspect of the present invention, there is provided a visual importance determining means for determining a visual attention degree of the separated image characters, and Accordingly, the speech synthesis unit is controlled to change the auditory attribute of the separated image character portion.

【０００８】請求項４の発明は、請求項２又は３の発明
において、前記視覚的注目度として、イメージ文字の大
きさを測定することを特徴としたものである。According to a fourth aspect of the present invention, in the second or third aspect, a size of an image character is measured as the visual attention degree.

【０００９】請求項５の発明は、請求項２又は３の発明
において、前記視覚的注目度として、イメージ文字の色
と背景色とのコントラストを測定することを特徴とした
ものである。The invention of claim 5 is characterized in that, in the invention of claim 2 or 3, the contrast between the color of the image character and the background color is measured as the degree of visual attention.

【００１０】請求項６の発明は、請求項２又は３の発明
において、前記視覚的注目度として、文字の形状特性を
測定することを特徴としたものである。A sixth aspect of the present invention is characterized in that, in the second or third aspect, a shape characteristic of a character is measured as the degree of visual attention.

【００１１】請求項７の発明は、請求項１乃至６のいず
れかの発明において、単一のイメージであっても、これ
を分割する座標情報が設定されている場合は、対応する
複数の領域別に独立した文字情報として扱うことを特徴
としたものである。According to a seventh aspect of the present invention, in any one of the first to sixth aspects, even if a single image is set, coordinate information for dividing the image is set to a plurality of corresponding areas. It is characterized in that it is handled separately as independent character information.

【００１２】[0012]

【発明の実施の形態】本発明は、ホームページ等におい
て、イメージによって表現された文字を合成音声にて出
力するようにしたもので、具体的には、以下に示すよう
な形態で実施される。 1）イメージを文字認識することにより、文字コー
ドに置き換えて読み上げるテキスト音声変換システム。 1.1）文字の視覚的注目度も判定して読み上げ順序や
聴覚属性に反映させる。 1.1.1）注目度として文字の大きさを用いる。 1.1.2）注目度として文字の色と背景色との関係を用い
る。 1.1.3）注目度として文字の形状特性を用いる。 1.2）本来の文字コードによるテキスト部分とは異な
った聴覚属性によって音声出力する。 1.3）単一のイメージであってもこれを分割する座標
情報が設定されている場合は、対応する複数の領域別に
独立した文字情報として扱う。BEST MODE FOR CARRYING OUT THE INVENTION The present invention is to output characters represented by images on a homepage or the like as synthesized speech, and is specifically implemented in the following form. 1) A text-to-speech conversion system that reads out text by recognizing characters by replacing them with character codes. 1.1) The degree of visual attention of characters is also determined and reflected in the reading order and auditory attributes. 1.1.1) Character size is used as the degree of attention. 1.1.2) The relationship between the character color and the background color is used as the degree of attention. 1.1.3) Character shape characteristics are used as the degree of attention. 1.2) Speech is output with auditory attributes different from those of the original character code text. 1.3) If coordinate information for dividing a single image is set, it is treated as independent character information for a plurality of corresponding areas.

【００１３】図１は、本発明によるテキスト読み上げ装
置の一例を説明するための要部構成図で、図中、１はテ
キスト中の文字コード（文字をコードで表わした部分）
とイメージ（文字をイメージで表した部分）を分離する
コード／イメージ分離部、２は分離されたイメージの文
字を認識し、認識された文字をコード化するイメージ文
字認識・文字コード化部、３はコード／イメージ分離部
１によって分離された文字コードと、イメージ文字認識
・文字コード化部２によってコード化された文字コード
とを、テキストに応じて並べる、読み上げ順序制御部、
４は読み上げ順序制御部３からの文字コードを合成して
音声出力する音声合成部である。FIG. 1 is a block diagram of a main part for explaining an example of a text-to-speech apparatus according to the present invention. In the drawing, reference numeral 1 denotes a character code (a character representing a character) in a text.
A code / image separation unit 2 for separating characters from an image (a part where characters are represented by images); an image character recognition / character coding unit 3 for recognizing characters of the separated image and coding the recognized characters; A reading order control unit for arranging the character codes separated by the code / image separation unit 1 and the character codes coded by the image character recognition / character coding unit 2 according to the text,
Reference numeral 4 denotes a speech synthesis unit that synthesizes the character codes from the reading order control unit 3 and outputs the synthesized speech.

【００１４】上述のように、本発明は、文字コードとイ
メージ文字が混在するテキストより、文字コードとイメ
ージ文字とを分離し、このイメージ文字を文字認識して
文字コード化し、これを元々テキスト中に存在した文字
コードと組み合わせて合成音声にて出力するようにした
ものであるが、更に、前述のようにして分離されたイメ
ージ文字の視覚的注目度も判定して読み上げ順序や聴覚
属性に反映させる。例えば、文字サイズ測定部５によっ
て、そのイメージ文字の大きさを測定し、文字色コント
ラスト測定部６によって、文字色と背景色とのコントラ
ストを測定し、更には、文字形状類似度測定部７によっ
て、線の太さや標準フォントとの非類似性等文字の形状
特性を測定し、重要度判定部８によって、これらのうち
の最も重要な視覚的注目度を判定して、読み上げ順序制
御部３を制御し、或いは、聴覚属性設定部９より、音声
合成部４により出力される音声を変えて本来の文字コー
ドによるテキスト部分とは異なる聴覚属性によって音声
出力する。As described above, according to the present invention, a character code and an image character are separated from a text in which a character code and an image character are mixed, and the image character is recognized and converted into a character code. It is designed to be output as a synthesized voice in combination with the character code that existed in the above. Furthermore, the visual attention level of the image characters separated as described above is also determined and reflected in the reading order and auditory attributes Let it. For example, the character size measuring unit 5 measures the size of the image character, the character color contrast measuring unit 6 measures the contrast between the character color and the background color, and further, the character shape similarity measuring unit 7 , The character characteristics such as the line thickness and dissimilarity with the standard font are measured, and the importance determining unit 8 determines the most important visual attention among them, and the reading order control unit 3 Controlling or changing the sound output by the sound synthesis unit 4 from the hearing attribute setting unit 9 to output a sound with a hearing attribute different from the text part by the original character code.

【００１５】図２は、単一のイメージであっても、これ
を分割する座標情報が設定されている場合に、対応する
複数の領域別に独立した文字情報として取り扱うように
した例を示す図である。換言すれば、イメージマップの
処理で、付随する座標情報によってイメージを分割処理
することで、独立したメニュー項目としての読み上げ、
独立したリンク先の設定を可能としたものである。FIG. 2 is a diagram showing an example in which, even when a single image is set, coordinate information for dividing the image is handled as independent character information for a plurality of corresponding areas. is there. In other words, in the processing of the image map, the image is divided according to the accompanying coordinate information, thereby reading out as an independent menu item,
Independent link destinations can be set.

【００１６】例えば、表１に示すような１つのイメージ
ファイルと数行のスクリプトからなるイメージマップを
読み上げる場合、〈AREA…〉の３行のスクリプトで示さ
れる３つの領域（図２中に点線で表示）内をそれぞれ独
立したイメージとして処理し、「前のページ」，「次の
ページ」，「目次に戻る」という別々の文字列を抽出す
る。For example, when reading out an image map consisting of one image file and several lines of script as shown in Table 1, three regions indicated by three lines of <AREA...> (Dotted lines in FIG. 2) ) Are processed as independent images, and separate character strings of “previous page”, “next page”, and “return to table of contents” are extracted.

【００１７】[0017]

【表１】 [Table 1]

【００１８】[0018]

【発明の効果】上述のように、ホームページ等において
は、イメージによって文字を表わすことが行われている
が、イメージ文字のままでは音声化することはできな
い。本発明は、上述のごときイメージによって表わされ
た文字を合成音声として出力可能とし、イメージによる
文字表現の理解容易性を向上させたものである。As described above, characters are represented by images on homepages and the like, but cannot be converted into voices using image characters as they are. The present invention makes it possible to output a character represented by an image as described above as a synthesized voice, thereby improving the understandability of the character expression by the image.

[Brief description of the drawings]

【図１】本発明によるイメージ文字を含むテキスト読
み上げ装置の一実施例を説明するための要部構成図であ
る。FIG. 1 is a main part configuration diagram for explaining an embodiment of a text-to-speech apparatus including image characters according to the present invention.

【図２】単一のイメージを分割し、対応する複数の領
域の独立した文字情報とする場合の例を説明する図であ
る。FIG. 2 is a diagram illustrating an example in which a single image is divided into independent character information of a plurality of corresponding areas.

[Explanation of symbols]

１…コード／イメージ分離部、２…イメージ文字認識・
文字コード化部、３…読み上げ順序制御部、４…音声合
成部、５…文字サイズ測定部、６…文字色コントラスト
測定部、７…文字形状類似度測定部、８…重要度判定
部、９…聴覚属性設定部。1 code / image separation unit 2 image character recognition
Character encoding section, 3 ... reading order control section, 4 ... voice synthesis section, 5 ... character size measuring section, 6 ... character color contrast measuring section, 7 ... character shape similarity measuring section, 8 ... importance determining section, 9 ... Auditory attribute setting unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者北川博雄東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者有吉敬東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者小島裕一東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者櫻又義文東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者鷹見淳一東京都大田区中馬込１丁目３番６号株式会社リコー内 (72)発明者呂彬東京都大田区中馬込１丁目３番６号株式会社リコー内Ｆターム(参考） 5B064 AA10 FA16 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Hiroo Kitagawa 1-3-6 Nakamagome, Ota-ku, Tokyo Stock inside Ricoh Company (72) Inventor Takashi Ariyoshi 1-3-6 Nakamagome, Ota-ku, Tokyo Stock Inside Ricoh Company (72) Inventor Yuichi Kojima 1-3-6 Nakamagome, Ota-ku, Tokyo Stock Company Ricoh Company (72) Inventor Yoshifumi Sakuramata 1-3-6 Nakamagome, Ota-ku, Tokyo Stock Company Ricoh Company (72) Inventor Junichi Takami 1-3-6 Nakamagome, Ota-ku, Tokyo Inside Ricoh Co., Ltd. (72) Inventor Ryo Akira 1-3-6 Nakamagome, Ota-ku, Tokyo F-term in Ricoh Co., Ltd. 5B064 AA10 FA16

Claims

[Claims]

1. A character code / image separating means for separating a character code and an image character in a text, and a character recognition / character coding means for recognizing the separated image character and replacing the recognized character with a character code. Reading order control means for arranging the character-coded image characters and the original character codes in accordance with the character arrangement of the text,
A text-to-speech apparatus, comprising: voice synthesis means for voice-synthesizing an order-controlled character code; and converting an image character into a character code and reading it out.

2. The apparatus according to claim 1, further comprising: a visual importance determining unit configured to determine a visual attention level of the separated image characters, wherein the reading order is controlled according to a determination result of the visual importance determining unit. The text-to-speech apparatus according to claim 1, wherein the text includes image characters.

3. A visual importance determining means for determining a visual attention level of the separated image character, and controlling the speech synthesis unit in accordance with a determination result of the visual importance determining means. 2. The text-to-speech apparatus according to claim 1, wherein the auditory attribute of the separated image character portion is changed.

4. The text-to-speech apparatus according to claim 2, wherein a size of the image character is measured as the visual attention level.

5. The text-to-speech apparatus according to claim 2, wherein a contrast between a color of an image character and a background color is measured as the degree of visual attention.

6. The text-to-speech apparatus according to claim 2, wherein a shape characteristic of a character is measured as the visual attention level.

7. A single image, if coordinate information for dividing the image is set, is treated as independent character information for a plurality of corresponding areas. A text-to-speech device that includes image characters according to any one of the above.