JPH05108716A

JPH05108716A - Machine translation system

Info

Publication number: JPH05108716A
Application number: JP3272445A
Authority: JP
Inventors: Masaki Matsudaira; 正樹松平
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-10-21
Filing date: 1991-10-21
Publication date: 1993-04-30

Abstract

PURPOSE:To provide a machine translation system capable of restoring automatically the format of input together with a character string after conversion. CONSTITUTION:An original image data input part 1 reads an objective original to be translated, and converts it into image data, and supplies it to a frame dividing part 2. The frame dividing part 2 extracts the null part of an image from the image data, and forms a frame by dividing the image data respectively based on the null part, and judges whether the inside of each frame is the character string, a table, and a pattern or not, and supplies the result to a translating part 3 as image information. The translating part 3 translates the character string from among the character string, the table, and the pattern, etc., of this image data into desired language, and supplies the translated result to a format information changing part 4. The format information changing part 4 changes the size of the frame, a character interval, and the size of a character so that the character string among the character string, the table, and the pattern of this translated result and the character string in the table enter the frame, and supplies a changed result to a translated result output part 5 as the translated result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、機械翻訳装置であっ
て、例えば、文字と図形と表などを含む原稿の翻訳に好
適な機械翻訳装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine translation apparatus suitable for translating manuscripts including, for example, characters, figures and tables.

【０００２】[0002]

【従来の技術】従来、数多くの機械翻訳装置が商品化さ
れ、次第に普及しつつある。そして、最近は入力の煩わ
しさからＯＣＲとの接続が要望されている。しかしなが
ら、単にＯＣＲと接続しただけでは図や表などを取り除
き、後編集で結合するといった操作が必要になる。そこ
で、文献「特開平１−１３７３６９号公報」では、文字
列のみからなる入力文に対する書式情報を入力文に付加
して取り込み、この取り込んだ情報に基づいて翻訳して
清書して出力するという機械翻訳方式が提案されてい
る。2. Description of the Related Art Conventionally, many machine translation devices have been commercialized and are gradually becoming popular. Recently, connection with the OCR has been demanded due to the troublesome input. However, simply connecting to the OCR requires operations such as removing the figures and tables and combining them by post-editing. Therefore, in the document “Japanese Patent Laid-Open No. 1-137369”, a machine in which format information for an input sentence consisting only of a character string is added to an input sentence and taken in, translated based on the taken information, and is written and output cleanly. A translation method has been proposed.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記文
献に示されている方式においては、原稿の中に図形や表
が挿入されている場合に、翻訳結果に図形や表などのイ
メージデータを復元することが意識されていない。例え
ば、表の中に文字がある場合や、翻訳前の入力文と翻訳
後の出力文の文字列の長さが著しく異なる場合の表の大
きさの調整などが考慮されていないために、書式を完全
に復元するためには手作業による後編集の操作が必要に
なるという問題がある。However, in the method disclosed in the above document, when a figure or a table is inserted in the manuscript, the image data such as the figure or the table is restored in the translation result. Not aware of it. For example, if there are characters in the table, or if the length of the character strings of the input sentence before translation and the output sentence after translation are significantly different, adjustment of the table size is not taken into consideration. There is a problem that a manual post-editing operation is required to completely restore the file.

【０００４】機械翻訳装置に不慣れなユーザでも簡単な
操作で入力及び出力の書式情報を意識しながら所望の書
式で翻訳できる装置が要請されている。There is a demand for a device that allows a user who is not familiar with a machine translation device to translate it in a desired format while being aware of input and output format information by a simple operation.

【０００５】この発明は、以上の課題に鑑み為されたも
のであり、その目的とするところは、自動的に入力の書
式を変換後の文字列と共に復元することができる機械翻
訳装置を提供することである。The present invention has been made in view of the above problems, and an object thereof is to provide a machine translation device capable of automatically restoring an input format together with a converted character string. That is.

【０００６】[0006]

【課題を解決するための手段】この発明は、以上の目的
を達成するために、この発明の機械翻訳装置は、次のよ
うな特徴的な手段を備えて改良した。In order to achieve the above object, the present invention has improved the machine translation apparatus of the present invention by providing the following characteristic means.

【０００７】つまり、原稿をイメージデータとして取り
込むイメージデータ取込手段と、このイメージデータか
ら文字を認識する文字認識手段と、このイメージデータ
を構成するイメージの種類に応じて、上記イメージデー
タを枠を使って枠形式で分類する分類手段と、この分類
された枠形式のイメージごとの、原稿上の配置構造と、
イメージの内容構造とを表す書式情報を、上記分類され
た枠形式のイメージごとに作成する書式情報作成手段
と、上記文字認識手段で認識された文字の列から所望の
異なる種類の文字列に変換する翻訳手段と、この変換後
の文字列と上記書式情報とを用いて、上記変換後の文字
列にあった新たな書式情報に変更する書式情報変更手段
とを備えて、この変更された書式情報に基づき所望の文
字列のイメージデータを得ることを特徴とする。That is, an image data capturing means for capturing a document as image data, a character recognizing means for recognizing characters from the image data, and a frame for the image data according to the type of the image forming the image data. A classifying means for classifying using a frame format, and a layout structure on the manuscript for each of the classified frame format images,
Format information representing the content structure of the image, format information creating means for creating each of the classified frame format images, and converting the character string recognized by the character recognition means into a desired different type of character string And a format information changing unit that uses the converted character string and the format information to change to new format information that matches the converted character string. It is characterized in that image data of a desired character string is obtained based on the information.

【０００８】また、上記イメージの種類は、文字列（記
号列や数式なども含まれる）と、図形（絵なども含まれ
る）と、表のいずれか１以上であってもよい。Further, the type of the image may be any one or more of a character string (including a symbol string and a mathematical formula), a figure (including a picture), and a table.

【０００９】更に、上記書式情報は、それぞれのイメー
ジの種類に対して、枠の座標と、枠内の文字間隔と、枠
内の文字の大きさと、枠内の行数と、枠内の文字位置制
御情報と、文字内容とから構成される情報であってもよ
い。Further, the format information includes the coordinates of the frame, the character spacing within the frame, the size of the characters within the frame, the number of lines within the frame, and the characters within the frame for each image type. It may be information composed of position control information and character contents.

【００１０】[0010]

【作用】この発明によれば、文字や図形や表を意識する
こと無くイメージデータとして取り込み、自動的に入力
のイメージの種類に応じて枠を使って枠形式で分類し
て、分類したイメージごとに、そのイメージの原稿上の
配置構造と、イメージの内容構造とを表す書式情報を作
成し、また、文字認識手段である言語の文字を認識し
て、所望の異なる文字列に変換して、この変換された文
字列の長さなどに応じて、表や図形や文字の大きさなど
を変更して、入力の表や図形を変換後の文字列に応じて
復元して出力することができる。According to the present invention, characters, figures, and tables are taken in as image data without being aware of them, and are automatically classified in a frame format using a frame according to the type of input image, and each classified image is classified. In addition, format information representing the arrangement structure of the image on the original document and the content structure of the image is created, and the characters of the language that is the character recognition means are recognized and converted into different desired character strings, Depending on the length of the converted character string, you can change the size of the table, figure, or character, and restore the input table or figure according to the converted character string and output it. ..

【００１１】また、上記翻訳手段は、異なる言語間の翻
訳や、日本語の平仮名と漢字で記述された文章から、カ
タカナ文やローマ字文に変換することや、ある言語文か
ら数値制御情報やコマンド情報に変換することなどであ
ってもよい。Further, the above-mentioned translation means translates between different languages, translates sentences written in Japanese hiragana and kanji into katakana sentences and romaji sentences, and converts numerical control information and commands from a certain language sentence. It may be conversion into information.

【００１２】[0012]

【実施例】次にこの発明の機械翻訳装置の好適な一実施
例を図面を用いて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A preferred embodiment of the machine translation apparatus of the present invention will be described with reference to the drawings.

【００１３】この実施例の目的は、表の中に文字がある
場合や入力と出力の文字列の長さが著しく異なる場合な
どいかなる場合に対しても入力の書式を完全に復元する
ことを目的とする。The purpose of this embodiment is to completely restore the input format in any case such as when there are characters in the table or when the input and output character strings have significantly different lengths. And

【００１４】図２は、この実施例に係る機械翻訳装置を
コンピュータシステムで実現した場合のハードウエアブ
ロック図である。FIG. 2 is a hardware block diagram when the machine translation device according to this embodiment is realized by a computer system.

【００１５】図２において、機械翻訳装置用のこのコン
ピュータシステムは、例えば、ＣＰＵ１０１と、イメー
ジスキャナ１０２と、キーボード１０３と、磁気ディス
ク装置１０４と、ディスプレイ１０５と、プリンタ１０
６と、主メモリ１０７とで構成されている。尚、イメー
ジスキャナ１０２は、図形や表を読み取れるＯＣＲなど
であってもよい。In FIG. 2, this computer system for a machine translation device is, for example, a CPU 101, an image scanner 102, a keyboard 103, a magnetic disk device 104, a display 105, and a printer 10.
6 and a main memory 107. The image scanner 102 may be an OCR or the like that can read figures and tables.

【００１６】磁気ディスク装置１０４には、ユーザイン
ターフェースプログラムや、翻訳プログラムや、翻訳辞
書ライブラリーや、書式制御プログラムや、翻訳制御プ
ログラムなどが格納されている。このような各種のプロ
グラムは、主メモリ１０７にロードされてＣＰＵ１０１
によって実行される。イメージスキャナ１０２は、原稿
を取り込み、イメージデータに変換して、ＣＰＵ１０１
に供給する。キーボードは１０３は、磁気ディスク装置
１０４の制御や、イメージスキャナ１０２などの原稿読
取り制御を行うためのデータ入力や、その他の各種のデ
ータの入力を行う。ディスプレイ１０５は、入力原イメ
ージデータの表示や、翻訳結果などを表示出力素する。
プリンタ１０６は、翻訳結果などを印刷出力する。The magnetic disk device 104 stores a user interface program, a translation program, a translation dictionary library, a format control program, a translation control program, and the like. Such various programs are loaded into the main memory 107 and stored in the CPU 101.
Executed by The image scanner 102 takes in a document, converts it into image data, and outputs it to the CPU 101.
Supply to. The keyboard 103 is used to input data for controlling the magnetic disk device 104, original reading control for the image scanner 102, and other various data. The display 105 displays the input original image data and outputs the translation result.
The printer 106 prints out a translation result and the like.

【００１７】図１は、この実施例に係る機械翻訳装置に
機能ブロック図である図１において、この機械翻訳装置
は、原イメージデータ入力部１と、枠分割部２と、翻訳
部３と、書式情報変更部４と、翻訳出力部５とで構成さ
れている。FIG. 1 is a functional block diagram of a machine translation apparatus according to this embodiment. In FIG. 1, the machine translation apparatus includes an original image data input section 1, a frame division section 2, a translation section 3, and It is composed of a format information changing unit 4 and a translation output unit 5.

【００１８】原イメージデータ入力部１は、イメージス
キャナ１０２などを使用して翻訳対象の原稿を読取っ
て、イメージデータに変換して、このイメージデータを
枠分割部２に供給する。枠分割部２は、供給されるイメ
ージデータからイメージの空白部分を抽出して、そし
て、空白部分を基にしてそれぞれ分割して枠を形成し、
それぞれの枠内が文字列（記号列や数式なども含む）、
表、図形（例えば図、写真、絵など）であるか否かを判
断して結果をイメージ情報として翻訳部３に供給する。
翻訳部３は、このイメージ情報の文字列、表、図形など
の中から文字列を所望の言語（例えば、英語）に翻訳し
て、この翻訳結果を書式情報変更部４に供給する。The original image data input unit 1 reads an original to be translated by using an image scanner 102 or the like, converts it into image data, and supplies this image data to the frame dividing unit 2. The frame dividing unit 2 extracts a blank portion of the image from the supplied image data, and divides the blank portion based on the blank portion to form a frame,
Each frame is a character string (including symbol strings and mathematical formulas),
It is determined whether it is a table or a figure (for example, a figure, a photograph, a picture, etc.), and the result is supplied to the translation unit 3 as image information.
The translation unit 3 translates a character string from a character string, a table, a figure, etc. of the image information into a desired language (for example, English) and supplies the translation result to the format information changing unit 4.

【００１９】書式情報変更部４は、この翻訳結果の文字
列、表、図形な中から文字列及び表内の文字列について
枠内に入るように枠の大きさ、文字間隔、文字の大きさ
を変更して、変更結果を翻訳結果として翻訳結果出力部
５に供給する。翻訳結果出力部５は、翻訳結果をディス
プレイ１０５やプリンタ１０６に出力して表示あるいは
印刷出力する。The format information changing unit 4 has a frame size, a character interval, and a character size so that a character string and a character string in the table can be included in the frame among the character strings, tables, and figures of the translation result. Is changed and the change result is supplied to the translation result output unit 5 as the translation result. The translation result output unit 5 outputs the translation result to the display 105 or the printer 106 for display or printout.

【００２０】図３は、図１に係る機械翻訳装置の処理フ
ローチャートを示している。FIG. 3 shows a processing flow chart of the machine translation apparatus according to FIG.

【００２１】図３において、原イメージデータ入力部１
は原稿のイメージデータ（図４の（ａ）（ｂ）にこの例
を示す）の取り込みをイメージスキャナ１０２などを使
用して行いビット列に変換して、これを枠分割部２に供
給する（Ｓ１０）。次に枠分割部２は、供給されたイメ
ージデータからイメージの空白部分を抽出して、図５の
（ａ）、（ｂ）に示すような枠を形成して、枠分割を行
う（Ｓ２０）。図４の（ａ）に対して図５の（ａ）が対
応している。例えば、図４の（ａ）の『報告書』の部分
が、図５の（ａ）の枠Ｓ１に対応している。この様にし
て枠Ｓ１〜Ｓ５、枠Ｌ１、枠Ｆ１が形成されている。ま
た、図４の（ｂ）に対して図５の（ｂ）が対応してい
る。例えば、図４の（ａ）の『第２図』の部分が、図５
の（ｂ）の枠Ｓ８に対応している。この様にして枠Ｓ６
〜Ｓ９、枠Ｆ２が形成されている。In FIG. 3, the original image data input unit 1
Captures the image data of the document (this example is shown in FIGS. 4A and 4B) by using the image scanner 102 or the like, converts it into a bit string, and supplies this to the frame dividing unit 2 (S10). ). Next, the frame division unit 2 extracts a blank portion of the image from the supplied image data, forms a frame as shown in FIGS. 5A and 5B, and performs frame division (S20). .. FIG. 5A corresponds to FIG. 4A. For example, the "report" portion in FIG. 4A corresponds to the frame S1 in FIG. 5A. In this way, the frames S1 to S5, the frame L1, and the frame F1 are formed. Further, FIG. 5B corresponds to FIG. 4B. For example, the "FIG. 2" portion of FIG.
This corresponds to the frame S8 in (b). In this way, the frame S6
~ S9, the frame F2 is formed.

【００２２】次に枠分割部２は、各枠の始点座標、終点
座標を求めて、枠の特徴からＳ１（１行）〜Ｓ９（５
行）が文字列枠、Ｆ１、Ｆ２が図形枠、Ｌ１が表枠（２
×３）であると認識する。更に、上下の枠の始点座標と
終点座標から枠制御情報としてセンタリング、右寄せ、
標準などを判定して、これらの情報を書式情報（図６に
この書式情報の例を示す）に書き込む（Ｓ２１）。Next, the frame dividing section 2 obtains the starting point coordinates and the ending point coordinates of each frame, and from the characteristics of the frame, S1 (1 line) to S9 (5).
Line is a character string frame, F1 and F2 are graphic frames, and L1 is a table frame (2
X3) is recognized. Furthermore, centering, right alignment, as frame control information from the start point coordinates and end point coordinates of the upper and lower frames,
A standard or the like is determined, and these pieces of information are written in format information (an example of this format information is shown in FIG. 6) (S21).

【００２３】図６は、枠結合前の書式情報の例であり、
識別子Ｓ１〜Ｓ５・・・までは、文字（記号なども含
む）が含まれる枠であり、始点座標、終点座標、文字間
隔、文字の大きさ、行数、制御情報（センタリング、右
寄せ、左寄せ、標準など）、内容などが記述されてい
る。また、識別子Ｆ１は、図形枠であり、始点座標、終
点座標などが記述されている。また、識別子Ｌ１は、表
枠であり、始点座標、終点座標、大きさなどが記述され
ている。FIG. 6 shows an example of format information before frame combination.
The identifiers S1 to S5 ... Are frames including characters (including symbols), and have start point coordinates, end point coordinates, character spacing, character size, number of lines, control information (centering, right justification, left justification, Standards, etc.), contents, etc. are described. Further, the identifier F1 is a graphic frame, in which start point coordinates, end point coordinates, etc. are described. Further, the identifier L1 is a table frame and describes start point coordinates, end point coordinates, size, and the like.

【００２４】次に枠分割部２は、文字列及び表内の文字
部分を認識して、イメージデータから文字コードに変換
する（Ｓ２２）。次に文字列枠について結合可能性を調
べて、枠Ｓ２、Ｓ３、Ｓ５、Ｓ６、Ｓ７、Ｓ９を結合す
る（Ｓ２３）。この時に、枠Ｓ２とＳ３、Ｓ６とＳ７の
間にはそれぞれ空行を１行挿入する。この様にして文字
列枠が結合された後の文字列枠の書式情報の例を図７に
示す。この様にして得られた書式情報は翻訳部３に供給
される。Next, the frame division unit 2 recognizes the character string and the character portion in the table and converts the image data into a character code (S22). Next, the possibility of combining the character string frames is checked, and the frames S2, S3, S5, S6, S7 and S9 are combined (S23). At this time, one blank line is inserted between each of the frames S2 and S3 and S6 and S7. FIG. 7 shows an example of the format information of the character string frames after the character string frames are combined in this way. The format information thus obtained is supplied to the translation unit 3.

【００２５】図７は、文字列枠の結合後の書式情報の例
であり、識別子Ｓ１、Ｓ２、Ｓ４・・・が記述されてお
り、始点座標、終点座標、文字間隔、文字の大きさ、行
数、制御情報、内容などが含まれている。FIG. 7 is an example of the format information after combining the character string frames, in which the identifiers S1, S2, S4, ... Are described, and the starting point coordinates, the ending point coordinates, the character spacing, the character size, The number of lines, control information, contents, etc. are included.

【００２６】翻訳部３は、供給された書式情報の文字列
部分を翻訳（日本語から英語に翻訳）して、書式情報変
更部４に供給する（Ｓ３０）。次に書式情報変更部は、
供給された書式情報の文字列部分についてもとの文字間
隔、文字の大きさで訳文が枠内に入るか否かを調べ（Ｓ
４０）、図８に示すように、枠内に入る場合は枠制御情
報がセンタリングの枠に対しては、左右が均等になるよ
うに縮小し（図８（ａ））、右寄せの枠に対しては左を
文字列が入る最小の枠になるように縮小し（図８
（ｂ））、左寄せの枠に対しては右を文字列が入る最小
の枠になるように縮小し（図８（ｃ））、標準の枠に対
しては下を文字列が入る最小の枠になるように縮小して
（図８（ｄ））、書式情報の枠開始座標、枠終了座標を
変更する（Ｓ４０、Ｓ４１）。The translation unit 3 translates the character string portion of the supplied format information (translates from Japanese to English) and supplies it to the format information changing unit 4 (S30). Next, the format information change section
For the character string portion of the supplied format information, it is checked whether the translated text fits within the frame at the original character spacing and character size (S
40), as shown in FIG. 8, when the frame control information is within the frame, the frame control information is reduced so that the left and right sides of the frame are centered (FIG. 8A), and the frame is aligned with the right aligned frame. The left side is reduced so that it becomes the smallest frame in which the character string can be entered (Fig. 8
(B)), with respect to the left-justified frame, the right side is reduced to be the smallest frame in which the character string can be inserted (FIG. 8C), and the lower part of the standard frame is the smallest in which the character string can be inserted. The frame is reduced to a frame (FIG. 8D), and the frame start coordinates and frame end coordinates of the format information are changed (S40, S41).

【００２７】また、文字列が枠内に入らない場合は、図
９に示すように、枠制御情報がセンタリングの枠に対し
ては上下の枠の延長線を越えない範囲で左右を均等に拡
大し（図９（ａ））、右寄せの枠に対しては同様に上下
の枠の延長線を越えない範囲で左を拡大し（図９
（ｂ））、左寄せの枠に対しては同様に上下の枠の延長
線を越えない範囲で右を拡大し（図９（ｃ））、標準の
枠に対しては左右の枠の延長線を越えない範囲でかつ頁
の範囲内で下を文字列が入る最小の枠になるように拡大
し（図９（ｄ））、書式情報の枠開始座標、枠終了座標
を変更する（Ｓ４０、Ｓ４２）。If the character string does not fit in the frame, as shown in FIG. 9, the frame control information is equally expanded to the left and right within the range of the centering frame within the extension lines of the upper and lower frames. For the right-aligned frame, the left side is similarly enlarged within the range that does not exceed the extension lines of the upper and lower frames (FIG. 9A).
(B)) Similarly, for the left-aligned frame, the right is enlarged within a range not exceeding the extension lines of the upper and lower frames (FIG. 9C), and the extension lines of the left and right frames for the standard frame. Within the range not exceeding and within the range of the page, the bottom is enlarged to be the smallest frame in which the character string is inserted (FIG. 9D), and the frame start coordinates and frame end coordinates of the format information are changed (S40, S42).

【００２８】次に枠を拡大しても枠内に文字列が入らな
い場合、図１０に示すように、文字間隔を枠内に入る最
大値となるように小さくして、書式情報の文字間隔を変
更する（Ｓ４３、Ｓ４４）。次に文字間隔を小さくして
も枠内に文字が入らない場合は、図１１に示すように、
文字の大きさを枠内に入る最大値となるように小さくし
て、書式情報の文字の大きさを変更する（Ｓ４５、Ｓ４
６）。Next, when the character string does not fit in the frame even if the frame is expanded, the character spacing is reduced to the maximum value that fits in the frame as shown in FIG. Is changed (S43, S44). Next, if the characters do not enter the frame even if the character spacing is reduced, as shown in FIG.
The character size is reduced to the maximum value within the frame, and the character size of the format information is changed (S45, S4).
6).

【００２９】この様にして変更されて得られた書式情報
を、書式情報変更部４は、翻訳結果出力部５に供給し
て、図１２に示すように、表示出力又は印刷出力を行い
（Ｓ４６）、翻訳作業を終了する。The format information changing unit 4 supplies the format information obtained by the change in this way to the translation result output unit 5 to perform display output or print output as shown in FIG. 12 (S46). ), Translation work is completed.

【００３０】以上の実施例によれば、いかなる書式の原
稿であっても、原稿をイメージデータとして取り込めれ
ば、枠分割、書式情報の作成、文字認識、文字コードへ
の変換、文字列の翻訳、文字枠の調整、文字間隔の調
整、文字の大きさの調整などを行って、ユーザの操作を
逐一行うことなく、自動的に入力書式を復元しつつ、入
力原文の翻訳も行うことができる。According to the above-described embodiments, if a manuscript of any format is captured as image data, frame division, format information creation, character recognition, character code conversion, and character string translation are possible. By adjusting the character frame, adjusting the character spacing, adjusting the character size, etc., it is possible to translate the input source text while automatically recovering the input format without any user operation. ..

【００３１】従って、いかなる書式の原稿の翻訳であっ
ても、後編集を加えることなく入力書式を正確に復元す
ることができる。Therefore, regardless of the translation of the manuscript in any format, the input format can be accurately restored without post-editing.

【００３２】以上の実施例においては、原イメージデー
タをイメージスキャナで取り込んだが、これに限るもの
ではなく、イメージデータを供給することができる、ビ
デオカメラや、図形や表などを読み取れるＯＣＲや、Ｖ
ＴＲや、画像蓄積装置などから供給されるイメージデー
タなどであってもよい。また、文字と、文字以外の図形
や表などを別々にワードプロセッサなどで生成して、そ
の後に上記実施例の翻訳を行うようにしてもよいし、ま
た、文字と、文字以外の図形や表などを別々に取り込ん
で処理してもよい。Although the original image data is captured by the image scanner in the above embodiments, the present invention is not limited to this. A video camera capable of supplying the image data, an OCR capable of reading a figure or a table, a V
It may be TR, image data supplied from an image storage device, or the like. Further, the characters and the figures and tables other than the characters may be separately generated by a word processor and the like, and then the translation of the above embodiment may be performed, or the characters and the figures and tables other than the characters may be translated. May be separately captured and processed.

【００３３】以上の実施例において、入力原文を日本語
から英語に翻訳する例で説明したが、これに限るもので
はなく、英語から日本語であっても良いし、また、他の
言語間の翻訳においても適用することができる。また、
原イメージデータの中に漢字や仮名で記述された日本語
をカタカナ文や、ローマ字文に変換しながら、入力書式
も復元する装置にも適用することができる。また、ある
言語文からコンピュータのコマンド言語を生成したり、
数値制御情報を生成することにも適用することができ
る。In the above embodiment, an example of translating an input original sentence from Japanese to English has been described, but the present invention is not limited to this, and it may be from English to Japanese, or between other languages. It can also be applied in translation. Also,
It can also be applied to a device that restores the input format while converting Japanese written in Chinese characters or Kana in the original image data into Katakana or Roman characters. You can also generate a command language for a computer from a certain language sentence,
It can also be applied to generate numerical control information.

【００３４】また、上記書式情報変更部４においては、
自動的に変更して得られた書式情報をユーザのデータ入
力によって、更に変更する構成にすることも可能であ
る。Further, in the format information changing section 4,
It is also possible to adopt a configuration in which the format information obtained by the automatic change is further changed by the user's data input.

【００３５】[0035]

【発明の効果】以上述べたようにこの発明によれば、文
字や図形や表を意識すること無くイメージデータとして
取り込み、自動的に入力のイメージの種類に応じて分類
して、分類したイメージごとに、そのイメージの書式情
報を作成し、また、文字認識手段で文字を認識して、所
望の異なる文字列に変換して、この変換された文字列の
長さなどに応じて、表や図形や文字の大きさなどを変更
して、入力の表や図形を変換後の文字列に応じて復元し
て出力することができる。As described above, according to the present invention, characters, figures, and tables are taken in as image data without being aware of them, and are automatically classified according to the type of input image, and each classified image is classified. Then, the format information of the image is created, the character is recognized by the character recognition means, and the character is converted into a desired different character string, and the table or the figure is converted according to the length of the converted character string. It is possible to change the size of characters or characters, restore the input table or figure according to the converted character string, and output it.

【００３６】従って、不慣れなユーザによっても効率的
に機械翻訳処理を行うことができる。Therefore, even a user who is unfamiliar can efficiently perform the machine translation process.

[Brief description of drawings]

【図１】この実施例に係る機械翻訳装置の機能ブロック
図である。FIG. 1 is a functional block diagram of a machine translation device according to this embodiment.

【図２】この実施例に係る機械翻訳装置をコンピュータ
システムで実現した場合のハードウエア構成図である。FIG. 2 is a hardware configuration diagram when the machine translation device according to the embodiment is realized by a computer system.

【図３】この実施例に係る機械翻訳装置の処理フローチ
ャートである。FIG. 3 is a processing flowchart of the machine translation apparatus according to this embodiment.

【図４】この実施例に係る機械翻訳装置の入力イメージ
の例を示している。FIG. 4 shows an example of an input image of the machine translation device according to this embodiment.

【図５】この実施例に係る機械翻訳装置の枠分割の例を
示す図である。FIG. 5 is a diagram showing an example of frame division of the machine translation device according to the embodiment.

【図６】この実施例に係る機械翻訳装置の枠結合前の書
式情報の例を示している。FIG. 6 shows an example of format information before frame combination of the machine translation device according to the embodiment.

【図７】この実施例に係る機械翻訳装置の枠結合後の書
式情報の例を示している。FIG. 7 shows an example of format information after frame combination of the machine translation device according to the embodiment.

【図８】この実施例に係る機械翻訳装置の枠の縮小の例
を示している。FIG. 8 shows an example of reduction of the frame of the machine translation device according to this embodiment.

【図９】この実施例に係る機械翻訳装置の枠の拡大の例
を示している。FIG. 9 shows an example of enlargement of a frame of the machine translation device according to this embodiment.

【図１０】この実施例に係る機械翻訳装置の文字間隔変
更の例を示している。FIG. 10 shows an example of changing the character spacing in the machine translation device according to this embodiment.

【図１１】この実施例に係る機械翻訳装置の文字の大き
さの変更の例を示している。FIG. 11 shows an example of changing the character size of the machine translation device according to the embodiment.

【図１２】この実施例に係る機械翻訳装置の翻訳出力の
例を示している。FIG. 12 shows an example of translation output of the machine translation device according to this embodiment.

[Explanation of symbols]

１…原イメージデータ入力部、２…枠分割部、３…翻訳
部、４…書式情報変更部、５…翻訳結果出力部。1 ... Original image data input section, 2 ... Frame division section, 3 ... Translation section, 4 ... Format information changing section, 5 ... Translation result output section.

Claims

[Claims]

1. An image data capturing means for capturing a document as image data, a character recognizing means for recognizing characters from the image data, and a frame for the image data according to the type of an image forming the image data. The classification means for classifying by using the frame format, and the format information indicating the layout structure on the manuscript and the content structure of the image for each of the classified frame format images, for each of the classified frame format images Using the format information creation means to create, the translation means to convert the character string recognized by the character recognition means into a desired different type of character string, the converted character string and the format information, A format information changing means for changing to new format information suitable for the converted character string is provided, and the image data of the desired character string is generated based on the changed format information. A machine translation device characterized by being obtained.

2. The machine translation device according to claim 1, wherein the type of the image is one or more of a character string, a figure, and a table.

3. The format information includes, for each image type, frame coordinates, character spacing within the frame, character size within the frame, number of lines within the frame, and character within the frame. 3. The machine translation device according to claim 1, wherein the machine translation device is information including position control information and character contents.