JPH0756924A

JPH0756924A - Bilingual device

Info

Publication number: JPH0756924A
Application number: JP5162193A
Authority: JP
Inventors: Takako Satou; 佐藤多加子
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-06-30
Filing date: 1993-06-30
Publication date: 1995-03-03

Abstract

PURPOSE:To reduce memory quantity and work quantity by storing a connected word and an idiom in the same structure as a word at the time of retrieving a dictionary for translating a language which is optically recognized and outputting a retrieved translated word in an arbitrary system. CONSTITUTION:When a bilingual mode is selected in an operation part 5, an original picture which is read by a scanner 2 is binarized in a picture processing part 3. An OCR part/spelling check part 8 executes a character recognition processing in an area recognized to be a character area by an area identification part 7. When the original picture is English, the English word which is character recognition-processed is spelling-checked. A dictionary retrieval part 9 retrieves the dictionary 11 for the recognized word and idiom for translating the English sentence on the original into Japanese. CPU for bilingual processing 6 retrieves the word and the idiom, which are connected by a mark at the time of retrieving the dictionary, and the connected word/idiom are stored in the memory 13 in the same structure as the word. Then, they are outputted in the arbitrary system by using the retrieved translated word.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、原稿上の英文を読み取
って日本語に翻訳して出力する対訳装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a bilingual apparatus for reading an English sentence on a manuscript, translating it into Japanese, and outputting it.

【０００２】[0002]

【従来の技術】任意の言語から異なる任意の言語に翻訳
するデジタル複写機として特開平４ー７７９６５に記載
のものがある。しかしながら、このデジタル複写機にお
いては、複数の単語からなる熟語を翻訳した言語と併記
して出力すること（以下、対訳形式出力）ができなかっ
た。また、このデジタル複写装置においては、ハイフォ
ン等の記号で連結された分割英単語が、一般の英単語と
別扱いで記憶されていたため、メモリ量、作業量共に多
くなるという問題点があった。2. Description of the Related Art A digital copying machine for translating an arbitrary language into an arbitrary different language is disclosed in Japanese Patent Laid-Open No. 4-77965. However, in this digital copying machine, it was not possible to output a compound word composed of a plurality of words together with the translated language (hereinafter referred to as bilingual format output). Further, in this digital copying apparatus, the divided English words connected by symbols such as hyphens are stored separately from general English words, so that there is a problem that both the memory amount and the work amount increase.

【０００３】また、１枚当たりの文字数が多い原稿に対
して、対訳形式出力をさせた場合には、翻訳された文字
数も多くなるので、レイアウトの変更をするとともに２
枚以上の出力用紙に渡って出力をしていた。これを具体
的に説明すると、まず、図１０（１）に示される原稿
は、領域認識されてに分けられ、領域Ａ（１０１）〜領
域Ｅ（１０５）の順に翻訳がされる。そして、翻訳され
た文字は、原稿の文字ともに各々領域Ａ’（１０６）、
領域Ｂ’（１０７）、領域Ｃ’（１０８）、領域Ｄ’
（１０９）、領域Ｅ’（１１０）とされ、この領域Ａ’
（１０６）〜領域Ｅ’（１１０）は、出力用紙の空いて
いるスペ−スに次々と出力されて、図１０（２），
（３）に示されるような対訳形式出力となる。When a bilingual format output is made for a manuscript having a large number of characters per sheet, the number of translated characters also becomes large.
Output was performed over more than one sheet of output paper. More specifically, first, the document shown in FIG. 10 (1) is divided into regions which are recognized, and the regions are translated in the order of region A (101) to region E (105). Then, the translated characters are the area A '(106) and the characters of the manuscript, respectively.
Region B '(107), Region C' (108), Region D '
(109) and area E '(110), and this area A'
The areas (106) to E '(110) are successively output to the vacant spaces of the output paper, as shown in FIG.
The parallel translation output is as shown in (3).

【０００４】しかしながら、図１０（２），（３）に示
されるような対訳形式出力は、図１０（１）に示される
原稿と全く異なるレイアウトであるため、原稿のどの部
分がどの出力に対応しているかがわかりづらいという問
題点があった。However, since the bilingual format output as shown in FIGS. 10 (2) and 10 (3) has a completely different layout from the original shown in FIG. 10 (1), which part of the original corresponds to which output. There was a problem that it was difficult to understand what was going on.

【０００５】また、光学的文字認識技術を使用した最近
のＯＣＲは、精度がほぼ１００％に近くなっているが、
読み取られた原稿の状態や、文字フォントの形態により
誤認識が必ず生じてしまう。その際、偶然、誤認識され
た単語が辞書に存在する場合、原文の内容とは全くはず
れた訳語を出力する場合があり、どこでミスが発生した
のか一般の使用者には分からないという問題点があっ
た。Further, although the accuracy of the recent OCR using the optical character recognition technology is close to 100%,
Misrecognition will always occur depending on the state of the read document and the form of the character font. In that case, if a misrecognized word is accidentally found in the dictionary, a translated word that is completely different from the original text may be output, and it is not possible for general users to know where the error occurred. was there.

【０００６】また、一般に広く使用されている辞書のみ
の搭載では、専門文書に対して満足のいく訳語を提供で
きないという問題点があった。Further, there is a problem in that it is not possible to provide a satisfactory translated word for a specialized document if only a widely used dictionary is installed.

【０００７】また、白抜き文字やアウトライン文字に対
して対訳処理することができないという問題点があっ
た。Further, there is a problem that it is not possible to perform parallel translation processing on white characters and outline characters.

【０００８】[0008]

【発明が解決しようとする課題】本発明は上記問題点を
解決すべくなされたものであって、その第１の目的は、
メモリ量、作業量を少なくすることにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and its first object is to:
To reduce the amount of memory and work.

【０００９】また、第２の目的は、熟語とそれを構成す
る単語の両方の翻訳を出力できるようにすることにあ
る。A second object is to be able to output translations of both the idiom and the words that compose it.

【００１０】また、第３の目的は、原稿と対訳出力とを
参照しやすくすることにある。A third object is to make it easy to refer to the original and the bilingual output.

【００１１】また、第４の目的は、翻訳を出力する過程
のどこにミスが発生したのかを判り易くすることにあ
る。A fourth object is to make it easy to understand where a mistake has occurred in the process of outputting a translation.

【００１２】また、第５の目的は、翻訳ミスや翻訳不可
ということがないようにすることにある。A fifth object is to prevent a translation error and a translation failure.

【００１３】また、第６の目的は、白抜き文字、アウト
ラインフォントに対しての翻訳をできるようにすること
にある。A sixth object is to enable translation of white characters and outline fonts.

【００１４】[0014]

【課題を解決するための手段】前記第１の目的を達成す
るために、請求項１の発明は、原稿の画像を読み取る画
像読取手段、読み取った画像情報を記憶する画像記憶手
段、読み取った画像情報に画像処理を施す画像処理手
段、および、画像処理手段が出力する画像信号を記録媒
体上に記録する画像記録手段を備えるデジタル画像複写
機において、前記読取手段により読み取った画像から文
字領域を識別する識別手段と、前記識別された文字領域
の光学的文字認識を行う認識手段と、該認識された文字
からなる言語を異なる言語に翻訳するべく辞書検索をす
る辞書検索手段と、前記辞書検索において記号で連結さ
れた単語、熟語を検索する手段と、検索された訳語を格
納する際に、前記連結単語、熟語を単語と同じ構造で記
憶させる手段と、検索された訳語を使用して任意の形式
で出力を行う手段とを備える。In order to achieve the first object, the invention of claim 1 is an image reading means for reading an image of a document, an image storing means for storing read image information, and a read image. In a digital image copying machine provided with image processing means for performing image processing on information and image recording means for recording an image signal output by the image processing means on a recording medium, a character area is identified from an image read by the reading means. Identifying means, a recognizing means for performing optical character recognition of the identified character area, a dictionary searching means for performing a dictionary search to translate a language composed of the recognized characters into a different language, and the dictionary search. Means for retrieving words and phrases that are linked by symbols; means for storing the concatenated words and phrases with the same structure as the words when storing the retrieved translated words; And means for outputting in any format using been translation.

【００１５】また、請求項２の発明は、第１の発明にお
いて、検索された熟語に印を付けて出力する手段を備え
る。The invention of claim 2 is provided with a means for marking and outputting the searched idiom in the first invention.

【００１６】また、請求項３の発明は、第１の発明にお
いて、原文と訳語と光学的認識された文字を所定の形式
で同一記録紙上に出力する手段を備える。According to a third aspect of the present invention, in the first aspect of the present invention, the original sentence, the translated word, and the optically recognized character are provided on the same recording paper in a predetermined format.

【００１７】また、請求項４の発明は、第１の発明にお
いて、熟語を線画、色等で単語と区別する手段と熟語帳
を出力する手段とを備える。The invention according to claim 4 is the first aspect of the invention, further comprising means for distinguishing the idiom from the word by a line drawing, color, etc. and means for outputting the idiom book.

【００１８】また、請求項５の発明は、第１の発明にお
いて、辞書選択手段と選択された辞書において１単語に
対する複数訳語を同時に出力する手段とを備える。According to a fifth aspect of the present invention, in the first aspect, the dictionary selection means and the means for simultaneously outputting a plurality of translated words for one word in the selected dictionary are provided.

【００１９】また、請求項６の発明は、第１の発明にお
いて、画像処理範囲を指定する手段と画像加工処理手段
とを備える。According to a sixth aspect of the invention, in the first aspect of the invention, the image processing apparatus further comprises means for designating an image processing range and image processing means.

【００２０】[0020]

【作用】請求項１の発明によれば、原稿の画像を読み取
る画像読取手段、読み取った画像情報を記憶する画像記
憶手段、読み取った画像情報に画像処理を施す画像処理
手段、および、画像処理手段が出力する画像信号を記録
媒体上に記録する画像記録手段を備えるデジタル画像複
写機において、前記読取手段により読み取った画像から
文字領域を識別する識別手段と、前記識別された文字領
域の光学的文字認識を行う認識手段と、該認識された文
字からなる言語を異なる言語に翻訳するべく辞書検索を
する辞書検索手段と、前記辞書検索において記号で連結
された単語、熟語を検索する手段と、検索された訳語を
格納する際に、前記連結単語、熟語を単語と同じ構造で
記憶させる手段と、検索された訳語を使用して任意の形
式で出力を行う手段とを備えるので、メモリ量、作業量
が少なくなる。According to the present invention, the image reading means for reading the image of the original, the image storage means for storing the read image information, the image processing means for performing image processing on the read image information, and the image processing means. In a digital image copying machine provided with an image recording means for recording an image signal output by a recording medium on a recording medium, an identifying means for identifying a character area from an image read by the reading means, and an optical character of the identified character area A recognition means for recognizing; a dictionary search means for performing a dictionary search for translating a language composed of the recognized characters into a different language; a means for searching words and phrases that are linked by symbols in the dictionary search; When storing the translated word, a means for storing the concatenated word and the idiom in the same structure as the word, and a method for outputting the searched translated word in an arbitrary format. Because comprising the door, amount of memory, amount of work decreases.

【００２１】請求項２の発明によれば、請求項１に記載
の対訳装置において、検索された熟語に印を付けて出力
する手段を備えるので、熟語とそれを構成する単語の両
方の翻訳が出力される。According to the invention of claim 2, the parallel translation device according to claim 1 is provided with a means for marking and outputting the retrieved idiom, so that both the idiom and the words constituting it are translated. Is output.

【００２２】請求項３の発明によれば、請求項１に記載
の対訳装置において、任意の出力形式に原稿どうりのレ
イアウトで原文の下に対応する訳語を出力する手段を備
えるので、原稿と対訳出力とを参照しやすくなる。According to the third aspect of the present invention, the parallel translation apparatus according to the first aspect includes means for outputting the corresponding translated word below the original sentence in a layout similar to the original document in an arbitrary output format. It becomes easier to refer to the bilingual output.

【００２３】請求項４の発明によれば、請求項１に記載
の対訳装置において、任意の出力形式に、原文の下に対
応する訳語のみならず、認識された文字を出力する手段
を備えるので、翻訳を出力する過程のどのにミスが発生
したのかが判りやすい。According to the invention of claim 4, the parallel translation apparatus according to claim 1 is provided with means for outputting the recognized character as well as the corresponding translated word below the original sentence in an arbitrary output format. , It is easy to understand which error occurred in the process of outputting the translation.

【００２４】請求項５の発明によれば、請求項１に記載
の対訳装置において、使用する辞書に、一般に広く使用
されている辞書と専門用語に対応した辞書の両者を使用
する手段と、辞書の使用形態を選択できる手段を備える
ので、翻訳ミスや翻訳不可ということがない。According to the invention of claim 5, in the bilingual apparatus according to claim 1, as a dictionary to be used, a means that uses both a dictionary that is generally widely used and a dictionary corresponding to a technical term, and a dictionary. Since there is a means for selecting the usage mode of, there is no translation error or translation failure.

【００２５】請求項６の発明によれば、請求項１に記載
の対訳装置において、入力範囲指定手段を備えるので、
白抜き文字、アウトラインフォントに対しての翻訳が行
なえる。According to the invention of claim 6, in the bilingual apparatus according to claim 1, since the input range designating means is provided,
You can translate white characters and outline fonts.

【００２６】[0026]

【実施例】図１及び図２を用いて、デジタル複写機と対
訳装置の構成を示す。ケーブル４００は、デジタル複写
機１００と対訳装置２００との間の命令の通信、画像の
入出力に使用される。操作部であるＬＣＤ３００におい
て対訳処理モ−ドが選択されると、デジタル複写機１０
０のスキャナで読み取られた多値画像は２値化デ−タと
されてから、ＡＩベ−スボ−ド２０１を介して対訳装置
２００のメモリであるＩＣカ−ド２０２に格納される。
そして、２値化デ−タは、ＡＩベ−スボ−ド２０１にお
いて対訳処理されて、デジタル複写機１００に出力され
る。また、対訳処理の制御やデジタル複写機とのタイミ
ング制御は、制御部２０３が行う。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The configurations of a digital copying machine and a bilingual apparatus will be described with reference to FIGS. The cable 400 is used for communication of commands between the digital copying machine 100 and the parallel translation device 200, and input / output of images. When the parallel processing mode is selected on the LCD 300, which is the operation unit, the digital copying machine 10
The multi-valued image read by the scanner of 0 is converted into binary data and then stored in the IC card 202 which is the memory of the parallel translation device 200 via the AI base board 201.
Then, the binarized data is subjected to parallel translation processing in the AI base board 201 and output to the digital copying machine 100. The control unit 203 controls the parallel translation processing and the timing control with the digital copying machine.

【００２７】図３を用いてデジタル複写機と対訳装置の
構成を更に詳細に説明する。The configurations of the digital copying machine and the parallel translation device will be described in more detail with reference to FIG.

【００２８】まず、デジタル複写機について説明する
と、デジタル複写機は、デジタル複写機を制御するＣＰ
Ｕ１、原稿画像を読み取るスキャナ２、読み取られた原
稿画像に対して画像処理を行う画像処理部３、入力され
た画像を転写紙上に記録するプリンタ４、対訳モ−ド等
の各種モ−ドを指定するＬＣＤ等の操作部５を有する。First, the digital copying machine will be described. The digital copying machine is a CP for controlling the digital copying machine.
U1, a scanner 2 for reading an original image, an image processing unit 3 for performing image processing on the read original image, a printer 4 for recording the input image on a transfer paper, various modes such as a bilingual mode, etc. It has an operation unit 5 such as an LCD to be designated.

【００２９】また、対訳装置は、タイミング等の制御部
である対訳処理用ＣＰＵ６、入力された画像領域が文字
領域か絵柄領域かを認識する領域識別部７、領域識別部
７において文字領域と認識された領域において文字認識
処理を行うＯＣＲ部及び、原稿画像が英文であるときに
文字認識処理された英単語をスペルチェックするスペル
チェック部８、認識された単語及び熟語に対して辞書を
検索する辞書検索部９、原稿画像と翻訳された言語文字
と文字認識された文字とを合成する作画部１０、少なく
とも２つの言語が対応付けられている辞書１１、作画部
１０で文字を発生させるときに使用されるフォント記憶
しているフォント１２、原稿画像や文字認識結果や合成
時の画像を記憶したり、対訳処理をするときのワークメ
モリとして使用されたりするメモリ（ＲＡＭ）１３を有
する。In the bilingual apparatus, the bilingual processing CPU 6, which is a control unit for timing, etc., the area identifying unit 7 for recognizing whether the input image area is a character area or a picture area, and the area identifying unit 7 for recognizing the character area. OCR section for performing character recognition processing in the designated area, spell check section 8 for spell-checking English words subjected to character recognition processing when the original image is an English sentence, and dictionary is searched for the recognized words and idioms. When a character is generated in the dictionary search unit 9, a drawing unit 10 that combines a manuscript image, a translated language character, and a character recognized character, a dictionary 11 in which at least two languages are associated, and the drawing unit 10 Used fonts Stores the stored fonts 12, document images, character recognition results and images at the time of composition, and is used as a work memory for parallel translation processing. With a memory (RAM) 13 to or.

【００３０】図４を用いて、本発明のデジタル複写機及
び対訳装置のＣＰＵの制御動作の概要を示す。An outline of the control operation of the CPU of the digital copying machine and parallel translation apparatus of the present invention will be described with reference to FIG.

【００３１】操作部５において対訳モ−ドが選択される
と、スキャナ２により原稿を読み取りをする（ステップ
５）。読み取られた原稿画像は、画像処理部３において
２値化された後に対訳装置に入力される（ステップ
７）。入力された原稿画像が文字領域か絵柄領域かを認
識する（ステップ１０）。文字領域における文字の行
数、行の位置、その行の文字数、文字の位置の認識及び
単語への切り分けを行う（ステップ２０）。文字領域の
文字をＯＣＲを用いて認識する（ステップ３０）、ステ
ップ１０、２０、３０の処理結果を所定のフォーマット
でメモリ１３に格納する（ステップ３５）。ステップ１
０、２０、３０の処理結果を用いて認識された文字から
なる単語の辞書検索を行う（ステップ４０）。ステップ
３５で格納された処理結果、及び辞書検索結果から所定
のフォーマットで作画用データを作成してメモリ１３に
格納する（ステップ５０）。ステップ５０で作成された
作画用データに基づき出力画像を作成してメモリ１３に
格納し、対訳形式、単語帳、熟語帳等の出力画像を得る
（１４０）。ステップ１０００において、ステップ５０
で作成されたデータを用いてこの情報が現在選択されて
いる転写紙を使用した場合、何枚使用するかを算出す
る。算出されたページ数をデジタル複写機の操作部５に
表示する。When the parallel translation mode is selected on the operation section 5, the original is read by the scanner 2 (step 5). The read document image is binarized by the image processing unit 3 and then input to the bilingual apparatus (step 7). It is recognized whether the input document image is a character area or a picture area (step 10). The number of lines of the character in the character area, the position of the line, the number of characters in the line, the position of the character, and the division into words are performed (step 20). The characters in the character area are recognized using OCR (step 30), and the processing results of steps 10, 20, and 30 are stored in the memory 13 in a predetermined format (step 35). Step 1
Using the processing results of 0, 20, and 30, a dictionary search for words consisting of the recognized characters is performed (step 40). Drawing data is created in a predetermined format from the processing result stored in step 35 and the dictionary search result and stored in the memory 13 (step 50). An output image is created based on the drawing data created in step 50 and stored in the memory 13 to obtain an output image such as a bilingual form, a word book, and a idiom book (140). In Step 1000, Step 50
When the transfer paper for which this information is currently selected is used, the number of sheets to be used is calculated by using the data created in (3). The calculated number of pages is displayed on the operation unit 5 of the digital copying machine.

【００３２】第１の実施例を説明する。ステップ４０に
おいて辞書検索される単語の構造を以下に示す。＜単語情報＞開始Ｘ座標開始Ｙ座標終了Ｘ座標終了Ｙ座標単語文字数単語識別コード該当認識単語データが格納されている番地The first embodiment will be described. The structure of the words searched in the dictionary in step 40 is shown below. <Word information> Start X coordinate Start Y coordinate End X coordinate End Y coordinate Word number of words Word identification code Address where the corresponding recognition word data is stored

【００３３】以上が１単語に対する情報である。ここ
で、単語識別コードは、通常単語と熟語の識別を行うコ
ードであり１例として以下の識別を行う。通常単語：０ｘｆｆ熟語：０ｘ０１The above is the information for one word. Here, the word identification code is a code for identifying a normal word and an idiom, and the following identification is performed as an example. Common word: 0xff Jukugo: 0x01

【００３４】ここで、ステップ４０において辞書検索を
行った際に行と単語の中間の構造（中間体と呼ぶ）で、
結果をメモリに記憶させる。以下に中間体の構造を示
す。Here, when a dictionary search is performed in step 40, an intermediate structure between lines and words (called an intermediate) is obtained.
Store the result in memory. The structure of the intermediate is shown below.

【００３５】＜中間体情報＞中間体に含まれる単語の個数該当中間体の先頭の単語情報が格納されている番地<Intermediate Information> Number of Words Included in Intermediate Address where the word information at the beginning of the intermediate is stored

【００３６】ここで、中間体構造の参照の仕方を説明す
る。ステップ４０において、識別された領域単位に以下
に示す辞書検索結果構造を各領域の中間体の存在する個
数分メモリに記憶していき、前記動作を全領域に対して
行った結果がステップ５０で、格納される。Here, how to refer to the intermediate structure will be described. In step 40, the dictionary search result structures shown below are stored in the memory for each identified area as many as the number of intermediates in each area, and the result of performing the above operation for all areas is step 50. , Stored.

【００３７】＜辞書検索情報＞訳語の個数該当中間体データが格納されている番地該当訳語データが格納されている番地<Dictionary search information> Number of translated words Address where the corresponding intermediate data is stored Address where the translated word data is stored

【００３８】図５に中間体構造の説明図を示す。太線が
行の位置を示し、細線が単語を示している。FIG. 5 shows an explanatory view of the intermediate structure. Thick lines indicate line positions, and thin lines indicate words.

【００３９】図５（１）はＯＣＲ後の認識結果であり、
（２）はＯＣＲ結果を使用して辞書検索を行った後の単
語（中間体）を示している。（１）、（２）を比較すれ
ばわかるように、熟語と連結単語の単語の認識位置が異
なっているのがわかる。FIG. 5A shows the recognition result after OCR,
(2) shows the word (intermediate) after the dictionary search using the OCR result. As can be seen by comparing (1) and (2), it is understood that the word recognition positions of the idiom and the connected word are different.

【００４０】ここで、間に他の単語が介在する熟語に対
応する場合のデータの格納方法について説明する。ｔｏｏｃｏｍｐｌｉｃａｔｅｄｔｏｐｒｏｖｅという熟語の場合、ＯＣＲ結果の単語データにはｔｏｏｃｏｍｐｌｉｃａｔｅｄｔｏｐｒｏｖｅの順番にデータが格納されている。辞書検索結果後の中
間体データはｔｏｏｃｏｍｐｌｉｃａｔｅｄｔｏｐｒｏｖｅｃｏｍｐｌｉｃａｔｅｄｐｒｏｖｅの順番でデータが格納される。上記中間体情報を参照す
ることによって関連熟語にマークを付けることができ
る。Here, a method of storing data in the case of corresponding to a compound word in which another word intervenes will be described. In the case of the compound word "too compiled to protect", the word data of the OCR result has data stored in the order of "too compiled to protect". As the intermediate data after the dictionary search result, data is stored in the order of "to compliant", "to protect", and "complicated probe". Related idioms can be marked by referring to the intermediate information above.

【００４１】図６に出力形式を示す。図１０（１）の原
稿の領域Ａ（１０１）、領域Ｂ（１０２）、領域Ｃ（１
０３）、領域Ｄ（１０４）、領域Ｅ（１０５）は、各々
領域Ａ’’（１１１）、領域Ｂ’’（１１２）、領域
Ｃ’’（１１３）、領域Ｄ’’（１１４）、領域Ｅ’’
（１１５）に対応する。このように、図６では２枚に渡
ることにはなるが、第１０図（１）に示す原稿のレイア
ウトに応じたような出力がされている。FIG. 6 shows the output format. Area A (101), area B (102), area C (1
03), area D (104) and area E (105) are area A ″ (111), area B ″ (112), area C ″ (113), area D ″ (114) and area, respectively. E ''
Corresponds to (115). As described above, although the number of sheets is two in FIG. 6, the output is made according to the layout of the document shown in FIG. 10 (1).

【００４２】また、図７に絵柄領域が存在した場合の対
訳出力形式を示す。黒で塗り潰された領域が絵柄領域で
あるが、この場合余白が生じてしまうが、あくまでも原
稿のレイアウトを保存するように作画部において、座標
値を記憶してレイアウトする。Further, FIG. 7 shows a parallel translation output format in the case where a picture area exists. Although the area filled with black is the picture area, in this case, a margin is generated, but the layout is stored by memorizing the coordinate values in the drawing unit so that the layout of the original is preserved.

【００４３】第２の実施例の動作について説明すると、
ＯＣＲ部８で認識された、文字列は、単語単位で原稿上
での位置情報と対応づけてステップ３５で認識結果デー
タとして格納される。そこで、ステップ１２０において
前記ステップ３５で格納された認識英単語を対応する原
文の単語の開始座標と終了座標間に描画する。第８図に
第２の発明の対訳出力形式の一例を示してある。ここ
で、３行目の”Ｐａｒｉｓ”が本来”パリ”と訳される
べきであるのに”平鍋”と訳されている。そこで、ＯＣ
Ｒ結果を見るとｒとｉを１文字のｎとしてとらえた”Ｐ
ａｎｓ”になっておりＰａｎ＝平鍋と訳したことがわか
る。The operation of the second embodiment will be described below.
The character string recognized by the OCR unit 8 is stored as recognition result data in step 35 in association with position information on the document on a word-by-word basis. Therefore, in step 120, the recognized English word stored in step 35 is drawn between the start and end coordinates of the corresponding word in the original sentence. FIG. 8 shows an example of the parallel translation output format of the second invention. Here, "Paris" in the third line should be translated as "Paris", but it is translated as "Pannabe". So OC
Looking at the R result, we saw r and i as one character n. "P
It is "ans" and it can be understood that it was translated as Pan = pan.

【００４４】第３の実施例の動作ついて説明すると、単
語情報の単語識別コードが０ｘ０１の場合、ステップ１
２０の作画処理において開始座標から終了座標までを線
で囲む処理を行い図８（１）に示すような出力を得る。The operation of the third embodiment will be described. If the word identification code of the word information is 0x01, step 1
In the drawing process of 20, the process from the start coordinate to the end coordinate is surrounded by a line to obtain an output as shown in FIG. 8 (1).

【００４５】このとき、単語識別コードで０ｘ０１が連
続していないことを利用して図８（２）に示すように分
割して単語を囲むような処理を施しても良い。その際関
係していることを示すように図８（２）のような線を付
けてもよい。At this time, by utilizing the fact that 0x01 is not continuous in the word identification code, processing may be performed such that the word is divided and surrounded as shown in FIG. 8 (2). At this time, a line as shown in FIG. 8 (2) may be added to show that they are related.

【００４６】また、単語識別コードが０ｘ０１のものの
みを熟語シートとして出力させることができる。Also, only the word identification code of 0x01 can be output as the idiom sheet.

【００４７】第４の実施例の動作ついて説明すると、対
訳処理に入る前に以下の選択ができるようにする。標準辞書のみ専門辞書のみ両方The operation of the fourth embodiment will be described. Before the bilingual processing, the following selections can be made. Standard dictionary only Special dictionary only Both

【００４８】このとき、両方を指定した場合に、同一単
語において両者に訳が存在する場合、標準辞書の訳を優
先させる、またはその逆、または両方訳を表示する選択
を行えるように制御する。それにより、図９（１），
（２），（３）に示すような出力を得ることができる。At this time, when both are designated and translations exist for both in the same word, the translation of the standard dictionary is prioritized, or vice versa, or both translations are displayed. As a result, FIG. 9 (1),
Outputs as shown in (2) and (3) can be obtained.

【００４９】第５の実施例の動作について説明すると、
ステップ５で画像入力する際に原稿上の白抜き文字、ア
ウトライン文字が存在する領域を指定する。画像処理部
３において２値化された後、指定された領域に対して画
像加工処理において公知のネガポジ反転と中埋め処理を
行う。そのデータを対訳装置に入力することにより、白
抜き文字、アウトライン文字は通常のＯＣＲできる単語
と区別されることなく認識できる。The operation of the fifth embodiment will be described below.
In step 5, when an image is input, an area on the document in which blank characters and outline characters exist is designated. After being binarized in the image processing unit 3, the negative / positive inversion and the embedding processing which are known in the image processing are performed on the designated area. By inputting the data into the parallel translation device, the outline characters and outline characters can be recognized without being distinguished from the words that can be normally OCR.

【００５０】このとき、対訳形式で原稿のイメージをそ
のまま切り取って出力に合成する処理を行うため、画像
加工処理を行う部分は、ネガポジ反転と中埋め処理を行
った画像は、ＯＣＲ結果が出るまでの一時的なものとし
て記憶しておく。At this time, since the image of the original document is cut as it is in the bilingual form and combined with the output, the image processing part is processed by the negative / positive inversion and the half-filling process until the OCR result is obtained. Remember as a temporary thing.

【００５１】[0051]

【発明の効果】請求項１記載の発明によれば、原稿の画
像を読み取る画像読取手段、読み取った画像情報を記憶
する画像記憶手段、読み取った画像情報に画像処理を施
す画像処理手段、および、画像処理手段が出力する画像
信号を記録媒体上に記録する画像記録手段を備えるデジ
タル画像複写機において、前記読取手段により読み取っ
た画像から文字領域を識別する識別手段と、前記識別さ
れた文字領域の光学的文字認識を行う認識手段と、該認
識された文字からなる言語を異なる言語に翻訳するべく
辞書検索をする辞書検索手段と、前記辞書検索において
記号で連結された単語、熟語を検索する手段と、検索さ
れた訳語を格納する際に、前記連結単語、熟語を単語と
同じ構造で記憶させる手段と、検索された訳語を使用し
て任意の形式で出力を行う手段とを備えるので、メモリ
量、作業量を少なくできる。According to the first aspect of the invention, the image reading means for reading the image of the original, the image storing means for storing the read image information, the image processing means for performing the image processing on the read image information, and In a digital image copying machine provided with an image recording means for recording an image signal output by an image processing means on a recording medium, an identifying means for identifying a character area from an image read by the reading means, and an identifying means for identifying the character area A recognition means for performing optical character recognition, a dictionary search means for performing a dictionary search to translate a language composed of the recognized characters into a different language, and a means for searching for words and phrases that are connected by symbols in the dictionary search. When storing the retrieved translated word, the concatenated word and the idiom are stored in the same structure as the word, and the retrieved translated word is output in an arbitrary format. Because and means for performing, amount of memory, amount of work can be reduced.

【００５２】請求項２記載の発明によれば、請求項１に
記載の対訳装置において、検索された熟語に印を付けて
出力する手段を備えるので、熟語とそれを構成する単語
の両方の翻訳を出力できる。また、出力形式でマークを
付けるなどして単語と区別することにより、熟語である
ことが容易にわかる。また、マークの付け方を工夫する
ことにより、連結していない熟語に対してその熟語が関
連する単語全てにマークを付けたり、熟語だけをマーク
することができる。According to the second aspect of the invention, the parallel translation device according to the first aspect includes means for marking and outputting the retrieved idiom, so that both the idiom and the words that compose it are translated. Can be output. Also, by distinguishing from a word by adding a mark in the output format, it can be easily understood that it is a idiom. Further, by devising the marking method, it is possible to mark all the words related to a compound word that is not connected or to mark only the compound word.

【００５３】請求項３記載の発明によれば、請求項１に
記載の対訳装置において、任意の出力形式に原稿どうり
のレイアウトで原文の下に対応する訳語を出力する手段
を備えるので、原稿と対訳出力とを参照しやすくでき
る。According to the third aspect of the invention, the parallel translation apparatus according to the first aspect is provided with means for outputting the corresponding translated word below the original sentence in a layout similar to the original document in an arbitrary output format. It is possible to easily refer to and the parallel output.

【００５４】請求項４記載の発明によれば、請求項１に
記載の対訳装置において、任意の出力形式に、原文の下
に対応する訳語のみならず、認識された文字を出力する
手段を備えるので、翻訳を出力する過程のどのにミスが
発生したのかを判りやすくできる。According to the invention described in claim 4, in the bilingual apparatus according to claim 1, a unit for outputting the recognized character as well as the corresponding translated word under the original sentence is provided in an arbitrary output format. Therefore, it is possible to easily understand which error occurred in the process of outputting the translation.

【００５５】請求項５記載の発明によれば、請求項１に
記載の対訳装置において、使用する辞書に、一般に広く
使用されている辞書と専門用語に対応した辞書の両者を
使用する手段と、辞書の使用形態を選択できる手段を備
えるので、翻訳ミスや翻訳不可ということをなくすこと
ができる。According to the invention described in claim 5, in the bilingual apparatus according to claim 1, as a dictionary to be used, a means that uses both a dictionary that is generally widely used and a dictionary that corresponds to a technical term, Since a means for selecting the usage form of the dictionary is provided, it is possible to eliminate a translation error or a translation failure.

【００５６】請求項６記載の発明によれば、請求項１に
記載の対訳装置において、入力範囲指定手段を備えるの
で、白抜き文字、アウトラインフォントに対しての翻訳
を行なうことができる。According to the invention described in claim 6, in the parallel translation apparatus according to claim 1, since the input range designating means is provided, it is possible to perform the translation for the outline character and the outline font.

【００５７】[0057]

[Brief description of drawings]

【図１】本発明に係るデジタル複写機と対訳装置の外観
図である。FIG. 1 is an external view of a digital copying machine and a parallel translation apparatus according to the present invention.

【図２】本発明に係るデジタル複写機と対訳装置の構成
図である。FIG. 2 is a configuration diagram of a digital copying machine and a parallel translation apparatus according to the present invention.

【図３】本発明に係るデジタル複写機と対訳装置のブロ
ック図である。FIG. 3 is a block diagram of a digital copying machine and a parallel translation apparatus according to the present invention.

【図４】本発明に係るＣＰＵのフローチャートである。FIG. 4 is a flowchart of a CPU according to the present invention.

【図５】本発明に係る中間体構造を示す説明図である。FIG. 5 is an explanatory diagram showing an intermediate structure according to the present invention.

【図６】本発明に係る出力形式を示す説明図である。FIG. 6 is an explanatory diagram showing an output format according to the present invention.

【図７】本発明に係る出力形式を示す説明図である。FIG. 7 is an explanatory diagram showing an output format according to the present invention.

【図８】本発明に係る出力形式を示す説明図である。FIG. 8 is an explanatory diagram showing an output format according to the present invention.

【図９】本発明に係る出力形式を示す説明図である。FIG. 9 is an explanatory diagram showing an output format according to the present invention.

【図１０】従来の出力形式を示す説明図である。FIG. 10 is an explanatory diagram showing a conventional output format.

[Explanation of symbols]

１：ＣＰＵ２：スキャナ４：画像処理部５：プリンタ６：対訳処理用ＣＰＵ７：領域識別部８：ＯＣＲ部＋スペルチェック部９：辞書検索部１0：作画部１１：辞書１２：フォント１３：メモリ４００：デジタル複写機、対訳装置間ケーブル 1: CPU 2: scanner 4: image processing unit 5: printer 6: parallel processing CPU 7: area identification unit 8: OCR unit + spell check unit 9: dictionary search unit 10: drawing unit 11: dictionary 12: font 13: Memory 400: Cable between digital copier and translation device

Claims

[Claims]

1. An image reading unit for reading an image of an original, an image storage unit for storing read image information, an image processing unit for performing image processing on the read image information, and an image signal output by the image processing unit. In a digital image copying machine having an image recording means for recording on a medium, an identifying means for identifying a character area from an image read by the reading means, and a recognizing means for performing optical character recognition of the identified character area, Dictionary search means for performing a dictionary search to translate the language consisting of the recognized characters into a different language,
In the dictionary search, a means for searching for words and phrases that are connected by symbols in the dictionary search, a means for storing the connected words and phrases in the same structure as the words when storing the searched translations, and using the searched translations And a means for performing output in an arbitrary format.

2. The bilingual apparatus according to claim 1, further comprising means for marking and outputting the retrieved idiom, and separately outputting the idiom by comparing it with the translated word.

3. The parallel translation apparatus according to claim 1, further comprising means for outputting a corresponding translated word below the original sentence in a layout like a manuscript in an arbitrary output format.

4. The parallel translation apparatus according to claim 1, further comprising means for outputting a recognized character as well as a corresponding translated word under the original sentence in an arbitrary output format. .

5. The bilingual apparatus according to claim 1, wherein the dictionary to be used is provided with means for using both a widely used dictionary and a dictionary corresponding to a technical term, and a usage form of the dictionary can be selected. A parallel translation device comprising means and outputting translated words of both dictionaries to an output format.

6. The bilingual apparatus according to claim 1, further comprising an input range designating means, wherein bilingual processing is performed on the image subjected to the image processing processing means.