JPH0883280A

JPH0883280A - Document processor

Info

Publication number: JPH0883280A
Application number: JP6220007A
Authority: JP
Inventors: Hidezo Kugimiya; 秀造釘宮
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1994-09-14
Filing date: 1994-09-14
Publication date: 1996-03-26

Abstract

PURPOSE: To perform a translation processing with high reliability by substituting a text ID number for the text data composed of the second language obtained from the result of the translation processing. CONSTITUTION: A text data extraction means 203 extracts text data from original document data, makes the document data correspond to a test ID number, stores the document data in the text data storage part of a storage means 202, adds the text ID number to the corresponded portion after text data is extracted and stores non-text data in a non-text data storage part by the layout which is the same as that of the original document. When an edition means 204 executes an edition processing for the extracted text data, a translation means 205 executes the translation processing from a first language to a second language for the edited text data. Further, a text data substitution means 206 reads the stored text ID number and non-text data and substitutes the text number for the text data composed of the second language obtained from the result of the translation processing.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文章を形成するテキス
トデータと形式情報からなる非テキストデータの混在す
る文書の編集や翻訳などの文書データ処理を行う文書処
理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing apparatus for performing document data processing such as editing and translation of a document in which text data forming a sentence and non-text data consisting of format information are mixed.

【０００２】[0002]

【従来の技術】最近は、ＤＴＰ（Desk Top Publishing
）機能を持った文書処理装置により、テキストデータ
（言語データ）からなる文書データ以外に図や表を含む
文書データが多く作成されるようになっている。また、
ＴｅＸのように文書中に、書体やフォントのサイズ等の
組版の指定情報を表すマークアップ記号が埋め込まれた
文書データも増大している。そして、これらの文書デー
タを他の言語データに翻訳するという技術に対する需要
も増えてきている。2. Description of the Related Art Recently, DTP (Desk Top Publishing)
With the document processing device having the function, a lot of document data including figures and tables is created in addition to the document data composed of text data (language data). Also,
Document data such as TeX in which a markup symbol representing typesetting designation information such as a typeface or font size is embedded in a document is also increasing. And, there is an increasing demand for a technology for translating these document data into other language data.

【０００３】しかし、従来の機械翻訳機能を備えた文書
処理装置では、入力される文書データのうちテキストデ
ータ（言語データ）しか処理することができない。その
ため、例えば、図や表を含んだＤＴＰ文書を翻訳しよう
とする場合、利用者が、一旦、図、表、レイアウト情報
などの非テキストデータ（非言語データ）を原文書から
取り除き、テキストデータのみを翻訳した後で、非テキ
ストデータを結合するという操作が必要となる。これは
利用者には大きな負担であり、また、翻訳作業も非常に
効率が悪い。However, a conventional document processing apparatus having a machine translation function can process only text data (language data) of input document data. Therefore, for example, when translating a DTP document including a figure or a table, the user first removes non-text data (non-language data) such as a figure, a table, and layout information from the original document, and only the text data. After translating, you need to combine non-text data. This imposes a heavy burden on the user, and translation work is very inefficient.

【０００４】そこで、原文書から言語データだけを抽出
して翻訳処理を行い、原文書の言語データと翻訳処理さ
れた言語データとを置き換えることにより、非言語デー
タが含まれた文書の文書処理を行うことができるように
した文書処理システムが提案されている（特開平４−２
５９０５７号公報、参照）。Therefore, only the linguistic data is extracted from the original document, the translation process is performed, and the linguistic data of the original document is replaced with the translated linguistic data, whereby the document process of the document including the non-linguistic data is performed. There is proposed a document processing system which can be performed (Japanese Patent Laid-Open No. 4-2 / 1992).
59057, gazette).

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、特開平
４−２５９０５７号公報の文書処理システムでは、処理
された言語データが元の文書内のどこの箇所にあったか
という情報がないために、元の文書内の言語データと処
理された言語データとを置き換えるための対応箇所決定
手段が必要である。また、この対応箇所決定手段の方法
として、例えば、文同士が一致する文、同一のキーワー
ドが含まれる文、パラグラフの先頭の文同士は対応する
文、対応がとれた文から同一距離にある文、などは元の
文と対応することからその状態を検索して対応箇所を決
定しているため、文書処理システムが複雑になるという
問題がある。また、対応箇所決定手段が、常に正しい対
応箇所を決定するとは限らないために、非効率であると
ともに信頼性に欠けるという問題点がある。However, in the document processing system disclosed in Japanese Patent Laid-Open No. 4-259057, there is no information as to where the processed language data is located in the original document. Corresponding point determining means for replacing the language data in the and the processed language data is required. Further, as the method of the corresponding part determining means, for example, sentences in which sentences match, sentences including the same keyword, sentences at the beginning of a paragraph are corresponding sentences, and sentences at the same distance from the sentence having the correspondence , Etc. have a problem that the document processing system becomes complicated because the corresponding part is determined by searching its state because it corresponds to the original sentence. Moreover, since the corresponding point determining means does not always determine the correct corresponding point, it is inefficient and lacks reliability.

【０００６】本発明は以上の事情を考慮してなされたも
ので、テキストデータと非テキストデータが混在する文
書データで構成される文書データのレイアウトからテキ
ストデータのみを正確に抽出して信頼性の高い翻訳処理
をすることができる文書処理装置を提供するものであ
る。The present invention has been made in consideration of the above circumstances, and only text data is accurately extracted from the layout of document data composed of document data in which text data and non-text data are mixed, and reliability is improved. A document processing apparatus capable of performing high translation processing is provided.

【０００７】[0007]

【課題を解決するための手段】本発明は、第１言語から
なるテキストデータと非テキストデータとが混在する原
文書データを入力するとともに編集、レイアウト、翻訳
等の処理指示を入力する入力手段と、テキストデータを
記憶するテキストデータ記憶部と非テキストデータを記
憶する非テキストデータ記憶部を有する記憶手段と、入
力された原文書データからテキストデータを抽出しその
抽出したテキストデータにテキストＩＤ番号を対応させ
て前記テキストデータ記憶部に記憶させるとともにテキ
ストデータ抽出後の対応箇所に前記テキストＩＤ番号を
付加し非テキストデータを原文書と同一のレイアウトで
前記非テキストデータ記憶部に記憶させるテキストデー
タ抽出手段と、抽出されたテキストデータに対して編集
処理を実行する編集手段と、編集されたテキストデータ
に対して第１言語から第２言語に翻訳処理を実行する翻
訳手段と、非テキストデータ記憶部に記憶されたテキス
トＩＤ番号と非テキストデータを読み出しそのテキスト
ＩＤ番号を翻訳処理の結果得られた第２言語からなるテ
キストデータに置き換えるテキストデータ置換手段と、
置き換えられた第２言語からなるテキストデータと非テ
キストデータとが混在する文書データを出力する出力手
段とをそれぞれ備えたことを特徴とする文書処理装置で
ある。The present invention provides an input means for inputting original document data in which text data in the first language and non-text data are mixed, and for inputting processing instructions such as editing, layout, and translation. A storage unit having a text data storage unit for storing text data and a non-text data storage unit for storing non-text data, and text data is extracted from the input original document data, and a text ID number is assigned to the extracted text data. Text data extraction in which the corresponding text ID number is added to the corresponding location after the text data is extracted and stored in the non-text data storage section in the same layout as the original document Means and editing for the extracted text data Means, a translation means for performing translation processing on the edited text data from the first language to the second language, and a text ID number and non-text data stored in the non-text data storage section for reading the text ID number. And a text data replacing means for replacing the with the text data in the second language obtained as a result of the translation process,
The document processing apparatus is provided with an output unit that outputs document data in which the replaced text data in the second language and non-text data are mixed.

【０００８】さらに、他の発明としては、第１言語から
なるテキストデータと非テキストデータとが混在する原
文書データを入力するとともに編集、レイアウト、翻訳
等の処理指示を入力する入力手段と、テキストデータを
記憶するテキストデータ記憶部とテキストデータ対応関
係情報を記憶するテキストデータ対応関係記憶部を有す
る記憶手段と、入力された原文書データからテキストデ
ータを抽出するとともに抽出したテキストデータを記憶
手段のテキストデータ記憶部に記憶させるテキストデー
タ抽出手段と、抽出されたテキストデータに対して編集
処理を実行するとともに編集後のテキストデータと原文
書データとのテキストデータ対応関係情報を記憶手段の
テキストデータ対応関係記憶部に記憶させる編集手段
と、編集されたテキストデータに対して翻訳処理を実行
する翻訳手段と、テキストデータ対応関係記憶部に記憶
されたテキストデータ対応関係情報に基づき第１言語か
らなるテキストデータを翻訳処理の結果得られた第２言
語からなるテキストデータに置き換える置換手段と、置
き換えられた第２言語からなるテキストデータと非テキ
ストデータとが混在する文書データを出力する出力手段
とをそれぞれ備えたことを特徴とする文書処理装置であ
る。Further, as another invention, input means for inputting original document data in which text data in the first language and non-text data are mixed and inputting processing instructions for editing, layout, translation, etc., and text. A storage unit having a text data storage unit that stores data and a text data correspondence storage unit that stores text data correspondence information; and text data extracted from the input original document data and the extracted text data of the storage unit. Text data extraction means to be stored in the text data storage portion, edit processing is performed on the extracted text data, and text data correspondence information of the edited text data and original document data is stored in the storage means. Editing means to be stored in the relational storage unit and edited text And a second language obtained as a result of the translation processing of the text data in the first language based on the text data correspondence information stored in the text data correspondence storage unit. A document processing apparatus comprising: a replacement unit that replaces text data; and an output unit that outputs document data in which the replaced text data in the second language and non-text data are mixed.

【０００９】前記テキストデータ抽出手段はテキストデ
ータ抽出部およびテキストデータ判別部を備え、前記テ
キストデータ抽出部が入力された原文書データからテキ
ストデータを抽出する際、テキストデータ判別部はテキ
ストデータであるか非テキストデータであるかを判別す
るよう構成されることが好ましい。The text data extracting means includes a text data extracting section and a text data discriminating section. When the text data extracting section extracts text data from the input original document data, the text data discriminating section is text data. It is preferably configured to determine whether it is non-text data.

【００１０】なお、本発明において、入力手段として
は、キーボード、ＯＣＲ等の入力装置が用いられる。記
憶手段（テキストデータ記憶部、非テキストデータ記憶
部、キストデータ対応関係記憶部）、テキストデータ抽
出手段（テキストデータ抽出部、テキストデータ判別
部）、編集手段、翻訳手段、テキストデータ置換手段と
しては、ＣＰＵ、ＲＡＭ、ＲＯＭ、Ｉ／Ｏポートからな
るマイクロコンピュータが用いられる。また、記憶手段
としては、特に、ＲＡＭが用いられる。また、翻訳手段
としては、翻訳用の辞書、文法規則等を格納しているＲ
ＯＭが用いられる。出力手段としては、ＣＲＴ、ＬＣＤ
等の表示装置、あるいは熱転写プリンタ、レーザプリン
タ等の印刷装置が用いられる。In the present invention, an input device such as a keyboard or OCR is used as the input means. Storage means (text data storage section, non-text data storage section, text data correspondence relationship storage section), text data extraction means (text data extraction section, text data discrimination section), editing means, translation means, text data replacement means , A CPU, a RAM, a ROM, and an I / O port are used. A RAM is used as the storage means. As a translation means, an R storing a dictionary for translation, grammatical rules, etc.
OM is used. Output means include CRT, LCD
A display device such as a printer or a printing device such as a thermal transfer printer or a laser printer is used.

【００１１】[0011]

【作用】本発明の構成によれば、入力手段からは、第１
言語からなるテキストデータと非テキストデータとが混
在する原文書データを入力するとともに編集、レイアウ
ト、翻訳等の処理指示を入力する。テキストデータを記
憶するテキストデータ記憶部と非テキストデータを記憶
する非テキストデータ記憶部から構成されている記憶手
段が備えられている。テキストデータ抽出手段は、入力
された原文書データからテキストデータを抽出しその抽
出したテキストデータにテキストＩＤ番号を対応させて
前記テキストデータ記憶部に記憶させるとともにテキス
トデータ抽出後の対応箇所に前記テキストＩＤ番号を付
加し非テキストデータを原文書と同一のレイアウトで前
記非テキストデータ記憶部に記憶させる。編集手段が抽
出されたテキストデータに対して編集処理を実行する
と、翻訳手段は編集されたテキストデータに対して第１
言語から第２言語に翻訳処理を実行する。テキストデー
タ置換手段は、非テキストデータ記憶部に記憶されたテ
キストＩＤ番号と非テキストデータを読み出しそのテキ
ストＩＤ番号を翻訳処理の結果得られた第２言語からな
るテキストデータに置き換える。従って、出力手段は置
き換えられた第２言語からなるテキストデータと非テキ
ストデータとが混在する文書データを出力することがで
きる。従って、原文書と同じレイアウトの翻訳文書が得
られることになり、翻訳処理の効率を大幅に向上するこ
とができるので翻訳利用者の負担も大幅に軽減される。According to the structure of the present invention, the first means
Original document data in which text data in a language and non-text data are mixed is input, and processing instructions for editing, layout, translation, etc. are input. A storage unit is provided that includes a text data storage unit that stores text data and a non-text data storage unit that stores non-text data. The text data extraction means extracts text data from the input original document data, stores the extracted text data in the text data storage section in association with a text ID number, and stores the text at the corresponding location after the text data extraction. An ID number is added and the non-text data is stored in the non-text data storage unit in the same layout as the original document. When the editing means performs the editing process on the extracted text data, the translating means first performs the editing on the edited text data.
The translation process from the language to the second language is executed. The text data replacement means reads the text ID number and the non-text data stored in the non-text data storage unit and replaces the text ID number with the text data in the second language obtained as a result of the translation process. Therefore, the output means can output the document data in which the replaced text data in the second language and the non-text data are mixed. Therefore, a translated document having the same layout as the original document can be obtained, and the efficiency of translation processing can be greatly improved, so that the burden on the translation user can be significantly reduced.

【００１２】さらに、他の発明の構成によれば、入力手
段からは、第１言語からなるテキストデータと非テキス
トデータとが混在する原文書データを入力するとともに
編集、レイアウト、翻訳等の処理指示を入力する。テキ
ストデータを記憶するテキストデータ記憶部とテキスト
データ対応関係情報を記憶するテキストデータ対応関係
記憶部から構成されている記憶手段が備えられている。
テキストデータ抽出手段は、入力された原文書データか
らテキストデータを抽出するとともに抽出したテキスト
データを記憶手段のテキストデータ記憶部に記憶させる
と、編集手段は、抽出されたテキストデータに対して編
集処理を実行するとともに編集後のテキストデータと原
文書データとのテキストデータ対応関係情報を記憶手段
のテキストデータ対応関係記憶部に記憶させる。翻訳手
段が編集されたテキストデータに対して翻訳処理を実行
すると、置換手段はテキストデータ対応関係記憶部に記
憶されたテキストデータ対応関係情報に基づき第１言語
からなるテキストデータを翻訳処理の結果得られた第２
言語からなるテキストデータに置き換える。従って、出
力手段は置き換えられた第２言語からなるテキストデー
タと非テキストデータとが混在する文書データを出力す
ることができる。従って、原文書と同じレイアウトの翻
訳文書が得られることになり、翻訳処理の効率を大幅に
向上することができるので翻訳利用者の負担も大幅に軽
減される。Further, according to another aspect of the invention, the original document data in which the text data in the first language and the non-text data are mixed is input from the input means, and the processing instruction for editing, layout, translation, etc. is input. Enter. A storage unit is provided that includes a text data storage unit that stores text data and a text data correspondence storage unit that stores text data correspondence information.
The text data extraction means extracts text data from the input original document data and stores the extracted text data in the text data storage section of the storage means, and the editing means edits the extracted text data. And the text data correspondence relationship information between the edited text data and the original document data is stored in the text data correspondence relationship storage unit of the storage means. When the translation means performs the translation process on the edited text data, the replacement means obtains the text data in the first language as a result of the translation process based on the text data correspondence information stored in the text data correspondence storage unit. The second
Replace with text data consisting of language. Therefore, the output means can output the document data in which the replaced text data in the second language and the non-text data are mixed. Therefore, a translated document having the same layout as the original document can be obtained, and the efficiency of translation processing can be greatly improved, so that the burden on the translation user can be significantly reduced.

【００１３】前記テキストデータ抽出手段はテキストデ
ータ抽出部およびテキストデータ判別部を備えた構成で
あれば、前記テキストデータ抽出部が入力された原文書
データからテキストデータを抽出する際、テキストデー
タ判別部はテキストデータであるか非テキストデータで
あるかを判別することができる。If the text data extracting means is configured to include a text data extracting section and a text data discriminating section, the text data extracting section extracts the text data from the input original document data. Can determine whether it is text data or non-text data.

【００１４】[0014]

【実施例】以下、図面に示す実施例に基づいて本発明を
詳述する。なお、これによって本発明は限定されるもの
でない。本発明は、主として、コンピュータ、ＤＴＰ
（Desk Top Publishing ）などに搭載された機械翻訳装
置に適用されて好適であり、各構成要素は本発明の翻訳
処理機能を達成する以外に、文書データの編集処理機能
を有する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on the embodiments shown in the drawings. The present invention is not limited to this. The present invention is mainly applied to computers and DTPs.
It is suitable to be applied to a machine translation device installed in (Desk Top Publishing) and the like, and each component has a document data edit processing function in addition to achieving the translation processing function of the present invention.

【００１５】図１は本発明の文書処理装置の一実施例を
示すブロック図である。図１において、１は制御部であ
り、判別手段、データ格納手段、データ処理手段として
装置全体の制御を行うＣＰＵ（Central Processing Uni
t ）２と、このＣＰＵ２の動作時のデータを記憶するＲ
ＡＭ（Random Access Memory）からなるメインメモリ３
から構成されてる。また、メインメモリ３はテキストデ
ータ、編集されたテキストデータ及び翻訳されたテキス
トデータを記憶するテキストデータ記憶部、非テキスト
データ及びテキストＩＤ番号（テキストデータ識別番
号）等を記憶する非テキストデータ記憶部で構成されて
いる。FIG. 1 is a block diagram showing an embodiment of the document processing apparatus of the present invention. In FIG. 1, reference numeral 1 denotes a control unit, which is a CPU (Central Processing Uni) that controls the entire apparatus as a determination unit, a data storage unit, and a data processing unit.
t) 2 and R for storing data when this CPU 2 is operating
Main memory 3 consisting of AM (Random Access Memory)
It consists of Further, the main memory 3 is a text data storage unit for storing text data, edited text data and translated text data, and a non-text data storage unit for storing non-text data and a text ID number (text data identification number). It is composed of.

【００１６】また、４は翻訳手段として機能するＣＰ
Ｕ、翻訳プログラムを格納したＲＯＭ、翻訳バッファ
（ＲＡＭ）等からなる翻訳モジュールであり、５は翻訳
モジュール４が入力された原文を翻訳する際に使用する
翻訳用の辞書、文法規則等を格納しているＲＯＭ（Read
Only Memory）からなる辞書メモリでる。また、制御部
１には、翻訳モジュール４、辞書メモリ５が接続されて
いる。Further, 4 is a CP which functions as a translation means.
U is a translation module including a ROM storing a translation program, a translation buffer (RAM), and the like, and 5 stores a translation dictionary, a grammatical rule, etc., used when the translation module 4 translates the input original text. ROM (Read
It is a dictionary memory consisting of (Only Memory). A translation module 4 and a dictionary memory 5 are also connected to the control unit 1.

【００１７】６はテキストデータ抽出部であり、このテ
キストデータ抽出部６により、外部記憶装置１０に記憶
されているテキストデータと非テキストデータとが混在
する原文ファイルからテキストデータを抽出してメイン
メモリ３のテキストデータ記憶部に記憶させると同時に
テキストデータが抽出された場所にはテキストＩＤ番号
を生成して非テキストデータとともにメインメモリ３の
非テキストデータ記憶部に記憶させる。７は編集部であ
り、この編集部７により、テキストデータ抽出部６によ
って抽出されたテキストデータを編集する。A text data extraction unit 6 extracts text data from an original file in which text data and non-text data stored in the external storage device 10 are mixed by the text data extraction unit 6, and the text data is extracted from the main memory. The text ID number is generated at the location where the text data is extracted and stored in the non-text data storage section of the main memory 3 together with the non-text data. An editing unit 7 edits the text data extracted by the text data extracting unit 6 by the editing unit 7.

【００１８】８はテキストデータ置換部であり、このテ
キストデータ置換部８により、テキストデータ抽出部６
によってメインメモリ３の非テキストデータ部に記憶さ
れたテキストＩＤと、同じくテキストデータ抽出部６に
よって抽出されたテキストデータが翻訳モジュール４に
よる翻訳処理を行った結果得られた第２言語からなるテ
キストデータとを置き換える。９は入力原文の文字入
力、文書編集、翻訳に対する指示等の入力のためのキー
ボード、ＯＣＲ等で構成される入力装置である。１０は
入力装置９から入力された原文のファイル（原文ファイ
ル）や翻訳文のファイルを格納するためのＦＤ（フロピ
ィーディスク）装置、ＨＤ（ハードディスク）装置で構
成される外部記憶装置である。１１はＣＲＴ（Cathode
Ray Tube）、ＬＣＤ（Liquid Crystal Disply ）等の表
示装置である。Reference numeral 8 denotes a text data replacement unit, and the text data replacement unit 8 allows the text data extraction unit 6 to operate.
The text data stored in the non-text data section of the main memory 3 and the text data extracted by the text data extraction section 6 in the second language obtained as a result of the translation processing by the translation module 4. Replace and. An input device 9 is composed of a keyboard, an OCR, etc. for inputting characters of an input source text, editing a document, inputting instructions for translation, and the like. Reference numeral 10 denotes an external storage device including an FD (floppy disk) device and an HD (hard disk) device for storing the original sentence file (original sentence file) and the translated sentence file input from the input device 9. 11 is a CRT (Cathode
It is a display device such as a Ray Tube) or an LCD (Liquid Crystal Disply).

【００１９】上記翻訳モジュール４は、翻訳される原文
に使用されている第１言語であるソース言語が入力され
ると、それを翻訳して翻訳文に使用されている第２言語
であるターゲット言語を出力するものである。すなわ
ち、ＣＰＵ２の制御により、外部記憶装置１０に格納さ
れた複数の原文ファイルのうち、あらかじめ指定された
１つの原文ファイルが外部記憶装置１０からメインメモ
リ３に転送され、その原文ファイルの中の１文のソース
言語が翻訳モジュール４に送られる。翻訳モジュール４
は、辞書メモリ５に記憶されている辞書、文法規則を用
いて、入力されたソース言語をターゲット言語に翻訳す
る。翻訳された文は、メインメモリ３に一旦記憶される
と共に、表示装置１１の画面に表示されるようになって
いる。When the source language, which is the first language used in the translated original sentence, is input, the translation module 4 translates the source language, and the target language, which is the second language used in the translated sentence. Is output. That is, under the control of the CPU 2, one pre-specified original text file among the plural original text files stored in the external storage device 10 is transferred from the external storage device 10 to the main memory 3, and one of the original text files is transferred. The source language of the sentence is sent to the translation module 4. Translation module 4
Translates the input source language into a target language using the dictionary and grammar rules stored in the dictionary memory 5. The translated sentence is temporarily stored in the main memory 3 and is displayed on the screen of the display device 11.

【００２０】図２は本発明を機械翻訳装置１に適用した
機能構成を示すブロック図である。図２において、２０
１はキーボード、ＯＣＲからなる入力手段（入力装置）
であり、入力原文の文字入力、文書編集、翻訳に対する
指示等の入力をする。２０２はテキストデータ記憶部、
非テキストデータ記憶部からなる記憶手段（メインメモ
リ）であり、テキストデータ記憶部には、テキストデー
タ、編集されたテキストデータ及び翻訳されたテキスト
データが記憶され、非テキストデータ部には、非テキス
トデータ及びテキストＩＤ番号が記憶される。FIG. 2 is a block diagram showing a functional configuration in which the present invention is applied to the machine translation apparatus 1. In FIG. 2, 20
1 is an input means (input device) consisting of a keyboard and OCR
Input characters of the input source text, edit documents, input instructions for translation, and the like. 202 is a text data storage unit,
It is a storage means (main memory) including a non-text data storage unit. The text data storage unit stores text data, edited text data, and translated text data, and the non-text data unit stores non-text data. Data and text ID numbers are stored.

【００２１】２０３はテキストデータ判別部、テキスト
データ抽出部、テキストＩＤ生成部、データ格納部から
なるテキストデータ抽出手段（テキストデータ抽出部）
であり、第１言語からなるテキストデータと非テキスト
データとが混在する文書からテキストデータか非テキス
トデータかを判別し、テキストデータを抽出する。テキ
ストデータ抽出後の対応箇所にテキストＩＤ番号を付加
し元の文書レイアウトに対応して非テキストデータを非
テキストデータ記憶部に記憶させるとともに抽出したテ
キストデータをテキストＩＤ番号と対応させてテキスト
データ記憶部に記憶させる。Reference numeral 203 denotes a text data extraction unit (text data extraction unit) including a text data discrimination unit, a text data extraction unit, a text ID generation unit, and a data storage unit.
Then, it is determined whether the text data or the non-text data is mixed from the document in which the text data in the first language and the non-text data are mixed, and the text data is extracted. A text ID number is added to the corresponding portion after extracting the text data, and the non-text data is stored in the non-text data storage unit in correspondence with the original document layout, and the extracted text data is stored in the text data in association with the text ID number. To be stored in the department.

【００２２】２０４は削除処理部、連結処理部、分割処
理部、挿入処理部からなる編集手段（編集部）であり、
抽出されたテキストに対して前記の各処理部で削除、連
結、分割、挿入等の編集処理を実行する。２０５は一文
切り出し部、翻訳実行部からなる翻訳手段（翻訳部）で
あり、編集されたテキストに対して翻訳処理を実行す
る。２０６はテキストデータ判別部、テキストデータ検
索部、データ格納部からなるテキストデータ置換手段
（テキストデータ置換部）であり、非テキストデータ記
憶部に記憶されたテキストＩＤ番号による文書レイアウ
トに基づいて第１言語からなるテキストデータを翻訳処
理の結果得られた第２言語からなるテキストデータに置
き換える。２０７は表示装置、あるいはプリンタからな
る出力手段であり、テキストデータ、非テキストデータ
等を表示出力あるいはプリント出力する。Reference numeral 204 denotes an editing unit (editing unit) including a deletion processing unit, a connection processing unit, a division processing unit, and an insertion processing unit,
The extracted texts are subjected to editing processing such as deletion, concatenation, division, insertion and the like in each of the processing units. Reference numeral 205 denotes a translation unit (translation unit) including a one-sentence cutout unit and a translation execution unit, which performs translation processing on the edited text. Reference numeral 206 denotes a text data replacement unit (text data replacement unit) including a text data determination unit, a text data search unit, and a data storage unit, which is based on the document layout based on the text ID number stored in the non-text data storage unit. The text data in the language is replaced with the text data in the second language obtained as a result of the translation process. An output unit 207 includes a display device or a printer, and displays or prints out text data, non-text data, and the like.

【００２３】図３は本発明の機械翻訳装置１におけるデ
ータ処理の概略を示すフローチャートである。次に、図
３を参照して、テキストデータと非テキストデータとが
混在する文書を翻訳するための処理手順を説明する。ステップ３０１：まず、翻訳対象となる、テキストデー
タと非テキストデータとが混在している原文書を読み込
み、メインメモリ３に格納する。ステップ３０２：入力された原文書からテキストデータ
を抽出する処理が行われる。この結果、翻訳処理の対象
となるテキストデータ３０８と非テキストデータ３０７
が得られる。FIG. 3 is a flow chart showing an outline of data processing in the machine translation device 1 of the present invention. Next, the processing procedure for translating a document in which text data and non-text data are mixed will be described with reference to FIG. Step 301: First, an original document, in which text data and non-text data are mixed, to be translated is read and stored in the main memory 3. Step 302: A process of extracting text data from the input original document is performed. As a result, the text data 308 and the non-text data 307 to be translated are processed.
Is obtained.

【００２４】このとき、テキストデータ抽出後の対応箇
所にテキストＩＤ番号を付加し元の文書レイアウトに対
応して非テキストデータを非テキストデータ記憶部に記
憶させるとともに抽出したテキストデータをテキストＩ
Ｄ番号と対応させてテキストデータ記憶部に記憶させ
る。ステップ３０３：必要に応じて、ステップ３０２で抽出
されたテキストデータに対して編集部７により編集処理
が行われる。ステップ３０４：編集が行われたテキストデータ３０９
を受け取り、翻訳モジュール４（図１、参照）によって
翻訳処理が行われる。翻訳処理の結果、翻訳文すなわち
第２言語のテキストデータ３１０が得られる。At this time, a text ID number is added to the corresponding portion after the text data is extracted, the non-text data is stored in the non-text data storage section in accordance with the original document layout, and the extracted text data is converted into the text I.
It is stored in the text data storage unit in association with the D number. Step 303: If necessary, the editing unit 7 performs an editing process on the text data extracted in step 302. Step 304: Edited text data 309
Is received, and the translation processing is performed by the translation module 4 (see FIG. 1). As a result of the translation process, translated text, that is, text data 310 in the second language is obtained.

【００２５】ステップ３０５：ステップ３０２で得られ
た非テキストデータ３０７とともにテキストＩＤ番号を
記憶していた箇所に、ステップ３０４で得られた第２言
語のテキストデータ３１０を置換する処理が行われる。ステップ３０６：以上のようにして得られた、原文書の
第１言語からなるテキストデータを第２言語からなるテ
キストデータに置き換えた結果文書を出力する処理が行
われる。Step 305: A process for replacing the text ID number stored in the non-text data 307 obtained in step 302 with the text data 310 of the second language obtained in step 304 is performed. Step 306: A process of outputting the result document in which the text data in the first language of the original document obtained as described above is replaced with the text data in the second language is performed.

【００２６】図４は図３のステップ３０２に対応するテ
キストデータ抽出処理の詳細を示すフローチャートであ
る。また、図５は本発明の機械翻訳装置１で翻訳される
入力文書の１ページのレイアウトを示す説明図である。
図５に示すように、この文書のレイアウトでは、例え
ば、ブロック１〜２とブロック４〜５にはテキストデー
タである文書データから構成され、ブロック３には非テ
キストデータである図形データが存在している。また、
図６は図５の入力文書を文書データとして記憶する場合
の記憶例を示す説明図である。表示装置１１には図５に
示すような文書が表示されるが、メインメモリ３には図
６に示すような形式のデータとして記憶される。また、
図７は図６の文書データに対してテキストデータ抽出処
理を行った後の非テキストデータを示す説明図である。
また、図８は図６の文書データに対してテキストデータ
抽出処理を行った後のテキストデータを示す説明図であ
る。FIG. 4 is a flow chart showing details of the text data extraction process corresponding to step 302 of FIG. FIG. 5 is an explanatory diagram showing the layout of one page of the input document translated by the machine translation device 1 of the present invention.
As shown in FIG. 5, in this document layout, for example, blocks 1 to 2 and blocks 4 to 5 are composed of document data which is text data, and a block 3 includes graphic data which is non-text data. ing. Also,
FIG. 6 is an explanatory diagram showing an example of storage when the input document of FIG. 5 is stored as document data. Although a document as shown in FIG. 5 is displayed on the display device 11, it is stored in the main memory 3 as data in the format as shown in FIG. Also,
FIG. 7 is an explanatory diagram showing the non-text data after the text data extraction processing is performed on the document data of FIG.
FIG. 8 is an explanatory diagram showing the text data after the text data extraction processing is performed on the document data of FIG.

【００２７】ステップ４０１：まず、翻訳対象となる、
テキストデータと非テキストデータとが混在している原
文書を読み込む。ステップ４０２：次に、データポインタを原文書の先頭
にセットする。図６に示す例では、ブロック１を指す。ステップ４０３：テキストＩＤ番号の初期値を「＃１
＃」にセットする。ここでは、テキストＩＤ番号の値を
示していることを他のデータと区別するため「＃」の記
号の間に数字を入れるようにしてある。この数字は連番
でデータ毎にカウントアップしていく。Step 401: First, the object to be translated,
Read an original document that contains both text data and non-text data. Step 402: Next, a data pointer is set at the beginning of the original document. In the example shown in FIG. 6, it indicates block 1. Step 403: The initial value of the text ID number is "# 1".
# ". Here, in order to distinguish that the value of the text ID number is shown from other data, a number is inserted between the symbols "#". This number is a serial number and is incremented for each data.

【００２８】ステップ４０４：ＣＰＵ２において、デー
タポインタが指しているデータがテキストデータである
か非テキストデータあるかを判別する。図６に示す例で
は、テキスト／非テキストを示すフラグによりテキスト
データか非テキストデータかを判別することができる。
ここで、テキストデータか非テキストデータかの判別は
原文書の形式によって異なる。例えば、ＴｅＸ文書で
は、「／」で始まるものは非テキストデータであり、Ｒ
ＴＦ（Rich Text Format：文書データ交換のためのフォ
ーマット）形式の文書も同様である。また、ｆｒａｍｅ
−ｍａｋｅｒで作成された文書では「＜string」で始ま
るものがテキストデータである。Step 404: The CPU 2 determines whether the data pointed to by the data pointer is text data or non-text data. In the example shown in FIG. 6, it is possible to determine whether the data is text data or non-text data by the flag indicating text / non-text.
Here, the distinction between text data and non-text data differs depending on the format of the original document. For example, in a TeX document, the one that starts with "/" is non-text data, and R
The same applies to documents in the TF (Rich Text Format) format. Also, frame
In a document created by -maker, text data begins with "<string".

【００２９】ステップ４０５：ステップ４０４におい
て、テキストデータと判別された場合は、テキストＩＤ
番号をテキストデータファイル３０８に書き込む。ステップ４０６：入力文書の「内容」の部分に入ってい
るデータをテキストデータファイル３０８にコピーす
る。図６の文書の例では、ブロック１のテキストデータ
ファイルは、図８のＩＤ「＃１＃」に示すようになる。Step 405: If it is judged as text data in step 404, text ID
Write the number to the text data file 308. Step 406: Copy the data contained in the “content” portion of the input document to the text data file 308. In the example of the document in FIG. 6, the text data file in block 1 is as shown by the ID “# 1 #” in FIG.

【００３０】ステップ４０７：そして、入力文書のブロ
ックＩＤとフラグを非テキストデータファイル３０７に
コピーする。ステップ４０８：テキストＩＤ番号を非テキストデータ
ファイル３０７の内容部に書き込む。図６の例では、ブ
ロック１の「内容」の部分がテキストＩＤ番号に置き換
えられて、図７のブロック１に示すような非テキストデ
ータファイルになる。Step 407: Then, copy the block ID and flag of the input document to the non-text data file 307. Step 408: Write the text ID number in the content section of the non-text data file 307. In the example of FIG. 6, the “content” portion of block 1 is replaced with the text ID number to form a non-text data file as shown in block 1 of FIG.

【００３１】ステップ４０９：さらに、テキストＩＤ番
号を＋１する。ステップ４１２：一方、ステップ４０４において、テキ
ストデータでないと判別された場合、例えば、図６にお
いて、ブロック３のデータの場合、ブロック３のデータ
は非テキストデータであるので、データポインタが指し
ているデータを非テキストデータファイル３０７に書き
込む。図６の例では、ブロック３のデータが図７のブロ
ック３にそのまま書き込まれる。ステップ４１０：以上のようにテキストデータと非テキ
ストデータの各々に応じた処理が行われた後、データポ
インタを＋１する。ステップ４１１：入力文書データがまだあるかどうか判
断する。データがある場合は、ステップ４０４に戻る。Step 409: Further, the text ID number is incremented by 1. Step 412: On the other hand, when it is determined in step 404 that the data is not text data, for example, in the case of the data of block 3 in FIG. 6, the data of block 3 is non-text data, so the data pointed to by the data pointer Is written in the non-text data file 307. In the example of FIG. 6, the data of the block 3 is written in the block 3 of FIG. 7 as it is. Step 410: After the processing according to each of the text data and the non-text data is performed as described above, the data pointer is incremented by one. Step 411: It is judged whether or not there is input document data. If there is data, the process returns to step 404.

【００３２】このようにして、テキストデータのブロッ
ク２、ブロック３、…と上記の処理を繰り返し、データ
が無くなれば処理を終了する。これにより、翻訳処理の
対象となるテキストデータ３０８と非テキストデータ３
０７が得られる。このようにして、図６の入力文書デー
タから得られた非テキストデータファイルを図７に、テ
キストデータファイルを図８に示す。In this way, the above processing is repeated with the blocks 2, 3, ... Of the text data, and when there is no more data, the processing ends. As a result, the text data 308 and the non-text data 3 to be translated are processed.
07 is obtained. A non-text data file obtained from the input document data of FIG. 6 in this way is shown in FIG. 7, and a text data file is shown in FIG.

【００３３】図９は図８のテキストデータに対して編集
処理を行った後のテキストデータを示す説明図である。
すなわち、ステップ３０２で抽出されたテキストデータ
をステップ３０３の編集処理で削除、連結、分割等の編
集作業が行われた後のテキストデータを示している。図
１０は図９のテキストデータに対して翻訳処理の中の一
文切り出し処理を行った後のテキストデータを示す説明
図である。すなわち、ステップ３０４の翻訳処理の中の
一文切り出し処理が終了した時点でのテキストデータを
示している。図１１は図１０のテキストデータに対して
翻訳処理を行った結果得られた第２言語のテキストデー
タを示す説明図である。すなわち、ステップ３０４の翻
訳処理によって得られた第２言語のテキストデータを示
している。FIG. 9 is an explanatory diagram showing the text data after the text data of FIG. 8 has been edited.
That is, it shows the text data after the text data extracted in step 302 has been edited in steps 303, such as deleting, linking, dividing, and the like. FIG. 10 is an explanatory diagram showing the text data after the text data of FIG. 9 has been subjected to one sentence cutout processing in the translation processing. That is, the text data at the time when the one-sentence cutout process in the translation process of step 304 is completed is shown. FIG. 11 is an explanatory diagram showing the text data of the second language obtained as a result of performing the translation process on the text data of FIG. That is, the text data of the second language obtained by the translation process of step 304 is shown.

【００３４】図１２は図３のステップ３０５に対応する
テキストデータ置換処理の詳細を示すフローチャートで
ある。ステップ１２０１：まず、ステップ３０２のテキストデ
ータ抽出処理で得られた非テキストデータファイル３０
７を読み込む。ステップ１２０２：続いて、ステップ３０４の翻訳処理
で得られた訳文すなわち第２言語のテキストデータファ
イル３１０を読み込む。FIG. 12 is a flowchart showing details of the text data replacement process corresponding to step 305 of FIG. Step 1201: First, the non-text data file 30 obtained by the text data extraction process of step 302
Read 7. Step 1202: Subsequently, the translated text obtained in the translation process of Step 304, that is, the text data file 310 of the second language is read.

【００３５】ステップ１２０３：次に、非テキストデー
タポインタを非テキストデータの先頭にセットする。図
６に示す例では、ブロック１を指す。ステップ１２０４：そして、非テキストデータポインタ
が指しているデータがテキストデータか非テキストデー
タかを判別する。図７に示す例では、テキスト／非テキ
ストを示すフラグによりテキストデータか非テキストデ
ータかを判別することができる。Step 1203: Next, the non-text data pointer is set at the head of the non-text data. In the example shown in FIG. 6, it indicates block 1. Step 1204: Then, it is determined whether the data pointed by the non-text data pointer is text data or non-text data. In the example shown in FIG. 7, it is possible to determine whether the data is text data or non-text data by the flag indicating text / non-text.

【００３６】ステップ１２０５：ステップ１２０４にお
いて、テキストデータと判別された場合は、非テキスト
データポインタが指すデータのうちブロックＩＤトフラ
グを非テキストデータファイルから出力ファイルにコピ
ーする。ステップ１２０６：次に、非テキストデータポインタが
指すデータの内容部にあるテキストＩＤ番号をキーとし
てテキストデータファイル３１０を検索し、テキストＩ
Ｄ番号が一致するテキストデータを取り出す。図７のブ
ロック１を処理する場合、内容部にあるテキストＩＤ番
号は「＃１＃」なので、図１１のテキストデータＩＤ番
号が「＃１＃」であるテキストが取り出される。Step 1205: If it is determined in step 1204 that the data is text data, the block ID flag of the data pointed to by the non-text data pointer is copied from the non-text data file to the output file. Step 1206: Next, the text data file 310 is searched using the text ID number in the content portion of the data pointed by the non-text data pointer as a key, and the text I
The text data having the same D number is taken out. When the block 1 in FIG. 7 is processed, the text ID number in the content part is “# 1 #”, and therefore the text having the text data ID number in FIG. 11 is “# 1 #” is extracted.

【００３７】ステップ１２０７：そして、取り出したテ
キストデータを出力ファイルの内容部に格納する。図７
のブロック１の例では、図１３のブロック１のようにな
る。ステップ１２１０：一方ステップ１２０４において、テ
キストデータでないと判別された場合、例えば、図７に
おいてブロック３のデータが処理される場合、ブロック
３のデータは非テキストデータであるので、ブロックＩ
Ｄ／フラグ／内容のデータを非テキストデータファイル
から出力ファイルにコピーする。この結果は、図１３の
ブロック３に示すようになる。Step 1207: Then, the extracted text data is stored in the content part of the output file. Figure 7
In the example of block 1 in FIG. Step 1210: On the other hand, if it is determined in step 1204 that the data is not text data, for example, if the data of block 3 is processed in FIG. 7, the data of block 3 is non-text data, so block I
Copy the D / Flag / Content data from the non-text data file to the output file. The result is as shown in block 3 of FIG.

【００３８】図１３は本発明の機械翻訳装置１により得
られた出力文書を示す説明図である。ステップ１２０８：次に、非テキストデータポインタを
＋１する。ステップ１２０９：非テキストデータポインタが指す場
所にデータがあるかどうか判断する。データがある場合
は、ステップ１２０４に戻る。このようにして、非テキ
ストデータファイルのブロック２、ブロック３、…と上
記の処理を繰り返し、データが無くなれば処理を終了す
る。FIG. 13 is an explanatory diagram showing an output document obtained by the machine translation device 1 of the present invention. Step 1208: Next, the non-text data pointer is incremented by 1. Step 1209: Determine whether there is data at the location pointed to by the non-text data pointer. If there is data, the process returns to step 1204. In this way, the above processing is repeated for blocks 2, 3, ... Of the non-text data file, and when there is no more data, the processing ends.

【００３９】これにより、記憶しておいた非テキストデ
ータファイル３０７のテキストＩＤ番号が翻訳処理ステ
ップ３０４の結果出力された第２言語からなるテキスト
データ３１０と置き換えられる。このようにして、図７
の非テキストデータと図１１のテキストデータから、図
６に示す文書データの「内容」が第２言語のテキストデ
ータに変換されたものが得られる（図１３参照）。As a result, the stored text ID number of the non-text data file 307 is replaced with the text data 310 in the second language output as a result of the translation processing step 304. In this way, FIG.
The non-text data and the text data of FIG. 11 are obtained by converting the “contents” of the document data shown in FIG. 6 into the text data of the second language (see FIG. 13).

【００４０】従って、本発明によれば、原文書データの
テキストデータと翻訳処理されたテキストデータとの置
き換えを、テキストＩＤ番号を非テキストデータファイ
ルに記憶しておき、後でテキストＩＤ番号を第２言語の
テキストデータに置き換えるという方法で実現する。ま
た、翻訳されたテキストデータが原文書データ内のどこ
にあったかという対応を求めるための特別な手段を用意
する必要がないので、単純な構成で実現できる。また、
対応を求める処理を行う必要がないので翻訳処理時間も
短くなる。さらに、テキストデータとテキストＩＤ番号
が１対１に対応しているので対応を間違うことがなく、
原文書データのテキストデータを翻訳処理されたテキス
トデータに正確に置き換えることができる。従って、翻
訳処理における作業の効率を図ることができる。Therefore, according to the present invention, the replacement of the text data of the original document data with the translated text data is stored in the non-text data file, and the text ID number is stored in the non-text data file. It is realized by the method of replacing with text data in two languages. Further, since it is not necessary to prepare a special means for ascertaining where in the original document data the translated text data is, it can be realized with a simple configuration. Also,
Since it is not necessary to perform the process of requesting the correspondence, the translation processing time is shortened. Furthermore, since the text data and the text ID number correspond one to one, there is no mistake in correspondence,
It is possible to accurately replace the text data of the original document data with the translated text data. Therefore, the work efficiency in the translation process can be improved.

【００４１】次に、本発明の他の実施例について図面を
参照しながら説明する。図１４は本発明の文書処理装置
の他の実施例を示すブロック図である。図１４に示すよ
うに、１４０１は制御部であり、判別手段、データ格納
手段、データ処理手段として装置全体の制御を行うＣＰ
Ｕ（Central Processing Unit ）１４０２と、このＣＰ
Ｕ１４０２の動作時のデータを記憶するＲＡＭ（Random
AccessMemory）からなるメインメモリ１４０３から構
成されてる。また、メインメモリ１４０３はテキストデ
ータ、編集されたテキストデータ及び翻訳されたテキス
トデータを記憶するテキストデータ記憶部、テキストデ
ータ対応関係情報等を記憶するテキストデータ対応関係
記憶部で構成されている。Next, another embodiment of the present invention will be described with reference to the drawings. FIG. 14 is a block diagram showing another embodiment of the document processing apparatus of the present invention. As shown in FIG. 14, a control unit 1401 is a CP that controls the entire apparatus as a determination unit, a data storage unit, and a data processing unit.
U (Central Processing Unit) 1402 and this CP
RAM (Random memory) that stores data during operation of U1402
The main memory 1403 is composed of AccessMemory). The main memory 1403 includes a text data storage unit that stores text data, edited text data, and translated text data, and a text data correspondence storage unit that stores text data correspondence information.

【００４２】また、１４０４は翻訳手段として機能する
ＣＰＵ、翻訳プログラムを格納したＲＯＭ、翻訳バッフ
ァ（ＲＡＭ）等からなる翻訳モジュールであり、１４０
５は翻訳モジュール１４０４が入力された原文を翻訳す
る際に使用する翻訳用の辞書、文法規則等を格納してい
るＲＯＭ（Read Only Memory）からなる辞書メモリで
る。また、制御部１４０１には、翻訳モジュール１４０
４、辞書メモリ１４０５が接続されている。Reference numeral 1404 denotes a translation module including a CPU functioning as a translation unit, a ROM storing a translation program, a translation buffer (RAM), and the like, and 140
Reference numeral 5 denotes a dictionary memory including a translation dictionary used when the translation module 1404 translates the input original text, and a ROM (Read Only Memory) storing grammar rules and the like. Further, the control module 1401 includes a translation module 140
4. The dictionary memory 1405 is connected.

【００４３】１４０６はテキストデータ抽出部であり、
このテキストデータ抽出部１４０６により、外部記憶装
置１４１０に記憶されているテキストデータと非テキス
トデータとが混在する原文ファイルからテキストデータ
を抽出して、一文切り出しを行った後、テキストデータ
をメインメモリ１４０３のテキストデータ記憶部に記憶
する。１４０７は編集部であり、この編集部１４０７に
より、テキストデータ抽出部１４１０によって抽出され
たテキストデータを編集すると同時に編集後のテキスト
データと原文書のテキストデータの対応関係をメインメ
モリ１４０３のテキストデータ対応関係記憶部に記憶さ
せる。Reference numeral 1406 is a text data extraction unit,
The text data extracting unit 1406 extracts the text data from the original text file in which the text data and the non-text data stored in the external storage device 1410 are mixed and cuts out one text, and then the text data is extracted from the main memory 1403. It is stored in the text data storage unit. An editing unit 1407 edits the text data extracted by the text data extracting unit 1410 by the editing unit 1407, and at the same time, sets the correspondence between the edited text data and the text data of the original document in the main memory 1403. It is stored in the relational storage unit.

【００４４】また、１４０４は翻訳手段として機能する
ＣＰＵ、翻訳プログラムを格納したＲＯＭ、翻訳バッフ
ァ（ＲＡＭ）等からなる翻訳モジュールであり、１４０
５は翻訳モジュール１４０４が入力された原文を翻訳す
る際に使用する翻訳用の辞書、文法規則等を格納してい
るＲＯＭ（Read Only Memory）からなる辞書メモリで
る。また、制御部１４０１には、翻訳モジュール１４０
４、辞書メモリ１４０５が接続されている。Reference numeral 1404 denotes a translation module including a CPU functioning as a translation means, a ROM storing a translation program, a translation buffer (RAM), and the like, and 140
Reference numeral 5 denotes a dictionary memory including a translation dictionary used when the translation module 1404 translates the input original text, and a ROM (Read Only Memory) storing grammar rules and the like. Further, the control module 1401 includes a translation module 140
4. The dictionary memory 1405 is connected.

【００４５】１４０８はテキストデータ置換部であり、
このテキストデータ置換部１４０８により、テキストデ
ータ対応関係記憶部に記憶されたテキストデータ対応関
係を参照して、入力文書のテキストデータを、抽出され
たテキストデータから翻訳モジュール１４０４による翻
訳処理を行った結果得られた第２言語からなるテキスト
データに置き換える。Reference numeral 1408 is a text data replacement section,
The text data replacement unit 1408 refers to the text data correspondence stored in the text data correspondence storage unit to translate the text data of the input document from the extracted text data by the translation module 1404. The obtained text data in the second language is replaced.

【００４６】１４０９は入力原文の文字入力、文書編
集、翻訳に対する指示等の入力のためのキーボード、Ｏ
ＣＲ等で構成される入力装置である。１４１０は入力装
置１４０９から入力された原文のファイル（原文ファイ
ル）や翻訳文のファイルを格納するためのＦＤ（フロピ
ィーディスク）装置、ＨＤ（ハードディスク）装置で構
成される外部記憶装置である。１４１１はＣＲＴ（Cath
ode Ray Tube）、ＬＣＤ（液晶ディスプレイ）等の表示
装置である。Reference numeral 1409 denotes a keyboard for inputting characters of the input source text, editing documents, inputting instructions for translation, and the like.
The input device is composed of a CR and the like. An external storage device 1410 includes an FD (floppy disk) device and an HD (hard disk) device for storing an original text file (original text file) and a translated text file input from the input device 1409. 1411 is a CRT (Cath
It is a display device such as an ode ray tube) or an LCD (liquid crystal display).

【００４７】上記翻訳モジュール１４０４は、翻訳され
る原文に使用されている第１言語であるソース言語が入
力されると、それを翻訳して翻訳文に使用されている第
２言語であるターゲット言語を出力するものである。す
なわち、ＣＰＵ１４０２の制御により、外部記憶装置１
４１０に格納された複数の原文ファイルのうち、あらか
じめ指定された１つの原文ファイルが外部記憶装置１４
１０からメインメモリ３に転送され、その原文ファイル
の中の１文のソース言語が翻訳モジュール１４０４に送
られる。翻訳モジュール１４０４は、辞書メモリ１４０
５に記憶されている辞書、文法規則を用いて、入力され
たソース言語をターゲット言語に翻訳する。翻訳された
文は、メインメモリ１４０３に一旦記憶されると共に、
表示装置１４１１の画面に表示されるようになってい
る。The translation module 1404 receives the source language, which is the first language used in the translated original text, and translates the source language, which is the target language, which is the second language used in the translated text. Is output. That is, the external storage device 1 is controlled by the CPU 1402.
Of the plurality of source text files stored in 410, one designated source text file is the external storage device 14
10 is transferred to the main memory 3 and the source language of one sentence in the original sentence file is sent to the translation module 1404. The translation module 1404 uses the dictionary memory 140.
Using the dictionary and grammar rules stored in 5, the input source language is translated into the target language. The translated sentence is temporarily stored in the main memory 1403, and
It is adapted to be displayed on the screen of the display device 1411.

【００４８】図１５は本発明を機械翻訳装置２に適用し
た機能構成を示すブロック図である。図１５において、
１５０１はキーボード、ＯＣＲからなる入力手段（入力
装置）であり、入力原文の文字入力、文書編集、翻訳に
対する指示等の入力をする。１５０２はテキストデータ
記憶部、テキストデータ対応関係記憶部からなる記憶手
段（メインメモリ）であり、テキストデータ記憶部に
は、テキストデータ、編集されたテキストデータ及び翻
訳されたテキストデータが記憶され、テキストデータ対
応関係記憶部には、テキストデータ対応関係情報が記憶
される。FIG. 15 is a block diagram showing the functional arrangement in which the present invention is applied to the machine translation device 2. In FIG.
Reference numeral 1501 denotes an input unit (input device) including a keyboard and an OCR, which is used to input characters of an input source text, edit documents, input instructions for translation, and the like. A storage unit (main memory) 1502 includes a text data storage unit and a text data correspondence storage unit. The text data storage unit stores text data, edited text data, and translated text data. Text data correspondence information is stored in the data correspondence storage unit.

【００４９】１５０３はテキストデータ判別部、テキス
トデータ抽出部、一文切り出し部、データ格納部からな
るテキストデータ抽出手段（テキストデータ抽出部）で
あり、第１言語からなるテキストデータと非テキストデ
ータとが混在する文書からテキストデータか非テキスト
データかを判別し、テキストデータを抽出する。１５０
４は削除処理部、連結処理部、分割処理部、挿入処理部
からなる編集手段（編集部）であり、抽出されたテキス
トに対して前記の各処理部で削除、連結、分割、挿入等
の編集処理を実行すると同時に編集後のテキストデータ
と原文書のテキストデータの対応関係情報をメインメモ
リ１５０３のテキストデータ対応関係記憶部に記憶させ
る。Reference numeral 1503 denotes a text data extracting means (text data extracting section) including a text data discriminating section, a text data extracting section, a sentence cutting section, and a data storing section. Text data in the first language and non-text data Text data or non-text data is discriminated from a mixed document, and the text data is extracted. 150
Reference numeral 4 denotes an editing unit (editing unit) including a deletion processing unit, a concatenation processing unit, a division processing unit, and an insertion processing unit. At the same time when the editing process is executed, the correspondence information between the edited text data and the text data of the original document is stored in the text data correspondence storage unit of the main memory 1503.

【００５０】１５０５は翻訳実行部からなる翻訳手段
（翻訳部）であり、編集されたテキストに対して翻訳処
理を実行する。１５０６はテキストデータ判別部、テキ
ストデータ検索部、データ格納部からなるテキストデー
タ置換手段（テキストデータ置換部）であり、テキスト
データ対応関係記憶部に記憶されたテキストデータ対応
関係による文書レイアウトに基づいて第１言語からなる
テキストデータを翻訳処理の結果得られた第２言語から
なるテキストデータに置き換える。１５０７は表示装
置、あるいはプリンタからなる出力手段であり、テキス
トデータ、非テキストデータ等を表示出力あるいはプリ
ント出力する。Reference numeral 1505 denotes a translation means (translation unit) including a translation execution unit, which executes translation processing on the edited text. Reference numeral 1506 denotes a text data replacement unit (text data replacement unit) including a text data determination unit, a text data search unit, and a data storage unit, which is based on the document layout based on the text data correspondence stored in the text data correspondence storage unit. The text data in the first language is replaced with the text data in the second language obtained as a result of the translation process. Reference numeral 1507 denotes an output unit including a display device or a printer, which displays or prints out text data, non-text data, and the like.

【００５１】図１６は本発明の機械翻訳装置２における
データ処理の概略を示すフローチャートである。次に、
図１６を参照して、テキストデータと非テキストデータ
とが混在する文書を翻訳するための処理手順を説明す
る。ステップ１６０１：まず、翻訳対象となる、テキストデ
ータと非テキストデータとが混在している原文書を読み
込み、メインメモリ１４０３に格納する。ステップ１６０２：入力された原文書からテキストデー
タを抽出する処理が行われる。この結果、翻訳処理の対
象となるテキストデータ１６０８が得られる。FIG. 16 is a flow chart showing an outline of data processing in the machine translation device 2 of the present invention. next,
A processing procedure for translating a document in which text data and non-text data are mixed will be described with reference to FIG. Step 1601: First, an original document in which text data and non-text data are mixed, which is a translation target, is read and stored in the main memory 1403. Step 1602: A process of extracting text data from the input original document is performed. As a result, the text data 1608 to be translated is obtained.

【００５２】ステップ１６０３：必要に応じて、ステッ
プ１６０２で抽出されたテキストデータに対して編集処
理が行われる。この結果、翻訳処理の対象となるテキス
トデータ１６０９と、原文書から抽出されたテキストデ
ータ１６８と編集後のテキストデータ１６０９との対応
関係を示すテキストデータ対応関係ファイル１６０７が
得られる。なお、ここでの対応関係は、文単位の関係を
表している。ステップ１６０４：編集が行われたテキストデータ１６
０９を受け取り、翻訳モジュール１４０４（図１４、参
照）によって翻訳する処理が行われる。翻訳処理の結
果、翻訳文すなわち第２言語のテキストデータ１６１０
が得られる。Step 1603: If necessary, the text data extracted in step 1602 is edited. As a result, a text data correspondence file 1607 showing the correspondence between the text data 1609 to be translated, the text data 168 extracted from the original document, and the edited text data 1609 is obtained. Note that the correspondence here represents a sentence-by-sentence relationship. Step 1604: Edited text data 16
09 is received, and the translation module 1404 (see FIG. 14) performs translation processing. As a result of the translation processing, the translated text, that is, the text data 1610 of the second language.
Is obtained.

【００５３】ステップ１６０５：ステップ１６０３で得
られたテキストデータ対応関係ファイル１６０７に記憶
していた情報に基づいて、原文書のテキストを記憶して
いた箇所に、ステップ１６０４で得られた第２言語のテ
キストデータ１６１０を置換する処理が行われる。ステップ１６０６：以上のようにして得られた、原文書
の第１言語からなるテキストデータを第２言語からなる
テキストデータに置き換えた結果文書を出力する処理が
行われる。Step 1605: Based on the information stored in the text data correspondence file 1607 obtained in step 1603, the text of the original document is stored in the location where the text of the original document is stored in the second language obtained in step 1604. A process of replacing the text data 1610 is performed. Step 1606: The process of outputting the result document in which the text data in the first language of the original document obtained as described above is replaced with the text data in the second language is performed.

【００５４】図１７は図１６のステップ１６０２に対応
するテキストデータ抽出処理の詳細を示すフローチャー
トである。また、図１８は本発明の機械翻訳装置２で翻
訳される文書の１ページのレイアウトを示す説明図であ
る。図１８に示すように、この文書のレイアウトでは、
例えば、ブロック１〜２とブロック４〜５にはテキスト
データである文書データから構成され、ブロック３には
非テキストデータである図形データが存在している。ま
た、図１９は図１８の入力文書を文書データとして記憶
する場合の記憶例を示す説明図である。表示装置１４１
１には図１８に示すような文書が表示されるが、メイン
メモリ１４０３には図１９に示すような形式のデータと
して記憶される。また、図２０は図１９の文書データに
対してテキストデータ抽出処理を行った後のテキストデ
ータの説明図である。FIG. 17 is a flow chart showing details of the text data extraction processing corresponding to step 1602 of FIG. FIG. 18 is an explanatory diagram showing the layout of one page of a document translated by the machine translation device 2 of the present invention. As shown in FIG. 18, in the layout of this document,
For example, blocks 1 and 2 and blocks 4 and 5 are composed of document data which is text data, and block 3 is graphic data which is non-text data. FIG. 19 is an explanatory diagram showing a storage example in which the input document of FIG. 18 is stored as document data. Display device 141
A document as shown in FIG. 18 is displayed in 1 and is stored in the main memory 1403 as data in the format as shown in FIG. FIG. 20 is an explanatory diagram of the text data after the text data extraction processing is performed on the document data of FIG.

【００５５】ステップ１７０１：まず、翻訳対象とな
る、テキストデータと非テキストデータとが混在してい
る原文書を読み込む。ステップ１７０２：次に、データポインタを原文書の先
頭にセットする。図１９に示す例では、ブロック１を指
す。ステップ１７０３：テキストＮｏを１にセットする。ステップ１７０４：ＣＰＵ１４０２において、データポ
インタが指しているデータがテキストデータか非テキス
トデータかを判別する。図１９に示す例では、テキスト
／非テキストを示すフラグによりテキストデータか非テ
キストデータかを判別することができる。Step 1701: First, an original document in which text data and non-text data are mixed, which is a translation target, is read. Step 1702: Next, the data pointer is set to the beginning of the original document. In the example shown in FIG. 19, it indicates block 1. Step 1703: Set the text No. to 1. Step 1704: The CPU 1402 determines whether the data pointed to by the data pointer is text data or non-text data. In the example shown in FIG. 19, it is possible to determine whether the data is text data or non-text data by the flag indicating text / non-text.

【００５６】ステップ１７０５：ステップ１７０４にお
いてテキストデータと判別された場合は、入力文書の
「内容」の部分に入っているデータを取り出す。ステップ１７０６：このデータに対して一文切り出しを
行う。一文切り出しでは、連続する文をピリオドやクエ
スチョンマークにより一文ごとに分割する。ステップ１７０７：一文切り出しの結果得られた一文ご
とにテキストＮｏを付加しながらテキストデータファイ
ルに書き込む。一方、ステップ１７０４において、テキ
ストデータでないと判別された場合、例えば、図１９に
示すブロック３のデータが指定された場合、ブロック３
のデータは非テキストデータであるので、何も処理は行
われずステップ１７０８へと進む。Step 1705: If the data is determined to be text data in step 1704, the data contained in the "content" portion of the input document is taken out. Step 1706: One sentence is cut out from this data. In single sentence segmentation, consecutive sentences are divided into individual sentences by periods and question marks. Step 1707: Write to the text data file while adding a text No. for each sentence obtained as a result of cutting out one sentence. On the other hand, if it is determined in step 1704 that the data is not text data, for example, if the data of block 3 shown in FIG.
Data is non-text data, no processing is performed and the process proceeds to step 1708.

【００５７】ステップ１７０８：以上のように、テキス
トデータと非テキストデータの各々に応じた処理が行わ
れた後、データポインタを＋１する。ステップ１７０９：データポインタが指す場所にデータ
があるかどうか判断する。データがある場合は、ステッ
プ１７０４に戻る。このようにして、文書データのブロ
ック２、ブロック３、…と上記の処理を繰り返し、デー
タが無くなれば処理を終了する。これにより、翻訳処理
の対象となるテキストデータ１６０８が得られる。この
ようにして図１９の文書データから得られたテキストデ
ータファイルが図２０に示すものである。Step 1708: After the processing corresponding to each of the text data and the non-text data is performed as described above, the data pointer is incremented by one. Step 1709: It is judged whether or not there is data at the location pointed to by the data pointer. If there is data, the process returns to step 1704. In this way, the above processing is repeated for the block 2, the block 3, ... Of the document data, and when there is no more data, the processing ends. As a result, the text data 1608 to be translated is obtained. A text data file obtained from the document data of FIG. 19 in this way is shown in FIG.

【００５８】また、図２１は図２０のテキストデータに
対して編集処理を行った後のテキストデータを示す説明
図である。すなわち、ステップ１６０２で抽出されたテ
キストデータに対して、ステップ１６０３の編集処理で
削除、連結、分割の編集作業が行われた後のテキストデ
ータを示している。図２２は図２０の原文書テキストと
図２１の編集後のテキストとのテキストデータ対応関係
を示す説明図である。これは、ステップ１６０３の編集
処理の前後のテキストを比較することにより得られる。FIG. 21 is an explanatory diagram showing the text data after the text data of FIG. 20 has been edited. That is, the text data extracted in step 1602 is the text data after the editing work of deleting, concatenating, and dividing in the editing process of step 1603. 22 is an explanatory diagram showing a text data correspondence relationship between the original document text of FIG. 20 and the edited text of FIG. This is obtained by comparing the text before and after the editing process of step 1603.

【００５９】図２３は図２１のテキストデータに対して
翻訳処理を行った結果得られた第２言語のテキストデー
タを示す説明図である。すなわち、ステップ１６０４の
翻訳処理によって得られた第２言語のテキストデータを
示している。ここで、翻訳処理は一文単位で行われる。
従って、図２３のテキストデータは図２１のテキストデ
ータと一対一に対応している。このことは、図２２は原
文書テキストと編集後のテキストデータとの対応関係を
示しているが、また同時に、原文書テキストと第２言語
のテキストデータとの対応関係も示していることにな
る。FIG. 23 is an explanatory diagram showing the text data of the second language obtained as a result of the translation processing on the text data of FIG. That is, the text data in the second language obtained by the translation processing in step 1604 is shown. Here, the translation process is performed for each sentence.
Therefore, the text data in FIG. 23 has a one-to-one correspondence with the text data in FIG. This shows that although FIG. 22 shows the correspondence between the original document text and the edited text data, it also shows the correspondence between the original document text and the text data of the second language. .

【００６０】図２４は図１６のステップ１６０５に対応
するテキストデータ置換処理の詳細を示すフローチャー
トである。ステップ２４０１：まず、原文書を読み込む。ステップ２４０２：続いて、ステップ１６０４の翻訳処
理で得られた翻訳文すなわち第２言語のテキストデータ
ファイル１６１０を読み込む。ステップ２４０３：更に、ステップ１６０３の編集処理
で得られたテキストデータ対応関係ファイル１６０７を
読み込む。ステップ２４０４：次に、原文書データポインタを原文
書の先頭にセットする。図１９に示す例では、ブロック
１を指す。原文書データポインタはブロック１、ブロッ
ク２、…と指していく。FIG. 24 is a flow chart showing details of the text data replacement process corresponding to step 1605 of FIG. Step 2401: First, the original document is read. Step 2402: Subsequently, the translated sentence obtained by the translation process of step 1604, that is, the text data file 1610 in the second language is read. Step 2403: Furthermore, the text data correspondence file 1607 obtained by the editing process of step 1603 is read. Step 2404: Next, the original document data pointer is set at the head of the original document. In the example shown in FIG. 19, it indicates block 1. The original document data pointer points to block 1, block 2, ...

【００６１】ステップ２４０５：原文書テキストポイン
タを原文書の先頭の文にセットする。図１９の例では、
ブロック１の「内容」部にあるテキストの先頭の文を指
す。原文書テキストポインタは、一文ごとに指してい
く。ステップ２４０６：原文書データポインタが指している
データがテキストデータか非テキストデータかを判別す
る。図１９に示す例では、テキスト／非テキストを示す
フラグによりテキストデータか非テキストデータかを判
別することができる。ステップ２４０７：ステップ２４０６において、テキス
トデータと判別された場合は、原文書テキストポインタ
が指すブロックＩＤとフラグのデータを出力ファイルに
コピーする。Step 2405: Set the original document text pointer to the first sentence of the original document. In the example of FIG.
Refers to the first sentence of the text in the "content" section of block 1. The original document text pointer points to each sentence. Step 2406: Determine whether the data pointed to by the original document data pointer is text data or non-text data. In the example shown in FIG. 19, it is possible to determine whether the data is text data or non-text data by the flag indicating text / non-text. Step 2407: If the data is determined to be text data in step 2406, the block ID and flag data pointed to by the original document text pointer is copied to the output file.

【００６２】ステップ２４０８：続いて、原文書テキス
トポインタに対応するテキストデータＮｏをテキストデ
ータ対応関係ファイルから得る。図２２の例では、Ｎ
ｏ．１の原文書テキストにはＮｏ．１のテキストが対応
している。つまり、Ｎｏ．１の原文書テキストは図２１
と図２３に示すテキストデータの両方のＮｏ．１のテキ
ストに対応している。ステップ２４０９：テキストデータファイル１６１０を
検索し、テキストデータＮｏのテキストを取り出す。図
２３の例では、Ｎｏ．１のテキストが取り出される。Step 2408: Subsequently, the text data No corresponding to the original document text pointer is obtained from the text data correspondence file. In the example of FIG. 22, N
o. No. 1 is the original document text. 1 text corresponds. That is, No. The original document text of 1 is shown in FIG.
No. of both the text data shown in FIG. It corresponds to 1 text. Step 2409: The text data file 1610 is searched to retrieve the text of the text data No. In the example of FIG. 1 text is retrieved.

【００６３】ステップ２４１０：そして、ステップ２４
０９で取り出したテキストデータを出力ファイルに書き
込む。図２３では、Ｎｏ．１のテキストが出力ファイル
に書き込まれる。つまり、図１９のブロック１の「内
容」の部分の１番目の文が図２３の１番目のテキストデ
ータに置き換えられて出力ファイルに書き込まれたとい
うことになる。ステップ２４１１：原文書テキストポインタを＋１す
る。ステップ２４１２：原文書テキストポインタが指す場所
にデータがあるかどうか判断する。つまり、現在処理中
のブロック内に文が残っているかどうか判断する。デー
タがある場合は、ステップ２４０８に戻る。図１９のブ
ロック１の例では、２番目の文があるのでステップ２４
０８に戻り上記の処理を繰り返すことになる。Step 2410: And step 24
The text data extracted in 09 is written in the output file. In FIG. 23, No. The text of 1 is written to the output file. That is, it means that the first sentence of the "content" portion of block 1 in FIG. 19 is replaced with the first text data in FIG. 23 and written in the output file. Step 2411: The original document text pointer is incremented by 1. Step 2412: It is judged whether or not there is data at the location pointed to by the original document text pointer. That is, it is determined whether or not a sentence remains in the block currently being processed. If there is data, the process returns to step 2408. In the example of block 1 in FIG. 19, there is the second sentence, so step 24
Returning to 08, the above processing is repeated.

【００６４】図２２の例の中からいくつかを説明する
と、Ｎｏ．５の原文書テキストには対応するテキストが
ない。従って、原文書のＮｏ．５のテキスト部分は出力
では空白となる。Ｎｏ．７とＮｏ．８の原文書テキスト
はどちらも同じＮｏ．６のテキストに対応している。従
って、原文書のＮｏ．７とＮｏ．８の２つのテキスト部
分がＮｏ．６のテキストひとつに置き換えられる。Ｎ
ｏ．１１の原文書テキストはＮｏ．９とＮｏ．１０のテ
キストと対応している。従って、原文書のＮｏ．１１の
テキスト部分ひとつがＮｏ．９とＮｏ．１０の２つのテ
キストに置き換えられる。このようにして、原文書デー
タポインタが指すブロック内でテキストデータがなくな
るまで上記処理を繰り返す。Some of the examples shown in FIG. The original document text of 5 has no corresponding text. Therefore, the original document No. The text portion of 5 is blank in the output. No. 7 and No. The original document texts of No. 8 have the same No. It corresponds to 6 texts. Therefore, the original document No. 7 and No. The two text portions of No. 8 are No. Replaced with one of the six texts. N
o. No. 11 is the original document text. 9 and No. Corresponds to 10 texts. Therefore, the original document No. One text part of No. 11 is No. 9 and No. Replaced by 2 texts of 10. In this way, the above process is repeated until there is no text data in the block pointed to by the original document data pointer.

【００６５】ステップ２４１５：一方、ステップ２４０
６において、テキストデータでないと判別された場合、
例えば、図１９においてブロックのデータが指定された
場合、ブロック３のデータは非テキストデータであるの
で、原文書データポインタが指すブロックＩＤ、フラ
グ、内容を出力ファイルにコピーする。ステップ２４１３：次に、原文書データポインタを＋１
する。ステップ２４１４：原文書データポインタが指す場所に
データがあるかどうか判断する。つまり、原文書にブロ
ックが残っているかどうかを判断する。データがある場
合は、ステップ２４０６に戻る。このようにして、原文
書データファイルのブロック２、ブロック３、…と上記
の処理を繰り返し、データが無くなれば処理を終了す
る。Step 2415: On the other hand, step 240
When it is determined in step 6 that the data is not text data,
For example, when the block data is designated in FIG. 19, since the block 3 data is non-text data, the block ID, flag, and contents pointed by the original document data pointer are copied to the output file. Step 2413: Next, set the original document data pointer to +1
To do. Step 2414: It is judged whether or not there is data at the location pointed to by the original document data pointer. That is, it is determined whether or not the block remains in the original document. If there is data, the process returns to step 2406. In this way, the above processing is repeated for blocks 2, 3, ... Of the original document data file, and the processing ends when there is no more data.

【００６６】これにより、原文書のテキストが翻訳処理
ステップ１６０４の結果出力された第２言語からなるテ
キストデータ１６１０と置き換えられる。このようにし
て、図２３のテキストデータと図２２のテキストデータ
対応関係ファイルから、図１９に示す文書データの「内
容」が第２言語のテキストデータに変換されたものが得
られる（図２５、参照）。図２５は機械翻訳装置２によ
り得られた出力文書を示す説明図である。As a result, the text of the original document is replaced with the text data 1610 in the second language output as a result of the translation processing step 1604. In this way, the "contents" of the document data shown in FIG. 19 is converted into the text data of the second language from the text data correspondence file of FIG. 23 and the text data correspondence file of FIG. 22 (FIG. 25, reference). FIG. 25 is an explanatory diagram showing an output document obtained by the machine translation device 2.

【００６７】従って、本発明によれば、原文書データの
テキストデータと翻訳処理されたテキストデータとの置
き換えを、テキストデータ対応関係情報をテキストデー
タ対応関係情報記憶部に記憶しておき、後でテキストデ
ータ対応関係情報に基づいて第２言語のテキストデータ
を置き換えるという方法で実現できる。従って、テキス
トデータと非テキストデータが混在する文書データで構
成される文書データのレイアウトからテキストデータの
みを正確に抽出して信頼性の高い翻訳処理をすることが
できる。従って、翻訳処理の効率を大幅に向上すること
ができる。Therefore, according to the present invention, the replacement of the text data of the original document data with the translated text data is stored in the text data correspondence relationship information storage unit and stored later. This can be realized by a method of replacing the text data of the second language based on the text data correspondence information. Therefore, it is possible to accurately extract only the text data from the layout of the document data composed of the document data in which the text data and the non-text data are mixed, and perform the highly reliable translation process. Therefore, the efficiency of translation processing can be greatly improved.

【００６８】[0068]

【発明の効果】本発明によれば、テキストデータと非テ
キストデータが混在する文書データで構成される文書デ
ータのレイアウトからテキストデータのみを正確に抽出
して信頼性の高い翻訳処理をすることができる。従っ
て、原文書と同じレイアウトの翻訳文書が得られること
になり、翻訳処理の効率を大幅に向上することができる
ので翻訳利用者の負担も大幅に軽減される。According to the present invention, it is possible to accurately extract only text data from a layout of document data composed of document data in which text data and non-text data are mixed, and perform highly reliable translation processing. it can. Therefore, a translated document having the same layout as the original document can be obtained, and the efficiency of translation processing can be greatly improved, so that the burden on the translation user can be significantly reduced.

[Brief description of drawings]

【図１】本発明の文書処理装置の一実施例を示すブロッ
ク図である。FIG. 1 is a block diagram showing an embodiment of a document processing apparatus of the present invention.

【図２】本発明を機械翻訳装置１に適用した機能構成を
示すブロック図である。FIG. 2 is a block diagram showing a functional configuration in which the present invention is applied to a machine translation device 1.

【図３】本発明の機械翻訳装置１におけるデータ処理の
概略を示すフローチャートである。FIG. 3 is a flowchart showing an outline of data processing in the machine translation device 1 of the present invention.

【図４】図３のステップ３０２に対応するテキストデー
タ抽出処理の詳細を示すフローチャートである。FIG. 4 is a flowchart showing details of text data extraction processing corresponding to step 302 of FIG.

【図５】本発明の機械翻訳装置１で翻訳される入力文書
１頁のレイアウトを示す説明図である。FIG. 5 is an explanatory diagram showing a layout of one page of an input document translated by the machine translation device 1 of the present invention.

【図６】図５の入力文書を文書データとして記憶する場
合の記憶例を示す説明図である。FIG. 6 is an explanatory diagram showing a storage example when the input document of FIG. 5 is stored as document data.

【図７】図６の文書データに対してテキストデータ抽出
処理を行なった後の非テキストデータを示す説明図であ
る。FIG. 7 is an explanatory diagram showing non-text data after performing text data extraction processing on the document data of FIG. 6;

【図８】図６の文書データに対してテキストデータ抽出
処理を行なった後のテキストデータを示す説明図であ
る。8 is an explanatory diagram showing text data after performing text data extraction processing on the document data of FIG.

【図９】図８のテキストデータに対して編集処理を行な
った後のテキストデータを示す説明図である。9 is an explanatory diagram showing text data after the text data of FIG. 8 is edited.

【図１０】図９のテキストデータに対して翻訳処理の中
の一文切り出し処理を行なった後のテキストデータを示
す説明図である。10 is an explanatory diagram showing the text data after the one-sentence cutting process in the translation process is performed on the text data of FIG. 9;

【図１１】図１０のテキストデータに対して翻訳処理を
行なった結果得られた第２言語のテキストデータを示す
説明図である。11 is an explanatory diagram showing text data of a second language obtained as a result of performing a translation process on the text data of FIG.

【図１２】図３のステップ３０５に対応するテキストデ
ータ置換処理の詳細を示すフローチャートである。12 is a flowchart showing details of text data replacement processing corresponding to step 305 of FIG.

【図１３】本発明の機械翻訳装置１により得られた出力
文書を示す説明図である。FIG. 13 is an explanatory diagram showing an output document obtained by the machine translation device 1 of the present invention.

【図１４】本発明の文書処理装置の他の実施例を示すブ
ロック図である。FIG. 14 is a block diagram showing another embodiment of the document processing apparatus of the present invention.

【図１５】本発明を機械翻訳装置２に適用した機能構成
を示すブロック図である。FIG. 15 is a block diagram showing a functional configuration in which the present invention is applied to a machine translation device 2.

【図１６】本発明の機械翻訳装置２におけるデータ処理
の概略を示すフローチャートである。FIG. 16 is a flowchart showing an outline of data processing in the machine translation device 2 of the present invention.

【図１７】図１６のステップ１６０２に対応するテキス
トデータ抽出処理の詳細を示すフローチャートである。FIG. 17 is a flowchart showing details of text data extraction processing corresponding to step 1602 of FIG. 16.

【図１８】本発明の機械翻訳装置２で翻訳される入力文
書の１頁のレイアウトを示す説明図である。FIG. 18 is an explanatory diagram showing a one-page layout of an input document translated by the machine translation device 2 of the present invention.

【図１９】図１８の入力文書を文書データとして記憶す
る場合の記憶例を示す説明図である。FIG. 19 is an explanatory diagram showing a storage example when the input document of FIG. 18 is stored as document data.

【図２０】図１９の文書データに対してテキストデータ
抽出処理を行なった後のテキストデータを示す説明図で
ある。20 is an explanatory diagram showing text data after performing text data extraction processing on the document data of FIG. 19. FIG.

【図２１】図２０のテキストデータに対して編集処理を
行なった後のテキストデータを示す説明図である。21 is an explanatory diagram showing text data after the text data of FIG. 20 is edited.

【図２２】図２０の原文書テキストと図２１の編集後の
テキストとのテキストデータ対応関係を示す説明図であ
る。22 is an explanatory diagram showing a text data correspondence relationship between the original document text in FIG. 20 and the edited text in FIG. 21. FIG.

【図２３】図２１のテキストデータに対して翻訳処理を
行なった結果得られた第２言語のテキストデータであ
る。23 is text data in a second language obtained as a result of translating the text data in FIG. 21. FIG.

【図２４】図１６のステップ１６０５に対応するテキス
トデータ置換処理の詳細を示すフローチャートである。FIG. 24 is a flowchart showing details of text data replacement processing corresponding to step 1605 of FIG.

【図２５】本発明の機械翻訳装置２により得られた出力
文書を示す説明図である。FIG. 25 is an explanatory diagram showing an output document obtained by the machine translation device 2 of the present invention.

[Explanation of symbols]

１、１４０１制御部２、１４０２ＣＰＵ３、１４０３メインメモリ４、１４０４翻訳モジュール（翻訳手段）５、１４０５辞書メモリ６、１４０６テキストデータ抽出部（テキストデータ
抽出手段）７、１４０７編集部（編集手段）８、１４０８テキストデータ置換部（テキストデータ
置換手段）９、１４０９入力装置１０、１４１０外部記憶装置１１、１４１１表示装置1, 1401 control unit 2, 1402 CPU 3, 1403 main memory 4, 1404 translation module (translation means) 5, 1405 dictionary memory 6, 1406 text data extraction unit (text data extraction means) 7, 1407 editing unit (editing means) 8, 1408 Text data replacement unit (text data replacement means) 9, 1409 Input device 10, 1410 External storage device 11, 1411 Display device

Claims

[Claims]

1. Input means for inputting original document data in which text data in a first language and non-text data are mixed and inputting processing instructions for editing, layout, translation, etc., and text data for storing text data. A storage unit having a storage unit and a non-text data storage unit for storing non-text data, and text data is extracted from the input original document data and the extracted text data is associated with a text ID number. Text data extraction means for storing the non-text data in the same layout as the original document by adding the text ID number to the corresponding portion after the text data is extracted, and the extracted text data. Editing means for executing the editing process on the A translation means for performing a translation process from the first language to the second language for the string data, and a text ID number and non-text data stored in the non-text data storage unit are read to obtain the text ID number as a result of the translation process And a text data replacing means for replacing the replaced text data in the second language, and an output means for outputting document data in which the replaced second language text data and non-text data are mixed. Document processing device.

2. Input means for inputting original document data in which text data in a first language and non-text data are mixed and inputting processing instructions for editing, layout, translation, etc., and text data for storing text data. A storage unit having a storage unit and a text data correspondence relationship storage unit for storing text data correspondence relationship information; and text data extracted from the input original document data, and the extracted text data is stored in the text data storage unit of the storage unit. A text data extraction unit for executing the editing process on the extracted text data and storing the text data correspondence information between the edited text data and the original document data in the text data correspondence storage unit of the storage unit. Translation method and translation for edited text data And a translating means for executing the processing and a replacement for replacing the text data in the first language with the text data in the second language obtained as a result of the translation process based on the text data correspondence information stored in the text data correspondence storage unit. A document processing apparatus, comprising: a means and an output means for outputting document data in which text data of the replaced second language and non-text data are mixed.

3. The text data extracting means includes a text data extracting section and a text data discriminating section, and when the text data extracting section extracts the text data from the input original document data, the text data discriminating section uses the text data. 3. The document processing apparatus according to claim 1, wherein the document processing apparatus determines whether the data is non-text data or non-text data.