JP3477308B2

JP3477308B2 - Machine translation equipment

Info

Publication number: JP3477308B2
Application number: JP07381096A
Authority: JP
Inventors: 一夫西浦
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1996-03-28
Filing date: 1996-03-28
Publication date: 2003-12-10
Anticipated expiration: 2016-03-28
Also published as: JPH09265468A

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ある言語の文書を
他の言語の文書に変換する機械翻訳装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine translation device for converting a document in one language into a document in another language.

【０００２】[0002]

【従来の技術】機械翻訳装置を含め、ワードプロセッサ
やコンピュータなどの計算機上で取り扱われる文書デー
タの多くは、文字データ本体である文字情報以外にも、
文書の構成、レイアウトや文字の修飾情報、例えば罫線
の付与や書体の変更などの情報を含んでいる。2. Description of the Related Art Most of the document data handled by computers such as word processors and computers, including machine translation devices, are not limited to the character information that is the character data itself.
It includes information such as document structure, layout and character modification information, such as ruled line addition and typeface change information.

【０００３】これらの情報（以下付加情報と呼ぶ）を付
加する方式としてはいくつかの規格が既に提案されてお
り、文書の論理構造などを記述するためのＳＧＭＬ、ハ
イパーテキスト文書を記述するためのＨＴＭＬなどが知
られている。Several standards have already been proposed as a method of adding these pieces of information (hereinafter referred to as additional information), and SGML for describing the logical structure of a document and hypertext documents for describing. HTML and the like are known.

【０００４】このような状況において、機械翻訳の分野
においても、原文書のデータの中に含まれている付加情
報をできるだけ損なわずに、通常の文書の部分を翻訳す
ることが行われている。この装置で翻訳された翻訳結果
の文書は、原文書データとほとんど同様の書式で出力す
ることが可能である。In such a situation, even in the field of machine translation, a portion of a normal document is translated without damaging the additional information contained in the data of the original document as much as possible. The translation result document translated by this device can be output in almost the same format as the original document data.

【０００５】このような機械翻訳装置において、開始タ
グ／終了タグと呼ばれる記号を用いて、特定の区間に付
加情報を付与させている文の処理過程を考える。In such a machine translation device, consider a process of processing a sentence in which additional information is added to a specific section by using symbols called start tags / end tags.

【０００６】Time flies <itaric> like an arrow </it
aric> . 上記の文において、“<itaric>”、“</itaric>”がそ
れぞれ開始タグ、終了タグであり、これらの両タグはこ
の二つのタグではさまれた区間をイタリック体で出力す
ることを表している。Time flies <itaric> like an arrow </ it
aric>. In the above sentence, “<itaric>” and “</ itaric>” are the start and end tags, respectively. Both of these tags output the section sandwiched between these two tags in italics. It means that.

【０００７】この文は以下に説明する処理手順によっ
て、同じ付加情報を持った訳文へと変換される。図７の
処理ブロック図を元に説明する。まず、タグ分離部２
で、入力文書１中に出現するタグを識別し、どの単語に
タグ情報が付加されているのか記憶する。図８はタグ情
報の記憶状態（原文タグ情報６）を示す図である。タグ
情報を記憶後、タグを原文から削除する。This sentence is converted into a translated sentence having the same additional information by the processing procedure described below. Description will be given based on the processing block diagram of FIG. 7. First, the tag separation unit 2
Then, the tag appearing in the input document 1 is identified, and which word the tag information is added to is stored. FIG. 8 is a diagram showing a storage state of the tag information (original text tag information 6). After storing the tag information, the tag is deleted from the original text.

【０００８】次に、翻訳部３によって、文の翻訳を行
う。翻訳過程で得られる原文と訳文の単語の対応情報を
記憶する。図９は訳語の対応情報の例（訳語対応情報
７）を示す図である。Next, the translation unit 3 translates the sentence. The correspondence information between the original sentence and the translated word obtained in the translation process is stored. FIG. 9 is a diagram showing an example of translated word correspondence information (translated word correspondence information 7).

【０００９】さらに、タグ復元部４は、翻訳部３で得ら
れた翻訳結果、単語の対応情報と、タグ分離部で記憶し
ておいた原文タグ情報をもとにして、新たに訳文にタグ
を付与する。図１０が得られる訳文タグ情報を示す図で
ある。これに基づいて訳文にタグを付加し、出力文書５
が出力される。Further, the tag restoration unit 4 newly tags the translated sentence based on the translation result obtained by the translation unit 3, the word correspondence information, and the original sentence tag information stored in the tag separation unit. Is given. It is a figure which shows the translated text tag information from which FIG. 10 is obtained. A tag is added to the translated text based on this, and the output document 5
Is output.

【００１０】この例で得られる翻訳結果は次のようにな
る。The translation result obtained in this example is as follows.

【００１１】時間は <itaric> 矢のように </itaric>
飛ぶ。Time is like <itaric> arrow </ itaric>
Fly

【００１２】[0012]

【発明が解決しようとする課題】ここで、上記したよう
な方式において、例えば次のような文を翻訳したとす
る。Here, it is assumed that the following sentence is translated in the above-mentioned method.

【００１３】<tag1> Time flies </tag2> like an <tag
3> arrow </tag3> . この文では、タグの対応関係に誤りがある。そのため
に、<tag1>および<tag2>の有効範囲が定まらない。タグ
は文字の表示形式に関する情報を表すものであるとする
と、この文を受け取った表示装置（出力装置）は、タグ
の誤りがあることを使用者に報知して処理を中断する
か、もしくは、タグを無効であると解釈する、またはタ
グを自動的に矛盾しないように変更して処理するという
処理を行う。<Tag1> Time flies </ tag2> like an <tag
3> arrow </ tag3>. In this sentence, the tag correspondence is incorrect. Therefore, the effective range of <tag1> and <tag2> is not fixed. If the tag represents information regarding the display format of characters, the display device (output device) that receives this sentence notifies the user that there is an error in the tag and interrupts the process, or Interpret the tag as invalid, or automatically change the tag so that it does not conflict and process it.

【００１４】しかしながら、一般的にこの文を翻訳しよ
うとしたユーザとしては、タグが無効になってしまうと
せっかくタグ情報が欠落してしまうことになる。このこ
とは、その文だけでなく、その文書において表示形式が
原文書データと大きく異なってしまうということになり
かねない。However, generally, as a user who tries to translate this sentence, if the tag becomes invalid, the tag information will be lost. This may cause not only the sentence but also the display format of the document to be significantly different from the original document data.

【００１５】例えば、前述の例文で、開始タグ<tag1>の
有効範囲を、誤った終了タグが出現する</tag2>の直前
までであるとし、対応する開始タグを持たない終了タグ
</tag2>を無効とすると、翻訳結果は以下のようにな
る。For example, in the above example sentence, it is assumed that the effective range of the start tag <tag1> is just before </ tag2> where an incorrect end tag appears, and the end tag does not have a corresponding start tag.
When </ tag2> is disabled, the translation result is as follows.

【００１６】<tag1> 時間 </tag1> は <tag3> 矢 </tag
3> のように <tag1> 飛ぶ </tag1>。<Tag1> time </ tag1> is <tag3> arrow </ tag
3> like <tag1> fly </ tag1>.

【００１７】また、対応関係に誤りのあるタグをすべて
無効であるとすると、時間は <tag3> 矢 </tag3> のよ
うに飛ぶ。If all the tags with incorrect correspondence are invalid, the time jumps like <tag3> arrow </ tag3>.

【００１８】となる。[0018]

【００１９】上記した２つの結果ではどちらも、原文の
タグ情報が少なからず欠落してしまう。そのために上記
したように、表示形式が大きく変わってしまうという問
題が発生する可能性を残してしまう。In both of the above two results, the tag information of the original text is not a little missing. Therefore, as described above, there is a possibility that the problem that the display format changes greatly occurs.

【００２０】本発明においては、このような誤ったタグ
が含まれている場合であっても、翻訳装置がタグの有効
範囲を勝手に解釈することなしに、原文のタグ情報を正
確に反映する形で翻訳処理の出力結果を得ることを目的
とするものである。In the present invention, even if such an erroneous tag is included, the translation device accurately reflects the tag information of the original text without interpreting the effective range of the tag arbitrarily. The purpose is to obtain the output result of the translation process in the form.

【００２１】[0021]

【課題を解決するための手段】本発明の請求項１によれ
ば、第１言語により記述された原文である文字データ、
開始位置及び終了位置を示す一対のタグにより、その区
間において各種情報を付加する付加情報からなる入力文
を入力する入力部と、前記文字データと前記付加情報の
関係を原文タグ情報として記憶し、前記入力文より前記
付加情報を分離するタグ分離部と、前記タグ分離部によ
り、タグを除いた残りの入力文を所望の第２の言語に翻
訳し、訳語対応情報として第１言語と第２言語の対応を
記憶する翻訳部と、前記翻訳部によって得られた第２言
語の文に、前記タグ分離部で一時的に分離したタグを復
元するタグ復元部と、前記タグ復元部によりタグが付け
られた第２言語の文を出力部を備えた機械翻訳装置にお
いて、タグ対応検査部は、前記タグの一文中における対
応関係を検査して開始位置及び終了位置を示す一対のタ
グの対応関係に誤りがある区間を検出し、前記翻訳部
は、前記タグ対応検査部によって分割された、タグの対
応関係に誤りがあると判断された区間にある入力文と、
当該区間の前にある入力文および当該区間の後ろにある
入力文とを、それぞれ翻訳処理を行うことを特徴とす
る。 According to claim 1 of the present invention, character data which is an original sentence described in a first language,
A pair of tags indicating the start position and end position
Input sentence consisting of additional information that adds various information between
And an input unit for inputting the character data and the additional information.
The relationship is stored as original text tag information, and
The tag separating unit for separating the additional information and the tag separating unit.
And translate the rest of the input text, excluding the tags, into the desired second language.
Translated, and the correspondence between the first language and the second language as translated word correspondence information
The translation unit to be stored and the second word obtained by the translation unit
In the sentence of the word, the tag temporarily separated by the tag separation unit is restored.
Tags are added by the original tag restoration unit and the tag restoration unit.
The translated text of the second language to a machine translation device equipped with an output unit.
Then, the tag correspondence inspection unit makes a pair in the sentence of the tag.
A pair of keys that indicate the start and end positions by inspecting the correspondence
Section that has an error in the correspondence
Is a pair of tags divided by the tag correspondence inspection unit.
Input sentence in the section that is judged to have an error in the correspondence,
Input sentence before the section and after the section
Characterized by the translation processing of the input sentence and
It

【００２２】本発明の請求項２によれば、前記タグ対応
検査部は、タグの対応関係に誤りがあると判断された当
該区間にある入力文に対して、前記翻訳部に翻訳処理を
行わせることなく、第１言語による原文のまま出力する
ことを特徴とする。 According to claim 2 of the present invention, the tag correspondence
The inspection unit determines that there is an error in the tag correspondence.
For the input sentence in the section, the translation unit performs translation processing.
Output the original text as it is in the first language, without prompting
It is characterized by

【００２３】[0023]

【００２４】[0024]

【発明の実施の形態】本発明の実施の形態を図面を用い
て詳細に説明する。図１は本発明の構成を示すブロック
図である。入力文書１、タグ分離部２、翻訳部３、タグ
復元部４、出力文書５、原文タグ情報６、訳語対応情報
７は、図７で示した従来技術のものと同様の構成である
ので、ここでの説明は省略する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the present invention. The input document 1, the tag separation unit 2, the translation unit 3, the tag restoration unit 4, the output document 5, the original sentence tag information 6, and the translated word correspondence information 7 have the same configurations as those of the conventional technique shown in FIG. The description here is omitted.

【００２５】タグ対応検査部８についてここで詳細に説
明する。図２はタグ対応検査部をさらに細かい処理ごと
に分解して図示した図である。文字列バッファ８１は、
入力文書をそのまま、つまりタグが付与された状態のま
ま一文毎にセットするバッファである。The tag correspondence inspection unit 8 will now be described in detail. FIG. 2 is a diagram illustrating the tag correspondence inspection unit by disassembling it for each finer process. The character string buffer 81 is
It is a buffer that sets the input document as it is, that is, with the tag attached, for each sentence.

【００２６】タグ識別部８２は、セットされた文字列バ
ッファの先頭から走査を行い、文中に含まれているタグ
の出現位置とタグ名を順次取得する手段である。このタ
グ識別部８２で取得されたタグはタグバッファ８３に格
納される。The tag identification unit 82 is a means for scanning the set character string buffer from the beginning and sequentially acquiring the appearance position and tag name of the tags included in the sentence. The tag acquired by the tag identification unit 82 is stored in the tag buffer 83.

【００２７】タグスタック８４は、文字列バッファの現
在の走査位置までのタグの状態を記憶するための、後入
れ先出し型のスタックである。タグ識別部８２で開始タ
グが検出された時、開始タグバッファの内容及び位置を
格納する。The tag stack 84 is a last-in first-out stack for storing the state of the tag up to the current scanning position of the character string buffer. When the tag identification unit 82 detects the start tag, the contents and position of the start tag buffer are stored.

【００２８】タグ比較部８５は、タグ識別部８２で終了
タグが検出された時に終了タグバッファとタグスタック
８４の最上部の開始タグとを比較する手段である。つま
り、文字列バッファにセットされた文において、タグの
対応の誤りがあった場合には、このタグ比較部８５で検
出されることになる。文分割処理部８６は、タグ比較部
８５でタグの誤りが検出された場合、謝りのあったタグ
の直前と直後で文を分割するための手段である。The tag comparing section 85 is means for comparing the end tag buffer with the start tag at the top of the tag stack 84 when the end tag is detected by the tag identifying section 82. That is, if there is a tag correspondence error in the sentence set in the character string buffer, the tag comparison unit 85 detects the error. The sentence division processing unit 86 is means for dividing a sentence immediately before and after the apologized tag when a tag error is detected by the tag comparison unit 85.

【００２９】分割された結果は、文前半部バッファ８６
ａ、誤りタグバッファ８６ｂ、文後半部バッファ８６ｃ
にそれぞれ格納される。文前半部バッファ８６ａの内容
は、従来の翻訳処理過程にそって翻訳を行う（図中[1]
の処理）。その翻訳結果の出力後、誤りタグバッファの
内容をそのまま出力し（図中[2]の処理）、その後文後
半部バッファの内容を改めて文字列バッファ８１へセッ
トし、処理を繰り返す（図中[3]）。文中にタグの対応
の誤りが一つも検出されなかった時には文分割処理部８
６は何も処理も行わず、文字列バッファ８１の内容がそ
のまま翻訳処理に送られる（図中[4]の処理）。The divided result is the sentence first half buffer 86.
a, error tag buffer 86b, sentence latter half buffer 86c
Stored in each. The contents of the first sentence buffer 86a are translated according to the conventional translation process ([1] in the figure).
Processing). After outputting the translation result, the contents of the error tag buffer are output as they are (process [2] in the figure), then the contents of the latter half of the sentence buffer are set again in the character string buffer 81, and the process is repeated ([[] in the diagram]). 3]). When no tag correspondence error is detected in the sentence, the sentence division processing unit 8
6 does not perform any processing, and the content of the character string buffer 81 is directly sent to the translation processing (processing [4] in the figure).

【００３０】次に図３のフローチャートを用いて処理の
流れを示す。まず、タグ対応検査部８への入力文をステ
ップＳ１で文字列バッファ８１に格納する。次にステッ
プＳ２で、タグ識別部８２の文字列バッファ走査位置の
初期化を行い、走査位置ポインタを０にセットする。Next, the flow of processing will be shown using the flowchart of FIG. First, the input sentence to the tag correspondence inspection unit 8 is stored in the character string buffer 81 in step S1. Next, in step S2, the character string buffer scanning position of the tag identifying unit 82 is initialized, and the scanning position pointer is set to zero.

【００３１】ステップＳ３、Ｓ４、Ｓ５、Ｓ５、Ｓ８、
Ｓ１０はタグ識別部８２で行う文中のタグを切り出す処
理である。文字列バッファ８１の走査を前から順に行
い、走査位置がタグ開始記号（例では“<”）かどうか
を判断する（ステップＳ３）。Steps S3, S4, S5, S5, S8,
In S10, the tag identifying unit 82 cuts out a tag in a sentence. The character string buffer 81 is scanned sequentially from the front to determine whether the scanning position is the tag start symbol (“<” in the example) (step S3).

【００３２】走査位置がタグ開始記号でない場合には、
タグのポインタを進める（ステップＳ４）。さらに、ポ
インタを進めた結果文末までいったかどうかをチェック
する（ステップＳ１０）。走査位置がタグ開始記号であ
った場合には、ポインタ（の位置）をタグバッファ８３
に記憶し（ステップＳ４）、タグ終了記号（例では
“>”）まで走査を進め、タグ名をタグバッファ８３に
記憶する（ステップＳ５）。If the scan position is not the tag start symbol,
The tag pointer is advanced (step S4). Further, it is checked whether or not the result of moving the pointer has reached the end of the sentence (step S10). When the scanning position is the tag start symbol, the pointer (position) is set to the tag buffer 83.
(Step S4), the scanning is advanced to the tag end symbol (“>” in the example), and the tag name is stored in the tag buffer 83 (Step S5).

【００３３】ここでタグ名が“/”で始まっているかど
うかをチェックする（ステップＳ６）。タグ名が“/”
で始まっていない場合は開始タグであると判断し、タグ
バッファに記憶しておいたタグ名とタグ開始位置のポイ
ンタの値の対をタグスタック８４に格納する（ステップ
Ｓ７）。Here, it is checked whether or not the tag name starts with "/" (step S6). Tag name is "/"
If it does not start with, it is determined to be a start tag, and the pair of the tag name stored in the tag buffer and the value of the pointer at the tag start position is stored in the tag stack 84 (step S7).

【００３４】タグ名が“/”で始まっている場合は、終
了タグであるので、まずタグスタック８４が空であるか
どうか確認し（ステップＳ９）、空でなければタグスタ
ック８４の一番上（一番新しく格納されたもの）の内容
を取り出し、タグバッファ８３の内容とタグ名の比較を
行う（ステップＳ１４）。If the tag name starts with "/", it is an end tag. Therefore, it is checked whether the tag stack 84 is empty (step S9). If it is not empty, the tag stack 84 is at the top. The contents of (the most recently stored one) are taken out, and the contents of the tag buffer 83 and the tag name are compared (step S14).

【００３５】タグスタック８４から取り出したタグ名
と、タグバッファ８３のタグ名が同一であれば、開始タ
グと終了タグが正しい対応関係にあるとして、そのまま
処理を続行する（ステップＳ３へ戻る）。ここで、文字
列バッファにおける走査位置ポインタの例を図４に、タ
グスタック８４の例を図５に示す。図４に示す２４の位
置にポインタが来ている時点で、図５に示すタグスタッ
ク８４には３つのタグがそれぞれ出現順に下からタグ名
と出現位置のポインタが積み上げられていることがわか
る。If the tag name extracted from the tag stack 84 and the tag name in the tag buffer 83 are the same, it is determined that the start tag and the end tag have the correct correspondence, and the process is continued (return to step S3). Here, an example of the scanning position pointer in the character string buffer is shown in FIG. 4, and an example of the tag stack 84 is shown in FIG. When the pointer comes to the position 24 shown in FIG. 4, it can be seen that the three tags are stacked in the tag stack 84 shown in FIG.

【００３６】上記した処理を繰り返し、走査位置が文字
列バッファの文末まで達した時に（ステップＳ１０）、
タグスタック８４が空であれば、入力文に全くタグの対
応関係の誤りがなかったことになるので、文字列バッフ
ァの内容をそのまま、従来の機械翻訳処理に受け渡すこ
とになる（ステップＳ１３）。When the scanning position reaches the end of the sentence in the character string buffer by repeating the above process (step S10),
If the tag stack 84 is empty, it means that there is no tag correspondence error in the input sentence, and therefore the contents of the character string buffer are directly passed to the conventional machine translation process (step S13). .

【００３７】タグの対応関係に誤りがあるのは以下の３
パターンに分けられる。一つは、終了タグが出現された
にもかかわらず、タグスタックが空である場合である。
これは、タグ比較部８５において、タグバッファ８３に
終了タグが格納され（走査により終了タグが検出され）
ている際にタグスタック８４が空である場合に検出され
る誤りである。フローチャートではステップＳ９で検出
される。There is an error in the tag correspondence as described in the following 3
Divided into patterns. One is when the end tag appears but the tag stack is empty.
This is because the end tag is stored in the tag buffer 83 in the tag comparison unit 85 (the end tag is detected by scanning).
This is an error detected when the tag stack 84 is empty during the operation. In the flowchart, it is detected in step S9.

【００３８】誤りの検出を受けて、文分割処理部８６は
文字列バッファ８１の、終了タグの直前までの部分、終
了タグ、終了タグより後半の部分をそれぞれ文前半部バ
ッファ８６ａ、誤りタグバッファ８６ｂ、文後半部バッ
ファ８６ｃに格納する（ステップＳ１７）。In response to the detection of an error, the sentence division processing unit 86 causes the character string buffer 81 to include a portion immediately before the end tag, an end tag, and a portion after the end tag in the sentence front half buffer 86a and the error tag buffer, respectively. 86b and the latter half of the sentence buffer 86c are stored (step S17).

【００３９】そして、文前半部バッファ８６ａの内容を
翻訳処理に受け渡す（ステップＳ１８）。それに続い
て誤りタグバッファ８６ｂの内容を出力する（ステッ
プＳ１９）。さらにそれに続いて文後半部バッファ８６
ｃの内容を文字列バッファ８１に出力する（ステップ
Ｓ２０）。このようにして文字列バッファ８１の内容
が書き換えられる。そして処理をステップＳ２に戻
す。Then, the contents of the sentence first half buffer 86a are transferred to the translation process (step S18) . Its Re followed by outputting the contents of the error tag buffer 86b (step
S19) . Following that, the latter half of the sentence buffer 86
The content of c is output to the character string buffer 81 (step S20). In this way, the contents of the character string buffer 81 are rewritten. Then, the process returns to step S2.

【００４０】二つめは、出現した終了タグのタグ名が、
タグスタック８４に格納されている開始タグのタグ名と
一致しない場合である。これは、タグ比較部８５におい
て、タグバッファ８３に終了タグが格納され（走査によ
り終了タグが検出され）ている際にタグスタック８４に
格納されている開始タグと、タグバッファ８３の終了タ
グとが対応しない場合に検出される誤りである。フロー
チャートではステップＳ１４で検出される。Second, the tag name of the end tag that appears is
This is a case where the tag name of the start tag stored in the tag stack 84 does not match. This is because, in the tag comparison unit 85, the start tag stored in the tag stack 84 when the end tag is stored in the tag buffer 83 (the end tag is detected by scanning), and the end tag in the tag buffer 83. Is an error that is detected when does not correspond. In the flowchart, it is detected in step S14.

【００４１】誤りが検出されると、タグスタック８４の
内容を全て取り出し、スタックの最下部にある開始タグ
の出現位置ポインタを取得する（ステップＳ１６）。文
分割処理部８６は誤りの検出を受け、文字列バッファの
タグより前方の部分、タグ、タグより後方の部分に分割
して（分割については前述の処理と同様）、それぞれを
各バッファ８６ａ〜８６ｃに格納する。そして、同様に
文後半部の内容を新たに文字列バッファ８１にセットし
て処理をステップＳ２に戻す。When an error is detected, the contents of the tag stack 84 are all taken out and the appearance position pointer of the start tag at the bottom of the stack is acquired (step S16). Upon detection of an error, the sentence division processing unit 86 divides the character string buffer into a portion before the tag, a portion after the tag, and a portion after the tag (the division is similar to the above-described processing), and each of them is divided into buffers 86a to. It is stored in 86c. Then, similarly, the contents of the latter half of the sentence are newly set in the character string buffer 81, and the process returns to step S2.

【００４２】第３のパターンは、バッファの末尾まで走
査が終わった時点で、タグスタック８４にタグが残って
いる場合である。これは、タグ識別部８２が文字列バッ
ファ８１の走査を文末まで行った時点で、タグスタック
８４にタグが残っていた場合に検出される誤りである。
フローチャートではステップＳ１２で検出される。The third pattern is a case in which tags remain in the tag stack 84 at the time when scanning is completed up to the end of the buffer. This is an error detected when the tag identifying unit 82 scans the character string buffer 81 to the end of the sentence and there are tags remaining in the tag stack 84.
In the flowchart, it is detected in step S12.

【００４３】誤りが検出されると、文分割処理部８６
は、タグスタック８４の内容を全て取り出し、スタック
の最下部にある開始タグの出現位置ポインタを取得する
（ステップＳ１６）。文分割処理部８６は誤りの検出を
受け、文字列バッファのタグより前方の部分、タグ、タ
グより後方の部分に分割して（分割については前述の処
理と同様）、それぞれを各バッファ８６ａ〜８６ｃに格
納する。そして、同様に文後半部の内容を新たに文字列
バッファ８１にセットして処理をステップＳ２に戻す。When an error is detected, the sentence division processing unit 86
Fetches all the contents of the tag stack 84 and acquires the appearance position pointer of the start tag at the bottom of the stack (step S16). Upon detection of an error, the sentence division processing unit 86 divides the character string buffer into a portion before the tag, a portion after the tag, and a portion after the tag (the division is similar to the above-described processing), and each of them is divided into buffers 86a to. It is stored in 86c. Then, similarly, the contents of the latter half of the sentence are newly set in the character string buffer 81, and the process returns to step S2.

【００４４】ここまでの処理を実際の入力文に基づいて
説明する。入力文としては、前述した <tag1> Time flies </tag2> like an <tag3> arrow </t
ag3> . を用いる。The processing up to this point will be described based on an actual input sentence. As the input sentence, the above <tag1> Time flies </ tag2> like an <tag3> arrow </ t
Use ag3>.

【００４５】まず、<tag1>がタグバッファ８３に記憶さ
れ、タグ比較部８５により、開始タグであることがわか
り、タグスタック８４に格納される。次に</tag2>がタ
グバッファ８３に記憶され、タグ比較部８５により終了
タグと判断されるので、タグスタック８４を取り出すと
<tag1>であり一致しないので、タグ名の不一致となる。
そこで、<tag1>を誤りタグとして分割するので、文分割
処理部８６では、文前半部バッファ８６ａには“”
（空）を、誤りタグバッファ８６ｂには“<tag1>”を、
文後半部バッファ８６ｃには“Time flies </tag2> lik
e an <tag3> arrow</tag3> .”が格納される。First, <tag1> is stored in the tag buffer 83, and the tag comparison unit 85 finds that it is a start tag and stores it in the tag stack 84. Next, </ tag2> is stored in the tag buffer 83, and the tag comparison unit 85 determines that the tag is the end tag.
Since it is <tag1> and does not match, the tag names do not match.
Therefore, since <tag1> is divided as an error tag, the sentence division processing unit 86 stores “” in the sentence first half buffer 86a.
(Empty) and “<tag1>” in the error tag buffer 86b,
In the latter half of the sentence buffer 86c, "Time flies </ tag2> lik
e an <tag3> arrow </ tag3>. ”is stored.

【００４６】そして、文前半部バッファ８６ａの内容は
翻訳処理過程に引き渡される。この場合は空（空文）が
引き渡されるので、翻訳結果ももちろん空（空文）であ
る。Then, the contents of the sentence first half buffer 86a are handed over to the translation process. In this case, since the empty (empty sentence) is passed, the translation result is also empty (empty sentence).

【００４７】続いて、誤りタグバッファ８６ｂに格納さ
れているタグ“<tag1>”を出力する。Then, the tag "<tag1>" stored in the error tag buffer 86b is output.

【００４８】次に、文後半部バッファ８６ｃに格納され
ている“Time flies </tag2> likean <tag3> arrow </t
ag3> .”を文字列バッファ８１にもう一度セットし、処
理を繰り返す。この場合は、</tag2>を発見した時点
で、タグスタック８４が空であるので、開始タグの不足
となる。Next, "Time flies </ tag2> likean <tag3> arrow </ t stored in the sentence latter half buffer 86c.
“Ag3>.” is set again in the character string buffer 81, and the process is repeated. In this case, since the tag stack 84 is empty when </ tag2> is found, the start tag becomes insufficient.

【００４９】誤りタグは“</tag2>”となり、文前半部
バッファ８６ａには“Time flies”が、誤りタグバッフ
ァ８６ｂには“</tag2>”が、文後半部バッファ８６ｃ
には“like an <tag3> arrow </tag3> .”がそれぞれ格
納される。The error tag becomes "</ tag2>", "Time flies" is stored in the sentence first half buffer 86a, "</ tag2>" is stored in the error tag buffer 86b, and the sentence latter half buffer 86c.
"Like an <tag3> arrow </ tag3>." Is stored in each.

【００５０】文前半部バッファの内容は翻訳処理過程に
受け渡され、翻訳結果として“時間は飛ぶ”が得られ
る。それに続いて誤りタグ“</tag2>”が出力される。
また、文後半部バッファ８６ｃに格納された“like an
<tag3> arrow </tag3> .”が文字列バッファ８１に戻さ
れて処理を繰り返す。この場合はもうタグの対応の誤り
がないので、文の分割は行われず、そのまま翻訳過程に
受け渡される。翻訳結果としては、“<tag3>矢</tag3>
のように。”が得られ、出力される。The content of the sentence first half buffer is passed to the translation process, and "time flies" is obtained as the translation result. Following that, the error tag "</ tag2>" is output.
In addition, the “like an” stored in the sentence latter half buffer 86c
"<tag3> arrow </ tag3>." is returned to the character string buffer 81 and the process is repeated. In this case, since there is no tag correspondence error, the sentence is not divided and is passed as it is to the translation process. The translation result is “<tag3> arrow </ tag3>.
like. Is obtained and output.

【００５１】つまり、最終的には次のような訳文が得ら
れることになる<tag1>時間は飛ぶ</tag2><tag3>矢</tag
3>のように。That is, the following translated text will be finally obtained <tag1> Time flies </ tag2><tag3> Arrow </ tag
Like 3>.

【００５２】このように、本発明による翻訳結果では、
タグと単語の前後関係が原文と一致した状態で、つまり
タグ情報を歪めることなく翻訳結果を出力されている。Thus, in the translation result according to the present invention,
The translation result is output in the state where the context of the tag and the word match the original sentence, that is, without distorting the tag information.

【００５３】また、タグの誤りを検出した場合に、その
文は原文のまま出力してもよい。この処理のフローチャ
ートを図６に示す。ステップＳ１〜ステップＳ１４まで
は図３と同様である。ステップＳ９、ステップＳ１２、
ステップＳ１４のいづれかでタグの対応の誤りが検出さ
れると、ステップＳ２１において、原文をそのまま行わ
ずに出力する。When a tag error is detected, the sentence may be output as the original sentence. A flowchart of this process is shown in FIG. Steps S1 to S14 are the same as in FIG. Step S9, Step S12,
When the tag correspondence error is detected in any of the steps S14, the original sentence is output as it is without being processed in a step S21.

【００５４】[0054]

【発明の効果】本発明における装置では、入力文書の各
文に対して、開始タグと終了タグの対応を検査し、検出
されたタグの対応に誤りがある文に対して、その誤りの
タグの前後で分割を行い翻訳処理し、タグを戻すため
に、原文の持つタグ情報を歪めることなく翻訳結果を得
ることができる。 With the apparatus according to the present invention, the correspondence between the start tag and the end tag is inspected and detected for each sentence of the input document.
For a sentence that has an error in the correspondence of the specified tag,
To split the tag before and after the translation process and return the tag
In addition, the translation result can be obtained without distorting the tag information of the original text.
You can

【００５５】[0055]

【００５６】また、タグの対応に誤りのある文に関し
て、原文のまま出力することにより、当然タグ情報の欠
落や歪みを防止てきる。By outputting the original sentence as it is with respect to the sentence having an incorrect tag correspondence, it is possible to prevent the tag information from being lost or distorted.

【００５７】このことは、多少翻訳の質を低下させて
も、文書データ重大な影響を持つタグ情報を原文とでき
る限り同じ状態で翻訳結果を得たい場合に非常に有効な
ものである。This is very effective when it is desired to obtain the translation result in the same state as the original text as the tag information, which has a serious influence on the document data, even if the translation quality is slightly deteriorated.

[Brief description of drawings]

【図１】本発明の一実施の形態における構成を示すブロ
ック図である。FIG. 1 is a block diagram showing a configuration in an embodiment of the present invention.

【図２】本発明の一実施の形態におけるタグ対応検査部
８１の詳細な構成を示すブロック図である。FIG. 2 is a block diagram showing a detailed configuration of a tag correspondence inspection unit 81 in the embodiment of the present invention.

【図３】本発明の一実施の形態における処理の流れを示
すフローチャートである。FIG. 3 is a flowchart showing a flow of processing in the embodiment of the present invention.

【図４】本発明の一実施の形態における文字列バッファ
８１の例と、走査位置ポインタの例を示す図である。FIG. 4 is a diagram showing an example of a character string buffer 81 and an example of a scanning position pointer in one embodiment of the present invention.

【図５】本発明の一実施の形態におけるタグスタック８
４の例を示す図である。FIG. 5 is a tag stack 8 according to an embodiment of the present invention.
It is a figure which shows the example of FIG.

【図６】本発明の一実施の形態における処理の流れを示
すフローチャートである。FIG. 6 is a flowchart showing a flow of processing in the embodiment of the present invention.

【図７】従来技術の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a conventional technique.

【図８】従来技術におけるタグ情報の記憶例を示す図で
ある。FIG. 8 is a diagram showing a storage example of tag information in a conventional technique.

【図９】従来技術における訳語情報の記憶例を示す図で
ある。FIG. 9 is a diagram showing an example of storage of translated word information in the related art.

【図１０】従来技術における訳文タグ情報の記憶例を示
す図である。FIG. 10 is a diagram showing a storage example of translated text tag information in a conventional technique.

[Explanation of symbols]

１入力文書２タグ分離部３翻訳部４タグ復元部５出力文書６原文タグ情報７訳語対応情報８タグ対応検査部８１文字列バッファ８２タグ識別部８３タグバッファ８４タグスタック８５タグ比較部８６文分割処理部８６ａ文前半部バッファ８６ｂ誤りタグバッファ８６ｃ文後半部バッファ 1 Input document 2 Tag separation unit 3 Translation Department 4 Tag restoration section 5 Output document 6 original text tag information 7 Translation information 8 tag compatible inspection department 81 string buffer 82 Tag identification section 83 Tag buffer 84 tag stack 85 Tag comparison section 86 sentence division processing unit 86a sentence first half buffer 86b error tag buffer 86c sentence latter half buffer

Claims

(57) [Claims]

1. An input unit for inputting an input sentence consisting of additional information for adding various information in the section by character data which is an original sentence described in a first language, and a pair of tags indicating a start position and an end position. , A tag separating unit that stores the relationship between the character data and the additional information as original sentence tag information and separates the additional information from the input sentence, and the remaining input sentence excluding the tags is desired by the tag separating unit. A translation unit that translates into a second language and stores the correspondence between the first language and the second language as translation correspondence information; and a sentence in the second language obtained by the translation unit, temporarily in the tag separation unit. A tag restoring unit for restoring the separated tag, and a sentence in the second language tagged by the tag restoring unit.
In a machine translation device having an output unit for outputting, a tag correspondence inspection unit that inspects a correspondence relation in one sentence of the tag and detects a section in which a correspondence relation between a pair of tags indicating a start position and an end position has an error. The translation unit includes an input sentence in a section divided by the tag correspondence inspection unit and determined to have an error in tag correspondence, an input sentence in front of the section, and a section after the section. A machine translation device characterized by performing a translation process on a given input sentence.

2. The tag correspondence inspection unit uses the first language without causing the translation unit to perform a translation process on the input sentence in the section in which the correspondence relation of the tags is determined to be erroneous. The output as the original text is output.
The machine translation device described in.