JP2015228170A

JP2015228170A - Machine translation method, machine translation program and machine translation apparatus

Info

Publication number: JP2015228170A
Application number: JP2014114109A
Authority: JP
Inventors: 富士　秀; Hide Fuji; 秀富士; 友樹長瀬; Yuki Nagase
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-06-02
Filing date: 2014-06-02
Publication date: 2015-12-17
Anticipated expiration: 2034-06-02
Also published as: JP6292036B2

Abstract

PROBLEM TO BE SOLVED: To provide a machine translation method, a machine translation program and a machine translation apparatus capable of improving the accuracy in translation of a document component acquired by dividing a sentence as a translation object.SOLUTION: A sentence as a translation object is divided into plural document components. An analysis candidate which is applied to the translation of a document component to be translated is selected from one or more analysis candidates based on the similarity among the analysis candidates of the document components and a corpus of the analyzed document components to perform the translation.

Description

本発明は、第１言語文を第２言語文に翻訳する機械翻訳技術に関する。 The present invention relates to a machine translation technique for translating a first language sentence into a second language sentence.

従来、翻訳対象の第１言語の文章を、第１言語と第２言語の対訳辞書を用いて、第１言語とは異なる翻訳目的の第２言語の文章に翻訳する機械翻訳が知られている（例えば、特許文献１乃至５を参照。）。 2. Description of the Related Art Conventionally, machine translation for translating a sentence in a first language to be translated into a sentence in a second language for translation purposes different from the first language using a bilingual dictionary of the first language and the second language is known. (For example, see Patent Documents 1 to 5.)

機械翻訳を必要とする文書には様々なものがある。その中でも、操作マニュアル、法令、契約書、特許明細書等の産業文書に対する翻訳需要は大きい。 There are a variety of documents that require machine translation. Among them, there is a great demand for translation of industrial documents such as operation manuals, laws, contracts, and patent specifications.

一般的に産業文書は、長文を多く含むという特徴を持つ。機械翻訳においては、翻訳対象の文章が長くなると、解釈の曖昧性が多くなるため、処理時間の増大や、解析誤りの増加等の問題が生じ、翻訳品質が低くなる傾向がある。したがって、長文に対して高精度（高品質）な翻訳を行うための技術には、大きなニーズがある。 In general, industrial documents are characterized by containing many long sentences. In machine translation, if the sentence to be translated becomes longer, the ambiguity of interpretation increases, which causes problems such as an increase in processing time and an increase in analysis errors, and the translation quality tends to be lowered. Therefore, there is a great need for a technique for performing high-accuracy (high-quality) translation of long sentences.

例えば、長文が多い英文特許明細書のクレーム（特許請求の範囲）を翻訳するために、クレーム特有の文体に注目し、入力した各クレーム（請求項）を構成要件等の部品に分割することで、適切な翻訳結果を得るという技術が開示されている（例えば、特許文献６を参照。）。 For example, in order to translate claims (claims) of English patent specifications with many long sentences, paying attention to the specific style of the claims and dividing each input claim (claim) into components such as constituent requirements A technique for obtaining an appropriate translation result is disclosed (see, for example, Patent Document 6).

すなわち、まず予め格納されたクレーム特有の文体を分割するためのパターン（所定規則）との照合によって入力された英語のクレームの分割を行う。次に、分割された各部品を解析し、修飾関係から階層構造（係り受け関係）を作成する。そして、階層構造を逆転し日本語の語順に直す。最後に、各階層を翻訳し結果を出力している。これにより、１文全体をそのまま翻訳するのと比べて、翻訳単位が短くなった分、解釈の曖昧性を少なくできるため翻訳品質が向上する。 That is, first, an English claim inputted by collation with a pattern (predetermined rule) for dividing a sentence-specific sentence style stored in advance is divided. Next, each divided part is analyzed, and a hierarchical structure (dependency relationship) is created from the modification relationship. Then, reverse the hierarchical structure and correct the order of Japanese words. Finally, each hierarchy is translated and the results are output. As a result, the translation quality is improved because the ambiguity of the interpretation can be reduced as much as the translation unit is shortened as compared with the case where the entire sentence is translated as it is.

特開平８−２７８９７０号公報JP-A-8-278970 特開２００８−２３３９５５号公報JP 2008-233955 A 特表２０１０−５３５３７７号公報Special table 2010-535377 特開２００９−２３０５６１号公報JP 2009-230561 A 特開２０１２−７９０８１号公報JP 2012-79081 A 特開平９−２９３０７５号公報Japanese Patent Laid-Open No. 9-293075

図１は、従来技術の問題点を説明する図である。
図１において、入力文１００「A device comprising text storing means for new data and data ranking part for the stored data.」が、翻訳対象の第１言語の文章である。長文を分割する所定規則としてのパターン２００「「部品Ｐ０」 comprising 「部品Ｐ１」 and 「部品Ｐ２」.」は、予め格納されている。 FIG. 1 is a diagram for explaining the problems of the prior art.
In FIG. 1, an input sentence 100 “A device comprising text storing means for new data and data ranking part for the stored data” is a sentence in the first language to be translated. A pattern 200 ““ part P0 ”comprising“ part P1 ”and“ part P2 ”.” As a predetermined rule for dividing a long sentence is stored in advance.

従来の機械翻訳装置は、まず入力文１００をパターン２００と照合し、入力文１００を「A device」、「comprising」、「text storing means for new data」、「and」、「data ranking part for the stored data」及び「.」に分割する。パターン２００の「部品Ｐ０」は、文書部品１１０「A device」に対応する。パターン２００の「部品Ｐ１」は、文書部品１２０「text storing means for new data」に対応する。パターン２００の「部品Ｐ２」は、文書部品１３０「data ranking part for the stored data」に対応する。 The conventional machine translation device first matches the input sentence 100 with the pattern 200, and the input sentence 100 is "A device", "comprising", "text storing means for new data", "and", "data ranking part for the Divided into “stored data” and “.”. “Part P0” of the pattern 200 corresponds to the document part 110 “A device”. “Part P1” of the pattern 200 corresponds to the document part 120 “text storing means for new data”. “Part P2” of the pattern 200 corresponds to the document part 130 “data ranking part for the stored data”.

従来の機械翻訳装置は、次に、分割された各部分を解析する。文書部品１２０「text storing means for new data」の解析結果として、フレーズ１２１「text」からフレーズ１２２「storing」への係り受け、フレーズ１２２「storing」からフレーズ１２３「means」への係り受け、及びフレーズ１２４「for new data」からフレーズ１２３「means」への係り受けが得られる。同様に、文書部品１３０「data ranking part for the stored data」の解析結果として、フレーズ１３４「for the stored data」からフレーズ１３３「part」への係り受け、フレーズ１３３「part」からフレーズ１３２「ranking」への係り受け、及びフレーズ１３２「ranking」からフレーズ１３１「data」への係り受けが得られる。 Next, the conventional machine translation apparatus analyzes each divided part. As an analysis result of the document part 120 “text storing means for new data”, the dependency from the phrase 121 “text” to the phrase 122 “storing”, the dependency from the phrase 122 “storing” to the phrase 123 “means”, and the phrase The dependency from 124 “for new data” to the phrase 123 “means” is obtained. Similarly, as an analysis result of the document part 130 “data ranking part for the stored data”, the phrase 134 “for the stored data” depends on the phrase 133 “part”, and the phrase 133 “part” to the phrase 132 “ranking”. And the dependency from the phrase 132 “ranking” to the phrase 131 “data”.

そして、対訳辞書を用いてフレーズ１２１乃至１２４を翻訳し、「新規データ用」「テキスト」「格納」「手段」を係り受けの順に結合して、翻訳部品２１０「新規データ用テキスト格納手段」を得る。同様に、対訳辞書を用いてフレーズ１３１乃至１３４を翻訳し、係り受けの順に結合して、翻訳部品２３０「格納データ用部品をソートするデータ」を得る。なお、文書部品１１０「A device」は、翻訳結果として、翻訳部品２５０「装置」を得る。これらの翻訳は独立して行われる。 Then, the phrases 121 to 124 are translated using the bilingual dictionary, and "translation part 210" new data text storage means "is combined by combining" for new data "," text "," storage ", and" means "in the order of dependency. obtain. Similarly, phrases 131 to 134 are translated using a bilingual dictionary and combined in the order of dependency to obtain a translation component 230 “data for sorting stored data components”. The document part 110 “A device” obtains a translation part 250 “device” as a translation result. These translations are done independently.

そして、従来の機械翻訳装置は、これらの翻訳結果と、パターン２００中の定型部品（「comprising」「and」）の翻訳結果（翻訳部品２２０「と」、翻訳部品２４０「を備える」）と結合する。 Then, the conventional machine translation apparatus combines these translation results with the translation results of the standard parts (“comprising” and “and”) in the pattern 200 (translation parts 220 “to” and translation parts 240 “includes”). To do.

上述したように、文書部品１１０「A device」の翻訳、文書部品１２０「text storing means for new data」の翻訳、及び文書部品１３０「data ranking part for the stored data」の翻訳がそれぞれ独立している。この例では、本来「格納データ用データソート部」と翻訳されなければならない文書部品１３０「data ranking part for the stored data」の翻訳結果が間違っている。 As described above, the translation of the document part 110 “A device”, the translation of the document part 120 “text storing means for new data”, and the translation of the document part 130 “data ranking part for the stored data” are independent. . In this example, the translation result of the document part 130 “data ranking part for the stored data” that should be translated as “data sorting part for stored data” is incorrect.

すなわち、従来の機械翻訳は、翻訳対象の文章を分割して得られた各文書部品の翻訳において、各文書部品を１つずつ独立して翻訳する。そのために、翻訳対象の文章全体における文書部品の構成を勘案した曖昧性解消ができていない、という問題点があった。 That is, in the conventional machine translation, each document part is independently translated one by one in the translation of each document part obtained by dividing the sentence to be translated. For this reason, there has been a problem that the ambiguity cannot be resolved in consideration of the structure of the document parts in the entire sentence to be translated.

１つの側面において、本発明は、翻訳対象の文章を分割して得られた各文書部品の翻訳において、文書部品の翻訳精度を向上させることが可能な機械翻訳方法、機械翻訳プログラム及び機械翻訳装置を提供することを目的とする。 In one aspect, the present invention provides a machine translation method, a machine translation program, and a machine translation apparatus capable of improving the translation accuracy of document parts in translation of each document part obtained by dividing a sentence to be translated. The purpose is to provide.

１つの案では、機械翻訳方法は、第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する機械翻訳装置のコンピュータが実行する。所定規則に基づいて、翻訳対象文章を分割して文書部品を作成する。作成された文書部品のそれぞれについて、１つ以上の解析候補を作成する。作成された文書部品間に並列構造が認められた場合、並列構造を構成する文書部品それぞれに対して作成された解析候補について、作成された解析候補の組み合せを作成する。作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出する。過去に解析した結果である解析済みコーパスに基づいて、作成された解析候補の出現度合を示すコーパス値を算出する。算出された並列類似値と算出されたコーパス値に基づいて、作成された１つ以上の解析候補から翻訳対象の文書部品の翻訳に際して適用する解析候補を選択する。 In one proposal, the machine translation method is executed by a computer of a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language. Based on a predetermined rule, a document part is created by dividing a sentence to be translated. One or more analysis candidates are created for each of the created document parts. When a parallel structure is recognized between the created document parts, a combination of the created analysis candidates is created for the analysis candidates created for each document part constituting the parallel structure. A parallel similarity value indicating the similarity between the analysis candidates constituting the created combination is calculated. A corpus value indicating the appearance degree of the created analysis candidate is calculated based on the analyzed corpus that is a result of analysis in the past. Based on the calculated parallel similarity value and the calculated corpus value, an analysis candidate to be applied when translating the document part to be translated is selected from the created one or more analysis candidates.

実施の形態によれば、翻訳対象の文章を分割して得られた各文書部品の翻訳において、翻訳精度を向上させることができる。 According to the embodiment, translation accuracy can be improved in translation of each document part obtained by dividing a sentence to be translated.

従来技術の問題点を説明する図である。It is a figure explaining the problem of a prior art. 本実施の形態の特徴を説明する図である。It is a figure explaining the characteristic of this Embodiment. 第１の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。It is a figure which shows the functional block of the machine translation apparatus which concerns on 1st Embodiment. 機械翻訳装置のコンピュータが実行する機械翻訳処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the machine translation process which the computer of a machine translation apparatus performs. パターンの例を示す図である。It is a figure which shows the example of a pattern. 解析済みコーパスの作成の仕方を説明する図である。It is a figure explaining how to create an analyzed corpus. 解析済みコーパスデータベースの例を示す図である。It is a figure which shows the example of an analyzed corpus database. コーパス値の求め方を説明する図である。It is a figure explaining how to obtain a corpus value. 並列類似値を算出する条件を説明する図である。It is a figure explaining the conditions which calculate a parallel similarity value. 並列類似値の算出例を示す図である。It is a figure which shows the example of calculation of a parallel similarity value. 総合値の算出例を示す図である。It is a figure which shows the example of calculation of a comprehensive value. 正しく翻訳された翻訳結果の例を示す図である。It is a figure which shows the example of the translation result translated correctly. 第２の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。It is a figure which shows the functional block of the machine translation apparatus which concerns on 2nd Embodiment. 文書部品間に並列構造が存在しない場合の例を示す図である。It is a figure which shows the example when a parallel structure does not exist between document parts. 文書部品間に並列構造が存在する場合の例を示す図である。It is a figure which shows an example in case a parallel structure exists between document parts. 第３の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。It is a figure which shows the functional block of the machine translation apparatus which concerns on 3rd Embodiment. 情報処理装置の構成図である。It is a block diagram of information processing apparatus.

以下、図面を参照しながら、本発明を実施するための形態について詳細に説明する。
図２は、本実施の形態の特徴を説明する図である。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
FIG. 2 is a diagram for explaining the features of the present embodiment.

図１に示した例を用いて本実施の形態を説明する。
なお、文書部品１２０を生成する処理までは図１を参照して説明した内容と共通するため、説明は省略する。 The present embodiment will be described using the example shown in FIG.
Since the process up to generating the document part 120 is the same as that described with reference to FIG. 1, the description thereof is omitted.

図２において、本実施の形態では、文書部品１２０「text storing means for new data」に対して複数の解析候補１２０Ａ（Ｐ１−ａ）及び１２０Ｂ（Ｐ１−ｂ）を取得する。解析候補１２０Ａは、図１に示した例と同様、フレーズ「text」からフレーズ「storing」への係り受け、フレーズ「storing」からフレーズ「means」への係り受け、及びフレーズ「for new data」からフレーズ「means」への係り受けである。解析候補１２０Ｂは、解析候補１２０Ａとは別の、フレーズ「for new data」からフレーズ「means」への係り受け、フレーズ「means」からフレーズ「storing」への係り受け、及びフレーズ「storing」からフレーズ「text」への係り受けである。同様に、文書部品１３０「data ranking part for the stored data」に対して複数の解析候補１３０Ａ（Ｐ２−ａ）及び１３０Ｂ（Ｐ２−ｂ）を取得する。 In FIG. 2, in the present embodiment, a plurality of analysis candidates 120A (P1-a) and 120B (P1-b) are acquired for a document part 120 “text storing means for new data”. Similar to the example shown in FIG. 1, the analysis candidate 120 </ b> A includes a dependency from the phrase “text” to the phrase “storing”, a dependency from the phrase “storing” to the phrase “means”, and the phrase “for new data”. It is a dependency on the phrase “means”. The analysis candidate 120B is different from the analysis candidate 120A, the dependency from the phrase “for new data” to the phrase “means”, the dependency from the phrase “means” to the phrase “storing”, and the phrase “storing” It is a dependency on "text". Similarly, a plurality of analysis candidates 130A (P2-a) and 130B (P2-b) are acquired for the document part 130 “data ranking part for the stored data”.

次に、翻訳対象文書を分割する為のパターンに基づいて、分割された文書部品１２０や文書部品１３０が並列関係にあるか、パターンを構成する「and」や「or」を挟んで配置される文か否かで判断できる。一文を構成する文章の中で並列関係にある、すなわち、「and」や「or」等で結合される文書部品（１２０，１３０）同士は並列する要件を対象とした類似の構造を持っていることが多い。この文書の構造的な特徴に着目し、並列関係を構成する文書部品間の文書構造（解析候補）の類似度に応じて、解析候補の並列組み合せに「並列類似値」を与える。本実施の形態では、文書部品１２０に対して複数の解析候補１２０Ａ及び１２０Ｂを取得し、文書部品１３０に対して複数の解析候補１３０Ａ及び１３０Ｂを取得する。文書部品１２０（１２０Ａ及び１２０Ｂ）の解析候補と、文書部品１３０の（１３０Ａ及び１３０Ｂ）の解析候補それぞれの組み合わせと、各組み合わせの解析候補の共通性に応じて「並列類似値」を与える。 Next, based on the pattern for dividing the document to be translated, the divided document parts 120 and the document parts 130 are in a parallel relationship or arranged with “and” and “or” constituting the pattern interposed therebetween. Judgment can be made by sentence. The sentence parts (120, 130) connected in parallel in the sentences constituting one sentence, ie, “and”, “or”, etc., have a similar structure for the parallel requirement. There are many cases. Focusing on the structural features of this document, a “parallel similarity value” is given to the parallel combination of analysis candidates in accordance with the similarity of the document structure (analysis candidate) between the document parts constituting the parallel relationship. In the present embodiment, a plurality of analysis candidates 120A and 120B are acquired for the document part 120, and a plurality of analysis candidates 130A and 130B are acquired for the document part 130. A “parallel similarity value” is given according to the combination of the analysis candidate of the document part 120 (120A and 120B), the combination of the analysis candidate of the document part 130 (130A and 130B), and the commonality of the analysis candidate of each combination.

また、並列類似値だけでは曖昧性を十分に解消できないことがある。そこで、あらかじめ文種（法律文書，論文，操作マニュアル，等の分野）に応じて分類された対象分野の解析済みコーパス中の頻度を基に、各解析候補に「コーパス値」を与える。ここで、コーパス値とは、候補がコーパス中に出現する頻度に基づく値であり、コーパス値の求め方は後述する。 In addition, ambiguity may not be sufficiently resolved only by parallel similarity values. Therefore, a “corpus value” is given to each analysis candidate based on the frequency in the analyzed corpus of the target field that has been classified in advance according to the sentence type (field of legal documents, papers, operation manuals, etc.). Here, the corpus value is a value based on the frequency at which candidates appear in the corpus, and how to obtain the corpus value will be described later.

そして、並列類似値とコーパス値に基づくことにより、総合値が最も高い並列組み合わせである（Ｐ１−ａ）及び（Ｐ２−ｂ）が選択される。その結果、文書部品１３０「data ranking part for the stored data」に対して、並列文書を構成する文書部品１２０との間の並列類似値が高く対象分野におけるコーパス値の高い（正しいと判断できる）翻訳結果である翻訳部品３３０「格納データ用データソート部」が得られる。 Then, based on the parallel similarity value and the corpus value, (P1-a) and (P2-b), which are parallel combinations having the highest total value, are selected. As a result, with respect to the document part 130 “data ranking part for the stored data”, the parallel similarity value between the document part 120 and the document part 120 constituting the parallel document is high, and the corpus value in the target field is high (it can be determined to be correct). As a result, the translation component 330 “data sorting unit for stored data” is obtained.

図３は、第１の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。
図３において、機械翻訳装置５００Ａは、第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する。機械翻訳装置５００Ａは、入力文分割部５０１、複数解析候補作成部５０２、並列組合せ作成部５０３、並列類似値付与部５０４、頻度取得部５０５、コーパス値付与部５０６、及び総合値算出部５０８を備える。更に機械翻訳装置５００Ａは、候補選択部５０９、部品翻訳部５１０、及び結合部５１１を備える。 FIG. 3 is a diagram illustrating functional blocks of the machine translation apparatus according to the first embodiment.
In FIG. 3, the machine translation apparatus 500A translates a translation target sentence described in the first language into a translation target sentence described in the second language. The machine translation apparatus 500A includes an input sentence division unit 501, a plurality of analysis candidate creation units 502, a parallel combination creation unit 503, a parallel similarity value assignment unit 504, a frequency acquisition unit 505, a corpus value assignment unit 506, and a total value calculation unit 508. Prepare. Furthermore, the machine translation apparatus 500A includes a candidate selection unit 509, a component translation unit 510, and a combination unit 511.

入力文分割部５０１は、パターン２００（所定規則）に基づいて、入力文１００である翻訳対象文章を分割して文書部品を作成する。例えば、図１を用いて説明したように、入力文１００である翻訳対象文章とパターン２００と照合し、パターンを構成する部品に対応する文書部品を作成する。 Based on the pattern 200 (predetermined rule), the input sentence dividing unit 501 divides the translation target sentence that is the input sentence 100 to create a document part. For example, as described with reference to FIG. 1, the translation target sentence that is the input sentence 100 and the pattern 200 are collated, and a document part corresponding to the part constituting the pattern is created.

複数解析候補作成部５０２は、入力文分割部５０１によって作成された文書部品のそれぞれについて、１つ以上の解析候補を作成する。例えば、上述した例の文書部品１２０から複数の解析候補１２０Ａ及び１２０Ｂを作成し、文書部品１３０から複数の解析候補１３０Ａ及び１３０Ｂを作成する。 The multiple analysis candidate creation unit 502 creates one or more analysis candidates for each of the document parts created by the input sentence division unit 501. For example, a plurality of analysis candidates 120A and 120B are created from the document part 120 in the above example, and a plurality of analysis candidates 130A and 130B are created from the document part 130.

並列組合せ作成部５０３は、入力文分割部５０１によって作成された文書部品が並列構造を有すると判断された場合、複数解析候補作成部５０２によって作成された並列構造を構成する複数の文書部品それぞれの複数の解析候補（１２０Ａ，１２０Ｂ／１３０Ａ，１３Ｂ）の組み合せを作成する。例えば、解析候補１２０Ａと解析候補１３０Ａ、解析候補１２０Ａと解析候補１３０Ｂ、解析候補１２０Ｂと解析候補１３０Ａ、及び解析候補１２０Ｂと解析候補１３０Ｂを作成する。 When it is determined that the document part created by the input sentence splitting unit 501 has a parallel structure, the parallel combination creation unit 503 determines each of the plurality of document parts constituting the parallel structure created by the multiple analysis candidate creation unit 502. A combination of a plurality of analysis candidates (120A, 120B / 130A, 13B) is created. For example, the analysis candidate 120A and the analysis candidate 130A, the analysis candidate 120A and the analysis candidate 130B, the analysis candidate 120B and the analysis candidate 130A, and the analysis candidate 120B and the analysis candidate 130B are created.

並列類似値付与部５０４は、並列組合せ作成部５０３によって作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出する。並列類似値の具体的な算出については後述する。 The parallel similarity value assigning unit 504 calculates a parallel similarity value indicating the similarity between the analysis candidates constituting the combination created by the parallel combination creating unit 503. Specific calculation of the parallel similarity value will be described later.

頻度取得部５０５は、複数解析候補作成部５０２で生成された複数の解析候補それぞれについて解析済みコーパスデータベース（ＤＢ）５０７から頻度データを取得する。ここで、解析済みコーパスとは、過去に解析及び翻訳した結果であり、文種（法律文書，論文，操作マニュアル，等の分野）に対応して分類されたコーパスをいう。 The frequency acquisition unit 505 acquires frequency data from the analyzed corpus database (DB) 507 for each of the plurality of analysis candidates generated by the multiple analysis candidate creation unit 502. Here, the analyzed corpus is a result of analysis and translation in the past, and is a corpus classified according to sentence types (fields of legal documents, papers, operation manuals, etc.).

コーパス値付与部５０６は、頻度取得部５０５で解析候補毎に取得した頻度データに基づいて、複数解析候補作成部５０２によって作成された解析候補の出現度合を示すコーパス値を算出する。コーパス値の具体的な算出については後述する。 The corpus value assigning unit 506 calculates a corpus value indicating the appearance degree of the analysis candidates created by the multiple analysis candidate creation unit 502 based on the frequency data acquired for each analysis candidate by the frequency acquisition unit 505. Specific calculation of the corpus value will be described later.

総合値算出部５０８は、並列類似値付与部５０４によって算出された並列類似値とコーパス値付与部５０６によって算出されたコーパス値に基づいて、総合値を算出する。総合値の算出については後述する。 The total value calculation unit 508 calculates a total value based on the parallel similarity value calculated by the parallel similarity value giving unit 504 and the corpus value calculated by the corpus value giving unit 506. The calculation of the total value will be described later.

候補選択部５０９は、総合値算出部５０８によって算出された総合値に基づいて、複数解析候補作成部５０２によって作成された複数の解析候補から翻訳対象の文書部品の翻訳際して適用する解析候補を選択する。 Based on the total value calculated by the total value calculation unit 508, the candidate selection unit 509 analyzes candidates to be applied when translating the document parts to be translated from the plurality of analysis candidates created by the multiple analysis candidate creation unit 502. Select.

部品翻訳部５１０は、候補選択部５０９によって選択された解析候補に基づいて文書部品を翻訳する。 The component translation unit 510 translates the document component based on the analysis candidate selected by the candidate selection unit 509.

そして、結合部５１１は、部品翻訳部５１０によって翻訳された翻訳済文書部品を結合し、出力文として出力する。 Then, the combining unit 511 combines the translated document parts translated by the part translating unit 510 and outputs them as output sentences.

図４は、機械翻訳装置のコンピュータが実行する機械翻訳処理の流れを示すフローチャートである。図５は、パターンの例を示す図である。図６は、解析済みコーパスの作成の仕方を説明する図である。図７は、解析済みコーパスデータベースの例を示す図である。図８は、コーパス値の求め方を説明する図である。図９は、並列類似値を算出する条件を説明する図である。図１０は、並列類似値の算出例を示す図である。図１１は、総合値の算出例を示す図である。図１２は、正しく翻訳された翻訳結果の例を示す図である。 FIG. 4 is a flowchart showing the flow of machine translation processing executed by the computer of the machine translation apparatus. FIG. 5 is a diagram illustrating an example of a pattern. FIG. 6 is a diagram for explaining how to create an analyzed corpus. FIG. 7 is a diagram illustrating an example of an analyzed corpus database. FIG. 8 is a diagram for explaining how to obtain the corpus value. FIG. 9 is a diagram illustrating conditions for calculating parallel similarity values. FIG. 10 is a diagram illustrating an example of calculating parallel similarity values. FIG. 11 is a diagram illustrating an example of calculating the total value. FIG. 12 is a diagram illustrating an example of a translation result that is correctly translated.

これらの図４乃至図１２を用いて、英語で記述された特許文書を日本語で記述された特許文書に翻訳する例を説明する。 An example in which a patent document written in English is translated into a patent document written in Japanese will be described with reference to FIGS.

まず、ステップＳ４０１において、機械翻訳装置５００Ａの入力文分割部５０１が、入力文１００を図５に示すようなパターンと順次照合する。例えば、入力文１００は、「A device comprising text storing means for new data and data ranking part for the stored data.」である。図５に示した例では、この入力文１００はパターン「「部品Ｐ０」 comprising 「部品Ｐ１」 and 「部品Ｐ２」」と合致する。そして、入力文分割部５０１が、このパターンを構成する各部品に対応する文書部品を作成する。すなわち、入力文分割部５０１が、「部品Ｐ０」に対応する「A device」、「部品Ｐ１」に対応する「text storing means for new data」、及び「部品Ｐ２」に対応する「data ranking part for the stored data」を作成する。なお、図５で例示したパターンの右に端に記載した「あり」，「なし」は作成した文書部品に並列構造の有無を示している。 First, in step S401, the input sentence dividing unit 501 of the machine translation device 500A sequentially matches the input sentence 100 with a pattern as shown in FIG. For example, the input sentence 100 is “A device comprising text storing means for new data and data ranking part for the stored data”. In the example shown in FIG. 5, the input sentence 100 matches the patterns “part P0” comprising “part P1” and “part P2”. Then, the input sentence dividing unit 501 creates a document part corresponding to each part constituting this pattern. That is, the input sentence dividing unit 501 performs “A device” corresponding to “part P0”, “text storing means for new data” corresponding to “part P1”, and “data ranking part for” corresponding to “part P2”. the stored data ". Note that “present” and “none” shown at the right end of the pattern illustrated in FIG. 5 indicate the presence or absence of a parallel structure in the created document part.

ステップＳ４０２において、複数解析候補作成部５０２が、ステップＳ４０１で作成された文書部品のそれぞれについて、少なくともパターンを構成する「and」や「or」を挟んで配置される文書部品に対して１つ以上の解析候補を作成する。例えば、文書部品「text storing means for new data」からは、図２を用いて上述したような解析候補１２０Ａ及び１２０Ｂを作成する。文書部品「data ranking part for the stored data」からは、図２を用いて上述したような解析候補１３０Ａ及び１３０Ｂを作成する。 In step S402, the multiple analysis candidate creation unit 502 has at least one or more document parts arranged with “and” or “or” constituting a pattern between each of the document parts created in step S401. Create analysis candidates. For example, analysis candidates 120A and 120B as described above with reference to FIG. 2 are created from the document part “text storing means for new data”. From the document part “data ranking part for the stored data”, the analysis candidates 130A and 130B as described above with reference to FIG. 2 are created.

ステップＳ４０３において、頻度取得部５０５が複数解析候補作成部５０２で生成された複数の解析候補それぞれについて、解析済みコーパスＤＢ５０７から頻度データを取得する。そして、コーパス値付与部５０６は、複数の解析候補毎に取得した頻度データに基づいて、ステップＳ４０２で作成された解析候補の出現度合を示すコーパス値を算出し、解析候補に付与する。 In step S <b> 403, the frequency acquisition unit 505 acquires frequency data from the analyzed corpus DB 507 for each of the plurality of analysis candidates generated by the multiple analysis candidate creation unit 502. Then, the corpus value assigning unit 506 calculates a corpus value indicating the appearance degree of the analysis candidate created in step S402 based on the frequency data acquired for each of the plurality of analysis candidates, and assigns it to the analysis candidate.

解析済みコーパスＤＢ５０７は、下記のようにして作成される。
解析済コーパスは過去に翻訳された結果が蓄積されたデータベースであり、文種（法律文書，論文，操作マニュアル，等の分野）に対応して分類され、翻訳元の文書と翻訳済文書が対応づけられて登録されている。登録データは、翻訳時の解析結果を含めて登録される。例えば、図６の（Ａ）に示すような文書部品「text storing means for new data」の場合を以下に説明する。 The analyzed corpus DB 507 is created as follows.
Analyzed corpus is a database in which the results of past translations are accumulated, classified according to sentence type (fields such as legal documents, papers, operation manuals, etc.), and the source document and the translated document correspond. Attached and registered. The registration data is registered including the analysis results at the time of translation. For example, the case of a document part “text storing means for new data” as shown in FIG.

解析結果として、「text」、「storing」、「means」、及び「for new data」の４つのフレーズが得られる。ここで、フレーズ「text」と「means」は名詞であり、フレーズ「for new data」は名詞句である。しかしながら、フレーズ「storing」は形容詞にも成り得るし、動詞にも成り得る。フレーズ「storing」を形容詞とするか動詞とするかにより、図６の（Ｂ）のような解析結果が作成されたり、図６の（Ｃ）のような解析結果が作成されたりする。この図６の（Ｂ）及び（Ｃ）のような解析結果は曖昧性があると判断する。 As an analysis result, four phrases “text”, “storing”, “means”, and “for new data” are obtained. Here, the phrases “text” and “means” are nouns, and the phrase “for new data” is a noun phrase. However, the phrase “storing” can be an adjective or a verb. Depending on whether the phrase “storing” is an adjective or a verb, an analysis result as shown in FIG. 6B is created or an analysis result as shown in FIG. 6C is created. The analysis results as shown in FIGS. 6B and 6C are determined to be ambiguous.

また、図６の（Ｄ）に示すような文書部品「communication means for new data」を解析すると、３つのフレーズ「communication」、「means」、及び「for new data」が得られる。ここで、フレーズ「communication」と「means」は名詞であり、フレーズ「for new data」は名詞句である。この図６（Ｄ）のような解析結果は曖昧性がないと判断する。 When the document part “communication means for new data” as shown in FIG. 6D is analyzed, three phrases “communication”, “means”, and “for new data” are obtained. Here, the phrases “communication” and “means” are nouns, and the phrase “for new data” is a noun phrase. It is determined that the analysis result as shown in FIG. 6D is not ambiguous.

上記の解析結果と併せ、最後に係り受けられる語や句等のフレーズが主要部となる。
以上のような図６の（Ｂ）（Ｃ）（Ｅ）のような解析結果から曖昧性のある解析結果を除外した図６（Ｅ）のみが上記の解析済みコーパスとなる。 Along with the above analysis results, phrases such as words and phrases that are finally involved are the main parts.
Only the FIG. 6E in which the ambiguous analysis result is excluded from the analysis results as shown in FIGS. 6B, 6C, and 6E is the analyzed corpus.

このようにして、図７に示すように、曖昧性のない解析結果が格納された解析済みコーパスＤＢ５０７が作成される。 In this way, as shown in FIG. 7, an analyzed corpus DB 507 in which an unambiguous analysis result is stored is created.

コーパス値は、下記にようにして算出される。
まず、入力文１００「A device comprising text storing means for new data and data ranking part for the stored data.」が、上述したように対応するパターンに基づいて分割される。分割された入力文１００は、部品Ｐ１「text storing means for new data」、及び部品Ｐ２「data ranking part for the stored data」を含む（図８の（Ａ））。 The corpus value is calculated as follows.
First, the input sentence 100 “A device comprising text storing means for new data and data ranking part for the stored data” is divided based on the corresponding pattern as described above. The divided input sentence 100 includes a part P1 “text storing means for new data” and a part P2 “data ranking part for the stored data” ((A) in FIG. 8).

部品Ｐ１からは、図８の（Ｂ）及び（Ｃ）のような２つの解析候補（Ｐ１−ａ）及び（Ｐ１−ｂ）が得られる。部品Ｐ２からは、図８の（Ｄ）及び（Ｅ）のような２つの解析候補（Ｐ２−ａ）及び（Ｐ２−ｂ）が得られる。 Two analysis candidates (P1-a) and (P1-b) as shown in FIGS. 8B and 8C are obtained from the component P1. From the component P2, two analysis candidates (P2-a) and (P2-b) as shown in (D) and (E) of FIG. 8 are obtained.

これらの解析候補（Ｐ１−ａ）（Ｐ１−ｂ）（Ｐ２−ａ）（Ｐ２−ｂ）について、解析済みコーパスＤＢを参照する。そして、例えば、解析候補（Ｐ１−ａ）については、解析候補の主要部の頻度と主要部の句としての頻度をそれぞれ主要部頻度「５３８」及び句頻度「１９」として取得する。コーパス値は、定数α×主要頻度＋定数β×句頻度を算出して求める。ここで、α及びβは経験的に定められ、例えば、α＝１、β＝４である。この場合、解析候補（Ｐ１−ａ）のコーパス値は「６１４」となる。同様に解析候補（Ｐ１−ｂ）のコーパス値は「７」。解析候補（Ｐ２−ａ）のコーパス値は「２３」。そして、解析候補（Ｐ２−ｂ）のコーパス値は「９２５」となる。 The analyzed corpus DB is referred to for these analysis candidates (P1-a) (P1-b) (P2-a) (P2-b). For example, for the analysis candidate (P1-a), the main part frequency and the main part phrase frequency of the analysis candidate are acquired as the main part frequency “538” and the phrase frequency “19”, respectively. The corpus value is obtained by calculating constant α × primary frequency + constant β × phrase frequency. Here, α and β are determined empirically. For example, α = 1 and β = 4. In this case, the corpus value of the analysis candidate (P1-a) is “614”. Similarly, the corpus value of the analysis candidate (P1-b) is “7”. The corpus value of the analysis candidate (P2-a) is “23”. The corpus value of the analysis candidate (P2-b) is “925”.

次に、図４のステップＳ４０４において、並列組合せ作成部５０３が、ステップＳ４０１で作成された文書部品間の並列構造について、ステップＳ４０２で作成された解析候補の組み合せを作成する。例えば、図８の（Ｂ）に示した（Ｐ１−ａ）と図８の（Ｄ）に示した（Ｐ２−ａ）の組み合わせや、（Ｐ１−ａ）と図８の（Ｅ）に示した（Ｐ２−ｂ）の組み合わせ等を作成する。 Next, in step S404 in FIG. 4, the parallel combination creation unit 503 creates a combination of analysis candidates created in step S402 for the parallel structure between document parts created in step S401. For example, a combination of (P1-a) shown in (B) of FIG. 8 and (P2-a) shown in (D) of FIG. 8, or (P1-a) and (E) of FIG. A combination of (P2-b) is created.

そして、ステップＳ４０５において、ステップＳ４０４で作成した組み合わせを順に抽出する。 In step S405, the combinations created in step S404 are extracted in order.

ステップＳ４０６において、並列類似値付与部５０４が、ステップＳ４０４で作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出し、解析候補間に付与する。 In step S406, the parallel similarity value assigning unit 504 calculates a parallel similarity value indicating the similarity between the analysis candidates constituting the combination created in step S404, and assigns it between the analysis candidates.

並列類似値は、各解析結果を構成するフレーズの数の合計に対する、所定条件の一致数の割合として算出される。所定条件の一致数とは、各解析結果の同じ位置で出現するフレーズの間における、品詞と係り受けの方向が一致する組の数である。例えば、図９の（Ａ）に示す例は、名詞「text」から動詞「storing」の方向への係り受けと（左側）、動名詞「ranking」から名詞「data」の方向への係り受けと（右側）である。図９の（Ａ）に示す例は、品詞も係り受けの方向も一致していない。図９の（Ｂ）に示す例は、動名詞「storing」から名詞「means」の方向への係り受けと（左側）、名詞「part」から動詞「ranking」の方向への係り受けと（右側）である。図９の（Ｂ）に示す例は、品詞も係り受けの方向も一致していない。図９の（Ｃ）に示す例は、名詞句「for new data」から名詞「means」の方向への係り受けと（左側）、名詞句「for the stored data」から名詞「part」の方向への係り受けと（右側）である。図９の（Ｃ）に示す例は、品詞も係り受けの方向も一致している。 The parallel similarity value is calculated as a ratio of the number of matching predetermined conditions to the total number of phrases constituting each analysis result. The number of matches of the predetermined condition is the number of pairs in which the part of speech and the dependency direction match between phrases appearing at the same position in each analysis result. For example, the example shown in FIG. 9A includes a dependency from the noun “text” to the verb “storing” (left side) and a dependency from the noun “ranking” to the noun “data”. (Right side). In the example shown in FIG. 9A, neither the part of speech nor the dependency direction matches. The example shown in FIG. 9B includes a dependency from the verbal noun “storing” to the noun “means” (left) and a dependency from the noun “part” to the verb “ranking” (right) ). In the example shown in FIG. 9B, neither the part of speech nor the dependency direction matches. The example shown in FIG. 9C is a dependency from the noun phrase “for new data” to the noun “means” (left side), and from the noun phrase “for the stored data” to the noun “part”. And (right). In the example shown in FIG. 9C, the part of speech and the dependency direction are the same.

したがって、解析候補（Ｐ１−ａ）と（Ｐ２−ａ）の組み合わせの場合、図１０の（Ａ）に示すように、各解析候補を構成するフレーズの数の合計が６で、係り受けの方向の一致数が１となる。よって、並列類似値を百分率で表した場合、（１／６）×１００＝１７（％）となる。同様に、解析候補（Ｐ１−ａ）と（Ｐ２−ｂ）の組み合わせの場合、図１０の（Ｂ）に示すように、各解析候補を構成するフレーズの数の合計が６で、係り受けの方向の一致数が３となる。よって、並列類似値を百分率で表した場合、（３／６）×１００＝５０（％）となる。解析候補（Ｐ１−ｂ）と（Ｐ２−ａ）の組み合わせの場合、図１０の（Ｃ）に示すように、各解析候補を構成するフレーズの数の合計が６で、係り受けの方向の一致数が３となる。よって、並列類似値を百分率で表した場合、（３／６）×１００＝５０（％）となる。解析候補（Ｐ１−ｂ）と（Ｐ２−ｂ）の組み合わせの場合、図１０の（Ｄ）に示すように、各解析候補を構成するフレーズの数の合計が６で、係り受けの方向の一致数が１となる。よって、並列類似値を百分率で表した場合、（１／６）×１００＝１７（％）となる。 Therefore, in the case of the combination of analysis candidates (P1-a) and (P2-a), the total number of phrases constituting each analysis candidate is 6, as shown in FIG. The number of matches is 1. Therefore, when the parallel similarity value is expressed as a percentage, (1/6) × 100 = 17 (%). Similarly, in the case of a combination of analysis candidates (P1-a) and (P2-b), the total number of phrases constituting each analysis candidate is 6, as shown in FIG. The number of direction matches is 3. Therefore, when the parallel similarity value is expressed as a percentage, (3/6) × 100 = 50 (%). In the case of the combination of analysis candidates (P1-b) and (P2-a), as shown in FIG. 10C, the total number of phrases constituting each analysis candidate is 6, and the direction of dependency is the same. The number is 3. Therefore, when the parallel similarity value is expressed as a percentage, (3/6) × 100 = 50 (%). In the case of a combination of analysis candidates (P1-b) and (P2-b), as shown in FIG. 10 (D), the total number of phrases constituting each analysis candidate is 6, and the direction of dependency is consistent. The number is 1. Therefore, when the parallel similarity value is expressed as a percentage, (1/6) × 100 = 17 (%).

ステップＳ４０７において、総合値算出部５０８が、ステップＳ４０６で算出された並列類似値と、ステップＳ４０３で算出されたコーパス値に基づいて、総合値を算出する。 In step S407, the total value calculation unit 508 calculates a total value based on the parallel similarity value calculated in step S406 and the corpus value calculated in step S403.

総合値は、例えば、（定数γ）×（並列類似値）＋（定数δ）×（各コーパス値の合計）を算出して求める。ここで、γ及びδは経験的に定められ、例えば、γ＝１、δ＝０．１である。例えば、解析候補（Ｐ１−ａ）と（Ｐ２−ａ）の組み合わせの場合、図１１に示したように、解析候補（Ｐ１−ａ）と（Ｐ２−ａ）の並列類似値が１７であり、解析候補（Ｐ１−ａ）のコーパス値が６１４、解析候補（Ｐ２−ａ）のコーパス値が２３である。よって、上記式に基づいて計算した総合値は８１となる。同様に、解析候補（Ｐ１−ａ）と（Ｐ２−ｂ）の組み合わせの場合、並列類似値が５０であり、解析候補（Ｐ１−ａ）のコーパス値が７、解析候補（Ｐ２−ｂ）のコーパス値が９２５である。よって、上記式に基づいて計算した総合値は２０４となる。解析候補（Ｐ１−ｂ）と（Ｐ２−ａ）の組み合わせの場合、並列類似値が５０であり、解析候補（Ｐ１−ｂ）のコーパス値が７、解析候補（Ｐ２−ａ）のコーパス値が２３である。よって、上記式に基づいて計算した総合値は５３となる。解析候補（Ｐ１−ｂ）と（Ｐ２−ｂ）の組み合わせの場合、並列類似値が１７であり、解析候補（Ｐ１−ｂ）のコーパス値が７、解析候補（Ｐ２−ｂ）のコーパス値が９２５である。よって、上記式に基づいて計算した総合値は１１０となる。 The total value is obtained, for example, by calculating (constant γ) × (parallel similarity value) + (constant δ) × (total of each corpus value). Here, γ and δ are determined empirically, for example, γ = 1 and δ = 0.1. For example, in the case of a combination of analysis candidates (P1-a) and (P2-a), the parallel similarity value of analysis candidates (P1-a) and (P2-a) is 17, as shown in FIG. The corpus value of the analysis candidate (P1-a) is 614, and the corpus value of the analysis candidate (P2-a) is 23. Therefore, the total value calculated based on the above formula is 81. Similarly, in the case of the combination of the analysis candidates (P1-a) and (P2-b), the parallel similarity value is 50, the corpus value of the analysis candidate (P1-a) is 7, and the analysis candidate (P2-b) The corpus value is 925. Therefore, the total value calculated based on the above formula is 204. In the case of the combination of the analysis candidates (P1-b) and (P2-a), the parallel similarity value is 50, the corpus value of the analysis candidate (P1-b) is 7, and the corpus value of the analysis candidate (P2-a) is 23. Therefore, the total value calculated based on the above formula is 53. In the case of the combination of the analysis candidates (P1-b) and (P2-b), the parallel similarity value is 17, the corpus value of the analysis candidate (P1-b) is 7, and the corpus value of the analysis candidate (P2-b) is 925. Therefore, the total value calculated based on the above formula is 110.

そして、ステップＳ４０８において、最後の組み合わせまで総合値を算出したか否かを判断する。最後の組み合わせまで総合値を算出したら（ステップＳ４０８：ＹＥＳ）、ステップＳ４０９において、候補選択部５０９が、ステップＳ４０７で算出された総合値に基づいて、ステップＳ４０２で作成された複数の解析候補から翻訳対象の文書部品の翻訳に適用する解析候補を選択する。図１１に示した例では、算出された総合値のうち最大の値となった解析候補（Ｐ１−ａ）と（Ｐ２−ｂ）の組み合わせが選択される。 In step S408, it is determined whether the total value has been calculated up to the last combination. When the total value is calculated up to the last combination (step S408: YES), in step S409, the candidate selection unit 509 translates from the plurality of analysis candidates created in step S402 based on the total value calculated in step S407. Select an analysis candidate to be applied to the translation of the target document part. In the example illustrated in FIG. 11, the combination of analysis candidates (P1-a) and (P2-b) that has the maximum value among the calculated total values is selected.

ステップＳ４１０において、部品翻訳部５１０が、ステップＳ４０９で選択された解析候補に基づいて文書部品を翻訳する。 In step S410, the component translation unit 510 translates the document component based on the analysis candidate selected in step S409.

そして、ステップＳ４１１において、結合部５１１が、ステップＳ４１０で翻訳された翻訳済文書部品を結合し、図１２に示すような出力文として出力する。 In step S411, the combining unit 511 combines the translated document parts translated in step S410 and outputs them as an output sentence as shown in FIG.

以上説明したように、第１の実施の形態によれば、翻訳対象の文章を分割して得られた各文書部品の翻訳において、並列関係にある文書部品間の並列類似値、及び文種に対応した分野のコーパス値に基づいて、各文書部品の解析候補を選択する。この選択により翻訳対象の文章全体における文書部品の構成を勘案した曖昧性が解消され、翻訳精度を向上させることができる。 As described above, according to the first embodiment, in the translation of each document part obtained by dividing the sentence to be translated, the parallel similarity value between the document parts in parallel relation and the sentence type are converted. Based on the corpus value in the corresponding field, an analysis candidate for each document part is selected. This selection eliminates the ambiguity in consideration of the structure of the document parts in the entire sentence to be translated, and improves the translation accuracy.

例えば、特許文書の３割以上は、４０形態素以上の長文から構成されている。このような４０形態素以上の長文では、９割以上の入力文で並列構造が現れるため、並列構造の情報の利用は有効である。 For example, 30% or more of patent documents are composed of long sentences of 40 morphemes or more. In such a long sentence of 40 morphemes or more, a parallel structure appears in 90% or more of input sentences. Therefore, use of the information of the parallel structure is effective.

図１３は、第２の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。図１４は、文書部品間に並列構造が存在しない場合の例を示す図である。図１５は、文書部品間に並列構造が存在する場合の例を示す図である。 FIG. 13 is a diagram illustrating functional blocks of the machine translation apparatus according to the second embodiment. FIG. 14 is a diagram illustrating an example in which there is no parallel structure between document parts. FIG. 15 is a diagram illustrating an example in which a parallel structure exists between document parts.

図１３において、機械翻訳装置５００Ｂは、第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する。機械翻訳装置５００Ｂは、入力文分割部５０１、複数解析候補作成部５０２、並列構造有無判別部１５０１、及び第１の処理制御部１５０２を備える。更に機械翻訳装置５００Ｂは、並列組合せ作成部５０３、並列類似値付与部５０４、頻度取得部５０５、コーパス値付与部５０６、総合値算出部５０８、候補選択部５０９、部品翻訳部５１０、及び結合部５１１を備える。 In FIG. 13, the machine translation apparatus 500B translates a translation target sentence described in the first language into a translation target sentence described in the second language. The machine translation apparatus 500B includes an input sentence dividing unit 501, a multiple analysis candidate creating unit 502, a parallel structure presence / absence determining unit 1501, and a first processing control unit 1502. Furthermore, the machine translation apparatus 500B includes a parallel combination creation unit 503, a parallel similarity value assignment unit 504, a frequency acquisition unit 505, a corpus value assignment unit 506, an overall value calculation unit 508, a candidate selection unit 509, a component translation unit 510, and a combination unit 511.

並列構造有無判別部１５０１及び第１の処理制御部１５０２以外は、図３を用いて説明した機械翻訳装置５００Ａと同様であるので、説明を省略する。 Except for the parallel structure presence / absence determining unit 1501 and the first processing control unit 1502, the description is omitted because it is the same as the machine translation apparatus 500A described with reference to FIG.

並列構造有無判別部１５０１は、複数解析候補作成部５０２によって作成された解析候補間に並列構造が存在するか否かを判別する。例えば、図１４に示したように、「A device comprising text storing means for new data.」のような入力文１００が、パターン「「部品Ｐ０」 comprising 「部品Ｐ１」」と照合された場合、「部品Ｐ０」や「部品Ｐ１」と並列構造を有する文書部品が存在しない。これに対して、図１５に示したように、「A device comprising text storing means for new data and data ranking part for the stored data.」のような入力文１００が、パターン「「部品Ｐ０」comprising 「部品Ｐ１」 and 「部品Ｐ２」」と照合された場合、「部品Ｐ１」と「部品Ｐ２」は、並列構造を有する文書部品である。 The parallel structure presence / absence determination unit 1501 determines whether or not a parallel structure exists between the analysis candidates created by the multiple analysis candidate creation unit 502. For example, as illustrated in FIG. 14, when an input sentence 100 such as “A device comprising text storing means for new data” is collated with the pattern ““ part P0 ”comprising“ part P1 ””, There is no document part having a parallel structure with “P0” or “part P1”. On the other hand, as shown in FIG. 15, an input sentence 100 such as “A device comprising text storing means for new data and data ranking part for the stored data” is represented by the pattern “part P0” comprising “part. When collated with “P1” and “part P2”, “part P1” and “part P2” are document parts having a parallel structure.

そして、第１の処理制御部１５０２は、並列構造有無判別部１５０１によって並列構造が存在すると判別された場合、並列組合せ作成部５０３及び並列類似値付与部５０４の処理が行われるように制御する。 Then, when the parallel structure presence / absence determining unit 1501 determines that a parallel structure exists, the first processing control unit 1502 controls the parallel combination creating unit 503 and the parallel similarity value adding unit 504 to perform processing.

これにより、機械翻訳装置５００Ｂは、並列構造を持った入力文に対してのみ、図４を用いて説明した並列構造を構成する文書部品に対して複数の解析候補から適切な解析候補を選択する処理が実行さる。よって、本案の様な処理が必要な並列構造を有する翻訳対象文に対してより効率的に適切な解析候補に従って翻訳を行うことが可能となる。 Thereby, the machine translation apparatus 500B selects an appropriate analysis candidate from a plurality of analysis candidates for the document parts constituting the parallel structure described with reference to FIG. 4 only for the input sentence having the parallel structure. Processing is executed. Therefore, it becomes possible to more efficiently translate a sentence to be translated having a parallel structure that requires processing like the present plan according to an appropriate analysis candidate.

図１６は、第３の実施の形態に係る機械翻訳装置の機能ブロックを示す図である。
図１６において、機械翻訳装置５００Ｃは、第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する。機械翻訳装置５００Ｃは、入力文分割部５０１、複数解析候補作成部５０２、並列組合せ作成部５０３、並列類似値付与部５０４、並列類似値閾値判別部１６０１、第２の処理制御部１６０２、頻度取得部５０５、コーパス値付与部５０６、総合値算出部５０８、候補選択部５０９、部品翻訳部５１０、及び結合部５１１を備える。 FIG. 16 is a diagram illustrating functional blocks of the machine translation apparatus according to the third embodiment.
In FIG. 16, the machine translation device 500C translates a translation target sentence described in the first language into a translation target sentence described in the second language. The machine translation apparatus 500C includes an input sentence division unit 501, a plurality of analysis candidate creation units 502, a parallel combination creation unit 503, a parallel similarity value assignment unit 504, a parallel similarity value threshold determination unit 1601, a second processing control unit 1602, and a frequency acquisition. A unit 505, a corpus value assigning unit 506, a total value calculating unit 508, a candidate selecting unit 509, a component translating unit 510, and a combining unit 511.

並列類似値閾値判別部１６０１及び第２の処理制御部１６０２以外は、図３を用いて説明した機械翻訳装置５００Ａと同様であるので、説明を省略する。 Except for the parallel similarity threshold determination unit 1601 and the second processing control unit 1602, the description is omitted because it is the same as the machine translation device 500A described with reference to FIG.

並列類似値閾値判別部１６０１は、並列類似値付与部５０４によって算出された並列類似値が予め定められた閾値以上か否かを判別する。 The parallel similarity value threshold determination unit 1601 determines whether or not the parallel similarity value calculated by the parallel similarity value giving unit 504 is equal to or greater than a predetermined threshold value.

第２の処理制御部１６０２は、並列類似値閾値判別部１６０１によって並列類似値が閾値以上であると判別された場合、総合値算出部５０８によって総合値を算出するように制御する。 The second processing control unit 1602 controls the total value calculation unit 508 to calculate the total value when the parallel similarity value threshold determination unit 1601 determines that the parallel similarity value is equal to or greater than the threshold value.

これにより、並列類似度の高い入力文、すなわち並列構造を構成する文書部品間で解析候補の類似性が高い解析候補に対してのみ、解析候補の選択対象とすることで総合値算出部５０８の処理を削減できる。 As a result, only the input candidate having a high parallel similarity, that is, the analysis candidate having a high similarity of the analysis candidates between the document parts constituting the parallel structure is selected as the analysis candidate selection target. Processing can be reduced.

図１７は、情報処理装置の構成図である。
図１の機械翻訳装置５００Ａ、５００Ｂ、５００Ｃは、例えば、図１７に示すような情報処理装置（コンピュータ）を用いて実現することが可能である。図１７の情報処理装置は、ＣＰＵ（Central Processing Unit）１７０１、メモリ１７０２、入力装置１７０３、出力装置１７０４、外部記録装置１７０５、媒体駆動装置１７０６及びネットワーク接続装置１７０７を備える。これらはバス１７０８により互いに接続されている。 FIG. 17 is a configuration diagram of the information processing apparatus.
The machine translation apparatuses 500A, 500B, and 500C in FIG. 1 can be realized using an information processing apparatus (computer) as shown in FIG. 17, for example. 17 includes a CPU (Central Processing Unit) 1701, a memory 1702, an input device 1703, an output device 1704, an external recording device 1705, a medium driving device 1706, and a network connection device 1707. These are connected to each other by a bus 1708.

メモリ１７０２は、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、フラッシュメモリ等の半導体メモリであり、機械翻訳装置５００Ａ、５００Ｂ、５００Ｃが実行する機械翻訳処理に用いられるプログラム及びデータを格納する。例えば、ＣＰＵ１７０１は、メモリ１７０２を利用してプログラムを実行することにより、上述の機械翻訳処理を行う。 The memory 1702 is, for example, a semiconductor memory such as a ROM (Read Only Memory), a RAM (Random Access Memory), or a flash memory, and stores programs and data used for machine translation processing executed by the machine translation devices 500A, 500B, and 500C. Store. For example, the CPU 1701 performs the above-described machine translation process by executing a program using the memory 1702.

入力装置１７０３は、例えば、キーボード、ポインティングデバイス等であり、オペレータからの指示や情報の入力に用いられる。出力装置１７０４は、例えば、表示装置、プリンタ、スピーカ等であり、オペレータへの問い合わせや処理結果の出力に用いられる。 The input device 1703 is, for example, a keyboard, a pointing device, or the like, and is used for inputting instructions and information from an operator. The output device 1704 is, for example, a display device, a printer, a speaker, and the like, and is used to output an inquiry to the operator and a processing result.

外部記録装置１７０５は、例えば、磁気ディスク装置、光ディスク装置、光磁気ディスク装置、テープ装置等である。この外部記録装置１７０５には、ハードディスクドライブも含まれる。情報処理装置は、この外部記録装置１７０５にプログラム及びデータを格納しておき、それらをメモリ１７０２にロードして使用することができる。 The external recording device 1705 is, for example, a magnetic disk device, an optical disk device, a magneto-optical disk device, a tape device, or the like. The external recording device 1705 includes a hard disk drive. The information processing apparatus can store programs and data in the external recording apparatus 1705 and load them into the memory 1702 for use.

媒体駆動装置１７０６は、可搬型記録媒体１７０９を駆動し、その記録内容にアクセスする。可搬型記録媒体１７０９は、メモリデバイス、フレキシブルディスク、光ディスク、光磁気ディスク等である。この可搬型記録媒体１７０９には、Compact Disk Read Only Memory （ＣＤ−ＲＯＭ）、Digital Versatile Disk（ＤＶＤ）、Universal Serial Bus（ＵＳＢ）メモリ等も含まれる。オペレータは、この可搬型記録媒体１７０９にプログラム及びデータを格納しておき、それらをメモリ１７０２にロードして使用することができる。 The medium driving device 1706 drives a portable recording medium 1709 and accesses the recorded contents. The portable recording medium 1709 is a memory device, a flexible disk, an optical disk, a magneto-optical disk, or the like. The portable recording medium 1709 includes a compact disk read only memory (CD-ROM), a digital versatile disk (DVD), a universal serial bus (USB) memory, and the like. An operator can store programs and data in the portable recording medium 1709 and load them into the memory 1702 for use.

このように、機械翻訳処理に用いられるプログラム及びデータを格納するコンピュータ読み取り可能な記録媒体には、メモリ１７０２、外部記録装置１７０５、及び可搬型記録媒体１７０９のような、物理的な（非一時的な）記録媒体が含まれる。 As described above, computer-readable recording media for storing programs and data used for machine translation processing include physical (non-temporary) such as the memory 1702, the external recording device 1705, and the portable recording medium 1709. Recording medium).

ネットワーク接続装置１７０７は、通信ネットワーク１７１０に接続され、通信に伴うデータ変換を行う通信インターフェースである。情報処理装置は、プログラム及びデータを外部の装置からネットワーク接続装置１７０７を介して受け取り、それらをメモリ１７０２にロードして使用することができる。 The network connection device 1707 is a communication interface that is connected to the communication network 1710 and performs data conversion accompanying communication. The information processing apparatus can receive a program and data from an external apparatus via the network connection apparatus 1707, and can use them by loading them into the memory 1702.

開示の実施の形態とその利点について詳しく説明したが、当業者は、特許請求の範囲に明確に記載した本発明の範囲から逸脱することなく、様々な変更、追加、省略をすることができるであろう。 Although the disclosed embodiment and its advantages have been described in detail, those skilled in the art can make various modifications, additions and omissions without departing from the scope of the present invention as explicitly described in the claims. I will.

図面を参照しながら説明した実施の形態に関し、さらに以下の付記を開示する。
（付記１）
第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する機械翻訳装置のコンピュータが実行する機械翻訳方法であって、
所定規則に基づいて、前記翻訳対象文章を分割して文書部品を作成し、
前記作成された文書部品のそれぞれについて、１つ以上の解析候補を作成し、
前記作成された文書部品間に並列構造が認められた場合、並列構造を構成する文書部品それぞれに対して作成された解析候補について、解析候補の組み合せを作成し、
前記作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出し、
過去に解析した結果である解析済みコーパスに基づいて、前記作成された解析候補の出現度合を示すコーパス値を算出し、
前記算出された並列類似値と前記算出されたコーパス値に基づいて、前記作成された１つ以上の解析候補から翻訳対象の文書部品の翻訳に際して適用する解析候補を選択する、
ことを特徴とする機械翻訳方法。
（付記２）
前記作成された解析候補間に並列構造が存在するか否かを判別し、
前記並列構造が存在すると判別された場合、前記解析候補の組み合わせを作成するように制御する、
ことを特徴とする付記１に記載の機械翻訳方法。
（付記３）
前記算出された並列類似値が予め定められた閾値以上か否かを判別し、
前記並列類似値が前記閾値以上であると判別された解析候補に対してコーパス値を算出するように制御する、
ことを特徴とする付記１に記載の機械翻訳方法。
（付記４）
第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する機械翻訳装置のコンピュータに、
所定規則に基づいて、前記翻訳対象文章を分割して文書部品を作成し、
前記作成された文書部品のそれぞれについて、１つ以上の解析候補を作成し、
前記作成された文書部品間に並列構造が認められた場合、並列構造を構成する文書部品それぞれに対して作成された解析候補について、解析候補の組み合せを作成し、
前記作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出し、
過去に解析した結果である解析済みコーパスに基づいて、前記作成された解析候補の出現度合を示すコーパス値を算出し、
前記算出された並列類似値と前記算出されたコーパス値に基づいて、前記作成された１つ以上の解析候補から翻訳対象の文書部品の翻訳に際して適用する解析候補を選択する、
処理を実行させることを特徴とする機械翻訳プログラム。
（付記５）
前記作成された解析候補間に並列構造が存在するか否かを判別し、
前記並列構造が存在すると判別された場合、前記解析候補の組み合わせを作成するように制御する、
処理を実行させることを特徴とする付記４に記載の機械翻訳プログラム。
（付記６）
前記算出された並列類似値が予め定められた閾値以上か否かを判別し、
前記並列類似値が前記閾値以上であると判別された解析候補に対してコーパス値を算出するように制御する、
処理を実行させることを特徴とする付記４に記載の機械翻訳プログラム。
（付記７）
第１言語で記述された翻訳対象文章を第２言語で記述された翻訳目的文章に翻訳する機械翻訳装置において、
所定規則に基づいて、前記翻訳対象文章を分割して文書部品を作成する入力文分割部と、
前記入力文分割部によって作成された文書部品のそれぞれについて、１つ以上の解析候補を作成する複数解析候補作成部と、
前記入力文分割部によって作成された文書部品間に並列構造が認められた場合、並列構造を構成する文書部品それぞれに対して作成された解析候補について、解析候補の組み合せを作成する並列組合せ作成部と、
前記並列組合せ作成部によって作成された組み合わせを構成する解析候補間の類似度を示す並列類似値を算出する並列類似値付与部と、
過去に解析した結果である解析済みコーパスに基づいて、前記複数解析候補作成部によって作成された解析候補の出現度合を示すコーパス値を算出するコーパス値付与部と、
前記並列類似値付与部によって算出された並列類似値と前記コーパス値付与部によって算出されたコーパス値に基づいて、前記複数解析候補作成部によって作成された１つ以上の解析候補から翻訳対象の文書部品の翻訳に際して適用する解析候補を選択する候補選択部と、
を備えることを特徴とする機械翻訳装置。
（付記８）
前記複数解析候補作成部によって作成された解析候補間に並列構造が存在するか否かを判別する並列構造有無判別部と、
前記並列構造有無判定部によって並列構造が存在すると判別された場合、前記並列組合せ作成部によって解析候補の組み合わせを作成するように制御する第１の処理制御部と、
を備えることを特徴とする付記７に記載の機械翻訳装置。
（付記９）
前記並列類似値付与部によって算出された並列類似値が予め定められた閾値以上か否かを判別する並列類似値閾値判別部と、
前記並列類似値閾値判別部によって並列類似値が前記閾値以上であると判別された解析候補に対してコーパス値を算出するように制御する第２の処理制御部と、
を備えることを特徴とする付記７に記載の機械翻訳装置。 The following notes are further disclosed with respect to the embodiment described with reference to the drawings.
(Appendix 1)
A machine translation method executed by a computer of a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, create a document part by dividing the sentence to be translated,
Creating one or more analysis candidates for each of the created document parts;
When a parallel structure is recognized between the created document parts, a combination of analysis candidates is created for the analysis candidates created for each document part constituting the parallel structure,
Calculate a parallel similarity value indicating the similarity between the analysis candidates constituting the created combination,
Based on the analyzed corpus that is the result of analysis in the past, a corpus value indicating the appearance degree of the created analysis candidate is calculated,
Based on the calculated parallel similarity value and the calculated corpus value, select an analysis candidate to be applied when translating a document part to be translated from the created one or more analysis candidates.
A machine translation method characterized by the above.
(Appendix 2)
Determining whether a parallel structure exists between the created analysis candidates,
If it is determined that the parallel structure exists, control to create a combination of the analysis candidates,
The machine translation method according to Supplementary Note 1, wherein:
(Appendix 3)
Determining whether the calculated parallel similarity value is equal to or greater than a predetermined threshold;
Control to calculate a corpus value for the analysis candidate determined that the parallel similarity value is equal to or greater than the threshold,
The machine translation method according to Supplementary Note 1, wherein:
(Appendix 4)
In a computer of a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, create a document part by dividing the sentence to be translated,
Creating one or more analysis candidates for each of the created document parts;
When a parallel structure is recognized between the created document parts, a combination of analysis candidates is created for the analysis candidates created for each document part constituting the parallel structure,
Calculate a parallel similarity value indicating the similarity between the analysis candidates constituting the created combination,
Based on the analyzed corpus that is the result of analysis in the past, a corpus value indicating the appearance degree of the created analysis candidate is calculated,
Based on the calculated parallel similarity value and the calculated corpus value, select an analysis candidate to be applied when translating a document part to be translated from the created one or more analysis candidates.
A machine translation program for executing a process.
(Appendix 5)
Determining whether a parallel structure exists between the created analysis candidates,
If it is determined that the parallel structure exists, control to create a combination of the analysis candidates,
The machine translation program according to attachment 4, wherein the machine translation program is executed.
(Appendix 6)
Determining whether the calculated parallel similarity value is equal to or greater than a predetermined threshold;
Control to calculate a corpus value for the analysis candidate determined that the parallel similarity value is equal to or greater than the threshold,
The machine translation program according to attachment 4, wherein the machine translation program is executed.
(Appendix 7)
In a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, an input sentence dividing unit that divides the translation target sentence to create a document part;
A plurality of analysis candidate creation units for creating one or more analysis candidates for each of the document parts created by the input sentence dividing unit;
A parallel combination creation unit that creates a combination of analysis candidates for the analysis candidates created for each of the document parts constituting the parallel structure when a parallel structure is recognized between the document parts created by the input sentence dividing unit When,
A parallel similarity value assigning unit that calculates a parallel similarity value indicating the similarity between the analysis candidates constituting the combination created by the parallel combination creation unit;
A corpus value giving unit that calculates a corpus value indicating the appearance degree of the analysis candidate created by the plurality of analysis candidate creation units, based on an analyzed corpus that is a result of analysis in the past;
Based on the parallel similarity value calculated by the parallel similarity value giving unit and the corpus value calculated by the corpus value giving unit, the document to be translated from one or more analysis candidates created by the multiple analysis candidate creation unit A candidate selection unit for selecting analysis candidates to be applied when translating parts;
A machine translation device comprising:
(Appendix 8)
A parallel structure presence / absence determining unit that determines whether or not a parallel structure exists between analysis candidates created by the plurality of analysis candidate creating units;
A first process control unit that controls the parallel combination creation unit to create a combination of analysis candidates when the parallel structure presence / absence determination unit determines that a parallel structure exists;
The machine translation apparatus according to appendix 7, characterized by comprising:
(Appendix 9)
A parallel similarity value threshold determination unit that determines whether or not the parallel similarity value calculated by the parallel similarity value giving unit is equal to or greater than a predetermined threshold;
A second processing control unit that performs control so as to calculate a corpus value for an analysis candidate for which a parallel similarity value is determined to be greater than or equal to the threshold value by the parallel similarity value threshold determination unit;
The machine translation apparatus according to appendix 7, characterized by comprising:

１００入力文
１１０、１２０、１３０文書部品
１２０Ａ、１２０Ｂ、１３０Ａ、１３０Ｂ解析候補
１２１、１２２、１２３、１２４、１３１、１３２、１３３、１３４フレーズ
２００パターン
２１０、２２０、２３０、２４０、２５０翻訳部品
５００Ａ、５００Ｂ、５００Ｃ機械翻訳装置
５０１入力文分割部
５０２複数解析候補作成部
５０３並列組合せ作成部
５０４並列類似値付与部
５０５頻度取得部
５０６コーパス値付与部
５０７解析済みコーパスデータベース（ＤＢ）
５０８総合値算出部
５０９候補選択部
５１０部品翻訳部
５１１結合部
１５０１並列構造有無判別部
１５０２第１の処理制御部
１６０１並列類似値閾値判別部
１６０２第２の処理制御部
１７０１ＣＰＵ（Central Processing Unit）
１７０２メモリ
１７０３入力装置
１７０４出力装置
１７０５外部記録装置
１７０６媒体駆動装置
１７０７ネットワーク接続装置
１７０８バス
１７０９可搬型記録媒体
１７１０通信ネットワーク 100 Input sentence 110, 120, 130 Document part 120A, 120B, 130A, 130B Analysis candidate 121, 122, 123, 124, 131, 132, 133, 134 Phrase 200 Pattern 210, 220, 230, 240, 250 Translation part 500A, 500B, 500C Machine translation apparatus 501 Input sentence division unit 502 Multiple analysis candidate creation unit 503 Parallel combination creation unit 504 Parallel similarity value assignment unit 505 Frequency acquisition unit 506 Corpus value assignment unit 507 Analyzed corpus database (DB)
508 Total value calculation unit 509 Candidate selection unit 510 Component translation unit 511 Connection unit 1501 Parallel structure presence / absence determination unit 1502 First processing control unit 1601 Parallel similarity value threshold determination unit 1602 Second processing control unit 1701 CPU (Central Processing Unit)
1702 Memory 1703 Input device 1704 Output device 1705 External recording device 1706 Medium drive device 1707 Network connection device 1708 Bus 1709 Portable recording medium 1710 Communication network

Claims

A machine translation method executed by a computer of a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, create a document part by dividing the sentence to be translated,
Creating one or more analysis candidates for each of the created document parts;
When a parallel structure is recognized between the created document parts, a combination of analysis candidates is created for the analysis candidates created for each document part constituting the parallel structure,
Calculate a parallel similarity value indicating the similarity between the analysis candidates constituting the created combination,
Based on the analyzed corpus that is the result of analysis in the past, a corpus value indicating the appearance degree of the created analysis candidate is calculated,
Based on the calculated parallel similarity value and the calculated corpus value, select an analysis candidate to be applied when translating a document part to be translated from the created one or more analysis candidates.
A machine translation method characterized by the above.

Determining whether a parallel structure exists between the created analysis candidates,
If it is determined that the parallel structure exists, control to create a combination of the analysis candidates,
The machine translation method according to claim 1.

Determining whether the calculated parallel similarity value is equal to or greater than a predetermined threshold;
Control to calculate a corpus value for the analysis candidate determined that the parallel similarity value is equal to or greater than the threshold,
The machine translation method according to claim 1.

In a computer of a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, create a document part by dividing the sentence to be translated,
Creating one or more analysis candidates for each of the created document parts;
When a parallel structure is recognized between the created document parts, a combination of analysis candidates is created for the analysis candidates created for each document part constituting the parallel structure,
Calculate a parallel similarity value indicating the similarity between the analysis candidates constituting the created combination,
Based on the analyzed corpus that is the result of analysis in the past, a corpus value indicating the appearance degree of the created analysis candidate is calculated,
Based on the calculated parallel similarity value and the calculated corpus value, select an analysis candidate to be applied when translating a document part to be translated from the created one or more analysis candidates.
A machine translation program for executing a process.

In a machine translation device that translates a translation target sentence described in a first language into a translation target sentence described in a second language,
Based on a predetermined rule, an input sentence dividing unit that divides the translation target sentence to create a document part;
A plurality of analysis candidate creation units for creating one or more analysis candidates for each of the document parts created by the input sentence dividing unit;
A parallel combination creation unit that creates a combination of analysis candidates for the analysis candidates created for each of the document parts constituting the parallel structure when a parallel structure is recognized between the document parts created by the input sentence dividing unit When,
A parallel similarity value assigning unit that calculates a parallel similarity value indicating the similarity between the analysis candidates constituting the combination created by the parallel combination creation unit;
A corpus value giving unit that calculates a corpus value indicating the appearance degree of the analysis candidate created by the plurality of analysis candidate creation units, based on an analyzed corpus that is a result of analysis in the past;
Based on the parallel similarity value calculated by the parallel similarity value giving unit and the corpus value calculated by the corpus value giving unit, the document to be translated from one or more analysis candidates created by the multiple analysis candidate creation unit A candidate selection unit for selecting analysis candidates to be applied when translating parts;
A machine translation device comprising: