JP2021022203A

JP2021022203A - Translation evaluation device, program for controlling translation evaluation device, and translation evaluation method using translation evaluation device

Info

Publication number: JP2021022203A
Application number: JP2019138768A
Authority: JP
Inventors: 豊椿; Yutaka Tsubaki
Original assignee: Tsubaki IP Service Co Ltd
Current assignee: Tsubaki IP Service Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2021-02-18

Abstract

To provide a translation evaluation device capable of easily evaluating a translation sentence, such as checking whether there is no omission of translation or not.SOLUTION: A translation evaluation device evaluates a result in which a sentence described in a first language is translated into a sentence described in a second language. The sentence described in the first language includes a plurality of character strings. The translation evaluation device includes: determination means for determining whether or not a character string corresponding to the character string included in the first language is included in the sentence described in the second language; and presentation means for presenting the determination result by the determination means.SELECTED DRAWING: Figure 1

Description

この発明は、翻訳評価装置、翻訳評価装置の制御プログラム、および翻訳評価装置を用いた翻訳評価方法に関する。 The present invention relates to a translation evaluation device, a control program of the translation evaluation device, and a translation evaluation method using the translation evaluation device.

形態素解析や係り受け解析等、機械翻訳（ＭＴ：ＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ）の基礎となる技術の研究が始まったのは、西暦１９５０年代のことである。西暦１９８０年代になると、コンピュータの性能の向上に伴い、ルールベースの機械翻訳技術が、ある程度の実用性を備えるに至った。 Research on the basic techniques of machine translation (MT), such as morphological analysis and dependency analysis, began in the 1950s. In the 1980s, as the performance of computers improved, rule-based machine translation technology came to have some practicality.

西暦１９９０年代には、統計的機械翻訳（ＳＴＭ：ＳｔａｔｉｓｔｉｃａｌＢａｓｅＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ）の手法が開発された。これは、異なる言語間の単語、文章構造の対応を統計的に処理し、翻訳される確率の高い単語、翻訳文を翻訳候補から選択するものである。統計的機械翻訳においては、翻訳モデルにより、元の単語に対応する単語、および単語の語順が確率に従って決定される。さらに、言語モデルによって、翻訳後の単語の正しい並び方が確率に従って決定される。統計的機械翻訳では、翻訳エンジン（翻訳モデルおよび言語モデル）の構築のために、大量の対訳データ（訓練データ）が必要とされる。翻訳エンジンを用いた翻訳文と正解翻訳とを比較することで、翻訳エンジンは適宜チューニングされる。 In the 1990s, a method of statistical machine translation (STM: Statistical Base Machine Translation) was developed. This statistically processes the correspondence between words and sentence structures between different languages, and selects words and translated sentences that are highly likely to be translated from translation candidates. In statistical machine translation, the translation model determines the words corresponding to the original words and the word order of the words according to probability. In addition, the language model determines the correct arrangement of translated words according to probability. In statistical machine translation, a large amount of bilingual data (training data) is required to build a translation engine (translation model and language model). The translation engine is tuned appropriately by comparing the translated sentence using the translation engine with the correct translation.

統計的機械翻訳においては、語順を正しく決定するために、元の言語の文章の構文解析を行い、係り受け関係を利用して、元の言語の単語の語順を予め翻訳先の言語の語順に変換しておく技術が知られている。 In statistical machine translation, in order to correctly determine the word order, the word order of the words in the original language is preliminarily changed to the word order of the destination language by parsing the sentences in the original language and using the dependency relationship. The technology to convert is known.

西暦２０１０年代に入り、ニューラルネットワークによるディープラーニングを用いたニューラル機械翻訳（ＮＭＴ：ＮｅｕｒａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ）が登場した。それは、西暦２０１５年頃から急速な実用化を遂げ、統計的機械翻訳を置き換えつつある。ニューラル機械翻訳は、一般に翻訳原文の前処理（語順の入れ替えなど）を行わず、文章をニューラルネットワークで直接（ｅｎｄｔｏｅｎｄに）翻訳するものである。 In the 2010s, neural machine translation (NMT: Neural Machine Translation) using deep learning by neural networks appeared. It has been put into practical use rapidly since about 2015, and is replacing statistical machine translation. Neural machine translation generally translates a sentence directly (end to end) with a neural network without preprocessing the original translated text (such as changing the word order).

ニューラル機械翻訳では、単語はベクトルの実数値として扱われ、数百次元の分散表現として扱われる。すなわちベクトルによって、単語の語義や統語的な情報が表現される（ＷｏｒｄＥｍｂｅｄｄｉｎｇ）。１つの元単語とその翻訳単語との意味関係から、他の元単語とその翻訳単語との意味関係をある程度類推することができるため、柔軟な翻訳が可能となっている。 In neural machine translation, words are treated as real values of vectors and as distributed representations of hundreds of dimensions. That is, the vector expresses the meaning and syntactic information of a word (Word Embedding). From the semantic relationship between one original word and its translated word, the semantic relationship between the other original word and its translated word can be inferred to some extent, so that flexible translation is possible.

機械翻訳においては、入力データの長さ（原文の長さ）は一定ではなく、かつ過去に入力したデータを利用する必要がある。このため、ニューラル機械翻訳では、一般にはリカレントニューラルネットワーク（ＲＮＮ）が利用される。リカレントニューラルネットワークに逐次原文の単語（または文字）を入力し、文末（ＥＯＳ：ＥｎｄｏｆＳｔｒｉｎｇ）を示すコードが出力されると、一文の翻訳が完了とされる。 In machine translation, the length of the input data (the length of the original text) is not constant, and it is necessary to use the data input in the past. Therefore, in neural machine translation, a recurrent neural network (RNN) is generally used. When the words (or characters) of the original sentence are sequentially input to the recurrent neural network and the code indicating the end of the sentence (EOS: End of String) is output, the translation of one sentence is completed.

ニューラル機械翻訳においては、ニューロン（ノード）それぞれの重み付け係数が翻訳知識となるため、翻訳のためのテーブル（フレーズテーブルなどの元単語と翻訳単語とを対応付けるテーブル）は必要とされない。ニューラル機械翻訳は、統計的機械翻訳のように、原文の構成単語を置き換え、並べ替えることで翻訳を行うものではなく、学習した言語モデルに整合するように、入力テキストから新たな翻訳文を作り出すものであるといえる。 In neural machine translation, since the weighting coefficient of each neuron (node) is the translation knowledge, a table for translation (a table for associating the original word with the translated word such as a phrase table) is not required. Unlike statistical machine translation, neural machine translation does not translate by replacing and rearranging the constituent words of the original text, but creates a new translation from the input text so that it matches the learned language model. It can be said that it is a thing.

特開2018-120584号公報JP-A-2018-120584

ルールベースの機械翻訳や統計的機械翻訳と比較して、ニューラル機械翻訳には、翻訳文が流暢である、学習していない文章であっても（ＷｏｒｄＥｍｂｅｄｄｉｎｇ等により）柔軟に翻訳することが可能である、という利点がある。 Compared to rule-based machine translation and statistical machine translation, neural machine translation allows for flexible translation of unlearned sentences (by Word Embedding, etc.) even if the translated sentences are fluent. There is an advantage that it is.

しかしながら、ニューラル機械翻訳は、ルールベースの機械翻訳や統計的機械翻訳のように、原文の置き換えによって翻訳を行う技術ではないため、入力文に含まれる情報を過不足なく厳密に翻訳することが難しい（翻訳の抜けが生じる可能性がある）という問題があった。また、翻訳の重複した箇所が出力されることもあった。 However, unlike rule-based machine translation and statistical machine translation, neural machine translation is not a technology that translates by replacing the original text, so it is difficult to translate the information contained in the input text exactly and exactly. There was a problem (translation may be omitted). In addition, duplicate translations may be output.

さらに、ニューラル機械翻訳では、統計的機械翻訳で用いられていた、原文の単語の語順を予め翻訳先の言語の語順に変換しておく手法を用いることができないため、これまで用いられてきた学習データ（辞書データ、語順データなど）を利用し難いという問題点があった。 Furthermore, in neural machine translation, it is not possible to use the method used in statistical machine translation to convert the word order of the words in the original text into the word order of the translation destination language in advance, so the learning that has been used so far. There was a problem that it was difficult to use data (dictionary data, word order data, etc.).

さらにニューラル機械翻訳では、ＷｏｒｄＥｍｂｅｄｄｉｎｇにより柔軟な翻訳ができる反面、意味が全く異なる単語が選択されることで、意味の全く異なる翻訳文が作成されることもあった（特に出現頻度の低い語、出現頻度の低い固有名詞など）。また、長い文章、複雑な文章の場合、意味の通じない翻訳文が作成されることもあった。 Furthermore, in neural machine translation, while Word Embedding enables flexible translation, words with completely different meanings may be selected to create translated sentences with completely different meanings (especially words that appear infrequently). Infrequently occurring proper nomenclature, etc.). In addition, in the case of long sentences and complicated sentences, translated sentences that do not make sense may be created.

さらに、一般にニューラル機械翻訳では、ルールベースの機械翻訳や、統計的機械翻訳とは異なり、翻訳の過程を人間が理解することが困難であり、正確な翻訳文が出力されるか否かを予測し難いという問題があった。これにより、ニューラル機械翻訳では、誤訳が生じやすく、誤訳文が最終結果物として提供されやすいという問題があった。 Furthermore, in general, neural machine translation, unlike rule-based machine translation and statistical machine translation, makes it difficult for humans to understand the translation process, and predicts whether or not an accurate translation will be output. There was a problem that it was difficult to do. As a result, in neural machine translation, there is a problem that mistranslation is likely to occur and the mistranslated sentence is easily provided as the final product.

多少の誤訳や翻訳の抜けがあっても概要がわかればよい日常会話や、あまり重要ではない情報の伝達のための文書に関しては、内容の正確さよりも、生成される翻訳文の流暢さや柔軟さが重視されることがある。このような文書に対して、ニューラル機械翻訳は極めて有用である。 For everyday conversations where you only need to get an overview even if there are some mistranslations or omissions in translations, or for documents for communicating less important information, the fluency and flexibility of the generated translations rather than the accuracy of the content. May be emphasized. Neural machine translation is extremely useful for such documents.

一方で、法律、特許などの厳密な正確性が要求される文書については、誤訳や翻訳の抜けは致命的である。このため、ニューラル機械翻訳を用いたとしても誤訳や翻訳の抜けがないかのチェックを相当の時間と労力をかけて行う必要があった。また、チェック不足により、致命的な誤訳や翻訳の抜けが生じ、財産的な損害が生じる可能性があった。 On the other hand, for documents that require strict accuracy such as laws and patents, mistranslations and omissions of translations are fatal. Therefore, even if neural machine translation is used, it is necessary to spend a considerable amount of time and effort to check for mistranslations and omissions in translations. In addition, due to lack of checks, fatal mistranslations and omissions of translations may occur, resulting in property damage.

また、ニューラル機械翻訳に関する他の課題として、上付き、下付き文字、イタリック、太字、アンダーラインが付された文字など、文字飾りやスタイルの取り扱いで不都合が生じることがある。文章において、例えば上付き文字であれば、ＨＴＭＬで通常の文字が<sup>〜</sup>のタグ囲まれることで上付き文字であることが示されている場合と、Unicodeなどの文字コードで直接上付き文字が表現されている場合がある。これらが混在されて機械学習に用いられたり、また、翻訳時の文書での文字飾りやスタイルの取り扱いが学習時の取り扱いと異なる場合、誤った翻訳結果が出力される可能性がある。 Another issue with neural machine translation is the handling of character decorations and styles, such as superscripts, subscripts, italics, bold, and underlined characters. In sentences, for example, in the case of superscripts, HTML indicates that normal characters are superscripts by being surrounded by <sup> ~ </ sup> tags, and character codes such as Unicode. In some cases, superscript characters are directly expressed in. If these are mixed and used for machine learning, or if the handling of character decorations and styles in the document at the time of translation is different from the handling at the time of learning, an incorrect translation result may be output.

上付き、下付き文字は、化学記号、化学式、数式、変数などの記載に用いられるが、ニューラル機械翻訳ではＷｏｒｄＥｍｂｅｄｄｉｎｇの作用効果によって、一見似ているが全く異なる化学記号、化学式、数式、変数が翻訳結果として出力されることがある。化学記号、化学式、数式、変数は、一見して誤りを見つけにくいため、翻訳成果物内の誤訳として残るケースが多い。 Subscripts and subscripts are used to describe chemical symbols, chemical formulas, mathematical formulas, variables, etc., but in neural machine translation, chemical symbols, chemical formulas, mathematical formulas, variables that are seemingly similar but completely different due to the effects of Word Embedding May be output as a translation result. Chemical symbols, chemical formulas, mathematical formulas, and variables are difficult to find at first glance, so they often remain as mistranslations in translation products.

さらにニューラル機械翻訳では、原文中の同一の語が、異なる複数種類の語に翻訳される問題がある。例えば、元の文書に含まれる「address」という語が、同一の意味を示すにもかかわらず、出現箇所によって、「アドレス」と訳されたり、「住所」と訳されたり、「宛名」と訳されたりすることがある。また例えば、元の文書に含まれる「表示部」という語が、同一の物を示すにも関わらず、出現箇所によって、「display unit」と訳されたり、「display portion」と訳されたり、「display means」と訳されたりすることがある。このような表記のゆらぎや異なった意味をもつ単語の出力は、特に、厳密さを要求される法律文書、特許文書の翻訳において好ましいものではない。 Further, in neural machine translation, there is a problem that the same word in the original text is translated into a plurality of different kinds of words. For example, although the word "address" contained in the original document has the same meaning, it may be translated as "address", "address", or "address" depending on where it appears. It may be done. Also, for example, the word "display unit" contained in the original document may be translated as "display unit" or "display portion" depending on where it appears, even though it indicates the same thing. It may be translated as "display means". Such fluctuations in notation and output of words with different meanings are not particularly preferable in the translation of legal documents and patent documents that require strictness.

さらにニューラル機械翻訳では、学習のための膨大な対訳文を準備する必要がある。かつ、学習のための計算量が多いため、学習のための多大な時間とコンピュータ資源（ＧＰＵ等）を必要とするという問題があった。 Furthermore, in neural machine translation, it is necessary to prepare a huge amount of bilingual sentences for learning. In addition, since the amount of calculation for learning is large, there is a problem that a large amount of time and computer resources (GPU, etc.) are required for learning.

ニューラル機械翻訳のコストを減らすために、自社専用ではない、学習済の外部のコンピュータ（事業者である他社や他人が運営するインターネット接続されたサイト、サーバ等）によってニューラル機械翻訳を行うことも考えられる。１の事業者が学習済のニューラル機械翻訳システムをインターネット上で提供し、複数の利用者（不特定多数のユーザ）がそのシステムをインターネット上で利用するものである。なお、不特定多数の利用とは、そのサイト（サーバ）が一般に公開され、各所からアクセス可能であることを意味している。 In order to reduce the cost of neural machine translation, it is also possible to perform neural machine translation using a learned external computer (an internet-connected site, server, etc. operated by another company or another company) that is not dedicated to the company. Be done. One business operator provides a learned neural machine translation system on the Internet, and a plurality of users (unspecified number of users) use the system on the Internet. The use of an unspecified number of people means that the site (server) is open to the public and can be accessed from various places.

このようなシステムを用いる場合、システムの利用者は、インターネットを介して他社が運営する学習済みのニューラル機械翻訳コンピュータに文書データを送信し、そのサーバで行われた機械翻訳の結果をインターネットを介して受け取る。送信や受信は、システム提供者が提供するＡＰＩ（Application Programming Interface）を用いて行う事も可能であるし、ＨＴＴＰ、ＨＴＴＰＳなどの一般の通信プロトコルを用いて行う事も可能である。 When using such a system, the user of the system sends document data to a trained neural machine translation computer operated by another company via the Internet, and the result of machine translation performed on that server is transmitted via the Internet. To receive. Transmission and reception can be performed using an API (Application Programming Interface) provided by the system provider, or can be performed using a general communication protocol such as HTTP or HTTPS.

また、送信や受信は、ユーザインタフェースであるＷＥＢブラウザを介して行うこともできる。この場合、利用者はＷＥＢブラウザ上に表示されたフォームにコピーアンドペーストなどにより翻訳対象の文（テキスト）を貼り付け、それを外部のニューラル機械翻訳コンピュータに送信する。ニューラル機械翻訳コンピュータからのレスポンスである翻訳結果は、ＷＥＢブラウザ上に表示される。これらの通信におけるデータ交換フォーマットとしては、ＪＳＯＮ（JavaScript Object Notation）形式などが用いられる。 In addition, transmission and reception can also be performed via a WEB browser which is a user interface. In this case, the user pastes the sentence (text) to be translated into the form displayed on the WEB browser by copy and paste or the like, and sends it to an external neural machine translation computer. The translation result, which is the response from the neural machine translation computer, is displayed on the WEB browser. As a data exchange format in these communications, a JSON (JavaScript Object Notation) format or the like is used.

上述のような共同利用可能なニューラル機械翻訳システムでは、システム利用者にとっては、学習のための膨大な対訳文を準備することや、学習のための多大な時間、メンテナンスなどが不要となる。またシステム利用者は、システム構築のためのコンピュータ資源（サーバ、ＧＰＵ等）を所有する必要がない。すなわち利用者は、ニューラル機械翻訳サービスを利用しながら、ニューラルネットワークの学習に要する時間、ハードウェアのコスト、メンテナンスのコストや手間をほぼゼロとすることができ、便利である。 In the above-mentioned jointly usable neural machine translation system, the system user does not need to prepare a huge amount of bilingual sentences for learning, a large amount of time for learning, and maintenance. In addition, the system user does not need to own computer resources (server, GPU, etc.) for system construction. That is, the user can conveniently reduce the time required for learning the neural network, the hardware cost, the maintenance cost, and the labor while using the neural machine translation service.

一方で、秘密性の高い文書（例えば契約書などの法律文書、特に公開前の特許文書、社内の機密文書、研究開発の文書など）を翻訳する場合には、インターネットを介して外部にそれら秘密性の高い文書が送信されることは避けなければならない。 On the other hand, when translating highly confidential documents (for example, legal documents such as contracts, especially unpublished patent documents, internal confidential documents, research and development documents, etc.), these secrets are externally transmitted via the Internet. High-quality documents should be avoided.

仮にＨＴＴＰＳ通信などを用いることで通信経路を暗号化したとしても、通信の目的地では、送信された文書は当然に平文に変換される。よって、他社の提供するサービスを利用するのであれば結局のところ、秘密性の高い文書をインターネットで送信することは避けなければならない。通信経路の暗号化を行うことによるセキュリティ確保と、秘密にすべき情報を外部（他の企業など）に渡してしまうことによるセキュリティリスクとは、別次元の問題であり、別個独立に対処する必要がある。 Even if the communication path is encrypted by using HTTPS communication or the like, the transmitted document is naturally converted into plain text at the destination of communication. Therefore, if you use services provided by other companies, you should avoid sending highly confidential documents over the Internet after all. Ensuring security by encrypting the communication path and security risk by passing information that should be kept secret to the outside (such as other companies) are different dimensions and need to be dealt with independently. There is.

この点に関し、守秘義務に関する契約を文書の送信先（インターネット上でニューラル機械翻訳サービスを提供する事業者）との間で締結することも考えられる。しかしながら、ネット社会において情報は漏洩しやすく、しかも漏洩が明らかになりにくいという問題があるため、守秘義務契約によって情報を秘匿化することは実現性に乏しく、危険である。また、そのような守秘義務契約の確実な履行が保証されるものではない。さらに、守秘義務契約の履行が継続的に行われていることを定期的に監督、管理することは困難である。 In this regard, it is conceivable to conclude a confidentiality contract with the destination of the document (a business operator that provides a neural machine translation service on the Internet). However, since information is easily leaked in the Internet society and it is difficult for the leak to be revealed, it is not feasible and dangerous to conceal information by a non-disclosure agreement. In addition, the reliable performance of such non-disclosure agreements is not guaranteed. Moreover, it is difficult to regularly supervise and control the ongoing performance of nondisclosure agreements.

今日、パソコン、スマートフォンが身近なツールとなり、インターネットやＷＥＢブラウザは誰でも気軽に利用できる生活必需品となっている。一方で、インターネットのシステムや通信プロトコル、情報の伝達経路についての理解不足も手伝い、秘密性の高い文書をインターネット上の機械翻訳サービスに安易に送信する事例が後を絶たない。また、インターネット上の機械翻訳サービスを利用することによる情報漏洩のリスクは看過されることが多い。 Today, personal computers and smartphones have become familiar tools, and the Internet and WEB browsers have become daily necessities that anyone can easily use. On the other hand, there are many cases where highly confidential documents are easily sent to machine translation services on the Internet, with the help of lack of understanding of Internet systems, communication protocols, and information transmission routes. In addition, the risk of information leakage due to the use of machine translation services on the Internet is often overlooked.

ＷＥＢを介した翻訳サービス、辞書サービスでは、その利用者は、ＷＥＢブラウザなどから翻訳したい文や単語を入力する（これには、タイピング、音声入力、コピーアンドペーストなどの入力方法がある）。入力された文や単語は、翻訳サービス、辞書サービスの提供事業者の保有（または管理）するサーバに送信され、送信者の情報とともに蓄積される。 In the translation service and dictionary service via WEB, the user inputs a sentence or word to be translated from a WEB browser or the like (this includes input methods such as typing, voice input, and copy and paste). The input sentences and words are transmitted to the server owned (or managed) by the provider of the translation service and the dictionary service, and are accumulated together with the sender's information.

翻訳サービス、辞書サービスの提供事業者は、蓄積された情報の利用ポリシー（利用規約）を定めることができるが、それを定めない事業者も存在する。また、利用規約によっては、「ユーザがコンテンツをアップロード、提供、送信などすると、ユーザは、サービス提供事業者（およびサービス提供事業者と協働する第三者）に対して、そのコンテンツについて、使用、保存、複製、変更、派生物の作成、（公衆）送信、出版、公開表示、および配布を行うための全世界的なライセンスを付与する。」との条項が定められているものもある。 Providers of translation services and dictionary services can set usage policies (terms of use) for accumulated information, but some businesses do not. In addition, depending on the terms of use, "When the user uploads, provides, sends, etc., the user uses the content to the service provider (and a third party who cooperates with the service provider). , Retain, copy, modify, create derivatives, (publicly) transmit, publish, publicly display, and grant worldwide licenses for distribution. "

翻訳会社の翻訳者、または翻訳会社から案件を受注した個人翻訳者が、秘密性の高い文章（またはその一部文章や単語）をパソコンやスマートフォンのウェブブラウザに安易に入力することで、秘密性の高い文章が外部（特に海外）に漏洩する事態が多く発生している。また、そのような漏洩は、翻訳者本人、翻訳者の管理職、翻訳会社、翻訳の発注元の誰も気づかないまま、何年にもわたって継続しているという問題もある。 Confidentiality is achieved by a translator of a translation company or an individual translator who has received an order from a translation company by easily inputting highly confidential sentences (or some sentences or words) into a web browser of a personal computer or smartphone. There are many cases where high-quality sentences are leaked to the outside (especially overseas). There is also the problem that such leaks have continued for many years without being noticed by the translator himself, the translator's manager, the translation company, or the person who ordered the translation.

特に契約書や、研究開発の文書や、公開前の特許などに関する秘密性の高い文書を外部（特に国外）に漏洩させることは、一企業の利益保護の観点はもとより、国益保護の観点からも絶対に避ける必要がある。 In particular, leaking highly confidential documents such as contracts, R & D documents, and patents before publication to the outside (especially overseas) is not only from the perspective of protecting the interests of a company, but also from the perspective of protecting national interests. You absolutely need to avoid it.

例えば、独立行政法人情報処理推進機構の2015年2月20日のプレス発表である『「注意喚起」クラウドサービスに入力した内容の意図しない情報漏えいに注意』の資料には、以下の事項が記載されている。（https://www.ipa.go.jp/about/press/20150220.htmlより引用。） For example, the following items are described in the material of the press release of the Information-technology Promotion Agency, Japan on February 20, 2015, "Beware of unintentional information leakage of the contents entered in the" alert "cloud service." Has been done. (Quoted from https://www.ipa.go.jp/about/press/20150220.html.)

『IPA（独立行政法人情報処理推進機構、理事長：藤江一正）は、ネット上の翻訳サービスに入力した文章が、ネット上にそのまま公開されていたという問題が明らかになったことを受け、クラウドサービス利用における利用者の意識向上を目的に、利用における心構えについて改めて、注意喚起を発することとしました。近年、様々なクラウドサービスが充実し、企業向けだけでなく、個人を対象としたサービスの利用も進んでいます。例えばインターネット上に写真や資料等のデータを保存することで、いつでも、どこでも利用できたり、翻訳なども手軽にできたりするサービスなどがあります。しかし、クラウドサービスはその利便性から急速に普及した反面、利用者がサービスの内容やリスクを正しく認識せずに利用したことが原因で、意図しない情報漏えいの問題が再三指摘されてきました(*1)。 "IPA (Information-technology Promotion Agency, Japan, Chairman: Kazumasa Fujie) responded to the fact that the text entered in the translation service on the Internet was published as it was on the Internet. For the purpose of raising the awareness of users when using cloud services, we have decided to issue a new warning about the attitude of using cloud services. In recent years, various cloud services have been enhanced, and the use of services not only for businesses but also for individuals is increasing. For example, by saving data such as photos and materials on the Internet, there are services that can be used anytime, anywhere, and can be easily translated. However, while cloud services have spread rapidly due to their convenience, the problem of unintended information leakage has been repeatedly pointed out because users used the service without correctly recognizing the content and risks of the service (" * 1).

2013年にはGoogle社が提供するサービスの1つであるGoogleグループの利用者が、情報公開範囲の設定を正しく認識していなかったために、関係者以外でもやりとりが閲覧できてしまう状態にあったという問題が発生しています。また、IPAの調査(*2)では “ブラウザへの入力情報や検索履歴等がブラウザ提供元の企業に収集される”ことについて、“まったく気にならない”“あまり気にならない”等と回答した人が全体の31.3%であったことからも、クラウドサービスに対する利用者の意識・知識向上が求められます。 In 2013, users of the Google Group, one of the services provided by Google, did not correctly recognize the information disclosure range setting, so even non-related parties could view the exchange. I'm having a problem. In addition, in the IPA survey (* 2), "I don't care at all" or "I don't care much" about "the information entered in the browser, search history, etc. are collected by the company that provided the browser". Since the number of people was 31.3% of the total, it is necessary to improve the awareness and knowledge of users regarding cloud services.

一方、サービス提供側も利用者への説明不足や利用者の認識不足による情報漏えいを防ぐため、サービスの内容やサービス側での情報の扱いについて、判りやすく説明することが求められます。』 On the other hand, the service provider is also required to explain the content of the service and the handling of information on the service side in an easy-to-understand manner in order to prevent information leakage due to lack of explanation to the user and lack of recognition of the user. 』\

また、独立行政法人情報処理推進機構の「2014年2月の呼びかけ」には、以下の事項が記載されている。（https://www.ipa.go.jp/security/txt/2014/02outline.htmlより引用。） In addition, the following items are stated in the "Call for February 2014" by the Information-technology Promotion Agency, Japan. (Quoted from https://www.ipa.go.jp/security/txt/2014/02outline.html.)

『「知らない間に情報を外部に漏らしていませんか？」〜クラウドサービスを利用する上での勘所〜（中略） "Are you leaking information to the outside without knowing it?" ~ Tips for using cloud services ~ (Omitted)

クラウドサービスは便利に利用できる反面、何らかの情報をサービス事業者側に渡すことが避けられません。 While cloud services are convenient to use, it is inevitable to pass some information to the service provider.

下記表1の3つのクラウドサービスは、普段の業務上、何気なく利用しているサービスです。特に「オンライン翻訳サービス」は、パソコンに翻訳ソフトをインストールすることなく気軽に文章を翻訳できますが、翻訳する元の“文章そのものを事業者に渡している”ともいえます。 The three cloud services in Table 1 below are services that you casually use in your daily work. In particular, the "online translation service" allows you to easily translate sentences without installing translation software on your computer, but it can also be said that the original "text itself is handed over to the business operator".

業務で機密情報を扱う場合は、このことを認識し注意して利用してください。 Please be aware of this and use it with caution when handling confidential information in your business.

表1：情報漏えいなどにつながるリスクがある、クラウドサービスの使い方 Table 1: How to use cloud services that may lead to information leakage

オンライン翻訳サービス Online translation service

概要翻訳したい文書をウェブページにコピー＆ペーストすると、サービス事業者側の翻訳プログラムが自動的に翻訳。 Overview Copy and paste the document you want to translate onto a web page, and the translation program on the service provider will automatically translate it.

利用時のリスク Risk when using

コピー＆ペーストした翻訳元文書の内容を外部に送信してしまうことになり、その内容が機密情報に該当する場合、知らない間に社内規定に反してしまうことになる。 The contents of the original translation document copied and pasted will be sent to the outside, and if the contents correspond to confidential information, it will violate the company regulations without knowing it.

想定される被害例 Possible damage example

ある社員が、海外の会社と、契約に関する交渉を暗号化メールでやり取りしていた。復号したメールが英語で書かれていたため、オンライン翻訳サービスを利用して日本語に翻訳した。その事により契約情報が漏えいしてしまった。せっかく情報漏えい防止のために暗号化メールを使用していたのに、オンライン翻訳という別の経路で情報が漏えいしてしまった。』 An employee was exchanging contract negotiations with an overseas company via encrypted email. The decrypted email was written in English, so I translated it into Japanese using an online translation service. As a result, the contract information was leaked. I used to use encrypted e-mail to prevent information leakage, but the information was leaked through another route called online translation. 』\

以上のように注意喚起がされているものの、その利便性が高いことから、依然として外部の（クラウド等の）翻訳システムを使う者のセキュリティ意識は低いままである。 Despite the above alerts, the security awareness of those who use external translation systems (such as the cloud) remains low due to their high convenience.

本願発明においては、翻訳文を容易に評価することができる翻訳評価装置を提供することを目的としている。 An object of the present invention is to provide a translation evaluation device capable of easily evaluating a translated sentence.

上記目的を達成するため、この発明のある局面に従うと、第１の言語で記述された文章を第２の言語で記述された文章に翻訳した結果を評価する翻訳評価装置において、前記第１の言語で記述された文章は、文字列を複数含んでおり、翻訳評価装置は、前記第１の言語に含まれる文字列に対応する文字列が、前記第２の言語で記述された文章に含まれているかを判定する判定手段と、前記判定手段の判定結果を提示する提示手段とを備える。 In order to achieve the above object, according to a certain aspect of the present invention, in the translation evaluation device for evaluating the result of translating a sentence written in a first language into a sentence written in a second language, the first The sentence described in the language contains a plurality of character strings, and the translation evaluation device includes the character string corresponding to the character string included in the first language in the sentence described in the second language. It is provided with a determination means for determining whether or not the determination means, and a presentation means for presenting the determination result of the determination means.

好ましくは前記第２の言語で記述された文章は、前記第１の言語で記述された文章を機械翻訳することにより得られた文章である。 Preferably, the sentence described in the second language is a sentence obtained by machine-translating the sentence described in the first language.

好ましくは前記文字列は、単語を示す文字列、単語とその直後に続く参照符号を示す文字列、および暗号文字列のいずれかである、または、単語を示す文字列、単語とその直後に続く参照符号を示す文字列、および暗号文字列のいずれかを含む文字列である。 Preferably, the character string is either a character string indicating a word, a character string indicating a word and a reference code immediately following the word, or a coded character string, or a character string indicating a word, a word immediately following the word. A character string including either a character string indicating a reference code or a coded character string.

好ましくは翻訳評価装置は、前記第１の言語で記述された文章中の、ある単語を示す文字列、またはある単語を示す文字列を含む文字列が、それに対応する他の文字列に置き換えられた文章の機械翻訳結果である前記第２の言語で記述された文章を取得し、前記判定手段は、前記置き換え後の文字列が、前記第２の言語で記述された文章に含まれているかを判定する。 Preferably, the translation evaluation device replaces a character string indicating a certain word or a character string including a character string indicating a certain word in a sentence written in the first language with another character string corresponding thereto. The sentence described in the second language, which is the result of machine translation of the sentence, is acquired, and the determination means determines whether the replaced character string is included in the sentence described in the second language. To judge.

好ましくは前記判定手段は、文書を構成する単位の１つごとに判定を行い、前記提示手段は、文書を構成する単位の１つごとに判定結果を提示する。 Preferably, the determination means makes a determination for each unit constituting the document, and the presentation means presents the determination result for each unit constituting the document.

好ましくは前記判定手段は、文書全体について判定を行い、前記提示手段は、文書全体についての判定結果を提示する。 Preferably, the determination means makes a determination for the entire document, and the presentation means presents a determination result for the entire document.

好ましくは前記判定結果は、前記第２の言語で記述された文章に含まれていない文字列、および前記第２の言語で記述された文章に含まれていない文字列の数の少なくともいずれかを含む。 Preferably, the determination result is at least one of a character string not included in the sentence written in the second language and a character string not included in the sentence written in the second language. Including.

好ましくは前記判定手段は、前記第２の言語に含まれる文字列に対応する文字列が、前記第１の言語で記述された文章に含まれているかを判定する。 Preferably, the determination means determines whether or not the character string corresponding to the character string included in the second language is included in the sentence described in the first language.

この発明の他の局面に従うと、第１の言語で記述された文章を第２の言語で記述された文章に翻訳した結果を評価する翻訳評価装置の制御プログラムにおいて、前記翻訳評価装置は、コンピュータを含み、前記第１の言語で記述された文章は、文字列を複数含んでおり、前記翻訳評価装置の制御プログラムは、前記第１の言語に含まれる文字列に対応する文字列が、前記第２の言語で記述された文章に含まれているかを判定する判定ステップと、前記判定ステップの判定結果を提示する提示ステップとをコンピュータに実行させる。 According to another aspect of the present invention, in a control program of a translation evaluation device that evaluates the result of translating a sentence written in a first language into a sentence written in a second language, the translation evaluation device is a computer. The sentence described in the first language contains a plurality of character strings, and the control program of the translation evaluation device includes the character string corresponding to the character string included in the first language. A computer is made to execute a determination step of determining whether or not the text is included in a sentence written in a second language and a presentation step of presenting the determination result of the determination step.

この発明のさらに他の局面に従うと、第１の言語で記述された文章を第２の言語で記述された文章に翻訳した結果を評価する翻訳評価装置を用いた翻訳評価方法において、前記翻訳評価装置は、コンピュータを含み、前記第１の言語で記述された文章は、文字列を複数含んでおり、前記翻訳評価方法は、前記第１の言語に含まれる文字列に対応する文字列が、前記第２の言語で記述された文章に含まれているかを判定する判定ステップと、前記判定ステップの判定結果を提示する提示ステップとを含む。 According to still another aspect of the present invention, in a translation evaluation method using a translation evaluation device that evaluates the result of translating a sentence written in a first language into a sentence written in a second language, the translation evaluation is performed. The device includes a computer, the sentence written in the first language contains a plurality of character strings, and the translation evaluation method includes a character string corresponding to the character string included in the first language. It includes a determination step of determining whether or not the sentence is included in the sentence described in the second language, and a presentation step of presenting the determination result of the determination step.

この発明によると、上記課題の少なくとも１つを解決することができる。 According to the present invention, at least one of the above problems can be solved.

本発明の第１の実施の形態における翻訳システムの構成を示すブロック図である。It is a block diagram which shows the structure of the translation system in 1st Embodiment of this invention. 本発明の第１の実施の形態における翻訳システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the translation system in 1st Embodiment of this invention. データベース２０７に格納されるデータ構造の具体例を示す図である。It is a figure which shows the specific example of the data structure stored in the database 207. 本発明の第１の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。It is a flowchart which shows the Japanese-English translation processing of the computer program included in the translation system in 1st Embodiment of this invention. 図４に続くフローチャートである。It is a flowchart following FIG. 本発明の第１の実施の形態における翻訳システムに含まれるコンピュータプログラムの単語、暗号登録処理を示すフローチャートである。It is a flowchart which shows the word of the computer program included in the translation system in 1st Embodiment of this invention, and the code registration process. 本発明の第２の実施の形態における翻訳システムに含まれるコンピュータプログラムの単語、暗号登録処理を示すフローチャートである。It is a flowchart which shows the word of the computer program included in the translation system in the 2nd Embodiment of this invention, and the code registration process. 本発明の第３の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。It is a flowchart which shows the Japanese-English translation processing of the computer program included in the translation system in 3rd Embodiment of this invention. 本発明の第４の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。It is a flowchart which shows the Japanese-English translation processing of the computer program included in the translation system in 4th Embodiment of this invention. 本発明の第５の実施の形態における翻訳システムで、データベース２０７中の第１のデータベースに格納されるデータ構造の具体例を示す図である。It is a figure which shows the specific example of the data structure stored in the 1st database in the database 207 in the translation system in 5th Embodiment of this invention. 本発明の第５の実施の形態における翻訳システムで、データベース２０７中の第２のデータベースに格納されるデータ構造の具体例を示す図である。It is a figure which shows the specific example of the data structure stored in the 2nd database in the database 207 in the translation system in 5th Embodiment of this invention. 本発明の第５の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。It is a flowchart which shows the Japanese-English translation processing of the computer program included in the translation system in 5th Embodiment of this invention. 図１２に続くフローチャートである。It is a flowchart following FIG.

［第１の実施の形態］ [First Embodiment]

図１は、本発明の第１の実施の形態における翻訳システムの構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of a translation system according to the first embodiment of the present invention.

図を参照して、翻訳システムは、ユーザ（システム利用者）が操作を行うコンピュータ１００と、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）３００と、インターネット４００と、他社コンピュータ資源２００と、自社コンピュータ資源５００とから構成される。 With reference to the figure, the translation system is composed of a computer 100 operated by a user (system user), a LAN (Local Area Network) 300, an Internet 400, another company's computer resource 200, and an in-house computer resource 500. Will be done.

ユーザが操作を行うコンピュータ１００は、ＣＰＵ１０１と、通信部１０３と、Ｉ／Ｏ１０５と、ＲＯＭ１０７と、ＲＡＭ１０９と、記憶装置１１１と、音声入力／出力部１１３と、ディスプレイ１１５と、グラフィックユニット１１７と、キーボード１１９と、マウス１２１とを含んで構成されている。 The computer 100 operated by the user includes a CPU 101, a communication unit 103, an I / O 105, a ROM 107, a RAM 109, a storage device 111, an audio input / output unit 113, a display 115, a graphic unit 117, and the like. It is configured to include a keyboard 119 and a mouse 121.

コンピュータ１００は、ＬＡＮ３００に接続され、ＬＡＮ３００はインターネット４００に接続される。ＬＡＮ３００には自社コンピュータ資源５００が接続されており、インターネット４００には他社コンピュータ資源２００が接続されている。 The computer 100 is connected to the LAN 300, and the LAN 300 is connected to the Internet 400. The company's computer resource 500 is connected to the LAN 300, and another company's computer resource 200 is connected to the Internet 400.

ここでコンピュータ資源とは、サーバ、パーソナルコンピュータ、記憶装置（コンピュータ内のストレージ、ＮＡＳ（ＮｅｔｗｏｒｋＡｔｔａｃｈｅｄＳｔｏｒａｇｅ）など）、および情報通信経路（ネットワーク、ロードバランサ、スイッチ、ルータなど）、並びに、それらを構成するＣＰＵ、メモリ、記憶装置（ハードディスク、光学的または磁気的記憶装置、ＳＳＤほか半導体デバイス）、ＲＯＭ、マザーボード、キーボード、マウス、マイクなどの入力装置、ディスプレイ、およびスピーカなどの出力装置、並びに記憶装置から読み出され、一般にはメモリ上で動作するコンピュータプログラム、データなどのソフトウェアの全てまたは一部を示している。 Here, the computer resources include a server, a personal computer, a storage device (storage in a computer, NAS (Network Attached Storage), etc.), an information communication path (network, load balancer, switch, router, etc.), and a configuration thereof. CPU, memory, storage device (hard disk, optical or magnetic storage device, SSD and other semiconductor devices), ROM, motherboard, keyboard, mouse, input device such as microphone, output device such as display and speaker, and storage device. Indicates all or part of software such as computer programs, data, etc. that are read from and generally run on memory.

自社コンピュータ資源５００とは、上記ユーザ（または上記ユーザが所属する組織）が所有しているコンピュータ資源である。組織は、ユーザ１名からなる組織であっても良いし、複数の人員から構成される組織であってもよい。 The in-house computer resource 500 is a computer resource owned by the user (or the organization to which the user belongs). The organization may be an organization consisting of one user or an organization composed of a plurality of personnel.

自社コンピュータ資源５００やコンピュータ１００は、インターネット４００に直接接続されておらず、インターネット４００には、ＬＡＮ３００のルータ（図示せず）を介して接続される。また、ルータ、自社コンピュータ資源５００、コンピュータ１００にファイヤウォール機能を設けることにより、インターネット４００側から自社コンピュータ資源５００やコンピュータ１００にアクセスすることが禁じられている。これにより、自社コンピュータ資源５００やコンピュータ１００がインターネット４００を介して外部から操作されることはなく、また、自社コンピュータ資源５００やコンピュータ１００に記憶されたデータに外部からアクセスすることが禁止されている。 The company's computer resource 500 and the computer 100 are not directly connected to the Internet 400, and are connected to the Internet 400 via a router (not shown) of the LAN 300. Further, by providing a firewall function on the router, the company's computer resource 500, and the computer 100, it is prohibited to access the company's computer resource 500 and the computer 100 from the Internet 400 side. As a result, the company's computer resource 500 and the computer 100 are not operated from the outside via the Internet 400, and access to the data stored in the company's computer resource 500 and the computer 100 from the outside is prohibited. ..

他社コンピュータ資源２００とは、上記ユーザ（または上記ユーザが所属する組織）が所有するものではないコンピュータ資源である。上記ユーザは、コンピュータ１００からアクセスすることで他社コンピュータ資源２００の計算能力やストレージを一時的に借り受けることができる。その利用形態は、有償であると無償であるとを問わない。 The other company's computer resource 200 is a computer resource that is not owned by the user (or the organization to which the user belongs). The user can temporarily borrow the computing power and storage of another company's computer resource 200 by accessing from the computer 100. The usage pattern does not matter whether it is paid or free of charge.

秘密性の高い文章である翻訳の原文データは、ＨＴＭＬやプレーンなテキストで記述され、自社コンピュータ資源５００やコンピュータ１００に記録される。翻訳の原文データが画像である場合には、自社コンピュータ資源５００やコンピュータ１００に記録されたＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ／Ｒｅａｄｅｒ）のソフトウェアによりそれがＨＴＭＬ文書やテキスト文書に変換され、処理の対象とされる。 The original text data of the translation, which is a highly confidential sentence, is described in HTML or plain text, and is recorded in the company's computer resource 500 or computer 100. When the original data of the translation is an image, it is converted into an HTML document or a text document by the software of OCR (Optical Character Recognition / Reader) recorded in the company's computer resource 500 or the computer 100, and is processed. To.

自社コンピュータ資源５００およびコンピュータ１００と、他社コンピュータ資源２００との間の通信プロトコルは、ＨＴＴＰＳ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌＳｅｃｕｒｅ）等により行われる。すなわち、ＳＳＬ（Secure Sockets Layer）／ＴＬＳ（Transport Layer Security）プロトコルによって提供されるセキュアな暗号化（秘密鍵、公開鍵を使ったデータ暗号化手法）の上でＨＴＴＰ通信が行われるので、自社コンピュータ資源５００およびコンピュータ１００と他社コンピュータ資源２００との間の通信内容を秘匿化することができ、その通信内容が第三者に漏洩することは防止される。通信内容の第三者への漏洩を防ぐことができるのであれば、ＨＴＴＰＳ以外の暗号化通信を行っても良い。 The communication protocol between the company's computer resource 500 and the computer 100 and the computer resource 200 of another company is performed by HTTPS (Hypertext Transfer Protocol Security) or the like. That is, since HTTP communication is performed on secure encryption (data encryption method using private key and public key) provided by SSL (Secure Sockets Layer) / TLS (Transport Layer Security) protocol, the company's computer The communication content between the resource 500 and the computer 100 and the computer resource 200 of another company can be concealed, and the communication content can be prevented from being leaked to a third party. Encrypted communication other than HTTPS may be performed as long as the communication content can be prevented from being leaked to a third party.

ＨＴＴＰＳなどで暗号化された通信内容は、当然に他社コンピュータ資源２００で復号化される。このため、たとえＨＴＴＰＳなどの通信プロトコルを用いるとしても、上記ユーザ、または上記ユーザが所属する組織内で秘密にしておくべき文書の平文を他社コンピュータ資源２００に送信することは望ましくない。また、他社コンピュータ資源２００の運営業者が、利用規約として、送られてきたデータに関する公開、利用などの権利を留保するように取り決めている場合がある。このような場合、上記ユーザ、または上記ユーザが所属する組織内で秘密にしておくべき文書の平文を他社コンピュータ資源２００に送信することは、技術的には可能であるが、ビジネス的には不可能である。 The communication content encrypted by HTTPS or the like is naturally decrypted by the computer resource 200 of another company. Therefore, even if a communication protocol such as HTTPS is used, it is not desirable to transmit the plaintext of the user or the document to be kept secret in the organization to which the user belongs to the computer resource 200 of another company. In addition, the operator of the computer resource 200 of another company may have agreed as a terms of use to reserve the right to disclose or use the sent data. In such a case, it is technically possible to send the plaintext of the document to be kept secret within the user or the organization to which the user belongs to the computer resource 200 of another company, but it is not business-wise. It is possible.

他社コンピュータ資源２００で平文に戻された通信内容が、現実的には当該他社でどのように保存、利用されるかユーザにとって詳細は解らない。また、一般にはユーザは他社コンピュータ資源２００の構成、接続を知ることができず、当該他社内部の受信データの具体的な取り扱いを知ることができない。 The user does not know the details of how the communication content returned to plain text by the computer resource 200 of the other company is actually stored and used by the other company. Further, in general, the user cannot know the configuration and connection of the computer resource 200 of the other company, and cannot know the specific handling of the received data inside the other company.

現実に秘密情報の流出が頻繁に起こっており、それを防ぐことが困難であることを考えると、たとえ他社コンピュータ資源２００の運営者との間で秘密保持契約を締結するとしても、やはり秘密にしておくべき文書の平文を他社コンピュータ資源２００に送信することは望ましくないといえる。 Considering that the leakage of confidential information occurs frequently in reality and it is difficult to prevent it, even if a confidentiality agreement is concluded with the operator of another company's computer resource 200, it is still kept secret. It can be said that it is not desirable to send the plain text of the document to be kept to the computer resource 200 of another company.

すなわち、ＨＴＴＰＳなどの暗号化通信を用いるとしても、秘密性の高い文章である翻訳対象の文章、またはその一部は、外部の他社コンピュータ資源２００に送信すべきではない。 That is, even if encrypted communication such as HTTPS is used, the text to be translated, which is a highly confidential text, or a part thereof should not be transmitted to an external computer resource 200 of another company.

そこで、本実施例におけるコンピュータ１００は、秘密性の高い文章である翻訳対象の文章の一部を、解読困難な文字列に入替えることによって暗号化し（以下、これを「第１の暗号化」という。）、その後、その文章を外部の他社コンピュータ資源２００に送信するときにＨＴＴＰＳ（ＳＳＬ暗号化通信）などのプロトコルによってさらに暗号化（以下、これを「第２の暗号化」という。）する。 Therefore, the computer 100 in this embodiment encrypts a part of the text to be translated, which is a highly confidential text, by replacing it with a character string that is difficult to decipher (hereinafter, this is referred to as "first encryption"). After that, when the text is transmitted to an external computer resource 200 of another company, it is further encrypted by a protocol such as HTTPS (SSL encrypted communication) (hereinafter, this is referred to as "second encryption"). ..

他社コンピュータ資源２００で、上記第２の暗号化によって暗号化された通信内容は、ＨＴＴＰＳなどのプロトコルによって平文に戻される（「第２の暗号化」に対応するものであるため、以下これを「第２の復号化」という。）。しかしながら、第１の暗号化については、復号化されることはなく、第１の暗号化によって暗号化されたままの文書が他社コンピュータ資源２００内の翻訳プログラムによって翻訳される。その翻訳によって作成された翻訳文（テキスト文書、またはＨＴＭＬ文書）は、ＨＴＴＰＳなどのプロトコルにて暗号化（すなわち「第２の暗号化」の手法による暗号化）され、インターネット４００を通じてコンピュータ１００に送信される。コンピュータ１００は、「第２の復号化」の手法であるＨＴＴＰＳなどのプロトコルによって、受信した文章を平文に戻す。この状態では、まだ第１の暗号化については復号化されていない。 The communication content encrypted by the second encryption in the computer resource 200 of another company is returned to plain text by a protocol such as HTTPS (since it corresponds to the "second encryption", this is referred to as "2" below. Second decryption "). However, the first encryption is not decrypted, and the document as encrypted by the first encryption is translated by the translation program in the computer resource 200 of the other company. The translated text (text document or HTTPS document) created by the translation is encrypted by a protocol such as HTTPS (that is, encrypted by the method of "second encryption") and transmitted to the computer 100 via the Internet 400. Will be done. The computer 100 returns the received text to plain text by a protocol such as HTTPS, which is a method of "second decoding". In this state, the first encryption has not yet been decrypted.

コンピュータ１００では、受信された翻訳後のデータに対して、「第１の暗号化」に対する復号処理である「第１の復号化」を行う。また、必要に応じてその後処理を行う。これにより、完全な翻訳後の文章をコンピュータ１００のユーザは得ることができる。 The computer 100 performs "first decryption", which is a decryption process for "first encryption", on the received translated data. In addition, subsequent processing is performed as necessary. As a result, the user of the computer 100 can obtain the completely translated sentence.

図１に示されている自社、他社以外に通信データが漏洩したとしても、それは第１の暗号化と第２の暗号化により２重の暗号化がされたものである。これにより、通信を行う当事者以外の第三者への情報の漏洩が防止される。 Even if the communication data is leaked to other than the company and other companies shown in FIG. 1, it is double-encrypted by the first encryption and the second encryption. This prevents information from being leaked to a third party other than the communicating party.

また、図１に示されている他社は、第２の暗号化については復号化が行われた情報を得ることができるが、第１の暗号化については復号化することができない。このため、図１に示されている自社から、図１に示されている他社への情報の漏洩も防止される。すなわち図１に示されている他社（コンピュータ資源２００の運営者側の人間）は、第１の暗号化がなされた文章しか手に入れることができない。また、翻訳後の文章も第１の暗号化がなされたままである。このため、図１に示されている他社は、暗号化されていない完全な翻訳前のデータ、および暗号化されていない完全な翻訳後のデータのいずれも得ることができない。 Further, the other company shown in FIG. 1 can obtain the decrypted information for the second encryption, but cannot decrypt the first encryption. Therefore, the leakage of information from the company shown in FIG. 1 to the other company shown in FIG. 1 is also prevented. That is, the other company (human being on the operator side of the computer resource 200) shown in FIG. 1 can only obtain the first encrypted text. Also, the translated text remains the first encrypted. For this reason, the other companies shown in FIG. 1 cannot obtain either the unencrypted complete untranslated data or the unencrypted fully translated data.

第１の暗号化は、他社コンピュータ資源２００にとって暗号化後の文章の機械翻訳ができる程度の暗号化であり、かつ、自社以外の者がその内容を理解できない程度の強度の暗号化である必要がある。 The first encryption needs to be such that the computer resource 200 of another company can perform machine translation of the encrypted text, and the encryption is strong enough that no one other than the company can understand the contents. There is.

図２は、本発明の第１の実施の形態における翻訳システムの構成を示す機能ブロック図である。 FIG. 2 is a functional block diagram showing a configuration of a translation system according to the first embodiment of the present invention.

図１の記憶装置１１１には、ＣＰＵ１０１によって順に実行されることで、コンピュータ１００を翻訳装置として動作させるコンピュータ読取り可能な実行形式のプログラムが複数記録されている。プログラムは、記憶装置１１１からから読み出され、ＲＡＭ１０９上に展開される。プログラムがＲＡＭ１０９上でＣＰＵ１０１によって実行される。 In the storage device 111 of FIG. 1, a plurality of computer-readable executable programs that operate the computer 100 as a translation device by being sequentially executed by the CPU 101 are recorded. The program is read from the storage device 111 and expanded on the RAM 109. The program is executed by the CPU 101 on the RAM 109.

プログラムの実行により、図２に示されるように、コンピュータ１００は、制御部２０１、通信部１０３、検索・置換部２０３、文書編集・単語登録部２０５、データベース２０７、メモリー２０９、記憶装置１１１、表示／出力部２１１、および入力部２１３としての機能を発揮する。 By executing the program, as shown in FIG. 2, the computer 100 has a control unit 201, a communication unit 103, a search / replace unit 203, a document editing / word registration unit 205, a database 207, a memory 209, a storage device 111, and a display. / Exhibits functions as an output unit 211 and an input unit 213.

制御部２０１は、翻訳装置全体の各種制御を行う機能ブロックである。通信部１０３は、社内・社外のコンピュータ資源と通信を行うための機能ブロックである。検索・置換部２０３は、翻訳の対象となる文書および機械翻訳後の文書の少なくとも一方について、特定の要素（単語、文節など）の検索を行ったり、検索された要素を対応する文字列や暗号に置換する機能ブロックである。文書編集・単語登録部２０５は、翻訳の対象となる文書および機械翻訳後の文書の少なくとも一方を編集したり、単語変換のための辞書を登録する機能ブロックである。 The control unit 201 is a functional block that performs various controls for the entire translation device. The communication unit 103 is a functional block for communicating with internal and external computer resources. The search / replace unit 203 searches for a specific element (word, phrase, etc.) in at least one of the document to be translated and the document after machine translation, and the searched element is a character string or code corresponding to the searched element. It is a functional block that replaces with. The document editing / word registration unit 205 is a functional block for editing at least one of a document to be translated and a document after machine translation, and registering a dictionary for word conversion.

データベース２０７は、主に単語辞書を登録するデータベースである。データベース２０７としてはデータベースサーバのソフトウェアを採用しても良いし、ＣＳＶファイルなどの単なるテキストファイルや、表計算ソフト（ＥＸＣＥＬなど）のテーブルを記録し、それを検索し、対応するデータを読み出すことでデータベースとしてもよい。 The database 207 is a database that mainly registers a word dictionary. Database server software may be used as the database 207, or by recording a simple text file such as a CSV file or a table of spreadsheet software (EXCEL, etc.), searching for it, and reading the corresponding data. It may be a database.

メモリ２０９は、データを一時的に記録するワーキングエリアである。記憶装置１１１は、不揮発性の記憶装置であり、翻訳前のデータ、翻訳後のデータ、データベースに記録されるデータ、メモリ２０９のデータなどを記憶する。表示／出力部２１１は、ディスプレイ、スピーカなどのユーザインタフェースである。入力部２１３は、マウス、キーボード、マイクなどのユーザインタフェースである。 The memory 209 is a working area for temporarily recording data. The storage device 111 is a non-volatile storage device, and stores data before translation, data after translation, data recorded in a database, data in memory 209, and the like. The display / output unit 211 is a user interface for a display, a speaker, or the like. The input unit 213 is a user interface such as a mouse, a keyboard, and a microphone.

図３は、データベース２０７に格納されるデータ構造の具体例を示す図である。 FIG. 3 is a diagram showing a specific example of the data structure stored in the database 207.

ここではデータベース２０７は列名（カラム名、フィールド名）として、番号（レコード番号であり、データの通し番号）、単語、暗号、対訳単語などを有するテーブルからなっている。 Here, the database 207 is composed of a table having numbers (record numbers, serial numbers of data), words, codes, bilingual words, etc. as column names (column names, field names).

単語としては、名詞が登録されるが、動詞、副詞、形容詞などを登録しても良い。また名詞には、複数の名詞からなる複合名詞（電気＋自動車である「電気自動車」など）も含まれる。 Nouns are registered as words, but verbs, adverbs, adjectives, etc. may be registered. The nouns also include compound nouns consisting of a plurality of nouns (such as "electric vehicle" which is an electric vehicle + an automobile).

番号のカラムは、１レコードの通し番号を記録する。単語のカラムは、翻訳原文の単語を登録する。暗号のカラムは、その単語に対応する暗号を登録する。対訳単語のカラムは、その単語（および暗号）に対応する翻訳後の単語を登録する。 The number column records the serial number of one record. In the word column, the words in the original translation are registered. The cipher column registers the cipher corresponding to the word. The parallel word column registers the translated word corresponding to the word (and cipher).

例えば図３では、日本語と英語の翻訳で用いる翻訳データが登録されている。すなわち、日本語の単語と、それに対応する英語の単語（対訳単語）と、それに対応する暗号が登録されている。図３のテーブルを用いることで、日本語の単語をそれに対応する暗号に変換したり、日本語の単語をそれに対応する英単語に変換したり、暗号をそれに対応する日本語の単語に変換したり、暗号をそれに対応する英単語に変換したり、英単語をそれに対応する暗号に変換したり、英単語をそれに対応する日本語の単語に変換することが可能である。 For example, in FIG. 3, translation data used for Japanese and English translation is registered. That is, a Japanese word, a corresponding English word (translation word), and a corresponding code are registered. By using the table in Fig. 3, Japanese words can be converted to the corresponding ciphers, Japanese words can be converted to the corresponding English words, and ciphers can be converted to the corresponding Japanese words. It is possible to convert the cipher to the corresponding English word, convert the English word to the corresponding cipher, and convert the English word to the corresponding Japanese word.

また、カラムを追加し、他の言語の対訳単語も図３のテーブルに登録することで、３以上の言語間における翻訳が可能である。例えば、対訳中国語単語のカラムを追加することで、日英、英日、日中、中日、英中、中英の翻訳（および各言語の単語と暗号との変換）が可能となる。 In addition, by adding a column and registering bilingual words in other languages in the table of FIG. 3, translation between three or more languages is possible. For example, by adding a column of bilingual Chinese words, it is possible to translate Japanese-English, English-Japanese, Japanese-Chinese, Chinese-Japanese, English-Chinese, and Chinese-English (and conversion of words in each language into ciphers).

例として、番号「０」のレコードとして、「電気自動車」の原文の単語と、「ＡＡＡ」の暗号文字列と、「electric vehicle」の対訳単語とをデータベースは対応付けて記録している。 As an example, as a record of the number "0", the database records the original word of "electric vehicle", the encrypted character string of "AAA", and the bilingual word of "electric vehicle" in association with each other.

このようなデータベースは、事前にユーザが作成しても良いし、図６または７のフローチャートで示される処理を繰り返すことで、翻訳作業時または翻訳作業時以外に作成しても良い。また、業者がデータベースを作成し、ユーザに提供することもできる。 Such a database may be created by the user in advance, or may be created during translation work or other than translation work by repeating the process shown in the flowchart of FIG. 6 or 7. In addition, the vendor can create a database and provide it to the user.

このようなデータベースを用いることで、単語を暗号に変換することができるし、単語を対訳単語に変換することもできる。また、暗号を対訳単語に変換することなどもできる。さらに、対訳単語を元の単語に変換（逆翻訳）することも可能である。 By using such a database, words can be converted into ciphers, and words can be converted into bilingual words. It is also possible to convert the code into a bilingual word. Furthermore, it is also possible to convert a bilingual word into the original word (reverse translation).

単語、対訳単語には、１つの基本の単語（「車」、「手段」、「部」、「vehicle」、「device」、「unit」など）が１レコードに登録されても良いし、複合語（「電気自動車」、「信号入力手段」、「表示部」、「electric vehicle」、「signal input unit」、「display unit」など）が１レコードに登録されても良い。 One basic word ("car", "means", "part", "vehicle", "device", "unit", etc.) may be registered in one record as a word or a bilingual word, or a compound. Words (“electric vehicle”, “signal input means”, “display unit”, “electric vehicle”, “signal input unit”, “display unit”, etc.) may be registered in one record.

暗号は、ここではアルファベット３文字からなっており、機械翻訳時の仮想単語として機能する。ここでは暗号は、ＡＡＡから始まり、１０進数で記載されたレコード番号（番号）を、Ａ〜Ｚのアルファベットをそれぞれ０〜２５の数値に当てた２６進数として表したものである。 The code here consists of three letters of the alphabet and functions as a virtual word during machine translation. Here, the cipher is expressed as a 26-ary number in which a record number (number) described in a decimal number starting from AAA is assigned to a numerical value of 0 to 25 in each of the alphabets A to Z.

すなわち暗号の下１桁は、２６の０乗の位であり、暗号の下２桁目は、２６の１乗の位であり、暗号の下３桁目（最上位）は、２６の２乗の位である。 That is, the last digit of the cipher is the 0th power of 26, the last 2nd digit of the cipher is the 1st power of 26, and the last 3rd digit (most significant) of the cipher is 26 squared. It is the place of.

例えば「番号」のカラムが０であれば、「ＡＡＡ」（Ａは２６進数のゼロを示す）の暗号が当てられ、「番号」のカラムが１であれば、「ＡＡＢ」（Ａは２６進数のゼロを示し、Ｂは２６進数の１を示す）の暗号が当てられる。同様に例えば「番号」のカラムが３５０１であれば、「ＦＥＲ」（２６進数の３５０１）の暗号が当てられる。 For example, if the "number" column is 0, the code of "AAA" (A indicates zero in 26-ary) is applied, and if the column of "number" is 1, "AAB" (A is 26-ary). Indicates zero, and B indicates 1 in 26). Similarly, for example, if the "number" column is 3501, the code of "FER" (26-ary number 3501) is applied.

すなわち暗号の最上位（下３桁目）は、番号を２６の２乗（すなわち６７６）で割った値の整数部分に対応するアルファベットが当てられる。暗号の中位（下２桁目）は、上記割り算の余りを２６の１乗（すなわち２６）で割った値の整数部分に対応するアルファベットが当てられる。暗号の最下位（下１桁目）は、その割り算の余りに対応するアルファベットが当てられる。これは、１０進数から２６進数への変換に等しい。 That is, the uppermost (last 3rd digit) of the cipher is assigned the alphabet corresponding to the integer part of the value obtained by dividing the number by the square of 26 (that is, 676). The middle part (last second digit) of the cipher is assigned the alphabet corresponding to the integer part of the value obtained by dividing the remainder of the above division by the first power of 26 (that is, 26). The lowest (last digit) of the cipher is assigned the alphabet corresponding to the remainder of the division. This is equivalent to converting from decimal to 26.

なお、暗号の桁数は３に限るものではない。また、３桁の暗号のうち言葉としての意味を有するもの（例えば、ＣＰＵ、ＲＡＭ、ＵＳＢ、ＮＯＸなど）は、誤解、誤訳が生じることを防ぐため、予約語としてそれに対応する番号と共に登録しないこととしても良い。 The number of digits in the cipher is not limited to three. Also, of the three-digit ciphers that have meaning as words (for example, CPU, RAM, USB, NOX, etc.), do not register them as reserved words together with the corresponding numbers in order to prevent misunderstandings and mistranslations. May be.

またここではレコード番号を２６進数に変換したものを暗号としたが、暗号はランダムに決めても良いし、他の法則で決めても良い。 Further, here, the record number converted into a 26-ary number is used as the encryption, but the encryption may be randomly determined or may be determined by another rule.

単語と対訳単語は、同じものが登録されても良い。例えば化学式や略語（ＨＣｌ、ＣＰＵなど）は、単語と対訳単語とを同じものとして登録してもよい。この場合、日本語でも英語でも同じ単語が登録される。さらに、単語（日本語）が全角文字であり、対訳単語（英語）がその半角文字であってもよい。 The same word and translated word may be registered. For example, for chemical formulas and abbreviations (HCl, CPU, etc.), words and translated words may be registered as the same. In this case, the same word is registered in both Japanese and English. Further, the word (Japanese) may be a full-width character and the translated word (English) may be a half-width character.

また単語が「情報表示 unit 」であり、その対訳単語が「information display unit」であるなど、原文の単語の一部が翻訳後の言語で記述されているものを登録しても良い。 Further, a word may be registered in which a part of the original word is described in the translated language, such as the word is "information display unit" and the translated word is "information display unit".

暗号は、通常の１６進数表記（０〜９、Ａ〜Ｆを用いる）に倣った２６進数表記とし、０〜９、Ａ〜Ｐを用いて表現しても良い。また、アルファベット小文字や記号（！”＃＄％＆など）を用いても良い。暗号を、複数の暗号間で重複しないランダムな文字列としても良い。その文字列を構成する文字は、アルファベット、記号、またはアルファベットと記号の組み合わせからなることが望ましい。顔文字などピクトグラムを暗号としても良い。 The cipher may be expressed in 26-ary notation following the usual hexadecimal notation (0-9, A to F are used), and may be expressed using 0-9, A to P. In addition, lowercase letters and symbols (! ”# $% &, Etc.) may be used. The cipher may be a random character string that does not overlap between a plurality of ciphers. The characters constituting the character string are alphabets. It is desirable that it consists of symbols or a combination of alphabets and symbols. Pictograms such as face letters may be used as encryption.

本実施の形態のように、暗号を、ＡＡＡ、ＡＡＢ、ＡＡＣ、・・・、ＺＺＹ、ＺＺＺとし、１桁をＡ〜Ｚまでの２６進数とし、３桁（３文字）のアルファベットで表すのであれば、２６＊２６＊２６＝１７５７６の単語をデータベースに登録することができる。不足であれば、暗号の桁数を増やすことにより、２６のべき数分のレコード（単語と暗号と対訳単語の組）をデータベースに登録することができる。 As in the present embodiment, the cipher is AAA, AAB, AAC, ..., ZZY, ZZZ, one digit is a 26-ary number from A to Z, and it is represented by a three-digit (three-character) alphabet. For example, 26 * 26 * 26 = 17576 words can be registered in the database. If there is a shortage, 26 power records (word, cipher, and bilingual word pairs) can be registered in the database by increasing the number of digits of the cipher.

本実施の形態のように、Ａ〜Ｚをそれぞれ０〜２５とする２６進数で暗号を表記することで、一見してそれが何を示す語であるかわからないという利点がある。また、ニューラル機械翻訳において、アルファベットや記号の文字列は１つの単語として認識されるため、暗号を１つの単語として原文に残したままで機械翻訳することができるという利点がある。 As in the present embodiment, by expressing the code in a 26-ary number with A to Z as 0 to 25, there is an advantage that it is not possible to know at first glance what the word indicates. Further, in neural machine translation, since the character strings of alphabets and symbols are recognized as one word, there is an advantage that the machine translation can be performed while leaving the code as one word in the original text.

すなわち、ニューラル機械翻訳を用いて翻訳するときに、アルファベットや記号の文字列には訳語が当てられずそのまま出力される（但し、例外もある）。 That is, when translating using neural machine translation, the translated word is not applied to the character strings of alphabets and symbols and is output as it is (however, there are exceptions).

例えば、「電気自動車１００は、エンジン１０１を含む。」の翻訳原文を、図３のテーブルによって置き換えると、「ＡＡＡ１００は、ＡＡＢ１０１を含む。」の暗号文が生成される。この文は、ニューラル機械翻訳により、「AAA 100 includes AAB 101.」の文に翻訳される。ニューラル機械翻訳が行われた後に、図３のテーブルを用いて、その中の暗号を元の単語の訳語に変換することで、「electric vehicle 100 includes engine 101.」の翻訳後の文章を得ることができる。 For example, if the original translation of "the electric vehicle 100 includes the engine 101" is replaced by the table of FIG. 3, the ciphertext of "AAA 100 includes the AAB 101" is generated. This sentence is translated into the sentence "AAA 100 includes AAB 101." by neural machine translation. After the neural machine translation is performed, the translated sentence of "electric vehicle 100 includes engine 101." Is obtained by converting the code in the table into the translation of the original word using the table in FIG. Can be done.

なお、例えば「複数のＡＡＡ」の原文が「a plural of AAAs」に翻訳されるなど、暗号が複数形に変換されることはありうる。この場合も、「AAAs」の「AAA」の暗号部分をその単語に対応する英単語（例えば「book」）に置き換えることで、語尾に「ｓ」を付けた「books」の翻訳語を得ることができる。 It is possible that the cipher may be converted to the plural, for example, the original text of "plural AAA" may be translated into "a plural of AAAs". In this case as well, by replacing the coded part of "AAA" in "AAAs" with the English word corresponding to the word (for example, "book"), a translated word of "books" with "s" at the end can be obtained. Can be done.

但し、この方法では「bus」、「leaf」、「city」などの単語（「ｓ」を付けるだけでは正しい複数形にならない単語）がつづり違い（スペルミス）となってしまう。このため、翻訳後に従来技術であるスペルチェックのルーチンを実行して、これらの単語が正しいスペルとなるように対処する必要がある。またたとえば、「buss」を「buses」に、「leafs」を「leaves」に、「city」を「cities」に対応付けるテーブル（ミススペルと正しいスペルとを対応付けるテーブル）を用意しておき、機械翻訳後に一括変換することとしてもよい。 However, in this method, words such as "bus", "leaf", and "city" (words that cannot be correctly pluralized just by adding "s") are misspelled (misspelling). For this reason, it is necessary to execute a conventional spell check routine after translation to ensure that these words are spelled correctly. Also, for example, prepare a table that associates "buss" with "buses", "leafs" with "leaves", and "city" with "cities" (a table that associates misspellings with correct spellings) after machine translation. It may be a batch conversion.

また、ニューラルネットワークの学習結果によっては、入力される原文中の暗号と、それに対応する翻訳文中の暗号とが異なるものとなってしまう場合もある（誤訳の一種である）。 In addition, depending on the learning result of the neural network, the cipher in the input original text and the cipher in the corresponding translated text may be different (a kind of mistranslation).

これを防ぐために、翻訳原文をニューラル機械翻訳する際に、暗号の部分を、それが特殊な文字列であることを示すキャラクターで囲む（暗号部分の前後に、特殊なキャラクタを挿入する）とよい。例えば、「ＡＡＡ１００は、ＡＡＢ１０１を含む。」のように、暗号の部分の前後にスペース（空白）のキャラクターを挿入してニューラルネットワークに送信することで、暗号が他の暗号に変換されること（誤訳の一種）を防ぐことができる。 To prevent this, when translating the original translation by neural machine translation, it is advisable to enclose the cipher part with a character indicating that it is a special character string (insert a special character before and after the cipher part). .. For example, the cipher is converted to another cipher by inserting a space (blank) character before and after the cipher part and transmitting it to the neural network, such as "AAA 100 includes AAA 101." This (a kind of mistranslation) can be prevented.

スペースを示すキャラクター以外に、翻訳前の文書内の暗号部分を鉤括弧（「」）で囲んでおくことも有効である。同様に、クオーテーションキャラクタ（''、""、””、’’など）、丸括弧、二重丸括弧、二重鉤括弧、角括弧、二重角括弧、波括弧、亀甲括弧、二重亀甲括弧、山括弧、二重山括弧、ギュメ、または隅付き括弧などで囲むことも有効である。これによって、囲まれる部分が他の部分とは異なる特殊な意味を有する部分（暗号）であることを示す状態とした上で、ニューラル機械翻訳することができる。 In addition to the character indicating the space, it is also effective to enclose the encrypted part in the document before translation with hook brackets (""). Similarly, quotation characters ('', "", "", ", etc.), parentheses, double parentheses, double brackets, square brackets, double square brackets, curly braces, braces, double brackets. It is also effective to enclose it in braces, angle brackets, double angle brackets, gume, or corner brackets. As a result, neural machine translation can be performed after setting the enclosed part to be a part (encryption) having a special meaning different from other parts.

これらの処理により挿入（追加）されたキャラクターは、機械翻訳後の後処理で削除される。例えば、暗号を鉤括弧でくくり、「ＡＡＡ」１００などとして機械翻訳した場合、それは"AAA" 100に翻訳される。ダブルクォーテーションは、元々の文章にはない、上記処理時に追加されたものであるため、機械翻訳後に削除されてAAA 100とされる。削除は、正規表現を用いることで可能である。 Characters inserted (added) by these processes are deleted by post-processing after machine translation. For example, if the cipher is enclosed in brackets and machine translated as "AAA" 100, it will be translated into "AAA" 100. Since the double quotation mark was added during the above processing, which was not in the original text, it is deleted after machine translation and becomes AAA 100. Deletion is possible by using a regular expression.

暗号に漢字やひらがななど、アルファベットや記号以外の文字を用いると、それが英語に翻訳されてしまい、復号化できなくなるという問題がある（例えば暗号文字列として割り当てた「あさい」の文字列が「shallow」（「浅い」の英訳）に英訳されたり、「ああい」のような意味を持たない文字列が「Aai」に英訳されたりする）。よって、暗号はアルファベットや記号の文字列とすることが望ましい。 If you use characters other than alphabets and symbols such as kanji and hiragana for the code, there is a problem that they will be translated into English and cannot be decrypted (for example, the character string of "Asai" assigned as a code character string is "". It is translated into English as "shallow" (English translation of "shallow"), or a character string that does not have a meaning like "ahii" is translated into English as "Aai"). Therefore, it is desirable that the cipher be a character string of alphabets and symbols.

また、図３のテーブルを用いると、翻訳後の英語文章の１文の先頭文字が小文字となってしまう場合がある（英語であれば、文頭は大文字で始めるべきである）。これに対しても後のスペルチェックで、文頭を大文字にする処理を行うと良い。例えば、文頭（またはピリオド後の文字）のアルファベットを正規表現で検索し、全て大文字とする処理などである。 In addition, when using the table of FIG. 3, the first letter of one sentence of the translated English sentence may be lowercase (in the case of English, the beginning of the sentence should start with an uppercase letter). Against this, it is advisable to capitalize the beginning of the sentence in a later spell check. For example, the process of searching the alphabet at the beginning of a sentence (or the character after a period) with a regular expression and making it all uppercase.

図３のデータベースのデータは、業者から購入、入手することでそれを登録しても良いし、ユーザが登録してもよい。 The data in the database of FIG. 3 may be registered by purchasing and obtaining it from a vendor, or may be registered by a user.

図４は、本発明の第１の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。図５は、図４に続くフローチャートである。 FIG. 4 is a flowchart showing a Japanese-English translation process of a computer program included in the translation system according to the first embodiment of the present invention. FIG. 5 is a flowchart following FIG.

このフローチャートで示される処理は、記憶装置１１１に記録されたコンピュータプログラムがＲＡＭ１０９上に読み出され、それをＣＰＵ１０１が順次実行することで実行される。プログラムは、コンパイルされた機械語の実行形式で保存されてもよいし、ソースコードをインタプリタが逐次実行する形式としてもよいし、中間言語で記載された形式としても良い。 The process shown in this flowchart is executed by reading the computer program recorded in the storage device 111 onto the RAM 109 and sequentially executing the computer program by the CPU 101. The program may be saved in a compiled machine language executable format, the source code may be sequentially executed by the interpreter, or the program may be written in an intermediate language.

ここではワープロソフト（マイクロソフト社のＷＯＲＤ、オープンソース方式で公開されているＯｐｅｎＯｆｆｉｃｅなど）を用い、そのマクロ（ＶＢＡ：ＶｉｓｕａｌＢａｓｉｃＦｏｒＡｐｐｌｉｃａｔｉｏｎｓなど）を用いて図４のフローチャートのプログラムが実行されるものとする。データベースは、専用のデータベースでもＣＳＶファイルであってもよいが、ここではマイクロソフト社のＥＸＣＥＬなどの表計算ソフトを用いてデータベースが形成され、ワープロソフトのＶＢＡを用いてデータベースへのアクセスが行われるものとする。なお使用されるワープロソフト、プログラム言語の種類は、発明の実施において制限されるものではない。 Here, word processing software (Microsoft's WORD, OpenOffice published in an open source system, etc.) is used, and the macro (VBA: Visual Basic For Applications, etc.) is used to execute the program shown in the flowchart of FIG. To do. The database may be a dedicated database or a CSV file, but here the database is formed using spreadsheet software such as Microsoft's EXCEL, and the database is accessed using the word processing software VBA. And. The types of word processing software and program language used are not limited in the practice of the invention.

図４を参照して、ステップＳ１０１においてユーザはワープロソフトによって翻訳対象の文章が記録された文書ファイルを開く（文書ファイルを補助記憶装置からＲＡＭへ展開する）。また、表計算ソフトも開いておく。 With reference to FIG. 4, in step S101, the user opens the document file in which the sentence to be translated is recorded by the word processing software (expands the document file from the auxiliary storage device to the RAM). Also, open the spreadsheet software.

文書ファイルは、プレーンテキスト形式で記載されたテキストファイルであってもよいし、ワープロソフトの文書ファイルであってもよいし、ＨＴＭＬファイルであってもよい。また、画像に含まれる文字がＯＣＲによりテキスト化されたファイルであってもよい。画像ファイルを入力し、ステップＳ１０１でＯＣＲ機能により文字部分をテキストデータに変換したファイルを作成しても良い。他、文章を記述することができるのであれば、ファイルのフォーマットは特定のものに限定されない。文書は、他のコンピュータからファイル転送プロトコルや電子メールソフトウェアを用いて受信しても良いし、ＵＳＢメモリなどのストレージから入力されても良い。また、キーボード１１９やマイクによって入力されても良い。文書は、インターネットからダウンロードすることとしてもよい。 The document file may be a text file described in plain text format, a document file of word processing software, or an HTML file. Further, the characters included in the image may be a file converted into text by OCR. An image file may be input, and a file in which the character portion is converted into text data by the OCR function in step S101 may be created. In addition, the format of the file is not limited to a specific one as long as it can describe sentences. The document may be received from another computer using a file transfer protocol or e-mail software, or may be input from a storage such as a USB memory. Further, it may be input by a keyboard 119 or a microphone. The document may be downloaded from the Internet.

ここでは仮に電気自動車について記載された特許明細書を翻訳するものとし、文書ファイルに、 Here, it is assumed that the patent specification describing the electric vehicle is translated, and the document file is

「［００２３］ "[0023]

電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章が記載されていたものとする。この文章の翻訳を例として、本実施の形態における翻訳処理について説明する。なお、［００２３］は、文書中の段落番号である。 The electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to a signal input means 103 and a display unit 102. When the input of the signal input means 103 is high, the display unit 102 displays a warning. It is assumed that the sentence "" was described. The translation process in the present embodiment will be described by taking the translation of this sentence as an example. [0023] is a paragraph number in the document.

ステップＳ１０３で翻訳の対象となる文書に対し、前処理が行われる。これは、以下を目的とするものである。 Preprocessing is performed on the document to be translated in step S103. This is for the following purposes:

（１）１度に他社コンピュータ資源２００に送信する文章の単位（文章の区切り）を明確にする。 (1) Clarify the unit of sentences (sentence breaks) to be sent to the computer resources 200 of other companies at one time.

（２）誤訳を少なくするために、他社コンピュータ資源２００において一度に処理される１文の長さを短くする。 (2) In order to reduce mistranslation, the length of one sentence processed at one time in the computer resource 200 of another company is shortened.

（３）１文中の意味が区切られる部分を明確にする。 (3) Clarify the part where the meaning in one sentence is separated.

例えば本実施の形態では、１度に他社コンピュータ資源２００に送信する文章の単位は、文頭から句点（。）までとする。段落番号も文章の単位であるものとして、前処理で、段落番号部分を正規表現を用いてサーチし、その後ろに句点を付与する（上記例では、［００２３］の後に句点が付与される）。 For example, in the present embodiment, the unit of the sentence to be transmitted to the computer resource 200 of another company at one time is from the beginning of the sentence to the punctuation mark (.). Assuming that the paragraph number is also a unit of sentences, in the preprocessing, the paragraph number part is searched using a regular expression, and a punctuation mark is added after that (in the above example, a punctuation mark is added after [0023]). ..

また、重文（２以上の等位の節（主語と述語の組合わせを含む語の集合）によって構成される文）は、短文に分解した方が誤訳が少なくなる。このため、前処理では、重文を２以上の文に分解する（上記例では、「電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。」の文が、「電気自動車１００は、エンジン１０１と表示部１０２を備える。エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。」の２文に変換される）。 In addition, a compound sentence (a sentence composed of two or more coordinated clauses (a set of words including a combination of a subject and a predicate)) is less likely to be mistranslated when it is decomposed into short sentences. Therefore, in the preprocessing, the compound sentence is decomposed into two or more sentences (in the above example, the electric vehicle 100 includes the engine 101 and the display unit 102, and the engine 101 is connected to the signal input means 103 and the display unit 102. The sentence "is." Is converted into two sentences "The electric vehicle 100 includes an engine 101 and a display unit 102. The engine 101 is connected to the signal input means 103 and the display unit 102.").

条件、時、原因、理由などを示す副詞節と主節とからなる複文は、副詞節と主節に関連があるため、２つの文に変換して処理するよりも、１つの文として処理した方が正確な翻訳が可能である。一方で、１文が長くなると誤訳が生じやすいという二律背反がある。本実施の形態では、副詞節と主節とを１つの処理単位とするが、その間に改行コードを挿入することで、両者が別の節であるものとして機械翻訳をすることとしている（複文を構成する節と節との間に改行コードを挿入した後に機械翻訳を行う）。これにより、翻訳をより正確にすることが可能となる。（上記例では、「信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章の「ハイである場合、」の後に改行コードが挿入される）。 A compound sentence consisting of an adverbial clause and a main clause indicating a condition, time, cause, reason, etc. is related to the adverbial clause and the main clause, so it was processed as one sentence rather than being converted into two sentences and processed. More accurate translation is possible. On the other hand, there is an antinomy that mistranslation is likely to occur when one sentence becomes long. In the present embodiment, the adverb clause and the main clause are used as one processing unit, but by inserting a line feed code between them, machine translation is performed assuming that both clauses are different (complex sentence). Machine translation is performed after inserting a line feed code between the constituent clauses). This makes it possible to make the translation more accurate. (In the above example, a line feed code is inserted after "if high" in the sentence "when the input of the signal input means 103 is high, the display unit 102 displays a warning").

副詞節のみならず、副詞句、形容詞節、形容詞句に対しても同様の処理を行ってもよい。 The same processing may be performed not only for adverbial clauses but also for adverbial phrases, adjective clauses, and adjective phrases.

ステップＳ１０３での処理により、 By the process in step S103

「［００２３］ "[0023]

電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章は、 The electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to a signal input means 103 and a display unit 102. When the input of the signal input means 103 is high, the display unit 102 displays a warning. Is the sentence

「［００２３］。 "[0023].

電気自動車１００は、エンジン１０１と表示部１０２を備える。エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、 The electric vehicle 100 includes an engine 101 and a display unit 102. The engine 101 is connected to the signal input means 103 and the display unit 102. When the input of the signal input means 103 is high,

表示部１０２は警告を表示する。」の文章に変換される。（「ハイである場合、」の後に改行コードが挿入されている）。 The display unit 102 displays a warning. Is converted into a sentence. (A line feed code is inserted after "If high").

ステップＳ１０５において、図３のデータベースを用いて、ステップＳ１０３の処理後の翻訳対象の文内の単語が、対訳単語に一決変換される。 In step S105, the words in the sentence to be translated after the processing of step S103 are converted into bilingual words by using the database of FIG.

この処理は、図３のデータベースの番号０のレコードから順にレコードを読みとり、そのレコードに記載された単語を対象語として翻訳対象の文書の全文検索を行い、発見された単語を、図３の同じレコードに記載された対訳単語に置き換える処理である。なお、ここで単語をそのレコードに記載された暗号に置き換えることとしてもよいが、後のチェックでの人間にとっての可読性が低下するため、ここでは人間（翻訳者）に意味の分かる対訳単語に置き換えることが望ましい。 In this process, the records are read in order from the record number 0 in the database of FIG. 3, the full-text search of the document to be translated is performed using the words described in the records as the target words, and the found words are the same as those in FIG. It is a process of replacing with a bilingual word described in a record. It is possible to replace the word with the code described in the record here, but since the readability for humans in the later check will be reduced, here it will be replaced with a bilingual word that can be understood by humans (translators). Is desirable.

図３の例であれば、まずレコード番号０の「電気自動車」が翻訳対象文書中の検索単語とされ、「electric vehicle」がそれを置換する単語（置換後の単語）とされる。ＣＰＵの処理により、翻訳対象文書の先頭から「電気自動車」の語（対象語）が検索され、存在すると、それが「electric vehicle」に置き換えられる。なお、置き換え前にユーザに「○○を△△に置き換えますか？」のようなダイアログボックスを表示し、ユーザのＹＥＳ／ＮＯの入力に基づいてその単語の置換を行うかどうかを決定することとしてもよい。また、ユーザの同意を得ることなく、機械的に全文検索、一括置換（文書中の全ての対象語を置換すること）を行ってもよい。レコード番号０の処理が終了すると、次のレコードがあるかが判定され、最終レコードまで同様の処理が行われる。 In the example of FIG. 3, first, "electric vehicle" with record number 0 is a search word in the document to be translated, and "electric vehicle" is a word to replace it (word after replacement). The processing of the CPU searches for the word (target word) of "electric vehicle" from the beginning of the translation target document, and if it exists, it is replaced with "electric vehicle". Before the replacement, display a dialog box such as "Do you want to replace XX with △△?" To the user, and decide whether to replace the word based on the user's YES / NO input. May be. In addition, full-text search and batch replacement (replacement of all target words in a document) may be performed mechanically without obtaining the consent of the user. When the process of record number 0 is completed, it is determined whether there is a next record, and the same process is performed until the final record.

なお、置換語の単語は、可読性を高める観点と、単語の区切りであることを機械翻訳時に明確にするために、その前後に半角または全角のスペース（空白記号）を挿入することが望ましい。また、確定した単語であることを明確にするために、置換語の単語を鉤括弧（「」）で囲んでもよい。クオーテーションキャラクタ（''、""、””、’’など）、丸括弧、二重丸括弧、二重鉤括弧、角括弧、二重角括弧、波括弧、亀甲括弧、二重亀甲括弧、山括弧、二重山括弧、ギュメ、または隅付き括弧、それ以外の記号などで囲んでもよい。このように、囲まれる部分が他の部分とは異なることを示す記号で囲んでも良い。 It is desirable to insert a half-width or full-width space (blank symbol) before and after the word of the replacement word in order to improve readability and to clarify that it is a word delimiter at the time of machine translation. Also, the replacement word may be enclosed in brackets ("") to make it clear that it is a definite word. Quartet characters ('', "", "",'', etc.), parentheses, double parentheses, double brackets, square brackets, double square brackets, curly braces, braces, double braces, It may be enclosed in angle brackets, double angle brackets, gume, or brackets with corners, or other symbols. In this way, the enclosed portion may be enclosed by a symbol indicating that it is different from other portions.

なお、まだデータベース（図３）にデータが登録されていない状態（システム導入初期など）であれば、図４のステップＳ１０５の処理はパスされる（実行されず、翻訳対象の文章は変化しない）。 If the data is not yet registered in the database (FIG. 3) (initial stage of system introduction, etc.), the process of step S105 in FIG. 4 is passed (it is not executed and the text to be translated does not change). ..

仮にデータベースに図３のデータが登録されていたとすると、 Assuming that the data shown in Fig. 3 is registered in the database,

「［００２３］。 "[0023].

表示部１０２は警告を表示する。」の文章は、 The display unit 102 displays a warning. Is the sentence

「［００２３］。 "[0023].

electric vehicle １００は、 engine １０１と display １０２を備える。 engine １０１は、 signal input unit １０３と display １０２に接続される。 signal input unit １０３の入力がハイである場合、 The electric vehicle 100 includes an engine 101 and a display 102. The engine 101 is connected to the signal input unit 103 and the display 102. If the input of signal input unit 103 is high,

display １０２は警告を表示する。」の文に変換される。（ここでは、可読性を高めるために、置換された単語の前後に半角のスペース（空白記号）を挿入することとしている。） display 102 displays a warning. Is converted to the sentence. (Here, in order to improve readability, half-width spaces (blank symbols) are inserted before and after the replaced word.)

ステップＳ１０７において、ユーザは、ステップＳ１０５までの処理で作成された文書を目視により確認し、必要であれば、文章の編集、新規単語および暗号の新規登録、新規登録単語の文書内の一括変換を行う。 In step S107, the user visually confirms the document created in the processes up to step S105, and if necessary, edits the sentence, newly registers new words and ciphers, and batch-converts the newly registered words in the document. Do.

これは、以下を目的として行われる。 This is done for the following purposes:

（１）依然として長いままで残されている文章を、短い文章に変更する。 (1) Change sentences that are still long to short sentences.

（２）ステップＳ１０５の一括変換で誤変換された単語を修正する。 (2) Correct the erroneously converted words in the batch conversion in step S105.

（３）対訳単語に置換えるべき単語、また、暗号化すべき単語を一括変換し、データベースに登録する。 (3) Words to be replaced with bilingual words and words to be encrypted are collectively converted and registered in the database.

すなわち、ユーザは目視によりステップＳ１０５までの処理で作成された文書を確認し、依然として長いままで残されている文章を、短い文章に変換する。例えば２以上の文章からなる重文を、句点で切ることで、同じ意味を有する２以上の文章とする。 That is, the user visually confirms the document created by the process up to step S105, and converts the sentence that is still left long into a short sentence. For example, a compound sentence consisting of two or more sentences is cut at a punctuation mark to obtain two or more sentences having the same meaning.

またユーザは、ステップＳ１０５の一括変換で誤変換された単語を修正する。技術分野などの違いによって、同一の単語であっても違う単語に翻訳すべきケースが存在する。そのような場合、ユーザは一括変換によりその単語を変換する。また、必要であればデータベースを正しい単語にアップデートしたり、技術分野などによって使い分ける複数のデータベースを準備したりする。 In addition, the user corrects the erroneously converted words in the batch conversion in step S105. There are cases where even the same word should be translated into different words due to differences in technical fields. In such a case, the user converts the word by batch conversion. Also, if necessary, update the database to the correct word, or prepare multiple databases to be used properly according to the technical field.

単語がデータベースに登録されていなかったことから、ステップＳ１０５で変換されなかった単語（とくに、参照符号前の単語）については、このステップＳ１０７においてユーザは新規単語、対訳単語および暗号の新規登録、ならびに、新規登録単語の文書内の一括変換を行う。 For words that were not converted in step S105 (particularly words before the reference code) because the words were not registered in the database, in step S107 the user newly registered new words, bilingual words and ciphers, and , Perform batch conversion in the document of newly registered words.

図６は、本発明の第１の実施の形態における翻訳システムに含まれるコンピュータプログラムの単語、暗号登録処理を示すフローチャートである。この処理は、例えば図４のステップＳ１０７でユーザの入力に応じて実行される。 FIG. 6 is a flowchart showing a word and cryptographic registration process of a computer program included in the translation system according to the first embodiment of the present invention. This process is executed, for example, in step S107 of FIG. 4 in response to a user input.

図６を参照して、ステップＳ２０３においてユーザは文書内の文章の入力、編集処理を行っているものとする。新規単語のデータベースへの登録が必要であるとユーザが考えた場合、ユーザは、ステップＳ２０５で文書中のその単語を選択する。これは、ワープロソフトで文書が表示されているときに、登録すべき単語の先頭（または末尾）にカーソルを移動させ、シフトキーを押下しながら登録すべき単語の末尾（または先頭）まで方向キーを押下することで、登録すべき単語を反転表示させる（または色を変えたり、アンダーラインを付するなどで他の部分と区別できるようにする）ものである。単語の選択は、その単語をマウスでドラッグすることで行っても良い。 With reference to FIG. 6, it is assumed that the user is inputting and editing the text in the document in step S203. If the user thinks that the new word needs to be registered in the database, the user selects the word in the document in step S205. This is done by moving the cursor to the beginning (or end) of the word to be registered when the document is displayed in word processing software, and pressing the shift key while pressing the direction key to the end (or beginning) of the word to be registered. By pressing it, the word to be registered is highlighted (or changed in color or underlined so that it can be distinguished from other parts). A word may be selected by dragging the word with the mouse.

単語登録のためのショートカットキーが押下される（あるいは、表示されたメニューから単語登録を示す表示が選択される）と、ステップＳ２０７においてダイアログボックスが表示される。ダイアログボックスは、選択された単語の対訳語を入力するフィールドを有している。ユーザはこのフィールドに選択された単語の対訳語を入力することで、選択された単語の対訳語を確定させる。また、図３とは異なるデータベースや、学習済みニューラルネットワークによって、選択された単語の対訳語の候補をダイアログボックスに表示し、ユーザから選択を受け付けることで、選択された単語の対訳語を確定させることとしてもよい。 When the shortcut key for word registration is pressed (or the display indicating word registration is selected from the displayed menu), a dialog box is displayed in step S207. The dialog box has a field for entering the translation of the selected word. The user confirms the translation of the selected word by entering the translation of the selected word in this field. In addition, a database different from that in FIG. 3 and a learned neural network are used to display candidates for bilingual words of the selected word in a dialog box and accept selection from the user to determine the bilingual words of the selected word. It may be that.

なお、ステップＳ２０５で選択される単語は、その一部が図４のステップＳ１０５で変換された単語であってもよい。例えば、データベースに「表示部」を「display unit」とする対訳が記録されていたとき、ステップＳ１０５の処理により、文書中の「情報表示部」は、「情報 display unit 」に変換される。ユーザは、「情報」を選択し、「information」の語を対訳語としてデータベースに登録しても良いが、「情報 display unit」を選択し、「information display unit」の語を対訳語としてデータベースに登録しても良い。 The word selected in step S205 may be a part of the word converted in step S105 of FIG. For example, when a bilingual translation with the "display unit" as the "display unit" is recorded in the database, the "information display unit" in the document is converted into the "information display unit" by the process of step S105. The user may select "information" and register the word "information" in the database as a bilingual word, but select "information display unit" and use the word "information display unit" as a bilingual word in the database. You may register.

ステップＳ２０９において、ステップＳ２０５で選択された単語と同じ単語を全文検索し、ステップＳ２１１でそれをステップＳ２０７で確定された対訳語に置換する。この置換においても、図４のステップＳ１０５と同様に、置き換え前にユーザに「○○を△△に置き換えますか？」のようなダイアログボックスを表示し、ユーザのＹＥＳ／ＮＯの入力に基づいてその単語の置換を行うかどうかを決定することとしてもよい。また、ユーザの同意を得ることなく、機械的に全文検索、一括置換（文書中の全ての対象語を置換すること）を行ってもよい。 In step S209, the same word as the word selected in step S205 is searched in full text, and in step S211 it is replaced with the bilingual word determined in step S207. Also in this replacement, as in step S105 of FIG. 4, a dialog box such as "Do you want to replace XX with △△?" Is displayed to the user before the replacement, and based on the user's YES / NO input. You may decide whether to replace the word. In addition, full-text search and batch replacement (replacement of all target words in a document) may be performed mechanically without obtaining the consent of the user.

ステップＳ２１３において、データベース（図３）のデータ登録が行われている最下行を検索する。その１つ下の行を今回の単語の登録行とし、ステップＳ２１５でそのレコード番号から暗号を作成する。ステップＳ２１７でデータベース最下行の１つ下の行に、番号、単語、暗号、対訳単語などが新規に登録される。これにより、翻訳資産であるデータベースがアップデートされる。 In step S213, the bottom row in which the data of the database (FIG. 3) is registered is searched. The line immediately below it is used as the registration line for the word this time, and the code is created from the record number in step S215. In step S217, a number, a word, a code, a bilingual word, and the like are newly registered in the line immediately below the bottom line of the database. As a result, the database, which is a translation asset, is updated.

ステップＳ２１７での処理の後、ステップＳ２０３からの処理に戻る。また、ステップＳ２０５で単語の登録が行われないときは、ステップＳ２０３からの処理を行う。 After the process in step S217, the process returns to the process from step S203. If the word is not registered in step S205, the process from step S203 is performed.

なお、ステップＳ２０５での単語の登録処理開始のイベントは、単語登録のためのショートカットキーが押下されることや、表示されたメニューから単語登録を示す表示が選択されることや、（「単語登録」などの）音声入力などであればよいが、（マウスや音声入力を用いずに単語登録ができるため、）ショートカットキーを用いることが望ましい。ショートカットキーは、文字入力の邪魔にならないよう、例えばコントロールキーと特定のキーの双方の押下などに割り当てられていることが望ましい。 In the event of starting the word registration process in step S205, the shortcut key for word registration is pressed, the display indicating word registration is selected from the displayed menu, and ("word registration"). It may be voice input (such as "), but it is desirable to use a shortcut key (because word registration can be performed without using a mouse or voice input). It is desirable that the shortcut keys are assigned to, for example, pressing both the control key and a specific key so as not to interfere with the character input.

図４のステップＳ１０７での処理が終了したのであれば、ステップＳ１０９において、ステップＳ１０７までの処理で生成された文書内に、機械翻訳しにくい部分があるかどうかのチェックが行われる。これは、具体的には、以下のものである。 When the process in step S107 of FIG. 4 is completed, in step S109, it is checked whether or not there is a part in the document generated by the processes up to step S107 that is difficult to machine translate. Specifically, this is as follows.

（１）１つの文章（文頭から句点まで）の長さが所定の長さ以上あれば、その文章を機械翻訳しにくい部分であるとして、警告を出力する。 (1) If the length of one sentence (from the beginning of the sentence to the punctuation mark) is longer than the predetermined length, a warning is output as it is difficult to machine translate the sentence.

（２）１つの文章の文頭、または改行コードから、次の改行コード、または句点までの長さが所定の長さ以上あれば、その文章を機械翻訳しにくい部分であるとして、警告を出力する。 (2) If the length from the beginning of one sentence or line feed code to the next line feed code or punctuation is longer than the specified length, a warning is output as it is difficult to machine translate the sentence. ..

（３）所定回数以上出現する単語であって、ステップＳ１０５またはＳ１０７の処理で変換されていない（データベースに登録されていない）単語があれば、変換するよう、警告を出力する。 (3) If there is a word that appears a predetermined number of times or more and has not been converted (not registered in the database) in the process of step S105 or S107, a warning is output to convert.

（４）参照符号の前に出現する単語であって、ステップＳ１０５またはＳ１０７の処理で変換されていない（データベースに登録されていない）単語があれば、変換するよう、警告を出力する。 (4) If there is a word that appears before the reference code and has not been converted (not registered in the database) in the process of step S105 or S107, a warning is output to convert it.

（５）１文中の主語と述語の対応がとれていない場合、警告を出力する。 (5) If the subject and predicate in one sentence do not correspond, a warning is output.

（６）１文中に、主語と述語が１つずつではない場合、警告を出力する。 (6) If there is not one subject and one predicate in one sentence, a warning is output.

ステップＳ１０９において、チェックに引っかかった場合、ユーザはステップＳ１０７での処理を続ける。なお、ステップＳ１０９での処理は省略してもよい。 If the check is caught in step S109, the user continues the process in step S107. The process in step S109 may be omitted.

ステップＳ１１１において、ステップＳ１０９までの処理で得られた文書に対し、英語単語を暗号に一括変換する処理が行われる。 In step S111, a process of batch-converting English words into ciphers is performed on the documents obtained in the processes up to step S109.

この処理は、ステップＳ１０５での処理と類似する処理であり、図３のデータベースの番号０のレコードから順にレコードを読みとり、そのレコードに記載された対訳単語を対象語として翻訳対象の文書の全文検索を行い、発見された単語を、図３の同じレコードに記載された暗号に置き換える処理である。 This process is similar to the process in step S105, and records are read in order from the record number 0 in the database of FIG. 3, and the full-text search of the document to be translated is performed using the bilingual word described in the record as the target word. Is performed, and the found word is replaced with the code described in the same record of FIG.

これにより、 This will

「［００２３］。 "[0023].

display １０２は警告を表示する。」の文章は、 display 102 displays a warning. Is the sentence

「［００２３］。 "[0023].

AAA １００は、 AAB １０１と FER １０２を備える。 AAB １０１は、 AAC １０３と FER １０２に接続される。 AAC １０３の入力がハイである場合、 AAA 100 comprises AAB 101 and FER 102. AAB 101 is connected to AAC 103 and FER 102. If the input of AAC 103 is high

FER １０２は警告を表示する。」に変換される。 FER 102 displays a warning. Is converted to.

このように、単語がそのレコードに記載された暗号に置き換えられるため、人間に意味の分からない文書が生成される（第１の暗号化）。 In this way, the word is replaced with the cipher described in the record, resulting in a document that is incomprehensible to humans (first cipher).

図３のデータベースの番号０のレコードから順にレコードを読みとる場合、ステップＳ１１１での処理では、対訳単語が長い順（または、より多くの単語を含む順）にデータをソートしておくことが望ましい。 When reading records in order from the record number 0 in the database of FIG. 3, in the process in step S111, it is desirable to sort the data in the order of longest translation words (or in order of including more words).

例えば、データベースに対訳単語として、「display unit」と「information display unit」とが登録される場合がある。文章中に「information display unit」の語があるときに、その中の「display unit」の部分のみが先に検索され、暗号化されると、「information」が暗号化されないままとなる。従って、対訳単語の長い順（より多くの単語を含む順）に検索を行う事により、長い単語を先に暗号化することができる。データベースのデータをソートしなくても、データベースから対訳単語の長い順（より多くの単語を含む順）に検索語をピックアップすることとしてもよい。 For example, "display unit" and "information display unit" may be registered as bilingual words in the database. When there is a word "information display unit" in a sentence, if only the "display unit" part in it is searched first and encrypted, "information" remains unencrypted. Therefore, the long words can be encrypted first by performing the search in the order of the longest translation words (the order containing more words). Instead of sorting the data in the database, the search terms may be picked up from the database in the order of longest translation words (in order of inclusion of more words).

ステップＳ１１３において、参照符号前の単語（特に名詞）で暗号に変換できないものがあるかを検索してもよい。すなわち、参照符号前の単語で暗号に変換できていないものがあれば、ステップＳ１１５で、ステップＳ１０９と同様にユーザにデータベースへの登録を促すものである。ステップＳ１１３、Ｓ１１５の処理により、参照符号前の単語を確実に暗号化することができる。なお、ステップＳ１１３、Ｓ１１５の処理は、省略しても良い。 In step S113, it may be searched whether there is a word (particularly a noun) before the reference code that cannot be converted into a cipher. That is, if there is a word before the reference code that cannot be converted into a cipher, in step S115, the user is urged to register in the database as in step S109. By the processing of steps S113 and S115, the word before the reference code can be reliably encrypted. The processing of steps S113 and S115 may be omitted.

ステップＳ１１７において、ステップＳ１１５までの処理によってワープロソフトで作成された文書データを、テキストデータ（ＵＴＦ−８などのデータ）に変換する。ＨＴＭＬデータなどに変換しても良いが、通信されるデータ量や翻訳されるデータの量を削減するためには、テキストデータとすることが望ましい。 In step S117, the document data created by the word processor software by the processing up to step S115 is converted into text data (data such as UTF-8). Although it may be converted into HTML data, it is desirable to convert it into text data in order to reduce the amount of data to be communicated and the amount of data to be translated.

次に図５のステップＳ１２１において、ステップＳ１１７で得られたデータの先頭から順に、処理対象の１文（１度に機械翻訳する処理単位）を特定する。これは、データの先頭から次の句点までを第１番目の処理データとし、さらに次の句点までを第２番目の処理データとし、それを文末（第ｎ番目の処理データの最後）まで続ける処理である。第ｎ番目の処理データの最後（データ末尾）は、句点、またはデータの最後のキャラクタである。 Next, in step S121 of FIG. 5, one sentence (a processing unit to be machine translated at one time) to be processed is specified in order from the beginning of the data obtained in step S117. In this process, the first processing data is from the beginning of the data to the next punctuation, the second processing data is from the next punctuation, and it is continued until the end of the sentence (the end of the nth processing data). Is. The end of the nth processed data (the end of the data) is a punctuation mark or the last character of the data.

「［００２３］。 "[0023].

FER １０２は警告を表示する。」の文であれば、 FER 102 displays a warning. If it is a sentence of

第１番目の文は、「［００２３］。」となる。 The first sentence is "[0023]."

第２番目の文は、「（改行コード） AAA １００は、 AAB １０１と FER １０２を備える。」となる。 The second sentence is "(line feed code) AAA 100 includes AAB 101 and FER 102."

第３番目の文は、「 AAB １０１は、 AAC １０３と FER １０２に接続される。」となる。 The third sentence is "AAB 101 is connected to AAC 103 and FER 102."

第４番目の文は、「 AAC １０３の入力がハイである場合、（改行コード） The fourth sentence is "If the input of AAC 103 is high (line feed code).

FER １０２は警告を表示する。」となる。これらが第１番目の文から順に処理対象とされ、ステップＳ１２１で処理される。なお、第２番目および第４番目の文は、処理対象の１文であるが、改行コードがその中に含まれている。 FER 102 displays a warning. ". These are processed in order from the first sentence, and are processed in step S121. The second and fourth sentences are one sentence to be processed, but the line feed code is included in the sentence.

ステップＳ１２１においては、処理対象の１文が翻訳不要であるかを判定する。上記の例であれば、第１番目の文は段落番号「［００２３］。」であり、記号と数字しか含まれていない。このように、記号、数字、アルファベットのみからなる１文であれば、翻訳する必要はないため、翻訳結果を代入する変数（キュー）にそのまま追加する。または、全角文字を半角文字に変換するなどの処理を行ってから追加することとしてもよい。すなわち、ステップＳ１２１でＹＥＳであれば、その１文がステップＳ１２３で必要に応じて処理され、キューの末尾に追加される。これにより、通信コストや翻訳コストを削減することができる。また、誤訳が生じることを防ぐことができる。 In step S121, it is determined whether or not one sentence to be processed does not require translation. In the above example, the first sentence is the paragraph number "[0023].", Which contains only symbols and numbers. In this way, if it is one sentence consisting of only symbols, numbers, and alphabets, it is not necessary to translate it, so it is added as it is to the variable (queue) to which the translation result is assigned. Alternatively, it may be added after performing processing such as converting full-width characters to half-width characters. That is, if YES in step S121, the sentence is processed in step S123 as necessary and added to the end of the queue. As a result, communication costs and translation costs can be reduced. In addition, it is possible to prevent mistranslation from occurring.

ステップＳ１２１でＮＯであれば、ステップＳ１２５でその１文をＨＴＴＰＳ通信により外部の他社コンピュータ資源２００に送信する。ＨＴＴＰＳ通信で行われる暗号化は、第２の暗号化である。 If NO in step S121, the sentence is transmitted to an external computer resource 200 of another company by HTTPS communication in step S125. The encryption performed in HTTPS communication is the second encryption.

ステップＳ１２７において、その１文の翻訳結果を外部の他社コンピュータ資源２００から受信する。この通信もＨＴＴＰＳで暗号化が行われており、コンピュータ１００で復号化（第２の復号化）が行われることで解読される。翻訳結果は、キューの末尾に付加される。 In step S127, the translation result of the one sentence is received from an external computer resource 200 of another company. This communication is also encrypted by HTTPS, and is decrypted by decryption (second decryption) by the computer 100. The translation result is added to the end of the queue.

翻訳結果をキューの末尾に付加するときに、ピリオドの後であればスペース（またはツースペース（２個のスペース））を挿入することとしてもよい。すなわち、受信された文書同士を、その間にスペースを挿入して結合することとしてもよい。これは、翻訳後の文章の可読性を上げるための処理である。 When adding the translation result to the end of the queue, a space (or two spaces (two spaces)) may be inserted after the period. That is, the received documents may be combined by inserting a space between them. This is a process for improving the readability of the translated text.

ステップＳ１２９で最後の文（第ｎ番目の文）まで処理が終わったかが判定され、ＮＯであればステップＳ１２１からの処理（次の番の文の処理）を繰り返す。これにより、第１番目の文から第ｎ番目の文までの処理が完了する。 In step S129, it is determined whether the processing has been completed up to the last sentence (nth sentence), and if NO, the processing from step S121 (processing of the next sentence) is repeated. As a result, the processing from the first sentence to the nth sentence is completed.

ステップＳ１２９でＹＥＳとなれば、キューには翻訳結果が含まれている。例えば上記の例であれば、 If YES in step S129, the queue contains the translation result. For example, in the above example

「［００２３］。 "[0023].

FER １０２は警告を表示する。」の文は、 FER 102 displays a warning. Is the sentence

「[0023]。 "[0023].

AAA 100 includes AAB 101 and FER 102. AAB 101 is connected to AAC 103 and FER 102. When an input of AAC 103 is high, AAA 100 includes AAB 101 and FER 102. AAB 101 is connected to AAC 103 and FER 102. When an input of AAC 103 is high,

FER 102 displays warning.」のような文に変換され、それがキューに含まれることになる。このように、翻訳文も暗号化されているため、自社以外の者は翻訳文を得たとしても、その内容を知ることができない。 It will be converted to a statement like "FER 102 displays warning." And it will be queued. In this way, the translated text is also encrypted, so even if a person other than the company obtains the translated text, the content cannot be known.

ここでステップＳ１３１において、キューの中の暗号を英単語に変換する処理が行われる。 Here, in step S131, a process of converting the code in the queue into English words is performed.

この処理は、ステップＳ１０５での処理と類似する処理であり、図３のデータベースの番号０のレコードから順にレコードを読みとり、そのレコードに記載された暗号を対象語として翻訳対象の文書の全文検索を行い、発見された暗号を、図３の同じレコードに記載された対訳単語に置き換える処理である。 This process is similar to the process in step S105, and records are read in order from the record number 0 in the database of FIG. 3, and the full-text search of the document to be translated is performed using the code described in the record as the target word. This is a process of replacing the found code with a bilingual word described in the same record of FIG.

これにより、上記の This will result in the above

「[0023]。 "[0023].

FER 102 displays warning.」 FER 102 displays warning. "

の文は、 The sentence is

「[0023]。 "[0023].

electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high, electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high,

display 102 displays warning.」 display 102 displays warning. "

の文に変換される。この変換を自社内で行う事で、文書の秘匿性が担保される。 Is converted into a sentence of. By performing this conversion in-house, the confidentiality of the document is guaranteed.

なおこの変換においては、暗号を単語の一部分にマッチさせないよう、全文検索のフラグをセットすることが望ましい。例えば、ＡＢＳＴＲＡＣＴの単語中の「ＡＢＳ」が暗号であるとして誤変換されることが防止するものである。このような処理は、例えばマイクロソフト社のＶＢＡを用いるのであれば、ＭａｔｃｈＷｈｏｌｅＷｏｒｄプロパティをＴｒｕｅにして全文検索を行うことで実現される。 In this conversion, it is desirable to set the full-text search flag so that the cipher does not match a part of the word. For example, it prevents "ABS" in the word ABSTRACT from being erroneously converted as a cipher. Such processing is realized, for example, when Microsoft's VBA is used, a full-text search is performed by setting the MatchWoleWord property to True.

また、ＣＰＵ、ＲＡＭなどの意味を持つ単語が暗号として翻訳されることを防ぐため、このような意味を持つ単語は、暗号としてデータベースに登録されないようにする（予約語とする）ことが望ましい。 Further, in order to prevent words having meanings such as CPU and RAM from being translated as ciphers, it is desirable that words having such meanings are not registered in the database as ciphers (reserved words).

ステップＳ１３３において、後処理を行う。これは具体的には以下の処理である。 In step S133, post-processing is performed. Specifically, this is the following process.

（１）不要なスペースを削除する処理。例えば、ピリオドの後ではない位置にスペースが２以上挿入されている場合、それを１つにする処理。また、アルファベットと数字からなる参照符号（１００Ａなど）の数字とアルファベットの間のスペースを削除する処理。 (1) A process of deleting unnecessary spaces. For example, if two or more spaces are inserted at positions other than after the period, the process of unifying them. Also, the process of deleting the space between the numbers and the alphabet of the reference code (100A, etc.) consisting of the alphabet and the numbers.

（２）カンマの後に存在する改行コード、またはピリオドの後に存在する改行コード以外の改行コードを削除する処理。例えば上記例であれば、「When an input of signal input unit 103 is high,」の後の改行コードを削除する処理である。 (2) A process of deleting a line feed code existing after a comma or a line feed code other than the line feed code existing after a period. For example, in the above example, it is a process of deleting the line feed code after "When an input of signal input unit 103 is high,".

（３）複数形の誤りを正しい形に変換する処理。例えば、boxsをboxesに変換する処理である。 (3) Processing to convert plural errors into correct forms. For example, the process of converting boxes to boxes.

（４）「前記」の訳語に当てられた「said」を一般的な語である「the」に直す処理。 (4) The process of converting "said", which is the translation of "above", into the general word "the".

（５）暗号を鉤括弧でくくるなど、特殊なコードを付加して翻訳した場合、それを削除する処理。例えば、ＡＡＡ１００の暗号に対し、「ＡＡＡ」１００など鉤括弧を付与して機械翻訳すると、"AAA" 100など元にはないダブルクォーテーションが付加されるため、これを削除する処理。 (5) When a special code is added, such as enclosing the code in brackets, and translated, the process is deleted. For example, if the code of AAA100 is machine-translated by adding brackets such as "AAA" 100, double quotation marks such as "AAA" 100 that are not in the original are added, so the process of deleting this.

（６）冠詞を正しくする処理。初出の単語は、不定冠詞をつけることとし、２度目以降に出現する場合には、定冠詞を付するなど。 (6) Processing to correct articles. The first word should have an indefinite article, and if it appears a second time or later, a definite article should be added.

（７）全角の句点を削除する処理。（上記[0023]。の句点などを削除する処理である。） (7) Processing to delete full-width punctuation marks. (This is a process to delete the punctuation marks of [0023]. Above.)

ステップＳ１３５において、ユーザは後処理後の文書をチェックし、誤りがあればそれを修正する。 In step S135, the user checks the post-processed document and corrects any errors.

ステップＳ１３７において、ワープロソフトでのスペルチェッカーにより、単語、構文の誤りがあればそれを修正し、同時に複数形の誤りがあればそれを修正する。誤りをユーザが修正・編集することで、最終的な翻訳文が完成する。 In step S137, the spell checker in the word processing software corrects any errors in the word or syntax, and at the same time corrects any errors in the plural form. The final translation is completed by the user correcting and editing the error.

このようにして、翻訳原文、および翻訳文双方の秘匿性を保ったまま、外部のコンピュータ資源を用いた機械翻訳を行う事ができる。 In this way, machine translation using an external computer resource can be performed while maintaining the confidentiality of both the original translated text and the translated text.

また、上付き、下付き文字、イタリック、太字、アンダーラインが付された文字を含む単語をデータベースに登録することで、それは暗号に変換された状態で機械翻訳される。暗号を後に上付き、下付き文字、イタリック、太字、アンダーラインが付された文字を含む単語に再度変換することができるため、機械翻訳によって上付き、下付き文字、イタリック、太字、アンダーラインが付された文字を含む単語が誤訳されることが防止される。例えばデータベースに、単語は「ＳｉＯ_２」、その対応する暗号は「ＡＢＣ」、対訳単語は「ＳｉＯ_２」として登録してもよいし、図３の３５０４番のレコードのように登録を行ってもよい。翻訳前の単語と翻訳語の単語が同じであれば、単語と暗号のみをデータベースに登録し、対訳単語は登録しないこととしてもよい。単語は全角、対訳単語は半角（あるいはその逆）としてデータベースに登録しても良い。 Also, by registering words containing superscripts, subscripts, italics, bold, and underlined characters in the database, they are machine-translated in a cryptographically converted state. Machine translation allows superscripts, subscripts, italics, bold, and underlines because the code can be converted back into words that contain superscripts, subscripts, italics, bold, and underlined characters. Words containing subscripts are prevented from being mistranslated. For example, the word may be registered as "SiO ₂ ", the corresponding code may be "ABC", and the translated word may be registered as "SiO ₂ " in the database, or may be registered as in the record No. 3504 in FIG. Good. If the word before translation and the word of the translated word are the same, only the word and the code may be registered in the database, and the translated word may not be registered. Words may be registered in the database as full-width characters and translated words as half-width characters (or vice versa).

なお、秘密性の低い文書であれば、ステップＳ１１１〜Ｓ１１５の暗号化と、ステップＳ１３１での復号化を行わず、一部が対訳単語に変換されている文書を機械翻訳してもよい。この場合、第１の暗号化と第１の復号化は行われず、第２の暗号化と第２の復号化のみが行われることとなる。この方法は、図１の他社に対して翻訳の内容を秘密にする必要が無い場合に有効である。この方法では、機械翻訳後の文書の単語のゆらぎを防ぐことができる。この方法において、一括変換により変換された対訳単語に対して、それを囲む鉤括弧やダブルコーテーションなどを付加した後に機械翻訳を行ってもよい。これにより、翻訳しなくて良い単語であることを示した状態で機械翻訳を行う事ができるため、機械翻訳後の文書の単語のゆらぎ、誤訳がより少なくなる。 If the document has low confidentiality, the document partially converted into a bilingual word may be machine-translated without performing the encryption in steps S111 to S115 and the decryption in step S131. In this case, the first encryption and the first decryption are not performed, and only the second encryption and the second decryption are performed. This method is effective when it is not necessary to keep the content of the translation secret from the other companies in FIG. In this method, it is possible to prevent fluctuations in the words of the document after machine translation. In this method, machine translation may be performed after adding brackets or double quotation marks surrounding the bilingual words converted by batch conversion. As a result, machine translation can be performed in a state indicating that the word does not need to be translated, so that the fluctuation and mistranslation of the word in the document after machine translation are reduced.

「電気自動車」の訳語としては、「electric vehicle」、「electrical vehicle」、「electric-powered vehicle」、「electronic vehicle」、「battery car」、「battery vehicle」など複数の訳語が存在する。ニューラルネットワークによる機械翻訳を行うと、文脈によってどの訳語が当てられるかわからず、翻訳後の単語表現にゆらぎが生じる。また、翻訳対象の文書の分野の違いによって、適切な訳語が異なることも多い。本実施の形態においては、ニューラルネットワークによる機械翻訳の前に、予め単語が統一された訳語に変換される。このため、表現の揺らぎがなくなる。 There are multiple translations of "electric vehicle" such as "electric vehicle", "electrical vehicle", "electric-powered vehicle", "electronic vehicle", "battery car", and "battery vehicle". When machine translation is performed by a neural network, it is not possible to know which translated word is applied depending on the context, and the word expression after translation fluctuates. In addition, appropriate translations often differ depending on the field of the document to be translated. In the present embodiment, the words are converted into a unified translation in advance before the machine translation by the neural network. Therefore, there is no fluctuation in expression.

なお、暗号を全角文字として日→英の機械翻訳をした場合、機械翻訳後にはそれらは半角文字となっている。このため、ステップＳ１３１の変換では、それを考慮して全文変換する必要がある。はじめから暗号を半角文字としておけば、このような問題は生じない。 If Japanese-English machine translation is performed using the code as full-width characters, they will be half-width characters after machine translation. Therefore, in the conversion in step S131, it is necessary to perform full-text conversion in consideration of this. If the encryption is set as half-width characters from the beginning, such a problem does not occur.

［第２の実施の形態］ [Second Embodiment]

図７は、本発明の第２の実施の形態における翻訳システムに含まれるコンピュータプログラムの単語、暗号登録処理を示すフローチャートである。 FIG. 7 is a flowchart showing a word and cryptographic registration process of a computer program included in the translation system according to the second embodiment of the present invention.

図６で示された処理は、図１の翻訳システム内での処理に限らず、単語登録処理（データベース作成処理）として、別個独立に行う事ができる。 The process shown in FIG. 6 is not limited to the process in the translation system of FIG. 1, and can be performed separately and independently as a word registration process (database creation process).

すなわち、図７のステップＳ２０１でワープロソフトで文章を開いた後に、ステップＳ２０３以降の処理（図６のステップＳ２０３以降の処理と同じ）を行う事も可能である。 That is, it is also possible to perform the processing after step S203 (same as the processing after step S203 in FIG. 6) after opening the sentence with the word processor software in step S201 of FIG.

なお、第１、第２の実施の形態において、暗号を用いずに翻訳を行うのであれば（秘密にする必要が無い文書を翻訳する場合）、データベースに暗号を登録する必要はない。この場合、ステップＳ２１５の処理は不要となる。 In the first and second embodiments, if the translation is performed without using the cipher (when translating a document that does not need to be kept secret), it is not necessary to register the cipher in the database. In this case, the process of step S215 becomes unnecessary.

また、暗号はデータベースに登録することとしたが、例えばレコード番号から一意に決定される暗号を用いることとし、暗号はデータベースに記録せず、コンピュータ１００でレコード番号から演算により求めることとしてもよい。 Further, although the cipher is registered in the database, for example, a cipher uniquely determined from the record number may be used, and the cipher may not be recorded in the database and may be obtained by calculation from the record number on the computer 100.

このようなフローチャートで示される処理は、記憶装置１１１に記録されたコンピュータプログラムがＲＡＭ１０９上に読み出され、それをＣＰＵ１０１が順次実行することで実行される。プログラムは、コンパイルされた機械語の実行形式で保存されてもよいし、ソースコードをインタプリタが逐次実行する形式としてもよいし、中間言語で記載された形式としても良い。 The process shown in such a flowchart is executed by reading the computer program recorded in the storage device 111 onto the RAM 109 and sequentially executing the computer program by the CPU 101. The program may be saved in a compiled machine language executable format, the source code may be sequentially executed by the interpreter, or the program may be written in an intermediate language.

ワープロソフト（マイクロソフト社のＷＯＲＤ、オープンソース方式で公開されているＯｐｅｎＯｆｆｉｃｅなど）を用い、そのマクロ（ＶＢＡ：ＶｉｓｕａｌＢａｓｉｃＦｏｒＡｐｐｌｉｃａｔｉｏｎｓなど）を用いて図６、図７のフローチャートのプログラムが実行されてもよい。データベースは、専用のデータベースでもＣＳＶファイルであってもよいが、ここではマイクロソフト社のＥＸＣＥＬなどの表計算ソフトを用いてデータベースが形成され、ワープロソフトのＶＢＡを用いてデータベースへのアクセスが行われるものとする。 Even if the program shown in the flowcharts of FIGS. 6 and 7 is executed using word processing software (Microsoft's WORD, OpenOffice published in an open source system, etc.) and its macro (VBA: Visual Basic For Applications, etc.). Good. The database may be a dedicated database or a CSV file, but here the database is formed using spreadsheet software such as Microsoft's EXCEL, and the database is accessed using the word processing software VBA. And.

図７を参照して、ステップＳ２０１においてユーザはワープロソフトによって翻訳対象の文章が記録された文書ファイルを開く（文書ファイルを補助記憶装置からＲＡＭへ展開する）。また、表計算ソフトも開いておく。 With reference to FIG. 7, in step S201, the user opens the document file in which the sentence to be translated is recorded by the word processing software (expands the document file from the auxiliary storage device to the RAM). Also, open the spreadsheet software.

文書ファイルは、プレーンテキスト形式で記載されたテキストファイルであってもよいし、ワープロソフトの文書ファイルであってもよいし、ＨＴＭＬファイルであってもよい。また、画像に含まれる文字がＯＣＲによりテキスト化されたファイルであってもよい。画像ファイルを入力し、ステップＳ２０１でＯＣＲ機能により文字部分をテキストデータに変換したファイルを作成しても良い。他、文章を記述することができるのであれば、ファイルのフォーマットは特定のものに限定されない。文書は、外部コンピュータからファイル転送プロトコルや電子メールソフトウェアを用いて受信しても良いし、ＵＳＢメモリなどのストレージから入力されても良い。また、キーボード１１９やマイクによって入力されても良い。文書は、インターネットからダウンロードすることとしてもよい。 The document file may be a text file described in plain text format, a document file of word processing software, or an HTML file. Further, the characters included in the image may be a file converted into text by OCR. An image file may be input, and a file in which the character portion is converted into text data by the OCR function in step S201 may be created. In addition, the format of the file is not limited to a specific one as long as it can describe sentences. The document may be received from an external computer using a file transfer protocol or e-mail software, or may be input from a storage such as a USB memory. Further, it may be input by a keyboard 119 or a microphone. The document may be downloaded from the Internet.

「電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章が記載されていたものとする。 "The electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to the signal input means 103 and the display unit 102. When the input of the signal input means 103 is high, the display unit 102 warns. It is assumed that the sentence "Display." Was described.

図４のステップＳ２０３において、ユーザは開かれた文書の内容をチェックし、必要に応じてそれを訂正する。また必要に応じて文章を追加（入力）する。 In step S203 of FIG. 4, the user checks the contents of the opened document and corrects it if necessary. Also, add (input) sentences as needed.

図４のステップＳ２０５において、ユーザは文章を目視することにより、データベースに登録したい単語を選択する。これはカーソルを単語の最初または最後に移動させ、ＳＨＩＦＴ＋方向キーの押下、またはマウスによるドラッグ操作によって、対象の単語を反転表示（または文字および／または背景色を他の部分と変えて表示）させるものである。単語の登録は、主に参照符号の前に存在する単語に対して行われるが、参照符号を伴わない単語も登録するようにしてもよい。参照符号とは、図面に記載された符号（引き出し線とともに用いられる、構成要素を示す数字や文字など）を参照するための符号である。例えば「電気自動車１００」の「１００」が参照符号であり、参照符号の前に存在する単語とは、「電気自動車」を意味する。 In step S205 of FIG. 4, the user visually selects a word to be registered in the database by visually observing the sentence. This moves the cursor to the beginning or end of a word and highlights the target word (or changes the text and / or background color to other parts) by pressing the SHIFT + arrow keys or dragging with the mouse. It is a thing. The word registration is mainly performed on the word existing before the reference code, but the word without the reference code may also be registered. The reference code is a code for referring to a code (such as a number or a character indicating a component used together with a leader line) described in a drawing. For example, "100" of "electric vehicle 100" is a reference code, and the word existing before the reference code means "electric vehicle".

単語の選択は、上記のように人間の判断（キーやマウスの入力）によっても良いが、例えば参照符号の前に存在する単語を自動的に検出し、それを選択することとしてもよい。また、参照符号の有無に限らず、単語を自動的に検出し、それを選択することとしてもよい。 The word may be selected by human judgment (key or mouse input) as described above, but for example, the word existing before the reference code may be automatically detected and selected. Further, the word may be automatically detected and selected regardless of the presence or absence of the reference code.

単語が選択された状態で、画面に表示された登録ボタン（または登録のためのショートカットキー）が押下されたのであれば、ステップＳ２０７において、ＣＰＵ１０１は、ステップＳ２０５で選択された単語の対訳語の入力を受け付ける。これはダイアログボックスを画面に表示し、ユーザからキーボードや音声による入力を受け付けるものでもよいし、辞書から単語候補を複数表示し、その中から対訳語をユーザに選択させるＵＩ（ユーザインタフェース）を使用しても良い。これにより翻訳前の単語と、翻訳後の単語との対応が決定される。 If the registration button (or shortcut key for registration) displayed on the screen is pressed while the word is selected, in step S207, the CPU 101 determines the translation of the word selected in step S205. Accept input. This may display a dialog box on the screen and accept keyboard or voice input from the user, or use a UI (user interface) that displays multiple word candidates from the dictionary and allows the user to select a bilingual word from among them. You may. This determines the correspondence between the untranslated word and the translated word.

ステップＳ２０９において、ＣＰＵ１０１は翻訳対象の文書の全文検索を行い、ステップＳ２０７までの処理で決定された翻訳前の単語を検索し、ステップＳ２１１においてそれを対応する翻訳後の単語に変換する。全文検索はユーザに確認を取ることなく行っても良いし、１つの単語が検索される度に、「（翻訳後の単語）に置換しますか？（ＹＥＳ／ＮＯ）」のようなダイアログボックスを画面に表示し、単語を置換するか否かをユーザに選択させるようにしても良い。これにより、ステップＳ２０５で選択された単語を含む、全文中のその単語が、ステップＳ２０７で決定された翻訳後の単語に一括変換される。例えば「電気自動車」の語に「electric vehicle」が対応付けられたのであれば、文書中の「電気自動車」の語は、「electric vehicle」に一括変換される。 In step S209, the CPU 101 performs a full-text search of the document to be translated, searches for the untranslated word determined in the processes up to step S207, and converts it into the corresponding translated word in step S211. Full-text search may be performed without asking the user, and each time one word is searched, a dialog box such as "Do you want to replace it with (translated word)? (YES / NO)" May be displayed on the screen and the user may choose whether or not to replace the word. As a result, the word in the whole sentence including the word selected in step S205 is collectively converted into the translated word determined in step S207. For example, if the word "electric vehicle" is associated with "electric vehicle", the word "electric vehicle" in the document is collectively converted to "electric vehicle".

ステップＳ２１３において、ＣＰＵ１０１は、データベース２０７の情報の登録されている最下行（図３の最下行）を検索し、その次の行である空白行の番号（通し番号）を取得する。 In step S213, the CPU 101 searches for the bottom line (bottom line in FIG. 3) in which the information in the database 207 is registered, and acquires the number (serial number) of the blank line that is the next line.

ステップＳ２１５において取得された番号の２６進数への変換が行われ、対象となる単語の暗号が作成される。ステップＳ２１７において、暗号は、データベースの情報の登録されている最下行の下の行に、翻訳前の単語、翻訳後の単語とともに１レコードとして記録される。データベースへ登録された情報は、将来用いる翻訳辞書としての資産となる。 The number acquired in step S215 is converted into a 26-ary number, and the code of the target word is created. In step S217, the cipher is recorded as one record together with the untranslated word and the translated word in the line below the bottom line where the information in the database is registered. The information registered in the database will be an asset as a translation dictionary to be used in the future.

ステップＳ２１７の処理の後、ステップＳ２０３からの処理が繰り返し行われる。また、ステップＳ２０５で登録ボタンが押されていないときは、ステップＳ２０３からの処理が繰り返し行われる。 After the process of step S217, the process from step S203 is repeated. When the registration button is not pressed in step S205, the process from step S203 is repeated.

全文の処理が終了すると、利用者は文書を保存する。これは文書中の一部が翻訳された原文データであり、その後のニューラル機械翻訳の対象とされてもよい。 When the full text processing is complete, the user saves the document. This is the original data in which a part of the document is translated, and may be the target of subsequent neural machine translation.

図６や図７の処理でデータベースに登録される単語は、主に、参照符号の前に記載される単語である。例えば翻訳対象の原文が、「電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文であれば、そのうちの「電気自動車」、「エンジン」、「表示部」、「信号入力手段」が登録の対象とされる。 The words registered in the database in the processes of FIGS. 6 and 7 are mainly the words described before the reference code. For example, the original text to be translated states, "The electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to the signal input means 103 and the display unit 102. When the input of the signal input means 103 is high. , The display unit 102 displays a warning. ”, The“ electric vehicle ”,“ engine ”,“ display unit ”, and“ signal input means ”are subject to registration.

なお、参照符号を伴わない「入力」、「警告」などの語も、暗号、対訳単語とともに図４のデータベース２０７に登録するようにしてもよい。 Words such as "input" and "warning" that do not have a reference code may also be registered in the database 207 of FIG. 4 together with the code and the bilingual word.

さらに、参照符号を伴う単語の少なくとも一部が登録の対象外となっていてもよい。すなわち、「電気自動車」、「エンジン」、「表示部」、「信号入力手段」のうちの少なくとも一部が登録されなくても良い。 Further, at least a part of the word with the reference code may be excluded from the registration. That is, at least a part of the "electric vehicle", the "engine", the "display unit", and the "signal input means" need not be registered.

ステップＳ２１３での登録の後に、原文の単語が長いもの順にソートされても良い。より長い単語が番号０の近く（上側）に、より短い単語がより大きい番号側（下側）に位置するようにソートするものである。 After the registration in step S213, the words in the original text may be sorted in descending order. The longer words are sorted so that they are closer to the number 0 (upper side) and the shorter words are located closer to the higher number side (lower side).

以上の処理により、「電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章は、 By the above processing, "the electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to the signal input means 103 and the display unit 102. When the input of the signal input means 103 is high, the display is displayed. The sentence "Part 102 displays a warning."

「 electric vehicle １００は、 engine １０１と display １０２を備え、 engine １０１は、 signal input unit １０３と display １０２に接続される。 signal input unit １０３の入力がハイである場合、 display １０２は警告を表示する。」の文章に変換される。この文章をユーザはチェックし、誤りがないかを判断する。誤りがある場合、ワープロソフト上で編集を行う。 "The electric vehicle 100 comprises engine 101 and display 102, and engine 101 is connected to signal input unit 103 and display 102. If the input of signal input unit 103 is high, display 102 displays a warning. Is converted into a sentence. The user checks this sentence to determine if there are any mistakes. If there is an error, edit it on the word processing software.

［第３の実施の形態］ [Third Embodiment]

図８は、本発明の第３の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。 FIG. 8 is a flowchart showing a Japanese-English translation process of a computer program included in the translation system according to the third embodiment of the present invention.

図を参照して、ステップＳ１０１において、ユーザの指示に従ってＣＰＵ１０１は、翻訳する文書（文章）を入力する。ここで入力される文書は、図７での処理により変換が行われた後の文章であるものとする。ここでは、「 electric vehicle １００は、 engine １０１と display １０２を備え、 engine １０１は、 signal input unit １０３と display １０２に接続される。 signal input unit １０３の入力がハイである場合、 display １０２は警告を表示する。」の文章が処理される例を説明する。 With reference to the figure, in step S101, the CPU 101 inputs a document (sentence) to be translated according to a user's instruction. It is assumed that the document input here is a sentence after the conversion is performed by the process shown in FIG. Here, "the electric vehicle 100 includes engine 101 and display 102, and engine 101 is connected to signal input unit 103 and display 102. If the input of signal input unit 103 is high, display 102 warns. An example in which the sentence "Display." Is processed will be described.

ステップＳ１０３においてＣＰＵ１０１は、入力文書に関して、機械翻訳に難がある部分を検索する。例えば、日本語から翻訳を行うのであれば、文書を「。」などの区点（英語からの翻訳であればピリオド）で区切ることで複数の文に分解し、各文が所定の長さ以内であるか（すなわち各文の長さが所定の閾値以下であるか）を判断するものである（閾値として、他社コンピュータ資源２００の性能に応じ、例えば８０〜１２０文字程度の値が選択される）。 In step S103, the CPU 101 searches for a part of the input document that has difficulty in machine translation. For example, when translating from Japanese, the document is decomposed into multiple sentences by separating the document with a delimiter such as "." (Period if translating from English), and each sentence is within the specified length. (That is, whether the length of each sentence is equal to or less than a predetermined threshold) (as a threshold, a value of, for example, about 80 to 120 characters is selected according to the performance of the computer resource 200 of another company. ).

１文の長さが長ければ、機械翻訳は不正確になりやすいため、予め文章の長さをチェックし、長い文は翻訳不可とするものである。長い文章については、例えば重文であれば、「ＡはＢであり、ＣはＤである。」のような文章を、「ＡはＢである。ＣはＤである。」のような複数の文章に変換し、各文を短くする変換・編集処理が行われる。このような変換は自動で行っても良いし、ユーザに長いと判断された文章を示し、ユーザの手動で修正を促すようにしてもよい。また、自動で変換を行う場合には、自動変換後の文章（および必要であれば変換前の文章）をユーザに示し、正しく変換が行われているかをユーザにチェックさせることが望ましい。 If the length of one sentence is long, machine translation tends to be inaccurate. Therefore, the length of the sentence is checked in advance, and long sentences cannot be translated. For a long sentence, for example, in the case of a compound sentence, a sentence such as "A is B and C is D." And a plurality of sentences such as "A is B. C is D." Conversion / editing processing is performed to convert to sentences and shorten each sentence. Such conversion may be performed automatically, or the user may be shown a sentence judged to be long and the user may be manually prompted to correct it. Further, in the case of automatic conversion, it is desirable to show the user the text after the automatic conversion (and the text before the conversion if necessary) so that the user can check whether the conversion is performed correctly.

機械翻訳に難がある部分のチェックとしては、１文に主語があるか（例えば「○○は、」や「○○が、」に相当する語句が含まれているか）、１文中の主語と述語が対応しているかなどをチェックしても良い。より機械翻訳を誤訳無く正確に行うのであれば、例えば一文中に主語が１つのみであり、かつ述語が１つのみであることを翻訳可能な文の条件としても良い。 To check the part where machine translation is difficult, is there a subject in one sentence (for example, is there a phrase corresponding to "○○ wa" or "○○ wa") and the subject in one sentence? You may check whether the predicates correspond. If machine translation is performed more accurately without mistranslation, for example, it may be a condition of a translatable sentence that there is only one subject and only one predicate in one sentence.

これらのチェックについては、正規表現を利用して文のパターンをマッチングさせることで行ってもよい。形態素解析によりチェックを行ってもよい。また例えば学習済みのリカレントニューラルネットワーク（ＲＮＮ）を用い、チェック対象の文章を入力とし、出力として機械翻訳に難がない文章を出力させる、または出力として難あり／なしの信号を出力させ、難がある場合にユーザに修正を促す、などを行うことも可能である。 These checks may be performed by matching sentence patterns using regular expressions. The check may be performed by morphological analysis. Also, for example, using a learned recurrent neural network (RNN), the sentence to be checked is input, and the sentence that is not difficult for machine translation is output as the output, or the signal with / without difficulty is output as the output. It is also possible to urge the user to make corrections in some cases.

ステップＳ１０５において、機械翻訳が可能な文章であるかのチェックを行い、難のある部分が存在するのであれば、ステップＳ１０３での処理を繰り返し行う。 In step S105, it is checked whether the sentence can be machine translated, and if there is a difficult part, the process in step S103 is repeated.

ステップＳ１０５において、機械翻訳が可能であると判断されたのであれば、ステップＳ１０７で文章の前処理を行う。これは、機械翻訳にミスがないよう、機械翻訳をしやすくする処理である。例えば複文であれば、「ＡがＢのとき、ＣはＤを行う。」の構造を有するが、翻訳ミスを防ぐため、複文の区切り部分（上記文章であれば、「とき、」の直後）に目印となるコードを埋め込む処理である。コードは、改行コード、スペース、区切りとなるキャラクターなどであり、翻訳に影響を与えず、区切りを機械にわかりやすくするためのコードが選択される。また、例えば「ＡはＢであり、ＣはＤである。」のような構造を有する重文を、重文の区切り部分（上記文章であれば、「であり、」の直後）に上記目印となるコードを埋め込んでもよい。ステップＳ１０７の処理により、機械翻訳において処理する文の単位が小さくなるため、機械翻訳において誤った主語−述語の組み合わせが選択されたり、その他の誤訳が生じることが防止される。 If it is determined in step S105 that machine translation is possible, the sentence is preprocessed in step S107. This is a process that facilitates machine translation so that there are no mistakes in machine translation. For example, a compound sentence has a structure of "when A is B, C performs D." However, in order to prevent translation errors, the delimiter part of the complex sentence (in the above sentence, immediately after "when,"). It is a process of embedding a code that serves as a mark in. The code is a line feed code, a space, a character that serves as a delimiter, etc., and a code that does not affect the translation and makes the delimiter easy for the machine to understand is selected. Further, for example, an important cultural property having a structure such as "A is B and C is D" is used as the mark at the delimiter portion of the important cultural property (in the case of the above sentence, immediately after "is"). You may embed the code. Since the unit of the sentence to be processed in the machine translation is reduced by the process of step S107, it is possible to prevent the wrong subject-predicate combination from being selected or other mistranslations in the machine translation.

ステップＳ１０３〜Ｓ１０７での処理により、例えば上記の By the processing in steps S103 to S107, for example, the above

「 electric vehicle １００は、 engine １０１と display １０２を備え、 engine １０１は、 signal input unit １０３と display １０２に接続される。 signal input unit １０３の入力がハイである場合、 display １０２は警告を表示する。」の文章は、 "The electric vehicle 100 comprises engine 101 and display 102, and engine 101 is connected to signal input unit 103 and display 102. If the input of signal input unit 103 is high, display 102 displays a warning. The sentence is

「 electric vehicle １００は、 engine １０１と display １０２を備える。 engine １０１は、 signal input unit １０３と display １０２に接続される。 signal input unit １０３の入力がハイである場合、（改行コード） "The electric vehicle 100 includes engine 101 and display 102. The engine 101 is connected to signal input unit 103 and display 102. If the input of signal input unit 103 is high (line feed code).

display １０２は警告を表示する。」 display 102 displays a warning. "

の文章に変換される（ステップＳ１０３で単文化が行われ、ステップＳ１０７で複文を作る文の区切りに改行コードが挿入されている）。 (A single culture is performed in step S103, and a line feed code is inserted at the break of the sentence that creates a compound sentence in step S107).

ステップＳ１０９において、記憶装置１１１中のデータベースを参照して、翻訳する文書内の参照符号前に記載されている単語（特に名詞）を暗号に変換する（第１の暗号化）。データベースは、原文と、暗号文字列と、翻訳文字列を対比させるものである。 In step S109, the database in the storage device 111 is referred to, and the word (particularly noun) described before the reference code in the document to be translated is converted into a cipher (first encryption). The database contrasts the original text with the encrypted string and the translated string.

ステップＳ１０９で参照符号前の単語を暗号に一括変換することによって、例えば上述の By batch-converting the words before the reference code into the cipher in step S109, for example, the above-mentioned

「ＡＡＡ１００は、ＡＡＢ１０１とＦＥＲ１０２を備える。ＡＡＢ１０１は、ＡＡＣ１０３とＦＥＲ１０２に接続される。ＡＡＣ１０３の入力がハイである場合、（改行コード） "AAA100 comprises AAB101 and FER102. AAB101 is connected to AAC103 and FER102. If the input of AAC103 is high (line feed code).

ＦＥＲ１０２は警告を表示する。」 FER102 displays a warning. "

の暗号文に置き換えられる。 Is replaced by the ciphertext of.

このような暗号文では、「ＡＡＡ」などの２６進数が単語として処理されるため、真の意味を外部に漏らすことなく、外部の他社コンピュータ資源２００で機械翻訳可能である。 In such a ciphertext, since a 26-ary number such as "AAA" is processed as a word, machine translation is possible with an external computer resource 200 of another company without leaking the true meaning to the outside.

ステップＳ１０９での処理は、以下のように行われる。 The process in step S109 is performed as follows.

（１）文書の先頭から、単語と参照符号の組み合わせからなる文章の構成部分を検索する。 (1) From the beginning of the document, search for the component part of the sentence consisting of the combination of the word and the reference code.

（２）検索された構成部分の単語を、データベースから検索する。 (2) Search the database for the words of the searched components.

（３）対応する暗号をデータベースから得る。 (3) Obtain the corresponding cipher from the database.

（４）上記（２）で検索された構成部分の単語を、上記（３）で得た暗号に置換する。 (4) The word of the constituent part searched in the above (2) is replaced with the code obtained in the above (3).

（５）文章中の次に現れる、単語と参照符号の組み合わせからなる文章の構成部分を検索する。 (5) Search for the component of the sentence that appears next in the sentence and consists of a combination of words and reference codes.

（６）全ての構成部分が処理されるまで、上記（１）からの処理を繰り返す。全ての構成部分が処理されたのであれば、終了する。 (6) The process from (1) above is repeated until all the components are processed. If all the components have been processed, exit.

またはステップＳ１０９での処理は、以下のように行われてもよい。 Alternatively, the process in step S109 may be performed as follows.

（１）データベースに登録されている１番目の単語を読み出す (1) Read the first word registered in the database

（２）読み出された単語を、文書の先頭から全て検索し、データベースに登録されている対応する暗号に置換する（全文一括置換）。 (2) All the read words are searched from the beginning of the document and replaced with the corresponding ciphers registered in the database (full-text batch replacement).

（３）データベースに登録されている次の単語を読み出し、上記（２）の処理を繰り返す。データベースの登録単語全てについて処理が終われば、終了する。 (3) The next word registered in the database is read, and the process of (2) above is repeated. When all the registered words in the database are processed, the process ends.

ステップＳ１１９以降の処理において、ステップＳ１０９までの処理で得られた機械翻訳前のデータは、先頭の１文から順に、ＨＴＴＰＳ送信により、ニューラルネットワーク学習済みの他社コンピュータ資源２００に送信されることとなる。 In the processing after step S119, the data before machine translation obtained in the processing up to step S109 will be transmitted to the computer resource 200 of another company that has already learned the neural network by HTTPS transmission in order from the first sentence. ..

より詳しくは、翻訳前のデータの最初の１文（例えば、「。」の句点までの文章）がステップＳ１１９においてＨＴＴＰＳにより外部の他社コンピュータ資源２００に送信される。この送信は、暗号化される（第２の暗号化）。外部の他社コンピュータ資源２００は、リクエストに応じ、受信したデータを対象言語に翻訳し、コンピュータ１００に送信する。 More specifically, the first sentence of the data before translation (for example, the sentence up to the punctuation mark of ".") Is transmitted by HTTPS to an external computer resource 200 of another company in step S119. This transmission is encrypted (second encryption). The external computer resource 200 of another company translates the received data into the target language and transmits it to the computer 100 in response to the request.

ステップＳ１２１において、コンピュータ１００は、ＨＴＴＰＳによるレスポンスとして、外部の他社コンピュータ資源２００から翻訳後の１文のデータを得る。 In step S121, the computer 100 obtains the data of one sentence after translation from the external computer resource 200 of another company as a response by HTTPS.

ステップＳ１２３において、最後の文まで翻訳が終了したかを判定し、ＮＯであればステップＳ１１９において、次の１文の処理を行う。最後の文まで翻訳が終了したのであれば、ステップＳ１２５に進む。 In step S123, it is determined whether the translation has been completed up to the last sentence, and if NO, the next sentence is processed in step S119. If the translation is completed up to the last sentence, the process proceeds to step S125.

ステップＳ１２５において、翻訳後の文書の後処理を行う。ここでは、ステップＳ１１９〜Ｓ１２３の処理で得られた１文ごとの翻訳文を１つの文書に纏める。また、ステップＳ１０９〜Ｓ１１３の第１の暗号化により得られた翻訳文は、暗号を含んでいるので、その暗号を、対応の訳語に一括変換する処理が行われる。これは、例えば以下のように行われる。 In step S125, post-processing of the translated document is performed. Here, the translated sentences for each sentence obtained in the processes of steps S119 to S123 are put together in one document. Further, since the translated text obtained by the first encryption in steps S109 to S113 contains a cipher, a process of batch-converting the cipher into the corresponding translated word is performed. This is done, for example, as follows.

（１）データベースに登録されている１番目の暗号（「ＡＡＡ」など）を読み出す。 (1) Read the first cipher (such as "AAA") registered in the database.

（２）読み出された暗号を、文書の先頭から全て検索し、データベースの対応する訳語に置換する（全文一括置換）。 (2) All the read ciphers are searched from the beginning of the document and replaced with the corresponding translations in the database (full-text batch replacement).

（３）データベースに登録されている次の暗号を読み出し、上記（２）からの処理を繰り返す。データベースの登録暗号全てについて処理が終われば、終了する。 (3) The next cipher registered in the database is read, and the process from (2) above is repeated. When all the registered ciphers in the database are processed, the process ends.

ステップＳ１２７において、複数形単語のチェックなどのスペルチェックを行う。ユーザは翻訳文を最終チェックし、問題があれば修正、編集を行う。 In step S127, a spell check such as a check for plural words is performed. The user finally checks the translated text and corrects or edits any problems.

ステップＳ１２７での処理を行うことにより、 By performing the process in step S127,

「ＡＡＡ１００ comprises ＡＡＢ１０１ and ＦＥＲ１０２. ＡＡＢ１０１ is connected to ＡＡＣ１０３ and ＦＥＲ１０２. When an input of ＡＡＣ１０３ is high,（改行コード） "AAA 100 in AAB 101 and FER 102. AAB 101 is connected to AAC 103 and FER 102. When an input of AAC 103 is high, (line feed code)

ＦＥＲ１０２ displays alert.」 FER 102 displays alert. "

の文章から、 From the text of

「electric vehicle １００ comprises engine １０１ and display １０２. engine １０１ is connected to signal input unit １０３ and display １０２. When an input of signal input unit １０３ is high,（改行コード） "Electric vehicle 100 engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high, (line feed code)

display １０２ displays alert.」 display 102 displays alert. "

の文が得られる。 Sentence is obtained.

ステップＳ１２５、ステップＳ１２７において、図５のステップＳ１３５、Ｓ１３７と同様の処理を行うことで、第１の実施の形態と同様の効果を得ることができる。 By performing the same processing as in steps S135 and S137 of FIG. 5 in steps S125 and S127, the same effect as that of the first embodiment can be obtained.

［第４の実施の形態］ [Fourth Embodiment]

図９は、本発明の第４の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。 FIG. 9 is a flowchart showing a Japanese-English translation process of a computer program included in the translation system according to the fourth embodiment of the present invention.

このフローチャートでの処理は、図８の処理に加えて、ステップＳ１１１、Ｓ１１３の処理が追加されたものである。 The processing in this flowchart is obtained by adding the processing of steps S111 and S113 in addition to the processing of FIG.

ステップＳ１０９での処理の後に、ステップＳ１１１において、データベースに登録がないことで、変換ができなかった参照符号前の単語があるかを判定する。これは、単語と参照符号の組み合わせからなる文章の構成部分が存在するか否かを正規表現を用いて判断しても良いし、ユーザが目視で確認しても良い。 After the process in step S109, in step S111, it is determined whether or not there is a word before the reference code that could not be converted because it is not registered in the database. This may be determined by using a regular expression whether or not a sentence component composed of a combination of a word and a reference code exists, or the user may visually confirm it.

ステップＳ１１１でＹＥＳであれば、ステップＳ１１３に進む。ステップＳ１１３においては、ステップＳ１１１で変換できなかった単語と、その暗号、および訳語をセットにしてデータベースに登録する。例えば、変換できなかった単語を白黒反転表示として目立たせた上で、「○○の訳語を入力してください」の語と単語入力フィールドを含むダイアログボックスを表示し、ユーザから入力を受け付けるようにしてもよい。例えば、文書中の「電気自動車」の単語を反転表示し、「『電気自動車』の訳語を入力してください」の語と単語入力フィールドを含むダイアログボックスを表示し、ユーザから「electric vehicle」の入力を受け付けるものである。単語と、その暗号、および訳語はセットにされ、データベースに登録される。なお、暗号は自動作成することが望ましい。 If YES in step S111, the process proceeds to step S113. In step S113, the word that could not be converted in step S111, its cipher, and the translated word are registered in the database as a set. For example, after making the word that could not be converted stand out as a black-and-white inverted display, a dialog box containing the word "Please enter the translation of XX" and the word input field is displayed so that input from the user is accepted. You may. For example, highlight the word "electric vehicle" in the document, display a dialog box containing the word "please enter a translation of'electric vehicle'" and a word entry field, and the user will say "electric vehicle". It accepts input. A word, its code, and a translation are set and registered in the database. It is desirable to create the cipher automatically.

ステップＳ１１１でＮＯとなるまで、ステップＳ１１３での処理は繰り返し行われる。 The process in step S113 is repeated until the result becomes NO in step S111.

ステップＳ１１１でＮＯとなったのであれば、ステップＳ１１９への処理に進む。 If NO is obtained in step S111, the process proceeds to step S119.

［その他］ [Other]

上述の複数の実施の形態、およびそれに含まれる要素（一部の構成、一部の処理）を組み合わせたり、入替えたりすることで新たな別の実施の形態とすることもできる。 It is also possible to combine or replace the plurality of embodiments described above and the elements contained therein (partial configuration, partial processing) to obtain another new embodiment.

全文一括変換（単語を対訳単語に、または対訳単語を暗号に変換する処理）を行うとき、検索対象の語は、長い単語から順にデータベースから読み出すことが望ましい。例えば、データベース中に、「検出手段」を「detection means」に変換するレコードと、「情報検出手段」を「information detection means」に変換するの２つのレコードがあった場合、長い単語である「情報検出手段」を「information detection means」に一括変換する処理をした後で、より短い単語である「検出手段」を「detection means」に一括変換する処理をするものである。これにより、文書中の「情報検出手段」の「検出手段」の部分のみが変換されることを防ぎ、翻訳の効率を上げることができる。また、単語を暗号に変換する場合も同様に、長い単語から順にデータベースから読み出すことで、一連の意味を持つ単語の集合の一部のみが暗号に変換されることを防ぐことができる。 When performing full-sentence batch conversion (processing of converting words into bilingual words or translating words into ciphers), it is desirable that the words to be searched are read from the database in order from the longest word. For example, if there are two records in the database, one that converts "detection means" to "detection means" and the other that converts "information detection means" to "information detection means", the long word "information" After the process of batch-converting "detection means" into "information detection means", the process of batch-converting the shorter word "detection means" into "detection means" is performed. This prevents only the "detection means" part of the "information detection means" in the document from being converted, and can improve the efficiency of translation. Similarly, when converting a word into a cipher, it is possible to prevent only a part of a set of words having a series of meanings from being converted into a cipher by reading from the database in order from the longest word.

レコード番号の小さいものから順に単語を読み出して一括変換する場合は、レコードを登録単語の文字数によってソートすることとしてもよい。 When reading words in ascending order of record number and performing batch conversion, the records may be sorted by the number of characters of the registered words.

特許明細書であれば、その末尾に「符号の説明」の項目があり、一般に重要な参照符号とそれが示す構成要素（単語）がそこに説明されている。符号の説明の項目に記載された単語のみを暗号化することとしてもよい。機械翻訳前に、符号の説明の項目に記載された単語が文書中で暗号化されていない場合、警告を出すこととしてもよい。 In the case of a patent specification, there is an item "description of code" at the end, and generally important reference code and the component (word) indicated by the reference code are explained there. Only the words described in the item of the code description may be encrypted. Prior to machine translation, a warning may be issued if the word in the code description section is not encrypted in the document.

また、特許明細書であれば、発明を実施するための形態、図面の簡単な説明、発明の効果、課題を解決するための手段、書類名明細書、発明の名称、技術分野、背景技術、先行技術文献、特許文献、発明の概要、符号の説明、書類名請求の範囲、書類名要約書、などをそれぞれ単語と見なして暗号化しても良い。これにより、翻訳対象の文章が特許明細書であることを分かりにくくすることができる。 If it is a patent specification, a form for carrying out the invention, a brief explanation of drawings, an effect of the invention, a means for solving a problem, a document name specification, a title of the invention, a technical field, a background art, etc. The prior art document, the patent document, the outline of the invention, the description of the code, the scope of the document name claim, the document name abstract, etc. may be regarded as words and encrypted. This makes it difficult to understand that the text to be translated is a patent specification.

特許明細書など、図面の参照符号を参照して説明を行う書類では、文書中の参照符号の前にある名詞である単語が特に重要な意味を持つ。参照符号の前にある単語の少なくとも一部を暗号（または対訳語）に置換することで、本発明は高い効果を発揮することができる。もちろん、文書中の参照符号の前にある単語以外の単語の少なくとも一部を暗号（または対訳語）に置換することとしてもよい。また、文書中の参照符号の前にある単語は暗号（または対訳語）に置換せずに、文書中の参照符号の前にある単語以外の単語の少なくとも一部を暗号（または対訳語）に置換することとしてもよい。 In documents such as patent specifications that are explained with reference to the reference code of the drawing, the word that is the noun before the reference code in the document has a particularly important meaning. By substituting at least a part of the word before the reference code with a code (or a bilingual word), the present invention can exert a high effect. Of course, at least a part of a word other than the word before the reference code in the document may be replaced with a code (or a bilingual word). Also, the word before the reference code in the document is not replaced with a code (or translation), and at least a part of the words other than the word before the reference code in the document is coded (or translation). It may be replaced.

文書中に現れる、名詞である単語と参照符号とのセット（名詞である単語とその直後に続く参照符号とからなるセット）のうち、名詞である単語の部分のみを一括変換の対象としてデータベースに登録し、参照符号は一括変換の対象に含めないことが望ましい。このようにすることで、異なる参照符号が付与された同一の単語も、そのデータベースを用いて処理することができる（将来、他の文書を翻訳するときにもそのデータベースをより有効に活用することができる）。 Of the set of noun words and reference codes (set consisting of noun words and the reference code immediately following them) that appears in the document, only the part of the noun words is stored in the database as the target of batch conversion. It is desirable to register and not include the reference code in the target of batch conversion. By doing so, the same word with a different reference code can be processed using the database (in the future, the database can be used more effectively when translating other documents. Can be done).

参照符号を含まない書類にも本発明は適用可能である。この場合、文書中の単語の少なくとも一部が暗号（または対訳語）に置換された後に、機械翻訳される。 The present invention is also applicable to documents that do not include reference numerals. In this case, at least some of the words in the document are replaced with ciphers (or bilingual words) and then machine translated.

図４のステップＳ１０５、図８のステップＳ１０９でのデータベースを用いた一括変換処理は、連続して複数回行うこととしてもよい。例えば、ユーザが事前に「検出手段」→「detection means」の変換を行うレコードをデータベースに記録することで、翻訳書類中の「情報検出手段」が「情報 detection means」と変換されることがある。これは単語の一部が変換されたケースである。この場合、ユーザは「情報 detection means」を単語として選択し、その対訳として「information detection means」をデータベースに新たなレコードとして記録することになる。そのデータベースを使って次の書類を翻訳する際に、文書の中に「情報検出手段」の語が存在すると、当該データベースを用いた一括変換でその語は、「情報 detection means」にまず変換される。データベースの検索順でその後に検索語「情報 detection means」が出てくれば、「情報 detection means」は、「information detection means」に問題なく変換される。しかし、データベースでの検索順が、「情報 detection means」の方が「検出手段」よりも前であれば、データベースを用いた一括変換処理を１回行っただけでは、「情報 detection means」は、「information detection means」に変換されない。 The batch conversion process using the database in step S105 of FIG. 4 and step S109 of FIG. 8 may be continuously performed a plurality of times. For example, the "information detection means" in the translated document may be converted to "information detection means" by recording in the database a record in which the user converts "detection means" → "detection means" in advance. .. This is the case where some of the words have been converted. In this case, the user selects "information detection means" as a word and records "information detection means" as a new record in the database as a translation. When translating the next document using that database, if the word "information detection means" exists in the document, that word is first converted to "information detection means" by batch conversion using the database. To. If the search term "information detection means" appears after that in the search order of the database, "information detection means" is converted to "information detection means" without any problem. However, if the search order in the database is that "information detection means" is earlier than "detection means", "information detection means" can be obtained by performing batch conversion processing using the database only once. Not converted to "information detection means".

そこで、データベースを用いた一括変換処理を、２回（あるいはそれ以上）繰り返して行うことで、再度の一括変換を行うものである。これにより、「情報 detection means」などの、原文に一部翻訳語を含む単語であっても、「information detection means」など正しい単語に変換される。 Therefore, the batch conversion process using the database is repeated twice (or more) to perform the batch conversion again. As a result, even a word such as "information detection means" that includes a partially translated word in the original text is converted into a correct word such as "information detection means".

また、実施の形態においてはニューラルネットワークを用いた機械翻訳を行う例を挙げたが、統計的機械翻訳や、フレーズベース、ルールベースの機械翻訳、人工知能機械翻訳、ディープラーニング機械翻訳などを用いてもよく、本発明は、ニューラルネットワークを用いた機械翻訳に限定されるものではない。 Further, in the embodiment, an example of performing machine translation using a neural network has been given, but statistical machine translation, phrase-based and rule-based machine translation, artificial intelligence machine translation, deep learning machine translation, and the like are used. Also, the present invention is not limited to machine translation using a neural network.

外部の他社コンピュータ資源２００へ情報を送受信する方法は、ＷＥＢページを介して行っても良い。ブラウザ操作の自動化手法によって、情報を送受信してもよい。また、外部の他社コンピュータ資源を運営する事業者が提供するＡＰＩを用いて情報を送受信してもよい。 The method of transmitting / receiving information to an external computer resource 200 of another company may be performed via a WEB page. Information may be sent and received by an automated method of browser operation. In addition, information may be transmitted and received using an API provided by an external business operator that operates computer resources of another company.

上記後処理としては、文章中の不要なスペースの削除、単語の統一、文章中の大文字の小文字の変換、スペルチェック、複数形の単語を正しい表記にする処理などを行ってもよい。 The post-processing may include deleting unnecessary spaces in the sentence, unifying words, converting uppercase and lowercase letters in the sentence, checking spelling, and making plural words correct.

［実施の形態の効果］ [Effect of Embodiment]

以上の実施の形態により、翻訳原文、翻訳後の文章の双方とも、第１の暗号化および第２の暗号化という２段階の暗号化が行われる。第三者に対しては両暗号化により文書の秘匿性が保たれる。また、翻訳サービスの提供事業者に対しては、第１の暗号化により文書の秘匿性が保たれる。これにより、外部翻訳サービスを利用しながらも、情報の外部流出を避けることができる。 According to the above embodiment, both the original translated text and the translated text are subjected to two-step encryption, that is, the first encryption and the second encryption. Both encryptions keep the document confidential to third parties. In addition, for the translation service provider, the confidentiality of the document is maintained by the first encryption. As a result, it is possible to avoid information leakage to the outside while using the external translation service.

情報を送信するコンピュータ資源が、不特定多数（または複数の者、他人）が利用するコンピュータ資源であっても、情報の秘密性を確保することができる。なお、「不特定多数の利用」とは、そのサイト（サーバ）が一般に公開され、各所からアクセス可能であることを意味している。 Even if the computer resource that transmits the information is a computer resource used by an unspecified number of people (or a plurality of persons or others), the confidentiality of the information can be ensured. In addition, "use of an unspecified number of people" means that the site (server) is open to the public and can be accessed from various places.

また、従来技術のニューラル機械翻訳には、入力文に含まれる情報を過不足なく厳密に翻訳することができない（翻訳の抜けが生じる可能性がある）という問題点があった。また、翻訳の重複した箇所が出力されることもあった。本実施の形態では、原文を暗号化により圧縮（短くシンプルに）することができる。また文の構造を簡略化（短くシンプルに）することができる。これにより、ニューラル機械翻訳を採択しても、入力文に含まれる情報を過不足なく翻訳でき、翻訳の抜けが生じにくいという効果がある。また、翻訳の重複した箇所が出力されにくいという効果がある。 In addition, the conventional neural machine translation has a problem that the information contained in the input sentence cannot be translated exactly without excess or deficiency (translation may be omitted). In addition, duplicate translations may be output. In the present embodiment, the original text can be compressed (short and simple) by encryption. In addition, the structure of the sentence can be simplified (short and simple). As a result, even if neural machine translation is adopted, the information contained in the input sentence can be translated in just proportion, and there is an effect that translation omissions are unlikely to occur. In addition, there is an effect that it is difficult to output duplicated parts of translation.

さらに、これまで用いられてきた学習データ（対訳辞書データ、語順データなど）をデータベースに登録するデータとして利用しやすいという効果がある。 Further, there is an effect that the learning data (translation dictionary data, word order data, etc.) that have been used so far can be easily used as data to be registered in the database.

さらに、従来技術のニューラル機械翻訳では、ＷｏｒｄＥｍｂｅｄｄｉｎｇにより柔軟な翻訳ができる反面、全く異なる単語（時には意味の無い文字列）が選択されることで、意味の全く異なる翻訳文が作成されることもあった（特に出現頻度の低い語、出現頻度の低い固有名詞など）。また、長い文章、複雑な文章の場合、意味の通じない翻訳文が作成されることもあった。本実施の形態では、単語が予め対訳単語または暗号に一括変換された後、機械翻訳される（好ましくは、さらに鉤括弧などを付与した状態で機械翻訳される）。これにより機械翻訳で、原文の単語が全く意味の異なる対訳単語に翻訳されることがなくなる。 Furthermore, in the conventional neural machine translation, while Word Embedding enables flexible translation, a completely different word (sometimes a meaningless character string) may be selected to create a translated sentence having a completely different meaning. There were (especially infrequently occurring words, infrequently occurring proper nomenclature, etc.). In addition, in the case of long sentences and complicated sentences, translated sentences that do not make sense may be created. In the present embodiment, the words are batch-converted into bilingual words or ciphers in advance, and then machine-translated (preferably, machine-translated with hook brackets or the like added). This prevents machine translation from translating the original word into a bilingual word with a completely different meaning.

［第５の実施の形態］ [Fifth Embodiment]

本発明の第５の実施の形態における翻訳システムのハードウェア構成は、第１〜第４の実施の形態におけるそれと同じであるためここでの説明を繰り返さない。第５の実施の形態においては、翻訳対象である第１の言語で記述された文章は、その一部に、名詞である単語とその直後に続く参照符号とからなるセットを含んでいる。翻訳装置は、名詞である単語とその直後に続く参照符号とからなるセットを、そのセットに対応する文字列に置き換える置換部を備えている。その置換部で置き換えが行われた文章は、インターネットを通じて外部コンピュータに送信される。翻訳装置は、コンピュータから、翻訳後の第２の言語で記述された文章をインターネットを通じて受信する。 Since the hardware configuration of the translation system according to the fifth embodiment of the present invention is the same as that in the first to fourth embodiments, the description thereof will not be repeated. In the fifth embodiment, the sentence written in the first language to be translated contains, in part, a set consisting of a word that is a noun and a reference code that immediately follows it. The translation device includes a replacement unit that replaces a set consisting of a noun word and a reference code immediately following it with a character string corresponding to the set. The text replaced by the replacement part is transmitted to an external computer via the Internet. The translation device receives the translated text written in the second language from the computer via the Internet.

第５の実施の形態が上記実施の形態と異なる点について、以下に説明する。 The difference between the fifth embodiment and the above embodiment will be described below.

図１０は、本発明の第５の実施の形態における翻訳システムで、データベース２０７中の第１のデータベースに格納されるデータ構造の具体例を示す図である。 FIG. 10 is a diagram showing a specific example of a data structure stored in the first database in the database 207 in the translation system according to the fifth embodiment of the present invention.

この第１のデータベースに登録されるデータは、図３のデータと比較して、暗号のカラムが含まれていない。第１の言語の単語と、その対訳語である第２の言語の単語とが対応付けられて（ペアになって）記録されるものである。第１のデータベースのデータは、第１〜第４の実施の形態で説明したものと同じ登録処理により登録されてもよいし、過去の辞書データや外部辞書データを第１のデータベースに登録することとしてもよい。また、第３〜第Ｎの言語の対応単語を登録することで、複数言語への翻訳を可能としてもよい。 The data registered in this first database does not include a cryptographic column as compared with the data of FIG. Words in the first language and words in the second language, which are their translations, are recorded in association with each other (in pairs). The data in the first database may be registered by the same registration process as described in the first to fourth embodiments, or the past dictionary data and the external dictionary data may be registered in the first database. May be. Further, by registering the corresponding words in the third to Nth languages, translation into a plurality of languages may be possible.

図１１は、本発明の第５の実施の形態における翻訳システムで、データベース２０７中の第２のデータベースに格納されるデータ構造の具体例を示す図である。 FIG. 11 is a diagram showing a specific example of a data structure stored in the second database in the database 207 in the translation system according to the fifth embodiment of the present invention.

第２のデータベースに登録されるデータは、第１のデータベースに記録されたデータと、名詞である単語とその直後に続く参照符号とからなるセットを複数含む翻訳対象の文書とから作成されるものである。第２のデータベースは、番号、単語、対訳単語、および暗号のカラムを含んでいる。もちろん、第３〜第Ｎの言語の対応単語を登録することで、複数言語への翻訳を可能としてもよい。第５の実施の形態では、第２のデータベースが第１の暗号化、および第１の復号化を行うときに利用される暗号表（コードブック）の役割を果たす。 The data registered in the second database is created from the data recorded in the first database and the document to be translated containing a plurality of sets consisting of a noun word and a reference code immediately following it. Is. The second database contains columns for numbers, words, bilingual words, and ciphers. Of course, by registering the corresponding words in the third to Nth languages, translation into a plurality of languages may be possible. In the fifth embodiment, the second database serves as a codebook used when performing the first encryption and the first decryption.

第２のデータベースの作成においては、先ず、第１のデータベースの内容がコピーされ（番号０〜３５０４の行を参照）、それぞれの単語を識別するための暗号が記録される。暗号については、第１〜第４の実施の形態におけるそれと同様である。 In creating the second database, the contents of the first database are first copied (see lines 0-3504) and the code for identifying each word is recorded. The cipher is the same as that in the first to fourth embodiments.

次に、翻訳対象である第１の言語で記述された文章に含まれる、名詞である単語とその直後に続く参照符号とからなるセットが検出される。この検出においては、文章の分かち書き・形態素解析により名詞が１以上連続する部分と、それに続く数字やアルファベットなどの参照符号が１文字以上続く部分をセットとして検出することとしてもよいし、正規表現を用いて検出を行うこととしてもよい。また、図１０の第１のデータベースに登録されている単語を翻訳対象の文書から検索し、それに参照符号が続く場合に、その単語と参照符号をセットとして検出することとしてもよい。名詞である単語とその直後に続く参照符号とからなるセットについて、その単語は、第１の言語の単語であってもよいし、第２の言語の単語（対訳単語）であってもよい。 Next, a set consisting of a word that is a noun and a reference code that immediately follows it is detected in a sentence written in the first language to be translated. In this detection, a part in which one or more nouns are continuous and a part in which a reference code such as a number or an alphabet is continued by one or more characters may be detected as a set by word-separation / morphological analysis of a sentence, or a regular expression may be detected. It may be used for detection. Further, the word registered in the first database of FIG. 10 may be searched from the document to be translated, and when the reference code follows the word, the word and the reference code may be detected as a set. For a set consisting of a word that is a noun and a reference code that immediately follows it, the word may be a word in the first language or a word in the second language (translational word).

特に、翻訳処理対象の文章が第１の言語で書かれているのであれば、第１の言語の単語とその直後に続く参照符号とからなるセットが検出される。翻訳処理対象の文章が第１の言語で書かれているが、既に一部の単語が第２の言語に置換されているのであれば、第２の言語の単語とその直後に続く参照符号とからなるセットが検出される。 In particular, if the sentence to be translated is written in the first language, a set consisting of a word in the first language and a reference code immediately following it is detected. If the text to be translated is written in the first language, but some words have already been replaced by the second language, then the words in the second language and the reference code that immediately follows them. A set consisting of is detected.

ここでは、翻訳処理対象の文章中に「電気自動車１００」という、単語とその直後に続く参照符号とからなるセットが含まれていたものとする。セット中の「電気自動車」は、図１０の第１のデータベース（または図１１の第２のデータベース）のデータを用いて、「electric vehicle」に置換される。これにより、「電気自動車１００」と「electric vehicle 100」との対応関係ができ、その対応には新たな暗号が付与されて、図１１の第２のデータベースに新たに登録される（番号４０１２の行を参照）。 Here, it is assumed that the sentence to be translated includes a set consisting of the word "electric vehicle 100" and the reference code immediately following it. The "electric vehicle" in the set is replaced with the "electric vehicle" using the data from the first database of FIG. 10 (or the second database of FIG. 11). As a result, a correspondence relationship between the "electric vehicle 100" and the "electric vehicle 100" is established, a new code is given to the correspondence, and the correspondence is newly registered in the second database of FIG. 11 (No. 4012). See line).

このとき第１の言語の単語として、「電気自動車１００」の語を登録してもよいし、そのうちの「電気自動車」部分を番号０の行データで暗号化した、「"AAA" 100」を登録してもよい。図１１では、「"AAA" 100」を登録することとしている。 At this time, the word "electric vehicle 100" may be registered as the word in the first language, or "" AAA "100" in which the "electric vehicle" part is encrypted with the line data of the number 0 is used. You may register. In FIG. 11, "" AAA "100" is registered.

なお、暗号部分はダブルコーテーションなどの特殊な記号で囲んで処理すると、その部分を暗号であると明確にすることができるため、後の処理に便利である。すなわち、暗号に変換するときに、その暗号をダブルコーテーションなどの特殊記号で囲んで文章に挿入したり、図１１のデータベースに登録するときに、その暗号をダブルコーテーションなどの特殊記号で囲んで登録するものである。勿論、特殊な記号で囲まなくてもよい。 If the cryptographic part is processed by enclosing it in special symbols such as double quotation marks, it can be clarified that the part is cryptographic, which is convenient for later processing. That is, when converting to a cipher, the cipher is enclosed in a special symbol such as double quotation marks and inserted into a sentence, or when being registered in the database of FIG. 11, the cipher is enclosed in special symbols such as double quotation marks and registered. Is what you do. Of course, it does not have to be surrounded by special symbols.

同様にして、翻訳処理対象の文章中に含まれる他の単語と参照符号とのセットが、第２のデータベースに順次登録される。なお、翻訳対象の文書中には、同じ単語であっても異なる参照符号が付与されるケースがあるが（例えば、「歯車５０１ａ」と「歯車５０１ｂ」など）、この場合は、それぞれに別の暗号が付されて別の行に登録される（対訳単語はそれぞれ、「gear 501a」、「gear 501b」となり、それぞれに別の暗号が付与される）。 Similarly, a set of other words and reference codes included in the sentence to be translated is sequentially registered in the second database. In some documents to be translated, different reference codes are given even for the same word (for example, "gear 501a" and "gear 501b"), but in this case, they are different. It is coded and registered in a separate line (translated words are "gear 501a" and "gear 501b", respectively, and different codes are assigned to each).

第２のデータベースを用いることで、第１〜第４の実施の形態と同様に、単語と対訳単語と暗号との間で双方向の置換ができるし、参照符号付きの単語と、参照符号付きの対訳単語と暗号との間での双方向の置換も可能である。 By using the second database, bidirectional substitution can be performed between the word, the translated word, and the cipher, as in the first to fourth embodiments, and the word with the reference code and the word with the reference code can be used. Bidirectional substitution between the bilingual word and the cipher is also possible.

また、第１のデータベースを翻訳知識のデータベース（単語変換、単語翻訳用のデータベース）として保持し、第２のデータベースを翻訳対象の文書に対しての翻訳用のデータベース（所謂オーダーメイドのデータベース）とすることで、同一の単語に対して異なる参照符号が付与されている複数の文書であっても、単語＋参照符号のセットに暗号や対訳語＋参照符号のセットを対応付けることが可能となる。 In addition, the first database is held as a database of translation knowledge (database for word conversion and word translation), and the second database is used as a database for translation of documents to be translated (so-called custom-made database). By doing so, even in a plurality of documents in which different reference codes are assigned to the same word, it is possible to associate a code or a bilingual word + reference code set with the word + reference code set.

本実施の形態では、単語＋参照符号のセットを暗号に置き換えて翻訳を行う事ができる。このため、翻訳する文章（サーバに送信する文章）を短くすることができ（データ量を少なくすることができ）、翻訳コストを抑えることができる。また、機械翻訳する文章が短くなるため、誤訳が生じにくくなる。 In the present embodiment, the set of words + reference codes can be replaced with a cipher for translation. Therefore, the text to be translated (text to be sent to the server) can be shortened (the amount of data can be reduced), and the translation cost can be suppressed. In addition, since the machine-translated sentences are shortened, mistranslations are less likely to occur.

翻訳においては、翻訳対象の文書が第２のデータベースを用いて暗号化される。このとき、翻訳対象の文書内の上記セット中の単語が第１の言語であれば、図１１のデータの「単語」のカラムに記載された単語が、その行の暗号に変換される。 In translation, the document to be translated is encrypted using a second database. At this time, if the word in the above set in the document to be translated is the first language, the word described in the "word" column of the data in FIG. 11 is converted into the code of that line.

このとき、翻訳対象の文書中には、単語＋参照符号のセットが存在し、かつ、その単語が参照符号を伴わずに出現することも考えられる。暗号化後の文章を短くするためには、先ず、単語＋参照符号のセットの暗号への置換え（図１１では番号４０１２以降の行を用いた置換え）を優先して行い、その後に単語の暗号への置換え（図１１では番号０から番号４０１２よりも前の行を用いた置換え）を行う事が望ましい。これにより、参照符号を伴う単語は、単語＋参照符号のセットとして暗号化することができるし、参照符号を伴わない単語は、単語だけを暗号化することができる。 At this time, it is conceivable that a set of a word + a reference code exists in the document to be translated, and the word appears without a reference code. In order to shorten the encrypted sentence, first, the replacement of the word + reference code set with the cipher (in FIG. 11, the replacement using the line after the number 4012) is performed with priority, and then the cipher of the word is performed. It is desirable to perform the replacement with (in FIG. 11, the replacement using the line before the number 0 to the number 4012). As a result, a word with a reference code can be encrypted as a set of a word + a reference code, and a word without a reference code can encrypt only a word.

なお、図１１に示されるように単語のカラムに、暗号＋参照符号が登録されるのであれば、その暗号は同データベースによってデコードされ、単語＋参照符号とされた上で暗号への変換が行われるようにしてもよい。 If a code + reference code is registered in the word column as shown in FIG. 11, the code is decoded by the same database, converted to the word + reference code, and then converted into a code. You may be asked.

また、翻訳対象の文書中のセット中の単語が第２の言語に置き換えられているのであれば、図１１のデータの「対訳単語」のカラムに記載された単語が、その行の暗号に変換される。このとき、翻訳対象の文書中には、対訳単語＋参照符号のセットが存在し、かつ、その対訳単語が参照符号を伴わずに出現することも考えられる。暗号化後の文章を短くするためには、先ず、対訳単語＋参照符号のセットの暗号への置換え（図１１では番号４０１２以降の行を用いた置換え）を優先して行い、その後に対訳単語の暗号への置換え（図１１では番号４０１２以降の行を用いた置換え）を行う事が望ましい。これにより、参照符号を伴う対訳単語は、対訳単語＋参照符号のセットとして暗号化することができるし、参照符号を伴わない対訳単語は、対訳単語だけを暗号化することができる。 If the word in the set in the document to be translated has been replaced with the second language, the word described in the "translation word" column of the data in FIG. 11 is converted into the code of that line. Will be done. At this time, it is conceivable that a set of a bilingual word + a reference code exists in the document to be translated, and the bilingual word appears without a reference code. In order to shorten the sentence after encryption, first, the replacement of the set of the translated word + the reference code with the cipher (in FIG. 11, the replacement using the line after the number 4012) is performed with priority, and then the translated word is replaced. It is desirable to replace the above with the cipher (in FIG. 11, the line after the number 4012 is used). As a result, the translated word with the reference code can be encrypted as a set of the translated word + the reference code, and the translated word without the reference code can encrypt only the translated word.

図１１に示されるように対訳単語のカラムに、対訳単語＋参照符号が登録されるのであれば、それがそのまま文章中で検索された上で暗号への変換が行われる。対訳単語も一部を暗号化して（"AAA" 100などとして）データベースに登録することとしてもよい。 If the bilingual word + reference code is registered in the column of the bilingual word as shown in FIG. 11, it is searched in the sentence as it is and then converted into the code. The bilingual words may also be partially encrypted and registered in the database (such as "AAA" 100).

図１１のように単語＋参照符号のセットを暗号＋参照符号として第２のデータベースに登録するのであれば、第２のデータベースを、単語と暗号との対応付け部分（図１１であれば番号０〜３５０４）と、単語＋参照符号と暗号との対応付け部分（図１１であれば番号４０１２以降）とに分けて管理し、先ず、単語と暗号との対応付け部分を用いて単語を暗号に変換するステップを実行し、その後に単語＋参照符号と暗号との対応付け部分を用いて単語を暗号に変換するステップを実行すれば、結果的に参照符号を伴う単語は、単語＋参照符号のセットとして暗号化することができるし、参照符号を伴わない単語は、対訳単語だけを暗号化することができる。このような処理は、アルゴリズムに応じて適宜変更することができる。例えば、第２のデータベースについて、単語と暗号との対応付け部分（図１１であれば番号０〜３５０４）と、単語＋参照符号と暗号との対応付け部分（図１１であれば番号４０１２以降）とを別のデータベースとして処理してもよい。 If a set of a word + a reference code is registered in the second database as a code + a reference code as shown in FIG. 11, the second database is registered in the correspondence part between the word and the code (number 0 in the case of FIG. 11). ~ 3504) and the association part between the word + reference code and the cipher (number 4012 or later in FIG. 11) are managed separately, and first, the word is encrypted using the association part between the word and the cipher. If the step of converting is executed, and then the step of converting the word into the cipher using the associating part between the word + reference code and the cipher is executed, the word with the reference code is eventually changed to the word + reference code. It can be encrypted as a set, and for words without a reference code, only bilingual words can be encrypted. Such processing can be appropriately changed depending on the algorithm. For example, in the second database, the correspondence part between the word and the cipher (numbers 0 to 3504 in FIG. 11) and the correspondence part between the word + reference code and the cipher (number 4012 or later in FIG. 11). And may be processed as a separate database.

なお、単語や対訳単語を暗号化するときには、より長い（文字数の多い）単語や対訳単語を優先して暗号化することが望ましい。例えば、データベースに「入力手段」（対応暗号はJKL）と「信号入力手段」（対応暗号はYSH）の双方が別の暗号とともに記録されているときに、文字数の少ない「入力手段」が先に処理されると、処理対象の文章に含まれる「信号入力手段」が「信号 JKL」に変換され、文章を短くする観点、誤訳を少なくする観点から好ましくないためである。 When encrypting a word or a translated word, it is desirable to give priority to the longer word or the translated word (with a large number of characters). For example, when both "input means" (corresponding cipher is JKL) and "signal input means" (corresponding cipher is YSH) are recorded together with different ciphers in the database, the "input means" with a small number of characters comes first. This is because when the text is processed, the "signal input means" included in the text to be processed is converted into the "signal JKL", which is not preferable from the viewpoint of shortening the text and reducing mistranslation.

より長い（文字数の多い）単語や対訳単語を優先して暗号化することで、処理対象の文章に含まれる「信号入力手段」は「YSH」の暗号に正しく変換され、文章を短くでき、誤訳を少なくすることができる。 By preferentially encrypting longer (more characters) words and bilingual words, the "signal input means" included in the text to be processed is correctly converted to the "YSH" cipher, and the text can be shortened and mistranslated. Can be reduced.

このように、図１１の第２のデータベースを用いることで、第１の言語で記載されている翻訳対象の文章に含まれる単語＋参照符号のセットは、それに対応する暗号に置き換えられ、また、第１の言語で記載されている翻訳対象の文章に含まれる単語は、それに対応する暗号に置き換えられる（第１の暗号化）。第１の暗号化が行われた文書を外部サーバに送ることで、文書の秘匿性を担保したままその処理（翻訳）が行われる。 In this way, by using the second database of FIG. 11, the set of words + reference codes contained in the sentence to be translated described in the first language is replaced with the corresponding cipher, and also. Words contained in the text to be translated written in the first language are replaced with the corresponding ciphers (first cipher). By sending the first encrypted document to an external server, the processing (translation) is performed while ensuring the confidentiality of the document.

外部サーバにより翻訳が行われた第２の言語の文書については、その中に暗号が含まれるため、図１１の第２のデータベース（暗号表、コードブック）を用いた復号化が行われる。すなわち、文章中の暗号が、図１１の対訳単語の対応するデータに置換される。 Since the document in the second language translated by the external server contains the cipher, the decryption is performed using the second database (encryption table, codebook) in FIG. That is, the code in the sentence is replaced with the corresponding data of the translated word in FIG.

例えば仮に電気自動車について記載された特許明細書を翻訳するものとし、文書ファイルに、 For example, suppose that a patent specification describing an electric vehicle is translated, and the document file is

「［００２３］ "[0023]

電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章が記載されていたものとすれば、本実施の形態では、その第１の暗号化後の文章は、 The electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to a signal input means 103 and a display unit 102. When the input of the signal input means 103 is high, the display unit 102 displays a warning. If the text of "" is described, in the present embodiment, the text after the first encryption is

「［００２３］ "[0023]

"HSD" は、"HSE" と "HEF" を備え、"HSE" は、"HEG"と"HEF"に接続される。"HEG"の入力がハイである場合、"HEF"は警告を表示する。」 The "HSD" has "HSE" and "HEF", and the "HSE" is connected to the "HEG" and "HEF". If the "HEG" input is high, "HEF" will display a warning. "

の文章に変換される（説明の簡便のため、前処理などは行っていないものとする）。このように参照符号も含めて１つのまとまりとして暗号化を行うことができるため、第１〜第４の実施の形態よりも短い文章とすることができ、誤訳の可能性が小さくなる。また、参照符号を伴わない単語も、暗号化することができる。 (For the sake of simplicity of explanation, preprocessing is not performed). Since the encryption can be performed as one unit including the reference code in this way, the sentence can be made shorter than that of the first to fourth embodiments, and the possibility of mistranslation is reduced. In addition, words without a reference code can also be encrypted.

なお、この文章が外部コンピュータ資源で翻訳されると、その結果は、 If this sentence is translated by an external computer resource, the result will be

「[0023]。 "[0023].

"HSD" includes "HSE" and "HEF". "HSE" is connected to "HEG" and "HEF". When an input of "HEG" is high, "HEF" displays warning.」のような文に変換される。これを第２のデータベースを用いて復号化することで、第１〜第４の実施の形態と同様に、 "HSD" includes "HSE" and "HEF". "HSE" is connected to "HEG" and "HEF". When an input of "HEG" is high, "HEF" displays warning. " To. By decoding this using the second database, the same as in the first to fourth embodiments,

「[0023]。 "[0023].

electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high, display 102 displays warning.」 electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high, display 102 displays warning. "

の翻訳文を得ることができる（説明の簡便のため、後処理などは行っていないものとする）。 (For the sake of simplicity of explanation, post-processing etc. is not performed).

図１２および１３は、本発明の第５の実施の形態における翻訳システムに含まれるコンピュータプログラムの日英翻訳処理を示すフローチャートである。 12 and 13 are flowcharts showing Japanese-English translation processing of a computer program included in the translation system according to the fifth embodiment of the present invention.

このフローチャートが図４および図５のフローチャートと異なる点について説明する。 The difference between this flowchart and the flowcharts of FIGS. 4 and 5 will be described.

ステップＳ１０７においては、選択された単語とその対訳が、第１のデータベースに登録が行われる（図１０）。 In step S107, the selected word and its bilingual translation are registered in the first database (FIG. 10).

ステップＳ１０９での処理の後、ステップＳ１１０において、第１のデータベースの内容と、翻訳対象の文とから、第２のデータベース（図１１）を作成する処理が行われる。ステップＳ１１１において、単語または単語＋参照符号を暗号化する処理が行われる。この処理においては、第２のデータベースが用いられる。 After the process in step S109, in step S110, a process for creating a second database (FIG. 11) is performed from the contents of the first database and the sentence to be translated. In step S111, a process of encrypting the word or the word + reference code is performed. A second database is used in this process.

図１３のステップＳ１３１においては、第２のデータベースを用いて、暗号を単語（対訳単語）に、または暗号を単語（対訳単語）＋参照符号に変換する処理が行われる（第１の復号化）。 In step S131 of FIG. 13, a process of converting a cipher into a word (translated word) or a cipher into a word (translated word) + reference code is performed using the second database (first decoding). ..

なお、第１、第２のデータベースは、それぞれ異なる種類のプログラムで動作するものであってもよい。例えば第１のデータベースは、ＣＳＶファイルやＥＸＣＥＬファイルとし、第２のデータベースはＳＱＬｉｔｅやＭｙＳＱＬなどのソフトウェアを用いるなどである。 The first and second databases may operate with different types of programs. For example, the first database is a CSV file or an EXCEL file, and the second database uses software such as SQLite or MySQL.

本実施の形態では、単語＋参照符号を１つの単位として暗号とともにデータベースに登録したが、前置語＋単語、または前置語＋単語＋参照符号を１つの単位として暗号とともにデータベースに登録することとしてもよい。例えば「Ｓｉ基板３００」の「Ｓｉ」を前置語とし、「基板」の単語の前に前置語がある場合には、「Ｓｉ基板」を１つの単位として暗号とともに登録したり、「Ｓｉ基板３００」を１つの単位として暗号とともに登録したりするものである。 In the present embodiment, the word + reference code is registered in the database together with the cipher as one unit, but the prefix word + word or the prefix word + word + reference code is registered in the database together with the cipher as one unit. May be. For example, if "Si" of "Si board 300" is used as a prefix word and there is a prefix word before the word "board", "Si board" can be registered as one unit together with encryption, or "Si" can be registered. The "board 300" is registered together with the code as one unit.

また、図１１の単語と暗号の組み合わせのデータベース（番号０〜３５０４）を用いた翻訳原文の全文暗号化処理を行った後、文書中の暗号＋参照符号部分を検索することで、単語＋参照符号と暗号の組み合わせのデータベース（番号４０１２〜）を作成してもよい。その後、単語＋参照符号と暗号の組み合わせのデータベースにより、再度の暗号化を行うものである。 Further, after performing the full-text encryption processing of the translated original text using the database of the combination of the word and the cipher in FIG. 11 (numbers 0 to 3504), the cipher + reference code portion in the document is searched for the word + reference. A database (numbers 4012-) of a combination of code and cipher may be created. After that, the encryption is performed again by the database of the combination of the word + reference code and the encryption.

なお、単語、対訳単語のデータベースへの重複登録を防ぐ処理を行うことが望ましい。これは、単語、対訳単語の同じペアが登録されないようにする処理や、そのいずれか一方について、同じものが登録されないようにする処理である。 It is desirable to perform processing to prevent duplicate registration of words and bilingual words in the database. This is a process of preventing the same pair of words and bilingual words from being registered, and a process of preventing the same pair of words from being registered.

［その他］ [Other]

本実施の形態におけるフローチャートは、複数のソフトウェアにより実行されてもよい。例えば、一部がワープロソフト（マクロなど）によって実行され、一部はスクリプト言語やコンパイルされた実行形式ファイルによって実行されてもよい。一部が人間により実行されてもよい。 The flowchart in this embodiment may be executed by a plurality of software. For example, some may be executed by word processing software (macro, etc.), and some may be executed by a scripting language or a compiled executable file. Some may be performed by humans.

また、全角の文字を半角にする処理、スペースを削除する処理、スペースの数を減らす処理などを適宜（処理のいずれかで）行ってもよい。 In addition, a process of converting full-width characters to half-width, a process of deleting spaces, a process of reducing the number of spaces, and the like may be performed as appropriate (either of the processes).

上述の実施の形態では、暗号化を行う前に、第１の言語の単語を第２の言語の単語に置き換えることとしたが（ステップＳ１０５、Ｓ１０７など）、これを行わずデータベースを用いて、第１の言語の単語を直接暗号化し、それを外部コンピュータ資源に送るようにしてもよい。 In the above-described embodiment, the word of the first language is replaced with the word of the second language before the encryption is performed (steps S105, S107, etc.), but this is not performed and the database is used. A word in the first language may be encrypted directly and sent to an external computer resource.

文書中のすべての単語（および／または単語＋参照符号）を暗号に置換してもよいし、その一部のみを置換してもよい。 All words (and / or words + reference codes) in the document may be replaced with ciphers, or only some of them may be replaced.

［第６の実施の形態］ [Sixth Embodiment]

本発明の第６の実施の形態における翻訳システムのハードウェア構成は、第１〜第５の実施の形態におけるそれと同じであるためここでの説明を繰り返さない。第６の実施の形態において翻訳システムは、上述の実施の形態による方法（または他の方法）で翻訳された文章の原文と翻訳文とを比較し、翻訳の結果を評価するものである。 Since the hardware configuration of the translation system according to the sixth embodiment of the present invention is the same as that in the first to fifth embodiments, the description thereof will not be repeated. In the sixth embodiment, the translation system compares the original text of the text translated by the method (or other method) according to the above-described embodiment with the translated text, and evaluates the translation result.

翻訳の評価は、原文に存在する情報が翻訳文にも残されているか、および／または原文に存在しない情報が翻訳文に存在しないかの判定に基づいて行われる。前者の判定は、原文の情報の欠損がないかを判定するものであり、後者の判定は、不要な情報が翻訳文に追加されていないか（いわゆる「情報の湧き出し」がないか）を判定するものである。 The evaluation of translation is based on the determination that the information existing in the original text is left in the translated text and / or the information not present in the original text is not present in the translated text. The former judgment determines whether or not the information in the original text is missing, and the latter judgment determines whether or not unnecessary information is added to the translated text (whether or not there is a so-called "information gushing"). It is a judgment.

例えば上記第５の実施の形態で例示された、第１の暗号化後の文章は、 For example, the first encrypted text illustrated in the fifth embodiment is

「［００２３］ "[0023]

であり、この文章が外部コンピュータ資源で翻訳されると、その結果は、 And when this sentence is translated by an external computer resource, the result is:

「[0023]。 "[0023].

"HSD" includes "HSE" and "HEF". "HSE" is connected to "HEG" and "HEF". When an input of "HEG" is high, "HEF" displays warning.」となる。 "HSD" includes "HSE" and "HEF". "HSE" is connected to "HEG" and "HEF". When an input of "HEG" is high, "HEF" displays warning. "

ここで、第１の暗号化後の文章には、"HSD"が１つ、"HSE"が２つ、"HEF"が３つ、"HEG"が２つ含まれている。 Here, the first encrypted text contains one "HSD", two "HSE", three "HEF", and two "HEG".

また、翻訳後の文章にも、"HSD"が１つ、"HSE"が２つ、"HEF"が３つ、"HEG"が２つ含まれている。 The translated text also contains one "HSD", two "HSE", three "HEF" and two "HEG".

これは、翻訳前の全ての単語の情報が欠損せずに翻訳後の文章に残っていることを示している（単語の欠損率は０％であると判定する）。また、翻訳後の文章に翻訳前の文章にはなかった単語が含まれていないことを示している（単語の湧き出し率は０％であると判定する）。 This indicates that the information of all the words before translation remains in the sentence after translation without being lost (the word loss rate is determined to be 0%). It also indicates that the translated text does not contain words that were not in the pre-translated text (the word spout rate is determined to be 0%).

単語の欠損があった時には、例えば「"HSE"が１つ欠損しています。」、「"HSE"が２つとも欠損しています。」、「原文には２つある"HSE"が、１つ欠損しています。」、「原文にある"HSD"が欠損しています。」、「この文章の単語の欠損率は○○％です。」などのメッセージをユーザに提示（ディスプレイに表示、音声で案内など）すればよい。また、欠損がない場合に「欠損がありません。」、「欠損率０％」などのメッセージをユーザに提示してもよい。 When a word is missing, for example, "One" HSE "is missing.", "Both" HSE "are missing.", "There are two" HSE "in the original text. Present a message to the user (displayed on the display) such as "One is missing.", "The" HSD "in the original text is missing.", "The missing rate of the word in this sentence is XX%." , Voice guidance, etc.) Further, when there is no defect, a message such as "No defect" or "0% defect rate" may be presented to the user.

欠損がある場合にのみ、上記警告メッセージを表示し、ない場合にはメッセージを表示しなくてもよい。 The above warning message may be displayed only when there is a defect, and the message may not be displayed when there is no defect.

単語の湧き出し（不要な情報が翻訳文に追加されていること）があった時には、例えば「"HSE"が１つ増えています。」、「"HSE"が２つ増えています。」、「原文にない"HSD"が追加されています。」、「原文には"HSE"が１つですが、翻訳文には２つあります。」、「この文章の単語の湧き出し率は○○％です。」などのメッセージをユーザに提示すればよい。また、欠損がない場合に「湧き出しはありません。」、「湧き出し率０％」などのメッセージをユーザに提示してもよい。 When there is a word sprout (unnecessary information is added to the translated text), for example, "" HSE "is increased by one.", "" HSE "is increased by two.", "" HSD "that is not in the original text has been added.", "There is one" HSE "in the original text, but there are two in the translated text.", "The rate of words in this sentence is XX. You can present a message such as "%." To the user. Further, if there is no defect, a message such as "There is no springing out" or "Spouting rate 0%" may be presented to the user.

湧き出しがある場合にのみ、上記警告メッセージを表示し、ない場合にはメッセージを表示しなくてもよい。 It is not necessary to display the above warning message only when there is a spring, and not to display the message when there is no spring.

警告メッセージは、その警告がどの文章に対するものであるかがわかるように、翻訳文中に挿入されてもよいし（例えばその文章の開始部分の前に挿入されたり、末尾部分の後に挿入されたりしてもよい）、アンダーライン、網掛け、文章に対するコメントの付加などによって警告対象の文章や警告内容が利用者に対して明確に表示されてもよい。 The warning message may be inserted in the translated text so that the warning is for which sentence (for example, it may be inserted before the beginning of the sentence or after the end of the sentence). The text to be warned and the content of the warning may be clearly displayed to the user by underlining, shading, adding a comment to the text, or the like.

また、"HSE"などの暗号で提示を行うのではなく、それに対応する「electric vehicle 100」（または参照符号を含まない「electric vehicle」）の語をデータベースから読み出して提示を行ってもよいし、「電気自動車１００」（または参照符号を含まない「電気自動車」）の語をデータベースから読み出して提示を行ってもよい。 Also, instead of presenting with a code such as "HSE", the corresponding word "electric vehicle 100" (or "electric vehicle" without a reference code) may be read from the database and presented. , "Electric vehicle 100" (or "electric vehicle" not including a reference code) may be read from the database and presented.

『原文の「電気自動車１００」に対応する訳語である「electric vehicle 100」が翻訳文には含まれていません。』などの翻訳前、翻訳後の単語の両者を含むメッセージを警告としてユーザに提示してもよい。 "The translation does not include" electric vehicle 100, "which is the translation of the original" electric vehicle 100. " ], Etc., a message containing both pre-translation and post-translation words may be presented to the user as a warning.

また、暗号化されていない「電気自動車１００は、エンジン１０１と表示部１０２を備え、エンジン１０１は、信号入力手段１０３と表示部１０２に接続される。信号入力手段１０３の入力がハイである場合、表示部１０２は警告を表示する。」の文章を翻訳したときに、「electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit 103 is high, display 102 displays warning.」の翻訳文が得られたとした場合に、翻訳前の単語（「電気自動車１００」、または「電気自動車」など）に対応する翻訳後の単語（「electric vehicle 100」、または「electric vehicle」など）が同じ数だけあるか、異なっているかを判断することで翻訳評価を行ってもよい。 Further, the unencrypted "electric vehicle 100 includes an engine 101 and a display unit 102, and the engine 101 is connected to the signal input means 103 and the display unit 102. When the input of the signal input means 103 is high. When the sentence "electric vehicle 100 includes engine 101 and display 102. engine 101 is connected to signal input unit 103 and display 102. When an input of signal input unit." Assuming that the translated text of "103 is high, display 102 displays warning." Is obtained, the translated word ("electric vehicle") corresponding to the untranslated word ("electric vehicle 100", or "electric vehicle", etc.) Translation evaluation may be performed by determining whether there are the same number of "100" or "electric vehicle") or different ones.

このような判断のためには、翻訳前の単語から、翻訳後の単語を決定する必要（または逆に翻訳後の単語から、翻訳前の単語を決定する必要）があるが、これは図３、１０のようなデータベースを用いて対応する単語を決定することで行うことができる。または、機械翻訳や他の辞書を用いることにより、単語からその訳語を決定することとしてもよい。 For such a judgment, it is necessary to determine the word after translation from the word before translation (or conversely, it is necessary to determine the word before translation from the word after translation). This can be done by determining the corresponding word using a database such as 10. Alternatively, the translation may be determined from the word by using machine translation or another dictionary.

"HSE"などの暗号が翻訳前の文章と翻訳後の文章に含まれるかを調べるのであれば、データベースを用いなくても可能である。すなわち、アルファベット大文字３文字を正規表現で見つけ、各暗号ごとに出現回数をカウントするなどで、各単語（各文字列）の出現数を調べることができる。 If you want to find out if a code such as "HSE" is included in the pre-translation text and the post-translation text, you can do it without using a database. That is, the number of occurrences of each word (each character string) can be checked by finding three uppercase letters of the alphabet with a regular expression and counting the number of occurrences for each cipher.

このようにして、第１の言語で記述された文章を第２の言語で記述された文章に翻訳した結果を評価することができる。すなわち、第１の言語で記述された文章は、文字列を複数含んでいる。第１の言語で記述された文章に含まれる文字列に対応する文字列が、第２の言語で記述された文章に含まれているかを判定し、その判定結果が表示される。 In this way, the result of translating a sentence written in the first language into a sentence written in the second language can be evaluated. That is, the sentence written in the first language contains a plurality of character strings. It is determined whether or not the character string corresponding to the character string included in the sentence described in the first language is included in the sentence described in the second language, and the determination result is displayed.

なお、例えば日本語は名詞の単数形・複数形の区別が明確ではなく、英語は名詞の単数形・複数形の区別が明確である。このため、正しい翻訳が行われたときに、翻訳原文と翻訳文との間で、名詞の単数複数が異なることが生じうる。このような単数複数の違いによって、第１の言語で記述された文章に含まれる単語（例えば「装置」）に対応する単語（例えば「apparatuses」）が、正しく第２の言語で記述された文章に含まれている場合において、データベースに「装置」に対応する単語が「apparatus」と登録されていたときに、「apparatus」と「apparatuses」とが一致しないと判断される（単語の抜けがあると判断される）ことは好ましくない。 For example, in Japanese, the distinction between the singular and plural forms of nouns is not clear, and in English, the distinction between the singular and plural forms of nouns is clear. Therefore, when the correct translation is performed, the singular and plural nouns may differ between the original translation and the translation. Due to these singular and plural differences, the word (eg "apparatuses") corresponding to the word (eg "device") contained in the sentence written in the first language is correctly written in the second language. When the word corresponding to "device" is registered as "apparatus" in the database, it is judged that "apparatus" and "apparatuses" do not match (there is a missing word). It is not preferable.

従って、例えば複数形の単語は単数形に変換する（または逆に単数形の単語は複数形に変換する）、単数形の単語と複数形の単語とは同じものとして処理する、などにより、翻訳前の文章に含まれる単語と、翻訳後の文章に含まれる単語のマッチング（両者に存在するかの判定）を行うこととしてもよい。 So, for example, a plural word is converted to a singular form (or singular word is converted to a plural form), a singular form word and a plural form word are treated as the same, and so on. Matching (determination of existence in both) between the word contained in the previous sentence and the word contained in the translated sentence may be performed.

同様に、動詞が主語の人称などにより変化する言語を扱う場合において、第１の言語で記述された文章に含まれる動詞の単語が、第２の言語で記述された文章に含まれているかを判断するときに、例えば動詞は原形に変換する、語形が変化しても動詞は同じものとして処理する、などにより、翻訳前の文章に含まれる単語と、翻訳後の文章に含まれる単語のマッチングを行うこととしてもよい。 Similarly, when dealing with a language in which the verb changes depending on the person of the subject, whether the word of the verb contained in the sentence described in the first language is included in the sentence described in the second language. When making a judgment, for example, the verb is converted to the original form, the verb is treated as the same even if the word form changes, etc., so that the word contained in the sentence before translation and the word contained in the sentence after translation are matched. May be done.

上述の通り現在の機械翻訳では、翻訳後に原文の情報が一部消失（欠損）したり、原文にはない情報が翻訳文に現れたりすること（情報の湧き出し）がある。このような情報の欠損や湧き出しを人間の目でチェックするには、多大な労力を要する（原文と機械翻訳文の両者をチェックする必要がある）。 As mentioned above, in the current machine translation, some information in the original text may be lost (missing) after translation, or information not found in the original text may appear in the translated text (information springing out). It takes a lot of effort to check such information deficiencies and springs with the human eye (it is necessary to check both the original text and the machine translated text).

機械翻訳が行われた文章を評価するときに、文書を構成する単位（文、節、句、段落、行、ページ、複数文、複数節、複数句、複数段落、複数行、複数ページ、または文書全体など）の１つごとに上述の判定を行い、文書を構成する単位の１つごとに判定結果を提示すると、その単位について、利用者は翻訳の抜け（情報の消失）や不要な情報の追加があるか、あったとするとどの単語かがわかる。これにより、利用者による翻訳文のチェックの労力が軽減される。 When evaluating a machine-translated sentence, the units that make up the document (sentence, clause, phrase, paragraph, line, page, multiple sentences, multiple clauses, multiple phrases, multiple paragraphs, multiple lines, multiple pages, or When the above judgment is made for each unit (such as the entire document) and the judgment result is presented for each unit that constitutes the document, the user can omit translation (loss of information) or unnecessary information for that unit. You can see if there is an addition of, and if so, which word. This reduces the effort of the user to check the translated text.

すなわち、１つの単位の翻訳結果に対応付けて、単語の抜けや追加を利用者に提示することができるため、利用者はその１単位を詳細にチェックすべきであるかどうか容易に判断することができる。翻訳の抜け（情報の消失）や不要な情報の追加がある単位部分についてのみ、利用者に注意を喚起するものである（利用者はその単位部分を詳細にチェックして、翻訳文の推敲を行うことができる）。警告が出されなかった単位については、利用者によるチェック、推敲の労力が減少される。 That is, since it is possible to present the user with missing words or additions in association with the translation result of one unit, the user can easily determine whether or not the one unit should be checked in detail. Can be done. It calls attention to the user only for the unit part where there is omission of translation (loss of information) or addition of unnecessary information (the user checks the unit part in detail and refines the translated text. It can be carried out). For units for which no warning has been issued, the labor of checking and elaboration by the user will be reduced.

また、翻訳が行われた文章を評価するときに、文書全体について判定を行い、文書全体についての判定結果を提示することとしてもよい。これにより、文書全体としての翻訳の抜け（情報の消失）や不要な情報の追加（翻訳文の評価）の判定を行うことができる。この判定結果は、翻訳の品質の評価として利用することができる。例えば、機械翻訳や、機械翻訳エンジン、人の手による翻訳についての評価が可能となる。 Further, when evaluating the translated text, the judgment may be made for the entire document and the judgment result for the entire document may be presented. As a result, it is possible to determine whether the translation of the entire document is missing (loss of information) or addition of unnecessary information (evaluation of the translated text). This determination result can be used as an evaluation of the quality of translation. For example, it is possible to evaluate machine translation, machine translation engine, and manual translation.

例えば、１文（開始から句点（またはピリオドなど）で区切られるまでの文の単位）ごとに評価を行う場合の処理は、以下のように行うことができる。 For example, the process for evaluating each sentence (the unit of a sentence from the beginning to the time when it is separated by a punctuation mark (or a period, etc.)) can be performed as follows.

（１）図５のステップＳ１２５で送信した１文と、ステップＳ１２７で受信した１文の翻訳結果とを比較・評価し、翻訳の抜け（情報の消失）や不要な情報の追加があれば、抜けた文字列（暗号、単語など）や追加された文字列（暗号、単語など）についての情報（文字列、その数）を利用者に提示する。 (1) The translation result of one sentence transmitted in step S125 of FIG. 5 and the translation result of one sentence received in step S127 are compared and evaluated, and if there is any omission of translation (loss of information) or addition of unnecessary information, Information (character string, its number) about the missing character string (encryption, word, etc.) and the added character string (encryption, word, etc.) is presented to the user.

（２）全て翻訳が完了した原文の文書と、その翻訳文の文書とを準備し、両者を先頭から１文（１つの文の開始から句点（またはピリオドなど）で区切られるまでの文の単位）ずつ読み出し、双方を比較・評価し、翻訳の抜け（情報の消失）や不要な情報の追加があれば、その文字列についての情報（文字列、その数）を利用者に提示する。 (2) Prepare the original document that has been completely translated and the document of the translated sentence, and divide both from the beginning by one sentence (from the beginning of one sentence to a punctuation mark (or period, etc.)). ) By reading, comparing and evaluating both, and if there is a translation omission (loss of information) or unnecessary information added, the information about the character string (character string, its number) is presented to the user.

なお、上記実施の形態において、翻訳は外部サーバを用いて行ってもよいし、スタンドアローンのＰＣ内やＬＡＮ内のＰＣ内で行ってもよい。 In the above embodiment, the translation may be performed using an external server, or may be performed in a stand-alone PC or a PC in the LAN.

上述の実施の形態における処理は、ソフトウェアにより行っても、ハードウェア回路を用いて行ってもよい。また、上述の実施の形態における処理を実行するプログラムを提供することもできるし、そのプログラムをＣＤ−ＲＯＭ、フレキシブルディスク、ハードディスク、ＲＯＭ、ＲＡＭ、メモリカードなどの記録媒体に記録してユーザーに提供することにしてもよい。プログラムは、ＣＰＵなどのコンピューターにより実行される。また、プログラムはインターネットなどの通信回線を介して、装置にダウンロードするようにしてもよい。 The processing in the above-described embodiment may be performed by software or by using a hardware circuit. It is also possible to provide a program that executes the processing according to the above-described embodiment, and record the program on a recording medium such as a CD-ROM, a flexible disk, a hard disk, a ROM, a RAM, or a memory card and provide the program to the user. You may decide to do it. The program is executed by a computer such as a CPU. Further, the program may be downloaded to the device via a communication line such as the Internet.

上記実施の形態は、すべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 It should be considered that the above embodiments are exemplary in all respects and not restrictive. The scope of the present invention is shown by the scope of claims rather than the above description, and is intended to include all modifications within the meaning and scope equivalent to the scope of claims.

１００コンピュータ
１０１ＣＰＵ
１０３通信部
１０９ＲＡＭ
１１１記憶装置
２００他社コンピュータ資源
２０３検索・置換部
２０５文書編集・単語登録部
２０７データベース
２０９メモリ
２１１表示／出力部
２１３入力部
４００インターネット
５００自社コンピュータ資源

100 computer 101 CPU
103 Communication unit 109 RAM
111 Storage device 200 Computer resources of other companies 203 Search / replace unit 205 Document editing / word registration unit 207 Database 209 Memory 211 Display / output unit 213 Input unit 400 Internet 500 Own computer resources

Claims

It is a translation evaluation device that evaluates the result of translating a sentence written in a first language into a sentence written in a second language.
The sentence written in the first language contains a plurality of character strings and contains a plurality of character strings.
A determination means for determining whether or not the character string corresponding to the character string included in the first language is included in the sentence described in the second language.
A translation evaluation device including a presentation means for presenting a determination result of the determination means.

The translation evaluation device according to claim 1, wherein the sentence described in the second language is a sentence obtained by machine-translating the sentence described in the first language.

The character string is either a character string indicating a word, a character string indicating a word and a reference code immediately following the word, or a coded character string, or a character string indicating a word, a word and a reference code immediately following the word. The translation evaluation device according to claim 1 or 2, which is a character string including either a character string indicating the above and a coded character string.

A machine translation result of a sentence in which a character string indicating a certain word or a character string including a character string indicating a certain word in a sentence written in the first language is replaced with another corresponding character string. Get a sentence written in the second language
The translation evaluation device according to any one of claims 1 to 3, wherein the determination means determines whether or not the replaced character string is included in a sentence described in the second language.

The determination means makes a determination for each unit constituting the document.
The translation evaluation device according to any one of claims 1 to 4, wherein the presentation means presents a determination result for each unit constituting the document.

The determination means determines the entire document and determines.
The translation evaluation device according to any one of claims 1 to 5, wherein the presentation means presents a determination result for the entire document.

The determination result includes at least one of a character string not included in the sentence written in the second language and a number of character strings not included in the sentence written in the second language. The translation evaluation device according to any one of claims 1 to 6.

The determination means according to any one of claims 1 to 7, wherein the determination means determines whether or not the character string corresponding to the character string included in the second language is included in the sentence described in the first language. The described translation evaluation device.

In the control program of the translation evaluation device that evaluates the result of translating a sentence written in the first language into a sentence written in the second language.
The translation evaluation device includes a computer.
The sentence written in the first language contains a plurality of character strings and contains a plurality of character strings.
The control program of the translation evaluation device is
A determination step for determining whether or not the character string corresponding to the character string included in the first language is included in the sentence described in the second language, and
A control program for a translation device that causes a computer to execute a presentation step that presents a determination result of the determination step.

In a translation evaluation method using a translation evaluation device that evaluates the result of translating a sentence written in a first language into a sentence written in a second language.
The translation evaluation device includes a computer.
The sentence written in the first language contains a plurality of character strings and contains a plurality of character strings.
The translation evaluation method is
A determination step for determining whether or not the character string corresponding to the character string included in the first language is included in the sentence described in the second language, and
A translation evaluation method using a translation evaluation device, which includes a presentation step for presenting a determination result of the determination step.