JP2015095182A

JP2015095182A - Character string processing device, method, and program

Info

Publication number: JP2015095182A
Application number: JP2013235384A
Authority: JP
Inventors: 小野　朗; Akira Ono; 朗小野; 昌英水島; Masahide Mizushima; 内野　一; Hajime Uchino; 一内野; 仙吉田; Sen Yoshida; 吉田　　仙; 孝文引地; Takafumi Hikichi; 太一片山; Taichi Katayama
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-11-13
Filing date: 2013-11-13
Publication date: 2015-05-18
Anticipated expiration: 2033-11-13
Also published as: JP5833087B2

Abstract

PROBLEM TO BE SOLVED: To acquire a character position of an input character string before replacement even when the character position is changed by performing replacement processing for the input character string.SOLUTION: A character processing device 10 includes a replacement part 14 that: replaces a first partial character string included in an input character string by a second partial character string on the basis of a collection of predetermined replacement rules for replacing the first partial character string by the second partial character string; and generates and provides character position information and length information of the second partial character string in the input character string after the replacement, on the basis of character position information and length information of the first character string in the input character string before the replacement.

Description

本発明は、文字列処理装置、方法、及びプログラムに係り、特に、機械翻訳における翻訳前の文字列を処理する文字列処理装置、方法、及びプログラムである。 The present invention relates to a character string processing device, method, and program, and more particularly, to a character string processing device, method, and program for processing a character string before translation in machine translation.

従来から、或る自然言語の文章を別の自然言語の文章へコンピュータプログラムを用いて機械的に変換する機械翻訳の技術が知られている。また、近年では、統計モデルを用いて自動的に機械翻訳を実現する統計的機械翻訳が知られている（例えば非特許文献１参照）。 2. Description of the Related Art Conventionally, a machine translation technique for mechanically converting a sentence in one natural language into a sentence in another natural language using a computer program is known. In recent years, statistical machine translation that automatically realizes machine translation using a statistical model is known (see, for example, Non-Patent Document 1).

統計的機械翻訳においては、例えば英語の文章から日本語の文章への翻訳を行った場合に、翻訳結果である日本語の文章の各文字の文字位置が、翻訳前の英語の文章のどの文字位置に対応しているのかを知りたいという要求は多々ある。 In statistical machine translation, for example, when an English sentence is translated into a Japanese sentence, the character position of each character in the Japanese sentence, which is the translation result, is the character in the English sentence before translation. There are many requests to know if it corresponds to a position.

また、統計的機械翻訳では、数式や化学式、長い固有名詞等を、翻訳前処理において仮想単語に変換したり、冠詞を削除したりする置換処理を行うことが一般的に行われている。 In statistical machine translation, it is common to perform a substitution process that converts mathematical formulas, chemical formulas, long proper nouns, and the like into virtual words or deletes articles in pre-translation processing.

P.Koehn et al. 2007. "Moses: Open Source Toolkit for Statistical Machine Translation"P.Koehn et al. 2007. "Moses: Open Source Toolkit for Statistical Machine Translation"

しかしながら、一般的な統計的機械翻訳では、翻訳前後のフレーズ対応をするだけであるため、翻訳対象である入力文に対して上記の翻訳前処理を実行することにより文字位置が変化してしまうと、翻訳後の翻訳文に対して入力文の文字位置を対応付けることが困難となる。 However, in general statistical machine translation, only the phrase correspondence before and after translation is supported, so if the character position is changed by executing the above pre-translation processing on the input sentence to be translated Thus, it is difficult to associate the character position of the input sentence with the translated sentence after translation.

本発明は、上記の事情に鑑みてなされたもので、入力文字列に対して置換処理を実行することにより文字位置が変化した場合でも、置換前の入力文字列の文字位置を得ることができる文字列処理装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and even when the character position is changed by executing a replacement process on the input character string, the character position of the input character string before replacement can be obtained. An object of the present invention is to provide a character string processing apparatus, method, and program.

上記目的を達成するために、本発明の文字列処理装置は、第１の部分文字列を第２の部分文字列に置換する予め定められた置換ルールの集合に基づいて、前記置換ルールに従って入力文字列に含まれる前記第１の部分文字列を前記第２の部分文字列に置換すると共に、置換前の前記入力文字列における前記第１の部分文字列の文字位置情報及び長さ情報に基づいて、置換後の前記入力文字列における前記第２の部分文字列の文字位置情報及び長さ情報を生成して付与する置換部を含む。 In order to achieve the above object, the character string processing device according to the present invention inputs according to the replacement rule based on a set of predetermined replacement rules for replacing the first partial character string with the second partial character string. The first partial character string included in the character string is replaced with the second partial character string, and based on character position information and length information of the first partial character string in the input character string before replacement A replacement unit that generates and assigns character position information and length information of the second partial character string in the input character string after replacement.

また、前記置換部は、前記置換ルールにおける前記第１の部分文字列は、空文字を含み、前記入力文字列を文字単位で分割した各文字に対して、前記入力文字列の先頭の文字位置を０として前記文字の文字位置を示す前記文字位置情報と、前記入力文字列の各文字の長さを１とする前記長さ情報と、を生成する情報生成部と、前記置換ルールに従って置換される前記第１の部分文字列が空文字の場合、置換された前記第２の部分文字列の各文字の前記文字位置情報を０として生成し、且つ、前記第２の部分文字列の各文字の前記長さ情報を０として生成し、前記置換ルールに従って置換される前記第１の部分文字列が空文字ではない場合、置換された前記第２の部分文字列の各文字の前記文字位置情報を前記第１の部分文字列の各文字の前記文字位置情報の最小値として生成し、且つ、前記第２の部分文字列の各文字の前記長さ情報を前記第１の部分文字列の長さとして生成する文字列処理部と、を含む構成としてもよい。 In the replacement unit, the first partial character string in the replacement rule includes an empty character, and for each character obtained by dividing the input character string in character units, the first character position of the input character string is determined. The character position information indicating the character position of the character as 0 and the length information in which the length of each character of the input character string is set to 1 are replaced according to the replacement rule. When the first partial character string is an empty character, the character position information of each character of the replaced second partial character string is generated as 0, and the character of each character of the second partial character string is When the length information is generated as 0 and the first partial character string to be replaced according to the replacement rule is not a null character, the character position information of each character of the replaced second partial character string is the first character string. 1 for each character of the substring A character string processing unit that generates a character position information as a minimum value and generates the length information of each character of the second partial character string as the length of the first partial character string. It is good.

また、本発明の文字列処理方法は、置換部を含む文字列処理装置において実行される文字列処理方法であって、前記置換部が、第１の部分文字列を第２の部分文字列に置換する予め定められた置換ルールの集合に基づいて、前記置換ルールに従って入力文字列に含まれる前記第１の部分文字列を前記第２の部分文字列に置換すると共に、置換前の前記入力文字列における前記第１の部分文字列の文字位置情報及び長さ情報に基づいて、置換後の前記入力文字列における前記第２の部分文字列の文字位置情報及び長さ情報を生成して付与するステップを含む。 The character string processing method according to the present invention is a character string processing method executed in a character string processing device including a replacement unit, wherein the replacement unit converts the first partial character string into a second partial character string. Based on a set of predetermined replacement rules to be replaced, the first partial character string included in the input character string is replaced with the second partial character string according to the replacement rule, and the input character before replacement Based on the character position information and length information of the first partial character string in the string, the character position information and length information of the second partial character string in the input character string after replacement are generated and attached. Includes steps.

また、本発明の文字列処理プログラムは、コンピュータを、請求項１又は請求項２記載の文字列処理装置の各部として機能させるためのプログラムである。 The character string processing program of the present invention is a program for causing a computer to function as each part of the character string processing device according to claim 1 or claim 2.

以上説明したように、本発明の文字列処理装置、方法、及びプログラムによれば、入力文字列に対して置換処理を実行することにより文字位置が変化した場合でも、置換前の入力文字列の文字位置を得ることができる、という効果が得られる。 As described above, according to the character string processing device, method, and program of the present invention, even when the character position is changed by executing the replacement process on the input character string, the input character string before replacement is changed. The effect that the character position can be obtained is obtained.

本実施の形態に係る文字列処理装置の機能的な構成例を示すブロック図である。It is a block diagram which shows the functional structural example of the character string processing apparatus which concerns on this Embodiment. 置換前文字列、文字位置情報、及び長さ情報の一例を示す図である。It is a figure which shows an example of the character string before substitution, character position information, and length information. 置換後文字列、文字位置情報、及び長さ情報の一例を示す図である。It is a figure which shows an example of the character string after substitution, character position information, and length information. 置換前文字列、文字位置情報、及び長さ情報の一例を示す図である。It is a figure which shows an example of the character string before substitution, character position information, and length information. 置換後文字列、文字位置情報、及び長さ情報の一例を示す図である。It is a figure which shows an example of the character string after substitution, character position information, and length information. 本実施の形態における文字列処理ルーチンを示すフローチャートである。It is a flowchart which shows the character string processing routine in this Embodiment.

以下、図面を参照して、本発明の実施の形態を詳細に説明する。
＜発明の概要＞
まず、本発明の実施の形態の概要について説明する。
本実施の形態では、翻訳前処理として、入力文としての置換前文字列に含まれる第１の部分文字列を第２の部分文字列に置換する。この際、第１の部分文字列の文字位置情報及び長さ情報に基づいて、置換後の第２の部分文字列の文字位置情報及び長さ情報を生成して付与する。これにより、翻訳対象である入力文に対して翻訳前処理を実行することにより文字位置が変化した場合でも、翻訳後の翻訳文に対して入力文の文字位置を適切に対応付けることを可能にする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<Summary of invention>
First, an outline of an embodiment of the present invention will be described.
In the present embodiment, as the pre-translation processing, the first partial character string included in the pre-substitution character string as the input sentence is replaced with the second partial character string. At this time, based on the character position information and length information of the first partial character string, the character position information and length information of the second partial character string after replacement are generated and attached. This makes it possible to appropriately associate the character position of the input sentence with the translated sentence after translation even when the character position changes by executing pre-translation processing on the input sentence to be translated. .

＜システム構成＞
本実施の形態に係る文字列処理装置１０は、入力部１２、置換部１４、及び出力部１６を備えている。 <System configuration>
The character string processing apparatus 10 according to the present embodiment includes an input unit 12, a replacement unit 14, and an output unit 16.

入力部１２は、翻訳対象の入力文の入力を受け付ける。入力部１２は、例えばキーボード、マウス、記憶装置等の入力機器により実現される。 The input unit 12 receives input of an input sentence to be translated. The input unit 12 is realized by an input device such as a keyboard, a mouse, and a storage device.

置換部１４は、機能的には、置換ルール記憶部１８、情報生成部２０、及び文字列処理部２２を含む。置換部１４は、例えばＣＰＵ（Central Processing Unit）と、ＲＡＭ（Random Access Memory）と、後述する文字列処理ルーチンを実行するための文字列処理プログラム及び置換ルールを記憶したＲＯＭ（Read Only Memory）と、を備えたコンピュータにより実現される。なお、ＲＯＭに代えて不揮発性メモリを用いてもよい。 The replacement unit 14 functionally includes a replacement rule storage unit 18, an information generation unit 20, and a character string processing unit 22. The replacement unit 14 includes, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory) storing a character string processing program and a replacement rule for executing a character string processing routine described later, and Are realized by a computer equipped with. Note that a nonvolatile memory may be used instead of the ROM.

置換ルール記憶部１８は、統計的機械翻訳の前処理として、翻訳対象の入力文に含まれる第１の部分文字列を第２の部分文字列に置換する際のルールが記述された置換ルールの集合を記憶する。なお、本実施の形態における「文字」には、ローマ字、漢字等の各国の言語を表す文字だけでなく、数字及び記号も含む。また、「部分文字列」は、１文字の場合も含む。また、「部分文字列」は、空文字も含む。なお、以下では、入力部１２により入力された入力文を翻訳前文字列といい、翻訳前文字列が置換ルールに従って置換された文字列を置換後文字列という場合がある。 The replacement rule storage unit 18 includes, as pre-processing for statistical machine translation, a replacement rule that describes a rule for replacing the first partial character string included in the input sentence to be translated with the second partial character string. Remember the set. Note that the “characters” in the present embodiment include not only characters representing languages of each country such as Roman characters and Chinese characters, but also numbers and symbols. The “partial character string” includes a case of one character. The “partial character string” includes a null character. Hereinafter, an input sentence input by the input unit 12 may be referred to as a pre-translation character string, and a character string in which the pre-translation character string is replaced according to a replacement rule may be referred to as a post-substitution character string.

本実施の形態では、２つの置換ルールについて説明する。例えば第１の置換ルールは、「入力文の文末に特定文字が存在しない場合は、文末に特定文字を挿入する。挿入した特定文字の文字位置情報は“０”、挿入した特定文字の長さ情報は“０”とする。」というものである。 In this embodiment, two replacement rules will be described. For example, the first replacement rule is “if the specific character does not exist at the end of the input sentence, the specific character is inserted at the end of the sentence. The character position information of the inserted specific character is“ 0 ”, and the length of the inserted specific character. The information is “0”.

特定文字は、例えば「．」、「。」のように、通常は文末に記述される終止符（ピリオド）を表す文字である。なお、第１の置換ルールは、換言すれば、置換前文字列の最後の空文字（第１の部分文字列）を特定文字（第２の部分文字列）に置換するルールともいえる。 The specific character is a character that represents a period that is usually written at the end of a sentence, such as “.” And “.”. In other words, the first replacement rule can be said to be a rule for replacing the last empty character (first partial character string) of the character string before replacement with a specific character (second partial character string).

また、例えば第２の置換ルールは、「記号及び数字の少なくとも一方が連続する特定文字列は、仮想単語に置換する。挿入した仮想単語の各文字の文字位置情報は全て、特定文字列の最小の文字位置情報と同一とする。挿入した仮想単語の各文字の長さ情報は全て、（特定文字列の最後の文字の文字位置＋当該最後の文字の長さ−特定文字列の各文字の文字位置のうち最小の文字位置）とする。」というものである。 Further, for example, the second replacement rule is “a specific character string in which at least one of a symbol and a number continues is replaced with a virtual word. All character position information of each character of the inserted virtual word is the minimum of the specific character string. The length information of each character of the inserted virtual word is all (the character position of the last character of the specific character string + the length of the last character−the length of each character of the specific character string). The smallest character position among the character positions).

仮想単語は、例えば特定文字列が数字及び算術記号を含む数式の場合には、「スウシキ」等の単語である。なお、（特定文字列の最後の文字の文字位置＋当該最後の文字の長さ−特定文字列の各文字の文字位置のうち最小の文字位置）は、換言すれば、特定文字列の長さである。 The virtual word is a word such as “Suisuki” in the case where the specific character string is a mathematical expression including a number and an arithmetic symbol, for example. In addition, (the character position of the last character of the specific character string + the length of the last character−the minimum character position among the character positions of each character of the specific character string) is, in other words, the length of the specific character string. It is.

情報生成部２０は、入力部１２により入力された入力文、すなわち置換前文字列を文字単位に分割し、分割した各文字に対して文字位置情報及び長さ情報を生成して付与する。例えば図２に示すように、置換前文字列が「Ｔｈｉｓｉｓａｐｅｎｃｉｌ」の場合、空文字（スペース）を含めて１６文字で構成されるため、１６個の文字に分割する。そして、置換前文字列の先頭の文字（左端の文字）に対しては、文字位置を“０”とした文字位置情報を生成し、最後の文字（右端の文字）に向かって１文字進む毎に、文字位置を１つずつ加算した文字位置情報を各文字に対して生成する。これにより、図２に示すように、最初の文字である「Ｔ」には文字位置情報として“０”が生成され、最後の文字である「ｌ」には、文字位置情報として“１５”が生成される。また、長さ情報は、図２に示すように、全ての文字に対して“１”が生成される。 The information generation unit 20 divides the input sentence input by the input unit 12, that is, the character string before substitution into character units, and generates and assigns character position information and length information to each divided character. For example, as shown in FIG. 2, when the pre-replacement character string is “This is a pencil”, it is composed of 16 characters including an empty character (space), so it is divided into 16 characters. Then, for the first character (leftmost character) of the pre-replacement character string, character position information with the character position set to “0” is generated, and one character is advanced toward the last character (rightmost character). In addition, character position information obtained by adding character positions one by one is generated for each character. As a result, as shown in FIG. 2, “0” is generated as the character position information for the first character “T”, and “15” is set as the character position information for the last character “l”. Generated. As the length information, “1” is generated for all characters as shown in FIG.

文字列処理部２２は、置換ルール記憶部１８に記憶された置換ルールに基づいて、入力部１２により入力された置換前文字列を置換する。例えば、図２に示す「Ｔｈｉｓｉｓａｐｅｎｃｉｌ」のように、翻訳前文字列の文末に特定文字としてのピリオドが存在しない文章は、第１の置換ルールに従い、図３に示す「Ｔｈｉｓｉｓａｐｅｎｃｉｌ．」のように、最後にピリオドを挿入する。すなわち、置換前文字列の最後の空文字（第１の部分文字列）を特定文字（第２の部分文字列）に置換する。 The character string processing unit 22 replaces the pre-replacement character string input by the input unit 12 based on the replacement rule stored in the replacement rule storage unit 18. For example, a sentence that does not have a period as a specific character at the end of the pre-translation character string, such as “This is a pencil” shown in FIG. 2, follows the first substitution rule, “This is a pencil” shown in FIG. Insert a period at the end. That is, the last empty character (first partial character string) in the character string before replacement is replaced with a specific character (second partial character string).

また、文字列処理部２２は、図３に示すように、挿入された特定文字としてのピリオドに対して、文字位置情報として“０”、長さ情報として“０”を生成して付与する。 In addition, as shown in FIG. 3, the character string processing unit 22 generates and assigns “0” as character position information and “0” as length information to a period as an inserted specific character.

また、例えば、図４に示すように、「２＋３＝５」等の数式を表す特定文字列（第１の部分文字列）を含む文章は、第２の置換ルールに従い、図５に示すように特定文字列を「スウシキ」のような仮想単語（第２の部分文字列）に置換する。 Also, for example, as shown in FIG. 4, a sentence including a specific character string (first partial character string) representing a mathematical expression such as “2 + 3 = 5” follows the second replacement rule as shown in FIG. The specific character string is replaced with a virtual word (second partial character string) such as “SUSUKIKI”.

また、特定文字列の最後の文字である「５」の文字位置は“４”であり、特定文字列の最後の文字である「５」の長さは“１”であり、特定文字列の各文字の文字位置のうち最小の文字位置、すなわち特定文字列の最初の文字である「２」の文字位置は“０”である。このため、文字列処理部２２は、仮想単語の各文字に対して、文字位置情報として“０”、長さ情報として“５”（＝４＋１−０）を生成する。すなわち、仮想単語の各文字の長さ情報を特定文字列の長さと同一とする。 Further, the character position of “5” which is the last character of the specific character string is “4”, the length of “5” which is the last character of the specific character string is “1”, The minimum character position among the character positions of each character, that is, the character position of “2” that is the first character of the specific character string is “0”. For this reason, the character string processing unit 22 generates “0” as character position information and “5” (= 4 + 1−0) as length information for each character of the virtual word. That is, the length information of each character of the virtual word is the same as the length of the specific character string.

このように、文字列処理部２２は、置換される第１の部分文字列が空文字の場合、置換された第２の部分文字列の各文字の文字位置情報を“０”、各文字の長さ情報を“０”として生成し付与する。一方、置換される第１の部分文字列が空文字ではない場合、置換された第２の部分文字列の各文字の文字位置情報を第１の部分文字列の文字位置情報の最小値とし、各文字の長さ情報を第１の部分文字列の長さとして生成し付与する。 Thus, when the first partial character string to be replaced is an empty character, the character string processing unit 22 sets the character position information of each character of the replaced second partial character string to “0” and the length of each character. Information is generated and assigned as “0”. On the other hand, if the first partial character string to be replaced is not a null character, the character position information of each character of the replaced second partial character string is set as the minimum value of the character position information of the first partial character string, Character length information is generated and assigned as the length of the first partial character string.

＜文字列処理装置の作用＞
次に、図６を参照して、本実施の形態に係る文字列処理装置１０において実行される文字列処理ルーチンについて説明する。 <Operation of character string processing device>
Next, a character string processing routine executed in the character string processing apparatus 10 according to the present embodiment will be described with reference to FIG.

ステップＳ１００では、翻訳対象の文字列の入力を受け付ける。本実施の形態では、一例として入力された文字列が英語の文字列であり、この英語の文字列を第１の置換ルール及び第２の置換ルールにより置換する場合について説明する。 In step S100, an input of a character string to be translated is accepted. In the present embodiment, an example will be described in which an input character string is an English character string, and the English character string is replaced by the first replacement rule and the second replacement rule.

ステップＳ１０２では、ステップＳ１００で入力された入力文を文字単位に分割し、分割した各文字の文字位置を表す文字位置情報及び文字の長さを表す長さ情報を生成して付与する。例えば図２に示すように、置換前文字列としての入力文が「Ｔｈｉｓｉｓａｐｅｎｃｉｌ」の場合、１６個の文字に分割され、各文字に対して図２に示すような文字位置情報及び長さ情報が生成される。 In step S102, the input sentence input in step S100 is divided into character units, and character position information indicating the character position of each divided character and length information indicating the character length are generated and attached. For example, as shown in FIG. 2, when the input sentence as the pre-replacement character string is “This is a pencil”, it is divided into 16 characters, and character position information and length as shown in FIG. Information is generated.

ステップＳ１０４では、置換ルール記憶部１８に記憶された置換ルールの集合に基づいて、置換前文字列を置換する。また、置換される第１の部分文字列が空文字の場合、置換された第２の部分文字列の各文字の文字位置情報を“０”、各文字の長さ情報を“０”とする。一方、置換される第１の部分文字列が空文字ではない場合、置換された第２の部分文字列の各文字の文字位置情報を第１の部分文字列の文字位置情報の最小値とし、各文字の長さ情報を第１の部分文字列の長さとする文字列処理を行う。 In step S104, the pre-replacement character string is replaced based on the set of replacement rules stored in the replacement rule storage unit 18. If the first partial character string to be replaced is an empty character, the character position information of each character of the replaced second partial character string is set to “0”, and the length information of each character is set to “0”. On the other hand, if the first partial character string to be replaced is not a null character, the character position information of each character of the replaced second partial character string is set as the minimum value of the character position information of the first partial character string, Character string processing is performed using the character length information as the length of the first partial character string.

例えば置換前文字列が図２に示すような文字列の場合、文末にピリオドが存在しない。このため、第１の置換ルールに従い、図３に示すように、置換前文字列の最後にピリオドを追加する。すなわち、置換前文字列の最後の空文字（第１の部分文字列）を特定文字（第２の部分文字列）に置換する。 For example, when the pre-replacement character string is a character string as shown in FIG. 2, there is no period at the end of the sentence. Therefore, according to the first replacement rule, a period is added to the end of the character string before replacement as shown in FIG. That is, the last empty character (first partial character string) in the character string before replacement is replaced with a specific character (second partial character string).

そして、置換される第１の部分文字列は空文字であるため、置換された第２の部分文字列としてのピリオドに対して、文字位置情報として“０”、長さ情報として“０”を生成する。 Since the first partial character string to be replaced is an empty character, “0” is generated as character position information and “0” is generated as length information for the period as the replaced second partial character string. To do.

なお、図２に示す置換前文字列は、記号及び数字の少なくとも一方が連続する文字列を含まないため、第２の置換ルールによる置換は行われない。 Note that the pre-replacement character string shown in FIG. 2 does not include a character string in which at least one of a symbol and a number continues, so that the replacement according to the second replacement rule is not performed.

また、例えば置換前文字列が図４に示すような文字列の場合、文末にピリオドが存在するため、第１の置換ルールに基づく置換は行われない。一方、図４に示す文字列は、記号及び数字の少なくとも一方が連続する特定文字列（第１の部分文字列）としての数式「２＋３＝５」を含むため、第２の置換ルールに従い、上記数式は仮想単語（第２の部分文字列）としての「スウシキ」に置換される。 For example, when the pre-replacement character string is a character string as shown in FIG. 4, a period is present at the end of the sentence, so that the replacement based on the first replacement rule is not performed. On the other hand, since the character string shown in FIG. 4 includes the mathematical expression “2 + 3 = 5” as a specific character string (first partial character string) in which at least one of a symbol and a number continues, according to the second replacement rule, The mathematical expression is replaced with “Suisuki” as a virtual word (second partial character string).

また、置換される第１の部分文字列としての数式は空文字ではないため、置換された第２の部分文字列としての「スウシキ」の各文字に対して、文字位置情報として“０”、長さ情報として“５”を生成する。 Further, since the mathematical expression as the first partial character string to be replaced is not a null character, “0” as the character position information, long for each character of “Suisuki” as the replaced second partial character string “5” is generated as the information.

ステップＳ１０６では、ステップＳ１０４により置換された置換後文字列を後段の翻訳装置へ出力する。後段の翻訳装置は、例えば置換後文字列に対して他の言語への変換、すなわち翻訳を行う装置である。 In step S106, the replaced character string replaced in step S104 is output to the subsequent translation apparatus. The latter-stage translation device is a device that converts, for example, a translated character string into another language, that is, a translation.

このように、本実施の形態によれば、翻訳前処理として、入力文としての置換前文字列に含まれる第１の部分文字列を第２の部分文字列に置換した場合であっても、第１の部分文字列の文字位置情報及び長さ情報に基づいて、置換後の第２の部分文字列の文字位置情報及び長さ情報が生成される。このため、翻訳対象である入力文に対して翻訳前処理を実行することにより文字位置が変化した場合でも、翻訳後の翻訳文に対して入力文の文字位置を適切に対応付けることができる。 Thus, according to the present embodiment, as a pre-translation process, even when the first partial character string included in the pre-substitution character string as the input sentence is replaced with the second partial character string, Based on the character position information and length information of the first partial character string, the character position information and length information of the second partial character string after replacement are generated. For this reason, even when the character position is changed by executing pre-translation processing on the input sentence to be translated, the character position of the input sentence can be appropriately associated with the translated sentence after translation.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、第２の置換ルールにおける第１の部分文字列は数式に限らず、化学式や予め定めた長さ以上の固有名詞でもよい。また、置換ルールは上述した第１の置換ルール及び第２の置換ルールに限られるものではなく、他の置換ルールを記憶しておいてもよい。 For example, the first partial character string in the second replacement rule is not limited to a mathematical formula, but may be a chemical formula or a proper noun longer than a predetermined length. Further, the replacement rule is not limited to the first replacement rule and the second replacement rule described above, and other replacement rules may be stored.

また、上述の文字列処理装置１０は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Further, the above-described character string processing apparatus 10 has a computer system therein, but the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. Shall be.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体、例えばＣＤ−ＲＯＭやメモリーカード等に格納して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium such as a CD-ROM or a memory card. It is.

１０文字列処理装置
１２入力部
１４置換部
１６出力部
１８置換ルール記憶部
２０情報生成部
２２文字列処理部 DESCRIPTION OF SYMBOLS 10 Character string processing apparatus 12 Input part 14 Replacement part 16 Output part 18 Replacement rule memory | storage part 20 Information generation part 22 Character string processing part

Claims

Based on a set of predetermined replacement rules for replacing the first partial character string with the second partial character string, the first partial character string included in the input character string according to the replacement rule is changed to the second partial character string. The second partial character string in the input character string after replacement is replaced with a partial character string and based on the character position information and length information of the first partial character string in the input character string before replacement. A character string processing apparatus including a replacement unit that generates and assigns character position information and length information.

The first partial character string in the replacement rule includes a null character,
The replacement part is:
For each character obtained by dividing the input character string in character units, the character position information indicating the character position of the character with the leading character position of the input character string being 0, and the length of each character of the input character string An information generation unit that generates the length information with a length of 1;
When the first partial character string to be replaced according to the replacement rule is an empty character, the character position information of each character of the replaced second partial character string is generated as 0, and the second part When the length information of each character of the character string is generated as 0 and the first partial character string to be replaced according to the replacement rule is not a null character, each character of the replaced second partial character string The character position information is generated as a minimum value of the character position information of each character of the first partial character string, and the length information of each character of the second partial character string is generated as the first part. A character string processing unit for generating the length of the character string;
The character string processing device according to claim 1.

A character string processing method executed in a character string processing device including a replacement unit,
The first partial character string included in the input character string according to the replacement rule based on a set of predetermined replacement rules in which the replacement unit replaces the first partial character string with the second partial character string. Is replaced with the second partial character string, and based on the character position information and length information of the first partial character string in the input character string before replacement, the second partial character string is replaced with the second partial character string. A character string processing method comprising: generating and assigning character position information and length information of the partial character string of 2.

A character string processing program for causing a computer to function as each part of the character string processing device according to claim 1.