JP2014075005A

JP2014075005A - Information processor, information processing method and program

Info

Publication number: JP2014075005A
Application number: JP2012221653A
Authority: JP
Inventors: Qing He; 青何
Original assignee: Zenrin Datacom Co Ltd
Current assignee: Zenrin Datacom Co Ltd
Priority date: 2012-10-03
Filing date: 2012-10-03
Publication date: 2014-04-24
Anticipated expiration: 2032-10-03
Also published as: JP6021571B2

Abstract

PROBLEM TO BE SOLVED: To provide an information processor, an information processing method and a program capable of suitably converting notations into language with no long sound even in the case of a Japanese text including long sounds.SOLUTION: The information processor includes: an input part 101 for accepting the input of a text including Chinese characters and the reading KANA of the text; a long sound Chinese character determination part 107 for determining whether or not long sound Chinese characters are included in the text; a long sound KANA determination part 103 for determining whether or not long sound KANA characters are included in the reading KANA; a long sound determination part 111 for, when the long sound Chinese characters are included in the text and the long sound KANA characters are included in the reading KANA as the result of determination, and when the reading of the long sound Chinese characters is coincident with the long sound KANA characters, specifying the long sound Chinese characters and the long sound KANA characters as long sounds; and a translation part 113 for converting the reading KANA into notations in language with no long sound in accordance with whether or not long sounds are included.

Description

本発明に係るいくつかの態様は、情報処理装置、情報処理方法、及びプログラムに関する。 Some embodiments according to the present invention relate to an information processing apparatus, an information processing method, and a program.

近年、日本語と他の言語との間で翻訳を行う装置が普及しつつある。ここで、日本語には長音がある一方で、英語やハングル語、タイ語等の言語には長音がないために、日本語の固有名詞等を長音のある言語で表記すると、表記が不一致となる場合がある。
このような問題に対処すべく、例えば特許文献１には、ローマ字からカナ表記文字列に変換する際に、省略されている長音を追加する装置が開示されている。 In recent years, devices that translate between Japanese and other languages are becoming popular. Here, there are long sounds in Japanese, but there are no long sounds in languages such as English, Korean, Thai, etc. There is a case.
In order to cope with such a problem, for example, Patent Document 1 discloses an apparatus that adds a long sound that is omitted when converting a Roman character into a kana character string.

特再表２００８／０７２７３４号公報Japanese National Patent Publication No. 2008/072734

しかしながら、特許文献１記載の手法は、ローマ字表記からカナ表記文字列に変換する場合についてのみしか考慮しておらず、日本語表記から他の言語への表記については何ら考慮していない。 However, the method described in Patent Document 1 only considers the case of converting from Roman alphabet notation to kana notation character string, and does not consider any notation from Japanese notation to another language.

本発明のいくつかの態様は前述の課題に鑑みてなされたものであり、長音が含まれている日本語テキストであっても、長音のない言語へ表記を好適に変換することを可能とする情報処理装置、情報処理方法、及びプログラムを提供することを目的の１つとする。 Some aspects of the present invention have been made in view of the above-mentioned problems, and even a Japanese text containing a long sound can be suitably converted into a language without a long sound. An object is to provide an information processing apparatus, an information processing method, and a program.

本発明に係る情報処理システムは、漢字を含むテキストと、当該テキストの振り仮名との入力を受ける入力手段と、前記テキストに長音漢字が含まれているか否かを判別する第１の判別手段と、前記振り仮名に長音仮名が含まれているか否かを判別する第２の判別手段と、判別の結果、前記テキストには長音漢字が含まれ、前記振り仮名には長音仮名が含まれている場合であって、当該長音漢字の読みが当該長音仮名に一致する場合に、当該長音漢字及び当該長音仮名を長音として特定する手段と、前記振り仮名を、長音が含まれているか否かに応じて、長音のない言語の表記に変換する手段とを備える。 An information processing system according to the present invention includes an input unit that receives input of a text including a kanji and a kana for the text, and a first determination unit that determines whether or not the text includes a long kanji. The second discrimination means for discriminating whether or not the phonetic kana is included in the phonetic kana, and, as a result of the discrimination, the text contains a long kanji and the phonetic kana is contained in the kana And if the reading of the long kana matches the long kana, the means for identifying the long kanji and the long kana as a long sound, and whether the kana includes a long sound And means for converting into a notation of a language without a long sound.

本発明に係る情報処理方法は、漢字を含むテキストと、当該テキストの振り仮名との入力を受けるステップと、前記テキストに長音漢字が含まれているか否かを判別するステップと、前記振り仮名に長音仮名が含まれているか否かを判別するステップと、判別の結果、前記テキストには長音漢字が含まれ、前記振り仮名には長音仮名が含まれている場合であって、当該長音漢字の読みが当該長音仮名に一致する場合に、当該長音漢字を長音として特定するステップと、前記振り仮名を、長音が含まれているか否かに応じて、長音のない言語の表記に変換するステップとを情報処理装置が行う。 An information processing method according to the present invention includes a step of receiving a text including a kanji and a kana character of the text, a step of determining whether or not a long kanji is included in the text, A step of determining whether or not a long kana is included, and as a result of the determination, the text includes a long kanji and the kana includes a long kana, Identifying a kana kanji as a long sound when the reading matches the long kana, and converting the kana to a notation of a language without a long sound depending on whether or not a long sound is included; Is performed by the information processing apparatus.

本発明に係るプログラムは、漢字を含むテキストと、当該テキストの振り仮名との入力を受けるステップと、前記テキストに長音漢字が含まれているか否かを判別するステップと、前記振り仮名に長音仮名が含まれているか否かを判別するステップと、判別の結果、前記テキストには長音漢字が含まれ、前記振り仮名には長音仮名が含まれている場合であって、当該長音漢字の読みが当該長音仮名に一致する場合に、当該長音漢字を長音として特定するステップと、前記振り仮名を、長音が含まれているか否かに応じて、長音のない言語の表記に変換するステップとをコンピュータに実行させるためのプログラムである。 The program according to the present invention includes a step of receiving a text including a kanji and a kana character of the text, a step of determining whether or not the text includes a kana kanji, and a kana kana in the kana And the result of the determination is that the text includes a long kanji and the kana includes a long kana, and the long kanji is read. A step of identifying the kana kanji as a long sound when matching the kana with the long sound kana, and converting the kana to a notation of a language without a long sound depending on whether or not the kana is included It is a program for making it run.

尚、本発明において、「部」や「手段」、「装置」、「システム」とは、単に物理的手段を意味するものではなく、その「部」や「手段」、「装置」、「システム」が有する機能をソフトウェアによって実現する場合も含む。また、１つの「部」や「手段」、「装置」、「システム」が有する機能が２つ以上の物理的手段や装置により実現されても、２つ以上の「部」や「手段」、「装置」、「システム」の機能が１つの物理的手段や装置により実現されても良い。 In the present invention, “part”, “means”, “apparatus”, and “system” do not simply mean physical means, but “part”, “means”, “apparatus”, “system”. This includes the case where the functions possessed by "are realized by software. Further, even if the functions of one “unit”, “means”, “apparatus”, and “system” are realized by two or more physical means or devices, two or more “parts” or “means”, The functions of “device” and “system” may be realized by a single physical means or device.

本発明によれば、長音が含まれている日本語テキストであっても、長音のない言語へ表記を好適に変換することを可能とする情報処理装置、情報処理方法、及びプログラムを提供することができる。 According to the present invention, it is possible to provide an information processing apparatus, an information processing method, and a program capable of suitably converting a notation into a language without a long sound even if it is a Japanese text including a long sound. Can do.

本発明の実施形態における処理を説明するための図である。It is a figure for demonstrating the process in embodiment of this invention. 本発明の実施形態における情報処理装置の構成を示す図である。It is a figure which shows the structure of the information processing apparatus in embodiment of this invention. 図２に示す長音母音仮名辞書の具体例を示す図である。It is a figure which shows the specific example of the long sound vowel kana dictionary shown in FIG. 図２に示す長音漢字辞書の具体例を示す図である。It is a figure which shows the specific example of the long sound Chinese character dictionary shown in FIG. 図２に示す情報処理装置の処理の流れを説明するためのフローチャートである。3 is a flowchart for explaining a processing flow of the information processing apparatus illustrated in FIG. 2. 図１に示す情報処理装置を実装可能なハードウェアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware which can mount the information processing apparatus shown in FIG.

以下に本発明の実施形態を説明する。以下の説明及び参照する図面の記載において、同一又は類似の構成には、それぞれ同一又は類似の符号が付されている。 Embodiments of the present invention will be described below. In the following description and the description of the drawings to be referred to, the same or similar components are denoted by the same or similar reference numerals.

（実施形態）
図１乃至図６は、実施形態を説明するための図である。以下、これらの図を参照しながら、以下の流れに沿って実施形態を説明する。まず「１」で実施形態に係る情報処理装置が行う処理の概要を説明する。その上で、「２」で情報処理装置のシステム構成を、「３」で処理の流れを説明する。「４」は、システムを実装可能なハードウェア構成の具体例を示す。最後に、「１．５」以降で、本実施形態に係る効果等を説明する。 (Embodiment)
1 to 6 are diagrams for explaining the embodiment. Hereinafter, embodiments will be described along the following flow with reference to these drawings. First, an outline of processing performed by the information processing apparatus according to the embodiment is described as “1”. Then, “2” describes the system configuration of the information processing apparatus, and “3” describes the processing flow. “4” indicates a specific example of a hardware configuration capable of mounting the system. Finally, the effects and the like according to the present embodiment will be described after “1.5”.

（１概要）
日本語のテキストを他の言語に機械翻訳する場合には、例えば、地名や人名、住所がテキストに含まれる場合には、当該部分は通常翻訳されずに表記のみを変更する。しかしながら、日本語から英語やハングル語、タイ語等の言語に表記を変更しようとすると、日本語には長音があるが、英語やハングル語、タイ語等には長音がないため、不具合が生じる場合がある。 (1 Overview)
When machine-translating Japanese text into another language, for example, when a place name, a person's name, or an address is included in the text, the part is not usually translated and only the notation is changed. However, if you try to change the notation from Japanese to English, Korean, Thai, etc., Japanese will have long sounds, but English, Korean, Thai, etc. will not have long sounds, causing problems. There is a case.

例えば、「東京」（とうきょう）や「大阪」（おおさか）を英語表記（ローマ字表記）する際には、通常、「Ｔｏｕｋｙｏｕ」「Ｏｏｓａｋａ」とは表記せずに、「Ｔｏｋｙｏ」「Ｏｓａｋａ」と表記する。これは、後ろに母音（あ行）を連なる音節（例えば「東京」の「とう」や「きょう」、大阪の「おお」）の中には長音的に発音されるものがあるが、英語には長音がないため、当該母音を省略して表記する方が自然だからである。 For example, when “Tokyo” (Tokyo) or “Osaka” (Osaka) is written in English (in Roman letters), it is usually written as “Tokyo” or “Osaka” instead of “Tokyokyo” or “Oosaka”. To do. This is because some syllables followed by a vowel (a line) (eg “Tokyo” “Tou” and “Kyo”, Osaka “Oo”) are pronounced in a long sound. Because there is no long sound, it is more natural to omit the vowels.

「大阪」等のケースに対応すべく、読み仮名において機械的に母音行に連なる母音を削除することも考えられるが、このような手法にも不都合がある。例えば「長雨」（ながあめ）や「由宇」（ゆう）は、母音行に連なる母音を単純に削除する手法を用いると「があ」や「ゆう」が長音として扱われるため、「Ｎａｇａｍｅ」「Ｙｕ」と表記される。しかしながら、漢字の表記から考えると、「Ｎａｇａａｍｅ」「Ｙｕｕ」と表記すべきである。 In order to deal with cases such as “Osaka”, it may be possible to mechanically delete the vowels connected to the vowel line in the reading kana, but such a method is also inconvenient. For example, “Nagame” and “Yu” (Yu) are treated as “Nagame” because “Aga” and “Yu” are treated as long sounds by using a method of simply deleting vowels connected to the vowel line. Indicated as “Yu”. However, considering from the notation of kanji, it should be expressed as “Nagaame” “Yuu”.

そこで、本実施形態に係る情報処理装置は、漢字表記と振り仮名（カナ表記）の両方を考慮した上で、長音か否かを判別する。以下、この手法を図１を参照しながら説明する。図１は、本実施形態に係る情報処理装置が行う長音判定の概要を説明するための図である。 Therefore, the information processing apparatus according to the present embodiment determines whether or not the sound is a long sound in consideration of both kanji notation and kana kana (kana notation). Hereinafter, this method will be described with reference to FIG. FIG. 1 is a diagram for explaining an outline of the long sound determination performed by the information processing apparatus according to the present embodiment.

本実施形態に係る情報処理装置は、漢字を含む日本語テキストと、当該日本語テキストの振り仮名（読み仮名）との２種類の入力を受ける。図１の例であれば、日本語テキスト「東京スカイツリー」と、振り仮名「とうきょうすかいつりー」とが、情報処理装置に入力される。
情報処理装置は、これらの日本語テキスト及び振り仮名の双方に対して、先頭から順番に長音判定を行う。以下、具体的に説明する。 The information processing apparatus according to the present embodiment receives two types of input: Japanese text including kanji and a Japanese kana (reading kana) of the Japanese text. In the example of FIG. 1, the Japanese text “Tokyo Sky Tree” and the phonetic name “TOKYO KASUKA DAI” are input to the information processing apparatus.
The information processing apparatus performs a long sound determination in order from the top of both the Japanese text and the phonetic kana. This will be specifically described below.

日本語テキストに対しては、まず先頭の文字である「東」が長音漢字（読みが長音となりうる漢字であるか否か）を、長音漢字辞書を参照して判別する。ここで、長音漢字辞書は、長音となりうる漢字（長音漢字）と、長音時の読み仮名（振り仮名）とを対応付ける辞書である。長音漢字辞書については、後に図３を参照しながら詳細を説明する。 For Japanese text, the first character “East” is first identified as a long kanji (whether or not it is a kanji whose reading can be a long sound) with reference to the long kanji dictionary. Here, the long sound kanji dictionary is a dictionary that associates kanji that can be long sound (long sound kanji) with the reading kana (long kana) at the time of the long sound. Details of the long-tone kanji dictionary will be described later with reference to FIG.

振り仮名に対しては、最後が母音（あ行）となる２文字以上（通常は２文字若しくは３文字）の文字列であって、長音として発音される可能性のある長音仮名（長音読み）となるか否かを、先頭から順番に、長音母音仮名辞書を参照して判別する。図１の例であれば、先頭から２文字の「とう」が長音仮名となる。なお、長音母音仮名辞書は、長音仮名を特定するための辞書である。長音母音仮名辞書については、後に図４を参照しながら詳細を説明する。 For furigana, it is a string of 2 or more characters (usually 2 or 3 characters) that end with a vowel (a line) and may be pronounced as a long sound (long sound reading) Is determined with reference to the long vowel kana dictionary in order from the top. In the example shown in FIG. 1, the first two letters “To” are the long-tone kana. The long sound vowel kana dictionary is a dictionary for identifying long sound kana. Details of the long vowel kana dictionary will be described later with reference to FIG.

もし、日本語テキストに対して特定された長音漢字（図１の例では「東」）の長音時の読みと、振り仮名に対して特定された長音仮名とが一致すれば、「とう」と読む「東」を、長音読みの漢字として特定することができる。 If the long-tone kanji specified for the Japanese text (“East” in the example of FIG. 1) is the same as the long-tone kana specified for the phonetic kana, “East” to be read can be specified as a kanji for long sound reading.

尚、図１には示していないが、テキスト「東京スカイツリー」及び振り仮名「とうきょうすかいつりー」の場合、「東」「とう」以降も順次処理を続けることにより、「きょう」と読む「京」も、長音読みの漢字として特定することができる。 Although not shown in FIG. 1, in the case of the text “Tokyo Skytree” and the pseudonym “Toyosuka Kadari”, “Kyo” is read as “Kyo” by continuing the processing after “East” and “Toyo”. “Kyo” can also be specified as a kanji for long sound reading.

このように長音を特定できれば、情報処理装置は、これに従って表記を他の言語に変換することができる。すなわち、「とうきょう」のうち、「とう」、「きょう」がそれぞれ長音なので、これを例えば英語表記（ローマ字表記）すると、それぞれの母音部分を落とした「Ｔｏｋｙｏ」という好適な表記を情報処理装置は作成することができる。 If the long sound can be identified in this way, the information processing apparatus can convert the notation to another language in accordance with this. That is, among “Tokyo”, “Tou” and “Kyo” are long sounds, respectively. For example, when this is written in English (Roman alphabet), the information processing apparatus displays a suitable notation “Tokyo” with each vowel part dropped. Can be created.

尚、本実施形態に係る説明において、日本語表記を英語表記に変換する場合を例に説明するが、これに限られるものではなく、ハングル語、タイ語等の、長音のない言語であれば適用可能である。 In the description according to the present embodiment, a case where Japanese notation is converted to English notation will be described as an example. However, the present invention is not limited to this, and any language that does not have a long sound, such as Korean or Thai. Applicable.

（２システム概要）
図２は、本実施形態に係る情報処理装置１００の概略構成を示す図である。情報処理装置１００は、前述の通り、日本語の表記を他の表記へと変換する機能を有する装置であり、例えば、パーソナルコンピュータやサーバ等として実現できる。また、情報処理装置１００が日本語から多言語に表記を変換するデータは、例えば地図データに関係する地名、住所、店舗名、駅名、道路名、施設名等の各種情報や、人物名等が考えられる。 (2 System overview)
FIG. 2 is a diagram illustrating a schematic configuration of the information processing apparatus 100 according to the present embodiment. As described above, the information processing apparatus 100 is a device having a function of converting Japanese notation into another notation, and can be realized as, for example, a personal computer or a server. The data that the information processing apparatus 100 converts from Japanese to multilingual data includes various information such as place names, addresses, store names, station names, road names, and facility names related to map data, person names, and the like. Conceivable.

図１に示すように、情報処理装置１００は、入力部１０１、長音仮名判定部１０３、長音母音仮名辞書１０５、長音漢字判定部１０７、長音漢字辞書１０９、長音判定部１１１、及び翻訳部１１３を含む。 As illustrated in FIG. 1, the information processing apparatus 100 includes an input unit 101, a long sound kana determination unit 103, a long sound vowel kana dictionary 105, a long sound kanji determination unit 107, a long sound kanji dictionary 109, a long sound determination unit 111, and a translation unit 113. Including.

入力部１０１は、漢字を含む日本語テキストと、当該日本語テキストの振り仮名（読み仮名）との入力を受ける。尚、この入力方法は種々考えられるが、例えば、キーボード等の入力装置によりユーザから直接入力を受けても良いし、あるいは、ＨＤＤ（ＨａｒｄＤｉｓｋＭｅｍｏｒｙ）やフラッシュメモリ等の記憶媒体に記録されているデータを読み込むことも考えられる。 The input unit 101 receives an input of Japanese text including kanji and a kana (reading kana) of the Japanese text. Various input methods can be considered. For example, the input may be received directly from the user by an input device such as a keyboard, or may be recorded on a storage medium such as an HDD (Hard Disk Memory) or a flash memory. Reading data is also conceivable.

長音仮名判定部１０３は、入力部１０１で入力を受けた振り仮名に含まれている長音仮名となりうる文字列を特定する。長音仮名は、前述の通り、最後が母音（あ行）となる２文字以上の文字列であって、長音として発音される可能性のある文字列である。 The long sound kana determination unit 103 identifies a character string that can be a long sound kana included in the phonetic kana received by the input unit 101. As described above, the long sound kana is a character string of two or more characters that end with a vowel (a line) and may be pronounced as a long sound.

このとき、長音仮名判定部１０３は、長音母音仮名辞書１０５を参照した上で、振り仮名に含まれる文字列が、長音母音仮名辞書１０５に含まれる文字列であるか否かを判別することにより、長音仮名を特定する。 At this time, the long sound kana determination unit 103 refers to the long sound vowel kana dictionary 105 and determines whether the character string included in the phonetic kana is a character string included in the long sound vowel kana dictionary 105. , Identify the long kana.

長音母音仮名辞書１０５は、前述の通り、長音仮名を特定するための辞書である。図３に、長音母音仮名辞書１０５の具体例を示す。図３の例の長音母音仮名辞書１０５では、母音と、当該母音の前に来た場合に長音となりうる文字とが対応付けられている。例えば、「あ」の場合には、「あ」「か」「さ」「た」「な」等の文字が対応付けられている。これは、「ああ」「かあ」「さあ」「たあ」「なあ」等も文字列が長音仮名であること、すなわちこれらの文字列が振り仮名に含まれる場合には長音になり得ることを示している。 The long sound vowel kana dictionary 105 is a dictionary for specifying the long sound kana as described above. FIG. 3 shows a specific example of the long vowel kana dictionary 105. In the long vowel kana dictionary 105 in the example of FIG. 3, a vowel is associated with a character that can be a long sound when it comes before the vowel. For example, in the case of “a”, characters such as “a”, “ka”, “sa”, “ta”, and “na” are associated. This is because "ah" "ka" "sah" "taa" "naa" etc., the strings are long kana, that is, if these strings are included in the kana, they can be long Is shown.

長音漢字判定部１０７は、入力部１０１で入力を受けた日本語テキストに含まれている長音漢字となり得る文字列を特定する。長音漢字は、前述の通り読みが長音となりうる漢字である。 The long sound kanji determination unit 107 specifies a character string that can be a long sound kanji included in the Japanese text input by the input unit 101. The long kanji is a kanji that can be read as long as described above.

このとき、長音漢字判定部１０７は、長音漢字辞書１０９を参照した上で、日本語テキストに含まれる文字が、長音漢字辞書１０９に含まれる文字であるか否かを判別することにより、長音漢字を特定する。 At this time, the long sound kanji determination unit 107 refers to the long sound kanji dictionary 109 and determines whether or not the character included in the Japanese text is a character included in the long sound kanji dictionary 109. Is identified.

長音漢字辞書１０９は、前述の通り、長音仮名を特定するための辞書である。図４に、長音漢字辞書１０９の具体例を示す。図４の例の長音漢字辞書１０９では、長音となり得る漢字と、長音となる時の読み仮名とが対応付けられている。例えば、図４の例であれば、「鳴」や「飯」が長音となり得る長音漢字であること、及びそれらが長音となる場合には、「ああ」や「いい」と読まれること、等が、長音漢字辞書１０９を参照することによりわかる。 As described above, the long-tone kanji dictionary 109 is a dictionary for identifying long-tone kana. FIG. 4 shows a specific example of the long sound Chinese character dictionary 109. In the long-span kanji dictionary 109 in the example of FIG. 4, a kanji that can be a long sound is associated with a reading kana when it is a long sound. For example, in the example of FIG. 4, “sound” and “rice” are long kanji characters that can be long sound, and when they are long sound, they are read as “ah” or “good”, etc. Can be understood by referring to the long-tone kanji dictionary 109.

長音判定部１１１は、長音仮名判定部１０３及び長音漢字判定部１０７での判定結果に基づいて、実際に長音であるか否かを判別する。より具体的には、長音仮名判定部１０３により、振り仮名に長音仮名が含まれると判別され、また、長音漢字判定部１０７により長音漢字が含まれると判別された場合であって、当該長音漢字の長音時の読み仮名が長音仮名と一致する場合に、長音判定部１１１は長音であると判定する。具体例は、図１を参照しながら上述した通りである。 The long sound determination unit 111 determines whether the sound is actually a long sound based on the determination results of the long sound kana determination unit 103 and the long sound kanji determination unit 107. More specifically, when the kana character determining unit 103 determines that the kana character includes a long kana character, and the long sound kanji determining unit 107 determines that a kana character character is included, the long sound kanji character is used. When the reading kana at the time of the long sound coincides with the long sound kana, the long sound determination unit 111 determines that the sound is a long sound. A specific example is as described above with reference to FIG.

翻訳部１１３は、長音判定部１１１による判定の結果に基づいて、予め定められた言語へと表記を変換する。例えば、「東京」を英語表記に変換する場合には、前述の通り、「とう」と「きょう」が長音であると長音判定部１１１に判定されることになるので、「とうきょう」から長音母音（あ行の文字）を除いた「ときょ」を元に、翻訳部１１３は「Ｔｏｋｙｏ」へと表記を変換する。 The translation unit 113 converts the notation into a predetermined language based on the determination result by the long sound determination unit 111. For example, when converting “Tokyo” into English notation, as described above, the long sound vowel from “Tokyo” is determined by the long sound determination unit 111 that “To” and “Kyou” are long sounds. Based on the “time” excluding (the character on the line), the translation unit 113 converts the notation to “Tokyo”.

（３処理の流れ）
以下、図５を参照しながら、情報処理装置１００の処理の流れを説明する。図５は、情報処理装置１００における長音判定の処理の流れを示すフローチャートである。なお、図５の例では、表記の変換（翻訳）に係る処理は省略している。 (3 Process flow)
Hereinafter, the processing flow of the information processing apparatus 100 will be described with reference to FIG. FIG. 5 is a flowchart showing the flow of the long sound determination process in the information processing apparatus 100. In the example of FIG. 5, processing related to notation conversion (translation) is omitted.

ここで、後述の各処理ステップは、処理内容に矛盾を生じない範囲で、任意に順番を変更して若しくは並列に実行することができ、また、各処理ステップ間に他のステップを追加しても良い。更に、便宜上１つのステップとして記載されているステップは複数のステップに分けて実行することもでき、便宜上複数に分けて記載されているステップを１ステップとして実行することもできる。
まず、入力部１０１は、漢字を含む日本語テキスト及び日本語テキストの振り仮名（読み仮名）の入力を受ける（Ｓ５０１）。 Here, each processing step, which will be described later, can be executed in any order or in parallel as long as there is no contradiction in processing contents, and other steps are added between the processing steps. Also good. Further, a step described as a single step for convenience can be executed by being divided into a plurality of steps, and a step described as being divided into a plurality of steps for convenience can be executed as one step.
First, the input unit 101 receives an input of a Japanese text including kanji and a Japanese kana (reading kana) (S501).

長音仮名判定部１０３は、振り仮名の先頭から文字列を抜き出して、長音母音仮名辞書１０５と比較する（Ｓ５０３）。前述の図１の例であれば、振り仮名の先頭の文字列である「とう」を対象に、長音母音仮名辞書１０５を参照する。 The long sound kana determination unit 103 extracts a character string from the beginning of the phonetic kana and compares it with the long sound vowel kana dictionary 105 (S503). In the example of FIG. 1 described above, the long vowel kana dictionary 105 is referred to for “to”, which is the first character string of the phonetic kana.

その結果、抜き出した文字列が長音母音仮名辞書１０５に含まれなければ（すなわち、長音仮名でなければ）（Ｓ５０５のＮｏ）、長音仮名判定部１０３は、振り仮名先頭の１文字を振り仮名から削除した上で（Ｓ５０７）、まだ未処理の振り仮名が残っていれば（Ｓ５０９のＹｅｓ）、Ｓ５０３へと処理を戻す。 As a result, if the extracted character string is not included in the long sound vowel kana dictionary 105 (that is, if it is not the long sound kana) (No in S505), the long sound kana determination unit 103 determines the first character of the kana from the kana After deletion (S507), if unprocessed pseudonym still remains (Yes in S509), the process returns to S503.

もし、振り仮名から抜き出した文字列が長音仮名である場合には（Ｓ５０５のＹｅｓ）、長音漢字判定部１０７は、日本語テキストの先頭文字（図１の例であれば「東」）を対象に、長音漢字辞書１０９を参照する。 If the character string extracted from the phonetic kana is a long sound kana (Yes in S505), the long sound kanji determination unit 107 targets the first character of the Japanese text ("east" in the example of FIG. 1). In addition, the long-tone kanji dictionary 109 is referred to.

その結果、抜き出した文字が長音漢字辞書１０９に無ければ（Ｓ５１３のＮｏ）、当該文字は長音漢字ではないということなので、長音漢字判定部１０７は、当該文字をテキストから削除した上で（Ｓ５１５）、まだ未処理のテキストが残っていれば（Ｓ５１７のＹｅｓ）、Ｓ５１１の処理へと戻る。 As a result, if the extracted character does not exist in the long-span kanji dictionary 109 (No in S513), it means that the character is not a long-span kanji, so the long-span kanji determination unit 107 deletes the character from the text (S515). If unprocessed text still remains (Yes in S517), the process returns to S511.

もし、テキストから抜き出した文字が長音漢字である場合には（Ｓ５１３）、長音判定部１１１は、当該長音漢字の長音時の読み（図１の例であれば「とう」）と、Ｓ５０５で特定した長音仮名（図１の例であれば「とう」）とを比較する（Ｓ５１９）。その結果、長音漢字の長音寺の読みの中に、長音仮名が含まれていれば（一致するものがあれば）、当該長音漢字及び長音仮名は長音であると判別できるので（Ｓ５２１のＹｅｓ）、長音判定部１１１は当該長音漢字及び長音仮名を、長音であるものとして転記処理した上で（Ｓ５２３）、元のテキスト及び振り仮名から、これらの長音漢字及び長音仮名を削除する（Ｓ５２５）。その後、もしまだ未処理の振り仮名が残っていれば（Ｓ５０９のＹｅｓ）、Ｓ５０３へと処理を戻す。 If the character extracted from the text is a long Chinese character (S513), the long sound determination unit 111 reads the long Chinese character at the time of the long sound ("To" in the example of FIG. 1), and identifies it in S505. The long kana that has been made ("To" in the case of FIG. 1) is compared (S519). As a result, if a long-sword is included in a long-sword kanji reading of a long-sword (if there is a match), it can be determined that the long-span kanji and long-sword Kana are longs (Yes in S521). The long sound determination unit 111 transcribes the long sound kanji and the long sound kana as being a long sound (S523), and deletes the long sound kanji and the long sound kana from the original text and the kana (S525). Thereafter, if an unprocessed pseudonym still remains (Yes in S509), the process returns to S503.

Ｓ５２１に置いて、特定した長音漢字の読みの中に長音仮名が含まれていなければ、当該長音漢字は長音読みされていないということなので（Ｓ５２１のＮｏ）、テキストから当該長音漢字を削除した上で（Ｓ５１５）、まだ未処理のテキストが残っていれば（Ｓ５１７のＹｅｓ）、Ｓ５１１へと処理を戻す。 If a long kana is not included in the reading of the specified long kanji in S521, it means that the long kanji is not read (No in S521), and the long kanji is deleted from the text. (S515), if unprocessed text still remains (Yes in S517), the process returns to S511.

Ｓ５０３乃至Ｓ５１７の処理を繰り返した結果、振り仮名若しくはテキストの少なくとも一方の処理が全て終われば（Ｓ５０９のＮｏ若しくはＳ５１７のＮｏ）、長音判定にかかる全ての処理が終了する。 As a result of repeating the processing of S503 to S517, when all of the processing of at least one of the kana or text ends (No in S509 or No in S517), all the processing related to the long sound determination ends.

（４ハードウェア構成）
以下、図６を参照しながら、上述してきた情報処理装置１００をコンピュータにより実現する場合のハードウェア構成の一例を説明する。なお、情報処理装置１００の機能は、ネットワークを介し、複数の装置に分けて実現することも可能である。 (4 hardware configuration)
Hereinafter, an example of a hardware configuration when the information processing apparatus 100 described above is realized by a computer will be described with reference to FIG. Note that the functions of the information processing apparatus 100 can be realized by being divided into a plurality of apparatuses via a network.

図６に示すように、情報処理装置１００は、プロセッサ６０１、メモリ６０３、記憶装置６０５、入力Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）６０７、データＩ／Ｆ６０９、通信Ｉ／Ｆ６１１、及び表示装置６１３を含む。 As illustrated in FIG. 6, the information processing apparatus 100 includes a processor 601, a memory 603, a storage device 605, an input I / F (Interface) 607, a data I / F 609, a communication I / F 611, and a display device 613.

プロセッサ６０１は、メモリ６０３に記憶されているプログラムを実行することにより情報処理装置１００における様々な処理を制御する。例えば、図２で説明した入力部１０１、長音仮名判定部１０３、長音漢字判定部１０７、長音判定部１１１、及び翻訳部１１３は、メモリ６０３に一時記憶された上で、主にプロセッサ６０１上で動作するプログラムとして実現可能である。 The processor 601 controls various processes in the information processing apparatus 100 by executing a program stored in the memory 603. For example, the input unit 101, the long sound kana determination unit 103, the long sound kanji determination unit 107, the long sound determination unit 111, and the translation unit 113 described in FIG. 2 are temporarily stored in the memory 603 and mainly on the processor 601. It can be realized as an operating program.

メモリ６０３は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶媒体である。メモリ６０３は、プロセッサ６０１によって実行されるプログラムのプログラムコードや、プログラムの実行時に必要となるデータを一時的に記憶する。例えば、メモリ６０３の記憶領域には、プログラム実行時に必要となるスタック領域が確保される。 The memory 603 is a storage medium such as a RAM (Random Access Memory). The memory 603 temporarily stores a program code of a program executed by the processor 601 and data necessary for executing the program. For example, in the storage area of the memory 603, a stack area necessary for program execution is secured.

記憶装置６０５は、例えばハードディスクやフラッシュメモリ等の不揮発性の記憶媒体である。記憶装置６０５は、オペレーティングシステムや、入力部１０１、長音仮名判定部１０３、長音漢字判定部１０７、長音判定部１１１、及び翻訳部１１３を実現するための各種プログラムや、長音母音仮名辞書１０５及び長音漢字辞書１０９を実現するためのデータベース等の各種データを記憶する。記憶装置６０５に記憶されているプログラムやデータは、必要に応じてメモリ６０３にロードされることにより、プロセッサ６０１から参照される。 The storage device 605 is a non-volatile storage medium such as a hard disk or a flash memory. The storage device 605 includes an operating system, various programs for realizing the input unit 101, the long sound kana determination unit 103, the long sound kanji determination unit 107, the long sound determination unit 111, and the translation unit 113, the long sound vowel kana dictionary 105, and the long sound. Various data such as a database for realizing the Chinese character dictionary 109 is stored. Programs and data stored in the storage device 605 are referred to by the processor 601 by being loaded into the memory 603 as necessary.

入力Ｉ／Ｆ６０７は、ユーザからの入力を受け付けるためのデバイスである。入力Ｉ／Ｆ６０７の具体例としては、キーボードやマウス、タッチパネル、各種センサ等が挙げられる。入力Ｉ／Ｆ６０７は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを介して情報処理装置１００に接続されても良い。 The input I / F 607 is a device for receiving input from the user. Specific examples of the input I / F 607 include a keyboard, a mouse, a touch panel, and various sensors. The input I / F 607 may be connected to the information processing apparatus 100 via an interface such as a USB (Universal Serial Bus).

データＩ／Ｆ６０９は、情報処理装置１００の外部からデータを入力するためのデバイスである。データＩ／Ｆ６０９の具体例としては、各種記憶媒体に記憶されているデータを読み取るためのドライブ装置等がある。データＩ／Ｆ６０９は、情報処理装置１００の外部に設けられることも考えられる。その場合、データＩ／Ｆ６０９は、例えばＵＳＢ等のインタフェースを介して情報処理装置１００へと接続される。 The data I / F 609 is a device for inputting data from outside the information processing apparatus 100. Specific examples of the data I / F 609 include a drive device for reading data stored in various storage media. The data I / F 609 may be provided outside the information processing apparatus 100. In that case, the data I / F 609 is connected to the information processing apparatus 100 via an interface such as a USB.

通信Ｉ／Ｆ６１１は、情報処理装置１００の外部の装置と有線又は無線によりデータ通信するためのデバイスである。通信Ｉ／Ｆ６１１は情報処理装置１００の外部に設けることも考えられる。その場合、通信Ｉ／Ｆ６１１は、例えばＵＳＢ等のインタフェースを介して情報処理装置１００に接続される。 The communication I / F 611 is a device for performing data communication with an external device of the information processing apparatus 100 by wire or wireless. The communication I / F 611 may be provided outside the information processing apparatus 100. In that case, the communication I / F 611 is connected to the information processing apparatus 100 via an interface such as a USB.

表示装置６１３は、各種情報を表示するためのデバイスである。例えば、翻訳部１１３による変換後のテキストは、表示装置６１３に表示しても良い。表示装置６１３の具体例としては、例えば液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等が挙げられる。表示装置６１３は、情報処理装置１００の外部に設けられても良い。その場合、表示装置６１３は、例えばディスプレイケーブル等を介して情報処理装置１００に接続される。 The display device 613 is a device for displaying various information. For example, the text converted by the translation unit 113 may be displayed on the display device 613. Specific examples of the display device 613 include a liquid crystal display and an organic EL (Electro-Luminescence) display. The display device 613 may be provided outside the information processing apparatus 100. In that case, the display device 613 is connected to the information processing apparatus 100 via, for example, a display cable.

（５本実施形態の効果）
以上説明したように、本実施形態に係る情報処理装置１００は、日本語テキストでの長音漢字の有無、及び振り仮名に対する長音仮名の有無を調べた上で、当該長音漢字の読みが長音仮名に一致する場合に、当該長音漢字及び長音仮名を長音であるものと特定する。 (5 Effects of this embodiment)
As described above, the information processing apparatus 100 according to the present embodiment checks the presence or absence of a long kanji in Japanese text and the presence or absence of a long kana for a Japanese phonetic kana, and then reads the long kanji as a long kana. If they match, the long kanji and long kana are identified as long sounds.

つまり、カナ表記及び漢字表記の両方を用いるので、好適に長音を特定することができ、結果として、長音のない多言語に表記を好適に変換することができる。例えば「東京」や「大阪」を英語で表記する場合には、「Ｔｏｕｋｙｏｕ」や「Ｏｏｓａｋａ」ではなく、「Ｔｏｋｙｏ」や「Ｏｓａｋａ」と表記でき、「長雨」や「由宇」を英語表記する場合には、「Ｎａｇａａｍｅ」や「Ｙｕｕ」と表記することが可能となる。 That is, since both the kana notation and the kanji notation are used, it is possible to suitably specify the long sound, and as a result, it is possible to suitably convert the notation into a multilingual with no long sound. For example, when “Tokyo” or “Osaka” is written in English, “Tokyo” or “Osaka” can be written instead of “Tokyokyo” or “Oosaka”, and “long rain” or “Yu” is written in English. Can be expressed as “Nagaame” or “Yuu”.

（６付記事項）
尚、前述の実施形態の構成は、組み合わせたり或いは一部の構成部分を入れ替えたりしてもよい。また、本発明の構成は前述の実施形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加えてもよい。 (6 Appendix)
Note that the configurations of the above-described embodiments may be combined or some components may be replaced. The configuration of the present invention is not limited to the above-described embodiment, and various modifications may be made without departing from the scope of the present invention.

１００・・・情報処理装置、１０１・・・入力部、１０３・・・長音仮名判定部、１０５・・・長音母音仮名辞書、１０７・・・長音漢字判定部、１０９・・・長音漢字辞書、１１１・・・長音判定部、１１３・・・翻訳部、６０１・・・プロセッサ、６０３・・・メモリ、６０５・・・記憶装置、６０７・・・入力インタフェース、６０９・・・データインタフェース、６１１・・・通信インタフェース、６１３・・・表示装置 DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus, 101 ... Input part, 103 ... Long sound kana determination part, 105 ... Long sound vowel kana dictionary, 107 ... Long sound kanji determination part, 109 ... Long sound kanji dictionary, DESCRIPTION OF SYMBOLS 111 ... Long sound determination part, 113 ... Translation part, 601 ... Processor, 603 ... Memory, 605 ... Storage device, 607 ... Input interface, 609 ... Data interface, 611 ..Communication interface, 613 ... Display device

Claims

An input means for receiving input including a text including kanji and a kana for the text;
First discriminating means for discriminating whether or not a long kanji character is included in the text;
Second discriminating means for discriminating whether or not the phonetic kana is included in the phonetic kana;
As a result of the determination, when the text includes a long kana and the kana includes a long kana, and the reading of the long kana matches the long kana, the long kanji and Means for identifying the long kana as a long sound;
An information processing apparatus comprising: means for converting the phonetic pseudonym into a notation of a language without a long sound depending on whether or not a long sound is included.

The first determining means determines whether or not the text includes a long sound kanji character by referring to a long sound kanji dictionary that associates a long sound kanji character with a reading for the character included in the text.
The information processing apparatus according to claim 1.

The second discriminating means refers to a long phonetic kana in the phonetic kana by referring to a long vowel kana dictionary related to a combination of a long vowel and another character with respect to consecutive characters included in the phonetic kana. Whether or not is included,
The information processing apparatus according to claim 1 or 2.

Receiving a text containing kanji and a kana for the text;
Determining whether the text contains long kanji characters; and
Determining whether the phonetic kana is included in the phonetic kana;
As a result of the determination, if the text includes a long kana and the kana includes a long kana, and the reading of the long kana matches the long kana, the long kanji is Identifying as a long sound,
An information processing method in which an information processing apparatus performs a step of converting the phonetic name into a notation of a language without a long sound depending on whether or not a long sound is included.

Receiving a text containing kanji and a kana for the text;
Determining whether the text contains long kanji characters; and
Determining whether the phonetic kana is included in the phonetic kana;
As a result of the determination, if the text includes a long kana and the kana includes a long kana, and the reading of the long kana matches the long kana, the long kanji is Identifying as a long sound,
A program for causing a computer to execute a step of converting the phonetic name into a notation of a language without a long sound depending on whether or not a long sound is included.