JP7333526B2

JP7333526B2 - Comic machine translation device, comic parallel database generation device, comic machine translation method and program

Info

Publication number: JP7333526B2
Application number: JP2021541830A
Authority: JP
Inventors: 遼太日並; 祥之佑石渡
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2023-08-25
Anticipated expiration: 2039-08-27
Also published as: WO2021038708A1; JPWO2021038708A1

Description

特許法第３０条第２項適用２０１８年８月２８日、ＩｎｎｏｖａｔｉｖｅＴｅｃｈｎｏｌｏｇｉｅｓ２０１８採択技術発表ｈｔｔｐｓ：／／ｗｗｗ．ｄｃａｊ．ｏｒ．ｊｐ／ｎｅｗｓ／２０１８／０８／ｉｎｎｏｖａｔｉｖｅ－ｔｅｃｈｎｏｌｏｇｉｅｓ－２０１８．ｈｔｍｌApplication of Article 30, Paragraph 2 of the Patent Act Aug. 28, 2018, Innovative Technologies 2018 Adopted Technology Announcement https://www. dcaj. or. jp/news/2018/08/innovative-technologies-2018. html

本発明は、漫画の機械翻訳装置、漫画の対訳データベース生成装置、漫画の機械翻訳方法およびプログラムに関する。 The present invention relates to a machine translation device for comics, a parallel translation database generation device for comics, a machine translation method for comics, and a program.

近年、コンピュータの処理能力の向上に伴い、ある自然言語で記載された文章を他の自然言語の文章に機械翻訳する方法が注目されており、種々の機械翻訳方法、装置が提案されている（例えば、特許文献１参照）。 In recent years, with the improvement of computer processing power, methods for machine-translating sentences written in one natural language into sentences in another natural language have attracted attention, and various machine translation methods and devices have been proposed ( For example, see Patent Document 1).

特開２０１９－９６３０３号公報JP 2019-96303 A

ところで、現在、漫画の翻訳は、翻訳者の手によって行われている。翻訳者による翻訳では、漫画のストーリー等の状況に応じた柔軟な翻訳が行われるため、精度の高い翻訳が可能である。一方で、翻訳者による翻訳は、比較的費用が高いため、翻訳して他の地域、国において出版しても、著作権者や出版社が十分な利益を得られない場合がある。また、翻訳費用が高いことから、海外の出版社が翻訳権・出版権の買取を拒否し、正規の翻訳版が海外において出版されない場合もある。翻訳者による翻訳は、翻訳作業のために比較的長い時間を要する。したがって、正規の著作権者が漫画の翻訳を行って、他の地域、国において出版する前、あるいは正規の翻訳版が出版できずにいる間に、質の悪い海賊版が市場において売買されていることもあり得る。 By the way, currently, the translation of manga is done by translators. In translation by a translator, translation is performed flexibly according to the situation such as the story of the manga, so highly accurate translation is possible. On the other hand, translations by translators are relatively expensive, so even if they are translated and published in other regions or countries, copyright holders and publishers may not be able to obtain sufficient profits. Also, due to the high cost of translation, there are cases where overseas publishers refuse to purchase translation and publishing rights, and official translations are not published overseas. Translation by a translator takes a relatively long time for translation work. Therefore, before the official copyright holder translates the manga and publishes it in another region or country, or while the official translation cannot be published, low-quality pirated copies are sold in the market. It is possible.

一方で、機械翻訳を用いた場合、比較的安価かつ迅速な翻訳が可能である。したがって、機械翻訳を用いた場合、翻訳者による翻訳において生じる欠点を解消できる。しかしながら、本発明者らが検討した結果、既存の機械翻訳方法を採用しても、精度の高い漫画の機械翻訳ができないことが判明した。 On the other hand, when machine translation is used, relatively inexpensive and rapid translation is possible. Therefore, when machine translation is used, it is possible to eliminate the shortcomings that occur in translation by translators. However, as a result of investigation by the present inventors, it has been found that highly accurate machine translation of comics cannot be performed even if existing machine translation methods are employed.

本発明は、上記の課題を解決するためになされたものであって、精度の高い漫画の機械翻訳が可能な漫画の機械翻訳装置、漫画の機械翻訳方法およびプログラムならびにこれらのための漫画の対訳データベース生成装置を提供することを目的とする。 The present invention has been made in order to solve the above-mentioned problems, and is a machine translation apparatus for comics, a machine translation method and program for comics, and parallel translation of comics therefor, capable of machine-translating comics with high accuracy. An object of the present invention is to provide a database generation device.

上記の課題を解決するために提供される本発明の要旨は、主に以下の通りである。
（１）第１自然言語を用いて作成された漫画を構成する第１自然言語画像より、文字領域を検出する文字領域検出部と、
前記文字領域より、前記第１自然言語の文字情報を推定する文字情報推定部と、
前記第１自然言語の文字情報を、対訳データベースを用いた機械翻訳により第２自然言語の文字情報に翻訳する機械翻訳部と、を備え、
前記対訳データベースは、参照漫画の第１自然言語参照画像に含まれる第１自然言語文字領域と前記参照漫画の第２自然言語参照画像に含まれ、第１自然言語文字領域に対応する第２自然言語文字領域とを検出し、前記第１自然言語文字領域に存在する第１自然言語文字情報および第２自然言語文字領域に存在する第２自然言語文字情報を抽出することにより自動生成した対訳情報を含む、漫画の機械翻訳装置。
（２）前記文字情報推定部は、学習済み文字認識モデルを用いて前記文字領域より、前記文字情報を推定し、
前記学習済み文字認識モデルは、前記第１自然言語の１以上のフォント画像と、前記フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、生成されたものである、（１）に記載の漫画の機械翻訳装置。
（３）前記第１自然言語文字領域からの第１自然言語文字情報の抽出は、学習済み文字認識モデルを用いて前記第１自然言語文字情報を推定することにより行われ、
前記学習済み文字認識モデルは、前記第１自然言語の１以上のフォント画像と、前記フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、生成されたものである、（１）または（２）に記載の漫画の機械翻訳装置。
（４）前記第２自然言語文字領域からの第２自然言語文字情報の抽出は、学習済み第２自然言語文字認識モデルを用いて前記第２自然言語文字情報を推定することにより行われ、
前記学習済み第２自然言語文字認識モデルは、前記第２自然言語の１以上の第２自然言語フォント画像と、当該第２自然言語フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、生成されたものである、（１）～（３）のいずれか一項に記載の漫画の機械翻訳装置。
（５）前記第１自然言語が日本語である、（１）～（４）のいずれか一項に記載の漫画の機械翻訳装置。
（６）さらに、前記機械翻訳部により翻訳された前記第２自然言語の文字情報を、前記第１自然言語画像に画像として付与し、第２自然言語画像を生成する、画像生成部を備える、（１）～（５）のいずれか一項に記載の漫画の機械翻訳装置。
（７）参照漫画の第１自然言語参照画像に含まれる第１自然言語文字領域と前記参照漫画の第２自然言語参照画像に含まれ、第１自然言語文字領域に対応する第２自然言語文字領域とを検出する文字領域検出部と、
前記第１自然言語文字領域に存在する第１自然言語文字情報および第２自然言語文字領域に存在する第２自然言語文字情報を抽出する対訳情報抽出部と、
前記第１言語文字情報の少なくとも一部と、前記第２言語文字情報の少なくとも一部とを、対訳情報として格納する記憶部とを有する、漫画の対訳データベース生成装置。
（８）前記第１言語文字領域から第１自然言語文字情報の抽出は、学習済み文字認識モデルを用いて前記第１自然言語文字情報を推定することにより行われ、
前記学習済み文字認識モデルは、前記第1自然言語の１以上のフォント画像と、前記フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、生成されたものである、（７）に記載の漫画の対訳データベース生成装置。
（９）前記第２言語文字領域からの第２自然言語文字情報の抽出は、学習済み第２自然言語文字認識モデルを用いて前記第２自然言語文字情報を推定することにより行われ、
前記学習済み第２自然言語文字認識モデルは、前記第２自然言語の１以上の第２自然言語フォント画像と、当該第２自然言語フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、生成されたものである、（８）に記載の漫画の対訳データベース生成装置。
（１０）プロセッサにより、
第１自然言語を用いて作成された漫画を構成する第１自然言語画像より、文字領域を検出することと、
前記文字領域より、前記第１自然言語の文字情報を推定することと、および
前記第１自然言語の文字情報を、対訳データベースを用いた機械翻訳により第２自然言語の文字情報に翻訳することと、を実行し、
前記対訳データベースは、参照漫画の第１自然言語参照画像に含まれる第１自然言語文字領域と前記参照漫画の第２自然言語参照画像に含まれ、第１自然言語文字領域に対応する第２自然言語文字領域とを検出し、前記第１自然言語文字領域に存在する第１自然言語文字情報および第２自然言語文字領域に存在する第２自然言語文字情報を抽出することにより自動生成した対訳情報を含む、漫画の機械翻訳方法。
（１１）コンピュータを、
第１自然言語を用いて作成された漫画を構成する第１自然言語画像より、文字領域を検出する文字領域検出部と、
前記文字領域より、前記第１自然言語の文字情報を推定する文字情報推定部と、
前記第１自然言語の文字情報を、対訳データベースを用いた機械翻訳により第２自然言語の文字情報に翻訳する機械翻訳部と、を備え、
前記対訳データベースは、参照漫画の第１自然言語参照画像に含まれる第１自然言語文字領域と前記参照漫画の第２自然言語参照画像に含まれ、第１自然言語文字領域に対応する第２自然言語文字領域とを検出し、前記第１自然言語文字領域に存在する第１自然言語文字情報および第２自然言語文字領域に存在する第２自然言語文字情報を抽出することにより自動生成した対訳情報を含む、漫画の機械翻訳装置として機能させるためのプログラム。The gist of the present invention provided to solve the above problems is mainly as follows.
(1) a character area detection unit that detects a character area from a first natural language image that constitutes a cartoon created using a first natural language;
a character information estimation unit for estimating character information of the first natural language from the character area;
a machine translation unit that translates the character information in the first natural language into character information in the second natural language by machine translation using a bilingual database;
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. bilingual information automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region A machine translation device for cartoons, including
(2) the character information estimation unit estimates the character information from the character region using a trained character recognition model;
The learned character recognition model is machine-learned using teacher data including one or more font images of the first natural language and an image obtained by adding deformation, tilt and/or noise to the font image, The cartoon machine translation device according to (1), which is generated.
(3) extracting the first natural language character information from the first natural language character region is performed by estimating the first natural language character information using a trained character recognition model;
The learned character recognition model is machine-learned using teacher data including one or more font images of the first natural language and an image obtained by adding deformation, tilt and/or noise to the font image, The cartoon machine translation device according to (1) or (2), which is generated.
(4) extraction of the second natural language character information from the second natural language character region is performed by estimating the second natural language character information using a trained second natural language character recognition model;
The trained second natural language character recognition model includes one or more second natural language font images of the second natural language, and an image obtained by adding deformation, tilt and / or noise to the second natural language font image. The machine translation device for comics according to any one of (1) to (3), which is generated by machine learning using teacher data including the training data.
(5) The comic machine translation device according to any one of (1) to (4), wherein the first natural language is Japanese.
(6) Furthermore, an image generation unit that generates a second natural language image by adding the character information in the second natural language translated by the machine translation unit as an image to the first natural language image, The machine translation device for comics according to any one of (1) to (5).
(7) a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region; a character area detection unit for detecting an area;
a parallel translation information extraction unit for extracting first natural language character information existing in the first natural language character area and second natural language character information existing in the second natural language character area;
A parallel translation database generation apparatus for comics, comprising a storage unit for storing at least part of the first language character information and at least part of the second language character information as parallel translation information.
(8) extracting the first natural language character information from the first language character region by estimating the first natural language character information using a trained character recognition model;
The learned character recognition model is machine-learned using teacher data including one or more font images of the first natural language and an image obtained by adding deformation, tilt and/or noise to the font image, The bilingual database generation device for comics according to (7), which is generated.
(9) extraction of the second natural language character information from the second language character region is performed by estimating the second natural language character information using a trained second natural language character recognition model;
The trained second natural language character recognition model includes one or more second natural language font images of the second natural language, and an image obtained by adding deformation, tilt and / or noise to the second natural language font image. The bilingual comics database generation device according to (8), which is generated by machine learning using teacher data including the data.
(10) by the processor,
Detecting a character region from a first natural language image that constitutes a cartoon created using a first natural language;
estimating character information in the first natural language from the character area; and translating the character information in the first natural language into character information in a second natural language by machine translation using a bilingual database. , and
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. bilingual information automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region Machine translation methods for cartoons, including
(11) a computer;
a character area detection unit that detects a character area from a first natural language image that constitutes a cartoon created using a first natural language;
a character information estimation unit for estimating character information of the first natural language from the character area;
a machine translation unit that translates the character information in the first natural language into character information in the second natural language by machine translation using a bilingual database;
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. bilingual information automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region A program for functioning as a machine translation device for comics, including

以上、本発明によれば、精度の高い漫画の機械翻訳が可能な漫画の機械翻訳装置、漫画の機械翻訳方法およびプログラムならびにこれらのための漫画の対訳データベース生成装置を提供することができる。 As described above, according to the present invention, it is possible to provide a comics machine translation device, a comics machine translation method and program, and a comics bilingual database generation device for these that are capable of highly accurate comics machine translation.

本発明の一実施形態に係る漫画の機械翻訳装置の機能構成を説明するブロック図である。1 is a block diagram illustrating the functional configuration of a machine translation device for comics according to an embodiment of the present invention; FIG. 図１に記載の漫画の機械翻訳装置において使用される学習済み文字認識モデルを生成する文字認識モデル生成装置の機能構成を説明するブロック図である。2 is a block diagram illustrating the functional configuration of a character recognition model generation device that generates a learned character recognition model used in the cartoon machine translation device shown in FIG. 1; FIG. 本発明の一実施形態に係る漫画の対訳データベース生成装置の機能構成を説明するブロック図である。1 is a block diagram illustrating the functional configuration of a bilingual comic database generation device according to an embodiment of the present invention; FIG. 図２に示す文字認識モデル生成装置による教師データの生成処理を説明するための図である。3 is a diagram for explaining a process of generating teacher data by the character recognition model generation device shown in FIG. 2; FIG. 図３に示す漫画の対訳データベース生成装置による参照漫画の画像の取り込みおよび対応付けを説明するための図である。FIG. 4 is a diagram for explaining how the bilingual comic database generation device shown in FIG. 3 captures and associates images of reference comics; 図３に示す漫画の対訳データベース生成装置による文字領域の検出方法の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a method of detecting a character region by the bilingual comic database generation device shown in FIG. 3; 図３に示す漫画の対訳データベース生成装置による文字領域の検出方法の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a method of detecting a character region by the bilingual comic database generation device shown in FIG. 3; 図３に示す漫画の対訳データベース生成装置による文字列の特定方法の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a method of specifying a character string by the comic bilingual database generation device shown in FIG. 3; 図３に示す漫画の対訳データベース生成装置による文字列の特定方法の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a method of specifying a character string by the comic bilingual database generation device shown in FIG. 3; 図１に示す漫画の機械翻訳装置による文字情報の抽出方法の一例を説明するための図である。2 is a diagram for explaining an example of a method of extracting character information by the comic machine translation apparatus shown in FIG. 1; FIG. 図１に示す漫画の機械翻訳装置による第２自然言語の漫画画像の生成方法の一例を説明するための図である。2 is a diagram for explaining an example of a method for generating a comic image in a second natural language by the comic machine translation device shown in FIG. 1; FIG. 本発明の一実施形態に係る文字認識モデル生成方法を説明するフローチャートである。4 is a flowchart illustrating a character recognition model generation method according to an embodiment of the present invention; 本発明の一実施形態に係る漫画の対訳データベース生成方法を説明するフローチャートである。4 is a flow chart illustrating a method for generating a parallel translation database of comics according to an embodiment of the present invention; 本発明の一実施形態に係る漫画の機械翻訳方法を説明するフローチャートである。4 is a flow chart illustrating a method for machine translation of comics according to an embodiment of the present invention; 図１に示す漫画の機械翻訳装置のハードウェア構成の一例を示すブロック図である。2 is a block diagram showing an example of the hardware configuration of the machine translation apparatus for comics shown in FIG. 1; FIG.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。
＜１．本発明者らによる検討＞
まず、本発明の実施形態の説明に先立ち、本発明者らによる検討について説明する。上述したように、本発明者らが検討したところ、既存の機械翻訳方法を採用しても、精度の高い漫画の機械翻訳ができなかった。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
<1. Study by the present inventors>
First, prior to describing embodiments of the present invention, studies by the inventors will be described. As described above, when the inventors of the present invention have studied, even if the existing machine translation method is adopted, it is not possible to perform highly accurate machine translation of comics.

本発明者らは、この理由について以下のような可能性を考慮した。機械翻訳を精度よく行うためには対訳データベース（コーパス）が必要であるが、既存の対訳データベースは漫画の機械翻訳に適していない可能性がある。特に、漫画は、通常の文章とは異なり、漫画中の人物のセリフが主に文として記載されている。このようなセリフは、主語や述語の対応がない場合や、文が途切れている場合も多い。さらには、疑問符、感嘆符、長音符、慢符等の符号の使用方法も作者によって大きく異なる。 The inventors considered the following possibilities for this reason. A bilingual database (corpus) is necessary for accurate machine translation, but existing bilingual databases may not be suitable for machine translation of comics. In particular, comics are mainly written as sentences, unlike ordinary sentences, in which the lines of characters in the comics are written. Such lines often have no correspondence between subjects and predicates, or have broken sentences. Furthermore, the use of symbols such as question marks, exclamation marks, long notes, and arrogances also varies greatly depending on the author.

このような可能性に鑑み、本発明者らは、鋭意検討して漫画の対訳情報を含む特定の対訳データベースを用いたところ、精度の高い機械翻訳が可能となることを見出し、本発明に至った。 In view of such possibilities, the present inventors conducted extensive research and found that highly accurate machine translation is possible by using a specific bilingual database containing bilingual information on comics, leading to the present invention. Ta.

＜２．システムの概要＞
まず、本実施形態に係る漫画の機械翻訳装置（以下、単に「機械翻訳装置」ともいう）、および漫画の対訳データベース生成装置（以下、単に「対訳データベース生成装置」ともいう）を含む、漫画の機械翻訳システムの概要について説明する。図１は、本発明の一実施形態に係る機械翻訳装置の機能構成を説明するブロック図、図２は、図１に記載の機械翻訳装置において使用される学習済み文字認識モデルを生成する文字認識モデル生成装置の機能構成を説明するブロック図、図３は、本発明の一実施形態に係る対訳データベース生成装置の機能構成を説明するブロック図である。<2. System Overview>
First, a cartoon machine translation device according to the present embodiment (hereinafter also simply referred to as a "machine translation device") and a manga bilingual database generation device (hereinafter simply referred to as a "bilingual database generation device"). An overview of the machine translation system will be explained. FIG. 1 is a block diagram illustrating the functional configuration of a machine translation device according to one embodiment of the present invention, and FIG. 2 is a character recognition system that generates a learned character recognition model used in the machine translation device described in FIG. FIG. 3 is a block diagram for explaining the functional configuration of the model generation device, and FIG. 3 is a block diagram for explaining the functional configuration of the bilingual database generation device according to one embodiment of the present invention.

図１に示す機械翻訳装置１００は、第１自然言語としての日本語を用いて作成された漫画中の日本語文字情報を、第２自然言語としての英語文字情報に機械翻訳する装置である。 The machine translation device 100 shown in FIG. 1 is a device for machine-translating Japanese character information in a comic created using Japanese as a first natural language into English character information as a second natural language.

また、図２示す文字認識モデル生成装置２００は、漫画画像に存在する自然言語文字情報を推定するための学習済み文字認識モデルを生成するための装置である。図３に示す対訳データベース生成装置３００は、翻訳済みの参照漫画について、第１自然言語としての日本語と、第２自然言語としての英語との対訳情報を抽出し、対訳データベースを自動生成する装置である。 Also, the character recognition model generation device 200 shown in FIG. 2 is a device for generating a trained character recognition model for estimating natural language character information existing in a cartoon image. The bilingual database generation device 300 shown in FIG. 3 is a device that automatically generates a bilingual database by extracting bilingual information in Japanese as the first natural language and English as the second natural language from the translated reference comics. is.

そして、図１に示すように、これらの機械翻訳装置１００、文字認識モデル生成装置２００、対訳データベース生成装置３００は、ネットワーク４００を介して、相互に通信可能であり、漫画の機械翻訳システムを構成している。ここで、文字認識モデル生成装置２００において生成される学習済み文字認識モデルは、機械翻訳装置１００および対訳データベース生成装置３００において利用される。また、対訳データベース生成装置３００において生成される対訳データベースは、機械翻訳装置１００における機械翻訳において使用される。したがって、説明の容易化のため、まず文字認識モデル生成装置２００、対訳データベース生成装置３００、次いで機械翻訳装置１００の順に説明を行う。 As shown in FIG. 1, the machine translation device 100, the character recognition model generation device 200, and the bilingual database generation device 300 can communicate with each other via a network 400 to form a machine translation system for comics. are doing. Here, the learned character recognition model generated by the character recognition model generation device 200 is used in the machine translation device 100 and the bilingual database generation device 300. FIG. Also, the parallel translation database generated by the parallel translation database generation device 300 is used in the machine translation performed by the machine translation device 100 . Therefore, for ease of explanation, the character recognition model generation device 200, the bilingual database generation device 300, and then the machine translation device 100 will be described in this order.

なお、本実施形態においては、一例として、第１自然言語が日本語であり、第２自然言語が英語である場合について説明する。しかしながら、第１自然言語、第２自然言語は、これらの自然言語に限定されるものではなく、文字を用いて記載される任意の自然言語であることができる。 In this embodiment, as an example, a case where the first natural language is Japanese and the second natural language is English will be described. However, the first natural language and the second natural language are not limited to these natural languages, and can be any natural language written using characters.

（２．１．文字認識モデル生成装置２００）
図２に示すように文字認識モデル生成装置２００は、教師データ生成部２１０と、機械学習部２３０と、記憶部２５０とを有する。(2.1. Character recognition model generation device 200)
As shown in FIG. 2 , the character recognition model generation device 200 has a teacher data generation section 210 , a machine learning section 230 and a storage section 250 .

教師データ生成部２１０は、後述する機械学習部２３０で用いる教師データを生成する。具体的には、図４に示すように、教師データ生成部２１０は、複数種のフォントについての文字画像４１１を含む画像、すなわちフォント画像群４１０を用意し、これに変形、傾きおよび／またはノイズを付与した加工文字画像４２１を含む加工文字画像群４２０を生成する。ここで、フォント画像群４１０は、文字画像４１１により構成された単語、文、あるいは文字列もしくは行単位等の複数の文字画像４１１により構成される単位を含んでいてもよい。この場合、これらの単位ごとの文字認識モデルを生成することもできる。次いで、加工フォント画像群４２０中の加工文字画像４２１と、フォント画像群４１０中の文字画像４１１とを組み合わせて、学習用文字画像４３１を複数含む学習用文字画像群４３０を生成する。なお、フォント画像群４１０については、記憶部２５０中に保存されたデータを使用することができる。また、生成した学習用文字画像群４３０等の各種データは、必要に応じて記憶部２５０に送信されて保存されてもよい。 The teacher data generation unit 210 generates teacher data used by the machine learning unit 230, which will be described later. Specifically, as shown in FIG. 4, the training data generation unit 210 prepares an image including character images 411 for a plurality of types of fonts, that is, a font image group 410. A processed character image group 420 including a processed character image 421 to which Here, the font image group 410 may include a unit composed of a plurality of character images 411, such as a word or sentence composed of character images 411, or a character string or line unit. In this case, it is also possible to generate a character recognition model for each of these units. Next, the processed character images 421 in the processed font image group 420 and the character images 411 in the font image group 410 are combined to generate a learning character image group 430 including a plurality of learning character images 431 . For the font image group 410, data saved in the storage unit 250 can be used. Various data such as the generated learning character image group 430 may be transmitted to and stored in the storage unit 250 as necessary.

なお、本開示においては、学習用文字画像群４３０は、図４に記載される態様に限定されるものではなく、学習用文字画像群４３０は、例えば、漫画に記載される文字画像、その他公知の文字認識データセットの文字画像等の文字画像をさらに含んでいてもよい。 Note that in the present disclosure, the learning character image group 430 is not limited to the embodiment illustrated in FIG. may further include a character image, such as a character image of the character recognition data set.

さらに、教師データ生成部２１０は、学習用文字画像群４３０中の各学習用文字画像４３１に対応する当該学習用文字画像４３１の文字情報を正解データとして用意する。学習用文字画像群４３０が加工文字画像４２１および文字画像４１１以外の文字画像を含む場合、当該文字画像に対しては必要に応じて手動でまたは自動的に正解データに対応する文字情報が付与されてもよい。そして、教師データ生成部２１０は、例題データである学習用文字画像群４３０と正解データとを含む教師データを生成する。生成した教師データは、機械学習部２３０へ直接出力されてもよいし、記憶部２５０において保存されてもよい。 Further, the teacher data generation unit 210 prepares character information of the learning character image 431 corresponding to each learning character image 431 in the learning character image group 430 as correct data. When the learning character image group 430 includes character images other than the processed character image 421 and the character image 411, the character information corresponding to the correct data is automatically or manually added to the character image as necessary. may Then, the teacher data generation unit 210 generates teacher data including the learning character image group 430, which is the example data, and the correct answer data. The generated teacher data may be directly output to the machine learning section 230 or may be stored in the storage section 250 .

機械学習部２３０は、教師データ生成部２１０において生成した教師データを用いて機械学習を行い、学習済み文字認識モデルを生成する。機械学習において用いられる手法（アルゴリズム）は、特に限定されるものではなく、当業者が利用可能な各種手法を単独でまたは組み合わせて用いることができる。当該手法としては、例えば、ＲｅｓＮｅｔ（残渣ネットワーク）イメージ認識モジュール、ＣＲＮＮ（ｃｏｎｖｏｌｕｔｉｏｎａｌｒｅｃｕｒｒｅｎｔｎｅｕｒａｌｎｅｔｗｏｒｋ）や、Ｂｉ－ＤｉｒｅｃｔｉｏｎａｌＬＳＴＭ（Ｌｏｎｇｓｈｏｒｔ－ｔｅｒｍｍｅｍｏｒｙ）を含むＬＳＴＭ等の畳み込みニューラルネットワークや再帰型ニューラルネットワーク等の各種ニューラルネットワークや、これらの組み合わせを挙げることができる。なお、学習済み文字認識モデルは、文字単位で文字情報を認識するものであってもよいし、単語、文、あるいは文字列もしくは行単位等の複数の文字により構成される単位で文字情報を認識するものであってもよい。機械学習部２３０は、生成した学習済み文字認識モデルを記憶部２５０に保存するとともに、必要に応じて機械翻訳装置１００や対訳データベース生成装置３００へ送信する。 The machine learning unit 230 performs machine learning using the teacher data generated by the teacher data generation unit 210 to generate a learned character recognition model. Techniques (algorithms) used in machine learning are not particularly limited, and various techniques available to those skilled in the art can be used alone or in combination. Examples of the method include a ResNet (residual network) image recognition module, CRNN (convolutional recurrent neural network), convolutional neural networks such as LSTM including Bi-Directional LSTM (Long short-term memory), recursive neural networks, etc. and combinations thereof. The trained character recognition model may recognize character information in units of characters, or may recognize character information in units composed of multiple characters such as words, sentences, character strings, or lines. It may be something to do. The machine learning unit 230 stores the generated learned character recognition model in the storage unit 250, and transmits it to the machine translation device 100 and the bilingual database generation device 300 as necessary.

記憶部２５０は、教師データや学習済み文字認識モデルの生成に必要な各種情報を保存するとともに、生成した教師データや学習済み文字認識モデルを保存する。教師データや学習済み文字認識モデルの生成に必要な各種情報としては、例えば、フォント情報（文字画像および文字情報）、変形、傾き、ノイズ付与に必要な加工のための情報、機械学習のための手法に関する各種情報等が挙げられる。 The storage unit 250 stores various information necessary for generating teacher data and learned character recognition models, and stores the generated teacher data and learned character recognition models. Various types of information necessary for generating teacher data and trained character recognition models include font information (character images and character information), deformation, inclination, information for processing necessary for adding noise, and information for machine learning. Various types of information related to the method can be mentioned.

（２．２．対訳データベース生成装置３００）
対訳データベース生成装置３００は、翻訳済みの参照漫画について、日本語と、英語との対訳情報を抽出し、対訳データベースを自動生成する。図３に示すように、対訳データベース生成装置３００は、参照画像取得部３１０と、文字領域検出部３３０と、対訳情報抽出部３５０と、記憶部３７０とを有している。(2.2. Bilingual database generation device 300)
The bilingual database generation device 300 extracts bilingual information in Japanese and English from translated reference comics, and automatically generates a bilingual database. As shown in FIG. 3 , the translation database generation device 300 has a reference image acquisition section 310 , a character area detection section 330 , a translation information extraction section 350 and a storage section 370 .

参照画像取得部３１０は、翻訳済みの参照漫画中の画像を取得し、日本語参照画像と英語参照画像とを対応付ける。ここで、参照漫画としては、第１の自然言語としての日本語で作成された日本語参照漫画５００Ｊと、第２自然言語としての英語で作成された英語参照漫画５００Ｅとが存在する漫画作品であれば、特に限定されず、任意のものを使用することができる。日本語参照漫画５００Ｊおよび英語参照漫画５００Ｅは、それぞれ電子的なテキストデータが付属していなくてもよい。本実施形態においては、後述する文字領域検出部３３０および対訳情報抽出部３５０により、精度よく、日本語文字情報および英語文字情報を抽出することができる。また、日本語参照漫画５００Ｊおよび英語参照漫画５００Ｅは、少なくともいずれかが翻訳者により翻訳されたものであることが好ましい。これにより、より精度の良い機械翻訳が可能になる。 The reference image acquisition unit 310 acquires images in the translated reference comics, and associates the Japanese reference images with the English reference images. Here, as reference cartoons, there are a Japanese reference cartoon 500J created in Japanese as the first natural language and an English reference cartoon 500E created in English as the second natural language. If there is, it is not particularly limited, and any one can be used. The Japanese reference comics 500J and the English reference comics 500E do not have to be accompanied by electronic text data. In this embodiment, the character area detection unit 330 and the parallel translation information extraction unit 350, which will be described later, can accurately extract Japanese character information and English character information. At least one of the Japanese reference cartoon 500J and the English reference cartoon 500E is preferably translated by a translator. This enables more accurate machine translation.

図５に示すように、参照画像取得部３１０は、まず、日本語参照漫画５００Ｊおよび英語参照漫画５００Ｅの各ページの画像を、それぞれ日本語参照画像５１０Ｊ_ｎ、英語参照画像５１０Ｅ_ｍとして取り込む（ここでｎ、ｍは自然数である）。ここで、一般に漫画は、絵と文字により表現されており、翻訳された際にも、原作の漫画と翻訳後の漫画とは、内容部分については、ページごとに対応する。しかしながら、表紙や目次の構成等によっては、原作の漫画と翻訳後の漫画とで対応位置（表紙からのページ数）が異なる場合がある。As shown in FIG. 5, the reference image acquisition unit 310 first captures the images of each page of the Japanese reference cartoon 500J and the English reference cartoon 500E as Japanese reference images _510Jn and English reference images _510Em , respectively. where n and m are natural numbers). Here, comics are generally represented by pictures and characters, and even when translated, the original comics and the translated comics correspond to each other in terms of the contents of each page. However, depending on the composition of the cover and table of contents, etc., the corresponding position (the number of pages from the cover) may differ between the original manga and the translated manga.

したがって、参照画像取得部３１０は、取り込まれた日本語参照漫画５００Ｊの取り込まれた日本語参照画像５１０Ｊ_ｎを英語参照漫画５００Ｅの英語参照画像５１０Ｅ_ｍと対応付ける。対応付けのための手法は特に限定されないが、例えば局所的な特徴量を演算することにより行うことができる。具体的には、ＡＫＡＺＥ検出器等の検出器により、日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍの局所記述子（ｌｏｃａｌｄｅｓｃｒｉｐｔｏｒ）を検出する。次いで、日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍの２つのページのホモグラフィー行列を算出する。ついで、ｉｎｌｉｅｒと判断された局所的特徴のペアをカウントすることにより、日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍの類似性を判断する。Therefore, the reference image acquisition unit 310 associates the captured Japanese reference image _510Jn of the captured Japanese reference cartoon 500J with the English reference image 510Em of the English reference cartoon _500E . Although the method for matching is not particularly limited, it can be performed, for example, by calculating local feature amounts. Specifically, a detector such as the AKAZE detector detects the local descriptors of the Japanese reference image 510J _n and the English reference image 510E _m . The homography matrices of the two pages of the Japanese reference image _510Jn and the English reference image _510Em are then calculated. The similarity between the Japanese reference image 510J _n and the English reference image 510E _m is then determined by counting the pairs of local features that are determined to be inferior.

この結果に基づき、参照画像取得部３１０は、日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍの対応付けを行う。例えば、図５においては、日本語参照画像５１０Ｊ_１は英語参照画像５１０Ｅ_１に、日本語参照画像５１０Ｊ_２は英語参照画像５１０Ｅ_３に、日本語参照画像５１０Ｊ_３は英語参照画像５１０Ｅ_４に対応付けられる。参照画像取得部３１０は、対応付けられた日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍのペアを文字領域検出部３３０に出力するおよび／または記憶部３７０に保存する。なお、参照画像取得部３１０は、必要に応じて、保存または出力される日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍに対してこれらの形状が一致するように画像補正を行ってもよい。例えば、参照画像取得部３１０は、日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍのサイズを統一してもよいし、あるいは日本語参照画像５１０Ｊ_ｎおよび英語参照画像５１０Ｅ_ｍに存在する歪みや傾きを補正してもよい。Based on this result, the reference image acquisition unit 310 associates the Japanese reference image _510Jn with the English reference image _510Em . For example, in FIG. 5, Japanese reference image _510J1 maps to English reference image _510E1 , Japanese reference image _510J2 maps to English reference image _510E3 , and Japanese reference image _510J3 maps to English reference image _510E4 . be done. The reference image acquisition unit 310 outputs the pair of the associated Japanese reference image 510J _n and English reference image 510E _m to the character area detection unit 330 and/or stores them in the storage unit 370 . Note that the reference image acquiring unit 310 may perform image correction on the Japanese reference image _510Jn and the English reference image _510Em to be stored or output, as necessary, so that their shapes match. For example, the reference image acquisition unit 310 may unify the sizes of the Japanese reference image _510Jn and the English reference image _510Em , or may adjust the distortion or tilt present in the Japanese reference image _510Jn and the English reference image _510Em . may be corrected.

図６、図７に示すように、文字領域検出部３３０は、日本語参照画像５１０Ｊ_ｎに含まれる日本語文字領域５２０Ｊ、５３０Ｊと英語参照画像５１０Ｅ_ｍに含まれ、日本語文字領域に対応する英語文字領域５２０Ｅ、５３０Ｅとを検出する。As shown in FIGS. 6 and 7, the character region detection unit 330 detects the Japanese character regions 520J and 530J included in the Japanese reference image _510Jn and the Japanese character regions included in the English reference image _510Em . English character areas 520E and 530E are detected.

日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅの検出は、いかなる手法で行われてもよいが、例えば、物体検出器（ｏｂｊｅｃｔｄｅｔｅｃｔｏｒ）により検出することができる。物体検出器としては、特に限定されるものではなく、例えば、Ｒ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔｅｒＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＭａｓｋＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ等のＲ－ＣＮＮ（ＲｅｇｉｏｎｓｗｉｔｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ）系物体検出器、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＹＯＬＯ（ＹｏｕＬｏｏｋＯｎｌｙＯｎｃｅ）、Ｍ２Ｄｅｔ等を用いることができる。 The Japanese character areas 520J and 530J and the English character areas 520E and 530E may be detected by any method, but can be detected by, for example, an object detector. The object detector is not particularly limited. For example, R-CNN (Regions with Convolutional Neural Networks) system object detector, SSD (Single Shot MultiBox Detector), YOLO (You Look Only Once), M2Det, etc. can be used.

また、日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅのうち、一方の自然言語の文字領域のみ検出し、検出した文字領域の位置に基づき、他方の自然言語に存在する文字領域を特定してもよい。通常、漫画の文字情報は、翻訳後においても漫画の絵画中の同一位置に配置されるため、精度の良い文字領域の検出が可能である。さらに、物体検出処理に要する時間が節約される。例えば、図７において、日本語文字領域５２０Ｊ、５３０Ｊを物体検出処理により検出し、その後、英語参照画像５１０Ｅ_ｍ中の日本語文字領域５２０Ｊ、５３０Ｊに対応する部位を英語文字領域５２０Ｅ、５３０Ｅとして特定することができる。In addition, only the character area of one of the Japanese character areas 520J and 530J and the English character areas 520E and 530E is detected, and based on the position of the detected character area, the character area existing in the other natural language is specified. You may Since the character information of a comic is normally arranged at the same position in the picture of the comic even after translation, it is possible to detect the character region with high accuracy. Furthermore, the time required for the object detection process is saved. For example, in FIG. 7, Japanese character areas 520J and 530J are detected by object detection processing, and then portions corresponding to the Japanese character areas 520J and 530J in the English reference image _510Em are identified as English character areas 520E and 530E. can do.

対訳情報抽出部３５０は、文字領域検出部３３０により検出された日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅより、それぞれ日本語文字情報５２１Ｊ、５３１Ｊおよび英語文字情報５２１Ｅ、５３１Ｅを抽出する。具体的には、対訳情報抽出部３５０は、日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅにおいて存在する文字列を特定し、文字列に存在する文字情報を抽出する。 The parallel translation information extraction unit 350 extracts Japanese character information 521J and 531J and English character information 521E and 531E from the Japanese character areas 520J and 530J and the English character areas 520E and 530E detected by the character area detection unit 330, respectively. . Specifically, parallel translation information extraction unit 350 identifies character strings that exist in Japanese character regions 520J and 530J and English character regions 520E and 530E, and extracts character information that exists in the character strings.

文字列の特定は、一例として、以下のようにして行うことができる。通常、漫画中の文字列は、黒字で記載され、周囲が白色である。したがって、縦書きの文字列を検出する場合、図８に示すように、対訳情報抽出部３５０は、まず、文字領域５４０について、ピクセルの列５４１に分割し、列５４１中に連結された黒のピクセル部位が存在するか否かを判断し、黒のピクセル部位が存在する列５４３および列群５４５を特定する。その後、対訳情報抽出部３５０は、文字列としては小さい列５４３を除去して、列群５４５を文字列５４５として特定する。その後、必要に応じて、対訳情報抽出部３５０は、文字列５４５について分割を行い、文字画像５４７、５４９を得る。 Character strings can be specified as follows, for example. Character strings in cartoons are usually written in black and surrounded by white. Therefore, when detecting a vertically written character string, as shown in FIG. Determine if pixel sites are present and identify columns 543 and column groups 545 where black pixel sites are present. After that, the parallel translation information extraction unit 350 removes the column 543 that is small as a character string, and identifies the column group 545 as the character string 545 . After that, the parallel translation information extraction unit 350 divides the character string 545 as necessary to obtain character images 547 and 549 .

同様に、横書きの文字列を検出する場合、図９に示すように、対訳情報抽出部３５０は、まず、文字領域５５０について、ピクセルの行５５１に分割し、行５５１中に連結された黒のピクセル部位が存在するか否かを判断し、黒のピクセル部位が存在する行および行群５５３を特定する。その後、対訳情報抽出部３５０は、文字列としては小さい行を除去して、行群５５３を文字列５５３として特定する。 Similarly, when detecting a horizontally written character string, as shown in FIG. Determine if pixel sites exist and identify rows and groups of rows 553 where black pixel sites exist. After that, the parallel translation information extraction unit 350 removes lines that are small as character strings, and identifies the line group 553 as the character strings 553 .

なお、本開示は、図８、図９に記載される態様に限定されるものではなく、対訳情報抽出部３５０は、文字列単位で物体検出を行うことにより、直接日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅから文字列を検出してもよい。この場合、文字領域検出部３３０を省略することができる。この場合においても、物体検出器としては、特に限定されるものではなく、例えば、Ｒ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔｅｒＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＭａｓｋＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ等のＲ－ＣＮＮ（ＲｅｇｉｏｎｓｗｉｔｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ）系物体検出器、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＹＯＬＯ（ＹｏｕＬｏｏｋＯｎｌｙＯｎｃｅ）、Ｍ２Ｄｅｔ等を用いることができる。 It should be noted that the present disclosure is not limited to the embodiments illustrated in FIGS. 8 and 9, and the parallel translation information extraction unit 350 directly detects the Japanese character regions 520J and 530J by performing object detection in units of character strings. and English character regions 520E and 530E. In this case, the character area detection section 330 can be omitted. Even in this case, the object detector is not particularly limited. -CNN (Regions with Convolutional Neural Networks) system object detector, SSD (Single Shot MultiBox Detector), YOLO (You Look Only Once), M2Det, etc. can be used.

ついで、対訳情報抽出部３５０は、日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅにおいて特定された文字列から、日本語文字情報５２１Ｊ、５３１Ｊおよび英語文字情報５２１Ｅ、５３１Ｅを抽出する。日本語文字情報５２１Ｊ、５３１Ｊおよび英語文字情報５２１Ｅ、５３１Ｅの抽出は、文字認識モデル生成装置２００において生成した学習済み日本語文字認識モデルおよび学習済み英語文字認識モデルを用いて行うことができる。 Next, parallel translation information extraction unit 350 extracts Japanese character information 521J and 531J and English character information 521E and 531E from the character strings specified in Japanese character areas 520J and 530J and English character areas 520E and 530E. The Japanese character information 521J, 531J and the English character information 521E, 531E can be extracted using the trained Japanese character recognition model and the trained English character recognition model generated by the character recognition model generating device 200. FIG.

さらに、対訳情報抽出部３５０は、抽出された日本語文字情報５２１Ｊとこれに対応する英語文字情報５２１Ｅ、また日本語文字情報５３１Ｊとこれに対応する英語文字情報５３１Ｅを１対の文の対訳情報として特定し、後述する対訳データベース３７１に出力する。 Further, the parallel translation information extraction unit 350 converts the extracted Japanese character information 521J and the corresponding English character information 521E, and the Japanese character information 531J and the corresponding English character information 531E into a pair of sentence parallel translation information. , and output to the bilingual database 371, which will be described later.

記憶部３７０は、対訳情報抽出部３５０において行われる処理に必要な各種情報を記憶するとともに、対訳データベース３７１を格納する。対訳情報抽出部３５０において行われる処理に必要な各種情報としては、例えば、学習済み日本語文字認識モデルおよび学習済み英語文字認識モデル等が挙げられる。 The storage unit 370 stores various kinds of information necessary for processing performed in the parallel translation information extraction unit 350 and also stores a parallel translation database 371 . Various types of information necessary for the processing performed in the parallel translation information extraction unit 350 include, for example, a trained Japanese character recognition model and a trained English character recognition model.

また、対訳データベース３７１には、少なくとも対訳情報抽出部３５０において抽出された対訳情報を含む。対訳データベース３７１は、さらに、対訳情報抽出部３５０を用いずに得られた対訳情報を有してもよい。 Also, the parallel translation database 371 includes at least the parallel translation information extracted by the parallel translation information extraction unit 350 . The parallel translation database 371 may also have parallel translation information obtained without using the parallel translation information extraction unit 350 .

（２．３．機械翻訳装置１００）
図１に示すように、機械翻訳装置１００は、機械翻訳学習部１１０と、漫画画像取得部１２０と、文字領域検出部１３０と、文字情報推定部１４０と、機械翻訳部１５０と、画像生成部１６０と、記憶部１７０とを有している。(2.3. Machine translation device 100)
As shown in FIG. 1, the machine translation apparatus 100 includes a machine translation learning unit 110, a cartoon image acquisition unit 120, a character area detection unit 130, a character information estimation unit 140, a machine translation unit 150, and an image generation unit. 160 and a storage unit 170 .

機械翻訳学習部１１０は、対訳情報抽出部３５０において生成された対訳データベース３７１を用いて、機械翻訳モデルの学習を行う。本実施形態においては、機械翻訳モデルとして、ニューラル機械翻訳器を用いる。ニューラル機械翻訳器としては、特に限定されるものではないが、例えば、Ａｔｔｅｎｔｉｏｎ－ｂａｓｅｄＥｎｃｏｄｅｒ－Ｄｅｃｏｄｅｒ、ＣｏｎｖｏｌｕｔｉｏｎａｌＳｅｑｕｅｎｃｅｔｏＳｅｑｕｅｎｃｅ、Ｔｒａｎｓｆｏｒｍｅｒ等を用いることができる。機械翻訳学習部１１０は、学習済みの機械翻訳モデルを、記憶部１７０へ出力する。 The machine translation learning unit 110 uses the parallel translation database 371 generated by the parallel translation information extraction unit 350 to learn a machine translation model. In this embodiment, a neural machine translator is used as the machine translation model. The neural machine translator is not particularly limited, but for example, Attention-based Encoder-Decoder, Convolutional Sequence to Sequence, Transformer, etc. can be used. Machine translation learning section 110 outputs the learned machine translation model to storage section 170 .

漫画画像取得部１２０は、図１０に示すように、翻訳対象の漫画の各ページを漫画画像６００Ｊ_ｎとして取得する（ｎは自然数である）。漫画画像取得部１２０は、必要に応じて、漫画画像６００Ｊ_ｎに対して画像補正を行ってもよい。例えば、漫画画像取得部１２０は漫画画像６００Ｊ_ｎのサイズを各ページ間で統一してもよいし、あるいは漫画画像６００Ｊ_ｎに存在する歪みや傾きを補正してもよい。漫画画像６００Ｊｎの取得方法は、特に限定されず、電子データとして提供されるものを用いてもよいし、紙媒体の漫画をスキャンすることにより得てもよい。なお、ここで翻訳対象の漫画は、日本語にて作成されており、英語に翻訳されることが予定されているとする。As shown in FIG. 10, the comic image acquisition unit 120 acquires each page of the comic to be translated as a comic image _600Jn (n is a natural number). The comic image acquisition section 120 may perform image correction on the comic image _600Jn as necessary. For example, the comic image acquisition unit 120 may unify the sizes of the comic images _600Jn between pages, or may correct distortions and tilts present in the comic images _600Jn . The method of obtaining the comic image 600Jn is not particularly limited, and it may be provided as electronic data or may be obtained by scanning a paper medium comic. It is assumed here that the manga to be translated is created in Japanese and is scheduled to be translated into English.

文字領域検出部１３０は、漫画画像６００Ｊ_ｎより、文字領域６１０を検出する。文字領域６１０の検出は、文字領域検出部３３０と同様に、いかなる手法で行われてもよいが、例えば、物体検出器（ｏｂｊｅｃｔｄｅｔｅｃｔｏｒ）により検出することができる。物体検出器としては、特に限定されるものではなく、例えば、Ｒ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＦａｓｔｅｒＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ、ＭａｓｋＲ－ＣＮＮｏｂｊｅｃｔｄｅｔｅｃｔｏｒ等のＲ－ＣＮＮ（ＲｅｇｉｏｎｓｗｉｔｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ）系物体検出器、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉＢｏｘＤｅｔｅｃｔｏｒ）、ＹＯＬＯ（ＹｏｕＬｏｏｋＯｎｌｙＯｎｃｅ）、Ｍ２Ｄｅｔ等を用いることができる。なお、文字領域検出部１３０は、漫画画像６００Ｊ_ｎに存在する文字列を直接文字領域６１０として検出・特定してもよい。Character area detection section 130 detects character area 610 from comic image _600Jn . The character area 610 can be detected by any method, like the character area detection unit 330. For example, the character area 610 can be detected by an object detector. The object detector is not particularly limited. For example, R-CNN (Regions with Convolutional Neural Networks) system object detector, SSD (Single Shot MultiBox Detector), YOLO (You Look Only Once), M2Det, etc. can be used. Note that the character region detection unit 130 may directly detect and identify a character string existing in the comic image 600J _n as the character region 610 .

文字情報推定部１４０は、検出された文字領域６１０より、日本語の文字情報６２０Ｊを推定する。具体的には、文字情報推定部１４０は、文字領域６１０において存在する文字列を特定し、文字列に存在する文字情報６２０Ｊを抽出する。文字列の特定は、対訳情報抽出部３５０における手法と同様の手法により行うことができる。なお、文字領域検出部１３０が直接文字列を文字領域６１０として検出している場合、文字情報推定部１４０による文字列の特定は省略できる。 Character information estimation unit 140 estimates Japanese character information 620J from detected character area 610 . Specifically, character information estimation unit 140 identifies a character string that exists in character region 610, and extracts character information 620J that exists in the character string. Character strings can be identified by the same technique as the technique used by the parallel translation information extraction unit 350 . Note that when the character region detection unit 130 directly detects the character string as the character region 610, the character string identification by the character information estimation unit 140 can be omitted.

ついで、文字情報推定部１４０は、文字領域６１０において特定された文字列から文字情報６２０Ｊを抽出する。文字情報６２０Ｊの抽出は、文字認識モデル生成装置２００において生成した学習済み日本語文字認識モデルを用いて行うことができる。 Next, character information estimation unit 140 extracts character information 620J from the character string identified in character area 610 . The character information 620J can be extracted using the learned Japanese character recognition model generated by the character recognition model generation device 200. FIG.

機械翻訳部１５０は、文字情報推定部１４０において推定された日本語の文字情報６２０Ｊを、機械翻訳により英語の文字情報６２０Ｅに翻訳する。機械翻訳部１５０における機械翻訳は、機械翻訳学習部１１０において学習した機械翻訳モデルにより行われる。この機械翻訳モデルは、対訳データベース３７１の対訳情報を利用して学習されたものであり、精度のよい翻訳が可能である。 The machine translation unit 150 translates the Japanese character information 620J estimated by the character information estimation unit 140 into English character information 620E by machine translation. Machine translation in the machine translation unit 150 is performed by a machine translation model learned in the machine translation learning unit 110 . This machine translation model is learned using the parallel translation information of the parallel translation database 371, and is capable of accurate translation.

画像生成部１６０は、機械翻訳部１５０により翻訳された英語の文字情報６２０Ｅを、日本語で作成された漫画画像６００Ｊ_ｎに画像として付与し、英語の漫画画像６００Ｅ_ｎを生成する。具体的には、図１１に示すように、画像生成部１６０は、漫画画像６００Ｊ_ｎ中の文字領域６１０を白色の領域とし、その後、文字情報６２０Ｅを画像として付与する。なお、文字情報６２０が付与される領域は、文字情報６２０Ｊが存在していた領域に対応していればよく、文字情報６２０Ｊが存在していた領域と一致しなくてもよい。The image generation unit 160 adds the English character information 620E translated by the machine translation unit 150 to the comic image _600Jn created in Japanese as an image to generate an English comic image _600En . Specifically, as shown in FIG. 11, the image generation unit 160 makes the character area 610 in the comic image _600Jn a white area, and then adds the character information 620E as an image. Note that the area to which the character information 620 is added may correspond to the area where the character information 620J exists, and does not have to match the area where the character information 620J exists.

記憶部１７０は、各部に対する入出力が可能であり、各部の処理に必要な情報を保存する。例えば、記憶部１７０は、漫画画像取得部１２０において処理されるための漫画画像６００Ｊ_ｎや、文字情報推定部１４０において使用される学習済み文字認識モデル、機械翻訳学習部１１０において生成したニューラル機械翻訳モデル等を記憶する。The storage unit 170 can input/output to/from each unit, and stores information necessary for processing of each unit. For example, the storage unit 170 stores a comic image 600J _n to be processed by the comic image acquiring unit 120, a learned character recognition model used in the character information estimating unit 140, a neural machine translation generated in the machine translation learning unit 110. Store the model, etc.

＜３．漫画の機械翻訳方法、文字認識モデル生成方法および対訳データベース生成方法＞
次に、上述した機械翻訳装置１００、文字認識モデル生成装置２００および対訳データベース生成装置３００の動作について、漫画の機械翻訳方法、文字認識モデル生成方法および対訳データベース生成方法とともに説明する。なお、上記の装置構成の説明と同様、文字認識モデル生成方法および対訳データベース生成方法を説明した後、漫画の機械翻訳方法について説明する。<3. Manga Machine Translation Method, Character Recognition Model Generating Method, and Bilingual Database Generating Method>
Next, the operations of the machine translation apparatus 100, the character recognition model generation apparatus 200, and the bilingual database generation apparatus 300 described above will be described together with the cartoon machine translation method, character recognition model generation method, and bilingual database generation method. As in the explanation of the apparatus configuration above, after explaining the character recognition model generation method and the bilingual database generation method, the machine translation method for comics will be explained.

（３．１文字認識モデル生成方法）
本実施形態に係る文字認識モデル生成方法は、プロセッサにより、自然言語の１以上のフォント画像と、前記フォント画像に変形、傾きおよび／またはノイズを付与した画像とを含む教師データを用いて機械学習することにより、学習済み文字認識モデルを生成すること、を実行することを含む。
図１２は、本実施形態に係る文字認識モデル生成方法についてのフローチャートである。(3.1 Character recognition model generation method)
In the character recognition model generation method according to the present embodiment, a processor performs machine learning using teacher data including one or more font images of a natural language and an image obtained by adding deformation, tilt, and/or noise to the font image. generating a trained character recognition model by doing.
FIG. 12 is a flow chart of the character recognition model generation method according to this embodiment.

本実施形態では、まず、教師データ生成部２１０により教師データが作成される。具体的には、教師データ生成部２１０は、記憶部２５０より、複数種のフォントについての文字画像４１１を含むフォント画像群４１０を取得する（Ｓ１０１）。
次に、教師データ生成部２１０は、フォント画像群４１０中の文字画像４１１に変形、傾きおよび／またはノイズを付与した加工文字画像４２１を含む加工文字画像群４２０を生成する（Ｓ１０３）。In this embodiment, first, teacher data is created by the teacher data generator 210 . Specifically, the training data generation unit 210 acquires a font image group 410 including character images 411 for multiple types of fonts from the storage unit 250 (S101).
Next, the teacher data generation unit 210 generates a processed character image group 420 including processed character images 421 obtained by adding deformation, tilt and/or noise to the character images 411 in the font image group 410 (S103).

次に、教師データ生成部２１０は、加工フォント画像群４２０中の加工文字画像４２１と、フォント画像群４１０中の文字画像４１１とを組み合わせて、学習用文字画像４３１を複数含む学習用文字画像群４３０を生成する（Ｓ１０５）。 Next, the teacher data generation unit 210 combines the processed character images 421 in the processed font image group 420 and the character images 411 in the font image group 410 to create a learning character image group including a plurality of learning character images 431. 430 is generated (S105).

最後に、機械学習部２３０により、教師データ生成部２１０において生成した教師データを用いて機械学習を行い、学習済み文字認識モデルを生成する（Ｓ１０７）。 Finally, the machine learning unit 230 performs machine learning using the training data generated by the training data generation unit 210 to generate a learned character recognition model (S107).

以上のようにして得られる学習済み文字認識モデルを用いた場合、多種多様なフォントや、その変形画像に対応して、漫画画像中の文字を認識することができる。すなわち、一般に、漫画は、一般の文書とは異なり、絵とともに文が記載されており、視覚的な要素が大きいことから、同一のページ中であっても、多種多様なフォントが用いられる。また、漫画中の会話のやり取りや状況を、臨場感を持って説明するために、多種多様なフォントがさらに変形されることもある。本発明者らが検討したところ、このような漫画中に記載される文は、一般的な光学式文字認識手段（ＯＣＲ：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）では正確に認識することが困難である。これに対し、上記で得られる学習済み文字認識モデルを用いた場合、精度よく漫画中も文字情報を認識することができる。 When the trained character recognition model obtained as described above is used, it is possible to recognize characters in cartoon images corresponding to a wide variety of fonts and their modified images. In general, comics differ from ordinary documents in that texts are written together with pictures, and since comics have large visual elements, a wide variety of fonts are used even on the same page. In addition, a wide variety of fonts may be further modified in order to realistically describe the conversations and situations in the manga. As a result of studies by the present inventors, it is difficult to accurately recognize sentences written in such cartoons by general optical character recognition means (OCR: Optical Character Recognition). On the other hand, when the learned character recognition model obtained above is used, character information can be accurately recognized even in comics.

（３．２対訳データベース生成方法）
本実施形態に係る対訳データベース生成方法は、プロセッサにより、参照漫画の第１自然言語参照画像に含まれる第１自然言語文字領域と前記参照漫画の第２自然言語参照画像に含まれ、第１自然言語文字領域に対応する第２自然言語文字領域とを検出することと、
前記第１自然言語文字領域に存在する第１自然言語文字情報および第２自然言語文字領域に存在する第２自然言語文字情報を抽出することと、を実行することを含む。
図１３は、本実施形態に係る漫画の対訳データベース生成方法を説明するフローチャートである。以下、第１自然言語が日本語であり、第２自然言語が英語である場合について説明する。(3.2 Bilingual database generation method)
In the bilingual database generation method according to the present embodiment, the first natural language character region included in the first natural language reference image of the reference cartoon and the second natural language reference image of the reference cartoon are included in the first natural language character region by the processor. detecting a second natural language character region corresponding to the language character region;
and extracting first natural language character information present in the first natural language character region and second natural language character information present in the second natural language character region.
FIG. 13 is a flow chart for explaining a method for generating a parallel translation database of comics according to this embodiment. A case where the first natural language is Japanese and the second natural language is English will be described below.

まず、各工程に先立ち、参照画像取得部３１０は、まず、日本語参照漫画５００Ｊおよび英語参照漫画５００Ｅの各ページの画像を、それぞれ日本語参照画像５１０Ｊ_ｎ、英語参照画像５１０Ｅ_ｍとして取り込む（ここでｎ、ｍは自然数である）（Ｓ２０１）。
次に、参照画像取得部３１０は、取り込まれた日本語参照漫画５００Ｊの取り込まれた日本語参照画像５１０Ｊ_ｎを英語参照漫画５００Ｅの英語参照画像５１０Ｅ_ｍと対応付ける（Ｓ２０３）。First, prior to each step, the reference image acquisition unit 310 first captures images of each page of the Japanese reference cartoon 500J and the English reference cartoon 500E as a Japanese reference image _510Jn and an English reference image _510Em , respectively. where n and m are natural numbers) (S201).
Next, the reference image acquisition unit 310 associates the captured Japanese reference image _510Jn of the captured Japanese reference comic 500J with the English reference image _510Em of the English reference comic 500E (S203).

次いで、文字領域検出部３３０により、日本語参照画像５１０Ｊ_ｎに含まれる日本語文字領域５２０Ｊ、５３０Ｊと英語参照画像５１０Ｅ_ｍに含まれ、日本語文字領域に対応する英語文字領域５２０Ｅ、５３０Ｅとを検出する（Ｓ２０５）。なお、ここで文字領域検出部３３０は、日本語参照画像５１０Ｊ_ｎ、英語参照画像５１０Ｅ_ｍに含まれる文字列を直接日本語文字領域５２０Ｊ、５３０Ｊ、英語文字領域５２０Ｅ、５３０Ｅとして検出してもよい。Next, the character area detection unit 330 detects Japanese character areas 520J and 530J included in the Japanese reference image _510Jn and English character areas _520E and 530E included in the English reference image 510Em and corresponding to the Japanese character areas. Detect (S205). Here, the character region detection unit 330 may directly detect the character strings included in the Japanese reference image _510Jn and the English reference image 510Em as the Japanese character regions 520J and _530J and the English character regions 520E and 530E. .

次いで、対訳情報抽出部３５０により、文字領域検出部３３０により検出された日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅより、それぞれ日本語文字情報５２１Ｊ、５３１Ｊおよび英語文字情報５２１Ｅ、５３１Ｅを抽出する（Ｓ２０７）。具体的には、対訳情報抽出部３５０により、日本語文字領域５２０Ｊ、５３０Ｊおよび英語文字領域５２０Ｅ、５３０Ｅにおいて存在する文字列を特定し、文字列に存在する文字情報を抽出する。文字領域検出部３３０が直接文字列を日本語文字領域５２０Ｊ、５３０Ｊ、英語文字領域５２０Ｅ、５３０Ｅとして特定した場合には、対訳情報抽出部３５０による文字列の特定は省略されることができる。日本語文字情報５２１Ｊ、５３１Ｊおよび英語文字情報５２１Ｅ、５３１Ｅの抽出は、文字認識モデル生成方法において生成した学習済み日本語文字認識モデルおよび学習済み英語文字認識モデルを用いて行うことができる。 Next, from the Japanese character areas 520J and 530J and the English character areas 520E and 530E detected by the character area detection section 330, the parallel translation information extraction section 350 extracts the Japanese character information 521J and 531J and the English character information 521E and 531E, respectively. Extract (S207). Specifically, parallel translation information extraction unit 350 identifies character strings existing in Japanese character areas 520J and 530J and English character areas 520E and 530E, and extracts character information existing in the character strings. When the character region detection unit 330 directly identifies the character strings as the Japanese character regions 520J and 530J and the English character regions 520E and 530E, the character string identification by the parallel translation information extraction unit 350 can be omitted. The Japanese character information 521J, 531J and the English character information 521E, 531E can be extracted using the trained Japanese character recognition model and the trained English character recognition model generated in the character recognition model generating method.

最後に、対訳情報抽出部３５０により、抽出された日本語文字情報５２１Ｊとこれに対応する英語文字情報５２１Ｅ、また日本語文字情報５３１Ｊとこれに対応する英語文字情報５３１Ｅを１対の文の対訳情報として特定し（Ｓ２０９）、後述する対訳データベース３７１に出力する。以上を行うことにより、対訳データベース３７１を生成することができる。 Finally, the bilingual information extraction unit 350 translates the extracted Japanese character information 521J and the corresponding English character information 521E, and the Japanese character information 531J and the corresponding English character information 531E into a pair of sentences. It is specified as information (S209) and output to the bilingual database 371, which will be described later. By performing the above, the bilingual database 371 can be generated.

以上によれば、漫画の対訳データベースを精度よくかつ迅速に自動生成することができる。特に、上記の対訳データベース生成方法は、漫画の対訳のテキストデータを必要とせず、対訳が存在する漫画が存在すればよいことから、多種多様な漫画から対訳を収集することができ、膨大なデータ量の対訳データベースを安価に生成することが容易である。また特に、文字認識モデル生成方法において生成した学習済み文字認識モデルを用いた場合、漫画特有の多種多様なフォントおよびこれの変形物にも対応して精度よく文字情報を認識することが可能となる。この結果、精度の良い対訳データベースを自動生成することが可能となる。 According to the above, it is possible to automatically generate a parallel translation database of comics accurately and quickly. In particular, the above-described bilingual database generation method does not require text data of the bilingual comics, and only requires the existence of comics with bilingual translations. It is easy to generate a large number of bilingual databases at low cost. In particular, when the trained character recognition model generated by the character recognition model generating method is used, it is possible to recognize character information with high accuracy in response to a wide variety of fonts peculiar to comics and variations thereof. . As a result, it is possible to automatically generate a highly accurate bilingual database.

（３．３漫画の機械翻訳方法）
本実施形態に係る漫画の機械翻訳方法は、プロセッサにより、
第１自然言語を用いて作成された漫画を構成する第１自然言語画像より、文字領域を検出することと、
前記文字領域より、前記第１自然言語の文字情報を推定することと、および
前記第１自然言語の文字情報を、対訳データベースを用いた機械翻訳により第２自然言語の文字情報に翻訳することと、を実行することを含む。
図１４は、本実施形態に係る漫画の機械翻訳方法を説明するフローチャートである。以下、第１自然言語が日本語であり、第２自然言語が英語である場合について説明する。(3.3 Machine translation method for manga)
The machine translation method for comics according to the present embodiment is performed by a processor,
Detecting a character region from a first natural language image that constitutes a cartoon created using a first natural language;
estimating character information in the first natural language from the character area; and translating the character information in the first natural language into character information in a second natural language by machine translation using a bilingual database. , including executing
FIG. 14 is a flow chart for explaining the machine translation method for comics according to the present embodiment. A case where the first natural language is Japanese and the second natural language is English will be described below.

まず、漫画画像取得部１２０により、翻訳対象の漫画の各ページを日本語の漫画画像６００Ｊ_ｎとして取得する（ｎは自然数である）（Ｓ３０１）。次いで、文字領域検出部１３０により、漫画画像６００Ｊ_ｎから文字領域６１０を検出する（Ｓ３０３）。First, the comic image acquiring unit 120 acquires each page of the comic to be translated as a Japanese comic image _600Jn (n is a natural number) (S301). Next, the character area detection unit 130 detects the character area 610 from the comic image _600Jn (S303).

次いで、文字情報推定部１４０により、検出された文字領域６１０から日本語の文字情報６２０Ｊを推定する（Ｓ３０５）。文字情報６２０Ｊの抽出は、文字認識モデル生成装置２００において生成した学習済み日本語文字認識モデルを用いて行うことができる。 Next, the character information estimation unit 140 estimates Japanese character information 620J from the detected character area 610 (S305). The character information 620J can be extracted using the learned Japanese character recognition model generated by the character recognition model generation device 200. FIG.

次いで、機械翻訳部１５０により、文字情報推定部１４０において推定された日本語の文字情報６２０Ｊを、機械翻訳により英語の文字情報６２０Ｅに翻訳する（Ｓ３０７）。機械翻訳は、機械翻訳学習部１１０において学習した機械翻訳モデルにより行われる。 Next, the machine translation unit 150 translates the Japanese character information 620J estimated by the character information estimation unit 140 into English character information 620E by machine translation (S307). Machine translation is performed by a machine translation model learned by the machine translation learning unit 110 .

最後に、画像生成部１６０により、機械翻訳部１５０により翻訳された英語の文字情報６２０Ｅを、日本語で作成された漫画画像６００Ｊ_ｎに画像として付与し、英語の漫画画像６００Ｅ_ｎを生成する（Ｓ３０７）。Finally, the image generation unit 160 adds the English character information 620E translated by the machine translation unit 150 to the comic image 600J _n created in Japanese as an image to generate the English comic image 600E _n ( S307).

以上によれば、上述した漫画の対訳データベースを用いて機械翻訳を行うことにより、精度よくかつ迅速に漫画の機械翻訳を行うことが可能である。また特に、文字認識モデル生成方法において生成した学習済み文字認識モデルを用いた場合、漫画特有の多種多様なフォントおよびこれの変形物にも対応して精度よく文字情報を認識することが可能となる。この結果、機械翻訳の精度がより一層向上する。 According to the above, by performing machine translation using the bilingual database of comics described above, it is possible to perform machine translation of comics accurately and quickly. In particular, when the trained character recognition model generated by the character recognition model generating method is used, it is possible to recognize character information with high accuracy in response to a wide variety of fonts peculiar to comics and variations thereof. . As a result, the accuracy of machine translation is further improved.

＜４．ハードウェア構成例＞
最後に、図１５を参照して、本実施形態に係る漫画の機械翻訳装置１００のハードウェア構成について説明する。図１５は、本実施形態に係る漫画の機械翻訳装置のハードウェア構成の一例を示すブロック図である。なお、図１５に示す情報処理装置（コンピュータ）９００は、例えば、図１に示した機械翻訳装置１００を実現し得る。本実施形態に係る機械翻訳装置１００による情報処理は、ソフトウェアと、以下に説明するハードウェアとの協働により実現される。なお、漫画の対訳データベース生成装置３００および文字認識モデル生成装置２００についても同様である。<4. Hardware configuration example>
Finally, with reference to FIG. 15, the hardware configuration of the machine translation device 100 for comics according to the present embodiment will be described. FIG. 15 is a block diagram showing an example of the hardware configuration of the machine translation device for comics according to this embodiment. Note that the information processing apparatus (computer) 900 shown in FIG. 15 can implement the machine translation apparatus 100 shown in FIG. 1, for example. Information processing by the machine translation apparatus 100 according to this embodiment is realized by cooperation between software and hardware described below. The same applies to the bilingual comic database generation device 300 and the character recognition model generation device 200 .

図１５に示すように、情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３及びホストバス９０４ａを備える。また、情報処理装置９００は、ブリッジ９０４、外部バス９０４ｂ、インタフェース９０５、入力装置９０６、出力装置９０７、ストレージ装置９０８、ドライブ９０９、接続ポート９１１及び通信装置９１３を備える。情報処理装置９００は、ＣＰＵ９０１に代えて、又はこれとともに、電気回路、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）若しくはＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等の処理回路を有してもよい。 As shown in FIG. 15, an information processing apparatus 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, and a host bus 904a. The information processing device 900 also includes a bridge 904 , an external bus 904 b , an interface 905 , an input device 906 , an output device 907 , a storage device 908 , a drive 909 , a connection port 911 and a communication device 913 . The information processing apparatus 900 may have a processing circuit such as an electric circuit, a DSP (Digital Signal Processor), or an ASIC (Application Specific Integrated Circuit) instead of or together with the CPU 901 .

ＣＰＵ９０１は、プロセッサの一例であり、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。 The CPU 901 is an example of a processor, functions as an arithmetic processing device and a control device, and controls general operations within the information processing device 900 according to various programs. Alternatively, the CPU 901 may be a microprocessor. The ROM 902 stores programs, calculation parameters, and the like used by the CPU 901 . The RAM 903 temporarily stores programs used in the execution of the CPU 901, parameters that change as appropriate during the execution, and the like.

ＣＰＵ９０１、ＲＯＭ９０２及びＲＡＭ９０３は、ＣＰＵバスなどを含むホストバス９０４ａにより相互に接続されている。ホストバス９０４ａは、ブリッジ９０４を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス９０４ｂに接続されている。なお、必ずしもホストバス９０４ａ、ブリッジ９０４および外部バス９０４ｂを分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The CPU 901, ROM 902 and RAM 903 are interconnected by a host bus 904a including a CPU bus. The host bus 904a is connected via a bridge 904 to an external bus 904b such as a PCI (Peripheral Component Interconnect/Interface) bus. Note that the host bus 904a, the bridge 904 and the external bus 904b do not necessarily have to be configured separately, and these functions may be implemented in one bus.

入力装置９０６は、例えば、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチ及びレバー等、ユーザによって情報が入力される装置によって実現される。また、入力装置９０６は、例えば、赤外線やその他の電波を利用したリモートコントロール装置であってもよいし、情報処理装置９００の操作に対応した携帯電話やＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）等の外部接続機器であってもよい。さらに、入力装置９０６は、例えば、上記の入力手段を用いてユーザにより入力された情報に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路などを含んでいてもよい。情報処理装置９００のユーザは、この入力装置９０６を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 906 is realized by a device such as a mouse, keyboard, touch panel, button, microphone, switch, lever, etc., through which information is input by the user. Further, the input device 906 may be, for example, a remote control device using infrared rays or other radio waves, or an external connection device such as a mobile phone or PDA (Personal Digital Assistant) compatible with the operation of the information processing device 900. may be Furthermore, the input device 906 may include, for example, an input control circuit that generates an input signal based on information input by the user using the above input means and outputs the signal to the CPU 901 . A user of the information processing apparatus 900 can input various data to the information processing apparatus 900 and instruct processing operations by operating the input device 906 .

出力装置９０７は、取得した情報をユーザに対して視覚的又は聴覚的に通知することが可能な装置で形成される。このような装置として、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ装置、プラズマディスプレイ装置、ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅｄｉｓｐｌａｙ）ディスプレイ装置、レーザープロジェクタ、ＬＥＤプロジェクタ及びランプ等の表示装置や、スピーカ及びヘッドホン等の音声出力装置等がある。出力装置９０７は、例えば、情報処理装置９００が行った各種処理により得られた結果を出力する。具体的には、出力装置９０７は、情報処理装置９００が行った各種処理により得られた結果を、テキスト、イメージ、表、グラフ等、様々な形式で視覚的に表示する。他方、音声出力装置を用いる場合は、再生された音声データや音響データ等からなるオーディオ信号をアナログ信号に変換して聴覚的に出力する。 The output device 907 is formed by a device capable of visually or audibly notifying the user of the acquired information. Examples of such devices include CRT (Cathode Ray Tube) display devices, liquid crystal display devices, plasma display devices, EL (Electroluminescence display) display devices, laser projectors, LED projectors, lamps and other display devices, and speakers and headphones for sound. There is an output device, etc. The output device 907 outputs, for example, results obtained by various processes performed by the information processing device 900 . Specifically, the output device 907 visually displays the results obtained by various processes performed by the information processing device 900 in various formats such as text, image, table, and graph. On the other hand, when an audio output device is used, an audio signal composed of reproduced audio data, acoustic data, etc. is converted into an analog signal and audibly output.

ストレージ装置９０８は、情報処理装置９００の記憶部の一例として形成されたデータ格納用の装置である。ストレージ装置９０８は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等の磁気記憶部デバイス、半導体記憶デバイス、光記憶デバイス又は光磁気記憶デバイス等により実現される。ストレージ装置９０８は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置などを含んでもよい。このストレージ装置９０８は、ＣＰＵ９０１が実行するプログラムや各種データ及び外部から取得した各種のデータ等を格納する。ストレージ装置９０８は、例えば、図１に示す記憶部１７０の機能を実行し得る。 The storage device 908 is a data storage device formed as an example of the storage unit of the information processing device 900 . The storage device 908 is implemented by, for example, a magnetic storage device such as a HDD (Hard Disk Drive), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like. The storage device 908 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 908 stores programs executed by the CPU 901, various data, and various data acquired from the outside. Storage device 908 may perform the functions of storage unit 170 shown in FIG. 1, for example.

ドライブ９０９は、記憶媒体用リーダライタであり、情報処理装置９００に内蔵、あるいは外付けされる。ドライブ９０９は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記憶媒体に記録されている情報を読み出して、ＲＡＭ９０３に出力する。また、ドライブ９０９は、リムーバブル記憶媒体に情報を書き込むこともできる。 The drive 909 is a reader/writer for storage media, and is built in or externally attached to the information processing apparatus 900 . The drive 909 reads out information recorded on a removable storage medium such as a mounted magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and outputs the information to the RAM 903 . Drive 909 can also write information to a removable storage medium.

接続ポート９１１は、外部機器と接続されるインタフェースであって、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）などによりデータ伝送可能な外部機器との接続口である。 The connection port 911 is an interface connected to an external device, and is a connection port with an external device capable of data transmission by, for example, USB (Universal Serial Bus).

通信装置９１３は、例えば、ネットワーク９２０に接続するための通信デバイス等で形成された通信インタフェースである。通信装置９１３は、例えば、有線若しくは無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）、Ｂｌｕｅｔｏｏｔｈ（登録商標）又はＷＵＳＢ（ＷｉｒｅｌｅｓｓＵＳＢ）用の通信カード等である。また、通信装置９１３は、光通信用のルータ、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）用のルータ又は各種通信用のモデム等であってもよい。この通信装置９１３は、例えば、インターネットや他の通信機器との間で、例えばＴＣＰ／ＩＰ等の所定のプロトコルに則して信号等を送受信することができる。 The communication device 913 is, for example, a communication interface formed by a communication device or the like for connecting to the network 920 . The communication device 913 is, for example, a communication card for wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), or WUSB (Wireless USB). Further, the communication device 913 may be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), a modem for various types of communication, or the like. This communication device 913 can transmit and receive signals and the like to and from the Internet and other communication devices, for example, according to a predetermined protocol such as TCP/IP.

なお、ネットワーク９２０は、ネットワーク９２０に接続されている装置から送信される情報の有線、または無線の伝送路である。例えば、ネットワーク９２０は、インターネット、電話回線網、衛星通信網などの公衆回線網や、Ｅｔｈｅｒｎｅｔ（登録商標）を含む各種のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などを含んでもよい。また、ネットワーク９２０は、ＩＰ－ＶＰＮ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ－ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）などの専用回線網を含んでもよい。 Note that the network 920 is a wired or wireless transmission path for information transmitted from devices connected to the network 920 . For example, the network 920 may include a public network such as the Internet, a telephone network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), WANs (Wide Area Networks), and the like. Network 920 may also include a dedicated line network such as IP-VPN (Internet Protocol-Virtual Private Network).

＜５．プログラムおよび記録媒体＞
以上、本実施形態に係る機械翻訳装置１００、漫画の対訳データベース生成装置３００および文字認識モデル生成装置２００およびこれを用いた各種方法について説明した。
したがって、本発明は、他の局面において、コンピュータをこれらの装置として機能させるためのプログラムにも関する。また、該プログラムを記憶させた記録媒体も提供される。<5. Program and recording medium>
The machine translation device 100, the bilingual comic database generation device 300, the character recognition model generation device 200, and various methods using them according to the present embodiment have been described above.
Therefore, the present invention, in another aspect, also relates to a program for causing a computer to function as these devices. A recording medium storing the program is also provided.

＜６．まとめ＞
以上、図１～１５を参照して、本発明の一実施形態について説明した。以上説明したように、本発明によれば、精度の高い漫画の機械翻訳が可能な漫画の機械翻訳装置、漫画の機械翻訳方法およびプログラムならびにこれらのための漫画の対訳データベース生成装置を提供することができる。特に、上述した漫画の対訳データベースを用いた場合、漫画の機械翻訳の精度が向上する。さらには、上述した学習済み文字認識モデルを用いることにより、従来困難であった漫画の画像より文字情報を精度よく認識することが可能となる。<6. Summary>
An embodiment of the present invention has been described above with reference to FIGS. As described above, according to the present invention, it is possible to provide a machine translation device for comics, a machine translation method and program for comics, and a parallel translation database generation device for comics, which are capable of machine-translating comics with high accuracy. can be done. In particular, when the bilingual database of comics described above is used, the accuracy of machine translation of comics is improved. Furthermore, by using the above-described trained character recognition model, it becomes possible to accurately recognize character information from cartoon images, which has been difficult in the past.

なお、本発明は、上記の実施形態に限定されるものではない。例えば、上述した実施形態においては、機械翻訳は、ニューラル機械翻訳であるとして説明したが、これに限定されない。機械翻訳としては、上述した漫画の対訳データベースを用いる限り、あらゆる統計的機械翻訳およびニューラル機械翻訳を用いることができる。 In addition, this invention is not limited to said embodiment. For example, in the above-described embodiments, machine translation was explained as neural machine translation, but it is not limited to this. Any statistical machine translation or neural machine translation can be used as the machine translation as long as the bilingual database of comics described above is used.

また、例えば、上述した実施形態に係る漫画の機械翻訳装置、および漫画の対訳データベース生成装置は、上述した学習済み文字認識モデルを用いることとしたが、これに限定されず、本発明に係る漫画の機械翻訳装置、および漫画の対訳データベース生成装置は、上記学習済み文字認識モデルを用いていなくてもよい。 Further, for example, although the machine translation device for comics and the bilingual database generation device for comics according to the above-described embodiments use the above-described learned character recognition model, the invention is not limited to this. The machine translation device and the parallel translation database generation device for comics need not use the learned character recognition model.

また、例えば、上述した実施形態においては、学習用文字画像群４３０は、加工フォント画像群４２０中の加工文字画像４２１と、フォント画像群４１０中の文字画像４１１とを組み合わせて得られるものとして説明したが、本開示はこれに限定されない。例えば、学習用文字画像群は、加工フォント画像群４２０中の加工文字画像４２１と、フォント画像群４１０中の文字画像４１１を含まなくてもよい。この場合、学習用文字画像群は、漫画に記載される文字画像、その他公知の文字認識データセットの文字画像等のその他の文字画像を含むことができる。 Further, for example, in the above-described embodiment, the learning character image group 430 is obtained by combining the processed character image 421 in the processed font image group 420 and the character image 411 in the font image group 410. However, the present disclosure is not so limited. For example, the learning character image group may not include the processed character image 421 in the processed font image group 420 and the character image 411 in the font image group 410 . In this case, the learning character image group can include other character images such as character images described in cartoons and other known character recognition data sets.

また、例えば、対訳データベースの生成における第１自然言語文字領域および第２自然言語文字領域の検出は、上述した物体検出器に限定されるものではなく、例えば、第１自然言語参照画像および第２自然言語参照画像の対応するページを重ね合わせ、異なる部分を第１自然言語文字領域および第２自然言語文字領域として検出してもよい。 Also, for example, the detection of the first natural language character region and the second natural language character region in the generation of the bilingual database is not limited to the object detector described above, for example, the first natural language reference image and the second Corresponding pages of the natural language reference images may be overlaid and different portions detected as first and second natural language text regions.

また、上述した説明では、第１自然言語が日本語であり、第２自然言語が英語であるものとして説明したが、本発明は、上述した実施形態に限定されるものではなく、第１自然言語および第２自然言語は、文字を用いて記載される任意の自然言語であることができる。 In the above description, the first natural language is Japanese and the second natural language is English. The language and second natural language can be any natural language written using characters.

また、上述した説明では、漫画の機械翻訳装置１００、漫画の対訳データベース生成装置３００および文字認識モデル生成装置２００がそれぞれ１つの情報処理装置によって構成されるものとして説明したが、本発明はこれに限定されない。例えば、漫画の機械翻訳装置、漫画の対訳データベース生成装置および文字認識モデル生成装置は、それぞれ複数の情報処理装置により構成されていてもよい。また、漫画の機械翻訳装置、漫画の対訳データベース生成装置および文字認識モデル生成装置のうち２以上が、一つの情報処理装置において実現されていてもよい。 In the above description, the machine translation device 100 for comics, the parallel translation database generation device 300 for comics, and the character recognition model generation device 200 are each configured by one information processing device. Not limited. For example, the machine translation device for comics, the parallel translation database generation device for comics, and the character recognition model generation device may each be composed of a plurality of information processing devices. Further, two or more of the machine translation device for comics, the parallel translation database generation device for comics, and the character recognition model generation device may be implemented in one information processing device.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention belongs can conceive of various modifications or modifications within the scope of the technical idea described in the claims. It is understood that these also naturally belong to the technical scope of the present invention.

１００機械翻訳装置
１１０機械翻訳学習部
１２０漫画画像取得部
１３０文字領域検出部
１４０文字情報推定部
１５０機械翻訳部
１６０画像生成部
１７０記憶部
２００文字認識モデル生成装置
２１０教師データ生成部
２３０機械学習部
２５０記憶部
３００対訳データベース生成装置
３１０参照画像取得部
３３０文字領域検出部
３５０対訳情報抽出部
３７０記憶部
３７１対訳データベース
４００ネットワーク

100 machine translation device 110 machine translation learning unit 120 comic image acquisition unit 130 character area detection unit 140 character information estimation unit 150 machine translation unit 160 image generation unit 170 storage unit 200 character recognition model generation device 210 teacher data generation unit 230 machine learning unit 250 Storage unit 300 Translation database generation device 310 Reference image acquisition unit 330 Character region detection unit 350 Translation information extraction unit 370 Storage unit 371 Translation database 400 Network

Claims

a character area detection unit that detects a character area from a first natural language image that constitutes a cartoon created using a first natural language;
a character information estimation unit for estimating character information of the first natural language from the character area;
a machine translation unit that translates the character information in the first natural language into character information in the second natural language by neural machine translation using a bilingual database;
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. A sentence automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region including bilingual information,
The character information estimation unit estimates the character information from the character region using a learned character recognition model,
The trained character recognition model uses teacher data including a plurality of font images of the first natural language for a plurality of types of fonts, and images obtained by adding deformation, tilt, and/or noise to the font images. A cartoon machine translation device generated by machine learning using an image recognition module.

Extraction of the first natural language character information from the first natural language character region is performed by estimating the first natural language character information using a trained character recognition model,
The trained character recognition model is a machine using teacher data including a plurality of font images of the first natural language for a plurality of types of fonts and an image obtained by adding deformation, tilt and / or noise to the font image. 2. The cartoon machine translation device according to claim 1, which is generated by learning.

Extraction of the second natural language character information from the second natural language character region is performed by estimating the second natural language character information using a trained second natural language character recognition model,
The trained second natural language character recognition model includes a plurality of second natural language font images of the second natural language for a plurality of types of fonts, and deformation, tilt, and/or noise on the second natural language font images. 3. The machine translation apparatus for comics according to claim 1, which is generated by machine learning with a Resnet image recognition module using teacher data including attached images.

4. The machine translation device for comics according to claim 1, wherein said first natural language is Japanese.

2. Further comprising an image generation unit that generates a second natural language image by adding the character information in the second natural language translated by the machine translation unit as an image to the first natural language image. 5. The machine translation device for comics according to any one of -4.

A first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region a character area detection unit to detect;
a parallel translation information extraction unit for extracting first natural language character information existing in the first natural language character area and second natural language character information existing in the second natural language character area;
a storage unit that stores at least part of the first natural language character information and at least part of the second natural language character information as parallel translation information of a sentence;
Extraction of the first natural language character information from the first natural language character region is performed by estimating the first natural language character information using a trained character recognition model,
The trained character recognition model is a Resnet image using teacher data including a plurality of font images of a first natural language for a plurality of types of fonts, and images obtained by adding deformation, tilt and/or noise to the font images. A manga bilingual database generation device that is generated by machine learning using a recognition module.

Extraction of the second natural language character information from the second natural language character region is performed by estimating the second natural language character information using a trained second natural language character recognition model,
The trained second natural language character recognition model provides a plurality of second natural language font images of a second natural language for a plurality of types of fonts, and deformation, tilt, and/or noise to the second natural language font images. 7. The comic bilingual database generation apparatus according to claim 6, wherein the database is generated by machine learning with a Resnet image recognition module using training data including the images.

by the processor
Detecting a character region from a first natural language image that constitutes a cartoon created using a first natural language;
estimating character information in the first natural language from the character area; and translating the character information in the first natural language into character information in a second natural language by neural machine translation using a bilingual database. and run
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. A sentence automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region including bilingual information,
Extraction of the first natural language character information from the first natural language character region is performed by estimating the first natural language character information using a trained character recognition model,
The trained character recognition model uses teacher data including a plurality of font images of the first natural language for a plurality of types of fonts, and images obtained by adding deformation, tilt, and/or noise to the font images. A cartoon machine translation method generated by machine learning using an image recognition module.

the computer,
a character area detection unit that detects a character area from a first natural language image that constitutes a cartoon created using a first natural language;
a character information estimation unit for estimating character information of the first natural language from the character area;
a machine translation unit that translates the character information in the first natural language into character information in the second natural language by neural machine translation using a bilingual database;
The bilingual database includes a first natural language character region included in the first natural language reference image of the reference cartoon and a second natural language character region included in the second natural language reference image of the reference cartoon and corresponding to the first natural language character region. A sentence automatically generated by detecting a language character region and extracting first natural language character information existing in the first natural language character region and second natural language character information existing in the second natural language character region including bilingual information,
Extraction of the first natural language character information from the first natural language character region is performed by estimating the first natural language character information using a trained character recognition model,
The trained character recognition model uses teacher data including a plurality of font images of the first natural language for a plurality of types of fonts, and images obtained by adding deformation, tilt, and/or noise to the font images. A program for functioning as a machine translation device for comics, generated by machine learning using an image recognition module.