JP2021056796A

JP2021056796A - Structure recognition system, structure recognition device, structure recognition method, and program

Info

Publication number: JP2021056796A
Application number: JP2019179710A
Authority: JP
Inventors: 美恵大串; Mie Ogushi; 貴広馬場; Takahiro Baba; 陽太 ▲高▼岡; Yota Takaoka; 英雄寺田; Hideo Terada
Original assignee: Toppan Forms Co Ltd; Open Stream Inc
Current assignee: Open Stream Inc; Toppan Edge Inc
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2021-04-08
Anticipated expiration: 2039-09-30
Also published as: JP7365835B2

Abstract

To provide a structure recognition system that can extract information necessary to transform a layout of a document containing a rectangle.SOLUTION: A structure recognition system includes: an image data acquisition unit that acquires image data in a target image including characters and rectangles; an area determination unit that determines each area of the characters and the rectangles in the target image; and a structure determination unit that determines, based on area data related to the area determined by the area determination unit, a hierarchical structure of the rectangles included in the target image.SELECTED DRAWING: Figure 1

Description

本発明は、構造認識システム、構造認識装置、構造認識方法、及びプログラムに関する。 The present invention relates to a structure recognition system, a structure recognition device, a structure recognition method, and a program.

帳票などの文書をスキャナー等により読み込むことにより作成されたスキャン画像から、画像内の文字情報を抽出する技術がある（例えば、特許文献１−２参照）。特許文献１の技術では、画像内の文字をその位置に基づいて構造化することにより、文字情報の誤りを修正し易くする技術が開示されている。特許文献１の構造化とは、文字情報を一群の情報ごとにまとめ、まとめた情報の階層関係を特定し、表現すること、と記載されている。例えば、画像から、タイトル、文書作成者、及び文書作成日等の文字情報が抽出された場合、構造化されたデータでは、最上位の階層にタイトルが示され、その下層に文書作成者、及び文書作成日が示される。特許文献２の技術では、画像内の文字情報と、罫線の特徴を示す特徴情報を抽出する。これにより、文書を検索する際に、文書に記載された文字に加えて、文書に記載された罫線の特徴を指定することができ、効率よく検索することが可能である。 There is a technique for extracting character information in an image from a scanned image created by reading a document such as a form with a scanner or the like (see, for example, Patent Document 1-2). The technique of Patent Document 1 discloses a technique of structuring characters in an image based on their positions to facilitate correction of errors in character information. The structuring of Patent Document 1 is described as summarizing character information for each group of information, specifying and expressing the hierarchical relationship of the summarized information. For example, when textual information such as title, document creator, and document creation date is extracted from an image, in the structured data, the title is shown in the highest layer, and the document creator and the document creator are in the lower layer. The document creation date is shown. In the technique of Patent Document 2, character information in an image and feature information indicating the features of ruled lines are extracted. As a result, when searching a document, in addition to the characters described in the document, the characteristics of the ruled lines described in the document can be specified, and the search can be performed efficiently.

一方、近年のコンピュータおよび通信ネットワーク技術の発展に伴い、紙媒体の情報が電子データに置き換えられる傾向がある。帳票を電子化することにより、ペーパーレス化による業務の効率化や省資源化を実現することができ、尚且つ、スマートフォンなどの電子機器を介して文書の記入欄などに記入することができるようになり、ユーザの利便性を向上させることができる。 On the other hand, with the recent development of computer and communication network technology, information on paper media tends to be replaced with electronic data. By digitizing the form, it is possible to realize efficiency and resource saving of work by making it paperless, and it is also possible to fill in the entry field of the document via an electronic device such as a smartphone. Therefore, the convenience of the user can be improved.

帳票を電子化する場合、そのレイアウトが変更されることが多い。帳票が印字された紙面と、スマートフォンなどの電子機器の画面とでは、アスペクト比が互いに異なり、紙の帳票のレイアウトを変更することなく、そのまま電子機器の画面に表示させると、表示の縮尺によっては文書の一部が表示できなかったり、画像全体を表示させようとすると、かなり縮小されてしまい文字が読み取り難くなってしまったりする可能性があるためである。レイアウトを変更する場合には、変換前の帳票に記載されていた内容を、変換後の帳票に過不足なく反映させる必要がある。この対策として、例えば、特許文献１−２の技術を適用して帳票のレイアウトを変更することが考えられる。特許文献１−２の技術を用いれば、帳票に記載された文字の構造、及び罫線の特徴を維持して、レイアウトを変更することが可能となる。 When digitizing forms, the layout is often changed. The aspect ratios of the paper on which the form is printed and the screen of an electronic device such as a smartphone are different from each other, and if the paper form is displayed on the screen of the electronic device as it is without changing the layout, depending on the scale of the display. This is because if a part of the document cannot be displayed or if the entire image is displayed, the image may be reduced considerably and the characters may become difficult to read. When changing the layout, it is necessary to reflect the contents described in the form before conversion in the form after conversion without excess or deficiency. As a countermeasure for this, for example, it is conceivable to apply the technique of Patent Document 1-2 to change the layout of the form. By using the technique of Patent Document 1-2, it is possible to change the layout while maintaining the character structure and the characteristics of the ruled lines described in the form.

特開２０１９−８２８１４号公報Japanese Unexamined Patent Publication No. 2019-82814 特開２００８−４０８３４号公報Japanese Unexamined Patent Publication No. 2008-40834

しかしながら、文字の構造、及び罫線の特徴を維持してレイアウトを変更しても、変換前の帳票に記載されていた内容を、変換後の帳票に過不足なく反映させることができない。帳票には、必要事項を記入するための記入枠が存在するものが多い。このような記入枠のほとんどが、文字を含まない、単純な矩形で示される。このような記入枠それ自体からは文字の情報を抽出することはできない。このため特許文献１の技術では、記入枠などの矩形を含む帳票に記載されている事項すべてについて階層構造を判定することが困難である。また、特許文献２を用いてレイアウト変更後の帳票に変更前の罫線の特徴が維持されたとしても、罫線で区分される何れの領域に文字を記載するか、或いは記載しないで記入枠とするかが判らなければ、適切にレイアウト変換を行うことができない。このように、従来の技術をそのまま利用するのみでは、帳票に記載されている事項（矩形を含む）の意味的な繋がり（構造）を維持しながら、レイアウトを変更することが困難であった。 However, even if the layout is changed while maintaining the character structure and the characteristics of the ruled lines, the contents described in the form before conversion cannot be reflected in the form after conversion without excess or deficiency. Many forms have an entry frame for entering necessary items. Most of these boxes are represented by simple rectangles that do not contain letters. Character information cannot be extracted from such an entry frame itself. Therefore, in the technique of Patent Document 1, it is difficult to determine the hierarchical structure for all the items described in the form including the rectangle such as the entry frame. Further, even if the characteristics of the ruled line before the change are maintained in the form after the layout change using Patent Document 2, the character is described in any area divided by the ruled line, or is used as an entry frame without being described. If you do not know, you cannot properly perform layout conversion. As described above, it has been difficult to change the layout while maintaining the semantic connection (structure) of the items (including the rectangle) described in the form only by using the conventional technique as it is.

本発明は、このような状況に鑑みてなされたもので、矩形を含む文書のレイアウトを変換するために必要な情報を抽出することができる構造認識システム、構造認識装置、構造認識方法、及びプログラムを提供する。 The present invention has been made in view of such a situation, and is a structure recognition system, a structure recognition device, a structure recognition method, and a program capable of extracting information necessary for converting the layout of a document including a rectangle. I will provide a.

本発明の上述した課題を解決するために、本発明は、文字と矩形とを含む対象画像における画像データを取得する画像データ取得部と、前記対象画像における文字と矩形のそれぞれの領域を判定する領域判定部と、前記領域判定部によって判定された前記領域に関する領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する構造判定部と、を備える構造認識システムである。 In order to solve the above-mentioned problems of the present invention, the present invention determines an image data acquisition unit that acquires image data in a target image including characters and a rectangle, and each region of the characters and the rectangle in the target image. It is a structure recognition system including a region determination unit and a structure determination unit that determines a rectangular hierarchical structure included in the target image based on region data related to the region determined by the region determination unit.

また、本発明は、上述の構造認識システムにおいて、文字の領域を示す前記領域データを用いて、当該領域に示される第１文字に対応する特定の第２文字を含む意味タグ情報を生成する前処理部を更に備え、前記構造判定部は、前記意味タグ情報と、矩形の領域を示す前記領域データとに基づいて、前記階層構造を判定する。 Further, in the above-mentioned structure recognition system, the present invention uses the area data indicating a character area before generating semantic tag information including a specific second character corresponding to the first character shown in the area. The structure determination unit further includes a processing unit, and the structure determination unit determines the hierarchical structure based on the semantic tag information and the area data indicating a rectangular area.

また、本発明は、上述の構造認識システムにおいて、前記構造判定部は、学習済みモデルを用いて前記階層構造を判定し、前記学習済みモデルは、文字と矩形とを含む学習画像における前記意味タグ情報及び矩形の領域を示す前記領域データと、前記学習画像に含まれる矩形の前記階層構造とを対応付けた学習用データセットを用いて、入力された画像に含まれる矩形の階層構造を出力するように学習されたモデルである。 Further, according to the present invention, in the above-mentioned structure recognition system, the structure determination unit determines the hierarchical structure using a learned model, and the learned model is the semantic tag in a learning image including characters and rectangles. The learning data set in which the area data indicating the information and the rectangular area and the hierarchical structure of the rectangle included in the learning image are associated with each other is used to output the hierarchical structure of the rectangle included in the input image. It is a model learned as.

また、本発明は、上述の構造認識システムにおいて、前記構造判定部は、前記対象画像において、前記階層構造を判定する着目矩形を選択し、前記選択した着目矩形の位置から所定の第１範囲内に位置する前記意味タグ情報である近傍意味タグ群を取得し、前記選択した着目矩形の位置から所定の第２範囲内に位置する矩形の前記領域データである近傍矩形群を取得し、取得した前記着目矩形、前記近傍意味タグ群、及び前記近傍矩形群の位置に応じた並べ替えを行うことにより、前記学習済みモデルに入力させる入力データの順序を決定する。 Further, in the above-described structure recognition system, the structure determination unit selects a rectangle of interest for determining the hierarchical structure in the target image, and is within a predetermined first range from the position of the selected rectangle of interest. The neighborhood semantic tag group which is the meaning tag information located in is acquired, and the neighborhood rectangle group which is the area data of the rectangle located within a predetermined second range from the position of the selected rectangle of interest is acquired and acquired. The order of the input data to be input to the trained model is determined by rearranging according to the positions of the rectangle of interest, the neighborhood semantic tag group, and the neighborhood rectangle group.

また、本発明は、対象画像における文字と矩形とのそれぞれの領域に関する領域データを取得する領域データ取得部と、前記領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する構造判定部と、を備える構造認識装置である。 Further, the present invention has an area data acquisition unit that acquires area data relating to each area of a character and a rectangle in the target image, and a structure that determines a hierarchical structure of a rectangle included in the target image based on the area data. It is a structure recognition device including a determination unit.

また、本発明は、領域データ取得部が、対象画像における文字と矩形とのそれぞれの領域に関する領域データを取得し、構造判定部が、前記領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する構造認識方法である。 Further, in the present invention, the area data acquisition unit acquires area data relating to each area of the character and the rectangle in the target image, and the structure determination unit acquires the area data of the rectangle included in the target image based on the area data. This is a structure recognition method for determining a hierarchical structure.

また、本発明は、コンピュータを、上記に記載の構造認識装置として動作させるためのプログラムであって、前記コンピュータを前記構造認識装置が備える各部として機能させるためのプログラムである。 Further, the present invention is a program for operating the computer as the structure recognition device described above, and is a program for operating the computer as each part included in the structure recognition device.

本発明によれば、矩形を含む文書のレイアウトを変換するために必要な情報を抽出することができる。 According to the present invention, it is possible to extract information necessary for converting the layout of a document including a rectangle.

実施形態に係る構造認識システム１の構成例を示す図である。It is a figure which shows the structural example of the structure recognition system 1 which concerns on embodiment. 実施形態に係る構造認識システム１が行う処理を説明する図である。It is a figure explaining the process performed by the structure recognition system 1 which concerns on embodiment. 実施形態に係る領域分割装置１０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the area division apparatus 10 which concerns on embodiment. 実施形態に係る構造認識装置３０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the structure recognition apparatus 30 which concerns on embodiment. 実施形態に係る変換テーブル３６０の構成例を示す図である。It is a figure which shows the structural example of the conversion table 360 which concerns on embodiment. 実施形態に係る領域分割装置１０が行う処理を説明する図である。It is a figure explaining the process performed by the area division apparatus 10 which concerns on embodiment. 実施形態に係る構造認識装置３０が行う処理を説明する図である。It is a figure explaining the process performed by the structure recognition apparatus 30 which concerns on embodiment. 実施形態に係る構造認識システム１を適用したレイアウト変換の例を示す図である。It is a figure which shows the example of the layout conversion which applied the structure recognition system 1 which concerns on embodiment. 実施形態に係る構造認識システム１が行う処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the process performed by the structure recognition system 1 which concerns on embodiment.

以下、発明の実施形態について図面を参照しながら説明する。 Hereinafter, embodiments of the invention will be described with reference to the drawings.

本実施形態の構造認識システム１は、矩形を含む文書のレイアウトを変換するために必要な情報を抽出するシステムである。 The structure recognition system 1 of the present embodiment is a system that extracts information necessary for converting the layout of a document including a rectangle.

以下の説明では、レイアウトを変換する対象とする文書が帳票である場合を例示して説明するが、これに限定されることはない。レイアウトを変換する対象は、少なくとも文字と矩形とが含まれる文書であればよく、例えば、アンケート、問診票、テスト問題、定型文テンプレート、アイディアシートなど、任意の文書であってよい。文書に含まれる矩形とは、文書において長方形や正方形など四角形状に囲まれた領域を示す。矩形は、実線で囲まれた領域のみならず、点線や特定の記号や図形により囲まれた矩形の領域、或いは、背景色の濃淡等により区分される矩形の領域を含む。また、文書に含まれる文字とは、単体の文字のみならず、複数の文字からなる文字列や、文字群を含む。 In the following description, the case where the document to be converted the layout is a form will be described as an example, but the present invention is not limited to this. The target for converting the layout may be any document including at least characters and rectangles, and may be any document such as a questionnaire, a questionnaire, a test question, a fixed phrase template, and an idea sheet. The rectangle included in the document indicates an area surrounded by a rectangle such as a rectangle or a square in the document. The rectangle includes not only an area surrounded by a solid line, but also a rectangular area surrounded by a dotted line or a specific symbol or figure, or a rectangular area divided by the shade of the background color or the like. In addition, the characters included in the document include not only a single character but also a character string composed of a plurality of characters and a character group.

レイアウトを変換するために必要な情報とは、帳票に含まれる文字及び矩形の階層構造を示す情報（以下、構造化データと称する）である。帳票に含まれる文字及び矩形の階層構造が判れば、その構造を維持したままレイアウトを変換することができる。したがって、レイアウト変換前と変換後において帳票に示される文字や記入欄等とそれらの相対的な位置関係を維持することができる。すなわち、帳票が示している内容を維持したままレイアウトを変更するためには、帳票に含まれる文字及び矩形の構造化データを抽出する必要がある。 The information required to convert the layout is information indicating the hierarchical structure of characters and rectangles included in the form (hereinafter referred to as structured data). If the hierarchical structure of characters and rectangles included in the form is known, the layout can be converted while maintaining that structure. Therefore, it is possible to maintain the relative positional relationship between the characters and entry fields shown on the form before and after the layout conversion. That is, in order to change the layout while maintaining the contents indicated by the form, it is necessary to extract the structured data of characters and rectangles included in the form.

構造化データの例を説明する。図６に示すように、帳票が、矩形の領域Ｋ１〜Ｋ５を含む場合を考える。図７に示すように、領域Ｋ１〜Ｋ３の構造化データは、上位の階層に領域Ｋ１、その下に領域Ｋ２、Ｋ３が従属されるという階層構造を示す情報である。領域Ｋ４、Ｋ５の構造化データは、上位の階層に領域Ｋ４、その下に領域Ｋ５が従属されるという階層構造を示す情報である。 An example of structured data will be described. As shown in FIG. 6, consider a case where the form includes rectangular areas K1 to K5. As shown in FIG. 7, the structured data of the areas K1 to K3 is information indicating a hierarchical structure in which the area K1 is subordinated to the upper layer and the areas K2 and K3 are subordinated to the upper layer. The structured data of the areas K4 and K5 is information indicating a hierarchical structure in which the area K4 is subordinated to the upper layer and the area K5 is subordinate to the area K5 below the area K4.

構造認識システム１の全体構成について、図１を用いて説明する。図１は、実施形態に係る構造認識システム１の構成例を示すブロック図である。図1に示すように、構造認識システム１は、例えば、領域分割装置１０と、ＯＣＲ装置２０と、構造認識装置３０とを備える。構造認識システム１におけるこれらの構成要素（領域分割装置１０、ＯＣＲ装置２０、及び構造認識装置３０）は、通信可能に接続される。
なお、図１ではＯＣＲ装置２０が文字認識を行う場合を例示したが、文字認識処理の機能が、領域分割装置１０、又は構造認識装置３０に設けられていてもよく、この場合、ＯＣＲ装置２０を省略することができる。 The overall configuration of the structure recognition system 1 will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration example of the structure recognition system 1 according to the embodiment. As shown in FIG. 1, the structure recognition system 1 includes, for example, an area division device 10, an OCR device 20, and a structure recognition device 30. These components (area dividing device 10, OCR device 20, and structure recognition device 30) in the structure recognition system 1 are communicably connected.
Although the case where the OCR device 20 performs character recognition is illustrated in FIG. 1, the character recognition processing function may be provided in the area dividing device 10 or the structure recognition device 30, and in this case, the OCR device 20 may be provided. Can be omitted.

領域分割装置１０は、帳票を、その帳票に示された文字及び矩形等の領域に分割する装置である。ＯＣＲ装置２０は、入力された画像に示された文字を認識する文字認識処理を行う装置である。構造認識装置３０は、帳票に示される矩形の階層構造を判定する装置である。 The area division device 10 is a device that divides a form into areas such as characters and rectangles shown on the form. The OCR device 20 is a device that performs character recognition processing for recognizing the characters shown in the input image. The structure recognition device 30 is a device that determines a rectangular hierarchical structure shown on a form.

以下では、構造認識装置３０が、帳票に示される「矩形」の階層構造を特定する場合を例示して説明する。帳票に示される「文字」の階層構造を特定する場合にも同様の方法を適用することができる。 In the following, a case where the structure recognition device 30 specifies the hierarchical structure of the “rectangle” shown in the form will be described as an example. The same method can be applied when specifying the hierarchical structure of "characters" shown in the form.

また、以下では、階層構造として、帳票に含まれる矩形の従属元となる矩形又は文字の識別情報（以下、親ＩＤと称する）を判定する場合を例示して説明する。この場合、構造化データは、矩形と、その矩形の親ＩＤとを対応付けた情報である。階層構造として親ＩＤを判定する方法を用いることによって、データ容量の増加を抑制しつつ、矩形の構造を一意に特定することができるため好適である。しかしながら、これに限定されることはない。矩形の階層構造を特定する方法として、矩形の従属先となる矩形又は文字の識別情報（以下、子ＩＤと称する）を判定することも考えられる。この場合、一つの矩形に複数の文字や矩形が従属する構造が有り得るため、矩形に複数の子ＩＤを対応付けられるような構成をとる必要があるためデータ容量の増加を招く要因となり得る。矩形の階層構造を特定する方法は、少なくとも階層構造が特定できれば、任意の方法であってよい。矩形の階層構造を特定する方法は、矩形に親ＩＤを対応付ける方法であってもよいし、矩形に子ＩＤを対応付ける方法であってもよいし、矩形に親ＩＤと子ＩＤの双方を対応付ける方法であってもよいし、他の方法であってもよいのは勿論である。 Further, in the following, as a hierarchical structure, a case where identification information (hereinafter, referred to as a parent ID) of a rectangle or a character that is a subordinate source of a rectangle included in a form is determined will be described as an example. In this case, the structured data is information in which the rectangle and the parent ID of the rectangle are associated with each other. By using the method of determining the parent ID as the hierarchical structure, it is possible to uniquely identify the rectangular structure while suppressing the increase in the data capacity, which is preferable. However, it is not limited to this. As a method of specifying the hierarchical structure of the rectangle, it is also conceivable to determine the identification information (hereinafter, referred to as a child ID) of the rectangle or the character to which the rectangle depends. In this case, since there may be a structure in which a plurality of characters or a rectangle are subordinate to one rectangle, it is necessary to have a configuration in which a plurality of child IDs can be associated with the rectangle, which may cause an increase in data capacity. The method for specifying the hierarchical structure of the rectangle may be any method as long as the hierarchical structure can be specified at least. The method of specifying the hierarchical structure of the rectangle may be a method of associating the parent ID with the rectangle, a method of associating the child ID with the rectangle, or a method of associating both the parent ID and the child ID with the rectangle. Of course, it may be another method.

ここで、構造認識システム１が行う処理について、図２を用いて説明する。図２は、実施形態に係る構造認識システム１が行う処理を説明する図である。図２に示すように、帳票Ｔをスキャナーにより読み込む処理（スキャン処理ＳＣ）によりスキャンされた帳票Ｔの画像（スキャン画像）を示す情報（スキャン画像データＳＤ）が作成される。スキャン画像は、領域分割装置１０による処理の対象となる画像である。すなわち、スキャン画像は、「対象画像」の一例である。スキャン画像データＳＤは、領域分割装置１０に入力される。これにより、領域分割装置１０は、スキャン画像データＳＤを取得する。 Here, the processing performed by the structure recognition system 1 will be described with reference to FIG. FIG. 2 is a diagram illustrating a process performed by the structure recognition system 1 according to the embodiment. As shown in FIG. 2, information (scanned image data SD) indicating an image (scanned image) of the form T scanned by a process of reading the form T by a scanner (scan processing SC) is created. The scanned image is an image to be processed by the area dividing device 10. That is, the scanned image is an example of the "target image". The scanned image data SD is input to the area dividing device 10. As a result, the area dividing device 10 acquires the scanned image data SD.

領域分割装置１０は、スキャン画像データＳＤに基づいて、帳票Ｔにおける文字、矩形などの要素ごとの領域に分割する。領域分割装置１０は、スキャン画像における、文字の領域を示す情報（文字領域データＭＤ）、及び矩形の領域を示す情報（矩形領域データＫＤ）を出力する。領域分割装置１０は、文字領域データＭＤをＯＣＲ装置２０に出力する。文字領域データＭＤは、「領域データ」の一例である。矩形領域データＫＤは、「領域データ」の一例である。 The area division device 10 divides into areas for each element such as characters and rectangles in the form T based on the scanned image data SD. The area dividing device 10 outputs information indicating a character area (character area data MD) and information indicating a rectangular area (rectangular area data KD) in the scanned image. The area division device 10 outputs the character area data MD to the OCR device 20. The character area data MD is an example of "area data". The rectangular area data KD is an example of "area data".

ＯＣＲ装置２０は、領域分割装置１０から取得した文字領域データに示される文字を認識する文字認識処理を行なう。ＯＣＲ装置２０は、認識した文字の内容を示す情報（文字認識データＭＮＤ）を構造認識装置３０に通知する。文字認識データＭＮＤは、構造認識装置３０に入力される。これにより、構造認識装置３０は、文字認識データＭＮＤを取得する。 The OCR device 20 performs character recognition processing for recognizing the characters shown in the character area data acquired from the area dividing device 10. The OCR device 20 notifies the structure recognition device 30 of information (character recognition data MND) indicating the content of the recognized character. The character recognition data MND is input to the structure recognition device 30. As a result, the structure recognition device 30 acquires the character recognition data MND.

一方、領域分割装置１０によって出力された文字領域データＭＤ、及び矩形領域データＫＤは、構造認識装置３０に入力される。これにより、構造認識装置３０は、文字領域データＭＤ、及び矩形領域データＫＤを取得する。 On the other hand, the character area data MD and the rectangular area data KD output by the area dividing device 10 are input to the structure recognition device 30. As a result, the structure recognition device 30 acquires the character area data MD and the rectangular area data KD.

構造認識装置３０は、領域分割装置１０から取得した文字領域データＭＤ、及び矩形領域データＫＤ、及びＯＣＲ装置２０から取得した文字認識データＭＮＤに基づいて、帳票に示される矩形の階層構造を判定する。構造認識装置３０は、変換テーブル３６０を用いて、文字認識データＭＮＤを、その文字の意味ごとに区分する。構造認識装置３０は、構造判定部３４よって、文字領域データＭＤ、矩形領域データＫＤ、及び文字認識データＭＮＤをその文字の意味ごとに区分した情報を用いて、矩形の階層構造を判定する。構造認識装置３０が矩形の階層構造を判定する方法については後で詳しく説明する。構造認識装置３０は、矩形の階層構造を示す情報（構造化データＫＺＤ）を出力する。 The structure recognition device 30 determines the rectangular hierarchical structure shown in the form based on the character area data MD acquired from the area dividing device 10, the rectangular area data KD, and the character recognition data MND acquired from the OCR device 20. .. The structure recognition device 30 uses the conversion table 360 to classify the character recognition data MND according to the meaning of the characters. The structure recognition device 30 determines the hierarchical structure of the rectangle by using the information obtained by dividing the character area data MD, the rectangular area data KD, and the character recognition data MND according to the meaning of the character by the structure determination unit 34. The method by which the structure recognition device 30 determines the rectangular hierarchical structure will be described in detail later. The structure recognition device 30 outputs information (structured data KZD) indicating a rectangular hierarchical structure.

ここで、領域分割装置１０の構成について、図３を用いて説明する。図３は、実施形態に係る領域分割装置１０の構成例を示すブロック図である。図３に示すように、領域分割装置１０は、例えば、画像データ取得部１１と、変調画像生成部１２と、領域判定部１３と、領域データ出力部１４と、記憶部１５とを備える。 Here, the configuration of the region dividing device 10 will be described with reference to FIG. FIG. 3 is a block diagram showing a configuration example of the area dividing device 10 according to the embodiment. As shown in FIG. 3, the area dividing device 10 includes, for example, an image data acquisition unit 11, a modulated image generation unit 12, an area determination unit 13, an area data output unit 14, and a storage unit 15.

画像データ取得部１１は、スキャン画像データＳＤを取得する。スキャン画像データＳＤは、例えば、画素ごとに、画像に関する情報が対応付けられた情報であり、画素ごとのグレースケール値が示された情報、或いは、画素ごとのＲＧＢ値が示された情報等である。画像データ取得部１１は、取得したスキャン画像データＳＤを、変調画像生成部１２、及び領域判定部１３に出力する。 The image data acquisition unit 11 acquires the scanned image data SD. The scanned image data SD is, for example, information in which information about an image is associated with each pixel, and is information indicating a grayscale value for each pixel, information indicating an RGB value for each pixel, or the like. is there. The image data acquisition unit 11 outputs the acquired scanned image data SD to the modulation image generation unit 12 and the area determination unit 13.

変調画像生成部１２は、画像データ取得部１１から取得したスキャン画像データＳＤに基づいて、強調画像を生成する。強調画像は、スキャン画像における画素ごとの画素値（グレースケール値や、ＲＧＢ値）を所定の変調条件に基づいて変更した画像である。 The modulated image generation unit 12 generates an emphasized image based on the scanned image data SD acquired from the image data acquisition unit 11. The emphasized image is an image in which the pixel value (grayscale value or RGB value) for each pixel in the scanned image is changed based on a predetermined modulation condition.

変調画像生成部１２は、例えば、スキャン画像のエッジを強調する強調処理を行った画像を強調画像として生成する。この場合、変調画像生成部１２は、スキャン画像におけるエッジを検出し、検出したエッジを強調する処理を行う。変調画像生成部１２は、例えば、従来行われている任意の手法により、スキャン画像におけるエッジを検出する。任意の手法とは、例えば、スキャン画像にメディアンフィルタ処理を行ったものと、ガウシアンフィルタなどによる平滑化処理を行ったものとの差分を検出することにより、エッジを検出する手法である。或いは、スキャン画像に、ラプラシアンフィルタやソーベル（Sobel）フィルタを適用することにより、エッジを検出する手法を用いてもよい。変調画像生成部１２は、検出したエッジをある特定の画素値（例えば、「黒」を示すグレースケール値や、ＲＧＢ値）とし、エッジとして検出されなかった画素の画素値を、別の特定の画素値（例えば、「白」を示すグレースケール値や、ＲＧＢ値）に変換することにより、強調画像を生成する。変調画像生成部１２は、生成した強調画像の画像データを領域判定部１３に出力する。 The modulated image generation unit 12 generates, for example, an image that has been subjected to an enhancement process for emphasizing the edges of the scanned image as an enhanced image. In this case, the modulated image generation unit 12 detects an edge in the scanned image and performs a process of emphasizing the detected edge. The modulated image generation unit 12 detects an edge in the scanned image by, for example, an arbitrary conventional method. The arbitrary method is, for example, a method of detecting an edge by detecting a difference between a scanned image subjected to a median filter process and a smoothed image processed by a Gaussian filter or the like. Alternatively, a method of detecting edges by applying a Laplacian filter or a Sobel filter to the scanned image may be used. The modulated image generation unit 12 sets the detected edge as a specific pixel value (for example, a grayscale value indicating “black” or an RGB value), and sets the pixel value of a pixel that is not detected as an edge to another specific pixel value. A highlighted image is generated by converting to a pixel value (for example, a grayscale value indicating "white" or an RGB value). The modulated image generation unit 12 outputs the image data of the generated emphasized image to the area determination unit 13.

領域判定部１３は、スキャン画像、及び強調画像のそれぞれの画像における文字、矩形及び背景（文字でなく、且つ矩形でない要素）の領域を判定する。領域判定部１３は、例えば、領域判定モデル１５０を用いて画像における文字、及び矩形の領域を判定する。領域判定モデル１５０は、記憶部１５に記載される情報であって、画像データと、画像における文字、矩形及び背景のそれぞれの領域を判定した結果とを対応付けた学習用データセットを、学習モデルに学習させた学習結果である。このような学習用データセットを学習させることにより、学習モデルは、入力された画像のデータに対し、その画像における文字、及び矩形の領域を、精度よく出力（予測）できるように学習される。学習モデルは、例えば、ＤＣＮＮ（Deep Convolutional Neural Network）であるが、これに限定されることはない。学習モデルとして、例えば、ＣＮＮ、決定木、階層ベイズ、ＳＶＭ（Support Vector Machine）などの手法、およびこれらを適宜組み合わせた手法が用いられてもよい。 The area determination unit 13 determines areas of characters, rectangles, and backgrounds (elements that are not characters and are not rectangles) in each of the scanned image and the emphasized image. The area determination unit 13 determines characters and rectangular areas in an image using, for example, the area determination model 150. The area determination model 150 is information described in the storage unit 15, and is a learning data set in which an image data is associated with a result of determining each area of a character, a rectangle, and a background in the image. It is a learning result that was made to learn. By training such a learning data set, the learning model is trained so that characters and rectangular areas in the input image can be accurately output (predicted) with respect to the input image data. The learning model is, for example, DCNN (Deep Convolutional Neural Network), but is not limited to this. As the learning model, for example, a method such as CNN, decision tree, hierarchical Bayes, SVM (Support Vector Machine), or a method in which these are appropriately combined may be used.

領域判定部１３は、スキャン画像、及び強調画像におけるそれぞれの判定結果に基づいて、スキャン画像における文字、矩形及び背景のそれぞれの領域を確定させる。領域判定部１３は、例えば、スキャン画像の判定結果と、強調画像の判定結果とが一致する領域については、スキャン画像の判定結果をそのまま確定させる。 The area determination unit 13 determines the respective areas of the character, the rectangle, and the background in the scanned image based on the respective determination results in the scanned image and the emphasized image. For example, the area determination unit 13 determines the determination result of the scanned image as it is for the region where the determination result of the scanned image and the determination result of the emphasized image match.

一方、領域判定部１３は、スキャン画像の判定結果と、強調画像の判定結果とが一致しない領域については、予め定めた所定の規定に基づいて文字、矩形及び背景のそれぞれの領域を確定させる。例えば、領域判定部１３は、スキャン画像、及び強調画像のうち少なくとも一方が文字、他方が背景と判定した領域を、スキャン画像における文字の領域に確定させる。領域判定部１３は、例えば、スキャン画像、及び強調画像のうち少なくとも一方が矩形、他方が背景と判定した領域を、スキャン画像における矩形の領域に確定させる。領域判定部１３は、スキャン画像における文字の領域を示す情報（文字領域データＭＤ）、及び、矩形の領域を示す情報（矩形領域データＫＤ）を領域データ出力部１４に出力する。 On the other hand, for the region where the determination result of the scanned image and the determination result of the emphasized image do not match, the region determination unit 13 determines the respective regions of the character, the rectangle, and the background based on a predetermined predetermined rule. For example, the area determination unit 13 determines an area determined that at least one of the scanned image and the emphasized image is a character and the other is a background as a character area in the scanned image. The area determination unit 13 determines, for example, an area in which at least one of the scanned image and the emphasized image is rectangular and the other is determined to be the background as a rectangular area in the scanned image. The area determination unit 13 outputs information indicating a character area (character area data MD) in the scanned image and information indicating a rectangular area (rectangular area data KD) to the area data output unit 14.

領域データ出力部１４は、文字領域データＭＤをＯＣＲ装置２０に出力する。領域データ出力部１４は、文字領域データＭＤ、及び矩形領域データＫＤを構造認識装置３０に出力する。記憶部１５は、領域判定モデル１５０を記憶する。 The area data output unit 14 outputs the character area data MD to the OCR device 20. The area data output unit 14 outputs the character area data MD and the rectangular area data KD to the structure recognition device 30. The storage unit 15 stores the area determination model 150.

ここで、構造認識装置３０の構成について、図４を用いて説明する。図４は、実施形態に係る構造認識装置３０の構成例を示すブロック図である。図４に示すように、構造認識装置３０は、例えば、領域データ取得部３１と、文字認識データ取得部３２と、前処理部３３と、構造判定部３４と、構造データ出力部３５と、記憶部３６とを備える。 Here, the configuration of the structure recognition device 30 will be described with reference to FIG. FIG. 4 is a block diagram showing a configuration example of the structure recognition device 30 according to the embodiment. As shown in FIG. 4, the structure recognition device 30 stores, for example, an area data acquisition unit 31, a character recognition data acquisition unit 32, a preprocessing unit 33, a structure determination unit 34, a structure data output unit 35, and storage. A unit 36 is provided.

領域データ取得部３１は、領域分割装置１０から領域データ（文字領域データＭＤ、及び矩形領域データＫＤ）を取得する。文字領域データＭＤは、例えば、文字の領域における位置を示す座標と、文字の領域であることを示す識別情報とが対応付けられた情報である。矩形領域データＫＤは、例えば、矩形の領域における位置を示す座標と、矩形の領域であることを示す識別情報とが対応付けられた情報である。ここで、領域における位置を示す座標とは、例えば、領域の形状が四角形である場合、当該四角形の四つの頂点のうち、対角線上に位置する二つの頂点の座標である。或いは、領域における位置を示す座標は、四角形の四つの頂点のうち予め定めた特定の頂点（例えば、左下の頂点）の座標と、縦横それぞれの長さを示す情報であってもよい。領域データ取得部３１は、取得した領域データを、構造判定部３４に出力する。 The area data acquisition unit 31 acquires area data (character area data MD and rectangular area data KD) from the area dividing device 10. The character area data MD is, for example, information in which coordinates indicating a position in a character area and identification information indicating that the character area is a character area are associated with each other. The rectangular area data KD is, for example, information in which coordinates indicating a position in a rectangular area and identification information indicating that the area is a rectangular area are associated with each other. Here, the coordinates indicating the position in the area are, for example, the coordinates of two vertices located on the diagonal line among the four vertices of the quadrangle when the shape of the area is a quadrangle. Alternatively, the coordinates indicating the position in the region may be the coordinates of a predetermined specific vertex (for example, the lower left vertex) among the four vertices of the quadrangle, and the information indicating the length of each of the vertical and horizontal directions. The area data acquisition unit 31 outputs the acquired area data to the structure determination unit 34.

文字認識データ取得部３２は、ＯＣＲ装置２０から文字認識データＭＮＤを取得する。文字認識データＭＮＤは、例えば、文字領域データに、その領域で認識された文字を示す文字認識結果が対応付けられた情報である。文字認識データ取得部３２は、取得した文字認識データＭＮＤを前処理部３３に出力する。 The character recognition data acquisition unit 32 acquires the character recognition data MND from the OCR device 20. The character recognition data MND is, for example, information in which character recognition results indicating characters recognized in the area are associated with character area data. The character recognition data acquisition unit 32 outputs the acquired character recognition data MND to the preprocessing unit 33.

前処理部３３は、後述する構造判定部３４が、階層構造を判定し易くする目的で、判定処理に先立って、事前の処理（前処理）を行う。具体的に、前処理部３３は、文字認識データ取得部３２から取得した文字認識データＭＮＤを用いて、意味タグ情報を生成する。 The pre-processing unit 33 performs pre-processing (pre-processing) prior to the determination processing for the purpose of facilitating the determination of the hierarchical structure by the structure determination unit 34, which will be described later. Specifically, the preprocessing unit 33 generates semantic tag information using the character recognition data MND acquired from the character recognition data acquisition unit 32.

意味タグ情報は、文字領域データに、その領域に示された文字の意味に応じたタグ（意味タグ）を付与した情報である。意味タグは、意味的に同等の文言であることを示す何らかの情報であればよい。意味タグは、例えば、意味的に同等の文言を代表させた文言であり、より具体的には、「お住まい」、「住所」、「おところ」、「ご住所」などの文言が、「住所」であることを示す情報である。前処理部３３が、意味タグ情報を生成することにより、意味的に同等の文言を、一つの文言に統一させることができる。したがって、文言を統一しない場合と比較して、後段の処理を簡素にでき、後段の構造判定部３４が階層構造を判定し易くなる。 The semantic tag information is information in which a tag (semantic tag) corresponding to the meaning of the character shown in the area is added to the character area data. The semantic tag may be any information indicating that the wording is semantically equivalent. The meaning tag is, for example, a word that represents a word that is semantically equivalent, and more specifically, a word such as "house", "address", "place", or "address" is "address". It is information indicating that it is an "address". By generating the semantic tag information, the preprocessing unit 33 can unify the semantically equivalent words into one word. Therefore, as compared with the case where the wording is not unified, the processing in the subsequent stage can be simplified, and the structure determination unit 34 in the subsequent stage can easily determine the hierarchical structure.

前処理部３３は、文字認識データＭＮＤにおける文字認識結果を、変換テーブル３６０（図５参照）を用いて所定の文字に変換することにより意味タグ情報を生成する。変換テーブル３６０は、記憶部３６に記憶される情報であり、変換前の文字と、変換後の文字とが対応付けられた情報（テーブル）である。例えば、変換テーブル３６０の変換前の文字列には、帳票において頻出する文字であり、かつ表記にばらつきが有り得る文字が示される。変換前の文字列は、住所、おところ、ご住所などである。変換後の文字列には、意味に応じて設定した一つの文字、例えば「住所、おところ、ご住所」に対応する「住所」との文言が示される。 The preprocessing unit 33 generates semantic tag information by converting the character recognition result in the character recognition data MND into a predetermined character using the conversion table 360 (see FIG. 5). The conversion table 360 is information stored in the storage unit 36, and is information (table) in which the characters before conversion and the characters after conversion are associated with each other. For example, the character string before conversion in the conversion table 360 indicates characters that frequently appear in the form and that may have variations in notation. The character string before conversion is an address, a place, an address, and the like. In the converted character string, one character set according to the meaning, for example, the wording "address" corresponding to "address, place, address" is shown.

前処理部３３は、文字認識データＭＮＤにおける文字認識結果に基づいて変換テーブル３６０を参照する。前処理部３３は、変換テーブル３６０の変換前に示される文字に、文字認識結果が存在する場合、その変換前の文字に対応付けられた、変換後の文字を取得する。前処理部３３は、文字認識結果を、変換テーブル３６０に示される変換後の文字に変換する。前処理部３３は、文字領域データに、変換後の文字を対応づけることにより意味タグ情報を生成する。前処理部３３は、生成した意味タグ情報を構造判定部３４に出力する。なお、前処理部３３は、変換テーブル３６０の変換前に示される文字に、文字認識結果が存在しない場合、文字認識結果を変換することなく、文字領域データに、文字認識結果の文字を対応づけることにより意味タグ情報を生成する。 The preprocessing unit 33 refers to the conversion table 360 based on the character recognition result in the character recognition data MND. When the character recognition result exists in the characters shown before the conversion in the conversion table 360, the preprocessing unit 33 acquires the converted characters associated with the characters before the conversion. The preprocessing unit 33 converts the character recognition result into the converted characters shown in the conversion table 360. The preprocessing unit 33 generates semantic tag information by associating the converted characters with the character area data. The preprocessing unit 33 outputs the generated semantic tag information to the structure determination unit 34. In addition, when the character recognition result does not exist in the character shown before the conversion of the conversion table 360, the preprocessing unit 33 associates the character of the character recognition result with the character area data without converting the character recognition result. This generates semantic tag information.

構造判定部３４は、矩形領域データ、及び意味タグ情報を用いて、矩形の階層構造を判定する。構造判定部３４は、構造判定モデル３６１を用いて矩形の階層構造を判定する。構造判定モデル３６１は、矩形領域データ、及び意味タグ情報と、矩形の親ＩＤとを対応付けた学習用データセットを、学習モデルに学習させた学習結果である。このような学習用データセットを学習させることにより、学習モデルは、入力された矩形領域データ、及び意味タグ情報に対し、矩形の親ＩＤを、精度よく出力（予測）できるように学習される。学習モデルは、例えば、ＲＮＮ（Recurrent Neural Network）である。ＲＮＮを用いることにより、順序づけられた系列情報に基づく学習を実行することができる。 The structure determination unit 34 determines the hierarchical structure of the rectangle by using the rectangular area data and the semantic tag information. The structure determination unit 34 determines the rectangular hierarchical structure using the structure determination model 361. The structure determination model 361 is a learning result in which a learning model is trained with a learning data set in which rectangular area data, semantic tag information, and a rectangular parent ID are associated with each other. By training such a learning data set, the learning model is trained so that the parent ID of the rectangle can be accurately output (predicted) with respect to the input rectangular area data and the semantic tag information. The learning model is, for example, an RNN (Recurrent Neural Network). By using RNN, learning based on ordered sequence information can be performed.

構造判定部３４は、学習モデルにＲＮＮを用いる場合、構造判定モデル３６１に入力させるデータ（以下、入力データという）の順序が情報を持つように、入力データを生成する。構造判定部３４は、スキャン画像における着目矩形を選択する。着目矩形は、階層構造を判定したい矩形である。構造判定部３４は、着目矩形から所定の範囲（以下、第１範囲という）にある矩形領域データ（以下、近傍矩形群という）を抽出する。構造判定部３４は、着目矩形から所定の範囲（以下、第２範囲という）にある意味タグ情報（以下、近傍意味タグ群という）を抽出する。ここでの所定の範囲は、任意に設定されてよい。第１範囲と第２範囲とが互いに異なる範囲であってもよいし、同じ範囲であってもよい。また、第１範囲、第２範囲が予め定められた固定値であってもよいし、スキャン画像のサイズや、着目矩形の大きさに応じて、第１範囲、第２範囲が変動するようにしてもよい。 When RNN is used as the learning model, the structure determination unit 34 generates input data so that the order of the data to be input to the structure determination model 361 (hereinafter referred to as input data) has information. The structure determination unit 34 selects the rectangle of interest in the scanned image. The rectangle of interest is a rectangle whose hierarchical structure is to be determined. The structure determination unit 34 extracts rectangular area data (hereinafter, referred to as a neighborhood rectangle group) in a predetermined range (hereinafter, referred to as a first range) from the rectangle of interest. The structure determination unit 34 extracts semantic tag information (hereinafter, referred to as a neighborhood semantic tag group) within a predetermined range (hereinafter, referred to as a second range) from the rectangle of interest. The predetermined range here may be arbitrarily set. The first range and the second range may be different ranges from each other, or may be the same range. Further, the first range and the second range may be predetermined fixed values, and the first range and the second range may be changed according to the size of the scanned image and the size of the rectangle of interest. You may.

構造判定部３４は、着目矩形、近傍矩形群、近傍意味タグ群のそれぞれの代表座標（例えば、中心座標）をラスター順にソートしたデータを入力データとする。ここでのラスター順とは、二次元に配置された画素を読み込む（或いは、書込む）際における、所定の方向に沿った読み込み（書き込み）順序である。例えば、ラスター順は、画像における水平方向の左側から右側へ向かう方向に沿う順序であり、且つ垂直方向の上側から下側へ向かう方向である。しかしながら、ラスター順における所定の方向は、任意の方向であってよく、右側から左側へ向かう方向に沿う順序であってもよいし、下側から上側へ向かう方向に沿う順序であってもよい。 The structure determination unit 34 uses data obtained by sorting the representative coordinates (for example, center coordinates) of the rectangle of interest, the neighborhood rectangle group, and the neighborhood meaning tag group in raster order as input data. The raster order here is a read (write) order along a predetermined direction when reading (or writing) pixels arranged in two dimensions. For example, the raster order is the order along the horizontal direction from the left side to the right side in the image, and the vertical direction from the upper side to the lower side. However, the predetermined direction in the raster order may be any direction, may be an order along the direction from the right side to the left side, or may be an order along the direction from the lower side to the upper side.

構造判定部３４は、生成した入力データを構造判定モデル３６１に入力させることにより得られる出力に基づいて、着目矩形の親ＩＤを判定する。構造判定部３４は、スキャン画像における全ての矩形を一つずつ着目矩形として選択し、上述した方法を繰り返し行うことにより、全ての矩形の親ＩＤを判定する。これにより、構造判定部３４は、矩形の階層構造を判定する。構造判定部３４は、判定した矩形の階層構造を示す情報、すなわち構造化データを構造データ出力部３５に出力する。構造データ出力部３５は、構造化データを出力する。記憶部３６は、変換テーブル３６０、及び構造判定モデル３６１を記憶する。 The structure determination unit 34 determines the parent ID of the rectangle of interest based on the output obtained by inputting the generated input data to the structure determination model 361. The structure determination unit 34 selects all the rectangles in the scanned image one by one as the rectangles of interest, and repeats the above-mentioned method to determine the parent IDs of all the rectangles. As a result, the structure determination unit 34 determines the rectangular hierarchical structure. The structure determination unit 34 outputs information indicating the determined hierarchical structure of the rectangle, that is, structured data to the structure data output unit 35. The structural data output unit 35 outputs structured data. The storage unit 36 stores the conversion table 360 and the structure determination model 361.

なお、上述した入力データを入力させて構造判定モデル３６１に矩形の階層構造を出力させる場合、学習段階においても、同様な方法で学習用データセットにおける入力データを生成する必要がある。すなわち、学習用の画像から着目矩形を選択し、選択した着目矩形に対する近傍矩形群、及び近傍意味タグ群を抽出する。そして、着目矩形、近傍矩形群、近傍意味タグ群のそれぞれの代表座標（例えば、中心座標）をラスター順にソートしたデータを入力データとする。入力データを学習モデルに入力することにより得られる出力が、その着目矩形の親ＩＤとなるように学習させることにより、構造判定モデル３６１が生成される。 When the above-mentioned input data is input and the structure determination model 361 outputs a rectangular hierarchical structure, it is necessary to generate the input data in the learning data set by the same method also in the learning stage. That is, the rectangle of interest is selected from the image for learning, and the neighborhood rectangle group and the neighborhood meaning tag group for the selected rectangle of interest are extracted. Then, the data obtained by sorting the representative coordinates (for example, the center coordinates) of the rectangle of interest, the neighborhood rectangle group, and the neighborhood meaning tag group in raster order is used as the input data. The structure determination model 361 is generated by learning so that the output obtained by inputting the input data to the learning model becomes the parent ID of the rectangle of interest.

図５は、実施形態に係る変換テーブル３６０の構成例を示す図である。変換テーブル３６０は、例えば、意味タグＩＤ、変換後、変換前などの各項目を備える。意味タグＩＤには、意味タグを一意に識別する識別情報が示される。変換後には変換後の文字が示される。変換前には変換前の文字列が示される。この例では、意味タグＩＤ（Ｅ０００１）に、変換後の文字として「氏名」、変換前の文字として「お名前」、「名前」、「おなまえ」が示されている。 FIG. 5 is a diagram showing a configuration example of the conversion table 360 according to the embodiment. The conversion table 360 includes items such as a meaning tag ID, after conversion, and before conversion. The meaning tag ID indicates identification information that uniquely identifies the meaning tag. After conversion, the converted characters are shown. Before conversion, the character string before conversion is shown. In this example, the meaning tag ID (E0001) indicates "name" as the characters after conversion, and "name", "name", and "name" as the characters before conversion.

図６は、実施形態に係る領域分割装置１０が行う処理を説明する図である。図６には、領域分割装置１０が判定した文字及び矩形それぞれの領域の例が示されている。領域分割装置１０は、図６に示すスキャン画像から、文字の領域Ｍ１〜Ｍ６、及び矩形の領域Ｋ１〜Ｋ５のそれぞれの領域を抽出する。領域Ｍ１は、「申込書」の文字が示されている領域である。領域Ｍ２は、「ご住所」の文字が示されている領域である。領域Ｍ３は、「都道府県」の文字が示されている領域である。領域Ｍ４は、「お名前」の文字が示されている領域である。領域Ｍ５は、「記入日」の文字が示されている領域である。領域Ｍ６は、「年月日」の文字が示されている領域である。このように、領域分割装置１０は、例えば、文字の領域を、矩形（四角形）の形状の領域として抽出するようにしてもよい。 FIG. 6 is a diagram illustrating a process performed by the area dividing device 10 according to the embodiment. FIG. 6 shows an example of each area of the character and the rectangle determined by the area dividing device 10. The area dividing device 10 extracts each of the character areas M1 to M6 and the rectangular areas K1 to K5 from the scanned image shown in FIG. The area M1 is an area in which the characters "application form" are shown. The area M2 is an area in which the characters "address" are shown. The area M3 is an area in which the characters "prefecture" are shown. The area M4 is an area in which the characters "name" are shown. The area M5 is an area in which the characters "entry date" are shown. The area M6 is an area in which the characters "year / month / day" are shown. In this way, the area dividing device 10 may, for example, extract a character area as a rectangular (quadrilateral) shaped area.

領域Ｋ１は、領域Ｍ２を囲む矩形が示されている領域である。領域Ｋ２は、領域Ｍ３が枠内の右端に配置されるように、領域Ｍ３を囲む矩形が示されている領域である。領域Ｋ３は、領域Ｋ２の右側に配置される矩形が示されている領域である。領域Ｋ３は、領域Ｍ４を囲む矩形が示されている領域である。領域Ｋ５は、領域Ｋ４の右側に配置される矩形が示されている領域である。 The area K1 is an area in which a rectangle surrounding the area M2 is shown. The area K2 is an area in which a rectangle surrounding the area M3 is shown so that the area M3 is arranged at the right end in the frame. The area K3 is an area where a rectangle arranged on the right side of the area K2 is shown. The area K3 is an area in which a rectangle surrounding the area M4 is shown. The area K5 is an area where a rectangle arranged on the right side of the area K4 is shown.

図７は、実施形態に係る構造認識装置３０が行う処理を説明する図である。図７には、構造認識装置３０が判定した構造化データを、ツリー構造により可視化した例が示されている。図７において、領域Ｍ１＃は、文字の領域Ｍ１に示された文字が、前処理部３３により変換された後の領域を示している。領域Ｍ２＃〜Ｍ６＃についても同様に、文字の領域Ｍ２〜Ｍ６に示された文字が、前処理部３３により変換された後の領域を示している。 FIG. 7 is a diagram illustrating a process performed by the structure recognition device 30 according to the embodiment. FIG. 7 shows an example in which the structured data determined by the structure recognition device 30 is visualized by a tree structure. In FIG. 7, the area M1 # indicates an area after the characters shown in the character area M1 have been converted by the preprocessing unit 33. Similarly, for the areas M2 # to M6 #, the characters shown in the character areas M2 to M6 indicate the area after being converted by the preprocessing unit 33.

構造認識装置３０は、例えば、図６に示すスキャン画像における意味タグ情報、及び矩形領域データに基づいて、矩形の階層構造を判定する。構造認識装置３０は、領域Ｋ１の親（従属元）は、領域Ｋ２であると判定する。構造認識装置３０は、領域Ｋ４の親は、領域Ｋ２であると判定する。構造認識装置３０は、領域Ｋ５の親は、領域Ｋ３であると判定する。 The structure recognition device 30 determines the hierarchical structure of the rectangle based on, for example, the semantic tag information in the scanned image shown in FIG. 6 and the rectangular area data. The structure recognition device 30 determines that the parent (subordinate source) of the area K1 is the area K2. The structure recognition device 30 determines that the parent of the area K4 is the area K2. The structure recognition device 30 determines that the parent of the area K5 is the area K3.

図８は、実施形態に係るレイアウト変換の例を示す図である。図８に示すように、図６に示す縦長の帳票を、横長のレイアウトに変換することを考える。この場合、構造認識装置３０により判定された矩形の階層構造を維持しつつレイアウトを変更する。すなわち、領域Ｋ１の親が領域Ｋ２となるように、領域Ｋ４の親が領域Ｋ２となるように、レイアウトを変換する。こうすることで、元の帳票に記載されていた必要事項を過不足なく、且つ元の帳票と同等な感覚で必要事項を記載させることができるようにレイアウトの変換を行うことが可能となる。なお、この例に示すように、必要に応じて領域Ｋ６、Ｋ７を補うようにしてもよい。領域Ｋ６は、「日付」の文字を内包する矩形の領域である。領域Ｋ７は、「年月日」の文字を内包する矩形の領域である。例えば、領域Ｋ７の親が領域Ｋ６であると判定された場合、その判定結果を用いることにより、図８に示すような適切な変換を行うことが可能となる。 FIG. 8 is a diagram showing an example of layout conversion according to the embodiment. As shown in FIG. 8, consider converting the vertically long form shown in FIG. 6 into a horizontally long layout. In this case, the layout is changed while maintaining the rectangular hierarchical structure determined by the structure recognition device 30. That is, the layout is transformed so that the parent of the area K1 becomes the area K2 and the parent of the area K4 becomes the area K2. By doing so, it is possible to convert the layout so that the necessary items described in the original form can be described without excess or deficiency and with the same feeling as the original form. As shown in this example, the regions K6 and K7 may be supplemented as needed. The area K6 is a rectangular area containing the characters "date". The area K7 is a rectangular area containing the characters "year / month / day". For example, when it is determined that the parent of the region K7 is the region K6, it is possible to perform an appropriate conversion as shown in FIG. 8 by using the determination result.

図９は、実施形態に係る構造認識システム１が行う処理の流れを示すシーケンス図である。領域分割装置１０は、スキャン画像データを取得し（ステップＳ１０）、スキャン画像における文字及び矩形の領域を判定することにより、文字と矩形それぞれの領域データを生成する（ステップＳ１１）。構造認識装置３０は、文字の領域データ、及びＯＣＲ装置２０により文字認識された文字認識データを用いて、意味タグ情報を生成する（ステップＳ１２）。 FIG. 9 is a sequence diagram showing a flow of processing performed by the structure recognition system 1 according to the embodiment. The area division device 10 acquires the scanned image data (step S10) and determines the area of the character and the rectangle in the scanned image to generate the area data of each character and the rectangle (step S11). The structure recognition device 30 generates semantic tag information using the character area data and the character recognition data recognized by the OCR device 20 (step S12).

構造認識装置３０は、スキャン画像から着目矩形を選択する（ステップＳ１３）。構造認識装置３０は、着目矩形における近傍意味タグ群を取得し（ステップＳ１４）、近傍矩形群を取得する（ステップＳ１５）。構造認識装置３０は、着目矩形、近傍意味タグ群、及び近傍矩形群の代表座標をラスター順にソートすることにより入力データを生成する（ステップＳ１６）。構造認識装置３０は、入力データを構造判定モデル３６１に入力させることにより得られる出力に基づいて、着目矩形の親ＩＤを判定する（ステップＳ１７）。構造認識装置３０は、スキャン画像における全ての矩形について親ＩＤを判定したか否かを判断し（ステップＳ１８）、親ＩＤを判定していない矩形がある場合には、ステップＳ１３に戻り、親ＩＤを判定する処理を繰返す。 The structure recognition device 30 selects the rectangle of interest from the scanned image (step S13). The structure recognition device 30 acquires the neighborhood meaning tag group in the rectangle of interest (step S14), and acquires the neighborhood rectangle group (step S15). The structure recognition device 30 generates input data by sorting the representative coordinates of the rectangle of interest, the neighborhood meaning tag group, and the neighborhood rectangle group in raster order (step S16). The structure recognition device 30 determines the parent ID of the rectangle of interest based on the output obtained by inputting the input data into the structure determination model 361 (step S17). The structure recognition device 30 determines whether or not the parent ID has been determined for all the rectangles in the scanned image (step S18), and if there is a rectangle for which the parent ID has not been determined, returns to step S13 and returns to the parent ID. The process of determining is repeated.

以上説明したように、実施形態の構造認識システム１は、画像データ取得部１１と、領域判定部１３と、構造判定部３４とを備える。画像データ取得部１１は、文字と矩形とを含むスキャン画像（「対象画像」の一例）における、画像データを取得する。領域判定部１３は、スキャン画像における文字と矩形のそれぞれの領域を判定する。構造判定部３４は、領域データに基づいて、前記対象画像に含まれる矩形の階層構造を判定する。これにより、実施形態の構造認識システム１では、矩形の階層構造を判定することができる。したがって、レイアウトの変更に必要な情報を取得することが可能である。 As described above, the structure recognition system 1 of the embodiment includes an image data acquisition unit 11, an area determination unit 13, and a structure determination unit 34. The image data acquisition unit 11 acquires image data in a scanned image (an example of a “target image”) including characters and rectangles. The area determination unit 13 determines each area of the character and the rectangle in the scanned image. The structure determination unit 34 determines the rectangular hierarchical structure included in the target image based on the area data. Thereby, in the structure recognition system 1 of the embodiment, the rectangular hierarchical structure can be determined. Therefore, it is possible to acquire the information necessary for changing the layout.

また、実施形態の構造認識システム１は、前処理部３３を更に備える。前処理部３３は、文字領域データを用いて、当該領域に示される文字認識結果（「第１文字」の一例）に対応する、意味に応じて設定される文字（「特定の第２文字」の一例）を含む意味タグ情報を生成する。これにより、実施形態の構造認識システム１では、文字領域データに示されている文字について、その意味に応じたタグ付けを行うことができ、構造判定部３４による判定の処理を、タグ付けを行わない場合と比較して、簡単にすることが可能である。 Further, the structure recognition system 1 of the embodiment further includes a preprocessing unit 33. The preprocessing unit 33 uses the character area data to set characters (“specific second character”” according to the meaning corresponding to the character recognition result (an example of “first character”) shown in the area. Generate semantic tag information including one example). As a result, in the structure recognition system 1 of the embodiment, the characters shown in the character area data can be tagged according to their meanings, and the determination process by the structure determination unit 34 is tagged. It can be easier than it would be without it.

また、実施形態の構造認識システム１では、構造判定部３４は、構造判定モデル３６１（「学習済みモデル」の一例）を用いて、矩形の階層構造を判定する。構造判定モデル３６１は、文字と矩形とを含む学習画像における、意味タグ情報及び矩形領域データと、学習画像に含まれる矩形の構造化データと、を対応付けた学習用データセットを用いて、入力された画像に含まれる矩形の構造化データを出力するように学習されたモデルである。これにより、実施形態の構造認識システム１では、学習済みモデルにデータを入力させるという簡単な方法で、矩形の階層構造を認識することが可能である。 Further, in the structure recognition system 1 of the embodiment, the structure determination unit 34 determines the rectangular hierarchical structure by using the structure determination model 361 (an example of the “learned model”). The structure determination model 361 inputs using a training data set in which semantic tag information and rectangular area data in a training image including characters and rectangles and structured data of a rectangle included in the training image are associated with each other. It is a model trained to output the structured data of the rectangle included in the image. As a result, in the structure recognition system 1 of the embodiment, it is possible to recognize the rectangular hierarchical structure by a simple method of inputting data into the trained model.

また、実施形態の構造認識システム１では、構造判定部３４は、スキャン画像において、着目矩形を選択し、着目矩形における近傍意味タグ群を取得し、着目矩形における近傍矩形群を取得し、取得した着目矩形、意味タグ群、及び近傍矩形群の位置に応じたソート（並べ替え）を行うことにより、構造判定モデル３６１に入力させる入力データの順序を決定する。これにより、実施形態の構造認識システム１では、入力データに意味（情報）を持たせることができ、ＲＮＮ系の学習モデルに基づく学習済みモデルを用いて、入力データの順序を考慮した予測、すなわち近傍にある文字や矩形との関係から、親ＩＤを予測させることができ、予測の精度向上が期待できる。 Further, in the structure recognition system 1 of the embodiment, the structure determination unit 34 selects the rectangle of interest in the scanned image, acquires the neighborhood meaning tag group in the rectangle of interest, and acquires and acquires the neighborhood rectangle group in the rectangle of interest. By sorting (sorting) according to the positions of the rectangle of interest, the semantic tag group, and the neighboring rectangle group, the order of the input data to be input to the structure determination model 361 is determined. As a result, in the structure recognition system 1 of the embodiment, the input data can be given meaning (information), and the prediction considering the order of the input data, that is, using the trained model based on the learning model of the RNN system, that is, The parent ID can be predicted from the relationship with nearby characters and rectangles, and improvement in prediction accuracy can be expected.

また、実施形態の構造認識装置３０は、領域データ取得部３１と構造判定部３４とを備える。領域データ取得部３１は、スキャン画像における文字と矩形とのそれぞれの領域に関する領域データを取得する。構造判定部３４は、領域データに基づいて、スキャン画像に含まれる矩形の階層構造を判定する。これにより、上述した効果と同様の効果を奏する。 Further, the structure recognition device 30 of the embodiment includes an area data acquisition unit 31 and a structure determination unit 34. The area data acquisition unit 31 acquires area data relating to each area of the character and the rectangle in the scanned image. The structure determination unit 34 determines the rectangular hierarchical structure included in the scanned image based on the area data. As a result, the same effect as the above-mentioned effect is obtained.

上述した実施形態における構造認識システム１、及び構造認識装置３０の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The structure recognition system 1 and the structure recognition device 30 in the above-described embodiment may be realized by a computer in whole or in part. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１…構造認識システム
１０…領域分割装置
１１…画像データ取得部
１２…変調画像生成部
１３…領域判定部
１４…領域データ出力部
１５…記憶部
１５０…領域判定モデル
２０…ＯＣＲ装置
３０…構造認識装置
３１…領域データ取得部
３２…文字認識データ取得部
３３…前処理部
３４…構造判定部
３５…構造データ出力部
３６…記憶部
３６０…変換テーブル
３６１…構造判定モデル 1 ... Structure recognition system 10 ... Area division device 11 ... Image data acquisition unit 12 ... Modulated image generation unit 13 ... Area determination unit 14 ... Area data output unit 15 ... Storage unit 150 ... Area determination model 20 ... OCR device 30 ... Structure recognition Device 31 ... Area data acquisition unit 32 ... Character recognition data acquisition unit 33 ... Preprocessing unit 34 ... Structural judgment unit 35 ... Structural data output unit 36 ... Storage unit 360 ... Conversion table 361 ... Structural judgment model

Claims

An image data acquisition unit that acquires image data in a target image including characters and rectangles,
An area determination unit that determines each area of a character and a rectangle in the target image,
A structure determination unit that determines a rectangular hierarchical structure included in the target image based on the area data related to the area determined by the area determination unit.
Structure recognition system equipped with.

A preprocessing unit that generates semantic tag information including a specific second character corresponding to the first character shown in the area by using the area data indicating the area of the character is further provided.
The structure determination unit determines the hierarchical structure based on the semantic tag information and the area data indicating a rectangular area.
The structure recognition system according to claim 1.

The structure determination unit determines the hierarchical structure using the trained model, and determines the hierarchical structure.
The trained model is a learning data set in which the semantic tag information in a learning image including characters and a rectangle and the area data indicating a rectangular area are associated with the hierarchical structure of the rectangle included in the learning image. Is a model trained to output the hierarchical structure of the rectangles contained in the input image using.
The structure recognition system according to claim 2.

The structure determination unit selects a rectangle of interest for determining the hierarchical structure in the target image, and a neighborhood semantic tag group which is the semantic tag information located within a predetermined first range from the position of the selected rectangle of interest. Is acquired, and the neighborhood rectangle group which is the area data of the rectangle located within the predetermined second range from the position of the selected rectangle of interest is acquired, and the acquired rectangle of interest, the neighborhood meaning tag group, and the neighborhood are obtained. By rearranging according to the position of the rectangle group, the order of the input data to be input to the trained model is determined.
The structure recognition system according to claim 3.

An area data acquisition unit that acquires area data related to each area of characters and rectangles in the target image,
A structure determination unit that determines the hierarchical structure of rectangles included in the target image based on the area data, and
A structure recognition device comprising.

The area data acquisition unit acquires area data related to each area of the character and the rectangle in the target image, and obtains the area data.
The structure determination unit determines the rectangular hierarchical structure included in the target image based on the area data.
Structure recognition method.

A program for operating a computer as the structure recognition device according to claim 5, wherein the computer functions as each part included in the structure recognition device.