JP6612680B2

JP6612680B2 - Logical relationship recognition apparatus, logical relationship recognition method, and logical relationship recognition program

Info

Publication number: JP6612680B2
Application number: JP2016111296A
Authority: JP
Inventors: 郁子高木; 光一山田; 長年名和; 勉丸山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-06-02
Filing date: 2016-06-02
Publication date: 2019-11-27
Anticipated expiration: 2036-06-02
Also published as: JP2017219883A

Description

本発明は、論理関係認識装置、論理関係認識方法および論理関係認識プログラムに関する。 The present invention relates to a logical relationship recognition apparatus, a logical relationship recognition method, and a logical relationship recognition program.

様々な企業における業務の処理に、ＯｐＳ（業務支援システム：Operation Support System）が導入されている。一方で、ＯｐＳが支援できない業務については、各組織独自の運用がなされており、その中で帳票は情報流通に利用されている。帳票は半構造データのため、それぞれの値の意味（論理関係）は、人にとっては理解できても機械にとっては理解困難なものであり、帳票上のデータを活用する際に多大な人手の作業負担が発生している。これに対し、罫線枠の並びを基に、帳票の項目名間の論理関係、および項目名と項目値との間の論理関係を自動的に認識する技術が知られている（例えば、特許文献１および非特許文献１〜３を参照）。 OpS (Operation Support System: Operation Support System) has been introduced for business processing in various companies. On the other hand, for operations that cannot be supported by OpS, each organization has its own operations, and forms are used for information distribution. Since the form is semi-structured data, the meaning (logical relationship) of each value can be understood by a person but difficult for a machine, and a large amount of manual work is required when utilizing the data on the form. There is a burden. On the other hand, a technique for automatically recognizing a logical relationship between item names of a form and a logical relationship between item names and item values based on the arrangement of ruled line frames is known (for example, Patent Documents). 1 and non-patent documents 1 to 3).

特開２０１６−００４４１７号公報JP, 2006-004417, A

高木ほか、“電子帳票群に対する横断的データ操作技術のための抽出手法の検討”、信学技報、vol.114、No.150、pp.1-6(2014)Takagi et al., “Examination of Extraction Techniques for Cross-sectional Data Manipulation Techniques for Electronic Forms”, IEICE Technical Report, vol.114, No.150, pp.1-6 (2014) 高木ほか、“電子帳票データ連携のための帳票視覚表現の調査”、電子情報通信学会通信ソサイエティ大会講演論文集(2015)Takagi et al., “Investigation of visual representation of forms for linking electronic form data”, IEICE Communication Society Conference Proceedings (2015) 高木ほか、”視覚表現を利用した電子帳票のデータ構造変換手法の検討”、信学技報、vol.115、No.409、pp.25-30(2016)Takagi et al., "Examination of data structure conversion method for electronic forms using visual expression", IEICE Technical Report, vol.115, No.409, pp.25-30 (2016)

しかしながら、従来の技術には、帳票に縦列挙、横列挙、縦列挙と横列挙の複合型、または列挙型入れ子構造が含まれる場合、帳票の項目名間の論理関係、および項目名と項目値との間の論理関係を正確に認識することができない場合があるという問題があった。 However, in the conventional technology, when a form includes vertical enumeration, horizontal enumeration, a composite type of vertical enumeration and horizontal enumeration, or enumerated type nested structure, the logical relationship between the item names of the form, and the item names and item values There is a problem in that the logical relationship between and cannot be recognized correctly.

例えば、縦列挙の場合、縦方向に項目名に対応した同じ幅の罫線枠が複数存在する場合がある。さらに、それぞれの項目名は項目値を持ち、項目値に対応した罫線枠の幅も、項目名に対応した罫線枠の幅と同じである。このような場合、従来の技術では、どの罫線枠が項目名であるかを特定することができないため、項目名間の論理関係、および項目名と項目値との間の論理関係を正確に認識することは困難である。 For example, in the case of vertical enumeration, there may be a plurality of ruled line frames of the same width corresponding to the item names in the vertical direction. Further, each item name has an item value, and the width of the ruled line frame corresponding to the item value is the same as the width of the ruled line frame corresponding to the item name. In such a case, the conventional technology cannot identify which ruled line frame is the item name, so it accurately recognizes the logical relationship between the item names and the logical relationship between the item names and the item values. It is difficult to do.

本発明の論理関係認識装置は、帳票の項目名または項目値を表す領域に関する情報をノードとして表し、前記ノード間の隣接関係をエッジとして表したグラフを基に、前記ノードのうち、あらかじめ設定された条件を満たすノードを、項目名を表す領域のノードである項目名ノードとして抽出する抽出部と、１つのノードの所定の方向に、複数のノードが隣接している場合、前記１つのノードと前記複数のノードとの隣接関係を表すエッジを削除する第１の削除部と、前記項目名ノードのうち、所定の方向に項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、前記項目値ノードとの隣接関係を表すエッジを削除する第２の削除部と、前記第１の削除部および前記第２の削除部によってエッジの削除が行われた前記グラフを基に、前記項目名ノードと前記項目値ノードとの間の論理関係を取得する第１の取得部と、前記第１の削除部によってエッジの削除が行われた前記グラフを基に、前記項目名ノード間の包含関係を取得する第２の取得部と、前記第１の取得部によって取得された論理関係、および前記第２の取得部によって取得された包含関係を合成した木構造のデータを作成する合成部と、を有することを特徴とする。 The logical relationship recognition apparatus of the present invention represents information relating to an area representing an item name or item value of a form as a node, and is set in advance among the nodes based on a graph representing an adjacency relationship between the nodes as an edge. An extraction unit that extracts a node satisfying the condition as an item name node that is a node in an area representing an item name, and when a plurality of nodes are adjacent in a predetermined direction of one node, An item in which an item value node, which is a node in an area representing an item value in a predetermined direction, is adjacent to the first deletion unit that deletes an edge representing an adjacent relationship with the plurality of nodes. A second deletion unit that deletes an edge representing an adjacency relationship between the name node and the item value node, and the deletion of the edge is performed by the first deletion unit and the second deletion unit Based on the rough, based on the first acquisition unit that acquires the logical relationship between the item name node and the item value node, and the graph in which the edge deletion is performed by the first deletion unit, A second acquisition unit that acquires an inclusion relationship between the item name nodes, a logical relationship acquired by the first acquisition unit, and a tree structure that combines the inclusion relationship acquired by the second acquisition unit And a synthesis unit for creating data.

また、本発明の論理関係認識方法は、論理関係認識装置で実行される論理関係認識方法であって、帳票の項目名または項目値を表す領域に関する情報をノードとして表し、前記ノード間の隣接関係をエッジとして表したグラフを基に、前記ノードのうち、あらかじめ設定された条件を満たすノードを、項目名を表す領域のノードである項目名ノードとして抽出する抽出工程と、１つのノードの所定の方向に、複数のノードが隣接している場合、前記１つのノードと前記複数のノードとの隣接関係を表すエッジを削除する第１の削除工程と、前記第１の削除工程によってエッジの削除が行われた前記グラフを基に、前記項目名ノード間の包含関係を取得する第２の取得工程と、前記項目名ノードのうち、所定の方向に項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、前記項目値ノードとの隣接関係を表すエッジを削除する第２の削除工程と、前記第１の削除工程および前記第２の削除工程によってエッジの削除が行われた前記グラフを基に、前記項目名ノードと前記項目値ノードとの間の論理関係を取得する第１の取得工程と、前記第１の取得工程によって取得された論理関係、および前記第２の取得工程によって取得された包含関係を合成した木構造のデータを作成する合成工程と、を含んだことを特徴とする。 The logical relationship recognition method of the present invention is a logical relationship recognition method executed by a logical relationship recognition device, wherein information relating to an area representing an item name or item value of a form is represented as a node, and the adjacent relationship between the nodes And extracting a node satisfying a preset condition from among the nodes as an item name node that is a node in a region representing an item name, and a predetermined value of one node When a plurality of nodes are adjacent to each other in the direction, a first deletion step of deleting an edge representing an adjacency relationship between the one node and the plurality of nodes, and deletion of an edge by the first deletion step A second acquisition step of acquiring an inclusion relationship between the item name nodes based on the performed graph; and a node of an area representing an item value in a predetermined direction among the item name nodes An item name node adjacent to an item value node, a second deletion step of deleting an edge representing an adjacency relationship between the item value node, an edge formed by the first deletion step and the second deletion step A first acquisition step of acquiring a logical relationship between the item name node and the item value node based on the graph in which the deletion is performed, and a logical relationship acquired by the first acquisition step, And a synthesizing step of creating data of a tree structure obtained by synthesizing the inclusion relation acquired in the second acquiring step.

本発明の論理関係認識プログラムは、コンピュータに、帳票の項目名または項目値を表す領域に関する情報をノードとして表し、前記ノード間の隣接関係をエッジとして表したグラフを基に、前記ノードのうち、あらかじめ設定された条件を満たすノードを、項目名を表す領域のノードである項目名ノードとして抽出する抽出ステップと、１つのノードの所定の方向に、複数のノードが隣接している場合、前記１つのノードと前記複数のノードとの隣接関係を表すエッジを削除する第１の削除ステップと、前記第１の削除ステップによってエッジの削除が行われた前記グラフを基に、前記項目名ノード間の包含関係を取得する第２の取得ステップと、前記項目名ノードのうち、所定の方向に項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、前記項目値ノードとの隣接関係を表すエッジを削除する第２の削除ステップと、前記第１の削除ステップおよび前記第２の削除ステップによってエッジの削除が行われた前記グラフを基に、前記項目名ノードと前記項目値ノードとの間の論理関係を取得する第１の取得ステップと、前記第１の取得ステップによって取得された論理関係、および前記第２の取得ステップによって取得された包含関係を合成した木構造のデータを作成する合成ステップと、を実行させることを特徴とする。 The logical relationship recognition program of the present invention represents information relating to an area representing an item name or item value of a form as a node on a computer, and based on a graph representing an adjacency relationship between the nodes as an edge, An extraction step of extracting a node that satisfies a preset condition as an item name node that is a node in an area representing an item name, and when a plurality of nodes are adjacent in a predetermined direction of one node, the 1 A first deletion step of deleting an edge representing an adjacent relationship between one node and the plurality of nodes, and the item name nodes between the item name nodes based on the graph in which the deletion of the edge is performed by the first deletion step. A second acquisition step of acquiring an inclusion relationship, and an item value node that is a node of an area representing an item value in a predetermined direction is adjacent to the item name node The edge is deleted by the second deletion step of deleting the edge representing the adjacent relationship between the item name node and the item value node, and the first deletion step and the second deletion step. Based on the graph, a first acquisition step of acquiring a logical relationship between the item name node and the item value node, a logical relationship acquired by the first acquisition step, and the second acquisition And a synthesizing step for creating tree-structured data obtained by synthesizing the inclusion relation acquired in the step.

本発明によれば、帳票に縦列挙、横列挙、縦列挙と横列挙の複合型、または列挙型入れ子構造が含まれる場合であっても、帳票の項目名間の論理関係、および項目名と項目値との間の論理関係を正確に認識することができる。 According to the present invention, even when the form includes a vertical enumeration, a horizontal enumeration, a composite type of vertical enumeration and horizontal enumeration, or an enumerated type nested structure, the logical relationship between the item names of the form, and the item names and The logical relationship between the item values can be recognized accurately.

図１は、論理関係認識処理の概要について説明するための図である。FIG. 1 is a diagram for explaining the outline of the logical relationship recognition process. 図２は、縦列挙の一例を示す図である。FIG. 2 is a diagram illustrating an example of vertical listing. 図３は、横列挙の一例を示す図である。FIG. 3 is a diagram illustrating an example of horizontal enumeration. 図４は、縦列挙と横列挙の複合型の一例を示す図である。FIG. 4 is a diagram illustrating an example of a combined type of vertical enumeration and horizontal enumeration. 図５は、列挙型入れ子構造の一例を示す図である。FIG. 5 is a diagram illustrating an example of an enumerated type nested structure. 図６は、列挙型入れ子構造の一例を示す図である。FIG. 6 is a diagram illustrating an example of an enumerated type nested structure. 図７は、列挙型入れ子構造の一例を示す図である。FIG. 7 is a diagram illustrating an example of an enumerated type nested structure. 図８は、列挙型入れ子構造の一例を示す図である。FIG. 8 is a diagram illustrating an example of an enumerated type nested structure. 図９は、第１の実施形態に係る論理関係認識装置の構成の一例を示す図である。FIG. 9 is a diagram illustrating an example of the configuration of the logical relationship recognition apparatus according to the first embodiment. 図１０は、列挙リストの一例である。FIG. 10 is an example of an enumeration list. 図１１は、列挙リストの一例である。FIG. 11 is an example of an enumeration list. 図１２は、包含グラフについて説明するための図である。FIG. 12 is a diagram for explaining the inclusion graph. 図１３は、包含グラフについて説明するための図である。FIG. 13 is a diagram for explaining the inclusion graph. 図１４は、木構造のデータの一例である。FIG. 14 is an example of tree structure data. 図１５は、木構造のデータの一例である。FIG. 15 is an example of tree-structured data. 図１６は、抽出部の処理の流れを示すフローチャートである。FIG. 16 is a flowchart showing the flow of processing of the extraction unit. 図１７は、解析部の処理の流れを示すフローチャートである。FIG. 17 is a flowchart showing the flow of processing of the analysis unit. 図１８は、第１の削除部の処理の流れを示すフローチャートである。FIG. 18 is a flowchart showing the flow of processing of the first deletion unit. 図１９は、第２の削除部の処理の流れを示すフローチャートである。FIG. 19 is a flowchart showing the flow of processing of the second deletion unit. 図２０は、分類部の処理の流れを示すフローチャートである。FIG. 20 is a flowchart showing the flow of processing of the classification unit. 図２１は、縦列挙取得部の処理の流れを示すフローチャートである。FIG. 21 is a flowchart illustrating a process flow of the vertical enumeration acquisition unit. 図２２は、横列挙取得部の処理の流れを示すフローチャートである。FIG. 22 is a flowchart showing the flow of processing of the horizontal enumeration acquisition unit. 図２３は、包含関係取得部の処理の流れを示すフローチャートである。FIG. 23 is a flowchart showing the flow of processing of the inclusion relationship acquisition unit. 図２４は、右側の包含関係を取得する処理の流れを示すフローチャートである。FIG. 24 is a flowchart showing the flow of processing for acquiring the right inclusion relationship. 図２５は、下側の包含関係を取得する処理の流れを示すフローチャートである。FIG. 25 is a flowchart showing the flow of processing for acquiring the lower inclusion relationship. 図２６は、包含グラフ生成部の処理の流れを示すフローチャートである。FIG. 26 is a flowchart illustrating a process flow of the inclusion graph generation unit. 図２７は、項目名間合成部の処理の流れを示すフローチャートである。FIG. 27 is a flowchart showing the flow of processing of the item name synthesizing unit. 図２８は、列挙合成部の処理の流れを示すフローチャートである。FIG. 28 is a flowchart showing the flow of processing of the enumeration synthesis unit. 図２９は、追加部の処理の流れを示すフローチャートである。FIG. 29 is a flowchart showing the flow of processing of the adding unit. 図３０は、その他の実施形態について説明するための図である。FIG. 30 is a diagram for explaining another embodiment. 図３１は、プログラムが実行されることにより論理関係認識装置が実現されるコンピュータの一例を示す図である。FIG. 31 is a diagram illustrating an example of a computer in which a logical relationship recognition apparatus is realized by executing a program.

以下に、本願に係る論理関係認識装置、論理関係認識方法および論理関係認識プログラムの実施形態を図面に基づいて詳細に説明する。なお、この実施形態により本発明が限定されるものではない。 Hereinafter, embodiments of a logical relationship recognition device, a logical relationship recognition method, and a logical relationship recognition program according to the present application will be described in detail with reference to the drawings. In addition, this invention is not limited by this embodiment.

まず、図１を用いて、論理関係認識装置を有する論理関係認識システムによる論理関係認識処理の概要について説明する。図１は、論理関係認識処理の概要について説明するための図である。図１に示すように、まず、論理関係認識システムは、ＰＣ等から列挙形式の帳票を読み込む（ステップＳ１）。このとき、論理関係認識システムが読み込むデータは、帳票に限られず、ＷｅｂＧＵＩ、システムＧＵＩ、および画像上の列挙構造等の、半構造のデータであればよい。次に、論理関係認識システムは、読み込んだ帳票から罫線枠に関する情報を取得する（ステップＳ２）。また、論理関係認識システムは、スキャナ等で読み込まれた紙の帳票の画像から記載内容をＯＣＲ（光学文字認識：Optical Character Recognition）によって取得してもよい（ステップＳ３）。 First, the outline of the logical relationship recognition process by the logical relationship recognition system having the logical relationship recognition device will be described with reference to FIG. FIG. 1 is a diagram for explaining the outline of the logical relationship recognition process. As shown in FIG. 1, first, the logical relationship recognition system reads an enumerated form from a PC or the like (step S1). At this time, the data read by the logical relationship recognition system is not limited to a form, and may be semi-structured data such as a Web GUI, a system GUI, and an enumerated structure on an image. Next, the logical relationship recognition system acquires information on the ruled line frame from the read form (step S2). Further, the logical relationship recognition system may acquire the description content from an image of a paper form read by a scanner or the like by OCR (Optical Character Recognition) (step S3).

そして、論理関係認識システムは、罫線枠情報および項目名定義情報を基に、様式グラフを生成する（ステップＳ４）。罫線枠情報には、例えば罫線枠の座標、罫線枠内の文字列、罫線枠の塗りつぶし色、罫線の種類や太さ、色等の視覚的な情報が含まれる。また、項目名定義情報には、罫線枠を項目名として判断する際の条件が含まれる。 Then, the logical relationship recognition system generates a style graph based on the ruled line frame information and the item name definition information (step S4). The ruled line frame information includes visual information such as the coordinates of the ruled line frame, the character string in the ruled line frame, the fill color of the ruled line frame, the type, thickness, and color of the ruled line. Further, the item name definition information includes a condition for determining a ruled line frame as an item name.

例えば、色が黄色の罫線枠を項目名とする項目名定義情報は、「if {node:{color:#FFFF00}} then item」と記述される。また、例えば、文字列の空でない罫線枠を項目名とする罫線枠定義情報は、「if {node:{!string:null}} then item」と記述される。また、例えば、色が白でなく，かつ文字列が空でない罫線枠を項目名とする罫線枠定義情報は、「if {node:{!color:whilte},{!string:null} then item」と記述される。また、例えば、文字列が「y1」の罫線枠を項目名とする罫線枠定義情報は、「if {node:{string:”y1”}} then item」と記述される。 For example, item name definition information whose item name is a ruled frame with a yellow color is described as “if {node: {color: # FFFF00}} then item”. Also, for example, ruled line frame definition information whose item name is a non-empty ruled line frame of a character string is described as “if {node: {! String: null}} then item”. Also, for example, ruled line frame definition information whose item name is a ruled line frame whose color is not white and whose character string is not empty is “if {node: {! Color: whilte}, {! String: null} then item” Is described. Further, for example, ruled line frame definition information whose item name is a ruled line frame with the character string “y1” is described as “if {node: {string:“ y1 ”}} then item”.

また、様式グラフとは、帳票に含まれる複数の様式ごとに、罫線枠等をノード、ノードの隣接関係をエッジとして表したグラフである。以降の処理において、論理関係認識システムは、帳票の様式をグラフ形式のデータとして扱う。また、様式には列挙が含まれる。以降の説明では、様式グラフを基に論理関係を認識する場合について説明するが、帳票全体をノードとエッジで表した帳票グラフを用いることとしてもよい。 Further, the style graph is a graph in which a ruled line frame or the like is represented as a node and an adjacent relationship between nodes is represented as an edge for each of a plurality of styles included in the form. In the subsequent processing, the logical relationship recognition system handles the form format as graph format data. The format also includes an enumeration. In the following description, the case where the logical relationship is recognized based on the style graph will be described. However, a form graph in which the entire form is represented by nodes and edges may be used.

次に、論理関係認識システムは、列挙形式の構造から論理関係を認識し（ステップＳ５）、認識した結果を所定の形式のデータ構造（ｘｍｌ、ｙａｍｌ、ｊｓｏｎ等）に変換し（ステップＳ６）、ＤＢに格納する。このとき、ＤＢへはデータ構造のリンクパスを格納してもよいし、あらかじめ定義したスキーマに合わせてデータを格納してもよい。 Next, the logical relationship recognition system recognizes the logical relationship from the structure of the enumeration format (step S5), converts the recognized result into a data structure of a predetermined format (xml, yaml, json, etc.) (step S6), Store in DB. At this time, the link path of the data structure may be stored in the DB, or the data may be stored in accordance with a predefined schema.

次に、図２〜８を用いて、論理関係認識システムによる論理関係認識処理の対象である列挙の種類について説明する。なお、図２〜８の各符号はノードを表しており、以降の説明では、説明のために、これらの符号が示すノードを、単にノードと呼ぶ場合と、項目名ノードまたは項目値ノードと呼ぶ場合がある。 Next, the types of enumeration that are targets of logical relationship recognition processing by the logical relationship recognition system will be described with reference to FIGS. 2 to 8 represent nodes. In the following description, for the sake of explanation, the nodes indicated by these symbols are simply referred to as nodes, and are referred to as item name nodes or item value nodes. There is a case.

図２は、縦列挙の一例を示す図である。図２に示すように、縦列挙は、項目名同士、および項目名と項目値の関係である論理関係を、縦方向に複数有する。例えば、ノードａ１とａ２、およびノードａ４とａ５は、項目名と項目値の論理関係を有する。図３は、横列挙の一例を示す図である。図３に示すように、横列挙は、論理関係を横方向に複数有する。例えば、ノードｂ１とｂ２は、項目名と項目値の論理関係を有する。なお、図２〜８において、網掛け部分は項目名を表し、網掛けでない部分は項目値を表している。 FIG. 2 is a diagram illustrating an example of vertical listing. As shown in FIG. 2, the vertical enumeration has a plurality of logical relationships in the vertical direction, which are the relationship between item names and the relationship between item names and item values. For example, the nodes a1 and a2 and the nodes a4 and a5 have a logical relationship between item names and item values. FIG. 3 is a diagram illustrating an example of horizontal enumeration. As shown in FIG. 3, the horizontal enumeration has a plurality of logical relationships in the horizontal direction. For example, the nodes b1 and b2 have a logical relationship between item names and item values. 2 to 8, the shaded portion represents an item name, and the portion not shaded represents an item value.

図４は、縦列挙と横列挙の複合型の一例を示す図である。図４に示すように、縦列挙と横列挙の複合型の場合、縦列挙および横列挙が１つの様式内に存在する。例えば、ノードｃ１とｃ２、およびノードｃ３とｃ４は、横列挙である。また、例えば、ノードｃ５とｃ７、およびノードｃ６とｃ８は、縦列挙である。 FIG. 4 is a diagram illustrating an example of a combined type of vertical enumeration and horizontal enumeration. As shown in FIG. 4, in the case of a combined type of vertical enumeration and horizontal enumeration, vertical enumeration and horizontal enumeration exist in one mode. For example, nodes c1 and c2 and nodes c3 and c4 are listed horizontally. For example, nodes c5 and c7 and nodes c6 and c8 are listed vertically.

図５〜８は、列挙型入れ子構造の一例を示す図である。図５に示すように、列挙型入れ子構造は、縦方向または横方向に、入れ子になった複数の列挙を有する。例えば、ノードｄ１とｄ２は項目名同士の論理関係を有する。さらに、ノードｄ２とｄ３は、項目名と項目値の論理関係を有する。このように、ノードｄ２とｄ３の論理関係は、ノードｄ１とｄ２の論理関係の入れ子になっている。 5 to 8 are diagrams illustrating an example of an enumerated type nested structure. As shown in FIG. 5, the enumerated nested structure has a plurality of nested enumerations in the vertical or horizontal direction. For example, nodes d1 and d2 have a logical relationship between item names. Further, the nodes d2 and d3 have a logical relationship between item names and item values. Thus, the logical relationship between the nodes d2 and d3 is a nesting of the logical relationship between the nodes d1 and d2.

また、図６に示すように、例えば、ノードｅ１とｅ２は項目名同士の論理関係を有する。さらに、ノードｅ２とｅ３は、項目名と項目値の論理関係を有する。このように、ノードｅ２とｅ３の論理関係は、ノードｅ１とｅ２の論理関係の入れ子になっている。 As shown in FIG. 6, for example, nodes e1 and e2 have a logical relationship between item names. Further, the nodes e2 and e3 have a logical relationship between item names and item values. Thus, the logical relationship between the nodes e2 and e3 is a nesting of the logical relationship between the nodes e1 and e2.

また、図７に示すように、例えば、ノードｆ１とｆ２は項目名同士の論理関係を有する。また、ノードｆ２とｆ３は項目名同士の論理関係を有する。さらに、ノードｆ３とｆ４は、項目名と項目値の論理関係を有する。このように、ノードｆ３とｆ４の論理関係は、ノードｆ２とｆ３の論理関係の入れ子になっている。さらに、ノードｆ２とｆ３の論理関係は、ノードｆ１とｆ２の論理関係の入れ子になっている。 Also, as shown in FIG. 7, for example, the nodes f1 and f2 have a logical relationship between item names. Nodes f2 and f3 have a logical relationship between item names. Further, the nodes f3 and f4 have a logical relationship between item names and item values. Thus, the logical relationship between the nodes f3 and f4 is a nesting of the logical relationship between the nodes f2 and f3. Further, the logical relationship between the nodes f2 and f3 is a nesting of the logical relationship between the nodes f1 and f2.

また、図８に示すように、例えば、ノードｇ２とｇ３は、項目名と項目値の論理関係を有する。また、ノードｇ６とｇ７は項目名同士の論理関係を有する。また、ノードｇ７とｇ８は、項目名と項目値の論理関係を有する。このように、ノードｇ７とｇ８の論理関係は、ノードｇ６とｇ７の論理関係の入れ子になっている。しかし、ノードｇ２とｇ３の論理関係は、ノードｇ７とｇ８の論理関係、およびノードｇ６とｇ７の論理関係との間に入れ子関係はない。 As shown in FIG. 8, for example, nodes g2 and g3 have a logical relationship between item names and item values. Nodes g6 and g7 have a logical relationship between item names. Nodes g7 and g8 have a logical relationship between item names and item values. Thus, the logical relationship between the nodes g7 and g8 is a nesting of the logical relationship between the nodes g6 and g7. However, the logical relationship between the nodes g2 and g3 is not nested between the logical relationship between the nodes g7 and g8 and the logical relationship between the nodes g6 and g7.

［第１の実施形態の構成］
次に、図９を用いて、第１の実施形態に係る論理関係認識装置の構成について説明する。図９は、第１の実施形態に係る論理関係認識装置の構成の一例を示す図である。図９に示すように、論理関係認識装置１０は、制御部２０および記憶部３０を有する。 [Configuration of First Embodiment]
Next, the configuration of the logical relationship recognition apparatus according to the first embodiment will be described with reference to FIG. FIG. 9 is a diagram illustrating an example of the configuration of the logical relationship recognition apparatus according to the first embodiment. As illustrated in FIG. 9, the logical relationship recognition apparatus 10 includes a control unit 20 and a storage unit 30.

制御部２０は、論理関係認識装置１０全体を制御する。制御部２０は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。また、制御部２０は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部２０は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部２０は、抽出部２０１、解析部２０２、第１の削除部２０３、第２の削除部２０４、分類部２０５、縦列挙取得部２０６、横列挙取得部２０７、包含関係取得部２０８、包含グラフ生成部２０９、項目名間合成部２１０、列挙合成部２１１および追加部２１２を有する。 The control unit 20 controls the entire logical relationship recognition apparatus 10. The control unit 20 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 20 has an internal memory for storing programs and control data that define various processing procedures, and executes each process using the internal memory. The control unit 20 functions as various processing units when various programs are operated. For example, the control unit 20 includes an extraction unit 201, an analysis unit 202, a first deletion unit 203, a second deletion unit 204, a classification unit 205, a vertical enumeration acquisition unit 206, a horizontal enumeration acquisition unit 207, and an inclusion relationship acquisition unit 208. , An inclusion graph generation unit 209, an item name synthesis unit 210, an enumeration synthesis unit 211, and an addition unit 212.

記憶部３０は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、光ディスク等の記憶装置である。なお、記憶部３０は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ＮＶＳＲＡＭ（Non Volatile Static Random Access Memory）等のデータを書き換え可能な半導体メモリであってもよい。記憶部３０は、論理関係認識装置１０で実行されるＯＳ（Operating System）や各種プログラムを記憶する。さらに、記憶部３０は、プログラムの実行で用いられる各種情報を記憶する。また、記憶部３０は、例えば列挙リスト３０１および包含グラフ３０２を記憶する。 The storage unit 30 is a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or an optical disk. Note that the storage unit 30 may be a semiconductor memory that can rewrite data, such as a random access memory (RAM), a flash memory, and a non-volatile static random access memory (NVSRAM). The storage unit 30 stores an OS (Operating System) executed by the logical relationship recognition apparatus 10 and various programs. Furthermore, the storage unit 30 stores various information used in executing the program. In addition, the storage unit 30 stores, for example, an enumeration list 301 and an inclusion graph 302.

ここで、論理関係認識装置１０による論理関係認識処理について説明するとともに、論理関係認識装置１０の各部の詳細について説明する。抽出部２０１は、帳票の項目名または項目値を表す領域に関する情報をノードとして表し、ノード間の隣接関係をエッジとして表したグラフ、すなわち様式グラフを基に、ノードのうち、あらかじめ設定された条件を満たすノードを、項目名を表す領域のノードである項目名ノードとして抽出する。 Here, the logical relationship recognition processing by the logical relationship recognition device 10 will be described, and details of each part of the logical relationship recognition device 10 will be described. The extraction unit 201 represents information related to the area representing the item name or item value of the form as a node, and a pre-set condition among the nodes based on a graph representing the adjacent relationship between nodes as an edge, that is, a style graph. Nodes satisfying the condition are extracted as item name nodes that are nodes in the area representing the item name.

ここで、項目名ノードを抽出する際のルールは、解析部２０２によって作成される。解析部２０２は、あらかじめテキストファイル等として作成された解析前の視覚表現のルールを読み込み、読み込んだ視覚表現のルールから条件およびアクションを解析し、コンピュータで認識可能な視覚表現ルールを作成する。なお、以降の説明で、単に視覚表現ルールと呼ぶ場合は、解析後の視覚表現ルールのことを指すこととする。 Here, a rule for extracting the item name node is created by the analysis unit 202. The analysis unit 202 reads a pre-analysis visual expression rule created in advance as a text file or the like, analyzes conditions and actions from the read visual expression rule, and creates a visual expression rule that can be recognized by a computer. In the following description, when simply referred to as a visual expression rule, it refers to a visual expression rule after analysis.

第１の削除部２０３は、１つのノードの所定の方向に、複数のノードが隣接している場合、１つのノードと複数のノードとの隣接関係を表すエッジを削除する。なお、第１の削除部２０３は、１つのノードの左側または上側に、複数のノードが隣接している場合、１つのノードと複数のノードとの隣接関係を表すエッジを削除するようにしてもよい。例えば、図４の縦列挙と横列挙の複合型の場合、いずれのノードについても、左側または上側に複数ノードが隣接していないため、第１の削除部２０３はエッジを削除しない。また、例えば、図８の列挙型入れ子構造の場合、ノードｇ６の上側に複数のノードが隣接しているため、第１の削除部２０３は、ノードｇ６と、ノードｇ１、ｇ２、ｇ３、ｇ４およびｇ５との隣接関係を表すエッジを削除する。 When a plurality of nodes are adjacent to each other in a predetermined direction of one node, the first deletion unit 203 deletes an edge representing an adjacency relationship between the one node and the plurality of nodes. Note that the first deletion unit 203 may delete an edge representing an adjacency relationship between one node and a plurality of nodes when a plurality of nodes are adjacent to the left or upper side of the one node. Good. For example, in the case of the combined type of vertical enumeration and horizontal enumeration in FIG. 4, the first deletion unit 203 does not delete the edge because there are no multiple nodes adjacent to the left or upper side for any node. Further, for example, in the case of the enumerated type nested structure of FIG. 8, since a plurality of nodes are adjacent to the upper side of the node g6, the first deletion unit 203 includes the node g6, the nodes g1, g2, g3, g4, and The edge representing the adjacency relationship with g5 is deleted.

また、第２の削除部２０４は、項目名ノードのうち、所定の方向に項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、項目値ノードとの隣接関係を表すエッジを削除する。なお、第２の削除部２０４は、項目名ノードのうち、左側または上側に、項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、項目値ノードとの隣接関係を表すエッジを削除するようにしてもよい。例えば、図４の縦列挙と横列挙の複合型の場合、項目名ノードｃ６の上側に項目値ノードｃ４が隣接しているため、第２の削除部２０４は、ノードｃ６とノードｃ４との隣接関係を表すエッジを削除する。また、例えば、図６の列挙型入れ子構造の場合、項目名ノードｅ４の左側に項目値ノードｅ３およびｅ７が隣接しているため、第２の削除部２０４は、ノードｅ４と、ノードｅ３およびｅ７との隣接関係を表すエッジを削除する。 In addition, the second deletion unit 204 determines the adjacency relationship between the item name node and the item value node adjacent to the item value node that is the node of the area representing the item value in a predetermined direction among the item name nodes. Delete the representing edge. Note that the second deletion unit 204 has an adjacency relationship between an item name node and an item value node that is adjacent to an item value node that is a node of an area representing an item value on the left or upper side of the item name nodes. You may make it delete the edge showing. For example, in the case of the combined type of vertical enumeration and horizontal enumeration in FIG. 4, the item value node c4 is adjacent to the upper side of the item name node c6, so the second deletion unit 204 is adjacent to the node c6 and the node c4 Remove edges that represent relationships. Further, for example, in the case of the enumerated type nested structure of FIG. 6, since the item value nodes e3 and e7 are adjacent to the left side of the item name node e4, the second deletion unit 204 performs the node e4 and the nodes e3 and e7. Delete the edge representing the adjacency relationship with.

そして、縦列挙取得部２０６および横列挙取得部２０７は、第１の削除部２０３および第２の削除部２０４によってエッジの削除が行われたグラフを基に、項目名ノードと項目値ノードとの間の論理関係を取得する。具体的に、縦列挙取得部２０６および横列挙取得部２０７は、論理関係を、図１０または１１に示すようなリストとして取得し、取得したリストを列挙リスト３０１として記憶部３０に格納する。図１０および１１は、列挙リストの一例である。 Then, the vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207, based on the graph in which the edges are deleted by the first deletion unit 203 and the second deletion unit 204, are used for the item name node and the item value node. Get logical relationship between. Specifically, the vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207 acquire the logical relationship as a list as illustrated in FIG. 10 or 11 and store the acquired list in the storage unit 30 as the enumeration list 301. 10 and 11 are examples of enumerated lists.

図１０は、図４の列挙の論理関係を表した列挙リストである。図１０のリストの１行目は、項目名ノードｃ１の横方向の子がノードｃ２であることを表している。また、図１０のリストの３行目は、項目名ノードｃ５の縦方向の子がノードｃ７であることを表している。 FIG. 10 is an enumeration list showing the logical relationship of the enumeration of FIG. The first line of the list in FIG. 10 indicates that the horizontal child of the item name node c1 is the node c2. Further, the third line of the list of FIG. 10 indicates that the vertical child of the item name node c5 is the node c7.

図１１は、図７の列挙の論理関係を表した列挙リストである。図１１のリストの１行目は、項目名ノードｆ３の横方向の子がノードｆ４であることを表している。また、図１１のリストの２行目は、項目名ノードｆ９の横方向の子がノードｆ１０であることを表している。 FIG. 11 is an enumeration list showing the logical relationship of the enumeration of FIG. The first line of the list of FIG. 11 indicates that the horizontal child of the item name node f3 is the node f4. The second line of the list of FIG. 11 indicates that the horizontal child of the item name node f9 is the node f10.

また、例えば、図８の列挙型入れ子構造の場合、第１の削除部２０３によって、ノードｇ６と、ノードｇ１、ｇ２、ｇ３、ｇ４およびｇ５との隣接関係を表すエッジが削除されているため、縦列挙取得部２０６および横列挙取得部２０７が、項目名ノードｇ２の縦方向の子をｇ８とするような論理関係を取得することはない。 Further, for example, in the case of the enumerated type nested structure of FIG. 8, the first deletion unit 203 deletes the edge representing the adjacent relationship between the node g6 and the nodes g1, g2, g3, g4, and g5. The vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207 do not acquire a logical relationship such that the vertical child of the item name node g2 is g8.

また、例えば、図８の列挙型入れ子構造の場合、第２の削除部２０４によって、ノードｇ１４と、ノードｇ１３およびｇ１７との隣接関係を表すエッジが削除されているため、縦列挙取得部２０６および横列挙取得部２０７が、項目名ノードｇ１２の横方向の子をｇ１５とするような論理関係を取得することはない。 Further, for example, in the case of the enumerated type nested structure of FIG. 8, since the edge indicating the adjacency relationship between the node g14 and the nodes g13 and g17 is deleted by the second deletion unit 204, the vertical enumeration acquisition unit 206 and The horizontal enumeration acquisition unit 207 does not acquire a logical relationship in which the horizontal child of the item name node g12 is g15.

包含関係取得部２０８は、第１の削除部２０３によってエッジの削除が行われたグラフを基に、項目名ノード間の包含関係を取得する。また、包含グラフ生成部２０９は、包含関係取得部２０８によって取得された包含関係を基に、図１２に示すようなグラフを生成し、生成したグラフを包含グラフ３０２として記憶部３０に格納する。図１２は、包含グラフについて説明するための図である。図１２の破線は縦方向の包含関係を表している。また、図１２の実線は横方向の包含関係を表している。以後、対象ノードが縦方向（または横方向）に他のノードを1つ以上包含する場合、対象ノードを縦方向（または横方向）の包含ノードと呼ぶ。また、包含関係のあるノードのうち、他のノードに包含されないノードを主のノード、包含される他のノードを従のノードと呼ぶ。 The inclusion relationship acquisition unit 208 acquires the inclusion relationship between the item name nodes based on the graph in which the edge is deleted by the first deletion unit 203. Further, the inclusion graph generation unit 209 generates a graph as illustrated in FIG. 12 based on the inclusion relationship acquired by the inclusion relationship acquisition unit 208 and stores the generated graph in the storage unit 30 as the inclusion graph 302. FIG. 12 is a diagram for explaining the inclusion graph. The broken line in FIG. 12 represents the vertical inclusion relationship. Also, the solid line in FIG. 12 represents the horizontal inclusion relationship. Hereinafter, when the target node includes one or more other nodes in the vertical direction (or horizontal direction), the target node is referred to as a vertical (or horizontal) included node. Among nodes having an inclusion relationship, a node that is not included in another node is referred to as a master node, and another node that is included is referred to as a slave node.

図１２の包含グラフは、図７の列挙型入れ子構造に対応したものである。図７および１２に示すように、ノードｆ２は、ノードｆ３、ｆ５、ｆ７、ｆ９およびｆ１１を横方向に包含している。また、ノードｆ１３は、ノードｆ１４、ｆ１６、ｆ１８、ｆ２０、ｆ２２およびｆ２４を横方向に包含している。また、ノードｆ１は、ノードｆ２およびｆ１３を縦方向に包含している。また、ｆ１が主のノード、ｆ２、ｆ３、ｆ５、ｆ７、ｆ９、ｆ１１、ｆ１３、ｆ１４、ｆ１６、ｆ１８、ｆ２０、ｆ２２、ｆ２が従のノードとなる。 The inclusion graph of FIG. 12 corresponds to the enumerated type nested structure of FIG. As shown in FIGS. 7 and 12, the node f2 includes nodes f3, f5, f7, f9, and f11 in the horizontal direction. The node f13 includes nodes f14, f16, f18, f20, f22, and f24 in the horizontal direction. The node f1 includes nodes f2 and f13 in the vertical direction. Further, f1 is a main node, and f2, f3, f5, f7, f9, f11, f13, f14, f16, f18, f20, f22, and f2 are subordinate nodes.

具体的に、包含関係取得部２０８は、第１の項目名ノードの右側に隣接する第１のノード群のうち少なくとも１つが項目名ノードであり、かつ、第１のノード群に含まれる全てのノードの高さが第１の項目名ノードの高さ以下であり、かつ、第１のノード群の左上端のノードの頂点と、第１の項目名ノードの頂点が重なっている場合、第１の項目名ノードが第１のノード群を横方向に包含していると判定する。また、包含関係取得部２０８は、第２の項目名ノードの下側に隣接する第２のノード群のうち少なくとも１つが項目名ノードであり、かつ、第２のノード群に含まれる全てのノードの幅が第２の項目名ノードの幅以下であり、かつ、第２のノード群の左上端のノードの頂点と、第１の項目名ノードの頂点が重なっている場合、第２の項目名ノードが第２のノード群を縦方向に包含していると判定する。 Specifically, the inclusion relationship acquisition unit 208 includes at least one of the first node groups adjacent to the right side of the first item name node as an item name node, and all of the first node groups included in the first node group. If the height of the node is less than or equal to the height of the first item name node, and the vertex of the upper left node of the first node group and the vertex of the first item name node overlap, the first Is determined to include the first node group in the horizontal direction. In addition, the inclusion relationship acquisition unit 208 includes at least one of the second node groups adjacent to the lower side of the second item name node as an item name node, and all nodes included in the second node group The width of the second item name node is less than or equal to the width of the second item name node, and the vertex of the upper left node of the second node group overlaps the vertex of the first item name node, the second item name It is determined that the node includes the second node group in the vertical direction.

例えば、図７に示すように、項目名ノードｆ２の右側に隣接するノード群のうち、少なくともノードｆ３は項目名ノードである。また、項目名ノードｆ２の右側に隣接するノード群に含まれる全てのノードの高さは、全て項目名ノードｆ２の高さ以下である。また、項目名ノードｆ２の右側に隣接するノード群の左上端のノード、すなわちノードｆ３の頂点は、項目名ノードｆ２の頂点と重なり、かつ項目名ノードｆ２の右側に隣接するノード群の左下端のノード、すなわちノードｆ４の頂点は項目名ノードｆ２の頂点と重なる。これより、包含関係取得部２０８は、項目名ノードｆ２が、項目名ノードｆ２の右側に隣接するノード群を包含していると判定する。 For example, as shown in FIG. 7, at least the node f3 is an item name node in the node group adjacent to the right side of the item name node f2. The heights of all nodes included in the node group adjacent to the right side of the item name node f2 are all equal to or lower than the height of the item name node f2. Also, the upper left node of the node group adjacent to the right side of the item name node f2, that is, the vertex of the node f3 overlaps with the vertex of the item name node f2, and the lower left end of the node group adjacent to the right side of the item name node f2. , That is, the vertex of the node f4 overlaps the vertex of the item name node f2. Accordingly, the inclusion relationship acquisition unit 208 determines that the item name node f2 includes a node group adjacent to the right side of the item name node f2.

なお、包含関係取得部２０８の処理対象となる様式グラフは、第１の削除部２０３によるエッジの削除は行われているが、第２の削除部２０４によるエッジの削除は行われていないものである。このため、例えば、ノードｆ４と、ノードｆ５およびｆ７との間の隣接関係を表すエッジは削除されていない。そのため、包含関係取得部２０８は、項目名ノードｆ２が、ノードｆ５およびｆ７を横方向に包含していると判定する。 Note that the format graph to be processed by the inclusion relationship acquisition unit 208 has been deleted by the first deletion unit 203 but not by the second deletion unit 204. is there. For this reason, for example, the edge representing the adjacent relationship between the node f4 and the nodes f5 and f7 is not deleted. Therefore, the inclusion relationship acquisition unit 208 determines that the item name node f2 includes the nodes f5 and f7 in the horizontal direction.

項目名間合成部２１０は、包含グラフを、図１３に示すような、木構造のデータとして表す。図１３は、包含グラフについて説明するための図である。そして、項目名間合成部２１０および列挙合成部２１１は、縦列挙取得部２０６および横列挙取得部２０７によって取得された論理関係と、包含関係取得部２０８によって取得された包含関係と、を合成した木構造のデータを作成する。また、追加部２１２は、項目名間合成部２１０および列挙合成部２１１によって作成された木構造のデータに、当該木構造を定義する根ノードを追加する。 The item name synthesizing unit 210 represents the inclusion graph as tree-structured data as shown in FIG. FIG. 13 is a diagram for explaining the inclusion graph. Then, the item name synthesizing unit 210 and the enumeration synthesizing unit 211 synthesize the logical relationship acquired by the vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207 and the inclusion relationship acquired by the inclusion relationship acquisition unit 208. Create tree structure data. The adding unit 212 adds a root node that defines the tree structure to the tree structure data created by the item name synthesizing unit 210 and the enumeration synthesizing unit 211.

例えば、列挙合成部２１１は、図１０のリストを基に、図１４に示す木構造のデータを作成する。そして、追加部２１２は、当該木構造のデータに根ノード「ｆｏｒｍ１」を追加する。図１４は、木構造のデータの一例である。なお、この場合、図４の列挙には包含関係が存在しないため、項目名間合成部２１０は、包含グラフを作成しない。 For example, the enumeration synthesis unit 211 creates the tree structure data shown in FIG. 14 based on the list shown in FIG. Then, the adding unit 212 adds the root node “form1” to the data of the tree structure. FIG. 14 is an example of tree structure data. In this case, since there is no inclusion relationship in the enumeration in FIG. 4, the inter-name-name combining unit 210 does not create an inclusion graph.

また、例えば、列挙合成部２１１は、図１１のリストおよび図１３の包含グラフを基に、図１５に示す木構造のデータを作成する。そして、追加部２１２は、当該木構造のデータに根ノード「ｆｏｒｍ２」を追加する。図１５は、木構造のデータの一例である。なお、根ノードは様式を構成する論理関係を示すために追加しているため、様式を構成する情報が必要なければ追加部２１２は必須ではない。 Further, for example, the enumeration synthesis unit 211 creates the tree structure data shown in FIG. 15 based on the list of FIG. 11 and the inclusion graph of FIG. Then, the adding unit 212 adds the root node “form2” to the data of the tree structure. FIG. 15 is an example of tree-structured data. In addition, since the root node is added to indicate the logical relationship that forms the format, the adding unit 212 is not essential if the information that configures the format is not necessary.

なお、木構造データの各ノードである木ノードは、例えば項目名または項目値の文字列等の帳票の書式情報から取得された情報の他に、子や親のノードを識別する情報、子や親のノードとの隣接方向、および当該木ノードが列挙に含まれるものであることを示す情報等を有する。 In addition, the tree node that is each node of the tree structure data includes, for example, information for identifying child and parent nodes in addition to information acquired from form format information such as item name or item value character strings, Information indicating that the adjacent direction to the parent node, the tree node is included in the enumeration, and the like.

［第１の実施形態の処理］
次に、論理関係認識装置１０の処理の流れについて説明する。まず、図１６を用いて抽出部２０１の処理について説明する。図１６は、抽出部の処理の流れを示すフローチャートである。図１６に示すように、まず、抽出部２０１は、様式グラフおよび解析後視覚表現ルールを読み込む（ステップＳ１１）。次に、抽出部２０１は、様式グラフから解析後視覚表現ルールの条件を満たすノード群を選択する（ステップＳ１２）。次に、抽出部２０１は、選択したノード群の項目名属性を項目名に設定、すなわち項目名であることを示すフラグの値をｔｒｕｅにする（ステップＳ１３）。最後に、抽出部２０１は、様式グラフとＧとして返す（ステップＳ１４）。 [Process of First Embodiment]
Next, the processing flow of the logical relationship recognition apparatus 10 will be described. First, the process of the extraction unit 201 will be described with reference to FIG. FIG. 16 is a flowchart showing the flow of processing of the extraction unit. As shown in FIG. 16, first, the extraction unit 201 reads a style graph and a visual expression rule after analysis (step S11). Next, the extraction unit 201 selects a node group that satisfies the condition of the post-analysis visual expression rule from the style graph (step S12). Next, the extraction unit 201 sets the item name attribute of the selected node group to the item name, that is, sets the value of the flag indicating the item name to true (step S13). Finally, the extraction unit 201 returns it as a style graph and G (step S14).

次に、図１７を用いて解析部２０２の処理について説明する。図１７は、解析部の処理の流れを示すフローチャートである。解析部２０２は、視覚表現ルールを基に解析後視覚表現ルールを作成する。まず、解析部２０２は、視覚表現ルール群を読み込む（ステップＳ２１）。次に、解析部２０２は、ＡｒｒａｙまたはＨａｓｈ型の変数ｒｕｌｅ＿ｌｉｓｔを生成する（ステップＳ２２）。以降、解析部２０２は、読み込んだ視覚表現ルールを１つずつ処理する（ステップＳ２３、Ｓ２７）。 Next, processing of the analysis unit 202 will be described with reference to FIG. FIG. 17 is a flowchart showing the flow of processing of the analysis unit. The analysis unit 202 creates a post-analysis visual expression rule based on the visual expression rule. First, the analysis unit 202 reads a visual expression rule group (step S21). Next, the analysis unit 202 generates an Array or Hash type variable rule_list (step S22). Thereafter, the analysis unit 202 processes the read visual expression rules one by one (steps S23 and S27).

まず、解析部２０２は、視覚表現ルールの条件を解析する（ステップＳ２４）。次に、解析部２０２は、視覚表現ルールのアクションを解析する（ステップＳ２５）。そして、解析部２０２は、解析した条件およびアクションを解析後視覚表現ルールとしてｒｕｌｅ＿ｌｉｓｔに格納する（ステップＳ２６）。解析部２０２は、全ての視覚表現ルールについて処理を行った後、解析後視覚表現ルール群が格納されたｒｕｌｅ＿ｌｉｓｔを出力する（ステップＳ２８）。 First, the analysis unit 202 analyzes the condition of the visual expression rule (step S24). Next, the analysis unit 202 analyzes the action of the visual expression rule (step S25). Then, the analysis unit 202 stores the analyzed conditions and actions in the rule_list as post-analysis visual expression rules (step S26). The analysis unit 202 processes all the visual expression rules, and then outputs a rule_list storing the post-analysis visual expression rule group (step S28).

次に、図１８を用いて、第１の削除部２０３の処理について説明する。図１８は、第１の削除部の処理の流れを示すフローチャートである。まず、第１の削除部２０３は、様式グラフを読み込む（ステップＳ３１）。 Next, processing of the first deletion unit 203 will be described with reference to FIG. FIG. 18 is a flowchart showing the flow of processing of the first deletion unit. First, the first deletion unit 203 reads a style graph (step S31).

次に、第１の削除部２０３は、様式グラフの各ノードについて、以下の処理を行う（ステップＳ３２、Ｓ３７）。まず、左方向に隣接するノードの本数が２本以上である場合（ステップＳ３３、ｔｒｕｅ）、第１の削除部２０３は、左方向に隣接するノードと対象ノードの隣接関係を削除する（ステップＳ３４）。また、左方向に隣接するノードの本数が２本以上でない場合（ステップＳ３３、ｆａｌｓｅ）、第１の削除部２０３は、隣接関係を削除しない。 Next, the 1st deletion part 203 performs the following processes about each node of a style graph (step S32, S37). First, when the number of nodes adjacent in the left direction is two or more (step S33, true), the first deletion unit 203 deletes the adjacent relationship between the node adjacent in the left direction and the target node (step S34). ). When the number of nodes adjacent in the left direction is not two or more (step S33, false), the first deletion unit 203 does not delete the adjacent relationship.

次に、上方向に隣接するノードの本数が２本以上である場合（ステップＳ３５、ｔｒｕｅ）、第１の削除部２０３は、上方向に隣接するノードと対象ノードの隣接関係を削除する（ステップＳ３６）。また、上方向に隣接するノードの本数が２本以上でない場合（ステップＳ３５、ｆａｌｓｅ）、第１の削除部２０３は、隣接関係を削除しない。第１の削除部２０３は、全ての対象ノードについて処理を行った後、様式グラフを出力する（ステップＳ３８）。 Next, when the number of nodes adjacent in the upward direction is two or more (step S35, true), the first deletion unit 203 deletes the adjacent relationship between the node adjacent in the upward direction and the target node (step S35). S36). If the number of nodes adjacent in the upward direction is not two or more (step S35, false), the first deletion unit 203 does not delete the adjacent relationship. The 1st deletion part 203 outputs a style graph, after processing about all the object nodes (step S38).

例えば、図８の例では、第１の削除部２０３は、ノードｇ１、ｇ２、ｇ３、ｇ４およびｇ５と、ノードｇ６との隣接関係や、ノードｇ１３およびｇ１７と、ノードｇ１４との隣接関係を表すエッジを削除する。 For example, in the example of FIG. 8, the first deletion unit 203 represents the adjacency relationship between the nodes g1, g2, g3, g4, and g5 and the node g6, and the adjacency relationship between the nodes g13 and g17 and the node g14. Delete the edge.

次に、図１９を用いて、第２の削除部２０４の処理について説明する。図１９は、第２の削除部の処理の流れを示すフローチャートである。まず、第２の削除部２０４は、様式グラフおよび項目名ノードリストを読み込む（ステップＳ４１）。なお、項目名ノードリストは、抽出部２０１によって抽出された項目名ノードのリストである。 Next, the processing of the second deletion unit 204 will be described with reference to FIG. FIG. 19 is a flowchart showing the flow of processing of the second deletion unit. First, the second deletion unit 204 reads a style graph and an item name node list (step S41). The item name node list is a list of item name nodes extracted by the extraction unit 201.

次に、第２の削除部２０４は、項目名ノードのリストの各項目名ノードについて、以下の処理を行う（ステップＳ４２、Ｓ４７）。まず、左方向に項目値ノードが隣接する場合（ステップＳ４３、ｔｒｕｅ）、第２の削除部２０４は、左方向に隣接するノードと対象ノードの隣接関係を削除する（ステップＳ４４）。また、左方向に項目値ノードが隣接しない場合（ステップＳ４３、ｆａｌｓｅ）、第２の削除部２０４は、隣接関係を削除しない。 Next, the second deletion unit 204 performs the following processing for each item name node in the list of item name nodes (steps S42 and S47). First, when the item value node is adjacent in the left direction (step S43, true), the second deletion unit 204 deletes the adjacent relationship between the node adjacent in the left direction and the target node (step S44). When the item value node is not adjacent in the left direction (step S43, false), the second deletion unit 204 does not delete the adjacent relationship.

次に、上方向に項目値ノードが隣接する場合（ステップＳ４５、ｔｒｕｅ）、第２の削除部２０４は、上方向に隣接するノードと対象ノードの隣接関係を削除する（ステップＳ４６）。また、上方向に項目値ノードが隣接しない場合（ステップＳ４５、ｆａｌｓｅ）、第２の削除部２０４は、隣接関係を削除しない。第２の削除部２０４は、全ての対象ノードについて処理を行った後、様式グラフを出力する（ステップＳ４８）。 Next, when the item value node is adjacent in the upward direction (step S45, true), the second deletion unit 204 deletes the adjacent relationship between the node adjacent in the upward direction and the target node (step S46). If the item value node is not adjacent in the upward direction (step S45, false), the second deletion unit 204 does not delete the adjacent relationship. The 2nd deletion part 204 outputs a style graph, after processing about all the object nodes (step S48).

例えば、図８の例では、第２の削除部２０４は、ノードｇ３とノードｇ４との隣接関係や、ノードｇ８と、ノードｇ１２およびｇ１３との隣接関係や、ノードｇ１４およびｇ１７と、ノードｇ１５との隣接関係を表すエッジを削除する。 For example, in the example of FIG. 8, the second deletion unit 204 includes the adjacency relationship between the node g3 and the node g4, the adjacency relationship between the node g8 and the nodes g12 and g13, the nodes g14 and g17, and the node g15. Delete the edge representing the adjacency relationship.

次に、図２０を用いて、分類部２０５の処理について説明する。図２０は、分類部の処理の流れを示すフローチャートである。以後、対象ノードから右側に隣接しているノードを辿って得られるノードとの関係を「右側に接続する」、対象ノードから下側に隣接しているノードを辿って得られるノードとの関係を「下側に接続する」と呼ぶ。図２０に示すように、まず、分類部２０５は、様式グラフおよび対象ノードを読み込む（ステップＳ５１）。ここで、対象ノードが項目名でない場合（ステップＳ５２、ｔｒｕｅ）、分類部２０５は、列挙フラグを「列挙なし」に設定する（ステップＳ５８）。なお、列挙フラグは、各対象ノードの分類を「縦列挙」、「横列挙」および「列挙なし」のうちのいずれかで示す変数である。また、列挙には、縦列挙と横列挙の複合型や、列挙入れ子構造も存在するが、これらの列挙は、縦列挙および横列挙の組み合わせで表現することができる。 Next, the processing of the classification unit 205 will be described with reference to FIG. FIG. 20 is a flowchart showing the flow of processing of the classification unit. From now on, the relationship with the node obtained by tracing the node adjacent to the right side from the target node is “connected to the right side”, and the relationship with the node obtained by tracing the node adjacent to the lower side from the target node Called “connect to the bottom”. As shown in FIG. 20, the classification unit 205 first reads the style graph and the target node (step S51). If the target node is not an item name (step S52, true), the classification unit 205 sets the enumeration flag to “no enumeration” (step S58). The enumeration flag is a variable indicating the classification of each target node as one of “vertical enumeration”, “horizontal enumeration”, and “no enumeration”. In addition, the enumeration includes a composite type of vertical enumeration and horizontal enumeration, and an enumeration nested structure. These enumerations can be expressed by a combination of vertical enumeration and horizontal enumeration.

また、対象ノードが項目名である場合（ステップＳ５２、ｆａｌｓｅ）、分類部２０５は、対象ノードの下側に接続するノード群をｂｏｔｔｏｍｓに格納し、対象ノードの右側に接続するノード群をｒｉｇｈｔｓに格納する（ステップＳ５３）。 If the target node is an item name (step S52, false), the classification unit 205 stores the node group connected to the lower side of the target node in bottoms, and sets the node group connected to the right side of the target node to rights. Store (step S53).

ここで、ｒｉｇｈｔｓに項目名が含まれず、ｒｉｇｈｔｓの個数が１である場合（ステップＳ５４、ｔｒｕｅ）、分類部２０５は、列挙フラグを「横列挙」に設定する（ステップＳ５５）。また、ｒｉｇｈｔｓに項目名が含まれる場合、またはｒｉｇｈｔｓの個数が１でない場合（ステップＳ５４、ｆａｌｓｅ）、分類部２０５は、以下の処理を行う。 Here, when the item name is not included in rights and the number of rights is 1 (step S54, true), the classification unit 205 sets the enumeration flag to “horizontal enumeration” (step S55). If the item name is included in rights, or if the number of rights is not 1 (step S54, false), the classification unit 205 performs the following processing.

ｂｏｔｔｏｍｓに項目名が含まれず、ｂｏｔｔｏｍｓの個数が１である場合（ステップＳ５６、ｔｒｕｅ）、分類部２０５は、列挙フラグを「縦列挙」に設定する（ステップＳ５７）。ｂｏｔｔｏｍｓに項目名が含まれる場合、または、ｂｏｔｔｏｍｓの個数が１でない場合（ステップＳ５６、ｆａｌｓｅ）、分類部２０５は、列挙フラグを「列挙なし」に設定する（ステップＳ５８）。最後に、分類部２０５は、列挙フラグを出力する（ステップＳ５９）。 When the item name is not included in bottoms and the number of bottoms is 1 (step S56, true), the classification unit 205 sets the enumeration flag to “vertical enumeration” (step S57). When the item name is included in bottoms, or when the number of bottoms is not 1 (step S56, false), the classification unit 205 sets the enumeration flag to “no enumeration” (step S58). Finally, the classification unit 205 outputs an enumeration flag (step S59).

例えば、図６の例では、対象ノードがノードｅ１である場合、分類部２０５は、列挙フラグを「列挙なし」にする。また、対象ノードがノードｅ４である場合、分類部２０５は、列挙フラグを「横列挙」にする。 For example, in the example of FIG. 6, when the target node is the node e1, the classification unit 205 sets the enumeration flag to “no enumeration”. When the target node is the node e4, the classification unit 205 sets the enumeration flag to “horizontal enumeration”.

次に、図２１を用いて、縦列挙取得部２０６の処理について説明する。図２１は、縦列挙取得部の処理の流れを示すフローチャートである。図２１に示すように、まず、縦列挙取得部２０６は、様式グラフおよび対象ノードを読み込む（ステップＳ６１）。次に、縦列挙取得部２０６は、ｂｏｔｔｏｍｓに対象ノードの下側に接続するノード群を格納する（ステップＳ６２）。ここで、ｂｏｔｔｏｍｓに項目名が含まれている場合（ステップＳ６３、ｔｒｕｅ）、縦列挙取得部２０６は、処理を終了する。また、ｂｏｔｔｏｍｓに項目名が含まれていない場合（ステップＳ６３、ｆａｌｓｅ）、縦列挙取得部２０６は、親を対象ノード、子をｂｏｔｔｏｍｓ、方向を縦とする論理関係を取得する（ステップＳ６４）。そして、縦列挙取得部２０６は、取得した論理関係を列挙リスト３０１に追加する（ステップＳ６５）。 Next, processing of the vertical enumeration acquisition unit 206 will be described with reference to FIG. FIG. 21 is a flowchart illustrating a process flow of the vertical enumeration acquisition unit. As shown in FIG. 21, first, the vertical enumeration acquiring unit 206 reads a style graph and a target node (step S61). Next, the vertical enumeration acquisition unit 206 stores a node group connected to the lower side of the target node in bottoms (step S62). Here, when the item name is included in bottoms (step S63, true), the vertical enumeration acquisition unit 206 ends the process. If the item name is not included in bottoms (step S63, false), the vertical enumeration acquisition unit 206 acquires a logical relationship in which the parent is the target node, the child is bottoms, and the direction is vertical (step S64). Then, the vertical enumeration acquiring unit 206 adds the acquired logical relationship to the enumeration list 301 (step S65).

例えば、図２の例では、対象ノードがノードａ１である場合、縦列挙取得部２０６は、ｂｏｔｔｏｍｓにはノードａ２およびａ３を格納し、親をノードａ１、子をノードａ２およびａ３、方向を縦とする論理関係を取得する。また、対象ノードがノードａ４である場合、縦列挙取得部２０６は、ｂｏｔｔｏｍｓにはノードａ５を格納し、親をノードａ４、子をノードａ５、方向を縦とする論理関係を取得する。 For example, in the example of FIG. 2, when the target node is the node a1, the vertical enumeration acquisition unit 206 stores the nodes a2 and a3 in bottoms, the parent is the node a1, the child is the nodes a2 and a3, and the direction is the vertical Get the logical relationship. When the target node is the node a4, the vertical enumeration acquisition unit 206 stores the node a5 in bottoms, and acquires a logical relationship in which the parent is the node a4, the child is the node a5, and the direction is vertical.

次に、図２２を用いて、横列挙取得部２０７の処理について説明する。図２２は、横列挙取得部の処理の流れを示すフローチャートである。図２２に示すように、まず、横列挙取得部２０７は、様式グラフおよび対象ノードを読み込む（ステップＳ７１）。次に、横列挙取得部２０７は、ｒｉｇｈｔｓに対象ノードの右側に接続するノード群を格納する（ステップＳ７２）。ここで、ｒｉｇｈｔｓに項目名が含まれている場合（ステップＳ７３、ｔｒｕｅ）、横列挙取得部２０７は、処理を終了する。また、ｒｉｇｈｔｓに項目名が含まれていない場合（ステップＳ７３、ｆａｌｓｅ）、横列挙取得部２０７は、親を対象ノード、子をｒｉｇｈｔｓ、方向を横とする論理関係を取得する（ステップＳ７４）。そして、横列挙取得部２０７は、取得した論理関係を列挙リスト３０１に追加する（ステップＳ７５）。 Next, the process of the horizontal enumeration acquisition unit 207 will be described with reference to FIG. FIG. 22 is a flowchart showing the flow of processing of the horizontal enumeration acquisition unit. As shown in FIG. 22, first, the horizontal enumeration acquisition unit 207 reads the style graph and the target node (step S71). Next, the horizontal enumeration acquisition unit 207 stores a node group connected to the right side of the target node in rights (step S72). Here, when the item name is included in rights (step S73, true), the horizontal enumeration acquisition unit 207 ends the process. If the item name is not included in rights (step S73, false), the horizontal enumeration acquisition unit 207 acquires a logical relationship in which the parent is the target node, the child is rights, and the direction is horizontal (step S74). Then, the horizontal enumeration acquisition unit 207 adds the acquired logical relationship to the enumeration list 301 (step S75).

例えば、図３の例では、対象ノードがノードｂ１である場合、横列挙取得部２０７は、ｒｉｇｈｔｓにノードｂ２を格納し、親をノードｂ１、子をノードｂ２、方向を横とする論理関係を取得する。また、対象ノードがノードｂ３である場合、横列挙取得部２０７は、ｒｉｇｈｔｓにノードｂ４を格納し、親ノードをｂ３、子をｂ４、方向を横とする論理関係を取得する。 For example, in the example of FIG. 3, when the target node is the node b1, the horizontal enumeration acquisition unit 207 stores the node b2 in rights, sets the parent as the node b1, the child as the node b2, and the direction as the horizontal relationship. get. When the target node is the node b3, the horizontal enumeration acquisition unit 207 stores the node b4 in rights, and acquires a logical relationship in which the parent node is b3, the child is b4, and the direction is horizontal.

次に、図２３を用いて、包含関係取得部２０８の処理について説明する。図２３は、包含関係取得部の処理の流れを示すフローチャートである。包含関係取得部２０８は、まず、様式グラフを読み込む（ステップＳ８１）。 Next, processing of the inclusion relationship acquisition unit 208 will be described with reference to FIG. FIG. 23 is a flowchart showing the flow of processing of the inclusion relationship acquisition unit. The inclusion relationship acquisition unit 208 first reads a style graph (step S81).

ここで、包含関係取得部２０８は、様式グラフに含まれる各ノードについて、以下の処理を行う（ステップＳ８２、Ｓ８７）。まず、ノード自身が項目名、かつ右方向に隣接するノードに項目名のノードを含む場合（ステップＳ８３、ｔｒｕｅ）、包含関係取得部２０８は、右側の包含関係を取得する（ステップＳ８４）。右側の包含関係を取得する処理の詳細については後述する。また、ノード自身が項目名でない場合、または右方向に隣接するノードに項目名のノードを含まない場合（ステップＳ８３、ｆａｌｓｅ）、包含関係取得部２０８は、右側の包含関係を取得しない。 Here, the inclusion relationship acquisition unit 208 performs the following processing for each node included in the style graph (steps S82 and S87). First, when the node itself includes the item name and the node having the item name in the right adjacent node (step S83, true), the inclusion relationship acquisition unit 208 acquires the right inclusion relationship (step S84). Details of the process of acquiring the right inclusion relationship will be described later. Also, if the node itself is not the item name, or if the node adjacent to the right direction does not include the item name node (step S83, false), the inclusion relationship acquisition unit 208 does not acquire the right inclusion relationship.

次に、ノード自身が項目名、かつ下方向に隣接するノードに項目名のノードを含む場合（ステップＳ８５、ｔｒｕｅ）、包含関係取得部２０８は、下側の包含関係を取得する（ステップＳ８６）。下側の包含関係を取得する処理の詳細については後述する。また、ノード自身が項目名でない場合、または下方向に隣接するノードに項目名のノードを含まない場合（ステップＳ８５、ｆａｌｓｅ）、包含関係取得部２０８は、右側の包含関係を取得しない。包含関係取得部２０８は、全てのノードについて処理を行った後、取得した包含関係を包含関係リストとして出力する（ステップＳ８８）。 Next, when the node itself includes the item name and the node of the item name in the node adjacent in the downward direction (step S85, true), the inclusion relationship acquisition unit 208 acquires the lower inclusion relationship (step S86). . Details of the process of acquiring the lower inclusion relationship will be described later. If the node itself is not an item name, or if the node adjacent to the downward direction does not include the item name node (step S85, false), the inclusion relationship acquisition unit 208 does not acquire the right inclusion relationship. The inclusion relationship acquisition unit 208 processes all the nodes and then outputs the acquired inclusion relationship as an inclusion relationship list (step S88).

次に、図２４を用いて、右側の包含関係を取得する処理について説明する。図２４は、右側の包含関係を取得する処理の流れを示すフローチャートである。図２４に示すように、まず、包含関係取得部２０８は、対象ノードおよび様式グラフを読み込む（ステップＳ１０１）。次に、包含関係取得部２０８は、ｍｉｎ＿ｙに、対象ノードの右側にあるノード群のｙ座標の最小値を格納し、ｍａｘ＿ｙに、対象ノードの右側にあるノード群のｙ座標の最大値を格納する（ステップＳ１０２）。 Next, the process of acquiring the right inclusion relationship will be described with reference to FIG. FIG. 24 is a flowchart showing the flow of processing for acquiring the right inclusion relationship. As shown in FIG. 24, first, the inclusion relationship acquisition unit 208 reads the target node and the style graph (step S101). Next, the inclusion relationship acquisition unit 208 stores the minimum y coordinate value of the node group on the right side of the target node in min_y, and stores the maximum y coordinate value of the node group on the right side of the target node in max_y. (Step S102).

ここで、対象ノードのｙ座標の範囲が対象ノードの右側のノード群のｙ座標の範囲と一致する場合（ステップＳ１０３、ｔｒｕｅ）、包含関係取得部２０８は、ｌｉｓｔに、対象ノードの右側のノード群を格納する（ステップＳ１０７）。なお、包含関係取得部２０８は、対象ノードの右側のノード群のｙ座標の範囲を、ｍｉｎ＿ｙおよびｍａｘ＿ｙを用いて計算する。そして、包含関係取得部２０８は、ｌｉｓｔに含まれるノードそれぞれについて以下の処理を行う（ステップＳ１０８、Ｓ１１０）。包含関係取得部２０８は、それぞれのノードｒを対象ノードとするたびに、ステップＳ１０２に戻り再帰処理を実行する（ステップＳ１０９）。 When the y-coordinate range of the target node matches the y-coordinate range of the right node group of the target node (step S103, true), the inclusion relationship acquisition unit 208 sets the list to the right node of the target node. The group is stored (step S107). The inclusion relationship acquisition unit 208 calculates the y-coordinate range of the node group on the right side of the target node using min_y and max_y. Then, the inclusion relationship acquisition unit 208 performs the following processing for each node included in the list (steps S108 and S110). The inclusion relationship acquisition unit 208 returns to step S102 and executes recursion processing every time each node r is set as a target node (step S109).

また、対象ノードのｙ座標の範囲が対象ノードの右側のノード群のｙ座標の範囲と一致しない場合（ステップＳ１０３、ｆａｌｓｅ）であって、包含するノードの中に項目値が含まれない場合（ステップＳ１０４、ｆａｌｓｅ）、包含関係取得部２０８は、右側に包含する従のノード集合（ｌｉｓｔ）を初期化する（ステップＳ１０５）。また、包含するノードの中に項目値が含まれる場合（ステップＳ１０４、ｔｒｕｅ）、包含関係取得部２０８は、初期化を行わない。最後に、包含関係取得部２０８は、ｌｉｓｔを出力する（ステップＳ１０６）。 Also, when the y-coordinate range of the target node does not match the y-coordinate range of the right node group of the target node (step S103, false), and the item value is not included in the included node ( In step S104, false), the inclusion relationship acquisition unit 208 initializes the slave node set (list) included on the right side (step S105). In addition, when the item value is included in the included node (step S104, true), the inclusion relationship acquisition unit 208 does not perform initialization. Finally, the inclusion relationship acquisition unit 208 outputs list (step S106).

次に、図２５を用いて、下側の包含関係を取得する処理について説明する。図２５は、下側の包含関係を取得する処理の流れを示すフローチャートである。図２５に示すように、まず、包含関係取得部２０８は、対象ノードおよび様式グラフを読み込む（ステップＳ１５１）。次に、包含関係取得部２０８は、ｍｉｎ＿ｘに、対象ノードの下側にあるノード群のｘ座標の最小値を格納し、ｍａｘ＿ｘに、対象ノードの下側にあるノード群のｘ座標の最大値を格納する（ステップＳ１５２）。 Next, processing for acquiring the lower inclusion relationship will be described with reference to FIG. FIG. 25 is a flowchart showing the flow of processing for acquiring the lower inclusion relationship. As shown in FIG. 25, first, the inclusion relationship acquisition unit 208 reads the target node and the style graph (step S151). Next, the inclusion relationship acquisition unit 208 stores the minimum x coordinate value of the node group below the target node in min_x, and the maximum x coordinate value of the node group below the target node in max_x. Is stored (step S152).

ここで、対象ノードのｘ座標の範囲が対象ノードの下側のノード群のｘ座標の範囲と一致する場合（ステップＳ１５３、ｔｒｕｅ）、包含関係取得部２０８は、ｌｉｓｔに、対象ノードの下側のノード群を格納する（ステップＳ１５７）。なお、包含関係取得部２０８は、対象ノードの下側のノード群のｙ座標の範囲を、ｍｉｎ＿ｘおよびｍａｘ＿ｘを用いて計算する。そして、包含関係取得部２０８は、ｌｉｓｔに含まれるノードそれぞれについて以下の処理を行う（ステップＳ１５８、Ｓ１６０）。包含関係取得部２０８は、それぞれのノードｒを対象ノードとするたびに、ステップＳ１５２に戻り再帰処理を実行する（ステップＳ１５９）。 If the x-coordinate range of the target node matches the x-coordinate range of the lower node group of the target node (step S153, true), the inclusion relationship acquisition unit 208 sets the list below the target node. Are stored (step S157). The inclusion relationship acquisition unit 208 calculates the y-coordinate range of the lower node group of the target node using min_x and max_x. Then, the inclusion relationship acquisition unit 208 performs the following processing for each node included in the list (steps S158 and S160). The inclusion relationship acquisition unit 208 returns to step S152 and executes recursion processing every time each node r is the target node (step S159).

また、対象ノードのｘ座標の範囲が対象ノードの下側のノード群のｘ座標の範囲と一致しない場合（ステップＳ１５３、ｆａｌｓｅ）であって、包含するノードの中に項目値が含まれない場合（ステップＳ１５４、ｆａｌｓｅ）、包含関係取得部２０８は、下側に包含する従のノード集合（ｌｉｓｔ）を初期化する（ステップＳ１５５）。また、包含するノードの中に項目値が含まれる場合（ステップＳ１５４、ｔｒｕｅ）、包含関係取得部２０８は、初期化を行わない。最後に、包含関係取得部２０８は、ｌｉｓｔを出力する（ステップＳ１５６）。 Further, when the x-coordinate range of the target node does not match the x-coordinate range of the lower node group of the target node (step S153, false), and the item value is not included in the included node (Step S154, false), the inclusion relationship acquisition unit 208 initializes a subordinate node set (list) included in the lower side (Step S155). In addition, when the item value is included in the included node (step S154, true), the inclusion relationship acquisition unit 208 does not perform initialization. Finally, the inclusion relationship acquisition unit 208 outputs list (step S156).

次に、図２６を用いて、包含グラフ生成部２０９の処理について説明する。図２６は、包含グラフ生成部の処理の流れを示すフローチャートである。図２６に示すように、まず、包含グラフ生成部２０９は、様式グラフおよび包含関係リストを読み込む（ステップＳ２０１）。次に、包含グラフ生成部２０９は、Ｎｖに、包含関係リストから取得した縦方向の包含ノードを格納し、Ｎｈに、包含関係リストから取得した横方向の包含ノードを格納し、ＩＧに、新規包含グラフ集合を格納する（ステップＳ２０２）。 Next, processing of the inclusion graph generation unit 209 will be described with reference to FIG. FIG. 26 is a flowchart illustrating a process flow of the inclusion graph generation unit. As shown in FIG. 26, first, the inclusion graph generation unit 209 reads a style graph and an inclusion relation list (step S201). Next, the inclusion graph generation unit 209 stores the vertical inclusion node acquired from the inclusion relation list in Nv, the horizontal inclusion node acquired from the inclusion relation list in Nh, and the new in IG. The inclusion graph set is stored (step S202).

ここで、包含グラフ生成部２０９は、ＮｖおよびＮｈに含まれる各包含ノードｉについて、以下の処理を行う（ステップＳ２０３、Ｓ２１３）。以後、他のノードの包含関係から、包含グラフに対象ノードが既に割り当てられている場合、対象ノードを「分割済みのノード」と呼ぶ。まず、包含ノードｉが分割済みである場合（ステップＳ２０４、ｔｒｕｅ）、包含グラフ生成部２０９は、次の包含ノードの処理に進む。また、包含ノードｉが分割済みでない場合（ステップＳ２０４、ｆａｌｓｅ）、包含グラフ生成部２０９は、ｉｎｃに包含ノードｉおよび従の項目名ノード集合を格納する（ステップＳ２０５）。 Here, the inclusion graph generation unit 209 performs the following processing for each inclusion node i included in Nv and Nh (steps S203 and S213). Hereinafter, when the target node is already assigned to the inclusion graph due to the inclusion relationship of other nodes, the target node is referred to as a “divided node”. First, when the inclusion node i has been divided (step S204, true), the inclusion graph generation unit 209 proceeds to processing of the next inclusion node. If the inclusion node i has not been divided (step S204, false), the inclusion graph generation unit 209 stores the inclusion node i and the subordinate item name node set in inc (step S205).

ここで、ｉｎｃの中で分割済みのノードがある場合（ステップＳ２０６、ｔｒｕｅ）、包含グラフ生成部２０９は、生成済包含グラフから従のノードが重なっているグラフを探し、当該包含ノードをｎに格納する（ステップＳ２０７）。次に、包含グラフ生成部２０９は、包含ノードｎを起点とする包含グラフ内にあるノードに追加されていないノード群を追加する（ステップＳ２０８）。そして、包含グラフ生成部２０９は、包含ノードの包含する方向、または項目名の並びから包含方向を設定する（ステップＳ２０９）。 Here, when there is a divided node in inc (step S206, true), the inclusion graph generation unit 209 searches the generated inclusion graph for a graph in which the slave node overlaps, and sets the inclusion node to n. Store (step S207). Next, the inclusion graph generation unit 209 adds a node group that has not been added to the nodes in the inclusion graph starting from the inclusion node n (step S208). Then, the inclusion graph generation unit 209 sets the inclusion direction from the inclusion node's inclusion direction or the list of item names (step S209).

一方、ｉｎｃの中で分割済みのノードがない場合（ステップＳ２０６、ｆａｌｓｅ）、包含グラフ生成部２０９は、新規包含グラフを生成する（ステップＳ２１０）。そして、包含グラフ生成部２０９は、主のノードが包含する方向、または項目名の並びから包含方向を設定し（ステップＳ２１１）、新規包含グラフを包含グラフ集合ＩＧに追加する（ステップＳ２１２）。包含グラフ生成部２０９は、全てのｉについて処理を行った後、包含グラフ集合ＩＧを出力する（ステップＳ２１４）。 On the other hand, when there is no divided node in inc (step S206, false), the inclusion graph generation unit 209 generates a new inclusion graph (step S210). Then, the inclusion graph generation unit 209 sets the inclusion direction from the direction included in the main node or the list of item names (step S211), and adds a new inclusion graph to the inclusion graph set IG (step S212). The inclusion graph generation unit 209 outputs the inclusion graph set IG after processing all i (Step S214).

このように、包含グラフ生成部２０９は、包含関係のあるノードのうち、包含するノードの数が多いものを優先して処理していく。包含グラフ生成部２０９は、まず、様式グラフ内の全ての包含関係を取得し、包含方向にノードを探索していく。また、例えば、図７の例では、図１２に示すような包含グラフが生成される。また、図８の例では、ｇ１はｇ６に包含されていないため、ｇ６を起点とした包含グラフとは別に、ｇ１を起点とした包含グラフが生成される。 In this way, the inclusion graph generation unit 209 preferentially processes a node having a large number of included nodes among nodes having an inclusion relationship. The inclusion graph generation unit 209 first acquires all inclusion relationships in the style graph and searches for nodes in the inclusion direction. For example, in the example of FIG. 7, an inclusion graph as shown in FIG. 12 is generated. In the example of FIG. 8, since g1 is not included in g6, an inclusion graph starting from g1 is generated separately from the inclusion graph starting from g6.

次に、図２７を用いて、項目名間合成部２１０の処理について説明する。図２７は、項目名間合成部の処理の流れを示すフローチャートである。図２７に示すように、まず、項目名間合成部２１０は、様式グラフ、包含グラフ３０２、列挙リスト３０１および始点ノード群を読み込む（ステップＳ３０１）。次に、項目名間合成部２１０は、新規木構造Ｔを生成する（ステップＳ３０２）。なお、始点ノード群は、包含グラフの起点となるノードの集合である。 Next, processing of the item name synthesizing unit 210 will be described with reference to FIG. FIG. 27 is a flowchart showing the flow of processing of the item name synthesizing unit. As shown in FIG. 27, first, the item name synthesizing unit 210 reads the style graph, the inclusion graph 302, the enumeration list 301, and the start node group (step S301). Next, the item name synthesizing unit 210 generates a new tree structure T (step S302). The starting point node group is a set of nodes that are the starting points of the inclusion graph.

ここで、項目名間合成部２１０は、始点ノード群に含まれる各始点ノードについて、以下の処理を行う（ステップＳ３０３、Ｓ３１９）。まず、項目名間合成部２１０は、包含グラフからノードが項目名のノード群を求め、ｃｈｉｌｄｒｅｎに格納する（ステップＳ３０４）。次に、項目名間合成部２１０は、子をｃｈｉｌｄｒｅｎ、親をなしとする始点ノード用の新規木ノードｔを生成する（ステップＳ３０５）。そして、項目名間合成部２１０は、対象木ノードをｔとする（ステップＳ３０６）。そして、項目名間合成部２１０は、対象木ノードの種類を設定せずに木構造に追加する（ステップＳ３０７）。 Here, the inter-item name composition unit 210 performs the following processing for each start point node included in the start point node group (steps S303 and S319). First, the item name synthesizing unit 210 obtains a node group whose node is an item name from the inclusion graph, and stores it in the children (step S304). Next, the item name synthesizing unit 210 generates a new tree node t for the start point node having children as children and no parents (step S305). Then, the item name synthesizing unit 210 sets the target tree node to t (step S306). Then, the item name synthesizing unit 210 adds to the tree structure without setting the type of the target tree node (step S307).

ここで、対象木ノードが列挙リストにある場合（ステップＳ３０８、ｆａｌｓｅ）、項目名間合成部２１０は、次の始点ノードの処理に移行する。対象木ノードが列挙リストにない場合（ステップＳ３０８、ｔｒｕｅ）であって、さらに、対象木ノードが横方向に包含するノード集合が存在する場合（ステップＳ３０９、ｔｒｕｅ）、項目名間合成部２１０は、対象木ノードの種類を包含に設定する（ステップＳ３１０）。そして、項目名間合成部２１０は、横に包含するノード集合の次のノードを対象ノードとし、ステップＳ３０７へ戻り、再帰処理を行う（ステップＳ３１１、Ｓ３１２、Ｓ３１３）。 Here, when the target tree node is in the enumeration list (step S308, false), the inter-name-name synthesis unit 210 proceeds to processing of the next start point node. If the target tree node is not in the enumeration list (step S308, true) and there is a node set that the target tree node includes in the horizontal direction (step S309, true), the inter-item name composition unit 210 Then, the type of the target tree node is set to inclusion (step S310). Then, the item name synthesizing unit 210 sets the next node in the horizontally included node set as the target node, returns to step S307, and performs recursive processing (steps S311, S312, and S313).

対象木ノードが列挙リストにない場合（ステップＳ３０８、ｔｒｕｅ）であって、さらに、対象木ノードが横方向に包含するノード集合が存在せず（ステップＳ３０９、ｆａｌｓｅ）、対象木ノードが縦方向に包含するノード集合が存在する場合（ステップＳ３１４、ｔｒｕｅ）、項目名間合成部２１０は、対象木ノードの種類を包含に設定する（ステップＳ３１５）。そして、項目名間合成部２１０は、横に包含するノード集合の次のノードを対象ノードとし、ステップＳ３０７へ戻り、再帰処理を行う（ステップＳ３１６、Ｓ３１７、Ｓ３１８）。項目名間合成部２１０は、全ての始点ノードについて処理を行った後、木構造データを出力する（ステップＳ３２０）。 When the target tree node is not in the enumeration list (step S308, true), there is no node set that the target tree node includes in the horizontal direction (step S309, false), and the target tree node is in the vertical direction. If there is an included node set (step S314, true), the item name synthesizing unit 210 sets the type of the target tree node to include (step S315). Then, the item name synthesizing unit 210 sets the next node in the horizontally included node set as the target node, returns to step S307, and performs recursive processing (steps S316, S317, and S318). The item name synthesizing unit 210 performs processing on all the start point nodes, and then outputs tree structure data (step S320).

なお、対象木ノードが列挙リストにない場合（ステップＳ３０８、ｔｒｕｅ）であって、さらに、対象木ノードが横方向に包含するノード集合が存在せず（ステップＳ３０９、ｆａｌｓｅ）、対象木ノードが縦方向に包含するノード集合が存在しない場合（ステップＳ３１４、ｆａｌｓｅ）、項目名間合成部２１０は、次の始点ノードの処理に移行する。 When the target tree node is not in the enumeration list (step S308, true), there is no node set that the target tree node includes in the horizontal direction (step S309, false), and the target tree node is vertical. If there is no node set included in the direction (step S314, false), the inter-name-name combining unit 210 proceeds to the processing of the next start point node.

次に、図２８を用いて、列挙合成部２１１の処理について説明する。図２８に示すように、列挙合成部２１１は、まず、様式グラフ、列挙リスト３０１および木構造データを読み込む（ステップＳ３４１）。 Next, processing of the enumeration synthesis unit 211 will be described with reference to FIG. As shown in FIG. 28, the enumeration synthesis unit 211 first reads the style graph, the enumeration list 301, and the tree structure data (step S341).

そして、列挙合成部２１１は、列挙リスト３０１の各ノードごとに以下の処理を行う（ステップＳ３４２、Ｓ３４７）。まず、列挙合成部２１１は、新規木ノードを生成する（ステップＳ３４３）。次に、列挙合成部２１１は、木ノードの子を列挙リストの子に設定する（ステップＳ３４４）。そして、列挙合成部２１１は、木ノードを木構造データに合成する（ステップＳ３４６）。列挙合成部２１１は、以上の処理を全てのノードについて行った後、木構造データを出力する（ステップＳ３４８）。例えば、図７の例では、図１３に示すような木構造データが出力される。 The enumeration synthesis unit 211 performs the following processing for each node in the enumeration list 301 (steps S342 and S347). First, the enumeration synthesis unit 211 generates a new tree node (step S343). Next, the enumeration composition unit 211 sets the children of the tree node as children of the enumeration list (step S344). Then, the enumeration synthesis unit 211 synthesizes the tree node with the tree structure data (step S346). The enumeration synthesis unit 211 outputs the tree structure data after performing the above processing for all the nodes (step S348). For example, in the example of FIG. 7, tree structure data as shown in FIG. 13 is output.

次に、図２９を用いて、追加部２１２の処理について説明する。図２９は、追加部の処理の流れを示すフローチャートである。図２９に示すように、まず、追加部２１２は、様式グラフおよび木構造データを読み込み、Ｓに木構造データを格納し、Ｇに様式グラフを格納する（ステップＳ３５１）。次に、追加部２１２は、ｎｄｓに親なしのノード集合を格納し、ｓｔｒに任意の文字列を格納する（ステップＳ３５２）。ここで、追加部２１２は、ｎｄｓの左または上方向にある罫線枠外の文字列が取得できれば、取得した文字列をｓｔｒに格納する（ステップＳ３５３）。 Next, processing of the adding unit 212 will be described with reference to FIG. FIG. 29 is a flowchart showing the flow of processing of the adding unit. As shown in FIG. 29, the adding unit 212 first reads the style graph and the tree structure data, stores the tree structure data in S, and stores the style graph in G (step S351). Next, the adding unit 212 stores a parentless node set in nds, and stores an arbitrary character string in str (step S352). Here, if the character string outside the ruled line frame in the left or upward direction of nds can be acquired, the adding unit 212 stores the acquired character string in str (step S353).

罫線枠外の文字列が発見されなかった場合（ステップＳ３５４、ｆａｌｓｅ）、追加部２１２は、ｓｔｒに格納した任意の文字列を有し、種類を様式とする木ノードをｎｄｓの親として木構造データに追加する（ステップＳ３５５）。また、罫線枠外の文字列が発見された場合（ステップＳ３５４、ｔｒｕｅ）、追加部２１２は、ｓｔｒに格納した任意の文字列を有し、種類を様式とする木ノードをｎｄｓの親として木構造データに追加する（ステップＳ３５６）。最後に、追加部２１２は、木構造データを出力する（ステップＳ３５７）。 When a character string outside the ruled line frame is not found (step S354, false), the adding unit 212 has an arbitrary character string stored in str, and uses a tree node whose type is style as tree structure data as a parent of nds. (Step S355). In addition, when a character string outside the ruled line frame is found (step S354, true), the adding unit 212 has an arbitrary character string stored in str, and uses a tree node whose type is a style as a parent of nds. It adds to data (step S356). Finally, the adding unit 212 outputs tree structure data (step S357).

なお、任意の文字列としては、ｆｏｒｍ＃（ｉ）のように、カウントに合わせて文字列が変わるようにしてもよい。この場合、（ｉ）の部分がカウントに合わせて変化するため、根ノードの文字列は「ｆｏｒｍ１」、「ｆｏｒｍ２」のようになる。また、表の左または上方向の罫線枠外に、「○○直交表」のような記載があれば、根ノードの文字列を「○○直交表」のようにしてもよい。 In addition, as an arbitrary character string, the character string may be changed in accordance with the count as in form # (i). In this case, since the part (i) changes in accordance with the count, the character string of the root node becomes “form1”, “form2”. Further, if there is a description such as “XX orthogonal table” outside the left or upper ruled line frame of the table, the character string of the root node may be set as “XX orthogonal table”.

［第１の実施形態の効果］
抽出部２０１は、帳票の項目名または項目値を表す領域に関する情報をノードとして表し、ノード間の隣接関係をエッジとして表したグラフを基に、ノードのうち、あらかじめ設定された条件を満たすノードを、項目名を表す領域のノードである項目名ノードとして抽出する。また、第１の削除部２０３は、１つのノードの所定の方向に、複数のノードが隣接している場合、１つのノードと複数のノードとの隣接関係を表すエッジを削除する。また、第２の削除部２０４は、項目名ノードのうち、所定の方向に項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、項目値ノードとの隣接関係を表すエッジを削除する。 [Effect of the first embodiment]
The extraction unit 201 represents information related to the area representing the item name or item value of the form as a node, and based on the graph representing the adjacency relationship between the nodes as an edge, the node satisfying a preset condition is selected. , It is extracted as an item name node which is a node of the area representing the item name. In addition, when a plurality of nodes are adjacent to each other in a predetermined direction of one node, the first deletion unit 203 deletes an edge representing an adjacency relationship between the one node and the plurality of nodes. In addition, the second deletion unit 204 determines the adjacency relationship between the item name node and the item value node adjacent to the item value node that is the node of the area representing the item value in a predetermined direction among the item name nodes. Delete the representing edge.

縦列挙取得部２０６および横列挙取得部２０７は、第１の削除部２０３および第２の削除部２０４によってエッジの削除が行われたグラフを基に、項目名ノードと項目値ノードとの間の論理関係を取得する。また、包含関係取得部２０８は、第１の削除部２０３によってエッジの削除が行われたグラフを基に、項目名ノード間の包含関係を取得する。また、項目名間合成部２１０および列挙合成部２１１は、縦列挙取得部２０６および横列挙取得部２０７によって取得された論理関係と、包含関係取得部２０８によって取得された包含関係と、を合成した木構造のデータを作成する。 The vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207 are arranged between the item name node and the item value node based on the graph in which the edge is deleted by the first deletion unit 203 and the second deletion unit 204. Get logical relationship. The inclusion relationship acquisition unit 208 acquires the inclusion relationship between the item name nodes based on the graph in which the edge is deleted by the first deletion unit 203. The item name synthesizing unit 210 and the enumeration synthesizing unit 211 combine the logical relationship acquired by the vertical enumeration acquisition unit 206 and the horizontal enumeration acquisition unit 207 and the inclusion relationship acquired by the inclusion relationship acquisition unit 208. Create tree structure data.

このため、本実施形態によれば、帳票に縦列挙、横列挙、縦列挙と横列挙の複合型、または列挙型入れ子構造が含まれる場合であっても、帳票の項目名間の論理関係、および項目名と項目値との間の論理関係を正確に認識することができるようになる。さらに、本実施形態では、論理関係を半自動的に取得することができるため、帳票の半構造データの取得および活用を効率的に行うことができるようになる。 Therefore, according to the present embodiment, even if the form includes vertical enumeration, horizontal enumeration, a composite type of vertical enumeration and horizontal enumeration, or an enumerated type nested structure, the logical relationship between the item names of the forms, In addition, the logical relationship between the item name and the item value can be accurately recognized. Further, in the present embodiment, since the logical relationship can be acquired semi-automatically, it is possible to efficiently acquire and utilize the semi-structured data of the form.

また、第１の削除部２０３は、１つのノードの左側または上側に、複数のノードが隣接している場合、１つのノードと複数のノードとの隣接関係を表すエッジを削除してもよい。また、第２の削除部２０４は、項目名ノードのうち、左側または上側に、項目値を表す領域のノードである項目値ノードが隣接している項目名ノードと、項目値ノードとの隣接関係を表すエッジを削除してもよい。一般的に帳票の項目名間や項目名と項目値との位置関係は、左から右、または上から下である場合が多い。このため、削除する隣接関係の方向を左側と上側に設定しておくことで、多くの帳票に対応するこが可能になる。 In addition, when a plurality of nodes are adjacent to the left side or the upper side of one node, the first deletion unit 203 may delete an edge representing the adjacent relationship between the one node and the plurality of nodes. In addition, the second deletion unit 204 has an adjacency relationship between an item name node and an item value node adjacent to an item value node that is a node of an area representing an item value on the left or upper side of the item name nodes. You may delete the edge showing. In general, the positional relationship between item names in a form and between item names and item values is often left to right or top to bottom. For this reason, by setting the direction of the adjacent relationship to be deleted on the left side and the upper side, it is possible to deal with many forms.

包含関係取得部２０８は、第１の項目名ノードの右側に隣接する第１のノード群のうち少なくとも１つが項目名ノードであり、かつ、第１のノード群に含まれる全てのノードの高さが第１の項目名ノードの高さ以下であり、かつ、第１のノード群の左上端のノードの頂点と、第１の項目名ノードの頂点が重なっている場合、第１の項目名ノードが第１のノード群を包含していると判定する。また、包含関係取得部２０８は、第２の項目名ノードの下側に隣接する第２のノード群のうち少なくとも１つが項目名ノードであり、かつ、第２のノード群に含まれる全てのノードの幅が第２の項目名ノードの幅以下であり、かつ、第２のノード群の左上端のノードの頂点と、第１の項目名ノードの頂点が重なっている場合、第２の項目名ノードが第２のノード群を包含していると判定する。このように、ノードの隣接関係、高さおよび幅を利用することによって、包含関係を正確に認識することができる。 The inclusion relationship acquisition unit 208 includes at least one of the first node groups adjacent to the right side of the first item name node as the item name node, and heights of all the nodes included in the first node group. Is less than or equal to the height of the first item name node, and the vertex of the upper left node of the first node group and the vertex of the first item name node overlap, the first item name node Are included in the first node group. In addition, the inclusion relationship acquisition unit 208 includes at least one of the second node groups adjacent to the lower side of the second item name node as an item name node, and all nodes included in the second node group The width of the second item name node is less than or equal to the width of the second item name node, and the vertex of the upper left node of the second node group overlaps the vertex of the first item name node, the second item name It is determined that the node includes the second node group. Thus, the inclusion relation can be accurately recognized by utilizing the adjacent relation, height, and width of the nodes.

［その他の実施形態］
論理関係認識の対象は、帳票形式に整形可能であれば、Ｗｅｂ画面やシステムＧＵＩであってもよい。例えば、図３０に示すようなＷｅｂ上で航空券を行うようなＷｅｂ画面から、項目名および項目値を取得し、帳票形式に整形することで、当該Ｗｅｂ画面を論理関係認識処理の対象とすることができる。図３０は、その他の実施形態について説明するための図である。 [Other Embodiments]
The logical relationship recognition target may be a Web screen or a system GUI as long as it can be formatted into a form format. For example, an item name and an item value are acquired from a Web screen on which a ticket is made on the Web as shown in FIG. 30, and the Web screen is subjected to logical relationship recognition processing by formatting it into a form format. be able to. FIG. 30 is a diagram for explaining another embodiment.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Furthermore, all or a part of each processing function performed in each device may be realized by a CPU and a program that is analyzed and executed by the CPU, or may be realized as hardware by wired logic.

また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Also, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
一実施形態として、論理関係認識装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記の論理関係認識を実行する論理関係認識プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の論理関係認識プログラムを情報処理装置に実行させることにより、情報処理装置を論理関係認識装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等のスレート端末等がその範疇に含まれる。 [program]
As an embodiment, the logical relationship recognition apparatus 10 can be implemented by installing a logical relationship recognition program for executing the logical relationship recognition as package software or online software on a desired computer. For example, the information processing apparatus can function as the logical relation recognition apparatus 10 by causing the information processing apparatus to execute the above logical relation recognition program. The information processing apparatus referred to here includes a desktop or notebook personal computer. In addition, the information processing apparatus includes mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone System), and slate terminals such as PDA (Personal Digital Assistant).

また、論理関係認識装置１０は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の論理関係認識に関するサービスを提供する論理関係認識サーバ装置として実装することもできる。例えば、論理関係認識サーバ装置は、帳票を入力とし、木構造データを出力とする論理関係認識サービスを提供するサーバ装置として実装される。この場合、論理関係認識サーバ装置は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の論理関係認識に関するサービスを提供するクラウドとして実装することとしてもかまわない。 The logical relationship recognition apparatus 10 can also be implemented as a logical relationship recognition server device that uses a terminal device used by a user as a client and provides the client with the above-described service related to logical relationship recognition. For example, the logical relationship recognition server device is implemented as a server device that provides a logical relationship recognition service that takes a form as input and outputs tree structure data. In this case, the logical relationship recognition server device may be implemented as a Web server, or may be implemented as a cloud that provides the above-described service relating to logical relationship recognition by outsourcing.

図３１は、プログラムが実行されることにより論理関係認識装置が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 FIG. 31 is a diagram illustrating an example of a computer in which a logical relationship recognition apparatus is realized by executing a program. The computer 1000 includes a memory 1010 and a CPU 1020, for example. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、論理関係認識装置１０の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、論理関係認識装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤにより代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the logical relationship recognition apparatus 10 is implemented as a program module 1093 in which a code executable by a computer is described. The program module 1093 is stored in the hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the logical relationship recognition apparatus 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD.

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes them.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN, WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

１０論理関係認識装置
２０制御部
３０記憶部
２０１抽出部
２０２解析部
２０３第１の削除部
２０４第２の削除部
２０５分類部
２０６縦列挙取得部
２０７横列挙取得部
２０８包含関係取得部
２０９包含グラフ生成部
２１０項目名間合成部
２１１列挙合成部
２１２追加部
３０１列挙リスト
３０２包含グラフ DESCRIPTION OF SYMBOLS 10 Logical relationship recognition apparatus 20 Control part 30 Memory | storage part 201 Extraction part 202 Analysis part 203 1st deletion part 204 2nd deletion part 205 Classification part 206 Vertical enumeration acquisition part 207 Horizontal enumeration acquisition part 208 Inclusion relation acquisition part 209 Inclusion graph Generation unit 210 Inter-name-name synthesis unit 211 Enumeration synthesis unit 212 Addition unit 301 Enumeration list 302 Inclusion graph

Claims

Based on a graph that represents information related to the area representing the item name or item value of the form as a node and the adjacent relationship between the nodes as an edge, a node that satisfies a preset condition is selected as the item name. An extraction unit that extracts as an item name node that is a node of an area that represents
A first deletion unit that deletes an edge representing an adjacency relationship between the one node and the plurality of nodes when a plurality of nodes are adjacent to each other in a predetermined direction of the one node;
Among the item name nodes, an item name node that is adjacent to an item value node that is a node of an area that represents an item value in a predetermined direction, and an edge that represents an adjacency relationship between the item value node are deleted. Delete part,
A first acquisition unit that acquires a logical relationship between the item name node and the item value node based on the graph in which an edge is deleted by the first deletion unit and the second deletion unit. When,
A second acquisition unit that acquires an inclusion relationship between the item name nodes based on the graph in which an edge is deleted by the first deletion unit;
A synthesis unit that creates data of a tree structure obtained by synthesizing the logical relationship acquired by the first acquisition unit and the inclusion relationship acquired by the second acquisition unit;
A logical relationship recognition apparatus comprising:

The first deletion unit, when a plurality of nodes are adjacent to the left or upper side of one node, deletes an edge representing an adjacency relationship between the one node and the plurality of nodes;
The second deletion unit includes an item name node adjacent to an item value node that is a node of an area representing an item value on the left or upper side of the item name nodes, and an adjacency relationship between the item value nodes. The logical relationship recognition apparatus according to claim 1, wherein an edge representing the data is deleted.

In the second acquisition unit, at least one of the first node groups adjacent to the right side of the first item name node is the item name node, and all the nodes included in the first node group The height of the first item name node is equal to or less than the height of the first item name node, and the vertex of the upper left node of the first node group overlaps the vertex of the first item name node, It is determined that the first item name node includes the first node group, and at least one of the second node groups adjacent to the lower side of the second item name node is the item name node. And the width of all the nodes included in the second node group is equal to or less than the width of the second item name node, and the vertex of the upper left node of the second node group, and If the vertex of the second item name nodes overlap, the second term Logical relationship recognition apparatus according to claim 2 in which the name node and determines that encompasses the second node group.

A logical relationship recognition method executed by a logical relationship recognition device,
Based on a graph that represents information related to the area representing the item name or item value of the form as a node and the adjacent relationship between the nodes as an edge, a node that satisfies a preset condition is selected as the item name. An extraction step of extracting as an item name node that is a node of an area representing
When a plurality of nodes are adjacent to each other in a predetermined direction of one node, a first deletion step of deleting an edge representing an adjacent relationship between the one node and the plurality of nodes;
A second acquisition step of acquiring an inclusion relationship between the item name nodes based on the graph in which an edge is deleted by the first deletion step;
Among the item name nodes, an item name node that is adjacent to an item value node that is a node of an area that represents an item value in a predetermined direction, and an edge that represents an adjacency relationship between the item value node are deleted. Delete process,
A first acquisition step of acquiring a logical relationship between the item name node and the item value node based on the graph in which an edge is deleted by the first deletion step and the second deletion step. When,
A synthesis step of creating data of a tree structure obtained by synthesizing the logical relationship acquired by the first acquisition step and the inclusion relationship acquired by the second acquisition step;
The logical relationship recognition method characterized by including.

On the computer,
Based on a graph that represents information related to the area representing the item name or item value of the form as a node and the adjacent relationship between the nodes as an edge, a node that satisfies a preset condition is selected as the item name. An extraction step of extracting as an item name node that is a node of an area representing
A first deletion step of deleting an edge representing an adjacency relationship between the one node and the plurality of nodes when a plurality of nodes are adjacent to each other in a predetermined direction of the one node;
A second acquisition step of acquiring an inclusion relationship between the item name nodes based on the graph in which an edge is deleted by the first deletion step;
Among the item name nodes, an item name node that is adjacent to an item value node that is a node of an area that represents an item value in a predetermined direction, and an edge that represents an adjacency relationship between the item value node are deleted. Delete step,
A first acquisition step of acquiring a logical relationship between the item name node and the item value node based on the graph in which an edge is deleted by the first deletion step and the second deletion step. When,
A synthesis step of creating data of a tree structure obtained by synthesizing the logical relationship acquired by the first acquisition step and the inclusion relationship acquired by the second acquisition step;
A logical relationship recognition program characterized in that