JP6613706B2

JP6613706B2 - Table design support apparatus, table design support method, and control program

Info

Publication number: JP6613706B2
Application number: JP2015161237A
Authority: JP
Inventors: 駿一田中
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2015-08-18
Filing date: 2015-08-18
Publication date: 2019-12-04
Anticipated expiration: 2035-08-18
Also published as: JP2017041017A

Description

本発明は、テーブル設計支援装置、テーブル設計支援方法及び制御プログラムに関する。 The present invention relates to a table design support apparatus, a table design support method, and a control program.

昨今、コンピュータシステムの進化に伴い、インターネット上で公開されているオープンデータ等、他者から提供されるデータの容量の増加や種類の多様化が進んでいる。データベースによってこれらのデータを管理するにあたり、データの容量が極めて大きい場合は、大量のデータを複数のマシンに分散して処理できるハドゥープ（Ｈａｄｏｏｐ）等のプラットフォームを用いるのが一般的である。一方、これらのデータの容量がギガバイト（ＧＢ）オーダー程度であれば、リレーショナルデータベース管理システム（ＲＤＢＭＳ）により対応することが可能である。ＲＤＢＭＳでは、普及しているＳＱＬ言語をデータベース言語としているので簡便である。 In recent years, with the evolution of computer systems, the amount of data provided by others, such as open data published on the Internet, has been increasing and the types have been diversified. When managing these data using a database, if the data capacity is extremely large, it is common to use a platform such as Hadoop that can process a large amount of data distributed to a plurality of machines. On the other hand, if the capacity of these data is on the order of gigabytes (GB), it can be handled by a relational database management system (RDBMS). The RDBMS is convenient because the SQL language that is prevalent is the database language.

ＲＤＢＭＳに対しデータを格納するためには、対応する形式のテーブル定義データを生成する必要がある。このテーブル定義データの生成には少なからず専門的な知識を要する。特許文献１には、テーブル定義データの生成におけるテーブル設計者の負荷を軽減するため、新たにテーブル定義データを生成する際、予め用意されたテーブル定義データを活用するようにするテーブル設計支援装置が記載されている。 In order to store data in the RDBMS, it is necessary to generate table definition data in a corresponding format. The generation of this table definition data requires a lot of specialized knowledge. Patent Document 1 discloses a table design support device that utilizes table definition data prepared in advance when newly generating table definition data in order to reduce the load on the table designer in generating table definition data. Has been described.

特開２０１１−００８４４３号公報JP 2011-008443 A

特許文献１に記載されたテーブル設計支援装置では、テーブル設計作業において、テーブル設計者は、入手したレコード集合に対し、各フィールド（列）のデータ型や、一意性などの制約を定める必要がある。オープンデータのように他者からレコード集合のみ提供される場合には、提供されたレコード集合の内容を確認しながらテーブル設計作業を行う必要がある。しかしながら、他者から提供されるレコード集合の規模（データ容量）が大きくなるにつれ、テーブル設計作業においてデータ型の不一致や制約違反などのエラーが生じやすくなり、エラーを修正するための再設計作業が必要になるケースが多くなるという問題があった。ここで、データ型の不一致とは、設計したフィールドのデータ型と実際のレコードのフィールド値との不一致による格納エラーを意味する。また、制約違反とは、フィールドもしくはレコード集合の制約（一意性やヌル（ＮＵＬＬ）値の有無など）に対し、実際のレコード集合が違反していることによる格納エラーを意味する。 In the table design support apparatus described in Patent Document 1, in the table design work, the table designer needs to define restrictions such as the data type of each field (column) and uniqueness for the obtained record set. . When only a record set is provided by another person as in open data, it is necessary to perform table design work while confirming the contents of the provided record set. However, as the scale (data capacity) of record sets provided by others increases, errors such as data type mismatches and constraint violations tend to occur in table design work, and redesign work to correct errors There was a problem that more cases were needed. Here, the data type mismatch means a storage error due to a mismatch between the designed field data type and the actual record field value. Further, the constraint violation means a storage error due to an actual record set violating a field or record set constraint (such as uniqueness or null value).

本発明は、以上の背景に鑑みなされたものであり、リレーショナルデータベース管理システム対するテーブル設計作業において、他者から提供されるレコード集合の規模が大きくなっても、テーブル設計作業においてデータ型の不一致や制約違反などのエラーが生じ難くできるテーブル設計支援装置、テーブル設計支援方法及び制御プログラムを提供することを目的とする。 The present invention has been made in view of the above background, and in the table design work for the relational database management system, even if the size of the record set provided by another person increases, the data type mismatch or It is an object of the present invention to provide a table design support apparatus, a table design support method, and a control program that are unlikely to cause errors such as constraint violations.

かかる課題を解決するために本発明は、リレーショナルデータベース管理システムに対するテーブル設計を支援するテーブル設計支援装置であって、他者から提供されたレコード集合を分割するための区切り文字候補を検出する区切り文字候補検出部と、前記区切り文字候補によって前記レコード集合を分割し分割レコード群を生成するレコード分割部と、前記分割レコード群における各分割レコードに対し、前記リレーショナルデータベース管理システムにおいて格納可能なデータ型及び制約について分析を行い分析結果ファイル群を生成するフィールド分析部と、前記分析結果ファイル群を結合して前記リレーショナルデータベース管理システムで格納が可能なテーブル定義候補を生成する分析結果結合部と、を備えるものである。 In order to solve such a problem, the present invention is a table design support device for supporting table design for a relational database management system, and is a delimiter character for detecting delimiter characters for dividing a record set provided by another person. A candidate detection unit, a record dividing unit that divides the record set by the delimiter candidate and generates a divided record group, a data type that can be stored in the relational database management system for each divided record in the divided record group, and A field analysis unit that analyzes the constraints and generates an analysis result file group; and an analysis result combination unit that combines the analysis result file groups to generate table definition candidates that can be stored in the relational database management system. Is.

また、本発明は、区切り文字候補検出部と、レコード分割部と、フィールド分析部と、分析結果結合部と、を備えるテーブル設計支援装置により、リレーショナルデータベース管理システムに対するテーブル設計を支援するテーブル設計支援方法であって、区切り文字候補検出部が、他者から提供されたレコード集合を分割するための区切り文字候補を検出するステップと、レコード分割部が、前記区切り文字候補によって前記レコード集合を分割し分割レコード群を生成するステップと、フィールド分析部が、前記分割レコード群における各分割レコードに対し、前記リレーショナルデータベース管理システムにおいて格納可能なデータ型及び制約について分析を行い分析結果ファイル群を生成するステップと、分析結果結合部が、前記分析結果ファイル群を結合して前記リレーショナルデータベース管理システムで格納が可能なテーブル定義候補を生成するステップと、を備えるものである。 The present invention also provides table design support for supporting table design for a relational database management system using a table design support device comprising a delimiter candidate detection unit, a record division unit, a field analysis unit, and an analysis result combination unit. A method in which a delimiter candidate detection unit detects a delimiter candidate for dividing a record set provided by another, and a record dividing unit divides the record set by the delimiter candidate. A step of generating a divided record group, and a step of generating an analysis result file group by analyzing a data type and constraints that can be stored in the relational database management system for each divided record in the divided record group And the analysis result combining unit Generating a possible table definition candidate storage in the relational database management system combines the result files are those comprising a.

さらに、本発明は、リレーショナルデータベース管理システムに対するテーブル設計を支援する制御プログラムであって、他者から提供されたレコード集合を分割するための区切り文字候補を検出するステップと、前記区切り文字候補によって前記レコード集合を分割し分割レコード群を生成するステップと、前記分割レコード群における各分割レコードに対し、前記リレーショナルデータベース管理システムにおいて格納可能なデータ型及び制約について分析を行い分析結果ファイル群を生成するステップと、前記分析結果ファイル群を結合して前記リレーショナルデータベース管理システムで格納が可能なテーブル定義候補を生成するステップと、をコンピュータに実行させるものである。 Further, the present invention provides a control program for supporting table design for a relational database management system, the step of detecting a delimiter character candidate for dividing a record set provided by another person, Dividing a record set to generate a divided record group, and generating an analysis result file group by analyzing data types and constraints that can be stored in the relational database management system for each divided record in the divided record group And generating the table definition candidates that can be stored in the relational database management system by combining the analysis result file groups.

本発明によれば、リレーショナルデータベース管理システム対するテーブル設計作業において、他者から提供されるレコード集合の規模が大きくなっても、テーブル設計作業においてデータ型の不一致や制約違反などのエラーが生じ難くできる。 According to the present invention, in the table design work for the relational database management system, even if the size of the record set provided by another person increases, errors such as data type mismatches and constraint violations can hardly occur in the table design work. .

本発明の概要について説明する図である。It is a figure explaining the outline | summary of this invention. 本実施の形態にかかるテーブル設計支援装置の概略構成の一例を示すブロック図である。It is a block diagram which shows an example of schematic structure of the table design assistance apparatus concerning this Embodiment. 本実施の形態にかかるテーブル設計支援装置における処理フローを示すフローチャートである。It is a flowchart which shows the processing flow in the table design assistance apparatus concerning this Embodiment. 本実施の形態にかかるテーブル設計支援装置で処理するレコード集合の一例を示す図である。It is a figure which shows an example of the record set processed with the table design assistance apparatus concerning this Embodiment. 図４に示すレコード集合の一例を区切り文字候補で区切った状態を示す図である。It is a figure which shows the state which divided the example of the record set shown in FIG. 4 with a delimiter character candidate. 図４に示すレコード集合の一例を区切り文字候補で分割した状態を示す図である。It is a figure which shows the state which divided | segmented an example of the record set shown in FIG. 4 by the delimiter character candidate. 図３におけるレコード集合において区切り文字の候補を決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines the candidate of a delimiter in the record set in FIG. 図３における区切り文字候補の決定について具体例により説明する図である。It is a figure explaining the determination of the delimiter candidate in FIG. 3 by a specific example. 図３における区切り文字候補の決定について具体例により説明する図である。It is a figure explaining the determination of the delimiter candidate in FIG. 3 by a specific example. 図７におけるレコード集合ファイルのある一行において各既出文字の出現回数をカウントする処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process which counts the frequency | count of appearance of each already appearing character in one line with a record set file in FIG. 図３における分割レコード群に対する分析処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the analysis process with respect to the division | segmentation record group in FIG. 図１０におけるデータ型候補の判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the determination process of the data type candidate in FIG. 実行した合致判定用プログラムのＲＤＢＭＳ種別とデータ型とに対応するスコアの一覧表の一例を示す図である。It is a figure which shows an example of the list of the scores corresponding to the RDBMS classification and data type of the executed match determination program. 図１０のステップＳ４６において出力された分析結果ファイル群を、図３のステップＳ５において１つのファイルに結合した分析結果ファイルの一例を示す図である。It is a figure which shows an example of the analysis result file which combined the analysis result file group output in step S46 of FIG. 10 into one file in step S5 of FIG.

［本発明の特徴］
本発明の実施の形態の説明に先立って、本発明の特徴についてその概要をまず説明する。 [Features of the present invention]
Prior to the description of the embodiments of the present invention, an outline of the features of the present invention will be described first.

図１は、本発明の概要について説明する図である。図１は、本発明の概略構成の一例を示すテーブル設計支援装置１の概略構成の一例を示すブロック図である。テーブル設計支援装置１は、区切り文字候補検出部２と、レコード分割部３と、フィールド分析部４と、分析結果結合部５と、を備えている。 FIG. 1 is a diagram for explaining the outline of the present invention. FIG. 1 is a block diagram showing an example of a schematic configuration of a table design support apparatus 1 showing an example of a schematic configuration of the present invention. The table design support device 1 includes a delimiter candidate detection unit 2, a record division unit 3, a field analysis unit 4, and an analysis result combination unit 5.

区切り文字候補検出部２は、他者から提供されたレコード集合を適切に分割するための文字（区切り文字候補）を検出する。レコード分割部３は、区切り文字候補検出部２から受け取った区切り文字候補によってレコード集合を分割し分割レコード群を生成する。フィールド分析部４は、分割レコード群における各分割レコードに対し、格納可能なデータ型、及び一意性やＮＵＬＬ値の有無などの制約について分析を行い、分析結果ファイル群を生成する。分析結果結合部５は、分析結果ファイル群を結合してＲＤＢＭＳで格納が可能なテーブル定義候補を生成する。 The delimiter candidate detection unit 2 detects a character (delimiter candidate) for appropriately dividing a record set provided by another person. The record dividing unit 3 divides the record set by the delimiter candidate received from the delimiter candidate detection unit 2 and generates a divided record group. The field analysis unit 4 analyzes each divided record in the divided record group with respect to a data type that can be stored and restrictions such as uniqueness and presence / absence of a NULL value, and generates an analysis result file group. The analysis result combining unit 5 combines analysis result file groups to generate table definition candidates that can be stored in the RDBMS.

テーブル設計支援装置１では、他者から提供されたレコード集合に対し、レコード集合を分割して分割レコード群を生成し、分割レコードごとにデータ型や制約をそれぞれ分析することによって、データ型リレーショナルデータベース管理システム（ＲＤＢＭＳ）で格納が可能なテーブル定義候補を自動で生成することができる。これにより、ＲＤＢＭＳにおけるテーブル設計作業において、他者から提供されるレコード集合の規模が大きくなっても、テーブル設計作業においてデータ型の不一致や制約違反などのエラーが生じ難くできる。 The table design support apparatus 1 generates a divided record group by dividing a record set from a record set provided by others, and analyzes a data type and a constraint for each divided record, thereby obtaining a data type relational database. Table definition candidates that can be stored in the management system (RDBMS) can be automatically generated. As a result, in the table design work in the RDBMS, even if the scale of the record set provided by another person increases, errors such as data type mismatch and constraint violation can hardly occur in the table design work.

以下、図面を参照して本発明の実施の形態について説明する。
まず、本実施の形態にかかるテーブル設計支援装置２００の概略構成について以下に説明する。
図２は、本実施の形態にかかるテーブル設計支援装置２００の概略構成の一例を示すブロック図である。テーブル設計支援装置２００は、入力されたレコード集合をＲＤＢＭＳに格納することが可能なテーブル定義の候補をクライアント端末１００に出力する。図２に示すように、テーブル設計支援装置２００は、入力情報受信部２０１と、入力情報記憶部２０２と、区切り文字候補検出部２０３と、区切り文字候補送信部２０４と、区切り文字受信部２０５と、レコード分割部２０６と、フィールド分析部２０７と、分析結果結合部２０８と、分析結果送信部２０９と、記憶装置２２０と、を有する。 Embodiments of the present invention will be described below with reference to the drawings.
First, a schematic configuration of the table design support apparatus 200 according to the present embodiment will be described below.
FIG. 2 is a block diagram showing an example of a schematic configuration of the table design support apparatus 200 according to the present embodiment. The table design support apparatus 200 outputs to the client terminal 100 table definition candidates that can store the input record set in the RDBMS. As shown in FIG. 2, the table design support apparatus 200 includes an input information receiving unit 201, an input information storage unit 202, a delimiter candidate detection unit 203, a delimiter candidate transmission unit 204, and a delimiter receiving unit 205. A record dividing unit 206, a field analyzing unit 207, an analysis result combining unit 208, an analysis result transmitting unit 209, and a storage device 220.

他者から提供されるレコード集合１１１及びＲＤＢＭＳ種別情報は、クライアント端末１００の記憶装置１１０に保存されている。なお、ＲＤＢＭＳ種別情報とは、例えば、ＳＱＬ９９、Ｏｒａｃｌｅ、ＭｙＳＱＬ、ＰｏｓｔｇｒｅＳＱＬ等のＲＤＢＭＳの種別に関する情報を意味する。 A record set 111 and RDBMS type information provided by another person are stored in the storage device 110 of the client terminal 100. The RDBMS type information means information related to the type of RDBMS such as SQL99, Oracle, MySQL, PostgreSQL, and the like.

入力情報受信部２０１は、クライアント端末１００（入力情報送信部１０２）から送信されたデータ（レコード集合１１１及びＲＤＢＭＳ種別情報２１１）を受け取る。入力情報記憶部２０２は、入力情報受信部２０１が受け取ったデータを、ＲＤＢＭＳ種別情報２１１及びレコード集合２１２として記憶装置２１０に記憶させる。 The input information receiving unit 201 receives data (record set 111 and RDBMS type information 211) transmitted from the client terminal 100 (input information transmitting unit 102). The input information storage unit 202 stores the data received by the input information reception unit 201 in the storage device 210 as RDBMS type information 211 and a record set 212.

区切り文字候補検出部２０３は、レコード集合２１２を適切に分割するための文字を検出する。区切り文字候補送信部２０４は、区切り文字候補検出部２０３での検出結果が一意に定まらなかった場合にテーブル設計者に確認するため、区切り文字候補をクライアント端末１００（区切り文字候補受信部１０３）に送信する。この場合、テーブル設計者は、区切り文字入力候補の中から選択した区切り文字をクライアント端末１００（入力装置１０１）から入力する。 The delimiter candidate detection unit 203 detects a character for appropriately dividing the record set 212. The delimiter candidate transmission unit 204 sends the delimiter candidate to the client terminal 100 (delimiter character candidate reception unit 103) in order to confirm with the table designer when the detection result in the delimiter candidate detection unit 203 is not uniquely determined. Send. In this case, the table designer inputs the delimiter character selected from the delimiter input candidates from the client terminal 100 (input device 101).

区切り文字受信部２０５は、テーブル設計者によって入力された区切り文字をクライアント端末１００（区切り文字送信部１０４）から受け取る。レコード分割部２０６は、区切り文字候補検出部２０３もしくは区切り文字受信部２０５から受け取った区切り文字によってレコード集合２１２を分割し分割レコード群２１３を生成する。分割レコード群２１３は、一時データとして記憶装置２１０に記憶される。 The delimiter receiving unit 205 receives the delimiter input by the table designer from the client terminal 100 (delimiter transmitting unit 104). The record dividing unit 206 generates a divided record group 213 by dividing the record set 212 by the delimiter received from the delimiter candidate detection unit 203 or the delimiter receiving unit 205. The divided record group 213 is stored in the storage device 210 as temporary data.

フィールド分析部２０７は、分割レコード群２１３における各分割レコードについてそれぞれ分析し、分析結果ファイル群２１５を生成する。分析結果結合部２０８は、分析結果ファイル群２１５に含まれる各分析結果ファイルを結合し、１つの分析結果ファイル（ＲＤＢＭで格納可能なテーブル定義候補）を生成する。分析結果送信部２０９は、テーブル定義候補をクライアント端末１００（分析結果受信部１０５）に送信する。記憶装置２２０は、フィールド分析部２０７が分割レコード群２１３を分析する際に使用する各判定用プログラム（データ型判定プログラム２２１、ＮＵＬＬ値判定プログラム２２２及び一意性判定プログラム２２３）を格納する。 The field analysis unit 207 analyzes each divided record in the divided record group 213 and generates an analysis result file group 215. The analysis result combining unit 208 combines the analysis result files included in the analysis result file group 215 to generate one analysis result file (table definition candidate that can be stored in the RDBM). The analysis result transmission unit 209 transmits the table definition candidates to the client terminal 100 (analysis result reception unit 105). The storage device 220 stores each determination program (data type determination program 221, NULL value determination program 222, and uniqueness determination program 223) used when the field analysis unit 207 analyzes the divided record group 213.

区切り文字候補検出部２０３、レコード分割部２０６、フィールド分析部２０７及び分析結果結合部２０８は、これらの一部又は全部をソフトウェア（プログラム）によって構成するようにしても良いし、ハードウェア回路によって構成するようにしても良い。 The delimiter candidate detection unit 203, the record division unit 206, the field analysis unit 207, and the analysis result combination unit 208 may be configured in part or in whole by software (programs) or by hardware circuits. You may make it do.

次に、テーブル設計支援装置２００の動作について以下で説明する。なお、以下の説明では、図２についても適宜参照する。
図３は、テーブル設計支援装置２００における処理の流れを示すフローチャートである。図３に示すように、ステップＳ１では、入力情報受信部２０１がクライアント端末１００からＲＤＢＭＳ種別情報とレコード集合１１１のデータを受信し、入力情報記憶部２０２を介して受信したデータを記憶装置２１０に保存する。記憶装置２１０に保存されたレコード集合をレコード集合２１２、ＲＤＢＭＳ種別情報をＲＤＢＭＳ種別情報２１１とする。 Next, the operation of the table design support apparatus 200 will be described below. In the following description, reference is also made to FIG. 2 as appropriate.
FIG. 3 is a flowchart showing the flow of processing in the table design support apparatus 200. As shown in FIG. 3, in step S <b> 1, the input information receiving unit 201 receives RDBMS type information and data of the record set 111 from the client terminal 100, and stores the data received via the input information storage unit 202 in the storage device 210. save. A record set stored in the storage device 210 is referred to as a record set 212, and RDBMS type information is referred to as RDBMS type information 211.

次に、ステップＳ２では、区切り文字候補検出部２０３がレコード集合２１２を分析し、レコード集合２１２を複数列に分割することが可能な区切り文字候補を検出する。検出された区切り文字候補が単一でない（複数検出された、または、該当する区切り文字がない）場合、区切り文字候補送信部２０４からクライアント端末側の区切り文字候補受信部１０３に送信され、テーブル設計者に対して確認を行う。テーブル設計者が入力装置１０１から区切り文字を選択すると区切り文字送信部１０４及び区切り文字受信部２０５を経てレコード分割部２０６へ送られる。区切り文字候補検出部２０３により検出された区切り文字候補が単一の場合、そのままレコード分割部２０６へ送られる。区切り文字候補を決定する処理の詳細については後述する。 Next, in step S2, the delimiter candidate detection unit 203 analyzes the record set 212 and detects delimiter candidates that can divide the record set 212 into a plurality of columns. When the detected delimiter candidates are not single (a plurality of delimiters are detected or there is no corresponding delimiter), the delimiter candidate transmission unit 204 transmits the delimiter candidate candidates to the delimiter candidate reception unit 103 on the client terminal side, and the table design Confirm with the person in charge. When the table designer selects a delimiter from the input device 101, the delimiter is sent to the record dividing unit 206 via the delimiter transmitting unit 104 and the delimiter receiving unit 205. When the delimiter candidate detected by the delimiter candidate detection unit 203 is single, the delimiter candidate detection unit 203 sends it to the record division unit 206 as it is. Details of the process for determining a delimiter candidate will be described later.

次に、ステップＳ３では、レコード分割部２０６がステップＳ２において算出された区切り文字によってレコード集合２１２をフィールド単位に分割し、分割レコード群２１３として記憶装置２１０に保存される。レコード集合２１２の分割について、図４〜７を用いて具体的に説明する。 Next, in step S3, the record dividing unit 206 divides the record set 212 into field units based on the delimiter character calculated in step S2, and the divided record group 213 is stored in the storage device 210. The division of the record set 212 will be specifically described with reference to FIGS.

図４は、レコード集合２１２の一例であるレコード集合Ｒ１００を示す図である。図４に示されたレコード集合Ｒ１００を区切り文字候補検出部２０３により検出された区切り文字候補Ｓ（この例では“，”（カンマ））で区切った状態を図５に示す。この区切り文字候補Ｓの位置でレコード集合Ｒ１００を分割する。レコード集合Ｒ１００が分割された後、図６に示すように複数の分割レコードＲ１０１、Ｒ１０２、Ｒ１０３、Ｒ１０４が生成される。複数の分割レコードＲ１０１、Ｒ１０２、Ｒ１０３、Ｒ１０４からなる分割レコード群Ｒ１０５（分割レコード群２１３の一例）は記憶装置２１０に保存される。 FIG. 4 is a diagram illustrating a record set R100 that is an example of the record set 212. FIG. 5 shows a state where the record set R100 shown in FIG. 4 is separated by a delimiter candidate S (in this example, “,” (comma)) detected by the delimiter candidate detection unit 203. The record set R100 is divided at the position of the delimiter candidate S. After the record set R100 is divided, a plurality of divided records R101, R102, R103, and R104 are generated as shown in FIG. A divided record group R105 (an example of the divided record group 213) including a plurality of divided records R101, R102, R103, and R104 is stored in the storage device 210.

図３に示すように、ステップＳ３に続き、ステップＳ４では、フィールド分析部２０７が、生成された分割レコード群２１３のそれぞれに対し、ＮＵＬＬ値の有無、一意性の有無、ＲＤＢＭＳ種別情報２１１に基づくデータ型及びデータ長、について分析を行う。そして、フィールド単位での格納可能なテーブル定義の候補群を分析結果ファイル群２１５として記憶装置２１０に保存される。分割レコード群２１３に対する分析処理の詳細については後述する。 As shown in FIG. 3, following step S <b> 3, in step S <b> 4, the field analysis unit 207 determines, based on the presence / absence of NULL value, presence / absence of uniqueness, and RDBMS type information 211 for each of the generated divided record groups 213. Analyze data type and data length. Then, a table definition candidate group that can be stored in field units is stored in the storage device 210 as an analysis result file group 215. Details of the analysis processing for the divided record group 213 will be described later.

次に、ステップＳ５では、分析結果結合部２０８がフィールド毎の分析結果ファイル群２１５を結合して1つのファイルとする。このファイルは、ＲＤＢＭＳに格納することが可能なテーブル定義候補である。テーブル定義候補は、分析結果送信部２０９を介してクライアント端末１００に送信される。 In step S5, the analysis result combining unit 208 combines the analysis result file groups 215 for each field into one file. This file is a table definition candidate that can be stored in the RDBMS. The table definition candidates are transmitted to the client terminal 100 via the analysis result transmission unit 209.

図３のステップＳ２において区切り文字候補を決定する処理について以下で説明する。
図７は、レコード集合２１２において区切り文字の候補を決定する処理の流れを示すフローチャートである。図７に示すように、まず、ステップＳ２１では、既出文字一覧の初期化を行う。既出文字とは、区切り文字の候補を決定する処理フローにおいて処理対象となった文字である。区切り文字候補を決定する処理が完了したとき、レコード集合２１２に含まれる全ての文字が既出文字一覧に登録される。区切り文字候補を決定する処理の開始時点では既出文字は不明であるため、既出文字一覧をクリアする初期化処理を行う。 A process for determining a delimiter candidate in step S2 of FIG. 3 will be described below.
FIG. 7 is a flowchart showing a flow of processing for determining a delimiter candidate in the record set 212. As shown in FIG. 7, first, in step S21, an already-existing character list is initialized. An already-existing character is a character that has been processed in the processing flow for determining candidate delimiters. When the process for determining the delimiter candidate is completed, all the characters included in the record set 212 are registered in the existing character list. Since the already-existing character is unknown at the start of the process of determining the delimiter candidate, initialization processing for clearing the already-existing character list is performed.

次に、ステップＳ２２では、レコード集合２１２内の一行を読み込む。最初にレコード集合２１２の１行目を読み込み、後述するステップＳ２３、ステップＳ２４の処理を行った後、ステップＳ２２に戻って次の行を読み込む。ステップＳ２２〜ステップＳ２４の処理はレコード集合２１２の最終行まで繰り返される。 Next, in step S22, one line in the record set 212 is read. First, the first line of the record set 212 is read, and after the processing of Step S23 and Step S24 described later, the process returns to Step S22 to read the next line. Steps S22 to S24 are repeated until the last line of the record set 212.

次に、ステップＳ２３では、ステップ２２で読み込みした一行に対し、文字毎の出現回数をカウントする。具体的には、レコード集合２１２に記載されている文字について既出文字一覧の確認及び登録を行うとともに、行番号と既出文字との連想配列（ハッシュ）に対し、行ごとの文字の出現回数を格納する。なお、既出文字一覧の確認及び登録を行う処理の詳細については後述する。 Next, in step S23, the number of appearances for each character is counted for one line read in step 22. Specifically, the list of existing characters is confirmed and registered for the characters described in the record set 212, and the number of appearances of characters for each row is stored in an associative array (hash) of line numbers and existing characters. To do. Details of the process for confirming and registering the list of existing characters will be described later.

ステップＳ２４では、読み込みした一行がレコード集合２１２の最後の行か否かを判定する。ステップＳ２４で、読み込みした一行がレコード集合２１２の最後の行であると判断された場合、ステップＳ２５では、既出文字一覧に登録された各文字について、レコード集合２１２の行ごとに出現回数の集計を行う。そして、既出文字一覧に登録された各文字について、行ごとの出現回数から、出現回数一致率を算出する。 In step S24, it is determined whether the read one row is the last row of the record set 212. If it is determined in step S24 that the read line is the last line of the record set 212, the number of appearances is counted for each line of the record set 212 for each character registered in the existing character list in step S25. Do. Then, for each character registered in the list of existing characters, the appearance frequency coincidence ratio is calculated from the appearance frequency for each line.

ステップＳ２５に続いて、ステップＳ２６では、区切り文字候補を決定する。具体的には、区切り文字候補検出部２０３が、レコード集合２１２の行ごとにレコード集合２１２に含まれるそれぞれの文字の出現回数を集計する。そして、区切り文字候補検出部２０３が、レコード集合２１２に含まれる文字ごとに出現回数が一致している行の最大数の全行数に対する比率である出現回数一致率を算出し、出現回数一致率に基づいて区切り文字候補を決定する。出現回数一致率が１である文字は、レコード集合２１２の全ての行において出現回数が同じであることを意味するので、この文字を区切り文字候補とする。出現回数一致率が１である文字が存在しない場合、最も出現回数一致率の大きい文字を区切り文字候補とする。なお、出現回数一致率が1でない文字が区切り文字候補として選択された場合、出現回数が異なる行については、レコード分割部２０６における分割レコード生成時において例外行として除外され、以降の処理の対象外とする。区切り文字候補を決定した後、処理を終了する。 Subsequent to step S25, in step S26, a delimiter candidate is determined. Specifically, the delimiter candidate detection unit 203 counts the number of appearances of each character included in the record set 212 for each row of the record set 212. Then, the delimiter candidate detection unit 203 calculates an appearance count matching rate that is a ratio of the maximum number of lines with the same number of appearances for each character included in the record set 212 to the total number of lines, and the appearance count match rate The delimiter candidate is determined based on. A character having an appearance count matching rate of 1 means that the appearance count is the same in all the rows of the record set 212, so this character is set as a delimiter candidate. If there is no character with an appearance count match rate of 1, the character with the highest appearance count match rate is set as a delimiter character candidate. When a character whose appearance frequency matching rate is not 1 is selected as a delimiter character candidate, a line with a different appearance frequency is excluded as an exception line when the record dividing unit 206 generates a divided record, and is not subject to subsequent processing. And After determining the delimiter candidate, the process is terminated.

ステップＳ２４で、読み込みした一行がレコード集合２１２の最後の行ではないと判断された場合、処理をステップＳ２２に戻し、ステップＳ２２で次の行を読み込む。 If it is determined in step S24 that the read line is not the last line of the record set 212, the process returns to step S22, and the next line is read in step S22.

図８は、区切り文字候補の決定について具体例により説明する図である。図８Ａは、行番号と行番号と既出文字一覧に登録された文字（既出文字）との連想配列（ハッシュ）の一例を示す図である。行番号と既出文字との連想配列中の数字（ハッシュ関数値）は、既出文字の出現回数である。また、説明の便宜上、レコード集合２１２の行数は６行とする。図８Ａに示すように、既出文字“，”（カンマ）は、レコード集合２１２の１〜６行目において出現回数はいずれも３回であるので、出現回数一致率は、（出現回数が一致している行の最大数）／（全行数）＝６[行]／６[行]＝１．００である。 FIG. 8 is a diagram for explaining determination of a delimiter candidate by a specific example. FIG. 8A is a diagram illustrating an example of an associative array (hash) of line numbers, line numbers, and characters (existing characters) registered in the already-existing character list. A number (hash function value) in the associative array of line numbers and existing characters is the number of appearances of the existing characters. For convenience of explanation, the record set 212 has six rows. As shown in FIG. 8A, since the appearance number “,” (comma) appears three times in the first to sixth lines of the record set 212, the appearance number match rate is (the number of appearance matches). The maximum number of lines) / (total number of lines) = 6 [lines] / 6 [lines] = 1.00.

“，”（カンマ）以外の既出文字について、例えば、既出文字“ａ”では、レコード集合２１２の各行における出現回数は、１行目が１回、２行目はなし、３行目は３回、４行目は３回、５行目は１回、６行目は３回、である。既出文字“ａ”において、出現回数が３回の行は、３行（３行目、４行目、６行目）で最も多い。つまり、出現回数が一致している行の最大数は３行である。よって、出現回数一致率は、（出現回数が一致している行の最大数）／（全行数）＝３[行]／６[行]＝０．５０である。 For the existing characters other than “,” (comma), for example, in the existing character “a”, the number of appearances in each line of the record set 212 is 1 for the first line, 2 for the second line, 3 for the third line, The fourth line is 3 times, the 5th line is 1 time, and the 6th line is 3 times. In the existing character “a”, the number of occurrences of 3 times is the largest in 3 lines (3rd line, 4th line, 6th line). That is, the maximum number of lines with the same number of appearances is three lines. Therefore, the appearance count matching rate is (maximum number of rows with the same appearance count) / (total number of rows) = 3 [rows] / 6 [rows] = 0.50.

図８Ｂに示すように、既出文字において、“，”（カンマ）の出現回数一致率のみが１．００なので、“，”（カンマ）を区切り文字候補とする（判定結果○）。 As shown in FIG. 8B, since only the appearance frequency matching rate of “,” (comma) is 1.00 in the existing characters, “,” (comma) is set as a delimiter character candidate (determination result ○).

図７のステップＳ２３において、レコード集合２１２のある一行において各既出文字の出現回数をカウントする処理について以下で説明する。
図９は、レコード集合２１２のある一行において各既出文字の出現回数をカウントする処理の流れを示すフローチャートである。図９に示すように、まず、ステップＳ２３１において、レコード集合２１２のある一行における一文字を読み込む。最初にレコード集合２１２のある一行における１文字目を読み込み、後述するステップＳ２３２〜ステップＳ２３５の処理を行った後、ステップＳ２３１に戻って次の文字を読み込む。ステップＳ２３１〜ステップＳ２３５の処理はレコード集合２１２のある行の最後の文字まで繰り返す。 A process of counting the number of appearances of each existing character in one line of the record set 212 in step S23 of FIG. 7 will be described below.
FIG. 9 is a flowchart showing a flow of processing for counting the number of appearances of each existing character in one line of the record set 212. As shown in FIG. 9, first, in step S231, one character in one line of the record set 212 is read. First, after reading the first character in one line of the record set 212 and performing steps S232 to S235 described later, the process returns to step S231 to read the next character. The processing from step S231 to step S235 is repeated up to the last character of a line in the record set 212.

次に、ステップＳ２３２では、読み込んだ一文字が既出文字一覧に登録されているか否かを確認する。ステップＳ２３２で読み込んだ一文字が既出文字一覧に登録されていればステップＳ２３４に進む。読み込んだ一文字が既出文字一覧に登録されていなければステップＳ２３３で読み込んだ一文字を既出文字一覧に登録し、ステップＳ２３４に進む。 Next, in step S232, it is confirmed whether or not the read one character is registered in the existing character list. If the single character read in step S232 is registered in the list of existing characters, the process proceeds to step S234. If the read character is not registered in the existing character list, the read one character is registered in the existing character list in step S233, and the process proceeds to step S234.

次に、ステップＳ２３４では、行番号と既出文字との連想配列（ハッシュ）におけるハッシュ関数値をカウントアップし、読み込んだ行における各既出文字の出現回数を記録する。次に、ステップＳ２３５で、処理を行っている文字がある行内の最後の文字であるか否かを判断する。ステップＳ２４で、処理を行っている文字がある行内の最後の文字であると判断された場合は、処理を終了する。処理を行っている文字がある行内の最後の文字ではないと判断された場合、処理をステップＳ２３１に戻し、ステップＳ２３１で次の文字を読み込む。 Next, in step S234, the hash function value in the associative array (hash) of line numbers and existing characters is counted up, and the number of appearances of each existing character in the read line is recorded. Next, in step S235, it is determined whether the character being processed is the last character in a line. If it is determined in step S24 that the character being processed is the last character in a line, the processing ends. If it is determined that the character being processed is not the last character in a line, the process returns to step S231, and the next character is read in step S231.

図３のステップＳ４において、分割レコード群２１３に対する分析処理について以下で説明する。
図１０は、分割レコード群２１３に対する分析処理の流れを示すフローチャートである。図１０に示すように、ステップＳ４１では、分割レコード群２１３に含まれる分割レコードの１つを読み込む。なお、分割レコード群２１３に含まれる分割レコードの１つを読み込んだ後、後述するステップＳ４２〜ステップＳ４５の処理を行い、ステップＳ４１に戻って次の分割レコードを読み込む。ステップＳ４１〜ステップＳ４５の処理は、分割レコード群２１３に含まれる全部の分割レコードを処理し終わるまで繰り返される。 The analysis process for the divided record group 213 in step S4 in FIG. 3 will be described below.
FIG. 10 is a flowchart showing the flow of analysis processing for the divided record group 213. As shown in FIG. 10, in step S41, one of the divided records included in the divided record group 213 is read. Note that after reading one of the divided records included in the divided record group 213, processing in steps S42 to S45 described later is performed, and the process returns to step S41 to read the next divided record. The processes in steps S41 to S45 are repeated until all the divided records included in the divided record group 213 are processed.

次に、ステップＳ４２では、ＮＵＬＬ値判定プログラムにより、分割レコード群２１３に含まれる分割レコードのそれぞれに対し、文字列長が０であるデータが含まれているかどうかの判定をする（ＮＵＬＬ値判定）。そして、ＮＵＬＬ値判定において、ＮＵＬＬ値が含まれている分割レコードが見つかった場合、当該分割レコードからＮＵＬＬ値を除く処理を行う。なお、分割レコード群２１３に対し、このＮＵＬＬ値を除く処理を行った後の処理後の分割レコード群が処理後分割レコード群２１４（図２参照）である。なお、ＮＵＬＬ値判定において、分割レコード群２１３に含まれる分割レコードのいずれからもＮＵＬＬ値が見つからなかった場合、分割レコード群２１３と処理後分割レコード群２１４との内容は同一となる。 Next, in step S42, it is determined by the NULL value determination program whether data with a character string length of 0 is included for each of the divided records included in the divided record group 213 (NULL value determination). . Then, in the NULL value determination, when a divided record including the NULL value is found, a process of removing the NULL value from the divided record is performed. Note that the post-processing division record group 214 (see FIG. 2) is a post-processing division record group after the processing except the NULL value is performed on the division record group 213. In the NULL value determination, when a NULL value is not found from any of the divided records included in the divided record group 213, the contents of the divided record group 213 and the post-processing divided record group 214 are the same.

次に、ステップＳ４３では、ステップＳ４２でＮＵＬＬ値を除く処理を行った処理後分割レコードに対し、一意性があるか否かの判定を行う。具体的には、一意性判定プログラムにより、処理後分割レコードに対し、まず行単位でのソートを行い、次に前後の行を比較することによって重複する行が存在するかどうかを確認する。処理後分割レコードにおいて重複する行が見つかった場合には、当該重複する行を一つの行にまとめる行統合処理を行う。この行統合処理の前後で処理後分割レコードの行数が同じ場合は、当該処理後分割レコードは一意性があると判定する。 Next, in step S43, it is determined whether or not there is uniqueness to the post-process division record that has undergone the process of removing the NULL value in step S42. Specifically, the uniqueness determination program first sorts the post-processed divided records in units of rows, and then compares the preceding and succeeding rows to check whether there are duplicate rows. When duplicate rows are found in the post-process division record, row integration processing is performed to combine the duplicate rows into one row. If the number of lines in the post-process division record is the same before and after this line integration process, it is determined that the post-process division record has uniqueness.

次に、ステップＳ４４では、ステップＳ４２でＮＵＬＬ値を除く処理を行った処理後分割レコードに対し、ＲＤＢＭＳ種別情報２１１に基づいてＲＤＢＭＳに格納する際に使用可能なデータ型候補の判定を行う。このデータ型候補の判定処理の詳細については後述する。 Next, in step S44, data type candidates that can be used for storing in the RDBMS based on the RDBMS type information 211 are determined based on the RDBMS type information 211 for the post-processed divided records that have undergone the process of removing the NULL value in step S42. Details of the data type candidate determination process will be described later.

次に、ステップＳ４５では、分割レコード群２１３に含まれる全部の分割レコードの処理が終わったか否かを判断する。ステップＳ４５で、分割レコード群２１３に含まれる全部の分割レコードの処理が終わったと判断された場合、ステップＳ４６に進む。ステップＳ４６では分割レコード群２１３に対する分析結果を分析結果ファイル群２１５として出力する。ステップＳ４５で、分割レコード群２１３に含まれる全部の分割レコードの処理が終わっていないと判断された場合、処理をステップＳ４１に戻し、ステップＳ４１で次の分割レコードを読み込む。 Next, in step S45, it is determined whether or not all the divided records included in the divided record group 213 have been processed. If it is determined in step S45 that all the divided records included in the divided record group 213 have been processed, the process proceeds to step S46. In step S46, the analysis result for the divided record group 213 is output as the analysis result file group 215. If it is determined in step S45 that all the divided records included in the divided record group 213 have not been processed, the process returns to step S41, and the next divided record is read in step S41.

図１０のステップＳ４４におけるデータ型候補の判定処理について以下で説明する。
図１１は、データ型候補の判定処理の流れを示すフローチャートである。図１１に示すように、ステップＳ４４１でＲＤＢＭＳ種別情報を読み込む。 The data type candidate determination process in step S44 of FIG. 10 will be described below.
FIG. 11 is a flowchart showing the flow of data type candidate determination processing. As shown in FIG. 11, RDBMS type information is read in step S441.

次に、ステップＳ４４２では、ＲＤＢＭＳ種別情報から得られたＲＤＢＭＳ種別（例えば、ＳＱＬ９９、Ｏｒａｃｌｅ、ＭｙＳＱＬ、ＰｏｓｔｇｒｅＳＱＬ等）において登録済みのデータ型（例えば、ＩＭＴＥＧＥＲ型、ＣＨＡＲ型等）の１つに対応する合致判定用プログラムを読み込む。なお、合致判定用プログラムは、ＲＤＢＭＳ種別において定義されたデータ型によりそれぞれ異なる。ＲＤＢＭＳ種別において登録済みのデータ型の１つに対応する合致判定用プログラムを読み込んだ後、後述するステップＳ４４３〜ステップＳ４４５の処理を行い、再びステップＳ４４２に戻って次の登録済みのデータ型の合致判定用プログラムを読み込む。ステップＳ４４２〜ステップＳ４４５の処理は、登録済みのデータ型の合致判定用プログラムの全てを実行し終わるまで繰り返される。 Next, in step S442, it corresponds to one of registered data types (for example, IMTEGER type, CHAR type, etc.) in the RDBMS type (for example, SQL99, Oracle, MySQL, PostgreSQL, etc.) obtained from the RDBMS type information. Read the match determination program. Note that the match determination program differs depending on the data type defined in the RDBMS type. After reading the match determination program corresponding to one of the registered data types in the RDBMS type, the processing in steps S443 to S445 described later is performed, and the process returns to step S442 again to match the next registered data type. Load the judgment program. The processes in steps S442 to S445 are repeated until all of the registered data type match determination programs have been executed.

次に、ステップＳ４４３では、ステップＳ４２でＮＵＬＬ値を除く処理を行った処理後分割レコードに対し、読み込みした合致判定用プログラムを実行し、処理後分割レコードのデータ型が、実行中の合致判定用プログラムのデータ型と合致するか否かの判定を行う。以下、標準ＳＱＬ規格ＳＱＬ９９にて定義されているデータ型のうちの、“ＩＮＴＥＧＥＲ型”の合致判定プログラムと、“ＣＨＡＲ型”の合致判定プログラムと、を例として、処理後分割レコードのデータ型が、実行中の合致判定用プログラムのデータ型と合致するか否かを判定する方法について具体的に説明する。 Next, in step S443, the read match determination program is executed for the post-process division record that has undergone the process of removing the NULL value in step S42, and the data type of the post-process division record is for the match determination being executed. Judges whether it matches the data type of the program. Hereinafter, of the data types defined in the standard SQL standard SQL99, the “INTEGER type” match determination program and the “CHAR type” match determination program are used as examples. A method for determining whether or not the data type of the matching determination program being executed matches will be specifically described.

ＳＱＬ９９におけるＩＮＴＥＧＥＲ型は、−２１４７４８３６４８〜２１４７４８３６４７の範囲の整数を格納可能な整数データである。このため、ＳＱＬ９９におけるＩＮＴＥＧＥＲ型の合致判定用プログラムでは、まず、処理後分割レコードに整数でない値が含まれているか否かを確認する。整数でない値が含まれていない（すべて整数である）場合に、処理後分割レコードに含まれるデータの最大値と最小値を求め、これがＩＮＴＥＲＧＥＲ型の値の範囲内であれば当該処理後分割レコードはＩＮＴＥＲＧＥＲ型に格納可能であると判定する。ＳＱＬ９９におけるＣＨＡＲ型は文字で構成されるデータ型であるため、ＳＱＬ９９におけるＣＨＡＲ型の合致判定用プログラムでは、まず、処理後分割レコードの各行のデータが文字列であるか否かを確認する。処理後分割レコードの各行が文字列である場合、各行の文字列長をそれぞれ算出し、算出した文字列長のうちの最大値をＣＨＡＲ型の文字列長と定める。 The INTEGER type in SQL99 is integer data that can store an integer in the range of -2147483648 to 2147483647. For this reason, in the INTEGER type match determination program in SQL99, first, it is confirmed whether or not a non-integer value is included in the post-process division record. When non-integer values are not included (all are integers), the maximum value and the minimum value of the data included in the post-process division record are obtained, and if this is within the range of the INTERGER type value, the post-process division record Is determined to be storable in the INTERGER type. Since the CHAR type in SQL99 is a data type composed of characters, the CHAR type match determination program in SQL99 first checks whether the data in each line of the post-process division record is a character string. When each line of the post-process division record is a character string, the character string length of each line is calculated, and the maximum value of the calculated character string lengths is determined as the CHAR type character string length.

ステップＳ４４３に続き、ステップＳ４４４では、合致判定の結果を記録する。ステップＳ４４３でデータ型が合致したと判断された場合には、実行した合致判定用プログラムのＲＤＢＭＳ種別とデータ型とに対応して割り出されるスコアを、実行した合致判定用プログラムに対応するデータ型の”有効なスコア”として記録する。一方、ステップＳ４４３において実行した合致判定用プログラムにおいて、データ型が合致しないと判断された場合には、実行した合致判定用プログラムに対応するデータ型についてスコアは記録されない。実行した合致判定用プログラムのＲＤＢＭＳ種別とデータ型とに対応するスコアの一覧表の一例を図１２に示す。 Following step S443, in step S444, the result of match determination is recorded. If it is determined in step S443 that the data types match, the score calculated corresponding to the RDBMS type and data type of the executed match determination program is used as the data type corresponding to the executed match determination program. Record as a “valid score”. On the other hand, if it is determined in the match determination program executed in step S443 that the data types do not match, no score is recorded for the data type corresponding to the executed match determination program. An example of a list of scores corresponding to the RDBMS type and data type of the executed match determination program is shown in FIG.

図１１に示すように、ステップＳ４４４に続くステップＳ４４５では、登録済みのデータ型の全てについて、データ型の合致判定が終了したかどうかを判断する。ステップＳ４４５で登録済みのデータ型の全てについて、データ型の合致判定が終了したと判断された場合、ステップＳ４４６へと進む。一方、ステップＳ４４５で登録済みのデータ型の全てについて、データ型の合致判定が終了していないと判断された場合、処理をステップＳ４４２に戻し、次の登録済みのデータ型を読み込む。
す。 As shown in FIG. 11, in step S445 following step S444, it is determined whether or not the data type match determination has been completed for all registered data types. If it is determined in step S445 that the data type match determination has been completed for all registered data types, the process proceeds to step S446. On the other hand, if it is determined in step S445 that the data type match determination has not been completed for all registered data types, the process returns to step S442 to read the next registered data type.
The

ステップＳ４４６では、ステップＳ４４４にて”有効なスコア”が記録されたデータ型を、分割レコード群２１３の格納が可能なデータ型候補とする。さらに、これらのデータ型候補について有効なスコアが大きい順にソートし、データ型候補の優先度を決める。 In step S446, the data type in which “valid score” is recorded in step S444 is set as a data type candidate capable of storing the divided record group 213. Further, the data type candidates are sorted in descending order of effective scores, and the priority of the data type candidates is determined.

図１３は、図１０のステップＳ４６において出力された分析結果ファイル群を、図３のステップＳ５において１つのファイルに結合した分析結果ファイル（ＲＤＢＭＳで格納が可能なテーブル定義候補）の一例を示す図である。図１３に示すように、例えば、レコード集合の第１番目のフィールド（フィールド１）では、データ型の優先度はＳＭＡＬＬＩＮＴ型（優先度１）、ＩＮＴＥＧＥＲ型（優先度２）、ＣＨＡＲ型またはＶＡＲＣＨＡＲ型（優先度３）の順に高い。また、フィールド１には、一意性があり、ＮＵＬＬ値が存在しない。 FIG. 13 is a diagram showing an example of an analysis result file (table definition candidates that can be stored in the RDBMS) obtained by combining the analysis result file group output in step S46 of FIG. 10 into one file in step S5 of FIG. It is. As shown in FIG. 13, for example, in the first field (field 1) of the record set, the priority of the data type is SMALLINT type (priority 1), INTEGER type (priority 2), CHAR type or VARCHAR type. In order of (priority 3). Also, field 1 is unique and does not have a NULL value.

上記実施の形態で説明したテーブル設計支援装置２００では、他者から提供されたレコード集合に対し、ＲＤＢＭＳで格納が可能なテーブル定義候補を自動で生成することができる。これにより、リレーショナルデータベース管理システムにおけるテーブル設計作業において、他者から提供されるレコード集合の規模が大きくなっても、テーブル設計作業においてデータ型の不一致や制約違反などのエラーが生じ難くできる。 The table design support apparatus 200 described in the above embodiment can automatically generate table definition candidates that can be stored in the RDBMS for a record set provided by another person. As a result, in the table design work in the relational database management system, even if the size of the record set provided by others increases, errors such as data type mismatches and constraint violations can hardly occur in the table design work.

以上、本発明を上記実施の形態に即して説明したが、本発明は上記実施の形態の構成にのみ限定されるものではなく、本願特許請求の範囲の請求項の発明の範囲内で当業者であればなし得る各種変形、修正、組み合わせを含むことは勿論である。 Although the present invention has been described with reference to the above embodiment, the present invention is not limited only to the configuration of the above embodiment, and within the scope of the invention of the claims of the present application. It goes without saying that various modifications, corrections, and combinations that can be made by those skilled in the art are included.

例えば、上記実施の形態において、クライアント端末１００とテーブル設計支援装置２００とを別々の装置として構成したが、一体として構成してもよい。また、テーブル設計支援装置２００の構成要素は一体として構成したが、別々に構成してもよい。フィールド分析部２０７を複数の異なる装置内に配置すると、分割レコード群２１３を並列に分析することができるので処理時間を短縮することが可能になる。なお、フィールド分析部２０７を複数の異なる装置内に配置する場合、分割レコード群２１３、処理後分割レコード群２１４及び分析結果ファイル群２１５については複数の装置間で共有できるような構成にする。 For example, in the above embodiment, the client terminal 100 and the table design support device 200 are configured as separate devices, but may be configured as a single unit. Moreover, although the component of the table design assistance apparatus 200 was comprised integrally, you may comprise separately. If the field analysis unit 207 is arranged in a plurality of different apparatuses, the divided record group 213 can be analyzed in parallel, so that the processing time can be shortened. When the field analysis unit 207 is arranged in a plurality of different apparatuses, the divided record group 213, the post-processed divided record group 214, and the analysis result file group 215 are configured to be shared among a plurality of apparatuses.

１テーブル設計支援装置
２区切り文字候補検出部
３レコード分割部
４フィールド分析部
５分析結果結合部 1 Table design support device 2 Delimiter candidate detection unit 3 Record division unit 4 Field analysis unit 5 Analysis result combination unit

Claims

A table design support device for supporting table design for a relational database management system,
A delimiter candidate detection unit for detecting delimiter candidates for dividing a record set provided by another person,
A record dividing unit that divides the record set by the delimiter candidate and generates a divided record group;
For each divided record in the divided record group, a field analysis unit that analyzes data types and constraints that can be stored in the relational database management system and generates an analysis result file group;
An analysis result combining unit that combines the analysis result file groups to generate table definition candidates that can be stored in the relational database management system;
Equipped with a,
The delimiter candidate detection unit aggregates the number of appearances of each character included in the record set for each row of the record set, and the number of occurrences of the line where the number of appearances matches for each character included in the record set. A table design support device that calculates an appearance frequency coincidence ratio that is a ratio of the maximum number of all rows and determines the delimiter candidate based on the appearance frequency coincidence rate .

The field analysis unit determines, for each of a plurality of data types defined in the relational database management system, whether or not the divided record and the data type match, and determines a data type candidate of the divided record The table design support apparatus according to claim 1 .

The field analysis unit analyzes whether or not a NULL value is included in each divided record of the divided record group, and when a divided record including the NULL value is found, a NULL value is calculated from the divided record. The table design support apparatus according to claim 1 or 2 , wherein the post-process division record is generated.

The field analysis unit sorts the post-processed divided records in units of rows, checks whether or not there are duplicate lines by comparing preceding and subsequent lines, and duplicates in the post-processed divided records. When a row is found, a row integration process is performed to combine the duplicate rows into a single row. The table design support apparatus according to claim 3, wherein it is determined that there is a table.

A table design support method for supporting table design for a relational database management system by means of a table design support device comprising a delimiter candidate detection unit, a record division unit, a field analysis unit, and an analysis result combination unit,
A delimiter candidate detection unit detecting delimiter candidates for dividing a record set provided by others;
A record dividing unit dividing the record set by the delimiter candidates to generate a divided record group;
A field analysis unit, for each divided record in the divided record group, analyzing data types and constraints that can be stored in the relational database management system to generate an analysis result file group;
An analysis result combining unit generating the table definition candidates that can be stored in the relational database management system by combining the analysis result file groups;
Equipped with a,
In the step of detecting the delimiter candidate, the delimiter candidate detection unit counts the number of appearances of each character included in the record set for each row of the record set, and for each character included in the record set A table design support method for calculating an appearance frequency coincidence ratio that is a ratio of a maximum number of lines having the same appearance frequency to the total number of lines, and determining the delimiter candidate based on the appearance frequency coincidence ratio .

A control program that supports table design for a relational database management system,
Detecting a delimiter candidate for dividing a record set provided by others,
Dividing the record set by the delimiter candidates to generate a divided record group;
Analyzing data types and constraints that can be stored in the relational database management system for each divided record in the divided record group, and generating an analysis result file group;
Combining the analysis result file group to generate a table definition candidate that can be stored in the relational database management system ;
In the step of detecting the delimiter candidate, the delimiter candidate detection unit counts the number of appearances of each character included in the record set for each row of the record set, and for each character included in the record set A control program that calculates an appearance frequency coincidence ratio that is a ratio of the maximum number of lines with the same appearance frequency to the total number of lines, and determines the delimiter candidate based on the appearance frequency coincidence ratio .