JP2015014847A

JP2015014847A - Design assist system, design assist method, and program

Info

Publication number: JP2015014847A
Application number: JP2013140035A
Authority: JP
Inventors: 英志木村; Hideshi Kimura
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2013-07-03
Filing date: 2013-07-03
Publication date: 2015-01-22

Abstract

PROBLEM TO BE SOLVED: To facilitate application of a parallel distributed processing technique by assisting, in a system design taking advantage of the parallel distributed processing technique, the design of a case in which an especially large amount of data is handled.SOLUTION: In a data extraction/processing design assist system 100, a mode evaluation subsystem 300 extracts a feature amount from inputted processing mode data and evaluates a data processing mode from the feature amount and processing mode metadata. A sample execution subsystem 400 extracts a candidate as distributed processing from the processing mode data, and predicts a processing time for data to be processed from a sample data processing time which is a processing time based on the processing mode data for sample data obtained by adding the evaluation result of a mode evaluation unit to the extraction result. A data management subsystem 500 generates a processing mode feature amount on the basis of the feature amount extracted by the mode evaluation unit, and outputs the processing mode feature amount and the processing time for the data to be processed as predicted by the sample execution subsystem 400.

Description

本発明は、設計支援システム、設計支援方法、およびプログラムに関し、特に、並列分散処理技術を活用する際における設計支援技術に関する。 The present invention relates to a design support system, a design support method, and a program, and more particularly, to a design support technology when utilizing a parallel distributed processing technology.

近年、現実的な時間で処理することが不可能であった大量データに対するデータ加工処理が、並列分散処理技術により可能になってきている。この並列分散処理技術によるデータ加工は、大量データに対する処理の高速化という点で非常に効果的であり、統計処理やマイニング処理、あるいは大量バッチ処理などに適用されている。適用にあたっては、並列分散処理が可能かを判断した上で、実装などを行う。 In recent years, data processing for large amounts of data that could not be processed in a realistic time has become possible with parallel distributed processing technology. Data processing by this parallel distributed processing technique is very effective in terms of speeding up processing for a large amount of data, and is applied to statistical processing, mining processing, or large-scale batch processing. In application, it is implemented after judging whether parallel distributed processing is possible.

この種の並列分散処理技術においては、例えば並列分散処理が苦手な系列データを対象に、効率的に並列分散処理するもの（例えば特許文献１参照）やＣＯＢＯＬ（COmmon Business Oriented Language）バッチ処理を並列分散処理可能にして高速化するもの（例えば非特許文献１参照）などが知られている。 In this type of parallel distributed processing technology, for example, parallel distributed processing (for example, refer to Patent Document 1) or COBOL (COmmon Business Oriented Language) batch processing is performed in parallel for series data that is not good at parallel distributed processing. A device that can be distributed and speeded up (for example, see Non-Patent Document 1) is known.

特開２０１１−１５０５０３号公報JP 2011-150503 A

Hadoopの死角、COBOLバッチ処理の並列化、（ＵＲＬ：http://www.atmarkit.co.jp/fjava/rensai4/trouble_knowhow08/01.html）Hadoop blind spot, parallelization of COBOL batch processing, (URL: http://www.atmarkit.co.jp/fjava/rensai4/trouble_knowhow08/01.html)

並列分散処理技術は、大量のデータを高速に扱えるという点において非常に有効である。しかし、データ分析などのデータ抽出／加工処理を実装する上で、処理をどのように分散し、どのような手順で処理するかを適切に設計することは技術的に難易度が高い。 The parallel distributed processing technology is very effective in that a large amount of data can be handled at high speed. However, in implementing data extraction / processing such as data analysis, it is technically difficult to appropriately design how to distribute the processing and in what procedure.

これは、データ抽出処理に指定可能な条件の制限や、出力データが大量になりすぎてＩ／Ｏ（Input/Output）性能が劣化する、といったケースがあるためである。 This is because there are cases where there are restrictions on conditions that can be specified for the data extraction process, and there are cases where output data becomes too large and I / O (Input / Output) performance deteriorates.

そのような課題を解決するための特殊な手法や、データ加工をコンサルティングするようなサービスも市場には存在するが、「大量データ処理」であるが故に、試行そのものが困難となる場合がある。 Although there are special methods for solving such problems and services for consulting data processing, there are cases where trials themselves are difficult due to the "mass data processing".

前述したように、非特許文献１では、既存のバッチ処理を並列分散化することが可能であることが記載されており、特許文献１では、並列分散化を行う技術が記載されているが、実際にこれから扱うデータに当該手法が適用できるか否かの判断は明確にされておらず、適用にはこれら高度な手法に精通した技術者を要する。 As described above, Non-Patent Document 1 describes that existing batch processing can be distributed in parallel, and Patent Document 1 describes a technique for performing parallel distribution. Judgment as to whether or not the method can be applied to data to be handled in the future is not clear, and application requires an engineer who is familiar with these advanced methods.

例えば性能検証という視点では、設計を確認するためにプロトタイプなどを開発し、大量データで試行する必要がある。適切な設計ができていない場合、処理時間は、当然長くなり、試行回数が減少し、相対的に品質が劣化する。 For example, from the viewpoint of performance verification, it is necessary to develop a prototype or the like in order to confirm the design and to experiment with a large amount of data. If an appropriate design is not made, the processing time naturally becomes longer, the number of trials decreases, and the quality deteriorates relatively.

このことは、分析などのデータ加工サービスを顧客に提供する上において、価値提供におけるボトルネックとなりかねないため、検証の度にコンサルティングや高度技術者を適用する必要がある。 This may become a bottleneck in providing value when providing data processing services such as analysis to customers, and it is necessary to apply consulting and advanced engineers for each verification.

本発明の目的は、並列分散処理技術を活用するシステム設計において、特に大量のデータを取り扱うケースの設計を支援することで、並列分散処理技術の適用を容易とすることのできる技術を提供することにある。 An object of the present invention is to provide a technology that can facilitate the application of a parallel distributed processing technique by supporting a design of a case that handles a large amount of data in a system design that utilizes a parallel distributed processing technique. It is in.

本発明の前記ならびにその他の目的と新規な特徴については、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、次のとおりである。 Of the inventions disclosed in the present application, the outline of typical ones will be briefly described as follows.

すなわち、代表的なものの概要は、設計支援システムに適用され、以下のような特徴を有するものである。 That is, a typical outline is applied to a design support system and has the following characteristics.

設計支援システムは、並列分散処理技術を活用するシステム設計において、特に大量のデータを取り扱うケースの設計を支援することによって、データ加工技術に加えて並列分散アルゴリズムを習熟するような高度技術者でなくても、並列分散処理技術の適用を容易とするものである。 The design support system is not an advanced engineer who is proficient in parallel distributed algorithms in addition to data processing technology by supporting the design of cases that handle large amounts of data in system design that utilizes parallel distributed processing technology. However, this facilitates the application of the parallel distributed processing technology.

この設計支援システムは、方式評価部、サンプル実行処理部、およびデータ管理部を有する。方式評価部は、入力された処理方式データから特徴量を抽出し、抽出した特徴量と処理方式メタデータとからデータの処理方式を評価する。 This design support system includes a method evaluation unit, a sample execution processing unit, and a data management unit. The method evaluation unit extracts a feature amount from the input processing method data, and evaluates a data processing method from the extracted feature amount and processing method metadata.

サンプル実行処理部は、処理方式データから、分散処理としての候補をサンプルデータとして抽出し、方式評価部による評価結果を加えたサンプルデータに対して処理方式データに基づく処理を実行した際に要した時間であるサンプルデータ処理時間から、処理対象データの処理時間を予測する。 The sample execution processing unit needed to extract a candidate for distributed processing as sample data from the processing method data, and to execute processing based on the processing method data on the sample data to which the evaluation result by the method evaluation unit was added. The processing time of the processing target data is predicted from the sample data processing time which is time.

データ管理部は、方式評価部が抽出した特徴量に基づいて処理方式特徴量を生成し、処理方式特徴量およびサンプル実行処理部が予測した処理対象データの処理時間を出力する。 The data management unit generates a processing method feature value based on the feature value extracted by the method evaluation unit, and outputs the processing method feature value and the processing time of the processing target data predicted by the sample execution processing unit.

また、本発明は、多様な形式のデータを入力として、適切な並列分散処理選択候補と選択処理方式の検証方法を提供するシステムによる方法や、前記システムとしてコンピュータシステムを機能させるプログラムにも適用することができる。 The present invention is also applicable to a method by a system that provides appropriate parallel distributed processing selection candidates and a selection processing method verification method using various types of data as input, and a program that causes a computer system to function as the system. be able to.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

並列分散処理設計を高効率化することができる。 Parallel distributed processing design can be made highly efficient.

一実施の形態におけるデータ抽出・加工設計支援システムの構成の一例を示す説明図である。It is explanatory drawing which shows an example of a structure of the data extraction and process design assistance system in one Embodiment. 図１のデータ抽出・加工設計支援システムが有する方式評価サブシステムの処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of the system evaluation subsystem which the data extraction and process design assistance system of FIG. 1 has. 図１のデータ抽出・加工設計支援システムが有する処理対象データ群に格納される処理方式特徴量のデータ形式の一例を示した説明図である。It is explanatory drawing which showed an example of the data format of the processing method feature-value stored in the process target data group which the data extraction and process design support system of FIG. 1 has. 処理対象データ群に格納されるデータ特徴量のデータ形式の一例を示した説明である。It is description which showed an example of the data format of the data feature-value stored in a process target data group. 図１のデータ抽出・加工設計支援システムが有する形式特徴抽出部によって抽出された処理方式特徴量におけるデータ形式の一例を示す説明図である。It is explanatory drawing which shows an example of the data format in the processing system feature-value extracted by the format feature extraction part which the data extraction and process design support system of FIG. 1 has. 図１のサンプル実行サブシステムが有する候補処理抽出部の処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process of the candidate process extraction part which the sample execution subsystem of FIG. 1 has. 候補処理抽出部のサンプルデータ処理によるデータ形式の一例を示す説明図である。It is explanatory drawing which shows an example of the data format by the sample data process of a candidate process extraction part. サンプル実行サブシステムが有するサンプル実行部の処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process of the sample execution part which a sample execution subsystem has. 設計制御サブシステムが有する表示部における表示の一例を示す説明図である。It is explanatory drawing which shows an example of the display in the display part which a design control subsystem has.

以下の実施の形態においては便宜上その必要があるときは、複数のセクションまたは実施の形態に分割して説明するが、特に明示した場合を除き、それらはお互いに無関係なものではなく、一方は他方の一部または全部の変形例、詳細、補足説明等の関係にある。 In the following embodiments, when it is necessary for the sake of convenience, the description will be divided into a plurality of sections or embodiments. However, unless otherwise specified, they are not irrelevant to each other. There are some or all of the modifications, details, supplementary explanations, and the like.

また、以下の実施の形態において、要素の数等（個数、数値、量、範囲等を含む）に言及する場合、特に明示した場合および原理的に明らかに特定の数に限定される場合等を除き、その特定の数に限定されるものではなく、特定の数以上でも以下でもよい。 Further, in the following embodiments, when referring to the number of elements (including the number, numerical value, quantity, range, etc.), especially when clearly indicated and when clearly limited to a specific number in principle, etc. Except, it is not limited to the specific number, and may be more or less than the specific number.

さらに、以下の実施の形態において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須のものではないことは言うまでもない。 Further, in the following embodiments, the constituent elements (including element steps and the like) are not necessarily indispensable unless otherwise specified and apparently essential in principle. Needless to say.

同様に、以下の実施の形態において、構成要素等の形状、位置関係等に言及するときは特に明示した場合および原理的に明らかにそうではないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。このことは、上記数値および範囲についても同様である。 Similarly, in the following embodiments, when referring to the shape, positional relationship, etc. of components, etc., the shape of the component is substantially the case unless it is clearly specified and the case where it is clearly not apparent in principle. And the like are included. The same applies to the above numerical values and ranges.

また、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。なお、図面をわかりやすくするために平面図であってもハッチングを付す場合がある。 In all the drawings for explaining the embodiments, the same members are denoted by the same reference symbols in principle, and the repeated explanation thereof is omitted. In order to make the drawings easy to understand, even a plan view may be hatched.

以下、上記した概要に基づいて、実施の形態を詳細に説明する。 Hereinafter, the embodiment will be described in detail based on the above-described outline.

〈データ抽出・加工設計支援システムの概要〉
図１は、本実施の形態におけるデータ抽出・加工設計支援システム１００の構成の一例を示す説明図である。 <Outline of data extraction / processing design support system>
FIG. 1 is an explanatory diagram illustrating an example of a configuration of a data extraction / processing design support system 100 according to the present embodiment.

データ抽出・加工設計支援システム１００は、中央演算処理装置などの計算能力を有する１つ以上の計算機およびネットワークで構成される設計支援システムであり、大量の系列データを並列に処理する。 The data extraction / processing design support system 100 is a design support system configured by one or more computers having a calculation capability such as a central processing unit and a network, and processes a large amount of series data in parallel.

並列分散処理データ抽出・加工設計支援システム１００は、図１に示すように、設計制御サブシステム２００、方式評価サブシステム３００、サンプル実行サブシステム４００、データ管理サブシステム５００、処理対象データ群６０１、および処理方式メタデータ群６０２を有する。 As shown in FIG. 1, the parallel distributed processing data extraction / processing design support system 100 includes a design control subsystem 200, a method evaluation subsystem 300, a sample execution subsystem 400, a data management subsystem 500, a processing target data group 601, And a processing method metadata group 602.

設計制御サブシステム２００は、方式評価サブシステム３００およびサンプル実行サブシステム４００にそれぞれ接続されており、方式評価部である方式評価サブシステム３００は、サンプル実行処理部となるサンプル実行サブシステム４００に接続されている。 The design control subsystem 200 is connected to the method evaluation subsystem 300 and the sample execution subsystem 400, respectively. The method evaluation subsystem 300, which is a method evaluation unit, is connected to the sample execution subsystem 400 serving as a sample execution processing unit. Has been.

また、データ管理部であるデータ管理サブシステム５００は、方式評価サブシステム３００、サンプル実行サブシステム４００、処理対象データ群６０１、および処理方式メタデータ群６０２にそれぞれ接続されている。 The data management subsystem 500, which is a data management unit, is connected to the method evaluation subsystem 300, the sample execution subsystem 400, the processing target data group 601, and the processing method metadata group 602, respectively.

設計制御サブシステム２００は、キーボードなどの入力部２０１とディスプレイなどの表示部２０２とを有し、該入力部２０１を介して得た利用者からの処理方式定義や処理対象データを含む様々な要求を用いて、方式評価サブシステム３００やサンプル実行サブシステム４００に様々な要求を送信し、表示部２０２に結果を表示する。 The design control subsystem 200 includes an input unit 201 such as a keyboard and a display unit 202 such as a display, and various requests including processing method definitions and processing target data obtained from the user via the input unit 201. Are used to transmit various requests to the system evaluation subsystem 300 and the sample execution subsystem 400, and the results are displayed on the display unit 202.

方式評価サブシステム３００は、データ評価部３０１および処理方式特徴量抽出部３０２を有する。方式評価サブシステム３００は、設計制御サブシステム２００の入力部２０１から入力された、処理方式定義と処理対象データとからなる処理方式データの特徴量を抽出し、データ管理サブシステム５００からの処理方式メタデータと合わせて、データの処理方式を評価する。処理方式定義は、処理の入出力定義だけでもよいし、処理フローを明示するなどの処理の経過ログを含んでもよい。 The method evaluation subsystem 300 includes a data evaluation unit 301 and a processing method feature amount extraction unit 302. The method evaluation subsystem 300 extracts the feature amount of the processing method data including the processing method definition and the processing target data input from the input unit 201 of the design control subsystem 200, and the processing method from the data management subsystem 500. Evaluate the data processing method along with the metadata. The processing method definition may be only the input / output definition of processing, or may include a progress log of processing such as specifying the processing flow.

サンプル実行サブシステム４００は、サンプル実行部４０１および候補処理抽出部４０２を有する。このサンプル実行サブシステム４００は、入力部２０１から入力された処理方式定義と処理対象データに対し、方式評価サブシステム３００にて実行した処理の評価に加え、処理対象データから実行用のサンプルデータを抽出し、そのサンプルデータを用いて処理方式定義に基づく処理を実行する。実行した試行結果は、データ管理サブシステム５００を通じて格納される。 The sample execution subsystem 400 includes a sample execution unit 401 and a candidate process extraction unit 402. In addition to the evaluation of the processing executed by the method evaluation subsystem 300 on the processing method definition and processing target data input from the input unit 201, the sample execution subsystem 400 receives sample data for execution from the processing target data. Extraction is performed, and processing based on the processing method definition is executed using the sample data. The executed trial result is stored through the data management subsystem 500.

データ管理サブシステム５００は、形式特徴抽出部５０１およびデータ閲覧部５０２を有する。データ管理サブシステム５００は、方式評価サブシステム３００やサンプル実行サブシステム４００からのデータ格納要求に応じて、データ形式に基づいた特徴を抽出し、その結果やメタデータなどを、データベースである処理対象データ群６０１や処理方式メタデータ群６０２に格納し、また様々な要求に応じて格納データを送付するサブシステムである。 The data management subsystem 500 includes a format feature extraction unit 501 and a data browsing unit 502. The data management subsystem 500 extracts features based on the data format in response to a data storage request from the method evaluation subsystem 300 or the sample execution subsystem 400, and the results, metadata, etc. are processed as a database. It is a subsystem that stores data in the data group 601 and processing method metadata group 602 and sends stored data in response to various requests.

なお、以下に説明する方式評価サブシステム３００、サンプル実行サブシステム４００、およびデータ管理サブシステム５００の処理機能は、たとえば、データ抽出・加工設計支援システム１００に設けられたプログラム格納メモリ（図示せず）などに記憶されているプログラム形式のソフトウェアを、該データ抽出・加工設計支援システム１００の図示しないＣＰＵ（Central Processing Unit）などが実行することにより実現する。 The processing functions of the method evaluation subsystem 300, the sample execution subsystem 400, and the data management subsystem 500 described below are, for example, a program storage memory (not shown) provided in the data extraction / processing design support system 100. ) And the like in a program format stored by the CPU (Central Processing Unit) (not shown) of the data extraction / processing design support system 100 is executed.

〈方式評価サブシステムの処理例〉
続いて、図２を用いて方式評価サブシステム３００の処理を説明する。 <Example of system evaluation subsystem processing>
Next, processing of the system evaluation subsystem 300 will be described using FIG.

図２は、図１のデータ抽出・加工設計支援システム１００が有する方式評価サブシステム３００の処理の一例を示すフローチャートである。 FIG. 2 is a flowchart showing an example of processing of the method evaluation subsystem 300 included in the data extraction / processing design support system 100 of FIG.

ここで、図２のステップＳ１０１，Ｓ１０２，Ｓ１０６，Ｓ１０７の処理は、データ評価部３０１が実行する処理であり、図２のステップＳ１０３，Ｓ１０４，Ｓ１０５の処理は、処理方式特徴量抽出部３０２が実行する処理である。 Here, the processes of steps S101, S102, S106, and S107 in FIG. 2 are processes executed by the data evaluation unit 301, and the processes of steps S103, S104, and S105 in FIG. It is a process to be executed.

まず、入力部２０１を介し入力された処理方式定義のうち、処理ログを解析し、処理名称、処理時間やデータ利用の詳細履歴を抽出する（ステップＳ１０１）。続いて、入力部２０１を介し入力された処理方式定義とステップＳ１０１の処理の出力であるデータ利用の詳細履歴から、処理に対する入出力定義を取得する（ステップＳ１０２）。入出力定義からは、入出力データの種別や組、利用の順序、入力値同士の操作種別、処理時間が抽出される。 First, among the processing method definitions input via the input unit 201, the processing log is analyzed, and the processing name, processing time, and detailed history of data use are extracted (step S101). Subsequently, the input / output definition for the process is acquired from the processing method definition input via the input unit 201 and the detailed history of data use that is the output of the process of step S101 (step S102). From the input / output definition, the type and set of input / output data, the order of use, the operation type between input values, and the processing time are extracted.

このとき、類似の処理方式がある場合には、データ管理サブシステム５００のデータ閲覧部５０２を通じて、処理方式メタデータ群６０２に既に格納されている、類似の処理方式メタデータを取得する（ステップＳ１０３）。また、類似の処理方式がない場合には、ステップＳ１０６の処理を行う。 At this time, if there is a similar processing method, the similar processing method metadata already stored in the processing method metadata group 602 is acquired through the data browsing unit 502 of the data management subsystem 500 (step S103). ). If there is no similar processing method, the process of step S106 is performed.

ステップＳ１０３の処理おいて、類似の処理方式メタデータを取得すると、ステップＳ１０１，Ｓ１０２の処理において抽出した処理方式定義の解析結果と突き合わせて特徴量を抽出する（ステップＳ１０４）。この際、特徴量の突き合わせは、数値の平均でもよいし、文字列の結合でもよい。 When similar processing method metadata is acquired in the processing of step S103, the feature amount is extracted by matching with the analysis result of the processing method definition extracted in the processing of steps S101 and S102 (step S104). At this time, the matching of the feature amounts may be an average of numerical values or a combination of character strings.

その後、データ管理サブシステム５００を通じて、算出した処理方式特徴量を処理対象データ群６０１に格納し（ステップＳ１０５）、入出力定義から入出力の管理単位や操作単位を抽出する（ステップＳ１０６）。管理単位は、ファイル名から取得した名称でもよいし、拡張子から判断される用途でもよい。続いて、データ管理サブシステム５００を通じて、処理対象データとともに、算出したデータ特徴量を格納する（ステップＳ１０７）。 Thereafter, the calculated processing method feature quantity is stored in the processing target data group 601 through the data management subsystem 500 (step S105), and an input / output management unit and an operation unit are extracted from the input / output definition (step S106). The management unit may be a name acquired from a file name or may be an application determined from an extension. Subsequently, the calculated data feature amount is stored together with the processing target data through the data management subsystem 500 (step S107).

〈処理方式特徴量のデータ形式例〉
図３は、図１のデータ抽出・加工設計支援システム１００が有する処理対象データ群６０１に格納される処理方式特徴量のデータ形式の一例を示した説明図であり、処理方式特徴量は、図示するように、「方式ＩＤ」、「詳細ＩＤ」、「処理名」、「入力種別」、「出力種別」、「試行履歴」、「処理時間／データ量」、および「組操作」を有する形式となっている。 <Data format example of processing method feature value>
FIG. 3 is an explanatory diagram showing an example of the data format of the processing method feature amount stored in the processing target data group 601 included in the data extraction / processing design support system 100 of FIG. As described above, a format having “method ID”, “detail ID”, “processing name”, “input type”, “output type”, “trial history”, “processing time / data amount”, and “group operation” It has become.

〈データ特徴量のデータ形式例〉
図４は、処理対象データ群６０１に格納されるデータ特徴量のデータ形式の一例を示した説明であり、図示するように、「処理ＩＤ」、「引数ＩＤ」、「管理単位」、「共有単位」、「操作単位」、および「データ」を有する形式となっている。 <Data format example of data feature value>
FIG. 4 is an explanation showing an example of the data format of the data feature quantity stored in the processing target data group 601. As shown in the figure, “processing ID”, “argument ID”, “management unit”, “shared” The format has “unit”, “operation unit”, and “data”.

データ特徴量の格納時には、データ管理サブシステム５００の形式特徴抽出部５０１を通じて、自動的にデータ型や値の分布といった特徴量が処理方式特徴量として算出され、処理対象データ群６０１に格納される。この際、算出される特徴量は、文字列型、整数型といったデータ型でもよいし、最大値、最小値や分布形式といった集計、統計的処理結果でもよい。 When storing data feature amounts, feature amounts such as data type and value distribution are automatically calculated as processing method feature amounts through the format feature extraction unit 501 of the data management subsystem 500 and stored in the processing target data group 601. . At this time, the calculated feature value may be a data type such as a character string type or an integer type, or may be a total or maximum statistical value such as a maximum value, a minimum value, or a distribution format.

〈抽出された処理方式特徴量のデータ形式例〉
図５は、図１のデータ抽出・加工設計支援システム１００が有する形式特徴抽出部５０１によって抽出された処理方式特徴量におけるデータ形式の一例を示す説明図であり、図示するように、「データ」、「属性」、「属性型」、および「レコード数」を有する形式となっている。 <Data format example of extracted processing method features>
FIG. 5 is an explanatory diagram showing an example of a data format in the processing method feature amount extracted by the format feature extraction unit 501 included in the data extraction / processing design support system 100 of FIG. 1. As shown in FIG. , “Attribute”, “attribute type”, and “number of records”.

〈候補処理抽出部の処理例〉
次に、候補処理抽出部４０２の処理について、図６を用いて説明する。 <Example of candidate process extraction unit>
Next, the process of the candidate process extraction unit 402 will be described with reference to FIG.

図６は、図１のサンプル実行サブシステム４００が有する候補処理抽出部４０２の処理の一例を示すフローチャートである。 FIG. 6 is a flowchart showing an example of processing of the candidate process extraction unit 402 included in the sample execution subsystem 400 of FIG.

候補処理抽出部４０２は、入力部２０１に入力された処理対象データに基づいて、分散処理としての候補を算出する処理を実行し、表示部２０２に選択可能な入力項目として出力される。 The candidate process extraction unit 402 executes a process of calculating candidates as a distributed process based on the processing target data input to the input unit 201, and is output to the display unit 202 as selectable input items.

まず、入力部２０１から、処理対象データが入力されると、候補処理抽出部４０２は、データ管理サブシステム５００から、図３に示す形式の処理方式特徴量群を取得する（ステップＳ２０１）。 First, when processing target data is input from the input unit 201, the candidate process extraction unit 402 acquires a processing method feature quantity group in the format shown in FIG. 3 from the data management subsystem 500 (step S201).

続いて、方式評価サブシステム３００に対してデータ評価要求を実施し、結果として図４に示す形式にてデータ特徴量を得る（ステップＳ２０２）。また、データ管理サブシステム５００に対して形式評価要求を実施し、結果として図５に示す形式でデータ特徴量を得る。 Subsequently, a data evaluation request is made to the method evaluation subsystem 300, and as a result, a data feature amount is obtained in the format shown in FIG. 4 (step S202). Further, a format evaluation request is made to the data management subsystem 500, and as a result, data feature amounts are obtained in the format shown in FIG.

ステップＳ２０１の処理にて得た処理方式特徴量群とステップＳ２０２の処理にて得たデータ特徴量から実施可能な値の組を抽出し（ステップＳ２０３）、抽出した組に含まれる値それぞれのデータ特徴量から、最小数のサンプルを取得する。 A set of feasible values is extracted from the processing method feature amount group obtained in the process of step S201 and the data feature amount obtained in the process of step S202 (step S203), and data of each value included in the extracted set is extracted. The minimum number of samples is acquired from the feature quantity.

抽出したサンプルを用いて、各処理方式毎にサンプル実行部４０１に対して、サンプルデータに対する処理実行を要求する（ステップＳ２０４）。サンプル実行部４０１の処理が終了すると処理実行時のログを解析し、処理時間や日時などが抽出され、図７に示す形式のデータとして保持する（ステップＳ２０５）。 Using the extracted sample, the sample execution unit 401 is requested to execute processing on the sample data for each processing method (step S204). When the processing of the sample execution unit 401 is completed, the log at the time of execution of the process is analyzed, the processing time, the date, etc. are extracted and held as data in the format shown in FIG. 7 (step S205).

サンプル実行時間があらかじめ設定された時間であるしきい値を超えたか否かを判断し（ステップＳ２０６）、サンプル実行時間がしきい値を超えた場合、または既定の回数実行された場合に処理を終了する。サンプル実行時間がしきい値を超えていない場合には、ステップＳ２０３の処理に戻り、再度サンプル数を増加してサンプル実行を要求する。 It is determined whether or not the sample execution time exceeds a threshold value that is a preset time (step S206), and processing is performed when the sample execution time exceeds the threshold value or when a predetermined number of times have been executed. finish. If the sample execution time does not exceed the threshold value, the process returns to step S203 to request the sample execution by increasing the number of samples again.

〈サンプル実行部の処理例〉
続いて、サンプル実行部４０１の処理について説明する。 <Sample execution unit processing example>
Next, processing of the sample execution unit 401 will be described.

図８は、サンプル実行サブシステム４００が有するサンプル実行部４０１の処理の一例を示すフローチャートである。 FIG. 8 is a flowchart illustrating an example of processing of the sample execution unit 401 included in the sample execution subsystem 400.

サンプル実行部４０１は、入力部２０１から実行を要求されるか、あるいは候補処理抽出部４０２の要求により処理を開始する。サンプル実行部４０１は、実行すべき処理対象サンプルデータと処理方式定義を入力として開始する。 The sample execution unit 401 is requested to execute from the input unit 201 or starts processing in response to a request from the candidate process extraction unit 402. The sample execution unit 401 starts with input of processing target sample data to be executed and a processing method definition.

まず、サンプル実行部４０１は、データ管理サブシステム５００に対し、形式特徴の抽出要求を行い、結果として図５に示すデータ形式特徴を取得する（ステップＳ３０１）。そして、方式評価サブシステム３００に対して、処理対象サンプルデータの特徴量抽出を要求し、結果として図４に示すデータ特徴量を取得する（ステップＳ３０２）。 First, the sample execution unit 401 makes a format feature extraction request to the data management subsystem 500, and acquires the data format feature shown in FIG. 5 as a result (step S301). Then, the method evaluation subsystem 300 is requested to extract the feature quantity of the processing target sample data, and as a result, the data feature quantity shown in FIG. 4 is acquired (step S302).

データ特徴量を取得すると、処理方式定義に基づいた処理を、処理対象サンプルデータを入力として実行し（ステップＳ３０３）、実行履歴情報を出力する（ステップＳ３０４）。 When the data feature amount is acquired, processing based on the processing method definition is executed with the processing target sample data as input (step S303), and execution history information is output (step S304).

〈表示部の表示例〉
図９は、設計制御サブシステム２００が有する表示部２０２における表示の一例を示す説明図である。 <Display examples on the display>
FIG. 9 is an explanatory diagram illustrating an example of display on the display unit 202 included in the design control subsystem 200.

表示部２０２には、図９の左上方にデータ特徴表示ウィンドウ２０２２が表示されており、該データ特徴表示ウィンドウ２０２２の左下方には、処理方式表示ウィンドウ２０２３が表示されている。 On the display unit 202, a data feature display window 2022 is displayed at the upper left of FIG. 9, and a processing method display window 2023 is displayed at the lower left of the data feature display window 2022.

また、処理方式表示ウィンドウ２０２３の右側には、処理入力方式表示ウィンドウ２０２４が表示されている。処理入力方式表示ウィンドウ２０２４の下方には、実行ボタン２０２５が表示されている。 A processing input method display window 2024 is displayed on the right side of the processing method display window 2023. An execution button 2025 is displayed below the process input method display window 2024.

データ特徴表示ウィンドウ２０２２の右側には、出力サンプル表示ウィンドウ２０２６が表示されており、該出力サンプル表示ウィンドウ２０２６の下方には、処理時間推定表示ウィンドウ２０２７が表示されている。 An output sample display window 2026 is displayed on the right side of the data feature display window 2022, and a processing time estimation display window 2027 is displayed below the output sample display window 2026.

データ特徴表示ウィンドウ２０２２は、図４および図５に示した形式で保存されている、データ特徴量、データ形式特徴量をリスト形式にて表示する。処理方式表示ウィンドウ２０２３および処理入力方式表示ウィンドウ２０２４には、図３に示した形式にて保存されている処理方式リストと、処理方式に対する入力組（タプル）とが表示されおり、処理方式に対する入力組は、データ形式特徴量で組み合わせ可能な組のみが明示される。 The data feature display window 2022 displays data features and data format features stored in the format shown in FIGS. 4 and 5 in a list format. The processing method display window 2023 and the processing input method display window 2024 display a processing method list saved in the format shown in FIG. 3 and an input set (tuple) for the processing method. Only pairs that can be combined with the data format feature amount are specified.

処理方式表示ウィンドウ２０２３や処理入力方式表示ウィンドウ２０２４にて選択された処理方式定義は、実行ボタン２０２５を押下することによってサンプル実行部４０１によって実行される。このサンプル実行部４０１での実行結果は、出力サンプル表示ウィンドウ２０２６と処理時間推定表示ウィンドウ２０２７にそれぞれ表示される。 The processing method definition selected in the processing method display window 2023 or the processing input method display window 2024 is executed by the sample execution unit 401 when an execution button 2025 is pressed. The execution results in the sample execution unit 401 are displayed in an output sample display window 2026 and a processing time estimation display window 2027, respectively.

出力サンプル表示ウィンドウ２０２６は、選択した処理方式の出力が表示される。表示形式は、テキストでもよいしグラフでもよい。処理時間推定表示ウィンドウ２０２７は、実行結果として得られる、図７に示すサンプル実行監視結果の形式のデータを表示し、近似曲線を表示する。少量サンプルによる処理時間を用いて、大量データ時の処理時間を推定することで、実現可能性の判断の支援となる。 The output sample display window 2026 displays the output of the selected processing method. The display format may be text or graph. The processing time estimation display window 2027 displays data in the form of sample execution monitoring results shown in FIG. 7 obtained as execution results, and displays approximate curves. Estimating the processing time for a large amount of data using the processing time for a small amount of sample helps to determine feasibility.

以上によれば、技術者の手により、並列分散処理方式定義をあらかじめ蓄積しておけば、大量データを取り扱うケースであっても、並列分散処理技術の設計において、分析手法の抽出と性能評価の実施を容易とすることができる。 According to the above, if the parallel distributed processing method definition is accumulated in advance by an engineer, analysis method extraction and performance evaluation can be performed in parallel distributed processing technology design, even in cases where large amounts of data are handled. Implementation can be facilitated.

また、画面提供により、分散並列処理の効果を概念的に把握することができるため、高度分析者が直接指導することなく、分散技術やデータ分析技術を効果的な活用検証に専念することができる。 In addition, since the provision of screens enables conceptual understanding of the effects of distributed parallel processing, it is possible to concentrate on effective use verification of distributed technology and data analysis technology without direct instruction from advanced analysts. .

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

なお、本発明は上記した実施の形態に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。 In addition, this invention is not limited to above-described embodiment, Various modifications are included. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to one having all the configurations described.

また、ある実施の形態の構成の一部を他の実施の形態の構成に置き換えることが可能であり、また、ある実施の形態の構成に他の実施の形態の構成を加えることも可能である。また、各実施の形態の構成の一部について、他の構成の追加、削除、置換をすることが可能である。 Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. . In addition, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

１００データ抽出・加工設計支援システム
２００設計制御サブシステム
２０１入力部
２０２表示部
２０２２データ特徴表示ウィンドウ
２０２３処理方式表示ウィンドウ
２０２４処理入力方式表示ウィンドウ
２０２５実行ボタン
２０２６出力サンプル表示ウィンドウ
２０２７処理時間推定表示ウィンドウ
３００方式評価サブシステム
３０１データ評価部
３０２処理方式特徴量抽出部
４００サンプル実行サブシステム
４０１サンプル実行部
４０２候補処理抽出部
５００データ管理サブシステム
５０１形式特徴抽出部
５０２データ閲覧部
６０１処理対象データ群
６０２処理方式メタデータ群 100 Data Extraction / Processing Design Support System 200 Design Control Subsystem 201 Input Unit 202 Display Unit 2022 Data Feature Display Window 2023 Processing Method Display Window 2024 Processing Input Method Display Window 2025 Execution Button 2026 Output Sample Display Window 2027 Processing Time Estimation Display Window 300 Method Evaluation Subsystem 301 Data Evaluation Unit 302 Processing Method Feature Extraction Unit 400 Sample Execution Subsystem 401 Sample Execution Unit 402 Candidate Processing Extraction Unit 500 Data Management Subsystem 501 Format Feature Extraction Unit 502 Data Browsing Unit 601 Processing Target Data Group 602 Processing Method metadata group

Claims

A method evaluation unit that extracts feature values from the input processing method data and evaluates a data processing method from the extracted feature values and processing method metadata;
The time required when processing based on the processing method data is performed on the sample data obtained by extracting candidates for distributed processing as sample data from the processing method data and adding the evaluation result by the method evaluation unit. A sample execution processing unit for predicting the processing time of processing target data from a certain sample data processing time;
A data management unit that generates a processing method feature value based on the feature value extracted by the method evaluation unit, and outputs a processing time of the processing target data predicted by the processing method feature value and the sample execution processing unit;
Design support system with

The design support system according to claim 1,
The design support system further includes a database that stores the processing method feature amount output by the data management unit and a processing time of the processing target data predicted by the sample execution processing unit.

The design support system according to claim 1,
The processing method data has a processing method definition that defines input / output of data processing,
The system evaluation unit is a design support system that extracts the feature amount by comparing the processing method definition and processing method metadata similar to the processing method definition.

The design support system according to claim 1,
The sample execution processing unit generates an approximate curve based on the sample data processing time, and predicts the processing time of the processing target data from the approximate curve.

A method evaluation unit that evaluates a data processing method, a sample execution processing unit that predicts a processing time of processing target data, a processing method feature amount, and a processing time of the processing target data predicted by the sample execution processing unit are output. A design support method by a computer system comprising a data management unit,
Extracting a feature amount from the input processing method data in the method evaluation unit;
Evaluating a processing method based on the extracted feature quantity and processing method metadata;
In the sample execution processing unit, candidates for distributed processing are extracted as sample data from the processing method data, and processing based on the processing method data is executed on the sample data to which the evaluation result by the method evaluation unit is added. Calculating sample data processing time, which is the time required for
Predicting the processing time of the data to be processed from the calculated sample data processing time;
In the data management unit, generating a processing method feature amount based on the feature amount extracted by the method evaluation unit;
Outputting the processing method feature and the processing time of the processing target data predicted by the sample execution processing unit;
A design support method.

The design support method according to claim 5,
The design support method further includes a step of storing the processing method feature amount output by the data management unit and a processing time of the processing target data predicted by the sample execution processing unit in a database.

The design support method according to claim 5,
The processing method data has a processing method definition that defines input / output of data processing,
The step of extracting the feature amount is a design support method for extracting the feature amount by comparing the processing method definition and processing method metadata similar to the processing method definition.

The design support method according to claim 5,
The step of predicting the processing time of the processing target data is a design in which the sample execution processing unit generates an approximate curve based on the sample data processing time and predicts the processing time of the processing target data from the approximate curve. Support method.

A method evaluation unit that evaluates a data processing method, a sample execution processing unit that predicts a processing time of processing target data, a processing method feature amount, and a processing time of the processing target data predicted by the sample execution processing unit are output. A program to be executed by a computer system comprising a data management unit,
Extracting a feature amount from the input processing method data in the method evaluation unit;
Evaluating a processing method based on the extracted feature quantity and processing method metadata;
In the sample execution processing unit, candidates for distributed processing are extracted as sample data from the processing method data, and processing based on the processing method data is executed on the sample data to which the evaluation result by the method evaluation unit is added. Calculating sample data processing time, which is the time required for
Predicting the processing time of the data to be processed from the calculated sample data processing time;
In the data management unit, generating a processing method feature amount based on the feature amount extracted by the method evaluation unit;
Outputting the processing method feature and the processing time of the processing target data predicted by the sample execution processing unit;
A program with

The program according to claim 9, wherein
The program further includes a step in which the data management unit stores in the database the processing method feature amount output by the data management unit and the processing time of the processing target data predicted by the sample execution processing unit.

The program according to claim 9, wherein
The step of extracting the feature amount extracts the feature amount by comparing a processing method definition that defines input / output of data processing included in the processing method data and processing method metadata similar to the processing method definition, program.

The program according to claim 9, wherein
The step of predicting the processing time of the processing target data is a program in which the sample execution processing unit generates an approximate curve based on the sample data processing time and predicts the processing time of the processing target data from the approximate curve. .