JP2019159608A

JP2019159608A - Search device and search method

Info

Publication number: JP2019159608A
Application number: JP2018043561A
Authority: JP
Inventors: 亮太赤井; Ryota Akai; 一樹谷本; Kazuki Tanimoto
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-03-09
Filing date: 2018-03-09
Publication date: 2019-09-19
Anticipated expiration: 2038-03-09
Also published as: JP6646699B2

Abstract

To facilitate the input of a correct key for each of various analysis elements if a user does not understand the content of the analysis the user wants to perform, to result in facilitating the search for similar analyses.SOLUTION: Analysis usually includes data processing, and the processing characteristics that are the characteristics of the data processing represent the characteristics of the analysis. Therefore, a search device searches for the processing characteristic similar to the specified characteristic (the processing characteristic of specified data processing definition or specified processing characteristic) from one or more processing characteristics that respectively correspond to one or more data processing definitions that have been registered for the analysis. Then, the search device displays search results associated with information related to the data processing definition having the similar processing characteristic.SELECTED DRAWING: Figure 15

Description

本発明は、概して、検索、例えば、分析支援の１つとしての検索に関する。 The present invention generally relates to searching, for example, searching as one of analytical support.

データ分析では、分析対象のデータを準備するデータ準備作業に大半の時間を費やす傾向にある。データ準備作業に費やす時間の削減を図る方法として、今回の分析と類似する過去の分析を探し、当該過去の分析で使用されたデータを、今回の分析の対象とする方法が考えられる。特許文献１に開示の分析支援サーバは、分析目的と、目的カテゴリと、分析手法と、対象データ（項目キーと当該項目キーを用いて取得するデータ項目）といった複数の分析要素をキーに、類似する分析を検索し、当該類似する分析の分析目的、目的カテゴリ、分析手法及び対象データを表示する。 In data analysis, data preparation work for preparing data to be analyzed tends to spend most of the time. As a method of reducing the time spent for data preparation work, a method of searching for a past analysis similar to the current analysis and using the data used in the past analysis as a target of the current analysis can be considered. The analysis support server disclosed in Patent Document 1 is similar to a plurality of analysis elements such as an analysis purpose, a purpose category, an analysis method, and target data (an item key and a data item acquired using the item key) as keys. And the analysis purpose, purpose category, analysis method, and target data of the similar analysis are displayed.

特開２０１０−２０５２１８号公報JP 2010-205218 A

ユーザ（例えば分析者）は、行いたい分析の中身を理解していないと、特許文献１に開示のような様々な分析要素の各々について正しいキーを入力することは難しく、故に、類似する分析を検索することが難しい。例えば、様々な分析要素として、分析の目的や手法等を理解している必要がある。 If the user (for example, an analyst) does not understand the contents of the analysis to be performed, it is difficult to input a correct key for each of various analysis elements as disclosed in Patent Document 1, and therefore, a similar analysis is performed. Difficult to search. For example, it is necessary to understand the purpose and method of analysis as various analysis elements.

分析は、通常、データ加工処理を含んでおり、データ加工処理の特徴である加工処理特徴が、分析の特徴を表している、言い換えれば、分析を区別する。そこで、本発明に係る検索装置は、それぞれが分析について登録済の１以上のデータ加工処理定義にそれぞれ対応した１以上の加工処理特徴から、指定特徴（指定されたデータ加工処理定義の加工処理特徴、又は、指定された加工処理特徴）に類似する加工処理特徴を検索する。そして、検索装置は、類似する加工処理特徴を有するデータ加工処理定義に関する情報が関連付いた検索結果を表示する。なお、「類似する加工処理特徴」とは、指定特徴との関連性に関して所定の条件を満たす加工処理特徴でよく、例えば、指定特徴との一致度が所定度合以上の加工処理特徴でよい。 The analysis usually includes a data processing process, and the processing feature that is a characteristic of the data processing process represents a characteristic of the analysis. In other words, the analysis is distinguished. Therefore, the search device according to the present invention can specify a specified feature (a processing feature of a specified data processing definition from one or more processing features corresponding to each of one or more registered data processing definitions for analysis. Or a processing feature similar to the specified processing feature). Then, the search device displays search results associated with information related to data processing definition having similar processing characteristics. The “similar processing feature” may be a processing feature that satisfies a predetermined condition regarding the relevance with the designated feature. For example, it may be a processing feature having a degree of coincidence with the designated feature equal to or higher than a predetermined level.

ユーザは、様々な分析要素の各々について正しいキーを入力できるほど十分に分析の中身を理解していなくても、行いたい分析に関するデータ加工処理定義（又は、当該データ加工処理定義の加工処理特徴）さえ指定できれば、行いたい分析に類似する分析に関する情報の提示を受けることができる。 Even if the user does not sufficiently understand the contents of the analysis so that the correct key can be input for each of the various analysis elements, the data processing definition related to the analysis to be performed (or processing characteristics of the data processing definition) If it can be specified, information related to the analysis similar to the analysis to be performed can be presented.

実施例１に係る検索装置の構成を示す。1 shows a configuration of a search device according to Embodiment 1. データ登録処理の流れを示す。The flow of data registration processing is shown. データ登録画面の一例を示す。An example of a data registration screen is shown. 分析の一例を模式的に示す。An example of analysis is shown typically. データ加工処理定義の一例の構成を示す。The structure of an example of a data processing process definition is shown. 特徴管理テーブルの構成を示す。The structure of a feature management table is shown. 傾向管理テーブルの構成を示す。The structure of a trend management table is shown. 実施例１に係る特徴抽出処理の流れを示す。3 shows a flow of feature extraction processing according to the first embodiment. 検索処理の流れを示す。The flow of search processing is shown. 実施例１に係る特徴検索の流れを示す。The flow of the feature search based on Example 1 is shown. 検索画面の一例を示す。An example of a search screen is shown. 定義詳細画面の一例を示す。An example of a definition details screen is shown. 実施例２に係る特徴管理テーブルの構成を示す。10 shows a configuration of a feature management table according to the second embodiment. 実施例２に係る特徴抽出処理の流れを示す。The flow of the feature extraction process which concerns on Example 2 is shown. 実施例２に係る特徴検索の流れを示す。10 shows a flow of feature search according to the second embodiment. 実施例１の概要を示す。The outline | summary of Example 1 is shown.

以下の説明では、「インターフェース部」は、１以上のインターフェースデバイスでよい。当該１以上のインターフェースデバイスは、下記のうちのいずれでもよい。
・Ｉ／Ｏ（Input/Output）デバイスと遠隔の表示用計算機とのうちの少なくとも１つに対するＩ／Ｏインターフェースデバイス。表示用計算機に対するＩ／Ｏインターフェースデバイスは、通信インターフェースデバイスでよい。少なくとも１つのＩ／Ｏデバイスは、ユーザインターフェースデバイス、例えば、キーボード及びポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでもよい。
・１以上の通信インターフェースデバイス。１以上の通信インターフェースデバイスは、１以上の同種の通信インターフェースデバイス（例えば１以上のＮＩＣ（Network Interface Card））であってもよいし２以上の異種の通信インターフェースデバイス（例えばＮＩＣとＨＢＡ（Host Bus Adapter））であってもよい。 In the following description, the “interface unit” may be one or more interface devices. The one or more interface devices may be any of the following.
An I / O interface device for at least one of an I / O (Input / Output) device and a remote display computer. The I / O interface device for the display computer may be a communication interface device. The at least one I / O device may be any of user interface devices, eg, input devices such as a keyboard and pointing device, and output devices such as a display device.
• One or more communication interface devices. The one or more communication interface devices may be one or more similar communication interface devices (for example, one or more NIC (Network Interface Card)) or two or more different types of communication interface devices (for example, NIC and HBA (Host Bus). Adapter)).

また、以下の説明では、「メモリ部」は、１以上のメモリであり、典型的には主記憶デバイスでよい。メモリ部における少なくとも１つのメモリは、揮発性メモリであってもよいし不揮発性メモリであってもよい。 In the following description, the “memory unit” is one or more memories, and may typically be a main storage device. The at least one memory in the memory unit may be a volatile memory or a non-volatile memory.

また、以下の説明では、「ＰＤＥＶ部」は、１以上のＰＤＥＶであり、典型的には補助記憶デバイスでよい。「ＰＤＥＶ」は、物理的な記憶デバイス（Physical storage DEVice）を意味し、典型的には、不揮発性の記憶デバイス、例えばＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）である。 In the following description, the “PDEV unit” is one or more PDEVs, and typically an auxiliary storage device. "PDEV" means a physical storage device (Physical storage DEVice), and is typically a non-volatile storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

また、以下の説明では、「記憶部」は、メモリ部とＰＤＥＶ部の少なくとも一部とのうちの少なくとも１つ（典型的には少なくともメモリ部）である。 In the following description, the “storage unit” is at least one of the memory unit and at least a part of the PDEV unit (typically at least the memory unit).

また、以下の説明では、「プロセッサ部」は、１以上のプロセッサである。少なくとも１つのプロセッサは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサであるが、ＧＰＵ（Graphics Processing Unit）のような他種のプロセッサでもよい。少なくとも１つのプロセッサとしてのプロセッサは、シングルコアでもよいしマルチコアでもよい。少なくとも１つのプロセッサは、処理の一部又は全部を行うハードウェア回路（例えばＦＰＧＡ（Field-Programmable Gate Array）又はＡＳＩＣ（Application Specific Integrated Circuit））といった広義のプロセッサでもよい。 In the following description, the “processor unit” is one or more processors. The at least one processor is typically a microprocessor such as a CPU (Central Processing Unit), but may be another type of processor such as a GPU (Graphics Processing Unit). The processor as at least one processor may be a single core or a multi-core. The at least one processor may be a processor in a broad sense such as a hardware circuit (for example, a field-programmable gate array (FPGA) or an application specific integrated circuit (ASIC)) that performs part or all of the processing.

また、以下の説明では、「ｘｘｘテーブル」といった表現にて、入力に対して出力が得られる情報を説明することがあるが、当該情報は、どのような構造のデータでもよいし、入力に対する出力を発生するニューラルネットワークのような学習モデルでもよい。従って、「ｘｘｘテーブル」を「ｘｘｘ情報」と言うことができる。また、以下の説明において、各テーブルの構成は一例であり、１つのテーブルは、２以上のテーブルに分割されてもよいし、２以上のテーブルの全部又は一部が１つのテーブルであってもよい。 In the following description, information that can be output with respect to an input may be described using an expression such as “xxx table”. The information may be data of any structure, and may be output with respect to an input. A learning model such as a neural network that generates Therefore, the “xxx table” can be referred to as “xxx information”. In the following description, the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of the two or more tables may be a single table. Good.

また、以下の説明では、「ｋｋｋ部」（インターフェース部、記憶部及びプロセッサ部を除く）の表現にて機能を説明することがあるが、機能は、１以上のコンピュータプログラムがプロセッサ部によって実行されることで実現されてもよいし、１以上のハードウェア回路（例えばＦＰＧＡ又はＡＳＩＣ）によって実現されてもよい。プログラムがプロセッサ部によって実行されることで機能が実現される場合、定められた処理が、適宜に記憶部及び／又はインターフェース部等を用いながら行われるため、機能はプロセッサ部の少なくとも一部とされてもよい。機能を主語として説明された処理は、プロセッサ部あるいはそのプロセッサ部を有する装置が行う処理としてもよい。プログラムは、プログラムソースからインストールされてもよい。プログラムソースは、例えば、プログラム配布計算機又は計算機が読み取り可能な記録媒体（例えば非一時的な記録媒体）であってもよい。各機能の説明は一例であり、複数の機能が１つの機能にまとめられたり、１つの機能が複数の機能に分割されたりしてもよい。 In the following description, the function may be described by the expression “kkk unit” (excluding the interface unit, the storage unit, and the processor unit). However, one or more computer programs are executed by the processor unit. It may be realized by one or more hardware circuits (for example, FPGA or ASIC). When the function is realized by the program being executed by the processor unit, since the predetermined processing is appropriately performed using the storage unit and / or the interface unit, the function is at least a part of the processor unit. May be. The processing described with the function as the subject may be processing performed by the processor unit or a device having the processor unit. The program may be installed from a program source. The program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-transitory recording medium). The description of each function is an example, and a plurality of functions may be combined into one function, or one function may be divided into a plurality of functions.

また、以下の説明では、「検索装置」は、１以上の計算機で構成されてよい。具体的には、例えば、計算機が表示デバイスを有していて計算機が自分の表示デバイスに情報を表示する場合、当該計算機が検索装置でよい。また、例えば、第１計算機（例えばサーバ）が表示用情報を遠隔の第２計算機（表示用計算機（例えばクライアント））に送信し表示用計算機がその情報を表示する場合（第１計算機が第２計算機に情報を表示する場合）、第１計算機と第２計算機とのうちの少なくとも第１計算機が検索装置でよい。つまり、「情報を表示する」ことは、計算機が有する表示デバイスに情報を表示することであってもよいし、計算機が表示用計算機に表示用情報を送信することであってもよい（後者の場合は表示用計算機によって表示用情報が表示される）。また、検索装置としての機能を発揮するためのソフトウェア（コンピュータプログラム）が計算機で実行されることでソフトウェアディファインドの検索装置が実現されてもよい。 In the following description, the “search device” may be composed of one or more computers. Specifically, for example, when a computer has a display device and the computer displays information on its own display device, the computer may be a search device. Also, for example, when the first computer (for example, server) transmits display information to a remote second computer (display computer (for example, client)) and the display computer displays the information (the first computer is the second computer). When displaying information on a computer), at least the first computer of the first computer and the second computer may be a search device. That is, “displaying information” may be displaying information on a display device included in the computer, or may be displaying the display information from the computer to the display computer (the latter). In this case, the display information is displayed by the display computer). In addition, a software-defined search device may be realized by executing software (computer program) for performing the function as a search device on a computer.

以下、図面を参照して、本発明の幾つかの実施例を説明する。 Hereinafter, several embodiments of the present invention will be described with reference to the drawings.

図１５は、本実施例の概要を示す。 FIG. 15 shows an outline of the present embodiment.

検索装置１０１が、ユーザが使用する入出力コンソール１６０（表示用計算機の一例）から検索要求を受け、当該検索要求に応答して検索を行い、検索の結果としての情報を入出力コンソール１６０に表示するようになっている。 The search apparatus 101 receives a search request from the input / output console 160 (an example of a display computer) used by the user, performs a search in response to the search request, and displays information as a search result on the input / output console 160. It is supposed to be.

具体的には、例えば、検索装置１０１は、加工処理検索部１１１と、検索結果表示部１１２とを有する。加工処理検索部１１１は、データ加工処理定義（又はその加工処理特徴）が指定された検索要求を入出力コンソール１６０から受けて、それぞれが分析について登録済の１以上のデータ加工処理定義１２１にそれぞれ対応した１以上の加工処理特徴１２２から、指定されたデータ加工処理定義の加工処理特徴（又は指定された加工処理特徴）に類似する加工処理特徴１２２を検索する。検索結果表示部１１２は、類似する加工処理特徴１２２を有するデータ加工処理定義１２１に関する情報が関連付いた検索結果１５００を表示する。ユーザは、様々な分析要素の各々について正しいキーを入力できるほど十分に分析の中身を理解していなくても、行いたい分析に関するデータ加工処理定義（又は、当該データ加工処理定義の加工処理特徴）さえ指定できれば、行いたい分析に類似する分析に関する情報の提示を受けることができる。すなわち、ユーザは、分析の検索の際に、分析の目的や手法をカテゴライズしなくて済む。また、データ加工処理定義１２１は、分析の目的や手法をベースに作成されるとは限らない。故に、分析の目的や手法といった観点をキーに検索する方法ではヒットしない分析が見つかる可能性もある。以上のことから、例えば、データ準備作業に費やす時間の削減と、行いたい分析に類似する分析に関する情報が得られる確率の向上とのうちの少なくとも１つが期待できる。なお、加工処理特徴は、分析を区別する特徴のため、本実施例では、いずれのデータ加工処理も、データクレンジングのように多くの分析に共通のデータ加工処理（つまり汎用的なデータ加工処理）ではないものとする。また、データ加工処理定義１２１は、データ加工処理を示す定義（例えば定義ファイル）である。以下、冗長な表現を避けるために、類似する加工処理特徴１２２を有する（類似する加工処理特徴１２２が関連付けられている）データ加工処理定義１２１を「類似するデータ加工処理定義１２１」と言い、類似するデータ加工処理定義１２１に関連付けられているデータ傾向１２３を、「類似するデータ傾向１２３」と言うことがある（「データ傾向」については後述する）。 Specifically, for example, the search device 101 includes a processing process search unit 111 and a search result display unit 112. The processing processing search unit 111 receives a search request in which a data processing processing definition (or processing processing characteristics thereof) is designated from the input / output console 160, and each of the processing processing search units 111 adds one or more registered data processing processing definitions 121 for analysis. A processing feature 122 similar to the processing feature (or the specified processing feature) of the specified data processing definition is retrieved from the corresponding one or more processing features 122. The search result display unit 112 displays a search result 1500 associated with information related to the data processing definition 121 having similar processing characteristics 122. Even if the user does not sufficiently understand the contents of the analysis so that the correct key can be input for each of the various analysis elements, the data processing definition related to the analysis to be performed (or processing characteristics of the data processing definition) If it can be specified, information related to the analysis similar to the analysis to be performed can be presented. That is, the user does not have to categorize the purpose and method of analysis when searching for analysis. The data processing definition 121 is not always created based on the purpose and method of analysis. Therefore, there is a possibility that an analysis that does not hit is found by the method of searching using the viewpoint such as the purpose and method of the analysis as a key. From the above, for example, at least one of a reduction in time spent for data preparation work and an improvement in the probability of obtaining information related to analysis similar to the analysis desired to be performed can be expected. In addition, since the processing feature is a feature that distinguishes analysis, in this embodiment, any data processing processing is common to many analyzes like data cleansing (that is, general-purpose data processing processing). Not. The data processing definition 121 is a definition (for example, a definition file) indicating data processing. Hereinafter, in order to avoid redundant expressions, a data processing definition 121 having a similar processing feature 122 (associated with a similar processing feature 122) is referred to as a “similar data processing definition 121”. The data trend 123 associated with the data processing definition 121 to be performed may be referred to as “similar data trend 123” (“data trend” will be described later).

検索装置１０１は、管理情報１３０を保持する。管理情報１３０は、登録済の１以上のデータ加工処理定義１２１を含む。具体的には、例えば、管理情報１３０は、１以上のエントリユニット１２０で構成される。エントリユニット１２０は、複数のデータセットのグループであり、データ加工処理定義１２１と、当該データ加工処理定義１２１に関連付けられた加工処理特徴１２２及びデータ傾向１２３のうちの少なくとも１つとで構成される。具体的には、例えば、エントリユニット１２０において、データ加工処理定義１２１に、当該データ加工処理定義１２１が示すデータ加工処理で使用されるデータのデータ傾向１２３が関連付けられている。「データ傾向」は、データ加工処理において使用されるデータの統計量（例えば、「JIS Z 8101-1:2015 統計
−用語と記号− 第１部:確率及び一般統計用語（日本規格協会）の「１一般統計用語」」に記載されるような統計量など）であり、具体例として、データにおけるデータセット（例えば値）の数、欠損値量、及び、分布などを挙げることができる。また、エントリユニット１２０において、データ加工処理定義１２１に、当該データ加工処理定義１２１の特徴を示す加工処理特徴１２２が関連付けられている。加工処理特徴１２２がデータ加工処理定義１２１に関連付けられていれば、検索の際に、当該データ加工処理定義１２１について加工処理特徴１２２を算出する必要が無く、故に、高速な検索が期待できる。また、データ傾向１２３がデータ加工処理定義１２１に関連付けられていれば、類似する加工処理特徴１２２（典型的には、過去の類似する分析のデータ加工処理の特徴）に関し、当該加工処理特徴１２２を有する分析（データ加工処理）で使用したデータの傾向もわかるので、過去の類似する分析として適切な分析をユーザが見つけ易くなり、結果として、分析の品質及び効率の少なくとも１つの向上が期待できる。なお、「データセット」とは、アプリケーションプログラムのようなプログラムから見た１つの論理的な電子データの塊であり、例えば、レコード、ファイル、キーバリューペア及びタプルのうちのいずれでもよい。本実施例では、例えば、データ加工処理定義１２１は、ファイルである。また、加工処理特徴が抽出されないデータ加工処理定義１２１を含んだエントリユニット１２０には、加工処理特徴１２２は含まれない。 The search apparatus 101 holds management information 130. The management information 130 includes one or more registered data processing process definitions 121. Specifically, for example, the management information 130 is composed of one or more entry units 120. The entry unit 120 is a group of a plurality of data sets, and includes a data processing definition 121 and at least one of a processing feature 122 and a data trend 123 associated with the data processing definition 121. Specifically, for example, in the entry unit 120, the data processing definition 121 of the data used in the data processing indicated by the data processing definition 121 is associated with the data processing definition 121. “Data trends” are data statistics used in data processing (for example, “JIS Z 8101-1: 2015 Statistics-Terms and Symbols-Part 1: Probability and General Statistical Terms (Japanese Standards Association)“ Statistic as described in "1 general statistical term"), and specific examples include the number of data sets (for example, values) in data, the amount of missing values, and the distribution. In the entry unit 120, the data processing definition 121 is associated with a processing feature 122 indicating the characteristics of the data processing definition 121. If the processing feature 122 is associated with the data processing definition 121, it is not necessary to calculate the processing feature 122 for the data processing definition 121 at the time of search, and therefore high-speed search can be expected. Further, if the data trend 123 is associated with the data processing definition 121, the processing feature 122 is related to a similar processing feature 122 (typically, a data processing feature of a similar analysis in the past). Since the tendency of the data used in the analysis (data processing) that the user has is also known, it becomes easy for the user to find an appropriate analysis as a similar analysis in the past, and as a result, at least one improvement in the quality and efficiency of the analysis can be expected. A “data set” is a single logical electronic data block viewed from a program such as an application program, and may be any of a record, a file, a key-value pair, and a tuple, for example. In this embodiment, for example, the data processing definition 121 is a file. In addition, the processing unit feature 122 is not included in the entry unit 120 including the data processing processing definition 121 from which the processing processing feature is not extracted.

検索結果１５００には、類似するデータ傾向を示す情報が関連付けられる。入出力コンソール１６０において、検索結果画面（検索結果１５００を表示する画面）には、一致度の上位ｎ件（ｎは自然数）の加工処理特徴１２２の各々について、当該加工処理特徴１２２を有するデータ加工処理定義１２１のファイル名（例えば“001.etl”）と、当該データ加工処理定義１２１に関連付けられているデータ傾向１２３とが表示される。ユーザは、検索結果を見て、類似するデータ傾向が、行いたい分析に則したデータ傾向であるかを評価することができる。表示されるデータ傾向から、ユーザは、類似するデータ加工処理定義１２１が、行いたい分析に類似する分析のデータ加工処理定義１２１であるか否かを判断したり、分析に必要なデータセットがデータに含まれているか否か（例えば、性別で分けた分析をしたいため男性に関するデータセットと女性に関するデータセットの両方があるか否か）を判断したりすることができる。 The search result 1500 is associated with information indicating a similar data trend. In the input / output console 160, on the search result screen (screen for displaying the search result 1500), the data processing having the processing feature 122 for each of the top n processing features 122 (n is a natural number) of the matching degree. The file name (for example, “001.etl”) of the process definition 121 and the data trend 123 associated with the data processing process definition 121 are displayed. The user can evaluate whether or not the similar data tendency is a data tendency according to the analysis to be performed by looking at the search result. From the displayed data trend, the user determines whether or not the similar data processing definition 121 is the data processing definition 121 of the analysis similar to the analysis to be performed, or the data set necessary for the analysis is data. (For example, whether or not there are both a data set related to males and a data set related to females for the purpose of analyzing by sex).

以下、本実施例を詳細に説明する。 Hereinafter, this embodiment will be described in detail.

図１は、検索装置１０１の構成を示す。 FIG. 1 shows the configuration of the search device 101.

検索装置１０１は、インターフェース部１５１、メモリ部１５２、ＰＤＥＶ部１５３、及び、それらに接続されたプロセッサ部１５４を有する。 The search apparatus 101 includes an interface unit 151, a memory unit 152, a PDEV unit 153, and a processor unit 154 connected thereto.

インターフェース部１５１に、通信ネットワーク（例えばインターネット）１７０経由で、入出力コンソール１６０が接続される。入出力コンソール１６０は、表示用計算機の一例であり、例えば、デスクトップ型、ノート型或いはタブレット型のパーソナルコンピュータである。入出力コンソール１６０は、入力デバイス１６１（例えば、キーボード及びポインティングデバイス）と表示デバイス１６２（例えば、液晶ディスプレイ）とを有する。 An input / output console 160 is connected to the interface unit 151 via a communication network (for example, the Internet) 170. The input / output console 160 is an example of a display computer, and is, for example, a desktop, notebook, or tablet personal computer. The input / output console 160 includes an input device 161 (for example, a keyboard and a pointing device) and a display device 162 (for example, a liquid crystal display).

ＰＤＥＶ部１５３は、管理情報１３０を格納する。管理情報１３０は、特徴管理テーブル１８１及び傾向管理テーブル１８２を含む。管理情報１３０の少なくとも一部が、検索装置１０１の外部の記憶装置に格納されてもよい。 The PDEV unit 153 stores management information 130. The management information 130 includes a feature management table 181 and a trend management table 182. At least a part of the management information 130 may be stored in a storage device external to the search device 101.

メモリ部１５２は、１以上のコンピュータプログラムを格納する。当該１以上のコンピュータプログラムの少なくとも１つがプロセッサ部１５４により実行されることにより、特徴抽出部１９１、データ登録部１９２、加工処理検索部１１１及び検索結果表示部１１２といった機能が実現される。特徴抽出部１９１は、指定されたデータ加工処理定義の加工処理特徴を当該指定されたデータ加工処理定義から自動抽出する。これにより、ユーザは、検索キーとなる加工処理定義を指定しなくとも、データ加工処理定義を指定すれば、検索キーとなる加工処理特徴が取得されることになる。データ登録部１９２は、データ加工処理定義１２１、加工処理特徴１２２及びデータ傾向１２３を管理情報１３０に登録する。加工処理検索部１１１及び検索結果表示部１１２については、上述の通りである。 The memory unit 152 stores one or more computer programs. When at least one of the one or more computer programs is executed by the processor unit 154, functions such as a feature extraction unit 191, a data registration unit 192, a processing processing search unit 111, and a search result display unit 112 are realized. The feature extraction unit 191 automatically extracts the processing feature of the specified data processing definition from the specified data processing definition. As a result, even if the user does not specify the processing definition to be the search key, if the data processing definition is specified, the processing feature to be the search key is acquired. The data registration unit 192 registers the data processing definition 121, the processing feature 122, and the data trend 123 in the management information 130. The processing process search unit 111 and the search result display unit 112 are as described above.

図２は、データ登録処理の流れを示す。 FIG. 2 shows the flow of data registration processing.

データ登録部１９２が、入出力コンソール１６０から、データ加工処理定義１２１と、当該データ加工処理定義１２１が示すデータ加工処理で利用されたデータのデータ傾向１２３との入力を受ける（Ｓ２０１）。 The data registration unit 192 receives an input of the data processing definition 121 and the data trend 123 of the data used in the data processing indicated by the data processing definition 121 from the input / output console 160 (S201).

データ登録部１９２が、特徴抽出部１９１を呼び出し、特徴抽出部１９１が、特徴抽出処理を行う（Ｓ２０２）。 The data registration unit 192 calls the feature extraction unit 191, and the feature extraction unit 191 performs feature extraction processing (S202).

データ登録部１９２が、データ加工処理定義１２１とデータ傾向１２３を紐付ける（関連付ける）データ加工処理ＩＤを付与し、データ加工処理定義１２１に、データ加工処理ＩＤとデータ傾向１２３を（Ｓ２０２の結果が抽出成功であれば、当該加工処理特徴１２２も）関連付ける（Ｓ２０３）。Ｓ２０３で、データ加工処理定義１２１、データ加工処理ＩＤ、及びデータ傾向１２３が（Ｓ２０２の結果が抽出成功であれば、当該加工処理特徴１２２も）登録される。 The data registration unit 192 assigns a data processing process ID for associating (associating) the data processing process definition 121 with the data trend 123, and the data processing process ID 121 and the data trend 123 are stored in the data processing process definition 121 (the result of S202 is If the extraction is successful, the processing feature 122 is also associated (S203). In S203, the data processing definition 121, the data processing ID, and the data trend 123 are registered (if the result of S202 is a successful extraction, the processing feature 122 is also registered).

データ登録部１９２は、処理結果を表示する（Ｓ２０４）。ここで言う「処理結果」は、例えば、加工処理特徴１２２の抽出の有無と、登録の成否と、登録されたデータセット群（１以上のデータセット）に関する情報とを含む。 The data registration unit 192 displays the processing result (S204). The “processing result” referred to here includes, for example, the presence / absence of extraction of the processing feature 122, the success / failure of registration, and information on a registered data set group (one or more data sets).

図３は、データ登録画面の一例を示す。 FIG. 3 shows an example of the data registration screen.

データ登録画面３００は、ＧＵＩ（Graphical User Interface）のようなユーザインターフェース画面である。データ登録画面３００は、例えばデータ登録部１９２により表示されてよい。データ登録画面３００は、ＵＩ（User Interface）３０１、３０２及び３０３を有する。 The data registration screen 300 is a user interface screen such as a GUI (Graphical User Interface). The data registration screen 300 may be displayed by the data registration unit 192, for example. The data registration screen 300 includes UI (User Interface) 301, 302, and 303.

ＵＩ３０１は、登録対象のデータ加工処理定義の入力用のＵＩである。ＵＩ３０１を用いて、データ加工処理定義が指定（例えば、データ加工処理定義のファイル名を含むファイルパスが入力）される。 A UI 301 is a UI for inputting a data processing definition to be registered. Using the UI 301, a data processing definition is designated (for example, a file path including the file name of the data processing definition is input).

ＵＩ３０２は、登録対象のデータ加工処理定義が示すデータ加工処理において使用されるデータのデータ傾向の入力用のＵＩである。ＵＩ３０２によれば、データ傾向は、データにおけるデータ項目と、各データ項目についての標本数、標本欠損数及び標本平均である。 The UI 302 is a UI for inputting a data trend of data used in the data processing indicated by the data processing definition to be registered. According to UI 302, data trends are the data items in the data and the number of samples, the number of sample defects and the sample average for each data item.

ＵＩ３０３は、データ登録処理の開始の指示用のＵＩである。ＵＩ３０３（例えばボタン）が操作されると、ＵＩ３０１を用いて指定されたデータ加工処理定義とＵＩ３０２を用いて入力されたデータ傾向との登録のためのデータ登録処理（図２）が開始される。 The UI 303 is a UI for instructing the start of data registration processing. When the UI 303 (for example, a button) is operated, a data registration process (FIG. 2) for registering the data processing process definition specified using the UI 301 and the data trend input using the UI 302 is started.

図４は、分析の一例を模式的に示す。 FIG. 4 schematically shows an example of analysis.

分析は、データ加工処理を含む。データ加工処理実行エンジン（C、Java（登録商標）、Python等の実行エンジン、ＤＢＭＳ（DataBase Management System）、又は、ＥＴＬ（Extract/Transform/Load）ツール）４００が、当該分析に対応したデータ加工処理定義１２１を読み込み、当該定義１２１が示すデータ加工処理を実行する。データ加工処理実行エンジン４００は、検索装置１０１とは別の装置で実行されてもよいし、検索装置１０１で実行されてもよい。データ加工処理は、データセットの結合処理と、データセットの集約処理とのうちの少なくとも１つを含む。 Analysis includes data processing. Data processing processing engine (C, Java (registered trademark), execution engine such as Python, DBMS (DataBase Management System), or ETL (Extract / Transform / Load) tool) 400 performs data processing corresponding to the analysis. The definition 121 is read, and the data processing indicated by the definition 121 is executed. The data processing execution engine 400 may be executed by a device different from the search device 101 or may be executed by the search device 101. The data processing process includes at least one of a data set combining process and a data set aggregation process.

図示の具体例は、次の通りである。入力データは、テーブルＡ〜Ｃを含み、出力データは、テーブルＤを含む。データ加工処理は、テーブルＡ〜Ｃ内のあるカラム（データ項目）をキーとして結合する結合処理と、結合処理により得られたテーブル内のあるカラムを集約キー及び集約対象としてある手法により集約する集約処理とを含む。集約処理の結果として、テーブルＤが出力される。 The specific example shown is as follows. The input data includes tables A to C, and the output data includes table D. The data processing process includes a join process that joins a certain column (data item) in the tables A to C as a key, and an aggregate that aggregates a certain column in the table obtained by the join process using an aggregation key and an aggregation target by a certain method. Processing. Table D is output as a result of the aggregation process.

入力データと出力データの両方又は片方のデータにおける少なくとも１つのデータセットは、テーブルのような構造化データに代えて、非構造化データ（例えば、ＸＭＬ（eXtensible Markup Language）ファイル、又は、ＪＳＯＮ（JavaScript Object Notation）ファイル）でもよいし（JavaScriptは登録商標）、非構造化データ（例えば、センサデータ、画像データ又は音声データ）でもよい。入力データと出力データの両方又は片方のデータに関し、データセット数は問わない。結合処理や集約処理の前後に、異常データの除外や数値計算などの他の処理が入っていてもよい。 At least one data set in the input data and / or the output data may be an unstructured data (e.g., XML (eXtensible Markup Language) file or JSON (JavaScript) instead of structured data such as a table. Object Notation) file (JavaScript is a registered trademark), or unstructured data (for example, sensor data, image data, or audio data). The number of data sets is not limited regarding both input data and output data or one of the data. Other processes such as exclusion of abnormal data and numerical calculation may be included before and after the combining process and the aggregation process.

図５は、データ加工処理定義１２１の一例の構成を示す。 FIG. 5 shows an exemplary configuration of the data processing process definition 121.

データ加工処理定義１２１は、データ加工処理（例えば図４参照）の内容をテキストで定義したファイルである。図示の例によれば、データ加工処理定義１２１は、ＸＭＬファイルであるが、ＸＭＬファイルに代えて、例えば、C、Java（登録商標）、Python言語などのプログラミング言語やSQL文、XML形式、JSON形式などの形式で記述されたデータセットでもよい。 The data processing process definition 121 is a file that defines the contents of the data processing process (for example, see FIG. 4) in text. According to the illustrated example, the data processing definition 121 is an XML file. However, instead of the XML file, for example, a programming language such as C, Java (registered trademark), Python language, SQL statement, XML format, JSON, etc. It may be a data set described in a format.

データ加工処理定義１２１は、各処理を表す識別子と、該当処理を行うために必要な設定、各処理の順序などの記述を含む。処理を表す識別子として、例えば、<component_type>“A”は、入力Ａを意味し、<component_type>“Groupby”は、集約処理を意味し、<component_type>“Join”は、結合処理を意味する。なお、集約処理に関して、処理を行うために必要な集約キー、集約対象、集約方法は、それぞれ<key>、<target>及び<method>として定義されている。 The data processing process definition 121 includes an identifier representing each process, a setting necessary for performing the process, a description of the order of each process, and the like. For example, <component_type> “A” means input A, <component_type> “Groupby” means aggregation processing, and <component_type> “Join” means join processing. Regarding the aggregation process, the aggregation key, the aggregation target, and the aggregation method necessary for performing the process are defined as <key>, <target>, and <method>, respectively.

図６は、特徴管理テーブル１８１の構成を示す。 FIG. 6 shows the configuration of the feature management table 181.

特徴管理テーブル１８１は、データ加工処理毎にレコードを有する。各レコードが、データ加工処理ＩＤ６０１、定義名６０２、集約キー６０３、集約対象６０４及び集約方法６０５といった情報を格納する。集約キー６０３、集約対象６０４及び集約方法６０５の組合せが、加工処理特徴１２２の一例に相当する。以下、１つのデータ加工処理を例に取る（図６において「対象加工処理」）。 The feature management table 181 has a record for each data processing process. Each record stores information such as a data processing process ID 601, a definition name 602, an aggregation key 603, an aggregation target 604, and an aggregation method 605. A combination of the aggregation key 603, the aggregation target 604, and the aggregation method 605 corresponds to an example of the processing feature 122. Hereinafter, one data processing process is taken as an example (“target processing process” in FIG. 6).

データ加工処理ＩＤ６０１は、対象加工処理に対してデータ登録処理において付与されたＩＤを示す。定義名６０２は、対象加工処理に対応したデータ加工処理定義１２１のファイル名を示す。集約キー６０３は、対象加工処理（集約処理）において使用されるキーとしてのデータ項目（項目名（カラム名））を示す。集約対象６０４は、対象加工処理（集約処理）において集約される値が属するデータ項目を示す。集約方法６０５は、集約方法を示す。 The data processing process ID 601 indicates an ID given to the target processing process in the data registration process. The definition name 602 indicates the file name of the data processing process definition 121 corresponding to the target processing process. An aggregation key 603 indicates a data item (item name (column name)) as a key used in the target processing process (aggregation process). The aggregation target 604 indicates a data item to which the value aggregated in the target processing process (aggregation process) belongs. An aggregation method 605 indicates an aggregation method.

図示の例によれば、例えば次の通りである。例えば、１番目のレコードは、売上明細テーブルから商品毎の売上金額の合計を分析（算出）することを意味する。つまり、データ加工処理実行エンジン４００は、データ項目［商品］をキーに、データ項目［商品］に属する全ての売上金額の合計を算出する。２番目のレコードは、同じ売上明細テーブルから性別毎の売上金額の平均を分析することを意味する。このように、分析種別ごとに分類できるケースがあるため、集約処理の集約キー、集約対象及び集約方法の組合せを加工処理特徴１２２の一例とすることができる。 According to the illustrated example, for example, it is as follows. For example, the first record means analyzing (calculating) the total sales amount for each product from the sales detail table. That is, the data processing execution engine 400 calculates the sum of all sales amounts belonging to the data item [product] using the data item [product] as a key. The second record means analyzing the average sales amount for each gender from the same sales detail table. As described above, since there is a case where the data can be classified for each analysis type, a combination of the aggregation key, the aggregation target, and the aggregation method of the aggregation processing can be an example of the processing feature 122.

図７は、傾向管理テーブル１８２の構成を示す。 FIG. 7 shows the configuration of the trend management table 182.

傾向管理テーブル１８２は、データ加工処理毎にレコードを有する。各レコードが、データ加工処理ＩＤ７０１、データ項目７０２、標本数７０３、標本欠損数７０４及び標本平均７０５といった情報を格納する。データ項目７０２、標本数７０３、標本欠損数７０４及び標本平均７０５の組合せが、データ傾向１２３の一例に相当する。以下、１つのデータ加工処理を例に取る（図７において「対象加工処理」）。 The trend management table 182 has a record for each data processing process. Each record stores information such as a data processing ID 701, a data item 702, a sample number 703, a sample defect number 704, and a sample average 705. A combination of the data item 702, the sample number 703, the sample defect number 704, and the sample average 705 corresponds to an example of the data trend 123. Hereinafter, one data processing process is taken as an example (“target processing process” in FIG. 7).

データ加工処理ＩＤ７０１は、対象加工処理に対してデータ登録処理において付与されたＩＤを示す。データ項目７０２、標本数７０３、標本欠損数７０４及び標本平均７０５の組合せが、対象加工処理に属するデータ項目毎に存在する。データ項目７０２は、データ項目を示す。標本数７０３、標本欠損数７０４及び標本平均７０５は、当該データ項目についての標本数、標本欠損数及び標本平均を示す。 The data processing process ID 701 indicates an ID assigned to the target processing process in the data registration process. A combination of a data item 702, a sample number 703, a sample defect number 704, and a sample average 705 exists for each data item belonging to the target processing. A data item 702 indicates a data item. The number of samples 703, the number of sample defects 704, and the sample average 705 indicate the number of samples, the number of sample defects, and the sample average for the data item.

図６及び図７によれば、図１５に示したエントリユニット１２０は、同一のデータ加工処理ＩＤに紐づけられた複数のデータセットである。エントリユニット１２０における構成要素は下記の通りである。
・データ加工処理定義１２１は、定義名６０２から特定されるデータ加工処理定義である。
・加工処理特徴１２２は、集約キー６０３、集約対象６０４及び集約方法６０５の組合せである。
・データ傾向１２３は、データ加工処理ＩＤに紐づいたデータ項目毎のデータ項目７０２、標本数７０３、標本欠損数７０４及び標本平均７０５の組合せである。 6 and 7, the entry unit 120 shown in FIG. 15 is a plurality of data sets associated with the same data processing ID. The components in the entry unit 120 are as follows.
The data processing definition 121 is a data processing definition defined from the definition name 602.
The processing feature 122 is a combination of the aggregation key 603, the aggregation target 604, and the aggregation method 605.
Data trend 123 is a combination of a data item 702, a sample number 703, a sample defect number 704, and a sample average 705 for each data item associated with the data processing ID.

図８は、特徴抽出処理（図２のＳ２０２、又は、図９のＳ９０２）の流れを示す。 FIG. 8 shows the flow of the feature extraction process (S202 in FIG. 2 or S902 in FIG. 9).

特徴抽出部１９１が、指定されたデータ加工処理定義を取得する（Ｓ８０１）。ここで、「指定されたデータ加工処理定義」とは、図２のデータ登録処理のために指定されたデータ加工処理定義でもよいし、後述の図９の検索処理のために指定されたデータ加工処理定義でもよい。 The feature extraction unit 191 acquires the designated data processing process definition (S801). Here, the “specified data processing definition” may be the data processing definition specified for the data registration processing of FIG. 2 or the data processing specified for the search processing of FIG. 9 described later. It may be a process definition.

特徴抽出部１９１が、Ｓ８０１で取得されたデータ加工処理定義に加工処理特徴が存在するか否か、具体的には、<component_type>が“Groupby”である<component>があるか否かを判断する（Ｓ８０２）。 The feature extraction unit 191 determines whether or not there is a processing feature in the data processing definition obtained in S801, specifically, whether or not there is a <component> whose <component_type> is “Groupby”. (S802).

Ｓ８０２の判断結果が偽の場合（Ｓ８０２：Ｎｏ）、特徴抽出部１９１が、抽出失敗を結果として返す（Ｓ８０３）。 When the determination result in S802 is false (S802: No), the feature extraction unit 191 returns an extraction failure as a result (S803).

Ｓ８０２の判断結果が真の場合（Ｓ８０２：Ｙｅｓ）、特徴抽出部１９１が、データ加工処理定義から加工処理特徴を抽出、具体的には、<component_type>が“Groupby”である<component>内の<key>、<target>及び<method>でそれぞれ指定された値を取得し、取得した値を、それぞれ、集約キー６０３、集約対象６０４及び集約方法６０５とする（Ｓ８０４）。そして、特徴抽出部１９１が、抽出した加工処理特徴（集約キー６０３、集約対象６０４及び集約方法６０５）と抽出成功とを結果として返す（Ｓ８０５）。 When the determination result in S802 is true (S802: Yes), the feature extraction unit 191 extracts the processing feature from the data processing definition, specifically, <component_type> is “Groupby” in <component> Values specified by <key>, <target>, and <method> are acquired, and the acquired values are set as an aggregation key 603, an aggregation target 604, and an aggregation method 605, respectively (S804). Then, the feature extraction unit 191 returns the extracted processing feature (the aggregation key 603, the aggregation target 604, and the aggregation method 605) and the extraction success as a result (S805).

図９は、検索処理の流れを示す。 FIG. 9 shows the flow of search processing.

加工処理検索部１１１が、データ加工処理定義が指定された検索要求を受け付ける（Ｓ９０１）。 The processing process search unit 111 receives a search request in which the data processing process definition is specified (S901).

加工処理検索部１１１が、特徴抽出部１９１を呼び出し、特徴抽出部１９１が、特徴抽出処理を行う（Ｓ９０２）。Ｓ９０２の結果が抽出失敗の場合、検索結果表示部１１２が、特徴を抽出できず検索ができなかったことを検索結果として表示する（Ｓ９０５）。 The processing search unit 111 calls the feature extraction unit 191 and the feature extraction unit 191 performs a feature extraction process (S902). If the result of S902 is an extraction failure, the search result display unit 112 displays that the feature could not be extracted and the search could not be performed as a search result (S905).

Ｓ９０２の結果が抽出成功の場合、加工処理検索部１１１が、特徴検索を行う（Ｓ９０３）。すなわち、加工処理検索部１１１が、抽出された加工処理特徴をキーに、特徴管理テーブル１８１から、当該加工処理特徴に類似する（例えば、当該加工処理特徴との一致度が所定度合以上である）加工処理特徴を検索する特徴検索を行う。加工処理検索部１１１が、類似するデータ加工処理定義とそれに紐付くデータ傾向とが関連付いた検索結果を作成する（Ｓ９０４）。当該検索結果は、類似するデータ加工処理定義のデータ加工処理ＩＤも関連付けられる。検索結果表示部１１２が、Ｓ９０４で作成された検索結果を表示する（Ｓ９０５）。 If the result of S902 is successful, the processing search unit 111 performs a feature search (S903). That is, the processing processing search unit 111 is similar to the processing processing feature from the feature management table 181 using the extracted processing processing feature as a key (for example, the degree of coincidence with the processing processing feature is a predetermined degree or more). Perform a feature search to search for processing features. The processing process search unit 111 creates a search result in which a similar data processing process definition is associated with the data trend associated with it (S904). The search result is also associated with a data processing ID of a similar data processing definition. The search result display unit 112 displays the search result created in S904 (S905).

図１０は、特徴検索（図９のＳ９０３）の流れを示す。 FIG. 10 shows the flow of feature search (S903 in FIG. 9).

加工処理検索部１１１が、Ｓ９０２の特徴抽出処理において抽出された加工処理特徴を取得する（Ｓ１００１）。Ｓ１００１で取得された加工処理特徴を、図１０の説明において「キー特徴」と呼ぶ。 The process search unit 111 acquires the process feature extracted in the feature extraction process of S902 (S1001). The processing feature acquired in S1001 is referred to as “key feature” in the description of FIG.

加工処理検索部１１１が、特徴管理テーブル１８１のレコード毎にＳ１００２及びＳ１００３を行う。以下、１つのレコードを例に取る（図１０の説明において「対象レコード」）。 The processing search unit 111 performs S1002 and S1003 for each record of the feature management table 181. Hereinafter, one record is taken as an example (“target record” in the description of FIG. 10).

加工処理検索部１１１が、対象レコードが示す加工処理特徴がキー特徴に類似する（例えば、対象レコードが示す加工処理特徴のキー特徴との一致度が所定度合以上か）か否かを判断する。具体的には、例えば、加工処理検索部１１１が、対象レコードが示す加工処理特徴を構成する複数の要素（集約キー６０３、集約対象６０４及び集約方法６０５）と、キー特徴を構成する複数の要素のうち、ｍ個（ｍは自然数、例えばｍ＝２）以上の要素が互いに一致するか否かを判断する（Ｓ１００２）。 The processing process search unit 111 determines whether or not the processing process feature indicated by the target record is similar to the key feature (for example, the degree of coincidence with the key feature of the processing process feature indicated by the target record is greater than or equal to a predetermined degree). Specifically, for example, the processing processing search unit 111 includes a plurality of elements (aggregation key 603, aggregation target 604, and aggregation method 605) that configure the processing characteristics indicated by the target record, and a plurality of elements that configure the key characteristics. Among these, it is determined whether or not m (m is a natural number, for example, m = 2) or more elements match each other (S1002).

Ｓ１００２の判断結果が偽の場合（Ｓ１００２：Ｎｏ）、対象レコードが示す加工処理特徴は、キー特徴に類似しない加工処理特徴である。 When the determination result in S1002 is false (S1002: No), the processing feature indicated by the target record is a processing feature that is not similar to the key feature.

Ｓ１００２の判断結果が真の場合（Ｓ１００２：Ｙｅｓ）、対象レコードが示す加工処理特徴は、キー特徴に類似する加工処理特徴である。加工処理検索部１１１が、当該加工処理特徴に関連付いているデータ加工処理ＩＤを出力する（Ｓ１００３）。当該データ加工処理ＩＤが、検索結果に関連付けられることになる。 When the determination result in S1002 is true (S1002: Yes), the processing feature indicated by the target record is a processing feature similar to the key feature. The processing processing search unit 111 outputs a data processing processing ID associated with the processing processing feature (S1003). The data processing process ID is associated with the search result.

図１１Ａは、検索画面の一例を示す。 FIG. 11A shows an example of a search screen.

検索画面１１００は、ＧＵＩのようなユーザインターフェース画面である。検索画面１１００は、第１プレーン１１３１と第２プレーン１１３２とを有する。第１プレーン１１３１を含んだ画面と、第２プレーン１１３２を含んだ画面とに分離していてもよい。 The search screen 1100 is a user interface screen such as a GUI. The search screen 1100 includes a first plane 1131 and a second plane 1132. The screen including the first plane 1131 and the screen including the second plane 1132 may be separated.

第１プレーン１１３１の表示は、例えば加工処理検索部１１１により制御される。第１プレーン１１３１は、検索キーとするデータ加工処理定義の指定と検索実行の指示とを受け付けるプレーンである。具体的には、例えば、第１プレーン１１３１は、ＵＩ１１０１及び１１０２を有する。ＵＩ１１０１は、今回の分析に対応したデータ加工処理定義の入力用のＵＩである。ＵＩ１１０２は、検索処理の開始の指示用のＵＩである。ＵＩ１１０２（例えばボタン）が操作されると、ＵＩ１１０１を用いて指定されたデータ加工処理定義をキーとして指定された検索要求が検索装置１０１へ発行され、当該検索要求に応答して検索処理（図９）が開始される。 The display of the first plane 1131 is controlled by, for example, the processing process search unit 111. The first plane 1131 is a plane that accepts specification of a data processing process definition as a search key and a search execution instruction. Specifically, for example, the first plane 1131 includes UIs 1101 and 1102. A UI 1101 is a UI for inputting a data processing definition corresponding to the current analysis. A UI 1102 is a UI for instructing start of search processing. When a UI 1102 (for example, a button) is operated, a search request specified using the data processing process definition specified using the UI 1101 as a key is issued to the search device 101, and a search process (FIG. 9) is performed in response to the search request. ) Is started.

第２プレーン１１３２の表示は、例えば検索結果表示部１１２により制御される。第２プレーン１１３２は、検索結果が表示されるプレーンである。具体的には、例えば、第２プレーン１１３２には、検索結果に関連付けられた１以上の検索結果モジュール１１０５の各々について、ＵＩ１１０６〜１１０７が表示される。１つの検索結果モジュール１１０５は、Ｓ１１０３で取得されたデータ加工処理ＩＤに対応する。ＵＩ１１０６は、データ加工処理ＩＤに紐づけられているデータ加工処理定義１２１の定義名を表示するＵＩである。ＵＩ１１０７は、データ加工処理ＩＤに紐づけられているデータ傾向（データ項目７０２、標本数７０３、標本欠損数７０４及び標本平均７０５の組合せ）を表示するＵＩである。ＵＩ１１０８は、ＵＩ１１０６に表示された定義名に対応するデータ加工処理定義１２１の詳細を表示することの指示用のＵＩである。ＵＩ１１０８（例えばボタン）が操作されると、データ加工処理定義１２１の詳細を表示する定義詳細画面（図１１Ｂ）が、例えば検索結果表示部１１２により表示される。 The display of the second plane 1132 is controlled by the search result display unit 112, for example. The second plane 1132 is a plane on which search results are displayed. Specifically, for example, UIs 1106 to 1107 are displayed on the second plane 1132 for each of one or more search result modules 1105 associated with the search results. One search result module 1105 corresponds to the data processing ID acquired in S1103. A UI 1106 is a UI that displays the definition name of the data processing definition 121 associated with the data processing ID. A UI 1107 is a UI for displaying a data trend (a combination of a data item 702, a sample number 703, a sample defect number 704, and a sample average 705) associated with the data processing process ID. A UI 1108 is an instruction UI for displaying details of the data processing definition 121 corresponding to the definition name displayed on the UI 1106. When a UI 1108 (for example, a button) is operated, a definition detail screen (FIG. 11B) for displaying details of the data processing process definition 121 is displayed by, for example, the search result display unit 112.

検索結果に関連付けられた１以上の検索結果モジュール１１０５の各々には、当該検索結果モジュール１１０５に対応する類似した分析についてのデータ傾向が含まれている。このため、ユーザにとって、検索結果において提示されているデータ加工処理が、行いたい分析に類似するか否かを判断し易い。 Each of the one or more search result modules 1105 associated with the search result includes a data trend for a similar analysis corresponding to the search result module 1105. For this reason, it is easy for the user to determine whether the data processing presented in the search result is similar to the analysis desired to be performed.

図１１Ｂは、定義詳細画面の一例を示す。 FIG. 11B shows an example of the definition details screen.

定義詳細画面１１１０は、データ加工処理定義１２１の詳細として、データ加工処理定義１２１が示すデータ加工処理を模式的に示す。また、定義詳細画面１１１０は、データ加工処理について、当該データ加工処理の加工処理特徴の詳細を表示する。詳細は、例えば、加工処理特徴を構成する複数の要素の各々について、当該要素の要素名（例えば“集約キー）”と、当該要素の値（例えば“［商品］”）とを含む。 The definition detail screen 1110 schematically shows the data processing process indicated by the data processing process definition 121 as the details of the data processing process definition 121. The definition detail screen 1110 displays the details of the processing characteristics of the data processing process for the data processing process. The details include, for example, the element name (for example, “aggregation key”) of the element and the value of the element (for example, “[product]”) for each of the plurality of elements constituting the processing feature.

検索結果に加えて、類似するデータ加工処理の詳細が表示されることで、ユーザにとって、検索結果において提示されているデータ加工処理が、行いたい分析に類似するか否かを一層判断し易い。 By displaying the details of the similar data processing process in addition to the search result, it is easier for the user to determine whether or not the data processing process presented in the search result is similar to the analysis to be performed.

実施例２を説明する。その際、実施例１との相違点を主に説明し、実施例１との共通点については説明を省略又は簡略する。 A second embodiment will be described. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified.

図１２は、実施例２に係る特徴管理テーブルの構成を示す。 FIG. 12 illustrates a configuration of a feature management table according to the second embodiment.

実施例２に係る特徴管理テーブル１２８１が有する各レコードは、上述した集約キー６０３、集約対象６０４及び集約方法６０５に代えて、結合テーブルの組合せ１２０３といった情報を格納する。すなわち、本実施例では、データ加工処理として、集約処理に代えて又は加えて、結合処理が採用される。 Each record included in the feature management table 1281 according to the second embodiment stores information such as a combination table combination 1203 instead of the above-described aggregation key 603, aggregation target 604, and aggregation method 605. In other words, in this embodiment, as the data processing process, a joining process is employed instead of or in addition to the aggregation process.

結合テーブルの組合せ１２０３の具体例は、次の通りである。売上明細テーブルから商品毎の売上金額の合計を分析する場合、データ加工処理は、POSデータと商品マスタの結合である（１番目のレコード参照）。また、売上明細テーブルから店舗毎の売上金額の合計を分析する場合、データ加工処理は、POSデータと店舗マスタの結合である（２番目のレコード参照）。このように、分析種別ごとに分類できるケースがあるため、結合テーブルの組合せが、加工処理特徴の一例として採用される。 A specific example of the combination table combination 1203 is as follows. When analyzing the total sales amount for each product from the sales details table, the data processing is a combination of POS data and the product master (see the first record). When analyzing the total sales amount for each store from the sales details table, the data processing is a combination of the POS data and the store master (see the second record). As described above, since there are cases where the data can be classified for each analysis type, a combination of combined tables is adopted as an example of a processing feature.

図１３は、実施例２に係る特徴抽出処理の流れを示す。 FIG. 13 shows a flow of feature extraction processing according to the second embodiment.

特徴抽出部１９１が、Ｓ８０１と同様、指定されたデータ加工処理定義を取得する（Ｓ１３０１）。 The feature extraction unit 191 acquires the specified data processing definition as in S801 (S1301).

特徴抽出部１９１が、Ｓ１３０１で取得されたデータ加工処理定義に加工処理特徴が存在するか否か、具体的には、<component_type>が“Join”である<component>があるか否かを判断する（Ｓ１３０２）。Ｓ１３０２の判断結果が偽の場合（Ｓ１３０２：Ｎｏ）、特徴抽出部１９１が、Ｓ８０３と同様、抽出失敗を結果として返す（Ｓ１３０３）。 The feature extraction unit 191 determines whether or not there is a processing feature in the data processing definition acquired in S1301, specifically, whether or not there is a <component> whose <component_type> is “Join”. (S1302). When the determination result in S1302 is false (S1302: No), the feature extraction unit 191 returns an extraction failure as a result, similar to S803 (S1303).

Ｓ１３０２の判断結果が真の場合（Ｓ１３０２：Ｙｅｓ）、特徴抽出部１９１が、<component_type>が“Join”である<component>内の<target_right>, <target_left>で指定された値をすべて取得する（Ｓ１３０４）。特徴抽出部１９１が、取得した値の中で重複する値がある場合は、ユニークになるように重複値を除外し、結合テーブル組み合わせとして加工処理特徴を抽出する（Ｓ１３０５）。そして、特徴抽出部１９１が、抽出した加工処理特徴（結合テーブルの組合せ１２０３）と抽出成功とを結果として返す（Ｓ１３０６）。 When the determination result in S1302 is true (S1302: Yes), the feature extraction unit 191 acquires all the values specified by <target_right> and <target_left> in <component> whose <component_type> is “Join”. (S1304). If there are duplicate values among the acquired values, the feature extraction unit 191 excludes the duplicate values so as to be unique, and extracts the processing feature as a combination table combination (S1305). Then, the feature extraction unit 191 returns the extracted processing feature (combination table combination 1203) and the extraction success as a result (S1306).

図１４は、実施例２に係る特徴検索の流れを示す。 FIG. 14 illustrates a flow of feature search according to the second embodiment.

加工処理検索部１１１が、図１３の特徴抽出処理において抽出された加工処理特徴を取得する（Ｓ１４０１）。Ｓ１４０１で取得された加工処理特徴を、図１４の説明において「キー特徴」と呼ぶ。 The processing process search unit 111 acquires the processing process feature extracted in the feature extraction process of FIG. 13 (S1401). The processing feature acquired in S1401 is referred to as “key feature” in the description of FIG.

加工処理検索部１１１が、特徴管理テーブル１８１のレコード毎にＳ１４０２及びＳ１４０３を行う。以下、１つのレコードを例に取る（図１４の説明において「対象レコード」）。 The processing search unit 111 performs S1402 and S1403 for each record of the feature management table 181. Hereinafter, one record is taken as an example (“target record” in the description of FIG. 14).

加工処理検索部１１１が、対象レコードが示す加工処理特徴がキー特徴に類似するか否かを判断する。具体的には、例えば、加工処理検索部１１１が、キー特徴と対象レコードが示す結合テーブルの組合せ１２０３とが一致するか否かを判断する（Ｓ１４０２）。 The processing search unit 111 determines whether the processing feature indicated by the target record is similar to the key feature. Specifically, for example, the processing search unit 111 determines whether or not the key feature and the combination table combination 1203 indicated by the target record match (S1402).

Ｓ１４０２の判断結果が真の場合（Ｓ１４０２：Ｙｅｓ）、加工処理検索部１１１が、当該加工処理特徴に関連付いているデータ加工処理ＩＤを出力する（Ｓ１４０３）。 When the determination result in S1402 is true (S1402: Yes), the processing search unit 111 outputs a data processing ID associated with the processing feature (S1403).

以上、幾つかの実施例を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施例にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実施することが可能である。 Although several embodiments have been described above, these are examples for explaining the present invention, and the scope of the present invention is not intended to be limited to these embodiments. The present invention can be implemented in various other forms.

１０１…検索装置
101 ... Search device

Claims

Processing features of the specified data processing definition or processing features similar to the specified processing feature from one or more processing features each corresponding to one or more registered data processing definitions for analysis A processing search section for searching
And a search result display unit for displaying a search result associated with information related to a data processing definition having the similar processing feature.

A data trend of the data processing definition is associated with at least one data processing definition of the one or more registered data processing definitions,
When the data processing definition having the similar processing feature is the at least one data processing definition, the search result is associated with a data tendency associated with the data processing definition having the similar processing feature. Being
The search device according to claim 1.

The at least one data processing definition is further associated with a processing feature of the data processing definition,
When the data processing definition having the similar processing feature is the at least one data processing definition, the search result includes a processing feature associated with the data processing definition having the similar processing feature. Associated,
The search device according to claim 2.

Each of the following (a) and (b) includes a combination of an aggregation key, an aggregation target, and an aggregation method of an aggregation process included in the data processing process.
(A) a processing feature of the specified data processing definition, or the specified processing feature,
(B) at least one of one or more processing features respectively corresponding to the one or more registered data processing definitions;
The search device according to claim 1.

Each of the following (a) and (b) includes a combination of data items to be combined in a combining process included in the data processing process.
(A) a processing feature of the specified data processing definition, or the specified processing feature,
(B) at least one of one or more processing features respectively corresponding to the one or more registered data processing definitions;
The search device according to claim 1.

The data trend associated with the at least one data processing definition is a statistic of data used in the data processing indicated by the data processing definition.
The search device according to claim 2.

At least one of the processing feature of the data processing definition and the data trend of the data processing definition is associated with at least one data processing definition of the one or more registered data processing definitions. Yes,
The search device according to claim 1.

A feature extraction unit for automatically extracting the processing feature of the specified data processing definition from the specified data processing definition;
The search device according to claim 1, further comprising:

Processing features of the specified data processing definition or processing features similar to the specified processing feature from one or more processing features each corresponding to one or more registered data processing definitions for analysis Search for
Displaying search results associated with information related to data processing definitions having similar processing characteristics;
retrieval method.

Processing features of the specified data processing definition or processing features similar to the specified processing feature from one or more processing features each corresponding to one or more registered data processing definitions for analysis Search for
Displaying search results associated with information related to data processing definitions having similar processing characteristics;
A computer program that causes a computer to execute.