WO2014081012A1

WO2014081012A1 - Data analysis assistance processing system and method

Info

Publication number: WO2014081012A1
Application number: PCT/JP2013/081513
Authority: WO
Inventors: 正裕本林; 古川　直広; 中野　定樹
Original assignee: 株式会社日立製作所
Priority date: 2012-11-26
Filing date: 2013-11-22
Publication date: 2014-05-30
Also published as: JP2014106611A; JP6025520B2

Abstract

In order to designate input data and result output/visualization images to thereby automatically construct processing for performing an interpolation therebetween, input data and result output/visualization images are inputted to thereby automatically construct processing for performing an interpolation therebetween. For example, when in a system for executing predetermined analysis processing on the basis of input data and visualizing a processing result, identification information relating to input data selected by a user via an input device, and identification information relating to a visualization method are inputted, setting item candidates for visualization, which are predicted from the identification information relating to the input data, are displayed.

Description

Data analysis support processing system and method

The present invention relates to a data analysis support processing system and method, and relates to a technology for supporting data analysis.

There is a method called business intelligence that stores, analyzes, and processes a huge amount of corporate data accumulated from business systems, etc., and uses it for corporate decision making. This method can analyze a large amount of data at a high speed across a plurality of databases (DB). This method is highly resistant to routine analysis in which highly structured analysis target data needs to be constructed in advance according to the purpose of analysis in order to increase the speed, and analysis contents and necessary data are determined.

On the other hand, if the analysis process and visualization process for the analysis target data and data are not fixed, the analysis process data and the visualization process must be changed at each trial and error. As background art related to such data analysis, there are Patent Literature 1 and Patent Literature 2. Japanese Patent Application Laid-Open No. 2004-151561 discloses a technique that makes it easy to select an analysis method by extracting registered analysis setting information using information obtained by abstracting data as a key, and using the extracted information. Patent Document 2 discloses a technology that supports the execution of a composite analysis in which a plurality of analyzes are combined using a past analysis history.

JP 2010-205218 A JP 2005-157896 A

As seen in the background art described above, when performing analysis, it is common to perform a procedure of (1) data selection (2) analysis method selection (3) analysis execution (4) result output / visualization. In the technique of recording and reusing such an analysis procedure, a past analysis procedure can be reproduced, and when the user creates an analysis procedure, an analysis method that matches the purpose of the user can be recommended. However, for users, especially those who do not conduct analysis, it is not so important what kind of procedure or method is used for analysis, and only interested in the analysis result of the data specified by the user, There is a need for an analysis support technology that allows the user to select only (1) and (4) and not to make the user aware of (2) and (3).
An object of the present invention is to provide a data analysis support method and system capable of solving the above-described problems and automatically constructing a process for interpolating between the input data and the result output / visualized image. There is to do.

In order to solve the above problems, for example, the configuration described in the claims is adopted.
[Application Example 1]
This data analysis support processing method is
A data analysis support processing method in a system that executes predetermined analysis processing on input data and visualizes the processing result,
When the identification information of the input data selected by the user and the identification information of the visualization method are input via the input device, the setting item candidates for visualization predicted from the identification information of the input data are displayed.
Thus, by specifying the input data and the result output / visualization image, it is possible to automatically predict and display the setting items for visualization that interpolate between them.

[Application Example 2]
In the above data analysis support processing method,
In accordance with the setting item selected from the setting item candidates, the processing result based on the input data is displayed by the visualization method.
As a result, the processing result can be visualized as specified.

[Application Example 3]
In the above data analysis support processing method,
A past visualization display or summary of the selected input data is further displayed.
Thereby, support information can be provided so that the user can set the visualization method without fine settings.

[Application Example 4]
In the above data analysis support processing method,
Based on the history of analysis processing and the history of visualization of input data, candidate setting items for analysis processing and visualization of the selected input data are predicted.
Thereby, the candidate of the setting item for a highly appropriate analysis process and visualization can be estimated.

[Application Example 5]
In the above data analysis support processing method,
The analysis processing is composed of a combination of predetermined analysis processing units,
With respect to the analysis processing history and the visualization history for the input data, an analysis pattern indicating a transition of the analysis processing unit is analyzed, and an analysis processing candidate for the selected input data is predicted according to the analysis pattern.
Thereby, the candidate of the setting item for a highly appropriate analysis process and visualization can be estimated.

[Application Example 6]
In the above data analysis support processing method,
Based on the similarity between the data used in the past analysis process and the selected input data, it is determined whether to apply the past analysis process to the selected input data.
Thereby, the applicability of the analysis process to the input data can be determined in more detail.

[Application Example 7]
In the above data analysis support processing method,
The setting item for visualization is data to be displayed in a table column or data to be a graph axis.
Thereby, it is possible to present support information that allows the user to easily select setting items for data to be displayed in the columns of the table and data to be graph axes.

According to a typical embodiment of the present invention, there is provided a data analysis support method and system capable of automatically constructing a process for interpolating between input data and result output / visualized image. Can do.

It is a block diagram of an analysis support system. It is a flowchart which shows an example of the procedure of the operation | work 1 of 1st embodiment. It is an example of the analysis process procedure creation dialog of 1st embodiment. It is an example of the visualization content creation dialog of 1st embodiment. It is a data structure (upper figure) of an analysis processing procedure of a first embodiment, and an example of data (lower figure). It is the data structure (upper figure) of the visualization process procedure of 1st embodiment, and an example (lower figure) of data. It is an example of the dialog for input data selection of 1st embodiment. It is an example of the visualization content edit dialog of 1st embodiment. It is an example of the parameter setting dialog regarding the table | surface of the visualization content edit dialog of 1st embodiment. It is an example of the parameter setting dialog regarding the graph of the visualization content edit dialog of 1st embodiment. It is a flowchart which shows an example of the analysis pattern creation procedure of 1st embodiment. It is a flowchart which shows an example of the visualization template preparation procedure of 1st embodiment. It is an example of the procedure which produces | generates the visualization content edit assistance information applicable to the selection data and visualization components information which the user selected in 1st embodiment. This is a detailed procedure of S712 in FIG. 7-C. It is a figure which shows the image of creation of the analysis pattern of 1st embodiment. It is a figure which shows an example of the data structure of the analysis pattern of 1st embodiment. It is a figure which shows an example of the data of the analysis pattern of 1st embodiment. It is a figure which shows the image of visualization template preparation of 1st embodiment. It is a figure which shows an example of the data structure of the visualization template of 1st embodiment. It is a figure which shows an example of the visualization template of 1st embodiment. It is a visualization content editing support information generation image (1) when “bar graph” is selected as the visualization component of the first embodiment. It is a visualization content editing support information generation image (2) when “bar graph” is selected as the visualization component of the first embodiment. It is an example of the flowchart which shows the procedure in which the visualization content edit assistance information of S206 of 1st embodiment is set to a table | surface or a graph parameter setting dialog. It is an image (1) for setting content editing support information in the graph parameter setting dialog in S2063 of the first embodiment. It is an image (2) for setting content editing support information in the graph parameter setting dialog in S2063 of the first embodiment. It is an image (1) for setting content editing support information in the table parameter setting dialog in S2063 of the first embodiment. It is an image (2) for setting content editing support information in the table parameter setting dialog in S2063 of the first embodiment. It is an example of the procedure which detailed the procedure of S207 of 1st embodiment. It is a figure which shows an example of the execution result of an example of the analysis process procedure data produced | generated as a result of the process of S2071-S2082 of 1st embodiment, and analysis process procedure data. It is an example of the visualization content edit dialog of 2nd embodiment. It is a figure which shows an example of the data structure of the analysis pattern of 3rd embodiment. It is a figure which shows an example of the data of the analysis pattern of 3rd embodiment. It is a figure which shows an example of similarity calculation of 3rd embodiment.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(First embodiment)
FIG. 1 is a block diagram of the analysis support system of the present embodiment.
The analysis support system (computer system) described in FIG. 1 includes a server 101, a computer 102, a display 103, an input device 104, networks 105 and 106, and a database 1004 (DB) 107.
The server 101 and the computer 102 are connected to each other via the network 105, and the server 101 and the DB 107 are connected to each other via the network 106.
The server 101 and the computer 102 are used when the user 100 performs analysis work. Furthermore, functions related to the analysis work are provided, a history of function execution is collected, and a template described later is extracted and recommended according to the situation to assist the user 100 in the analysis work.

The server 101 and the computer 102 can use a general PC as an example. The server 101 and the computer 102 include a processor, a memory, and an interface. The processor executes various processes by processing a program stored in the memory. The memory stores a program and data for executing processing. The interface is connected to the input device 104 such as a keyboard and a mouse, connected to the display 103, connected to the server 101 and the computer 102 via the network 105, and connected to the server 101 via the network 106. And those connected to the DB 107 and the like.
The DB 107 is a database that holds various data such as information related to companies, various statistical data, time-series data such as sensors, and Web access logs. The DB 107 may be configured to be included in the server 101, or may be configured to be stored in an external storage device and connected to the server 101 via the network 106.
The server 101 includes, for example, an input data creation unit 111, an analysis processing execution unit 112, a data visualization unit 113, a processing procedure recording unit 114, a processing construction unit 115, a processing procedure execution unit 116, and a template recommendation unit (processing procedure recommendation unit) 117. , A processing procedure analysis unit 118, a template DB 119, and a processing procedure DB 120. The input data creation unit 111, the analysis process execution unit 112, the data visualization unit 113, the process procedure recording unit 114, the process construction unit 115, the process procedure execution unit 116, the template recommendation unit 117, and the process procedure analysis unit 118 are, for example, programs The function of each unit is realized by being stored in the memory and executed by the processor.
The input data creation unit 111 extracts desired data from the DB 107 in accordance with an instruction from the user 100 or the processing procedure execution unit 116, and performs input data creation processing on the extracted data to create analysis target data.
The analysis processing execution unit 112 performs data processing on the analysis target data created by the input data creation unit 111 according to an instruction from the user 100 or the processing procedure execution unit 116, and creates processing result data.

The data visualization unit 113 performs visualization processing on the processing result data created by the analysis processing execution unit 112 according to an instruction from the user 100 or the processing procedure execution unit 116, and visualizes the processing result data.
The processing procedure recording unit 114 records each processing of the input data creation unit 111, the analysis processing execution unit 112, and the data visualization unit 113 in the processing procedure DB 120.
The process construction unit 115 creates visualized content editing support information and constructs an analysis process.
The processing procedure execution unit 116 retrieves the processing procedure from the processing procedure DB 120 according to the instruction of the user 100, and instructs the input data creation unit 111, the analysis processing execution unit 112, and the data visualization unit 113 according to the contents of the processing procedure to perform analysis processing. Control the execution of
The processing procedure recommendation unit 117 extracts a processing procedure from the processing procedure DB 120 based on a predetermined criterion, and presents the processing procedure to the user 100 using an appropriate output device such as a display.
The computer 102 includes an input data selection unit 121, a visualization template editing unit 122, and an analysis processing procedure creation unit 123.

The input data selection unit 121 displays a dialog for selecting input data in accordance with an instruction from the user 100 and holds the selection result of the user.
The visualization template editing unit 122 displays a dialog for editing the visualization template in accordance with an instruction from the user 100 and holds the editing result of the user.
The analysis processing procedure creation unit 123 displays a dialog for creating an analysis processing procedure in accordance with an instruction from the user 100, and holds the analysis processing procedure created by the user.
Each dialog is displayed on the display 103, for example, and each instruction from the user, a selection result, an editing result, and the like are input using the input device 104.
Each unit of the server 101 and the computer 102 may be configured by one apparatus or may be appropriately distributed. In this specification, for example, the processing procedure recording unit 114, the processing construction unit 115, the processing procedure execution unit 116, the template recommendation unit (processing procedure recommendation unit) 117, the processing procedure analysis unit 118, and the computer 102 of the server 101 The visualization template editing unit 122 and the analysis processing procedure creation unit 123 are collectively referred to as a data analysis support processing unit.
The configuration of the above-described analysis support system is not limited to the first embodiment, and can be applied to other embodiments.

FIG. 2 is a flowchart illustrating an example of a procedure according to the first embodiment.
This flowchart schematically shows the operation of the entire system including the computer 102 and the server 101. More details will be described later.
First, when the analysis support function of this embodiment is not used (S201), the computer 102 (for example, the analysis processing procedure creation unit 123) displays the analysis processing procedure creation dialog shown in FIG. 3-A on the display 103 (S202). ). The computer 102 (for example, the analysis processing procedure creation unit 123) causes the user 100 to select an analysis processing unit such as “data selection”, “aggregation”, and “filtering” in the analysis processing unit selection unit 301A using the input device 104. The analysis processing procedure to be created is sent to the server 101. The server 101 (for example, the analysis process execution unit 112) executes the analysis process according to the received analysis process procedure, and sends the execution result to the computer 102 (S203).

Next, the computer 102 displays a visualized content creation dialog shown in FIG. When the user 100 uses the input device 104 to select an item included in the visualization component selection unit 301B and adds it to the visualization content display unit 302B, the computer 102 creates visualization content (S204). In the visualized content, for example, the execution result of the received analysis process is visualized according to the selected item (for example, a table or a graph). The computer 102 displays the visualized content as a result of S204 on the display 103 (S209). The server (for example, the processing procedure recording unit 114) records the analysis processing procedure and the visualization processing procedure in the processing procedure DB 120 (S210).

An example of the analysis processing procedure creation dialog is shown in FIG.
When a user creates an analysis processing procedure, an analysis processing procedure is created by selecting items included in the analysis processing unit selection unit 301A and adding them to the analysis processing unit sequence display unit 302A. The processing procedure recommendation unit 117 can extract a processing procedure from the processing procedure DB 120 according to the creation status at the start of analysis processing procedure creation, or can recommend it to the user.
Here, the analysis processing procedure refers to a series of analysis processing units (for example, the minimum unit of analysis processing, which may be an appropriate processing unit). Specifically, the analysis processing unit indicates processing such as “data selection”, “aggregation”, “filtering”, “calculation”, and “editing” as displayed in the analysis processing unit selection unit 301A.

An example of the visualization content creation dialog is shown in FIG. When the user creates the visualization content, the visualization content is created by selecting an item (visualization component) included in the visualization component selection unit 301B and adding it to the visualization content display unit 302B. The processing procedure recommendation unit 117 can recommend the visualization component to the user in accordance with the analysis processing execution result obtained as a result of S203 from the processing procedure DB 120 when the visualization content creation starts or in the middle.
Returning to FIG. 2, the description of the flowchart will be continued. When the analysis support function of this embodiment is used (S201), the computer 102 displays, for example, an input data selection dialog shown in FIG. 5 and a visualized content edit dialog exemplified in FIG. 6-A on the display 103. (S205). The computer 102 sends the data selected by the user in the input data selection dialog using the input device 104 and the visualized part information created in the visualization content editing dialog to the server 101, and the processing construction unit 115 of the server 101 selects the selected data. Based on (selected input data) and visualization component information, visualization content editing support information (for example, setting item candidates for visualization) to be described later is created and sent to the computer 102 (S206). Edit the visualized content by operating the input device 104 by the user using the parameter setting dialog related to the table illustrated in FIG. 6B or the parameter setting dialog related to the graph illustrated in FIG. 6C and the visualized content editing support information. To do. For example, a desired setting item is selected from the displayed setting item candidates. The computer 102 sends the editing result to the server 101, and the processing construction unit 115 constructs an analysis process based on the selection data and the visualization content editing result (S207). The analysis process execution unit 112 executes the constructed analysis process and sends the execution result to the computer 102 (S208). The visualization content as a result of S208 is displayed on the display 103 (S209). The visualization content here displays, for example, the processing result according to the specified visualization component information and setting items. The processing procedure recording unit 114 records the analysis processing procedure and the visualization processing procedure in the processing procedure DB 120 (S210).

Fig. 4-A shows an example of the data structure and data of the analysis processing procedure, and Fig. 4-B shows an example of the data structure and data of the visualization processing procedure. Such data is recorded in the processing procedure DB 120.

Fig. 5 shows the input data selection dialog. The data displayed in the input data list display area 501 is selected by the user 100 using the input device 104, and the selection of the input data is determined by pressing the OK button 502.

Fig. 6-A is an example of a visualization content edit dialog. When the user edits the visualized content, the user selects the item included in the visualized part selecting unit 601A using the input device 104 and adds the selected item to the visualized content display unit 602A, thereby editing the visualized content. For example, when a table is selected by the visualization component selection unit 601A, the computer 102 displays a table parameter setting dialog (FIG. 6B) for setting parameters related to the table, and when any one of the graphs is selected, the computer 102 102 displays a graph parameter setting dialog (FIG. 6C) for setting parameters relating to the graph.

The table parameter setting dialog illustrated in FIG. 6B includes, for example, a front side column candidate list display unit (601B), a front side column edit unit (602B), a front side column pattern list display unit (603B), and a front side column. A candidate list display section (604B) and a selected column display section for head front column (605B) are included. Also, an OK button (606B) and a cancel button (608B) are included.

The graph parameter setting dialog illustrated in FIG. 6C includes, for example, a viewpoint list display unit (601C), a viewpoint editing unit (602C), an X axis pattern list display unit (603C), and an X axis candidate list display unit ( 604C), an X-axis pattern editing unit (605C), a Y-axis pattern list display unit (606C), a Y-axis candidate list display unit (607C), and a Y-axis pattern editing unit (608C). Also, an OK button (609C) and a cancel button (610C) are included.

7A, FIG. 7B, FIG. 7C, and FIG. 7D create data to be displayed in each part of the table parameter setting dialog (FIG. 6B) and the graph parameter setting dialog (FIG. 6C). It is a flowchart which shows the procedure for.
First, processing for creating an analysis pattern from the analysis processing procedure of FIG. 7A will be described with reference to FIG. 8-A. Here, the analysis pattern is a set of two or more analysis processing units included in succession in an analysis processing procedure (for example, 801 in FIG. 8-A) as shown by 802 in FIG. 8-A. Here, a description will be given as a set of two consecutive analysis processing units. In the analysis processing procedure, analysis processing units are arranged in the order of processing. The processing procedure analysis unit 118 acquires an unprocessed analysis processing procedure M related to the analysis pattern creation from the processing procedure DB 120 (S701). The processing procedure analysis unit 118 takes out a set of the Nth (N is an integer of 1 or more) analysis processing unit and the N + 1th analysis processing unit of the analysis processing procedure M, and creates an analysis pattern (S702). The processing procedure analysis unit 118 repeats S701 and S702 until there is no unprocessed analysis processing procedure in the processing procedure DB 120 (S703). The processing procedure analysis unit 118 obtains a probability (transition probability) that the created analysis pattern appears in the analysis processing procedure. The transition probability indicates the probability that the analysis processing unit serving as the end point of the analysis pattern is executed next to the analysis processing unit serving as the starting point of the analysis pattern. Note that the processing procedure analysis unit 118 may store the analysis pattern in the template DB 119.

Fig. 8-A shows an image of creating an analysis pattern. In FIG. 8A, the analysis processing procedures A to E (801) are processed, and eight or more analysis patterns (802) are created. 803 in FIG. 8A is an example of the analysis processing unit, and the numbers in the

circles

801 and 802 correspond to the numbers in the table. FIG. 8-B shows an example of the analysis pattern data structure, and FIG. 8-C shows an example of the analysis pattern data.

Next, the processing related to visualization template creation in FIG. 7-B will be described with reference to the image of visualization template creation in FIG. 9-A. Here, the visualization template indicates the relationship between the visualization process (901 in FIG. 9-A, the data structure and the example of the content is 904 in FIG. 9-A) and the analysis process result data (902 in FIG. 9-A). It is data. The processing procedure analysis unit 118 acquires the analysis processing procedure group G (905 in FIG. 9A) and the unprocessed visualization processing procedure M (904 in FIG. 9A) regarding the creation of the visualization template from the processing procedure DB 120 (S704). . The processing procedure analysis unit 118 acquires the analysis processing procedure Q corresponding to the analysis processing ID of the visualization processing procedure M from the analysis processing procedure group G (S705). A set of visualization components of the last analysis processing unit N of the analysis processing procedure Q and visualization processing procedure M is created, and a visualization template T is created (S706). The visualization template T includes information shown in FIG. 9-B, for example. The processing procedure analysis unit 118 generalizes the selected column of the visualization template T and updates the visualization template T (S707). S704 to S707 are repeated until there is no unprocessed analysis processing procedure in the processing procedure DB 120 (S708). Note that the processing procedure analysis unit 118 may store the visualization template T in the template DB 119.

Here, an example of generalization in S707 is shown.
(1) If there is a law in the character string included in the list of Y-axis columns, it is replaced with a pattern. Example: In the case of the Y axis column list “April 2012, May 2012,...”, The Y axis “yyyy year MM month”. Example: In the case of the Y-axis column list “S1000, S1001, S1002,...”, The Y-axis “S ####” (# represents one character)
(2) Replace with a category according to the contents of the X-axis row. Example: Numeric data: Numeric data string, Date data: Date data string, Character string: Character string data string, etc.

Fig. 9-A shows the image of creating the visualization template, Fig. 9-B shows an example of the data structure of the visualization template, and Fig. 9-C shows an example of the visualization template. The visualization template includes at least immediately preceding processing (902 in FIG. 9-A), selection data information (information such as column names and column elements of the input data selected in 903 in FIG. 9-A), and processing result data information ( Information such as data string names and data elements at the time of executing 902 in FIG. 9A.

FIG. 7C shows an example of a procedure for generating visualization content editing support information applicable to selection data and visualization component information selected by the user.
First, the process construction unit 115 acquires an analysis pattern and a visualization template from the template DB 119 (S711). The processing construction unit 115 generates visualization content editing support information applicable to the selection data and visualization component information (S712). Details will be described later. The processing construction unit 115 sends the visualized content editing support information rearranged according to a predetermined criterion to the computer 102 (713).

Fig. 7-D shows the detailed procedure of S712. In FIG. 7-D, the predetermined criterion is “in order of increasing transition probability”. However, for example, “in order of decreasing number of analysis processing units from“ input data selection ”to visualization processing” may be used. Other criteria may be used. These can be switched by a setting file or the like.

In the following description, J and K are parameters, N is an analysis processing unit, L and L0 are lists, and M is an analysis processing unit to be processed or a visualization process. The process construction unit 115 executes the following process.
Empty list L, J = 1, K = 1, analysis processing unit N = “input data selection”, empty list L0, N is added to L0, and L0 is added to L (S7121). An analysis processing unit or visualization process having the Kth largest transition probability from the analysis processing unit N is acquired from the template DB 119 and is set as M (S7122). If the set of N and M has not yet been processed, the process proceeds to S7124. If the processing has been completed, the process proceeds to S7122 with K = K + 1 (S7123, S7126). When M is a visualization process, L0 is copied to the list L1, M is added to L0, L0 is registered in L, L1 is substituted into L0, and K = K + 1 is set, and the process proceeds to S7122. If M is not a visualization process, the process proceeds to S7127 (S7124 to S7126).
If M is empty, J = J−1, the first to Jth of L0 are copied to the list L1, K = 1, and if J = 0, the process proceeds to S713. When J is not 0, the process proceeds to S7122 (S7127, S7133, S7234).
When M is not empty, the data obtained as a result of applying the analysis process included in L0 is selected as D, and the analysis processing unit M is applied to the data D (S7127 to S7129). If applicable, M is added to L0, N = M, J = J + 1, K = 1, and the process proceeds to S7122 (S7130, S7131). If not applicable, the process proceeds to S7122 with K = K + 1 (S7130, S7132).

Here, the determination of whether or not it is applicable is applicable when the data D and the parameter P of the analysis processing unit M satisfy the following conditions.
(1) When parameter P is column name A and data D includes column name A (2) When parameter P is column name A and element name α, data D includes column name A and column name A is When element name α is included.
It is not applicable in the following cases.
When the number of rows of data D1 as a result of applying analysis processing unit M to data D with parameter P is zero.

FIGS. 10A and 10B show visualization content editing support information generation images when “bar graph” is selected as the visualization component. As shown in FIG. 10-A, the visualization content editing support information is used to construct a processing procedure for filling a space between “input data selection” and “bar graph” based on a predetermined standard using an analysis pattern and a visualization template. Information.
For example, the processing construction unit 115 applies the selected input data (here, the processing unit identification number “1”) to the selected visualization component (here, a bar graph) based on the analysis pattern and the visualization template acquired from the template DB 119. One or a plurality of processing procedure candidates (visualized content editing support information) are obtained. For example, a process procedure candidate is obtained by following the process transition indicated by the analysis pattern and the transition from the final analysis processing unit indicated by the visualization template to the visualization part. A plurality of processing procedure candidates can be arranged (ranked) in accordance with a predetermined criterion such as an order of increasing transition probability. The same is true when a component other than the bar graph is selected as the visualization component.

FIG. 11 is a flowchart showing a procedure in which the visualized content editing support information in S206 is set in the table or graph parameter setting dialog. First, the computer 102 acquires the selection result (selected visualization component) of the visualization component selection unit 601A in the visualization content editing dialog, and sends it to the server 101 together with the selection data (selected input data) (S2061). The visualization template editing unit 122 receives the visualization content editing support information created and sent by the process construction unit 115 of the server 101 in S7121 to S7134 (S2062). The visualization template editing unit 122 prepares a parameter setting dialog for a table or a graph (see, for example, FIG. 6-B and FIG. 6-C) according to the selected visualization part, and sets a value in each list display unit ( S2063). The visualization template editing unit 122 acquires the selection result of each list display unit, and proceeds to S207 (S2064).

12-A and 12-B show an image of setting content editing support information in the graph parameter setting dialog in S2063. When the user 100 selects “bar graph”, the server 101 executes the processing of S7121 to S7134 to generate an analysis processing procedure in which the display component becomes “bar graph”, a series group of visualization templates, and visualization content editing support Information. The viewpoint candidate list display unit 601C displays the viewpoint selection column of the visualization template and the columns included in the analysis processing result data until immediately before the visualization processing (12A). The X axis pattern list display section 603C displays the X axis pattern of the selected column of the visualization template (12B). The X-axis candidate list display unit 604C displays columns included in the analysis process result data up to immediately before the visualization process (12C). The Y axis pattern list display section 606C displays the Y axis pattern of the selected column of the visualization template (12D). The Y-axis candidate list display unit 607C displays columns included in the analysis process result data up to immediately before the visualization process (12E).

13A and 13B show images for setting content editing support information in the table parameter setting dialog in S2063. When the user 100 selects “table”, the server 101 executes the processing of S7121 to S7134 to generate an analysis processing procedure in which the display component becomes “table”, a series of visualization templates, and visualization content editing support Information. The front side column candidate list display unit 601B displays the front side selected columns of the visualization template and the columns included in the analysis processing result data until immediately before the visualization processing (13A). The front head row pattern list display section 603B displays the front head pattern of the selected column of the visualization template (13B). In the head column candidate list display section 604B, columns included in the analysis processing result data until immediately before the visualization processing are displayed (13C).
As described above, the selection column indicating the data used for visualization in the past and the data of the past processing result are displayed on each list display unit to assist the selection by the user.

FIG. 14 is a detailed procedure of S207. In FIG. 14, the predetermined criterion is “order in which the transition probability is large”, but for example, “order in which the number of analysis processing units from“ input data selection ”to visualization processing is small” may be considered. This can be switched by a setting file or the like.
In the following description, K is a parameter, N is an analysis processing unit, L is a list, M is an analysis processing unit or visualization process to be processed, and G is a visualization process. The process construction unit 115 executes the following process.
Empty list L, K = 1, analysis processing unit N = “input data selection”, visualization process G = visualized content editing result by user 100, and N is added to L (S2071). The analysis process unit or visualization process with the Kth largest transition probability from the analysis process unit N is acquired from the template DB 119 and is set as M (S2072). If the set of N and M has not yet been processed, the process proceeds to S2075. If it has been processed, the process proceeds to S2072 with K = K + 1 (S2073, S2074). If M = G, M is added to L, output as analysis processing, and the process proceeds to S209 (S2075, S2076). When M is not M = G but M is empty, K = K + 1 and the process proceeds to S2072 (S2077, S2074). If M is not empty, the data obtained as a result of applying the analysis processing included in L to the selected data is set as D, and the analysis processing unit M is applied to the data D (S2077 to S2079). If applicable, M is added to L, N = M, K = 1, and the process proceeds to S2072 (S2080, S2081). If not applicable, the process proceeds to S2072 with K = K + 1 (S2080, S2082). As shown in FIG. 15, the analysis processing procedure generated as a result of the above processing is data combining the analysis processing procedure and the visualization processing procedure. FIG. 15 shows an image of the result of executing the analysis processing procedure generated as a result of the above processing.

Here, the determination of whether or not it is applicable is applicable when the data D and the parameter P of the analysis processing unit M satisfy the following conditions.
(1) When parameter P is column name A and data D includes column name A (2) When parameter P is column name A and element name α, data D includes column name A and column name A is When the element name α is included (3) Even in cases other than (1) and (2), when the number of rows of the data D1 as a result of applying the analysis processing unit M to the data D with the parameter P is one or more According to the embodiment, the user can create the desired visualization content only by selecting the input data and editing the visualization content, without being aware of the analysis procedure for filling in between them. It is possible to reduce the labor of analysis work.

(Second embodiment)
In the first embodiment, the user creates a visualization content by selecting and arranging the visualization components. In the second embodiment, the visualization content is created by using the existing content.
FIG. 16 shows a visualized content edit dialog according to the second embodiment. The visualized content edit dialog according to the second embodiment further includes a visualized content case selection unit (605A). An example (for example, a visualization image) visualized in the past is shown in the visualization content example selection unit (605A). When the user edits the visualized content, the user selects the visualized content case displayed on the visualized content case selecting unit 605A, selects items included in the visualized part selecting unit 601A as necessary, and displays the visualized content display unit. The visualization content is edited by adding to 602A.
Each visualization component of the visualization content example is linked to the analysis processing procedure and the visualization processing procedure shown in FIGS. 4A and 4B. Therefore, the visualization desired by the user is performed by performing the same processing as in the first embodiment. Content can be generated.
According to the second embodiment, when the user wants to use the existing visualized content, the user simply selects the data and the existing content and creates the desired visualized content without being aware of the analysis procedure for filling in between them. It becomes possible to do. In addition, it is possible to reduce the labor of analysis work of the user 100.

(Third embodiment)
Applicability determination of the analysis processing unit M for the data D in S7130 and S2080 in the first embodiment is often determined by whether or not the data D includes the column name or element specified in the parameter P. The applicable range of the analysis processing unit M becomes small. On the other hand, in the third embodiment, the applicable range of the analysis processing unit M is expanded by taking into account the similarity based on the number of elements and the appearance frequency of the column names and elements of the parameter P.
For example, it is determined whether to apply the past analysis process to the selected input data based on the similarity between the data used in the past analysis process and the selected input data.
More specifically, the determination as to whether it is applicable is applicable when the data D and the parameter P of the analysis processing unit M satisfy the following conditions.
(1) When parameter P is column name A and data D includes column name A (2) When parameter P is column name A and element name α, data D includes column name A and column name A is When element name α is included (3) When parameter P is column name A and element name α, data D does not include column name A, or column name A is included but column A does not include element name α Then, the information of the column name A is extracted from the data information of the analysis processing unit M, and the elements in the column B of the data D similar to the column name A and the column B similar to the element name α are extracted. Applicable when similar columns and similar elements can be extracted. Here, the similarity between columns is assumed to be similar when, for example, the difference is calculated in descending order of the appearance frequency or appearance rate of each element in the column, and the sum is the similarity, and the difference is smaller than a predetermined threshold. The similarity of elements is assumed to be similar when the difference in appearance frequency or appearance rate is smaller than a threshold value. Similar definitions can be changed according to the purpose.
It is not applicable in the following cases.
When the number of rows of data D1 as a result of applying analysis processing unit M to data D with parameter P is zero.

17A and 17B show an example of the data structure and data of the analysis pattern of the third embodiment.

FIG. 18 illustrates an example of similarity calculation, parameters P and data D having the following configuration, and the flow of similarity calculation for each column and each element when the threshold is 30.
It is assumed that the column “A” in the parameter P includes 40% of the element α, 35% of the element β, 15% of the element γ, 10% of the element δ, and so on. Data D includes columns “B”, “C”, “D”..., And column “B” includes four elements B1 to B4. B1 is 45%, B2 is 30%, B3 is 15%, and B4 is It is included at a rate of 5%. The column “C” includes two elements C1 and C2, and both C1 and C2 are included in a ratio of 50%. The column “D” includes the elements D1 to D4..., And D1 is 5%, D2 is 4%, D3 is 4%, D4 is 3%,. In this case, the similarity of each column is calculated as follows. The similarity AB between the columns “A” and “B” is | 40−45 | + | 35−30 | + | 15−15 | + | 10−5 | = 15.
The similarity AC between the columns “A” and “C” is | 40−50 | + | 35−50 | + | 15−0 | + | 10−0 | = 15. The similarity AD between the columns “A” and “D” is | 40-5 | + | 35-4 | + | 15-4 | + | 10-3 | +. In the case of the threshold value 30, the column similar to the column “A” is the column “B”, and the element of the column “B” most similar to the element α is B1.

Here, | XY | represents the absolute value of XY. There are various methods for calculating the similarity, but here, the elements are rearranged in descending order of the ratio of each column, and the ratio of the largest elements, the second largest elements, and so on are subtracted. When the number of elements is not equal, for example, when the number of elements in the column “A” is 4 and the number of elements in the column “C” is 2, as in the degree of similarity AC, the third largest element is the fourth Subtraction between large elements employs a method of subtracting zero. In the case of this calculation method, the smaller similarity is similar, and if there is no column below the threshold value, there is no similar column.
According to the third embodiment, the application range of the analysis processing unit M can be expanded by taking into account the similarity based on the number of elements and the appearance frequency of the column names and elements of the parameter P.

(Configuration example)
[Configuration example 1]
An analysis support method and system for automatically generating a processing procedure necessary for realizing a visualization method designated for input data when a user designates input data and a visualization method.
[Configuration example 2]
A visualization content creation support method and system for supporting specification of a visualization method by designating a table column and a graph axis.
[Configuration example 3]
A visualization content creation support method and system that supports specification of an analysis procedure and a visualization method by specifying existing content.
[Configuration Example 4]
An analysis procedure restructuring method and system for generating data that supports the designation of a visualization method using the history of analysis processing and the history of visualization processing.
[Configuration Example 5]
The analysis procedure reconstruction method and system of the configuration example 4 which analyzes the analysis process history and the visualization process history, and decomposes and reconstructs the analysis process.
[Configuration Example 6]
The analysis procedure reconstruction method and system of Configuration Example 4 in which application of analysis processing is expanded in consideration of the similarity between data.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

101 Server 102 Computer 103 Display 104 Input Device 105 Network 106 Network 107 DB
111 Input Data Creation Unit 112 Analysis Processing Execution Unit 113 Data Visualization Unit 114 Processing Procedure Recording Unit 115 Processing Construction Unit 116 Processing Procedure Execution Unit 117 Processing Procedure Recommendation Unit 118 Processing Procedure Analysis Unit 119 Template DB
120 processing procedure DB

Claims

A data analysis support processing method in a system that executes predetermined analysis processing on input data and visualizes the processing result,
Data analysis that displays candidate setting items for visualization predicted from the identification information of the input data when the identification information of the input data selected by the user and the identification information of the visualization method are input via the input device Support processing method.
The data analysis support processing method according to claim 1, wherein a processing result based on input data is displayed by the visualization method according to a setting item selected from the setting item candidates.
The data analysis support processing method according to claim 1, further displaying a past visualization display or an outline of the selected input data.
The data analysis support processing method according to claim 1, wherein candidates of setting items for analysis processing and visualization of the selected input data are predicted based on the history of analysis processing and visualization history of the input data.
The analysis processing is composed of a combination of predetermined analysis processing units,
The data according to claim 1, wherein an analysis pattern indicating a transition of an analysis processing unit is analyzed with respect to an analysis processing history and a visualization history of input data, and an analysis processing candidate for the selected input data is predicted according to the analysis pattern. Analysis support processing method.
6. The method according to claim 5, wherein whether or not to apply the past analysis processing to the selected input data is determined based on the similarity between the data used for the past analysis processing and the selected input data. Data analysis support processing method.
The data analysis support processing method according to claim 1, wherein the setting item for visualization is data displayed in a table column or data used as a graph axis.
A data visualization unit that visualizes the result of executing a predetermined analysis process on input data;
Data analysis that displays candidate setting items for visualization predicted from the identification information of the input data when the identification information of the input data selected by the user and the identification information of the visualization method are input via the input device A system including a support processing unit.
The system according to claim 8, wherein the data analysis support processing unit displays a processing result based on input data by the visualization method according to a setting item selected from the setting item candidates.
The system according to claim 8, wherein the data analysis support processing unit further displays a past visualization display or an outline of the selected input data.
The system according to claim 8, wherein the data analysis support processing unit predicts a setting item candidate for analysis processing and visualization of selected input data based on a history of analysis processing and visualization history of the input data.
The analysis processing is composed of a combination of predetermined analysis processing units,
The data analysis support processing unit analyzes an analysis pattern indicating a transition of an analysis processing unit with respect to an analysis processing history and a visualization history of input data, and predicts an analysis processing candidate for the selected input data according to the analysis pattern. The system according to claim 8.
The data analysis support processing unit determines whether to apply the past analysis processing to the selected input data based on the similarity between the data used for the past analysis processing and the selected input data. The system of claim 12 for determining.
The system according to claim 8, wherein in the data analysis support processing unit, the setting item for visualization is data displayed in a table column or data used as a graph axis.