WO2018011895A1 - Système et procédé de gestion de flux de traitement de données - Google Patents

Système et procédé de gestion de flux de traitement de données Download PDF

Info

Publication number
WO2018011895A1
WO2018011895A1 PCT/JP2016/070576 JP2016070576W WO2018011895A1 WO 2018011895 A1 WO2018011895 A1 WO 2018011895A1 JP 2016070576 W JP2016070576 W JP 2016070576W WO 2018011895 A1 WO2018011895 A1 WO 2018011895A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing flow
similarity
processing
metadata
Prior art date
Application number
PCT/JP2016/070576
Other languages
English (en)
Japanese (ja)
Inventor
陽子 平島
角谷 有司
受田 賢知
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2016/070576 priority Critical patent/WO2018011895A1/fr
Priority to JP2018527294A priority patent/JP6612450B2/ja
Publication of WO2018011895A1 publication Critical patent/WO2018011895A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to a data processing flow management system and method.
  • an index such as a discount rate that leads to purchase can be examined from consumer behavior data.
  • an evaluation index such as an optimum target temperature for reducing the power consumption of the air conditioner while improving the yield from the temperature and humidity data of the production facility and the defect rate.
  • Such analysis services require knowledge related to IT (Information Technology) systems and analysis algorithms in addition to business-specific knowledge, and there are many cases where business staff and consultants collaborate to analyze each project.
  • IT Information Technology
  • Patent Document 1 A method for improving the efficiency of design by reusing past know-how has been proposed (Patent Document 1). Also in data processing, a method of extracting and processing data from a core database, storing ETL (Extract / Transform / Load) processing for recording the processed data, and reusing it is disclosed (patent) Reference 2).
  • ETL Extract / Transform / Load
  • JP 2010-20617 A Japanese Patent Laid-Open No. 2005-11109
  • the similarity of ETL processing is determined only by the job data table attribute name and the data item attribute name. is doing.
  • the operation rules for items and attribute names are not necessarily unified between different industries or between multiple entities. Therefore, when trying to collect and use data obtained from different industries or between multiple entities, determine the relevance and similarity of the data to be handled based on the table attribute name and data item attribute name alone. It is difficult.
  • an object of the present invention is to provide a data processing flow management system and method that make it easy to collect and utilize data relating to processing flows obtained from different industries or from a plurality of entities.
  • an integrated database that stores the schema of the data, a newly created processing flow, or a receiving unit that receives a search request including a part of the processing flow as a search condition processing flow, and any two
  • the similarity of the data is obtained by comparing at least one of the metadata and schema of each data, and at least of the input data and the output data From the similarity calculation unit that calculates the similarity of the processing flow from one similarity, the processing flow group,
  • One or more processing flows similar to the search condition processing flow are extracted using the degree calculation unit, a processing flow list including the extracted one or more processing flows is created, and processing is performed on the search request source.
  • a transmission unit that returns a flow list and the similarity between the processing flows as a response to the search request.
  • Another aspect of the present invention is a data processing flow management method using a repository that stores a processing flow including definition information for reading one or more types of input data, executing conversion processing, and generating output data.
  • a processing flow information database that stores a processing flow group including processing flows is used. Data that is input and output of each processing flow included in the processing flow group and meta data that is information describing the data.
  • An integrated database that stores data and the schema of the data is used.
  • a newly created process flow or a request reception process for receiving a search request including a part of the process flow as a search condition process flow, and an input of a process flow of the search condition process flow and the process flow information database
  • the similarity of data is obtained by comparing at least one of the metadata and schema of each data, and at least one similarity of input data and output data
  • the similarity calculation process for calculating the similarity of the process flow from the above, and one or more process flows are extracted based on the result of the similarity calculation process, and a process flow list including the extracted one or more process flows is created Processing.
  • Yet another aspect of the present invention is a processing flow database for storing data, processing, and a processing flow that defines a connection relationship between the data or processing, a content of the data, and a semantic content of the data.
  • a system that includes an integrated database that stores metadata in a broad sense, an analysis tool computer that accepts processing request inputs from users, and a management unit that performs processing based on processing request inputs and generates processing result output is there.
  • the management unit compares the processing request input with broad metadata, calculates an evaluation value, extracts data having the broad metadata where the evaluation value satisfies a predetermined condition, and extracts the extracted data. Extract the processing flow that contains it and output it as the processing result.
  • Still another aspect of the present invention is a data processing flow management system that includes a storage unit that stores a processing flow in a repository management computer and a search request that uses the processing flow as a search condition, and is included in the search request. Based on the input / output data of the processing flow and the metadata of the input / output data, a processing flow having a high similarity to the processing flow processing included in the search request is extracted from the processing flows stored in the repository, and extracted 1 A search unit that creates a list of the above processing flows; and a similarity calculation unit that calculates the similarity of the processing flows.
  • notations such as “first”, “second”, and “third” are attached to identify the constituent elements, and do not necessarily limit the number or order.
  • a number for identifying a component is used for each context, and a number used in one context does not necessarily indicate the same configuration in another context. Further, it does not preclude that a component identified by a certain number also functions as a component identified by another number.
  • the data processing flow management system in this embodiment manages the input / output data of processing and the metadata of data in association with processing flow information. And at the time of processing flow search, the similarity of data is determined using each metadata about the data contained in search conditions, and the data used by the processing flow (group) used as search object. And based on the similarity of data, a desired process flow is extracted from the process flow used as search object.
  • the processing flow to be searched is the processing flow created in the past accumulated as a database.
  • the processing flow for obtaining the evaluation index required by the user can be extracted.
  • data that can be used by the user as a query and determining the similarity with the input data of the processing flow to be searched it is possible to extract a processing flow that can use the data currently held by the user.
  • the processing flow can be further narrowed down.
  • the data processing flow management system in the present embodiment receives a search request that uses a processing flow as a search condition and a storage unit that stores the processing flow in a repository management computer, and the processing flow included in the search request. Based on the input / output data and the metadata of the input / output data, a process flow having a high similarity to the process flow process included in the search request is extracted from the process flows stored in the repository, and the extracted one or more processes A search unit that creates a list of flows and a similarity calculation unit that calculates the similarity of processing flows are provided.
  • the similarity of data is determined using schema (data structure) information instead of metadata or together with metadata.
  • FIG. 1 is a diagram showing a system configuration and functional blocks according to an embodiment of the present invention.
  • the system configuration of the present embodiment is an information processing system.
  • it comprises one repository management computer 1, one or more analysis tool computers 4, and an integrated database computer 5.
  • the repository management computer 1, the analysis tool computer 4, and the integrated database computer 5 are connected via the network 2, but may be directly connected or may be configured by one computer.
  • each computer includes an input device (input I / F), an output device (output I / F), a processing device (processor or CPU (Central Processing Unit)), and a storage device (memory or storage resource). It has a configuration.
  • the magnetic disk device is constituted by a semiconductor storage device or the like or a combination thereof.
  • Each function of the computer is realized in cooperation with other hardware by executing a program stored in the storage device by the processing device.
  • a program executed by a computer, a function of the program, a means for realizing the function of the program, or a part thereof may be referred to as “function”, “means”, “part”, “unit”, “module”, etc. is there.
  • program is used as the subject.
  • the processing device may be used as the subject.
  • the repository management computer 1 includes a CPU 12, a communication I / F 13, and a storage resource 11.
  • the storage resource 11 stores a management program 100 and a processing flow information database (processing flow information) 200. Yes.
  • the processing flow information 200 stores a processing flow 210 and flow metadata 220.
  • the management program 100 includes a processing flow registration unit (hereinafter referred to as “registration unit”) 101, a processing flow search unit (hereinafter referred to as “search unit”) 102, a similarity calculation unit 103, a correlation analysis unit 104, and a charge management unit 105. Consists of.
  • the integrated database computer 5 includes a CPU, a communication I / F, and storage resources in the same manner as the management computer 1, and the storage resources of the integrated database computer 5 are related to a large number of management objects.
  • Data 510, data schema 520, data metadata 530, a thesaurus 540 defining synonyms of words used to define metadata and schema, search history (data) 580, search history management unit (program) 590 is stored.
  • the management target refers to facilities and information systems used for business.
  • the data 510 can include various information depending on the field and purpose, such as facility operation information, information system access information, the number of customers who use the facility or system, age, and sex.
  • the integrated database computer 5 collects management target data 510 from a management target or a field server that collects management target information, and makes a large number of various management target data accessible centrally.
  • the analysis tool computer 4 also includes a CPU, a communication I / F, and a storage resource as in the management computer 1, and the analysis tool program 400 is stored in the storage resource.
  • the analysis tool (program) 400 includes a processing flow registration request unit 401, a processing flow search request unit 402, a processing flow design unit 403, and an analysis request unit 404.
  • the analysis tool computer 4 is operated by a user and is used for processing flow search and analysis.
  • the repository management computer 1 stores the processing flow, performs search and analysis in response to a request from the analysis tool computer 4, and returns the result to the analysis tool computer 4.
  • the integrated database 500 manages data collectively.
  • the registration unit 101 of the management program 100 receives a processing flow registration request from the analysis tool 400 operated by the registrant via the communication I / F 13, and transfers the processing flow to the processing flow 210 of the storage resource 11. sign up.
  • the processing flow may be registered as appropriate by a processing flow created and used by another user.
  • the search unit 102 receives a search request from the analysis tool 400 via the communication I / F 13.
  • the search request includes a partial processing flow serving as a search condition.
  • the search unit 102 refers to the input / output data of one or more processing flows stored in the processing flow information 200 based on the partial processing flow, and has a similarity with the input / output data defined in the search condition flow. Search the processing flow that handles high data.
  • the search result is returned to the analysis tool 400 via the communication I / F 13 as a processing flow list.
  • the similarity calculation unit 103 calculates the similarity of the processing flow using the schema and metadata of the input / output data.
  • the processing flow search request unit 402 of the analysis tool 400 receives the processing flow list from the management program 100 of the repository management computer 1. Then, the user selects a processing flow close to the processing flow to be created from the processing flow list, modifies the processing flow using the processing flow design unit 403 as necessary, and executes the processing flow.
  • FIG. 2 shows a processing flow expressed by the processing flow 210 of the processing flow information 200.
  • the processing flow 210 is a flow for obtaining one or more output data 213 by performing one or more conversion processes 212 on one or more input data 211.
  • the output data 213 is, for example, an amount having a correlation with the evaluation index 215, but the output data itself can be used as the evaluation index in some cases.
  • the conversion process 212 may be divided into multiple stages, or there may be branching or merging of processes.
  • the starting point of the process flow is a process 214 for reading data from a database or a file, but when the data itself is designated, a process for reading the designated data is executed.
  • the processing flow 210 there is one that examines an index such as a discount rate that leads to purchase from consumer behavior data. For example, as input data 211 indicating consumer behavior data, there are the number of people, age, gender, etc., for example, various conversion processes 212 based on experience are performed, and output data 213 having a correlation with a discount rate as an evaluation index. Get.
  • the processing flow information 200 can store various processing flows 210 having different fields and technologies.
  • FIG. 3 shows the data structure of the processing flow 210 stored in the repository.
  • the processing flow 210 includes processing step information 2100 and connection relation information 2200.
  • the processing step information 2100 has a processing step ID 2101, a type 2102, and an attribute value (unique information) 2103 determined by the type.
  • the type 2102 indicates, for example, “input”, “output”, “conversion”, and the like.
  • the attribute value 2103 includes an access destination of a database that is a data input source, a table name, a column name, connection information, and the like.
  • the type is “output”, the access destination of the database that is the output destination of data, the table name, the column name, the connection information, and the like.
  • conversion the data conversion process or a parameter necessary for the process is used.
  • connection relation 2200 includes a connection relation ID 2201, a start point step ID 2202 indicating a processing step that is a start point of the connection relation, and an end point ID 2203 of the connection relation.
  • the processing flow shown in FIG. 2 can be expressed by the data structure as described above.
  • the process step information 2100 defines the contents of the processes 212 to 214, and the connection relation 2200 defines the relation between processes.
  • FIG. 4 shows an integrated database 500 stored in the storage resource of the integrated database computer 5.
  • the integrated database 500 includes data 510, a schema 520, metadata 530, a thesaurus 540, a search history 580, and a search history management unit 590 shown in FIGS.
  • the integrated database computer 5 accumulates and stores data obtained from different industries or between multiple entities. Various data stored in the integrated database 500 may be automatically collected by a known data collection unit, or may be manually input.
  • the data 510 includes an ID 511 that is an identifier for specifying data, a data name 512, a data location 513 that indicates a location (for example, a table name) in which the data body is stored, and a schema ID 514 that is an identifier for specifying a schema. And a metadata ID 515 that is an identifier for identifying metadata.
  • the metadata in a broad sense is “data related to the data body”, the data 510 itself, the schema 520, and the metadata 530 may be considered as metadata in a broad sense.
  • the metadata 530 is metadata in a narrow sense.
  • “metadata” simply refers to metadata in a narrow sense.
  • the broad metadata includes data used for managing data such as ID 511 and data location 513, and the meaning and content of data such as data name 512, schema 520, and narrow metadata 530. There is information.
  • the present embodiment is characterized in that broad metadata as information on the meaning and content of data is used for retrieval.
  • the data name 512, the schema 520, and the narrow-defined metadata 530 are collectively referred to as “broad-defined metadata related to the semantic content of data”.
  • the schema 520 and the narrowly-defined metadata 530 have a large amount of information and high utility value as information representing the semantic content of the data.
  • the schema 520 includes an ID 521 (514), which is a schema identifier, a data type 522, and a data unit 523.
  • a schema represents the structure of data or a database.
  • the metadata 530 includes an ID 531 (515) that is an identifier of the metadata and a target name 532 that indicates what data the data is related to.
  • Various parameters 533 to 538 can be added as necessary.
  • the thesaurus 540 defines synonym relations of terms used in data names and metadata, and from one or more semantic groups 550 that are synonymous or synonymous word sets. Become. By using the thesaurus 540, it becomes easy to extract data having related names, metadata, and schemas.
  • the semantic group 550 includes an ID 551 that is an identifier of the semantic group, a synonym / synonym flag 552 that indicates the type of synonym / synonym as necessary, and a set 553 of identical or similar words.
  • the set of similar words includes, for example, “large”, “huge”, “huge”, “huge”, and the like. Or, for example, “annealing”, “aniring”, “heating”, “heat treatment”, etc.
  • the unit group is a kind of the semantic group 550, and an ID 561 that is an identifier of the unit group and a synonym / similarity flag 562 as necessary, units that are the same or mutually convertible, such as “meter” and “inch”. ”And“ yard ”or“ yen ”,“ dollar ”, and“ pound ”.
  • the metadata 530 and the thesaurus 540 are input by the integrated data manager when creating the integrated data.
  • the thesaurus 540 a technique for automatically constructing a synonym relation by natural language processing has been developed, and a synonym relation of terms may be extracted from a business document using such a technique.
  • FIG. 5A is a diagram showing a processing flow search screen 1000 according to the embodiment of the present invention.
  • FIG. 5B is a diagram showing a flow of processing flow search processing in the embodiment of the present invention.
  • the processing flow search screen 1000 is an example of a GUI (Graphical User Interface) for processing flow search provided by the analysis tool 400 to the user.
  • GUI Graphic User Interface
  • a user can create a processing flow using the analysis tool 400.
  • a processing flow similar to an incomplete processing flow being created can be searched, and can be used for creating a new processing flow.
  • you may provide the various analysis tool mentioned later as an option.
  • the processing flow search screen 1000 includes an inventory 1100 that is a list of processing flow components and a flow editor 1200 for creating a processing flow.
  • the inventory 1100 displays a list of processing step components 1101 that can be used as components of the processing flow and data 1102 stored in the integrated database 500 (S5001).
  • the processing step part 1101 is a template for processing steps. For example, frequently used processes such as data reading from CSV (Comma-Separated Values) files are prepared as prepared process steps, and can be reused by defining some parameters such as file paths when using them. It is a thing. The user can easily implement the processing by dragging and dropping the processing step component displayed in the inventory to the flow editor and giving a predetermined parameter.
  • CSV Common-Separated Values
  • the data 1102 is recorded as data 510 in the integrated database 500, and the user drags and drops the data 1102 (for example, “power consumption”) displayed in the inventory 1100 to the flow editor 1200, It can be used as input data.
  • the processing step component 1101 and the data 1102 may be displayed with the number narrowed down by, for example, a keyword.
  • the user uses the flow editor 1200 to define input data, processing steps, and output data as an evaluation index, and defines the processing flow by specifying the processing order with arrows indicating connection relationships. (S5002).
  • Data recorded in the integrated database 500 can be used as output data generated by the processing flow. New data can also be defined.
  • the user selects new data 1103 from the output data list of the inventory 1100, drags and drops it on the flow editor 1200, and arranges the new data. Click to set a name, for example, “power demand forecast value”. Further, the metadata input dialog 1201 is opened, and metadata or schema of newly generated output data is input. By inputting the characteristics of data to be obtained as the metadata or schema, it is possible to narrow down search results of similar processing flows.
  • various parameters 533 to 538 such as the target name 532, which are the items of the metadata 530 in FIG.
  • a numerical value or a numerical value range 1203 may be set in the parameter.
  • the processing flow 710 created by the user first may be a simple processing flow as shown in the flow editor of FIG. 5A.
  • a similar process flow is searched from the process flows accumulated in the past.
  • the user presses the search button 1600 on the processing flow search screen 1000, so that the analysis tool 400 uses the input partial processing flow 710 as a search condition.
  • a search request is transmitted to the management program 100 (S5003).
  • the management program 100 receives the partial processing flow 710, uses a broad sense of metadata, searches the processing flow information 200 for processing flows with a high degree of similarity, creates an answer processing flow list, and returns it to the analysis tool 400 To do.
  • the management program 100 uses metadata in a broad sense, searches for data similar to input or output data of the input processing flow, and searches the processing flow information 200 for a processing flow having the data. (S5004). Details of this processing will be described with reference to FIG.
  • the management program 100 transmits the search result as an answer processing flow list to the analysis tool 400 (S5005).
  • the analysis tool 400 displays the answer processing flow list name 1301 and the similarity 1302 returned as the search result on the search result 1300 of the processing flow search screen 1000 (S5006).
  • Each processing flow included in the answer processing flow list includes one or more processing flows (information) 210 and a processing flow similarity. A method for obtaining the similarity of the processing flows will be described with reference to FIG.
  • search result 1300 of the process flow search screen 1000 for example, a list of similar process flow names and similarities are displayed in descending order of similarity.
  • the selected processing flow 1401 is displayed in the processing flow detail field 1400 (S5007).
  • the user can refer to or reuse the displayed processing flow data or processing by using the analysis tool (S5008).
  • the process indicated by the dotted line in the process flow 1401 of FIG. 5A is a process that is not included in the process flow 710 of the flow editor 1200.
  • the user can check the processing contents by clicking the processing step 1402 of the displayed processing flow to open a dialog.
  • the processing step is selected and added to the processing step list of the inventory 1100 by a drag and drop operation. Alternatively, drag and drop may be directly performed on the flow editor 1200.
  • data can be added to the data list (S5006). If the searched processing flow 1401 itself is at or near the desired processing flow, the processing flow may be reused as it is.
  • the search unit 102 of the repository management computer 1 may record the search history 580 in the integrated database 500 in association with the user ID and the searched process flow during the search request process.
  • the same user acquires from the integrated database computer 5 input data that is the same as or similar to the input data of the processing flow recorded as being searched for in the search history 580 by the same user. Then, it is determined that the recorded processing flow is used, and the charge management unit 105 of the management program 100 is notified.
  • the charging management unit 105 charges the user based on the notified usage notification.
  • the processing flow design unit 403 may notify the charge management unit 105 of the management program 100 of the use of the processing step from the analysis tool 400 by adding the selected processing step to the processing step list, and may be charged.
  • a usage fee may be added in association with the data ID of the used data. That is, the user who uses the data pays the price of the creator of the data together with the usage price of the search system.
  • some numerical value for measuring the effect of the system of the present embodiment may be added.
  • the user checks the operation of the processing flow edited with the flow editor 1200. Then, the newly created processing flow is registered in the management program 100.
  • the registration unit 101 of the management program 100 receives the processing flow registration request from the processing flow registration request unit 401 of the analysis tool 400 and registers the processing flow included in the registration request in the processing flow information 200.
  • the process flow registration request unit 401 of the analysis tool 400 stores the ID of the reused process flow in the flow metadata 220 of the process flow information 200 as derivation source information.
  • FIG. 6A is a diagram showing a data correlation discovery screen 6000 according to an embodiment of the present invention.
  • the data correlation discovery screen 6000 is a GUI for data correlation discovery provided by the analysis tool 400.
  • the correlation discovery unit of the analysis tool 400 selects between the selected data.
  • the correlation is added to the correlation analysis field 6200.
  • the correlation analysis field 6200 displays the correlation between the selected data items with lines and numerical values.
  • the strength of the relationship takes a value in the range of -1.0 to 1.0. The closer to 1.0, the higher the positive correlation, and the closer to -1.0, the stronger the negative correlation. A value close to 0 indicates that the correlation is weak.
  • the user can determine the relationship of data by referring to the strength of the relationship.
  • the correlation between data considered to be highly relevant is preferentially determined and displayed. Details of the method of creating information displayed on this screen will be described with reference to FIG.
  • FIG. 6B is a diagram showing a data usage discovery screen 3000 according to an embodiment of the present invention. This tool suggests how to use your data. This is realized by searching a processing flow using data similar to the target data. Details of this processing will be described with reference to FIG.
  • the data use discovery screen 3000 is a GUI for finding use of data provided by the analysis tool 400.
  • the inventory 3100 of the data usage discovery screen 3000 a part or all of the data 510 stored in the integrated database 500 is displayed.
  • the data usage search request is sent to the management program 100.
  • a processing flow list including the specified data is transmitted and received as a search result 3400.
  • FIG. 7 is a diagram showing processing flow search processing by the search unit 102 of the management program 100, and relates to the GUI described in FIG.
  • the search unit 102 of the management program 100 receives a search request transmitted from the analysis tool 400, and acquires a user ID and a partial processing flow serving as a search condition from the search request.
  • the partial processing flow is the processing flow 710 created by the flow editor 1200 of FIG. 5A, and has the processing flow format shown in FIG.
  • the search request may include a keyword such as a process flow creator, a process flow creation period, or a case name in addition to the partial process flow.
  • the search unit 102 acquires an evaluation index (“power demand prediction value” in the example of FIG. 5A) of the partial processing flow (S101), and stores it in the processing flow 201 in the processing flow information 200.
  • the similarity with the evaluation index of the processing flow to be calculated is calculated (S102). This similarity calculation method will be described with reference to FIG.
  • output data can be used specifically as the evaluation index
  • the similarity can be calculated by calculating the similarity between the output data.
  • Information regarding the output data of the processing flow 210 in the storage resource 11 can be used by accessing the information of the integrated database 500 shown in FIG. 4 based on the data of the processing flow 210 shown in FIG. Data 510 whose data similarity satisfies a predetermined condition is extracted from the integrated database 500, and a processing flow 210 including the extracted data as output data is extracted.
  • the search unit 102 selects a response processing flow list (similarity) for the processing flows having the similarity of the evaluation index (output data) equal to or higher than a certain value (for example, 0.5) from the top 10 having the highest similarity.
  • Flow list (S103).
  • the conditions for entering the answer processing flow list are not limited to the above, and various methods may be adopted.
  • the search unit 102 calculates the similarity with the partial processing flow for each processing flow in the similar flow list in consideration of the similarity of the input data (S104).
  • the search unit 102 returns an answer processing flow list to the analysis tool 400 as a search result (S105). Finally, the user ID and the search result are recorded as a search history 580 via the search history management unit 590 of the integrated database 500 (S106).
  • FIG. 8 is a diagram illustrating a data similarity calculation process performed by the similarity calculation unit 103 of the management program 100. This process is used in, for example, the process S102 in FIG. 7, the process S304 in FIG. 9, and the process S402 in FIG. In the following processing, data metadata and schema are used to calculate the similarity.
  • the similarity calculation unit 103 first acquires data 1 and data 2 which are two data to be compared from the integrated database 500 (S201).
  • the similarity calculation unit 103 compares the name of the data 1 and the name of the data 2 and confirms whether they are the same or belong to the same semantic group with reference to the thesaurus 540 (S202).
  • the similarity calculation unit 103 compares the unit of data 1 and the unit of data 2 and confirms whether they are the same or refer to the unit group and belong to the same unit group (S205).
  • the similarity calculation unit 103 compares the target name of data 1 and the target name of data 2 to confirm whether they are the same or belong to the same semantic group by referring to the semantic group (S208).
  • the similarity calculation unit 103 calculates the sum of the products of the weight coefficients of each name, unit, and target name and each similarity as the similarity (S211).
  • the similarity is obtained using the data name, unit, and target name, but the similarity of data may be calculated by including the similarity of other items included in the schema 520 and the metadata 530.
  • FIG. 9 is a diagram illustrating the similarity calculation processing of the processing flow by the similarity calculation unit 103 of the management program 100. This process is used in, for example, the process S104 in FIG.
  • the similarity calculation unit 103 acquires a processing flow 1 as a comparison source and a processing flow 2 as a comparison target (S301).
  • the similarity between the first input data of the process flow 1 and one or more input data of the process flow 2 is calculated (S304). Then, the input data having the highest similarity and the similarity is equal to or higher than a certain value is associated as the similar data of the first input data (S305). This process is repeated for all input data in process flow 1. In this way, similar data are associated with each other in the processing flow 1 and the processing flow 2.
  • the similarity calculation unit 103 calculates the sum of the similarity of the output data and the similarity of the input data as the similarity of the entire processing flow (S306). Also in this process, the output data similarity and the input data similarity may be appropriately weighted.
  • the search unit 102 uses the similarity calculation unit 103 to calculate the similarity between the input data and the input data of one or more processing flows stored in the processing flow 201 (S402).
  • This data similarity calculation process has been described with reference to FIG. Then, a processing flow having a predetermined range of similarity of input data is extracted. For example, a maximum of 10 processing flows having a high similarity are entered in the similar data usage list from processing flows in which the similarity of the input data is greater than or equal to a certain value (S403). Finally, the search unit returns a similar data usage list to the analysis tool 400 (S404).
  • a search for data use is realized by searching a processing flow using data similar to the target data.
  • the correlation analysis unit 104 acquires the analysis target data 510 and the metadata 530 of the analysis target data 510 from the integrated database 500 (S502).
  • the processing flow repository manages the processing flow and thesaurus information that defines the metadata, schema, and synonym of terms of the processing flow input / output data. Since the search is performed, it is possible to determine whether processing flows created by different organizations and handling data having different terms and schemas are similar to each other, so that a wider range of searches and a narrowed search can be performed. The person in charge of analysis can improve the design of the processing flow using the past processing flow. In addition, when finding correlations between data by correlation analysis, the correlation between data is preferentially determined by using metadata, schema, and dictionary information in advance to determine the strength of the relationship between the data. Can be found in.
  • the present invention is not limited to the above-described embodiment, and includes various modifications.
  • a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Grâce à la présente invention, un flux de traitement créé par différentes organisations et des données de manipulation qui diffèrent quant à un terme et un schéma peuvent être recherchés et réutilisés. Un système de gestion de flux de traitement de données est conçu pour gérer des informations de flux de traitement, des données d'entrée et des données de sortie d'un flux de traitement, des métadonnées des données, un schéma et une définition de relation de synonymes de termes. Dans une recherche d'un flux de traitement, les données d'entrée et les données de sortie du flux de traitement incluses dans une condition de recherche sont comparées aux données d'entrée et aux données de sortie du flux de traitement géré par le système de gestion de flux de traitement de données à l'aide des métadonnées, du schéma et de la définition de relation de synonymes de termes, ce qui permet de rechercher un flux de traitement similaire au flux de traitement établi par la condition de recherche.
PCT/JP2016/070576 2016-07-12 2016-07-12 Système et procédé de gestion de flux de traitement de données WO2018011895A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2016/070576 WO2018011895A1 (fr) 2016-07-12 2016-07-12 Système et procédé de gestion de flux de traitement de données
JP2018527294A JP6612450B2 (ja) 2016-07-12 2016-07-12 データ処理フロー管理システムおよび方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/070576 WO2018011895A1 (fr) 2016-07-12 2016-07-12 Système et procédé de gestion de flux de traitement de données

Publications (1)

Publication Number Publication Date
WO2018011895A1 true WO2018011895A1 (fr) 2018-01-18

Family

ID=60952453

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/070576 WO2018011895A1 (fr) 2016-07-12 2016-07-12 Système et procédé de gestion de flux de traitement de données

Country Status (2)

Country Link
JP (1) JP6612450B2 (fr)
WO (1) WO2018011895A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019149165A (ja) * 2018-02-26 2019-09-05 華凌光電股▲ふん▼有限公司 パネルコントロールシステム及びその編集方法
JP2019159608A (ja) * 2018-03-09 2019-09-19 株式会社日立製作所 検索装置及び検索方法
WO2020049759A1 (fr) * 2018-09-06 2020-03-12 オムロン株式会社 Dispositif, procédé et programme de traitement de données
JP2020047212A (ja) * 2018-09-21 2020-03-26 株式会社日立製作所 データ登録装置およびデータ登録方法
WO2021038835A1 (fr) * 2019-08-30 2021-03-04 富士通株式会社 Dispositif de traitement d'informations, et programme de création de flux de données
JP2021140640A (ja) * 2020-03-09 2021-09-16 株式会社日立製作所 検索システム及び検索方法
US11886459B2 (en) 2021-06-04 2024-01-30 Hitachi, Ltd. Data management system and data management method
JP7481283B2 (ja) 2021-03-02 2024-05-10 株式会社日立製作所 メタデータ管理装置、データ管理システムおよびデータ再現方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002041167A (ja) * 2000-07-25 2002-02-08 Hitachi Ltd プログラム実行課金方法ならびに装置
JP2005352861A (ja) * 2004-06-11 2005-12-22 Nippon Telegr & Teleph Corp <Ntt> 電子データ処理方法、電子データ処理装置、および、電子データ処理プログラム
JP2006260333A (ja) * 2005-03-18 2006-09-28 Fujitsu Ltd フロー検索方法
JP2008310566A (ja) * 2007-06-14 2008-12-25 Hitachi Ltd ビジネスプロセス作成方法、ビジネスプロセス作成装置、及びビジネスプロセス作成プログラム
JP2011065236A (ja) * 2009-09-15 2011-03-31 Nec Corp サービス検索装置、サービス提供装置、サービス検索システム、およびサービス検索方法
JP2012243268A (ja) * 2011-05-24 2012-12-10 Nec Corp 業務フロー検索装置、業務フロー検索方法、およびプログラム
WO2013035134A1 (fr) * 2011-09-08 2013-03-14 株式会社日立製作所 Système de fourniture d'ordinateur virtuel et procédé de fourniture
JP2013125429A (ja) * 2011-12-15 2013-06-24 Nec Corp 分析対象決定装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002041167A (ja) * 2000-07-25 2002-02-08 Hitachi Ltd プログラム実行課金方法ならびに装置
JP2005352861A (ja) * 2004-06-11 2005-12-22 Nippon Telegr & Teleph Corp <Ntt> 電子データ処理方法、電子データ処理装置、および、電子データ処理プログラム
JP2006260333A (ja) * 2005-03-18 2006-09-28 Fujitsu Ltd フロー検索方法
JP2008310566A (ja) * 2007-06-14 2008-12-25 Hitachi Ltd ビジネスプロセス作成方法、ビジネスプロセス作成装置、及びビジネスプロセス作成プログラム
JP2011065236A (ja) * 2009-09-15 2011-03-31 Nec Corp サービス検索装置、サービス提供装置、サービス検索システム、およびサービス検索方法
JP2012243268A (ja) * 2011-05-24 2012-12-10 Nec Corp 業務フロー検索装置、業務フロー検索方法、およびプログラム
WO2013035134A1 (fr) * 2011-09-08 2013-03-14 株式会社日立製作所 Système de fourniture d'ordinateur virtuel et procédé de fourniture
JP2013125429A (ja) * 2011-12-15 2013-06-24 Nec Corp 分析対象決定装置

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019149165A (ja) * 2018-02-26 2019-09-05 華凌光電股▲ふん▼有限公司 パネルコントロールシステム及びその編集方法
JP2019159608A (ja) * 2018-03-09 2019-09-19 株式会社日立製作所 検索装置及び検索方法
WO2020049759A1 (fr) * 2018-09-06 2020-03-12 オムロン株式会社 Dispositif, procédé et programme de traitement de données
JP2020042345A (ja) * 2018-09-06 2020-03-19 オムロン株式会社 データ処理装置、データ処理方法及びデータ処理プログラム
CN112567346A (zh) * 2018-09-06 2021-03-26 欧姆龙株式会社 数据处理装置、数据处理方法和数据处理程序
JP7127440B2 (ja) 2018-09-06 2022-08-30 オムロン株式会社 データ処理装置、データ処理方法及びデータ処理プログラム
US11468082B2 (en) 2018-09-06 2022-10-11 Omron Corporation Data processing apparatus, data processing method, and data processing program stored on computer-readable storage medium
JP2020047212A (ja) * 2018-09-21 2020-03-26 株式会社日立製作所 データ登録装置およびデータ登録方法
WO2021038835A1 (fr) * 2019-08-30 2021-03-04 富士通株式会社 Dispositif de traitement d'informations, et programme de création de flux de données
JP2021140640A (ja) * 2020-03-09 2021-09-16 株式会社日立製作所 検索システム及び検索方法
JP7481283B2 (ja) 2021-03-02 2024-05-10 株式会社日立製作所 メタデータ管理装置、データ管理システムおよびデータ再現方法
US11886459B2 (en) 2021-06-04 2024-01-30 Hitachi, Ltd. Data management system and data management method

Also Published As

Publication number Publication date
JPWO2018011895A1 (ja) 2018-11-08
JP6612450B2 (ja) 2019-11-27

Similar Documents

Publication Publication Date Title
JP6612450B2 (ja) データ処理フロー管理システムおよび方法
US11403464B2 (en) Method and system for implementing semantic technology
US10146878B2 (en) Method and system for creating filters for social data topic creation
Zakir et al. Big data analytics.
US9965527B2 (en) Method for analyzing time series activity streams and devices thereof
US10002187B2 (en) Method and system for performing topic creation for social data
Koschmider et al. Improving the process of process modelling by the use of domain process patterns
CN112100396B (zh) 一种数据处理方法和装置
CA2956627A1 (fr) Systeme et moteur servant au regroupement cible d&#39;evenements d&#39;informations
US9996529B2 (en) Method and system for generating dynamic themes for social data
KR102547033B1 (ko) 키워드 인식 기능을 활용하여 사용자가 선택한 방식으로 정보를 제공하는 방법
JP7278100B2 (ja) 投稿評価システム及び方法
JP6267398B2 (ja) サービス設計支援システムおよびサービス設計支援方法
Tambouris et al. Processing linked open data cubes
US20140201193A1 (en) Intellectual property asset information retrieval system
Bodendorf et al. Business analytics in strategic purchasing: Identifying and evaluating similarities in supplier documents
EP4002152A1 (fr) Système de marquage et de synchronisation de données
Modoni et al. The knowledge reuse in an industrial scenario: A case study
Chakravarthy et al. RETRACTED ARTICLE: Mining interesting actionable patterns for web service composition
KR20190052980A (ko) 인재 정보 처리 방법 및 장치
JP6403864B2 (ja) サービス設計支援システムおよびサービス設計支援方法
Cheng et al. Smart Home Service Experience Strategic Foresight Using the Social Network Analysis and Future Triangle
JP2010218216A (ja) 類似文書検索システム、方法及びプログラム
JP7477791B2 (ja) 処理装置、処理方法および処理プログラム
Bellandi et al. A service infrastructure for the Italian digital justice

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2018527294

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16908799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16908799

Country of ref document: EP

Kind code of ref document: A1