WO2018066152A1 - データ統合装置およびデータ統合方法 - Google Patents
データ統合装置およびデータ統合方法 Download PDFInfo
- Publication number
- WO2018066152A1 WO2018066152A1 PCT/JP2017/011163 JP2017011163W WO2018066152A1 WO 2018066152 A1 WO2018066152 A1 WO 2018066152A1 JP 2017011163 W JP2017011163 W JP 2017011163W WO 2018066152 A1 WO2018066152 A1 WO 2018066152A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- predetermined
- data format
- information
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F5/00—Methods or arrangements for data conversion without changing the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2205/00—Indexing scheme relating to group G06F5/00; Methods or arrangements for data conversion without changing the order or content of the data handled
- G06F2205/003—Reformatting, i.e. changing the format of data representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0661—Format or protocol conversion arrangements
Definitions
- the present invention relates to a data integration device and a data integration method, and more specifically, to a technology that supports the realization of an efficient data conversion process even between conversion-defined data and the like.
- Data integration devices have been developed for the purpose of promoting the cross-use of data across a wide variety of systems. These data integration devices collect and store a wide variety of data from various business systems that serve as data sources, while converting the format and structure of the stored data according to user requirements. Process.
- an information integration program for converting data extracted from an information source and registering it in a storage destination, wherein the first schema information acquired from the information source and the first schema information before the change Comparing the second schema information acquired from the information source to detect a change in the schema of the information source; and attribute values and data included in the schema information in the attribute values of the items related to the schema change
- the data format required for a predetermined system or application that requires the above-described conversion processing may be different from the integrated data format.
- the integrated data format is, for example, a data format composed of data items that are most commonly used among the predetermined data in various systems, and between the data in each system, The correspondence between the data items described above is already defined. Accordingly, the fact that the data format required by the above-mentioned predetermined system is different from the integrated data format means that the definition necessary for the above-described conversion processing is in an unknown state.
- an object of the present invention is to provide a technique for supporting the realization of an efficient data conversion process even between data whose conversion definitions are undefined.
- the data integration device of the present invention that solves the above-described problems is a data format of each table used in a predetermined system for data of a predetermined event, and master data predetermined for each predetermined table as a universal data format between the data
- a storage device storing each information of the format, information on a conversion process definition of data between the predetermined table of the master data format and the predetermined table of the predetermined data format of the predetermined system, and the storage device
- a first similarity that is a similarity between a data format of a table relating to predetermined data in which data format information is not stored and a master data format for each predetermined table is calculated, and the first similarity satisfies a predetermined criterion.
- a process for specifying a predetermined table in a data format, a master data format for the specified predetermined table, and storage in the storage device Calculating a second similarity that is a similarity to the data format of each table of the system, specifying a predetermined table of the predetermined system in which the second similarity satisfies a predetermined criterion, and the specified master data
- the information of the conversion processing definition related to the table is read from the storage device, and the information is output to the predetermined device as information of a conversion processing component candidate that can be reused.
- an arithmetic unit that executes the processing.
- the data integration method of the present invention includes a data format of each table used in a predetermined system for data of a predetermined event, and a master data format predetermined for each predetermined table as a universal data format between the data.
- An information processing apparatus comprising a storage device storing each information and information on a conversion process definition of data between a predetermined table in the master data format and a predetermined table in a predetermined data format of the predetermined system, A first similarity that is a similarity between a data format of a table related to predetermined data in which data format information is not stored in the apparatus and a master data format for each predetermined table is calculated, and the first similarity is based on a predetermined reference A process of specifying a predetermined table of a master data format to be satisfied, a master data format of the specified predetermined table, and the storage device Calculating a second similarity that is a similarity to the data format of each table of the system stored in the system, and specifying the predetermined table of the predetermined system that satis
- FIG. 1 It is a figure which shows the example of a network structure containing the data integration apparatus in this embodiment. It is a figure which shows the data format example of the data structure definition table of this embodiment. It is a figure which shows the example of a data format of the reusable component extraction result storage table of this embodiment. It is a figure which shows the data format example of the similarity calculation parameter table of this embodiment. It is a figure which shows the example of the data format which stores the result of having calculated the similarity between the table of the master data format in this embodiment, and the table of the data format which a delivery destination system requests
- FIG. (1) explaining the process which extracts the reusable data conversion process component candidate which performs data conversion to the data format which the delivery destination system of this embodiment requests
- FIG. (2) explaining the process which extracts the reusable data conversion process component candidate which performs data conversion to the data format which the delivery destination system of this embodiment requests
- FIG. 1 is a network configuration diagram including the data integration device 100 of the present embodiment. As shown in FIG. 1, the data integration device 100 of this embodiment is connected to an input terminal 120, a distribution source system 130, and a distribution destination system 140 via a dedicated line 150 so that they can communicate with each other.
- the distribution source system 130 is a system that holds train diagram data managed and operated by, for example, a railway operator. Data distributed from the distribution source system 130 to the data integration apparatus 100 is converted into a data format in the distribution destination system 140 by a predetermined data conversion program (conversion processing definition) in the data integration apparatus 100, and the distribution destination system 140 Will be delivered to.
- conversion processing definition conversion processing definition
- the distribution destination system 140 is a system that is managed and operated by a railway operator that executes appropriate operations and services based on the predetermined data derived from the distribution source system 130 described above. Specifically, it is possible to assume a system that manages train operation using observation data of train operation status and the above-described train schedule data.
- the input terminal 120 is a terminal operated by a design developer of a data conversion program for converting data obtained from the distribution source system 130 into a data format desired by the distribution destination system 140.
- the data integration apparatus 100 of this embodiment included in such a network configuration includes a user interface unit 111, a data structure similarity calculation unit 112, and a reusable data conversion component extraction as functional components implemented by appropriate hardware and software. Unit 113 and communication unit 114.
- the data integration device 100 also includes a data storage unit 101 as a storage destination of data handled by such functional units.
- the data structure similarity calculation unit 112 calculates the data structure in the data format table requested by the distribution destination system 140 and the data structure in the master data format table held in advance by the data integration device 100. The similarity is calculated.
- the above-described master data format integrated data format
- the correspondence between the data items is already defined, that is, between the data items of the corresponding table. It is assumed that a data conversion program for performing data conversion processing is already held in the data integration device 100. Details of the processing procedure performed by the data structure similarity calculation unit 112 will be described later with reference to the flowchart shown in FIG.
- the reusable data conversion component extraction unit 113 converts data distributed from the distribution source system 130 into a data format requested by the distribution destination system 140 via the master data format, That is, “reusable data conversion processing component candidates” are extracted. Details of the processing procedure performed by the reusable data conversion component extraction unit 113 will be described later with reference to the flowchart shown in FIG.
- the communication unit 114 communicates with the distribution source system 130 via the dedicated line 150, and transmits / receives predetermined distribution data and data structure definition information 131 related to the distribution data.
- the distribution data (eg, train schedule data) described above is assumed to be tabular data having a data structure defined by the data structure definition table 107 (FIG. 2).
- the data integration device 100 obtains such tabular data from the distribution source system 130 and stores it in the distribution source data storage unit 110 (FIG. 8).
- the data structure definition information 131 described above is information composed of information on the data format, table name, column in the table, and data type of the distribution data.
- the data integration device 100 stores this data structure definition information 131 in the data structure definition table 107.
- the above-described data structure definition table 107 has the data format shown in FIG. 2 and includes a data format 1101, a table 1072, a column 1103, and a data type 1104 as its data items.
- structure definition information relating to a total of three types of data formats “master data”, “data format X”, and “data format Y” is stored.
- the user interface unit 111 selects candidates for data conversion programs (data conversion parts) that can be reused to perform data conversion processing on the data format of the delivery destination system 140 for the data conversion program design developer.
- a reuse candidate conversion component presentation screen 1110 (FIG. 16) is generated.
- the reuse candidate conversion component presentation screen 1110 includes a distribution destination system data format input area 11101 for inputting the data format of the distribution destination system 140, a reusable component extraction button 11102, and a reuse candidate conversion component list display area. 11103.
- the design developer of the data conversion program views the above-mentioned reuse candidate conversion component presentation screen 1110 on the input terminal 120 and inputs the data format required by the distribution destination system 140 in the distribution destination system data format input area 11101. Assume that the reusable component extraction button 11102 is pressed. In this case, the data integration device 100 executes a data structure similarity calculation process and a reusable data conversion component extraction process in accordance with the data format input in the delivery destination system data format input area 11101.
- the data integration apparatus 100 uses the reuse candidate conversion component (known data conversion program) read from the reusable component extraction result storage table 106 (FIG. 3). List.
- This reusable part extraction result storage table 106 has the data format shown in FIG. 3, and as its data items, a data format 1081, a table 1062, a column 1083 in the distribution destination system 140, and a data conversion base point
- the conversion source column 1084 indicating the corresponding table and column in the master data format, and the value of the predetermined column of the predetermined table of the master data format corresponds to the value of the predetermined column of the predetermined table of the data format in the predetermined distribution destination system
- a conversion destination column 1085 (a data conversion program for performing data conversion processing is known).
- train number column of the station time table in master data format is set to “data format”.
- Corresponding information is stored on the assumption that the data conversion program to be converted into “train number column of X train information table” is a reusable candidate.
- the similarity calculation parameter table 102 in the data storage unit 101 has the data format shown in FIG. 4, and defines weight value information used in the data structure similarity calculation processing.
- the data items include an item name 1031 and a similarity calculation weight 1032.
- the item name 1031 indicates a column name in the table, and in the example of FIG. 4, values such as “train” and “departure time” are stored.
- the similarity calculation weight 1032 indicates a weight value to be applied to the result of matching determination of the corresponding column in similarity calculation between data structures. In the example of FIG. The value “3” is stored.
- Each data of the similarity calculation parameter table 102 is registered in advance by an expert.
- the similarity calculation result temporary storage unit 103 in the data storage unit 101 calculates the similarity between the master data format table and the data format table requested by the distribution destination system 140, as shown in FIG.
- the storage destination is stored in the table format.
- the data items include a table 1041, a column 1042, a table 1043, a column 1044, a data type 1045, and an inter-table similarity 1046.
- the table 1041 indicates the table name in the master data format
- the column 1042 indicates the column name of the table stored in the table 1041
- the table 1043 indicates the table name of the data format requested by the distribution destination system 140
- the column 1044 indicates the column name of the table stored in the table 1043.
- the data type 1045 indicates the data type of the column 1042 and the column 1044 described above.
- the inter-table similarity 1046 indicates a calculation result of the similarity between the tables stored in the table 1041 and the table 1043 described above. Note that the calculation result related to the degree of coincidence between columns is stored in the degree of coincidence storage area 1047.
- the result of calculating the degree of coincidence of the column names is N and the result of calculating the degree of coincidence of the data type is M
- the result is stored as a set of respective coincidence degree calculation results as (N, M). I decided to.
- the vertical length in the table illustrated in FIG. 5 is the number of columns of the table stored in the table 1041
- the horizontal length in the table is the number of columns of the table stored in the table 1043. Minutes.
- the similarity calculation result storage unit 105 in the data storage unit 101 calculates the similarity between the master data format table and the data format table defined in the data structure definition table, as shown in FIG. It is stored in tabular form.
- the data items include a table 1071, a column 1072, a data format 1073, a table 1074, a column 1075, a data type 1076, and an inter-table similarity 1077.
- the table 1071, the column 1072, the table 1074, the column 1075, the data type 1076, and the inter-table similarity 1077 are the data format examples of the similarity calculation result temporary storage unit 103 illustrated in FIG. It is the same composition.
- the data format 1073 has the same configuration as the data item of the data format in the data structure definition table 107.
- the value stored in the coincidence degree storage area 1078 has the same configuration as the data format example of the similarity calculation result temporary storage unit 103 exemplified in FIG. In the example illustrated in FIG. 6, the result when the similarity between the “train” table in the master data format and all the tables in “data format X” and “data format Y” is calculated is shown.
- the data conversion processing component definition table 104 in the data storage unit 101 is a data table that defines data conversion program information for converting the data format, and has the data format shown in FIG.
- the data items include a conversion source data format 1061, a conversion source table 1042, a conversion source column 1063, a conversion destination data format 1064, a conversion destination table 1065, a conversion destination column 1066, and a program file name 1067. Including.
- the conversion source data format 1061 indicates the data format of the conversion source data
- the conversion source table 1042 indicates the data table name of the conversion source data
- the conversion source column 1063 indicates the column name of the conversion source data table.
- the conversion destination data format 1064 indicates the data format of the conversion destination data
- the conversion destination table 1045 indicates the data table name of the conversion destination data
- the conversion destination column 1066 indicates the column name of the conversion destination data table
- the program file name 1067 indicates the file name of a program for converting data from the conversion source column 1063 to the conversion destination column 1066.
- the column “train number” in the table “station time” in the master data format is changed to the column “train number” in the table “train information” in the “data format X”.
- the name of the program “prg00001.dat” for data conversion is stored.
- FIG. 8 is an explanatory diagram showing the principle of data conversion processing in the data integration device 100.
- the data integration device 100 in the present embodiment converts the distribution source data stored in the distribution source data storage unit 110 into a master data format and stores it in the master data storage unit 109. Further, the data integration device 100 converts the above-mentioned data stored in the master data storage unit 109 into a data format requested by the distribution destination system 140. In this data format conversion processing, the data integration apparatus 100 performs association processing, column conversion, and arithmetic processing between the columns in the conversion source table and the columns in the conversion destination table, and stores the results in the data conversion component library 108. Store as a data conversion program. In the example shown in FIG.
- a data conversion component group (data conversion program group) that converts data in the master data format stored in the master data storage unit 109 into a data format required by the delivery destination system 140 in the data conversion component library 108.
- conversion to “data format X” required by “distribution destination system X” is realized by using a data conversion program for every column of all tables of “data format X”. It is assumed that a data conversion program to a data format required by the distribution destination system 140 is developed in advance and registered in the data conversion component library 108.
- FIG. 9 is a diagram illustrating a hardware configuration example of the data integration device 100.
- the data integration device 100 of this embodiment includes a CPU 201, an HDD 202, a memory 203, an input device 204, a display device 205, and a communication device 206.
- the CPU 201 is an arithmetic device that performs data input / output, reading, storage, and various processes.
- the HDD 202 is a nonvolatile storage unit that stores data.
- the memory 203 is a volatile storage unit that temporarily stores programs and data.
- the input device 204 is a device such as a keyboard, a mouse, or a microphone that receives an operation input from the user.
- the display device 205 is a device such as a display that displays data to the user.
- the communication device 206 is a device such as a network card that communicates with the distribution source system 130 or the distribution destination system 140 via the dedicated line 150 and transmits / receives data.
- the CPU 201 executes the program 207 stored in the HDD 202 or the memory 203, so that the above-described functional units are mounted.
- FIG. 10 is a diagram showing a flow example 1 of the data integration method according to the present embodiment. Specifically, the data integration apparatus 100 calculates the data structure similarity, and the data of the distribution source system 130 is distributed to the distribution destination.
- FIG. 7 is a flow chart showing a series of procedures for extracting a reusable data conversion program from an existing data conversion program (for conversion to a data format desired by the system 140).
- the design developer of the data conversion program calculates the data format, data structure, and data structure similarity requested by the delivery destination system 140 on the design developer presentation screen 1110 shown in FIG. 16 displayed on the input terminal 120. Assume that a processing request is input.
- the data integration apparatus 100 inputs the data format and data structure information requested by the delivery destination system 140 and the data structure similarity calculation processing request input by the above-mentioned data conversion program design developer. Received from the terminal 120 (301). Of course, this step is not necessary when the data integration apparatus 100 has acquired such information in advance by another means or route.
- FIG. 11 shows a data format example showing a data structure related to the “train / station” table of the data format “data format Z” requested by the delivery destination system 140.
- Data items in the exemplified data structure include a data format 1401, a table 1402, a column 1403, and a data type 1404. The configuration of this data item is the same as that of the data item in the data structure definition table 107 described above.
- the data structure similarity calculation unit 112 of the data integration device 100 calculates the similarity between the data structure in the data format table requested by the distribution destination system 140 and the data structure in each table in the master data format ( 302).
- the reusable data conversion component extraction unit 113 of the data integration device 100 extracts a reusable data conversion processing program candidate for performing data conversion into the data format requested by the distribution destination system 140 (303). ).
- the user interface unit 111 of the data integration device 100 refers to the reusable component extraction result storage table 106 shown in FIG. 3 and performs data conversion to convert the data into the data format requested by the distribution destination system 140 described above.
- a screen for displaying a list of reusable programs as a program is generated, the screen (FIG. 16) is returned to the display terminal (304), and the process is terminated.
- FIG. 12a shows the details of the procedure in which the data structure similarity calculation unit 112 calculates the similarity between the data structure in the data format table requested by the distribution destination system 140 and the data structure in each table in the master data format. It is a flowchart.
- the data structure similarity calculation unit 112 of the data integration device 100 acquires the data record of each table whose data format is “master data format” in the data structure definition table 107 (3021).
- the data structure similarity calculation unit 112 of the data integration device 100 performs a loop on all the tables in the master data format from which the data records are acquired in Step 3021 (3022).
- the data structure similarity calculation unit 112 of the data integration device 100 has registered in the data structure definition table 107 and has a data format other than the “master data format”, that is, a table of each data format of the known delivery destination system 140. A loop is performed for all (3023).
- the data structure similarity calculation unit 112 of the data integration device 100 is a table in the master data format obtained in step 3021 and includes the column of the loop target table and the distribution destination system 140 that is the loop target in step 3023. It is a data format table, and the degree of coincidence with the column of the loop target table and the degree of similarity between the tables are calculated (30231). Details of the processing procedure for calculating the similarity between the tables will be described with reference to the flowchart shown in FIG.
- the data structure similarity calculation unit 112 determines the degree of coincidence between the column of the loop target table in the master data format described above and the column of the loop target in the data format of the distribution destination system 140, and the similarity between the tables. Is a flowchart showing details of a procedure for calculating each of.
- the data structure similarity calculation unit 112 of the data integration device 100 performs a loop on all the columns of the master data format table that is the loop target table in the above-described step 3022 (3024).
- the data structure similarity calculation unit 112 of the data integration device 100 performs a loop on all the columns of the data format table of the distribution destination system 140, which is the loop target table in step 3023 described above (3025). ).
- the data structure similarity calculation unit 112 of the data integration device 100 loops the column name of the loop target column in the master data format table that is the loop target and the data format table loop of the distribution destination system 140 that is the loop target. It is determined whether the column name of the target column matches (3026).
- the data structure similarity calculation unit 112 of the data integration device 100 sets “0” as the matching degree of the similarity calculation result temporary storage unit 103. It stores in the storage area 1047 (30211).
- the data structure similarity calculation unit 112 of the data integration device 100 refers to the similarity calculation parameter table 102, and the table All values of item names and similarity calculation weights are acquired (3027).
- the data structure similarity calculation unit 112 of the data integration device 100 determines whether the target column name whose determination result is “match” in step 3026 is defined among the item names obtained in step 3027 (3028). .
- the data structure similarity calculation unit 112 of the data integration device 100 sets “1” in the similarity calculation result temporary storage unit 103. Stored in the coincidence storage area 1047 (30210).
- the data structure similarity calculation unit 112 of the data integration device 100 calculates the calculation result of “1 ⁇ similarity calculation weight” Is stored in the coincidence degree storage area 1047 of the similarity calculation result temporary storage unit 103 (3029).
- the data structure similarity calculation unit 112 of the data integration device 100 performs the loop in the data format table of the loop target column in the master data format table that is the loop target and the data format table of the distribution destination system 140 that is the loop target. It is determined whether the data type of the target column matches (30212).
- the data structure similarity calculation unit 112 of the data integration device 100 sets “1” to the similarity calculation result temporary storage unit 103. Stored in the coincidence storage area 1047 (30213).
- the data structure similarity calculation unit 112 of the data integration device 100 sets “0” in the similarity calculation result temporary storage unit 103. Stored in the coincidence degree storage area 1047 (30214).
- the data structure similarity calculation unit 112 of the data integration device 100 calculates the similarity between the master data format table and the data format table of the distribution destination system 140 (matching degree), which is the loop target described above. ) / ⁇ 2 ⁇ (number of columns of master data table ⁇ number of columns of table to be compared) ⁇ , and the calculation result is stored in the inter-table similarity 1046 of the similarity calculation result temporary storage unit 103. (30215), and the process ends.
- FIG. 13 is an explanatory diagram showing a concept of performing similarity calculation processing for the “train” table in the master data format and the “train / station” table in the “data format Z”.
- the data integration apparatus 100 determines that the column names of the “train number” column in the “train” table in the master data format and the “train / station” table in the “data format Z” match. This matching column name “train number” is defined in the item name of the similarity calculation parameter table 102. Therefore, the data integration device 100 acquires the similarity calculation weight “3” corresponding to this “train number”.
- the data integration device 100 stores “3”, which is the column name coincidence calculation result, in an area 10471 corresponding to the “train number” column in the coincidence degree storage area 1047.
- the data integration apparatus 100 matches the area 10471 corresponding to the “train number” column in the matching degree storage area 1047. “1” is stored as the result of calculating the coincidence of the data type.
- the data integration apparatus 100 performs the above-described processing for all combinations of each column of the “train” table in the master data format and each column of the “train / station” table in the “data format Z”.
- the data integration device 100 calculates the inter-table similarity for the “train” table in the master data format and the “train / station” table in the “data format Z”.
- FIG. 14 shows data conversion processing program candidates that can be reused when converting predetermined data of the distribution source system 130 into the data format required by the distribution destination system 140, and reusable data conversion of the data integration apparatus 100. It is a flowchart which shows the detail of the procedure (step 303 in a main flow) which the components extraction part 113 extracts.
- the “reusable data conversion program” refers to data conversion of data in a predetermined table of the distribution source system 130 to a data format of the predetermined distribution destination system 140 in relation to the predetermined table in the master data format. It is a known data conversion program that is defined to be performed.
- the data integration apparatus 100 of the present embodiment provides information for reusing a known data conversion program for the data format of the delivery destination system 140 for which the data conversion program is not yet defined.
- the reusable data conversion component extraction unit 113 of the data integration device 100 performs a loop on all the corresponding tables (information is obtained in step 301) in the data format requested by the distribution destination system 140. (3031).
- the reusable data conversion component extraction unit 113 of the data integration device 100 performs a loop for all the columns of the table to be looped in the loop (3032).
- the reusable data conversion component extraction unit 113 of the data integration device 100 calculates the similarity for the relationship between each table in the master data format and the data format table in the delivery destination system 140 that is the loop target. Referring to the storage unit 105 (FIG. 6), the column of the loop target table, the master data format column having the same column name or data type, and information on the table are acquired (3033).
- the reusable data conversion component extraction unit 113 of the data integration device 100 matches the column name or the data type as a result of the above-described step 3033, that is, the matching degree is (a, b) (a> 0 or b It is determined whether there is a column that is> 0) (3034).
- the reusable data conversion component extraction unit 113 of the data integration device 100 converts the conversion source column 1084 of the reusable component extraction result storage table 106 and the conversion source column 1084.
- a value of “no reusable candidate” is stored in the first column 1085 (3036).
- the reusable data conversion component extraction unit 113 of the data integration device 100 determines the degree of coincidence between the column name and the data type of the corresponding column.
- the column having the maximum sum among the corresponding columns is identified (3035).
- the reusable data conversion component extraction unit 113 of the data integration device 100 determines whether there are a plurality of columns specified in step 3035 described above (3037).
- the reusable data conversion component extraction unit 113 of the data integration device 100 determines the corresponding table in the master data format.
- the column name of the corresponding column and the table name of the master data format table having the column are acquired (3039).
- the reusable data conversion component extraction unit 113 acquires the similarity of each table having each corresponding column, and the similarity Specifies the master data format table in which the maximum is between tables (3038).
- the reusable data conversion component extraction unit 113 of the data integration device 100 acquires the column name of the corresponding column and the table name in the specified master data format table.
- the reusable data conversion component extraction unit 113 of the data integration device 100 performs a loop for the number of combinations of the corresponding column and the corresponding table for which the column name and the table name are acquired in either step 3038 or step 3039 ( 30310).
- the reusable data conversion component extraction unit 113 of the data integration device 100 refers to the similarity calculation result storage unit 105 and refers to the master data format table targeted in the above-described loop and the similarity between the table.
- the matching degree calculation result regarding the loop target column is acquired (30311).
- the reusable data conversion component extraction unit 113 of the data integration device 100 selects the column between the master data format table and each table of all data formats in the distribution destination system 140. It is determined whether there is a column whose name or data type matches, that is, the matching degree is (a, b) (a> 0 or b> 0) (30312). If the corresponding column does not exist as a result of the above determination (30312: NO), the reusable data conversion component extraction unit 113 of the data integration device 100 and the conversion source column 1084 in the reusable component extraction result table storage 106 A value of “no reusable candidate” is stored in the conversion destination column 1085 (30314).
- the reusable data conversion component extraction unit 113 of the data integration device 100 adds the matching degree between the column name and the data type of the corresponding column.
- the information of the data format, the corresponding table, and the column name of the delivery destination system 140 that obtains the maximum value is acquired (30313).
- the reusable data conversion component extraction unit 113 of the data integration device 100 determines whether there are a plurality of columns acquired in step 30313 (30315).
- the reusable data conversion component extraction unit 113 of the data integration device 100 has the corresponding master data format of each table including the corresponding columns.
- the table having the maximum similarity between the corresponding tables is specified (30316).
- the reusable data conversion component extraction unit 113 of the data integration device 100 advances the processing to S30317.
- the reusable data conversion component extraction unit 113 of the data integration device 100 has the data format (of the delivery destination system 140) specified in the above step 3016 for the column data in the predetermined table in the master data format.
- the data conversion program which is the column data of the corresponding table, determines that it is a reusable candidate part to be converted to the column of the table to be looped in step 3031 and step 3032 and converts the reusable part extraction result storage table 106
- the “column of the master data format table acquired in step 3038 or 3039” is stored in the source column 1084, and the “column of the acquired data format table of the distribution destination system 140” is stored in the conversion destination column 1085 (30317).
- FIG. 15a and FIG. 15b are reusable as a data conversion program for converting data to the column “train number” of the “train / station” table in the data format “data format Z” requested by the distribution destination system 140.
- a specific processing concept for extracting data conversion processing component candidates will be described.
- the reusable data conversion component extraction unit 113 of the data integration device 100 uses the “train number” column of the “train” table in the master data format as a column whose column name or data type matches between both tables.
- the information of the “train number” column of the “station time” table in the master data format is acquired.
- the reusable data conversion component extraction unit 113 of the data integration device 100 identifies the “station time” table in the master data format having the maximum similarity between tables of “0.47”, and the master data format Get the name of the “station time” table and the name of the “train number” column.
- the result of coincidence calculation between all columns of all tables of “format X” and “data format Y” is acquired.
- the reusable data conversion component extraction unit 113 of the data integration device 100 sets the “train number” column of the “station time” table in the master data format to the “train number” in the “train information” table of the “data format X”.
- the processing component to be converted to the “column” is stored in the reusable component extraction result storage table 106 as a reusable component candidate that performs data conversion to the “train number” column of the “train / station” table of “data format Z”. To do.
- FIG. 16 is an example of a screen generated by the user interface unit 111, and is a diagram illustrating an example of a reuse candidate conversion component presentation screen 1110 that is presented to a data conversion program design developer via the input terminal 120. .
- the reuse candidate conversion component presentation screen 1110 includes a delivery destination system data format input area 11101, a reusable component extraction button 11102, and a reuse candidate conversion component display area 11103.
- the reuse candidate conversion area 11103 records whose data items in the distribution destination data format in the reusable component extraction result storage table 106 match using the value input in the distribution destination system data format input area 11101 as a key.
- Information and the file name of the data conversion program to be converted from the conversion source column 1084 to the conversion destination column 1085 are displayed.
- the file name of the data conversion program is the value of the program file name 1067 of the record extracted from the data conversion processing component definition table 104 using the values of the conversion source column 1084 and the conversion destination column 1085 of the record described above as keys. .
- train information table of “data format X” from “train number” column of “station time” table in master data format, respectively. From the “station name” column of the “station time” table in the master data format to the “station name” column in the “train information” table in the “data format X”, the data conversion program “prg00001.dat” to be converted into the “number” column The data conversion program “prg00005.dat” to be converted is displayed as a reusable candidate.
- the means for extracting candidates for the reusable data conversion program described above include methods based on other known machine learning techniques, such as neural networks and support vector machines.
- a classifier may be used.
- the user interface unit 111 changes the display form of the column to the underlined part.
- a clickable highlight such as a character may be used.
- FIG. 17 shows a display example in this case.
- clickable highlighting is performed when the match is specified in the match determination between columns (steps 3028 to 3029 and step 30210), and the application target of the similarity calculation weight value in the similarity calculation parameter table 102 is applied. It is a description about the column.
- the user interface unit 111 of the data integration device 100 sets the characters of the column “train number” in the “station time” table in the master data format to be underlined with bold characters,
- the characters of the column “train number” in the “train information” table of “data format X” are underlined with bold letters.
- the user interface unit 111 of the data integration device 100 displays the pull-down menu 111031 below the underlined part, for example, according to the event that the above-mentioned design developer operates the input terminal 120 and clicks on the underlined part.
- This pull-down menu 111031 is an interface that allows the design developer to change the value of the similarity calculation weight of the similarity calculation parameter table 102 used in the above-described matching determination for the corresponding column.
- the similarity calculation weight value applied to the “train number” column is a menu that can be selected from “3” to “1”.
- the user interface unit 111 of the data integration device 100 uses each of the above-described similarity calculation weight values selected according to the selection of the similarity calculation weight value received from the design developer in the pull-down menu 111031. Instructs the data structure similarity calculation unit 112 to calculate the similarity.
- the data structure similarity calculation unit 112 re-executes each process necessary for similarity calculation (step 302) in accordance with this instruction. Also, the reusable data conversion component extraction unit 113 that has received the result of the re-execution performs each process necessary for the extraction process (step 303) of the reusable data conversion program based on the result of similarity calculation or the like. Try again.
- the user interface unit 111 acquires the result of such re-execution, updates the screen 1110, and displays it on the input terminal 120. Therefore, the above-described design developer can confirm the result when the weight value for similarity calculation is changed.
- the pull-down menu 111031 is shown as an example of a user interface that accepts a change in the similarity calculation weight value.
- the present invention is not limited to this, and various existing interfaces that receive a change instruction for a predetermined event (eg, slider) A bar, multiple radio buttons, etc.) may be employed as appropriate.
- the present embodiment it is possible to save the data conversion processing component that has already been designed and developed by eliminating the work such as the correspondence between the data format of the data format required by the delivery destination system or application and the data format of the master data. It is possible to present reusable parts to the user of the data integration apparatus.
- the calculation device performs a match determination of each column name and data type between target tables when calculating the first and second similarities.
- the similarity is calculated by applying the result of the match determination to a predetermined algorithm, and when the information of the reusable conversion processing component candidate is output, the specified master data format predetermined table and the predetermined system
- the predetermined device is used as information on a conversion processing component candidate that can be reused by reading out information on the conversion processing definition related to the column for which a match is specified in the matching determination and between the tables. It is good also as what is output to.
- the above-mentioned similarity is efficiently calculated with suitable accuracy, and information on conversion processing component candidates that can be reused with respect to the corresponding columns between the tables specified based on such similarity is obtained in a predetermined manner. It can be presented to the person in charge. As a result, even if the conversion definition is between undefined data, it is possible to support the realization of a more efficient data conversion process with high accuracy.
- the calculation device applies a weight value determined for each column according to the magnitude of the influence on the similarity to the result of the coincidence determination when calculating each similarity. Then, the similarity may be calculated by the predetermined algorithm.
- the computing device outputs the specified master data format predetermined table and the predetermined system predetermined table when outputting the information of the reusable conversion processing component candidate.
- the weight value change interface applied to the column is further output to the change interface. The calculation of each similarity and each process associated with the calculation may be re-executed in response to the change instruction of the weighting value received.
- the information processing apparatus determines whether each column name and data type match between target tables when calculating the first and second similarities. And calculating the similarity by applying the result of the coincidence determination to a predetermined algorithm, and when outputting the information of the reusable conversion processing component candidate, the specified predetermined table in the master data format and the predetermined system
- the predetermined table information on the conversion processing definition related to the column for which the match is specified in the match determination and between the tables is read from the storage device, and the information is predetermined as reusable conversion processing component candidate information. It is good also as outputting to an apparatus.
- the information processing apparatus uses the weighting value determined for each column according to the magnitude of the influence on the similarity as the result of the coincidence determination when calculating each similarity.
- the similarity may be calculated by the predetermined algorithm.
- the information processing apparatus when the information processing apparatus outputs information on the reusable conversion processing component candidate, the specified master data format predetermined table and the predetermined system predetermined table For the column in which the match is specified in the match determination and the weight value is applied, and the weight value change interface applied to the column is further output, and the change interface In accordance with the weighting value change instruction received at, the calculation of each similarity and each process associated with the calculation may be re-executed.
- Data Integration Device 101 Data Storage Unit 102 Similarity Calculation Parameter Table 103 Similarity Calculation Result Temporary Storage Unit 104 Data Conversion Processing Component Definition Table 105 Similarity Calculation Result Storage Unit 106 Reusable Component Extraction Result Storage Table 107 Data Structure Definition Table 108 Data conversion component library 109 Master data storage unit 110 Distribution source data storage unit 111 User interface unit 112 Data structure similarity calculation unit 113 Reusable data conversion component extraction unit 114 Communication unit 120 Input terminal 130 Distribution source system 131 Data structure definition Information 140 Distribution destination system 150 Dedicated line 201 CPU (arithmetic unit) 202 HDD (storage device) 203 Memory 204 Input Device 205 Display Device 206 Communication Device 207 Program
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Human Computer Interaction (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020197003935A KR102243794B1 (ko) | 2016-10-07 | 2017-03-21 | 데이터 통합 장치 및 데이터 통합 방법 |
| US16/330,397 US20200193343A1 (en) | 2016-10-07 | 2017-03-21 | Data integration apparatus and data integration method |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2016-198655 | 2016-10-07 | ||
| JP2016198655A JP6723893B2 (ja) | 2016-10-07 | 2016-10-07 | データ統合装置およびデータ統合方法 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018066152A1 true WO2018066152A1 (ja) | 2018-04-12 |
Family
ID=61831657
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2017/011163 Ceased WO2018066152A1 (ja) | 2016-10-07 | 2017-03-21 | データ統合装置およびデータ統合方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20200193343A1 (enExample) |
| JP (1) | JP6723893B2 (enExample) |
| KR (1) | KR102243794B1 (enExample) |
| WO (1) | WO2018066152A1 (enExample) |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11494688B2 (en) * | 2018-04-16 | 2022-11-08 | Oracle International Corporation | Learning ETL rules by example |
| JP2022059247A (ja) * | 2020-10-01 | 2022-04-13 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及びプログラム |
| EP4258173A4 (en) * | 2020-12-31 | 2024-03-06 | Huawei Technologies Co., Ltd. | PROCESSING METHOD AND APPARATUS FOR MODEL |
| US20240296173A1 (en) * | 2021-01-25 | 2024-09-05 | Nec Corporation | Information processing device, control method, and storage medium |
| KR102766064B1 (ko) * | 2022-03-28 | 2025-02-13 | 주식회사 알차다 | 차량구입 견적시스템 및 방법 |
| JP2024157205A (ja) * | 2023-04-25 | 2024-11-07 | 株式会社日立製作所 | データ変換装置、及びデータ変換方法 |
| KR102685789B1 (ko) * | 2023-11-02 | 2024-07-17 | 예스넷 주식회사 | 데이터 코드의 변환을 수행하는 시스템, 장치 및 방법 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007083371A1 (ja) * | 2006-01-18 | 2007-07-26 | Fujitsu Limited | データ統合装置、方法、プログラムを記録した記録媒体 |
| JP2009145972A (ja) * | 2007-12-11 | 2009-07-02 | Hitachi Information Systems Ltd | データべースシステム及びデータべースシステムの制御方法 |
| JP2013225285A (ja) * | 2012-03-19 | 2013-10-31 | Ricoh Co Ltd | 情報処理装置、情報処理方法、およびプログラム |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5601066B2 (ja) | 2010-07-23 | 2014-10-08 | 富士通株式会社 | 情報統合プログラム、装置及び方法 |
-
2016
- 2016-10-07 JP JP2016198655A patent/JP6723893B2/ja active Active
-
2017
- 2017-03-21 KR KR1020197003935A patent/KR102243794B1/ko active Active
- 2017-03-21 US US16/330,397 patent/US20200193343A1/en not_active Abandoned
- 2017-03-21 WO PCT/JP2017/011163 patent/WO2018066152A1/ja not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2007083371A1 (ja) * | 2006-01-18 | 2007-07-26 | Fujitsu Limited | データ統合装置、方法、プログラムを記録した記録媒体 |
| JP2009145972A (ja) * | 2007-12-11 | 2009-07-02 | Hitachi Information Systems Ltd | データべースシステム及びデータべースシステムの制御方法 |
| JP2013225285A (ja) * | 2012-03-19 | 2013-10-31 | Ricoh Co Ltd | 情報処理装置、情報処理方法、およびプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| KR102243794B1 (ko) | 2021-04-23 |
| KR20190028485A (ko) | 2019-03-18 |
| JP6723893B2 (ja) | 2020-07-15 |
| JP2018060430A (ja) | 2018-04-12 |
| US20200193343A1 (en) | 2020-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6723893B2 (ja) | データ統合装置およびデータ統合方法 | |
| US9135351B2 (en) | Data processing method and distributed processing system | |
| US20190251471A1 (en) | Machine learning device | |
| CN112527765B (zh) | 一种数据迁移方法及装置 | |
| US10120658B2 (en) | Method and system for realizing software development tasks | |
| JP7015319B2 (ja) | データ分析支援装置、データ分析支援方法およびデータ分析支援プログラム | |
| CN110837356A (zh) | 一种数据处理方法和装置 | |
| CN106503268B (zh) | 数据对比方法、装置和系统 | |
| CN107463356A (zh) | 任务流程的执行方法和装置 | |
| JP7015320B2 (ja) | データ分析支援装置、データ分析支援方法およびデータ分析支援プログラム | |
| CN112269588A (zh) | 算法的升级方法、装置、终端和计算机可读存储介质 | |
| JP2011060062A (ja) | システム仕様変更の支援システム、支援方法及び支援プログラム | |
| JP5600826B1 (ja) | 非構造化データ処理システム、非構造化データ処理方法およびプログラム | |
| JP6900265B2 (ja) | データ分析システム、及びデータ分析方法 | |
| US20180293285A1 (en) | Information providing method, information providing device, and computer-readable recording medium | |
| US20240168859A1 (en) | Software performance verification system and software performance verification method | |
| JP5449438B2 (ja) | ソフトウェア資産再利用支援装置およびソフトウェア資産再利用支援プログラム | |
| JP5081889B2 (ja) | 入力支援装置、入力支援方法及び入力支援プログラム | |
| JP6157166B2 (ja) | 部品生成システムおよび方法ならびにプログラム | |
| KR20220122562A (ko) | 서브 그래프 매칭 방법 및 장치 | |
| WO2017088547A1 (zh) | 一种数据升级方法和装置 | |
| JP2017004500A (ja) | 分析支援方法、分析支援プログラムおよび分析支援装置 | |
| JP2014096026A (ja) | アプリケーション基盤選定システム | |
| JP6498588B2 (ja) | 情報配信システムおよび情報配信方法 | |
| JP6664306B2 (ja) | 類似文書抽出装置、類似文書抽出方法及び類似文書抽出プログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17857992 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 20197003935 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17857992 Country of ref document: EP Kind code of ref document: A1 |