CN111078774A

CN111078774A - Automatic data integration method

Info

Publication number: CN111078774A
Application number: CN201911235771.8A
Authority: CN
Inventors: 承孝敏; 水新莹; 张宇光; 王兰
Original assignee: Anhui Data Anqiao Data Technology Co Ltd; Institute Of Smart City University Of Science And Technology Of China Wuhu
Current assignee: Anhui Data Anqiao Data Technology Co Ltd; Institute Of Smart City University Of Science And Technology Of China Wuhu
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-04-28

Abstract

The invention discloses an automatic integration method based on data, which comprises the following steps: s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field; s2, establishing a mapping relation between the source field and the standard field; s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table; and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field. The used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized.

Description

Automatic data integration method

Technical Field

The invention belongs to the technical field of data acquisition, and particularly relates to an automatic data integration method.

Background

In the application of data integration processing from different sources, researchers have conducted a lot of research on data integration processing methods and obtained certain results, but the integration processing from different data sources is still a problem.

The existing scheme for integrating data of different data sources is as follows: firstly, customizing development is carried out on data of different data sources according to a preset format, and then the data after the customizing development is integrated. Although this solution can integrate data, since the integrated data come from different data sources, their respective data formats and data types are various, and the same data at different periods may be different, which requires different customized development for each data, there are following disadvantages in this solution: (1) different customized development is required for each type of data, so the development cost is high; (2) each department has various data, so that customized and developed versions are more and more, the types of data to be maintained are more and more, and the cost for maintaining various versions is increased.

Disclosure of Invention

The invention provides an automatic integration method of data, aiming at realizing automatic integration of data from different data sources.

In order to achieve the purpose, the invention adopts the technical scheme that: a method for automatically integrating data specifically comprises the following steps:

s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;

s2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;

s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;

and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.

Further, the field weight calculation formula is specifically as follows:

further, after step S3, the method further includes:

and S5, detecting whether the authority of each standard field in the data summary table is unique, if not, executing the step S4, and if so, directly inserting the record of the standard field into the data model.

Further, target fields in the data model are defined based on user requirements.

The automatic integration method of the data provided by the invention has the following beneficial technical effects:

1) the data of different data sources are acquired and split into data of each type according to the data model, so that different customized development of the data of each type is not needed, and the development cost is reduced; the data types of all the data models use the same set of flow, so that the maintenance is very convenient, and the later maintenance cost is reduced; 2) the used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized, and the automatic data integration method of the data model solves the problem that the data from different sources cannot be automatically integrated in the related technology, so that the effect of automatically integrating the data from different sources is achieved; 3) the data automatic integration method based on the data model improves the data integration and processing efficiency of different sources and also guarantees the accuracy and the effectiveness of the data.

Drawings

Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be given in order to provide those skilled in the art with a more complete, accurate and thorough understanding of the inventive concept and technical solutions of the present invention.

Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention, where the method specifically includes the following steps:

in the embodiment of the invention, the target fields are fields in a data model, one data model may comprise one or more target fields, the target fields are set based on the requirements of users, and the data model is used for storing integrated data information;

in the embodiment of the present invention, the initial source data table may be from different business departments, for example, a personal mobile phone number is obtained, in the traffic police vehicle data table, the mobile phone number may be recorded as "a mobile phone number of a car owner", in the data table of a human-social office, the mobile phone number may be recorded as "TELEPHONE", in the data table of a human-social office, the mobile phone number may be recorded as "a contact information", and in the education department, assuming that the mobile phone number is used as a target field, a source data table containing mobile phone number information needs to be manually screened from the initial source data table.

in the embodiment of the invention, in the process of screening the source data tables, the mapping relation between the target field and the corresponding source field in each source data table is established, and the source field is defined by synonymy, synonymy or alternative name fields of the target field in the source data tables. To explain with the above example, the defined target field is "mobile phone number", and the mapping objects of the target field "mobile phone number" are "mobile phone number of owner", "TELEPHONE", and "contact information".

in the embodiment of the invention, in the data summary table assembly, a row of data corresponds to a record, each column corresponds to a target field, and the target field records: the method comprises the following steps of obtaining a source field name, a standard field name, a source field value under the source field name, the update time of the source field value and the credibility of the source field, wherein in the embodiment of the invention, the calculation formula of the credibility of the source data is as follows: the trusted data amount/the total source data amount in the source data can be understood as standard data, the accuracy rate of the standard data is 100%, the above example is taken as an explanation, the standard field in the data summary table is a mobile phone number, and the field values under the source fields of "mobile phone number of owner", "tele" and "contact way" in the source data table are inserted into the field value under the standard field of "mobile phone number", and are taken as the field value of the standard field.

In the embodiment of the invention, the main key field is the identification number, the passport, the military officer certificate, the unified social credit code, the business license registration number, the organization code and the taxpayer identification number, the mobile phone number is taken as an example for explanation, the main key field is the identification number, the same identification number is supposed to have 3 records in the data summary table, the record corresponding to the highest weight value of the mobile phone number is inserted into the data model, one identification number has three records in the data summary table, because the data sources are different, and only one record exists in one main key field in the same data source.

In the embodiment of the present invention, the field weight calculation formula is specifically as follows:

because the departments of data sources are more, some fields in some departments are only assisted by business management, the accuracy of the fields cannot be guaranteed, the data stored by some departments may still be the data of a long time ago, but the object state of the data record may be changed, and therefore, the field weight is calculated by integrating the field reliability and the data timeliness.

In the embodiment of the present invention, after step S3, the method further includes:

and S5, detecting whether the authority of each standard field in the data summary table is unique, if the detection result is no, executing the step S4, and if the detection result is yes, directly inserting the standard field into the record data model in which the standard field is located.

From the perspective of data authority, some data are one source, namely, a source department is responsible for the accuracy of the data, and some data are multiple sources, the responsible party cannot be confirmed, so that under the condition that the authority of the standard field is not unique, namely, the data source has multiple sources, the reliability of the data source is measured based on the weight of the standard field, therefore, the record with the most weighted value in the standard field is inserted into the data model for storage, and under the condition that the authority is unique, the data source is single, so that the calculation of the weight of the target field is not needed.

The invention has been described above with reference to the accompanying drawings, it is obvious that the invention is not limited to the specific implementation in the above-described manner, and it is within the scope of the invention to apply the inventive concept and solution to other applications without substantial modification.

Claims

1. An automatic data integration method is characterized by specifically comprising the following steps of:

2. The method for automatically integrating data according to claim 1, wherein the formula for calculating the weight of the field is as follows:

3. the method for automatically integrating data according to claim 1 or 2, further comprising, after step S3:

4. A method for automated integration of data according to claim 1, wherein the target fields in the data model are defined based on user requirements.