CN111078774A - Automatic data integration method - Google Patents

Automatic data integration method Download PDF

Info

Publication number
CN111078774A
CN111078774A CN201911235771.8A CN201911235771A CN111078774A CN 111078774 A CN111078774 A CN 111078774A CN 201911235771 A CN201911235771 A CN 201911235771A CN 111078774 A CN111078774 A CN 111078774A
Authority
CN
China
Prior art keywords
data
field
source
target
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911235771.8A
Other languages
Chinese (zh)
Inventor
承孝敏
水新莹
张宇光
王兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Data Anqiao Data Technology Co Ltd
Institute Of Smart City University Of Science And Technology Of China Wuhu
Original Assignee
Anhui Data Anqiao Data Technology Co Ltd
Institute Of Smart City University Of Science And Technology Of China Wuhu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Data Anqiao Data Technology Co Ltd, Institute Of Smart City University Of Science And Technology Of China Wuhu filed Critical Anhui Data Anqiao Data Technology Co Ltd
Priority to CN201911235771.8A priority Critical patent/CN111078774A/en
Publication of CN111078774A publication Critical patent/CN111078774A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an automatic integration method based on data, which comprises the following steps: s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field; s2, establishing a mapping relation between the source field and the standard field; s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table; and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field. The used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized.

Description

Automatic data integration method
Technical Field
The invention belongs to the technical field of data acquisition, and particularly relates to an automatic data integration method.
Background
In the application of data integration processing from different sources, researchers have conducted a lot of research on data integration processing methods and obtained certain results, but the integration processing from different data sources is still a problem.
The existing scheme for integrating data of different data sources is as follows: firstly, customizing development is carried out on data of different data sources according to a preset format, and then the data after the customizing development is integrated. Although this solution can integrate data, since the integrated data come from different data sources, their respective data formats and data types are various, and the same data at different periods may be different, which requires different customized development for each data, there are following disadvantages in this solution: (1) different customized development is required for each type of data, so the development cost is high; (2) each department has various data, so that customized and developed versions are more and more, the types of data to be maintained are more and more, and the cost for maintaining various versions is increased.
Disclosure of Invention
The invention provides an automatic integration method of data, aiming at realizing automatic integration of data from different data sources.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for automatically integrating data specifically comprises the following steps:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
s2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
Further, the field weight calculation formula is specifically as follows:
Figure BDA0002304833980000021
further, after step S3, the method further includes:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if not, executing the step S4, and if so, directly inserting the record of the standard field into the data model.
Further, target fields in the data model are defined based on user requirements.
The automatic integration method of the data provided by the invention has the following beneficial technical effects:
1) the data of different data sources are acquired and split into data of each type according to the data model, so that different customized development of the data of each type is not needed, and the development cost is reduced; the data types of all the data models use the same set of flow, so that the maintenance is very convenient, and the later maintenance cost is reduced; 2) the used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized, and the automatic data integration method of the data model solves the problem that the data from different sources cannot be automatically integrated in the related technology, so that the effect of automatically integrating the data from different sources is achieved; 3) the data automatic integration method based on the data model improves the data integration and processing efficiency of different sources and also guarantees the accuracy and the effectiveness of the data.
Drawings
Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be given in order to provide those skilled in the art with a more complete, accurate and thorough understanding of the inventive concept and technical solutions of the present invention.
Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention, where the method specifically includes the following steps:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
in the embodiment of the invention, the target fields are fields in a data model, one data model may comprise one or more target fields, the target fields are set based on the requirements of users, and the data model is used for storing integrated data information;
in the embodiment of the present invention, the initial source data table may be from different business departments, for example, a personal mobile phone number is obtained, in the traffic police vehicle data table, the mobile phone number may be recorded as "a mobile phone number of a car owner", in the data table of a human-social office, the mobile phone number may be recorded as "TELEPHONE", in the data table of a human-social office, the mobile phone number may be recorded as "a contact information", and in the education department, assuming that the mobile phone number is used as a target field, a source data table containing mobile phone number information needs to be manually screened from the initial source data table.
S2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
in the embodiment of the invention, in the process of screening the source data tables, the mapping relation between the target field and the corresponding source field in each source data table is established, and the source field is defined by synonymy, synonymy or alternative name fields of the target field in the source data tables. To explain with the above example, the defined target field is "mobile phone number", and the mapping objects of the target field "mobile phone number" are "mobile phone number of owner", "TELEPHONE", and "contact information".
S3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
in the embodiment of the invention, in the data summary table assembly, a row of data corresponds to a record, each column corresponds to a target field, and the target field records: the method comprises the following steps of obtaining a source field name, a standard field name, a source field value under the source field name, the update time of the source field value and the credibility of the source field, wherein in the embodiment of the invention, the calculation formula of the credibility of the source data is as follows: the trusted data amount/the total source data amount in the source data can be understood as standard data, the accuracy rate of the standard data is 100%, the above example is taken as an explanation, the standard field in the data summary table is a mobile phone number, and the field values under the source fields of "mobile phone number of owner", "tele" and "contact way" in the source data table are inserted into the field value under the standard field of "mobile phone number", and are taken as the field value of the standard field.
And S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
In the embodiment of the invention, the main key field is the identification number, the passport, the military officer certificate, the unified social credit code, the business license registration number, the organization code and the taxpayer identification number, the mobile phone number is taken as an example for explanation, the main key field is the identification number, the same identification number is supposed to have 3 records in the data summary table, the record corresponding to the highest weight value of the mobile phone number is inserted into the data model, one identification number has three records in the data summary table, because the data sources are different, and only one record exists in one main key field in the same data source.
In the embodiment of the present invention, the field weight calculation formula is specifically as follows:
Figure BDA0002304833980000051
because the departments of data sources are more, some fields in some departments are only assisted by business management, the accuracy of the fields cannot be guaranteed, the data stored by some departments may still be the data of a long time ago, but the object state of the data record may be changed, and therefore, the field weight is calculated by integrating the field reliability and the data timeliness.
In the embodiment of the present invention, after step S3, the method further includes:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if the detection result is no, executing the step S4, and if the detection result is yes, directly inserting the standard field into the record data model in which the standard field is located.
From the perspective of data authority, some data are one source, namely, a source department is responsible for the accuracy of the data, and some data are multiple sources, the responsible party cannot be confirmed, so that under the condition that the authority of the standard field is not unique, namely, the data source has multiple sources, the reliability of the data source is measured based on the weight of the standard field, therefore, the record with the most weighted value in the standard field is inserted into the data model for storage, and under the condition that the authority is unique, the data source is single, so that the calculation of the weight of the target field is not needed.
The automatic integration method of the data provided by the invention has the following beneficial technical effects:
1) the data of different data sources are acquired and split into data of each type according to the data model, so that different customized development of the data of each type is not needed, and the development cost is reduced; the data types of all the data models use the same set of flow, so that the maintenance is very convenient, and the later maintenance cost is reduced; 2) the used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized, and the automatic data integration method of the data model solves the problem that the data from different sources cannot be automatically integrated in the related technology, so that the effect of automatically integrating the data from different sources is achieved; 3) the data automatic integration method based on the data model improves the data integration and processing efficiency of different sources and also guarantees the accuracy and the effectiveness of the data.
The invention has been described above with reference to the accompanying drawings, it is obvious that the invention is not limited to the specific implementation in the above-described manner, and it is within the scope of the invention to apply the inventive concept and solution to other applications without substantial modification.

Claims (4)

1. An automatic data integration method is characterized by specifically comprising the following steps of:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
s2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
2. The method for automatically integrating data according to claim 1, wherein the formula for calculating the weight of the field is as follows:
Figure FDA0002304833970000011
3. the method for automatically integrating data according to claim 1 or 2, further comprising, after step S3:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if not, executing the step S4, and if so, directly inserting the record of the standard field into the data model.
4. A method for automated integration of data according to claim 1, wherein the target fields in the data model are defined based on user requirements.
CN201911235771.8A 2019-12-05 2019-12-05 Automatic data integration method Pending CN111078774A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911235771.8A CN111078774A (en) 2019-12-05 2019-12-05 Automatic data integration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911235771.8A CN111078774A (en) 2019-12-05 2019-12-05 Automatic data integration method

Publications (1)

Publication Number Publication Date
CN111078774A true CN111078774A (en) 2020-04-28

Family

ID=70313088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911235771.8A Pending CN111078774A (en) 2019-12-05 2019-12-05 Automatic data integration method

Country Status (1)

Country Link
CN (1) CN111078774A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522842A (en) * 2020-07-04 2020-08-11 杭州城市大数据运营有限公司 ETL data processing method and device, computer equipment and storage medium
CN111625520A (en) * 2020-06-08 2020-09-04 成都信息工程大学 Universal mapping method and system for field types of heterogeneous database
CN112597168A (en) * 2020-12-28 2021-04-02 恩亿科(北京)数据科技有限公司 Processing method, device and platform of multi-source customer data and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544323A (en) * 2013-11-08 2014-01-29 中国农业银行股份有限公司 Data updating method and device
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109829012A (en) * 2018-12-13 2019-05-31 山东亚华电子股份有限公司 The synchronous method and apparatus of data
CN110471926A (en) * 2019-08-15 2019-11-19 北京明略软件系统有限公司 A kind of archives method for building up and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544323A (en) * 2013-11-08 2014-01-29 中国农业银行股份有限公司 Data updating method and device
CN108509485A (en) * 2018-02-07 2018-09-07 深圳壹账通智能科技有限公司 Preprocess method, device, computer equipment and the storage medium of data
CN109829012A (en) * 2018-12-13 2019-05-31 山东亚华电子股份有限公司 The synchronous method and apparatus of data
CN110471926A (en) * 2019-08-15 2019-11-19 北京明略软件系统有限公司 A kind of archives method for building up and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马峰: ""一种异构数据转换系统的实现"", 《科学技术创新》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625520A (en) * 2020-06-08 2020-09-04 成都信息工程大学 Universal mapping method and system for field types of heterogeneous database
CN111625520B (en) * 2020-06-08 2023-06-06 成都信息工程大学 General mapping method and system for field types of heterogeneous database
CN111522842A (en) * 2020-07-04 2020-08-11 杭州城市大数据运营有限公司 ETL data processing method and device, computer equipment and storage medium
CN112597168A (en) * 2020-12-28 2021-04-02 恩亿科(北京)数据科技有限公司 Processing method, device and platform of multi-source customer data and storage medium

Similar Documents

Publication Publication Date Title
CN111078774A (en) Automatic data integration method
CN106815530B (en) Data storage method, data verification method and device
CN110489313B (en) Operation log recording method and device based on block chain and storage medium
CN102696028B (en) For carrying out the method and apparatus of Dynamic Packet to the entry in application
US20180157851A1 (en) Systems and methods for authentication of access based on multi-data source information
CN110597816A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN107680385B (en) Method and system for determining fake-licensed vehicle
Solow et al. On the Pleistocene extinctions of Alaskan mammoths and horses
CN102495848B (en) Method for processing massive GPS (global positioning system) data and system
CN102591960A (en) Agricultural economy electronic map data service interface method
CN112463986A (en) Information storage method and device
CN109816338A (en) Enterprise's rewards and punishments processing method, device, computer equipment and storage medium
CN108563706A (en) A kind of collection big data intelligent service system and its operation method
US20180315130A1 (en) Intelligent data gathering
CN112597168A (en) Processing method, device and platform of multi-source customer data and storage medium
CN112002087A (en) Book borrowing and returning system and method based on smart campus
CN112035676A (en) User operation behavior knowledge graph construction method and device
CN112073554B (en) Global unique identifier generation method, device and computer readable storage medium
Montoya et al. Thymeflow, a personal knowledge base with spatio-temporal data
CN114020699A (en) Method for returning two files based on query file, storage medium and terminal
CN113190562A (en) Report generation method and device and electronic equipment
CN105550326A (en) Electricity consumption query method and apparatus
CN112508472A (en) Method and system for viewing order information of same account by multiple persons
CN110519469A (en) Intelligent sound exchange method, system, medium and device applied to telephone service platform
CN110659867A (en) Comprehensive personal management service system for fund flow, travel management and financial management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200428