CN111078774A - Automatic data integration method - Google Patents
Automatic data integration method Download PDFInfo
- Publication number
- CN111078774A CN111078774A CN201911235771.8A CN201911235771A CN111078774A CN 111078774 A CN111078774 A CN 111078774A CN 201911235771 A CN201911235771 A CN 201911235771A CN 111078774 A CN111078774 A CN 111078774A
- Authority
- CN
- China
- Prior art keywords
- data
- field
- source
- target
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an automatic integration method based on data, which comprises the following steps: s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field; s2, establishing a mapping relation between the source field and the standard field; s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table; and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field. The used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized.
Description
Technical Field
The invention belongs to the technical field of data acquisition, and particularly relates to an automatic data integration method.
Background
In the application of data integration processing from different sources, researchers have conducted a lot of research on data integration processing methods and obtained certain results, but the integration processing from different data sources is still a problem.
The existing scheme for integrating data of different data sources is as follows: firstly, customizing development is carried out on data of different data sources according to a preset format, and then the data after the customizing development is integrated. Although this solution can integrate data, since the integrated data come from different data sources, their respective data formats and data types are various, and the same data at different periods may be different, which requires different customized development for each data, there are following disadvantages in this solution: (1) different customized development is required for each type of data, so the development cost is high; (2) each department has various data, so that customized and developed versions are more and more, the types of data to be maintained are more and more, and the cost for maintaining various versions is increased.
Disclosure of Invention
The invention provides an automatic integration method of data, aiming at realizing automatic integration of data from different data sources.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for automatically integrating data specifically comprises the following steps:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
s2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
Further, the field weight calculation formula is specifically as follows:
further, after step S3, the method further includes:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if not, executing the step S4, and if so, directly inserting the record of the standard field into the data model.
Further, target fields in the data model are defined based on user requirements.
The automatic integration method of the data provided by the invention has the following beneficial technical effects:
1) the data of different data sources are acquired and split into data of each type according to the data model, so that different customized development of the data of each type is not needed, and the development cost is reduced; the data types of all the data models use the same set of flow, so that the maintenance is very convenient, and the later maintenance cost is reduced; 2) the used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized, and the automatic data integration method of the data model solves the problem that the data from different sources cannot be automatically integrated in the related technology, so that the effect of automatically integrating the data from different sources is achieved; 3) the data automatic integration method based on the data model improves the data integration and processing efficiency of different sources and also guarantees the accuracy and the effectiveness of the data.
Drawings
Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be given in order to provide those skilled in the art with a more complete, accurate and thorough understanding of the inventive concept and technical solutions of the present invention.
Fig. 1 is a flowchart of an automatic data integration method according to an embodiment of the present invention, where the method specifically includes the following steps:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
in the embodiment of the invention, the target fields are fields in a data model, one data model may comprise one or more target fields, the target fields are set based on the requirements of users, and the data model is used for storing integrated data information;
in the embodiment of the present invention, the initial source data table may be from different business departments, for example, a personal mobile phone number is obtained, in the traffic police vehicle data table, the mobile phone number may be recorded as "a mobile phone number of a car owner", in the data table of a human-social office, the mobile phone number may be recorded as "TELEPHONE", in the data table of a human-social office, the mobile phone number may be recorded as "a contact information", and in the education department, assuming that the mobile phone number is used as a target field, a source data table containing mobile phone number information needs to be manually screened from the initial source data table.
S2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
in the embodiment of the invention, in the process of screening the source data tables, the mapping relation between the target field and the corresponding source field in each source data table is established, and the source field is defined by synonymy, synonymy or alternative name fields of the target field in the source data tables. To explain with the above example, the defined target field is "mobile phone number", and the mapping objects of the target field "mobile phone number" are "mobile phone number of owner", "TELEPHONE", and "contact information".
S3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
in the embodiment of the invention, in the data summary table assembly, a row of data corresponds to a record, each column corresponds to a target field, and the target field records: the method comprises the following steps of obtaining a source field name, a standard field name, a source field value under the source field name, the update time of the source field value and the credibility of the source field, wherein in the embodiment of the invention, the calculation formula of the credibility of the source data is as follows: the trusted data amount/the total source data amount in the source data can be understood as standard data, the accuracy rate of the standard data is 100%, the above example is taken as an explanation, the standard field in the data summary table is a mobile phone number, and the field values under the source fields of "mobile phone number of owner", "tele" and "contact way" in the source data table are inserted into the field value under the standard field of "mobile phone number", and are taken as the field value of the standard field.
And S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
In the embodiment of the invention, the main key field is the identification number, the passport, the military officer certificate, the unified social credit code, the business license registration number, the organization code and the taxpayer identification number, the mobile phone number is taken as an example for explanation, the main key field is the identification number, the same identification number is supposed to have 3 records in the data summary table, the record corresponding to the highest weight value of the mobile phone number is inserted into the data model, one identification number has three records in the data summary table, because the data sources are different, and only one record exists in one main key field in the same data source.
In the embodiment of the present invention, the field weight calculation formula is specifically as follows:
because the departments of data sources are more, some fields in some departments are only assisted by business management, the accuracy of the fields cannot be guaranteed, the data stored by some departments may still be the data of a long time ago, but the object state of the data record may be changed, and therefore, the field weight is calculated by integrating the field reliability and the data timeliness.
In the embodiment of the present invention, after step S3, the method further includes:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if the detection result is no, executing the step S4, and if the detection result is yes, directly inserting the standard field into the record data model in which the standard field is located.
From the perspective of data authority, some data are one source, namely, a source department is responsible for the accuracy of the data, and some data are multiple sources, the responsible party cannot be confirmed, so that under the condition that the authority of the standard field is not unique, namely, the data source has multiple sources, the reliability of the data source is measured based on the weight of the standard field, therefore, the record with the most weighted value in the standard field is inserted into the data model for storage, and under the condition that the authority is unique, the data source is single, so that the calculation of the weight of the target field is not needed.
The automatic integration method of the data provided by the invention has the following beneficial technical effects:
1) the data of different data sources are acquired and split into data of each type according to the data model, so that different customized development of the data of each type is not needed, and the development cost is reduced; the data types of all the data models use the same set of flow, so that the maintenance is very convenient, and the later maintenance cost is reduced; 2) the used data integration is automatically updated at regular time by using an ETL process according to a data model, so that automatic integration is realized, and the automatic data integration method of the data model solves the problem that the data from different sources cannot be automatically integrated in the related technology, so that the effect of automatically integrating the data from different sources is achieved; 3) the data automatic integration method based on the data model improves the data integration and processing efficiency of different sources and also guarantees the accuracy and the effectiveness of the data.
The invention has been described above with reference to the accompanying drawings, it is obvious that the invention is not limited to the specific implementation in the above-described manner, and it is within the scope of the invention to apply the inventive concept and solution to other applications without substantial modification.
Claims (4)
1. An automatic data integration method is characterized by specifically comprising the following steps of:
s1, screening the obtained initial source data table based on the target field in the data model, and screening out the source data table containing the target field;
s2, establishing a mapping relation between a source field and a standard field, wherein the source field is a field in a source data table, the standard field is a field of a data model and a data summary table, and a target field is defined as the standard field in the data summary table and the data model;
s3, based on the mapping relation between the source field and the standard field, mapping the data in the source field of the source data table to the corresponding standard field of the data summary table;
and S4, calculating the weight of each target field in the data summary table, and inserting the record of the target field with the highest weight into the data model for storage in the record with the same primary key field and the same target field.
3. the method for automatically integrating data according to claim 1 or 2, further comprising, after step S3:
and S5, detecting whether the authority of each standard field in the data summary table is unique, if not, executing the step S4, and if so, directly inserting the record of the standard field into the data model.
4. A method for automated integration of data according to claim 1, wherein the target fields in the data model are defined based on user requirements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911235771.8A CN111078774A (en) | 2019-12-05 | 2019-12-05 | Automatic data integration method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911235771.8A CN111078774A (en) | 2019-12-05 | 2019-12-05 | Automatic data integration method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111078774A true CN111078774A (en) | 2020-04-28 |
Family
ID=70313088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911235771.8A Pending CN111078774A (en) | 2019-12-05 | 2019-12-05 | Automatic data integration method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111078774A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522842A (en) * | 2020-07-04 | 2020-08-11 | 杭州城市大数据运营有限公司 | ETL data processing method and device, computer equipment and storage medium |
CN111625520A (en) * | 2020-06-08 | 2020-09-04 | 成都信息工程大学 | Universal mapping method and system for field types of heterogeneous database |
CN112597168A (en) * | 2020-12-28 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | Processing method, device and platform of multi-source customer data and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544323A (en) * | 2013-11-08 | 2014-01-29 | 中国农业银行股份有限公司 | Data updating method and device |
CN108509485A (en) * | 2018-02-07 | 2018-09-07 | 深圳壹账通智能科技有限公司 | Preprocess method, device, computer equipment and the storage medium of data |
CN109829012A (en) * | 2018-12-13 | 2019-05-31 | 山东亚华电子股份有限公司 | The synchronous method and apparatus of data |
CN110471926A (en) * | 2019-08-15 | 2019-11-19 | 北京明略软件系统有限公司 | A kind of archives method for building up and device |
-
2019
- 2019-12-05 CN CN201911235771.8A patent/CN111078774A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544323A (en) * | 2013-11-08 | 2014-01-29 | 中国农业银行股份有限公司 | Data updating method and device |
CN108509485A (en) * | 2018-02-07 | 2018-09-07 | 深圳壹账通智能科技有限公司 | Preprocess method, device, computer equipment and the storage medium of data |
CN109829012A (en) * | 2018-12-13 | 2019-05-31 | 山东亚华电子股份有限公司 | The synchronous method and apparatus of data |
CN110471926A (en) * | 2019-08-15 | 2019-11-19 | 北京明略软件系统有限公司 | A kind of archives method for building up and device |
Non-Patent Citations (1)
Title |
---|
马峰: ""一种异构数据转换系统的实现"", 《科学技术创新》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111625520A (en) * | 2020-06-08 | 2020-09-04 | 成都信息工程大学 | Universal mapping method and system for field types of heterogeneous database |
CN111625520B (en) * | 2020-06-08 | 2023-06-06 | 成都信息工程大学 | General mapping method and system for field types of heterogeneous database |
CN111522842A (en) * | 2020-07-04 | 2020-08-11 | 杭州城市大数据运营有限公司 | ETL data processing method and device, computer equipment and storage medium |
CN112597168A (en) * | 2020-12-28 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | Processing method, device and platform of multi-source customer data and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111078774A (en) | Automatic data integration method | |
CN106815530B (en) | Data storage method, data verification method and device | |
CN110489313B (en) | Operation log recording method and device based on block chain and storage medium | |
CN102696028B (en) | For carrying out the method and apparatus of Dynamic Packet to the entry in application | |
US20180157851A1 (en) | Systems and methods for authentication of access based on multi-data source information | |
CN110597816A (en) | Data processing method, data processing device, computer equipment and computer readable storage medium | |
CN107680385B (en) | Method and system for determining fake-licensed vehicle | |
Solow et al. | On the Pleistocene extinctions of Alaskan mammoths and horses | |
CN102495848B (en) | Method for processing massive GPS (global positioning system) data and system | |
CN102591960A (en) | Agricultural economy electronic map data service interface method | |
CN112463986A (en) | Information storage method and device | |
CN109816338A (en) | Enterprise's rewards and punishments processing method, device, computer equipment and storage medium | |
CN108563706A (en) | A kind of collection big data intelligent service system and its operation method | |
US20180315130A1 (en) | Intelligent data gathering | |
CN112597168A (en) | Processing method, device and platform of multi-source customer data and storage medium | |
CN112002087A (en) | Book borrowing and returning system and method based on smart campus | |
CN112035676A (en) | User operation behavior knowledge graph construction method and device | |
CN112073554B (en) | Global unique identifier generation method, device and computer readable storage medium | |
Montoya et al. | Thymeflow, a personal knowledge base with spatio-temporal data | |
CN114020699A (en) | Method for returning two files based on query file, storage medium and terminal | |
CN113190562A (en) | Report generation method and device and electronic equipment | |
CN105550326A (en) | Electricity consumption query method and apparatus | |
CN112508472A (en) | Method and system for viewing order information of same account by multiple persons | |
CN110519469A (en) | Intelligent sound exchange method, system, medium and device applied to telephone service platform | |
CN110659867A (en) | Comprehensive personal management service system for fund flow, travel management and financial management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200428 |