CN116136843A

CN116136843A - Multi-source heterogeneous mass data fusion sharing method under complex service scene

Info

Publication number: CN116136843A
Application number: CN202111363554.4A
Authority: CN
Inventors: 郭会明; 刘佳; 赵凯; 庞路明
Original assignee: China Changfeng Science Technology Industry Group Corp
Current assignee: China Changfeng Science Technology Industry Group Corp
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2023-05-19

Abstract

The invention relates to a multi-source heterogeneous mass data fusion sharing method in a complex business scene, which comprises data access, data processing, data management, data organization and data service. The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.

Description

Multi-source heterogeneous mass data fusion sharing method under complex service scene

Technical Field

The invention belongs to the technical field of big data processing, and relates to a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which is used for fusion sharing of a large amount of data of different services, different sources and different types among multiple departments.

Background

The large data utilization often needs to call the data of each related industry and each field through a unified platform for analysis and processing, including means such as data extraction, file analysis, interface call, data pushing and the like. However, at present, due to practical reasons such as application blocking, task segmentation, data dispersion, resource fragmentation, standard non-uniformity and the like of each business department in data acquisition and processing, the problems of difficult data access aggregation, difficult data asset management, low development and utilization level, difficult data fusion sharing and the like are caused.

Disclosure of Invention

The invention aims to solve the problems and provide a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which provides support for large-scale, multi-source, cross-domain and heterogeneous data convergence fusion sharing, further plays the efficacy of data assets and improves the application intelligence level.

The technical scheme of the invention is as follows:

the multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of:

(1) And (3) data access: a unified data convergence platform is established, data annotation items are unified, communication cost of different services is reduced, and preliminary docking work of data is completed;

(2) And (3) data processing: for various data accessed to the data convergence platform, data cleaning is carried out, wrong and repeated data are deleted, and relevant missing item information is complemented according to a mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; checking the data;

(3) Data management: cataloging the data resources to form a data asset catalog, and managing the data resources to realize scientific, orderly and safe use of the data resources;

(4) Data organization: according to the requirements of data application, organizing schemes according to the standard and the flow specification defined by the data, realizing the classified database construction from an original database to a standard database, a subject database and a subject database, and further strengthening the internal association of the application system big data;

(5) Data service: providing a query retrieval service of data detail, and supporting the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data; and providing data access authentication based on the data access rule, and realizing access control to the data resource.

The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.

Drawings

Fig. 1 is a data access convergence fusion sharing technology roadmap of the invention.

Detailed Description

As shown in fig. 1, the present invention mainly comprises data access, data processing, data management, data organization and data service.

(1) And (3) data access:

firstly, the required data is cleared according to the field of the self, the self function positioning and the self business application system, and the required data is targeted and refined as much as possible; and secondly, according to the data requirements, the data information condition table of the requirements is combed, the contents of source units, data items, update frequency, highest time delay, data format, data quantity, data type, network where the data are located, system where the data are located, security requirements, data security, data classification, service classification, permission approval, sharing requirements, application description and the like are refined, the communication cost of different services is reduced, and the primary docking work of the data is completed efficiently.

And determining a data access mode according to the specific condition of the data. For a data providing unit with higher informatization degree and perfect data center construction, a data pushing mode can be adopted, and the data providing unit actively pushes service data to a front-end processor or a front-end buffer area of a resource pool of the data management center through a corresponding data pushing system; for imperfect or non-possessed data center, the access unit with certain informatization level can adopt data extraction mode, and the service data is extracted from the exchange library or service library of the data providing unit to the front-end processor or front-end buffer zone by installing data acquisition software or extraction software in the front-end processor or front-end buffer zone, or the data interface externally published by the data providing unit is called to realize data extraction. For the data providing units with low informatization degree or no business application system, the manual collection and extraction of the data can be realized by means of file data medium conversion or importing.

(2) And (3) data processing:

the data processing is carried out on the basis of the data access convergence platform, the technical method is relatively universal, the data extraction is required for various data accessed to the data center original library, and the data cleaning is carried out on the extracted data according to the related cleaning rules: deleting some erroneous, duplicate data; the related missing item information is complemented according to the mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; and performing operations such as checking on the data.

In the data processing process, the association and comparison of the data can be realized according to the service requirement, and the data conforming to the related service rule can be identified. The data processed in the original library can be distributed to a related data standard library, a theme library or a thematic library according to the property and the purpose of the data.

(3) Data management:

the data management is to carry out planning design, process control and quality supervision of a full life cycle on data resources of a data resource center, realize transparency and controllability of data assets through standardized data management, clear the data assets, perfect data standards and data processing flows, improve data quality, guarantee data safety and promote data flow and value extraction.

By cataloging the data resources, a data asset catalog is formed, and the data resources are managed, so that the scientific, orderly and safe use of the data resources is realized. The data classification is to divide the sensitivity degree of the data content or the use range of the data, construct a perfect data classification management system, identify the data by utilizing the data classification, and coordinate with the data authorization to ensure the safe use of the data. The data blood margin refers to each process of recording data processing in the whole process of data generation, processing and flow to final extinction, so that a complete inheritance relationship of the data is formed. In the data management process, various data quality problems are timely found, positioned and tracked by establishing quality evaluation standards and management specifications, a closed loop for processing the data quality problems is formed, and the stability and reliability of the data quality are ensured.

(4) Data organization:

the data organization is to organize schemes according to the standard and the flow specification defined by the data according to the requirements of the data application, realize the classified construction of the data from the original library to the standard library, the subject library and the subject library, and further strengthen the internal association of the application system big data.

The original database reserves the original data accessed by the data, and the original data is not only the basis of data processing, but also the basis of data tracing; the standard library is used for integrating various accessed data resources, establishing key elements and data sets of association relations among the elements, and providing effective support for various services; the subject database is used for establishing a subject data warehouse according to various accessed data, accumulating data sets of various dimensions and providing a basis for statistical analysis and unified service of the data; the topic library is used for respectively establishing a business application topic library according to each business topic according to the actual business condition and providing support for business application; the knowledge base refers to the rule, algorithm and model for sharing knowledge in the business field, and the rule, algorithm and model are abstracted, verified and summarized to sort out the general thematic knowledge in the field.

(5) Data service:

the data service is to provide query and search service for various resource conditions and data details, and support the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data, subscribing data resources of a data resource center by each data use unit according to the need, and pushing and issuing the data to each resource sub-center or service system by the resource center at regular time; and providing data access authentication based on the data access rule, and realizing access control to the data resource.

Claims

1. The multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of: