CN116136843A - Multi-source heterogeneous mass data fusion sharing method under complex service scene - Google Patents

Multi-source heterogeneous mass data fusion sharing method under complex service scene Download PDF

Info

Publication number
CN116136843A
CN116136843A CN202111363554.4A CN202111363554A CN116136843A CN 116136843 A CN116136843 A CN 116136843A CN 202111363554 A CN202111363554 A CN 202111363554A CN 116136843 A CN116136843 A CN 116136843A
Authority
CN
China
Prior art keywords
data
service
providing
database
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111363554.4A
Other languages
Chinese (zh)
Inventor
郭会明
刘佳
赵凯
庞路明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Changfeng Science Technology Industry Group Corp
Original Assignee
China Changfeng Science Technology Industry Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Changfeng Science Technology Industry Group Corp filed Critical China Changfeng Science Technology Industry Group Corp
Priority to CN202111363554.4A priority Critical patent/CN116136843A/en
Publication of CN116136843A publication Critical patent/CN116136843A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Automation & Control Theory (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-source heterogeneous mass data fusion sharing method in a complex business scene, which comprises data access, data processing, data management, data organization and data service. The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.

Description

Multi-source heterogeneous mass data fusion sharing method under complex service scene
Technical Field
The invention belongs to the technical field of big data processing, and relates to a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which is used for fusion sharing of a large amount of data of different services, different sources and different types among multiple departments.
Background
The large data utilization often needs to call the data of each related industry and each field through a unified platform for analysis and processing, including means such as data extraction, file analysis, interface call, data pushing and the like. However, at present, due to practical reasons such as application blocking, task segmentation, data dispersion, resource fragmentation, standard non-uniformity and the like of each business department in data acquisition and processing, the problems of difficult data access aggregation, difficult data asset management, low development and utilization level, difficult data fusion sharing and the like are caused.
Disclosure of Invention
The invention aims to solve the problems and provide a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which provides support for large-scale, multi-source, cross-domain and heterogeneous data convergence fusion sharing, further plays the efficacy of data assets and improves the application intelligence level.
The technical scheme of the invention is as follows:
the multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of:
(1) And (3) data access: a unified data convergence platform is established, data annotation items are unified, communication cost of different services is reduced, and preliminary docking work of data is completed;
(2) And (3) data processing: for various data accessed to the data convergence platform, data cleaning is carried out, wrong and repeated data are deleted, and relevant missing item information is complemented according to a mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; checking the data;
(3) Data management: cataloging the data resources to form a data asset catalog, and managing the data resources to realize scientific, orderly and safe use of the data resources;
(4) Data organization: according to the requirements of data application, organizing schemes according to the standard and the flow specification defined by the data, realizing the classified database construction from an original database to a standard database, a subject database and a subject database, and further strengthening the internal association of the application system big data;
(5) Data service: providing a query retrieval service of data detail, and supporting the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data; and providing data access authentication based on the data access rule, and realizing access control to the data resource.
The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.
Drawings
Fig. 1 is a data access convergence fusion sharing technology roadmap of the invention.
Detailed Description
As shown in fig. 1, the present invention mainly comprises data access, data processing, data management, data organization and data service.
(1) And (3) data access:
firstly, the required data is cleared according to the field of the self, the self function positioning and the self business application system, and the required data is targeted and refined as much as possible; and secondly, according to the data requirements, the data information condition table of the requirements is combed, the contents of source units, data items, update frequency, highest time delay, data format, data quantity, data type, network where the data are located, system where the data are located, security requirements, data security, data classification, service classification, permission approval, sharing requirements, application description and the like are refined, the communication cost of different services is reduced, and the primary docking work of the data is completed efficiently.
And determining a data access mode according to the specific condition of the data. For a data providing unit with higher informatization degree and perfect data center construction, a data pushing mode can be adopted, and the data providing unit actively pushes service data to a front-end processor or a front-end buffer area of a resource pool of the data management center through a corresponding data pushing system; for imperfect or non-possessed data center, the access unit with certain informatization level can adopt data extraction mode, and the service data is extracted from the exchange library or service library of the data providing unit to the front-end processor or front-end buffer zone by installing data acquisition software or extraction software in the front-end processor or front-end buffer zone, or the data interface externally published by the data providing unit is called to realize data extraction. For the data providing units with low informatization degree or no business application system, the manual collection and extraction of the data can be realized by means of file data medium conversion or importing.
(2) And (3) data processing:
the data processing is carried out on the basis of the data access convergence platform, the technical method is relatively universal, the data extraction is required for various data accessed to the data center original library, and the data cleaning is carried out on the extracted data according to the related cleaning rules: deleting some erroneous, duplicate data; the related missing item information is complemented according to the mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; and performing operations such as checking on the data.
In the data processing process, the association and comparison of the data can be realized according to the service requirement, and the data conforming to the related service rule can be identified. The data processed in the original library can be distributed to a related data standard library, a theme library or a thematic library according to the property and the purpose of the data.
(3) Data management:
the data management is to carry out planning design, process control and quality supervision of a full life cycle on data resources of a data resource center, realize transparency and controllability of data assets through standardized data management, clear the data assets, perfect data standards and data processing flows, improve data quality, guarantee data safety and promote data flow and value extraction.
By cataloging the data resources, a data asset catalog is formed, and the data resources are managed, so that the scientific, orderly and safe use of the data resources is realized. The data classification is to divide the sensitivity degree of the data content or the use range of the data, construct a perfect data classification management system, identify the data by utilizing the data classification, and coordinate with the data authorization to ensure the safe use of the data. The data blood margin refers to each process of recording data processing in the whole process of data generation, processing and flow to final extinction, so that a complete inheritance relationship of the data is formed. In the data management process, various data quality problems are timely found, positioned and tracked by establishing quality evaluation standards and management specifications, a closed loop for processing the data quality problems is formed, and the stability and reliability of the data quality are ensured.
(4) Data organization:
the data organization is to organize schemes according to the standard and the flow specification defined by the data according to the requirements of the data application, realize the classified construction of the data from the original library to the standard library, the subject library and the subject library, and further strengthen the internal association of the application system big data.
The original database reserves the original data accessed by the data, and the original data is not only the basis of data processing, but also the basis of data tracing; the standard library is used for integrating various accessed data resources, establishing key elements and data sets of association relations among the elements, and providing effective support for various services; the subject database is used for establishing a subject data warehouse according to various accessed data, accumulating data sets of various dimensions and providing a basis for statistical analysis and unified service of the data; the topic library is used for respectively establishing a business application topic library according to each business topic according to the actual business condition and providing support for business application; the knowledge base refers to the rule, algorithm and model for sharing knowledge in the business field, and the rule, algorithm and model are abstracted, verified and summarized to sort out the general thematic knowledge in the field.
(5) Data service:
the data service is to provide query and search service for various resource conditions and data details, and support the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data, subscribing data resources of a data resource center by each data use unit according to the need, and pushing and issuing the data to each resource sub-center or service system by the resource center at regular time; and providing data access authentication based on the data access rule, and realizing access control to the data resource.

Claims (1)

1. The multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of:
(1) And (3) data access: a unified data convergence platform is established, data annotation items are unified, communication cost of different services is reduced, and preliminary docking work of data is completed;
(2) And (3) data processing: for various data accessed to the data convergence platform, data cleaning is carried out, wrong and repeated data are deleted, and relevant missing item information is complemented according to a mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; checking the data;
(3) Data management: cataloging the data resources to form a data asset catalog, and managing the data resources to realize scientific, orderly and safe use of the data resources;
(4) Data organization: according to the requirements of data application, organizing schemes according to the standard and the flow specification defined by the data, realizing the classified database construction from an original database to a standard database, a subject database and a subject database, and further strengthening the internal association of the application system big data;
(5) Data service: providing a query retrieval service of data detail, and supporting the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data; and providing data access authentication based on the data access rule, and realizing access control to the data resource.
CN202111363554.4A 2021-11-18 2021-11-18 Multi-source heterogeneous mass data fusion sharing method under complex service scene Pending CN116136843A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111363554.4A CN116136843A (en) 2021-11-18 2021-11-18 Multi-source heterogeneous mass data fusion sharing method under complex service scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111363554.4A CN116136843A (en) 2021-11-18 2021-11-18 Multi-source heterogeneous mass data fusion sharing method under complex service scene

Publications (1)

Publication Number Publication Date
CN116136843A true CN116136843A (en) 2023-05-19

Family

ID=86334113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111363554.4A Pending CN116136843A (en) 2021-11-18 2021-11-18 Multi-source heterogeneous mass data fusion sharing method under complex service scene

Country Status (1)

Country Link
CN (1) CN116136843A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118035527A (en) * 2024-04-11 2024-05-14 深圳迅策科技股份有限公司 Interactive data processing method, medium and equipment for business and resource

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118035527A (en) * 2024-04-11 2024-05-14 深圳迅策科技股份有限公司 Interactive data processing method, medium and equipment for business and resource
CN118035527B (en) * 2024-04-11 2024-06-11 深圳迅策科技股份有限公司 Interactive data processing method, medium and equipment for business and resource

Similar Documents

Publication Publication Date Title
Chee et al. Algorithms for frequent itemset mining: a literature review
Zhang et al. Multi-database mining
CN107315776B (en) Data management system based on cloud computing
CN105554070B (en) A method of based on police service large data center Service and Construction
WO2021159834A1 (en) Abnormal information processing node analysis method and apparatus, medium and electronic device
Ienco et al. Parameter-less co-clustering for star-structured heterogeneous data
CN112199433A (en) Data management system for city-level data middling station
US9123006B2 (en) Techniques for parallel business intelligence evaluation and management
CN112000773B (en) Search engine technology-based data association relation mining method and application
CN111538794B (en) Data fusion method, device and equipment
Zhang et al. Research on the integration of heterogeneous information resources in university management informatization based on data mining algorithms
US20190050435A1 (en) Object data association index system and methods for the construction and applications thereof
Huang et al. Technology–function matrix based network analysis of cloud computing
CN116244367A (en) Visual big data analysis platform based on multi-model custom algorithm
US11847169B2 (en) Method for data processing and interactive information exchange with feature data extraction and bidirectional value evaluation for technology transfer and computer used therein
CN116136843A (en) Multi-source heterogeneous mass data fusion sharing method under complex service scene
CN112800149A (en) Data blood margin analysis-based data management method and system
Arputhamary et al. A review on big data integration
Niu Optimization of teaching management system based on association rules algorithm
CN108108444B (en) Enterprise business unit self-adaptive system and implementation method thereof
Li et al. Automatic classification algorithm for multisearch data association rules in wireless networks
Vasilyeva et al. Leveraging flexible data management with graph databases
Li [Retracted] Research on the Social Security and Elderly Care System under the Background of Big Data
CN114138913A (en) Database modeling method, device, equipment and computer storage medium
CN111831887A (en) Internet resource integration system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication