CN116136843A - Multi-source heterogeneous mass data fusion sharing method under complex service scene - Google Patents
Multi-source heterogeneous mass data fusion sharing method under complex service scene Download PDFInfo
- Publication number
- CN116136843A CN116136843A CN202111363554.4A CN202111363554A CN116136843A CN 116136843 A CN116136843 A CN 116136843A CN 202111363554 A CN202111363554 A CN 202111363554A CN 116136843 A CN116136843 A CN 116136843A
- Authority
- CN
- China
- Prior art keywords
- data
- service
- providing
- database
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Automation & Control Theory (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a multi-source heterogeneous mass data fusion sharing method in a complex business scene, which comprises data access, data processing, data management, data organization and data service. The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.
Description
Technical Field
The invention belongs to the technical field of big data processing, and relates to a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which is used for fusion sharing of a large amount of data of different services, different sources and different types among multiple departments.
Background
The large data utilization often needs to call the data of each related industry and each field through a unified platform for analysis and processing, including means such as data extraction, file analysis, interface call, data pushing and the like. However, at present, due to practical reasons such as application blocking, task segmentation, data dispersion, resource fragmentation, standard non-uniformity and the like of each business department in data acquisition and processing, the problems of difficult data access aggregation, difficult data asset management, low development and utilization level, difficult data fusion sharing and the like are caused.
Disclosure of Invention
The invention aims to solve the problems and provide a multi-source heterogeneous mass data fusion sharing method in a complex service scene, which provides support for large-scale, multi-source, cross-domain and heterogeneous data convergence fusion sharing, further plays the efficacy of data assets and improves the application intelligence level.
The technical scheme of the invention is as follows:
the multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of:
(1) And (3) data access: a unified data convergence platform is established, data annotation items are unified, communication cost of different services is reduced, and preliminary docking work of data is completed;
(2) And (3) data processing: for various data accessed to the data convergence platform, data cleaning is carried out, wrong and repeated data are deleted, and relevant missing item information is complemented according to a mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; checking the data;
(3) Data management: cataloging the data resources to form a data asset catalog, and managing the data resources to realize scientific, orderly and safe use of the data resources;
(4) Data organization: according to the requirements of data application, organizing schemes according to the standard and the flow specification defined by the data, realizing the classified database construction from an original database to a standard database, a subject database and a subject database, and further strengthening the internal association of the application system big data;
(5) Data service: providing a query retrieval service of data detail, and supporting the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data; and providing data access authentication based on the data access rule, and realizing access control to the data resource.
The invention takes the data acquisition technology as a support, is mainly based on a big data acquisition platform, and develops the acquisition sharing of the data of different business member units according to different application scenes of various data, forms a unified data service product, provides data sharing service to the outside, and serves upper business application.
Drawings
Fig. 1 is a data access convergence fusion sharing technology roadmap of the invention.
Detailed Description
As shown in fig. 1, the present invention mainly comprises data access, data processing, data management, data organization and data service.
(1) And (3) data access:
firstly, the required data is cleared according to the field of the self, the self function positioning and the self business application system, and the required data is targeted and refined as much as possible; and secondly, according to the data requirements, the data information condition table of the requirements is combed, the contents of source units, data items, update frequency, highest time delay, data format, data quantity, data type, network where the data are located, system where the data are located, security requirements, data security, data classification, service classification, permission approval, sharing requirements, application description and the like are refined, the communication cost of different services is reduced, and the primary docking work of the data is completed efficiently.
And determining a data access mode according to the specific condition of the data. For a data providing unit with higher informatization degree and perfect data center construction, a data pushing mode can be adopted, and the data providing unit actively pushes service data to a front-end processor or a front-end buffer area of a resource pool of the data management center through a corresponding data pushing system; for imperfect or non-possessed data center, the access unit with certain informatization level can adopt data extraction mode, and the service data is extracted from the exchange library or service library of the data providing unit to the front-end processor or front-end buffer zone by installing data acquisition software or extraction software in the front-end processor or front-end buffer zone, or the data interface externally published by the data providing unit is called to realize data extraction. For the data providing units with low informatization degree or no business application system, the manual collection and extraction of the data can be realized by means of file data medium conversion or importing.
(2) And (3) data processing:
the data processing is carried out on the basis of the data access convergence platform, the technical method is relatively universal, the data extraction is required for various data accessed to the data center original library, and the data cleaning is carried out on the extracted data according to the related cleaning rules: deleting some erroneous, duplicate data; the related missing item information is complemented according to the mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; and performing operations such as checking on the data.
In the data processing process, the association and comparison of the data can be realized according to the service requirement, and the data conforming to the related service rule can be identified. The data processed in the original library can be distributed to a related data standard library, a theme library or a thematic library according to the property and the purpose of the data.
(3) Data management:
the data management is to carry out planning design, process control and quality supervision of a full life cycle on data resources of a data resource center, realize transparency and controllability of data assets through standardized data management, clear the data assets, perfect data standards and data processing flows, improve data quality, guarantee data safety and promote data flow and value extraction.
By cataloging the data resources, a data asset catalog is formed, and the data resources are managed, so that the scientific, orderly and safe use of the data resources is realized. The data classification is to divide the sensitivity degree of the data content or the use range of the data, construct a perfect data classification management system, identify the data by utilizing the data classification, and coordinate with the data authorization to ensure the safe use of the data. The data blood margin refers to each process of recording data processing in the whole process of data generation, processing and flow to final extinction, so that a complete inheritance relationship of the data is formed. In the data management process, various data quality problems are timely found, positioned and tracked by establishing quality evaluation standards and management specifications, a closed loop for processing the data quality problems is formed, and the stability and reliability of the data quality are ensured.
(4) Data organization:
the data organization is to organize schemes according to the standard and the flow specification defined by the data according to the requirements of the data application, realize the classified construction of the data from the original library to the standard library, the subject library and the subject library, and further strengthen the internal association of the application system big data.
The original database reserves the original data accessed by the data, and the original data is not only the basis of data processing, but also the basis of data tracing; the standard library is used for integrating various accessed data resources, establishing key elements and data sets of association relations among the elements, and providing effective support for various services; the subject database is used for establishing a subject data warehouse according to various accessed data, accumulating data sets of various dimensions and providing a basis for statistical analysis and unified service of the data; the topic library is used for respectively establishing a business application topic library according to each business topic according to the actual business condition and providing support for business application; the knowledge base refers to the rule, algorithm and model for sharing knowledge in the business field, and the rule, algorithm and model are abstracted, verified and summarized to sort out the general thematic knowledge in the field.
(5) Data service:
the data service is to provide query and search service for various resource conditions and data details, and support the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data, subscribing data resources of a data resource center by each data use unit according to the need, and pushing and issuing the data to each resource sub-center or service system by the resource center at regular time; and providing data access authentication based on the data access rule, and realizing access control to the data resource.
Claims (1)
1. The multi-source heterogeneous mass data fusion sharing method under the complex service scene is characterized by comprising the following steps of:
(1) And (3) data access: a unified data convergence platform is established, data annotation items are unified, communication cost of different services is reduced, and preliminary docking work of data is completed;
(2) And (3) data processing: for various data accessed to the data convergence platform, data cleaning is carried out, wrong and repeated data are deleted, and relevant missing item information is complemented according to a mapping table; performing conversion of related codes according to the data dictionary table; extracting characteristic character strings in the fields; checking the data;
(3) Data management: cataloging the data resources to form a data asset catalog, and managing the data resources to realize scientific, orderly and safe use of the data resources;
(4) Data organization: according to the requirements of data application, organizing schemes according to the standard and the flow specification defined by the data, realizing the classified database construction from an original database to a standard database, a subject database and a subject database, and further strengthening the internal association of the application system big data;
(5) Data service: providing a query retrieval service of data detail, and supporting the query of accurate/fuzzy and various screening conditions; providing model analysis service, carrying out collision, analysis and prediction on the data set according to service requirements, and searching for a hidden rule in the data; providing comparison subscription and data pushing of data; and providing data access authentication based on the data access rule, and realizing access control to the data resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111363554.4A CN116136843A (en) | 2021-11-18 | 2021-11-18 | Multi-source heterogeneous mass data fusion sharing method under complex service scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111363554.4A CN116136843A (en) | 2021-11-18 | 2021-11-18 | Multi-source heterogeneous mass data fusion sharing method under complex service scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116136843A true CN116136843A (en) | 2023-05-19 |
Family
ID=86334113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111363554.4A Pending CN116136843A (en) | 2021-11-18 | 2021-11-18 | Multi-source heterogeneous mass data fusion sharing method under complex service scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116136843A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118035527A (en) * | 2024-04-11 | 2024-05-14 | 深圳迅策科技股份有限公司 | Interactive data processing method, medium and equipment for business and resource |
-
2021
- 2021-11-18 CN CN202111363554.4A patent/CN116136843A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118035527A (en) * | 2024-04-11 | 2024-05-14 | 深圳迅策科技股份有限公司 | Interactive data processing method, medium and equipment for business and resource |
CN118035527B (en) * | 2024-04-11 | 2024-06-11 | 深圳迅策科技股份有限公司 | Interactive data processing method, medium and equipment for business and resource |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chee et al. | Algorithms for frequent itemset mining: a literature review | |
Zhang et al. | Multi-database mining | |
CN107315776B (en) | Data management system based on cloud computing | |
CN105554070B (en) | A method of based on police service large data center Service and Construction | |
WO2021159834A1 (en) | Abnormal information processing node analysis method and apparatus, medium and electronic device | |
Ienco et al. | Parameter-less co-clustering for star-structured heterogeneous data | |
CN112199433A (en) | Data management system for city-level data middling station | |
US9123006B2 (en) | Techniques for parallel business intelligence evaluation and management | |
CN112000773B (en) | Search engine technology-based data association relation mining method and application | |
CN111538794B (en) | Data fusion method, device and equipment | |
Zhang et al. | Research on the integration of heterogeneous information resources in university management informatization based on data mining algorithms | |
US20190050435A1 (en) | Object data association index system and methods for the construction and applications thereof | |
Huang et al. | Technology–function matrix based network analysis of cloud computing | |
CN116244367A (en) | Visual big data analysis platform based on multi-model custom algorithm | |
US11847169B2 (en) | Method for data processing and interactive information exchange with feature data extraction and bidirectional value evaluation for technology transfer and computer used therein | |
CN116136843A (en) | Multi-source heterogeneous mass data fusion sharing method under complex service scene | |
CN112800149A (en) | Data blood margin analysis-based data management method and system | |
Arputhamary et al. | A review on big data integration | |
Niu | Optimization of teaching management system based on association rules algorithm | |
CN108108444B (en) | Enterprise business unit self-adaptive system and implementation method thereof | |
Li et al. | Automatic classification algorithm for multisearch data association rules in wireless networks | |
Vasilyeva et al. | Leveraging flexible data management with graph databases | |
Li | [Retracted] Research on the Social Security and Elderly Care System under the Background of Big Data | |
CN114138913A (en) | Database modeling method, device, equipment and computer storage medium | |
CN111831887A (en) | Internet resource integration system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |