CN112988710A - Big data processing method and system - Google Patents
Big data processing method and system Download PDFInfo
- Publication number
- CN112988710A CN112988710A CN202110289206.0A CN202110289206A CN112988710A CN 112988710 A CN112988710 A CN 112988710A CN 202110289206 A CN202110289206 A CN 202110289206A CN 112988710 A CN112988710 A CN 112988710A
- Authority
- CN
- China
- Prior art keywords
- data
- service
- cleaning
- filtering
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 53
- 238000001914 filtration Methods 0.000 claims abstract description 43
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000013461 design Methods 0.000 claims abstract description 15
- 230000004044 response Effects 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims abstract description 14
- 238000004806 packaging method and process Methods 0.000 claims abstract description 12
- 238000007499 fusion processing Methods 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000000034 method Methods 0.000 claims description 8
- 238000005538 encapsulation Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000003032 molecular docking Methods 0.000 abstract description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a big data processing method and a system, wherein the processing method comprises the following steps: data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized; after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode; and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use. According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a big data processing method and system.
Background
Big data (big data) is a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the rapid development of science and technology, massive data are generated in various industry fields every day, and how to extract useful data from the massive data and obtain a required result through analysis and processing is the most basic problem to be solved by the existing big data processing.
At present, most of big data processing methods classify collected data, analyze and process the classified data, store the data in a certain database or server, and if users who have demands on the data exist, acquire corresponding data from the database or server by acquiring corresponding permissions of the database or server; however, the data analyzed and processed in this way is only obtained passively by the user, and many times, the user does not know where to obtain the corresponding data, so that the use efficiency of the data is too low; therefore, how to solve the problem needs to be considered at the present stage.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a big data processing method and a big data processing system, and overcomes the defects in the conventional big data processing mode.
The purpose of the invention is realized by the following technical scheme: a big data processing method, the processing method comprising:
data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized;
after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode;
and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use.
The data fusion processing by the edge calculation method comprises the following steps:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
The processing treatment for sequentially cleaning and filtering the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
The data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
The user data system comprises MySQL, Oracle, Sybase, FoxPro, BigTable, CouchDB, FileMaker and PostgreSQL, etc.
A system based on big data processing method comprises a data acquisition module, a data processing module, a data fusion module, a storage module and a packaging service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
The data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
The encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
The invention has the following advantages: a big data processing method and system, pack the data into corresponding service directly through the response design engine in the interface service, do not need to develop the specialized data docking interface, do not need to set up the specialized database either by oneself; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, an embodiment of the present invention relates to a big data processing method, which specifically includes the following steps:
s1, acquiring data from the service system in a memory exchange mode, converting the data into data in a standard format, and realizing data standardization; the standard format of the data conversion includes national standard, international standard, military standard, and the like.
S2, sequentially carrying out data cleaning and filtering processing on the data, and then carrying out data fusion processing in an edge calculation mode;
further, the processing treatment of sequentially performing data cleaning and filtering on the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
The missing value cleaning comprises the steps of determining a missing value range, removing unnecessary fields (namely directly deleting the data), filling missing contents and re-fetching numbers;
the missing value range is that the missing value proportion of each field is calculated, and then corresponding strategies are appointed according to the missing proportion and the field importance; for data with high importance and low deletion rate, the strategy is to fill the data through calculation and estimate the data through experience or business knowledge; for data with high importance and high missing rate, the strategy is that data can be obtained from other channels for completion, or other fields are used for obtaining through calculation, or fields are removed and marked in the result; for data with low importance and low deletion rate, the strategy is that no processing or simple filling can be carried out; for data with low importance and high deletion rate, the strategy can be directly omitted.
The filling of missing content may be performed by supposition of a business knowledge or experience to fill the missing value, or by using the calculation result of the same index, or by using the calculation result of different indexes.
The re-fetching is that if some indexes are very important and the loss rate is high, the collected data can be re-acquired or collected again to supplement the data.
Further, the data fusion processing by means of edge calculation includes:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
And S3, storing the fused data in a virtualization storage mode and packaging the fused data into corresponding data service for users to use.
The virtualized storage mainly stores the data path pointing address, and can also store the data by a solid storage mode.
Further, the data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
Example 2
As shown in fig. 2, another embodiment of the present invention relates to a system based on big data processing method, which includes a data acquisition module, a data processing module, a data fusion module, a storage module, and a packaging service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
The data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
The encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A big data processing method is characterized in that: the processing method comprises the following steps:
data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized;
after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode;
and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use.
2. The big data processing method according to claim 1, wherein: the data fusion processing by the edge calculation method comprises the following steps:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
3. The big data processing method according to claim 1, wherein: the processing treatment for sequentially cleaning and filtering the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
4. The big data processing method according to claim 1, wherein: the data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
5. A system based on big data processing method is characterized in that: the system comprises a data acquisition module, a data processing module, a data fusion module, a storage module and an encapsulation service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
6. The big data processing method-based system according to claim 5, wherein: the data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
7. The big data processing method-based system according to claim 5, wherein: the encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289206.0A CN112988710A (en) | 2021-03-18 | 2021-03-18 | Big data processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110289206.0A CN112988710A (en) | 2021-03-18 | 2021-03-18 | Big data processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112988710A true CN112988710A (en) | 2021-06-18 |
Family
ID=76332986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110289206.0A Pending CN112988710A (en) | 2021-03-18 | 2021-03-18 | Big data processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112988710A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113709261A (en) * | 2021-10-29 | 2021-11-26 | 深圳市沃易科技有限公司 | System for fusing multi-channel data chain processing |
CN113821503A (en) * | 2021-09-23 | 2021-12-21 | 北京金山云网络技术有限公司 | Medical data processing method and device and edge server |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109361532A (en) * | 2018-09-11 | 2019-02-19 | 上海天旦网络科技发展有限公司 | The high-availability system and method and computer readable storage medium of network data analysis |
CN109862087A (en) * | 2019-01-23 | 2019-06-07 | 深圳市康拓普信息技术有限公司 | Industrial Internet of things system and its data processing method based on edge calculations |
CN109885566A (en) * | 2019-02-25 | 2019-06-14 | 南京世界村云数据产业集团有限公司 | A kind of acquisition of data and edge calculations system |
CN110336703A (en) * | 2019-07-12 | 2019-10-15 | 河海大学常州校区 | Industrial big data based on edge calculations monitors system |
CN111459665A (en) * | 2020-03-27 | 2020-07-28 | 重庆电政信息科技有限公司 | Distributed edge computing system and distributed edge computing method |
CN111478960A (en) * | 2020-04-03 | 2020-07-31 | 河海大学常州校区 | Data acquisition and edge calculation system based on edge calculation |
-
2021
- 2021-03-18 CN CN202110289206.0A patent/CN112988710A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109361532A (en) * | 2018-09-11 | 2019-02-19 | 上海天旦网络科技发展有限公司 | The high-availability system and method and computer readable storage medium of network data analysis |
CN109862087A (en) * | 2019-01-23 | 2019-06-07 | 深圳市康拓普信息技术有限公司 | Industrial Internet of things system and its data processing method based on edge calculations |
CN109885566A (en) * | 2019-02-25 | 2019-06-14 | 南京世界村云数据产业集团有限公司 | A kind of acquisition of data and edge calculations system |
CN110336703A (en) * | 2019-07-12 | 2019-10-15 | 河海大学常州校区 | Industrial big data based on edge calculations monitors system |
CN111459665A (en) * | 2020-03-27 | 2020-07-28 | 重庆电政信息科技有限公司 | Distributed edge computing system and distributed edge computing method |
CN111478960A (en) * | 2020-04-03 | 2020-07-31 | 河海大学常州校区 | Data acquisition and edge calculation system based on edge calculation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113821503A (en) * | 2021-09-23 | 2021-12-21 | 北京金山云网络技术有限公司 | Medical data processing method and device and edge server |
CN113709261A (en) * | 2021-10-29 | 2021-11-26 | 深圳市沃易科技有限公司 | System for fusing multi-channel data chain processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110445637B (en) | Event monitoring method, system, computer device and storage medium | |
WO2020211299A1 (en) | Data cleansing method | |
CN112988710A (en) | Big data processing method and system | |
US11188443B2 (en) | Method, apparatus and system for processing log data | |
CN111400288A (en) | Data quality inspection method and system | |
CN111680108B (en) | Data storage method and device and data acquisition method and device | |
CN115269515A (en) | Processing method for searching specified target document data | |
CN112883001A (en) | Data processing method, device and medium based on marketing and distribution through data visualization platform | |
CN115687478A (en) | Standardized service data sharing system and method | |
CN115514784A (en) | Multisource data acquisition middle platform based on Internet of things | |
CN114138877A (en) | Method, device and equipment for realizing theme data service based on micro-service architecture | |
CN110932393B (en) | Substation information protection master station system and data initialization method thereof | |
CN112860412A (en) | Service data processing method and device, electronic equipment and storage medium | |
CN115576998B (en) | Power distribution network data integration method and system based on multi-dimensional information fusion | |
CN114817256A (en) | Quick unified storage system of thing networking | |
CN114357082A (en) | Cloud computing-based big data analysis method and system | |
CN111813873A (en) | Method and system for automatically discovering entity relationship | |
CN117453493B (en) | GPU computing power cluster monitoring method and system for large-scale multi-data center | |
CN112486992B (en) | Data storage method and system | |
CN111695034B (en) | Internet asset monitoring management system | |
CN118095237A (en) | Table generation method, electronic device and storage medium | |
CN117453493A (en) | GPU computing power cluster monitoring method and system for large-scale multi-data center | |
CN106354813A (en) | Mass data dimension user positioning method | |
CN116048810A (en) | Mixed calculation force network identification method and equipment based on three-dimensional view | |
CN117873691A (en) | Data processing method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210618 |
|
RJ01 | Rejection of invention patent application after publication |