CN112988710A

CN112988710A - Big data processing method and system

Info

Publication number: CN112988710A
Application number: CN202110289206.0A
Authority: CN
Inventors: 严涛; 宋怡
Original assignee: Chengdu Qingyunshang Information Technology Co ltd
Current assignee: Chengdu Qingyunshang Information Technology Co ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-18

Abstract

The invention relates to a big data processing method and a system, wherein the processing method comprises the following steps: data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized; after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode; and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use. According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.

Description

Big data processing method and system

Technical Field

The invention relates to the technical field of big data, in particular to a big data processing method and system.

Background

Big data (big data) is a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the rapid development of science and technology, massive data are generated in various industry fields every day, and how to extract useful data from the massive data and obtain a required result through analysis and processing is the most basic problem to be solved by the existing big data processing.

At present, most of big data processing methods classify collected data, analyze and process the classified data, store the data in a certain database or server, and if users who have demands on the data exist, acquire corresponding data from the database or server by acquiring corresponding permissions of the database or server; however, the data analyzed and processed in this way is only obtained passively by the user, and many times, the user does not know where to obtain the corresponding data, so that the use efficiency of the data is too low; therefore, how to solve the problem needs to be considered at the present stage.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a big data processing method and a big data processing system, and overcomes the defects in the conventional big data processing mode.

The purpose of the invention is realized by the following technical scheme: a big data processing method, the processing method comprising:

data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized;

after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode;

and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use.

The data fusion processing by the edge calculation method comprises the following steps:

the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;

each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.

The processing treatment for sequentially cleaning and filtering the data comprises the following steps:

the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;

and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.

The data service comprises a data interface service and a data loading service;

the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;

the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.

The user data system comprises MySQL, Oracle, Sybase, FoxPro, BigTable, CouchDB, FileMaker and PostgreSQL, etc.

A system based on big data processing method comprises a data acquisition module, a data processing module, a data fusion module, a storage module and a packaging service module;

the data acquisition module is used for acquiring data from the business system in a memory exchange mode;

the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;

the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;

the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;

the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.

The data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;

the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;

the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;

the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.

The encapsulation service module comprises an interface service unit and a data loading service unit;

the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;

the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.

The invention has the following advantages: a big data processing method and system, pack the data into corresponding service directly through the response design engine in the interface service, do not need to develop the specialized data docking interface, do not need to set up the specialized database either by oneself; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.

Example 1

As shown in fig. 1, an embodiment of the present invention relates to a big data processing method, which specifically includes the following steps:

s1, acquiring data from the service system in a memory exchange mode, converting the data into data in a standard format, and realizing data standardization; the standard format of the data conversion includes national standard, international standard, military standard, and the like.

S2, sequentially carrying out data cleaning and filtering processing on the data, and then carrying out data fusion processing in an edge calculation mode;

further, the processing treatment of sequentially performing data cleaning and filtering on the data comprises the following steps:

The missing value cleaning comprises the steps of determining a missing value range, removing unnecessary fields (namely directly deleting the data), filling missing contents and re-fetching numbers;

the missing value range is that the missing value proportion of each field is calculated, and then corresponding strategies are appointed according to the missing proportion and the field importance; for data with high importance and low deletion rate, the strategy is to fill the data through calculation and estimate the data through experience or business knowledge; for data with high importance and high missing rate, the strategy is that data can be obtained from other channels for completion, or other fields are used for obtaining through calculation, or fields are removed and marked in the result; for data with low importance and low deletion rate, the strategy is that no processing or simple filling can be carried out; for data with low importance and high deletion rate, the strategy can be directly omitted.

The filling of missing content may be performed by supposition of a business knowledge or experience to fill the missing value, or by using the calculation result of the same index, or by using the calculation result of different indexes.

The re-fetching is that if some indexes are very important and the loss rate is high, the collected data can be re-acquired or collected again to supplement the data.

Further, the data fusion processing by means of edge calculation includes:

And S3, storing the fused data in a virtualization storage mode and packaging the fused data into corresponding data service for users to use.

The virtualized storage mainly stores the data path pointing address, and can also store the data by a solid storage mode.

Further, the data service comprises a data interface service and a data loading service;

Example 2

As shown in fig. 2, another embodiment of the present invention relates to a system based on big data processing method, which includes a data acquisition module, a data processing module, a data fusion module, a storage module, and a packaging service module;

According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A big data processing method is characterized in that: the processing method comprises the following steps:

2. The big data processing method according to claim 1, wherein: the data fusion processing by the edge calculation method comprises the following steps:

3. The big data processing method according to claim 1, wherein: the processing treatment for sequentially cleaning and filtering the data comprises the following steps:

4. The big data processing method according to claim 1, wherein: the data service comprises a data interface service and a data loading service;

5. A system based on big data processing method is characterized in that: the system comprises a data acquisition module, a data processing module, a data fusion module, a storage module and an encapsulation service module;

6. The big data processing method-based system according to claim 5, wherein: the data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;

7. The big data processing method-based system according to claim 5, wherein: the encapsulation service module comprises an interface service unit and a data loading service unit;