CN112988710A - Big data processing method and system - Google Patents

Big data processing method and system Download PDF

Info

Publication number
CN112988710A
CN112988710A CN202110289206.0A CN202110289206A CN112988710A CN 112988710 A CN112988710 A CN 112988710A CN 202110289206 A CN202110289206 A CN 202110289206A CN 112988710 A CN112988710 A CN 112988710A
Authority
CN
China
Prior art keywords
data
service
cleaning
filtering
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110289206.0A
Other languages
Chinese (zh)
Inventor
严涛
宋怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Qingyunshang Information Technology Co ltd
Original Assignee
Chengdu Qingyunshang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Qingyunshang Information Technology Co ltd filed Critical Chengdu Qingyunshang Information Technology Co ltd
Priority to CN202110289206.0A priority Critical patent/CN112988710A/en
Publication of CN112988710A publication Critical patent/CN112988710A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a big data processing method and a system, wherein the processing method comprises the following steps: data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized; after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode; and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use. According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.

Description

Big data processing method and system
Technical Field
The invention relates to the technical field of big data, in particular to a big data processing method and system.
Background
Big data (big data) is a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the rapid development of science and technology, massive data are generated in various industry fields every day, and how to extract useful data from the massive data and obtain a required result through analysis and processing is the most basic problem to be solved by the existing big data processing.
At present, most of big data processing methods classify collected data, analyze and process the classified data, store the data in a certain database or server, and if users who have demands on the data exist, acquire corresponding data from the database or server by acquiring corresponding permissions of the database or server; however, the data analyzed and processed in this way is only obtained passively by the user, and many times, the user does not know where to obtain the corresponding data, so that the use efficiency of the data is too low; therefore, how to solve the problem needs to be considered at the present stage.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a big data processing method and a big data processing system, and overcomes the defects in the conventional big data processing mode.
The purpose of the invention is realized by the following technical scheme: a big data processing method, the processing method comprising:
data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized;
after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode;
and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use.
The data fusion processing by the edge calculation method comprises the following steps:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
The processing treatment for sequentially cleaning and filtering the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
The data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
The user data system comprises MySQL, Oracle, Sybase, FoxPro, BigTable, CouchDB, FileMaker and PostgreSQL, etc.
A system based on big data processing method comprises a data acquisition module, a data processing module, a data fusion module, a storage module and a packaging service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
The data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
The encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
The invention has the following advantages: a big data processing method and system, pack the data into corresponding service directly through the response design engine in the interface service, do not need to develop the specialized data docking interface, do not need to set up the specialized database either by oneself; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, an embodiment of the present invention relates to a big data processing method, which specifically includes the following steps:
s1, acquiring data from the service system in a memory exchange mode, converting the data into data in a standard format, and realizing data standardization; the standard format of the data conversion includes national standard, international standard, military standard, and the like.
S2, sequentially carrying out data cleaning and filtering processing on the data, and then carrying out data fusion processing in an edge calculation mode;
further, the processing treatment of sequentially performing data cleaning and filtering on the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
The missing value cleaning comprises the steps of determining a missing value range, removing unnecessary fields (namely directly deleting the data), filling missing contents and re-fetching numbers;
the missing value range is that the missing value proportion of each field is calculated, and then corresponding strategies are appointed according to the missing proportion and the field importance; for data with high importance and low deletion rate, the strategy is to fill the data through calculation and estimate the data through experience or business knowledge; for data with high importance and high missing rate, the strategy is that data can be obtained from other channels for completion, or other fields are used for obtaining through calculation, or fields are removed and marked in the result; for data with low importance and low deletion rate, the strategy is that no processing or simple filling can be carried out; for data with low importance and high deletion rate, the strategy can be directly omitted.
The filling of missing content may be performed by supposition of a business knowledge or experience to fill the missing value, or by using the calculation result of the same index, or by using the calculation result of different indexes.
The re-fetching is that if some indexes are very important and the loss rate is high, the collected data can be re-acquired or collected again to supplement the data.
Further, the data fusion processing by means of edge calculation includes:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
And S3, storing the fused data in a virtualization storage mode and packaging the fused data into corresponding data service for users to use.
The virtualized storage mainly stores the data path pointing address, and can also store the data by a solid storage mode.
Further, the data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
Example 2
As shown in fig. 2, another embodiment of the present invention relates to a system based on big data processing method, which includes a data acquisition module, a data processing module, a data fusion module, a storage module, and a packaging service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
The data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
The encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
According to the invention, data are directly packaged into corresponding services through a response design engine in the interface service without developing a special data docking interface or establishing a special database; the data is directly encapsulated into a data system of the user in a data loading service mode, the encapsulated data can be actively and directly delivered to the user, the user does not need to access a database or a server through a protocol or an interface after acquiring corresponding authority, and the utilization rate of the data is improved.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A big data processing method is characterized in that: the processing method comprises the following steps:
data are collected from a service system in a memory exchange mode and are converted into data in a standard format, so that data standardization is realized;
after data cleaning and filtering processing are sequentially carried out on the data, data fusion processing is carried out in an edge calculation mode;
and storing the fused data in a virtualization storage mode and packaging the data into corresponding data service for users to use.
2. The big data processing method according to claim 1, wherein: the data fusion processing by the edge calculation method comprises the following steps:
the data processing main node divides data into a plurality of data blocks, transmits the data in each data block to a distributed data node network, and each distributed data sub node receives and processes the data in one data block;
each distributed data sub-node analyzes and processes the received data and transmits the data back to the data processing main node through the distributed data node network.
3. The big data processing method according to claim 1, wherein: the processing treatment for sequentially cleaning and filtering the data comprises the following steps:
the data cleaning comprises the steps of sequentially carrying out missing value cleaning, format content cleaning, logic error cleaning, non-demand cleaning and relevance verification processing;
and setting data filtering conditions according to requirements, forming a data filtering grid, and filtering the data subjected to data cleaning through the data filtering grid.
4. The big data processing method according to claim 1, wherein: the data service comprises a data interface service and a data loading service;
the data interface service is a corresponding service which is formed by generating a corresponding code through a response design engine according to the content, parameters and response instructions of the service required by design through a background and then packaging the code;
the data loading service is used for acquiring a data system of a user, packaging data into data which accords with the user system, and then directly delivering the data to a data receiving end or a data using end of the user.
5. A system based on big data processing method is characterized in that: the system comprises a data acquisition module, a data processing module, a data fusion module, a storage module and an encapsulation service module;
the data acquisition module is used for acquiring data from the business system in a memory exchange mode;
the data processing module is used for processing the data by data cleaning and filtering in sequence and then performing format standard conversion;
the data fusion module is used for performing edge calculation on data through data sub-nodes through a distributed data node network, and finally collecting the calculated data;
the storage module is divided into virtualized storage and field storage, the virtualized storage is used for storing data path pointing addresses, and the field storage is used for storing data per se;
the encapsulation service module is used for encapsulating data into an interface service form and/or a data loading service form.
6. The big data processing method-based system according to claim 5, wherein: the data processing module comprises a data cleaning unit, a data filtering unit and a standardized conversion unit;
the data cleaning unit is used for sequentially performing missing value cleaning, format content cleaning, logic error cleaning, non-required cleaning and relevance verification processing;
the data filtering unit is used for setting data filtering conditions according to requirements, forming a data filtering grid and filtering the data subjected to data cleaning through the data filtering grid;
the standardized conversion unit is used for carrying out format standardized conversion on the data processed by the data cleaning unit and the data filtering unit.
7. The big data processing method-based system according to claim 5, wherein: the encapsulation service module comprises an interface service unit and a data loading service unit;
the interface service unit generates corresponding codes through a response design engine according to the content, parameters and response instructions of the service required by design through a background, and then packages the codes into corresponding services;
the data loading service unit encapsulates the data into data conforming to the user system by acquiring the user data system and then directly puts the data to the data receiving end or the data using end of the user.
CN202110289206.0A 2021-03-18 2021-03-18 Big data processing method and system Pending CN112988710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110289206.0A CN112988710A (en) 2021-03-18 2021-03-18 Big data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110289206.0A CN112988710A (en) 2021-03-18 2021-03-18 Big data processing method and system

Publications (1)

Publication Number Publication Date
CN112988710A true CN112988710A (en) 2021-06-18

Family

ID=76332986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110289206.0A Pending CN112988710A (en) 2021-03-18 2021-03-18 Big data processing method and system

Country Status (1)

Country Link
CN (1) CN112988710A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709261A (en) * 2021-10-29 2021-11-26 深圳市沃易科技有限公司 System for fusing multi-channel data chain processing
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361532A (en) * 2018-09-11 2019-02-19 上海天旦网络科技发展有限公司 The high-availability system and method and computer readable storage medium of network data analysis
CN109862087A (en) * 2019-01-23 2019-06-07 深圳市康拓普信息技术有限公司 Industrial Internet of things system and its data processing method based on edge calculations
CN109885566A (en) * 2019-02-25 2019-06-14 南京世界村云数据产业集团有限公司 A kind of acquisition of data and edge calculations system
CN110336703A (en) * 2019-07-12 2019-10-15 河海大学常州校区 Industrial big data based on edge calculations monitors system
CN111459665A (en) * 2020-03-27 2020-07-28 重庆电政信息科技有限公司 Distributed edge computing system and distributed edge computing method
CN111478960A (en) * 2020-04-03 2020-07-31 河海大学常州校区 Data acquisition and edge calculation system based on edge calculation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361532A (en) * 2018-09-11 2019-02-19 上海天旦网络科技发展有限公司 The high-availability system and method and computer readable storage medium of network data analysis
CN109862087A (en) * 2019-01-23 2019-06-07 深圳市康拓普信息技术有限公司 Industrial Internet of things system and its data processing method based on edge calculations
CN109885566A (en) * 2019-02-25 2019-06-14 南京世界村云数据产业集团有限公司 A kind of acquisition of data and edge calculations system
CN110336703A (en) * 2019-07-12 2019-10-15 河海大学常州校区 Industrial big data based on edge calculations monitors system
CN111459665A (en) * 2020-03-27 2020-07-28 重庆电政信息科技有限公司 Distributed edge computing system and distributed edge computing method
CN111478960A (en) * 2020-04-03 2020-07-31 河海大学常州校区 Data acquisition and edge calculation system based on edge calculation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server
CN113709261A (en) * 2021-10-29 2021-11-26 深圳市沃易科技有限公司 System for fusing multi-channel data chain processing

Similar Documents

Publication Publication Date Title
CN110445637B (en) Event monitoring method, system, computer device and storage medium
WO2020211299A1 (en) Data cleansing method
CN112988710A (en) Big data processing method and system
US11188443B2 (en) Method, apparatus and system for processing log data
CN111400288A (en) Data quality inspection method and system
CN111680108B (en) Data storage method and device and data acquisition method and device
CN115269515A (en) Processing method for searching specified target document data
CN112883001A (en) Data processing method, device and medium based on marketing and distribution through data visualization platform
CN115687478A (en) Standardized service data sharing system and method
CN115514784A (en) Multisource data acquisition middle platform based on Internet of things
CN114138877A (en) Method, device and equipment for realizing theme data service based on micro-service architecture
CN110932393B (en) Substation information protection master station system and data initialization method thereof
CN112860412A (en) Service data processing method and device, electronic equipment and storage medium
CN115576998B (en) Power distribution network data integration method and system based on multi-dimensional information fusion
CN114817256A (en) Quick unified storage system of thing networking
CN114357082A (en) Cloud computing-based big data analysis method and system
CN111813873A (en) Method and system for automatically discovering entity relationship
CN117453493B (en) GPU computing power cluster monitoring method and system for large-scale multi-data center
CN112486992B (en) Data storage method and system
CN111695034B (en) Internet asset monitoring management system
CN118095237A (en) Table generation method, electronic device and storage medium
CN117453493A (en) GPU computing power cluster monitoring method and system for large-scale multi-data center
CN106354813A (en) Mass data dimension user positioning method
CN116048810A (en) Mixed calculation force network identification method and equipment based on three-dimensional view
CN117873691A (en) Data processing method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210618

RJ01 Rejection of invention patent application after publication