CN108920410A - A kind of big data processing unit and method - Google Patents

A kind of big data processing unit and method Download PDF

Info

Publication number
CN108920410A
CN108920410A CN201810648667.0A CN201810648667A CN108920410A CN 108920410 A CN108920410 A CN 108920410A CN 201810648667 A CN201810648667 A CN 201810648667A CN 108920410 A CN108920410 A CN 108920410A
Authority
CN
China
Prior art keywords
data
data processing
processor
preliminary treatment
treatment device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810648667.0A
Other languages
Chinese (zh)
Inventor
王旭生
梁娜
王健
邱志祺
安逸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Science and Technology
Original Assignee
North China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Science and Technology filed Critical North China University of Science and Technology
Priority to CN201810648667.0A priority Critical patent/CN108920410A/en
Publication of CN108920410A publication Critical patent/CN108920410A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake

Abstract

The invention discloses a kind of big data processing unit and methods, including primary processor, preliminary treatment device and secondary processor, master data processing module is equipped in the primary processor, the primary processor is connect with multiple preliminary treatment devices, preliminary data processing module is equipped in the preliminary treatment device, a preliminary treatment device is connect with multiple secondary processors.Target data to be processed is obtained by preliminary treatment device, the data shape key index pre-established is obtained by secondary processor, data shape key index includes the self attributes information of data, and the data shape of target data to be processed is determined according to the data shape key index pre-established.It completes data exchange in time to alleviate Corporation system while handling the burden of mass data, the accuracy of data processing is high, and data processing improves work efficiency in time.

Description

A kind of big data processing unit and method
Technical field
The present invention relates to data processing field, specifically a kind of big data processing unit and method.
Background technique
With the arriving of cloud era, big data has also attracted more and more concerns.Analyst team thinks that big data is logical It is commonly used to describe a large amount of unstructured datas and semi-structured data that a company creates, these data are downloading to relationship type Database is for meeting overspending time and money when analyzing.Big data analysis is often linked together with cloud computing, because in real time Large data set analysis need frame as MapReduce to distribute work to tens of, hundreds of or even thousands of computers Make.
Big data needs special technology, effectively to handle a large amount of tolerance by the data in the time.Suitable for big The technology of data, including MPP database, data mining power grid, distributed file system, distributed data base, Cloud computing platform, internet and expansible storage system.
Summary of the invention
The purpose of the present invention is to provide a kind of big data processing unit and methods, to solve to propose in above-mentioned background technique The problem of.
To achieve the above object, the present invention provides the following technical solutions:
A kind of big data processing unit, including primary processor, preliminary treatment device and secondary processor, in the primary processor Equipped with master data processing module, the primary processor is connect with multiple preliminary treatment devices, is equipped in the preliminary treatment device preliminary Data processing module, a preliminary treatment device are connect with multiple secondary processors, and secondary data is equipped in the secondary processor Processing module sets in the master data processor, preliminary treatment device and secondary processor and is equipped with data detection module, data inspection Module is surveyed for detecting input data exception.
A kind of big data processing method, includes the following steps:
Step S1 obtains target data to be processed by preliminary treatment device.
Step S2 obtains the data shape key index pre-established, data shape key index by secondary processor Self attributes information including data.
In one embodiment, the data shape key index pre-established may include but be not limited in following index It is one or more:
The data volume of tentation data table, specifies compound primary key at the data volume of one or more subregions of tentation data table Number, the maximum value of field value, the minimum value of field value, word that number, the value of field after data volume, field duplicate removal are NULL Maximum length in segment value, minimum length, the calculated result of specific field, the number that field intermediate value is 0, field intermediate value in field value The number and the hundred of the data volume of whole table for being NULL with the percentage and field intermediate value of the data volume of whole table for 0 number Divide ratio.
Before handling data, need to define the data shape key index to be calculated, key index mainly has:From whole The data volume of whole table or the data volume of a subregion from the point of view of body surface.
The data volume of specified compound primary key.
The index that each field calculates:
Total amount after field duplicate removal:Number after this field duplicate removal;
NULL total amount:The value of this field is the number of null;
Maximum value:This field value maximum value is sought, if it is field (string type etc., just according to max of nonumeric class The logic calculation of function default.
Minimum value:This field value maximum value is sought, if it is field (string type etc., just according to min of nonumeric class The logic calculation of function default.
Maximum length:Maximum length in this field value;
Maximum length citing:One of field value in this field value in maximum length;
Minimum length:Minimum length in this field value;
Minimum length citing:One of field value in minimum length in this field value.
The calculating of field enumerated value:Specific field, such as specified gende:Field, calculated result may be:Male: 1000;Female:20000
0 value number:The number that this field intermediate value is 0.
0 value accounting:The data volume of the table of 0 value number/whole
Null value accounting:The data volume of the table of null value number/whole
Step S3, primary processor 1 determine target data to be processed according to the data shape key index pre-established Data shape.
Data shape just refers to after taking a table that the data inside this table all look like.Data shape report It is that the basic checkpoint having polymerize to table data by the index of these data shapes can find out this table at a glance in fact Data problem, the thought more than, the speed that can be determined with java ' realizes a platform, as long as user fills in some configurations, so that it may The data shape for taking target data reports.Cost of implementation is low, to master basic thought, with java or sql It realizes quickly.
The above method of the embodiment of the present invention obtains the data shape pre-established by obtaining target data to be processed State key index, data shape key index include the self attributes information of data, crucial according to the data shape pre-established Index determines the data shape of target data to be processed.Above-mentioned technical proposal reflects a certain partial data with key index Data shape, so as to the data shape of simple, quick, complete reflection data.
In one embodiment, which can also include the steps of S4:
Target data to be processed is written in a tentation data table step S4.
In the present embodiment, target data to be processed is as unit of table.Such as want across table, as long as or a part of data, It just will be in the specified table of data write-in to be processed.Be exactly in a word by target data to be processed, be placed in a tables of data from And convenient for being analyzed data, being handled.
As a further solution of the present invention:Input equipment is connected in the secondary processor, in the input equipment Equipped with data input module.
As further scheme of the invention:Input equipment, the input equipment are connected in the secondary processor It is interior to be equipped with data input module.
As further scheme of the invention:Secondary data processing module is equipped in the secondary processor.
As further scheme of the invention:Between the master data processor and preliminary treatment device, preliminary treatment device It is transmitted in both directions mode between secondary processor.
As further scheme of the invention:Step S3 may include following steps S31-S33:
Step S31 receives the table name of the tentation data table of input;
Step S32 cleans tentation data table according to the data shape key index pre-established, obtains predetermined number According to the data shape of the target data in table.
Step S33 receives the subregion to be processed of the preset data table of input and/or receives the one or more of input wait count Calculate the field of enumerated value.
As further scheme of the invention:The operating process of step S3 is:Newly-built task, is filled out corresponding according to prompt Information.For example, it is to be calculated to fill in the table name of tables of data to be analyzed, the subregion to be processed of preset data table and one or more The field of enumerated value.After filling in, executive button is clicked, can start to analyze the data in table, task execution is complete After can to user send notify, user can be with click logs button real time inspection task execution progress.
Compared with prior art, the beneficial effects of the invention are as follows:Target data to be processed is obtained by preliminary treatment device, The data shape key index pre-established is obtained by secondary processor, data shape key index includes itself belonging to for data Property information, the data shape of target data to be processed is determined according to the data shape key index pre-established.Due to pass Key index completes data exchange in time and alleviates Corporation system to handle simultaneously largely to reflect the data shape of a certain partial data The accuracy of the burden of data, data processing is high, and data processing is timely, can understand enterprise and handle relevant issues in time, It improves work efficiency.
Detailed description of the invention
Fig. 1 is a kind of structural schematic diagram of big data processing unit.
In figure:1- primary processor, 2- preliminary treatment device, 3- secondary processor.
Specific embodiment
The technical solution of the patent is explained in further detail With reference to embodiment.
Referring to Fig. 1, a kind of big data processing unit, including primary processor 1, preliminary treatment device 2 and secondary processor 3, It is equipped with master data processing module in the primary processor 1, the primary processor 1 is connect with multiple preliminary treatment devices 2, described preliminary Preliminary data processing module is equipped in processor 2, a preliminary treatment device 2 is connect with multiple secondary processors 3, at the secondary It manages and is equipped with secondary data processing module in device 3, be connected with input equipment in the secondary processor 3, set in the input equipment There is data input module.It is set in the master data processor 1, preliminary treatment device 2 and secondary processor 3 and is equipped with Data Detection mould Block, data detection module is for detecting input data exception, between the master data processor 1 and preliminary treatment device 2, preliminary place Managing between device 2 and secondary processor 3 is transmitted in both directions mode.
A kind of big data processing method, includes the following steps:
Step S1 obtains target data to be processed by preliminary treatment device 2.
Step S2 obtains the data shape key index pre-established, data shape key index by secondary processor 3 Self attributes information including data.
In one embodiment, the data shape key index pre-established may include but be not limited in following index It is one or more:
The data volume of tentation data table, specifies compound primary key at the data volume of one or more subregions of tentation data table Number, the maximum value of field value, the minimum value of field value, word that number, the value of field after data volume, field duplicate removal are NULL Maximum length in segment value, minimum length, the calculated result of specific field, the number that field intermediate value is 0, field intermediate value in field value The number and the hundred of the data volume of whole table for being NULL with the percentage and field intermediate value of the data volume of whole table for 0 number Divide ratio.
Before handling data, need to define the data shape key index to be calculated, key index mainly has:From whole The data volume of whole table or the data volume of a subregion from the point of view of body surface.
The data volume of specified compound primary key.
The index that each field calculates;
Total amount after field duplicate removal:Number after this field duplicate removal;
NULL total amount:The value of this field is the number of null;
Maximum value:This field value maximum value is sought, if it is field (string type etc., just according to max of nonumeric class The logic calculation of function default;
Minimum value:This field value maximum value is sought, if it is field (string type etc., just according to min of nonumeric class The logic calculation of function default.
Maximum length:Maximum length in this field value;
Maximum length citing:One of field value in this field value in maximum length;
Minimum length:Minimum length in this field value;
Minimum length citing:One of field value in minimum length in this field value;
The calculating of field enumerated value:Specific field, such as specified gende:Field, calculated result may be:Male: 1000;Female:20000
0 value number:The number that this field intermediate value is 0.
0 value accounting:The data volume of the table of 0 value number/whole;
Null value accounting:The data volume of the table of null value number/whole;
Step S3, primary processor 1 determine target data to be processed according to the data shape key index pre-established Data shape.
Data shape just refers to after taking a table that the data inside this table all look like.Data shape report It is that the basic checkpoint having polymerize to table data by the index of these data shapes can find out this table at a glance in fact Data problem.With above thought, the speed that can be determined with java ' realizes a platform, as long as user fills in some configurations, so that it may The data shape for taking target data reports.Cost of implementation is low, to master basic thought, with java or sql It realizes quickly.
The above method of the embodiment of the present invention obtains the data shape pre-established by obtaining target data to be processed State key index, data shape key index include the self attributes information of data, crucial according to the data shape pre-established Index determines the data shape of target data to be processed.Above-mentioned technical proposal reflects a certain partial data with key index Data shape, so as to the data shape of simple, quick, complete reflection data.
In one embodiment, which can also include the steps of:
Target data to be processed is written in a tentation data table step S4.
In the present embodiment, target data to be processed is as unit of table.Such as want across table, as long as or a part of data, It just will be in the specified table of data write-in to be processed.Be exactly in a word by target data to be processed, be placed in a tables of data from And convenient for being analyzed data, being handled.
In one embodiment, step S3 may include following steps S31-S33:
Step S31 receives the table name of the tentation data table of input;
Step S32 cleans tentation data table according to the data shape key index pre-established, obtains predetermined number According to the data shape of the target data in table.
Step S3 can also include the steps of S33:Step S33, receive the preset data table of input subregion to be processed and/ Or receive the field of the one or more enumerated value to be calculated of input.
The operating process of step S3 is:Newly-built task, fills out corresponding information according to prompt.For example, filling in number to be analyzed According to the field of the table name of table, the subregion to be processed of preset data table and one or more enumerated values to be calculated.After filling in, Executive button is clicked, can start to analyze the data in table, can send and notify to user after task execution is complete, user is also It can be with click logs button real time inspection task execution progress.
Such as:The total amount and the total amount after duplicate removal of field, can to Bian Jian carry out uniqueness verification, it is distant and out of sight go out repeated data, See whether total amount meets expection in combination with business;Field minimax length can verify dirty data, such as the ultra-long data of address, Sellerid is the data of units;The ratio distribution of field enumerated value can verify the reasonability of enumerated value;Field maximum value is most Small value is in combination with reasonability from the point of view of business;Field null value number is in combination with reasonability from the point of view of business.In particular, result after cleaning When table will be supplied to using directly showing, need to meet constraint of the application system to data shape, for example certain field is not It can be null, the length of certain field cannot be too long etc..
The preferred embodiment of the patent is described in detail above, but this patent is not limited to above-mentioned embodiment party Formula within the knowledge of one of ordinary skill in the art can also be under the premise of not departing from this patent objective It makes a variety of changes.

Claims (8)

1. a kind of big data processing unit, including primary processor (1), preliminary treatment device (2) and secondary processor (3), the master Master data processing module is equipped in processor (1), the primary processor (1) connect with multiple preliminary treatment devices (2), and one preliminary Processor (2) is connect with multiple secondary processors (3), the master data processor (1), preliminary treatment device (2) and secondary treatment It is set in device (3) and is equipped with data detection module, data detection module is for detecting input data exception.
2. a kind of big data processing unit according to claim 1, which is characterized in that set in the preliminary treatment device (2) There is preliminary data processing module.
3. a kind of big data processing unit according to claim 1, which is characterized in that connect on the secondary processor (3) It is connected to input equipment, is equipped with data input module in the input equipment.
4. a kind of big data processing unit according to claim 1, which is characterized in that set in the secondary processor (3) There is secondary data processing module.
5. a kind of big data processing unit according to claim 1, which is characterized in that the master data processor (1) and Between preliminary treatment device (2), between preliminary treatment device (2) and secondary processor (3) be transmitted in both directions mode.
6. a kind of big data processing method, which is characterized in that include the following steps:
Step S1 obtains target data to be processed by preliminary treatment device 2;
Step S2 obtains the data shape key index pre-established by secondary processor 3, and data shape key index includes The self attributes information of data;
Step S3, primary processor 1 determine the data of target data to be processed according to the data shape key index pre-established Form.
7. a kind of big data processing method according to claim 6, which is characterized in that step S3 may include following steps S31-S33:
Step S31 receives the table name of the tentation data table of input;
Step S32 cleans tentation data table according to the data shape key index pre-established, obtains tentation data table In target data data shape;
Step S33 receives the subregion to be processed of the preset data table of input and/or receives the to be calculated piece one or more of input The field of act value.
8. a kind of big data processing method according to claim 6, which is characterized in that the operating process of step S3 is:Newly Task is built, corresponding information is filled out according to prompt.For example, fill in the table name of tables of data to be analyzed, preset data table it is to be processed The field of subregion and one or more enumerated values to be calculated.After filling in, executive button is clicked, can be started in table Data are analyzed, and can be sent and be notified to user after task execution is complete, user can be with click logs button real time inspection task Implementation progress.
CN201810648667.0A 2018-06-22 2018-06-22 A kind of big data processing unit and method Pending CN108920410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810648667.0A CN108920410A (en) 2018-06-22 2018-06-22 A kind of big data processing unit and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810648667.0A CN108920410A (en) 2018-06-22 2018-06-22 A kind of big data processing unit and method

Publications (1)

Publication Number Publication Date
CN108920410A true CN108920410A (en) 2018-11-30

Family

ID=64421011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810648667.0A Pending CN108920410A (en) 2018-06-22 2018-06-22 A kind of big data processing unit and method

Country Status (1)

Country Link
CN (1) CN108920410A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11328114A (en) * 1998-05-14 1999-11-30 Toshiba Tec Corp Shop management data processing system
CN101527574A (en) * 2009-03-27 2009-09-09 王方松 Shielding system for seamless grading management
CN101667165A (en) * 2009-09-28 2010-03-10 中国电力科学研究院 Bus sharing method and device for distributed multi-master CPUs
CN201638067U (en) * 2009-11-18 2010-11-17 吉林大元电子科技有限公司 Automobile CAN bus electronic control unit
WO2010129998A1 (en) * 2009-05-13 2010-11-18 The University Of Sydney A method and system for data analysis and synthesis
CN104050230A (en) * 2013-03-15 2014-09-17 英特尔公司 Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture
CN104679813A (en) * 2013-11-28 2015-06-03 三星电子株式会社 Data storage device, data storage method and data storage system
CN105122726A (en) * 2013-03-11 2015-12-02 皇家飞利浦有限公司 Multiple user wireless docking
CN105892412A (en) * 2014-12-15 2016-08-24 广西大学 Multi-axis motion control hardware configuration based on custom bus
CN106204374A (en) * 2016-07-04 2016-12-07 黄可欣 A kind of for educating the system and method that big data process
CN106294823A (en) * 2016-08-17 2017-01-04 上海云信留客信息科技有限公司 Abnormality detection and the method for elimination for big data cleansing
CN106294776A (en) * 2016-08-12 2017-01-04 北京东方车云信息技术有限公司 A kind of data processing method and device
CN107679129A (en) * 2017-09-21 2018-02-09 无线生活(杭州)信息科技有限公司 A kind of big data processing method and processing device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11328114A (en) * 1998-05-14 1999-11-30 Toshiba Tec Corp Shop management data processing system
CN101527574A (en) * 2009-03-27 2009-09-09 王方松 Shielding system for seamless grading management
WO2010129998A1 (en) * 2009-05-13 2010-11-18 The University Of Sydney A method and system for data analysis and synthesis
CN101667165A (en) * 2009-09-28 2010-03-10 中国电力科学研究院 Bus sharing method and device for distributed multi-master CPUs
CN201638067U (en) * 2009-11-18 2010-11-17 吉林大元电子科技有限公司 Automobile CAN bus electronic control unit
CN105122726A (en) * 2013-03-11 2015-12-02 皇家飞利浦有限公司 Multiple user wireless docking
CN104050230A (en) * 2013-03-15 2014-09-17 英特尔公司 Fast approach to finding minimum and maximum values in a large data set using SIMD instruction set architecture
CN104679813A (en) * 2013-11-28 2015-06-03 三星电子株式会社 Data storage device, data storage method and data storage system
CN105892412A (en) * 2014-12-15 2016-08-24 广西大学 Multi-axis motion control hardware configuration based on custom bus
CN106204374A (en) * 2016-07-04 2016-12-07 黄可欣 A kind of for educating the system and method that big data process
CN106294776A (en) * 2016-08-12 2017-01-04 北京东方车云信息技术有限公司 A kind of data processing method and device
CN106294823A (en) * 2016-08-17 2017-01-04 上海云信留客信息科技有限公司 Abnormality detection and the method for elimination for big data cleansing
CN107679129A (en) * 2017-09-21 2018-02-09 无线生活(杭州)信息科技有限公司 A kind of big data processing method and processing device

Similar Documents

Publication Publication Date Title
Zanjani et al. Automatically recommending peer reviewers in modern code review
Vulimiri et al. Global analytics in the face of bandwidth and regulatory constraints
CN103620601B (en) Joining tables in a mapreduce procedure
US10824611B2 (en) Automatic determination of table distribution for multinode, distributed database systems
US9436734B2 (en) Relative performance prediction of a replacement database management system (DBMS)
US10223437B2 (en) Adaptive data repartitioning and adaptive data replication
US20150012635A1 (en) Systems and methods for organic knowledge base runbook automation
WO2008140937A4 (en) Query handling in databases with replicated data
US20130067440A1 (en) System and method for sql performance assurance services
US10664477B2 (en) Cardinality estimation in databases
Ali et al. A framework to implement data cleaning in enterprise data warehouse for robust data quality
CN110457371A (en) Data managing method, device, storage medium and system
CA3167981C (en) Offloading statistics collection
CN112997168A (en) System and method for data analysis by analyzing application environment
CN105868956A (en) Data processing method and device
Xie et al. Dynamic interaction graphs with probabilistic edge decay
US11222073B2 (en) System and method of creating different relationships between various entities using a graph database
CN105446824B (en) Table increment acquisition methods and long-distance data backup method
CN105786990B (en) The method and device of database data storage and quick search
CN108920410A (en) A kind of big data processing unit and method
US11023449B2 (en) Method and system to search logs that contain a massive number of entries
CN104572228B (en) A kind of node updating method and device
CN112948469B (en) Data mining method, device, computer equipment and storage medium
CN115455091A (en) Data generation method and device, electronic equipment and storage medium
CN115147183A (en) Chip resource management method, device, equipment and storage medium based on cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181130

RJ01 Rejection of invention patent application after publication