CN109522742A

CN109522742A - A kind of batch processing method of computer big data

Info

Publication number: CN109522742A
Application number: CN201811257472.XA
Authority: CN
Inventors: 张辉
Original assignee: Guizhou Simante Information Technology Development Co Ltd
Current assignee: Guizhou Simante Information Technology Development Co Ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2019-03-26

Abstract

The invention belongs to big data batch system fields, disclose a kind of batch processing method of computer big data, input customer data using data input device by data input module；Main control module dispatches data resource to be processed using dispatching algorithm by scheduling of resource module, scheduling of resource module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, utilizes the process operation to be processed of batch program dispatch processor batch processing by batch processing execution module；Cryptographic operation is carried out to big data using encipheror by encrypting module；Big data is analyzed using analysis program by analysis module, big data resource is stored using memory by data memory module；The big data information content is shown using display by display module.The present invention does not need into distributed data base to obtain big data in the big data of magnanimity, so time-consuming short and be easily achieved.

Description

A kind of batch processing method of computer big data

Technical field

The invention belongs to big data batch system field more particularly to a kind of batch processing methods of computer big data.

Background technique

Big data includes structuring, semi-structured and unstructured data, and unstructured data increasingly becomes data Major part.Show according to the survey report of IDC: 80% data are all unstructured datas in enterprise, these data are every year all Exponentially-increased 60%.Big data is exactly internet development to a kind of presentation or feature in stage now, it is not necessary that mind It talks about it or the heart revered is kept to it, using cloud computing as under the setting off of the technological innovation curtain of representative, these seem originally The data that is difficult to collect and use start to be easy to be utilized, and by constantly bringing forth new ideas for all trades and professions, big data can be gradually More values are created for the mankind.The generation of big data analysis be intended to IT management, enterprise can by real-time stream analyze and Historical correlation data combines, and then big data analysis and finds the model needed for them.In turn, aid forecasting and prevention The following outage and performance issue.For further, they can use big data understanding and are become using model and geography Gesture, and then deepen big data to the insight of responsible consumer.They can also track and record network behavior, and big data is light Ground identifies service impact；Accelerate profit with the profound understanding for utilizing service to increase；Across multisystem collection data development simultaneously IT service catalogue.However, traditional big data security protection technology can not be for the sensitive information and sensitivity inside big data platform Data implement protection；Meanwhile big data is analyzed, and there are problems that timeliness is long and is not easily accomplished.

In conclusion problem of the existing technology is:

(1) traditional big data security protection technology can not be for the sensitive information and sensitive number inside big data platform Protection is factually applied, the leakage of data is be easy to cause, causes damages to user.

(2) for big data when being analyzed, analysis time is longer, and working efficiency is lower, and is not easily accomplished, and is easy Now the situation of the analysis mistake of batch occurs.

(3) traditional scheduling of resource module is lower to the scheduling rates of mass data resource, cause batch processing rate compared with Slowly, the more time is wasted.

When big data analysis, the prior art does not utilize the non-precision solution of Granule Computing method analysis big data problem, will The input of problem is converted to information from most fine granularity initial data and indicates, the information contained in retaining data and value Under the premise of, data volume is greatly reduced.

Summary of the invention

In view of the problems of the existing technology, the present invention provides a kind of batch processing methods of computer big data.

The invention is realized in this way a kind of batch processing method of computer big data, comprising: pass through analysis module benefit Big data is analyzed with analysis program；Specifically have:

Following sequential processes: diversity → flood tide → high speed sequential processes are pressed to the 3V characteristic of big data；

The multiplicity of distributed storage, heterogeneous data are converted using data filtering and data integration, extracts, are granulated, The tables of data more standardized eliminates uncertainty therein；

Using under Granule Computing " gamp " concrete model and technology former data are granulated into the suitable grain of granule size, drop Low data scale, and construct the structure on corresponding granulosa and each granulosa；

Under the auxiliary of other machines learning method, data mining or machine learning are carried out to information；

The data mining used or machine learning are transform the version of distributed, online incremental learning as to meet The timeliness requirement of big data processing；

In processing big data, the free switching of granularity, need on multiple granularity levels the decomposition of grain with merge, also The rapid build accordingly solved；To certain particular problems, the information of multiple granularity levels is needed, " across granularity " mechanism is used to solve；

From entire treatment process, analysis initial data whether there is suitable granularity, for whether need adjust and how The generation or acquisition for adjusting initial data provide guidance；

Deep learning thought is used for reference, crucial process flow is adjusted to many levels, design parameter is allowed to obtain in study To optimization, and optimize final learning outcome.

Further, big data analyze and be specifically included: → integrated/expression of data acquisition → extraction/cleaning → point Analysis/modeling → explanation；

Wherein:

1) data source capability and data integration:

Row data source capability is encapsulated into using dimensionality reduction, data enrichment and data to the processing of isomeric data；

2) granulation of domain-oriented: the input of problem, which is converted to information from most fine granularity initial data, to be indicated, is being protected Under the premise of information and value that residual contains in, data volume is greatly reduced；Before the proposition of specific data analysis requirements, Initial data is first constructed to more granular information knowledge representation model Mu lti-Granular according to domain knowledge Information/Knowledge Representation model, MGrIKR；

Granulation analyzes the expression of information, granulosa and entire kernel structure first, is then constructed for representation method；

Wherein, formalized description, IG=(K VS, GM, VM) expression of information: are carried out to information using triple .KVS (Key Value pair Set) indicates the feature subvector of description information grain, referred to as key-value pair set, i.e. KVS= {〈key₁, va lue₁> ..., < key_n, value_n〉}.value_iIndicate entitled key in information_iThe value that is taken of feature, i= 1,2 ..., n.GM indicate the granularity metric (Granularity Measure) of the information, i.e. the fineness .V of information M indicates the measure of value (Value Measure) of the information；

The expression of granulosa: granulosa is by based on the pass between certain granulation criterion obtained all informations and information System is constituted；Formalization representation is a binary group, Layer=(IGS, Intra- LR)；Wherein, IGS indicates information in granulosa The set (Informa-tio n Granule Set, IGS) of grain IG, IGS is represented by IGS={ IG1, IG2 ..., IGM }；

Intra-LR (Intra-Layer Relationships, Intra-LR) is indicated in granulosa between information Existing relationship, if information IG_pWith I G_qThere are relationship, Intra-LR be represented by Intra-LR=E | E=(IG_p, IG_q), IG_p, IG_q∈IGS}；

The expression of kernel structure: in multiple granulosas that kernel structure in MGrIKR is obtained by different granulation criterion, different granulosas The topological structure that correlation in correlation and same granulosa between information between information is constituted；Kernel structure Formalization representation be similar to information IG and granulosa Layer, indicate kernel structure with tuple form (GranularStructure, GS), GS=(LS, Inter-LR)；

Wherein, LS={ Layer₁..., Layer_M-1, Layer_mIndicate m granulosa set (Layer Set, LS), Middle granulosa Layer_jBe in kernel structure a granulosa .Inter-LR (Inter-Layer Relation- ships, Inter-LR certain two granulosa Layer) is indicated_jTransformational relation collection between the information of Layerk, Inter-LR are expressed as

Inter-LR=r | r (Layer_j, Layerk) },

Or

Inter-LR=r | r (IG^j, IG^k), IG^j∈ IGSj, IG^k∈IGSk}；

R indicates granulosa Layer_jWith Layer_kThe partial ordering relation met between middle information, j, k=1 ..., m. wherein, r It is the relationship in adjacent two granulosa between information, or the relationship between the information of cross-layer.

Further, the batch processing method of the computer big data specifically includes:

Step 1 inputs customer data using data input device by data input module；

Step 2, main control module dispatch data resource to be processed, resource using dispatching algorithm by scheduling of resource module Scheduler module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, specific steps are as follows:

(1) judge whether the task in data acquisition system is sky, and not empty then downward execution (2) otherwise arrives (6)；

(2) for the task in data acquisition system, find out respectively they be mapped on all virtual machines with execute the time, Obtain a matrix；

(3) according to the result of (2) find out deadline the smallest task corresponding to virtual machine；

(4) task is distributed to virtual machine, and the task is deleted from data acquisition system；

(5) matrix is updated, (1) is returned to；

Step 3 utilizes batch program dispatch processor batch processing process to be processed by batch processing execution module Operation；Cryptographic operation is carried out to big data using encipheror by encrypting module；

Step 4 is analyzed big data using analysis program by analysis module, the analysis method of analysis module Are as follows:

(1) by big data, temporally fragment is stored in distributed data base, and adds to the data content in database Close processing；

(2) it is arranged in the interim table of initial data and concordance list of distributed data lab setting caching big data, concordance list Location information of the corresponding big data in the interim table of initial data；

(3) when carrying out big data analysis, according to the correspondence big data stored in the concordance list in server in original number According to the location information in interim table, fast decryption is carried out to encryption data by main control module, is called from the interim table of initial data Big data is analyzed, and is analyzed as a result, being stored in distributed data base.

Step 5 stores big data resource using memory by data memory module；

Step 6 shows the big data information content using display by display module；

The encrypting module encryption method is as follows:

(1) after receiving target big data, the target big data is handled according to preset rules, and described in determination Whether target big data is encrypted；

(2) if so, forming a key request to the target big data, and the key request is put into mesh It marks in queue；

(3) key request is successively taken out from the object queue, and proposes production number to big data key production module According to the request of encryption key；

(4) encryption key message that the key production module issues is received, and according to the encryption key message to institute Big data is stated to be encrypted.

Further, after the reception target big data, the target big data is handled according to preset rules, and Determine whether the target big data is encrypted, comprising:

After receiving target big data, rule is handled according to the piecemeal of data, piecemeal processing is carried out to the target big data, And treated that the target big data determines whether each piece encrypt respectively to piecemeal；

It is described if so, form a key request to the target big data, and the key request is put into mesh It marks in queue, comprising:

If so, a key request is formed to each piece in the target big data data block encrypted, And the key request is put into object queue.

Further, described that key request is successively taken out from the object queue, and mentioned to big data key production module The request of data encryption key is produced out, comprising:

According to the principle of first in, first out, key request is successively taken out from the object queue, and raw to big data key The request of creation data encryption key is proposed at module；

The encryption information includes the information of initial key, when the leakage of single block key, is produced using new initial key Raw key removes the block of encryption leakage key, and updates initial key in encryption information table, the information of block encryption key；In list When calculating to function, increase the information of information change key number, block symmetric key generate function be M (F (K, A, F (N))), in encryption information table in front on the basis of the information N comprising key change；

The distributed data base is Hbase database；

It is described in big data storage to before distributed data base, further including to the integrity verification of big data and legal Property verifying, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent to service Device locally completes legitimate verification；

The mode of the interim table cache big data of initial data of the caching big data are as follows:

Line unit rowkey is set using remote procedure call retrospect mark traceID, entry method name entrace and time It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID；

Described big data is stored in Hbase includes: that rowkey is set using traceID, entry method name and time It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.

Another object of the present invention is to provide a kind of computers of batch processing method for realizing the computer big data Program.

Another object of the present invention is to provide a kind of terminal, the terminal, which is at least carried, realizes the big number of computer According to batch processing method server.

Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer When upper operation, so that computer executes the batch processing method of the computer big data.

Another object of the present invention is to provide a kind of computers of batch processing method for implementing the computer big data The batch processing system of the batch processing system of big data, the computer big data includes:

Data input module is connect with main control module, for inputting customer data by data input device；

Main control module, with data input module, scheduling of resource module, batch processing execution module, encrypting module, analysis mould Block, data memory module, display module connection, work normally for controlling modules by single-chip microcontroller；

Scheduling of resource module, connect with main control module, for dispatching data resource to be processed by dispatching algorithm；

Batch processing execution module, connect with main control module, for being waited for by batch program dispatch processor batch processing Treatment progress operation；

Encrypting module is connect with main control module, for carrying out cryptographic operation to big data by encipheror；

Analysis module is connect with main control module, for being analyzed by analyzing program big data；

Data memory module is connect with main control module, for storing big data resource by memory；

Display module is connect with main control module, for showing the big data information content by display.

Another object of the present invention is to provide the enterprises that one kind at least carries the batch processing system of the computer big data Industry IT service equipment.

Advantages of the present invention and good effect are as follows:

(1) present invention by encrypting module in use, the code integrity of big data platform can through the invention come Verifying, even if big data platform is attacked by hacker and wooden horse, the present invention also can be detected and be alerted automatically.Even if of the invention big Data platform is encroached on by attack or virus or wooden horse, utilizes system integrity calibration technology (Hash provided by the present invention Algorithmic technique) can accurately recover with original identical system, avoid the leakage or loss of data.

(2) by analysis module by big data temporally fragment is stored in distributed data base while, in server sheet Setting caches the interim table of initial data and concordance list of big data in ground caching, and corresponding big data is provided in concordance list in original Location information in the interim table of beginning data, when carrying out big data analysis, directly according to the concordance list in server from original number Big data is called according to interim table, due to using secondary index mode, obtaining analysis result when analyzing big data It is stored in the analysis result table of distributed data base, does not need to obtain in the big data of magnanimity in distributed data base big Data, so time-consuming is short and is easily achieved.Further, location information of the big data in the interim table of initial data is remote The information of journey invocation of procedure big data is unique identification and reflects the called process of big data.

(3) present invention uses improved Min-Min dispatching algorithm to scheduling of resource module, by the excellent of multiple-task First filtering and priority processing meet calculating task diversity, calculate the big requirement of data volume, and the load for improving resource is equal Weighing apparatus degree and dispatching efficiency, improve work efficiency, and have saved the time.

When big data analysis of the present invention, using the non-precision solution of Granule Computing method analysis big data problem, by problem Input from most fine granularity initial data be converted to information indicate, retain data in contain information and value under the premise of, Data volume is greatly reduced.

Granule Computing played an important role in Intelligent Information Processing field as a kind of calculation paradigm, but by its There is directive function applied to big data analysis.

Detailed description of the invention

Fig. 1 is the batch processing method flow chart that the present invention implements the computer big data provided.

Fig. 2 is the batch processing system structural block diagram that the present invention implements the computer big data provided.

In figure: 1, data input module；2, main control module；3, scheduling of resource module；4, batch processing execution module；5, add Close module；6, analysis module；7, data memory module；8, display module.

Fig. 3 is the dynamic real-time update mechanism flow chart that the present invention implements the multi-source heterogeneous kernel structure provided.

Fig. 4 is that the present invention implements the suitable granulosa of selection provided, meets granularity metric demand and time constraints figure.

Fig. 5 is that the present invention implements the man-machine coordination alert response illustraton of model provided.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to this hair It is bright to be further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not For limiting the present invention.

With reference to the accompanying drawing and specific embodiment is further described application principle of the invention.

As shown in Figure 1, a kind of batch processing method of computer big data provided by the invention the following steps are included:

S101 inputs customer data using data input device by data input module；

S102, main control module dispatch data resource to be processed using dispatching algorithm by scheduling of resource module；

S103 is made by batch processing execution module using batch program dispatch processor batch processing process to be processed Industry；Cryptographic operation is carried out to big data using encipheror by encrypting module；

S104 analyzes big data using analysis program by analysis module；

S105 stores big data resource using memory by data memory module；

S106 shows the big data information content using display by display module.

As shown in Fig. 2, the batch processing system of computer big data provided in an embodiment of the present invention, comprising: data input mould Block 1, main control module 2, scheduling of resource module 3, batch processing execution module 4, encrypting module 5, analysis module 6, data memory module 7, display module 8.

Data input module 1 is connect with main control module 2, for inputting customer data by data input device；

Main control module 2, with data input module 1, scheduling of resource module 3, batch processing execution module 4, encrypting module 5, point It analyses module 6, data memory module 7, display module 8 to connect, be worked normally for controlling modules by single-chip microcontroller；

Scheduling of resource module 3 is connect with main control module 2, for dispatching data resource to be processed by dispatching algorithm；

Batch processing execution module 4 is connect with main control module 2, for passing through batch program dispatch processor batch processing Process operation to be processed；

Encrypting module 5 is connect with main control module 2, for carrying out cryptographic operation to big data by encipheror；

Analysis module 6 is connect with main control module 2, for being analyzed by analyzing program big data；

Data memory module 7 is connect with main control module 2, for storing big data resource by memory；

Display module 8 is connect with main control module 2, for showing the big data information content by display.

5 encryption method of encrypting module provided by the invention is as follows:

After reception target big data provided by the invention, the target big data is handled according to preset rules, and Determine whether the target big data is encrypted, comprising:

It is provided by the invention successively to take out key request from the object queue, and to big data key production module It is proposed the request of creation data encryption key, comprising:

According to the principle of first in, first out, key request is successively taken out from the object queue, and raw to big data key The request of creation data encryption key is proposed at module.

Encryption information provided by the invention includes the information of initial key, when the leakage of single block key, at the beginning of new Beginning key generates key and removes the block of encryption leakage key, and updates initial key in encryption information table, the letter of block encryption key Breath；When one-way function calculates, increase the information of an information change key number, it is M that block symmetric key, which generates function, (F (K, A, f (N))), in encryption information table in front on the basis of the information N comprising key change.

6 analysis method of analysis module provided by the invention is as follows:

(1) by big data, temporally fragment is stored in distributed data base；

(3) when carrying out big data analysis, according to the correspondence big data stored in the concordance list in server in original number According to the location information in interim table, calls big data to be analyzed from the interim table of initial data, analyzed as a result, being stored in In distributed data base.

Distributed data base provided by the invention is Hbase database.

It is provided by the invention in big data storage to before distributed data base, further include testing the integrality of big data Card and legitimate verification, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent out It gives server local and completes legitimate verification.

The mode of the interim table cache big data of initial data of caching big data provided by the invention are as follows:

Line unit rowkey is set using remote procedure call retrospect mark traceID, entry method name entrace and time It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.

It is provided by the invention big data is stored in Hbase include: rowkey using traceID, entry method name and Time setting, column name are set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.

Below with reference to concrete analysis, the invention will be further described.

The present invention analyzes big data using analysis program by analysis module, specifically includes:

For the characteristic of big data, unified big data problem Granule Computing solution framework is proposed, 3 V characteristics of big data can To press following sequential processes: (certainly, some data itself do not have this 3 spies to diversity → flood tide → high speed simultaneously Property, need to be accepted or rejected according to the actual situation)

(1) multiplicity of distributed storage, heterogeneous data converted using data filtering and data integration, extracted, grain Change, the tables of data more standardized eliminates uncertainty therein.

(2) be directed to problem, using under Granule Computing " gamp " concrete model and technology former data are granulated into granule size Suitable grain reduces data scale, and constructs the structure on corresponding granulosa and each granulosa

(3) under the auxiliary of other machines learning method, data mining or machine learning are carried out to information

(4) method used is transform as to the version of distributed, online incremental learning with meet big data processing and When property requires

(5) in processing big data, the free switching of granularity needs to consider the decomposition and conjunction of grain on multiple granularity levels And there are also the rapid builds accordingly solved；It to certain particular problems, needs to consider simultaneously the information of multiple granularity levels, uses " across granularity " mechanism Solve problems

(6) from entire treatment process, it can be found that whether initial data has suitable granularity, to need to adjust Generation that is whole and how adjusting initial data or acquisition, which provide, instructs

(7) deep learning (Deep Learning) thought is used for reference, crucial process flow is adjusted to many levels, is allowed Design parameter (such as the size and granulosa number of grain) is optimized in study, to optimize final learning outcome

Between big data process flow (data acquisition → extraction/cleaning → integrated/expression → analysis/modeling → explanation) The arrow " data source regulation instruction " for having the lower right corner specific corresponding relationship is actually the analysis application according to previous stage Data granularity (acquisition or the accuracy generated and frequency etc.) is adjusted, is corresponded to " data acquisition "；Then " data Source selection and data integration " corresponds to " extraction/cleaning "；" granulation of domain-oriented " corresponds to " integrated/to indicate " of data； " parallelization/increment type in " the Granule Computing methodology model & other machines learning model " and upper right side round rectangle of top Kernel structure updates and problem solving " corresponding " analysis/modeling "；Since information inherently has specific semanteme, It is granulated and has clear " explanation " with the process that excavation/learning model is analyzed

1) data source capability and data integration

First link of big data processing is to confirm which data might have help for the solution of problem, which It is the Mai Kenxi unrelated with theme a bit it is thought that one of 3 key challenges of big data analysis

The primitive form of big data generally has " diversity ", including syntactic metacharacter and Semantic Heterogeneous wherein syntactic metacharacter The atomicity of data is maintained, only name is different or Type-Inconsistencies, such case are easier to processing Semantic Heterogeneous Then it is related to the difference of many aspects such as data granularity and data type, needs to carefully analyze, then with metadata come to original number According to such as video data is described, some applications only need its some essential informations (such as scene type, duration)

To in terms of the processing of isomeric data, Pal discuss how data preprocessing phase processing data isomerism, The method mentioned has dimensionality reduction, data enrichment (data condensation) and data encapsulation (data wrapping) .Pedrycz it describes for isomeric data,

How to carry out preparation stage of the fuzzy clustering as big data analysis, data integration be essential about Data integration have been relatively mature.

2) granulation of domain-oriented:

Utilize the non-precision solution of Granule Computing method analysis big data problem, it is therefore an objective to by the input of problem from most particulate Degree initial data be converted to information expression, retain data in contain information and value under the premise of, data are greatly reduced Measure

The granulation of domain-oriented means before specific data analysis requirements propose, according to domain knowledge by original number According to being first configured to more granular information knowledge representation models (Multi-Granular Inf ormation/Knowledge Representation model, MGrIKR) building MGrIKR meaning be for family of solutions granularity thickness different problems Suitable calculate is provided and inputs

It is granulated the expression firstly the need of analysis information, granulosa and entire kernel structure, then carries out structure for representation method Build

(1) expression of information:

The representation method in the quotient space to manifold is used for reference, formalized description is carried out to information using triple, i.e., IG=(KVS, GM, VM) .KVS (Key Value pair Set) indicates the feature subvector of description information grain, is referred to as Key-value pair set, i.e. KVS={ < key₁, value₁> ..., < key_n, value_n〉}.valu e_iIndicate entitled key in information_i The value that is taken of feature, i=1,2 ..., n.GM indicate the granularity metric (Granularity Measure) of the information, i.e., The fineness .VM of information indicates measure of value (Value Measur e) of the information

Data granulation is granulated from example (examples/instances) and feature (features/attributes) grain Change both direction and carry out screening and combination that feature granulation refers mainly to feature, the kernel function side in machine learning can be used for reference The granulation of method example can use the Clustering of data mining, i.e., first determine the particulate degree that an information granulosa is included According to the module of similarity, then domain is split, so that each data similarity degree inside the same information Similarity degree minimum between the data of maximum, different informations

About the expression of information granularity metric GM, further analyze in combination with existing granularity metric method for example, The granularity metric method that Yao is proposed, i.e.,

Wherein, π={ X₁, X₂..., X_mIt is to be divided to one of domain U, X_iBe U subset when granularity is most thin, i.e., Each grain is single point set, there is GM (π)=0；When granularity is most thick, i.e., entire domain be a grain, GM (π)=log | U | information The granularity metric of grain helps to find suitable granulosa in problem solving process, i.e. the optimization of granular space

About the measure of value VM of information, mainly determined in terms of granularity metric, uncertainty and domain knowledge 3: 1. the granularity metric of information and data analysis requirements fitness are higher, it is worth bigger；The excessively thick or meticulous information of granularity Grain, value can all reduce；2. can determine information using variance analysis method in comentropy in information theory and statistics Measure of value；3. allowing to specify the measure of value of specific information grain by domain knowledge and expertise

(2) expression of granulosa

Granulosa (Layer) is by based on the relationship structure between certain granulation criterion obtained all informations and information Can be with formalization representation for a binary group at granulosa, i.e. wherein, IGS indicates granulosa to Layer=(IGS, I ntra-LR) The set (Inform a-tion Granule Set, IGS) of middle information IG, IGS be represented by IGS=IG1, IG2 ..., IGM }；

Intra-LR (Intra-Layer Relationships, Intra-LR) is indicated in granulosa between information Relationship that may be present, if information IG_pWith IG_qThere are relationships, then, Intra-LR be represented by Intra-LR=E | E=(IG_p, IG_q), IG_p, IG_q∈IGS}.

(3) expression of kernel structure:

Kernel structure in MGrIKR is the multiple granulosas obtained by different granulation criterion, in different granulosa between information Correlation and same granulosa in the topological structure that constitutes of correlation between information therefore, the form of kernel structure Changing indicates to be similar to information IG and granulosa Layer, it is also possible to tuple form expression kernel structure (GranularStructure, GS), i.e.,

GS=(LS, Inter-LR)

Wherein, LS={ Layer₁..., Layer_M-1, Layer_mIndicate m granulosa set (Layer Set, LS), Middle granulosa Layer_jBe in kernel structure a granulosa .Inter-LR (Inter-Layer Relation-ships, Inter-LR certain two granulosa Layer) is indicated_jTransformational relation collection between the information of Layerk, Inter-LR can be indicated For

Inter-LR=r | r (Layer_j, Layerk) },

Or

Inter-LR=r | r (IG^j, IG^k), IG^j∈ IGSj, IG^k∈ IGSk } here, r indicate granulosa Layer_jWith Layer_kThe partial ordering relation met between middle information, j, k=1 ..., wherein, r can be information in adjacent two granulosa to m. Between relationship, can also be the relationship between the information of cross-layer

The granulation of big data is exactly the formalization representation referring to information, granulosa and kernel structure, calculates each tuple In each element

3) parallelization/increment type kernel structure update and problem solving:

The speed that " high speed " feature request of big data analyzes it is fast, and the response action taken wants timely current Available technical solution mainly has parallelization calculating and incrementally updating, and wherein parallel computation includes using distributed parallel meter It calculates platform, the parallel multiple computing units for using multi-core CPU and carries out the such as cooperated computing using GPU and work as large-scale dataset When middle small part data change, safeguard that entire MGrIKR and amendment are asked based on MGrIKR using the thought of incremental update The result that solves is inscribed in the timeliness for ensureing big data analysis, from the timeliness of information update and problem solving and Two aspects of when property carry out analysis

(1) timeliness that information updates --- the dynamic of multi-source heterogeneous kernel structure updates

Without loss of generality, herein the present invention consider complex situations under (multi-source heterogeneous dynamic dataflow) kernel structure dynamic Update, remaining simple case is similar can be obtained and establishes initial kernel structure respectively to each data source first, then will it is each at the beginning of Beginning kernel structure is integrated according to certain relationship, is ultimately formed a global kernel structure first step, is integrated kernel structure first Formalized description integrates two kernel structure GS_i=(LS_i, In ter-LR_i) and GS_j=(LS_j, Inter-LR_j) can define One logical operation defines binary and maps f:GS × GS → GS, wherein GS is entire problem domain, i.e. kernel structure Set, the mapping of this binary should meet operation rule:

F (GSi, GSj)=(f₁(LS_i, LS_j), f₂(Inter-LR_i, Inter-LR_j)) wherein, binary map f₁It will The level of two kernel structures is integrated,

Form a new global granulosa；Binary maps f₂Two kernel structures are reintegrated, in granulosa and granulosa Between information set of relations integration process in, need between different granulosas between same granulosa information convert Set of relations is integrated, the merging, deletion including relationship and update

Second step, the dynamic update that the dynamic of each component kernel structure updates kernel structure can formalize are as follows: Up dateGS(GS_i)=(UpdateL (LS_i), UpdateR (I nter-LR_i)) wherein,

UpdateL is the dynamic updating method of granulosa, and UpdateR is that the dynamic of information set of relations in layer and layer updates Method

Third step, the incremental update of global kernel structure update result by the dynamic of each data source and design global burl The update method of structure, formalization representation are

Update (globalGS)=Update (UpdateGS (G S1), UpdateGS (GS2) ..., UpdateGS (GS n)).

The dynamic real-time update mechanism as shown in Figure 3 of multi-source heterogeneous kernel structure

(2) timeliness of problem solving --- the application type analysis solved based on MGrIKR:

Since Granule Computing itself has the property of " non-precision ", it is not able to satisfy at all types of big datas Reason demand is directed to suitable problem types, and the calculating based on kernel structure can accelerate solution procedure, guarantees which timeliness determines The big data problem of a little types is suitable for using the extremely important of Granule Computing method herein, and the present invention temporarily proposes two class problems as example The sub- further types of problem of can find in further analysis work

1. granular space optimization problem of example describes granulosa select permeability using optimum theory, determines the calculating grain effectively solved Degree, to obtain effectively solution in the shortest time

The validity for defining 1. solutions can define SolutionEf fectiveness=(GM by a binary group (R), T_u) .R be calculate as a result, GM (R) is the granularity metric of the result, T_uIf being the GM GM of time limit demand one solution (R), and the time of this solution is obtained less than T_u, then this solution has validity, referred to as effectively solution

In order to select the granulosa of one " suitable " to be calculated from the kernel structure of domain-oriented, to reduce the reality calculated Border space-time expends, and need to carry out granular space optimizing granular space optimizing is exactly to find such granulosa in the m layer of kernel structure Layer_i:

MaxGM(Layeri)

s.t.

GM (Ri=Solve (Layeri))≤GM (Ru), Ti≤Tu, 1≤i≤m.

Wherein, R_i、T_iIt is to consider problem as shown in Figure 4 in i-th layer of upper result solved and spent time respectively

Solution granularity meet demand on Layer3, but the time is not able to satisfy time constraints；Layer₁On the solution time can be with Meet time constraints, but the solution on granularity and too thick the two granulosas of is not effective solution for solving on Layer2 while expiring The granularity requirements and time constraints understood enough are effective solution

2. man-machine coordination of example can progressive computational problem in the decision system being made of people and computing system, if by certainly There is the calculating of detachable property, decision can gradually refine for the action of plan guidance, and from current state, the solution more refined The action that can be used for instructing next step can construct " beating the gun " and " taking action when calculating " aiming at the problem that this type For man-machine coordination alert response model in adjacent granulosa, lower layer's solution is the refinement of upper layer solution, is denoted as Ri-1 < Ri, each solution pair Using the tack (ActionStep, AS) that family is taken in next step, it is denoted as R_i→AS_i, and the corresponding action of entire decision A has detachable property, is denoted as

According to action the number of steps of, determine the value of n, that is, determined solution stage and parallel granularity, then from Filter out suitable n granulosa in the kernel structure of the domain-oriented pre-established, man-machine coordination can progressive Solve problems model such as Shown in Fig. 5

If not using the progressive calculation of man-machine coordination, Action-Step1 can only be held since t3 time point The time that finally completes of row, entire decision and action can significantly delay

Below with reference to effect, the invention will be further described.

The problems such as a possibility that Granule Computing is applied to big data processing and model framework:

(1) the granulation emphasis for analyzing big data is directed to " high speed " and " flood tide " of big data, continues to Granule Computing base This model and algorithm carry out theory analysis, obtain more quick granulating method, accelerate the common side of one kind of knowledge acquisition speed Method is incrementally updating, has there is some good achievement in terms of the incrementally updating of rough set in recent years

(2) optimization of analysis granular space, the switching of granularity level and more granularity combined calculations this 3 kinds of Granule Computing modes are big Under data environment using for example, by the man-machine coordination alert response model conversation of example 2 at another Problem-Solving Model, That is the progressive solving model of precision:

Since providing most coarseness solution, pass rank towards the direction of more fine granularity level and calculated, user it is in office when It is before guaranteeing timeliness that the meaning of currently available this computation model of most fine granularity solution can be got by, which carving, It puts, obtains non-precision solution with practical value.

(3) the directive function of Granule Computing processing frame processing frame under big data environment is analyzed and verifies to consider How Granule Computing thought is used in the links of big data processing, combines Granule Computing concrete model and data mining/machine The application of device learning algorithm is used for instructing big data analysis in combination with specific domain background and data analysis requirements Analysis and practice, and according to guidance during the new problem that finds be allowed to be corrected and improved

(4) the fast-developing basis IT of combining closely the Parallel Implementation method of analysis Granule Computing processing big data problem is set It applies and software platform, acceleration of the exploitation parallel computation in the Granule Computing method analysis of big data can be parallel for data Computation-intensive task, analyze the GPU+CPU High Performance Computing Cluster solution of Granule Computing；It is huge for data volume and The problem that data overall relevance is relatively strong, concurrency is weaker analyzes the processing on the Open Source Platforms such as Hadoop, Spark/Storm Method

(5) concrete application background is combined, is handled in scientific analysis and engineer application using the big data based on Granule Computing Method is in extensive video monitoring system, and after monitoring video is granulated according to scene classification information, being organized into has The kernel structure of Scene Semantics, to realize that the compression storage of monitor video and efficient retrieval these specific analysis work will The big data based on Granule Computing of enriching constantly handles the theoretical model and technological means in this direction

The present invention analyzes a possibility that handling big data using Granule Computing, proposes a kind of big data based on Granule Computing The correlation analysis basis future for handling frame, and reviewing big data and Granule Computing field needs the work carried out to be mainly In conjunction with specific application field and analysis demand, analyzes the MGrIKR building of big data, develops and based on big data MGrIKR Intelligence computation method

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one A computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from One web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL) or wireless (such as infrared, wireless, microwave etc.) mode is into another web-site, computer, server or data The heart is transmitted).The computer-readable storage medium can be any usable medium that computer can access either The data storage devices such as server, the data center integrated comprising one or more usable mediums.The usable medium can be Magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk SolidStateDisk (SSD)) etc..

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of batch processing method of computer big data, which is characterized in that the batch processing method packet of the computer big data It includes: big data being analyzed using analysis program by analysis module；Specifically have:

The multiplicity of distributed storage, heterogeneous data are converted using data filtering and data integration, extracts, are granulated, is obtained The tables of data more standardized eliminates uncertainty therein；

Using under Granule Computing " gamp " concrete model and technology former data are granulated into the suitable grain of granule size, reduce data Scale, and construct the structure on corresponding granulosa and each granulosa；

The data mining used or machine learning are transform the version of distributed, online incremental learning as to meet big data The timeliness requirement of processing；

In processing big data, the free switching of granularity, need on multiple granularity levels the decomposition of grain with merge, there are also corresponding solutions Rapid build；To certain particular problems, the information of multiple granularity levels is needed, " across granularity " mechanism is used to solve；

Whether from entire treatment process, whether analysis initial data has suitable granularity, to need to adjust and how to adjust The generation or acquisition of initial data provide guidance；

Deep learning thought is used for reference, crucial process flow is adjusted to many levels, allows design parameter to obtain in study excellent Change, and optimizes final learning outcome.

2. the batch processing method of computer big data as described in claim 1, which is characterized in that carry out analysis tool to big data Body includes: data acquisition → extraction/cleaning → integrated/expression → analysis/modeling → explanation；

Wherein:

1) data source capability and data integration:

2) granulation of domain-oriented: the input of problem, which is converted to information from most fine granularity initial data, to be indicated, in encumbrance Under the premise of the information and value that contain in, data volume is greatly reduced；Before the proposition of specific data analysis requirements, according to neck Initial data is first constructed more granular information knowledge representation model Multi-GranularInformation/ by domain knowledge KnowledgeRepresentationmodel, MGrIKR；Granulation analyzes the table of information, granulosa and entire kernel structure first Show, is then constructed for representation method；

Wherein, formalized description, IG=(KVS, GM, VM) .KVS the expression of information: are carried out to information using triple (KeyValuepairSet) the feature subvector of description information grain, referred to as key-value pair set, i.e. KVS={ < key are indicated₁, value₁> ..., < key_n, value_n〉}.value_iIndicate entitled key in information_iThe value that is taken of feature, i=1,2 ..., N.GM indicates that the granularity metric (GranularityMeasure) of the information, i.e. the fineness .VM of information indicate the letter Cease the measure of value (ValueMeasure) of grain；

The expression of granulosa: granulosa is by based on the relationship structure between certain granulation criterion obtained all informations and information At；Formalization representation is a binary group, Layer=(IGS, Intra-LR)；Wherein, IGS indicates information IG in granulosa Gather (Informa-tionGranuleSet, IGS), IGS is represented by IGS={ IG1, IG2 ..., IGM }；

Intra-LR (Intra-LayerRelationships, Intra-LR) indicates existing between information in granulosa Relationship, if information IG_pWith IG_qThere are relationship, Intra-LR be represented by Intra-LR=E | E=(IG_p, IG_q), IG_p, IG_q∈IGS}；

The expression of kernel structure: information in multiple granulosas that kernel structure in MGrIKR is obtained by different granulation criterion, different granulosas The topological structure that correlation in correlation and same granulosa between grain between information is constituted；The form of kernel structure Changing indicates to be similar to information IG and granulosa Layer, indicates kernel structure (GranularStructure, GS) with tuple form,

GS=(LS, Inter-LR)；

Wherein, LS={ Layer₁..., Layer_M-1, Layer_mIndicate m granulosa set (LayerSet, LS), wherein granulosa Layer_jIt is granulosa .Inter-LR (Inter-LayerRelation-ships, an Inter-LR) table in kernel structure Show certain two granulosa Layer_jTransformational relation collection between the information of Layerk, Inter-LR are expressed as

Inter-LR=r | r (Layer_j, Layerk) },

Or

Inter-LR=r | r (IG^j, IG^k), IG^j∈ IGSj, IG^k∈IGSk}；

R indicates granulosa Layer_jWith Layer_kThe partial ordering relation met between middle information, j, k=1 ..., wherein, r is adjacent to m. Relationship in two granulosas between information, or the relationship between the information of cross-layer.

3. the batch processing method of computer big data as described in claim 1, which is characterized in that the computer big data Batch processing method specifically includes:

Step 1 inputs customer data using data input device by data input module；

Step 2, main control module dispatch data resource to be processed, scheduling of resource using dispatching algorithm by scheduling of resource module Module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, specific steps are as follows:

(2) for the task in data acquisition system, find out respectively they be mapped on all virtual machines with execute the time, obtain one A matrix；

(5) matrix is updated, (1) is returned to；

Step 3 utilizes the process operation to be processed of batch program dispatch processor batch processing by batch processing execution module； Cryptographic operation is carried out to big data using encipheror by encrypting module；

Step 4 analyzes big data using analysis program by analysis module, and the analysis method of analysis module includes:

(1) by big data, temporally fragment is stored in distributed data base, and carries out at encryption to the data content in database Reason；

(2) it is provided in the interim table of initial data and concordance list of distributed data lab setting caching big data, in concordance list pair Answer location information of the big data in the interim table of initial data；

(3) when carrying out big data analysis, faced according to the correspondence big data stored in the concordance list in server in initial data When table in location information, fast decryption is carried out to encryption data by main control module, calls big number from the interim table of initial data According to being analyzed, analyzed as a result, being stored in distributed data base；

Step 5 stores big data resource using memory by data memory module；

The encrypting module encryption method is as follows:

(1) after receiving target big data, the target big data is handled according to preset rules, and determines that the target is big Whether data are encrypted；

(2) if so, forming a key request to the target big data, and the key request is put into object queue It is interior；

(3) key request is successively taken out from the object queue, and proposes that creation data adds to big data key production module The request of key；

(4) encryption key message that the key production module issues is received, and according to the encryption key message to described big Data are encrypted.

4. the batch processing method of computer big data as described in claim 1, which is characterized in that the reception target big data Afterwards, the target big data is handled according to preset rules, and determines whether the target big data is encrypted, Include:

After receiving target big data, rule is handled according to the piecemeal of data, piecemeal processing is carried out to the target big data, and is right Treated that the target big data determines whether each piece encrypted respectively for piecemeal；

It is described if so, form a key request to the target big data, and the key request is put into target team In column, comprising:

If so, forming a key request to each piece in the target big data data block encrypted, and will The key request is put into object queue.

5. the batch processing method of computer big data as described in claim 1, which is characterized in that described successively from the target Key request is taken out in queue, and the request of creation data encryption key is proposed to big data key production module, comprising:

According to the principle of first in, first out, key request is successively taken out from the object queue, and generate mould to big data key The request of block proposition creation data encryption key；

The encryption information includes the information of initial key, when the leakage of single block key, is generated using new initial key close Key removes the block of encryption leakage key, and updates initial key in encryption information table, the information of block encryption key；In one-way function When calculating, increasing the information of an information change key number, it is M (F (K, A, f (N))) that block symmetric key, which generates function, In encryption information table in front on the basis of comprising key change information N；

The distributed data base is Hbase database；

It is described in big data storage to before distributed data base, further include testing the integrity verification and legitimacy of big data Card, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent to server local Complete legitimate verification；

Line unit rowkey is using remote procedure call retrospect mark traceID, entry method name entrace and time setting, column Name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID；

Described big data is stored in Hbase includes: rowkey using traceID, entry method name and time setting, column name It is set as arbitrary value, the key assignments in key-value pair is spliced using spanID and big data value roleID.

6. a kind of computer program for realizing the batch processing method of computer big data described in Claims 1 to 5 any one.

7. a kind of terminal, which is characterized in that the terminal, which is at least carried, realizes computer described in Claims 1 to 5 any one The server of the batch processing method of big data.

8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed Benefit requires the batch processing method of computer big data described in 1-5 any one.

9. a kind of batch processing system of the computer big data for the batch processing method for implementing computer big data described in claim 1 System, which is characterized in that the batch processing system of the computer big data includes:

Main control module, with data input module, scheduling of resource module, batch processing execution module, encrypting module, analysis module, number It connects according to memory module, display module, is worked normally for controlling modules by single-chip microcontroller；

Batch processing execution module, connect with main control module, for be processed by batch program dispatch processor batch processing Process operation；

10. a kind of enterprise's IT service equipment for the batch processing system at least carrying computer big data described in claim 9.