CN109522742A - A kind of batch processing method of computer big data - Google Patents
A kind of batch processing method of computer big data Download PDFInfo
- Publication number
- CN109522742A CN109522742A CN201811257472.XA CN201811257472A CN109522742A CN 109522742 A CN109522742 A CN 109522742A CN 201811257472 A CN201811257472 A CN 201811257472A CN 109522742 A CN109522742 A CN 109522742A
- Authority
- CN
- China
- Prior art keywords
- data
- big data
- information
- module
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 24
- 238000004458 analytical method Methods 0.000 claims abstract description 51
- 238000012545 processing Methods 0.000 claims abstract description 51
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 16
- 238000011112 process operation Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 55
- 239000008187 granular material Substances 0.000 claims description 31
- 238000007405 data analysis Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 19
- 238000005469 granulation Methods 0.000 claims description 18
- 230000003179 granulation Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 11
- 238000004519 manufacturing process Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 10
- 230000010354 integration Effects 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 8
- 238000007418 data mining Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000008901 benefit Effects 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 4
- 238000013500 data storage Methods 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 239000012634 fragment Substances 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 239000010410 layer Substances 0.000 description 31
- 238000004364 calculation method Methods 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 9
- 230000009471 action Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 4
- 230000000750 progressive effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 241001269238 Data Species 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000010009 beating Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to big data batch system fields, disclose a kind of batch processing method of computer big data, input customer data using data input device by data input module;Main control module dispatches data resource to be processed using dispatching algorithm by scheduling of resource module, scheduling of resource module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, utilizes the process operation to be processed of batch program dispatch processor batch processing by batch processing execution module;Cryptographic operation is carried out to big data using encipheror by encrypting module;Big data is analyzed using analysis program by analysis module, big data resource is stored using memory by data memory module;The big data information content is shown using display by display module.The present invention does not need into distributed data base to obtain big data in the big data of magnanimity, so time-consuming short and be easily achieved.
Description
Technical field
The invention belongs to big data batch system field more particularly to a kind of batch processing methods of computer big data.
Background technique
Big data includes structuring, semi-structured and unstructured data, and unstructured data increasingly becomes data
Major part.Show according to the survey report of IDC: 80% data are all unstructured datas in enterprise, these data are every year all
Exponentially-increased 60%.Big data is exactly internet development to a kind of presentation or feature in stage now, it is not necessary that mind
It talks about it or the heart revered is kept to it, using cloud computing as under the setting off of the technological innovation curtain of representative, these seem originally
The data that is difficult to collect and use start to be easy to be utilized, and by constantly bringing forth new ideas for all trades and professions, big data can be gradually
More values are created for the mankind.The generation of big data analysis be intended to IT management, enterprise can by real-time stream analyze and
Historical correlation data combines, and then big data analysis and finds the model needed for them.In turn, aid forecasting and prevention
The following outage and performance issue.For further, they can use big data understanding and are become using model and geography
Gesture, and then deepen big data to the insight of responsible consumer.They can also track and record network behavior, and big data is light
Ground identifies service impact;Accelerate profit with the profound understanding for utilizing service to increase;Across multisystem collection data development simultaneously
IT service catalogue.However, traditional big data security protection technology can not be for the sensitive information and sensitivity inside big data platform
Data implement protection;Meanwhile big data is analyzed, and there are problems that timeliness is long and is not easily accomplished.
In conclusion problem of the existing technology is:
(1) traditional big data security protection technology can not be for the sensitive information and sensitive number inside big data platform
Protection is factually applied, the leakage of data is be easy to cause, causes damages to user.
(2) for big data when being analyzed, analysis time is longer, and working efficiency is lower, and is not easily accomplished, and is easy
Now the situation of the analysis mistake of batch occurs.
(3) traditional scheduling of resource module is lower to the scheduling rates of mass data resource, cause batch processing rate compared with
Slowly, the more time is wasted.
When big data analysis, the prior art does not utilize the non-precision solution of Granule Computing method analysis big data problem, will
The input of problem is converted to information from most fine granularity initial data and indicates, the information contained in retaining data and value
Under the premise of, data volume is greatly reduced.
Summary of the invention
In view of the problems of the existing technology, the present invention provides a kind of batch processing methods of computer big data.
The invention is realized in this way a kind of batch processing method of computer big data, comprising: pass through analysis module benefit
Big data is analyzed with analysis program;Specifically have:
Following sequential processes: diversity → flood tide → high speed sequential processes are pressed to the 3V characteristic of big data;
The multiplicity of distributed storage, heterogeneous data are converted using data filtering and data integration, extracts, are granulated,
The tables of data more standardized eliminates uncertainty therein;
Using under Granule Computing " gamp " concrete model and technology former data are granulated into the suitable grain of granule size, drop
Low data scale, and construct the structure on corresponding granulosa and each granulosa;
Under the auxiliary of other machines learning method, data mining or machine learning are carried out to information;
The data mining used or machine learning are transform the version of distributed, online incremental learning as to meet
The timeliness requirement of big data processing;
In processing big data, the free switching of granularity, need on multiple granularity levels the decomposition of grain with merge, also
The rapid build accordingly solved;To certain particular problems, the information of multiple granularity levels is needed, " across granularity " mechanism is used to solve;
From entire treatment process, analysis initial data whether there is suitable granularity, for whether need adjust and how
The generation or acquisition for adjusting initial data provide guidance;
Deep learning thought is used for reference, crucial process flow is adjusted to many levels, design parameter is allowed to obtain in study
To optimization, and optimize final learning outcome.
Further, big data analyze and be specifically included: → integrated/expression of data acquisition → extraction/cleaning → point
Analysis/modeling → explanation;
Wherein:
1) data source capability and data integration:
Row data source capability is encapsulated into using dimensionality reduction, data enrichment and data to the processing of isomeric data;
2) granulation of domain-oriented: the input of problem, which is converted to information from most fine granularity initial data, to be indicated, is being protected
Under the premise of information and value that residual contains in, data volume is greatly reduced;Before the proposition of specific data analysis requirements,
Initial data is first constructed to more granular information knowledge representation model Mu lti-Granular according to domain knowledge
Information/Knowledge Representation model, MGrIKR;
Granulation analyzes the expression of information, granulosa and entire kernel structure first, is then constructed for representation method;
Wherein, formalized description, IG=(K VS, GM, VM) expression of information: are carried out to information using triple
.KVS (Key Value pair Set) indicates the feature subvector of description information grain, referred to as key-value pair set, i.e. KVS=
{〈key1, va lue1> ..., < keyn, valuen〉}.valueiIndicate entitled key in informationiThe value that is taken of feature, i=
1,2 ..., n.GM indicate the granularity metric (Granularity Measure) of the information, i.e. the fineness .V of information
M indicates the measure of value (Value Measure) of the information;
The expression of granulosa: granulosa is by based on the pass between certain granulation criterion obtained all informations and information
System is constituted;Formalization representation is a binary group, Layer=(IGS, Intra- LR);Wherein, IGS indicates information in granulosa
The set (Informa-tio n Granule Set, IGS) of grain IG, IGS is represented by IGS={ IG1, IG2 ..., IGM };
Intra-LR (Intra-Layer Relationships, Intra-LR) is indicated in granulosa between information
Existing relationship, if information IGpWith I GqThere are relationship, Intra-LR be represented by Intra-LR=E | E=(IGp, IGq), IGp, IGq∈IGS};
The expression of kernel structure: in multiple granulosas that kernel structure in MGrIKR is obtained by different granulation criterion, different granulosas
The topological structure that correlation in correlation and same granulosa between information between information is constituted;Kernel structure
Formalization representation be similar to information IG and granulosa Layer, indicate kernel structure with tuple form
(GranularStructure, GS), GS=(LS, Inter-LR);
Wherein, LS={ Layer1..., LayerM-1, LayermIndicate m granulosa set (Layer Set, LS),
Middle granulosa LayerjBe in kernel structure a granulosa .Inter-LR (Inter-Layer Relation- ships,
Inter-LR certain two granulosa Layer) is indicatedjTransformational relation collection between the information of Layerk, Inter-LR are expressed as
Inter-LR=r | r (Layerj, Layerk) },
Or
Inter-LR=r | r (IGj, IGk), IGj∈ IGSj, IGk∈IGSk};
R indicates granulosa LayerjWith LayerkThe partial ordering relation met between middle information, j, k=1 ..., m. wherein, r
It is the relationship in adjacent two granulosa between information, or the relationship between the information of cross-layer.
Further, the batch processing method of the computer big data specifically includes:
Step 1 inputs customer data using data input device by data input module;
Step 2, main control module dispatch data resource to be processed, resource using dispatching algorithm by scheduling of resource module
Scheduler module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, specific steps are as follows:
(1) judge whether the task in data acquisition system is sky, and not empty then downward execution (2) otherwise arrives (6);
(2) for the task in data acquisition system, find out respectively they be mapped on all virtual machines with execute the time,
Obtain a matrix;
(3) according to the result of (2) find out deadline the smallest task corresponding to virtual machine;
(4) task is distributed to virtual machine, and the task is deleted from data acquisition system;
(5) matrix is updated, (1) is returned to;
Step 3 utilizes batch program dispatch processor batch processing process to be processed by batch processing execution module
Operation;Cryptographic operation is carried out to big data using encipheror by encrypting module;
Step 4 is analyzed big data using analysis program by analysis module, the analysis method of analysis module
Are as follows:
(1) by big data, temporally fragment is stored in distributed data base, and adds to the data content in database
Close processing;
(2) it is arranged in the interim table of initial data and concordance list of distributed data lab setting caching big data, concordance list
Location information of the corresponding big data in the interim table of initial data;
(3) when carrying out big data analysis, according to the correspondence big data stored in the concordance list in server in original number
According to the location information in interim table, fast decryption is carried out to encryption data by main control module, is called from the interim table of initial data
Big data is analyzed, and is analyzed as a result, being stored in distributed data base.
Step 5 stores big data resource using memory by data memory module;
Step 6 shows the big data information content using display by display module;
The encrypting module encryption method is as follows:
(1) after receiving target big data, the target big data is handled according to preset rules, and described in determination
Whether target big data is encrypted;
(2) if so, forming a key request to the target big data, and the key request is put into mesh
It marks in queue;
(3) key request is successively taken out from the object queue, and proposes production number to big data key production module
According to the request of encryption key;
(4) encryption key message that the key production module issues is received, and according to the encryption key message to institute
Big data is stated to be encrypted.
Further, after the reception target big data, the target big data is handled according to preset rules, and
Determine whether the target big data is encrypted, comprising:
After receiving target big data, rule is handled according to the piecemeal of data, piecemeal processing is carried out to the target big data,
And treated that the target big data determines whether each piece encrypt respectively to piecemeal;
It is described if so, form a key request to the target big data, and the key request is put into mesh
It marks in queue, comprising:
If so, a key request is formed to each piece in the target big data data block encrypted,
And the key request is put into object queue.
Further, described that key request is successively taken out from the object queue, and mentioned to big data key production module
The request of data encryption key is produced out, comprising:
According to the principle of first in, first out, key request is successively taken out from the object queue, and raw to big data key
The request of creation data encryption key is proposed at module;
The encryption information includes the information of initial key, when the leakage of single block key, is produced using new initial key
Raw key removes the block of encryption leakage key, and updates initial key in encryption information table, the information of block encryption key;In list
When calculating to function, increase the information of information change key number, block symmetric key generate function be M (F (K, A,
F (N))), in encryption information table in front on the basis of the information N comprising key change;
The distributed data base is Hbase database;
It is described in big data storage to before distributed data base, further including to the integrity verification of big data and legal
Property verifying, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent to service
Device locally completes legitimate verification;
The mode of the interim table cache big data of initial data of the caching big data are as follows:
Line unit rowkey is set using remote procedure call retrospect mark traceID, entry method name entrace and time
It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID;
Described big data is stored in Hbase includes: that rowkey is set using traceID, entry method name and time
It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.
Another object of the present invention is to provide a kind of computers of batch processing method for realizing the computer big data
Program.
Another object of the present invention is to provide a kind of terminal, the terminal, which is at least carried, realizes the big number of computer
According to batch processing method server.
Another object of the present invention is to provide a kind of computer readable storage mediums, including instruction, when it is in computer
When upper operation, so that computer executes the batch processing method of the computer big data.
Another object of the present invention is to provide a kind of computers of batch processing method for implementing the computer big data
The batch processing system of the batch processing system of big data, the computer big data includes:
Data input module is connect with main control module, for inputting customer data by data input device;
Main control module, with data input module, scheduling of resource module, batch processing execution module, encrypting module, analysis mould
Block, data memory module, display module connection, work normally for controlling modules by single-chip microcontroller;
Scheduling of resource module, connect with main control module, for dispatching data resource to be processed by dispatching algorithm;
Batch processing execution module, connect with main control module, for being waited for by batch program dispatch processor batch processing
Treatment progress operation;
Encrypting module is connect with main control module, for carrying out cryptographic operation to big data by encipheror;
Analysis module is connect with main control module, for being analyzed by analyzing program big data;
Data memory module is connect with main control module, for storing big data resource by memory;
Display module is connect with main control module, for showing the big data information content by display.
Another object of the present invention is to provide the enterprises that one kind at least carries the batch processing system of the computer big data
Industry IT service equipment.
Advantages of the present invention and good effect are as follows:
(1) present invention by encrypting module in use, the code integrity of big data platform can through the invention come
Verifying, even if big data platform is attacked by hacker and wooden horse, the present invention also can be detected and be alerted automatically.Even if of the invention big
Data platform is encroached on by attack or virus or wooden horse, utilizes system integrity calibration technology (Hash provided by the present invention
Algorithmic technique) can accurately recover with original identical system, avoid the leakage or loss of data.
(2) by analysis module by big data temporally fragment is stored in distributed data base while, in server sheet
Setting caches the interim table of initial data and concordance list of big data in ground caching, and corresponding big data is provided in concordance list in original
Location information in the interim table of beginning data, when carrying out big data analysis, directly according to the concordance list in server from original number
Big data is called according to interim table, due to using secondary index mode, obtaining analysis result when analyzing big data
It is stored in the analysis result table of distributed data base, does not need to obtain in the big data of magnanimity in distributed data base big
Data, so time-consuming is short and is easily achieved.Further, location information of the big data in the interim table of initial data is remote
The information of journey invocation of procedure big data is unique identification and reflects the called process of big data.
(3) present invention uses improved Min-Min dispatching algorithm to scheduling of resource module, by the excellent of multiple-task
First filtering and priority processing meet calculating task diversity, calculate the big requirement of data volume, and the load for improving resource is equal
Weighing apparatus degree and dispatching efficiency, improve work efficiency, and have saved the time.
When big data analysis of the present invention, using the non-precision solution of Granule Computing method analysis big data problem, by problem
Input from most fine granularity initial data be converted to information indicate, retain data in contain information and value under the premise of,
Data volume is greatly reduced.
Granule Computing played an important role in Intelligent Information Processing field as a kind of calculation paradigm, but by its
There is directive function applied to big data analysis.
Detailed description of the invention
Fig. 1 is the batch processing method flow chart that the present invention implements the computer big data provided.
Fig. 2 is the batch processing system structural block diagram that the present invention implements the computer big data provided.
In figure: 1, data input module;2, main control module;3, scheduling of resource module;4, batch processing execution module;5, add
Close module;6, analysis module;7, data memory module;8, display module.
Fig. 3 is the dynamic real-time update mechanism flow chart that the present invention implements the multi-source heterogeneous kernel structure provided.
Fig. 4 is that the present invention implements the suitable granulosa of selection provided, meets granularity metric demand and time constraints figure.
Fig. 5 is that the present invention implements the man-machine coordination alert response illustraton of model provided.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to this hair
It is bright to be further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not
For limiting the present invention.
With reference to the accompanying drawing and specific embodiment is further described application principle of the invention.
As shown in Figure 1, a kind of batch processing method of computer big data provided by the invention the following steps are included:
S101 inputs customer data using data input device by data input module;
S102, main control module dispatch data resource to be processed using dispatching algorithm by scheduling of resource module;
S103 is made by batch processing execution module using batch program dispatch processor batch processing process to be processed
Industry;Cryptographic operation is carried out to big data using encipheror by encrypting module;
S104 analyzes big data using analysis program by analysis module;
S105 stores big data resource using memory by data memory module;
S106 shows the big data information content using display by display module.
As shown in Fig. 2, the batch processing system of computer big data provided in an embodiment of the present invention, comprising: data input mould
Block 1, main control module 2, scheduling of resource module 3, batch processing execution module 4, encrypting module 5, analysis module 6, data memory module
7, display module 8.
Data input module 1 is connect with main control module 2, for inputting customer data by data input device;
Main control module 2, with data input module 1, scheduling of resource module 3, batch processing execution module 4, encrypting module 5, point
It analyses module 6, data memory module 7, display module 8 to connect, be worked normally for controlling modules by single-chip microcontroller;
Scheduling of resource module 3 is connect with main control module 2, for dispatching data resource to be processed by dispatching algorithm;
Batch processing execution module 4 is connect with main control module 2, for passing through batch program dispatch processor batch processing
Process operation to be processed;
Encrypting module 5 is connect with main control module 2, for carrying out cryptographic operation to big data by encipheror;
Analysis module 6 is connect with main control module 2, for being analyzed by analyzing program big data;
Data memory module 7 is connect with main control module 2, for storing big data resource by memory;
Display module 8 is connect with main control module 2, for showing the big data information content by display.
5 encryption method of encrypting module provided by the invention is as follows:
(1) after receiving target big data, the target big data is handled according to preset rules, and described in determination
Whether target big data is encrypted;
(2) if so, forming a key request to the target big data, and the key request is put into mesh
It marks in queue;
(3) key request is successively taken out from the object queue, and proposes production number to big data key production module
According to the request of encryption key;
(4) encryption key message that the key production module issues is received, and according to the encryption key message to institute
Big data is stated to be encrypted.
After reception target big data provided by the invention, the target big data is handled according to preset rules, and
Determine whether the target big data is encrypted, comprising:
After receiving target big data, rule is handled according to the piecemeal of data, piecemeal processing is carried out to the target big data,
And treated that the target big data determines whether each piece encrypt respectively to piecemeal;
It is described if so, form a key request to the target big data, and the key request is put into mesh
It marks in queue, comprising:
If so, a key request is formed to each piece in the target big data data block encrypted,
And the key request is put into object queue.
It is provided by the invention successively to take out key request from the object queue, and to big data key production module
It is proposed the request of creation data encryption key, comprising:
According to the principle of first in, first out, key request is successively taken out from the object queue, and raw to big data key
The request of creation data encryption key is proposed at module.
Encryption information provided by the invention includes the information of initial key, when the leakage of single block key, at the beginning of new
Beginning key generates key and removes the block of encryption leakage key, and updates initial key in encryption information table, the letter of block encryption key
Breath;When one-way function calculates, increase the information of an information change key number, it is M that block symmetric key, which generates function,
(F (K, A, f (N))), in encryption information table in front on the basis of the information N comprising key change.
6 analysis method of analysis module provided by the invention is as follows:
(1) by big data, temporally fragment is stored in distributed data base;
(2) it is arranged in the interim table of initial data and concordance list of distributed data lab setting caching big data, concordance list
Location information of the corresponding big data in the interim table of initial data;
(3) when carrying out big data analysis, according to the correspondence big data stored in the concordance list in server in original number
According to the location information in interim table, calls big data to be analyzed from the interim table of initial data, analyzed as a result, being stored in
In distributed data base.
Distributed data base provided by the invention is Hbase database.
It is provided by the invention in big data storage to before distributed data base, further include testing the integrality of big data
Card and legitimate verification, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent out
It gives server local and completes legitimate verification.
The mode of the interim table cache big data of initial data of caching big data provided by the invention are as follows:
Line unit rowkey is set using remote procedure call retrospect mark traceID, entry method name entrace and time
It sets, column name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.
It is provided by the invention big data is stored in Hbase include: rowkey using traceID, entry method name and
Time setting, column name are set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID.
Below with reference to concrete analysis, the invention will be further described.
The present invention analyzes big data using analysis program by analysis module, specifically includes:
For the characteristic of big data, unified big data problem Granule Computing solution framework is proposed, 3 V characteristics of big data can
To press following sequential processes: (certainly, some data itself do not have this 3 spies to diversity → flood tide → high speed simultaneously
Property, need to be accepted or rejected according to the actual situation)
(1) multiplicity of distributed storage, heterogeneous data converted using data filtering and data integration, extracted, grain
Change, the tables of data more standardized eliminates uncertainty therein.
(2) be directed to problem, using under Granule Computing " gamp " concrete model and technology former data are granulated into granule size
Suitable grain reduces data scale, and constructs the structure on corresponding granulosa and each granulosa
(3) under the auxiliary of other machines learning method, data mining or machine learning are carried out to information
(4) method used is transform as to the version of distributed, online incremental learning with meet big data processing and
When property requires
(5) in processing big data, the free switching of granularity needs to consider the decomposition and conjunction of grain on multiple granularity levels
And there are also the rapid builds accordingly solved;It to certain particular problems, needs to consider simultaneously the information of multiple granularity levels, uses
" across granularity " mechanism Solve problems
(6) from entire treatment process, it can be found that whether initial data has suitable granularity, to need to adjust
Generation that is whole and how adjusting initial data or acquisition, which provide, instructs
(7) deep learning (Deep Learning) thought is used for reference, crucial process flow is adjusted to many levels, is allowed
Design parameter (such as the size and granulosa number of grain) is optimized in study, to optimize final learning outcome
Between big data process flow (data acquisition → extraction/cleaning → integrated/expression → analysis/modeling → explanation)
The arrow " data source regulation instruction " for having the lower right corner specific corresponding relationship is actually the analysis application according to previous stage
Data granularity (acquisition or the accuracy generated and frequency etc.) is adjusted, is corresponded to " data acquisition ";Then " data
Source selection and data integration " corresponds to " extraction/cleaning ";" granulation of domain-oriented " corresponds to " integrated/to indicate " of data;
" parallelization/increment type in " the Granule Computing methodology model & other machines learning model " and upper right side round rectangle of top
Kernel structure updates and problem solving " corresponding " analysis/modeling ";Since information inherently has specific semanteme,
It is granulated and has clear " explanation " with the process that excavation/learning model is analyzed
1) data source capability and data integration
First link of big data processing is to confirm which data might have help for the solution of problem, which
It is the Mai Kenxi unrelated with theme a bit it is thought that one of 3 key challenges of big data analysis
The primitive form of big data generally has " diversity ", including syntactic metacharacter and Semantic Heterogeneous wherein syntactic metacharacter
The atomicity of data is maintained, only name is different or Type-Inconsistencies, such case are easier to processing Semantic Heterogeneous
Then it is related to the difference of many aspects such as data granularity and data type, needs to carefully analyze, then with metadata come to original number
According to such as video data is described, some applications only need its some essential informations (such as scene type, duration)
To in terms of the processing of isomeric data, Pal discuss how data preprocessing phase processing data isomerism,
The method mentioned has dimensionality reduction, data enrichment (data condensation) and data encapsulation (data wrapping)
.Pedrycz it describes for isomeric data,
How to carry out preparation stage of the fuzzy clustering as big data analysis, data integration be essential about
Data integration have been relatively mature.
2) granulation of domain-oriented:
Utilize the non-precision solution of Granule Computing method analysis big data problem, it is therefore an objective to by the input of problem from most particulate
Degree initial data be converted to information expression, retain data in contain information and value under the premise of, data are greatly reduced
Measure
The granulation of domain-oriented means before specific data analysis requirements propose, according to domain knowledge by original number
According to being first configured to more granular information knowledge representation models (Multi-Granular Inf ormation/Knowledge
Representation model, MGrIKR) building MGrIKR meaning be for family of solutions granularity thickness different problems
Suitable calculate is provided and inputs
It is granulated the expression firstly the need of analysis information, granulosa and entire kernel structure, then carries out structure for representation method
Build
(1) expression of information:
The representation method in the quotient space to manifold is used for reference, formalized description is carried out to information using triple, i.e.,
IG=(KVS, GM, VM) .KVS (Key Value pair Set) indicates the feature subvector of description information grain, is referred to as
Key-value pair set, i.e. KVS={ < key1, value1> ..., < keyn, valuen〉}.valu eiIndicate entitled key in informationi
The value that is taken of feature, i=1,2 ..., n.GM indicate the granularity metric (Granularity Measure) of the information, i.e.,
The fineness .VM of information indicates measure of value (Value Measur e) of the information
Data granulation is granulated from example (examples/instances) and feature (features/attributes) grain
Change both direction and carry out screening and combination that feature granulation refers mainly to feature, the kernel function side in machine learning can be used for reference
The granulation of method example can use the Clustering of data mining, i.e., first determine the particulate degree that an information granulosa is included
According to the module of similarity, then domain is split, so that each data similarity degree inside the same information
Similarity degree minimum between the data of maximum, different informations
About the expression of information granularity metric GM, further analyze in combination with existing granularity metric method for example,
The granularity metric method that Yao is proposed, i.e.,
Wherein, π={ X1, X2..., XmIt is to be divided to one of domain U, XiBe U subset when granularity is most thin, i.e.,
Each grain is single point set, there is GM (π)=0;When granularity is most thick, i.e., entire domain be a grain, GM (π)=log | U | information
The granularity metric of grain helps to find suitable granulosa in problem solving process, i.e. the optimization of granular space
About the measure of value VM of information, mainly determined in terms of granularity metric, uncertainty and domain knowledge 3:
1. the granularity metric of information and data analysis requirements fitness are higher, it is worth bigger;The excessively thick or meticulous information of granularity
Grain, value can all reduce;2. can determine information using variance analysis method in comentropy in information theory and statistics
Measure of value;3. allowing to specify the measure of value of specific information grain by domain knowledge and expertise
(2) expression of granulosa
Granulosa (Layer) is by based on the relationship structure between certain granulation criterion obtained all informations and information
Can be with formalization representation for a binary group at granulosa, i.e. wherein, IGS indicates granulosa to Layer=(IGS, I ntra-LR)
The set (Inform a-tion Granule Set, IGS) of middle information IG, IGS be represented by IGS=IG1,
IG2 ..., IGM };
Intra-LR (Intra-Layer Relationships, Intra-LR) is indicated in granulosa between information
Relationship that may be present, if information IGpWith IGqThere are relationships, then, Intra-LR be represented by Intra-LR=E |
E=(IGp, IGq), IGp, IGq∈IGS}.
(3) expression of kernel structure:
Kernel structure in MGrIKR is the multiple granulosas obtained by different granulation criterion, in different granulosa between information
Correlation and same granulosa in the topological structure that constitutes of correlation between information therefore, the form of kernel structure
Changing indicates to be similar to information IG and granulosa Layer, it is also possible to tuple form expression kernel structure (GranularStructure,
GS), i.e.,
GS=(LS, Inter-LR)
Wherein, LS={ Layer1..., LayerM-1, LayermIndicate m granulosa set (Layer Set, LS),
Middle granulosa LayerjBe in kernel structure a granulosa .Inter-LR (Inter-Layer Relation-ships,
Inter-LR certain two granulosa Layer) is indicatedjTransformational relation collection between the information of Layerk, Inter-LR can be indicated
For
Inter-LR=r | r (Layerj, Layerk) },
Or
Inter-LR=r | r (IGj, IGk), IGj∈ IGSj, IGk∈ IGSk } here, r indicate granulosa LayerjWith
LayerkThe partial ordering relation met between middle information, j, k=1 ..., wherein, r can be information in adjacent two granulosa to m.
Between relationship, can also be the relationship between the information of cross-layer
The granulation of big data is exactly the formalization representation referring to information, granulosa and kernel structure, calculates each tuple
In each element
3) parallelization/increment type kernel structure update and problem solving:
The speed that " high speed " feature request of big data analyzes it is fast, and the response action taken wants timely current
Available technical solution mainly has parallelization calculating and incrementally updating, and wherein parallel computation includes using distributed parallel meter
It calculates platform, the parallel multiple computing units for using multi-core CPU and carries out the such as cooperated computing using GPU and work as large-scale dataset
When middle small part data change, safeguard that entire MGrIKR and amendment are asked based on MGrIKR using the thought of incremental update
The result that solves is inscribed in the timeliness for ensureing big data analysis, from the timeliness of information update and problem solving and
Two aspects of when property carry out analysis
(1) timeliness that information updates --- the dynamic of multi-source heterogeneous kernel structure updates
Without loss of generality, herein the present invention consider complex situations under (multi-source heterogeneous dynamic dataflow) kernel structure dynamic
Update, remaining simple case is similar can be obtained and establishes initial kernel structure respectively to each data source first, then will it is each at the beginning of
Beginning kernel structure is integrated according to certain relationship, is ultimately formed a global kernel structure first step, is integrated kernel structure first
Formalized description integrates two kernel structure GSi=(LSi, In ter-LRi) and GSj=(LSj, Inter-LRj) can define
One logical operation defines binary and maps f:GS × GS → GS, wherein GS is entire problem domain, i.e. kernel structure
Set, the mapping of this binary should meet operation rule:
F (GSi, GSj)=(f1(LSi, LSj), f2(Inter-LRi, Inter-LRj)) wherein, binary map f1It will
The level of two kernel structures is integrated,
Form a new global granulosa;Binary maps f2Two kernel structures are reintegrated, in granulosa and granulosa
Between information set of relations integration process in, need between different granulosas between same granulosa information convert
Set of relations is integrated, the merging, deletion including relationship and update
Second step, the dynamic update that the dynamic of each component kernel structure updates kernel structure can formalize are as follows: Up
dateGS(GSi)=(UpdateL (LSi), UpdateR (I nter-LRi)) wherein,
UpdateL is the dynamic updating method of granulosa, and UpdateR is that the dynamic of information set of relations in layer and layer updates
Method
Third step, the incremental update of global kernel structure update result by the dynamic of each data source and design global burl
The update method of structure, formalization representation are
Update (globalGS)=Update (UpdateGS (G S1), UpdateGS (GS2) ..., UpdateGS (GS
n)).
The dynamic real-time update mechanism as shown in Figure 3 of multi-source heterogeneous kernel structure
(2) timeliness of problem solving --- the application type analysis solved based on MGrIKR:
Since Granule Computing itself has the property of " non-precision ", it is not able to satisfy at all types of big datas
Reason demand is directed to suitable problem types, and the calculating based on kernel structure can accelerate solution procedure, guarantees which timeliness determines
The big data problem of a little types is suitable for using the extremely important of Granule Computing method herein, and the present invention temporarily proposes two class problems as example
The sub- further types of problem of can find in further analysis work
1. granular space optimization problem of example describes granulosa select permeability using optimum theory, determines the calculating grain effectively solved
Degree, to obtain effectively solution in the shortest time
The validity for defining 1. solutions can define SolutionEf fectiveness=(GM by a binary group
(R), Tu) .R be calculate as a result, GM (R) is the granularity metric of the result, TuIf being the GM GM of time limit demand one solution
(R), and the time of this solution is obtained less than Tu, then this solution has validity, referred to as effectively solution
In order to select the granulosa of one " suitable " to be calculated from the kernel structure of domain-oriented, to reduce the reality calculated
Border space-time expends, and need to carry out granular space optimizing granular space optimizing is exactly to find such granulosa in the m layer of kernel structure
Layeri:
MaxGM(Layeri)
s.t.
GM (Ri=Solve (Layeri))≤GM (Ru), Ti≤Tu, 1≤i≤m.
Wherein, Ri、TiIt is to consider problem as shown in Figure 4 in i-th layer of upper result solved and spent time respectively
Solution granularity meet demand on Layer3, but the time is not able to satisfy time constraints;Layer1On the solution time can be with
Meet time constraints, but the solution on granularity and too thick the two granulosas of is not effective solution for solving on Layer2 while expiring
The granularity requirements and time constraints understood enough are effective solution
2. man-machine coordination of example can progressive computational problem in the decision system being made of people and computing system, if by certainly
There is the calculating of detachable property, decision can gradually refine for the action of plan guidance, and from current state, the solution more refined
The action that can be used for instructing next step can construct " beating the gun " and " taking action when calculating " aiming at the problem that this type
For man-machine coordination alert response model in adjacent granulosa, lower layer's solution is the refinement of upper layer solution, is denoted as Ri-1 < Ri, each solution pair
Using the tack (ActionStep, AS) that family is taken in next step, it is denoted as Ri→ASi, and the corresponding action of entire decision
A has detachable property, is denoted as
According to action the number of steps of, determine the value of n, that is, determined solution stage and parallel granularity, then from
Filter out suitable n granulosa in the kernel structure of the domain-oriented pre-established, man-machine coordination can progressive Solve problems model such as
Shown in Fig. 5
If not using the progressive calculation of man-machine coordination, Action-Step1 can only be held since t3 time point
The time that finally completes of row, entire decision and action can significantly delay
Below with reference to effect, the invention will be further described.
The problems such as a possibility that Granule Computing is applied to big data processing and model framework:
(1) the granulation emphasis for analyzing big data is directed to " high speed " and " flood tide " of big data, continues to Granule Computing base
This model and algorithm carry out theory analysis, obtain more quick granulating method, accelerate the common side of one kind of knowledge acquisition speed
Method is incrementally updating, has there is some good achievement in terms of the incrementally updating of rough set in recent years
(2) optimization of analysis granular space, the switching of granularity level and more granularity combined calculations this 3 kinds of Granule Computing modes are big
Under data environment using for example, by the man-machine coordination alert response model conversation of example 2 at another Problem-Solving Model,
That is the progressive solving model of precision:
Since providing most coarseness solution, pass rank towards the direction of more fine granularity level and calculated, user it is in office when
It is before guaranteeing timeliness that the meaning of currently available this computation model of most fine granularity solution can be got by, which carving,
It puts, obtains non-precision solution with practical value.
(3) the directive function of Granule Computing processing frame processing frame under big data environment is analyzed and verifies to consider
How Granule Computing thought is used in the links of big data processing, combines Granule Computing concrete model and data mining/machine
The application of device learning algorithm is used for instructing big data analysis in combination with specific domain background and data analysis requirements
Analysis and practice, and according to guidance during the new problem that finds be allowed to be corrected and improved
(4) the fast-developing basis IT of combining closely the Parallel Implementation method of analysis Granule Computing processing big data problem is set
It applies and software platform, acceleration of the exploitation parallel computation in the Granule Computing method analysis of big data can be parallel for data
Computation-intensive task, analyze the GPU+CPU High Performance Computing Cluster solution of Granule Computing;It is huge for data volume and
The problem that data overall relevance is relatively strong, concurrency is weaker analyzes the processing on the Open Source Platforms such as Hadoop, Spark/Storm
Method
(5) concrete application background is combined, is handled in scientific analysis and engineer application using the big data based on Granule Computing
Method is in extensive video monitoring system, and after monitoring video is granulated according to scene classification information, being organized into has
The kernel structure of Scene Semantics, to realize that the compression storage of monitor video and efficient retrieval these specific analysis work will
The big data based on Granule Computing of enriching constantly handles the theoretical model and technological means in this direction
The present invention analyzes a possibility that handling big data using Granule Computing, proposes a kind of big data based on Granule Computing
The correlation analysis basis future for handling frame, and reviewing big data and Granule Computing field needs the work carried out to be mainly
In conjunction with specific application field and analysis demand, analyzes the MGrIKR building of big data, develops and based on big data MGrIKR
Intelligence computation method
Granule Computing played an important role in Intelligent Information Processing field as a kind of calculation paradigm, but by its
There is directive function applied to big data analysis.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When using entirely or partly realizing in the form of a computer program product, the computer program product include one or
Multiple computer instructions.When loading on computers or executing the computer program instructions, entirely or partly generate according to
Process described in the embodiment of the present invention or function.The computer can be general purpose computer, special purpose computer, computer network
Network or other programmable devices.The computer instruction may be stored in a computer readable storage medium, or from one
A computer readable storage medium is transmitted to another computer readable storage medium, for example, the computer instruction can be from
One web-site, computer, server or data center pass through wired (such as coaxial cable, optical fiber, Digital Subscriber Line
(DSL) or wireless (such as infrared, wireless, microwave etc.) mode is into another web-site, computer, server or data
The heart is transmitted).The computer-readable storage medium can be any usable medium that computer can access either
The data storage devices such as server, the data center integrated comprising one or more usable mediums.The usable medium can be
Magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
SolidStateDisk (SSD)) etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. a kind of batch processing method of computer big data, which is characterized in that the batch processing method packet of the computer big data
It includes: big data being analyzed using analysis program by analysis module;Specifically have:
Following sequential processes: diversity → flood tide → high speed sequential processes are pressed to the 3V characteristic of big data;
The multiplicity of distributed storage, heterogeneous data are converted using data filtering and data integration, extracts, are granulated, is obtained
The tables of data more standardized eliminates uncertainty therein;
Using under Granule Computing " gamp " concrete model and technology former data are granulated into the suitable grain of granule size, reduce data
Scale, and construct the structure on corresponding granulosa and each granulosa;
Under the auxiliary of other machines learning method, data mining or machine learning are carried out to information;
The data mining used or machine learning are transform the version of distributed, online incremental learning as to meet big data
The timeliness requirement of processing;
In processing big data, the free switching of granularity, need on multiple granularity levels the decomposition of grain with merge, there are also corresponding solutions
Rapid build;To certain particular problems, the information of multiple granularity levels is needed, " across granularity " mechanism is used to solve;
Whether from entire treatment process, whether analysis initial data has suitable granularity, to need to adjust and how to adjust
The generation or acquisition of initial data provide guidance;
Deep learning thought is used for reference, crucial process flow is adjusted to many levels, allows design parameter to obtain in study excellent
Change, and optimizes final learning outcome.
2. the batch processing method of computer big data as described in claim 1, which is characterized in that carry out analysis tool to big data
Body includes: data acquisition → extraction/cleaning → integrated/expression → analysis/modeling → explanation;
Wherein:
1) data source capability and data integration:
Row data source capability is encapsulated into using dimensionality reduction, data enrichment and data to the processing of isomeric data;
2) granulation of domain-oriented: the input of problem, which is converted to information from most fine granularity initial data, to be indicated, in encumbrance
Under the premise of the information and value that contain in, data volume is greatly reduced;Before the proposition of specific data analysis requirements, according to neck
Initial data is first constructed more granular information knowledge representation model Multi-GranularInformation/ by domain knowledge
KnowledgeRepresentationmodel, MGrIKR;Granulation analyzes the table of information, granulosa and entire kernel structure first
Show, is then constructed for representation method;
Wherein, formalized description, IG=(KVS, GM, VM) .KVS the expression of information: are carried out to information using triple
(KeyValuepairSet) the feature subvector of description information grain, referred to as key-value pair set, i.e. KVS={ < key are indicated1,
value1> ..., < keyn, valuen〉}.valueiIndicate entitled key in informationiThe value that is taken of feature, i=1,2 ...,
N.GM indicates that the granularity metric (GranularityMeasure) of the information, i.e. the fineness .VM of information indicate the letter
Cease the measure of value (ValueMeasure) of grain;
The expression of granulosa: granulosa is by based on the relationship structure between certain granulation criterion obtained all informations and information
At;Formalization representation is a binary group, Layer=(IGS, Intra-LR);Wherein, IGS indicates information IG in granulosa
Gather (Informa-tionGranuleSet, IGS), IGS is represented by IGS={ IG1, IG2 ..., IGM };
Intra-LR (Intra-LayerRelationships, Intra-LR) indicates existing between information in granulosa
Relationship, if information IGpWith IGqThere are relationship, Intra-LR be represented by Intra-LR=E | E=(IGp, IGq),
IGp, IGq∈IGS};
The expression of kernel structure: information in multiple granulosas that kernel structure in MGrIKR is obtained by different granulation criterion, different granulosas
The topological structure that correlation in correlation and same granulosa between grain between information is constituted;The form of kernel structure
Changing indicates to be similar to information IG and granulosa Layer, indicates kernel structure (GranularStructure, GS) with tuple form,
GS=(LS, Inter-LR);
Wherein, LS={ Layer1..., LayerM-1, LayermIndicate m granulosa set (LayerSet, LS), wherein granulosa
LayerjIt is granulosa .Inter-LR (Inter-LayerRelation-ships, an Inter-LR) table in kernel structure
Show certain two granulosa LayerjTransformational relation collection between the information of Layerk, Inter-LR are expressed as
Inter-LR=r | r (Layerj, Layerk) },
Or
Inter-LR=r | r (IGj, IGk), IGj∈ IGSj, IGk∈IGSk};
R indicates granulosa LayerjWith LayerkThe partial ordering relation met between middle information, j, k=1 ..., wherein, r is adjacent to m.
Relationship in two granulosas between information, or the relationship between the information of cross-layer.
3. the batch processing method of computer big data as described in claim 1, which is characterized in that the computer big data
Batch processing method specifically includes:
Step 1 inputs customer data using data input device by data input module;
Step 2, main control module dispatch data resource to be processed, scheduling of resource using dispatching algorithm by scheduling of resource module
Module uses the Min-Min dispatching algorithm under big data environment in loads-scheduling algorithm, specific steps are as follows:
(1) judge whether the task in data acquisition system is sky, and not empty then downward execution (2) otherwise arrives (6);
(2) for the task in data acquisition system, find out respectively they be mapped on all virtual machines with execute the time, obtain one
A matrix;
(3) according to the result of (2) find out deadline the smallest task corresponding to virtual machine;
(4) task is distributed to virtual machine, and the task is deleted from data acquisition system;
(5) matrix is updated, (1) is returned to;
Step 3 utilizes the process operation to be processed of batch program dispatch processor batch processing by batch processing execution module;
Cryptographic operation is carried out to big data using encipheror by encrypting module;
Step 4 analyzes big data using analysis program by analysis module, and the analysis method of analysis module includes:
(1) by big data, temporally fragment is stored in distributed data base, and carries out at encryption to the data content in database
Reason;
(2) it is provided in the interim table of initial data and concordance list of distributed data lab setting caching big data, in concordance list pair
Answer location information of the big data in the interim table of initial data;
(3) when carrying out big data analysis, faced according to the correspondence big data stored in the concordance list in server in initial data
When table in location information, fast decryption is carried out to encryption data by main control module, calls big number from the interim table of initial data
According to being analyzed, analyzed as a result, being stored in distributed data base;
Step 5 stores big data resource using memory by data memory module;
Step 6 shows the big data information content using display by display module;
The encrypting module encryption method is as follows:
(1) after receiving target big data, the target big data is handled according to preset rules, and determines that the target is big
Whether data are encrypted;
(2) if so, forming a key request to the target big data, and the key request is put into object queue
It is interior;
(3) key request is successively taken out from the object queue, and proposes that creation data adds to big data key production module
The request of key;
(4) encryption key message that the key production module issues is received, and according to the encryption key message to described big
Data are encrypted.
4. the batch processing method of computer big data as described in claim 1, which is characterized in that the reception target big data
Afterwards, the target big data is handled according to preset rules, and determines whether the target big data is encrypted,
Include:
After receiving target big data, rule is handled according to the piecemeal of data, piecemeal processing is carried out to the target big data, and is right
Treated that the target big data determines whether each piece encrypted respectively for piecemeal;
It is described if so, form a key request to the target big data, and the key request is put into target team
In column, comprising:
If so, forming a key request to each piece in the target big data data block encrypted, and will
The key request is put into object queue.
5. the batch processing method of computer big data as described in claim 1, which is characterized in that described successively from the target
Key request is taken out in queue, and the request of creation data encryption key is proposed to big data key production module, comprising:
According to the principle of first in, first out, key request is successively taken out from the object queue, and generate mould to big data key
The request of block proposition creation data encryption key;
The encryption information includes the information of initial key, when the leakage of single block key, is generated using new initial key close
Key removes the block of encryption leakage key, and updates initial key in encryption information table, the information of block encryption key;In one-way function
When calculating, increasing the information of an information change key number, it is M (F (K, A, f (N))) that block symmetric key, which generates function,
In encryption information table in front on the basis of comprising key change information N;
The distributed data base is Hbase database;
It is described in big data storage to before distributed data base, further include testing the integrity verification and legitimacy of big data
Card, wherein integrity verification is completed by the redis in network system, and by rear, big data is sent to server local
Complete legitimate verification;
The mode of the interim table cache big data of initial data of the caching big data are as follows:
Line unit rowkey is using remote procedure call retrospect mark traceID, entry method name entrace and time setting, column
Name is set as arbitrary value, and the key assignments in key-value pair is spliced using spanID and big data value roleID;
Described big data is stored in Hbase includes: rowkey using traceID, entry method name and time setting, column name
It is set as arbitrary value, the key assignments in key-value pair is spliced using spanID and big data value roleID.
6. a kind of computer program for realizing the batch processing method of computer big data described in Claims 1 to 5 any one.
7. a kind of terminal, which is characterized in that the terminal, which is at least carried, realizes computer described in Claims 1 to 5 any one
The server of the batch processing method of big data.
8. a kind of computer readable storage medium, including instruction, when run on a computer, so that computer is executed as weighed
Benefit requires the batch processing method of computer big data described in 1-5 any one.
9. a kind of batch processing system of the computer big data for the batch processing method for implementing computer big data described in claim 1
System, which is characterized in that the batch processing system of the computer big data includes:
Data input module is connect with main control module, for inputting customer data by data input device;
Main control module, with data input module, scheduling of resource module, batch processing execution module, encrypting module, analysis module, number
It connects according to memory module, display module, is worked normally for controlling modules by single-chip microcontroller;
Scheduling of resource module, connect with main control module, for dispatching data resource to be processed by dispatching algorithm;
Batch processing execution module, connect with main control module, for be processed by batch program dispatch processor batch processing
Process operation;
Encrypting module is connect with main control module, for carrying out cryptographic operation to big data by encipheror;
Analysis module is connect with main control module, for being analyzed by analyzing program big data;
Data memory module is connect with main control module, for storing big data resource by memory;
Display module is connect with main control module, for showing the big data information content by display.
10. a kind of enterprise's IT service equipment for the batch processing system at least carrying computer big data described in claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811257472.XA CN109522742A (en) | 2018-10-26 | 2018-10-26 | A kind of batch processing method of computer big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811257472.XA CN109522742A (en) | 2018-10-26 | 2018-10-26 | A kind of batch processing method of computer big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109522742A true CN109522742A (en) | 2019-03-26 |
Family
ID=65772997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811257472.XA Pending CN109522742A (en) | 2018-10-26 | 2018-10-26 | A kind of batch processing method of computer big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522742A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135433A (en) * | 2019-05-07 | 2019-08-16 | 宏图物流股份有限公司 | A kind of representation data availability judgment method recommended based on vehicle |
CN111556098A (en) * | 2020-04-08 | 2020-08-18 | 深圳供电局有限公司 | Artificial intelligence based analysis system and analysis method for internet of things data |
CN111897828A (en) * | 2020-07-31 | 2020-11-06 | 广州视源电子科技股份有限公司 | Data batch processing implementation method, device, equipment and storage medium |
CN112090097A (en) * | 2020-08-06 | 2020-12-18 | 浙江大学 | Performance analysis method and application of traditional Chinese medicine concentrator |
CN112181965A (en) * | 2020-09-29 | 2021-01-05 | 成都商通数治科技有限公司 | MYSQL-based big data cleaning system and method for writing bottleneck into MYSQL-based big data cleaning system |
CN112307126A (en) * | 2020-11-24 | 2021-02-02 | 上海浦东发展银行股份有限公司 | Batch processing method and system for credit card account management data |
CN117610896A (en) * | 2024-01-24 | 2024-02-27 | 青岛创新奇智科技集团股份有限公司 | Intelligent scheduling system based on industrial large model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021484A (en) * | 2016-05-18 | 2016-10-12 | 中国电子科技集团公司第三十二研究所 | Customizable multi-mode big data processing system based on memory calculation |
CN107592295A (en) * | 2017-08-01 | 2018-01-16 | 佛山市深研信息技术有限公司 | A kind of encryption method of big data |
CN108268468A (en) * | 2016-12-30 | 2018-07-10 | 北京京东尚科信息技术有限公司 | The analysis method and system of a kind of big data |
CN108460489A (en) * | 2018-03-15 | 2018-08-28 | 重庆邮电大学 | A kind of user behavior analysis based on big data technology and service recommendation frame |
CN108519914A (en) * | 2018-04-09 | 2018-09-11 | 腾讯科技(深圳)有限公司 | Big data computational methods, system and computer equipment |
-
2018
- 2018-10-26 CN CN201811257472.XA patent/CN109522742A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021484A (en) * | 2016-05-18 | 2016-10-12 | 中国电子科技集团公司第三十二研究所 | Customizable multi-mode big data processing system based on memory calculation |
CN108268468A (en) * | 2016-12-30 | 2018-07-10 | 北京京东尚科信息技术有限公司 | The analysis method and system of a kind of big data |
CN107592295A (en) * | 2017-08-01 | 2018-01-16 | 佛山市深研信息技术有限公司 | A kind of encryption method of big data |
CN108460489A (en) * | 2018-03-15 | 2018-08-28 | 重庆邮电大学 | A kind of user behavior analysis based on big data technology and service recommendation frame |
CN108519914A (en) * | 2018-04-09 | 2018-09-11 | 腾讯科技(深圳)有限公司 | Big data computational methods, system and computer equipment |
Non-Patent Citations (2)
Title |
---|
周舟: "《云环境下能耗优化及管理技术》", 31 August 2018, 湖南大学出版社 * |
徐计 等: "基于粒计算的大数据处理", 《计算机学报》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135433A (en) * | 2019-05-07 | 2019-08-16 | 宏图物流股份有限公司 | A kind of representation data availability judgment method recommended based on vehicle |
CN111556098A (en) * | 2020-04-08 | 2020-08-18 | 深圳供电局有限公司 | Artificial intelligence based analysis system and analysis method for internet of things data |
CN111556098B (en) * | 2020-04-08 | 2023-09-15 | 深圳供电局有限公司 | Analysis system and analysis method for Internet of things data based on artificial intelligence |
CN111897828A (en) * | 2020-07-31 | 2020-11-06 | 广州视源电子科技股份有限公司 | Data batch processing implementation method, device, equipment and storage medium |
CN112090097A (en) * | 2020-08-06 | 2020-12-18 | 浙江大学 | Performance analysis method and application of traditional Chinese medicine concentrator |
CN112090097B (en) * | 2020-08-06 | 2021-10-19 | 浙江大学 | Performance analysis method and application of traditional Chinese medicine concentrator |
CN112181965A (en) * | 2020-09-29 | 2021-01-05 | 成都商通数治科技有限公司 | MYSQL-based big data cleaning system and method for writing bottleneck into MYSQL-based big data cleaning system |
CN112307126A (en) * | 2020-11-24 | 2021-02-02 | 上海浦东发展银行股份有限公司 | Batch processing method and system for credit card account management data |
CN112307126B (en) * | 2020-11-24 | 2022-09-27 | 上海浦东发展银行股份有限公司 | Batch processing method and system for credit card account management data |
CN117610896A (en) * | 2024-01-24 | 2024-02-27 | 青岛创新奇智科技集团股份有限公司 | Intelligent scheduling system based on industrial large model |
CN117610896B (en) * | 2024-01-24 | 2024-04-19 | 青岛创新奇智科技集团股份有限公司 | Intelligent scheduling system based on industrial large model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522742A (en) | A kind of batch processing method of computer big data | |
US11526333B2 (en) | Graph outcome determination in domain-specific execution environment | |
US20210073282A1 (en) | Graph-manipulation based domain-specific execution environment | |
CN110084377B (en) | Method and device for constructing decision tree | |
Shi et al. | Concept-cognitive learning model for incremental concept learning | |
CN104580163B (en) | Access control policy builds system under privately owned cloud environment | |
US11403347B2 (en) | Automated master data classification and curation using machine learning | |
CN103336790A (en) | Hadoop-based fast neighborhood rough set attribute reduction method | |
US10909114B1 (en) | Predicting partitions of a database table for processing a database query | |
CN103336791A (en) | Hadoop-based fast rough set attribute reduction method | |
CN114626807A (en) | Nuclear power scene management method, system, device, computer equipment and storage medium | |
Venkatraman et al. | Big data infrastructure, data visualisation and challenges | |
Wang et al. | Flint: A platform for federated learning integration | |
Wang et al. | Comparison of representative heuristic algorithms for multi-objective reservoir optimal operation | |
CN116258309A (en) | Business object life cycle management and tracing method and device based on block chain | |
CN110322153A (en) | Monitor event processing method and system | |
Priyanka et al. | Fundamentals of wireless sensor networks using machine learning approaches: Advancement in big data analysis using Hadoop for oil pipeline system with scheduling algorithm | |
Jia et al. | Development model of enterprise green marketing based on cloud computing | |
US10089475B2 (en) | Detection of security incidents through simulations | |
CN110837657B (en) | Data processing method, client, server and storage medium | |
CN115358728A (en) | ERP data processing method based on cloud computing | |
Zhou et al. | A compliance-based architecture for supporting GDPR accountability in cloud computing | |
Sarathchandra et al. | Resource aware scheduler for distributed stream processing in cloud native environments | |
US20240112067A1 (en) | Managed solver execution using different solver types | |
Ashfaq et al. | Towards a Trustworthy and Efficient ETL Pipeline for ATM Transaction Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |