A kind of large data processing method
Technical field
The present invention relates to large data fields, be specifically related to a kind of large data processing method.
Background technology
In recent years, along with fast development and the popularization and application of computing machine and infotech, the scale of sector application system expanded rapidly, and the data that sector application produces are explosive increase.Easily the even tens of large data of industry/enterprise to hundreds of PB scale of hundreds of TB are reached far beyond the processing power of existing traditional computing technique and infosystem, therefore, seek effective large data processing technique, active demand that ways and means has become a reality the world.There is existing time in the industry such as the fields such as physics, biology, Environmental ecology and military affairs, finance, communication in " large data ", but because in recent years internet and information industry development and cause people to pay close attention to.
The object of large data processing is to allow user can in time, effectively obtain required large data resource.Have a large amount of demands under the scenes such as the cloud computing in internet, Distributed Calculation, but prior art also lacks effective large data processing method.
Summary of the invention
The object of the present invention is to provide a kind of large data processing method, user can be enable in time, effectively to obtain required large data resource; And can effective data analysis and process be carried out.
Object of the present invention is achieved through the following technical solutions:
A kind of large data processing method, is characterized in that, comprise the following steps:
Step 1: the resource request information receiving user's input;
Step 2: according to described resource request information, obtains the large data resource relevant to described resource request information from high in the clouds;
Step 3: user downloads the large data resource obtained from high in the clouds;
Step 4: the described large data resource downloaded is classified;
Step 5: sorted large data resource is stored.
Alternatively, described step 2 comprises the following steps:
Step 2.1: obtain described resource request information from high in the clouds by management of computing node;
Step 2.2: described management of computing node specifies multiple distributed computational nodes to carry out Distributed Calculation according to described resource request information, makes each distributed computational nodes each self-generating local calculation result;
Step 2.3: the local calculation result of each distributed computational nodes is integrated by described management of computing node, obtains a global calculation result, and described global calculation result is sent to high in the clouds.
Alternatively, described step 2.3 comprises the following steps:
Step 2.3.1: described management of computing node is according to the comprehensive grading value K of described multiple distributed computational nodes, the respective local calculation result of each distributed computational nodes is sorted, and remove repeating data and noise data after being merged by ranking results, obtain global calculation result;
Wherein, for each distributed computational nodes, if its comprehensive grading value is K, degree of belief score value is K1, and computing power score value is K2; Then: K=(A+ (K1)
1/2) * (B+ (K2)
1/2);
Wherein, A, B are positive integers, and K1, K2 are greater than zero;
Step 2.3.2: described global calculation result, according to fixed time interval, is sent to high in the clouds in the mode of incremental data by described management of computing node.
Alternatively, described step 3 comprises the following steps:
Step 3.1: the described global calculation result obtained from high in the clouds is divided into several independently data blocks by data transmitting server, and record the capacity of each data block, described data block is stored into successively in chronological order in a data memory node set simultaneously, described data memory node set comprises M independently data memory node, that is: data memory node 1, data memory node 2 ..., data memory node N ..., data memory node M;
Step 3.2: after current data block is stored into data memory node N by described data transmitting server, data memory node N returns its residual capacity information to data transmitting server, when the residual capacity information of data memory node N is less than the capacity of next data block, forwarding server starts to store data block to data memory node N+1; The like, until global calculation result has all stored rear end; Wherein, N≤M, and M, N are positive integer;
Step 3.3: user will download described global calculation result from the data transmitting server in high in the clouds, described global calculation result is obtained large data resource.
Alternatively, described step 4 comprises the following steps:
Step 4.1: stochastic sampling is carried out to the attribute of the described large data resource downloaded, obtains multiple large class data set;
Step 4.2: stochastic sampling is carried out to the attribute of each large class data set, obtains multiple group data set;
Step 4.3: carry out cluster analysis to each large class data set, obtains multiple large class cluster result and corresponding large class label;
Step 4.4: carry out cluster analysis to each group data set, obtains multiple group cluster result and corresponding group label;
Step 4.5: export described large class cluster result and large class label, group cluster result and group label, complete the classification of described large data resource.
Beneficial effect of the present invention is: by calculating with process the distributed storage of large data resource, improve the counting yield of large data processing, cost is low, and it is good that data store continuity, and security is high.
Embodiment
Below in conjunction with embodiment, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
A kind of large data processing method, is characterized in that, comprise the following steps:
Step 1: the resource request information receiving user's input;
Step 2: according to described resource request information, obtains the large data resource relevant to described resource request information from high in the clouds;
Step 3: user downloads the large data resource obtained from high in the clouds;
Step 4: the described large data resource downloaded is classified;
Step 5: sorted large data resource is stored.
Alternatively, described step 2 comprises the following steps:
Step 2.1: obtain described resource request information from high in the clouds by management of computing node;
Step 2.2: described management of computing node specifies multiple distributed computational nodes to carry out Distributed Calculation according to described resource request information, makes each distributed computational nodes each self-generating local calculation result;
Step 2.3: the local calculation result of each distributed computational nodes is integrated by described management of computing node, obtains a global calculation result, and described global calculation result is sent to high in the clouds.
Alternatively, described step 2.3 comprises the following steps:
Step 2.3.1: described management of computing node is according to the comprehensive grading value K of described multiple distributed computational nodes, the respective local calculation result of each distributed computational nodes is sorted, and remove repeating data and noise data after being merged by ranking results, obtain global calculation result;
Wherein, for each distributed computational nodes, if its comprehensive grading value is K, degree of belief score value is K1, and computing power score value is K2; Then: K=(A+ (K1)
1/2) * (B+ (K2)
1/2);
Above in several parameter: the span of A, B is positive integer; K1, K2 are positive number;
Wherein, the degree of belief score value of each distributed computational nodes is K1 and computing power score value is that K2 can be known; To be K1 with the data history of this distributed computational nodes access described degree of belief score value that situation is relevant, and computing power score value is K2 with this computing power score value is that the hardware computing power of K2 is relevant;
Described parameter A, B are regulating parameter, can be constant, also can carry out necessary adjustment according to actual needs.
Step 2.3.2: described global calculation result, according to fixed time interval, is sent to high in the clouds in the mode of incremental data by described management of computing node.
Alternatively, described step 3 comprises the following steps:
Step 3.1: the described global calculation result obtained from high in the clouds is divided into several independently data blocks by data transmitting server, and record the capacity of each data block, described data block is stored into successively in chronological order in a data memory node set simultaneously, described data memory node set comprises M independently data memory node, that is: data memory node 1, data memory node 2 ..., data memory node N ..., data memory node M;
Step 3.2: after current data block is stored into data memory node N by described data transmitting server, data memory node N returns its residual capacity information to data transmitting server, when the residual capacity information of data memory node N is less than the capacity of next data block, forwarding server starts to store data block to data memory node N+1; The like, until global calculation result has all stored rear end; Wherein, N≤M, and M, N are positive integer;
Step 3.3: user will download described global calculation result from the data transmitting server in high in the clouds, described global calculation result is obtained large data resource.
Alternatively, described step 4 comprises the following steps:
Step 4.1: stochastic sampling is carried out to the attribute of the described large data resource downloaded, obtains multiple large class data set;
Step 4.2: stochastic sampling is carried out to the attribute of each large class data set, obtains multiple group data set;
Step 4.3: carry out cluster analysis to each large class data set, obtains multiple large class cluster result and corresponding large class label;
Step 4.4: carry out cluster analysis to each group data set, obtains multiple group cluster result and corresponding group label;
Step 4.5: export described large class cluster result and large class label, group cluster result and group label, complete the classification of described large data resource.
Although above detailed description illustrates, describe and point out to be applied to the of the present disclosure basic novel feature of multiple realization, but will be appreciated that, those skilled in the art under the prerequisite not departing from intention of the present disclosure, can make multiple omission, replacement and change in the form and details of system.In addition, the order of the order that occurs in the claims of method step not ways of hinting step.