CN105447141A

CN105447141A - Data processing method and node

Info

Publication number: CN105447141A
Application number: CN201510816312.4A
Authority: CN
Inventors: 孙志云; 郭美思
Original assignee: Inspur Group Co Ltd
Current assignee: Inspur Group Co Ltd
Priority date: 2015-11-20
Filing date: 2015-11-20
Publication date: 2016-03-30

Abstract

Embodiments of the invention provide a data processing method and node, which relate to the technical field of computers and are used for quickly searching for data to improve the efficiency of mass data search. The method comprises the steps that a master control node determines to-be-searched data information; the master control node generates a search instruction according to the to-be-searched data information, and sends the search instruction to at least one slave node, wherein the at least one slave node stores elastic distribution type data sent by the master control node, and the search instruction carries identifier information of the to-be-searched data information; the master control node obtains the elastic distribution type data corresponding to the to-be-searched data information and returned by the at least one slave node, and determines a response data set corresponding to the to-be-searched data information according to the elastic distribution data corresponding to the to-be-searched data information returned by the at least one slave node; and the main control node outputs the response data set.

Description

A kind of method of data processing and node

Technical field

The present invention relates to field of computer technology, particularly relate to a kind of method and node of data processing.

Background technology

Along with the arrival in electronic information epoch, people's life there occurs great variety.Everyone by the packet of magnanimity round, no matter be work, studying and living, data are all ubiquitous.The mass data of every field in people's life as medical treatment, meteorology, life etc. has all developed into relevant to service application, for people provide high-quality quality of the life.But information-based today prevailing, people have recognized the importance of data, and it has powerful energy, contain great riches, and people can carry out decision-making by utilizing mass data, for enterprises and individuals brings more useful effect.But in the face of the data of magnanimity, are how current problem demanding prompt solutions to the mass data fast search data gone out needed for user stored.

Summary of the invention

Embodiments of the invention provide a kind of method and node of data processing, in order to realize fast searching data, improve the efficiency of Mass Data Searching.

For achieving the above object, embodiments of the invention adopt following technical scheme:

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node, and described method comprises: described main controlled node determines data message to be searched; Described main controlled node is according to described data message to be searched, generate search instruction, and described search instruction is sent to described at least one from node, so that described at least one the elasticity distribution formula data stored, find the elasticity divided data that described data message to be searched is corresponding according to described search instruction from node, and elasticity divided data corresponding for described data message to be searched is back to described main controlled node; Described at least one store from node described main controlled node send elasticity distribution formula data; The identification information of described data message to be searched is carried in described search instruction; Described main controlled node obtains elasticity distribution formula data corresponding at least one data message described to be searched returned from node described, and the elasticity distribution formula data corresponding according at least one data message described to be searched returned from node described, determine the response data sets that described data message to be searched is corresponding; Described main controlled node exports described response data sets.

Further, before described main controlled node determines data message to be searched, also comprise: described main controlled node obtains data to be stored; Described Data Placement to be stored, according to default division rule, is at least one elasticity distribution formula data by described main controlled node; Described at least one elasticity distribution formula data described are sent to by described main controlled node, at least one is from node.

Further, described main controlled node is according to default division rule, described Data Placement to be stored is comprised at least one elasticity distribution formula data: described main controlled node, according to default division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by described Data Placement to be stored.

Further, embodiments provide a kind of method of data processing, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node, described method comprises: the described search instruction receiving the transmission of described main controlled node from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in described search instruction, obtain the elasticity distribution formula data that the identification information of the described data message to be searched of search is corresponding; The identification information of described data message to be searched is carried in described search instruction; Describedly from node, elasticity distribution formula data corresponding for the identification information of data message described to be searched obtained are back to described main controlled node.

Further, at the described search instruction receiving the transmission of described main controlled node from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in described search instruction, before obtaining the elasticity distribution formula data corresponding to identification information of the described data message to be searched of search, also comprise: describedly receive from node the elasticity distribution formula data that described main controlled node sends; Describedly from node, described elasticity distribution formula data to be stored to internal memory.

Further, embodiments provide a kind of main controlled node, comprising: determining unit, for determining data message to be searched; Processing unit, for according to described data message to be searched, generate search instruction, and described search instruction is sent to described at least one from node, so that described at least one the elasticity distribution formula data stored, find the elasticity divided data that described data message to be searched is corresponding according to described search instruction from node, and elasticity divided data corresponding for described data message to be searched is back to described main controlled node; Described at least one store from node described main controlled node send elasticity distribution formula data; The identification information of described data message to be searched is carried in described search instruction; Described processing unit, also for obtaining elasticity distribution formula data corresponding at least one data message described to be searched returned from node described, and the elasticity distribution formula data corresponding according at least one data message described to be searched returned from node described, determine the response data sets that described data message to be searched is corresponding; Output unit, for exporting described response data sets.

Further, also comprise: acquiring unit, for obtaining data to be stored; Division unit, preset division rule for basis, the Data Placement described to be stored obtained by described acquiring unit is at least one elasticity distribution formula data; Transmitting element, at least one elasticity distribution formula data described in described division unit is obtained be sent to described at least one from node.

Further, described division unit, specifically for according to presetting division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by described Data Placement to be stored.

Further, a kind of comprising: receiving element from node, for receiving the search instruction that described main controlled node sends is embodiments provided; Processing unit, identification information for the data message to be searched in the described search instruction that receives according to described receiving element is searched in the internal memory storing elasticity distribution formula data, obtains the elasticity distribution formula data that the identification information of the described data message to be searched of search is corresponding; The identification information of described data message to be searched is carried in described search instruction; Transmitting element, the elasticity distribution formula data that the identification information for the data message described to be searched obtained by described processing unit is corresponding are back to described main controlled node.

Further, described receiving element, also for receiving the elasticity distribution formula data that described main controlled node sends; Described processing unit, the described elasticity distribution formula data also for being received by described receiving element are stored in internal memory.

Embodiments provide a kind of method and node of data processing, be applied in the cluster of data storage, cluster comprises main controlled node and at least one is from node, and described method comprises: main controlled node determines data message to be searched; Main controlled node is according to described data message to be searched, generate search instruction, and search instruction is sent to described at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node; Main controlled node obtains elasticity divided data corresponding at least one data message to be searched returned from node, and the elasticity divided data that the data message described to be searched returned from node according at least one is corresponding, determine the response data sets that data to be searched are corresponding.Main controlled node exports described response data sets.Like this, when needs carry out data search, main controlled node can according to data genaration search instruction to be searched, search instruction is sent to store elasticity distribution formula data from node, thus elasticity distribution formula data making all data to be searched from the acquisition of nodal parallel corresponding.That is, in cluster of the present invention, be searched for by multiple elasticity distribution formula data stored separately from node simultaneously, get the elasticity distribution formula data that data to be searched are corresponding.And then achieve fast searching data, improve the efficiency of Mass Data Searching.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The schematic flow sheet of the method for a kind of data processing that Fig. 1 provides for the embodiment of the present invention;

The schematic flow sheet of the method for the another kind of data processing that Fig. 2 provides for the embodiment of the present invention;

The schematic flow sheet of the method for the another kind of data processing that Fig. 3 provides for the embodiment of the present invention;

The structural representation of a kind of main controlled node that Fig. 4 provides for the embodiment of the present invention;

The structural representation of the another kind of main controlled node that Fig. 5 provides for the embodiment of the present invention;

A kind of structural representation from node that Fig. 6 provides for the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, cluster comprises main controlled node and at least one is from node.

It should be noted that, the main controlled node in the cluster in all embodiments of the present invention and at least one be all made up of server from node.And the internal memory of server is not less than 96G.In the cluster, main controlled node and the environment that at least one all relies on according to Spark from node carry out installation and deployment, and at main controlled node and at least one installation and deployment Shark from node.Further, at main controlled node and at least one, Hadoop assembly is installed from node, such as, HDFS (HadoopDistributedFileSystem, distributed file system) assembly is installed.

As shown in Figure 1, described method comprises:

Step 101, main controlled node determine data message to be searched.

Concrete, when user needs to obtain data in the cluster having deposited data storage, data message to be searched can be sent to main controlled node.Be, tell that main controlled node user needs to obtain which data.

It should be noted that, the main controlled node of cluster pre-sets.

Exemplary, if user needs search, text data is data data, and now the information of data can be sent in the main controlled node of cluster by user.Main controlled node can get the information that data message to be searched is data.

Step 102, main controlled node are according to data message to be searched, generate search instruction, and search instruction is sent at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node.

Wherein, at least one stores the elasticity distribution formula data that main controlled node sends from node.The identification information of data message to be searched is carried in search instruction.

Concrete, main controlled node, after obtaining data message to be searched, generates search instruction according to data message to be searched.The identification information of data message to be searched can be carried, can know to search for which data according to the identification information of data message to be searched from node in this search instruction.Main controlled node generation search instruction after, this search instruction is sent in cluster from node.Owing to storing elasticity distribution formula data from node, therefore, according to the identification information of the data message to be searched search instruction, in the elasticity distribution formula data stored separately, the elasticity distribution formula data matched with the identification information of data message to be searched can be found from node.Be and find elasticity divided data corresponding to data message to be searched.Elasticity divided data corresponding for the data message to be searched found separately after finding elasticity divided data corresponding to data message to be searched, is all back to main controlled node from node by least one in cluster.

It should be noted that, during elasticity divided data corresponding to the data message to be searched from node checks in the cluster, if also store elasticity distribution formula data in main controlled node, then the elasticity distribution formula data stored in self EMS memory according to data message to be searched are also needed to search.

It should be noted that, in embodiments of the present invention, the data that needs will store, all first in cluster, are the data that will store and are all first divided into multiple elasticity distribution formula data, be stored to each from node and main controlled node with the form of elasticity distribution formula data.Like this, the main controlled node in cluster and all elasticity distribution formula data stored from node can be considered as elasticity distribution formula data set.Further, main controlled node and all from node when storing elasticity distribution formula data, be all that elasticity distribution formula data are stored in internal memory.

As above, described in example, main controlled node, after getting the information of data, according to the information of data, can generate the search instruction of search data, and in this search instruction, carries the identification information of data.And it is all from node to be sent in cluster by the search instruction of search data.Like this, can after the search instruction receiving search data from node, this search instruction can be resolved, and then the identification information of the data carried in large search instruction can be obtained, from node after getting the identification information of this data, can know and need to search for data, now can according to the identification information of data from node, in the elasticity distribution formula data that self EMS memory stores, search for the data of data, and then get the elasticity distribution formula data of data.The elasticity distribution formula data of the data of acquisition all can be back to main controlled node from node in cluster.

Step 103, main controlled node obtain elasticity distribution formula data corresponding at least one data message to be searched returned from node, and the elasticity distribution formula data that the data message to be searched returned from node according at least one is corresponding, determine the response data sets that data message to be searched is corresponding.

Concrete, after main controlled node receives elasticity distribution formula data corresponding at least one data message to be searched returned from node in cluster, by the data set of elasticity distribution formula data composition corresponding at least one data message to be searched returned from node above-mentioned, the response data sets that data message to be searched is corresponding can be defined as.

As above, described in example, after main controlled node receives the elasticity distribution formula data of at least one data returned from node in cluster, by the elasticity distribution formula data assemblies of all data that returns from node to together, the response data sets of data can be formed.Further, main controlled node according to instruction valdata=file.filter (line=>line.contains (" data ")), can obtain the response data sets of data.

Step 104, main controlled node output response data collection.

Concrete, this response data sets, after obtaining response data sets corresponding to data message to be searched, can export by main controlled node, thus makes user get data needed for it, carries out follow-up process.

As above described in example, determine the response data sets of data at main controlled node after, the response data sets of data can be exported to user, thus make user get data data needed for it, carry out follow-up process.

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, cluster comprises main controlled node and at least one is from node, and described method comprises: main controlled node determines data message to be searched; Main controlled node is according to described data message to be searched, generate search instruction, and search instruction is sent to described at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node; Main controlled node obtains elasticity divided data corresponding at least one data message to be searched returned from node, and the elasticity divided data that the data message described to be searched returned from node according at least one is corresponding, determine the response data sets that data to be searched are corresponding.Main controlled node exports described response data sets.Like this, when needs carry out data search, main controlled node can according to data genaration search instruction to be searched, search instruction is sent to store elasticity distribution formula data from node, thus elasticity distribution formula data making all data to be searched from the acquisition of nodal parallel corresponding.That is, in cluster of the present invention, be searched for by multiple elasticity distribution formula data stored separately from node simultaneously, get the elasticity distribution formula data that data to be searched are corresponding.And then achieve fast searching data, improve the efficiency of Mass Data Searching.

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node.As shown in Figure 2, described method comprises:

Step 201, the search instruction sent from node reception main controlled node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in search instruction, obtain the elasticity distribution formula data that the identification information of search data message to be searched is corresponding.

Wherein, the identification information of data message to be searched is carried in search instruction.

Concrete, receive the search instruction of main controlled node transmission from node after, this search instruction can be resolved, and then the identification information of the data message to be searched carried in search instruction can be got.Can search for the internal memory which stores elasticity distribution formula data according to the identification information of this data message to be searched from node, obtain the elasticity distribution formula data that the identification information of this data message to be searched is corresponding.Be and got elasticity distribution formula data corresponding to data message to be searched

Step 202, from node, elasticity distribution formula data corresponding for the identification information of data message to be searched obtained are back to main controlled node.

Concrete, from node after the elasticity distribution formula data that the identification information obtaining data message to be searched is corresponding, elasticity distribution formula data corresponding for the identification information of the data message to be searched obtained can be back to main controlled node.Be, elasticity distribution formula data corresponding for data message to be searched are back to main controlled node.

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, cluster comprises main controlled node and at least one is from node, described method comprises: the search instruction receiving main controlled node transmission from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in search instruction, obtain the elasticity distribution formula data that the identification information of search data message to be searched is corresponding; From node, elasticity distribution formula data corresponding for the identification information of the data message to be searched obtained are back to main controlled node.Like this, when needs carry out data search, all in cluster from node, can search for according to search instruction, obtain the elasticity distribution formula data that data message to be searched is corresponding after receiving search instruction.That is, in cluster of the present invention, be searched for by multiple elasticity distribution formula data stored separately from node simultaneously, get the elasticity distribution formula data that data message to be searched is corresponding.And then achieve fast searching data, improve the efficiency of Mass Data Searching.

Embodiments provide a kind of method of data processing, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node.As shown in Figure 3, described method comprises:

Step 301, main controlled node obtain data to be stored.

Concrete, the cluster stored due to data for storing data, therefore, user need to carry out data store time, data can be stored so far in cluster.Now, user can send data in the main controlled node of cluster.Now, main controlled node can get data to be stored.

Certainly, when other equipment need to carry out data storage, also can be stored in cluster.Now, the data that other equipment can store are sent in the main controlled node of cluster.Now, main controlled node can get data to be stored.

Data Placement to be stored, according to default division rule, is at least one elasticity distribution formula data by step 302, main controlled node.

Concrete, the Data Placement to be stored obtained, after getting data to be stored, can, according to default division rule, be at least one elasticity distribution formula data by main controlled node.

It should be noted that, default division rule pre-sets, for by Data Placement being the rule of multiple elasticity distribution formula data.Such as, default division rule can be data divided according to a size.Now, main controlled node, by data to be stored, divides according to a size, is elasticity distribution formula data by the Data Placement of every for data to be stored a size, thus is at least one elasticity distribution formula data by Data Placement to be stored.

Further, main controlled node is according to default division rule, Data Placement to be stored is comprised at least one elasticity distribution formula data: main controlled node, according to default division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by Data Placement to be stored.

That is, main controlled node according to default division rule, can carry out the division of data to be stored by spark.textFile function, is at least one elasticity distribution formula data by Data Placement to be stored.

At least one elasticity distribution formula data is sent at least one from node by step 303, main controlled node.The elasticity distribution formula data of main controlled node transmission are received from node.

Concrete, main controlled node, after being at least one elasticity distribution formula data by Data Placement to be stored, needs in this at least one elasticity distribution formula Data distribution8 each node in the cluster.Now, elasticity distribution formula data can be sent in each node in cluster by main controlled node successively.Like this, in cluster each from node can receive respectively main controlled node send elasticity distribution formula data.

It should be noted that, main controlled node also can store elasticity distribution formula data, and now main controlled node is when being sent at least one from node by elasticity distribution formula data, can in a certain order, elasticity distribution formula data be sent to uniformly in each node.

It should be noted that, the elasticity distribution formula data that main controlled node is sent to different node are different.Be, different elasticity distribution formula data are sent in different nodes by main controlled node.

Such as, main controlled node is elasticity distribution formula data 1 by Data Placement to be stored, elasticity distribution formula data 2, elasticity distribution formula data 3, elasticity distribution formula data 4.Include main controlled node in the cluster and from node 1, from node 2 and from node 3.Elasticity distribution formula data 1 can be sent to from node 1 by main controlled node, are sent to from node 2 by elasticity distribution formula data 2, are sent to from node 3 by elasticity distribution formula data 3, elasticity distribution formula data 4 are stored up at main controlled node self EMS memory.Like this, elasticity distribution formula data 1 can be received from node 1.Elasticity distribution formula data 2 can be received from node 2.Elasticity distribution formula data 3 can be received from node 3.Elasticity distribution formula data 4 are stored in the internal memory of self by main controlled node.

Step 304, from node, elasticity distribution formula data to be stored to internal memory.

Concrete, from node after receiving elasticity distribution formula data, need elasticity distribution formula data to store.Due in the cluster storing data, data all can be stored in internal memory, and then the speed of data processing can be improved.Therefore, the elasticity distribution formula data of acquisition all can be stored in the internal memory of self from node.

Further, cache function can be utilized to be stored internal memory by elasticity distribution formula Data import from node.

Further, each node in cluster comprises when elasticity distribution formula data being stored in internal memory from node and main controlled node, elasticity distribution formula data can be stored in internal memory as configuration data.

Step 305, main controlled node determine data message to be searched.

Concrete, can refer step 101, do not repeat them here.

Step 306, main controlled node are according to described data message to be searched, generate search instruction, and search instruction is sent at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node.The search instruction of main controlled node transmission is received from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in search instruction, obtain the elasticity distribution formula data that the identification information of search data message to be searched is corresponding.

Concrete, can refer step 102 and step 201, do not repeat them here.

Step 307, from node, elasticity distribution formula data corresponding for the identification information of data message to be searched obtained are back to main controlled node.Main controlled node obtains elasticity distribution formula data corresponding at least one data message to be searched returned from node, and the elasticity distribution formula data that the data message to be searched returned from node according at least one is corresponding, determine the response data sets that data message to be searched is corresponding.

Concrete, can refer step 103 and step 202, do not repeat them here.

Step 308, main controlled node export described response data sets.

Concrete, can refer step 104, do not repeat them here.

Embodiments provide a kind of main controlled node, as shown in Figure 4, comprising:

Determining unit 401, for determining data message to be searched.

Processing unit 402, for according to data message to be searched, generate search instruction, and search instruction is sent to described at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node.

Processing unit 402, also for obtaining elasticity distribution formula data corresponding at least one data message to be searched returned from node, and the elasticity distribution formula data that the data message to be searched returned from node according at least one is corresponding, determine the response data sets that data message to be searched is corresponding.

Output unit 403, for output response data collection.

Further, above-mentioned main controlled node, as shown in Figure 5, also comprises:

Acquiring unit 404, for obtaining data to be stored.

Division unit 405, preset division rule for basis, the Data Placement to be stored obtained by acquiring unit 404 is at least one elasticity distribution formula data.

Concrete, division unit 405, specifically for according to presetting division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by Data Placement to be stored.

Transmitting element 406, is sent at least one from node at least one elasticity distribution formula data division unit 405 obtained.

Embodiments provide a kind of main controlled node, comprising: main controlled node determines data message to be searched; Main controlled node is according to described data message to be searched, generate search instruction, and search instruction is sent to described at least one from node, so that at least one elasticity divided data finding data message to be searched corresponding according to search instruction the elasticity distribution formula data stored from node, and elasticity divided data corresponding for data message to be searched is back to main controlled node; Main controlled node obtains elasticity divided data corresponding at least one data message to be searched returned from node, and the elasticity divided data that the data message described to be searched returned from node according at least one is corresponding, determine the response data sets that data to be searched are corresponding.Main controlled node exports described response data sets.Like this, when needs carry out data search, main controlled node can according to data genaration search instruction to be searched, search instruction is sent to store elasticity distribution formula data from node, thus elasticity distribution formula data making all data to be searched from the acquisition of nodal parallel corresponding.That is, in cluster of the present invention, be searched for by multiple elasticity distribution formula data stored separately from node simultaneously, get the elasticity distribution formula data that data to be searched are corresponding.And then achieve fast searching data, improve the efficiency of Mass Data Searching.

Embodiments provide a kind of from node, as shown in Figure 6, comprising:

Receiving element 601, for receiving the search instruction that main controlled node sends.

Processing unit 602, identification information for the data message to be searched in the search instruction that receives according to receiving element 601 is searched in the internal memory storing elasticity distribution formula data, obtains the elasticity distribution formula data that the identification information of search data message to be searched is corresponding.

Transmitting element 603, the elasticity distribution formula data that the identification information for the data message to be searched obtained by processing unit 602 is corresponding are back to main controlled node.

Further, above-mentioned receiving element 601, also for receiving the elasticity distribution formula data that main controlled node sends.

Processing unit 602, the elasticity distribution formula data also for being received by receiving element 601 are stored in internal memory.

Embodiments provide a kind of from node, comprise: the search instruction receiving main controlled node transmission from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in search instruction, obtain the elasticity distribution formula data that the identification information of search data message to be searched is corresponding; From node, elasticity distribution formula data corresponding for the identification information of the data message to be searched obtained are back to main controlled node.Like this, when needs carry out data search, all in cluster from node, can search for according to search instruction, obtain the elasticity distribution formula data that data message to be searched is corresponding after receiving search instruction.That is, in cluster of the present invention, be searched for by multiple elasticity distribution formula data stored separately from node simultaneously, get the elasticity distribution formula data that data message to be searched is corresponding.And then achieve fast searching data, improve the efficiency of Mass Data Searching.

Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a method for data processing, is characterized in that, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node, and described method comprises:

Described main controlled node determines data message to be searched;

Described main controlled node is according to described data message to be searched, generate search instruction, and described search instruction is sent to described at least one from node, so that described at least one the elasticity distribution formula data stored, find the elasticity divided data that described data message to be searched is corresponding according to described search instruction from node, and elasticity divided data corresponding for described data message to be searched is back to described main controlled node; Described at least one store from node described main controlled node send elasticity distribution formula data; The identification information of described data message to be searched is carried in described search instruction;

Described main controlled node obtains elasticity distribution formula data corresponding at least one data message described to be searched returned from node described, and the elasticity distribution formula data corresponding according at least one data message described to be searched returned from node described, determine the response data sets that described data message to be searched is corresponding;

Described main controlled node exports described response data sets.

2. method according to claim 1, is characterized in that, before described main controlled node determines data message to be searched, also comprises:

Described main controlled node obtains data to be stored;

Described Data Placement to be stored, according to default division rule, is at least one elasticity distribution formula data by described main controlled node;

Described at least one elasticity distribution formula data described are sent to by described main controlled node, at least one is from node.

3. method according to claim 2, is characterized in that, described Data Placement to be stored, according to default division rule, comprises at least one elasticity distribution formula data by described main controlled node:

Described main controlled node, according to default division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by described Data Placement to be stored.

4. a method for data processing, is characterized in that, be applied in the cluster of data storage, described cluster comprises main controlled node and at least one is from node, and described method comprises:

The described search instruction receiving the transmission of described main controlled node from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in described search instruction, obtain the elasticity distribution formula data that the identification information of the described data message to be searched of search is corresponding; The identification information of described data message to be searched is carried in described search instruction;

Describedly from node, elasticity distribution formula data corresponding for the identification information of data message described to be searched obtained are back to described main controlled node.

5. method according to claim 4, it is characterized in that, at the described search instruction receiving the transmission of described main controlled node from node, and search in the internal memory storing elasticity distribution formula data according to the identification information of the data message to be searched in described search instruction, before obtaining the elasticity distribution formula data corresponding to identification information of the described data message to be searched of search, also comprise:

The described elasticity distribution formula data receiving the transmission of described main controlled node from node;

Describedly from node, described elasticity distribution formula data to be stored to internal memory.

6. a main controlled node, is characterized in that, comprising:

Determining unit, for determining data message to be searched;

Processing unit, for according to described data message to be searched, generate search instruction, and described search instruction is sent to described at least one from node, so that described at least one the elasticity distribution formula data stored, find the elasticity divided data that described data message to be searched is corresponding according to described search instruction from node, and elasticity divided data corresponding for described data message to be searched is back to described main controlled node; Described at least one store from node described main controlled node send elasticity distribution formula data; The identification information of described data message to be searched is carried in described search instruction;

Described processing unit, also for obtaining elasticity distribution formula data corresponding at least one data message described to be searched returned from node described, and the elasticity distribution formula data corresponding according at least one data message described to be searched returned from node described, determine the response data sets that described data message to be searched is corresponding;

Output unit, for exporting described response data sets.

7. main controlled node according to claim 6, is characterized in that, also comprises:

Acquiring unit, for obtaining data to be stored;

Division unit, preset division rule for basis, the Data Placement described to be stored obtained by described acquiring unit is at least one elasticity distribution formula data;

Transmitting element, at least one elasticity distribution formula data described in described division unit is obtained be sent to described at least one from node.

8. main controlled node according to claim 7, is characterized in that,

Described division unit, specifically for according to presetting division rule, utilizes spark.textFile function to be at least one elasticity distribution formula data by described Data Placement to be stored.

9. from a node, it is characterized in that, comprising:

Receiving element, for receiving the search instruction that described main controlled node sends;

Processing unit, identification information for the data message to be searched in the described search instruction that receives according to described receiving element is searched in the internal memory storing elasticity distribution formula data, obtains the elasticity distribution formula data that the identification information of the described data message to be searched of search is corresponding; The identification information of described data message to be searched is carried in described search instruction;

Transmitting element, the elasticity distribution formula data that the identification information for the data message described to be searched obtained by described processing unit is corresponding are back to described main controlled node.

10. according to claim 9 from node, it is characterized in that,

Described receiving element, also for receiving the elasticity distribution formula data that described main controlled node sends;

Described processing unit, the described elasticity distribution formula data also for being received by described receiving element are stored in internal memory.