CN110046062A

CN110046062A - Distributed data processing method and system

Info

Publication number: CN110046062A
Application number: CN201910173569.0A
Authority: CN
Inventors: 郑轩; 贾志忠; 曲家朋; 段和枫
Original assignee: PCI Suntek Technology Co Ltd
Current assignee: PCI Technology Group Co Ltd
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2019-07-23
Anticipated expiration: 2039-03-07
Also published as: CN110046062B

Abstract

The embodiment of the invention discloses a kind of distributed data processing method and system, comprising: metadata node receives the search instruction that client is sent, and metadata node belongs to metadata cluster；Whole metadata of storage are fed back to client according to search instruction by metadata node, so that client sends retrieval request according to metadata selected first object back end and to first object back end, one group metadata corresponds to a back end, multiple back end form a back end group, total data node group composition characteristic index server cluster, each back end group is there are a first object back end, and data are synchronous in back end group；First object back end receives retrieval request and obtains aspect indexing therein；First object back end determines retrieval data according to aspect indexing and feeds back to client, so that client determines search result according to retrieval data.Above-mentioned realize carries out quick-searching in the case where client does not store metadata.

Description

Distributed data processing method and system

Technical field

The present embodiments relate to field of computer technology more particularly to a kind of distributed data processing method and system.

Background technique

With the development of computer technology, all kinds of fields more depend on computer technology.For example, intelligent security guard field, Various electronic equipments are generallyd use, security system is formed in the way of networking.Wherein, camera is indispensable in safety-security area A part.It is subsequent for convenience to adopt card, it is normally set up and is adopted by the electronic equipment storage camera of processor grade in security system The picture of collection, and corresponding metadata is stored in client used by a user, so that user is retrieved by metadata. At this point, the data of storage can only achieve the fault-tolerant of disk level, when client is abnormal or damage, user will be unable to retrieval and save Data, irreparable damage can be caused to user in this way.

Summary of the invention

The present invention provides a kind of distributed data processing method and system, do not store metadata in client to realize In the case of, carry out quick-searching.

In a first aspect, the embodiment of the invention provides a kind of distributed data processing methods, comprising:

Metadata node receives the search instruction that client is sent, and the metadata node belongs to metadata cluster；

Whole metadata of storage are fed back to the client according to search instruction by the metadata node, so that described Client sends retrieval request according to the metadata selected first object back end and to the first object back end, One group metadata corresponds to a back end, and multiple back end form a back end group, total data node group composition Aspect indexing server cluster, there are a back end as first object back end, the number for each back end group It is synchronous according to data in node group；

The first object back end receives retrieval request, and obtains the aspect indexing in the retrieval request；

The first object back end determines retrieval data according to the aspect indexing, and by the retrieval data feedback To the client, so that the client determines search result according to retrieval data.

Further, the first object back end determines that retrieval data include: according to the aspect indexing

The first object back end determines in own cache manager with the presence or absence of corresponding with the aspect indexing Retrieve data；

If it exists, then the retrieval data are obtained；

If it does not exist, then retrieval data corresponding with the aspect indexing are searched in itself storage manager.

Further, the aspect indexing includes characteristic, and the characteristic is to carry out depth to data to be retrieved What study obtained.

Further, the retrieval data include data unique identification and/or data positional information.

Further, the back end group includes primary data node and multiple from back end, the primary data node Data for controlling the back end group are synchronous.

Further, further includes:

The metadata node receives the write instruction that the client is sent；

The metadata node instructs according to said write and determines the second target data node in total data node；

The metadata of the second target data node is fed back to the client by the metadata node, so that described Client sends write request to the second target data node according to the metadata；

The second target data node receives said write request；

The second target data node carries out write-in response according to said write request.

Further, the second target data node is the primary data node in affiliated back end group,

The second target data node carries out write-in response according to said write request

The second target data node requests to carry out write operation simultaneously in itself storage manager according to said write Create aspect indexing；

The data and aspect indexing that are written in the storage manager are updated to itself by the second target data node Cache manager；

Response message is written to the client feedback in the second target data node.

Further, the second target data node is the slave back end in affiliated back end group,

Said write request is sent to third target data node, the third target by the second target data node Back end is the primary data node in affiliated back end group；

The third target data node requests to carry out write operation simultaneously in itself storage manager according to said write Create aspect indexing；

The data and aspect indexing that are written in the storage manager are updated to itself by the third target data node Cache manager；

Response message is written to the second target data node feeding back in the third target data node；

The second target data node is to the client feedback said write response message.

Second aspect, the embodiment of the invention also provides a kind of distributed data processing systems, comprising: metadata cluster and Aspect indexing server cluster, the metadata cluster include at least one metadata node, the aspect indexing server set Group includes multiple back end groups, and each back end group includes multiple back end, data in the back end group It is synchronous；

The metadata node is also used to be stored according to search instruction for receiving the search instruction of client transmission Whole metadata feed back to the client so that the client is according to the metadata selected first object back end And retrieval request, the corresponding back end of a group metadata, each back end are sent to the first object back end There are a back end as first object back end in group；

The first object back end for receiving retrieval request, and obtains the aspect indexing in retrieval request；Also use Data are retrieved in determining according to the aspect indexing, and by the retrieval data feedback to the client, so that the client End determines search result according to retrieval data.

Further, the cryptographic Hash of each back end is within the set range in the back end group.

Above-mentioned distributed data processing method and system receive client hair by the metadata node in metadata cluster The search instruction sent, and the metadata of the back end of total data node group in aspect indexing server cluster is fed back into visitor Family end, so that client selects first object back end in each back end group according to metadata and to each first object number Retrieval request is sent according to node, each first object back end determines aspect indexing according to retrieval request, and according to aspect indexing Retrieval obtains corresponding retrieval data, and later, each first object back end will retrieve data feedback to client, so that client The technological means for holding clear search result realizes client in the case where not storing metadata, carries out quick-searching.Together When, pass through the synchronous back end group of data in setting group, it is also ensured that node level is other fault-tolerant, even if in back end group Any data node breaks down, other back end can equally provide retrieval service, and can be with assuring data security And stability.

Detailed description of the invention

Fig. 1 is a kind of flow chart of distributed data processing method provided in an embodiment of the present invention；

Fig. 2 is data flow diagram when client provided in an embodiment of the present invention is retrieved；

Fig. 3 is the flow chart of another distributed data processing method provided in an embodiment of the present invention；

Fig. 4 is data flow diagram when first object back end provided in an embodiment of the present invention handles retrieval request；

Fig. 5 is the flow chart of another distributed data processing method provided in an embodiment of the present invention；

Fig. 6 is data flow diagram between each cluster when data are written in client provided in an embodiment of the present invention；

Fig. 7 is the data flow diagram that primary data node provided in an embodiment of the present invention carries out write-in response；

Fig. 8 is a kind of structural schematic diagram of distributed data processing system provided in an embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of distributed data processing device provided in an embodiment of the present invention；

Figure 10 is a kind of structural schematic diagram of metadata node provided in an embodiment of the present invention；

Figure 11 is a kind of structural schematic diagram of back end provided in an embodiment of the present invention.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used to explain the present invention, rather than limitation of the invention.It also should be noted that for the ease of retouching It states, only the parts related to the present invention are shown in attached drawing rather than entire infrastructure.

Fig. 1 is a kind of flow chart of distributed data processing method provided in an embodiment of the present invention.At the distributed data Reason method is suitable for the case where retrieving to the data of distributed storage by aspect indexing.The distributed data processing method It can be executed by distributed data processing system.

Wherein, distributed data processing system includes metadata cluster and aspect indexing server cluster.Metadata cluster In include multiple metadata nodes, each metadata node may be considered a server.Metadata node is for storing spy The metadata of each back end in index server cluster is levied, each metadata node is stored with first number of total data node According to.Aspect indexing server cluster includes multiple back end groups, and each back end group includes multiple back end.Each data The aspect indexing that node memory is contained the data of user's write-in and generated based on data.Each data section in each back end group Data between point are synchronous.At this point, user only needs to access one of back end and can obtain the data of needs, and data In node group when any data node exception, user remains to the data that needs are retrieved in back end group.Further, respectively The packet mode of back end may be set according to actual conditions.For example, the back end that same geographic location area will be in It is divided into same back end group.For another example, Hash calculation is carried out using setting data of the consistency hash algorithm to back end, such as Consistency Hash calculation is carried out to equipment condition code, and the corresponding back end of cryptographic Hash in obtained setting range is divided into Same back end group.

Specifically, the distributed data processing method specifically includes with reference to Fig. 1:

Step 110, metadata node receive the search instruction that client is sent, and the metadata node belongs to metadata set Group.

Illustratively, client is the intelligent electronic device that user uses, first number in accessible metadata cluster According to the back end in node and access aspect indexing server cluster.Optionally, client may include mobile phone, plate electricity At least one user such as brain, laptop, desktop computer holds equipment.It further, can when user is there are when Search Requirement To issue search instruction by client.Wherein, the process embodiments that client generates search instruction are not construed as limiting.

Specifically, sending search instruction to metadata node after client generates search instruction.Wherein, client and member The communication mode embodiment that back end uses is not construed as limiting.Optionally, metadata node can be setting in metadata cluster Determine node, any node being also possible in metadata cluster.For example, presetting the corresponding client of each metadata node Range, at this point, the client within the scope of this can be communicated with the metadata node of setting.For another example, client can to appoint One metadata node sends search instruction, or search instruction is sent to metadata node in a broadcast manner.

Further, search instruction is the instruction that prompt carries out data retrieval.Wherein, the Content Implementation that search instruction includes Example is not construed as limiting.For example, including the identity information of client and the retrieval code of setting in setting search instruction.Metadata node Search instruction is received by retrieval code determination, and client is confirmed by identity information.

Whole metadata of storage are fed back to client according to search instruction by step 120, metadata node, so that client End sends retrieval request according to metadata selected first object back end and to first object back end.

Wherein, the corresponding back end of a group metadata, multiple back end form a back end group, whole numbers According to node group composition characteristic index server cluster, there are a back end as first object data for each back end group Node, data are synchronous in back end group.

Specifically, whole metadata of its storage inside are fed back to after setting metadata node receives search instruction Client.Wherein, the metadata of each back end in aspect indexing server cluster is stored in metadata node.Metadata For describing the attribute information of back end, in embodiment, setting metadata is included at least: the location information of back end and institute Belong to grouping, so that client determines the position of back end by location information, each back end institute is determined by affiliated grouping Back end group, and then realize access back end.In general, when back end starting after, can by itself Metadata reports to metadata node.Wherein, back end transmits metadata to whole metadata nodes, so that each member number Metadata is received according to node.Either, metadata is reported to any metadata node or setting metadata by back end Node after metadata node receives metadata, synchronizes in metadata cluster, to guarantee that each metadata node obtains Take the metadata.Wherein, the method for determination embodiment for setting metadata node is not construed as limiting.

Illustratively, after client receives metadata, data section belonging to each back end is determined based on metadata Point group.In general, include multiple back end in each back end group, and data are synchronous in back end group, each data Data between node group can be the same or different.Further, after determining back end grouping, client is successively every Select a back end as first object back end in a back end group.Wherein, same due to being limited in embodiment Data are synchronous between each back end in back end group, then, the status data memory of each back end is identical.Therefore, implement Example in setting client can in each back end group an optional back end as first object back end.It can be with Understand, in practical application, client can also select to set in back end group according to the limitation of communication condition etc. Back end is determined as first object back end.Typically, the corresponding first object back end of each back end group, Guarantee that client retrieves the total data stored in aspect indexing server cluster with this.

Further, after client selectes first object back end, according to corresponding metadata to each first object number Retrieval request is sent according to node.Wherein, the generating mode embodiment of retrieval request is not construed as limiting.In general, in retrieval request Including at least aspect indexing.Optionally, aspect indexing includes characteristic, and characteristic is to carry out depth to data to be retrieved What acquistion was arrived.At this point, it can be appreciated that the key of aspect indexing is characterized data.Further, the data lattice of data to be retrieved Formula embodiment is not construed as limiting, such as image data, text data and/or audio data.Different data correspond to different features Data.Characteristic can carry out deep learning by corresponding back end and obtain, and can also be carried out by other data systems deep Degree study obtains.It is to be carried out being described for deep learning obtains by other data systems by characteristic in embodiment, this When other data systems where hardware device and back end be different equipment, and other data systems can be with data section Point and client are communicated.In general, when back end or other memory nodes carry out storing data, other data systems pair The data of storage carry out deep learning to obtain the corresponding characteristic of the data.Wherein, the concrete mode of deep learning can be with It is set according to actual conditions, characteristic is the character string of certain length, and back end does not have permission to modify the character string.Into One step, after back end obtains characteristic, store this feature data.Optionally, client obtains and stores this feature number According to if at this point, client is wanted to retrieve some data, it is only necessary to corresponding characteristic is inputted, for example, client wants will retrieve packet The picture of the object containing A can be retrieved at this point, client only needs the characteristic based on A object to generate aspect indexing, I.e. the picture of A object is data to be retrieved.Also optional, client does not obtain this feature data, if at this point, client wants inspection Some data of rope then need to input reference data as data to be retrieved, and generate characteristic feedback by other data systems To client, so that client generates aspect indexing.For example, client wants will retrieve the picture comprising B object, at this point, client End generates characteristic, later, client by other data systems using certain picture comprising B object as data to be retrieved End group generates aspect indexing in the characteristic of B object, can be retrieved.

Step 130, first object back end receive retrieval request, and obtain the aspect indexing in retrieval request.

Illustratively, each first object back end is identical to the treatment process of retrieval request, with one in embodiment It is described for one target data node.Specifically, after first object back end receives retrieval request, to retrieval request It is parsed, to obtain the aspect indexing in retrieval request.Optionally, it when user needs to retrieve multiple data, can examine Multiple aspect indexings are written in rope request.

Step 140, first object back end determine retrieval data according to aspect indexing, and will retrieve data feedback to visitor Family end, so that client determines search result according to retrieval data.

Specifically, retrieval data are the data obtained according to aspect indexing, specific data content can be according to reality Situation setting.In embodiment, setting retrieval data include data unique identification and/or data positional information, and optional includes data Write time and/or aspect indexing similarity.Wherein, Data writing time is the time for storing the data.Data unique identification For the identity of data, there is uniqueness, the unlimited fixed number of embodiment according to unique identification create-rule.Data positional information is The location information of data storage.Data writing time, data positional information and data unique identification can be denoted as the spy of data Reference breath.Aspect indexing similarity is characterized the characteristic in indexing and between the characteristic found in back end Similarity.In general, similarity is higher, two characteristics are more similar, and retrieval data are more accurate.

In general, when back end or other memory node storing datas, feature of the back end in addition to recording the data Outside data, go back synchronous recording data positional information, data unique identification and Data writing time etc., and establish above-mentioned data it Between incidence relation.At this point, other memory nodes are after storage is completed to back end when other memory node storing datas Above- mentioned information are reported, and synchronize in group by back end.After first object back end receives aspect indexing, obtain Characteristic therein, and confirm that the similarity of each characteristic and characteristic in aspect indexing of itself storage obtains later The characteristic for being higher than setting similarity is taken, and then selects similarity highest one or setting in the characteristic of acquisition Several characteristics, and then the information such as the associated data positional information of characteristic, data unique identification are obtained, and generate corresponding Retrieval data.At this point, retrieval data are equivalent to the value of aspect indexing.It is understood that the above-mentioned setting referred to is similar The specific data of degree and the calculation of similarity may be set according to actual conditions.

Optionally, in retrieval, first object back end calculates each characteristic in the cache manager of itself first It with the similarity of characteristic in aspect indexing, i.e., is retrieved in cache manager, is higher than setting if existing in cache manager The characteristic of similarity, then confirm and retrieve successfully, and generates retrieval data.Otherwise, it is examined in itself storage manager Rope, at this point, retrieving is identical as retrieving in cache manager, this will not be repeated here.Further, if retrieving successfully, Retrieval data are then generated, otherwise, confirmation retrieval failure.Illustratively, if generating retrieval data, data feedback will be retrieved to visitor Family end, otherwise, notice client retrieval failure, so that client is confirmed whether to regenerate retrieval request.

Optionally, when obtaining retrieval data, first object back end carries out asynchronous retrieval, and waits all asynchronous inspections After the completion of rope, merge search result, to obtain retrieval data.Wherein, it can be protected when retrieval content is excessive by asynchronous retrieval Card retrieval data loading.

Further, after client receives retrieval data, search result is determined according to retrieval data.Wherein, client After confirmation receives the retrieval data that whole first object back end return, retrieval data are merged, to obtain final inspection Rope data.Later, client shows retrieval data, so that user clearly retrieves data, and then is obtained according to retrieval data To search result.For example, the data positional information in access retrieval data, the data that access is obtained are as search result.Or Person is, according to the data that data unique identification is needed, and using the data as search result.Optionally, retrieval is received After data, retrieval data are verified, and after being proved to be successful, obtain search result.If authentication failed, confirm that retrieval is lost It loses.Wherein, the mode of verifying may be set according to actual conditions.

Exemplary description is carried out to technical solution provided in this embodiment below, wherein Fig. 2 provides for the embodiment of the present invention Client retrieval when data flow diagram.With reference to Fig. 2, distributed data processing method specifically: client is to metadata set Pocket transmission search instruction, so that some metadata node determines search instruction and to client feedback feature rope in metadata cluster Draw the metadata of server cluster.Wherein, the metadata of aspect indexing server cluster refers to the metadata of total data node. Later, client selects a first object back end according to metadata in each back end group.Later, client is to each First object back end sends retrieval request, and each first object back end retrieves number to client feedback according to retrieval request According to after client confirms all equal feedback searching data of first object back end, confirmation first object back end has been responded Finish, later, each retrieval data is merged, and terminate this retrieval.Later, client can be examined according to retrieval data Hitch fruit.It should be noted that since the processing rule of each first object back end is identical, so, only with one in Fig. 2 It is stated for one target data node.

It is above-mentioned, the search instruction that client is sent is received by metadata node in metadata cluster, and by feature rope The metadata for drawing the back end of total data node group in server cluster feeds back to client, so that client is according to first number According to selecting first object back end in each back end group and send retrieval request to each first object back end, each the One target data node determines aspect indexing according to retrieval request, and is retrieved to obtain corresponding retrieval data according to aspect indexing, Later, each first object back end will retrieve data feedback to client, so that the technology hand of the clear search result of client Section, realizes client in the case where not storing metadata, carries out quick-searching.Meanwhile it is synchronous by data in setting group Back end group, it is also ensured that node level is other fault-tolerant, though in back end group any data node break down, His back end can equally provide retrieval service, and can be with assuring data security and stability.

Fig. 3 is the flow chart of another distributed data processing method provided in an embodiment of the present invention.The present embodiment be It is embodied on the basis of above-described embodiment.Specifically, with reference to Fig. 3, which includes:

Step 210, metadata node receive the search instruction that client is sent, and the metadata node belongs to metadata set Group.

Whole metadata of storage are fed back to client according to search instruction by step 220, metadata node, so that client End sends retrieval request according to metadata selected first object back end and to first object back end.

Wherein, the corresponding back end of a group metadata, multiple back end form a back end group, whole numbers According to node group composition characteristic index server cluster, there are a back end as first object data for each back end group Node, data are synchronous in the back end group.

Step 230, first object back end receive retrieval request, and obtain the aspect indexing in retrieval request.

Specifically, back end includes service layer and searching, managing device.Wherein, service layer with client for being led to Retrieval request for example, service layer is used to receive the retrieval request of client, and is sent to searching, managing device by letter.Further, Searching, managing device is for realizing search function, for example, parsing after searching, managing device receives retrieval request to retrieval request And obtain aspect indexing.In embodiment, setting aspect indexing includes characteristic.

Step 240, first object back end determine in own cache manager with the presence or absence of corresponding with aspect indexing Retrieve data.If it exists, 250 are thened follow the steps, if it does not exist, thens follow the steps 260.

Specifically, back end further includes cache manager, with caching function.In general, in cache manager The data being written in storage setting duration, the data of write-in include having the characteristic and characteristic information of incidence relation.Wherein, Setting duration may be set according to actual conditions, such as setting when it is 90 days a length of.Optionally, cache manager may include one Or it is multiple.Illustratively, it is corresponding according to characteristic access in aspect indexing after setting searching, managing device determines aspect indexing One or more cache managers, to confirm the characteristic for whether having in cache manager and meeting setting similarity.

Step 250 obtains retrieval data, and will retrieve data feedback to client, so that client is according to retrieval data Determine search result.

Specifically, retrieval data are sent to searching, managing device after cache manager obtains retrieval data.Searching, managing device The retrieval data of available cache manager and/or storage manager feedback carry out asynchronous data inspection to retrieval data later Rope, and after in all asynchronous datas, search complete, result merging is carried out, final retrieval data are obtained.Later, data will be retrieved Service layer is fed back to, so that service layer will retrieve data feedback to client.

Step 260 searches retrieval data corresponding with the aspect indexing in itself storage manager, and will retrieve number According to client is fed back to, so that client determines search result according to retrieval data.

Illustratively, back end further includes storage manager, with memory function.In embodiment, storage tube is set It is identical as the data type that cache manager stores to manage device, data volume is different.In general, the data volume stored in storage manager is more In the data volume stored in cache manager, cache manager only stores the data in setting duration recently, and storage manager can To store the data of more durations.Optionally, setting storage manager carries out pipe to the data of its storage inside using lru algorithm Reason, i.e., remove memory for certain data of its storage inside using lru algorithm and vacating space loads other data.

Specifically, obtaining the characteristic of aspect indexing, and then determine storage after storage manager receives aspect indexing Manager itself is with the presence or absence of the characteristic for meeting setting similarity.If it exists, then acquisition retrieval data are executed, and will retrieval Data feedback is to client, so that client determines search result according to retrieval data.At this point, storage manager will retrieve data It is sent to cache manager in advance, that is, updates the data of cache manager, later, cache manager will retrieve data feedback and extremely examine Rope manager, at this point, the treatment process of searching, managing device is identical as the treatment process of step 250, therefore not to repeat here.If not depositing , it is determined that retrieval data can not be obtained, and retrieve failed message to client feedback.At this point, storage manager can will be examined Rope failed message is sent to searching, managing device, and feeds back to client by service layer by searching, managing device.

It should be noted that sending aspect indexing from searching, managing device to cache manager in step 240 to step 260 Asynchronous data, which is carried out, to searching, managing device is retrieved as a cyclic process.By the cyclic process, may be implemented to whole features The retrieval of index.

In general, usually there are the retrieval data of user's needs in storage manager.

Exemplary description is carried out to the process of first object back end processing retrieval request in the present embodiment below.Fig. 4 Data flow diagram when handling retrieval request for first object back end provided in an embodiment of the present invention.With reference to Fig. 4, the first mesh Marking back end includes service layer, searching, managing device, cache manager and storage manager.Specifically, service layer receives client The retrieval request sent is held, and retrieval request is sent to searching, managing device.Searching, managing device decouples retrieval request, i.e. analysis inspection Rope is requested to obtain aspect indexing.Later, searching, managing device sends data acquisition instruction, i.e. transmission feature rope to cache manager Draw.Cache manager confirms itself with the presence or absence of retrieval data corresponding with aspect indexing according to aspect indexing, and if it exists, then obtains Retrieval data are taken, and feed back to searching, managing device.If it does not exist, then aspect indexing is sent to storage manager.Storage management Device retrieves retrieval data corresponding with aspect indexing, and will retrieve data feedback to cache manager, later by cache management Device will retrieve data feedback to searching, managing device.When searching, managing device receives retrieval data, asynchronous data search is executed, it Afterwards, it after the completion of the search of all asynchronous datas, carries out result and merges to obtain final retrieval data.Later, searching, managing device Retrieval data are sent to service layer, and client is fed back to by service layer.

It is above-mentioned, by setting cache manager and storage manager, the quick-searching to retrieval data may be implemented, together When, convenient for being managed to data.

Fig. 5 is the flow chart of another distributed data processing method provided in an embodiment of the present invention.The present embodiment be It is embodied on the basis of above-described embodiment.The present embodiment is that the write request of user is described.Specifically, setting number It according to node group include primary data node and multiple from back end, the primary data node is for controlling the back end group Data are synchronous.

Specifically, it includes a primary data node and multiple from back end for setting each back end group.Master data Data of the node between each back end in control group are synchronous.Wherein, when the interior any data node of group receives write request When, which is responded by primary data node, and after data are written, it is synchronous to carry out data.

Optionally, the selection mode embodiment of primary data node is not construed as limiting.For example, being assisted by distributed consensus algorithm View (such as RAFT algorithm) selects primary data node in back end group.In general, there is master data always in back end group Node.If current primary data node is abnormal or unavailable, other data can be selected in back end group again Node is as primary data node.At this point, the method for determination of primary data node is identical as aforementioned method of determination.If aforementioned abnormal or not When available primary data node can be used again, available primary data node it can will become again from back end, and by current Primary data node control its to carry out data synchronous.

With reference to Fig. 5, distributed data processing method further include:

Step 310, metadata node receive the write instruction that client is sent.

Specifically, user generates write instruction by client and is sent to metadata node there are when write-in demand.Its In, client sends write instruction and mode from client to metadata node transmission search instruction and rule to metadata node Identical, this will not be repeated here.Further, the particular content for including in write instruction may be set according to actual conditions.For example, It include the identity information that the back end of data is written in expectation in write instruction.Wherein, identity information has uniqueness, can be with It is number etc..In general, identity of the user by the available total data node that currently can be written into data of client Information.

Step 320, metadata node determine the second target data node according to write instruction in total data node.

Specifically, metadata node parses write instruction and confirms the identity information of back end.In general, metadata node There is the identity information of each back end, corresponding data section is determined by identity information in total data node to realize Point, and the back end is determined as the second target data node.Wherein, the second target data node can be one or more It is a.

The metadata of second target data node is fed back to client by step 330, metadata node, so that the client End sends write request to the second target data node according to metadata.

Illustratively, after metadata node determines the second target data node, by the metadata of the second target data node Client is fed back to, so that client determines the location information of the second target data node by metadata.Later, client is raw At write request and it is sent to the second target data node.Wherein, write request includes at least user and it is expected that back end is written Data.Wherein, the data of write-in include at least the characteristic and characteristic information of currently stored data, and optional includes current Data of storage itself.Optionally, after the metadata of the second target data node of user's acquisition, passed through according to location information consistent Property hash algorithm calculate write-in data storage fragment, later combine fragment result generate write request.

Step 340, the second target data node receive write request.

Optionally, write request is received by the service layer of the second target data node and is sent to storage manager.

Step 350, the second target data node carry out write-in response according to write request.

Specifically, the second target data node responds write request.Under setting write-in response includes in embodiment State at least two schemes:

Scheme one, the second target data node are the primary data node in affiliated back end group.The step specifically includes Step 351- step 353:

Step 351, the second target data node carry out write operation simultaneously in itself storage manager according to write request Create aspect indexing.

Specifically, directly being responded to write request when the second target data node is primary data node.Optionally, After the storage manager of second target data node receives write request, write request is packaged into RAFT motion.Later, right Local RAFT node carries out motion, and local RAFT node receives after motion to RAFT collection pocket transmission RAFT motion.RAFT cluster After confirming RAFT motion, the RAFT motion that passes through to local RAFT node feeding back.Wherein, RAFT cluster refers in back end group From the RAFT node of back end.By setting RAFT node and carry out RAFT motion, can in follow-up data writing process, Keep the consistency of each back end.Further, storage manager adjusts back the RAFT motion, can carry out data with confirmation and write Enter.Later, storage manager carries out data write-in, and generates aspect indexing according to the data of write-in.When carrying out data write-in, The characteristic and characteristic information with relevance are recorded, and using characteristic as the key of aspect indexing, and then generates feature Index.

The data being written in storage manager and aspect indexing are updated to itself by step 352, the second target data node Cache manager.

Specifically, data to be written and aspect indexing are updated to the slow of itself after storage manager completes write operation Manager is deposited, so that the data and aspect indexing of cache manager storage write-in.When subsequently received retrieval request, preferentially postpone It deposits in manager and is searched, to improve retrieval rate.

Response message is written to client feedback in step 353, the second target data node.

Illustratively, the storage manager of the second target data node sends the data and feature of write-in to cache manager After index, write-in response message is fed back to service layer.The write-in response message may include the key of aspect indexing, be written At information such as marks.Later, service layer feeds back to client for response message is written, so that client confirmation data are successfully write Enter, and confirm relevant aspect indexing, and then realizes subsequent data retrieval.

Scheme two, the second target data node are the slave back end in affiliated back end group.The step specifically includes Step 354- step 358:

Write request is sent to third target data node, third target data by step 354, the second target data node Node is the primary data node in affiliated back end group.

Specifically, since the data of primary data node control back end group are synchronous, when carrying out data write-in, Data write-in directly can be carried out by primary data node, and after the completion of write operation, each back end carries out data into group It is synchronous.After synchronizing, the second target data node is equally stored with the data and aspect indexing of write-in.Accordingly, setting is when the second mesh When to mark back end be from back end, write request is sent to primary data node by the second target data node, and by leading Back end executes write operation.At this point, primary data node is denoted as third target data node.Third target data node receives When write request, the second target data node is recorded, in the follow-up process, to be written and respond to the second target data node feeding back Information.

Step 355, third target data node carry out write operation simultaneously in itself storage manager according to write request Create aspect indexing.

The data being written in storage manager and aspect indexing are updated to itself by step 356, third target data node Cache manager.

Response message is written to the second target data node feeding back in step 357, third target data node.

Step 358, the second target data node are to the client feedback said write response message.

Step 355 is identical to the specific implementation process of step 358 and the specific implementation process of step 351 to step 353, This is not repeated them here.

It should be noted that respectively being counted into group from back end after third target data node completes write operation According to synchronization.

Exemplary description is carried out to technical solution provided in this embodiment below.Wherein, Fig. 6 provides for the embodiment of the present invention Client data flow diagram between each cluster when data are written.Fig. 7 is primary data node provided in an embodiment of the present invention progress The data flow diagram of response is written.In this example, data, i.e. the second mesh are written in setting client expectation in primary data node Mark back end is primary data node.

With reference to Fig. 6 to Fig. 7, after client generates write instruction, to Metadata Service collection pocket transmission write instruction, so that first Some metadata node determines write instruction in data service cluster, and corresponds to the second target data node to client feedback Metadata.After client receives metadata, the storage of write-in data is calculated by consistency hash algorithm according to location information Fragment combines fragment result to generate write request, and write request is sent to the second target data node later.

The service layer of second target data node receives write request, and write request is sent to storage manager. Write request is packaged into RAFT motion by storage manager.Later, motion is carried out to local RAFT node, local RAFT node connects It receives after motion to RAFT collection pocket transmission RAFT motion.It is logical to local RAFT node feeding back after RAFT cluster confirms RAFT motion The RAFT motion crossed.Further, storage manager adjusts back the RAFT motion, and carries out data write-in, later according to write-in Data generate aspect indexing.After storage manager completes write operation, data to be written and aspect indexing are updated to itself Cache manager.After storage manager sends the data and aspect indexing of write-in to cache manager, write to service layer's feedback Enter response message, service layer feeds back to client for response message is written, so that client confirmation data are successfully written.Meanwhile Cache manager flush buffers.

Above-mentioned, being synchronized by data in back end group realizes distributed storage.Also, by primary data node and from Back end can not only realize that data are written, and be convenient for data in data write-in and synchronize.Meanwhile it may be implemented whole A back end group it is fault-tolerant, it is other fault-tolerant rather than just individual data node level.At this point, client is without storing each data The metadata of node can also realize the communication with back end by metadata node.

Fig. 8 is a kind of structural schematic diagram of distributed data processing system provided in an embodiment of the present invention.It, should with reference to Fig. 8 Distributed data processing system includes metadata cluster 41 and aspect indexing server cluster 42, and metadata cluster 41 includes at least One metadata node 411, aspect indexing server cluster 42 include multiple back end groups 421, each back end group 421 Including multiple back end 422, data are synchronous in back end group 421.

Metadata node 411 is also used to for receiving the search instruction of client (not shown) transmission according to search instruction Whole metadata of storage are fed back into client, so that client is according to metadata selected first object back end and to One target data node sends retrieval request, and the corresponding back end of a group metadata, each back end group 421 is interior to be existed One back end 422 is used as first object back end；

First object back end for receiving retrieval request, and obtains the aspect indexing in retrieval request；It is also used to root Retrieval data are determined according to aspect indexing, and will retrieve data feedback to client, so that client determines inspection according to retrieval data Hitch fruit.

On the basis of the above embodiments, in back end group 421 cryptographic Hash of each back end 422 in setting range It is interior.

Specifically, the setup parameter (such as condition code) of each back end is calculated by consistency Hash calculation, it Afterwards, the back end belonged within the scope of same cryptographic Hash is classified as one group, the cryptographic Hash range of different data node group is different.Its In, the range of cryptographic Hash can be according to the actual situation.

Further, when the first object back end is used to determine retrieval data according to the aspect indexing, specifically For:

It determines in own cache manager with the presence or absence of retrieval data corresponding with the aspect indexing；If it exists, then it obtains Take the retrieval data；If it does not exist, then retrieval data corresponding with the aspect indexing are searched in itself storage manager.

Further, back end group further includes the second target data node, and metadata node is also used to: receiving the visitor The write instruction that family end is sent；The second target data node is determined in total data node according to said write instruction；By institute The metadata for stating the second target data node feeds back to the client, so that the client is according to the metadata to described Second target data node sends write request.Second target data node is used for: receiving said write request；It is write according to described Enter request and carries out write-in response.

Further, the second target data node be affiliated back end group in primary data node, described second When target data node is for carrying out write-in response according to said write request, it is specifically used for: according to said write request certainly Write operation is carried out in body storage manager and creates aspect indexing；By the data being written in the storage manager and feature rope Draw and is updated to own cache manager；Response message is written to the client feedback.

Further, the second target data node is the slave back end in affiliated back end group, the data Node group includes third target data node；The second target data node is used to carry out write-in sound according to said write request At once, it is specifically used for: said write request is sent to third target data node, is also used to: receiving third target data section The write-in response message of point feedback, and write-in response message is fed back into client；The third target data node is affiliated Primary data node in back end group.Third target data node is used for: being requested according to said write in itself storage management Write operation is carried out in device and creates aspect indexing；The data and aspect indexing that are written in the storage manager are updated to certainly Body cache manager；Response message is written to the second target data node feeding back.

Distributed data processing system provided in this embodiment has for executing above-mentioned any distributed data processing method Standby corresponding function and beneficial effect.

Fig. 9 is a kind of structural schematic diagram of distributed data processing device provided in an embodiment of the present invention.It, should with reference to Fig. 9 Distributed data processing device includes:

Search instruction receiving module 501, is configured at metadata node, for receiving the search instruction of client transmission, institute It states metadata node and belongs to metadata cluster；First metadata feedback module 502, is configured at metadata node, for according to inspection Whole metadata of storage are fed back to the client by Suo Zhiling, so that the client is according to the metadata selected first Target data node simultaneously sends retrieval request to the first object back end, and a group metadata corresponds to a back end, Multiple back end form a back end group, total data node group composition characteristic index server cluster, each data There are a back end for node group as first object back end, and data are synchronous in the back end group；Retrieval request Receiving module 503 is configured at first object back end, for receiving retrieval request, and obtains the spy in the retrieval request Sign index；Respond module 504 is retrieved, first object back end is configured at, for determining retrieval number according to the aspect indexing According to, and by the retrieval data feedback to the client, so that the client determines search result according to retrieval data.

On the basis of the above embodiments, retrieval respond module 504 includes: index comparing unit, for determining that itself is slow It deposits in manager with the presence or absence of retrieval data corresponding with the aspect indexing；First data capture unit is used for if it exists, then Obtain the retrieval data；Second data capture unit, for if it does not exist, then searched in itself storage manager with it is described The corresponding retrieval data of aspect indexing；Data feedback unit is used for by the retrieval data feedback to the client, so that institute It states client and determines search result according to retrieval data.

On the basis of the above embodiments, the aspect indexing includes characteristic, and the characteristic is to be retrieved Data carry out what deep learning obtained.

On the basis of the above embodiments, the retrieval data include data unique identification and/or data positional information.

On the basis of the above embodiments, the back end group includes primary data node and multiple from back end, institute It is synchronous to state data of the primary data node for controlling the back end group.

On the basis of the above embodiments, further includes: write instruction receiving module is configured at metadata node, for connecing Receive the write instruction that the client is sent；Node determining module, is configured at metadata node, for being instructed according to said write The second target data node is determined in total data node；Second metadata feedback module, is configured at metadata node, is used for The metadata of the second target data node is fed back into the client so that the client according to the metadata to The second target data node sends write request；Write request receiving module is configured at the second target data node, is used for Receive said write request；Respond module is written, is configured at the second target data node, for requesting to carry out according to said write Write-in response.

On the basis of the above embodiments, the second target data node is the master data section in affiliated back end group Point, said write respond module include: the first writing unit, for according to said write request in itself storage manager into Row write operation simultaneously creates aspect indexing；First updating unit, data and feature for will be written in the storage manager Index upgrade is to own cache manager；First feedback unit, for response message to be written to the client feedback.

On the basis of the above embodiments, the second target data node is the slave data section in affiliated back end group Point, said write respond module specifically include: the first transmission unit, for said write request to be sent to third target data Node, the third target data node are the primary data node in affiliated back end group, the second transmission unit, for receiving The write-in response message of third target data node feeding back, and said write response message is sent to client.The distribution Formula data processing equipment further include: the second writing module is configured at third target data node, for requesting according to said write Write operation is carried out in itself storage manager and creates aspect indexing；Second update module is configured at third target data Node, for the data and aspect indexing that are written in the storage manager to be updated to own cache manager；Second feedback Module is configured at third target data node, for response message to be written to the second target data node feeding back.

Distributed data processing device provided in an embodiment of the present invention can be used for executing above-mentioned any distributed data processing Method has corresponding function and beneficial effect.

Figure 10 is a kind of structural schematic diagram of metadata node provided in an embodiment of the present invention.As shown in Figure 10, this yuan of number It include first processor 60, first memory 61, the first input unit 62, the first output device 63 and the first communication dress according to node Set 64；The quantity of first processor 60 can be one or more in metadata node, with a first processor 60 in Figure 10 For；First processor 60, first memory 61, the first input unit 62,63 and of the first output device in metadata node First communication device 64 can be connected by bus or other modes, in Figure 10 for being connected by bus.

First memory 61 is used as a kind of computer readable storage medium, and can be used for storing software program, computer can hold Line program and module, such as the corresponding journey that metadata node executes in the distributed data processing method in the embodiment of the present invention Sequence instruction/module is (for example, be configured at 501 He of search instruction receiving module of metadata node in distributed data processing device First metadata feedback module 502).First processor 60 is by running the software program being stored in first memory 61, referring to It enables and module thereby executing the various function application and data processing of metadata node realizes above-mentioned distributed number According to the corresponding part of metadata node in processing method.

First memory 61 can mainly include storing program area and storage data area, wherein storing program area can store behaviour Application program needed for making system, at least one function；Storage data area can be stored to be created according to using for metadata node Data etc..In addition, first memory 61 may include high-speed random access memory, it can also include non-volatile memories Device, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, First memory 61 can further comprise the memory remotely located relative to first processor 60, these remote memories can be with Pass through network connection to metadata node.The example of above-mentioned network include but is not limited to internet, intranet, local area network, Mobile radio communication and combinations thereof.

First input unit 62 can be used for receiving the number or character information of input, and generate the use with metadata node Family setting and the related key signals input of function control.First output device 63 may include that display screen etc. shows equipment.First Communication device 64 is used to carry out data communication with client and back end.

Above-mentioned metadata node can be used for executing the phase that metadata node executes in any distributed data processing method Operation is closed, has corresponding function and beneficial effect.

Figure 11 is a kind of structural schematic diagram of back end provided in an embodiment of the present invention.As shown in figure 11, the data section Point includes second processor 70, second memory 71, secondary input device 72, the second output device 73 and secondary communication device 74；The quantity of second processor 70 can be one or more in back end, in Figure 11 by taking a second processor 70 as an example； Second processor 70, second memory 71, secondary input device 72, the second output device 73 and the second communication in back end Device 74 can be connected by bus or other modes, in Figure 11 for being connected by bus.

Second memory 71 is used as a kind of computer readable storage medium, and can be used for storing software program, computer can hold Line program and module, such as the corresponding program that back end executes in the distributed data processing method in the embodiment of the present invention Instruction/module is (for example, be configured at retrieval request receiving module 503 and the retrieval of back end in distributed data processing device Respond module 504).Software program, instruction and the module that second processor 70 is stored in second memory 71 by operation, Thereby executing the various function application and data processing of back end, that is, realize number in above-mentioned distributed data processing method According to the corresponding part of node.

Second memory 71 can mainly include storing program area and storage data area, wherein storing program area can store behaviour Application program needed for making system, at least one function；Storage data area can be stored to be created according to using for back end Data etc..In addition, second memory 71 may include high-speed random access memory, it can also include nonvolatile memory, A for example, at least disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, Two memories 71 can further comprise the memory remotely located relative to second processor 70, these remote memories can lead to Network connection is crossed to back end.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, movement Communication network and combinations thereof.

Secondary input device 72 can be used for receiving the number or character information of input, and generate the user with back end Setting and the related key signals input of function control.Second output device 73 may include that display screen etc. shows equipment.Second is logical T unit 74 is used to carry out data communication with client and metadata node.

Above-mentioned back end can be used for executing the related behaviour that back end in any distributed data processing method executes Make, has corresponding function and beneficial effect.

The embodiment of the present invention also provides a kind of storage medium comprising computer executable instructions, and the computer is executable Instruction is when by first processor and second processor execution for executing a kind of distributed data processing method, this method packet It includes:

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention Distributed number provided by any embodiment of the invention can also be performed in the method operation that executable instruction is not limited to the described above According to the relevant operation in processing method.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of distributed data processing method characterized by comprising

Whole metadata of storage are fed back to the client according to search instruction by the metadata node, so that the client End according to the metadata selected first object back end and to the first object back end send retrieval request, one group Metadata corresponds to a back end, and multiple back end form a back end group, total data node group composition characteristic Index server cluster, there are a back end as first object back end, the data section for each back end group Data are synchronous in point group；

The first object back end determines retrieval data according to the aspect indexing, and by the retrieval data feedback to institute Client is stated, so that the client determines search result according to retrieval data.

2. distributed data processing method according to claim 1, which is characterized in that the first object back end root Determine that retrieval data include: according to the aspect indexing

The first object back end determines in own cache manager with the presence or absence of retrieval corresponding with the aspect indexing Data；

If it exists, then the retrieval data are obtained；

3. distributed data processing method according to claim 1 or 2, which is characterized in that the aspect indexing includes spy Data are levied, the characteristic is to carry out deep learning to data to be retrieved to obtain.

4. distributed data processing method according to claim 1 or 2, which is characterized in that the retrieval data include number According to unique identification and/or data positional information.

5. distributed data processing method according to claim 1, which is characterized in that the back end group includes main number According to node and multiple from back end, the data that the primary data node is used to control the back end group are synchronous.

6. distributed data processing method according to claim 5, which is characterized in that further include:

The metadata node receives the write instruction that the client is sent；

The metadata of the second target data node is fed back to the client by the metadata node, so that the client End sends write request to the second target data node according to the metadata；

The second target data node receives said write request；

7. distributed data processing method according to claim 6, which is characterized in that the second target data node is Primary data node in affiliated back end group,

The second target data node is requested to carry out write operation in itself storage manager and be created according to said write Aspect indexing；

The data and aspect indexing that are written in the storage manager are updated to own cache by the second target data node Manager；

8. distributed data processing method according to claim 6, which is characterized in that the second target data node is Slave back end in affiliated back end group,

Said write request is sent to third target data node, the third target data by the second target data node Node is the primary data node in affiliated back end group；

The third target data node is requested to carry out write operation in itself storage manager and be created according to said write Aspect indexing；

The data and aspect indexing that are written in the storage manager are updated to own cache by the third target data node Manager；

9. a kind of distributed data processing system characterized by comprising metadata cluster and aspect indexing server cluster, The metadata cluster includes at least one metadata node, and the aspect indexing server cluster includes multiple back end Group, each back end group includes multiple back end, and data are synchronous in the back end group；

The metadata node is also used to according to search instruction for receiving the search instruction of client transmission by the complete of storage Portion's metadata feeds back to the client so that the client according to the metadata selected first object back end and to The first object back end sends retrieval request, the corresponding back end of a group metadata, in each back end group There are a back end as first object back end；

The first object back end for receiving retrieval request, and obtains the aspect indexing in retrieval request；It is also used to root Retrieval data are determined according to the aspect indexing, and by the retrieval data feedback to the client, so that the client root Search result is determined according to retrieval data.

10. distributed data processing system according to claim 9, which is characterized in that each number in the back end group According to node cryptographic Hash within the set range.