CN112905699A

CN112905699A - Full data comparison method, device, equipment and storage medium

Info

Publication number: CN112905699A
Application number: CN202110200812.0A
Authority: CN
Inventors: 杨波
Original assignee: Jingdong Shuke Haiyi Information Technology Co Ltd
Current assignee: Jingdong Shuke Haiyi Information Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-06-04
Anticipated expiration: 2041-02-23
Also published as: CN112905699B

Abstract

The application discloses a full data comparison method which is applied to a coordination server; the method comprises the following steps: obtaining a corresponding relation between the service node and the comparison data; issuing the corresponding relation to each service node of the distributed cluster, so that each service node traverses the stored total comparison data according to the corresponding relation between the service node and the comparison data, and loads the respective corresponding comparison data into a memory; and under the condition of receiving the data comparison instruction, issuing the comparison target to each service node of the distributed cluster, so that each service node compares the comparison data in the respective memory with the comparison target. By applying the technical scheme provided by the application, the comparison of the total data is realized, each service node is only compared with the comparison target based on the comparison data cached in the memory of the service node, the total comparison time can be reduced, and the comparison efficiency is improved. The application also discloses a device, equipment and a storage medium for comparing the full data, and the device, the equipment and the storage medium have corresponding technical effects.

Description

Full data comparison method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer application technologies, and in particular, to a method, an apparatus, a device, and a storage medium for comparing full data.

Background

With the rapid development of computer technology and internet technology, the data volume of various service data is continuously increased, and more scenes needing data comparison are needed. For example, face data comparison is required in the scenes of identity verification, wind control and the like, fingerprint data comparison is required in the scenes of attendance checking, authentication and the like, and vehicle data comparison and the like are required in the scene of vehicle monitoring. Taking the face data comparison as an example, the face feature vectors of the face pictures can be calculated through an AI (Artificial Intelligence) model, then the distance values of the face feature vectors corresponding to any two pictures are calculated, and whether the people in the two pictures are the same person is judged.

When data comparison is performed, the comparison target is often required to be compared with the full amount of comparison data. For example, after the user is registered, the application system compares the full face library to determine whether the user has a behavior of repeatedly registering an account. For the comparison of the full volume data, at present, the full volume comparison data is mostly stored by one machine, and then the comparison target is compared with each comparison data in the full volume comparison data one by one.

The method has a certain disadvantage that when the data volume of the full-volume comparison data is large, the Input/Output (IO) overhead of the machine is large, the data comparison process will take a long time, and the comparison efficiency is low.

Disclosure of Invention

The application aims to provide a full data comparison method, a full data comparison device, full data comparison equipment and a storage medium, so that time consumption of full data comparison is reduced, and comparison efficiency is improved.

In order to solve the technical problem, the application provides the following technical scheme:

a full data comparison method is applied to a coordination server, the coordination server is respectively in communication connection with each service node of a distributed cluster, and full comparison data are stored in each service node; the method comprises the following steps:

obtaining a corresponding relation between the service node and the comparison data;

issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster, so that each service node traverses the total comparison data according to the corresponding relation between the service node and the comparison data, and loads the respective corresponding comparison data into a memory;

and under the condition of receiving a data comparison instruction, issuing a comparison target to each service node of the distributed cluster so that each service node compares the comparison data in the respective memory with the comparison target.

In a specific embodiment of the present application, before issuing the comparison target to each service node of the distributed cluster, the method further includes:

determining whether the distributed cluster is in a service available state;

if yes, the step of sending the comparison target to each service node of the distributed cluster is executed.

In a specific embodiment of the present application, the determining whether the distributed cluster is in a service available state includes:

determining whether temporary nodes created for each service node exist in the coordination server;

if both exist, determining that the distributed cluster is in a service available state.

In one embodiment of the present application, the method further includes:

under the condition of receiving cache feedback information of any service node, establishing a corresponding temporary node for the corresponding service node, wherein the cache feedback information is information sent by the corresponding service node after corresponding comparison data are all loaded into a memory;

and under the condition that any one service node is offline, removing the temporary node corresponding to the corresponding service node.

In one embodiment of the present application, the method further includes:

updating the corresponding relation between the service nodes and the comparison data under the condition that the service nodes in the distributed cluster are changed and each service node in the changed distributed cluster stores the full comparison data;

and repeatedly executing the step of issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster.

In a specific embodiment of the present application, the updating the service node and the comparison data corresponding relationship includes:

replacing the identifier of the off-line node in the corresponding relation between the service node and the comparison data with the identifier of the new node under the condition that the service node is off-line in the distributed cluster and the off-line node is replaced by the new node;

determining the remaining memory occupation ratio of each online node in the distributed cluster under the condition that a service node is offline and an original online node is used for replacing an offline node; determining a substitute node of the off-line node in the on-line node according to the remaining memory proportion; distributing comparison data corresponding to the off-line node in the corresponding relation between the service node and the comparison data to the replacement node;

and under the condition that no service node is offline but a new node is added in the distributed cluster, updating the corresponding relation between the service node and the comparison data according to the memory occupation proportion of each service node in the distributed cluster.

In one embodiment of the present application, the method further includes:

obtaining a comparison result fed back by each service node;

and determining and outputting a matching result of the comparison target according to the comparison result.

A full data comparison device is applied to a coordination server, the coordination server is respectively in communication connection with each service node of a distributed cluster, and full comparison data are stored in each service node; the device comprises:

the corresponding relation obtaining module is used for obtaining the corresponding relation between the service node and the comparison data;

the corresponding relation issuing module is used for issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster, so that each service node traverses the total comparison data according to the corresponding relation between the service node and the comparison data and loads the corresponding comparison data into a memory;

and the data comparison module is used for issuing a comparison target to each service node of the distributed cluster under the condition of receiving a data comparison instruction so that each service node compares the comparison data in the respective memory with the comparison target.

A full-scale data alignment apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of any one of the above full data comparison methods when executing the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above-described full-data alignment methods.

By applying the technical scheme provided by the embodiment of the application, the coordination server sends the corresponding relation between the service node and the comparison data to each service node of the distributed cluster after obtaining the corresponding relation between the service node and the comparison data, so that each service node can traverse the total amount of comparison data stored by itself according to the corresponding relation between the service node and the comparison data, find out the comparison data corresponding to itself, and load the comparison data into the memory, namely, cache the corresponding comparison data in the memory. And under the condition of receiving the data comparison instruction, the coordination server issues a comparison target to each service node of the distributed cluster, and each service node compares the comparison data in the respective memory with the comparison target. The total sum of the comparison data cached in the memories of all the service nodes is the total comparison data, the aim of comparing the comparison target with the total data is finally achieved, each service node is only based on the comparison data cached in the memory of the service node and is compared with the comparison target, the total comparison time can be shortened, and the comparison efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an overall structure of a system for comparing full data in an embodiment of the present application;

FIG. 2 is a flowchart illustrating an embodiment of a method for comparing full data;

fig. 3 is a schematic diagram illustrating a service node start-up process in an embodiment of the present application;

FIG. 4 is a diagram illustrating a data persistence storage and caching process in an embodiment of the present application;

FIG. 5 is a diagram illustrating a data comparison process according to an embodiment of the present application;

fig. 6 is a schematic diagram of a service node expansion process in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a full data comparison apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a full data comparison device in an embodiment of the present application.

Detailed Description

The core of the application is to provide a full data comparison method, which can be applied to a coordination server, wherein the coordination server is respectively in communication connection with each service node of a distributed cluster, and the full comparison data is stored in each service node.

As shown in fig. 1, the overall structure of a full data comparison system is schematically illustrated, where the system includes a coordination server 110 and a distributed cluster 120, and the distributed cluster 120 is composed of a plurality of service nodes, such as a service node 121, a service node 122, a service node 123, and the like. Coordination server 110 is communicatively coupled to each service node of distributed cluster 120, respectively. Each service node of distributed cluster 120 has stored therein a full amount of comparison data.

After the coordination server obtains the corresponding relationship between the service node and the comparison data, the corresponding relationship between the service node and the comparison data is issued to each service node of the distributed cluster, so that each service node can traverse the total comparison data stored by itself according to the corresponding relationship between the service node and the comparison data, find out the comparison data corresponding to itself, and load the comparison data into the memory, namely, cache the corresponding comparison data in the memory. And under the condition of receiving the data comparison instruction, the coordination server issues a comparison target to each service node of the distributed cluster, and each service node compares the comparison data in the respective memory with the comparison target. The total sum of the comparison data cached in the memories of all the service nodes is the total comparison data, the aim of comparing the comparison target with the total data is finally achieved, each service node is only based on the comparison data cached in the memory of the service node and is compared with the comparison target, the total comparison time can be shortened, and the comparison efficiency is improved.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, a flowchart of an implementation of a full data comparison method in an embodiment of the present application is shown, where the method may include the following steps:

s210: and obtaining the corresponding relation between the service node and the comparison data.

In the embodiment of the present application, the service node and comparison data corresponding relationship means that each service node of the distributed cluster corresponds to cached comparison data, and the relationship may be set through the identifier of the service node and the number, identifier, and the like of the comparison data. The service node identifier may be an IP (Internet Protocol), a MAC (Media Access Control Address), and the like of the service node, as long as one service node can be uniquely represented. The number or the identifier of the comparison data can be obtained by setting a rule, for example, the number or the identifier can be obtained by performing hash operation on the comparison data. In the service node and comparison data corresponding relationship, each service node may correspond to one or more comparison data, and the sum of the comparison data corresponding to all service nodes is the total comparison data.

The comparison data corresponding to different service nodes are different, so that the situation of repeated comparison in the subsequent data comparison process can be avoided, and the comparison efficiency is improved.

The corresponding relation between the service node and the comparison data can be set by a user and transmitted to the coordination server. The coordination server may also be automatically configured, for example, automatically configured according to the memory size of each service node of the distributed cluster.

If there are 4 total alignment data, respectively identified as u0, u1, u2 and u3, the distributed cluster includes two service nodes, respectively identified as 192.168.0.10 and 192.168.0.11. In the service node and comparison data correspondence relationship, the service node identified as 192.168.0.10 corresponds to the comparison data identified as u0 and u1, and the service node identified as 192.168.0.11 corresponds to the comparison data identified as u2 and u 3.

The service node identifier may also correspond to the comparison data number, for example, the service node identifier 192.168.0.10 corresponds to the comparison data numbers 0 and 1, and the service node identifier 192.168.0.11 corresponds to the comparison data numbers 2 and 3. Each comparison data number can be obtained through operations such as Hash remainder, and the numbers can be the same though different comparison data identifications are different. That is, whenever the alignment data is numbered 0, 1, it will correspond to the service node identified as 192.168.0.10, and whenever the alignment data is numbered 2, 3, it will correspond to the service node identified as 192.168.0.11.

After the corresponding relationship between the service node and the comparison data is obtained, the operation of step S220 may be continuously performed.

S220: and issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster, so that each service node traverses the total comparison data according to the corresponding relation between the service node and the comparison data, and loads the respective corresponding comparison data into a memory.

The coordination server is in communication connection with each service node of the distributed cluster, and after the corresponding relation between the service node and the comparison data is obtained, the corresponding relation between the service node and the comparison data can be issued to each service node of the distributed cluster. Therefore, each service node of the distributed cluster can obtain the corresponding relation between the service node and the comparison data, and can obtain the comparison data corresponding to the service node in the corresponding relation between the service node and the comparison data. For any service node, the service node may traverse the stored full comparison data according to the corresponding relationship between the service node and the comparison data, search the comparison data corresponding to the service node from the full comparison data, and load the comparison data corresponding to the service node into the memory. After each service node performs such an operation, a part of comparison data is cached in the memory of each service node, and the sum of the comparison data cached in the memories of all the service nodes is the total comparison data.

In the above example, after the above operations are performed, the memory of the service node identified as 192.168.0.10 caches the comparison data with numbers 0 and 1, and the memory of the service node identified as 192.168.0.11 caches the comparison data with numbers 2 and 3.

S230: and under the condition of receiving the data comparison instruction, issuing the comparison target to each service node of the distributed cluster, so that each service node compares the comparison data in the respective memory with the comparison target.

When the user has a data comparison requirement, a corresponding data comparison instruction can be sent to the coordination server. The coordination server can obtain a comparison target under the condition that the coordination server receives the data comparison instruction, and sends the comparison target to each service node of the distributed cluster, and each service node can compare the comparison data in the respective memory with the comparison target. That is, for any service node, after receiving the comparison target issued by the coordination server, the service node may compare each piece of comparison data cached in its memory with the comparison target, respectively.

In data alignment, the alignment may be performed based on a feature vector of the data.

It should be noted that, in the embodiment of the present application, the comparison data may be face data, fingerprint data, license plate data, and the like, based on a specific application scenario.

By applying the method provided by the embodiment of the application, the coordination server sends the corresponding relation between the service node and the comparison data to each service node of the distributed cluster after obtaining the corresponding relation between the service node and the comparison data, so that each service node can traverse the total amount of comparison data stored by itself according to the corresponding relation between the service node and the comparison data, find out the comparison data corresponding to itself, and load the comparison data into the memory, namely, cache the corresponding comparison data in the memory. And under the condition of receiving the data comparison instruction, the coordination server issues a comparison target to each service node of the distributed cluster, and each service node compares the comparison data in the respective memory with the comparison target. The total sum of the comparison data cached in the memories of all the service nodes is the total comparison data, the aim of comparing the comparison target with the total data is finally achieved, each service node is only based on the comparison data cached in the memory of the service node and is compared with the comparison target, the total comparison time can be shortened, and the comparison efficiency is improved.

In an embodiment of the present application, before issuing the comparison target to each service node of the distributed cluster, the method may further include the following steps:

and determining whether the distributed cluster is in a service available state, and if so, executing the step of issuing the comparison target to each service node of the distributed cluster.

It can be understood that the corresponding relationship between the service nodes and the comparison data includes the corresponding relationship between all the service nodes and the full amount of comparison data, the sum of the comparison data corresponding to each service node is the full amount of comparison data, and the full amount of data comparison can be performed only when the distributed cluster is in the service available state. If a service node is offline or the comparison data is not loaded into the memory normally, the comparison result of the comparison data in the memory of the service node and the comparison target is lacked, and the comparison of the total data cannot be performed, so that the final comparison result is inaccurate, which is also the case that the distributed cluster is in a service unavailable state.

Therefore, in the embodiment of the present application, before receiving the data comparison instruction and issuing the comparison target to each service node of the distributed cluster, it may be determined whether the distributed cluster is in a service available state.

In a specific embodiment of the present application, determining whether the distributed cluster is in the service available state may include the following steps:

if both exist, the distributed cluster is determined to be in a service available state.

The method comprises the steps that under the condition that cache feedback information of any service node is received, a corresponding temporary node is established for the corresponding service node, and the cache feedback information is sent out after corresponding comparison data are loaded into a memory by the corresponding service node; and under the condition that any one service node is offline, removing the temporary node corresponding to the corresponding service node.

In the embodiment of the present application, if the service node traverses the full comparison data according to the correspondence between the service node and the comparison data, and loads the comparison data corresponding to the service node into the memory, the service node may send the cache feedback information to the coordination server. For any service node, after receiving the cache feedback information of the service node, the coordination server may create a corresponding temporary node for the service node in the coordination server. In the operation process, if the service node is offline due to a fault, shutdown and the like, the temporary node corresponding to the service node is automatically removed. That is, the temporary nodes in the coordination server correspond to the service nodes that have loaded the corresponding comparison data into the memory and can normally provide services, and one temporary node corresponds to one service node.

And determining whether the temporary nodes created for each service node in the coordination server exist, if so, indicating that each service node has loaded the corresponding comparison data into the memory and can normally provide the service, and under such a condition, determining that the distributed cluster is in a service available state. If at least one temporary node corresponding to the service node does not exist, the current distributed cluster can be considered to be in a service unavailable state.

If the distributed cluster is in the service available state, all the service nodes of the distributed cluster are online, and the corresponding comparison data is loaded into the memory, in this case, the comparison target may be issued to each service node, and each service node may compare the comparison data in the memory with the comparison target. Because the sum of the comparison data in the memories of all the service nodes is the full comparison data, the purpose of comparing the full data can be realized.

If the distributed cluster is in a service unavailable state, the distributed cluster may have service nodes offline or have service nodes not loading corresponding comparison data in the memory, and if the comparison target is issued to each service node, the purpose of comparing the full data cannot be realized due to data loss in the memory. In this case, an error prompt message may be returned to facilitate the user to perform problem troubleshooting in time, or the step of issuing the correspondence between the service node and the comparison data to each service node of the distributed cluster may be repeatedly performed, so that each service node traverses the total comparison data according to the correspondence between the service node and the comparison data, and loads the respective corresponding comparison data into the memory.

In one embodiment of the present application, the method may further comprise the steps of:

and repeating the step of issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster.

In the operation process, service nodes in the distributed cluster may be changed, for example, if a new node is simply added, a service node is offline due to a fault, shutdown and the like, and an offline node is replaced by the new node, the service nodes in the distributed cluster are changed. If a change occurs to a service node in the distributed cluster, the distributed cluster will be in a service unavailable state.

Under the condition that the service nodes in the distributed street cluster are changed, each service node in the changed distributed cluster can store full comparison data through modes of manual copying or interaction among the service nodes and the like.

When the service nodes in the distributed cluster are changed and each service node in the changed distributed cluster stores the full comparison data, the corresponding relation between the service nodes and the comparison data can be updated. Specifically, the corresponding relationship between the service node updated by the user and the comparison data can be directly obtained, and the corresponding relationship between the service node and the comparison data can be automatically updated.

In a specific embodiment of the present application, when a service node is offline in a distributed cluster and the offline node is replaced by a new node, the identifier of the offline node in the correspondence between the service node and the comparison data is replaced by the identifier of the new node.

Service nodes in the distributed cluster may be offline due to failure, shutdown and the like, and when a service node in the distributed cluster is offline, the offline node can be replaced by a new node. If the distributed cluster includes the service node identified as 192.168.0.10 and the service node identified as 192.168.0.11, if the service node identified as 192.168.0.10 goes offline, the service node identified as 192.168.0.12 can be used instead, and the service node identified as 192.168.0.12 is a new node, so that the distributed cluster includes the service node identified as 192.168.0.11 and the service node identified as 192.168.0.12.

When a service node is offline in the distributed cluster and the new node is used to replace the offline node, the identifier of the offline node in the corresponding relationship between the service node and the comparison data can be replaced by the identifier of the new node. Namely, the new node is directly used for replacing the offline node, and the comparison data corresponding to the new node is the same as the comparison data corresponding to the offline node. Therefore, the comparison data corresponding to other service nodes in the distributed cluster is not changed, and after the coordination server issues the corresponding relation between the service node and the comparison data to each service node, the comparison data in the memories of other service nodes except the new node can not be changed. The high availability of the overall service is quickly guaranteed.

In another specific embodiment of the present application, when a service node is offline in a distributed cluster and an original online node is used to replace the offline node, the remaining memory proportion of each online node in the distributed cluster is determined; determining a substitute node of the off-line node in the on-line node according to the remaining memory proportion; and distributing the comparison data corresponding to the off-line node in the corresponding relation between the service node and the comparison data to the replacement node.

Service nodes in the distributed cluster may be offline due to failure, shutdown and the like, and when a service node in the distributed cluster is offline, the original online node can be used for replacing the offline node without adding a new node. If the distributed cluster includes a service node identified as 192.168.0.10, a service node identified as 192.168.0.11, and a service node identified as 192.168.0.12, if the service node identified as 192.168.0.10 goes offline, a service node identified as 192.168.0.12 may be used as a substitute, and thus, the distributed cluster will include a service node identified as 192.168.0.11 and a service node identified as 192.168.0.12.

The method comprises the steps that when a service node is offline in a distributed cluster, and a new node is not added to replace an offline node with an original online node, the remaining memory occupation ratio of each online node in the distributed cluster can be determined, and the replacement node of the offline node is determined in the online nodes according to the remaining memory occupation ratio. Specifically, the online node with the largest remaining memory ratio can be determined as a replacement node of the offline node, so that enough memory can be ensured to cache more comparison data, and the high availability of the overall service is quickly ensured.

After the candidate node is determined, the comparison data corresponding to the offline node in the correspondence between the service node and the comparison data may be allocated to the candidate node. Namely, the offline node is directly replaced by the replacement node, the comparison data corresponding to the replacement node comprises the comparison data corresponding to the prior node and the comparison data corresponding to the offline node, and no new node is added. Therefore, the comparison data corresponding to other service nodes in the distributed cluster is not changed, and after the coordination server issues the corresponding relation between the service node and the comparison data to each service node, the comparison data in the memories of other service nodes except the replacement node can not be changed.

In another specific embodiment of the present application, in a case that no service node is offline in the distributed cluster but a new node is added, the corresponding relationship between the service node and the comparison data is updated according to the memory proportion of each service node in the distributed cluster.

When the distributed cluster needs to be expanded due to the increase of the data volume, a new node is added to the distributed cluster. For example, the distributed cluster includes a service node identified as 192.168.0.10 and a service node identified as 192.168.0.11, and on this basis, a service node identified as 192.168.0.12 is newly added, so that the distributed cluster includes the service node identified as 192.168.0.10, the service node identified as 192.168.0.11 and the service node identified as 192.168.0.12.

Under the condition that no service node is offline but a new node is added in the distributed cluster, the corresponding relation between the service node and the comparison data can be updated according to the memory occupation proportion of each service node in the distributed cluster. For example, the partial comparison data corresponding to the service node whose memory occupation ratio is greater than the set ratio threshold may be allocated to the new node. So as to ensure that each service node has enough memory to cache the corresponding comparison data.

It should be noted that, the updating of the corresponding relationship between the service nodes and the comparison data may be performed according to a specific application scenario, the service nodes in the distributed cluster are changed, and the corresponding relationship between all the service nodes and the comparison data may be updated when each service node in the changed distributed cluster stores the full amount of comparison data.

After the corresponding relation between the service node and the comparison data is updated, the step of issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster is repeatedly executed, so that each service node traverses the total comparison data according to the corresponding relation between the service node and the comparison data, and the corresponding comparison data is loaded into the memory. And guarantee is provided for the subsequent comparison of the full data.

When the service nodes in the distributed cluster are changed, each service node can be restarted, the comparison data cached in the memory of each service node is lost in the restarting process, and after the corresponding relation between the service nodes and the comparison data sent by the coordination server is received again, the whole comparison data can be traversed according to the corresponding relation between the service nodes and the comparison data, and the corresponding comparison data can be loaded into the memory. Certainly, when there is a change of service node in the distributed cluster, the restart operation may not be executed for each service node, so that the comparison data cached in the memory of each service node is not lost, and after receiving the correspondence between the service node and the comparison data sent by the coordination server again, the whole comparison data may be traversed according to the correspondence between the service node and the comparison data, and the corresponding comparison data in the memory is updated, that is, the existing retention and the loading are not present.

obtaining a comparison result fed back by each service node;

In the embodiment of the application, the comparison target is issued to each service node of the distributed cluster under the condition that the data comparison instruction is received, each service node compares the comparison data in the respective memory with the comparison target, and after the comparison is finished, the comparison result can be fed back to the coordination server. For each service node, the comparison result fed back by the service node may be the similarity between each comparison data in the memory of the service node and the comparison target, or the related information of the comparison data in the memory of the service node, the similarity between the comparison data and the comparison target being greater than the set first similarity threshold, such as the identifier of the comparison data, the corresponding similarity, and the like.

After the comparison result fed back by each service node is obtained, the matching result of the comparison target can be determined according to the comparison result. Specifically, the comparison data corresponding to the maximum similarity in the comparison results may be determined as the matching result of the comparison target, or the comparison data having a similarity greater than the set second similarity threshold in the comparison results may be determined as the matching result of the comparison target. And after the matching result of the comparison target is determined, the matching result can be output.

For ease of understanding, the embodiments of the present application will be described below by taking a face recognition scene as an example.

In an actual application scenario, the implementation of the embodiment of the present application can be divided into two stages: a system initial phase and a service providing phase.

In the initial stage of the system, each service node of the distributed cluster can be respectively started, each service node stores the total comparison data, the coordination server issues the corresponding relation between the service node and the comparison data to each service node, and each service node loads the corresponding comparison data into the memory. The comparison data cached in the memory of each service node form a data pool to be compared;

in the service providing stage, the coordination server issues the comparison target to each service node, and each service node compares the comparison data in the respective memory with the comparison target.

Some concepts and rules in this example are first explained.

The full comparison data is face data to be compared, and each face data to be compared is assigned with a corresponding data number, such as 0 to 127. The maximum number of data numbers may be set according to the maximum number of service nodes expected in the distributed cluster.

Each service node in the distributed cluster has a unique identity, such as an IP identity.

A service node and comparison data corresponding relationship may be defined, and in this example, the service node and comparison data corresponding relationship is a number corresponding relationship between the service node and the comparison data, and is used to describe which face data is cached to which corresponding service node. For example, the serving node identified as 192.168.0.10 supports the face data of 0 number, the face data of 0 number is cached only on the serving node identified as 192.168.0.10.

Any face data to be compared is stored on each service node.

That is to say: the service node persistently stores the full face data and caches the face data with the corresponding relation number in the memory.

The service node start-up procedure in the initial stage of the system is explained as follows, as shown in fig. 3:

s310: each service node of the distributed cluster is started, and each service node is allocated with a unique identification, such as IP, as the service node identification.

S320: each service node of the distributed cluster is loaded [ the service node and the comparison data corresponding relation ].

The service node and comparison data corresponding relationship may specifically be a service node and comparison data numbering relationship, for example, a service node identified as 192.168.0.10 supports face data with numbers 0 and 1, and a service node identified as 192.168.0.11 supports face data with numbers 2 and 3. The [ corresponding relationship between the service node and the comparison data ] loaded by each service node may be issued by a coordination server deployed with a distributed coordination service zk (zookeeper).

S330: each service node of the distributed cluster traverses the stored full amount of face data.

S340: and each service node of the distributed cluster selectively loads the face data into the memory according to the corresponding relation between the service node and the comparison data.

For example, the service node labeled 192.168.0.10, only loads the data with numbers 0 and 1 into the memory after traversing the stored full amount of face data. When the system is started for the first time, because the full amount of face data is not stored and the corresponding relation between the service node and the comparison data is not received, no data is loaded into the memory.

S350: each service node of the distributed cluster begins to provide services.

After the corresponding face data are loaded into the memory by the service nodes of the distributed cluster, the service can be provided, the coordination server creates temporary nodes for the service nodes, and when the service nodes are offline, the corresponding temporary nodes are automatically removed. Whether the distributed cluster is in the service available state or not can be judged through whether the temporary node exists or not, if all the service nodes provide services, the distributed cluster is in the service available state, and if only one service node cannot provide the services, the distributed cluster is in the service unavailable state.

S360: the service node starting process is ended.

For the process of persistent storage and caching of face data at the initial time, reference may be made to the operations of the steps in fig. 4:

s410: and the coordination server judges whether the distributed cluster is in a service available state, and if so, the operation of the subsequent step is continued.

S420: the coordination server issues the face data, and the face data may contain the constructed data number attribute.

Such as: the identification of the face data is u1, the feature vector is [1], and the number of the data of the u1 structure is 1.

S430: after each service node of the distributed cluster receives the face data, the face data is stored in the respective hard disk.

S440: and each service node of the distributed cluster selectively caches the face data in the memory according to the corresponding relation between the service node and the comparison data.

Such as: the service node identified as 192.168.0.10 receives the face data identified as u1, the data number of u1 is 1, because the service node identified as 192.168.0.10 supports the face data with numbers 0 and 1, and then the service node identified as 192.168.0.10 stores the face data of u1 and caches the face data of u1 in the memory. Since the service node labeled 192.168.0.11 supports the face data with numbers 2 and 3, the service node labeled 192.168.0.11 stores the face data of u1, and then does not perform the memory caching operation any more.

The following describes the data comparison implementation process in the service providing phase, as shown in fig. 5:

s510: and the coordination server judges whether the distributed cluster is in a service available state, and if so, the operation of the subsequent step is continued.

S520: and the coordination server issues the face target data.

Such as: the identification of the face object data is u2 and the feature vector is [2 ].

S530: after each service node in the distributed cluster receives the face target data, the received face target data is compared with the face data cached in the respective memory one by one.

Such as: the node 192.168.0.10 receives the face target data of u2, compares the face target data with the face data (u1) cached in the memory, and can judge whether u1 and u2 are the same person according to the comparison result, and marks or sends out corresponding notification after hit.

S540: and the coordination server obtains a comparison hit result set of each service node.

The implementation of the system initialization phase and the service providing phase is described above. In the system operation process, a case where a service node is changed in a distributed cluster may occur, and an implementation process in the case where a service node is changed in a distributed cluster will be described below.

(1) Dynamically expanding cluster nodes, that is, no service node in the distributed cluster is offline, but a new node is added:

such as: as the amount of data grows, the distributed cluster includes insufficient memory to support the existing amount of data for the service nodes identified as 192.168.0.10 and 192.168.0.11, preparing to add two new nodes, identified as 192.168.0.12, 192.168.0.13, respectively.

The specific implementation process is shown in fig. 6:

s610: and copying the full amount of data stored in any service node to the new node.

As the full amount of data stored in the service node identified as 192.168.0.10 or 192.168.0.11 may be copied to the new nodes identified as 192.168.0.12 and 192.168.0.13.

S620: the new node is started.

The service nodes of the distributed cluster comprise the original service nodes and the new nodes.

S630: and reloading each service node of the distributed cluster [ the service node and the comparison data corresponding relation ].

When there is a change of service node, all temporary nodes created in the coordination server may be deleted, and at this time, the distributed cluster is in a service unavailable state. The coordination server may issue the updated [ service node and comparison data correspondence ] to each service node, so that each service node may reload the updated correspondence.

Updated [ correspondence between service node and comparison data ], if: the service node identified as 192.168.0.10 supports face data of number 0, the service node identified as 192.168.0.11 supports face data of number 1, the service node identified as 192.168.0.12 supports face data of number 2, and the service node identified as 192.168.0.13 supports face data of number 3.

S640: each service node traverses the full amount of stored face data.

S650: and each service node selectively caches respective face data in the memory according to the corresponding relation between the service node and the comparison data.

Such as: the service node marked 192.168.0.10 only loads the face data with the number 0 into the memory, the face data with the numbers 0 and 1 need to be loaded into the memory before, the service node marked 192.168.0.11 only loads the face data with the number 1 into the memory, and the service nodes marked 192.168.0.12 and 192.168.0.13 as new nodes respectively load the face data with the numbers 2 and 3 into the memory.

S660: each service node begins providing services.

After the service nodes of the distributed cluster load the corresponding face data into the memory, the service can be provided, and a temporary node corresponding to each service node is created in the coordination server and used for judging whether the distributed cluster is in a service available state.

So far, if each service node can provide services, the distributed cluster is in a service available state and can start working normally to complete cluster expansion operation.

(2) And (3) fault node replacement, namely, service nodes in the distributed cluster are offline, and the original online nodes are used for replacing offline nodes:

such as: and if the service node marked as 192.168.0.10 is hung up, the machine hardware is also in a problem and cannot provide the service, at the moment, the temporary node corresponding to the service node in the coordination server is automatically removed, and the distributed cluster is in a service unavailable state. If the memory of the service node identified as 192.168.0.11 is abundant, the service node identified as 192.168.0.11 can simultaneously carry the face data in the memory of the service node identified as 192.168.0.10.

The coordination server may delete all temporary nodes previously created while the distributed cluster is in an unavailable state.

And the coordination server sends the updated corresponding relation between the service nodes and the comparison data to all the service nodes of the distributed cluster.

For example, the updated service node and the comparison data have the corresponding relationship: the service node identified as 192.168.0.11 supports face data of 0 and 1 numbers, the service node identified as 192.168.0.12 supports face data of 2 numbers, and the service node identified as 192.168.0.13 supports face data of 3 numbers. The face data of number 0 supported by the service node identified as 192.168.0.10 which is just hung up is supported by the service node identified as 192.168.0.11.

And each service node of the distributed cluster traverses the stored full face data, and loads the respective corresponding face data into the memory according to the received [ corresponding relationship between the service node and the comparison data ].

Such as: the service node marked as 192.168.0.11 loads the face data with the number 0 and the number 1 into the memory (before, the service node only needs to load the face data with the number 1).

And creating a temporary node corresponding to each service node in the coordination server, wherein the temporary node is used for judging whether the distributed cluster is in a service available state or not.

So far, if each service node can provide services, the distributed cluster is in a service available state and can start working normally to complete node replacement operation.

(3) And (3) replacing the fault node, namely taking a service node in the distributed cluster offline, and replacing the offline node by using a new node:

such as: when the service node marked 192.168.0.10 is hung up, the machine hardware has a problem and cannot provide the service, at the moment, the temporary node corresponding to the service node in the coordination server is automatically removed, and the distributed cluster is in a service unavailable state. The suspended service node identified as 192.168.0.10 is ready to be replaced with a new node identified as 192.168.0.15.

Copying the full amount of stored face data in any service node in the distributed cluster to a new node identified as 192.168.0.15, and starting the new node.

The coordination server issues the updated data to all the service nodes of the distributed cluster: [ service node and comparison data correspondence ].

For example, the updated service node and the comparison data have the corresponding relationship: the service node identified as 192.168.0.15 supports face data of number 0, the service node identified as 192.168.0.11 supports face data of number 1, the service node identified as 192.168.0.12 supports face data of number 2, and the service node identified as 192.168.0.13 supports face data of number 3. The hung-off 0-numbered face data supported by the service node identified as 192.168.0.10 is forwarded to the new node identified as 192.168.0.15.

So far, if each service node can provide services, the distributed cluster is in a service available state, and can start working normally to complete node replacement operation.

The implementation process of replacing the failed node may refer to the implementation process of dynamically expanding the cluster node shown in fig. 6, and details are not described here.

According to the embodiment of the application, the comparison result can be obtained quickly by comparing the comparison data cached in the memory of each service node in the distributed cluster with the comparison target, the comparison task with large data volume is supported, when the service node is changed, the comparison data can be expanded, replaced or replaced quickly, the usability of the overall service of the distributed cluster is improved, in addition, the full comparison data is stored in each service node, the comparison data which needs to be loaded into the memory again can be traversed quickly, and the IO time consuming bottleneck of the system for loading the data is reduced.

Corresponding to the above method embodiment, the embodiment of the present application further provides a full data comparison apparatus, which is applied to a coordination server, where the coordination server is respectively in communication connection with each service node of a distributed cluster, and each service node stores full comparison data; the full data comparison device described below and the full data comparison method described above may be referred to with each other.

Referring to fig. 7, the apparatus may include the following modules:

a corresponding relationship obtaining module 710, configured to obtain a corresponding relationship between the service node and the comparison data;

the correspondence issuing module 720 is configured to issue a correspondence between the service node and the comparison data to each service node of the distributed cluster, so that each service node traverses the full amount of comparison data according to the correspondence between the service node and the comparison data, and loads the respective corresponding comparison data into the memory;

the data comparison module 730 is configured to, in a case that the data comparison instruction is received, issue the comparison target to each service node of the distributed cluster, so that each service node compares the comparison data in the respective memory with the comparison target.

By applying the device provided by the embodiment of the application, after the coordination server obtains the corresponding relationship between the service node and the comparison data, the corresponding relationship between the service node and the comparison data is issued to each service node of the distributed cluster, so that each service node can traverse the total amount of comparison data stored by itself according to the corresponding relationship between the service node and the comparison data, find out the comparison data corresponding to itself, and load the comparison data into the memory, namely, cache the corresponding comparison data in the memory. And under the condition of receiving the data comparison instruction, the coordination server issues a comparison target to each service node of the distributed cluster, and each service node compares the comparison data in the respective memory with the comparison target. The total sum of the comparison data cached in the memories of all the service nodes is the total comparison data, the aim of comparing the comparison target with the total data is finally achieved, each service node is only based on the comparison data cached in the memory of the service node and is compared with the comparison target, the total comparison time can be shortened, and the comparison efficiency is improved.

In a specific embodiment of the present application, the method further includes a service availability determining module, configured to:

before the comparison target is issued to each service node of the distributed cluster, whether the distributed cluster is in a service available state is determined;

if yes, the data comparison module 730 is triggered to execute the step of issuing the comparison target to each service node of the distributed cluster.

In one embodiment of the present application, the service availability determining module is configured to:

In a specific embodiment of the present application, the system further includes a temporal node control module, configured to:

under the condition that cache feedback information of any one service node is received, a corresponding temporary node is established for the corresponding service node, and the cache feedback information is information sent by the corresponding service node after corresponding comparison data are all loaded into a memory;

In a specific embodiment of the present application, the method further includes a correspondence update module, configured to:

updating the corresponding relation between the service nodes and the comparison data under the condition that the service nodes in the distributed cluster are changed and each service node in the changed distributed cluster stores the full comparison data; the trigger mapping relation issuing module 720 repeatedly executes the step of issuing the mapping relation between the service node and the comparison data to each service node of the distributed cluster.

In a specific embodiment of the present application, the correspondence update module is configured to:

replacing the identifier of the off-line node in the corresponding relation between the service node and the comparison data with the identifier of the new node under the condition that the service node is off-line in the distributed cluster and the new node is used for replacing the off-line node;

determining the remaining memory occupation ratio of each online node in the distributed cluster under the condition that a service node is offline and an original online node is used for replacing an offline node; determining a substitute node of the off-line node in the on-line node according to the remaining memory proportion; distributing comparison data corresponding to the offline node in the corresponding relation between the service node and the comparison data to the replacement node;

In a specific embodiment of the present application, the method further includes a matching result determining module, configured to:

obtaining a comparison result fed back by each service node;

Corresponding to the above method embodiment, an embodiment of the present application further provides a full-volume data comparison device, including:

a memory for storing a computer program;

and the processor is used for realizing the steps of the full data comparison method when executing the computer program.

As shown in fig. 8, which is a schematic diagram of a composition structure of the full data comparison device, the full data comparison device may include: a processor 10, a memory 11, a communication interface 12 and a communication bus 13. The processor 10, the memory 11 and the communication interface 12 all communicate with each other through a communication bus 13.

In the embodiment of the present application, the processor 10 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, a field programmable gate array or other programmable logic device, etc.

The processor 10 may call a program stored in the memory 11, and specifically, the processor 10 may perform the operations in the embodiment of the full data alignment method.

The memory 11 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, in this embodiment, the memory 11 stores at least the program for implementing the following functions:

issuing the corresponding relation between the service node and the comparison data to each service node of the distributed cluster, so that each service node traverses the total comparison data according to the corresponding relation between the service node and the comparison data, and loading the respective corresponding comparison data into a memory;

and under the condition of receiving the data comparison instruction, issuing the comparison target to each service node of the distributed cluster, so that each service node compares the comparison data in the respective memory with the comparison target.

In one possible implementation, the memory 11 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as an information interaction function and a requirement determination function), and the like; the storage data area may store data created during use, such as correspondence data, comparison target data, and the like.

Further, the memory 11 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.

The communication interface 12 may be an interface of a communication module for connecting with other devices or systems.

Of course, it should be noted that the structure shown in fig. 8 does not constitute a limitation on the full data comparison device in the embodiment of the present application, and in practical applications, the full data comparison device may include more or less components than those shown in fig. 8, or may combine some components.

Corresponding to the above method embodiments, the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above full data comparison method are implemented.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The principle and the implementation of the present application are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A full data comparison method is characterized in that the full data comparison method is applied to a coordination server, the coordination server is respectively in communication connection with each service node of a distributed cluster, and full comparison data are stored in each service node; the method comprises the following steps:

2. The method of claim 1, wherein before issuing the comparison target to each service node of the distributed cluster, the method further comprises:

determining whether the distributed cluster is in a service available state;

3. The method of claim 2, wherein the determining whether the distributed cluster is in a service available state comprises:

4. The method of claim 3, further comprising:

5. The method of claim 1, further comprising:

6. The method of claim 5, wherein the updating the service node to compare data correspondence comprises:

7. The method of any one of claims 1 to 6, further comprising:

obtaining a comparison result fed back by each service node;

8. A full data comparison device is characterized by being applied to a coordination server, wherein the coordination server is respectively in communication connection with each service node of a distributed cluster, and full comparison data are stored in each service node; the device comprises:

9. A full-scale data comparison device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the full data alignment method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the full data alignment method according to any one of claims 1 to 7.