CN109271545A

CN109271545A - A kind of characteristic key method and device, storage medium and computer equipment

Info

Publication number: CN109271545A
Application number: CN201810873786.6A
Authority: CN
Inventors: 陈宇恒; 樊俊良
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2019-01-25
Anticipated expiration: 2038-08-02
Also published as: CN109271545B

Abstract

The embodiment of the present invention provides a kind of characteristic key method and device, storage medium and computer equipment, wherein the described method includes: treating retrieval character carries out feature extraction, obtains compressive features to be retrieved；It is searched from copy set and includes at least a targeted compression feature with the matched targeted compression characteristic set of the compressive features to be retrieved, the targeted compression characteristic set, include different compressive features in the copy set；The corresponding candidate feature of each targeted compression feature is determined from primitive character set, forms candidate feature set；It include at least one primitive character in the primitive character set；Candidate feature in the candidate feature set is compared with the feature to be retrieved, obtains the corresponding target candidate feature of the feature to be retrieved.

Description

Feature retrieval method and device, storage medium and computer equipment

Technical Field

The invention relates to the field of information service, in particular to a feature retrieval method and device, a storage medium and computer equipment.

Background

The feature retrieval service finds out features matching the input features to be retrieved among a series of known features. An existing series of known features are stored in a database, but a feature-based retrieval service is generally applied to the fields of intelligent video analysis, security monitoring and the like, and the known features stored in the database are massive, such as: the facial features stored in the national citizen face information database are 14 hundred million nationwide citizens' facial features, including up to 14 hundred million known features. Therefore, when feature retrieval is performed, the input feature to be retrieved is searched for in 14 hundred million known features, and the amount of information contained in the features itself is large, which results in a very slow processing speed.

In the related technology, the known compressed features after the known features are compressed are matched with the compressed features corresponding to the features to be retrieved, and the matched known features corresponding to the known compressed features are used as final retrieval results, so that the retrieval efficiency is improved through the retrieval of the compressed features, but the retrieval accuracy is greatly reduced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a feature retrieval method and apparatus, a storage medium, and a computer device, which improve the retrieval speed of feature retrieval and effectively improve the retrieval accuracy of feature retrieval.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a feature retrieval method, which comprises the following steps:

performing feature extraction on the features to be retrieved to obtain compressed features to be retrieved;

searching a target compression feature set matched with the compression features to be retrieved from a duplicate set, wherein the target compression feature set at least comprises one target compression feature, and the duplicate set comprises different compression features;

determining candidate features corresponding to each target compression feature from the original feature set to form a candidate feature set; the original feature set comprises at least one original feature;

and comparing the candidate features in the candidate feature set with the features to be retrieved to obtain target candidate features corresponding to the features to be retrieved.

In this embodiment of the present invention, the searching for the target compression feature set matching the compression feature to be retrieved from the replica set includes:

and searching a target compression feature set matched with the compression features to be retrieved from at least two copy sets, wherein the compression features included in each copy set are different.

In an embodiment of the present invention, the set of replicas includes a first subset stored on a first physical machine and a second subset stored on a second physical machine, the compression characteristics in the first subset and the second subset being the same;

correspondingly, the searching for the target compression feature set matching the compression feature to be retrieved from the copy set includes:

and searching a target compression feature set matched with the compression features to be retrieved from a target subset of each copy set, wherein the target subset is selected from the first subset and the second subset.

In an embodiment of the present invention, the replica set includes at least two clusters; the clusters comprise at least one compression feature, and the compression features in the same cluster belong to the same feature type; correspondingly, the searching for the target compression feature set matched with the compression feature to be retrieved from the copy set comprises:

determining target clusters from the duplicate set according to the features to be retrieved and the typical features of each cluster, wherein the typical features represent feature types to which compression features in the corresponding clusters belong;

and searching a target compression feature set matched with the compression features to be retrieved from the compression features of the target cluster.

determining a compression distance between the compression feature to be retrieved and each compression feature in the replica set, wherein the compression distance represents the similarity of the two compression features;

and taking the compression features of which the compression distances are smaller than the set compression distance threshold as the target compression features to form a target compression feature set.

In an embodiment of the invention, the clustering is determined by a clustering algorithm and the set of replicas;

the typical features corresponding to the clusters are determined by the original features corresponding to the compressed features in the clusters.

In this embodiment of the present invention, the determining a candidate feature corresponding to each target compression feature from an original feature set to form a candidate feature set includes:

determining an index corresponding to each target compression feature; the index is used for characterizing the position of a candidate feature corresponding to the target compression feature in the original feature set;

and acquiring a candidate feature corresponding to each target compression feature from the original feature set according to the index corresponding to each target compression feature to form the candidate feature set.

In this embodiment of the present invention, before performing feature extraction on a feature to be retrieved, the method further includes:

performing feature extraction on the features to be written to obtain compressed features to be written;

writing the compression features to be written into a first subset of a target copy set; the target replica set is one of at least two replica sets;

writing the features to be written into a first log corresponding to the first subset;

and writing the compression features corresponding to the features to be written into a second subset of the target copy set according to the first log, and writing the features to be written into a second log corresponding to the second subset.

In an embodiment of the present invention, the method further comprises:

if the compression characteristics in the first subset or the second subset of the copy set are null, determining to recover the compression characteristics according to the characteristics to be written recorded in the first log or the second log; wherein the first log records a to-be-written feature written before the compression feature in the first subset is empty; the second log records the features to be written in the second subset before the compression features are empty;

writing the recovered compressed features to the first subset or the second subset.

In an embodiment of the present invention, the method further comprises:

if the compression characteristics in the first subset or the second subset of the copy set are null, searching a metafile corresponding to the copy set; the metafile records the compression characteristics written before the compression characteristics in the first subset or the second subset are empty;

determining the compression characteristics recorded in the metafile as snapshot compression characteristics, and writing the snapshot compression characteristics into the first subset or the second subset;

acquiring the acquisition time of the last compression feature in the metafile, determining a supplementary original feature from a first log or a second log according to the acquisition time, and writing the compression feature corresponding to the supplementary original feature into the first subset or the second subset.

In an embodiment of the present invention, the method further comprises:

if the compression characteristic in the second subset of the copy set is empty, determining the recording time of the last characteristic to be written recorded in the second log;

determining synchronous original features from the first log according to the recording time, wherein the synchronous original features are to-be-written features written into the first subset after the recording time;

and writing the compressed features corresponding to the synchronous original features into the second subset.

An embodiment of the present invention further provides a feature retrieval device, where the device includes: the device comprises an extraction module, a search module, a determination module and a comparison module; wherein,

the extraction module is used for extracting the features of the features to be retrieved to obtain the compressed features to be retrieved;

the searching module is used for searching a target compression feature set matched with the compression features to be retrieved from a copy set, wherein the target compression feature set at least comprises one target compression feature, and the copy set comprises different compression features;

the determining module is used for determining candidate features corresponding to each target compression feature from the original feature set to form a candidate feature set; the original feature set comprises at least one original feature;

the comparison module is used for comparing the candidate features in the candidate feature set with the features to be retrieved to obtain target candidate features corresponding to the features to be retrieved.

In an embodiment of the present invention, the search module includes: a first lookup sub-module;

the first searching submodule is used for searching a target compression feature set matched with the compression feature to be retrieved from at least two duplicate sets, and the compression feature included in each duplicate set is different.

correspondingly, the search module further comprises: a second lookup sub-module;

the second searching sub-module is configured to search a target compression feature set matching the compression feature to be retrieved from a target subset of each replica set, where the target subset is selected from the first subset and the second subset.

In an embodiment of the present invention, the replica set includes at least two clusters; the clusters comprise at least one compression feature, and the compression features in the same cluster belong to the same feature type; correspondingly, the search module further comprises: determining a submodule and a third searching submodule;

the determining submodule is used for determining a target cluster from the duplicate set according to the features to be retrieved and the typical features of each cluster, and the typical features represent feature types to which the compression features in the corresponding clusters belong;

and the third searching submodule is used for searching a target compression feature set matched with the compression feature to be searched from the compression features of the target cluster.

In this embodiment of the present invention, the searching module further includes: a calculation submodule and a comparison submodule; wherein,

the calculation submodule is used for determining the compression distance between the compression feature to be retrieved and each compression feature in the replica set, and the compression distance represents the similarity of the two compression features;

and the comparison submodule is used for taking the compression characteristic of which the compression distance is smaller than the set compression distance threshold value as the target compression characteristic to form a target compression characteristic set.

In an embodiment of the present invention, the determining module includes: an indexing submodule and an acquisition submodule;

the index submodule is used for determining an index corresponding to each target compression characteristic; the index is used for characterizing the position of a candidate feature corresponding to the target compression feature in the original feature set;

the obtaining sub-module is configured to obtain a candidate feature corresponding to each target compression feature from the original feature set according to the index corresponding to each target compression feature, and form the candidate feature set.

In an embodiment of the present invention, the apparatus further includes: a write module to:

In an embodiment of the present invention, the apparatus further includes: a first recovery module to:

In an embodiment of the present invention, the apparatus further includes: a second recovery module to:

In an embodiment of the present invention, the apparatus further includes: a third recovery module to:

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium stores computer-executable instructions, and after the computer-executable instructions are executed, the steps in the feature retrieval method provided by the embodiment of the invention can be realized.

The embodiment of the invention also provides computer equipment, which comprises a memory and an image processor, wherein the memory is stored with computer executable instructions, and the image processor can realize the steps in the feature retrieval method provided by the embodiment of the invention when the image processor runs the computer executable instructions on the memory.

The embodiment of the invention provides a feature retrieval method and device, a storage medium and computer equipment, wherein compressed features corresponding to features to be retrieved are compared with compressed features in a duplicate set to find out a plurality of target compressed features, original features corresponding to the target compressed features are used as candidate features, and target candidate features corresponding to the features to be retrieved are found out from the candidate features; therefore, the retrieval speed of the feature retrieval is improved, and the retrieval precision of the feature retrieval is effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1A is a first schematic diagram illustrating a network architecture according to an embodiment of the present invention;

FIG. 1B is a schematic diagram of a second exemplary network architecture;

fig. 2 is a schematic flow chart illustrating an implementation of the feature retrieval method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a network architecture according to a second embodiment of the present invention;

fig. 4 is a schematic flow chart illustrating an implementation of a feature adding method according to a second embodiment of the present invention;

fig. 5 is a schematic flow chart illustrating an implementation of the feature retrieval method according to the second embodiment of the present invention;

fig. 6 is a schematic flow chart illustrating an implementation of a failure recovery method according to a second embodiment of the present invention;

FIG. 7A is a first diagram illustrating a feature retrieval method in the related art;

FIG. 7B is a diagram illustrating a feature retrieval method according to the related art;

fig. 7C is a first schematic diagram of a feature retrieval method according to a third embodiment of the present invention;

fig. 7D is a schematic diagram of a feature retrieval method according to a third embodiment of the present invention;

fig. 8 is a first schematic structural diagram of a feature retrieving device according to a fourth embodiment of the present invention;

fig. 9A is a structural diagram of a search module in the feature search apparatus according to the fourth embodiment of the present invention;

fig. 9B is a schematic structural diagram of a feature retrieving device according to a fourth embodiment of the present invention;

fig. 9C is a structural diagram of a determining module in the feature retrieving device according to the fourth embodiment of the present invention;

fig. 10 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the following describes specific technical solutions of the present invention in further detail with reference to the accompanying drawings in the embodiments of the present invention. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Fig. 1A is a schematic diagram of a composition structure of a network architecture according to an embodiment of the present invention, as shown in fig. 1A, the network architecture includes a retrieval device 10 and a service node 11, where the retrieval device 10 and the service node 11 interact with each other through a network 21. The retrieval device 10 is capable of receiving a retrieval request of a user during implementation and sending the retrieval request to the service node 11. The database in the service node 11 stores known original features, and the service node 11 also stores compression features corresponding to the original features. The service node 11 performs feature extraction on the features to be retrieved carried by the retrieval request to obtain compression features to be retrieved, compares the compression features to be retrieved with each compression feature to find out target compression features matched with the compression features to be retrieved, and compares the original features corresponding to the target compression features with the features to be retrieved to find out target candidate features corresponding to the features to be retrieved.

Fig. 1B is a schematic structural diagram of another network architecture according to an embodiment of the present invention, and as shown in fig. 1B, the network architecture includes a retrieval device 10, service nodes 11-1N, and a data service node 20, where the retrieval device 10, the service nodes 11 to 1N, and the data service node 20 interact with each other through a network 21. The retrieval device 10 can receive retrieval requests of users in the implementation process and respectively send the retrieval requests to the service nodes 11-1N. The service nodes 11 to 1N respectively store a part of the compression features corresponding to the known original features in the data service node 20, and the sum of the compression features in the service nodes 11 to 1N is the compression feature corresponding to all the original features in the data service node 20. After receiving the retrieval request, the service nodes 11 to 1N respectively extract features of the features to be retrieved carried by the retrieval request to obtain compression features to be retrieved, compare the compression features to be retrieved with the stored compression features to find out target compression features matched with the compression features to be retrieved, and notify the index of the target compression features to the data service node 20. The database on the data service node 20 stores known original features, the data service node 20 obtains the original features, namely candidate features, corresponding to the target compression features from the database according to the received indexes, and sends the candidate features to the service node 11-1N, and the service node 11-1N compares each candidate feature with the features to be retrieved respectively to find out the target candidate features corresponding to the features to be retrieved.

In the network structures shown in fig. 1A and fig. 1B, when the retrieval device 10 receives the retrieval request, it may also directly perform feature extraction on the features to be retrieved carried in the retrieval request to obtain compressed features to be retrieved, and distribute the compressed features to be retrieved to the service nodes 11 to 1N.

With reference to the application scenario diagrams shown in fig. 1A and fig. 1B, the embodiment provides a feature retrieval method, which can effectively improve the retrieval speed of feature retrieval and simultaneously improve the retrieval accuracy of feature retrieval.

In order to better understand the feature retrieval method provided by the embodiment of the present invention, some terms in the embodiment of the present invention are described below.

Original features, long features used to characterize the retrieved object, such as: the face information of each citizen in the national citizen information base. The retrieval image including the object to be retrieved is used as the input of the neural network model, and the output result of the neural network model is the original feature of the retrieval image, namely the feature to be retrieved.

The original feature set, i.e., the feature library, is a set of original features. Wherein, corresponding to different types of information, different feature libraries can be set, such as: the system comprises a citizen information base containing national citizen information and a vehicle information base containing national vehicle information. A database may include a plurality of different types of feature libraries.

Compressing the features, corresponding to the original features, and extracting the features of the original features to obtain short features. The feature extraction can be realized by sampling compression, compression mapping and other modes, so that the original features are compressed. Such as: the size of the original features is 2K, and feature extraction is carried out on the original features to obtain the compressed features of 32 bytes. The compressed features may be key information of the original features.

And searching the original features which are the same as or similar to the features to be searched in the original feature set through feature searching.

The target candidate feature is a retrieval result of feature retrieval, and is an original feature which is the same as or similar to the feature to be retrieved and is searched in the original feature set.

A replica set, a set of compressed features. One original feature set may correspond to a plurality of duplicate sets, each duplicate set includes at least one compressed feature, and the compressed features in the duplicate sets do not overlap with each other, that is, the compressed features corresponding to the original features in one original feature set are stored in the plurality of duplicate sets, and the compressed features in the duplicate sets do not overlap with each other. A replica set may include a plurality of subsets. When the replica set includes a plurality of subsets, the compression characteristics in each subset are the same. When the replica set comprises two subsets, the two subsets are respectively a first subset and a second subset, and each subset in the replica set is respectively stored in a different service node. The duplicate set also comprises indexes corresponding to the compression characteristics.

And the duplicate sets correspond to the service nodes of one duplicate set. When the replica set comprises a plurality of subsets, the subsets are respectively stored in the corresponding service nodes, one replica set comprises a plurality of service nodes, and one subset is stored on each service node. When the replica set comprises a first subset and a second subset, the replica set comprises two service nodes, namely a master service node and a slave service node, respectively, wherein the first subset is located on the master service node, and the second subset is located on the second service node.

And clustering, namely a set consisting of compressed features with the same feature type, wherein the compressed features do not have the feature type, and the feature type of the original feature corresponding to each compressed feature is used as the feature type of the corresponding compressed feature.

A typical feature, an original feature that characterizes a feature type of a compressed feature in a cluster.

And the snapshot file is a file obtained by copying the compression characteristics and the indexes in the copy set, and for each snapshot file, the acquisition time when copying is performed, the copied compression characteristics, the indexes corresponding to the compression characteristics, the storage positions of the compression characteristics and the like are recorded.

And the metafile stores the files of the snapshot files of the copy set. A metafile may store snapshot files for one or more replica collections.

Example one

The present embodiment provides a feature retrieval method, as shown in fig. 2, the method includes the following steps:

s201, performing feature extraction on the features to be retrieved to obtain compressed features to be retrieved;

and when the retrieval equipment receives the retrieval operation, responding to the retrieval operation, and generating a feature to be retrieved according to the retrieval image corresponding to the retrieval operation, wherein the feature to be retrieved is feature information of a retrieval object included in the retrieval image, and the retrieval object can be an object such as a human face, a vehicle and the like. The retrieval device generates a retrieval request according to the retrieval characteristics and sends the generated retrieval request to the service node.

And after receiving the retrieval request, the service node analyzes the retrieval request to obtain the to-be-retrieved features carried by the retrieval request, and performs feature extraction on the to-be-retrieved features to obtain the compressed features to be retrieved.

Here, when the retrieval device sends the retrieval request to the service node, the retrieval request may be sent to the interface proxy service, which sends the retrieval request to the service node. When the compressed features corresponding to one feature library are stored in a plurality of copy sets, the retrieval request is sent to the service node in each copy set.

Such as: and if the compressed features corresponding to one feature library are divided into three parts which are respectively stored in a copy set A, a copy set B and a copy set C, the retrieval request is respectively sent to the service node of the copy set A, the service node of the copy set B and the service node of the copy set C.

When one replica set comprises a master service node and a slave service node, the interface proxy service selects a target service node from the master service node and the slave service node and sends a retrieval request to the target service node in each replica set. Such as: in the same example, the service nodes of the replica set a include a master service node a and a slave service node a ', the service nodes of the replica set B include a master service node B and a slave service node B ', the service nodes of the replica set C include a master service node C and a slave service node C ', and the service nodes a, B ' and C are selected as target service nodes of the three replica sets, and then the retrieval request is sent to the service nodes a, B ' and C. For one copy set, when selecting a target service node, the target service node may be selected randomly, or the target service node may be selected according to the resource condition or the load condition of each service node, for example: when a target service node is selected from the master service node a and the slave service node a ' included in the replica set a, 4 retrieval requests currently being processed by the master service node a, 6 retrieval requests currently being processed by the slave service node a ', the load of the master service node a being lower than that of the slave service node a ', and the master service node a being selected as the target service node. The embodiment of the invention does not limit the selection mode of selecting the target service node.

S202, searching a target compression feature set matched with the compression features to be retrieved from a copy set;

the target compression feature set at least comprises one target compression feature, and the copy set comprises different compression features; and the service node compares the compression features to be retrieved with the compression features in the stored duplicate set, determines target compression features matched with the compression features to be retrieved from the duplicate set and forms a target compression feature set.

In the embodiment of the invention, the copy set can be stored in a video memory of the service node; correspondingly, the searching for the target compression feature set matched with the compression feature to be retrieved from the copy set includes: and searching a target compression feature set matched with the compression feature to be retrieved from the copy set through a Graphic Processing Unit (GPU). When the GPU searches a target compression feature set matched with the compression features to be retrieved from a copy set stored in the video memory, the service node can simultaneously process a plurality of retrieval requests based on the batch operation supported by the GPU, determines the compression features to be retrieved corresponding to the retrieval requests, and determines a target compression feature set corresponding to each compression feature to be retrieved from the copy set.

In an embodiment, the searching, from the replica set, a target compression feature set matching the compression feature to be retrieved includes: and searching a target compression feature set matched with the compression features to be retrieved from at least two copy sets, wherein the compression features included in each copy set are different.

Here, the interface proxy service sends the retrieval request to the service nodes in each copy set, and each service node searches a target compression feature set matched with the compression feature to be retrieved from the copy set stored by the service node.

Such as: in the same way, the interface proxy service sends the retrieval request to the service node of the replica set a, the service node of the replica set B, and the service node of the replica set C, and at this time, the service node of the replica set a, the service node of the replica set B, and the service node of the replica set C search a target compression feature set matched with the compression feature to be retrieved from their own stored replica sets.

In an embodiment, the set of replicas includes a first subset stored on a first physical machine and a second subset stored on a second physical machine, the compression characteristics in the first subset and the second subset being the same; correspondingly, the searching for the target compression feature set matching the compression feature to be retrieved from the copy set includes: and searching a target compression feature set matched with the compression features to be retrieved from a target subset of each copy set, wherein the target subset is selected from the first subset and the second subset. The first physical machine is a master service node, and the second physical machine is a slave service node.

Such as: the service nodes of the duplicate set A comprise a main service node a and a slave service node a ', the service nodes of the duplicate set B comprise a main service node B and a slave service node B ', the service nodes of the duplicate set C comprise a main service node C and a slave service node C ', the selected service nodes a, B ' and C are respectively target service nodes of three duplicate sets, and the retrieval request is sent to the service nodes a, B ' and C, the target compression feature set a is looked up in the first subset stored on the primary service node a, the target compressed feature set B is looked up from the second subset stored at serving node B', then the target compression feature set C is found from the first subset stored in the main service node C, and the compression feature set formed by the target compression feature set A, B and C includes all target compression features matching the compression features to be retrieved.

In an embodiment, the replica set comprises at least two clusters; the clusters comprise at least one compression feature, and the compression features in the same cluster belong to the same feature type; correspondingly, the searching for the target compression feature set matched with the compression feature to be retrieved from the copy set comprises: determining target clusters from the duplicate set according to the features to be retrieved and the typical features of each cluster, wherein the typical features represent feature types to which compression features in the corresponding clusters belong; and searching a target compression feature set matched with the compression features to be retrieved from the compression features of the target cluster.

Here, one replica set may include a plurality of clusters, each cluster including a representative feature, so as to characterize a feature type of a compression feature in a corresponding cluster by the representative feature. Wherein, the feature type is the type of the object characterized by the original feature, such as: when the object is a human face, the characteristic types can be long face, square face, white skin, black skin and the like; for another example: when the object is an automobile, the feature type is a brand, a color, or the like.

After receiving a retrieval request, comparing the to-be-retrieved features carried by the retrieval request with the typical features of each cluster, finding out typical features with feature types similar to the to-be-retrieved features, and determining the clusters corresponding to the found typical features as target clusters, wherein the target clusters can comprise a plurality of clusters, and the typical features similar to the to-be-retrieved features can be typical features with the similarity to the to-be-retrieved features larger than a set similarity threshold. And searching a target compression feature set matched with the compression features to be retrieved in the compression features of the target clusters.

Such as: the duplicate set comprises a cluster 1, a cluster 2, a cluster 3 and a cluster 4, wherein the cluster 1 comprises a typical feature 1, the cluster 2 comprises a typical feature 2, the cluster 3 comprises a typical feature 3, and the cluster 4 comprises a typical feature 4; respectively comparing the typical features 1 to 4 with the features to be retrieved, determining the clusters 1 and 2 corresponding to the typical features 1 and 2 as target clusters when determining that the typical features 1 and 2 are matched with the features to be retrieved, and searching a target compression feature set in the compression features of the clusters 1 and the compression features of the clusters 2.

For another example: the duplicate set 1 comprises a cluster 1 and a cluster 2, the duplicate set 2 comprises a cluster 3 and a cluster 4, the cluster 1 comprises a typical feature 1, the cluster 2 comprises a typical feature 2, the cluster 3 comprises a typical feature 3, and the cluster 4 comprises a typical feature 4; respectively comparing the typical features 1 to 4 with the features to be retrieved, determining the clusters 1 and 2 corresponding to the typical features 1 and 2 as target clusters when determining that the typical features 1 and 2 are matched with the features to be retrieved, and searching a target compression feature set in the compression features of the clusters 1 and the compression features of the clusters 2.

It should be noted that, when a copy set includes multiple subsets, the clusters in the subsets are identical, for example: the A replica set comprises a first subset and a second subset, the clusters in the first subset comprise cluster 1 and cluster 2, and the clusters in the second subset comprise cluster 1 and cluster 2.

Wherein determining a target cluster from the replica set according to the feature to be retrieved and the typical feature of each cluster may include: comparing the characteristic to be retrieved with the typical characteristic of each cluster in the duplicate set, and determining the similarity between the characteristic to be retrieved and the typical characteristic of each cluster; and determining the clusters corresponding to the similarity greater than the set similarity threshold value as target clusters.

Such as: in the same example, the set similarity threshold is 80%, the typical feature 1 to the typical feature 4 are respectively compared with the feature to be retrieved, the similarity between the typical feature 1 and the feature to be retrieved is 86%, the similarity between the typical feature 2 and the feature to be retrieved is 82%, the similarity between the typical feature 3 and the feature to be retrieved is 32%, and the similarity between the typical feature 4 and the feature to be retrieved is 50%, and then the cluster 1 and the cluster 2 corresponding to the typical feature 1 and the typical feature 2 are determined as the target cluster.

Here, when selecting a target cluster from a plurality of clusters, the similarity between each representative feature and the feature to be retrieved may be sorted, and a set number of clusters with the similarity arranged in front may be used as the target cluster. Such as: in the same example, the typical features 1 to 4 are ranked into the typical feature 1, the typical feature 2, the typical feature 4 and the typical feature 3 according to the similarity between the typical feature 1 and the feature 4 and the feature to be retrieved, and correspondingly, the ranking of the clusters is as follows: cluster 1, cluster 2, cluster 4, and cluster 3, and when two clusters are selected as the target cluster, the target cluster includes cluster 1 and cluster 2.

In an embodiment, the searching, from the replica set, a target compression feature set matching the compression feature to be retrieved includes: determining a compression distance between the compression feature to be retrieved and each compression feature in the replica set, wherein the compression distance represents the similarity of the two compression features; and taking the compression features of which the compression distances are smaller than the set compression distance threshold as the target compression features to form a target compression feature set.

The compression distance is the similarity between two compression features, such as: compression characteristic 1 is 00110011 and compression characteristic 2 is 00111101, the compression distance is: 3. the compression distance may also be expressed by a percentage. The set compression distance threshold value can be set according to actual requirements.

S203, determining candidate features corresponding to each target compression feature from the original feature set to form a candidate feature set;

the original feature set comprises at least one original feature, the service node acquires the original feature, namely the candidate feature, corresponding to each target compression feature from the database according to each target compression feature in the target compression feature set, wherein a mapping relation exists between the compression feature and the original feature, and the candidate feature corresponding to the target compression feature is acquired from the database according to the mapping relation between the compression feature and the original feature to form a candidate feature set.

In an embodiment, the determining a candidate feature corresponding to each target compression feature from the original feature set to form a candidate feature set includes: determining an index corresponding to each target compression feature; the index is used for characterizing the position of a candidate feature corresponding to the target compression feature in the original feature set; and acquiring a candidate feature corresponding to each target compression feature from the original feature set according to the index of each target compression feature to form a candidate feature set.

At this time, the replica set comprises a first mapping relation between the compression features and the indexes, and after the service node determines the target feature set, the service node acquires candidate features corresponding to the indexes from the database according to the indexes corresponding to each target compression feature in the target compression feature set.

Such as: the target compression features in the target compression feature set include: compressing feature 1, compressing feature 2 and compressing feature 3, wherein the indexes corresponding to compressing feature 1, compressing feature 2 and compressing feature 3 are respectively: 11. 12 and 13, the service node sends the indexes 11, 12 and 13 to a database, and acquires the original features 1, 2 and 3 corresponding to the indexes 11, 12 and 13 from the database, wherein the original features 1, 2 and 3 are candidate features, and the formed set is a candidate feature set.

Here, the database may be located on the service node, and may also be located on the data service node.

S204, comparing the candidate features in the candidate feature set with the features to be retrieved to obtain target candidate features corresponding to the features to be retrieved.

After the service node acquires the candidate feature set, the candidate features in the candidate feature set are respectively compared with the features to be retrieved, and the target candidate features matched with the features to be retrieved are found out from the candidate feature set.

In practical application, after the service node in each copy set determines the target candidate feature, the target candidate feature determined by the service node is sent to the retrieval equipment. Here, the service node may send the target candidate feature determined by the service node to the interface proxy service, so that the interface proxy service collects the search results of each copy set, and sends the collected results to the search device.

It should be noted that, during feature retrieval, there is a case that a target candidate feature matching a feature to be retrieved is not found in the duplicate set.

In an embodiment, the cluster is determined by a clustering algorithm and the set of replicas; the typical features corresponding to the clusters are determined by the original features corresponding to the compressed features in the clusters. When the service node receives the classification operation, the service node responds to the classification operation and determines classification original features corresponding to the compression features in the copy set from an original feature set to form a classification original feature set; dividing the classified original feature set into at least two original feature groups through a clustering algorithm, and selecting typical features for each original feature group; determining the corresponding compression feature in each original feature group as a cluster; and determining the characteristic feature corresponding to each original feature group as the characteristic feature of the corresponding cluster.

The classification operation can be triggered by the system periodically and automatically, or by the user operation, when the retrieval device receives the classification operation, the original features in the feature library stored in the database are classified through a clustering algorithm, the original features with the same feature type are divided into an original feature group, and typical features capable of representing the feature type of the original feature group are found out in each divided original feature group. During classification, the original features in the original feature set corresponding to each copy set can be classified by taking the copy sets as units. The clustering algorithm can be K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS algorithm, BIRCH algorithm, DBSCAN algorithm, STING algorithm, etc. After the typical features are determined, the compressed features corresponding to the original features in each group of original feature groups are determined as a cluster, and the typical features corresponding to each group of original feature groups are determined as the typical features of the corresponding clusters.

Such as: the original feature set comprises original features 101 to 110, and is divided into three original feature groups through a clustering algorithm: an original feature group 1, an original feature group 2 and an original feature group 3, wherein the original feature group 1 includes original features 101, 103, 105 and 106, the corresponding typical feature 1 is 103, the original feature group 2 includes original features 102 and 109, the corresponding typical feature 2 is 109, the original feature group 3 includes original features 104, 107, 108 and 110, and the corresponding typical feature 3 is 104, then in the service node, the cluster 1 includes compressed features corresponding to the original features 101, 103, 105 and 106, the typical feature of the cluster 1 is the original feature 103, the cluster 2 includes compressed features corresponding to the original features 102 and 109, the typical feature of the cluster 2 is the original feature 109, the cluster 3 includes compressed features corresponding to the original features 104, 107, 108 and 110, and the typical feature of the cluster 3 is the original feature 104.

In an embodiment, before performing the feature extraction on the feature to be retrieved, the method further includes: performing feature extraction on the features to be written to obtain compressed features to be written; writing the compression features to be written into a first subset of a target copy set; the target replica set is one of at least two replica sets; writing the features to be written into a first log corresponding to the first subset; and writing the compression features corresponding to the features to be written into a second subset of the target copy set according to the first log, and writing the features to be written into a second log corresponding to the second subset.

And when the retrieval equipment receives the write operation, responding to the write operation, generating a write request, and sending the write request to the main service node of the copy set. When one feature library corresponds to a plurality of copy sets, one copy set is selected from the plurality of copy sets to serve as a target copy set, a write request is sent to the target copy set, and the copy set corresponding to the target copy set serves as the target copy set. Wherein the retrieval request may be sent by the retrieval device to an interface proxy service, the target replica set being selected by the interface proxy service from the plurality of replica sets. Here, the target replica set is selected from the plurality of replica sets, and the target replica set may be selected from the plurality of replica sets according to the number of compression features included in the replica set of each replica set, such as: and determining the copy set in which the copy set with the minimum number of included compression features is positioned as the target copy set.

In practical applications, the state of the retrieval device may include an initial state and a retrieval state. In an initial state, only write operations are received, thereby generating a replica set. In the retrieval state, the retrieval operation and the write-in operation can be received, the compressed features to be retrieved corresponding to the features to be retrieved by the retrieval operation are searched in the copy set, and the copy set is updated through the received write-in operation. Here, in the retrieval state, the execution order of the retrieval operation and the write operation is not limited at all, and the copy set may be updated based on the received write operation between a plurality of retrieval operations.

And when the main service node receives the write-in request, performing feature extraction on the to-be-written features carried by the write-in request to obtain the to-be-written compression features. Here, the feature extraction algorithm is the same as the feature extraction algorithm for obtaining the compressed feature to be retrieved from the feature to be retrieved.

The main service node writes the compression features to be written into the first subset, records the writing operation of the compression features in a first log of the main service node, and writes the features to be written into a database, wherein the recorded information comprises the features to be written, writing time, writing position, index and other information.

When the first log of the master service node is updated, the slave service node corresponding to the master service node acquires the updated to-be-written feature of the first log based on the updating of the first log, performs feature extraction on the to-be-written feature to obtain the to-be-written compression feature, writes the to-be-written compression feature into the second subset according to the first log to ensure the synchronization of the first subset and the second subset, and records the writing operation of the to-be-written compression feature in the second log, wherein the recorded information comprises the to-be-written feature, the writing time, the writing position, the index and other information to ensure the synchronization of the first log and the second log.

In the embodiment of the invention, the first subset of the master service node and the second subset of the slave service nodes are synchronized, so that the same copy set can process more retrieval requests in parallel, and when the compression feature of one subset is lost, the compression feature lost subset can recover the compression feature according to the other subset, thereby ensuring the recovery of the copy set.

In the embodiment of the present invention, when the failure of the service node results in the loss of the compression features in the first subset or the second subset, the failure recovery can be implemented by the following two ways, that is, the recovery of the compression features of the first subset or the second subset is implemented:

the first method is as follows: if the compression characteristics in the first subset or the second subset of the copy set are null, determining to recover the compression characteristics according to the characteristics to be written recorded in the first log or the second log; wherein the first log records a to-be-written feature written before the compression feature in the first subset is empty; the second log records the features to be written in the second subset before the compression features are empty; writing the recovered compressed features to the first subset or the second subset.

When the service node fails to cause the loss of the stored compression features, all the written features to be written are obtained according to the logs (the main service node corresponds to the first log, and the slave service node corresponds to the second log) in the service node when the features to be written are written, feature extraction is carried out on the features to be written obtained from the logs to obtain compression recovery features, the compression recovery features are written into the corresponding first subset or second subset according to the logs, and the compression features in the first set or the second set before the failure are recovered.

In a second mode, if the compression characteristics in the first subset or the second subset of the copy set are null, searching the metafile corresponding to the copy set; the metafile records the compression characteristics written before the compression characteristics in the first subset or the second subset are empty; determining the compression characteristics recorded in the metafile as snapshot compression characteristics, and writing the snapshot compression characteristics into the first subset or the second subset; acquiring the acquisition time of the last compression feature in the metafile, determining a supplementary original feature from a first log or a second log according to the acquisition time, and writing the compression feature corresponding to the supplementary original feature into the first subset or the second subset.

And when the service node fails to cause the loss of the stored compression characteristics, performing failure recovery on the first subset or the second subset according to the compression characteristics and the index copied by the snapshot file of the metafile corresponding to the service node. Here, one copy set is copied at intervals, that is, the compression features stored in the metafile are the compression features in the copy set before the latest acquisition time, that is, snapshot compression features, only the snapshot compression features before the acquisition time can be recovered through the metafile, and the to-be-written features written in the database during the period from the acquisition time to the failure time can be obtained through the log on the service node. The to-be-written features written into the database during the period from the acquisition time to the failure time are called supplementary original features, feature extraction is carried out on the supplementary original features to obtain compressed features corresponding to the supplementary original features, the compressed features corresponding to the supplementary original features are written into the corresponding first subset or second subset according to the log, and the compressed features in the first set or the second set before the failure are recovered.

In practical application, when the compression feature in the service node is lost, whether a metafile exists can be judged, when the metafile exists, the compression feature can be directly recovered through the mode two, and when the metafile does not exist, the compression feature is recovered through the mode one.

In one embodiment, when the compression characteristics of the second subset are lost, the method further comprises: if the compression characteristic in the second subset of the copy set is empty, determining the recording time of the last characteristic to be written recorded in the second log; determining synchronous original features from the first log according to the recording time, wherein the synchronous original features are to-be-written features written into the first subset after the recording time; and writing the compressed features corresponding to the synchronous original features into the second subset.

When the compression characteristics of the second subset are lost, writing of the compression characteristics may exist in the first subset after the second subset fails, and here, the compression characteristics written into the first subset after the failure time are written into the second subset according to the first log, so that synchronization of the first subset and the second subset is realized. Wherein the compression characteristic written to the first subset after the time of failure is the compression characteristic written to the second subset recorded after the recording time in the second log.

It should be noted that, in the embodiment of the present invention, when a service node fails and a stored compression feature of the service node is null, failure recovery of the compression feature is performed on the service node according to a log or a metafile corresponding to the service node, where an execution sequence of failure recovery, feature retrieval, and feature writing (feature addition) is not limited at all, for example: in the process of carrying out the feature retrieval, the service node breaks down, at the moment, the terminal feature retrieval is carried out, the failure recovery is executed, and the feature retrieval is continued after the failure recovery is finished; for another example: before fault recovery, performing feature retrieval according to a retrieval request A, and after fault recovery, performing feature retrieval according to a retrieval request B; for another example: before the fault is recovered, writing the compression characteristic a corresponding to the characteristic A to be written in, updating the copy set, after the fault is recovered, writing the compression characteristic B corresponding to the characteristic B to be written in, and updating the copy set again.

In practical application, the compression features in each copy set can be stored in a fragmentation manner according to a fragmentation policy, one copy set is divided into a plurality of fragments, and each fragment is divided into a plurality of clusters. The embodiment of the present invention does not limit the fragmentation strategy at all.

Example two

In the embodiment of the present invention, the feature retrieval method provided in the embodiment of the present invention is further described by using the network structure shown in fig. 3. The network structure shown in fig. 3 includes: interface proxy service (shared-proxy) 301, service node 302, database 303, and object store 304; the service node 302 includes a work process (worker) and a GPU/CPU, and a memory of the service node stores compression features used when the worker performs retrieval. Each replica set (repliaset) comprises two service nodes: the system comprises a main service node and a slave service node, wherein a worker of the main service node is a main process (master), a worker of the slave service node is a slave process (slave), a first subset is stored in a video memory of the main service node, and a second subset is stored in the video memory of the slave service node. The first subset of master service nodes and the second subset of slave service nodes form a replica set.

The respective components shown in fig. 3 will be described below.

An interface proxy service 301 for feature library management and raw feature management, the feature library management including: the method comprises the following steps of adding a feature library, deleting the feature library, modifying the feature library, searching the feature library, carrying out a fragmentation strategy and maintaining a mapping relation between the feature library and a fragment, wherein the original feature management comprises the following steps: adding original features, deleting the original features and retrieving the original features; when receiving a retrieval request for retrieving the original features, scheduling and distributing the retrieval request and collecting retrieval results.

The interface proxy service 301 may connect a plurality of search devices (not shown) at the same time, and interact with users through the search devices, so that the users manage the feature library and the original features through the interface proxy service. In practical applications, the interface proxy service 301 may also directly act as a retrieval device.

The service node 302 comprises a Worker and a GPU/CPU, the Worker and the GPU are bound by a feature retrieval service, and compression features are stored in a memory. When the service node receives the retrieval request distributed by the interface agent service 301, the GPU/CPU controls the worker to process the retrieval request to obtain a retrieval result, and sends the retrieval result to the interface agent service. When the service node receives the write-in request distributed by the interface agent service, the worker processes the write-in request, writes the compression feature corresponding to the feature to be written in the video memory, and sends the feature to be written to the database 303.

Replicase as shown by a dotted frame in fig. 3, a master service node and a slave service node in the dotted frame constitute one replicase. The compression characteristics stored in the master service node and the slave service node are the same, and the compression characteristics between different replicasets do not overlap. The worker of the master service node and the worker of the slave service node are a master and a slave respectively. The worker of different replicasets can process one retrieval request at the same time, and for one retrieval request, the master or slave corresponding to the Replicaset processes the retrieval request.

Based on the interaction between the interface proxy service 301 and the service node, the master performs read and write operations, and the slave performs read operations. When the interface proxy service receives a retrieval request, the interface proxy service sends the retrieval request to a master or a slave in each copy set, and the retrieval request is used for reading a target candidate feature corresponding to a feature to be retrieved carried by the retrieval request from the copy set; when the interface proxy service receives a write-in request, the write-in request is sent to a master corresponding to the copy set, the features to be written in, which are carried by the write-in request and correspond to the features to be written in, are written into the first subset through the master, and the features to be written in are written into the database. The master implements synchronization of the compression features in the first subset of the master and the second subset of the slave by manipulating the log (first log).

Database 303, for storing the original feature set, may be a Cassandra or other database.

In practical application, fragmentation management is performed on the compression features in the replica set, that is, the compression features in the replica set are divided into a plurality of fragments, accordingly, the original features corresponding to the compression features in the replica set are also subjected to fragmentation management in the database, and the fragments of the compression features in the replica set correspond to the fragments of the original features in the database. Such as: the compression features in the replica set include compression features 101, 102 to 200, and are divided into 3 slices, where slice 1 includes compression features 101, 102 to 130, slice 2 includes compression features 131, 132 to 180, slice 3 includes compression features 181, 182 to 200, and accordingly, in the database, the original features corresponding to compression features 101, 102 to 130 are one slice, referred to as slice 1 ', the original features corresponding to compression features 131, 132 to 180 are one slice, referred to as slice 2 ', and the original features corresponding to compression features 181, 182 to 200 are one slice, referred to as slice 3 '.

For each shard in the replica set, there may be multiple clusters. For example: for the segment 1 including the compression features 101, 102 to 130, cluster a, cluster B and cluster C are included; for the segment 2 including the compression features 131, 132 through 180, the cluster D and the cluster E are included; for slice 3, which includes compression feature 181, compression feature 182 through compression feature 200, cluster F, cluster G, and cluster H are included.

The compression features in the replica set are divided into a plurality of fragments, fragment management is carried out on the compression features in the replica set, clustering is carried out on the compression features included in each fragment, and the compression features are divided into a plurality of clusters, so that clustering is carried out by taking the fragments as units, the clustering speed is increased, and meanwhile, the clustering precision is improved.

And the object storage 304 is used for storing metafiles, wherein snapshot files of the copy sets are stored in the metafiles, and the snapshot files are used for quick fault recovery of the copy sets.

In practical applications, the interface agent 301, the service node 302, the database 303 and the object store 304 shown in fig. 3 may correspond to different physical machines respectively.

The following describes the feature retrieval method provided by the embodiment of the present invention in detail with reference to the network structure shown in fig. 3. The feature retrieval method provided by the embodiment of the invention can comprise the following three scenes: feature addition, feature retrieval, and failure recovery.

Scene 1, feature addition

As shown in fig. 4, the feature adding method of scenario 1 includes:

s401, the retrieval equipment generates a write-in request and sends the write-in request to an interface proxy service;

when the original features need to be added into the database, the user performs a write operation through the retrieval device, the operation content of the write operation is the features to be written, the retrieval device generates a write request carrying the features to be written based on the write operation, and sends the write request to the interface proxy service 301.

S402, the interface proxy service determines a target copy set and sends a write-in request to a main service node of the copy set where the target copy set is located;

after receiving the write request, the interface proxy service 301 selects a target copy set from the multiple copy sets, and sends the write request to the master service node of the copy set where the target copy set is located. When the interface proxy service selects the target copy set from the plurality of copy sets, the interface proxy service can randomly select the copy set with the lowest capacity as the target copy set according to the capacity condition of each copy set, and can also determine the target copy set according to the load condition of each copy set and take the copy set stored by the target copy set as the target copy set. Here, the capacity of each replica set can be determined according to the number of compression features stored in each replica set, and the replica set with the least compression features can have the lowest capacity.

S403, the GPU scheduling master of the main service node performs feature extraction on the to-be-written features carried by the received write request to obtain to-be-written compression features;

s404, a GPU scheduling master of the main service node writes the compression features to be written into a first subset in the video memory;

here, the GPU further generates an index for the compression feature to be written, establishes a first mapping relationship between the compression feature to be written and the index, and stores the compression feature to be written and the generated index together in the first subset.

And establishing a corresponding relation between the features to be written and the corresponding compression features to be written according to the first mapping relation and the second mapping relation.

S405, writing the to-be-written features into a first log by a GPU scheduling master of the main service node;

here, when the master service node writes the to-be-written feature into the first log, the time of the write, the location of the write, and information that the index waits for a compression feature associated with the write are also recorded.

S406, the master service node synchronizes with the slave service node according to the first log.

The master service node transmits the first log to the slave service node in a stream (stream) manner, the information related to the feature to be written. And scheduling slave from the GPU of the service node to perform feature extraction on the to-be-written features recorded in the first log to obtain to-be-written compression features, writing the to-be-written compression features into a second subset stored in the video memory, writing the to-be-written features into a second log, and recording information such as writing time, writing position and index related to the to-be-written compression features in the second log.

It should be noted that, when the compression feature to be written is written into the video memory from the service node, the writing operation is performed according to the information related to the feature to be written in the first log, so that the compression features in the first subset and the second subset, the positions of each compression feature, and the index corresponding to each compression feature are completely consistent.

In the process of adding the features, a Worker in the main service node records the write operation of each step into the corresponding first log. The Master synchronizes the first log to the slave in a stream (stream) manner, and the slave writes one own log, namely a second log, every time the slave synchronizes to one write record from the Master.

In practical application, the Worker applies for each written original feature (feature to be written) to the GPU for video memory resources, and when a new original feature is inserted, the GPU calculates a compression feature to be written, and stores the calculated compression feature to be written in the applied video memory.

Here, the mapping relationship between the feature library and the copy set is stored in the database, so that the interface proxy service is stateless and can be expanded in parallel.

Scene 2, feature search

As shown in fig. 5, the feature retrieval method of scene 2 includes:

s501, the retrieval equipment generates a retrieval request and sends the retrieval request to an interface proxy service;

when the original features need to be retrieved from the database, the user performs retrieval operation through the retrieval device, the operation content of the retrieval operation is the features to be retrieved, the retrieval device generates a retrieval request carrying the features to be retrieved based on the retrieval operation, and sends the retrieval request to the interface proxy service 301.

Here, the retrieval device may receive an image of the object to be retrieved based on the retrieval operation, and obtain the feature to be retrieved of the output of the neural network model by using the image of the object to be retrieved as an input of the neural network model.

S502, the interface proxy service distributes the retrieval request to each copy set;

the interface proxy service distributes (map) the retrieval request to the service nodes in each replica set. When the replica sets comprise the master service node and the slave service nodes, the interface proxy service determines target service nodes from the master service node and the slave service nodes according to the resource states of the master service node and the slave service nodes in each replica set, and sends the retrieval request to the target service nodes in each replica set. That is, a target subset is selected from the first subset and the second subset of each replica set, the target subset being a subset on the target serving node.

S503, after receiving the retrieval request, the service node extracts the features to be retrieved carried by the retrieval request to obtain the compression features to be retrieved;

after receiving the retrieval request, the target service node in each copy set calls the worker through the GPU to analyze the retrieval request to obtain the features to be retrieved, and calls the worker through the GPU to perform feature extraction on the features to be retrieved to obtain the compressed features to be retrieved.

S504, the service node searches a target compression feature set matched with the compression feature to be retrieved from a target subset;

here, when the target subset in a copy set includes a plurality of clusters, the corresponding target service node calls a worker through the GPU to compare the feature to be retrieved with the typical feature of each cluster, and determines a target cluster, where the typical feature of the target cluster and the feature type of the feature to be retrieved are the same, such as: are white skin, and for example, are square faces. And after the target cluster is determined, comparing the compression features in the target cluster with the features to be retrieved, respectively calculating the compression distance between each compression feature in the target cluster and the compression features to be retrieved, and taking the compression features of which the compression distances are smaller than a compression distance threshold value as the target compression features to form a target feature set.

S505, the service node determines a candidate feature set from the original feature set in the database;

and the target service node in each copy set calls the worker through the GPU to obtain a candidate feature set corresponding to the target compression feature set from the database. Here, the worker sends the index corresponding to each target compression feature in the target compression feature set to the database, the database reads the candidate features corresponding to the index from the original feature set according to the received index, and sends the read candidate features to the target service node. The original feature set stores original features and corresponding indexes, and the index corresponding to one original feature is the same as the index of the compressed feature corresponding to the original feature in the duplicate set.

And the target service node in each duplicate set forms a candidate feature set according to the received candidate features so as to select a part of candidate features from the original feature set to compare with the features to be retrieved, thereby improving the retrieval efficiency.

It should be noted that the target service node may send the indexes corresponding to the plurality of target compression features to the database at the same time.

S506, the service node compares each candidate feature in the candidate feature set with the feature to be retrieved to obtain a target candidate feature.

And the target service node in each duplicate set compares each candidate feature in the candidate feature set with the feature to be retrieved to obtain the target candidate feature corresponding to each duplicate set.

S507, the service node sends the corresponding target candidate characteristics to the interface agent service.

The target service nodes in each replica set send respective target candidate features to the interface proxy service. And collecting and integrating the target candidate characteristics sent by the target service node in each copy set by the interface agent service, and sending the target candidate characteristics to a retrieval device for receiving retrieval operation.

Here, when the target service node sends the target candidate feature to the interface proxy service, it also sends feature information related to the target candidate feature, such as: the retrieval object is a user a, the feature to be retrieved is a feature of a face image of the user a, the target candidate feature in the retrieval result is the same as the feature to be retrieved, and is a feature of the face image of the user a, and when the feature of the face image of the user a is stored in the database, feature information of the user a is correspondingly stored, for example: identity card information, national, native and antecedent information, etc.

In practical application, one interface proxy service can simultaneously correspond to a plurality of feature libraries. When the interface proxy service receives a retrieval request, checking the mapping relation between the feature library and the copy set in the database; distributing (map) the retrieval request to a certain service node in repliicaset corresponding to the feature library, and replying the retrieval result of the feature retrieval to the interface proxy service by the service retrieval point through the feature retrieval; the interface agent service integrates and collects (reduce) the retrieval result into a final result.

The scene 2 feature retrieval method has the following technical advantages:

on one hand, when the target service node carries out feature retrieval, the distance (similarity between two compressed features) of the compressed features is used for judging in the GPU, and the index of the compressed features with the closer distance is found out. On the other hand, when the target service node is searched through the GPU, batch operation is supported, and the advantage of parallel computation of the GPU is fully utilized. In yet another aspect, compressed feature search is used in the GPU, substantially reducing costs. For example, on an 8G graphics card, 500M of space is reserved for the retrieval process, and for each 40Byte feature data (including 32Byte compression features and 8Byte indexes), 1.9 hundred million compression features can be loaded at most.

Scenario 3, failure recovery

Fig. 6 shows a failure recovery method of scenario 3, which includes:

s600, detecting that the video memory fails, and judging the type of the service node;

here, when a service node fails, for example: and when the system is down, the compression features stored in the video memory of the service node are abnormal, and all the compression features stored in the service node are lost. At this time, recording failure time, and judging the type of the service node, if the service node is a master service node, the compression feature in the first subset is empty, executing S6011, and if the service node is a slave service node, the compression feature in the second subset is empty, executing S6021;

s6011, judging whether a metafile exists or not;

and accessing the object storage 304, determining whether a metafile corresponding to the copy set where the current service node is located exists, and executing S6012 when the metafile exists, or executing S6013 when the metafile does not exist. The metafile stores a snapshot file of compression characteristics in a copy set of a copy set to which the service node belongs.

S6012, recovering the compression characteristics of the first subset according to the metafile;

here, the compression characteristics recorded before the acquisition time of the snapshot file are acquired from the metafile and restored to the first subset. Reading the to-be-written features after the acquisition time from the first log according to the acquisition time of the snapshot file to obtain the supplemented original features, namely the to-be-written features written in the period between the acquisition time and the failure time, performing feature extraction on the supplemented original features to obtain the compression features corresponding to the supplemented original features, restoring the compression features written in the period between the acquisition time and the failure time to the first subset, and completing restoration of the first subset.

Such as: the failure time is 10:35 minutes, recording all writing operations of the features to be written before 10:35 in the first log, and finally photographing the first subset in sequence to obtain the time of the photographing file of 10:15, wherein the compression features of the first subset before 10:15 are recorded in the photographing file in the metafile; when the metafile exists, recovering the compression characteristics before 10:15 in the first subset according to the photographing file in the metafile, reading the recording time 10:15 of the snapshot file, and playing back the characteristics to be written in the first log between 10:15 and 10:35 according to the recording time 10:15 to obtain the characteristics to be written in the first subset from 10:15 to 10:35, so that the compression characteristics in the first subset are recovered.

S6013, recovering the compression characteristics of the first subset according to the first log;

and performing feature extraction on the features to be written recorded in the first log to obtain compression features corresponding to the features to be written in the first log, namely recovering the compression features, writing the recovered compression features into the first subset, and recovering the compression features in the first subset.

S6021, judging whether a metafile exists;

when the metafile exists, S6022 is performed, and when the metafile does not exist, S6023 is performed.

S6022, recovering the compression characteristics of the second subset according to the metafile;

here, the compression characteristics recorded before the acquisition time of the snapshot file are acquired from the metafile and restored to the second subset. And reading the to-be-written features after the acquisition time from the second log according to the acquisition time of the snapshot file to obtain the supplemented original features, namely the to-be-written features written in the period between the acquisition time and the failure time, extracting the features of the supplemented original features to obtain the compression features corresponding to the supplemented original features, and recovering the compression features written in the period between the acquisition time and the failure time to the second subset. At this point, the compression characteristics of the second subset before the time of failure are restored, and step S6024 is further performed to maintain synchronization between the first subset and the second subset.

S6023, recovering the compression characteristics of the second subset according to the second log;

and performing feature extraction on the features to be written recorded in the second log to obtain compression features corresponding to the features to be written in the second log, namely recovering the compression features, writing the recovered compression features into the second subset, and recovering the compression features in the second subset.

For the slave service node, the compression feature is recovered to be written to before the failure time, and at this time, S6024 is further performed to maintain synchronization between the first subset and the second subset.

And S6024, synchronizing the first subset and the second subset according to the first log.

At the time from the failure time to the current time, the writing of the compression features may exist in the first subset, at this time, the features to be written after the failure time in the first log are read, that is, the synchronization original features, where the synchronization original features written in the first log after the failure of the service node are also read according to the recording time of the last feature to be written recorded in the second log, feature extraction is performed on the synchronization original features to obtain the compression features corresponding to the synchronization original features, the compression features corresponding to the synchronization original features are written in the second subset, and synchronization of the second subset and the first subset is completed.

In scenario 3, for the first subset, the set of the snapshot compression feature when restoring through the metafile and the compression feature corresponding to the supplemented original feature is the same as the restoration compression feature when restoring through the first log. And for the second subset, the snapshot compression feature when the recovery is performed through the metafile, the compression feature corresponding to the supplementary original feature and the compression feature corresponding to the synchronous original feature are the same as the recovery compression feature when the recovery is performed through the first log.

In scenario 3, when the primary service node encounters failure recovery, the following steps are performed: a. if the metafile does not exist, all the write operations are played back through the first log; b. if the metafile exists, loading all snapshot files according to the metafile; c. the location of the first log is obtained from the metafile and from this location the following write operation is played back. When the total service node encounters failure recovery, the previous 3 steps are the same as the main service node, and the last operation sequence number is sent to the main service node, the write operation synchronized by the main service node is received, and playback is performed, wherein the last operation sequence number is the sequence number of the last write operation recorded in the second log.

It should be noted that, at intervals, the service node performs full export on the copy set with a large change amount of the compression characteristics, stores the copy set as a snapshot file, and generates a new metafile according to the latest snapshot file of all the slices, where the metafile includes the location of the log where the snapshot file is located.

Compared with the prior art, the feature retrieval method provided by the embodiment of the invention has the following technical advantages:

1. and the GPU is used as a hardware basis of the feature retrieval service, so that the parallelism of feature retrieval calculation in the single-machine service is enhanced.

2. The automatic multi-machine horizontal data division is carried out on a large depth feature library (such as national citizen face information), and the bottlenecks of single-machine calculation and storage performance are broken through.

3. The duplicate sets realize multi-machine redundancy, namely, the duplicate sets are stored in a plurality of service nodes at the same time, the retrieval concurrence is linearly improved, a complete operation playback mechanism is provided, a timing snapshot strategy ensures reliable data and rapid error recovery, and the method has the characteristics of reliable data, disaster tolerance and the like. The disaster recovery refers to that when one of the service nodes has a problem, the recovery processing can be performed through other service nodes storing the same compression characteristics.

EXAMPLE III

In the embodiment of the present invention, a feature retrieval method in the related art and a feature retrieval method provided in the embodiment of the present invention are compared by four retrieval methods, where method 1 and method 2 are the feature retrieval methods in the related art, and method 3 and method 4 are the feature retrieval methods provided in the embodiment of the present invention.

Method 1, original characteristic retrieval

Fig. 7A is a schematic diagram of a feature retrieval method 1 in the related art, and as shown in fig. 7A, an original feature array in a database includes a plurality of original features, and when a feature to be retrieved is retrieved, the feature to be retrieved is matched with each original feature in the original feature array, so as to find out a target candidate feature matched with the feature to be retrieved.

Method 2, compression feature retrieval

Fig. 7B is a schematic diagram of the feature retrieval method 2 in the related art, and as shown in fig. 7B, the compressed feature array in the database includes a plurality of compressed features. When the features to be retrieved are retrieved, the compression features to be retrieved corresponding to the features to be retrieved are matched with the compression features in the compression feature array, target compression features matched with the compression features to be retrieved are found out, and the original features corresponding to the target compression features are determined as target candidate features.

Method 3, compressed feature retrieval and original feature retrieval

Fig. 7C is a schematic diagram of a feature retrieval method 3 according to an embodiment of the present invention. The service node stores a compressed feature array (i.e., a copy set), and the database stores original feature data (i.e., an original feature set). When the features to be retrieved are retrieved, the compression features to be retrieved corresponding to the features to be retrieved are matched with the compression features in the compression feature array, target compression features matched with the compression features to be retrieved are found out, original features corresponding to the target compression features are determined as candidate features, and target candidate features matched with the features to be retrieved are determined in the candidate features.

Method 4, clustering, compressed feature retrieval and original feature retrieval

Fig. 7D is a schematic diagram of the feature retrieval method 4 according to the embodiment of the present invention. Compressed feature arrays (i.e., sets of replicas) are stored in the service node, and each compressed feature array includes a plurality of clusters, each cluster including a corresponding representative feature: representative feature 1, representative feature 2 … … representative feature N (corresponding to original feature 1, original feature 2 … … original feature N in fig. 7D), and original feature data (i.e., an original feature set) is stored in a database. When the features to be retrieved are retrieved, the features to be retrieved are compared with the typical features of each cluster to find out target typical features similar to the retrieval features, the clusters corresponding to the target typical features are determined as target clusters, the compression features to be retrieved corresponding to the features to be retrieved are compared with the compression features in the target clusters to determine target compression features, and target candidate features are searched in the candidate features corresponding to the target compression features. Here, each representative feature is provided with an inverted index, so that the target cluster is determined by the inverted index corresponding to the target representative feature.

Here, taking the complexity of method 1 as o (n), the precision as 1, and the speed as an example, the search effects of method 1, method 2, method 3, and method 4 are compared from three dimensions of complexity, precision, and speed, and the comparison results are shown in table 1.

Table 1 comparative example of search effect of different search methods

	Method 1	Method 2	Method 3	Method 4
					Complexity of	O(n)	O(n)	O(n)	probe/nlist*O(n)
Accuracy of measurement	1	4	2	2 to 3
					Speed of rotation	4	1	3	1 to 3

In the complexity probe/nlist o (n) of method 4, probe represents the number of target clusters, that is, the target clusters include probe clusters, and nlist is the total number of clusters. In the method 4, probe target clusters are determined from the nlist clusters, and target compression features matched with the compression features to be retrieved are searched in the compression features of the probe target clusters. For a copy set after primary clustering, nlist is fixed, and nprobe may be different for each compression feature to be retrieved.

As can be seen from table 1, for accuracy: method 1> method 3> - > method 4> method 2, for speed: method 1, method 3, method 4, method 2.

Example four

An embodiment of the present invention provides a feature retrieval apparatus, as shown in fig. 8, the apparatus including: an extraction module 801, a search module 802, a determination module 803 and a comparison module 804; wherein,

the extraction module 801 is used for performing feature extraction on the features to be retrieved to obtain compressed features to be retrieved;

a searching module 802, configured to search a target compression feature set matching the compression feature to be retrieved from a replica set, where the target compression feature set includes at least one target compression feature, and the replica set includes different compression features;

a determining module 803, configured to determine, from the original feature set, a candidate feature corresponding to each target compression feature to form a candidate feature set; the original feature set comprises at least one original feature;

a comparing module 804, configured to compare the candidate feature in the candidate feature set with the feature to be retrieved, so as to obtain a target candidate feature corresponding to the feature to be retrieved.

In one embodiment, as shown in FIG. 9A, the lookup module 802 includes: a first lookup sub-module 8021;

the first searching sub-module 8021 is configured to search, from at least two duplicate sets, a target compression feature set matching the compression feature to be retrieved, where compression features included in each duplicate set are different.

In an embodiment, the set of replicas includes a first subset stored on a first physical machine and a second subset stored on a second physical machine, the compression characteristics in the first subset and the second subset being the same;

accordingly, as shown in fig. 9A, the lookup module 802 further includes: a second lookup submodule 8022;

a second searching submodule 8022, configured to search, from a target subset of the replica set, a target compression feature set that matches the compression feature to be retrieved, where the target subset is selected from the first subset and the second subset.

In an embodiment, the replica set comprises at least two clusters; the clusters comprise at least one compression feature, and the compression features in the same cluster belong to the same feature type; accordingly, as shown in fig. 9A, the lookup module 802 further includes: a determination sub-module 8023 and a third search sub-module 8024;

a determining submodule 8023, configured to determine, according to the feature to be retrieved and the typical feature of each cluster, a target cluster from the duplicate set, where the typical feature represents a feature type to which a compression feature in a corresponding cluster belongs;

a third searching submodule 8024, configured to search, from the compression features of the target cluster, a target compression feature set that matches the compression feature to be retrieved.

In one embodiment, as shown in fig. 9A, the lookup module 802 further comprises: a calculation sub-module 8025 and a comparison sub-module 8026; wherein,

a computation submodule 8025, configured to determine a compression distance between the compression feature to be retrieved and each compression feature in the replica set, where the compression distance represents a similarity between two compression features;

a comparison submodule 8026, configured to use the compression characteristic of which the compression distance is smaller than the set compression distance threshold as the target compression characteristic, so as to form a target compression characteristic set.

In one embodiment, as shown in fig. 9C, the determining module 803 comprises: an indexing sub-module 8031 and an acquisition sub-module 8032;

an index sub-module 8031, configured to determine an index corresponding to each target compression feature; the index is used for characterizing the position of a candidate feature corresponding to the target compression feature in the original feature set;

an obtaining submodule 8032, configured to obtain, according to the index of each target compression feature, a candidate feature corresponding to each target compression feature from the original feature set, so as to form the candidate feature set.

In one embodiment, as shown in fig. 9B, the apparatus further comprises: a write module 805 configured to:

In one embodiment, as shown in fig. 9B, the apparatus further comprises: a first recovery module 806 configured to:

In one embodiment, as shown in fig. 9B, the apparatus further comprises: a second restoring module 807 for:

acquiring the acquisition time of the last compression feature in the metafile, determining a supplementary original feature from the first log or the second log according to the acquisition time, and writing the compression feature corresponding to the supplementary original feature into the first subset or the second subset.

In one embodiment, as shown in fig. 9B, the apparatus further comprises: a third recovery module 808 configured to:

determining synchronous original features from the first log according to the recording time, wherein the synchronous original features are to-be-written features written into a first subset of the target copy set after the recording time;

It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus according to the invention, reference is made to the description of the embodiments of the method according to the invention for understanding.

It should be noted that, in the embodiment of the present invention, if the instant messaging method is implemented in the form of a software functional module and is sold or used as an independent product, the instant messaging method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for enabling an instant messaging device (which may be a terminal, a server, etc.) to perform all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Accordingly, an embodiment of the present invention further provides a computer program product, where the computer program product includes computer-executable instructions, and after the computer-executable instructions are executed, the steps in the feature retrieval method provided in the embodiment of the present invention can be implemented.

Accordingly, an embodiment of the present invention further provides a storage medium (i.e., a computer storage medium) having stored thereon computer-executable instructions, which when executed by a processor, implement the steps of the feature retrieval method provided by the above-mentioned embodiment.

Accordingly, an embodiment of the present invention provides a computer device, fig. 10 is a schematic diagram illustrating a composition structure of the computer device according to the embodiment of the present invention, and as shown in fig. 10, the device 1000 includes a memory 1005 and a GPU1001, where the memory 1005 stores computer-executable instructions, and when the GPU1001 runs the computer-executable instructions on the memory 1005, the steps of the feature retrieval method provided in the above embodiment may be implemented. As shown in fig. 10, the computer device 1000 further comprises at least one communication bus 1002, a user interface 1003 and at least one external communication interface 1004. Wherein the communication bus 1002 is configured to enable connective communication between these components. The user interface 1003 may include a display screen, and the external communication interface 1004 may include a standard wired interface and a wireless interface, among others.

The above description of the embodiments of the computer program product, the computer device and the computer storage medium is similar to the description of the above method embodiments with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the computer program product, the computer device and the computer storage medium of the present invention, reference is made to the description of the embodiments of the method of the present invention for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for feature retrieval, the method comprising:

2. The method according to claim 1, wherein the searching for the target compression feature set matching the compression feature to be retrieved from the replica set comprises:

3. The method of claim 1, wherein the set of replicas comprises a first subset stored on a first physical machine and a second subset stored on a second physical machine, the compression characteristics in the first subset and the second subset being the same;

correspondingly, the searching for the target compression feature set matched with the compression feature to be retrieved from the copy set comprises:

and searching a target compression feature set matched with the compression features to be retrieved from a target subset of the copy set, wherein the target subset is selected from the first subset and the second subset.

4. The method of claim 1 or 2, wherein the set of replicas comprises at least two clusters; the clusters comprise at least one compression feature, and the compression features in the same cluster belong to the same feature type; correspondingly, the searching for the target compression feature set matched with the compression feature to be retrieved from the copy set comprises:

5. The method of claim 3, wherein prior to feature extraction of features to be retrieved, the method further comprises:

6. The method of claim 3, further comprising:

7. The method of claim 3, further comprising:

8. A feature retrieval apparatus, characterized in that the apparatus comprises: the device comprises an extraction module, a search module, a determination module and a comparison module; wherein,

9. A computer storage medium having computer-executable instructions stored thereon that, when executed, perform the method steps of any of claims 1 to 7.

10. A computer device comprising a memory having stored thereon computer-executable instructions and an image processor operable to perform the method steps of any of claims 1 to 7 when the computer-executable instructions are executed on the memory.