CN113255828B

CN113255828B - Feature retrieval method, device, equipment and computer storage medium

Info

Publication number: CN113255828B
Application number: CN202110669606.4A
Authority: CN
Inventors: 何群; 吴婷; 闾凡兵; 牟三钢
Original assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Current assignee: Changsha Hisense Intelligent System Research Institute Co ltd
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-10-15
Anticipated expiration: 2041-06-17
Also published as: CN113255828A

Abstract

The application discloses a feature retrieval method, a feature retrieval device, a feature retrieval equipment and a computer storage medium. The feature retrieval method comprises the following steps: acquiring a first feature vector and P second feature vectors, wherein the dimensionality of the first feature vector and the dimensionality of the second feature vectors are both W, and both P and W are integers larger than 1; dividing the first feature vector into L first vectors, dividing each second feature vector into L second vectors, wherein the dimension of each first vector and the dimension of each second vector are smaller than W, the L first vectors correspond to the L second vectors one by one, and L is an integer larger than 1; determining a second similarity between the first eigenvector and each second eigenvector according to the first similarity between the L first vectors and the L second vectors associated with each second eigenvector; and obtaining a feature retrieval result of the first feature vector according to the second similarity. The embodiment of the application can effectively improve the feature retrieval efficiency of the first feature vector.

Description

Feature retrieval method, device, equipment and computer storage medium

Technical Field

The present application belongs to the field of machine learning technologies, and in particular, to a feature retrieval method, apparatus, device, and computer storage medium.

Background

It is known in the field of machine learning technology to extract features from multimedia data, such as images, videos, etc., which may be typically embodied in the form of feature vectors. The extracted feature vectors are subjected to feature retrieval in a preset feature vector library, so that objects (such as people, animals and the like) in the multimedia data can be predicted or identified.

However, when the dimension of the feature vector is high, the above feature retrieval process of the feature vector has a problem of low efficiency.

Disclosure of Invention

The embodiment of the application provides a feature retrieval method, a feature retrieval device, a feature retrieval equipment and a computer storage medium, which can solve the problem of low feature retrieval efficiency of feature vectors in the prior art.

In a first aspect, an embodiment of the present application provides a feature retrieval method, including:

acquiring a first feature vector and P second feature vectors, wherein the dimensionality of the first feature vector and the dimensionality of the second feature vectors are both W, both P and W are integers greater than 1, and both the first feature vector and the second feature vectors are feature vectors of multimedia resources;

dividing the first feature vector into L first vectors, dividing each second feature vector into L second vectors, wherein the dimension of each first vector and the dimension of each second vector are smaller than W, the L first vectors correspond to the L second vectors one by one, and L is an integer larger than 1;

determining a second similarity between the first eigenvector and each second eigenvector according to the first similarity between the L first vectors and the L second vectors associated with each second eigenvector;

and obtaining a feature retrieval result of the first feature vector according to the second similarity.

In a second aspect, an embodiment of the present application provides a feature retrieval apparatus, including:

the acquisition module is used for acquiring a first feature vector and P second feature vectors, the dimensionality of the first feature vector and the dimensionality of the second feature vectors are both W, both P and W are integers larger than 1, and both the first feature vector and the second feature vectors are feature vectors of multimedia resources;

the dividing module is used for dividing the first feature vectors into L first vectors and dividing each second feature vector into L second vectors, the dimensionality of each first vector and the dimensionality of each second vector are smaller than W, the L first vectors correspond to the L second vectors one by one, and L is an integer larger than 1;

the determining module is used for determining a second similarity between the first feature vector and each second feature vector according to a first similarity between the L first vectors and the L second vectors associated with each second feature vector;

and the retrieval module is used for obtaining a feature retrieval result of the first feature vector according to the second similarity.

In a third aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the feature retrieval method described above.

In a fourth aspect, embodiments of the present application provide a computer storage medium having computer program instructions stored thereon, where the computer program instructions, when executed by a processor, implement the above-mentioned feature retrieval method.

The feature retrieval method provided by the embodiment of the application obtains a first feature vector with dimension W and P second feature vectors with dimension W, divides the first feature vector into L first vectors with dimension smaller than W, divides the second feature vector into L second vectors with dimension smaller than W, and the L first vectors correspond to the L second vectors one by one, determines a second similarity between the first feature vector and each second feature vector according to a first similarity between the L first vectors and the L second vectors associated with each second feature vector, and obtains a feature retrieval result of the first feature vector according to the second similarity. According to the embodiment of the application, the high-dimensional first feature vector and the high-dimensional second feature vector can be divided into the plurality of low-dimensional first vectors and the plurality of low-dimensional second vectors respectively, the second similarity between the high-dimensional vectors is determined based on the first similarity between the low-dimensional vectors, and then the feature retrieval result of the first feature vector is obtained, so that the problem of low efficiency caused by directly retrieving the high-dimensional feature vector can be solved, and the feature retrieval efficiency of the first feature vector is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a feature retrieval method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a feature retrieval method in a specific application example;

FIG. 3 is a block diagram of a pedestrian re-identification model;

FIG. 4 is a schematic view of the distribution of vectors in a two-dimensional plane;

FIG. 5 is a graph of the effect of clustering vectors using a clustering algorithm;

FIG. 6 is a diagram of the effect of matching corresponding clusters to target vectors;

FIG. 7 is a schematic diagram of dividing a high-dimensional vector into low-dimensional vectors to determine similarity;

FIG. 8 is a schematic flow chart of outputting feature search results;

fig. 9 is a schematic structural diagram of a feature retrieval device provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the prior art problems, embodiments of the present application provide a feature retrieval method, apparatus, device, and computer storage medium. First, a feature retrieval method provided in an embodiment of the present application is described below.

Fig. 1 shows a schematic flowchart of a feature retrieval method according to an embodiment of the present application. As shown in fig. 1, the feature retrieval method includes:

step 101, acquiring a first feature vector and P second feature vectors, wherein the dimensionality of the first feature vector and the dimensionality of the second feature vectors are both W, both P and W are integers greater than 1, and both the first feature vector and the second feature vectors are feature vectors of multimedia resources;

102, dividing the first feature vector into L first vectors, dividing each second feature vector into L second vectors, wherein the dimensionality of each first vector is smaller than W, the L first vectors correspond to the L second vectors one by one, and L is an integer larger than 1;

103, determining a second similarity between the first feature vector and each second feature vector according to a first similarity between the L first vectors and the L second vectors associated with each second feature vector;

and 104, obtaining a feature retrieval result of the first feature vector according to the second similarity.

In the embodiment of the present application, the first feature vector may be considered as a feature vector of a multimedia resource that needs to perform feature retrieval.

For example, the multimedia asset may be an image and the first feature vector may be a feature vector in the image to be retrieved. And extracting corresponding feature vectors from the image to be retrieved through the feature extraction model.

Of course, in practical applications, the first feature vector may also be a feature vector extracted from a multimedia resource of a video type, a text type, or the like. The first feature vector may be referred to as a feature vector of the multimedia resource. Similarly, the second feature vector may also be a feature vector of the multimedia resource.

For example, the first feature vector may be a feature vector extracted from a multimedia resource to be retrieved, and each of the second feature vectors may correspond to a candidate multimedia resource. The second feature vector may be considered as a feature vector obtained by feature extraction of the corresponding multimedia resource, according to the correspondence between the second feature vector and the candidate multimedia resource.

The above feature retrieval is performed to a certain extent by retrieving a plurality of candidate multimedia resources with the highest similarity to the multimedia resource to be retrieved from the plurality of candidate multimedia resources.

Further taking the pedestrian image as an example, by extracting the features of the pedestrian image, the corresponding features can be obtained, such as whether to wear a hat, whether to wear a scarf, the type of jacket, the color of the jacket, whether to push a cart, and the like. These features can be embodied uniformly by feature vectors.

The second feature vector may then be a preset feature vector or may be considered a feature vector of the sample. In other words, the second feature vector may be a feature vector obtained by performing feature extraction on a sample such as an image in advance.

In general, the second feature vectors may be present in a predetermined feature vector library, and each second feature vector may correspond to an object, such as a person, an animal or an article.

For simplifying the description, the following description will mainly use the second feature vector as an example of the feature vector obtained by extracting the features of the pedestrian image.

The first feature vector and the second feature vector are multidimensional vectors, that is, the feature vectors have a plurality of dimensions. The dimension of the first feature vector and the dimension of the second feature vector can both be W, in other words, the dimensions of the two types of feature vectors can be equal. The specific value of W may be preset according to actual needs, and is not specifically limited herein.

In general, matching the first feature vector with each second feature vector one by one can realize feature retrieval of the first feature vector. However, when the value of W is too large, matching efficiency may be caused to be low.

In step 102, the first feature vector may be divided into L first vectors, and each second feature vector may also be divided into L second vectors. The L first vectors and the L second vectors may be in one-to-one correspondence, and a dimension of each first vector and a dimension of each second vector are both smaller than W.

In other words, the first feature vector and the second feature vector of high dimension may be divided into a plurality of low dimension vectors, respectively. The correspondence relationship between the first vector and the second vector may not only be that the number L is equal, but also be that the corresponding first vector and the second vector have equal dimensions.

For example, the dimensions of the first feature vector and the second feature vector are 2048 dimensions. The first feature vector may be divided into 4 512-dimensional first vectors and the second feature vector may also be divided into 4 512-dimensional second vectors.

In the case of performing the division processing on the first feature vector and the second feature vector, in step 103, a second similarity between the first feature vector and the second feature vector may be determined based on the first vector and the second vector.

For convenience of description, the following description mainly takes L second vectors obtained by dividing one of the second feature vectors as an example to describe a determination process of the second similarity between the first feature vector and the second feature vector.

As indicated above, there is a one-to-one correspondence between the first vectors and the second vectors, and for each first vector, a similarity, i.e., the first similarity described above, can be determined with the second vector with which it is opposite.

As for the determination method of the similarity between the vectors, the determination method is not particularly limited, and for example, the determination method of the euclidean distance may be used to determine the similarity between the vectors.

The first feature vector and the second feature vector are each divided into L low-dimensional second vectors, and accordingly, L second vectors may be associated with one second feature vector, and L first similarities may exist between one second feature vector and the first feature vector. From the L first similarities, a second similarity of the first eigenvector and the second eigenvector may be determined.

For example, the L first similarities may be added to obtain a second similarity; alternatively, the second similarity may be obtained by averaging; or, the weight of the corresponding first similarity may be determined according to the dimension of each first vector, and a weighted average calculation may be performed to obtain the second similarity, and so on. The method for determining the second similarity according to the first similarity may not be specifically limited, and may be selected according to actual needs.

It is worth to be noted that the second similarity may be a numerical value for measuring the actual similarity, and the value range is 0 to 1; the score can be a score for scoring the similarity degree, and the specific value range can be more flexible.

In step 104, a feature search result of the first feature vector may be obtained according to the second similarity.

It is easy to understand that, when the feature search is performed on the first feature vector, the feature vector with the highest similarity is usually searched from a plurality of preset feature vectors.

Specifically, in this embodiment, the P second eigenvectors may correspond to the plurality of preset eigenvectors. The obtained feature search result may be a second feature vector having a higher second similarity to the first feature vector.

For example, the feature retrieval result may include one or more second feature vectors with corresponding second similarity ranks at the top; or one or more second feature vectors comprising corresponding second similarities above a similarity threshold.

Of course, in practical applications, each second feature vector may correspond to a preset object, for example, a pedestrian, an object, or the like. Accordingly, the feature search result may be a feature vector indicating which preset object the first feature vector is.

From another perspective, if the first feature vector may be obtained by feature extraction of an image including a target object, the feature retrieval result may indicate which preset object the target object is specific to.

Optionally, the obtaining the first feature vector and the P second feature vectors includes:

acquiring a first feature vector and a preset feature vector library, wherein the preset feature vector library comprises Q third feature vectors, and Q is an integer greater than or equal to P;

clustering the Q third feature vectors to obtain M feature vector clusters, wherein each feature vector cluster has a corresponding first central vector and comprises at least one third feature vector;

respectively determining a target distance between a second central vector of the first feature vectors and each first central vector;

and determining a third feature vector included in the feature vector cluster corresponding to the third central vector as a second feature vector, wherein the third central vector is a first central vector of which the corresponding target distance meets a preset distance condition.

As indicated above, the second feature vector may be present in a pre-set library of feature vectors. In other words, when performing the feature search for the first feature vector, the feature search may be performed in a preset feature vector library, where the preset feature vector library may include the P second feature vectors, or may include more preset feature vectors.

For convenience of description, the preset feature vector library may be considered to include Q third feature vectors, and the P second feature vectors may be all or part of the Q third feature vectors.

When the value of Q is too large, if each third feature vector is determined to be the second similarity with the first feature vector one by one, the problem of low feature retrieval efficiency is also caused.

Therefore, in this embodiment, Q third feature vectors may be clustered first to obtain M feature vector clusters, each feature vector cluster has a corresponding first central vector, and each feature vector cluster is associated with at least one third feature vector.

As for the process of clustering the Q third feature vectors, the process may be implemented based on a clustering algorithm, for example, a K-MEANS clustering algorithm, a mean shift clustering algorithm, or a DBSCAN clustering algorithm, and the like, which is not specifically limited herein.

For each cluster of feature vectors, at least one third feature vector may be included. Meanwhile, each feature vector cluster may have a corresponding first center vector.

It is easy to understand that, in some application scenarios, the center vector may be preset to serve as a cluster center, and the feature vector whose distance from the center vector satisfies a corresponding distance condition (e.g., is smaller than a preset distance value, or is closest to the center vector, etc.) may be included in the feature vector cluster corresponding to the center vector. Or, in other application scenarios, the center vector may also be automatically calculated in the clustering process under the limitation of relevant constraints (e.g., the number of feature vector clusters, the loss caused by clustering between feature vectors and center vectors, etc.).

Each feature vector cluster has a corresponding first center vector, and the first feature vector may also specifically correspond to a second center vector.

In this embodiment, a target distance between the second center vector and each first center vector may be determined, and the target distance may be an euclidean distance, and the like, which may not be specifically limited herein.

According to the target distance, a third central vector can be determined from M first central vectors corresponding to M feature vector clusters, the number of the third central vectors can be one or more, and the target distance between the third central vector and the first central vector can satisfy a preset distance condition.

In practical application, the preset distance condition may be that the target distance is smaller than a distance threshold, or may also be that the target distances corresponding to the first center vectors are sorted from small to large, and a preset number of target distances ranked in the front are taken, and the like, and may be set according to actual needs.

In this embodiment, the third eigenvector associated with the eigenvector cluster corresponding to the third central vector is determined as the second eigenvector.

That is, when the first feature vector is subsequently retrieved, the second feature vector screened from the preset feature vector library may be retrieved. On one hand, the second feature vector with higher similarity to the first feature vector can be screened out based on the target distance of the central vector, so that the accuracy of feature retrieval is ensured, and on the other hand, the number of preset feature vectors used in the candidate retrieval process can be reduced, and the feature retrieval efficiency is improved.

In combination with the above embodiment, when the first feature vector is retrieved by using the preset feature vector library, two main steps of clustering and retrieving can be divided. In the step of retrieval, after dividing the high-dimensional vector into low-dimensional vectors, determining the similarity of the low-dimensional vectors, and then determining the similarity of the high-dimensional vectors, to a certain extent, the step of retrieval can be regarded as cascade retrieval.

In one embodiment, after the clustering is completed, the third feature vector in each feature vector cluster may be stored, so that the repeated clustering process is omitted in the subsequent process of determining the target distance between the first center vector and the second center vector.

In order to reduce the storage space, in this embodiment, after the Q third feature vectors are clustered to obtain M feature vector clusters, the feature retrieval method may further include:

respectively carrying out scalar quantization processing on at least one third feature vector included in each feature vector cluster;

and storing the third feature vector after the scalar quantization processing.

For example, one Scalar Quantization (Scalar Quantization) may be performed on each third feature vector in the feature vector cluster. Scalar quantization converts each dimension of the original vector from a 4 byte floating point number to a 1 byte unsigned integer, which can significantly reduce memory space.

In one embodiment, the step 102 of determining the second similarity between the first feature vector and each second feature vector according to the first similarity between the L first vectors and the L second vectors associated with each second feature vector includes:

respectively determining a first similarity between each second vector and the corresponding first vector in L second vectors associated with a fourth feature vector, wherein the fourth feature vector is any one of the second feature vectors;

and determining the sum of the L first similarity as a second similarity between the first feature vector and the fourth feature vector.

In combination with the above example of dividing the 2048-dimensional first feature vector into 4 512-dimensional first vectors, in the process of determining the second similarity between the first feature vector and a certain second feature vector, the first similarities between the 512-dimensional first vectors and the corresponding second vectors may be determined respectively. Assuming that the 4 first similarities are 50%, 40%, 60% and 50%, respectively, if directly added, the second similarity can be obtained as 200%.

That is, the first similarity is considered as a numerical value for measuring the degree of similarity, and the second similarity may be considered as a score. In general, the greater the value of the second similarity, the higher the similarity between the first feature vector and the second feature vector can be considered.

In this embodiment, for each second feature vector, similarity between the L second vectors associated with the second feature vector and the corresponding L first vectors may be determined, so as to obtain L first similarities. The sum of the L first similarity is determined as the second similarity between the first feature vector and the second feature vector, the calculation process is relatively simple, and calculation resources can be effectively saved.

In one embodiment, the step 104 of obtaining a feature search result of the first feature vector according to the second similarity includes:

sequencing the P second similarities from big to small;

and outputting K second feature vectors corresponding to the K second similarity degrees which are sorted at the top, wherein the feature retrieval result comprises the K second feature vectors, and K is a positive integer less than or equal to P.

It is easily understood that, for each second feature vector, a second similarity with the first feature vector can be correspondingly obtained. Accordingly, P second eigenvectors may correspondingly obtain P second similarities.

In this embodiment, the P second similarities may be sorted from large to small, and K second similarities sorted before may be taken. Each second similarity corresponds to a second feature vector, so that K second feature vectors can be output correspondingly. The K second feature vectors may be considered as the second feature vectors with the highest similarity to the first feature vector among the P second feature vectors.

The feature retrieval result may include K second feature vectors.

Of course, as shown above, each second feature vector may correspond to one preset object, and the feature search result may also include K preset objects corresponding to the K second feature vectors.

In addition, the feature retrieval result may further include predicted probability values of the above-mentioned K second feature vectors (or K preset objects), where the predicted probability values may correspond to the second similarities and may be used to reflect probability values that the target object is the retrieved preset object, and the target object may be matched with the first feature vector (for example, the first feature vector may be obtained by performing feature extraction on an image including the target object).

In this embodiment, the feature retrieval result is obtained according to the ranking of the second similarity, and the accuracy of the feature retrieval result can be effectively ensured.

Optionally, the obtaining the first feature vector may include:

acquiring a target image;

inputting a target image into a target feature extraction model obtained through training, and outputting a first feature vector;

the target feature extraction model comprises a target backbone network, a target aggregation network and a target head network, wherein the input end of the target backbone network is used for receiving a target image, the input end of the target aggregation network is used for receiving the output of the target backbone network, the input end of the target head network is used for receiving the output of the target aggregation network, the user at the input end of the target head network receives the output of the target aggregation network, and the output end of the target head network is used for outputting a first feature vector.

In this embodiment, the first feature vector may be extracted for the image. The target image may be considered to be an image that needs to be identified or predicted.

For example, a pedestrian may be included in the target image, and the first feature vector extracted from the target image may indicate some attributes of the pedestrian, such as whether the pedestrian wears a hat, the length of hair, and the like. Of course, this is only a few examples of what the first feature vector may correspond to. In practical application, the target image is input into the target feature extraction model to obtain the first feature vector.

The target feature extraction model may be a trained network model. In this embodiment, the target feature extraction network may include a target backbone network, a target aggregation network, and a target head network.

The target backbone network can receive a target image, and performs feature extraction on the target image to obtain a preliminary feature, which is recorded as a first feature. The target aggregation network may perform aggregation processing on the first feature to obtain a feature after the aggregation processing, and the feature is recorded as a second feature. The target head network may predict the second feature to obtain the first feature vector.

In other words, the target head network may be regarded as a kind of prediction network for predicting the first feature vector. Of course, the target head network may also be configured to perform normalization processing or the like on the output of the target aggregation network to obtain the first feature vector.

In this embodiment, the first feature vector is extracted from the target image based on a target feature extraction model including a target backbone network, a target aggregation network, and a target head network. In other words, in the process of extracting the first feature vector, a plurality of network frameworks are used, so that the first feature vector can better indicate the features of the target image, and the quality of the first feature vector is ensured.

Optionally, before inputting the target image into the trained target feature extraction model and outputting the first feature vector, the feature retrieval method may further include:

establishing an initial characteristic extraction model;

training an initial feature extraction model by using a sample image set carrying an annotation, and obtaining a target feature extraction model under the condition that the loss value of a loss function of the initial feature extraction model is smaller than a preset loss value;

the loss function comprises at least one of a cross entropy loss function, a triple loss function and a circle loss function, and the loss function calculates a loss value based on the output of the annotation and initial feature extraction model.

In this embodiment, the initial feature extraction model may be matched with the target feature extraction model, and the network frameworks of the two may be similar, but the network parameters may be different.

For example, the initial feature extraction model may include an initial backbone network, an initial aggregation network, and an initial head network, and the target backbone network, the target aggregation network, and the target head network may be obtained by training the initial feature extraction model and adjusting network parameters of each network.

In general, the initial feature extraction model can be trained over a sample image set carrying annotations. And inputting the sample image into the initial feature extraction model to obtain corresponding output.

The initial feature extraction model may have a corresponding loss function therein, which may calculate a loss value based on the output of the annotation and the initial feature extraction model.

In this embodiment, the loss function may include at least one of a cross-entropy loss function, a triplet loss function, and a round loss function. The manner in which the loss values are calculated for some of the loss functions is illustrated below.

The sample image set may include two sample images of the same pedestrian, and the labels carried by the two sample images are usually identical. The two sample images are input into the initial feature extraction model, and two feature vectors can be correspondingly output. Since the two feature vectors correspond to the same pedestrian (correspond to the same label), the larger the distance between the two feature vectors (or the smaller the similarity), the larger the loss value of the cross entropy loss function.

The sample image set may also include two sample images of different pedestrians, and the labels carried by the two sample images are usually inconsistent. The two sample images are input into the initial feature extraction model, and two feature vectors can be correspondingly output. The two eigenvectors correspond to different pedestrians (correspond to different labels), and the larger the distance between the two eigenvectors (or the smaller the similarity), the smaller the loss value of the triplet loss function.

In general, the loss value of the loss function can be used to measure whether the initial feature extraction model is sufficiently trained. When the loss value of the loss function is smaller than the preset loss value, it can be considered that the training of the initial feature extraction model is completed, so as to obtain the target feature extraction model.

The feature retrieval method provided in the embodiments of the present application is described below with reference to a specific application example.

As shown in fig. 2, in this specific application example, the feature retrieval may be performed on a pedestrian image, and the feature retrieval method may specifically include the following steps:

step 201, acquiring original video data;

step 202, carrying out pedestrian detection on the original video data to obtain an original pedestrian image;

step 203, preprocessing an original pedestrian image to obtain a sample image for training;

step 204, training a pedestrian re-identification model by using a sample image;

the pedestrian re-recognition model obtained through training can extract a first feature vector from a target image;

step 205, performing feature retrieval based on clustering retrieval and cascade retrieval fusion;

step 206, displaying the search result.

In step 202, pedestrian retrieval may be performed on the original video, and an image frame with high pedestrian image quality is extracted (i.e., an optimal frame is extracted).

Because the processed data sets of the original pedestrian images are unordered, the original pedestrian images can be preprocessed before being input into the alignment training of the pedestrian re-recognition model.

Specifically, in the data set of the original pedestrian image, there may exist images of two different pedestrians in some original pedestrian images, and in order to reduce manual intervention, a DBSCAN clustering algorithm may be added to process the data set to obtain the detected and tracked image data. In the image data after detection tracking, only one pedestrian is generally present in each image.

Labeling (which may be manual labeling or automatic labeling) is performed on the image data after detection and tracking, and a pedestrian ID and a camera ID corresponding to each image in the image data are distinguished. And finally, obtaining images of different pedestrians under different camera lenses, aligning and combining the images to obtain the ID pair (pedestrian ID and camera ID) of the same person under different camera lenses. According to the ID pair, images of the same person under all camera lenses can be merged into the same folder, and image data of the pedestrian ID is relabeled and named to be in a sample format which can be used for training a pedestrian re-identification model.

In order to enrich training samples of the pedestrian re-identification model, the robustness of the pedestrian re-identification model obtained through training is improved. Data enhancement processing such as random cropping, mosaic processing, color adjustment, rotation, inversion, addition of weather special effects, and the like may be performed on the image data of each pedestrian ID. Before inputting to the pedestrian re-recognition model, the image sizes of all training samples can be unified into a preset size so as to train the pedestrian re-recognition model.

In step 204, a pedestrian re-identification model is trained using the sample images. As shown in fig. 3, the pedestrian re-identification model may select a combination of ResNet101x. + IBN as a backbone network for feature extraction, aggregate features by using Gem Pooling operation, and then apply bneck to obtain a final prediction result.

Cross-entropy loss (Cross-entropy loss) can be used in combination with triple loss (Triplet loss) during model training. In addition, in the training process, in order to obtain a class prediction result (to a certain extent, the class prediction result can be regarded as a classification result of the pedestrian ID), a classification layer can be added, and the classification layer can introduce circleSoftmax. Finally, using an Adam (Adaptive motion Estimation) optimizer to enable the model to learn the data distribution in the data set from multiple angles, thereby performing better fitting,

IN the foregoing, the IBN may be understood as an example-Batch Normalization (instruction-Batch Normalization), and the IBN combines an example Normalization (IN) and a Batch Normalization (Batch Normalization, BN), and features extracted by the IBN may not only eliminate appearance influence but also retain feature content information, which is helpful for improving model accuracy.

Gem Pooling is a polymeric network, also known to some extent as a pooled network, ranging between average Pooling (mean Pooling) and maximum Pooling (max Pooling). Regulation parameters in Gem Poolingp. By regulatingpValues that allow Gem Pooling to focus on areas of different particle size. For example, whenpWhen =1, Gem Pooling is average Pooling; when in usepTowards infinity, Gem Pooling is the largest Pooling.

The bneck can be used as a head network, and to a certain extent, can be regarded as a Batch Normalization network (BN), and by introducing the bneck, the triplet loss can be more easily converged.

For circles Softmax, we can consider a combination of circles and Softmax. Wherein Softmax is an activation function, and is generally applied in the classification process. While Circle corresponds to Circle loss. In other words, the above classification layer adopts the activation function Softmax, and uses circle loss to adjust network parameters during training.

In addition, in the process of model training, strategies such as Learning rate Warm-up (Learning rate Warm up) and skeleton network freeze (back bone freeze) can be used, so as to provide training efficiency of model training.

The following describes cluster search and cascade search, respectively.

A typical approach in cluster vector search is to divide a large number of vectors into many clusters by a certain clustering algorithm, each cluster containing hundreds of vectors and each cluster having a central vector. When a user inputs a target vector for searching, the cluster retrieval system firstly calculates the distance between the target vector and the central vector of each cluster, selects a plurality of clusters with closer distances, then calculates the distance between the target vector and each vector in the clusters, and finally obtains k result vectors with the closest distances.

As shown in fig. 4, schematically, it is assumed that there are several vectors on a two-dimensional plane, each vector corresponding to a point in the diagram.

The two-dimensional plane may include an X-axis and a Y-axis, and accordingly, the point corresponding to each vector has corresponding coordinates in the two-dimensional plane. It is easy to understand that, in general, the position difference of different vectors in a two-dimensional plane can be described by coordinates. Therefore, in practical applications, the scales of the X-axis and the Y-axis can be set only according to practical needs, and are not limited specifically here.

As shown in fig. 5, the vectors are divided into a number of clusters by a clustering algorithm, the number of clusters is specifiable, here divided into 4 clusters, and the vectors with black circles represent the center vectors of the clusters:

as shown in fig. 6, adding the target vector to be retrieved, and based on the comparison calculation, finding that the target vector is closest to the center of the lower cluster, then calculating the distance between the target vector and each vector in the lower cluster, and obtaining the vector closest to the target vector:

it will be readily understood that the target vector may correspond to the first eigenvector described above, all vectors in fig. 4 correspond to the third eigenvector described above, and the vector in the lower cluster in fig. 6 corresponds to the second eigenvector described above.

When the feature vector dimension is very high, computing the similarity between vectors is very complicated and slow. The cascade search divides a high-dimensional vector into a plurality of low-dimensional vectors, and then sequentially compares the Similarity (Similarity) between the low-dimensional vectors. And finally, taking the first K vectors with the largest sum of the successive comparison similarity as output. The whole flow is shown in figure 7.

The following is an example of a process of feature retrieval based on cluster retrieval and cascade retrieval fusion.

In this example, for a hundred million high-dimensional vector, a method combining cascade search and cluster search may be used to perform the search of the hundred million vector. First, target vectors to be compared are inputv _obj(corresponding to the first feature vector), and clustering the vector to be retrievedVDivided into 16384 unitsU. Then the target vectorv _objTo the center of all unitsCComparing distances to select 64 nearest unitsV _cluster。

In addition, Scalar Quantization (Scalar Quantization) can be performed on each vector placed in a cell. Scalar quantization converts each dimension of the original vector from a 4 byte floating point number to a 1 byte unsigned integer, which can significantly reduce memory space.

The clustering method can quickly narrow the range to be searched in mass data. Then, cascade search is carried out in the clustered vectors, and firstly, the cascade search is carried outmA feature vector of 2048 dimensions is split intomA 4 × 512-dimensional feature vector, and then a target vectorv _objAnd also into 4 x 512-dimensional feature vectors. Then, similarity is sequentially calculated between the segmented target vector and the segmented characteristic vector to be retrieved, and then the similarity is added to obtain the final similarity.

The similarity in this example was calculated using Euclidean distance (Eucildean). And finally, taking out the first k feature vectors with the maximum sum of 4 similarity degrees as topK to return, wherein the whole flow is as shown in FIG. 8. The two methods are combined, so that the problem of fast and accurate retrieval in high-dimensional mass feature vectors can be well solved.

Wherein the content of the first and second substances,v _objfor the target vector to be compared the target vector,V={v ₁, v ₂, v ₃… represents the billions of vectors in the vector library,U={u ₁, u ₂, u ₃,…, u ₁₆₃₈₄represents the division of the hundred million level vectors in the vector library into 16384 cells.

Each unit comprises a plurality of vectors, and each unit corresponds to a central pointC={c ₁, c ₂, c ₃,…,c ₁₆₃₈₄}

c _objAndCcalculates the distance from all the center points in (a).V _cluster ={v _cluster1, v _cluster2, v _cluster3,…v _clusterm}，mIs the number of vectors after clustering.

The Euclidean distance calculation formula is as follows:

x _iandy _irespectively representXAndYto (1)iThe dimension is represented by a dimension, which is,nis the total dimension of the vector, in this examplenIs 512.

The sum of the distances is:

positionthe representation is the 512 th dimension in the entire vector dimension,distrepresenting the euclidean distance. Wherein the content of the first and second substances,dist _finalmay correspond to the second degree of similarity described above,dist _positionmay correspond to the first similarity described above.

And selecting K corresponding vectors with the maximum similarity in the final similarity set as topK to return.

As shown in fig. 8, the entire process for returning topK can be summarized as: inputting a target picture, converting the characteristic vector (correspondingly obtaining a first characteristic vector), clustering retrieval, cascading retrieval and outputting a topK result.

Based on the specific application example, when the feature retrieval method provided by the embodiment of the application is applied to pedestrian re-identification, the problem of retrieving the pedestrian re-identification task in the massive vector database is solved, the retrieval time is saved, and the retrieval speed in seconds of a hundred million level base is met. In addition, the retrieval accuracy rank1 (the probability that the top graph is the correct result) under the hundred million vector base library can reach 80%, and the average accuracy (MAP) of the query can reach 70%.

As shown in fig. 9, an embodiment of the present application further provides a feature retrieval apparatus, including:

an obtaining module 901, configured to obtain a first feature vector and P second feature vectors, where a dimension of the first feature vector and a dimension of the second feature vectors are both W, P and W are both integers greater than 1, and the first feature vector and the second feature vectors are both feature vectors of a multimedia resource;

a dividing module 902, configured to divide the first feature vector into L first vectors, and divide each second feature vector into L second vectors, where a dimension of each first vector and a dimension of each second vector are both smaller than W, the L first vectors correspond to the L second vectors one to one, and L is an integer greater than 1;

a determining module 903, configured to determine a second similarity between the first feature vector and each second feature vector according to a first similarity between the L first vectors and the L second vectors associated with each second feature vector;

and the retrieval module 904 is configured to obtain a feature retrieval result of the first feature vector according to the second similarity.

Optionally, the obtaining module 901 may include:

the device comprises a first obtaining unit, a second obtaining unit and a feature vector library, wherein the first obtaining unit is used for obtaining a first feature vector and the feature vector library comprises Q third feature vectors, and Q is an integer greater than or equal to P;

the clustering unit is used for clustering the Q third feature vectors to obtain M feature vector clusters, each feature vector cluster is provided with a corresponding first central vector and is associated with at least one third feature vector, and M is an integer greater than 1;

a first determining unit, configured to determine a target distance between a second center vector of the first feature vectors and each first center vector;

and the second determining unit is used for determining a third feature vector associated with the feature vector cluster corresponding to the third central vector as the second feature vector, and the third central vector is a first central vector of which the corresponding target distance meets a preset distance condition.

Optionally, the feature retrieving means may further include:

the processing module is used for respectively carrying out scalar quantization processing on at least one third eigenvector included in each eigenvector cluster;

and the storage module is used for storing the third feature vector after scalar quantization processing.

Optionally, the determining module 903 may include:

a third determining unit, configured to determine, respectively, a first similarity between each second vector and a corresponding first vector in L second vectors associated with a fourth feature vector, where the fourth feature vector is any one of the second feature vectors;

and a fourth determining unit, configured to determine a sum of the L first similarities as a second similarity between the first feature vector and the fourth feature vector.

Optionally, the retrieving module 904 may include:

the sorting module is used for sorting the P second similarities from big to small;

and the output module is used for outputting K second feature vectors corresponding to the K second similarity degrees which are sequenced at the front, the feature retrieval result comprises K second feature vectors, and K is a positive integer less than or equal to P.

Optionally, the obtaining module 901 may include:

a second acquisition unit configured to acquire a target image;

the output unit is used for inputting the target image to the trained target feature extraction model and outputting a first feature vector;

Optionally, the feature retrieving means may further include:

the establishing module is used for establishing an initial characteristic extraction model;

the training module is used for training the initial feature extraction model by using the sample image set with the label, and acquiring a target feature extraction model under the condition that the loss value of the loss function of the initial feature extraction model is smaller than a preset loss value;

It should be noted that the feature retrieval device is a device corresponding to the feature retrieval method, and all implementation manners in the method embodiments are applicable to the embodiment of the device, and the same technical effects can be achieved.

Fig. 10 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.

The electronic device may include a processor 1001 and a memory 1002 that stores computer program instructions.

Specifically, the processor 1001 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 1002 may include mass storage for data or instructions. By way of example, and not limitation, memory 1002 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, magnetic tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 1002 may include removable or non-removable (or fixed) media, where appropriate. The memory 1002 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 1002 is non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to methods in accordance with the present disclosure.

The processor 1001 realizes any one of the feature retrieval methods in the above embodiments by reading and executing computer program instructions stored in the memory 1002.

In one example, the electronic device may also include a communication interface 1003 and a bus 1004. As shown in fig. 10, a processor 1001, a memory 1002, and a communication interface 1003 are connected to each other via a bus 1004 to complete mutual communication.

The communication interface 1003 is mainly used for implementing communication between modules, apparatuses, units and/or devices in this embodiment.

Bus 1004 includes hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 1004 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In addition, in combination with the feature retrieval method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the feature retrieval methods in the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A feature retrieval method, comprising:

dividing the first feature vector into L first vectors, dividing each second feature vector into L second vectors, wherein the dimension of each first vector and the dimension of each second vector are both smaller than W, the L first vectors correspond to the L second vectors one to one, and L is an integer larger than 1;

determining a second similarity between the first feature vector and each second feature vector according to a first similarity between the L first vectors and L second vectors associated with each second feature vector;

obtaining a feature retrieval result of the first feature vector according to the second similarity;

the first feature vector is a feature vector of a multimedia resource needing feature retrieval, and the second feature vector is a candidate feature vector of the multimedia resource;

the feature retrieval result comprises one or more second feature vectors with corresponding second similarity ranking in the front; or the feature retrieval result comprises one or more second feature vectors with corresponding second similarity higher than the similarity threshold.

2. The method of claim 1, wherein obtaining the first eigenvector and the P second eigenvectors comprises:

clustering the Q third feature vectors to obtain M feature vector clusters, wherein each feature vector cluster has a corresponding first central vector and is associated with at least one third feature vector, and M is an integer greater than 1;

and determining a third feature vector associated with a feature vector cluster corresponding to a third center vector as the second feature vector, wherein the third center vector is a first center vector of which the corresponding target distance meets a preset distance condition.

3. The method of claim 2, wherein after clustering the Q third eigenvectors to obtain M eigenvector clusters, the method further comprises:

storing the third feature vector after scalar quantization processing.

4. The method of claim 1, wherein the determining the second similarity of the first eigenvector and each of the second eigenvectors according to the first similarity between the L first vectors and the L second vectors associated with each of the second eigenvectors respectively comprises:

and determining the sum of L first similarity as a second similarity between the first feature vector and the fourth feature vector.

5. The method according to claim 1, wherein obtaining the feature search result of the first feature vector according to the second similarity comprises:

sequencing the P second similarities from large to small;

outputting K second feature vectors corresponding to the first K second similarities, wherein the feature retrieval result comprises the K second feature vectors, and K is a positive integer smaller than or equal to P.

6. The method of claim 1, wherein the multimedia asset is an image;

the obtaining of the first feature vector includes:

acquiring a target image;

inputting the target image into a target feature extraction model obtained through training, and outputting the first feature vector;

the target feature extraction model comprises a target backbone network, a target aggregation network and a target head network, wherein the input end of the target backbone network is used for receiving the target image, the input end of the target aggregation network is used for receiving the output of the target backbone network, the input end user of the target head network receives the output of the target aggregation network, and the output end of the target head network is used for outputting the first feature vector.

7. The method of claim 6, before inputting the target image to a trained target feature extraction model and outputting the first feature vector, the method further comprises:

establishing an initial characteristic extraction model;

training the initial feature extraction model by using a sample image set carrying a label, and acquiring the target feature extraction model under the condition that the loss value of the loss function of the initial feature extraction model is smaller than a preset loss value;

wherein the loss function comprises at least one of a cross-entropy loss function, a triplet loss function, and a round loss function, the loss function calculating the loss value based on the output of the annotation and initial feature extraction model.

8. A feature retrieval apparatus, characterized in that the apparatus comprises:

the device comprises an acquisition module and a processing module, wherein the acquisition module is used for acquiring a first feature vector and P second feature vectors, the dimensionality of the first feature vector and the dimensionality of the second feature vectors are both W, both P and W are integers larger than 1, and both the first feature vector and the second feature vectors are feature vectors of multimedia resources;

the dividing module is used for dividing the first feature vectors into L first vectors, dividing each second feature vector into L second vectors, wherein the dimension of each first vector and the dimension of each second vector are both smaller than W, the L first vectors correspond to the L second vectors one by one, and L is an integer larger than 1;

a determining module, configured to determine, according to first similarities between the L first vectors and L second vectors associated with each second feature vector, second similarities between the first feature vectors and each second feature vector;

the retrieval module is used for obtaining a feature retrieval result of the first feature vector according to the second similarity;

9. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the feature retrieval method of any of claims 1-7.

10. A computer storage medium having computer program instructions stored thereon which, when executed by a processor, implement the feature retrieval method of any one of claims 1-7.