CN112989093A

CN112989093A - Retrieval method and device and electronic equipment

Info

Publication number: CN112989093A
Application number: CN202110089222.5A
Authority: CN
Inventors: 黄德亮; 朱烽; 赵瑞
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-06-18
Also published as: WO2022156284A1

Abstract

The embodiment of the disclosure provides a retrieval method, a retrieval device and electronic equipment, wherein the method comprises the following steps: performing feature extraction on a first image to obtain long features of the first image; inputting the long features into a pre-trained short feature extraction network to obtain short features output by the short feature extraction network and corresponding to the long features; based on the short features, searching a short feature library to obtain a matched candidate short feature set; obtaining long features corresponding to each short feature in the candidate short feature set from a long feature library to obtain a candidate long feature set; and carrying out similarity calculation on the long features and the candidate long feature set, and acquiring target long features matched with the features from the candidate long feature set.

Description

Retrieval method and device and electronic equipment

Technical Field

The present disclosure relates to machine learning technologies, and in particular, to a retrieval method, a retrieval device, and an electronic device.

Background

For example, in a face recognition application, the process of retrieving a face image is to extract features of the face image, and then search and match the extracted features with a face feature library stored in a database. The number of features in the face feature library is generally large, and in order to increase the retrieval speed, a short feature obtained by compressing a long feature (e.g., an original face feature) is generally used for retrieval in the retrieval process.

In the related art, long features can be compressed by using a method such as Principal Component Analysis (PCA) or clustering, but the accuracy of a search result obtained by a short feature search using such a compression method is still to be improved.

Disclosure of Invention

In view of the above, the present disclosure provides at least a retrieval method, a retrieval device and an electronic device.

In a first aspect, a retrieval method is provided, and the method includes:

performing feature extraction on a first image to obtain long features of the first image;

inputting the long features into a pre-trained short feature extraction network to obtain short features output by the short feature extraction network and corresponding to the long features;

based on the short features, searching a short feature library to obtain a matched candidate short feature set;

obtaining long features corresponding to each short feature in the candidate short feature set from a long feature library to obtain a candidate long feature set;

and carrying out similarity calculation on the long features and the candidate long feature set, and acquiring target long features matched with the features from the candidate long feature set.

In a second aspect, a retrieval apparatus is provided, the apparatus comprising:

the long feature extraction module is used for extracting features of a first image to obtain long features of the first image;

the short feature obtaining module is used for inputting the long features into a pre-trained short feature extraction network to obtain short features which are output by the short feature extraction network and correspond to the long features;

the candidate feature obtaining module is used for retrieving a matched candidate short feature set from a short feature library based on the short features; obtaining long features corresponding to each short feature in the candidate short feature set from a long feature library to obtain a candidate long feature set;

and the retrieval processing module is used for carrying out similarity calculation on the long features and the candidate long feature set and acquiring target long features matched with the features from the candidate long feature set.

In a third aspect, an electronic device is provided, the device comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any of the embodiments of the present disclosure when executing the computer instructions.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which program, when executed by a processor, implements the method of any of the embodiments of the present disclosure.

In a fifth aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of any of the embodiments of the present disclosure.

According to the retrieval method, the retrieval device and the electronic equipment, the long features are compressed by using the short feature extraction network trained in advance to obtain the short features, so that the information of the long features can be better reserved by the short features obtained in the mode, and the retrieval precision by using the short features is higher; in addition, the short feature extraction network can be also suitable for compressing new long features only by training once, and the training cost is low.

Drawings

In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a schematic flow chart of a retrieval method provided by at least one embodiment of the present disclosure;

fig. 2 illustrates a network architecture diagram provided by at least one embodiment of the present disclosure;

fig. 3 illustrates a training process of a short feature extraction network provided by at least one embodiment of the present disclosure;

fig. 4 shows a schematic structural diagram of a retrieval apparatus provided in at least one embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art based on one or more embodiments of the disclosure without inventive faculty are intended to be within the scope of the disclosure.

The embodiment of the disclosure provides a retrieval method, which can be used for retrieval. As shown in fig. 1, the method may include the following processes:

in step 100, feature extraction is performed on a first image to obtain long features of the first image.

The first image referred to in this step may be an image to be subjected to retrieval matching.

In practical implementation, for example, the feature of the first image may be extracted through a long feature extraction network trained in advance, so as to obtain the long feature of the first image. Referring to the example of fig. 2 in combination, the first image 21 is input to the long feature extraction network 22, and the structure of the long feature extraction network 22 is not limited in this embodiment, for example, a convolutional neural network may be used. The long feature extraction network 22 outputs extracted features, which may be referred to as long features 23.

In step 102, the long features are input into a pre-trained short feature extraction network, and short features corresponding to the long features and output by the short feature extraction network are obtained.

In this step, please continue to refer to fig. 2, the short feature extraction network 24 is trained in advance, and the long features obtained in step 100 are input into the short feature extraction network 24, so as to output the short features 25. Short features are compressed from long features and are typically smaller in feature dimension than long features.

In one example, the short feature extraction network 24 may be a fully-connected layer network, which may be composed of at least one fully-connected layer (in practical implementations, the number of fully-connected layers may be determined by taking time consumption and accuracy into consideration, generally speaking, retrieval accuracy increases as the number of layers increases, but network training time and short feature calculation time increase as the number of layers increases), each fully-connected layer has an output dimension of 2048, and each fully-connected layer may be followed by at least one ReLU layer (active layer). When the short feature extraction network 24 comprises a full-connection layer network, only simple matrix operation is performed, the dependence on hardware is small, and the short feature extraction network can run on most hardware.

In step 104, a matching candidate short feature set is retrieved from the short feature library based on the short features.

As shown in fig. 2, the short feature library 26 stores many short features, and the similarity between the short features 25 output by the short feature extraction network 24 and each short feature in the short feature library 26 may be calculated to obtain a similarity value, and at least a part of the short features may be selected from the short feature library according to the similarity value corresponding to each short feature to form the candidate short feature set. The selection method includes but is not limited to:

for example, the similarity values corresponding to the short features are sorted in descending order, and the first N short features are selected to form a candidate short feature set, where N is a natural number. As another example, a plurality of short features with similarity values greater than or equal to a similarity threshold are selected to form a candidate short feature set.

For convenience of understanding, the establishment of the "short feature library" and the "long feature library" appearing in the present embodiment is explained first: the short feature library and the long feature library may be established based on an image base library, where the image base library may be a search library used for search matching, for example, in a face search scene, a batch of face images with known identities may be obtained in advance, and these face images with determined identities may be referred to as an image base library. Then, for each candidate image in the image base, it is assumed that the long feature YS of the candidate image is extracted, and the long feature is compressed by a short feature extraction network (which may be the short feature extraction network 24 in fig. 2) trained in advance to obtain the short feature DT. The short features DT may be stored to a short feature library and the long features YS to a long feature library. In this way, the short features DT obtained for each candidate image in the image base constitute a short feature base, and the long features YS obtained for each candidate image in the image base constitute a long feature base. And, a mapping relationship between the long feature and the short feature of each candidate image is also stored.

In step 106, the long features corresponding to each short feature in the candidate short feature set are obtained from the long feature library, so as to obtain a candidate long feature set.

As described above, each short feature in the short feature library and the long feature in the long feature library have a mapping relationship, then the long features respectively corresponding to each short feature in the candidate short feature set can be found according to the candidate short feature set obtained in step 104, and these found long features constitute the candidate long feature set.

In step 108, similarity calculation is performed between the long features and the candidate long feature set, and a target long feature matched with the long features is obtained from the candidate long feature set.

As shown in fig. 2, in this step, similarity calculation may be performed between the long features obtained in step 100 and each long feature in the candidate long feature set, so as to obtain a matched target long feature. For example, the target long feature may be the feature with the highest similarity to the long feature obtained in step 100.

In one example, the identity information corresponding to each long feature in the long feature library is stored in advance, so that after the target long feature is obtained, the corresponding identity information can be obtained directly according to the target long feature. In another example, face images corresponding to the respective long features in the long feature library may be stored in advance, and a face image (which may be referred to as a second image) corresponding to the target long feature may be used as a search result image matched with the first image.

According to the feature retrieval method, the long features are compressed by using the short feature extraction network trained in advance to obtain the short features, so that the information of the long features can be better reserved by the short features obtained in the mode, and the retrieval precision by using the short features is higher; in addition, the short feature extraction network can be also suitable for compressing new long features only by training once, and the training cost is low.

The training process of the short feature extraction network according to the embodiment of the present disclosure is described as follows, please refer to fig. 3, where fig. 3 is an example of the training process of the short feature extraction network, and may include:

in step 300, sample long features of a training sample image are extracted, and the sample long features are input into the short feature extraction network to be trained, so as to obtain sample short features.

In this step, the sample long features of the training sample image can be extracted through the long feature extraction network, and the sample long features are input into the short feature extraction network to be trained, so as to obtain the sample short features.

The training sample image is provided with a corresponding classification label, and the classification label is used for representing a classification category to which an object in the training sample image belongs.

In step 302, a prediction classification result of the training sample image is obtained based on the sample short features, and a loss value is obtained according to the prediction classification result and a classification label corresponding to the training sample image.

In this step, the classification prediction of the training sample image can be continued based on the sample short features, so as to obtain the prediction classification result of the training sample image. And a Loss function may be calculated to obtain a Loss value based on the predicted classification result and the classification label. For example, a cross entropy (softmax) function may be used to determine the loss value and train the short feature extraction network accordingly until convergence.

In step 304, network parameters of the short feature extraction network are adjusted based on the loss value.

For example, in this step, the network parameters of the short feature extraction network may be adjusted back based on the loss value.

After the training of the short feature extraction network is completed, fine tuning training of the short feature extraction network can be continued. For example, a loss value may be determined using a cross-entropy function, and based on the loss value, the short feature extraction network may be fine-tuned (finetune) by a triplet loss.

In the above-described training mode, taking the example of extracting the sample long features of the training sample image through the pre-trained long feature extraction network, only the network parameters of the short feature extraction network need to be adjusted in the training process. In other examples, the long feature extraction network and the short feature extraction network may be trained simultaneously, for example, the long feature extraction network to be trained may extract the sample long feature of the training sample image, the long feature extraction network may extract the sample long feature of the training sample image, and when the network parameter is adjusted according to the loss value, the network parameters of the short feature extraction network and the long feature extraction network may be adjusted simultaneously, that is, the two networks are trained together.

Fig. 4 is a structure of a retrieval apparatus provided in an exemplary embodiment of the present disclosure, which may be applied to perform the retrieval method of any embodiment of the present disclosure. As shown in fig. 4, the apparatus may include: a long feature extraction module 41, a short feature obtaining module 42, a candidate feature obtaining module 43, and a retrieval processing module 44.

The long feature extraction module 41 is configured to perform feature extraction on the first image to obtain a long feature of the first image.

A short feature obtaining module 42, configured to input the long features into a pre-trained short feature extraction network, so as to obtain short features output by the short feature extraction network and corresponding to the long features.

A candidate feature obtaining module 43, configured to retrieve a matched candidate short feature set from the short feature library based on the short features; obtaining long features corresponding to each short feature in the candidate short feature set from a long feature library to obtain a candidate long feature set;

and the retrieval processing module 44 is configured to perform similarity calculation on the long features and the candidate long feature set, and obtain target long features matched with the features from the candidate long feature set.

The present disclosure also provides an electronic device comprising a memory for storing computer instructions executable on a processor, and a processor for implementing the retrieval method of any of the embodiments of the present disclosure when executing the computer instructions.

The present disclosure also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the retrieval method according to any of the embodiments of the present disclosure.

The present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the retrieval method of any of the embodiments of the present disclosure.

One skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program may be stored, where the computer program, when executed by a processor, implements the steps of the method for training a neural network for word recognition described in any of the embodiments of the present disclosure, and/or implements the steps of the method for word recognition described in any of the embodiments of the present disclosure. Wherein "and/or" means having at least one of the two, e.g., "multi and/or B" includes three schemes: poly, B, and "poly and B".

The embodiments in the disclosure are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The foregoing description of specific embodiments of the present disclosure has been described. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this disclosure may be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this disclosure and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPG multi (field programmable gate array) or a SIC multi (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Further, the computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PD multi), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular embodiments of the disclosure. Certain features that are described in this disclosure in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure, and is not intended to limit the scope of the present disclosure, which is to be construed as being limited by the appended claims.

Claims

1. A method of searching, the method comprising:

2. The method of claim 1, wherein the retrieving a matching candidate short feature set from a short feature library based on the short features comprises:

similarity calculation is carried out on the short features and each short feature in a short feature library to obtain similarity values;

and selecting at least part of short features from the short feature library to form the candidate short feature set according to the similarity value corresponding to each short feature.

3. The method according to claim 1 or 2, characterized in that the method further comprises: establishing the short feature library and the long feature library, including:

for a single candidate image in an image base library, extracting long features of the candidate image;

inputting the long features into the short feature extraction network to obtain short features;

and storing the long features into the long feature library, storing the short features into the short feature library, and storing the mapping relation between the long features and the short features.

4. The method according to any one of claims 1 to 3, further comprising:

extracting sample long features of a training sample image based on a long feature extraction network, and inputting the sample long features into the short feature extraction network to be trained to obtain sample short features;

obtaining a prediction classification result of the training sample image based on the sample short features, and obtaining a loss value according to the prediction classification result and a classification label corresponding to the training sample image;

adjusting network parameters of the short feature extraction network based on the loss value.

5. The method of claim 4, further comprising: and after the training of the short feature extraction network is finished, carrying out fine tuning training on the short feature extraction network.

6. The method of claim 5, wherein the short feature extraction network comprises a fully-connected layer network, each of which is followed by at least one active layer;

correspondingly, after the training of the short feature extraction network is completed, the fine tuning training of the short feature extraction network includes:

identifying the loss value using a cross-entropy function;

and performing fine tuning training on the short feature extraction network by using a triple loss function based on the loss value.

7. The method according to any one of claims 1 to 6, further comprising:

and acquiring a second image corresponding to the target long feature based on the target long feature, wherein the second image is a retrieval result corresponding to the first image.

8. A retrieval apparatus, characterized in that the apparatus comprises:

9. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 7 when executing the computer instructions.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 7 when executed by a processor.