WO2021168815A1

WO2021168815A1 - Image retrieval method and image retrieval apparatus

Info

Publication number: WO2021168815A1
Application number: PCT/CN2020/077238
Authority: WO
Inventors: 钟林; 宋昆鹏; 路石
Original assignee: 华为技术有限公司
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-09-02
Also published as: CN115176244A

Abstract

The present application provides an image retrieval method and an image retrieval apparatus, which use artificial intelligence. In the technical solution proposed by the present application, one of a base library feature corresponding to an image in an image library, and a feature to be retrieved corresponding to an image to be retrieved is used as a weight of a neural network, and the other one of the base library feature and said feature to be retrieved is used as an input of the neural network, a matrix computation between the input and the weight of the neural network is achieved by using the neural network, and a target image is obtained by retrieval from the image library according to an output of the neural network. The technical solution of the present application can not only realize image retrieval, but also simplify the implementation complexity of image retrieval.

Description

Image retrieval method and image retrieval device

Technical field

This application relates to the field of retrieval, and more specifically, to an image retrieval method and an image retrieval device.

Background technique

Retrieval refers to starting from the user's specific information needs, using certain methods and technical means for a specific information collection, and finding relevant information from it according to certain clues and rules. Retrieval has been applied to all walks of life in today's society. For example, image retrieval technology can be applied to tasks such as face recognition, license plate detection or fingerprint recognition.

Image retrieval mainly includes three steps: first, collecting and processing image resources, extracting image features, and establishing an image feature database; secondly, acquiring the image to be retrieved, extracting the features of the image, and forming the feature data to be retrieved; then, based on similarity The degree algorithm calculates the similarity between the feature data to be retrieved and the features recorded in the feature database; finally, the records meeting the similarity threshold are extracted from the feature database as the retrieval result, and output in descending order of similarity.

Currently, image retrieval can be achieved through artificial intelligence (AI) technology. Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.

How to use artificial intelligence to achieve image retrieval, for example, in the face of today's massive image data such as face recognition data, license plate detection data or fingerprint retrieval data, how to use artificial intelligence to achieve image retrieval has become an urgent technical problem to be solved.

Summary of the invention

The image retrieval method and image retrieval device provided in this application can implement image retrieval through artificial intelligence methods.

In the first aspect, the present application provides an image retrieval method, the method includes: acquiring features to be retrieved, the features to be retrieved are features of the image to be retrieved; Feature, the bottom library image is an image in a pre-configured image library; one of the bottom library feature and the feature to be retrieved is used as the weight of the neural network, and the bottom library feature and the feature to be retrieved Another item in the feature is used as the input of the neural network to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; according to the output of the neural network from the The target image is retrieved from the image library.

In the method of the present application, the matrix product of the feature vector to be retrieved and the feature vector of the base library is calculated through a neural network, so as to calculate the similarity between the feature to be retrieved and the base library feature based on the product, thereby realizing image retrieval. In addition, the natural matrix multiplication characteristics of the neural network can be fully utilized, which helps to obtain the similarity between the image to be retrieved and the base library image while reducing the complexity of implementation.

With reference to the first aspect, in a first possible implementation manner, the obtaining the output of the neural network includes: obtaining the output of the neural network through a neural network processor.

In this implementation, the neural network processor is used to perform neural network calculations, which has faster running speed and better retrieval efficiency.

In combination with the first possible implementation manner, in the second possible implementation manner, the neural network processor includes a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle .

In this implementation manner, since the three-dimensional arithmetic unit can perform more addition operations or multiplication operations in one clock cycle, that is, the running speed is faster, so the retrieval efficiency is higher.

With reference to the first aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the feature to be retrieved is used as the input, and the base library feature is used as the weight.

With reference to the first aspect or any one of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.

In a second aspect, the present application provides an image retrieval device, which includes: an acquisition module for acquiring features to be retrieved, where the features to be retrieved are features of the image to be retrieved; and the acquisition module is also used for acquiring base library features , The bottom library feature is a feature of a bottom library image, the bottom library image is an image in a pre-configured image library; an arithmetic module is used to take one of the bottom library feature and the feature to be retrieved as The weight of the neural network, the other of the base library feature and the feature to be retrieved is used as the input of the neural network, and the output of the neural network is obtained. The neural network is used to realize the input and the The matrix operation of the weights; a retrieval module, which is used to retrieve the target image from the image library according to the output of the neural network.

With reference to the second aspect, in the first possible implementation manner, the arithmetic module is specifically configured to obtain the output of the neural network through a neural network processor.

In combination with the first possible implementation manner, in the second possible implementation manner, the neural network processor includes a three-dimensional CUBE operation unit, and the minimum time period for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.

With reference to the second aspect or any of the foregoing possible implementation manners, in a third possible implementation manner, the feature vector to be retrieved is used as the input, and the base library feature vector is used as the weight.

With reference to the second aspect or any of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.

In a third aspect, the present application provides an image retrieval device, which includes: a processor coupled with a memory; the memory is used to store instructions; the processor is used to execute instructions stored in the memory, So that the device performs the following operations: acquiring features to be retrieved, which are features of the image to be retrieved; acquiring bottom library features, where the bottom library features are features of bottom library images, and the bottom library images are pre- The image in the configured image library; use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the The input of the neural network is used to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; and the target image is retrieved from the image library according to the output of the neural network.

With reference to the third aspect, in a first possible implementation manner, the processor includes a neural network processor. Wherein, the neural network processor is configured to: obtain the output of the neural network, the weight of the neural network is one of the base library feature and the feature to be retrieved, and the input of the neural network is the Another item of the base library feature and the feature to be retrieved, and the neural network is used to implement the matrix operation of the input and the weight.

With reference to the third aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the feature to be retrieved is used as the input, and the base library feature is used as the weight.

With reference to the third aspect or any of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.

In a fourth aspect, the present application provides a computer-readable medium that stores instructions for device execution, and the instructions are used to implement the method in the first aspect or any one of the possible implementation manners.

In the fifth aspect, this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method in the first aspect or any one of the possible implementation manners.

In a sixth aspect, the present application provides a chip that includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any one of its possible implementations The method in the way.

Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory. When the instructions are executed, the The processor is configured to execute the method in the first aspect or any one of the possible implementation manners.

In a seventh aspect, the present application provides a computing device. The computing device includes a processor and a memory. The memory stores computer instructions, and the processor executes the computer instructions to implement the first aspect or any one of the possible implementation modes. In the method.

Description of the drawings

Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a neural network processor according to an embodiment of the present application;

FIG. 3 is a schematic flowchart of an image retrieval method according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an image retrieval device according to an embodiment of the present application;

Fig. 5 is another schematic structural diagram of an image retrieval device according to an embodiment of the present application.

Detailed ways

The technical solution in this application will be described below in conjunction with the accompanying drawings.

Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application. As shown in Figure 1, the image retrieval system may include a feature base module 110, a model conversion (model conversion) module 120, a deep learning platform (deep learning plateform, DL plateform) 130, a query module 140, and a search engine ( The retrieal engine module 150 and the sorting module 160.

The feature base library module 110 includes base library features obtained by feature extraction on images in the base library by a feature extraction network, and each image has a corresponding base library feature.

The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.

In some examples, the feature extraction network may include a residual network (resnet) and a fully connected layer (FC); in other examples, the feature extraction network may include a VGC16 and a fully connected layer.

The model conversion module 120 is used to convert the feature vector of the base library from the current format into a format that can be loaded by the retrieval neural network. The current format mentioned here generally refers to the format supported by the deep learning platform that constructs the feature extraction network.

For example, when the feature extraction network for extracting base library features is a neural network constructed based on the deep learning platform "Tensorflow", and the retrieval neural network is a neural network constructed based on the deep learning platform "caffe", the model conversion module 120 is required to convert the base library features , From the format supported by the deep learning platform "Tensorflow" to the format supported by the deep learning platform "caffe".

For example, the model conversion module can convert the base library features from the current format into a format that can be loaded by the retrieval neural network based on the basic software package. An example of a basic software package is the Numpy software package.

The Numpy software package refers to scientific calculations implemented in python, which can include: powerful N-dimensional array objects, where N is a positive integer; relatively mature function libraries; toolkits for integrating C/C++ and Fortran codes; practical linearity Algebra, Fourier transform and random number generation functions.

The deep learning platform 130 is used to construct a retrieval neural network. The deep learning platform 130 includes but is not limited to caffe, Tensorflow, Mxnet, MindSpore, etc.

In some implementations, the deep learning platform 130 may be used to construct a neural network that calculates the similarity between the base library feature vector and the feature vector to be retrieved. Among them, the dimension of the weight matrix of the constructed neural network can be determined by the number of feature vectors of the base library. For example, when the base library feature vector length is 256 and there are 300,000 base library feature vectors in total, the weight matrix in the constructed neural network can be a three-dimensional matrix of 256*1*300000.

For example, when using the cosine formula to calculate the similarity between the base library feature vector and the feature vector to be retrieved, the deep learning platform 130 can be used to construct the normalized base library feature vector and the normalized feature vector to be retrieved. Neural network with inner product.

The query module 140 is configured to perform feature extraction on the image to be retrieved through a feature extraction network to obtain the feature of the image to be retrieved. The feature of the image to be retrieved is called the feature to be retrieved. The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.

In some examples, the feature extraction network may include a residual network and a fully connected layer; in other examples, the feature extraction network may include a VGC16 and a fully connected layer.

Generally speaking, the feature extraction network used for feature extraction of the base library image is the same as the feature extraction network for feature extraction of the image to be retrieved. The same here means that the network structure and network parameters are the same.

The retrieval engine module 150 is mainly used to: use the features of the base database as the weight of the retrieval neural network, use the features to be retrieved as the input of the retrieval neural network, and perform addition, subtraction, multiplication and/or division operations based on the model structure of the retrieval neural network. In order to obtain the similarity between the base library feature and the feature to be retrieved.

The search engine module 150 may also include operations such as normalizing the base library features and the features to be retrieved.

The ranking module 160 is used to obtain the similarity that meets the requirements from the similarity obtained by the search engine module 150.

In some examples, the sorting module 160 is specifically used to: filter the similarity acquired by the search engine module 150 based on a preset threshold to eliminate lower similarities; and then perform a certain order for the remaining similarities (for example, Sort from largest to smallest); select the top X similarities from the sorted similarities and select the bottom library images corresponding to the top X similarities from the bottom library images. These X bottom library images are Is the search result, where X is a positive integer less than the total number of images in the base library.

In other examples, the sorting module 160 may be specifically used to: sort the similarities obtained by the search engine module 150 in a certain order (for example, from the largest to the smallest); select the ranking from the sorted similarities The similarity of the first X and the base library image corresponding to the first X similarities are selected from the base library images. These X base library images are the retrieval results, where X is a positive integer less than the total number of images in the base library image .

It is understandable that the architecture of the image retrieval system shown in FIG. 1 is only an example, and the image retrieval system to which the image retrieval method proposed in this application can be applied may include more or fewer modules. For example, if the neural network used to extract the features of the base library and the retrieval neural network can run on the same deep learning platform, the image retrieval system to which the image retrieval method of this application can be applied may not have the model conversion module 120. For another example, the image retrieval system to which the image retrieval method of the present application can be applied may include a neural network for extracting features of the base library.

An exemplary structure diagram of an image retrieval device according to an embodiment of the present application is shown in FIG. 2. The image retrieval device 200 shown in FIG. 2 may include a main processor 210, a memory 220, and a neural network processor 230.

The main processor 210 may be a central processing unit (CPU). The main processor 210 may also be referred to as a host CPU (Host CPU).

The functions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, and the sorting module 160 may be implemented by the main processor 150. The function of the search engine module 150 can be implemented by the neural network processor 230.

The memory 220 can store the corresponding instructions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, the search engine module 150, and the sorting module 160, as well as base library features and features to be retrieved, and even You can also store bottom library images and images to be retrieved.

Neural-network processing unit (NPU) 230 is mainly used to complete numerical operations of addition, subtraction, multiplication, and division required for network inference. For example, the NPU 230 completes the multiplication and accumulation operations required for network inference. The neural network processor 230 may be mounted on the main CPU as a co-processor, and the Host CPU can allocate tasks.

The neural network processor 230 may include an input memory 201, a weight memory 202, an arithmetic circuit 203, a controller 204, a storage unit controller 205, a unified memory 206, and a fetch memory 209.

The unified memory 206, the input memory 201, the weight memory 202, and the instruction fetch buffer 209 are all on-chip memories.

The storage unit access controller (direct memory access controller, DMAC) 205 is mainly used to transfer the input data and weight data in the memory 220 to the unified memory 206.

Further, the DMAC 205 can also be used to transfer weight data from the unified memory 206 to the weight memory 202, transfer input data data from the unified memory 206 to the input memory 201, and transfer instructions to the instruction fetch memory 209.

The controller 204 executes the instructions stored in the fetch memory 209, and controls the arithmetic circuit to perform operations on the weights stored in the weight memory 202 and the input data in the input memory 201.

The NPU 230 executes corresponding instructions through the controller 204, and controls the arithmetic circuit 203 to extract the matrix data in the weight memory 502 and the input memory 501 and perform matrix operations.

In some implementations, the arithmetic circuit 203 may include a three-dimensional cube arithmetic unit. The arithmetic circuit 503 may also be a one-dimensional systolic array, a two-dimensional systolic array, or other electronic circuits capable of performing mathematical operations such as multiplication and addition.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 203 fetches the data corresponding to matrix B from the weight memory 202 and caches it on each arithmetic unit in the arithmetic circuit. The arithmetic circuit 203 fetches the matrix A data and matrix B from the input memory 201 to perform a matrix operation, and the partial result or final result of the obtained matrix is stored in the unified memory 206.

Fig. 3 is an exemplary flowchart of an image retrieval method according to an embodiment of the application. As shown in FIG. 3, the method may include S310 to S340.

S310: Acquire a base library feature, where the base library feature is a feature of an image in an image library.

The images in the image library can be called bottom library images. Normally, the base library images are collected in advance or collected.

You can number the images in the image library, and assign an index number for identifying the image to each image. After acquiring the features of the images in the image library, an index number may also be assigned to each base library feature, and the index number of the base library feature may be the same as the index number of the image corresponding to the base library feature.

The images in the image library may include one or more of images such as a face image, a fingerprint image, a license plate image, a vehicle image, and a human image. The images in the image library can be updated as required, such as adding, deleting or replacing the images in it.

In some implementations, each image in the image library may be input to a neural network for feature extraction, and the extracted features are called base library features. The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features. For example, when the image library contains 300,000 base images, the neural network used for feature extraction can include Resnet50 or VGG16. In other implementations, the base library features can be received or copied directly from other devices or systems.

One manifestation of the base library feature is a matrix. If the matrix corresponding to the base library feature is a one-dimensional matrix, the manifestation of the base library feature is a vector, which can be called a base library feature vector. In an example, a base library feature vector can be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).

An example of all the acquired features of the base library is the feature base library module 110 in the image retrieval system shown in FIG. 1.

If the format of the base library feature obtained above is inconsistent with the format of the weight file of the retrieval neural network in S330, the base library feature needs to be converted from the current format (which can be referred to as the initial format) to be readable by the retrieval neural network. The format taken (can be called the target format).

For example, the model conversion module 120 in the image retrieval system shown in FIG. 1 converts the features of the base library into a format that can be supported by the neural network processor. Specifically, the original format file of the base library feature can be read through the basic software package Numpy, and the weight data of the corresponding field in the weight file of the retrieved neural network can be replaced with the base library feature in the original format file, and the replaced The file is the weight file of the retrieval neural network.

S320: Acquire a feature to be retrieved, where the feature to be retrieved is a feature of an image to be retrieved.

For example, the query module 140 in the image retrieval system shown in FIG. 1 performs feature extraction on the picture to be retrieved to obtain the feature of the picture to be retrieved. The feature of the picture to be retrieved can be called the feature to be retrieved, and one form of the feature to be retrieved is a vector, which can be called the feature vector to be retrieved.

In an example, a feature vector to be retrieved may be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).

The image to be retrieved may include a face image, a fingerprint image, a license plate image, a vehicle image, or a human image, and so on. The image to be retrieved can be one, or one batch, that is, multiple images.

The images to be retrieved can be numbered, that is, each image to be retrieved is assigned an index number for identifying each image to be retrieved. After acquiring the feature to be retrieved of the image to be retrieved, an index number can also be assigned to each feature to be retrieved, and the index number of the feature to be retrieved can be the same as the index number of the corresponding image to be retrieved.

S330. Use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the input of the neural network to obtain The output of the neural network, the neural network is used to realize the matrix operation between the input and the weight. This neural network can be called a retrieval neural network.

For example, the retrieval engine module 150 in the image retrieval system shown in FIG. 1 is used to implement neural network inference and obtain the output of the neural network. Network reasoning is to call the underlying hardware (such as a neural network processor) to complete the addition, subtraction, multiplication, and/or division of input and weight.

In some possible implementations, the base library feature can be used as the weight of the neural network, and the feature to be retrieved can be used as the input of the neural network; in other implementations, the feature to be retrieved can be used as the weight of the neural network, and the base library can be used as the weight of the neural network. The features are used as input to the neural network.

In some implementation manners, before performing S330, normalization processing may be performed on each base library feature and each feature to be retrieved. Among them, the normalization of the base library features can be performed after the feature extraction module 110 extracts the base library features, or can be performed after the model conversion module 120 performs the conversion, or can be performed before the search engine module 150 performs inference; the features to be retrieved can be performed in The query module 140 then performs normalization.

The model structure of the neural network can be constructed based on a formula for calculating the similarity between the features of the base database and the features to be retrieved. The following uses cosine similarity to measure the similarity between the feature vector to be retrieved and the feature vector of the base library as an example to introduce the model structure of the neural network and the formula for calculating the similarity between the feature vector of the base library and the feature to be retrieved. Relationship.

The cosine similarity calculation formula is as follows:

Among them, n indicates that there are n base library feature vectors; A _i indicates the i-th item of the feature vector to be retrieved;

Represents the i-th item of the k-th base library feature vector; ^AT represents the transposition of the feature vector to be queried; ||A|| represents the 2-norm of the vector A to be retrieved, that is, the element square and square root; || W is ^k represents 2 norm || k-th feature vector W is a bottom library of ^k, i.e., the square of the k-th bottom in the library feature vector W is ^k, and the square root of the elements; cos [theta] ^k Represents the cosine similarity value between the feature vector to be retrieved and the k-th base library feature vector. The closer cosθ ^{k is} to 1, it indicates that the feature vector to be retrieved is more similar to the feature vector of the k-th base library.

Based on the above-mentioned cosine similarity calculation formula, ^{a retrieval neural network for calculating AT} · W ^k can be constructed, that is, a retrieval neural network for realizing W ^k matrix multiplication of A and W can be constructed.

In an example, a retrieval neural network can be constructed, and the retrieval neural network can include a network structure similar to a fully connected layer in Tensorflow or Caffe, and initialize the weights in the network structure, that is, initialize the parameters in the network structure.

After the retrieval neural network is determined, the base library feature is used as the weight of the retrieval neural network, the feature to be retrieved is used as the input of the retrieval neural network, and the output of the retrieval neural network is obtained, in an example, The main processor 210 can load the base library features obtained in S310 into the weight memory 202 of the neural network processor 230, load the features to be retrieved in S320 into the input memory 201 of the neural network processor 230, and pass The controller 204 controls the arithmetic circuit 230 to perform network inference, that is, to calculate the value of ^AT · W ^k.

S340: Retrieve a target image from the image library according to the output of the neural network.

In the embodiment of the present application, after the output of the neural network is obtained, the similarity between the base library feature and the feature to be retrieved can be determined based on the output of the neural network, and the target image can be retrieved from the image library according to the similarity.

In some implementations, if each base library feature vector and each feature vector to be retrieved are normalized before S330, then ||A|| is 1, ||W ^k || is also 1, Therefore, the cosine similarity calculation formula between the base library feature vector and the feature vector to be retrieved can be simplified as cosθ ^k =A ^T ·W ^k . In this case, the output of the neural network can be used as the base library feature and the feature to be retrieved The similarity of features. For example, the output of the neural network processor 230 can be

The value of is used as the similarity between the k-th base library feature and the feature to be retrieved.

In other implementations, if the base library feature vector and the feature vector to be retrieved are not normalized before S330, the output of the neural network can be further processed to obtain the base library feature and the feature to be retrieved. Similarity. For example, the output of the neural network processor 230 can be

Divided by

And use the obtained quotient as the similarity between the k-th base library feature and the feature to be retrieved. in,

And the value of

The value of can also be calculated based on a neural network. For example, A is used as the input and weight of the neural network, and the input and the weight are matrixed through the neural network; for example, W ^{k is} used as the input and the weight of the neural network. The weight is used to perform matrix operations on the input and the weight through the neural network.

The number of similarities output by the neural network can be determined by the input base library features and the number of features to be retrieved. For example, if n base library features are used as weights and m features to be retrieved are used as input, the neural network can output m*n similarities, and m and n are positive integers.

For example, when the base library feature vector as the weight of the retrieval neural network is 300,000, and the feature vector to be retrieved as the input of the retrieval neural network is 16 (a batch of 16), the output of the retrieval neural network is recorded as 16* 300,000 similarity values, where the value range of each similarity value can be [0,1].

In some examples, the neural network processor may include a cube computing unit. In this way, the retrieval neural network can rely on the powerful cube matrix computing capabilities of the neural network processor to improve the ability to calculate the similarity between the features of the base library and the features to be retrieved. .

For example, a single cube operation unit can usually complete a 16*16*16-dimensional matrix multiplication in one cycle, or even higher. This makes it possible to increase the computational efficiency by 8192 times compared with the traditional image processor or traditional CPU implementation of matrix calculations under the same main frequency. This is because the traditional CPU can only complete one multiplication or one addition operation in one clock cycle. If a traditional CPU is used to complete the multiplication and addition operations of 16*16*16 data, 16*16*16*2 clocks are required. Cycle; and a cube unit can complete a 16*16*16 data multiplication and addition operation in one use cycle, so using the cube unit for processing can increase the efficiency by 8192 times.

Even in the case of inputting multiple channels and batches of features to be retrieved, the similarity calculation between the base library features and the features to be retrieved through neural network inference can make the calculation speed reach the millisecond level.

In some implementations, after obtaining the similarity between the base library feature and the feature to be retrieved, the obtained similarity value can be filtered to filter out the similarity higher than the preset threshold and the corresponding image to be retrieved and the base library The index number of the image; sort these similarities in descending order, and select the top K similarities and their corresponding index numbers of the image to be retrieved and the base library image; then, these K similarities can be output as well as Corresponding to the image to be retrieved and the base library image, so as to realize the retrieval of the image, K is a positive integer. The K background images are the retrieved target images that correspond to the K images to be retrieved one-to-one.

Among them, the threshold can be set as required. Generally speaking, the threshold value can be set as large as possible, because if the threshold value is set lower, more similarity will be retained, and the efficiency of subsequent sorting and selection of the first K similarity and K image base library images Certain influence.

In the image retrieval method of this application, both the base library features and the features to be retrieved can be realized by neural networks. At the same time, the neural network is used to calculate the similarity between the base library features and the features to be retrieved, which can reduce the base library features and the features to be retrieved. Retrieving the characteristics of the repeated movement of these data reduces the time consumption of data movement or transmission. In addition, the image retrieval method of the present application can reduce retrieval time, thereby improving retrieval efficiency and thereby improving user experience. On the other hand, the features to be retrieved can be input to the retrieval neural network in batches, that is, multiple features to be retrieved are input, which can improve retrieval efficiency.

Fig. 4 is an exemplary structure diagram of the image retrieval device of the present application. The device 400 includes an acquisition module 410, an arithmetic module 420, and a retrieval module 430. The device 400 can implement the method shown in FIG. 3 described above.

For example, the acquisition module 410 is used to perform S310 and S320, the calculation module 420 is used to perform S330, and the retrieval module 430 is used to perform S340.

In some implementations, the device 400 may be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode. The cloud environment includes a cloud data center and a cloud service platform. The cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider. The computing resources included in the cloud data center can be a large number of computing resources. Device (for example, server). The device 400 may be a server for image retrieval in a cloud data center. The apparatus 400 may also be a virtual machine for image retrieval created in a cloud data center. The device 400 may also be a software device deployed on a server or a virtual machine in a cloud data center. The software device is used for image retrieval. The software device may be distributed on multiple servers or distributed on multiple servers. On a virtual machine, or distributedly deployed on virtual machines and servers. For example, the acquisition module 410, the calculation module 420, and the retrieval module 430 in the apparatus 400 may be distributed on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers. superior.

The device 400 can be abstracted by a cloud service provider into a cloud service of image retrieval on a cloud service platform and provided to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment uses the cloud service to provide the user with the cloud service of image retrieval. The user can upload the image to be retrieved to the cloud environment through the application program interface (API) or through the web interface provided by the cloud service platform, the device 400 receives the image to be retrieved, performs image retrieval based on the image to be retrieved, and finally obtains The target image of is returned by the device 400 to the edge device where the user is located.

When the apparatus 400 is a software apparatus, the apparatus 400 may also be separately deployed on a computing device in any environment.

The present application also provides an apparatus 500 as shown in FIG. 5. The apparatus 500 includes a processor 502, a communication interface 503, and a memory 504. An example of the device 500 is a chip. Another example of the apparatus 500 is a computing device.

The processor 502, the memory 504, and the communication interface 503 may communicate through a bus. Executable code is stored in the memory 504, and the processor 502 reads the executable code in the memory 504 to execute the corresponding method. The memory 504 may also include other software modules required for running processes, such as an operating system. The operating system can be LINUX ^TM , UNIX ^TM , WINDOWS ^TM etc.

For example, the executable code in the memory 504 is used to implement the method shown in 3, and the processor 502 reads the executable code in the memory 504 to execute the method shown in FIG. 3.

The processor 502 may be a central processing unit (CPU), or an exemplary structure of the processor 502 is shown in FIG. 2. The memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM). The memory 504 may also include a non-volatile memory (2non-volatile memory, 2NVM), such as a read-only memory (2read-only memory, 2ROM), a flash memory, a hard disk drive (HDD), or a solid-state boot ( solid state disk, SSD).

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

An image retrieval method, characterized in that it comprises:

Acquiring a feature to be retrieved, where the feature to be retrieved is a feature of the image to be retrieved;

Acquiring a bottom library feature, the bottom library feature being a feature of a bottom library image, and the bottom library image being an image in a pre-configured image library;

Use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the input of the neural network to obtain the The output of a neural network, the neural network is used to realize the matrix operation of the input and the weight;

The target image is retrieved from the image library according to the output of the neural network.
The image retrieval method according to claim 1, wherein said obtaining the output of said neural network comprises:

The output of the neural network is obtained by the neural network processor.
3. The image retrieval method according to claim 2, wherein the neural network processor comprises a stereo operation unit, and the minimum duration of the multiplication operation between the three-dimensional matrices by the stereo operation unit is one clock cycle.
The image retrieval method according to any one of claims 1 to 3, wherein the feature to be retrieved is used as the input, and the base library feature is used as the weight.
The image retrieval method according to any one of claims 1 to 4, wherein the neural network includes a fully connected layer.
An image retrieval device, characterized in that it comprises:

An acquisition module for acquiring features to be retrieved, where the features to be retrieved are features of the image to be retrieved;

The acquisition module is also used to acquire bottom library features, where the bottom library features are features of bottom library images, and the bottom library images are images in a pre-configured image library;

An arithmetic module, configured to use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the weight of the neural network Input to obtain the output of the neural network, and the neural network is used to implement the matrix operation of the input and the weight;

The retrieval module is used to retrieve the target image from the image library according to the output of the neural network.
7. The image retrieval device according to claim 6, wherein the computing module is specifically configured to obtain the output of the neural network through a neural network processor.
8. The image retrieval device according to claim 7, wherein the neural network processor comprises a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
8. The image retrieval device according to any one of claims 6 to 8, wherein the feature to be retrieved is used as the input, and the base library feature is used as the weight.
9. The image retrieval device according to any one of claims 6 to 9, wherein the neural network includes a fully connected layer.
An image retrieval device, characterized by comprising: a processor coupled with a memory;

The memory is used to store instructions;

The processor is configured to execute the instructions stored in the memory, so that the device executes the method according to claim 1.
The device of claim 11, wherein the processor comprises a neural network processor;

Wherein, the neural network processor is configured to: obtain the output of the neural network, the weight of the neural network is one of the base library feature and the feature to be retrieved, and the input of the neural network is the Another item of the base library feature and the feature to be retrieved, and the neural network is used to implement the matrix operation of the input and the weight.
The device according to claim 12, wherein the neural network processor comprises a three-dimensional CUBE operation unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
A computer-readable medium, characterized by comprising instructions, which when run on a processor, cause the processor to execute the method according to any one of claims 1 to 5.