WO2021168815A1 - Image retrieval method and image retrieval apparatus - Google Patents

Image retrieval method and image retrieval apparatus Download PDF

Info

Publication number
WO2021168815A1
WO2021168815A1 PCT/CN2020/077238 CN2020077238W WO2021168815A1 WO 2021168815 A1 WO2021168815 A1 WO 2021168815A1 CN 2020077238 W CN2020077238 W CN 2020077238W WO 2021168815 A1 WO2021168815 A1 WO 2021168815A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
feature
image
retrieved
library
Prior art date
Application number
PCT/CN2020/077238
Other languages
French (fr)
Chinese (zh)
Inventor
钟林
宋昆鹏
路石
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080097508.1A priority Critical patent/CN115176244A/en
Priority to PCT/CN2020/077238 priority patent/WO2021168815A1/en
Publication of WO2021168815A1 publication Critical patent/WO2021168815A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • This application relates to the field of retrieval, and more specifically, to an image retrieval method and an image retrieval device.
  • Retrieval refers to starting from the user's specific information needs, using certain methods and technical means for a specific information collection, and finding relevant information from it according to certain clues and rules. Retrieval has been applied to all walks of life in today's society. For example, image retrieval technology can be applied to tasks such as face recognition, license plate detection or fingerprint recognition.
  • Image retrieval mainly includes three steps: first, collecting and processing image resources, extracting image features, and establishing an image feature database; secondly, acquiring the image to be retrieved, extracting the features of the image, and forming the feature data to be retrieved; then, based on similarity
  • the degree algorithm calculates the similarity between the feature data to be retrieved and the features recorded in the feature database; finally, the records meeting the similarity threshold are extracted from the feature database as the retrieval result, and output in descending order of similarity.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • the image retrieval method and image retrieval device provided in this application can implement image retrieval through artificial intelligence methods.
  • the present application provides an image retrieval method, the method includes: acquiring features to be retrieved, the features to be retrieved are features of the image to be retrieved; Feature, the bottom library image is an image in a pre-configured image library; one of the bottom library feature and the feature to be retrieved is used as the weight of the neural network, and the bottom library feature and the feature to be retrieved Another item in the feature is used as the input of the neural network to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; according to the output of the neural network from the The target image is retrieved from the image library.
  • the matrix product of the feature vector to be retrieved and the feature vector of the base library is calculated through a neural network, so as to calculate the similarity between the feature to be retrieved and the base library feature based on the product, thereby realizing image retrieval.
  • the natural matrix multiplication characteristics of the neural network can be fully utilized, which helps to obtain the similarity between the image to be retrieved and the base library image while reducing the complexity of implementation.
  • the obtaining the output of the neural network includes: obtaining the output of the neural network through a neural network processor.
  • the neural network processor is used to perform neural network calculations, which has faster running speed and better retrieval efficiency.
  • the neural network processor includes a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle .
  • the three-dimensional arithmetic unit can perform more addition operations or multiplication operations in one clock cycle, that is, the running speed is faster, so the retrieval efficiency is higher.
  • the feature to be retrieved is used as the input, and the base library feature is used as the weight.
  • the neural network includes a fully connected layer.
  • the present application provides an image retrieval device, which includes: an acquisition module for acquiring features to be retrieved, where the features to be retrieved are features of the image to be retrieved; and the acquisition module is also used for acquiring base library features ,
  • the bottom library feature is a feature of a bottom library image, the bottom library image is an image in a pre-configured image library; an arithmetic module is used to take one of the bottom library feature and the feature to be retrieved as The weight of the neural network, the other of the base library feature and the feature to be retrieved is used as the input of the neural network, and the output of the neural network is obtained.
  • the neural network is used to realize the input and the The matrix operation of the weights; a retrieval module, which is used to retrieve the target image from the image library according to the output of the neural network.
  • the arithmetic module is specifically configured to obtain the output of the neural network through a neural network processor.
  • the neural network processor includes a three-dimensional CUBE operation unit, and the minimum time period for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
  • the feature vector to be retrieved is used as the input, and the base library feature vector is used as the weight.
  • the neural network includes a fully connected layer.
  • the present application provides an image retrieval device, which includes: a processor coupled with a memory; the memory is used to store instructions; the processor is used to execute instructions stored in the memory, So that the device performs the following operations: acquiring features to be retrieved, which are features of the image to be retrieved; acquiring bottom library features, where the bottom library features are features of bottom library images, and the bottom library images are pre- The image in the configured image library; use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the The input of the neural network is used to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; and the target image is retrieved from the image library according to the output of the neural network.
  • the processor includes a neural network processor.
  • the neural network processor is configured to: obtain the output of the neural network, the weight of the neural network is one of the base library feature and the feature to be retrieved, and the input of the neural network is the Another item of the base library feature and the feature to be retrieved, and the neural network is used to implement the matrix operation of the input and the weight.
  • the neural network processor includes a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle .
  • the feature to be retrieved is used as the input, and the base library feature is used as the weight.
  • the neural network includes a fully connected layer.
  • the present application provides a computer-readable medium that stores instructions for device execution, and the instructions are used to implement the method in the first aspect or any one of the possible implementation manners.
  • this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method in the first aspect or any one of the possible implementation manners.
  • the present application provides a chip that includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any one of its possible implementations The method in the way.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the method in the first aspect or any one of the possible implementation manners.
  • the present application provides a computing device.
  • the computing device includes a processor and a memory.
  • the memory stores computer instructions, and the processor executes the computer instructions to implement the first aspect or any one of the possible implementation modes. In the method.
  • Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a neural network processor according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of an image retrieval method according to an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of an image retrieval device according to an embodiment of the present application.
  • Fig. 5 is another schematic structural diagram of an image retrieval device according to an embodiment of the present application.
  • Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application.
  • the image retrieval system may include a feature base module 110, a model conversion (model conversion) module 120, a deep learning platform (deep learning plateform, DL plateform) 130, a query module 140, and a search engine ( The retrieal engine module 150 and the sorting module 160.
  • the feature base library module 110 includes base library features obtained by feature extraction on images in the base library by a feature extraction network, and each image has a corresponding base library feature.
  • the feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.
  • the feature extraction network may include a residual network (resnet) and a fully connected layer (FC); in other examples, the feature extraction network may include a VGC16 and a fully connected layer.
  • resnet residual network
  • FC fully connected layer
  • the model conversion module 120 is used to convert the feature vector of the base library from the current format into a format that can be loaded by the retrieval neural network.
  • the current format mentioned here generally refers to the format supported by the deep learning platform that constructs the feature extraction network.
  • the model conversion module 120 is required to convert the base library features , From the format supported by the deep learning platform "Tensorflow” to the format supported by the deep learning platform "caffe".
  • the model conversion module can convert the base library features from the current format into a format that can be loaded by the retrieval neural network based on the basic software package.
  • a basic software package is the Numpy software package.
  • the Numpy software package refers to scientific calculations implemented in python, which can include: powerful N-dimensional array objects, where N is a positive integer; relatively mature function libraries; toolkits for integrating C/C++ and Fortran codes; practical linearity Algebra, Fourier transform and random number generation functions.
  • the deep learning platform 130 is used to construct a retrieval neural network.
  • the deep learning platform 130 includes but is not limited to caffe, Tensorflow, Mxnet, MindSpore, etc.
  • the deep learning platform 130 may be used to construct a neural network that calculates the similarity between the base library feature vector and the feature vector to be retrieved.
  • the dimension of the weight matrix of the constructed neural network can be determined by the number of feature vectors of the base library. For example, when the base library feature vector length is 256 and there are 300,000 base library feature vectors in total, the weight matrix in the constructed neural network can be a three-dimensional matrix of 256*1*300000.
  • the deep learning platform 130 can be used to construct the normalized base library feature vector and the normalized feature vector to be retrieved. Neural network with inner product.
  • the query module 140 is configured to perform feature extraction on the image to be retrieved through a feature extraction network to obtain the feature of the image to be retrieved.
  • the feature of the image to be retrieved is called the feature to be retrieved.
  • the feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.
  • the feature extraction network may include a residual network and a fully connected layer; in other examples, the feature extraction network may include a VGC16 and a fully connected layer.
  • the feature extraction network used for feature extraction of the base library image is the same as the feature extraction network for feature extraction of the image to be retrieved.
  • the same here means that the network structure and network parameters are the same.
  • the retrieval engine module 150 is mainly used to: use the features of the base database as the weight of the retrieval neural network, use the features to be retrieved as the input of the retrieval neural network, and perform addition, subtraction, multiplication and/or division operations based on the model structure of the retrieval neural network. In order to obtain the similarity between the base library feature and the feature to be retrieved.
  • the search engine module 150 may also include operations such as normalizing the base library features and the features to be retrieved.
  • the ranking module 160 is used to obtain the similarity that meets the requirements from the similarity obtained by the search engine module 150.
  • the sorting module 160 is specifically used to: filter the similarity acquired by the search engine module 150 based on a preset threshold to eliminate lower similarities; and then perform a certain order for the remaining similarities (for example, Sort from largest to smallest); select the top X similarities from the sorted similarities and select the bottom library images corresponding to the top X similarities from the bottom library images.
  • These X bottom library images are Is the search result, where X is a positive integer less than the total number of images in the base library.
  • the sorting module 160 may be specifically used to: sort the similarities obtained by the search engine module 150 in a certain order (for example, from the largest to the smallest); select the ranking from the sorted similarities.
  • the similarity of the first X and the base library image corresponding to the first X similarities are selected from the base library images. These X base library images are the retrieval results, where X is a positive integer less than the total number of images in the base library image .
  • the architecture of the image retrieval system shown in FIG. 1 is only an example, and the image retrieval system to which the image retrieval method proposed in this application can be applied may include more or fewer modules.
  • the image retrieval system to which the image retrieval method of this application can be applied may not have the model conversion module 120.
  • the image retrieval system to which the image retrieval method of the present application can be applied may include a neural network for extracting features of the base library.
  • FIG. 2 An exemplary structure diagram of an image retrieval device according to an embodiment of the present application is shown in FIG. 2.
  • the image retrieval device 200 shown in FIG. 2 may include a main processor 210, a memory 220, and a neural network processor 230.
  • the main processor 210 may be a central processing unit (CPU).
  • the main processor 210 may also be referred to as a host CPU (Host CPU).
  • the functions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, and the sorting module 160 may be implemented by the main processor 150.
  • the function of the search engine module 150 can be implemented by the neural network processor 230.
  • the memory 220 can store the corresponding instructions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, the search engine module 150, and the sorting module 160, as well as base library features and features to be retrieved, and even You can also store bottom library images and images to be retrieved.
  • Neural-network processing unit (NPU) 230 is mainly used to complete numerical operations of addition, subtraction, multiplication, and division required for network inference.
  • the NPU 230 completes the multiplication and accumulation operations required for network inference.
  • the neural network processor 230 may be mounted on the main CPU as a co-processor, and the Host CPU can allocate tasks.
  • the neural network processor 230 may include an input memory 201, a weight memory 202, an arithmetic circuit 203, a controller 204, a storage unit controller 205, a unified memory 206, and a fetch memory 209.
  • the unified memory 206, the input memory 201, the weight memory 202, and the instruction fetch buffer 209 are all on-chip memories.
  • the storage unit access controller (direct memory access controller, DMAC) 205 is mainly used to transfer the input data and weight data in the memory 220 to the unified memory 206.
  • the DMAC 205 can also be used to transfer weight data from the unified memory 206 to the weight memory 202, transfer input data data from the unified memory 206 to the input memory 201, and transfer instructions to the instruction fetch memory 209.
  • the controller 204 executes the instructions stored in the fetch memory 209, and controls the arithmetic circuit to perform operations on the weights stored in the weight memory 202 and the input data in the input memory 201.
  • the NPU 230 executes corresponding instructions through the controller 204, and controls the arithmetic circuit 203 to extract the matrix data in the weight memory 502 and the input memory 501 and perform matrix operations.
  • the arithmetic circuit 203 may include a three-dimensional cube arithmetic unit.
  • the arithmetic circuit 503 may also be a one-dimensional systolic array, a two-dimensional systolic array, or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 203 fetches the data corresponding to matrix B from the weight memory 202 and caches it on each arithmetic unit in the arithmetic circuit.
  • the arithmetic circuit 203 fetches the matrix A data and matrix B from the input memory 201 to perform a matrix operation, and the partial result or final result of the obtained matrix is stored in the unified memory 206.
  • Fig. 3 is an exemplary flowchart of an image retrieval method according to an embodiment of the application. As shown in FIG. 3, the method may include S310 to S340.
  • S310 Acquire a base library feature, where the base library feature is a feature of an image in an image library.
  • the images in the image library can be called bottom library images.
  • the base library images are collected in advance or collected.
  • the images in the image library may include one or more of images such as a face image, a fingerprint image, a license plate image, a vehicle image, and a human image.
  • the images in the image library can be updated as required, such as adding, deleting or replacing the images in it.
  • each image in the image library may be input to a neural network for feature extraction, and the extracted features are called base library features.
  • the feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.
  • the neural network used for feature extraction can include Resnet50 or VGG16.
  • the base library features can be received or copied directly from other devices or systems.
  • One manifestation of the base library feature is a matrix. If the matrix corresponding to the base library feature is a one-dimensional matrix, the manifestation of the base library feature is a vector, which can be called a base library feature vector.
  • a base library feature vector can be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).
  • An example of all the acquired features of the base library is the feature base library module 110 in the image retrieval system shown in FIG. 1.
  • the base library feature needs to be converted from the current format (which can be referred to as the initial format) to be readable by the retrieval neural network.
  • the format taken can be called the target format).
  • the model conversion module 120 in the image retrieval system shown in FIG. 1 converts the features of the base library into a format that can be supported by the neural network processor.
  • the original format file of the base library feature can be read through the basic software package Numpy, and the weight data of the corresponding field in the weight file of the retrieved neural network can be replaced with the base library feature in the original format file, and the replaced The file is the weight file of the retrieval neural network.
  • S320 Acquire a feature to be retrieved, where the feature to be retrieved is a feature of an image to be retrieved.
  • the query module 140 in the image retrieval system shown in FIG. 1 performs feature extraction on the picture to be retrieved to obtain the feature of the picture to be retrieved.
  • the feature of the picture to be retrieved can be called the feature to be retrieved, and one form of the feature to be retrieved is a vector, which can be called the feature vector to be retrieved.
  • a feature vector to be retrieved may be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).
  • the image to be retrieved may include a face image, a fingerprint image, a license plate image, a vehicle image, or a human image, and so on.
  • the image to be retrieved can be one, or one batch, that is, multiple images.
  • the images to be retrieved can be numbered, that is, each image to be retrieved is assigned an index number for identifying each image to be retrieved. After acquiring the feature to be retrieved of the image to be retrieved, an index number can also be assigned to each feature to be retrieved, and the index number of the feature to be retrieved can be the same as the index number of the corresponding image to be retrieved.
  • the retrieval engine module 150 in the image retrieval system shown in FIG. 1 is used to implement neural network inference and obtain the output of the neural network.
  • Network reasoning is to call the underlying hardware (such as a neural network processor) to complete the addition, subtraction, multiplication, and/or division of input and weight.
  • the base library feature can be used as the weight of the neural network, and the feature to be retrieved can be used as the input of the neural network; in other implementations, the feature to be retrieved can be used as the weight of the neural network, and the base library can be used as the weight of the neural network.
  • the features are used as input to the neural network.
  • normalization processing may be performed on each base library feature and each feature to be retrieved.
  • the normalization of the base library features can be performed after the feature extraction module 110 extracts the base library features, or can be performed after the model conversion module 120 performs the conversion, or can be performed before the search engine module 150 performs inference; the features to be retrieved can be performed in The query module 140 then performs normalization.
  • the model structure of the neural network can be constructed based on a formula for calculating the similarity between the features of the base database and the features to be retrieved.
  • the following uses cosine similarity to measure the similarity between the feature vector to be retrieved and the feature vector of the base library as an example to introduce the model structure of the neural network and the formula for calculating the similarity between the feature vector of the base library and the feature to be retrieved. Relationship.
  • the cosine similarity calculation formula is as follows:
  • n indicates that there are n base library feature vectors;
  • a i indicates the i-th item of the feature vector to be retrieved; Represents the i-th item of the k-th base library feature vector; AT represents the transposition of the feature vector to be queried;
  • represents the 2-norm of the vector A to be retrieved, that is, the element square and square root;
  • W is k represents 2 norm
  • cos [theta] k Represents the cosine similarity value between the feature vector to be retrieved and the k-th base library feature vector. The closer cos ⁇ k is to 1, it indicates that the feature vector to be retrieved is more similar to the feature vector of the k-th base library.
  • a retrieval neural network for calculating AT ⁇ W k can be constructed, that is, a retrieval neural network for realizing W k matrix multiplication of A and W can be constructed.
  • a retrieval neural network can be constructed, and the retrieval neural network can include a network structure similar to a fully connected layer in Tensorflow or Caffe, and initialize the weights in the network structure, that is, initialize the parameters in the network structure.
  • the base library feature is used as the weight of the retrieval neural network
  • the feature to be retrieved is used as the input of the retrieval neural network
  • the output of the retrieval neural network is obtained, in an example,
  • the main processor 210 can load the base library features obtained in S310 into the weight memory 202 of the neural network processor 230, load the features to be retrieved in S320 into the input memory 201 of the neural network processor 230, and pass
  • the controller 204 controls the arithmetic circuit 230 to perform network inference, that is, to calculate the value of AT ⁇ W k.
  • S340 retrieve a target image from the image library according to the output of the neural network.
  • the similarity between the base library feature and the feature to be retrieved can be determined based on the output of the neural network, and the target image can be retrieved from the image library according to the similarity.
  • the output of the neural network can be used as the base library feature and the feature to be retrieved The similarity of features.
  • the output of the neural network processor 230 can be The value of is used as the similarity between the k-th base library feature and the feature to be retrieved.
  • the output of the neural network can be further processed to obtain the base library feature and the feature to be retrieved. Similarity.
  • the output of the neural network processor 230 can be Divided by And use the obtained quotient as the similarity between the k-th base library feature and the feature to be retrieved. in, And the value of The value of can also be calculated based on a neural network.
  • A is used as the input and weight of the neural network, and the input and the weight are matrixed through the neural network; for example, W k is used as the input and the weight of the neural network.
  • the weight is used to perform matrix operations on the input and the weight through the neural network.
  • the number of similarities output by the neural network can be determined by the input base library features and the number of features to be retrieved. For example, if n base library features are used as weights and m features to be retrieved are used as input, the neural network can output m*n similarities, and m and n are positive integers.
  • the output of the retrieval neural network is recorded as 16* 300,000 similarity values, where the value range of each similarity value can be [0,1].
  • the neural network processor may include a cube computing unit.
  • the retrieval neural network can rely on the powerful cube matrix computing capabilities of the neural network processor to improve the ability to calculate the similarity between the features of the base library and the features to be retrieved. .
  • a single cube operation unit can usually complete a 16*16*16-dimensional matrix multiplication in one cycle, or even higher. This makes it possible to increase the computational efficiency by 8192 times compared with the traditional image processor or traditional CPU implementation of matrix calculations under the same main frequency. This is because the traditional CPU can only complete one multiplication or one addition operation in one clock cycle. If a traditional CPU is used to complete the multiplication and addition operations of 16*16*16 data, 16*16*16*2 clocks are required. Cycle; and a cube unit can complete a 16*16*16 data multiplication and addition operation in one use cycle, so using the cube unit for processing can increase the efficiency by 8192 times.
  • the similarity calculation between the base library features and the features to be retrieved through neural network inference can make the calculation speed reach the millisecond level.
  • the obtained similarity value can be filtered to filter out the similarity higher than the preset threshold and the corresponding image to be retrieved and the base library
  • the index number of the image sort these similarities in descending order, and select the top K similarities and their corresponding index numbers of the image to be retrieved and the base library image; then, these K similarities can be output as well as Corresponding to the image to be retrieved and the base library image, so as to realize the retrieval of the image, K is a positive integer.
  • the K background images are the retrieved target images that correspond to the K images to be retrieved one-to-one.
  • the threshold can be set as required.
  • the threshold value can be set as large as possible, because if the threshold value is set lower, more similarity will be retained, and the efficiency of subsequent sorting and selection of the first K similarity and K image base library images Certain influence.
  • both the base library features and the features to be retrieved can be realized by neural networks.
  • the neural network is used to calculate the similarity between the base library features and the features to be retrieved, which can reduce the base library features and the features to be retrieved. Retrieving the characteristics of the repeated movement of these data reduces the time consumption of data movement or transmission.
  • the image retrieval method of the present application can reduce retrieval time, thereby improving retrieval efficiency and thereby improving user experience.
  • the features to be retrieved can be input to the retrieval neural network in batches, that is, multiple features to be retrieved are input, which can improve retrieval efficiency.
  • Fig. 4 is an exemplary structure diagram of the image retrieval device of the present application.
  • the device 400 includes an acquisition module 410, an arithmetic module 420, and a retrieval module 430.
  • the device 400 can implement the method shown in FIG. 3 described above.
  • the acquisition module 410 is used to perform S310 and S320
  • the calculation module 420 is used to perform S330
  • the retrieval module 430 is used to perform S340.
  • the device 400 may be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider.
  • the computing resources included in the cloud data center can be a large number of computing resources.
  • Device for example, server).
  • the device 400 may be a server for image retrieval in a cloud data center.
  • the apparatus 400 may also be a virtual machine for image retrieval created in a cloud data center.
  • the device 400 may also be a software device deployed on a server or a virtual machine in a cloud data center. The software device is used for image retrieval.
  • the software device may be distributed on multiple servers or distributed on multiple servers. On a virtual machine, or distributedly deployed on virtual machines and servers.
  • the acquisition module 410, the calculation module 420, and the retrieval module 430 in the apparatus 400 may be distributed on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers. superior.
  • the device 400 can be abstracted by a cloud service provider into a cloud service of image retrieval on a cloud service platform and provided to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment uses the cloud service to provide the user with the cloud service of image retrieval. The user can upload the image to be retrieved to the cloud environment through the application program interface (API) or through the web interface provided by the cloud service platform, the device 400 receives the image to be retrieved, performs image retrieval based on the image to be retrieved, and finally obtains The target image of is returned by the device 400 to the edge device where the user is located.
  • API application program interface
  • the apparatus 400 may also be separately deployed on a computing device in any environment.
  • the present application also provides an apparatus 500 as shown in FIG. 5.
  • the apparatus 500 includes a processor 502, a communication interface 503, and a memory 504.
  • An example of the device 500 is a chip.
  • Another example of the apparatus 500 is a computing device.
  • the processor 502, the memory 504, and the communication interface 503 may communicate through a bus.
  • Executable code is stored in the memory 504, and the processor 502 reads the executable code in the memory 504 to execute the corresponding method.
  • the memory 504 may also include other software modules required for running processes, such as an operating system.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM etc.
  • the executable code in the memory 504 is used to implement the method shown in 3, and the processor 502 reads the executable code in the memory 504 to execute the method shown in FIG. 3.
  • the processor 502 may be a central processing unit (CPU), or an exemplary structure of the processor 502 is shown in FIG. 2.
  • the memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • the memory 504 may also include a non-volatile memory (2non-volatile memory, 2NVM), such as a read-only memory (2read-only memory, 2ROM), a flash memory, a hard disk drive (HDD), or a solid-state boot ( solid state disk, SSD).
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application provides an image retrieval method and an image retrieval apparatus, which use artificial intelligence. In the technical solution proposed by the present application, one of a base library feature corresponding to an image in an image library, and a feature to be retrieved corresponding to an image to be retrieved is used as a weight of a neural network, and the other one of the base library feature and said feature to be retrieved is used as an input of the neural network, a matrix computation between the input and the weight of the neural network is achieved by using the neural network, and a target image is obtained by retrieval from the image library according to an output of the neural network. The technical solution of the present application can not only realize image retrieval, but also simplify the implementation complexity of image retrieval.

Description

图像检索方法和图像检索装置Image retrieval method and image retrieval device 技术领域Technical field
本申请涉及检索领域,尤其更具体地,涉及图像检索方法和图像检索装置。This application relates to the field of retrieval, and more specifically, to an image retrieval method and an image retrieval device.
背景技术Background technique
检索是指从用户特定的信息需求出发,对特定的信息集合采用一定的方法、技术手段,根据一定的线索与规则从中找出相关信息。检索已经应用到了当今社会的各行各业,例如,人脸识别、车牌检测或指纹识别等任务均可以应用图像检索技术。Retrieval refers to starting from the user's specific information needs, using certain methods and technical means for a specific information collection, and finding relevant information from it according to certain clues and rules. Retrieval has been applied to all walks of life in today's society. For example, image retrieval technology can be applied to tasks such as face recognition, license plate detection or fingerprint recognition.
图像检索主要包括三个步骤:首先,收集和加工图像资源,提取图像特征,建立图像特征数据库;其次,获取待检索的图像,提取该图像的特征,形成待检索的特征数据;然后,根据相似度算法,计算待检索的特征数据与特征数据库中记录的特征的相似度大小;最后,从特征数据库中提取出满足相似度阈值的记录作为检索结果,并按照相似度降序的方式输出。Image retrieval mainly includes three steps: first, collecting and processing image resources, extracting image features, and establishing an image feature database; secondly, acquiring the image to be retrieved, extracting the features of the image, and forming the feature data to be retrieved; then, based on similarity The degree algorithm calculates the similarity between the feature data to be retrieved and the features recorded in the feature database; finally, the records meeting the similarity threshold are extracted from the feature database as the retrieval result, and output in descending order of similarity.
目前,可以通过人工智能(artificial intelligence,AI)技术来实现图像检索。人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Currently, image retrieval can be achieved through artificial intelligence (AI) technology. Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
如何利用人工智能来实现图像检索,例如,面对当今海量的人脸识别数据、车牌检测数据或指纹检索数据等图像数据,如何利用人工智能来实现图像检索,成为亟待解决的技术问题。How to use artificial intelligence to achieve image retrieval, for example, in the face of today's massive image data such as face recognition data, license plate detection data or fingerprint retrieval data, how to use artificial intelligence to achieve image retrieval has become an urgent technical problem to be solved.
发明内容Summary of the invention
本申请提供的图像检索方法和图像检索装置,可以通过人工智能的方法来实现图像检索。The image retrieval method and image retrieval device provided in this application can implement image retrieval through artificial intelligence methods.
第一方面,本申请提供了一种图像检索方法,该方法包括:获取待检索特征,所述待检索特征为待检索图像的特征;获取底库特征,所述底库特征为底库图像的特征,所述底库图像为预先配置的图像库中的图像;将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现所述输入和所述权重的矩阵运算;根据所述神经网络的输出从所述图像库中检索得到目标图像。In the first aspect, the present application provides an image retrieval method, the method includes: acquiring features to be retrieved, the features to be retrieved are features of the image to be retrieved; Feature, the bottom library image is an image in a pre-configured image library; one of the bottom library feature and the feature to be retrieved is used as the weight of the neural network, and the bottom library feature and the feature to be retrieved Another item in the feature is used as the input of the neural network to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; according to the output of the neural network from the The target image is retrieved from the image library.
本申请的方法,通过神经网络计算待检索特征向量与底库特征向量的矩阵积,以根据该积计算待检索特征和底库特征的相似度,从而实现图像检索。此外,可以充分利用神经 网络天然的矩阵乘法特性,从而有助于在获取待检索图像与底库图像的相似度的同时,还能降低实现的复杂度。In the method of the present application, the matrix product of the feature vector to be retrieved and the feature vector of the base library is calculated through a neural network, so as to calculate the similarity between the feature to be retrieved and the base library feature based on the product, thereby realizing image retrieval. In addition, the natural matrix multiplication characteristics of the neural network can be fully utilized, which helps to obtain the similarity between the image to be retrieved and the base library image while reducing the complexity of implementation.
结合第一方面,在第一种可能的实现方式中,所述获取所述神经网络的输出,包括:通过神经网络处理器获取所述神经网络的输出。With reference to the first aspect, in a first possible implementation manner, the obtaining the output of the neural network includes: obtaining the output of the neural network through a neural network processor.
该实现方式中,通过神经网络处理器进行神经网络的计算,运行速度更快,检索效率更好。In this implementation, the neural network processor is used to perform neural network calculations, which has faster running speed and better retrieval efficiency.
结合第一种可能的实现方式,在第二种可能的实现方式中,所述神经网络处理器包括立体运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。In combination with the first possible implementation manner, in the second possible implementation manner, the neural network processor includes a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle .
该实现方式中,由于立体运算单元在一个时钟周期内可以进行更多次数的加运算或乘运算,即运行速度更快,因此检索效率更高。In this implementation manner, since the three-dimensional arithmetic unit can perform more addition operations or multiplication operations in one clock cycle, that is, the running speed is faster, so the retrieval efficiency is higher.
结合第一方面或上述任意一种可能的实现方式,在第三种可能的实现方式中,所述待检索特征作为所述输入,将所述底库特征作为所述权重。With reference to the first aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the feature to be retrieved is used as the input, and the base library feature is used as the weight.
结合第一方面或上述任意一种可能的实现方式,在第四种可能的实现方式中,所述神经网络包括全连接层。With reference to the first aspect or any one of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.
第二方面,本申请提供一种图像检索装置,该装置包括:获取模块,用于获取待检索特征,所述待检索特征为待检索图像的特征;所述获取模块还用于获取底库特征,所述底库特征为底库图像的特征,所述底库图像为预先配置的图像库中的图像;运算模块,用于将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现所述输入和所述权重的矩阵运算;检索模块,用于根据所述神经网络的输出从所述图像库中检索得到目标图像。In a second aspect, the present application provides an image retrieval device, which includes: an acquisition module for acquiring features to be retrieved, where the features to be retrieved are features of the image to be retrieved; and the acquisition module is also used for acquiring base library features , The bottom library feature is a feature of a bottom library image, the bottom library image is an image in a pre-configured image library; an arithmetic module is used to take one of the bottom library feature and the feature to be retrieved as The weight of the neural network, the other of the base library feature and the feature to be retrieved is used as the input of the neural network, and the output of the neural network is obtained. The neural network is used to realize the input and the The matrix operation of the weights; a retrieval module, which is used to retrieve the target image from the image library according to the output of the neural network.
结合第二方面,在第一种可能的实现方式中,所述运算模块具体用于:通过神经网络处理器获取所述神经网络的输出。With reference to the second aspect, in the first possible implementation manner, the arithmetic module is specifically configured to obtain the output of the neural network through a neural network processor.
结合第一种可能的实现方式,在第二种可能的实现方式中,所述神经网络处理器包括立体CUBE运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。In combination with the first possible implementation manner, in the second possible implementation manner, the neural network processor includes a three-dimensional CUBE operation unit, and the minimum time period for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
结合第二方面或上述任一种可能的实现方式,在第三种可能的实现方式中,所述待检索特征向量作为所述输入,将所述底库特征向量作为所述权重。With reference to the second aspect or any of the foregoing possible implementation manners, in a third possible implementation manner, the feature vector to be retrieved is used as the input, and the base library feature vector is used as the weight.
结合第二方面或上述任一种可能的实现方式,在第四种可能的实现方式中,所述神经网络包括全连接层。With reference to the second aspect or any of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.
第三方面,本申请提供一种图像检索装置,该装置包括:处理器,所述处理器与存储器耦合;所述存储器用于存储指令;所述处理器用于执行所述存储器中存储的指令,以使得所述装置执行如下操作:获取待检索特征,所述待检索特征为待检索图像的特征;获取底库特征,所述底库特征为底库图像的特征,所述底库图像为预先配置的图像库中的图像;将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现所述输入和所述权重的矩阵运算;根据所述神经网络的输出从所述图像库中检索得到目标图像。In a third aspect, the present application provides an image retrieval device, which includes: a processor coupled with a memory; the memory is used to store instructions; the processor is used to execute instructions stored in the memory, So that the device performs the following operations: acquiring features to be retrieved, which are features of the image to be retrieved; acquiring bottom library features, where the bottom library features are features of bottom library images, and the bottom library images are pre- The image in the configured image library; use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the The input of the neural network is used to obtain the output of the neural network, and the neural network is used to realize the matrix operation of the input and the weight; and the target image is retrieved from the image library according to the output of the neural network.
结合第三方面,在第一种可能的实现方式中,所述处理器包括神经网络处理器。其中,所述神经网络处理器用于:获取所述神经网络的输出,所述神经网络的权重为所述底库特征和所述待检索特征中的一项,所述神经网络的输入为所述底库特征和所述待检索特征中的另一项,所述神经网络用于实现所述输入和所述权重的矩阵运算。With reference to the third aspect, in a first possible implementation manner, the processor includes a neural network processor. Wherein, the neural network processor is configured to: obtain the output of the neural network, the weight of the neural network is one of the base library feature and the feature to be retrieved, and the input of the neural network is the Another item of the base library feature and the feature to be retrieved, and the neural network is used to implement the matrix operation of the input and the weight.
结合第一种可能的实现方式,在第二种可能的实现方式中,所述神经网络处理器包括立体运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。In combination with the first possible implementation manner, in the second possible implementation manner, the neural network processor includes a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle .
结合第三方面或上述任一种可能的实现方式,在第三种可能的实现方式中,所述待检索特征作为所述输入,将所述底库特征作为所述权重。With reference to the third aspect or any one of the foregoing possible implementation manners, in a third possible implementation manner, the feature to be retrieved is used as the input, and the base library feature is used as the weight.
结合第三方面或上述任一种可能的实现方式,在第四种可能的实现方式中,所述神经网络包括全连接层。With reference to the third aspect or any of the foregoing possible implementation manners, in a fourth possible implementation manner, the neural network includes a fully connected layer.
第四方面,本申请提供一种计算机可读介质,该计算机可读介质存储用于设备执行的指令,该指令用于实现第一方面或其中任意一种可能的实现方式中的方法。In a fourth aspect, the present application provides a computer-readable medium that stores instructions for device execution, and the instructions are used to implement the method in the first aspect or any one of the possible implementation manners.
第五方面,本申请提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行第一方面或其中任意一种可能的实现方式中的方法。In the fifth aspect, this application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method in the first aspect or any one of the possible implementation manners.
第六方面,本申请提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第一方面或其中任意一种可能的实现方式中的方法。In a sixth aspect, the present application provides a chip that includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any one of its possible implementations The method in the way.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或其中任意一种可能的实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory. When the instructions are executed, the The processor is configured to execute the method in the first aspect or any one of the possible implementation manners.
第七方面,本申请提供了一种计算设备,计算设备包括处理器和存储器,其中:存储器中存储有计算机指令,处理器执行计算机指令,以实现第一方面或其中任意一种可能的实现方式中的方法。In a seventh aspect, the present application provides a computing device. The computing device includes a processor and a memory. The memory stores computer instructions, and the processor executes the computer instructions to implement the first aspect or any one of the possible implementation modes. In the method.
附图说明Description of the drawings
图1是本申请实施例的图像检索系统的一种示意性结构图;Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application;
图2是本申请实施例的神经网络处理器的一种示意性流程图;FIG. 2 is a schematic flowchart of a neural network processor according to an embodiment of the present application;
图3是本申请实施例的图像检索方法的一种示意性流程图;FIG. 3 is a schematic flowchart of an image retrieval method according to an embodiment of the present application;
图4是本申请实施例的图像检索装置的一种示意性结构图;Fig. 4 is a schematic structural diagram of an image retrieval device according to an embodiment of the present application;
图5是本申请实施例的图像检索装置的另一种示意性结构图。Fig. 5 is another schematic structural diagram of an image retrieval device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below in conjunction with the accompanying drawings.
图1是本申请实施例的图像检索系统的一种示意性结构图。如图1所示,该图像检索系统可以包括特征底库模块110、模型转换(model convert)模块120、深度学习平台(deep learning plateform,DL plateform)130、查询(query)模块140、检索引擎(retrieal engine)模块150和排序模块160。Fig. 1 is a schematic structural diagram of an image retrieval system according to an embodiment of the present application. As shown in Figure 1, the image retrieval system may include a feature base module 110, a model conversion (model conversion) module 120, a deep learning platform (deep learning plateform, DL plateform) 130, a query module 140, and a search engine ( The retrieal engine module 150 and the sorting module 160.
特征底库模块110,包括特征提取网络对底库中的图像进行特征提取得到的底库特 征,每个图像都有对应的底库特征。The feature base library module 110 includes base library features obtained by feature extraction on images in the base library by a feature extraction network, and each image has a corresponding base library feature.
该特征提取网络可以采用通用的、用于提取特征的神经网络,也可以是重新设计的、能够提取特征的神经网络。The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.
在一些示例中,特征提取网络可以包含残差网络(resnet)和全连接层(fully connected layer,FC);在另一些示例中,特征提取网络可以包括VGC16和全连接层。In some examples, the feature extraction network may include a residual network (resnet) and a fully connected layer (FC); in other examples, the feature extraction network may include a VGC16 and a fully connected layer.
模型转换模块120,用于将底库特征向量从当前格式转换成检索神经网络能加载的格式。此处所述的当前格式通常是指构造特征提取网络的深度学习平台支持的格式。The model conversion module 120 is used to convert the feature vector of the base library from the current format into a format that can be loaded by the retrieval neural network. The current format mentioned here generally refers to the format supported by the deep learning platform that constructs the feature extraction network.
例如,提取底库特征的特征提取网络是基于深度学习平台“Tensorflow”构造的神经网络,而检索神经网络是基于深度学习平台“caffe”构造的神经网络时,需要模型转换模块120将底库特征,从深度学习平台“Tensorflow”支持的格式转化为深度学习平台“caffe”支持的格式。For example, when the feature extraction network for extracting base library features is a neural network constructed based on the deep learning platform "Tensorflow", and the retrieval neural network is a neural network constructed based on the deep learning platform "caffe", the model conversion module 120 is required to convert the base library features , From the format supported by the deep learning platform "Tensorflow" to the format supported by the deep learning platform "caffe".
例如,模型转换模块可以基于基础软件包将底库特征从当前格式转换成检索神经网络能加载的格式。基础软件包的一种示例为Numpy软件包。For example, the model conversion module can convert the base library features from the current format into a format that can be loaded by the retrieval neural network based on the basic software package. An example of a basic software package is the Numpy software package.
Numpy软件包是指用python实现的科学计算,其可以包括:强大的N维数组对象,N为正整数;比较成熟的函数库;用于整合C/C++和Fortran代码的工具包;实用的线性代数、傅里叶变换和随机数生成函数。The Numpy software package refers to scientific calculations implemented in python, which can include: powerful N-dimensional array objects, where N is a positive integer; relatively mature function libraries; toolkits for integrating C/C++ and Fortran codes; practical linearity Algebra, Fourier transform and random number generation functions.
深度学习平台130用于构造检索神经网络。深度学习平台130包括但不限于caffe、Tensorflow、Mxnet、MindSpore等。The deep learning platform 130 is used to construct a retrieval neural network. The deep learning platform 130 includes but is not limited to caffe, Tensorflow, Mxnet, MindSpore, etc.
在一些实现方式中,可以使用深度学习平台130构造出计算底库特征向量和待检索特征向量的相似度的神经网络。其中,构造的神经网络的权重矩阵的维度可以由底库特征向量的数量决定。例如,底库特征向量长度为256,且总共有30万个底库特征向量时,构建的神经网络中的权重矩阵可以是256*1*300000的三维矩阵。In some implementations, the deep learning platform 130 may be used to construct a neural network that calculates the similarity between the base library feature vector and the feature vector to be retrieved. Among them, the dimension of the weight matrix of the constructed neural network can be determined by the number of feature vectors of the base library. For example, when the base library feature vector length is 256 and there are 300,000 base library feature vectors in total, the weight matrix in the constructed neural network can be a three-dimensional matrix of 256*1*300000.
例如,使用余弦公式计算底库特征向量和待检索特征向量的相似度时,可以使用深度学习平台130构造用于计算归一化之后的底库特征向量和归一化之后的待检索特征向量的内积的神经网络。For example, when using the cosine formula to calculate the similarity between the base library feature vector and the feature vector to be retrieved, the deep learning platform 130 can be used to construct the normalized base library feature vector and the normalized feature vector to be retrieved. Neural network with inner product.
查询模块140,用于通过特征提取网络对待检索图像进行特征提取,得到待检索图像的特征。待检索图像的特征称为待检索特征。该特征提取网络可以采用通用的、用于提取特征的神经网络,也可以是重新设计的、能够提取特征的神经网络。The query module 140 is configured to perform feature extraction on the image to be retrieved through a feature extraction network to obtain the feature of the image to be retrieved. The feature of the image to be retrieved is called the feature to be retrieved. The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features.
在一些示例中,特征提取网络可以包含残差网络和全连接层;在另一些示例中,特征提取网络可以包括VGC16和全连接层。In some examples, the feature extraction network may include a residual network and a fully connected layer; in other examples, the feature extraction network may include a VGC16 and a fully connected layer.
通常来说,对底库图像进行特征提取所使用的特征提取网络与对待检索图像进行特征提取的特征提取网络相同。此处所说的相同是指网络结构以及网络参数均相同。Generally speaking, the feature extraction network used for feature extraction of the base library image is the same as the feature extraction network for feature extraction of the image to be retrieved. The same here means that the network structure and network parameters are the same.
检索引擎模块150主要是用于:将底库特征作为检索神经网络的权重,将待检索特征作为检索神经网络的输入,基于检索神经网络的模型结构进行加、减、乘和/或除运算,以得到底库特征与待检索特征的相似度。The retrieval engine module 150 is mainly used to: use the features of the base database as the weight of the retrieval neural network, use the features to be retrieved as the input of the retrieval neural network, and perform addition, subtraction, multiplication and/or division operations based on the model structure of the retrieval neural network. In order to obtain the similarity between the base library feature and the feature to be retrieved.
检索引擎模块150还可以包括对底库特征和待检索特征进行归一化等处理的操作。The search engine module 150 may also include operations such as normalizing the base library features and the features to be retrieved.
排序模块160用于从检索引擎模块150获取到的相似度中获取满足需求的相似度。The ranking module 160 is used to obtain the similarity that meets the requirements from the similarity obtained by the search engine module 150.
在一些示例中,排序模块具体160用于:基于预设的阈值对检索引擎模块150获取到的相似度进行过滤,以剔除较低的相似度;然后再对剩余相似度按照一定的顺序(例如从 大到小的顺序)进行排序;从排序后的相似度中选择排名前X的相似度以及从底库图像中选取该前X个相似度对应的底库图像,这X个底库图像即为检索结果,其中,X为小于底库图像中图像总数量的正整数。In some examples, the sorting module 160 is specifically used to: filter the similarity acquired by the search engine module 150 based on a preset threshold to eliminate lower similarities; and then perform a certain order for the remaining similarities (for example, Sort from largest to smallest); select the top X similarities from the sorted similarities and select the bottom library images corresponding to the top X similarities from the bottom library images. These X bottom library images are Is the search result, where X is a positive integer less than the total number of images in the base library.
在另一些示例中,排序模块160可以具体用于:对检索引擎模块150获取到的相似度,按照一定的顺序(例如从大到小的顺序)进行排序;从排序后的相似度中选择排名前X的相似度以及从底库图像中选取该前X个相似度对应的底库图像,这X个底库图像即为检索结果,其中,X为小于底库图像中图像总数量的正整数。In other examples, the sorting module 160 may be specifically used to: sort the similarities obtained by the search engine module 150 in a certain order (for example, from the largest to the smallest); select the ranking from the sorted similarities The similarity of the first X and the base library image corresponding to the first X similarities are selected from the base library images. These X base library images are the retrieval results, where X is a positive integer less than the total number of images in the base library image .
可以理解的是,图1所示的图像检索系统的架构仅是一种示例,可以应用本申请提出的图像检索方法的图像检索系统中可以包括更多或更少的模块。例如,若用于提取底库特征的神经网络与检索神经网络能够运行在相同的深度学习平台上,则可以应用本申请的图像检索方法的图像检索系统中可以没有模型转换模块120。又如,可以应用本申请的图像检索方法的图像检索系统中可以包括用于提取底库特征的神经网络。It is understandable that the architecture of the image retrieval system shown in FIG. 1 is only an example, and the image retrieval system to which the image retrieval method proposed in this application can be applied may include more or fewer modules. For example, if the neural network used to extract the features of the base library and the retrieval neural network can run on the same deep learning platform, the image retrieval system to which the image retrieval method of this application can be applied may not have the model conversion module 120. For another example, the image retrieval system to which the image retrieval method of the present application can be applied may include a neural network for extracting features of the base library.
本申请一个实施例的图像检索设备的示例性结构图如图2所示。图2所示的图像检索设备200可以包括主处理器210、存储器220和神经网络处理器230。An exemplary structure diagram of an image retrieval device according to an embodiment of the present application is shown in FIG. 2. The image retrieval device 200 shown in FIG. 2 may include a main processor 210, a memory 220, and a neural network processor 230.
主处理器210可以为中央处理器(central processing unit,CPU)。主处理器210也可以称为主CPU(Host CPU)。The main processor 210 may be a central processing unit (CPU). The main processor 210 may also be referred to as a host CPU (Host CPU).
特征底库模块110、模型转换模块120、深度学习平台130、查询(query)模块140和排序模块160的功能可以由主处理器150实现。检索引擎模块150的功能可以通过神经网络处理器230实现。The functions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, and the sorting module 160 may be implemented by the main processor 150. The function of the search engine module 150 can be implemented by the neural network processor 230.
存储器220可以存储特征底库模块110、模型转换模块120、深度学习平台130、查询(query)模块140、检索引擎模块150和排序模块160各自对应的指令,以及底库特征和待检索特征,甚至还可以存储底库图像和待检索图像。The memory 220 can store the corresponding instructions of the feature base module 110, the model conversion module 120, the deep learning platform 130, the query module 140, the search engine module 150, and the sorting module 160, as well as base library features and features to be retrieved, and even You can also store bottom library images and images to be retrieved.
神经网络处理器(neural-network processing unit,NPU)230主要用于完成网络推理需要的加减乘除数值运算。例如,NPU 230完成网络推理需要的乘累加运算。神经网络处理器230可以作为协处理器挂载到主CPU上,由Host CPU分配任务。Neural-network processing unit (NPU) 230 is mainly used to complete numerical operations of addition, subtraction, multiplication, and division required for network inference. For example, the NPU 230 completes the multiplication and accumulation operations required for network inference. The neural network processor 230 may be mounted on the main CPU as a co-processor, and the Host CPU can allocate tasks.
神经网络处理器230可以包含输入存储器201,权重存储器202,运算电路203,控制器204,存储单元控制器205,统一存储器206和取指存储器209。The neural network processor 230 may include an input memory 201, a weight memory 202, an arithmetic circuit 203, a controller 204, a storage unit controller 205, a unified memory 206, and a fetch memory 209.
统一存储器206,输入存储器201,权重存储器202以及取指存储器(instruction fetch buffer)209均为片上(on-chip)存储器。The unified memory 206, the input memory 201, the weight memory 202, and the instruction fetch buffer 209 are all on-chip memories.
存储单元访问控制器(direct memory access controller,DMAC)205主要用于将存储器220中的输入数据和权重数据搬运到统一存储器206。The storage unit access controller (direct memory access controller, DMAC) 205 is mainly used to transfer the input data and weight data in the memory 220 to the unified memory 206.
进一步地,DMAC 205还可以用于将权重数据从统一存储器206搬运到权重存储器202中,将输入数据数据从统一存储器206搬运到输入存储器201中,将指令搬运到取指存储器209中。Further, the DMAC 205 can also be used to transfer weight data from the unified memory 206 to the weight memory 202, transfer input data data from the unified memory 206 to the input memory 201, and transfer instructions to the instruction fetch memory 209.
控制器204执行取指存储器209中存储的指令,控制运算电路对权重存储器202中存储的权重和输入存储器201中的输入数据进行运算。The controller 204 executes the instructions stored in the fetch memory 209, and controls the arithmetic circuit to perform operations on the weights stored in the weight memory 202 and the input data in the input memory 201.
NPU 230通过控制器204执行相应指令,控制运算电路203提取权重存储器502和输入存储器501中的矩阵数据并进行矩阵运算。The NPU 230 executes corresponding instructions through the controller 204, and controls the arithmetic circuit 203 to extract the matrix data in the weight memory 502 and the input memory 501 and perform matrix operations.
在一些实现中,运算电路203可以包括三维的立体(cube)运算单元。运算电路503 还可以是一维脉动阵列、二维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。In some implementations, the arithmetic circuit 203 may include a three-dimensional cube arithmetic unit. The arithmetic circuit 503 may also be a one-dimensional systolic array, a two-dimensional systolic array, or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路203从权重存储器202中取矩阵B相应的数据,并缓存在运算电路中每一个运算单元上。运算电路203从输入存储器201中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存统一存储器206中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 203 fetches the data corresponding to matrix B from the weight memory 202 and caches it on each arithmetic unit in the arithmetic circuit. The arithmetic circuit 203 fetches the matrix A data and matrix B from the input memory 201 to perform a matrix operation, and the partial result or final result of the obtained matrix is stored in the unified memory 206.
图3为本申请一个实施例的图像检索方法的示例性流程图。如图3所示,该方法可以包括S310至S340。Fig. 3 is an exemplary flowchart of an image retrieval method according to an embodiment of the application. As shown in FIG. 3, the method may include S310 to S340.
S310,获取底库特征,所述底库特征为图像库中的图像的特征。S310: Acquire a base library feature, where the base library feature is a feature of an image in an image library.
图像库中的图像可以称为底库图像。通常情况下,底库图像是提前采集或者采集好的。The images in the image library can be called bottom library images. Normally, the base library images are collected in advance or collected.
可以对图像库中的图像进行编号,给每个图像分配用于标识该图像的索引号。获取图像库中的图像的特征之后,也可以给各个底库特征分配索引号,底库特征的索引号可以与该底库特征对应的图像的索引号相同。You can number the images in the image library, and assign an index number for identifying the image to each image. After acquiring the features of the images in the image library, an index number may also be assigned to each base library feature, and the index number of the base library feature may be the same as the index number of the image corresponding to the base library feature.
图像库中的图像可以包括人脸图像、指纹图像、车牌图像、车辆图像、人形图像等等图像中一种或多种图像。图像库中的图像可以根据需求进行更新,例如添加、删除或替换其中的图像等。The images in the image library may include one or more of images such as a face image, a fingerprint image, a license plate image, a vehicle image, and a human image. The images in the image library can be updated as required, such as adding, deleting or replacing the images in it.
在一些实现方式中,可以将图像库中的每个图像,输入神经网络,进行特征提取,提取得到的特征称为底库特征。该特征提取网络可以采用通用的、用于提取特征的神经网络,也可以是重新设计的、能够提取特征的神经网络。例如,图像库中包含三十万张底库图像时,用于特征提取的神经网络可以包括Resnet50或者VGG16。在另一些实现方式中,可以直接从其他设备或系统,接收或拷贝底库特征。In some implementations, each image in the image library may be input to a neural network for feature extraction, and the extracted features are called base library features. The feature extraction network can adopt a general neural network for feature extraction, or it can be a redesigned neural network that can extract features. For example, when the image library contains 300,000 base images, the neural network used for feature extraction can include Resnet50 or VGG16. In other implementations, the base library features can be received or copied directly from other devices or systems.
底库特征的一种表现形式为矩阵,若底库特征对应的矩阵为一维矩阵时,该底库特征的表现形式为向量,该向量可以称为底库特征向量。在一个示例中,一个底库特征向量可以是256维、每一维为一个32位浮点数(fp32)的数据。One manifestation of the base library feature is a matrix. If the matrix corresponding to the base library feature is a one-dimensional matrix, the manifestation of the base library feature is a vector, which can be called a base library feature vector. In an example, a base library feature vector can be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).
获取到的所有底库特征的一种示例为图1所示的图像检索系统中的特征底库模块110。An example of all the acquired features of the base library is the feature base library module 110 in the image retrieval system shown in FIG. 1.
若上述获取到的底库特征的格式与S330中的检索神经网络的权重文件的格式不一致,则还需将该底库特征从当前格式(可以称为初始格式)转换为该检索神经网络能够读取的格式(可以称为目标格式)。If the format of the base library feature obtained above is inconsistent with the format of the weight file of the retrieval neural network in S330, the base library feature needs to be converted from the current format (which can be referred to as the initial format) to be readable by the retrieval neural network. The format taken (can be called the target format).
例如,通过图1所示的图像检索系统中的模型转换模块120将该底库特征转换为神经网络处理器能够支持的格式。具体地,可以通过基础软件包Numpy读取底库特征的原始格式文件,并将检索神经网络的权值文件中对应字段的权值数据替换为该原始格式文件中的底库特征,保存替换后的文件即得到检索神经网络的权值文件。For example, the model conversion module 120 in the image retrieval system shown in FIG. 1 converts the features of the base library into a format that can be supported by the neural network processor. Specifically, the original format file of the base library feature can be read through the basic software package Numpy, and the weight data of the corresponding field in the weight file of the retrieved neural network can be replaced with the base library feature in the original format file, and the replaced The file is the weight file of the retrieval neural network.
S320,获取待检索特征,所述待检索特征为待检索图像的特征。S320: Acquire a feature to be retrieved, where the feature to be retrieved is a feature of an image to be retrieved.
例如,通过图1所示的图像检索系统中的查询模块140对待检索图片进行特征提取,得到待检索图片的特征。待检索图片的特征可以称为待检索特征,待检索特征的一种表现形式为向量,该向量可以称为待检索特征向量。For example, the query module 140 in the image retrieval system shown in FIG. 1 performs feature extraction on the picture to be retrieved to obtain the feature of the picture to be retrieved. The feature of the picture to be retrieved can be called the feature to be retrieved, and one form of the feature to be retrieved is a vector, which can be called the feature vector to be retrieved.
在一个示例中,一个待检索特征向量可以是256维、每一维为一个32位浮点数(fp32)的数据。In an example, a feature vector to be retrieved may be data of 256 dimensions, each dimension being a 32-bit floating point number (fp32).
待检索图像可以包括人脸图像、指纹图像、车牌图像、车辆图像或人形图像等等。待检索图像可以是一张,也可以是一批,即多张。The image to be retrieved may include a face image, a fingerprint image, a license plate image, a vehicle image, or a human image, and so on. The image to be retrieved can be one, or one batch, that is, multiple images.
可以给待检索图像进行编号,即给每张待检索图像分配用于标识每张待检索图像的索引号。获取待检索图像的待检索特征之后,也可以给每个待检索特征分配索引号,待检索特征的索引号可以与对应的待检索图像的索引号相同。The images to be retrieved can be numbered, that is, each image to be retrieved is assigned an index number for identifying each image to be retrieved. After acquiring the feature to be retrieved of the image to be retrieved, an index number can also be assigned to each feature to be retrieved, and the index number of the feature to be retrieved can be the same as the index number of the corresponding image to be retrieved.
S330,将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现输入和权重之间的矩阵运算。该神经网络可以称为检索神经网络。S330. Use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the input of the neural network to obtain The output of the neural network, the neural network is used to realize the matrix operation between the input and the weight. This neural network can be called a retrieval neural network.
例如,通过图1所示的图像检索系统中的检索引擎模块150来实现神经网络的推理,获取神经网络的输出。网络推理就是调用底层硬件(例如神经网络处理器)完成输入与权重的加、减、乘和/或除运算。For example, the retrieval engine module 150 in the image retrieval system shown in FIG. 1 is used to implement neural network inference and obtain the output of the neural network. Network reasoning is to call the underlying hardware (such as a neural network processor) to complete the addition, subtraction, multiplication, and/or division of input and weight.
在一些可能的实现方式中,可以将底库特征作为神经网络的权重,将待检索特征作为神经网络的输入;在另一些实现方式中,可以将待检索特征作为神经网络的权重,将底库特征作为神经网络的输入。In some possible implementations, the base library feature can be used as the weight of the neural network, and the feature to be retrieved can be used as the input of the neural network; in other implementations, the feature to be retrieved can be used as the weight of the neural network, and the base library can be used as the weight of the neural network. The features are used as input to the neural network.
在一些实现方式中,在执行S330之前,可以先对每个底库特征和每个待检索特征进行归一化处理。其中,底库特征的归一化可以在特征提取模块110提取底库特征之后进行或者可以在模型转换模块120进行转换之后进行,或者可以在检索引擎模块150执行推理之前进行;待检索特征可以在查询模块140之后进行归一化。In some implementation manners, before performing S330, normalization processing may be performed on each base library feature and each feature to be retrieved. Among them, the normalization of the base library features can be performed after the feature extraction module 110 extracts the base library features, or can be performed after the model conversion module 120 performs the conversion, or can be performed before the search engine module 150 performs inference; the features to be retrieved can be performed in The query module 140 then performs normalization.
该神经网络的模型结构可以是基于计算底库特征与待检索特征的相似度的公式构造的。下面以利用余弦相似度来度量待检索特征向量与底库特征向量之间的相似度为例,介绍神经网络的模型结构与计算底库特征与待检索特征之间的相似度的公式与之间的关联关系。The model structure of the neural network can be constructed based on a formula for calculating the similarity between the features of the base database and the features to be retrieved. The following uses cosine similarity to measure the similarity between the feature vector to be retrieved and the feature vector of the base library as an example to introduce the model structure of the neural network and the formula for calculating the similarity between the feature vector of the base library and the feature to be retrieved. Relationship.
余弦相似度计算公式如下:The cosine similarity calculation formula is as follows:
Figure PCTCN2020077238-appb-000001
Figure PCTCN2020077238-appb-000001
其中,n表示有n个底库特征向量;A i表示待检索特征向量的第i项;
Figure PCTCN2020077238-appb-000002
表示第k个底库特征向量的第i项;A T表示待查询特征向量的转置;||A||表示待检索向量A的2范数,即待检索特征向量A中各项元素的平方和开根号;||W k||表示第k个底库特征向量W k的2范数,即第k个底库特征向量W k中各项元素的平方和开根号;cosθ k表示待检索特征向量与第k个底库特征向量的余弦相似度值。cosθ k越接近1,就表明待检索特征向量与该第k个底库特征向量越相似。
Among them, n indicates that there are n base library feature vectors; A i indicates the i-th item of the feature vector to be retrieved;
Figure PCTCN2020077238-appb-000002
Represents the i-th item of the k-th base library feature vector; AT represents the transposition of the feature vector to be queried; ||A|| represents the 2-norm of the vector A to be retrieved, that is, the element square and square root; || W is k represents 2 norm || k-th feature vector W is a bottom library of k, i.e., the square of the k-th bottom in the library feature vector W is k, and the square root of the elements; cos [theta] k Represents the cosine similarity value between the feature vector to be retrieved and the k-th base library feature vector. The closer cosθ k is to 1, it indicates that the feature vector to be retrieved is more similar to the feature vector of the k-th base library.
基于上述余弦相似度计算公式,可以构造用于计算A T·W k的检索神经网络,即构造用于实现A和的W k矩阵乘法运算的检索神经网络。 Based on the above-mentioned cosine similarity calculation formula, a retrieval neural network for calculating AT · W k can be constructed, that is, a retrieval neural network for realizing W k matrix multiplication of A and W can be constructed.
在一个示例中,可以构造一个检索神经网络,该检索神经网络可以包含类似于Tensorflow或Caffe中的全连接层的网络结构,并初始化该网络结构中的权重,即初始化该网络结构中的参数。In an example, a retrieval neural network can be constructed, and the retrieval neural network can include a network structure similar to a fully connected layer in Tensorflow or Caffe, and initialize the weights in the network structure, that is, initialize the parameters in the network structure.
确定好检索神经网络之后,将所述底库特征作为检索神经网络的权重,将所述待检索特征作为所述检索神经网络的输入,获取所述检索神经网络的输出时,在一个示例中,主处理器210可以将S310中获取到的底库特征加载到神经网络处理器230的权重存储器 202,将S320中获取的待检索特征加载到神经网络处理器230的输入存储器201中,并且,通过控制器204控制运算电路230,进行网络推理,即计算A T·W k的值。 After the retrieval neural network is determined, the base library feature is used as the weight of the retrieval neural network, the feature to be retrieved is used as the input of the retrieval neural network, and the output of the retrieval neural network is obtained, in an example, The main processor 210 can load the base library features obtained in S310 into the weight memory 202 of the neural network processor 230, load the features to be retrieved in S320 into the input memory 201 of the neural network processor 230, and pass The controller 204 controls the arithmetic circuit 230 to perform network inference, that is, to calculate the value of AT · W k.
S340,根据所述神经网络的输出从所述图像库中检索得到目标图像。S340: Retrieve a target image from the image library according to the output of the neural network.
本申请的实施例中,得到神经网络的输出之后,基于该神经网络的输出可以确定出底库特征与待检索特征的相似度,并根据该相似度从图像库中检索得到目标图像。In the embodiment of the present application, after the output of the neural network is obtained, the similarity between the base library feature and the feature to be retrieved can be determined based on the output of the neural network, and the target image can be retrieved from the image library according to the similarity.
在一些实现方式中,若在S330之前对每个底库特征向量和每个待检索特征向量进行了归一化处理,则||A||为1,||W k||也为1,因此,底库特征向量和待检索特征向量之间的余弦相似度计算公式可以简化为cosθ k=A T·W k,这种情况下,可以将该神经网络的输出作为底库特征与待检索特征的相似度。例如,可以将神经网络处理器230输出的
Figure PCTCN2020077238-appb-000003
的值作为第k个底库特征与待检索特征的相似度。
In some implementations, if each base library feature vector and each feature vector to be retrieved are normalized before S330, then ||A|| is 1, ||W k || is also 1, Therefore, the cosine similarity calculation formula between the base library feature vector and the feature vector to be retrieved can be simplified as cosθ k =A T ·W k . In this case, the output of the neural network can be used as the base library feature and the feature to be retrieved The similarity of features. For example, the output of the neural network processor 230 can be
Figure PCTCN2020077238-appb-000003
The value of is used as the similarity between the k-th base library feature and the feature to be retrieved.
在另一些实现方式中,若在S330之前没有对底库特征向量和待检索特征向量进行归一化处理,则可以对神经网络的输出作进一步的处理,从而得到底库特征与待检索特征的相似度。例如,可以将神经网络处理器230输出的
Figure PCTCN2020077238-appb-000004
的值除上
Figure PCTCN2020077238-appb-000005
并将得到的商值作为第k个底库特征与待检索特征的相似度。其中,
Figure PCTCN2020077238-appb-000006
的值与
Figure PCTCN2020077238-appb-000007
的值也可以是基于神经网络计算得到的,例如,将A分别作为神经网络的输入和权重,通过该神经网络对该输入和该权重进行矩阵运算;例如,W k别作为神经网络的输入和权重,通过该神经网络对该输入和该权重进行矩阵运算。
In other implementations, if the base library feature vector and the feature vector to be retrieved are not normalized before S330, the output of the neural network can be further processed to obtain the base library feature and the feature to be retrieved. Similarity. For example, the output of the neural network processor 230 can be
Figure PCTCN2020077238-appb-000004
Divided by
Figure PCTCN2020077238-appb-000005
And use the obtained quotient as the similarity between the k-th base library feature and the feature to be retrieved. in,
Figure PCTCN2020077238-appb-000006
And the value of
Figure PCTCN2020077238-appb-000007
The value of can also be calculated based on a neural network. For example, A is used as the input and weight of the neural network, and the input and the weight are matrixed through the neural network; for example, W k is used as the input and the weight of the neural network. The weight is used to perform matrix operations on the input and the weight through the neural network.
神经网络输出的相似度的数量可以由输入的底库特征和待检索特征的数量决定。例如,若将n个底库特征当作权重,将m个待检索特征当作输入,该神经网络可以输出m*n个相似度,m和n为正整数。The number of similarities output by the neural network can be determined by the input base library features and the number of features to be retrieved. For example, if n base library features are used as weights and m features to be retrieved are used as input, the neural network can output m*n similarities, and m and n are positive integers.
例如,作为检索神经网络的权重的底库特征向量为30万个,且作为检索神经网络的输入的待检索特征向量为16个(一批16个)时,检索神经网络的输出记为16*30万个相似度值,其中,每个相似度值的取值范围可以为[0,1]。For example, when the base library feature vector as the weight of the retrieval neural network is 300,000, and the feature vector to be retrieved as the input of the retrieval neural network is 16 (a batch of 16), the output of the retrieval neural network is recorded as 16* 300,000 similarity values, where the value range of each similarity value can be [0,1].
在一些示例中,神经网络处理器可以包括立体(cube)运算单元,这样,检索神经网络可以依赖神经网络处理器强大的cube矩阵运算能力来提升计算底库特征与待检索特征的相似度的能力。In some examples, the neural network processor may include a cube computing unit. In this way, the retrieval neural network can rely on the powerful cube matrix computing capabilities of the neural network processor to improve the ability to calculate the similarity between the features of the base library and the features to be retrieved. .
例如,单个cube运算单元在一个周期通常可以完成1个16*16*16维的矩阵乘法运算,甚至更高。这使得在主频相同的情况下,与通过传统的图像处理器或传统的CPU实现矩阵运算相比,计算效率可以提升8192倍。这是因为传统的CPU在一个时钟周期内只能完成一个乘法或者一个加法运算,如果用传统的CPU完成16*16*16个数据的乘法和加法运算,需要16*16*16*2个时钟周期;而一个cube单元在一个使用周期内可以完成一个16*16*16个数据的乘加运算,因此使用cube单元来进行处理,可以提高8192倍的效率。For example, a single cube operation unit can usually complete a 16*16*16-dimensional matrix multiplication in one cycle, or even higher. This makes it possible to increase the computational efficiency by 8192 times compared with the traditional image processor or traditional CPU implementation of matrix calculations under the same main frequency. This is because the traditional CPU can only complete one multiplication or one addition operation in one clock cycle. If a traditional CPU is used to complete the multiplication and addition operations of 16*16*16 data, 16*16*16*2 clocks are required. Cycle; and a cube unit can complete a 16*16*16 data multiplication and addition operation in one use cycle, so using the cube unit for processing can increase the efficiency by 8192 times.
甚至在输入多路多批次待检索特征的情况下,通过神经网络推理实现底库特征与待检索特征的相似度计算,可以使得运算速度达到毫秒级别。Even in the case of inputting multiple channels and batches of features to be retrieved, the similarity calculation between the base library features and the features to be retrieved through neural network inference can make the calculation speed reach the millisecond level.
在一些实现方式中,获取到底库特征与待检索特征的相似度之后,可以对获取到的相似度值进行过滤,以筛选出高于预设阈值的相似度及其对应待检索图像和底库图像的索引号;将这些相似度按照从大到小的顺序排序,并选出前K个相似度及其对应待检索图像和底库图像的索引号;然后,可以输出这K个相似度以及对应的待检索图像和底库图像,从 而实现图像的检索,K为正整数。这K个底库图像即为检索得到的与这K个待检索图像一一对应的目标图像。In some implementations, after obtaining the similarity between the base library feature and the feature to be retrieved, the obtained similarity value can be filtered to filter out the similarity higher than the preset threshold and the corresponding image to be retrieved and the base library The index number of the image; sort these similarities in descending order, and select the top K similarities and their corresponding index numbers of the image to be retrieved and the base library image; then, these K similarities can be output as well as Corresponding to the image to be retrieved and the base library image, so as to realize the retrieval of the image, K is a positive integer. The K background images are the retrieved target images that correspond to the K images to be retrieved one-to-one.
其中,阈值的可根据需要设置。一般来说,阈值可以尽量设置得大一些,因为如果阈值设置的较低,会保留较多的相似度,对后续排序以及选择前K个相似度和K个图像的底库图像的效率上有一定的影响。Among them, the threshold can be set as required. Generally speaking, the threshold value can be set as large as possible, because if the threshold value is set lower, more similarity will be retained, and the efficiency of subsequent sorting and selection of the first K similarity and K image base library images Certain influence.
本申请的图像检索方法中,底库特征和待检索特征均可以是神经网络来实现的,同时,再使用神经网络来计算底库特征和待检索特征的相似性,可以减少底库特征和待检索特征这些数据的反复搬移,降低数据搬运或传输耗时。此外,本申请的图像检索方法可以降低检索耗时,从而可以提高检索效率,进而提高用户体验。另一方面,可以按批向检索神经网络输入待检索特征,即输入多个待检索特征,从而可以提高检索效率。In the image retrieval method of this application, both the base library features and the features to be retrieved can be realized by neural networks. At the same time, the neural network is used to calculate the similarity between the base library features and the features to be retrieved, which can reduce the base library features and the features to be retrieved. Retrieving the characteristics of the repeated movement of these data reduces the time consumption of data movement or transmission. In addition, the image retrieval method of the present application can reduce retrieval time, thereby improving retrieval efficiency and thereby improving user experience. On the other hand, the features to be retrieved can be input to the retrieval neural network in batches, that is, multiple features to be retrieved are input, which can improve retrieval efficiency.
图4是本申请图像检索装置的一种示例性结构图。该装置400包括获取模块410,运算模块420和检索模块430。该装置400可以实现前述图3所示的方法。Fig. 4 is an exemplary structure diagram of the image retrieval device of the present application. The device 400 includes an acquisition module 410, an arithmetic module 420, and a retrieval module 430. The device 400 can implement the method shown in FIG. 3 described above.
例如,获取模块410用于执行S310和S320,运算模块420用于执行S330,检索模块430用于执行S340。For example, the acquisition module 410 is used to perform S310 and S320, the calculation module 420 is used to perform S330, and the retrieval module 430 is used to perform S340.
在一些实现方式中,装置400可部署在云环境中,云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,所述云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。装置400可以是云数据中心中用于图像检索的服务器。装置400也可以是创建在云数据中心中的用于图像检索的虚拟机。装置400还可以是部署在云数据中心中的服务器或者虚拟机上的软件装置,该软件装置用于图像检索,该软件装置可以分布式地部署在多个服务器上、或者分布式地部署在多个虚拟机上、或者分布式地部署在虚拟机和服务器上。例如,装置400中的获取模块410、运算模块420和检索模块430可以分布式地部署在多个服务器上,或分布式地部署在多个虚拟机上,或者分布式地部署在虚拟机和服务器上。In some implementations, the device 400 may be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode. The cloud environment includes a cloud data center and a cloud service platform. The cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider. The computing resources included in the cloud data center can be a large number of computing resources. Device (for example, server). The device 400 may be a server for image retrieval in a cloud data center. The apparatus 400 may also be a virtual machine for image retrieval created in a cloud data center. The device 400 may also be a software device deployed on a server or a virtual machine in a cloud data center. The software device is used for image retrieval. The software device may be distributed on multiple servers or distributed on multiple servers. On a virtual machine, or distributedly deployed on virtual machines and servers. For example, the acquisition module 410, the calculation module 420, and the retrieval module 430 in the apparatus 400 may be distributed on multiple servers, or distributed on multiple virtual machines, or distributed on virtual machines and servers. superior.
装置400可以由云服务提供商在云服务平台抽象成一种图像检索的云服务提供给用户,用户在云服务平台购买该云服务后,云环境利用该云服务向用户提供图像检索的云服务,用户可以通过应用程序接口(application program interface,API)或者通过云服务平台提供的网页界面上传待检索图像至云环境,由装置400接收待检索图像,根据该待检索图像进行图像检索,最终搜索得到的目标图像由装置400返回至用户所在的边缘设备。The device 400 can be abstracted by a cloud service provider into a cloud service of image retrieval on a cloud service platform and provided to the user. After the user purchases the cloud service on the cloud service platform, the cloud environment uses the cloud service to provide the user with the cloud service of image retrieval. The user can upload the image to be retrieved to the cloud environment through the application program interface (API) or through the web interface provided by the cloud service platform, the device 400 receives the image to be retrieved, performs image retrieval based on the image to be retrieved, and finally obtains The target image of is returned by the device 400 to the edge device where the user is located.
当装置400为软件装置时,装置400也可以单独部署在任意环境的一个计算设备上。When the apparatus 400 is a software apparatus, the apparatus 400 may also be separately deployed on a computing device in any environment.
本申请还提供一种如图5所示的装置500,装置500包括处理器502、通信接口503和存储器504。装置500的一种示例为芯片。装置500的另一种示例为计算设备。The present application also provides an apparatus 500 as shown in FIG. 5. The apparatus 500 includes a processor 502, a communication interface 503, and a memory 504. An example of the device 500 is a chip. Another example of the apparatus 500 is a computing device.
处理器502、存储器504和通信接口503之间可以通过总线通信。存储器504中存储有可执行代码,处理器502读取存储器504中的可执行代码以执行对应的方法。存储器504中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX TM,UNIX TM,WINDOWS TM等。 The processor 502, the memory 504, and the communication interface 503 may communicate through a bus. Executable code is stored in the memory 504, and the processor 502 reads the executable code in the memory 504 to execute the corresponding method. The memory 504 may also include other software modules required for running processes, such as an operating system. The operating system can be LINUX TM , UNIX TM , WINDOWS TM etc.
例如,存储器504中的可执行代码用于实现3所示的方法,处理器502读取存储器504中的该可执行代码以执行图3所示的方法。For example, the executable code in the memory 504 is used to implement the method shown in 3, and the processor 502 reads the executable code in the memory 504 to execute the method shown in FIG. 3.
其中,处理器502可以为中央处理器(central processing unit,CPU),或者,处理器 502的一种示例性结构如图2所示。存储器504可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器504还可以包括非易失性存储器(2non-volatile memory,2NVM),例如只读存储器(2read-only memory,2ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态启动器(solid state disk,SSD)。The processor 502 may be a central processing unit (CPU), or an exemplary structure of the processor 502 is shown in FIG. 2. The memory 504 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM). The memory 504 may also include a non-volatile memory (2non-volatile memory, 2NVM), such as a read-only memory (2read-only memory, 2ROM), a flash memory, a hard disk drive (HDD), or a solid-state boot ( solid state disk, SSD).
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (14)

  1. 一种图像检索方法,其特征在于,包括:An image retrieval method, characterized in that it comprises:
    获取待检索特征,所述待检索特征为待检索图像的特征;Acquiring a feature to be retrieved, where the feature to be retrieved is a feature of the image to be retrieved;
    获取底库特征,所述底库特征为底库图像的特征,所述底库图像为预先配置的图像库中的图像;Acquiring a bottom library feature, the bottom library feature being a feature of a bottom library image, and the bottom library image being an image in a pre-configured image library;
    将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现所述输入和所述权重的矩阵运算;Use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the input of the neural network to obtain the The output of a neural network, the neural network is used to realize the matrix operation of the input and the weight;
    根据所述神经网络的输出从所述图像库中检索得到目标图像。The target image is retrieved from the image library according to the output of the neural network.
  2. 如权利要求1所述的图像检索方法,其特征在于,所述获取所述神经网络的输出,包括:The image retrieval method according to claim 1, wherein said obtaining the output of said neural network comprises:
    通过神经网络处理器获取所述神经网络的输出。The output of the neural network is obtained by the neural network processor.
  3. 如权利要求2所述的图像检索方法,其特征在于,所述神经网络处理器包括立体运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。3. The image retrieval method according to claim 2, wherein the neural network processor comprises a stereo operation unit, and the minimum duration of the multiplication operation between the three-dimensional matrices by the stereo operation unit is one clock cycle.
  4. 如权利要求1至3中任一项所述的图像检索方法,其特征在于,所述待检索特征作为所述输入,将所述底库特征作为所述权重。The image retrieval method according to any one of claims 1 to 3, wherein the feature to be retrieved is used as the input, and the base library feature is used as the weight.
  5. 如权利要求1至4中任一项所述的图像检索方法,其特征在于,所述神经网络包括全连接层。The image retrieval method according to any one of claims 1 to 4, wherein the neural network includes a fully connected layer.
  6. 一种图像检索装置,其特征在于,包括:An image retrieval device, characterized in that it comprises:
    获取模块,用于获取待检索特征,所述待检索特征为待检索图像的特征;An acquisition module for acquiring features to be retrieved, where the features to be retrieved are features of the image to be retrieved;
    所述获取模块还用于获取底库特征,所述底库特征为底库图像的特征,所述底库图像为预先配置的图像库中的图像;The acquisition module is also used to acquire bottom library features, where the bottom library features are features of bottom library images, and the bottom library images are images in a pre-configured image library;
    运算模块,用于将所述底库特征和所述待检索特征中的一项作为神经网络的权重,将所述底库特征和所述待检索特征中的另一项作为所述神经网络的输入,获取所述神经网络的输出,所述神经网络用于实现所述输入和所述权重的矩阵运算;An arithmetic module, configured to use one of the base library feature and the feature to be retrieved as the weight of the neural network, and use the other one of the base library feature and the feature to be retrieved as the weight of the neural network Input to obtain the output of the neural network, and the neural network is used to implement the matrix operation of the input and the weight;
    检索模块,用于根据所述神经网络的输出从所述图像库中检索得到目标图像。The retrieval module is used to retrieve the target image from the image library according to the output of the neural network.
  7. 如权利要求6所述的图像检索装置,其特征在于,所述运算模块具体用于:通过神经网络处理器获取所述神经网络的输出。7. The image retrieval device according to claim 6, wherein the computing module is specifically configured to obtain the output of the neural network through a neural network processor.
  8. 如权利要求7所述的图像检索装置,其特征在于,所述神经网络处理器包括立体运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。8. The image retrieval device according to claim 7, wherein the neural network processor comprises a three-dimensional arithmetic unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
  9. 如权利要求6至8中任一项所述的图像检索装置,其特征在于,所述待检索特征作为所述输入,将所述底库特征作为所述权重。8. The image retrieval device according to any one of claims 6 to 8, wherein the feature to be retrieved is used as the input, and the base library feature is used as the weight.
  10. 如权利要求6至9中任一项所述的图像检索装置,其特征在于,所述神经网络包括全连接层。9. The image retrieval device according to any one of claims 6 to 9, wherein the neural network includes a fully connected layer.
  11. 一种图像检索装置,其特征在于,包括:处理器,所述处理器与存储器耦合;An image retrieval device, characterized by comprising: a processor coupled with a memory;
    所述存储器用于存储指令;The memory is used to store instructions;
    所述处理器用于执行所述存储器中存储的指令,以使得所述装置执行如权利要求1所 述的方法。The processor is configured to execute the instructions stored in the memory, so that the device executes the method according to claim 1.
  12. 如权利要求11所述的装置,其特征在于,所述处理器包括神经网络处理器;The device of claim 11, wherein the processor comprises a neural network processor;
    其中,所述神经网络处理器用于:获取所述神经网络的输出,所述神经网络的权重为所述底库特征和所述待检索特征中的一项,所述神经网络的输入为所述底库特征和所述待检索特征中的另一项,所述神经网络用于实现所述输入和所述权重的矩阵运算。Wherein, the neural network processor is configured to: obtain the output of the neural network, the weight of the neural network is one of the base library feature and the feature to be retrieved, and the input of the neural network is the Another item of the base library feature and the feature to be retrieved, and the neural network is used to implement the matrix operation of the input and the weight.
  13. 如权利要求12所述的装置,其特征在于,所述神经网络处理器包括立体CUBE运算单元,所述立体运算单元进行三维矩阵之间的乘法运算的最小时长为一个时钟周期。The device according to claim 12, wherein the neural network processor comprises a three-dimensional CUBE operation unit, and the minimum duration for the three-dimensional arithmetic unit to perform multiplication operations between three-dimensional matrices is one clock cycle.
  14. 一种计算机可读介质,其特征在于,包括指令,当所述指令在处理器上运行时,使得所述处理器执行如权利要求1至5中任一项所述的方法。A computer-readable medium, characterized by comprising instructions, which when run on a processor, cause the processor to execute the method according to any one of claims 1 to 5.
PCT/CN2020/077238 2020-02-28 2020-02-28 Image retrieval method and image retrieval apparatus WO2021168815A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080097508.1A CN115176244A (en) 2020-02-28 2020-02-28 Image search method and image search device
PCT/CN2020/077238 WO2021168815A1 (en) 2020-02-28 2020-02-28 Image retrieval method and image retrieval apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/077238 WO2021168815A1 (en) 2020-02-28 2020-02-28 Image retrieval method and image retrieval apparatus

Publications (1)

Publication Number Publication Date
WO2021168815A1 true WO2021168815A1 (en) 2021-09-02

Family

ID=77489774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/077238 WO2021168815A1 (en) 2020-02-28 2020-02-28 Image retrieval method and image retrieval apparatus

Country Status (2)

Country Link
CN (1) CN115176244A (en)
WO (1) WO2021168815A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124135A1 (en) * 2021-12-29 2023-07-06 上海商汤智能科技有限公司 Feature retrieval method and apparatus, electronic device, computer storage medium and program

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595233A (en) * 2023-06-02 2023-08-15 上海爱可生信息技术股份有限公司 Vector database retrieval processing acceleration method and system based on NPU

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510058A (en) * 2018-02-28 2018-09-07 中国科学院计算技术研究所 Weight storage method in neural network and the processor based on this method
CN108805280A (en) * 2017-04-26 2018-11-13 上海荆虹电子科技有限公司 A kind of method and apparatus of image retrieval
CN109063824A (en) * 2018-07-25 2018-12-21 深圳市中悦科技有限公司 Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network
CN110287350A (en) * 2019-06-29 2019-09-27 北京字节跳动网络技术有限公司 Image search method, device and electronic equipment
US20190325276A1 (en) * 2018-04-23 2019-10-24 International Business Machines Corporation Stacked neural network framework in the internet of things
CN110674294A (en) * 2019-08-29 2020-01-10 维沃移动通信有限公司 Similarity determination method and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805280A (en) * 2017-04-26 2018-11-13 上海荆虹电子科技有限公司 A kind of method and apparatus of image retrieval
CN108510058A (en) * 2018-02-28 2018-09-07 中国科学院计算技术研究所 Weight storage method in neural network and the processor based on this method
US20190325276A1 (en) * 2018-04-23 2019-10-24 International Business Machines Corporation Stacked neural network framework in the internet of things
CN109063824A (en) * 2018-07-25 2018-12-21 深圳市中悦科技有限公司 Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network
CN110287350A (en) * 2019-06-29 2019-09-27 北京字节跳动网络技术有限公司 Image search method, device and electronic equipment
CN110674294A (en) * 2019-08-29 2020-01-10 维沃移动通信有限公司 Similarity determination method and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124135A1 (en) * 2021-12-29 2023-07-06 上海商汤智能科技有限公司 Feature retrieval method and apparatus, electronic device, computer storage medium and program

Also Published As

Publication number Publication date
CN115176244A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
US11307865B2 (en) Data processing apparatus and method
EP4123515A1 (en) Data processing method and data processing device
EP4167130A1 (en) Neural network training method and related device
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
EP4145351A1 (en) Neural network construction method and system
US20220222918A1 (en) Image retrieval method and apparatus, storage medium, and device
CN111382868A (en) Neural network structure search method and neural network structure search device
CN111241345A (en) Video retrieval method and device, electronic equipment and storage medium
JP7266828B2 (en) Image processing method, apparatus, device and computer program
WO2020098257A1 (en) Image classification method and device and computer readable storage medium
CN110222718B (en) Image processing method and device
WO2021168815A1 (en) Image retrieval method and image retrieval apparatus
Shamsolmoali et al. High-dimensional multimedia classification using deep CNN and extended residual units
CN111931002A (en) Matching method and related equipment
WO2021190433A1 (en) Method and device for updating object recognition model
Huu et al. Proposing a recognition system of gestures using MobilenetV2 combining single shot detector network for smart-home applications
CN113869496A (en) Acquisition method of neural network, data processing method and related equipment
CN113987119A (en) Data retrieval method, cross-modal data matching model processing method and device
CN110909817B (en) Distributed clustering method and system, processor, electronic device and storage medium
CN112529149A (en) Data processing method and related device
WO2023207531A1 (en) Image processing method and related device
WO2023197857A1 (en) Model partitioning method and related device thereof
Peluso et al. Inference on the edge: Performance analysis of an image classification task using off-the-shelf cpus and open-source convnets
WO2022001364A1 (en) Method for extracting data features, and related apparatus
CN116958020A (en) Abnormal image detection method, model training method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921779

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921779

Country of ref document: EP

Kind code of ref document: A1