CN114625903A

CN114625903A - Image retrieval method and device and image retrieval equipment

Info

Publication number: CN114625903A
Application number: CN202011460941.5A
Authority: CN
Inventors: 屠震元; 俞忠伟; 叶挺群
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-14

Abstract

The application discloses an image retrieval method which comprises the steps of obtaining at least one image to be retrieved and feature vector data of a compared image in base data on an image retrieval device side, selecting operators for calculating similarity between the image to be retrieved and the compared image according to the type of a hardware platform of the image retrieval device and according to feature dimensions and/or dimension division strategies of the feature vector data, calling and executing operator binary files of the selected operators according to the selected operators to obtain a similarity result, comparing the similarity result, and obtaining the compared image matched with the image to be retrieved from the base data to serve as a retrieval result. The method is beneficial to constructing a platform with hardware characteristics to realize the retrieval acceleration method with the best performance, and the method can support the retrieval of dynamic resolution due to the selection of operators from the input characteristic dimension, so that the retrieval effect under different resolutions is obtained.

Description

Image retrieval method and device and image retrieval equipment

Technical Field

The present application relates to the field of image analysis, and in particular, to an image retrieval method and apparatus.

Background

In image analysis, it is usually necessary to compare an image to be analyzed with an image to be compared, for example, in video analysis, input images or video data are compared with image data in a database one by one to obtain a retrieval result.

The image retrieval mainly compares the features extracted based on the image to be retrieved with the features extracted based on the compared image to obtain the best matched compared image, thereby obtaining the retrieval result.

Since the data size of the features included in the image itself is large, for example, a single feature vector dimension is 512, each dimension is represented by a 16-bit floating point, and assuming that there are 100 ten thousand feature vectors, the total memory occupied is:

512 (dimension) × 2 (byte) × 100 (ten) ═ 0.95GB

When such a large amount of data is loaded on hardware, the hardware memory is severely limited, and meanwhile, the large amount of data also means that the calculation amount is large and the acceleration performance is not high.

Disclosure of Invention

The embodiment of the application provides an image retrieval method, which is used for improving the acceleration performance under a hardware platform.

The image retrieval method provided by the embodiment of the application is realized as follows: on the side of the image retrieval device,

acquiring at least one image to be retrieved and the feature vector data of the compared image in the base database,

selecting operators for calculating the similarity between the image to be retrieved and the compared image according to the hardware platform type of the image retrieval equipment and the characteristic dimensions and/or the dimension division strategy of the characteristic vector data,

calling and executing the operator binary file of the selected operator according to the selected operator to obtain a similarity result,

and comparing the similarity result, and obtaining a compared image matched with the image to be retrieved from the base database as a retrieval result.

Preferably, before selecting each operator for calculating the similarity between the image to be retrieved and the compared image according to the feature dimension of the feature vector data and/or the dimension division strategy according to the hardware platform type of the image retrieval device, further comprising,

and according to the type of the hardware platform, carrying out alignment operation on the feature dimensions of the feature vector data to obtain a data arrangement format matched with the type of the hardware platform.

Preferably, the obtaining at least one image to be retrieved and the feature vector data of the compared image feature vector data in the base database further comprises,

filtering out feature vector data which do not accord with the set dimension according to the set dimension;

and/or the presence of a gas in the gas,

and limiting the quantity of the feature vectors contained in the feature vector data of the image to be retrieved according to the resources of the hardware platform.

Preferably, the selecting, according to the hardware platform type of the image retrieval device and according to the feature dimension of the feature vector data and/or the dimension division strategy, each operator for calculating the similarity between the image to be retrieved and the compared image includes,

selecting an operator according to the relation between the on-chip resources of the hardware platform and the data quantity of the feature vector data,

wherein the content of the first and second substances,

the on-chip resources of the hardware platform are determined by the hardware platform type,

the data amount of the feature vector data is determined by the feature dimension and the number of feature vectors,

the dimension division strategy is determined by the memory size of the hardware platform.

Preferably, the aligning the feature dimensions of the feature vector data according to the hardware platform type to obtain the data arrangement format matched with the hardware platform type includes,

converting a feature vector matrix which is formed by taking the feature dimension of the feature vector as the number of rows and taking the number of the feature vectors as the number of columns into more than one fragmentation matrix;

the fragmentation matrix comprises feature vector data taking the number of feature vectors as a line number and the fragmentation feature dimensions of the feature vectors as a column number, the size of the fragmentation feature dimensions is determined by the number of memory alignments, and the total number of fragments of the fragmentation matrix is rounding-up of the quotient of the feature dimensions of the feature vectors and the fragmentation feature dimensions;

selecting an operator according to the relation between the on-chip resources of the hardware platform and the data volume of the feature vector data, wherein the operator comprises comparing the total number of the fragments with the data volume in the fragment matrix;

if the total number of the fragments is larger than or equal to the data quantity in the fragment matrix, and the on-chip resources can contain eigenvector data consisting of the fragment matrices with the total number of the fragments, selecting a first operator for processing data in a multi-channel parallel mode and a fragment matrix parallel mode, so that more than one fragment matrix is directly unfolded into a two-dimensional mode for data operation with the total number of the fragments as the number of rows and the number of the data quantities in the fragment matrix as the number of columns, wherein the number of the channels is the same as the total number of the fragments,

otherwise, segmentation is carried out according to the data size which can be contained by the on-chip resources and based on the total number of the fragments, and an operator is selected according to the mode that the on-chip resources can contain the segmented data size.

Preferably, the splitting the total number of the fragments according to the data size that the on-chip resource can accommodate, and the operator is selected in such a way that the on-chip resource can accommodate the split data size, including,

rounding up the quotient of the set threshold value and the data volume in the fragmentation matrix to obtain the number of single channels, and taking the fragmentation matrix of at least one single channel as the data volume after segmentation; wherein the threshold is determined by the memory size of the hardware platform,

comparing the amount of data in the fragmentation matrix to the threshold,

if the data quantity of the fragmentation matrix is larger than or equal to the threshold value, selecting a second operator, wherein the operator is used for processing data in a single-channel fragmentation matrix parallel mode, so that more than one fragmentation matrix is directly unfolded to perform data operation in a two-dimensional mode that a single channel is used as a row number and the data quantity in the fragmentation matrix is used as a column number,

and otherwise, selecting a third operator, wherein the operator is used for processing data in a single-channel fragment matrix parallel mode and at least two or more single-channel fragment matrices serial modes, so that the more than one fragment matrices are directly unfolded into a two-dimensional mode that a single channel is used as the row number and the data quantity in the fragment matrices is used as the column number to sequentially perform data operation on all single-channel fragment matrices.

Preferably, said calling and executing the operator binary file of the selected operator according to the selected operator further comprises,

for each selected operator, packing and describing parameters required by the operator to realize the operator,

calling an operator binary file of the selected operator according to the operator name in the parameter;

the operator binary file is obtained from an operator file pre-stored outside the image retrieval device,

wherein the operator files comprise operator binary files of each hardware platform type and/or operator implementation files of each hardware platform type for direct binary compilation to generate the operator binary files,

the operator is corresponding to operators with different file types according to different optimization degrees and/or characteristic dimensions to realize files,

the same operator implementation file has different operator binary files according to the optimization degree and/or different correspondences of the compiler.

An image retrieval apparatus provided by an embodiment of the present application, the apparatus includes,

a characteristic vector data acquisition module for acquiring the characteristic vector data of the image to be retrieved and the compared image in the bottom database data,

an operator selection module used for selecting each operator for calculating the similarity between the image to be retrieved and the compared image according to the hardware platform type of the image retrieval equipment and the characteristic dimension and/or the dimension division strategy of the characteristic vector data,

the operator realization callback module is used for calling and executing the operator binary file of the selected operator according to the selected operator to obtain a similarity result,

and the similarity comparison module is used for comparing the similarity result and obtaining the compared image which is most matched with the image to be retrieved from the base database data as the retrieval result.

An embodiment of the present application further provides an image retrieval device, which includes a hardware platform, where the hardware platform includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to implement any of the steps of the image retrieval method described above.

According to the image retrieval method, according to the hardware platform type of the image retrieval equipment, according to the feature dimension of the feature vector data and/or the dimension division strategy, each operator for calculating the similarity between the image to be retrieved and the compared image is selected, therefore, aiming at different hardware platforms, the optimal operator matched with the hardware platform type can be selected, so that the optimal computing performance under a specific hardware platform is achieved, the method for accelerating the retrieval with the optimal performance by constructing a platform with hardware characteristics is facilitated, and in addition, due to the fact that the operator is selected from the input feature dimension, the method can support the retrieval with the dynamic resolution, and therefore the retrieval effects under different resolutions are obtained.

Drawings

Fig. 1 is a schematic flowchart of an image retrieval method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of face image retrieval.

Fig. 3 is a schematic diagram of feature extraction performed by a deep learning network during establishment of a face database, and feature extraction performed by a to-be-retrieved face image by a deep learning network.

Fig. 4 is a schematic diagram of similarity calculation based on GEMM.

Fig. 5 is a flowchart illustrating an image retrieval method according to an embodiment of the present application.

FIG. 6 is a schematic representation of the aligned data arrangement of NCHW and NHWC 32.

Fig. 7 is a diagram illustrating a data arrangement format conversion.

Fig. 8 is a schematic diagram of different operators selected according to different resolutions.

FIG. 9 is a schematic flow chart of operator selection.

FIG. 10 is a schematic diagram of a first operator implementation.

FIG. 11 is a schematic diagram of a second operator implementation.

FIG. 12 is a schematic diagram of a third operator implementation.

Fig. 13 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present application.

Fig. 14 is a schematic diagram of an image retrieval device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical means and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings.

The applicant researches and discovers that image retrieval based on a General Matrix Multiplication (GEMM) algorithm is distance calculation of a full sample, the precision is lossless, but the calculation amount is very large, and further researches and discovers that due to limited hardware resources, the GEMM image retrieval method realized on some platforms does not support dynamic transformation resolution, namely, what resolution is when initializing, and which fixed resolution is required when executing feature retrieval; for example, when an ARM-based hardware platform is used for searching a GEMM image, in order to achieve the purpose of acceleration, a specific optimization dimension is specified when an operator realizes code compiling, so that dimension information does not need to be specified again during operation, and the purpose of performance acceleration is achieved.

Referring to fig. 1, fig. 1 is a schematic flow chart of an image retrieval method according to an embodiment of the present application. The method is applied to the image retrieval device side, and comprises the following steps,

step 101, obtaining the current image to be retrieved and the feature vector data of the compared image in the base database,

102, according to the hardware platform type of the image retrieval equipment, selecting each operator for calculating the similarity between the image to be retrieved and the compared image according to the characteristic dimension of the characteristic vector data and/or the dimension division strategy,

it should be understood that the hardware platform types include, but are not limited to, a kind, a model, different surrogate hardware platforms of the same kind and/or model, for example, the hardware platform types include, by kind, a reduced instruction set microprocessor (ARM), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the like, wherein the CPU may further include, by model, a Sky Lake architecture, a Cascade Lake architecture, and the like;

the images to be retrieved are different, the feature dimensions of the feature vector data are also different, and the selected operators are different.

Step 103, according to the selected operator, calling and executing the operator binary file of the selected operator to obtain a similarity result,

and step 104, comparing the similarity results, and obtaining a compared image matched with the image to be retrieved from the base database as a retrieval result. Alternatively, the compared image that meets the set matching threshold is taken as the retrieval result.

And 105, judging whether all the images to be retrieved are retrieved completely, if so, ending, otherwise, taking the next image to be retrieved as the current image to be retrieved, and returning to the step 101.

According to the embodiment of the application, through characteristic dimension analysis of the characteristic vector of the current image to be retrieved, the optimal operator is selected according to the type of the hardware platform, and therefore the optimal calculation performance is achieved under the specific hardware platform. The method and the device realize acceleration from the aspect of input characteristic dimension information, so that the dynamic transformation resolution can be supported, the platform universality is realized, and the optimal performance is obtained on different platforms.

For the sake of understanding the present application, a face matching search will be described as an example, and it should be understood that the present application is not limited to the search of a face image, and the search of an image including any content, for example, a vehicle image, a topographic image, a text image, and the like, may also be applied.

Referring to fig. 2, fig. 2 is a schematic diagram of face image retrieval. The face database is established in an off-line mode relative to real-time face retrieval and is used for base database data of a face image feature set formed after feature extraction is carried out on a face image serving as a compared image. The base database may have a large number of characteristic features, such as gender characteristics, wearing characteristics, such as wearing a hat, wearing glasses, etc., and library-type characteristics, such as category characteristics under various application scenarios and/or uses, which may be combined with a plurality of small categories of base database, each of which has different usage characteristics, such as traffic base database, bank base database, etc., which are commonly combined to form the face database.

The face image to be retrieved can be a living body image obtained by various camera devices, and feature extraction is carried out on the basis of the image, so that a feature vector of the face image to be retrieved is obtained. The various types of camera devices include, but are not limited to, RGB (red green blue) camera devices, RGB combined with infrared camera devices, RGB combined with depth information camera devices.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating feature extraction performed by a deep learning network when a face database is established, and feature extraction performed by a to-be-retrieved face image by a deep learning network. There are many methods for extracting human face features, and the methods are implemented on different hardware platforms in different ways. For example, the detection network may be yolo V3, yolo V2, or the following classification network may be resnet 50.

When the image retrieval is carried out, similarity calculation is carried out on the feature vector of the face image to be retrieved and the feature vector in the base database data, and face data which is most matched with the face image to be retrieved is obtained from the base database data through similarity comparison, so that a retrieval result is obtained.

Referring to fig. 4, fig. 4 is a schematic diagram of similarity calculation based on GEMM. The base database data is a first matrix formed by M base database characteristic vectors with characteristic dimensionalities of K, wherein each row in the first matrix is a single base database characteristic vector with characteristic dimensionality of K, and the row number of the first matrix is M, so that M rows x K columns of data are shared in the first matrix; the human face image data to be retrieved is a second matrix formed by N characteristic vectors with K characteristic dimensions, wherein each column of the second matrix is a single image characteristic vector to be retrieved with the K characteristic dimensions, the number of columns of the second matrix is N, so that K rows of data multiplied by N columns of data are shared in the second matrix, the first matrix and the second matrix are cross-multiplied to obtain a third matrix with M rows and N columns, and the matrix is similarity data.

The above similarity calculation needs support on a hardware platform. In view of the different resources of hardware in different hardware platforms, for example, different on-chip resources; hardware acceleration performance is different in different hardware platforms, for example, different types of hardware platforms have different hardware acceleration instruction sets; hardware computing power is different in different hardware platforms; this results in different operators for similarity calculation in different hardware platforms. The operator may be understood as a partial process for processing the ring node in the similarity calculation process, and is a minimum process for implementing a certain function, including but not limited to reading and writing of data, calculation of data, and the like.

To improve the computing power of the hardware platform, refer to fig. 5, and fig. 5 is a schematic flow chart of the image retrieval method according to the embodiment of the present application. Taking the current image to be retrieved as an example, the method comprises,

step 501, filtering feature dimension information of the feature vector to obtain a filtered feature vector, wherein the dimension of the filtered feature vector conforms to a set dimension.

In this step, feature dimension information of a feature vector of a current face image to be retrieved is filtered, in order to reduce meaningless calculation, some cluttered feature dimensions are filtered, and optionally, feature vectors with K not equal to a set dimension threshold are filtered, for example, feature vectors with K not equal to 258, or 512, or 1024; it is also possible to set a threshold value of N according to the hardware platform in view of the versatility of the hardware platform to limit N, for example, N does not exceed 64.

Since the similarity calculation is a cross product of the first matrix and the second matrix, the feature vector dimension in the base database data needs to be filtered in the same way, so that the feature vector dimension of the base database data is the same as the dimension of the face image to be retrieved.

Step 502, according to the type of the hardware platform, performing alignment operation on each feature dimension information of the feature vector, and converting the feature dimension information into a data arrangement format matched with the hardware platform.

Since different types of hardware platforms have different hardware acceleration instruction sets, the condition that a hardware acceleration instruction in the hardware acceleration instruction set can be called is that a main dimension of data operation can be aligned with a memory alignment number. Therefore, in order to better utilize the hardware acceleration performance, the memory alignment number may be determined according to the acquired hardware information, so as to call the instruction in the hardware acceleration instruction set, thereby improving the operation efficiency.

Such as:

for the GPU hardware, a specific optimized instruction set may be selected according to the computing capability of the GPU, different computing capabilities have different optimized instruction sets, and the computing capability value may be obtained by querying parameters of the GPU graphics card, for example, the computing capability value includes, but is not limited to, computer _60, computer _72, computer _75, and the like, 80% of instruction sets of GPUs of different generations are the same, and the remaining 20% of instruction sets belong to the characteristic instructions of the current generation, and these characteristic instruction sets may often present advantages, and according to the characteristic instruction sets, optimization work of a specific platform may be performed.

For the hardware of the CPU, the architecture type of the CPU can be determined according to the obtained specific hardware model, because instruction sets supported by different hardware core architectures are different, and thus the optimization manner and the optimization force are different.

The memory alignment numbers of different computing hardware are different, for example, the memory alignment numbers are K-dimensional 32-byte alignment on ARM, K-dimensional 8-byte alignment on GPU, and K-dimensional 16-byte alignment on CPU. The memory alignment number may be determined according to the type identifier of the computing hardware in the acquired hardware information.

The data is stored in different arrangement formats, such as NCHW and NHWC32 aligned, as shown in fig. 6, and fig. 6 is a schematic diagram of the aligned data arrangement of NCHW and NHWC 32. In order to improve the data access efficiency, the first matrix of K rows × M columns adopts K1 split matrix arrangements of M rows × K0 columns, which are denoted as [ K1, M, K0] data arrangement format, and a schematic diagram of the first matrix arrangement of K rows × M columns and the conversion of [ K1, M, K0] data arrangement format is shown in fig. 7. Wherein K0 is a fragment feature dimension of a feature vector in a fragment matrix, and is strongly related to the micro-architecture, 32 are taken here, and the part of data needs to be stored continuously; k1 can be understood as the number of sliced matrices, K1 ═ K/K0, rounded up if the result is not divided exactly.

Step 503, according to the type of the hardware platform, selecting each operator for similarity calculation according to the dimension of the input feature vector and/or the dimension division strategy.

In view of different memory resources of different hardware platforms, in order to solve the problem of limited memory resources in the similarity calculation process, different operators are selected according to the filtered feature vector dimension (input dimension) and/or the dimension division strategy. Preferably, operators are selected according to different partition ranges of K, for example, if K is greater than 0 and less than 128, a first operator is selected; if K is larger than 128 and smaller than 256, selecting a second operator; if K is greater than 256, a third operator … … is selected, where the partition range needs to be obtained by conversion according to hardware resources, mainly referring to on-chip memory resources of the hardware.

Since on-chip resources of different hardware are different, the dimension partitioning policy is also completely different. When the operator acceleration strategies under different input dimensions are selected, different input dimensions conforming to the same operator acceleration strategy can share the same binary file. The same operator acceleration strategy means that the division ranges of the dimensions are the same, for example, K is larger than 0 and smaller than 128 belongs to one strategy, and then K is between 0 and 128, which actually includes a plurality of image resolutions. But for those resolutions where K is between 0 and 128, the operators they implement are of the same kind, e.g., operator 0. Therefore, different resolutions can be supported by selecting the acceleration operators with different strategies according to different input dimensions, so that the dynamic transformation resolution is supported in the image retrieval process, and the problem that the resolution is directly specified during the compiling of the operator code in the conventional image retrieval method so as to not support the dynamic transformation resolution is solved. Referring to fig. 8, fig. 8 is a schematic diagram of acceleration operators of different strategies selected according to different resolutions. After different resolution parameters are transmitted, accelerating operators with different strategies are selected according to different resolutions.

For ease of understanding, the selection of operators is described below using the processing of the first matrix as an example.

Referring to fig. 9, fig. 9 is a schematic flow chart of operator selection.

Judging the relation between the total number of shards K1 and the data size contained in the shard matrix, wherein the data size contained in the shard matrix is M multiplied by K0, which can be understood as the size of the shard matrix,

if K1 is greater than or equal to the amount of data contained in the tiled matrix, then since the on-chip resources are sufficient to accommodate all the data, a first operator can be selected for processing the data in K1 channel parallel and tiled matrix parallel, so that [ K1, M, K0] data is directly expanded into a two-dimensional manner with K1 as the number of rows and M × K0 as the number of columns for data manipulation, as shown in fig. 10, which is a schematic diagram of the first operator implementation.

If K1 is less than or equal to M × K0, K1 is sliced according to the data size that can be accommodated by the on-chip resources, so that the on-chip resources can accommodate the sliced data size, specifically:

rounding up the quotient of the set threshold value and the data amount in the fragmentation matrix to obtain the single channel number S0, wherein the threshold value is determined by the size of the on-chip memory,

whether M x K0 is greater than or equal to a set threshold value is judged,

if so, the number of single channels S0 equals 1, so a second operator is selected for processing data in parallel in a tiled matrix of single channels, such that K1 is sliced into one single channel S0, [ K1, M, K0] data is directly expanded into two dimensions with S0 as the number of rows and M K0 as the number of columns for data manipulation, wherein the size of S0M K0 is less than or equal to the size of the on-chip memory. Fig. 11 is a schematic diagram of the implementation of the second operator, as shown in fig. 11.

Otherwise, the number of single channels S0 is greater than 1, so a third operator is selected, the operator is used to process data in parallel in a single-channel tile matrix and in serial manner in each single-channel tile matrix, in this way, K1 is cut into multiple single channels S0, [ K1, M, K0] data is directly expanded into a two-dimensional manner (single-channel tile matrix) with S0 as row number and M × K0 as column number to perform data operation in sequence, where the size of S0 × M × K0 is smaller than or equal to the size of on-chip memory, S0 is set to the threshold/(M × K0), and if the result is not divided, the third operator is rounded up. Fig. 12 is a schematic diagram of the third operator implementation, as shown in fig. 12.

As can be seen from the above operator selection strategy, if the amount of data to be calculated is greater than the on-chip memory, all the data to be calculated cannot be stored in the on-chip memory at one time, and the operation can be completed only through multiple operations, so that the operation can be accelerated only through a single channel in parallel. However, if the amount of data to be calculated is smaller than the on-chip memory, that is, all the data can be stored in the on-chip memory at one time, the amount of data to be calculated can be processed in parallel in a plurality of channels. As can be seen from the operator selection strategy, the number of single channels has a direct relation with the dimension division strategy, and the data volume which can be accommodated by the on-chip memory has an indirect relation with the dimension division. As can be seen from the selection of the above operators, when the data amount to be calculated is different, the operators mean different, and therefore, in the similarity calculation process, even the same image to be retrieved may be subjected to similarity calculation by different operators.

Similarly, the same method is used for operator selection for the feature vector matrix (second matrix) of the image to be retrieved.

In step 504, in view of different hardware platforms, different operators, different involved dimension division parameters, and different invoked hardware acceleration strategies, parameters required by each selected operator are described for resource integration, so as to shield the difference of heterogeneous hardware, and facilitate invoking operators.

For each operator, some operator parameters can be directly assigned, and some operator parameters need to be obtained through calculation, and the operator parameters are collectively described in the form of a list, a structural body and the like, that is, the description is packed, so that the parameters can be conveniently transmitted in the subsequent processing.

Taking the first operator as an example, the required parameters include the number of rows and columns of the tile matrix, the number of channels, the number of repetitions, the name of the operator, and the like, and these parameters can be described in a packing manner.

Step 505, according to the selected operator, invoking the binary file corresponding to the selected operator in the operator binary file to complete the invocation of the single operator,

the operator binary file can be obtained by directly carrying out binary compiling on the operator implementation file. In the step, different operators correspond to different input dimension information, and the different operators have different binary files, so that the corresponding operator is selected according to the input dimension information when the operator is called.

In addition, because the compilers corresponding to different hardware platforms are different, the optimization strength can be controlled during compiling, so that different optimization strengths can be selected for generating different binary files for the binary files of the same operator. For example, some algorithm scenarios need to be optimized for occupied space, so-Os compilation options are added; some algorithm scenarios, which require extreme operator performance, add-O3 compilation options, and so on.

In order to improve the performance of the operator, for different input dimension information, different hardware platforms and different dimension division strategies, operator implementation files suitable for multiple hardware platforms and multiple operators, such as operator 0.cpp, operator 1.cpp and operator 2.cpp in the graph, can be pre-established.

Preferably, the implementation file type of each operator may include multiple file types to support operator implementations with different optimization strengths, because even the lowest-level implementation may have different optimization strengths due to different input dimensions, performance may be different for those very specific input dimensions in different ways, for example, in some dimensions, assembly implementation is fastest; in some dimensions, implementation using a third party library is fastest; in some dimensions, the use of a common C implementation is fastest, and so on.

In practical applications, the operator file including the operator implementation file and/or the binary file corresponding to the operator of the operator implementation file may be loaded from the outside, for example, in a storage device and/or a server external to the image retrieval device. Therefore, the occupation of the storage space of the image retrieval device is reduced, and the operator acceleration strategy has universality so as to obtain the optimal performance on different platforms.

And when the binary file of the operator is called, searching the binary file of the operator with the same operator name in the operator file according to the operator name included by the operator parameter.

And finally, according to the obtained similarity result, obtaining a compared image matched with the image to be retrieved from the base database as a retrieval result.

The image retrieval method provided by the embodiment of the application realizes selection of different operators, such as an ARM operator suitable for ARM, a CPU operator suitable for CPU and a GPU operator suitable for GPU, aiming at different hardware resource sizes, different transmission modes of data streams and different pipelines (pipeline); aiming at different input dimensions of different images to be retrieved, an optimal operator under the current dimension can be selected under a corresponding hardware platform, for example, a first dimension operator matched with a first input dimension, a second dimension operator matched with a second input dimension and the like; according to different calculation modes of the similarity, the operator for calculating the similarity between the image to be retrieved and the compared image can be a cosine similarity operator, or a GEMM operator for calculating the cosine similarity, or a Euclidean distance operator, and the like; in order to ensure expandability, operator implementation files with operators of different platforms are collected, and the operator implementation files are compiled into o files by using corresponding compilers under different hardware platforms and then provided for image retrieval equipment, so that operator implementation of various operators is provided. Different from the optimization method which has limited hardware resources and directly specifies a specific optimization dimension when an operator realizes code compiling when realizing the GEMM retrieval function, the method realizes acceleration from input characteristic dimension information, and selects different strategy operators, such as a first operator, a second operator, a third operator and the like, according to different input dimensions, so as to support the realization of operators with different resolutions; the method can support various different hardware platforms, and after the hardware platforms are determined, a retrieval method with hardware characteristics and capable of achieving the best performance can be constructed.

Referring to fig. 13, fig. 13 is a schematic diagram of an image retrieval apparatus according to an embodiment of the present application. The device comprises a plurality of devices which are connected with each other,

an operator selection module, which is used for selecting each operator for calculating the similarity between the image to be retrieved and the compared image according to the hardware platform type of the image retrieval equipment and each characteristic dimension and/or dimension division strategy of the characteristic vector data,

the callback module is used for calling and executing the operator binary file of the selected operator according to the selected operator to obtain a similarity result,

Preferably, the device also comprises a control unit,

the dimension information filtering module is used for filtering the feature vector data which do not accord with the set dimension according to the set dimension; and/or limiting the quantity of the feature vectors contained in the feature vector data of the image to be retrieved according to the resources of the hardware platform;

the data format conversion module is used for aligning the feature dimensions of the feature vector data according to the hardware platform type to obtain a data arrangement format matched with the hardware platform type;

and the parameter packing module is used for centrally describing the parameters required by each selected operator.

The data format conversion module, the operator selection module and the parameter packing module can be executed based on a heterogeneous hardware platform, that is, corresponding operations are executed according to different hardware characteristics.

Referring to fig. 14, fig. 14 is a schematic view of an image retrieval device according to an embodiment of the present application. The apparatus comprises a hardware platform comprising a memory storing a computer program and a processor configured to execute the steps of the computer program to implement the image retrieval method.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored in the storage medium, and the computer program realizes the steps of the image retrieval method when being executed by a processor.

For the device/network side device/storage medium embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An image retrieval method characterized by comprising, on an image retrieval apparatus side,

according to the selected operator, calling and executing the operator binary file of the selected operator to obtain a similarity result,

and comparing the similarity result, and obtaining a compared image matched with the image to be retrieved from the base database data as a retrieval result.

2. The method of claim 1, wherein before selecting operators for calculating similarity between the image to be retrieved and the compared image according to the hardware platform type of the image retrieval device and the feature dimensions of the feature vector data and/or the dimension division strategy, further comprising,

3. The method of claim 1, wherein obtaining feature vector data for at least an image to be retrieved and compared image feature vector data in the base database further comprises,

filtering feature vector data which do not accord with the set dimension according to the set dimension;

and/or the presence of a gas in the atmosphere,

4. The method according to any one of claims 1 to 3, wherein the selecting, according to the hardware platform type of the image retrieval device, each operator for calculating the similarity between the image to be retrieved and the compared image according to each feature dimension of the feature vector data and/or a dimension division strategy comprises,

acquiring on-chip resources of a hardware platform according to the type of the hardware platform of the image retrieval device, selecting an operator according to the relation between the on-chip resources of the hardware platform and the data quantity of the feature vector data, wherein,

the data volume of the feature vector data is determined by the feature dimensions and the number of feature vectors;

and/or the presence of a gas in the atmosphere,

acquiring on-chip resources of a hardware platform according to the type of the hardware platform of the image retrieval device,

determining a dimension partitioning policy based on the on-chip resources, the dimension partitioning policy including at least a dimension partitioning range,

and selecting operators corresponding to different dimension division ranges based on the feature dimensions of the feature vector data.

5. The method of claim 4, wherein aligning the feature dimensions of the feature vector data according to the hardware platform type to obtain a data arrangement format matching the hardware platform type comprises,

the fragmentation matrix comprises feature vector data with the number of the feature vectors as a line number and fragmentation feature dimensions of the feature vectors as a column number, the size of the fragmentation feature dimensions is determined by the memory alignment number, and the total number of fragments of the fragmentation matrix is rounded up by taking the quotient of the feature dimensions of the feature vectors and the fragmentation feature dimensions.

6. The method of claim 5, wherein the selecting an operator based on a relationship of on-chip resources of the hardware platform and a data volume of the feature vector data comprises,

comparing the total number of the fragments with the data size in the fragment matrix;

if the total number of the fragments is more than or equal to the data quantity in the fragment matrix, and the on-chip resources can accommodate the eigenvector data consisting of the fragment matrices with the total number of the fragments, selecting a first operator for processing the data in a multi-channel parallel mode and a fragment matrix parallel mode, so that the more than one fragment matrices are directly unfolded to perform data operation in a two-dimensional mode with the total number of the fragments as the number of lines and the number of the data quantities in the fragment matrices as the number of columns, wherein the number of the channels is the same as the total number of the fragments,

7. The method of claim 6, wherein the total number of slices is sliced according to the amount of data that the on-chip resource can accommodate, and wherein the operator is selected in such a way that the on-chip resource can accommodate the sliced amount of data, including,

rounding up the quotient of the set threshold value and the data quantity in the fragmentation matrix to obtain the number of single channels, and taking the fragmentation matrix of at least one single channel as the data quantity after the fragmentation; wherein the threshold is determined by the memory size of the hardware platform,

comparing the amount of data in the fragmentation matrix to the threshold,

if the data quantity of the fragmentation matrix is larger than or equal to the threshold value, selecting a second operator, wherein the operator is used for processing data in a single-channel fragmentation matrix parallel mode, so that more than one fragmentation matrix is directly unfolded into a two-dimensional mode with a single channel as the number of rows and the data quantity in the fragmentation matrix as the number of columns to perform data operation,

8. The method of claim 1, wherein said calling and executing an operator binary of a selected operator according to the selected operator, further comprises,

for each selected operator, packing and describing parameters required by the operator implementation,

wherein the content of the first and second substances,

the operator file comprises an operator binary file of each hardware platform type and/or an operator implementation file of each hardware platform type for direct binary compilation to generate the operator binary file,

the same operator implementation file is provided with different operator binary files according to different optimization degrees and/or different compilers.

9. An image retrieval apparatus, characterized by comprising,

a characteristic vector data acquisition module for acquiring at least one image to be retrieved and characteristic vector data of an image to be compared in the base database data,

and the similarity comparison module is used for comparing the similarity result and obtaining the compared image matched with the image to be retrieved from the base database as the retrieval result.

10. An image retrieval device comprising a hardware platform, characterized in that the hardware platform comprises a memory and a processor, the memory storing a computer program, the processor being configured to execute the steps of the computer program to implement the image retrieval method according to any one of claims 1 to 8.