CN112765382A

CN112765382A - Image searching method, image searching device, image searching medium and electronic equipment

Info

Publication number: CN112765382A
Application number: CN202110077625.8A
Authority: CN
Inventors: 黄程
Original assignee: Shanghai Yitu Network Science and Technology Co Ltd
Current assignee: Shanghai Yitu Network Science and Technology Co Ltd
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-05-07

Abstract

The present disclosure relates to the field of computer vision, and in particular, to a target tracking method, an apparatus, a medium, and an electronic device. The target tracking method is used for searching a library image corresponding to a target image in an image library according to the target image, and specifically comprises the following steps: respectively obtaining a target image and a label and a feature vector of each library image in an image library, determining the matching degree of each library image and the label of the target image, and obtaining the similarity between each library image and the target image according to the matching degree of each library image and the label of the target image; and determining a library image corresponding to the target image from the image library based on the similarity of each library image and the target image. Therefore, the similarity of the library images with unmatched labels is lowered, the library images with unmatched labels are not in the search results, and the accuracy of the search results is improved.

Description

Image searching method, image searching device, image searching medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image search method, an apparatus, a medium, and an electronic device.

Background

With the increasing of image data information on the internet, the number of images in an established image library is more and more huge, and how to quickly and accurately search out library images becomes a problem which is more and more concerned in the current massive image searching field.

In the prior art, the degree of similarity of a target image to images in an image library is determined by calculating the distance between a sequence of feature vectors of the target image and a sequence of feature vectors of library images in the image library. The greater the distance, the smaller the similarity between the target image and the images in the image library; the smaller the distance, the greater the similarity of the target image to the library images in the image library. For one image searching process, the distance between each library image in the image library and the target object can be sorted, and the library image with the highest similarity of the first K bits is selected as a searching result. However, in practical applications, since the number of library images in the image library is too large, there is a possibility that a desired library image is not within the top K-bit search result or an undesired library image is within the top K-bit search result, eventually resulting in an inaccurate search result.

Disclosure of Invention

The embodiment of the application provides an image searching method and device, a medium and electronic equipment thereof.

In a first aspect, an embodiment of the present application provides an image search method, including: respectively obtaining a target image and a label and a feature vector of each library image in an image library, wherein the feature vectors of the library image and the target image comprise one or more feature vectors, and the labels of the library image and the target image comprise one or more labels; determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image; and determining a library image corresponding to the target image from the image library based on the similarity of each library image and the target image. Therefore, the similarity of the library images with unmatched labels is lowered, the library images with unmatched labels are not in the search results, and the accuracy of the search results is improved.

In a possible implementation of the first aspect, the method further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: determining the matching degree of the labels of each library image and the target image, and calculating the original similarity of each library image and the target image through a distance algorithm; and taking the matching degree of the label of each library image and the label of the target image as an adjusting coefficient, and adjusting each original similarity to obtain the similarity between each library image and the target image.

In a possible implementation of the first aspect, the method further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: and determining the matching degree of the label of each library image and the target image, using a distance algorithm, taking the matching degree of the label of each library image and the target image as an adjusting coefficient for the distance algorithm, and calculating to obtain the similarity between the feature vectors of each library image and the target image.

In a possible implementation of the first aspect, the method further includes: the distance algorithm includes at least one of algorithms for calculating a euclidean distance, a cosine distance, a manhattan distance, a mahalanobis distance, and an EMD distance.

In a possible implementation of the first aspect, the method further includes: determining the matching degree of the label of each library image and the target image, wherein the matching degree comprises the following steps: and determining the matching degree of the label between the library image and the target image through a label matching algorithm, wherein the label matching algorithm comprises at least one algorithm of a bidirectional encoder algorithm, a long-time and short-time memory neural network algorithm and a bidirectional long-time and short-time memory neural network.

In a possible implementation of the first aspect, the method further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case that at least one label in the target image does not match a label in the library image, the similarity of the target image and the library image is reduced.

In a possible implementation of the first aspect, the method further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case where all the labels of the target image match the labels in the library image, the similarity between the target image and the library image is unchanged.

In a possible implementation of the first aspect, the method further includes: determining a library image corresponding to the target image from the image library based on the similarity between each library image and the target image, comprising: and sequencing the similarity of each library image in the image library and the target image in sequence, and determining the library image corresponding to the target image from the image library according to the sequencing sequence.

In a second aspect, an embodiment of the present application provides an image search apparatus, including: the label and feature vector acquisition module is used for respectively acquiring labels and feature vectors of the target image and each library image in the image library, wherein the feature vectors of the library image and the target image comprise one or more feature vectors, and the labels of the library image and the target image comprise one or more labels;

the similarity adjusting module is used for determining the matching degree of the label of each library image and the target image and acquiring the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image;

and the search result determining module is used for determining the library image corresponding to the target image from the image library based on the similarity of each library image and the target image.

In a possible implementation of the second aspect, the apparatus further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: determining the matching degree of the labels of each library image and the target image, and calculating the original similarity of each library image and the target image through a distance algorithm; and taking the matching degree of the label of each library image and the label of the target image as an adjusting coefficient, and adjusting each original similarity to obtain the similarity between each library image and the target image.

In a possible implementation of the second aspect, the apparatus further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: and determining the matching degree of the label of each library image and the target image, using a distance algorithm, taking the matching degree of the label of each library image and the target image as an adjusting coefficient for the distance algorithm, and calculating to obtain the similarity between the feature vectors of each library image and the target image.

In a possible implementation of the second aspect, the apparatus further includes: the distance algorithm includes at least one of algorithms for calculating a euclidean distance, a cosine distance, a manhattan distance, a mahalanobis distance, and an EMD distance.

In a possible implementation of the second aspect, the apparatus further includes: determining the matching degree of the label of each library image and the target image, wherein the matching degree comprises the following steps: and determining the matching degree of the label between the library image and the target image through a label matching algorithm, wherein the label matching algorithm comprises at least one algorithm of a bidirectional encoder algorithm, a long-time and short-time memory neural network algorithm and a bidirectional long-time and short-time memory neural network.

In a possible implementation of the second aspect, the apparatus further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case that at least one label in the target image does not match a label in the library image, the similarity of the target image and the library image is reduced.

In a possible implementation of the second aspect, the apparatus further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case where all the labels of the target image match the labels in the library image, the similarity between the target image and the library image is unchanged.

In a possible implementation of the second aspect, the apparatus further includes: determining a library image corresponding to the target image from the image library based on the similarity between each library image and the target image, comprising: and sequencing the similarity of each library image in the image library and the target image in sequence, and determining the library image corresponding to the target image from the image library according to the sequencing sequence.

In a third aspect, the present application provides a machine-readable medium, on which instructions are stored, and when executed on a machine, the instructions cause the machine to perform the image search method in the first aspect and possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing instructions for execution by one or more processors of the system, an

The processor is one of processors of an electronic device, and is configured to execute the image search method in the first aspect and possible implementations of the first aspect.

Drawings

FIG. 1 illustrates an image search scene graph, according to some embodiments of the present application;

FIG. 2 illustrates a flow diagram of an image search method, according to some embodiments of the present application;

FIG. 3 illustrates a schematic diagram showing an array of feature vectors for a library image in an image library, according to some embodiments of the present application;

FIG. 4 illustrates a diagram of enumerated values of enumerated image feature semantics, according to some embodiments of the present application;

FIG. 5 illustrates a block diagram of an image search apparatus, according to some embodiments of the present application;

FIG. 6 illustrates a block diagram of an electronic device, in accordance with some embodiments of the present application;

fig. 7 illustrates a block diagram of a system on a chip (SoC), according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, an image search method and apparatus, medium, and electronic device thereof.

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

According to the image searching method provided by the embodiment of the application, the distance between the library image in the image library and the target image is calculated, and the similarity between the library image in the image library and the target image is measured through the distance; and adjusting the similarity between each library image in the image library and the target image according to the matching degree of the label of each library image and the label of the target image. And then, sequencing the similarity of each library image and the target image from high to low to obtain library images with preset number. The similarity of the library images with unmatched labels is lowered, so that the library images with unmatched labels are not in the search results, and the accuracy of the search results is improved.

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

For convenience of describing the technical solution of the present application, an image library generated by monitoring a video and a library image in a search image library are taken as an example for explanation.

Fig. 1 is a view illustrating a scene of image search in a surveillance video according to an embodiment of the present application, as shown in fig. 1, the scene includes a surveillance video 100, a camera 101, an electronic device 103, an image library 105 generated from the surveillance video, and a target image 104. The camera 101 includes 1 or more monitoring cameras.

As shown in fig. 1, the camera 101 is configured to capture or shoot a video or an image 100, the camera 101 transmits the captured video or image 100 to the electronic device 103, specifically, the image library 105 may be an image library generated by performing image processing on the video or image 100, and the library image in the image library 105 may include a human body image, an object image, or an animal image in the video or image 100. The library image is an image stored in units of a single human body, a single animal, a single object, or the like.

As shown in FIG. 1, the image library 105 has N library images, specifically including library image 105-1, library image 105-2 … … library image 105-N, each of which is provided with one or more labels. The target image 104 may be a single human image or a single animal image or a single object image in the video or image 100, or may be a single human image or a single animal image provided by a user or a single object image (e.g., a single commodity image). The present application takes as an example that the library image in the image library 105 and the target image 104 are a single human body image.

As examples of the image library, it may be a portrait library of people who travel in a railway station, a portrait library of a membership card containing personal identification information, an animal image library, a commodity image library, and the like, and the content and source of the image library are not particularly limited.

In some embodiments, the electronic device 103 is an electronic device with certain image or video processing capabilities, such as a Personal Computer (PC), a notebook computer, or a server. The server may be an independent physical server, or a server cluster formed by a plurality of physical servers, or a server providing basic cloud computing services such as a cloud database, a cloud storage, and a CDN, and the scale of the server may be planned according to the number of video streams to be processed, which is not limited in the embodiment of the present application.

As an example, in the scene of image search in the surveillance video, the camera 101 may be a surveillance camera provided in a place such as a mall, a road, or a subway entrance, for capturing a video or an image 100 of a pedestrian in the place. In practical applications, the image search scene in the video surveillance may include one or more cameras, for example, a mall monitor includes a camera set at each position of each floor in a mall.

In addition, the camera 101 and the electronic device 103 may be communicatively connected via one or more networks. The network may be a wired network or a Wireless network, for example, the Wireless network may be a mobile cellular network, or may be a Wireless-Fidelity (WIFI) network, and certainly may also be other possible networks, which is not limited in this embodiment of the present application.

It can be understood that the application scenario of the present application is not limited to image search in surveillance videos, such as pedestrian search in shopping malls, pedestrian search in subway stations, and the like. The application scene of the application is also suitable for image searching in an identity card image library, and identity information of a target object in the target image is further obtained through the searched target image. The application scene of the application is also suitable for a face comparison scene of a railway station, a commodity search scene of a shopping website and the like.

The following describes a specific embodiment of the present application with reference to fig. 1 to 2. Fig. 2 is a flowchart illustrating an image searching method according to an embodiment of the present application, and specifically, the method includes:

step 201: labels and feature vectors are obtained for each library image in the image library.

In the embodiment of the present application, each library image in the portrait image library 105 is extracted by a feature extraction algorithm to obtain a feature vector representing a feature semantic category of the library image. The semantic features of the library images include human body features, clothing features, ornament features and carried object features. The human body characteristics include hair, face, age, gender and the like. The garment features include: a coat, trousers, a dress, shoes, etc. The ornament feature includes: hat, sunglasses, glasses, scarf, belt, etc. The characteristic of carrying thing includes: single shoulder satchels, backpack satchels, handbags, draw-bar boxes, umbrellas, etc.

FIG. 3 shows a schematic diagram of an array of feature vectors for library images in an image library, such as the portrait image library 105 comprising N library images, e.g., library image 105-1 … … library image 105-N, as shown in FIG. 3, according to an embodiment of the present application. Wherein, the feature semantics of the library image 105-1 are represented by feature vector arrays { db1[0], db1[1], … db1[ i ], … db1[ t-1] }, the feature semantics category of the library image 105-1 is represented by feature vector arrays { db2[0], db2[1], … db2[ i ], … db2[ t-1] }, … …, and the feature semantics category of the library image 105-n is represented by feature vector arrays { dbn [0], dbn [1], … dbn [ i ], … dbn [ t-1] }.

In the embodiment of the present application, the feature vector is a multi-dimensional feature vector, that is, the feature vector includes features in multiple dimensions. For example, the feature vector may be a P-dimensional feature vector, where P is a positive integer greater than 1. In some embodiments, feature vector extraction may be performed on a large number of library images in the image library through a feature extraction algorithm such as a Local Binary Patterns (LBP) algorithm, a Histogram of Oriented Gradient (HOG) algorithm, a Haar-like algorithm, or the like, to obtain a feature vector corresponding to each library image in the image library. Wherein, each library image can extract t characteristic vectors, and the t characteristic vectors form a characteristic vector array of the library image.

In embodiments of the present application, the feature vectors do not correspond to feature semantic categories of the images one to one, and one feature semantic category of an image may be represented by one or more feature vectors, i.e. a feature semantic category of a library image may be represented by one or more feature vectors. For example, the feature semantic category "gender" of the library image 105-1 may be represented by feature vector db1[0] and feature vector db1[1 ]. The feature semantic category "age" of the library image 105-1 may be represented by the feature vector db1[1], feature vector db1[ t-1 ]. It is understood that the specific number of semantic categories of a certain feature used for representing the library image is determined according to the image content or a feature extraction algorithm, and is not limited herein.

In an embodiment of the present application, one or more labels are obtained for each library image in the image library 105, wherein the labels for the library images may include identification number, gender, age, criminal record, hair length, jacket, glasses, and the like. For example, one or more tags of library image 105-1 may include: no criminal record, male, young, short-haired, no glasses, identification number, etc.

Step 202: and acquiring a label and a feature vector of the target image.

In some embodiments, the feature vector of the target image 104 may be obtained by performing image feature extraction on the target image 104 through any one of a Local Binary Patterns (LBP) algorithm, a Histogram of Oriented Gradient (HOG) algorithm, a Haar-like algorithm, and the like, to generate a feature vector of the target image 104. One feature semantic category of the target image 104 may be represented by one or more feature vectors of an array of feature vectors { query [0], query [1], … query [ i ], … query [ t-1 }. For example, the semantic category "gender" of the features of the target image 104 can be represented by the feature vector query [0] and the feature vector query [1 ]. The feature semantic category "age" of the target image 104 can be represented by a feature vector query [1], a feature vector query [ t-1 ].

In some embodiments, one or more labels of the target image 104 are obtained, wherein the labels of the target image 104 may include identification number, gender, age, criminal record, hair length, jacket, glasses, and the like. For example, the one or more labels of the target image 104 may be male, short-hair, child, identification number,. no glasses, etc.

Step 203: and determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image after adjustment according to the matching degree of the label of the target image.

In some embodiments, the image library 105 includes n library images, and compares all library images in the image library with the target image one by one to obtain the similarity between each library image and the target image; for example, library image 105-1 is compared with target object 104 to determine similarity, library image 105-2 is compared with target object to determine similarity, and library image 105-n is compared with target object to determine similarity, so that n similarity results are obtained in total.

As can be seen from step 201 and step 202, the semantic categories of the image features of the library image and the target image are represented by a feature vector array, wherein the feature vector array includes one or more feature vectors. And calculating the distance between the characteristic vector of each library image and the characteristic vector of the target image according to a distance algorithm, and measuring the similarity between each library image and the target image according to the distance.

The process of comparing the library image in the image library with the target image to generate the similarity will be described below by taking the library image 105-1 and the target image 104 as an example. Based on the feature vector array of the library image 105-1 and the feature vector array of the target image 104, the sum of euclidean distances Dist (query, db1) of the feature vectors of the library image 105-1 and the feature vectors of the target image 104 can be expressed as formula (1):

wherein t represents the number of feature vectors in the feature vector array of the library image 105-1 or the target image 104; query [ i ] represents the ith feature vector in the feature vector array of the target image 104, and db1[ i ] represents the ith feature vector in the feature vector array of the library image 105-1. t is an integer of 1 or more, and i is an integer of 0 or more and t-1 or less.

It is understood that the sum of distances Dist (query, db1) between the feature vectors of the library image 105-1 and the feature vectors of the target image 104 is the euclidean distance between the library image 105-1 and the target image 104. As can be seen from formula 1, the smaller the value of the sum of euclidean distances Dist (query, db1) between the feature vectors of the library image 105-1 and the feature vectors of the target image 104, the higher the similarity between the library image 105-1 and the target image 104; the greater the value of the sum of euclidean distances Dist (query, db1) of the feature vectors of the library image 105-1 and the feature vectors of the target image 104, the lower the similarity of the library image 105-1 and the target image 104. The calculation of the similarity between the other library images in the image library 105 and the target image 104 is the same as the calculation of the similarity between the library image 105-1, and is not repeated here.

It is to be understood that the above measures the similarity between the library image 105-1 in the image library 105 and the target image 104 by calculating the euclidean distance between the feature vector of the library image 105-1 and the feature vector of the target image 104, and that the similarity between the library image 105-1 in the image library and the target image 104 can be determined in other ways when actually implemented. For example, cosine distances, manhattan distances, mahalanobis distances, EMD distances, and the like may also be used.

In addition, in some cases, the distance value of the vector may or may not be a numerical value, for example, it may be a tokenized representation of a degree or a trend, and in this case, the content of the tokenized representation may be quantized to a specific value by a preset rule. Further, the quantified values may be subsequently utilized to determine the similarity of the library image 105-1 and the target image 104. For example, if the distance value between two feature vectors is "middle", the character may be quantized to the binary value or hexadecimal value of its ASCII code, and the distance between two vectors in the embodiment of the present application is not limited to the above.

In the embodiment of the present application, on the basis of determining the similarity between the feature vector of the library image and the feature vector of the target image, the influence of the matching degree between the labels of the library image and the target image needs to be considered.

The library image 105-1 and the target image 104 are explained as an example. By determining the matching degree of the labels of the library image 105-1 and the target image 104; and (3) taking the matching degree of the labels of the library image 105-1 and the target image 104 as an adjustment coefficient to further adjust the distance result obtained by the formula (1), or taking the matching degree of the labels of the library image 105-1 and the target image 104 as an adjustment coefficient to further adjust the distance value of the feature vector of the library image 105-1 and the target image 104 in the formula (1), and further adjusting the similarity of the library image 105-1 and the target image 104.

Specifically, as a result of the matching degree of the labels of the library image 105-1 and the target image 104 as an influence of the adjustment coefficient on the similarity of the feature vector of the library image and the feature vector of the target image, if at least one of the one or more labels of the target image 104 does not match the label in the library image 105-1, the adjustment coefficient of the target image 104 and the library image 105-1 is adjusted so that the similarity of the target image 104 and the library image 105-1 is reduced. If all the labels of the target image 104 match the labels in the library image 105-1, the adjustment coefficients of the target image 104 and the library image 105-1 are adjusted so that the similarity of the target image 104 and the library image 105-1 is unchanged. The specific implementation manner is described in detail in the following with reference to specific embodiments.

Step 204: and determining a library image corresponding to the target image from the image library based on the similarity of each library image and the target image.

Through the steps, each library image in the image library is compared with the target image one by one, the distance between each library image in the image library and the target image is obtained, the similarity between each library image and the target image is determined according to the distances, the similarities between each library image in the image library and the target image are sequentially sequenced, and the library image corresponding to the target image is determined from the image library according to the sequencing sequence.

Specifically, for example, the euclidean distance between the library image and the target image is adjusted according to the matching degree of the label of each library image and the label of the target image by using a euclidean distance algorithm, and the "similarity" between each library image and the target image is determined by the adjusted "distance". The distance and the similarity are mapped one by one, and the smaller the distance between the library image and the target image is, the higher the similarity is. The similarity between each library image and the target image is ranked from high to low, and the library image corresponding to the largest similarity or the largest similarities can be selected from the similarities to serve as a search result; or selecting the library image corresponding to the similarity greater than the preset threshold value from the similarities as a search result; library images corresponding to the similarity of the first k bits in the sequence from big to small can also be selected as search results; library images of a preset number of values are obtained.

It is understood that the image search method in fig. 2 is executed according to the sequence of step 201 and step 202, and the execution sequence of step 201 and step 202 may also be other sequences in other embodiments of the present application, for example, step 202 is executed first and then step 201 is executed, and the specific execution sequence is not limited in the present application.

As can be seen from the description of fig. 2, on the basis of determining the similarity between the feature vector of the library image and the feature vector of the target image, the influence of the matching degree between the labels of the library image and the target image needs to be considered, and the following specifically describes the technical solution of the present application with reference to formula (2) according to some embodiments of the present application.

Formula (2) is based on formula (1), and the matching degree of the labels of the library image 105-1 and the target image 104 is used as an adjustment coefficient, and the result calculated by formula (1) is adjusted, so as to finally obtain the distance Dist (query, db 1)' between the feature vectors of the library image 105-1 and the target image 104 after adjustment; equation (2) can be expressed as follows:

where Dist (query, db1) represents the sum of euclidean distances of the feature vectors of the library image 105-1 and the feature vectors of the target image 104; m represents the number of all the characteristic semantics of enumeration; x represents the number of bits occupied by each feature semantic; query 'x represents the enumerated value of the xth label of the target image 104, db 1' x represents the enumerated value of the xth label of the library image 105-1, and σ represents the maximum number; wherein, the value of σ is used to make Dist (query, db 1)' much larger than Dist (query, db 1). m is an integer greater than or equal to 1, and x is an integer less than or equal to m-1.

How formula (2) is implemented to adjust the similarity between the library image 105-1 and the target image 104 by determining the matching degree between the labels of the library image 105-1 and the target image 104 is described in detail below with reference to fig. 4.

As shown in FIG. 4, the feature semantic categories, i.e., the categories to which each tag of library image 105-1 belongs; the feature semantic categories include: gender, age, presence or absence of criminal records, etc. Feature semantics, i.e., all feature semantics under the enumerated feature semantic category; the bit number is the semantic of x in formula (2), i.e. the bit representing the semantic occupancy of each feature. As shown in fig. 4, the gender includes 2, which represents male and female respectively; the age includes 4, and the markers of age stages represent different ages, such as children, young, middle-aged, and old. The presence or absence of crime records includes 4, … …; query '[ x ] represents an enumerated value of the xth feature semantic of the target image 104, and when the target image 104 contains a label of a certain feature semantic category, the enumerated value of the query' [ x ] is set to 1 to represent that the label of the target image 104 contains the xth feature semantic; setting the enumerated value of query' x to 0 indicates that the tag of the target image 104 does not include the feature semantics of the xth bit. When the target image 104 does not contain a label of a certain feature semantic category, the enumerated values of all the digits under the feature semantic category are set to 1. It is understood that when the enumerated value of query '[ x ] is set to 1, then-query' [ x ] is 0. When the enumerated value of query 'x is set to 0, then-query' x is 1. db 1' [ x ] represents an enumerated value of the xth-th digit tag of the library image 105-1, and when the enumerated value of the xth digit of the library image 105-1 is set to 1, the tag of the library image 105-1 includes the feature semantics of the xth digit; when the enumerated value for the x-th bit of the library image 105-1 is set to 0, the tag of the library image 105-1 does not include the feature semantics for the x-th bit.

As shown in FIG. 4, assuming that the label of a target image 104 is set as male or child, the enumerated values of query '[ 0], query' [1], query '[ 2], query' [3], … …, and query '[ m-1] are 1, 0, …, 1, -query' [0], -query '[ 1], -query' [2], -query '[ 3] … …, and query' [ m-1] are 0, 1, …, 0. Assuming that the labels of the library image 105-1 include male, young, non-criminal records, the enumerated values of db1 ' [0], db ' [1], db ' [2], db ' [3] … …, db ' [ m-1] are 1, 0, 1, … 0, 1. Then, the enumerated values of-query '[ 0] & db 1' [0], -query '[ 1] & db 1' [1], -query '[ 2] & db 1' [2], -query '[ 3] & db 1' [3], … …, -query '[ m-1] & ab 1' [ m-1] are 0, 1, … …, 0.

As shown in FIG. 4, when the label of the target image 104 matches the label in the library image 105-1, then the label in equation (2)

Is 0, Dist (query, db 1)' is the same as Dist (query, db1), and the similarity of the library image 105-1 and the target image 104 is unchanged; when at least one label of the target image 104 does not match a label in the library image 105-1, then

Is greater than 1, the library image 105-1 becomes less similar to the target image 104 because Dist (query, db 1)' is much greater than Dist (query, db 1).

The above formula (2) introduces the result of adjusting Dist (query, db1) calculated by the formula (1) by determining the matching degree of the labels of the library image 105-1 and the target image 104 by calculating the enumerated values of the target image 104 and the library image 105-1, and using the result of the multiplication as an adjustment coefficient. And (3) obtaining the similarity between each adjusted library image and the target image by the method of the formula (2), and when the similarity of each library image in the image library is sequenced, because the matching degree of the labels of the library images and the target image is used as an adjustment coefficient, the similarity of the library images with unmatched labels is lowered, so that the library images with unmatched labels are not in the search result, and the accuracy of the search result is improved.

In some other embodiments of the present application, the enumerated values of the target image 104 and the library image 105-1 may not be calculated to determine the matching degree of the labels of the library image 105-1 and the target image 104, and the matching degree of the labels of the library image 105-1 and the target image 104 may be determined through a label matching algorithm, so as to adjust the distance value of Dist (query, db1) calculated by formula (1). The technical solution of the present application is specifically described below with reference to formula (3).

Formula (3) is based on formula (1), determining the matching degree of the labels of the library image 105-1 and the target image 104 by using a label matching algorithm, taking an adjusting factor corresponding to the matching degree as an adjusting coefficient, adjusting the result calculated by formula (1), and finally obtaining the distance Dist (query, db1) "between the feature vectors of the library image 105-1 and the target image 104 after adjustment; equation (3) can be expressed as follows:

Dist(query，db1)″＝α*Dist(query，db1)(3)

where Dist (query, db1) represents the euclidean distance sum α of the feature vectors of the library image 105-1 and the feature vectors of the target image 104, representing the adjustment factor.

As can be seen from formula (3), the matching degree of all the labels of the target image 104 and the labels in the library image 105-1 is determined, and if at least one label of the target image 104 does not match with the label in the library image 105-1, the adjustment factor α is set to be a maximum value, and the maximum value α is used for adjusting that Dist (query, db1) "is much larger than Dist (query, db 1); the distance Dist (query, db1) "between the feature vectors of the adjusted library image 105-1 and the target image 104 is obtained to be much greater than the distance Dist (query, db1) between the feature vectors of the unadjusted library image 105-1 and the target image 104, i.e., the library image 105-1 and the target image 104 are less similar. If all the labels of the target image 104 match the labels in the library image 105-1, the adjustment factor α is set to 1, so that the distance Dist (query, db1) "between the feature vectors of the library image 105-1 and the target image 104 after adjustment is obtained is much greater than the distance Dist (query, db1) between the feature vectors of the library image 105-1 and the target image 104 without adjustment, i.e., the similarity of the library image 105-1 and the target image 104 is unchanged. For example, the target image 104 is provided with two tags, male, and child, respectively; the labels of the library image 105-1 include males, young adults. It will be appreciated that if at least one label of the target image 104 does not match a label in the library image 105-1, the adjustment factor α is set to a maximum value.

In the embodiment of the present application, the matching degree between all tags of the target image 104 and the tags in the library image 105-1 may be determined by any one of a Bidirectional Encoder (BERT) algorithm, a Long Short-Term Memory Neural Network (LSTM) algorithm, a Bidirectional Long-Term Memory Neural Network (Bi-directional Long-Term Memory) algorithm, and the like.

Both the above formula (2) and formula (3) are to determine the matching degree of the labels of the library image 105-1 and the target image 104, and further adjust the result of Dist (query, db1) calculated in the above formula (1) by using the matching degree of the labels of the library image 105-1 and the target image 104 as an adjustment coefficient, so as to obtain the distance between the feature vectors of the library image 105-1 and the target image 104 after adjustment, and further obtain the similarity between the library image 105-1 and the target image 104 after adjustment.

In some other embodiments of the present application, according to the relationship between the feature semantic categories and the labels represented by the feature vectors of the library image 105-1 and the target image 104, the matching degree of the labels of the library image 105-1 and the target image 104 is used as an adjustment coefficient, and the distance value of the corresponding feature vector in the above formula (1) is adjusted, so as to obtain the similarity between the library image 105-1 and the target image 104 after the adjustment. The technical solution of the present application is specifically described below with reference to formula (4).

Formula (4) is to obtain the distance Dist (query, db 1)' between the feature vectors of the adjusted library image 105-1 and the target image 104 on the basis of the formula (1); equation (4) can be expressed as follows:

Dist(query，db1)″′＝q*(query[0]-db1[0])²+y*y′(query[1]-db1[1])²+…+z*(query[t-1]-db1[t-1])² (4)

where Dist (query, db1) represents the sum of euclidean distances of the feature vectors of the library image 105-1 and the feature vectors of the target image 104; q, y', z represent adjustment factors of the feature vector, respectively.

As can be seen from equation (4), the semantic categories of image features referred to in step 1 and step 2 of FIG. 2 for either the library image 105-1 or the target image 104 may be represented by one or more feature vectors. Each label of the library image 105-1 or the target image 104 may be a feature semantic category represented by one or more feature vectors. Setting an adjustment factor for each feature vector in formula (4), determining the matching degree of the label of the target image 104 and the label in the library image 105-1 one by one through a label matching algorithm, setting the adjustment factor of one or more feature vectors corresponding to the label of the target image 104 matching with the library image 105-1 to be 1, and setting the adjustment factor of one or more feature vectors corresponding to the label of the target image 104 not matching with the library image 105-1 to be a maximum value, where the maximum value is used for adjusting that Dist (query, db 1)' is much greater than Dist (query, db 1).

For example, the library image 105-1 is labeled male, young, shirt jacket. The labels of the target image 104 are male, child. As known from the tag matching algorithm, the "gender" tag of the library image 105-1 matches the "gender" tag of the target image 104; the "age" label of the library image 105-1 does not match the "age" label of the target image 104. If the 0 th and 1 st feature vectors of the library image 105-1 and the target image 104 represent the feature semantic category "gender", the values of the adjustment factor q and the adjustment factor y are set to 1; if the 0 th and t-1 th feature vectors of the library image 105-1 and the target image 104 represent the feature semantic category "age", the values of the adjustment factor y 'and the adjustment factor z are set to be maximum values, and the maximum values y' and z are respectively used for adjusting the distance between the library image 105-1 and the 1 st feature vector of the target image 104 and the distance between the library image 105-1 and the t-1 th feature vector of the target image 104, thereby adjusting the similarity between the library image 105-1 and the target image 104. It can be understood that by judging that the "age" labels of the library image 105-1 and the target image 104 are not matched and then adjusting the adjustment factor of the corresponding feature vector, the similarity between the target image 105-1 and the target image 104 becomes lower, which not only makes the similarity between the adjusted library image 105-1 and the target image 104 more accurate, but also improves the accuracy of image search, so that the search result is in line with expectations.

The formulas (2) to (4) are based on the euclidean distance algorithm, and the similarity between the library image 105-1 and the target image 104 is adjusted by combining the matching degree of the labels of the library image 105-1 and the target image 104, so as to further explain the technical scheme of the application. In other embodiments of the present application, the similarity between the library image 105-1 and the target image 104 may be adjusted based on other distance algorithms besides the euclidean distance algorithm, which is described in the present application in connection with the matching degree of the labels of the library image 105-1 and the target image 104. Wherein the distance algorithm includes at least one of algorithms for calculating a euclidean distance, a cosine distance, a manhattan distance, a mahalanobis distance, and an EMD distance. The idea of adjusting the image similarity based on the distance algorithms other than the euclidean distance algorithm is the same as the idea of adjusting the image similarity based on the euclidean distance algorithm, and details are not repeated herein.

Fig. 5 illustrates a block diagram of an image search apparatus 500, according to some embodiments of the present application. As shown in fig. 5, specifically, the method includes:

a label and feature vector obtaining module 502, configured to obtain a label and a feature vector of each of the target image and the library image in the image library, where the feature vectors of the library image and the target image include one or more feature vectors, and the labels of the library image and the target image include one or more labels;

the similarity adjusting module 504 determines the matching degree of the label of each library image and the target image, and obtains the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image;

the search result determination module 506 determines a library image corresponding to the target image from the image library based on the similarity between each library image and the target image.

In an embodiment of the present application, the image search apparatus 500 further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: determining the matching degree of the labels of each library image and the target image, and calculating the original similarity of each library image and the target image through a distance algorithm; and taking the matching degree of the label of each library image and the label of the target image as an adjusting coefficient, and adjusting each original similarity to obtain the similarity between each library image and the target image.

In an embodiment of the present application, the image search apparatus 500 further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: and determining the matching degree of the label of each library image and the target image, using a distance algorithm, taking the matching degree of the label of each library image and the target image as an adjusting coefficient for the distance algorithm, and calculating to obtain the similarity between the feature vectors of each library image and the target image.

In an embodiment of the present application, the image search apparatus 500 further includes: the distance algorithm includes at least one of algorithms for calculating a euclidean distance, a cosine distance, a manhattan distance, a mahalanobis distance, and an EMD distance. Determining the matching degree of the label of each library image and the target image, wherein the matching degree comprises the following steps: and determining the matching degree of the label between the library image and the target image through a label matching algorithm, wherein the label matching algorithm comprises at least one algorithm of a bidirectional encoder algorithm, a long-time and short-time memory neural network algorithm and a bidirectional long-time and short-time memory neural network.

In an embodiment of the present application, the image search apparatus 500 further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case that at least one label in the target image does not match a label in the library image, the similarity of the target image and the library image is reduced.

In an embodiment of the present application, the image search apparatus 500 further includes: determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image comprises the following steps: for one library image, in the case where all the labels of the target image match the labels in the library image, the similarity between the target image and the library image is unchanged.

In an embodiment of the present application, the image search apparatus 500 further includes: determining a library image corresponding to the target image from the image library based on the similarity between each library image and the target image, comprising: and sequencing the similarity of each library image in the image library and the target image in sequence, and determining the library image corresponding to the target image from the image library according to the sequencing sequence.

It can be understood that the image search apparatus 500 shown in fig. 5 corresponds to the image search method provided in the present application, and the technical details in the above detailed description about the image search method provided in the present application are still applicable to the image search apparatus 500 shown in fig. 5, and the detailed description is referred to above and is not repeated herein.

Fig. 6 is a block diagram illustrating a system 600 according to some embodiments of the present application. Fig. 6 schematically illustrates an example system 600 in accordance with various embodiments. In some embodiments, system 600 may include one or more processors 604, system control logic 608 coupled to at least one of processors 604, system memory 612 coupled to system control logic 608, non-volatile memory (NVM)616 coupled to system control logic 608, and network interface 620 coupled to system control logic 608.

In some embodiments, processor 604 may include one or more single-core or multi-core processors. In some embodiments, processor 604 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.).

In some embodiments, system control logic 608 may include any suitable interface controllers to provide any suitable interface to at least one of processors 604 and/or any suitable device or component in communication with system control logic 608.

In some embodiments, system control logic 608 may include one or more memory controllers to provide an interface to system memory 612. System memory 612 may be used to load and store data and/or instructions. The memory 612 of the electronic device 600 may comprise any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM), in some embodiments.

NVM/memory 616 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the NVM/memory 616 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), CD (Compact Disc) Drive, DVD (Digital Versatile Disc) Drive.

NVM/memory 616 may comprise a portion of a storage resource on the device on which electronic device 600 is installed, or it may be accessible by, but not necessarily a part of, the device. For example, the NVM/storage 616 may be accessed over a network via the network interface 620.

In particular, system memory 612 and NVM/storage 616 may each include: a temporary copy and a permanent copy of instructions 624. The instructions 624 may include: instructions that, when executed by at least one of the processors 604, cause the electronic device 600 to implement the method illustrated in fig. 2. In some embodiments, instructions 624, hardware, firmware, and/or software components thereof may additionally/alternatively be located in system control logic 608, network interface 620, and/or processor 604.

Network interface 620 may include a transceiver to provide a radio interface for electronic device 600 to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, the network interface 620 may be integrated with other components of the electronic device 600. For example, network interface 620 may be integrated with at least one of processor 604, system memory 612, NVM/storage 616, and a firmware device (not shown) having instructions that, when executed by at least one of processors 604, electronic device 600 implements the image search method as shown in fig. 2.

The network interface 620 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 620 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.

In one embodiment, at least one of the processors 604 may be packaged together with logic for one or more controllers of system control logic 608 to form a System In Package (SiP). In one embodiment, at least one of processors 604 may be integrated on the same die with logic for one or more controllers of system control logic 608 to form a system on a chip (SoC).

The electronic device 600 may further include: input/output (I/O) devices 632. I/O device 632 may include a user interface to enable a user to interact with electronic device 600; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 600. In some embodiments, the electronic device 600 further comprises a sensor for determining at least one of environmental conditions and geographic information related to the electronic device 600.

Fig. 7 shows a block diagram of a SoC (System on Chip) 700, according to an embodiment of the present application. In fig. 7, similar components have the same reference numerals. In addition, the dashed box is an optional feature of more advanced socs. In fig. 7, SoC 700 includes: an interconnect unit 750 coupled to the application processor 710; a system agent unit 770; a bus controller unit 780; an integrated memory controller unit 740; a set or one or more coprocessors 720 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; a Static Random Access Memory (SRAM) unit 730; a Direct Memory Access (DMA) unit 760. In one embodiment, coprocessor 720 includes a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPU, a high-throughput MIC processor, embedded processor, or the like.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or tangible machine-readable memories for transmitting information using the Internet in the form of electrical, optical, acoustical or other propagated signals, e.g., carrier waves, infrared digital signals, etc.). Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. An image search method for searching an image library for a library image corresponding to a target image based on the target image, the method comprising:

respectively obtaining a label and a feature vector of each library image in the target image and the image library, wherein the feature vectors of the library image and the target image comprise one or more feature vectors, and the labels of the library image and the target image comprise one or more labels;

determining the matching degree of the label of each library image and the target image, and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the target image;

and determining a library image corresponding to the target image from the image library based on the similarity of each library image and the target image.

2. The method of claim 1, wherein determining a degree of matching of the label of each of the library images with the target image, and obtaining a similarity between each of the library images and the target image according to the degree of matching of the label of each of the library images with the target image comprises:

determining a degree of matching of each of the library images to the label of the target image,

calculating the original similarity of each library image and the target image through a distance algorithm;

and adjusting each original similarity by taking the matching degree of the label of each library image and the label of the target image as an adjustment coefficient to obtain the similarity between each library image and the target image.

3. The method of claim 1, wherein determining a degree of matching of the label of each of the library images with the target image, and obtaining a similarity between each of the library images and the target image according to the degree of matching of the label of each of the library images with the target image comprises:

and calculating and obtaining the similarity between the feature vectors of each library image and the target image by using a distance algorithm and taking the matching degree of the label of each library image and the target image as an adjusting coefficient for the distance algorithm.

4. The method of claim 2 or claim 3, wherein the distance algorithm comprises at least one of algorithms for calculating Euclidean, cosine, Manhattan, Mahalanobis, and EMD distances.

5. The method of claim 2 or claim 3, wherein determining a degree of match of each of the library images to the label of the target image comprises:

and determining the matching degree of the label between the library image and the target image through a label matching algorithm, wherein the label matching algorithm comprises at least one algorithm of a bidirectional encoder algorithm, a long-time and short-time memory neural network algorithm and a bidirectional long-time and short-time memory neural network.

6. The method of claim 1, wherein determining a degree of matching of the label of each of the library images with the target image, and obtaining a similarity between each of the library images and the target image according to the degree of matching of the label of each of the library images with the target image comprises:

for one of the library images, in the case that at least one label in the target image does not match a label in the library image, the similarity between the target image and the library image is reduced.

7. The method of claim 1, wherein determining a degree of matching of the label of each of the library images with the target image, and obtaining a similarity between each of the library images and the target image according to the degree of matching of the label of each of the library images with the target image comprises:

for one of the library images, in the case that all the labels of the target image match the labels in the library image, the similarity between the target image and the library image is unchanged.

8. An image search apparatus, characterized in that the apparatus comprises:

a label and feature vector acquisition module that respectively acquires a label and a feature vector of a target image and each library image in the image library, wherein the feature vectors of the library image and the target image include one or more feature vectors, and the labels of the library image and the target image include one or more labels;

the similarity adjusting module is used for determining the matching degree of the label of each library image and the label of the target image and obtaining the similarity between each library image and the target image according to the matching degree of the label of each library image and the label of the target image;

and the search result determining module is used for determining a library image corresponding to the target image from the image library based on the similarity of each library image and the target image.

9. A readable medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the image search method of any one of claims 1 to 7.

10. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor, being one of the processors of the electronic device, for performing the image search method of any one of claims 1 to 7.