CN113704532B

CN113704532B - Method and system for improving picture retrieval recall rate

Info

Publication number: CN113704532B
Application number: CN202011341433.5A
Authority: CN
Inventors: 史国杰; 曹靖诚; 张继东; 刘硙
Original assignee: Tianyi Digital Life Technology Co Ltd
Current assignee: Tianyi Digital Life Technology Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-04-26
Anticipated expiration: 2040-11-25
Also published as: CN113704532A

Abstract

The invention provides a method and a system for improving the recall rate of picture retrieval, wherein the method comprises the following steps: converting a first picture and a second picture into pixel values, respectively, wherein the first picture is used as a reference picture; calculating a hash value string of the first picture based on differences in pixel values of adjacent pixels; calculating hash value strings of the second picture and a plurality of rotated and mirrored pictures of the second picture based on differences of pixel values of adjacent pixels respectively; determining the similarity between the second picture and the first picture after rotation and mirroring based on the hash value string; and taking the second picture and the picture with the highest similarity with the first picture in the plurality of rotated and mirrored pictures as the main direction of the second picture, and substituting the picture and the first picture into a picture identification model.

Description

Method and system for improving picture retrieval recall rate

Technical Field

The invention relates to the field of artificial intelligence and image recognition and processing, in particular to a method and a system for improving the recall rate of picture retrieval by using a hash algorithm for main direction positioning.

Background

With the development of scientific technology, artificial intelligence is increasingly widely used, such as license plate recognition, face recognition and the like, most of information is acquired through images, and particularly in the field of the artificial intelligence which is relatively hot nowadays, a large number of pictures are required for training and learning, wherein the picture recognition is the condition that the picture recognition is not performed.

Fig. 1 is an exemplary architecture diagram of a prior art model for identifying similar pictures. As shown in fig. 1, there are a variety of models already in the prior art that have been trained to recognize similar pictures, such as picture recognition model 102 in fig. 1. Such a model may provide a recognition result of whether the two pictures are similar. For example, in fig. 1, a picture a and a picture B are provided as inputs to the similar picture recognition model 102, where the picture a is typically a reference picture, and the picture recognition model 102 outputs a recognition result, i.e., the picture B is similar or dissimilar to the picture a as the reference picture.

The existing image similarity recognition method mainly comprises a similar picture retrieval scheme based on global features and local features, such as Sift and Surf, and is gradually replaced by a deep learning feature extraction method, wherein the deep learning feature extraction generally uses classical CNN (computer numerical network) networks, such as VGG, res, xception and other network models, a large-scale picture classification dataset ImageNet is used for pre-training, a certain layer of output result of a convolution layer or a full-connection layer is extracted as a feature representation of a picture, and then Euclidean distance or cosine similarity comparison is carried out. The method has stronger semanteme and higher precision. But has the following problems: the recall ratio is not good enough. Mainly because the pre-training model is based on classification, unobvious features can be discarded, and the extracted features can have larger differences under different conditions of rotation, mirror image, lens focal length and the like. The most intuitive expression used in picture similarity recognition is that when an input is a rotated picture, even if the rotation angle is small (about 30 degrees), efficient recognition cannot be performed, so that the recall ratio of the similar picture is extremely low. The conventional optimization method of these methods is to improve the network, which may bring about a series of problems such as increased algorithm complexity and excessively long development time.

A chinese patent application (201410848431.3) entitled "a similar picture detection method and apparatus" proposes a picture similarity detection method based on HASH, ashing a picture, dividing the picture by 8×8, further obtaining an average value of each region, and then performing quantization marking to form a HASH string. However, the method does not consider the situation that the recall ratio of the similar identification is low and the precision ratio is not high when the similar image identification is carried out by acquiring the characteristic value, so that the patent cannot completely solve the problem of the similar image identification.

Chinese patent application (201811449694.1) entitled "image similarity determination method based on pre-screening method and PHash" proposes a method for performing variance calculation on images by using a color variance algorithm, calculating variance difference between two images based on color variance, and completing the pre-screening process. If the variance difference value based on the color variance is larger than the variance threshold value, judging that the images are dissimilar, and ending the steps; if the variance difference value based on the color variance is smaller than or equal to the variance threshold value, continuing to carry out HASH on the images through PHASH algorithm, and calculating the Hamming distance based on PHASH between the two images; if the Hamming distance based on PHASH is less than the Yu Hanming distance threshold, determining that the images are similar; otherwise, the images are judged to be dissimilar. The method can improve the picture similarity retrieval efficiency, but has limited efficiency improvement and does not have anti-rotation capability for similarity identification of the rotated pictures through a series of complex algorithms and calculation, and does not have outstanding contribution to picture recall ratio improvement.

Chinese patent application (201710029935.6) entitled "HASH image retrieval method based on deep learning and local feature fusion" applies a method combining deep learning network and HASH image retrieval, extracts two features, then uses an approximate nearest neighbor search strategy to perform image retrieval, and also performs retrieval calculation on mirror images. However, the method still carries out different strategy processing on all the pictures, so that the operation speed and the precision are improved to a certain extent, but the recall ratio of the rotated pictures is still not guaranteed.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In order to solve the problems, the invention introduces a picture main direction identification method based on hash values. More specifically, the method may include ashing the picture to form pixel points, comparing two adjacent pixel values according to a certain rule to generate HASH, and obtaining the HASH of the picture to be compared in the same manner, and obtaining the HASH of different angles through rotation calculation. The advantage of adopting HASH contrast to search the similar pictures is mainly that the speed is fast, the accuracy is high, the recall ratio is high, etc. The HASH is calculated through the picture pixels only by one time, the rest rotation can be directly obtained through the rotation calculation of the obtained HASH, and when the similarity of the comparison results is extremely high and the aspect ratio is the same, the picture similarity can be directly judged, so that the processing speed is further increased. Particularly, the method has higher accuracy when the rotation and mirror image pictures are identified similarly, so that the recall ratio of the similar pictures is improved.

According to one aspect of the present invention, there is provided a method for enhancing picture retrieval recall, wherein the method comprises:

Converting a first picture and a second picture into pixel values, respectively, wherein the first picture is used as a reference picture;

calculating a hash value string of the first picture based on differences in pixel values of adjacent pixels;

Calculating hash value strings of the second picture and a plurality of rotated and mirrored pictures of the second picture based on differences of pixel values of adjacent pixels respectively;

determining the similarity between the second picture and the first picture after rotation and mirroring based on the hash value string; and

And taking the second picture and one of the plurality of rotated and mirrored pictures, which has the highest similarity with the first picture, as a main direction of the second picture, and substituting the picture and the first picture into a picture identification model.

According to a further embodiment of the invention, the method further comprises: and if none of the second picture and the rotated and mirrored pictures is higher than a preset threshold value, skipping the steps of taking the main direction and substituting the main direction into a picture identification model, and directly judging that the second picture is dissimilar to the first picture.

According to a further embodiment of the present invention, the converting the first picture and the second picture into pixel values, respectively, further comprises: and ashing the first picture and the second picture to obtain pixel values.

According to a further embodiment of the present invention, calculating a hash value string based on differences in pixel values of neighboring pixels further includes: comparing each pixel value of the picture with adjacent pixel values pixel by pixel, and obtaining a plurality of hash values based on the difference degree of the pixel values; and concatenating the plurality of hash values into a hash value string.

According to a further embodiment of the present invention, determining the similarity between the second picture and the plurality of rotated and mirrored pictures and the first picture, respectively, based on the hash value string further comprises:

Calculating an error sum of each picture, wherein the error sum is a sum of absolute differences of adjacent hash values in the hash value string; and

And calculating a difference between the error sum of the second picture and each of the plurality of rotated and mirrored pictures and the error sum of the first picture, wherein the smaller the difference is, the higher the similarity between the picture and the first picture is.

Calculating a ratio of a number of bits equal to corresponding bits in the hash value string of the second picture and each of the plurality of rotated and mirrored pictures to a total length of the hash value string of the first picture, wherein the greater the ratio indicates a higher similarity of the picture to the first picture.

According to a further embodiment of the present invention, at least one of the hash value strings of the rotated and mirrored pictures of the second picture is obtained by correspondingly rotating and/or mirroring the hash value string of the second picture.

According to another aspect of the present invention, there is provided a system for identifying similar pictures, wherein the system comprises:

A picture primary direction determination module configured to:

converting a first picture and a second picture into pixel values, respectively, wherein the first picture is taken as

A reference picture;

Calculating the second picture and the second picture, respectively, based on differences in pixel values of adjacent pixels

A plurality of rotated and mirrored hash value strings of the picture;

Determining the second picture and the plurality of rotated and mirrored images, respectively, based on the hash value string

Similarity between the first picture and the second picture; and

Taking the second picture and the picture with the highest similarity with the first picture in the plurality of rotated and mirrored pictures as the main direction of the second picture, and combining the second picture with the first picture

The slices are provided together to a picture recognition model; and

A picture recognition model configured to extract feature values of a set of pictures provided by the picture main direction determination module and to recognize whether the set of pictures are similar.

According to a further embodiment of the invention, the picture main direction determination module is further configured to: and if none of the second picture and the rotated and mirrored pictures is higher than a preset threshold value, skipping the steps of taking the main direction and substituting the main direction into a picture identification model, and directly judging that the second picture is dissimilar to the first picture.

According to a further embodiment of the present invention, the converting the first picture and the second picture into pixel values, respectively, further comprises: ashing the first picture and the second picture to obtain pixel values, and

Calculating a hash value string based on differences in pixel values of adjacent pixels further includes: comparing each pixel value of the picture with adjacent pixel values pixel by pixel, and obtaining a plurality of hash values based on the difference degree of the pixel values; and concatenating the plurality of hash values into a hash value string.

Compared with the scheme in the prior art, the picture retrieval method provided by the invention has at least the following advantages:

(1) The method has high accuracy. The method can solve the problem that the identification rate and recall rate of the rotation and mirror image pictures are poor when the neural network performs similar picture identification.

(2) The method has high efficiency, and for the similar picture identification algorithm mainly based on the neural network, the processing speed of the similar picture is improved by more than 90%.

(3) The method has stability, pixel difference is calculated based on HASH to form HASH strings, and the technology is mature and can ensure the stability of similar recognition functions.

(4) With flexible adaptation, the main direction positioning algorithm based on Hash can be adapted in front of the convolution layer of any CNN model.

(5) The method has the advantages that the accuracy, the efficiency and the recall ratio of the neural network for identifying the rotation and mirror images are improved, the resource cost and the time cost are saved, and the faster processing speed and the higher recall ratio are obtained.

These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.

Drawings

So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

Fig. 1 is an exemplary architecture diagram of a prior art model for identifying similar pictures.

Fig. 2 is an example flow diagram of a method for enhancing picture retrieval recall in accordance with one embodiment of the present invention.

FIG. 3 shows a schematic diagram of an example HASH function.

Fig. 4 is a schematic diagram of a HASH string mirror up and down.

Fig. 5 is an exemplary architecture diagram of a system for identifying similar pictures according to one embodiment of the invention.

Detailed Description

The features of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings.

In order to solve the problem that rotation and mirror images cannot be effectively identified in the prior art, the invention provides a method capable of improving the identification accuracy and the identification efficiency of similar images and further improving the recall rate of image retrieval. Fig. 2 is an example flow diagram of a method for enhancing picture retrieval recall in accordance with one embodiment of the present invention.

The method starts at step 202, a first picture and a second picture are respectively converted into pixel values, wherein the first picture is used as a reference picture, and the second picture is a picture to be identified, i.e. a picture to be identified and having a similarity to the reference picture. For convenience of description, hereinafter, the first picture will be referred to as picture a, and the picture to be identified will be referred to as picture B.

As is well known, a picture is typically represented by a color value for each pixel it contains, where the color values may have different color formats, e.g., RGB, CMYK, etc. Through experimentation, the inventors have found that the use of color values such as RGB does not significantly improve the calculation of the method of the present invention, but rather the complexity. Thus, converting a picture to a pixel value may preferably comprise de-coloring, i.e. ashing, the picture, followed by re-acquisition of its pixel value. The range of the pixel values after ashing depends on the set gradation level, and the gradation level may be set to 8, 16, 32, 64, 128, 256, or the like, for example. Subsequently, the method proceeds to step 204.

In step 204, a hash value string for the first picture is calculated based on the differences in pixel values of the neighboring pixels. As an example, the pixel value of the picture a may be processed, two adjacent pixels of each row are compared, different HASH (HASH) values are obtained according to the difference of the pixel values of the two, and finally a HASH string is obtained. FIG. 3 shows a schematic diagram of an example HASH function. The upper right hand corner of fig. 3 shows a close-up view of the ashed pixels of one picture. Then, starting from the leftmost first pixel of the first row, pixel value comparisons are made with its right adjacent pixels one by one. As an example, assume that the HASH function rule is adopted that HASH writes "2" when the left pixel value is equal to or greater than the right pixel plus a threshold (e.g., 4), HASH writes "1" when the left pixel value is equal to or greater than the right pixel minus a threshold (e.g., 4), and other cases write "0". In this example, the pixel value comparison of the first pixel of the first row and its neighboring pixels corresponds to: left > = right-4, and not met = right +4, then HASH value is 1. And so on, then the second pixel is compared with the third pixel adjacent to the right, and finally a HASH value string consisting of 0, 1 and 2 is obtained and is marked AHASH. It will be understood by those skilled in the art that different sections, for example, more than 0, 1, and 2 sections, may be set according to the requirements, and the specific difference threshold of each section may be set according to the requirements and the value range of the pixel value. Subsequently, the method proceeds to step 206.

In step 206, a hash value string of the second picture is calculated based on the differences in pixel values of the adjacent pixels, and hash value strings of the plurality of rotated and mirrored pictures of the second picture. First, in the same manner as the first picture, a HASH value string of the second picture may be calculated and recorded as BHASH. Then, hash values of the second picture after being rotated by 90 degrees, 180 degrees and 270 degrees, and horizontally mirrored and vertically mirrored pictures are respectively recorded as BHASH _90, BHASH _180, BHASH _270, BHASH _IMA_0 and BHASH _IMA_180.

In addition, since BHASH is a set of pixel value comparison results and is stored in a certain order, depending on the set HASH function, the HASH result of a part of the rotated or mirrored picture can be directly obtained by rotating the HASH result of the original picture. Fig. 4 is a schematic diagram of a HASH string mirror up and down. As shown in fig. 4, assuming that HASH strings obtained from an original picture are arranged by the number of rows and columns of pixels as shown in (a) of fig. 4, it is easy to understand that if a HASH function is used in which HASH values are obtained by comparing left and right adjacent pixel values, HASH results of pictures mirrored up and down are shown in (b) of fig. 4, and thus can be obtained by directly mirroring HASH values. Subsequently, the method proceeds to step 208.

In step 208, the similarity between the second picture and its rotation and the mirrored picture and the first picture, respectively, is determined based on the hash value string. According to one embodiment of the invention, the similarity between two pictures may be determined based on the error sum of the hash value strings. The error sum of a HASH value string is calculated by taking the absolute difference from the first character and the next character until the last, and summing each difference to obtain the total difference. For example, assuming AHASH is 012012012, the error sum of AHASH is se_ AHASH = |0-1|+|1-2|+|2-0|+|0-1|+|1-2|=12. Similarly, the error sums BHASH may be calculated as SE BHASH, and the error sums SE BHASH _90, SE BHASH _180, SE BHASH _270, SE BHASH _ima_0, and SE BHASH _ima_180 of the HASH value strings of the rotated and mirrored pictures. Subsequently, se_bhash, se_ BHASH _90, se_ BHASH _180, se_ BHASH _270, se_ BHASH _ima_0, and se_ BHASH _ima_180 are compared to se_ AHASH, respectively, and the closer the two pictures are, the more similar the two pictures are.

According to a further embodiment of the invention, the similarity between two pictures may be determined based on the ratio of the number of corresponding bits of the hash value string equal to the total length. For example, assuming that AHASH and BHASH each have a length of 64 bits, if the number of bits corresponding to the same bit is 48 bits, the same bit ratio can be found to be 48/64=75%. The higher the ratio, the more similar the two pictures are.

In step 210, the second picture and the rotation and mirror image of the second picture with the highest similarity to the first picture are taken as the main direction of the second picture, and the second picture and the first picture are substituted into the picture identification model. As depicted in step 208, the similarity of the second picture and its rotation and mirror image pictures, respectively, is quantized based on the hash value string, taking the most similar one of them as the main direction of the second picture, i.e. the direction is considered to be the same as the main direction of the first picture. For example, if the horizontal mirror image of the second picture is considered to be most similar to the first picture, the horizontal mirror image will replace the second picture and be substituted into the neural network together with the first picture, for example, in the picture recognition model 102 in fig. 1, and further subsequent operations such as feature value calculation are performed. The main direction before substituting the picture identification model is determined, so that the defect that the neural network cannot effectively identify the rotation and mirror image pictures is effectively overcome. With the effective identification of the rotated and mirrored pictures, the precision, recall and recall of the pictures are improved.

Optionally, the method may further include a dissimilarity determination step 212. More specifically, if the similarity between the second picture and each of the rotated and mirrored pictures is found to be very low, the second picture and the first picture may be directly determined to be dissimilar, and the process of determining the main direction and substituting the picture identification model may be skipped, and the process proceeds to step 210. For example, in the example of calculating the error sum, a threshold may be set for the difference of the error sum of the second picture and the first picture, and step 210 may be entered only if there is at least one of the error sum of the second picture and its respective rotated and mirrored pictures and the difference of the error sum from the first picture is greater than the threshold. When comparing the number of the same bits as the corresponding bits, it may be set that step 210 is performed only when at least one comparison result is 75% or more. On the contrary, the second picture and the first picture can be directly judged to be dissimilar, a large amount of operation processes are saved, the obviously dissimilar pictures can be screened out before entering the identification model, and the overall operation efficiency of picture identification can be greatly improved, for example, the overall operation efficiency is improved by more than 90%.

It should be noted that the similarity determined in step 208 does not fully represent the actual degree of similarity of the two pictures, but is simply a simple determination of the degree of similarity of the two pictures, but by step 208 it can be determined at least which angle of the picture is most likely to be a picture similar to the reference picture (i.e. determining the main direction of the picture), and the recognition problem of small angle rotation is also solved, since a hash value string that still shows a certain angle is closer to the hash value string of the reference picture, although there is a small angle rotation. Furthermore, clearly dissimilar pictures can be quickly and easily excluded by comparison of hash value strings.

Fig. 5 is an exemplary architecture diagram of a system for identifying similar pictures according to one embodiment of the invention. As shown in fig. 5, system 500 includes a picture primary direction determination module 502 and a picture identification model 504. Picture a, which is a reference picture, and picture B to be identified are provided as inputs to picture main direction determination module 502 first. Picture principal direction determination module 502 may determine the principal direction of picture B using, for example, the method described above in connection with fig. 2, and directly determine that picture B is dissimilar to picture a when none of the directions is similar to picture a to a threshold. Subsequently, the picture main direction determination module 502 supplies the picture of the main direction of the picture B to the picture recognition model 502 together with the picture a, which further determines whether the two are similar, and outputs the recognition result accordingly.

The method for improving the retrieval recall rate of the CNN network picture is described above. The method solves the problem that the recall ratio generated when the neural network performs picture similarity recognition is low, and particularly solves the problem that the recall ratio of the neural network to the rotated and mirrored pictures is very low. According to the principle and characteristics of picture rotation, the method adopts a mode of solving the HASH values of different angles of the picture and respectively comparing the HASH values to obtain the picture HASH values determined by the main directions, calculates the HASH values of the picture in multiple directions after the picture is rotated, determines the main directions of the picture through comparing the HASH values, and then the picture is correspondingly rotated and input into the neural network to obtain the similar result of the similar picture, thereby solving the problems of low recognition rate, low recognition accuracy and low recall rate encountered by the neural network when the similar picture is searched, perfecting the function of a similar picture recognition system and ensuring the recognition effect of the similar picture. By using the method of the invention, the operation efficiency of the image similarity recognition algorithm based on the neural network can be improved by more than 90%, and the image recall ratio can be improved by more than 20%.

What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Claims

1. A method for enhancing picture retrieval recall, the method comprising:

Calculating hash value strings of the second picture and a plurality of rotated and mirrored pictures of the second picture respectively based on differences of pixel values of adjacent pixels, wherein at least one of the hash value strings of the plurality of rotated and mirrored pictures of the second picture is obtained by correspondingly rotating and/or mirroring the hash value string of the second picture;

Taking the second picture and the one of the plurality of rotated and mirrored pictures, which has the highest similarity with the first picture, as a main direction of the second picture, substituting the picture and the first picture into a picture identification model, and if none of the second picture and the plurality of rotated and mirrored pictures has the similarity with the first picture higher than a preset threshold, skipping the main direction and substituting the picture identification model to directly determine that the second picture is dissimilar with the first picture,

Wherein calculating the hash value string based on the differences in pixel values of the neighboring pixels further comprises:

comparing each pixel value of the picture with adjacent pixel values pixel by pixel, and obtaining a plurality of hash values based on the difference degree of the pixel values; and

Concatenating the plurality of hash values into a hash value string.

2. The method of claim 1, wherein converting the first picture and the second picture, respectively, to pixel values further comprises:

and ashing the first picture and the second picture to obtain pixel values.

3. The method of claim 1, wherein determining similarities between the second picture and the plurality of rotated and mirrored pictures and the first picture, respectively, based on the hash string further comprises:

4. The method of claim 1, wherein determining similarities between the second picture and the plurality of rotated and mirrored pictures and the first picture, respectively, based on the hash string further comprises:

5. A system for identifying similar pictures, the system comprising:

A picture primary direction determination module configured to:

Taking the second picture and the one of the plurality of rotated and mirrored pictures, which has the highest similarity with the first picture, as a main direction of the second picture, providing the second picture and the first picture to a picture identification model, and if none of the second picture and the plurality of rotated and mirrored pictures has the similarity with the first picture higher than a preset threshold, skipping the main direction and substituting the main direction into the picture identification model to directly judge that the second picture is dissimilar with the first picture; and

A picture recognition model configured to extract feature values of a set of pictures provided by the picture main direction determination module and to recognize whether the set of pictures are similar,

Concatenating the plurality of hash values into a hash value string.

6. The system of claim 5, wherein converting the first picture and the second picture to pixel values, respectively, further comprises: and ashing the first picture and the second picture to obtain pixel values.