CN117648452A

CN117648452A - Picture retrieval method, device, equipment and storage medium

Info

Publication number: CN117648452A
Application number: CN202311467967.6A
Authority: CN
Inventors: 孙责荃; 刘明; 万雷鸣; 梁振宝; 陈勇
Original assignee: Zhejiang Geely Holding Group Co Ltd; Geely Automobile Research Institute Ningbo Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Geely Automobile Research Institute Ningbo Co Ltd
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-03-05

Abstract

The invention relates to the technical field of image processing, and discloses a picture retrieval method, a device, equipment and a storage medium, wherein the method comprises the following steps: confidence value screening is carried out on the difficult-to-identify pictures, and difficult-to-identify positive and negative samples corresponding to the difficult-to-identify pictures are obtained; extracting features based on the positive and negative samples difficult to separate to obtain new features of few samples of the positive and negative samples difficult to separate; and searching the warehouse-in picture according to the new features of the few samples to obtain a search picture set corresponding to the warehouse-in picture. According to the invention, the novel characteristics of the few samples are obtained by extracting the characteristics of the positive and negative samples which are difficult to separate, and then the novel characteristics of the few samples are used as retrieval input, so that the situation that the retrieval effect of the cloud large model on some pictures is poor when the training data has insufficient samples is avoided, the novel categories can be more accurately identified by using the characteristic retrieval mode, and the accuracy of the picture retrieval is improved.

Description

Picture retrieval method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for retrieving a picture.

Background

With the wide application of artificial intelligence technology in various fields, data presentation forms are more and more diversified. The multi-modal data such as text, image, video and the like are rapidly increased, the information of single-modal data is limited, the interactive multi-modal data can transmit more abundant information, and the same thing can describe various different-modal data. For example, multimodal retrieval aims at enabling information interaction between two different modalities, the fundamental purpose of which is to mine the relationship between different modality samples, i.e. to retrieve one modality sample by another with similar semantics.

In the field of automobiles, in order to upgrade the recognition capability of an intelligent driving perception algorithm model, data which is not seen by the model needs to be mined, the prior art relies on a cloud large model to mine the data, and the common mining mode comprises label mining and graph searching. However, the cloud large model may not have enough samples in the training data, and may have poor searching effect on some pictures, so that it is difficult to find the wanted pictures through multi-mode searching, resulting in low accuracy.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a picture retrieval method, a device, equipment and a storage medium, and aims to solve the technical problems that the existing multi-mode retrieval is difficult to find a desired picture and the accuracy is too low.

In order to achieve the above object, the present invention provides a picture retrieval method, which includes the steps of:

confidence value screening is carried out on the difficult-to-identify pictures, and difficult-to-identify positive and negative samples corresponding to the difficult-to-identify pictures are obtained;

extracting features based on the positive and negative samples difficult to separate to obtain new features of few samples of the positive and negative samples difficult to separate;

and searching the warehouse-in picture according to the new features of the few samples to obtain a search picture set corresponding to the warehouse-in picture.

Optionally, the extracting features based on the positive and negative samples to obtain new features of the few samples of the positive and negative samples includes:

extracting characteristics of the positive and negative samples difficult to separate, and obtaining a plurality of sample characteristics corresponding to the positive and negative samples difficult to separate;

obtaining feature distances between the plurality of sample features;

and carrying out similar processing on the plurality of sample features based on the feature distance to obtain new features with few samples corresponding to the difficult-to-identify picture.

Optionally, the performing similar processing on the plurality of sample features based on the feature distance to obtain new features with fewer samples corresponding to the difficult-to-identify picture includes:

performing similarity detection on the plurality of sample features based on the feature distances to obtain a plurality of sample similarities corresponding to the plurality of sample features;

judging whether the sample similarity reaches a preset similarity value or not;

and taking the sample characteristic corresponding to the preset similarity value as a new characteristic of less samples.

Optionally, the taking the sample feature corresponding to the preset similarity value as the new feature with less samples includes:

pulling a plurality of sample features which do not reach the preset similarity value far through a difficult-to-separate negative sample, and pulling a plurality of sample features which reach the preset similarity value near through a difficult-to-separate positive sample to obtain a similar feature cluster, wherein the difficult-to-separate positive and negative samples comprise a difficult-to-separate positive sample and a difficult-to-separate negative sample;

and marking the similar feature clusters to obtain new features of few samples corresponding to the difficult-to-identify pictures.

Optionally, the performing confidence value screening on the difficult-to-identify picture to obtain the difficult-to-identify positive and negative samples corresponding to the difficult-to-identify picture includes:

Performing region frame selection on the difficult-to-identify picture to obtain a difficult-to-identify region corresponding to the difficult-to-identify picture;

searching in advance according to the difficult-to-identify area to obtain a search sample;

and screening the retrieval samples according to the confidence values to obtain the indistinct positive and negative samples corresponding to the indistinct areas.

Optionally, the screening the search samples according to the confidence value to obtain the refractory positive and negative samples corresponding to the refractory areas includes:

obtaining an audit result based on the retrieval sample feedback;

dividing the retrieval sample into a difficult-to-separate negative sample and a difficult-to-separate positive sample according to the confidence value and the auditing result fed back by the retrieval sample;

and obtaining the refractory positive and negative samples corresponding to the refractory areas according to the refractory negative samples and the refractory positive samples.

Optionally, accessing the new features of the few samples into a preset multi-mode retrieval large model; the method further comprises the steps of after feature extraction is performed based on the positive and negative samples difficult to separate and few sample new features of the positive and negative samples difficult to separate are obtained:

extracting features of the warehouse-in picture through the preset multi-mode retrieval large model to obtain current features corresponding to the warehouse-in picture;

Judging whether the current characteristics are similar to the new characteristics of the few samples or not;

and when the current feature has similar features of the new features with few samples, judging the current feature as the new features with few samples.

In addition, to achieve the above object, the present invention also proposes a picture retrieval apparatus, the apparatus comprising:

the confidence screening module is used for screening confidence values of the difficult-to-identify pictures to obtain difficult-to-identify positive and negative samples corresponding to the difficult-to-identify pictures;

the feature extraction module is used for carrying out feature extraction based on the difficult-to-separate positive and negative samples to obtain new features of few samples of the difficult-to-separate positive and negative samples;

and the picture retrieval module is used for retrieving the warehouse-in picture according to the new features of the few samples to obtain a retrieval picture set corresponding to the warehouse-in picture.

In addition, to achieve the above object, the present invention also proposes a picture retrieval apparatus, the apparatus comprising: a memory, a processor and a picture retrieval program stored on the memory and executable on the processor, the picture retrieval program being configured to implement the steps of the picture retrieval method as described above.

In addition, to achieve the above object, the present invention also proposes a storage medium having stored thereon a picture retrieval program which, when executed by a processor, implements the steps of the picture retrieval method as described above.

Confidence value screening is carried out on the difficult-to-identify pictures, so that difficult-to-identify positive and negative samples corresponding to the difficult-to-identify pictures are obtained; then, extracting features based on the positive and negative samples difficult to separate to obtain new features of few samples of the positive and negative samples difficult to separate; and finally, searching the warehouse-in picture according to the new features of the few samples to obtain a search picture set corresponding to the warehouse-in picture. According to the invention, the novel characteristics of the few samples are obtained by extracting the characteristics of the positive and negative samples which are difficult to separate, and then the novel characteristics of the few samples are used as retrieval input, so that the situation that the retrieval effect of the cloud large model on some pictures is poor when the training data has insufficient samples is avoided, the novel categories can be more accurately identified by using the characteristic retrieval mode, and the accuracy of the picture retrieval is improved.

Drawings

Fig. 1 is a schematic structural diagram of a picture retrieval device of a hardware running environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of a picture retrieval method according to the present invention;

FIG. 3 is a flowchart of a second embodiment of a picture retrieval method according to the present invention;

FIG. 4 is a schematic view of an unsupervised learning scenario of features of a positive and negative sample difficult to separate in a second embodiment of the picture retrieval method of the present invention;

FIG. 5 is a flowchart of a third embodiment of a picture retrieval method according to the present invention;

FIG. 6 is a schematic diagram of a process of learning new features of a positive and negative sample difficult to separate in a third embodiment of the picture retrieval method of the present invention;

fig. 7 is a block diagram of a first embodiment of a picture retrieval apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a picture retrieval device of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the picture retrieval apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the picture retrieval device, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a picture retrieval program may be included in the memory 1005 as one type of storage medium.

In the picture retrieval apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the picture retrieval apparatus of the present invention may be provided in a picture retrieval apparatus which invokes a picture retrieval program stored in the memory 1005 through the processor 1001 and executes the picture retrieval method provided by the embodiment of the present invention.

An embodiment of the present invention provides a picture retrieval method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the picture retrieval method of the present invention.

In general, in order to upgrade the recognition capability of the intelligent driving perception algorithm model, data which is not seen by the model needs to be mined, and a camera case (edge scene) is extracted from massive unlabeled data. The prior art relies on a cloud large model to mine data, and common mining modes comprise label mining, graph searching, semantic mining and picture self-defining frame selection. In short, if the perception model of the vehicle end is to be upgraded, specific pictures are to be retrieved for upgrading.

The cloud large model has some poor data mining effects due to insufficient samples in the training data. The traditional solution is that firstly, sample data are screened manually and uninjectedly, but a large amount of time is consumed, and the selected samples cannot be guaranteed to be difficult to separate data (data with larger loss value); secondly, the data needs to be manually marked, and the engineering time is increased again. Therefore, the model needs to be trained from scratch, a great deal of training resources and training time are consumed, and the calculation efficiency is reduced. Therefore, the embodiment of the invention provides a picture retrieval method to widen the types of model mining, solve the problem of difficult-to-separate sample selection and improve the picture retrieval efficiency.

In this embodiment, the picture retrieval method includes the following steps:

step S10: and carrying out confidence value screening on the difficult-to-identify pictures to obtain difficult-to-identify positive and negative samples corresponding to the difficult-to-identify pictures.

It should be noted that, the execution subject of the method of the present embodiment may be a computing service device with sample screening, feature extraction and image retrieval functions, such as a tablet computer, a personal computer, or other electronic devices capable of implementing the same or similar functions, such as the image retrieval device described above, which is not limited in this embodiment. Here, the present embodiment and the following embodiments will be specifically described with the above-described picture retrieval apparatus.

It can be understood that the pictures difficult to identify are pictures which are not found to be wanted or have poor identification effect when being searched through multiple modes through the model. For example, it is difficult to find pictures that contain ground cracks. For example, want to search for pictures with triangular pyramids but search results have many triangular structures. This is because of the smaller number of training samples for this class; or the characteristics in the training sample are not representative, and the characteristic differences of different categories are smaller; or the lack of the Fine tune work, the three reasons lead to the occurrence of the category with poor recognition effect in the large model data mining process.

It should be understood that confidence value screening is the process of filtering and screening results based on the confidence of the model when applying a machine learning model or making predictions. In the confidence value screening process, the sample feature of the difficult-to-identify picture may be accompanied by a confidence (confidence) or probability (probability) value, which indicates the confidence level of the sample feature. The higher the confidence value, the greater the certainty of the model for that result; the lower the confidence value, the less deterministic the representation model is for the result. Through carrying out confidence value screening, only the prediction result with the confidence coefficient higher than a certain threshold value can be selected according to the set threshold value or requirement, and the result with lower confidence coefficient is filtered, so that the accuracy and reliability of classification of the difficult-to-classify positive and negative samples are improved, and the risk of misjudgment is reduced.

It can be understood that the indistinguishable positive and negative samples are positive and negative characteristics of the indistinguishable pictures obtained through confidence value screening. For example, the result of the retrieval of the difficult-to-identify picture can be screened according to the score value (i.e. the confidence level), the result with the lower score value and the wrong manual audit is selected as the difficult-to-separate negative sample, and the result with the lower score value and the right manual audit is selected as the difficult-to-separate positive sample. Because the low confidence level indicates that the model has low confidence in classifying the features in the graph (i.e. the probability of judging a certain class is low), the feature is positioned near the decision boundary of the model, and the model is more accurate in judging through the difficult-to-classify positive sample.

In a specific implementation, the picture searching device can perform multi-mode searching through the model at first, and it is determined that a picture with a desired or poor recognition effect cannot be found. And then screening the retrieval result of the difficult-to-identify picture according to score value (namely confidence level) to obtain a difficult-to-identify positive and negative sample corresponding to the difficult-to-identify picture, and reducing the risk of retrieval misjudgment through the difficult-to-identify positive sample so as to enable the model to be more accurate in judgment.

Step S20: and extracting features based on the positive and negative samples difficult to separate to obtain new features of few samples of the positive and negative samples difficult to separate.

It should be noted that, the new features of the few samples are features with representativeness and differentiation extracted from the hard-to-separate positive and negative samples through feature extraction, and by selecting appropriate features and performing effective feature extraction, the data dimension can be reduced, redundant information can be eliminated, key information can be highlighted, so as to improve the performance and generalization capability of the model.

In a specific implementation, the picture retrieval equipment can perform unsupervised learning or semi-supervised learning on the positive and negative samples difficult to separate, and extract the picture of the sample as a feature to obtain new features of few samples of the positive and negative samples difficult to separate. The data dimension is reduced, redundant information is eliminated, key information is highlighted through few sample new features, and the model can identify and distinguish different objects or things, so that the performance and generalization capability of the model are improved.

Step S30: and searching the warehouse-in picture according to the new features of the few samples to obtain a search picture set corresponding to the warehouse-in picture.

In a specific implementation, the picture retrieval device can finally take a small number of new features as input to retrieve, and as the mode is the matching of feature layers, more accurate retrieval picture sets can be obtained than those of text features or picture features in retrieval.

Further, in the present embodiment, step S30 includes: performing feature matching on the picture features of the warehouse-in picture and the new features of the few samples to obtain a matching result; judging whether the matching result reaches a preset matching value or not; and when the matching result reaches the preset matching value, searching the warehouse-in picture according to the new characteristics of the few samples, and obtaining a searching picture set corresponding to the warehouse-in picture.

It should be noted that the warehouse-in picture is a picture to be retrieved. The picture features are numerical representation features extracted from the warehouse-in pictures and used for describing image content, and the image data can be better understood, compared and processed by extracting and utilizing the picture features.

It is understood that the preset matching value is a threshold value for judging the degree of similarity or the degree of similarity set in advance when feature matching is performed. The accuracy and recall rate of the picture retrieval process can be balanced through the preset matching value.

In a specific implementation, the image retrieval device can take new characteristics with few samples as input to perform retrieval, and perform characteristic matching on the image characteristics of the warehouse-in image and the new characteristics with few samples to obtain a matching result. Because the method is the matching of the feature level, the accuracy and recall rate are higher than those of the text feature or the new picture feature retrieval. And then, when the matching result reaches the preset matching value, searching the warehouse-in picture according to the new features of the few samples to obtain a search picture set corresponding to the warehouse-in picture. Through experiments, the recall rate can be improved by about 12% and the accuracy rate can be improved by about 17% by using a characteristic retrieval mode.

The picture searching device of the embodiment can firstly perform multi-mode searching through the model to determine that the picture with wanted or bad identification effect cannot be found. And then screening the retrieval result of the difficult-to-identify picture according to score value (namely confidence level) to obtain a difficult-to-identify positive and negative sample corresponding to the difficult-to-identify picture, and reducing the risk of retrieval misjudgment through the difficult-to-identify positive sample so as to enable the model to be more accurate in judgment. And performing unsupervised learning or semi-supervised learning on the difficult-to-separate positive and negative samples, and extracting features of pictures of the samples to obtain new features of few samples of the difficult-to-separate positive and negative samples. The data dimension is reduced, redundant information is eliminated, key information is highlighted through few sample new features, and the model can identify and distinguish different objects or things, so that the performance and generalization capability of the model are improved. Finally, the new features with few samples can be used as input for searching, and the mode is the matching of feature layers, so that more accurate searching picture sets can be obtained than the searching of character features or picture features. According to the invention, the novel characteristics of the few samples are obtained by extracting the characteristics of the positive and negative samples which are difficult to separate, and then the novel characteristics of the few samples are used as retrieval input, so that the situation that the retrieval effect of the cloud large model on some pictures is poor when the training data has insufficient samples is avoided, the novel categories can be more accurately identified by using the characteristic retrieval mode, and the accuracy of the picture retrieval is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a picture retrieval method according to the present invention.

Based on the first embodiment, in this embodiment, the step S20 includes:

step S21: and extracting the characteristics of the positive and negative samples difficult to separate, and obtaining a plurality of sample characteristics corresponding to the positive and negative samples difficult to separate.

The plurality of sample features are numerical representation features extracted from the difficult-to-separate positive and negative samples and used for describing the difficult-to-separate positive and negative samples, and the image data can be better understood, compared and processed by extracting and utilizing the sample features.

Step S22: feature distances between the plurality of sample features are obtained.

Step S23: and carrying out similar processing on the plurality of sample features based on the feature distance to obtain new features with few samples corresponding to the difficult-to-identify picture.

It should be noted that, the feature distance is a measurement index for measuring the difference or the similarity between two sample features, and common feature distance measurement methods may include euclidean distance, manhattan distance, cosine distance, and the like.

It is appreciated that in the deep learning of an image neighborhood, an image may be represented as a high-dimensional feature vector. By calculating the distance between these feature vectors, the similarity between images or objects in the images can be determined. If the feature vectors of two images or objects in an image are closely spaced, they may belong to the same class or have similar features. Conversely, if feature vectors are far apart, they may belong to different categories or have different features.

In a specific implementation, the image retrieval device may perform feature extraction on the hard-to-separate positive and negative samples, and obtain a plurality of sample features for describing the numerical representation features of the hard-to-separate positive and negative samples. The sample features are then represented as high-dimensional feature vectors, and by calculating the distance between these feature vectors, the similarity between the images or objects in the images can be determined to measure the difference or degree of similarity between the two sample features. Finally, the plurality of sample features are similarly processed based on the feature distance, and if feature vectors of two images or objects in the images are closely spaced, they may belong to the same class or have similar features. Conversely, if feature vectors are far apart, they may belong to different categories or have different features. Therefore, the few sample new features corresponding to the difficult-to-identify pictures are obtained, and the new category can be identified more accurately when the multi-mode retrieval model uses the original retrieval mode.

Further, in the present embodiment, step S23 includes: performing similarity detection on the plurality of sample features based on the feature distances to obtain a plurality of sample similarities corresponding to the plurality of sample features; judging whether the sample similarity reaches a preset similarity value or not; and taking the sample characteristic corresponding to the preset similarity value as a new characteristic of less samples.

It should be noted that, the sample similarity is a degree of similarity between sample features, and may be implemented by different methods, such as a similarity matrix, a clustering algorithm, and the like. The similarity between the plurality of sample features can be calculated by common clustering algorithms such as K-means clustering, hierarchical clustering and the like to reflect the similarity relationship between samples.

It is understood that the preset similarity value refers to a threshold or a threshold range that is preset to determine whether two sample features are considered similar when performing similarity comparison. When the similarity between the two samples exceeds or reaches a preset similarity value, the two samples are regarded as similar; otherwise, it is considered dissimilar.

In a specific implementation, the image retrieval device may perform similarity detection on the plurality of sample features based on the feature distance, for example, perform similarity detection based on a similarity matrix, a clustering algorithm, and the like, to obtain a plurality of sample similarities corresponding to the plurality of sample features. Then judging whether the sample similarity reaches a preset similarity value or not; when the similarity between the two samples exceeds or reaches a preset similarity value, the two samples are regarded as similar; otherwise, it is considered dissimilar. At this time, the sample characteristic corresponding to the preset similarity value is taken as a new characteristic of a few samples, so that the accurate new characteristic of the few samples is obtained, and the accuracy of picture retrieval is improved.

Further, in this embodiment, the taking the sample feature corresponding to the preset similarity value as the new feature with less samples includes: pulling a plurality of sample features which do not reach the preset similarity value far through a difficult-to-separate negative sample, and pulling a plurality of sample features which reach the preset similarity value near through a difficult-to-separate positive sample to obtain a similar feature cluster, wherein the difficult-to-separate positive and negative samples comprise a difficult-to-separate positive sample and a difficult-to-separate negative sample; and marking the similar feature clusters to obtain new features of few samples corresponding to the difficult-to-identify pictures.

It should be noted that, the similar feature clusters are feature clusters obtained by aggregating or classifying similar feature samples into the same class among the plurality of sample features. By clustering the features with similar features in the same cluster, the image classification and the retrieval are conveniently carried out, and the retrieval efficiency and accuracy are improved.

In a specific implementation, the image retrieval device may pull away a plurality of sample features corresponding to the preset similarity value through the difficult-to-separate negative sample, and pull up a plurality of sample features corresponding to the preset similarity value through the difficult-to-separate positive sample, so as to obtain a similar feature cluster. And finally, marking the similar feature clusters to obtain new features of few samples corresponding to the difficult-to-identify pictures. In this way, the model can better judge whether the images belong to the same category by comparing the distances among the examples, so that the image retrieval device can identify and distinguish different objects or things, and the image retrieval efficiency is improved.

For easy understanding, referring to fig. 4, fig. 4 is a schematic view of a feature unsupervised learning of a positive and negative sample difficult to separate in the second embodiment of the picture retrieval method according to the present invention, but the scheme is not limited. The actual features are typically multi-dimensional, e.g., 128-dimensional, 256-dimensional, etc., as shown in fig. 4, assuming that the features are 2-dimensional for ease of understanding, each point represents a feature of an object. Points in the circle are new features of a few samples which we want to extract, the left graph is a feature graph which is not subjected to positive and negative sample feature learning, and in the left graph, the model is unclear as to which points should be marked as points in the circle, so that the multi-mode large retrieval model is inaccurate in judgment. The positive sample acts to draw similar points closer, i.e., points within the circle are closer to the center point; the negative sample has the effect of pulling irrelevant points to the periphery, namely pulling points outside the circle to the periphery. After this step, the model will accurately learn that the points within the circle are new features, labeled as few sample new features.

The image retrieval device of the embodiment can perform feature extraction on the difficult-to-separate positive and negative samples to obtain a plurality of sample features for describing the numerical representation features of the difficult-to-separate positive and negative samples. The sample features are then represented as high-dimensional feature vectors, and by calculating the distance between these feature vectors, the similarity between the images or objects in the images can be determined to measure the difference or degree of similarity between the two sample features. Finally, the plurality of sample features are similarly processed based on the feature distance, and if feature vectors of two images or objects in the images are closely spaced, they may belong to the same class or have similar features. Conversely, if feature vectors are far apart, they may belong to different categories or have different features. Therefore, the few sample new features corresponding to the difficult-to-identify pictures are obtained, and the new category can be identified more accurately when the multi-mode retrieval model uses the original retrieval mode. Further, the image retrieval device may further perform similarity detection on the plurality of sample features based on the feature distance, for example, perform similarity detection based on a similarity matrix, a clustering algorithm, and the like, to obtain a plurality of sample similarities corresponding to the plurality of sample features. Then judging whether the sample similarity reaches a preset similarity value or not; when the similarity between the two samples exceeds or reaches a preset similarity value, the two samples are regarded as similar; otherwise, it is considered dissimilar. At this time, the sample characteristic corresponding to the preset similarity value is taken as a new characteristic of a few samples, so that the accurate new characteristic of the few samples is obtained, and the accuracy of picture retrieval is improved. Furthermore, the image retrieval device may further pull the plurality of sample features corresponding to the preset similarity value through the difficult-to-separate negative sample, and pull the plurality of sample features corresponding to the preset similarity value through the difficult-to-separate positive sample, so as to obtain a similar feature cluster. And finally, marking the similar feature clusters to obtain new features of few samples corresponding to the difficult-to-identify pictures. In this way, the model can better judge whether the images belong to the same category by comparing the distances among the examples, so that the image retrieval device can identify and distinguish different objects or things, and the image retrieval efficiency is improved.

Referring to fig. 5, fig. 5 is a flowchart of a third embodiment of a picture retrieval method according to the present invention.

Based on the above embodiments, in this embodiment, the step S10 includes:

step S11: and carrying out region frame selection on the difficult-to-identify picture to obtain a difficult-to-identify region corresponding to the difficult-to-identify picture.

Step S12: and searching in advance according to the difficult-to-identify area to obtain a search sample.

It should be noted that, the difficult-to-recognize areas are areas in which some samples or data areas exist in the difficult-to-recognize picture, and these difficult-to-recognize areas may exist for various reasons, such as data noise, uneven sample distribution, and feature redundancy, for example, for the above model.

It can be understood that the retrieval sample is the last data obtained by retrieving the difficult-to-identify picture based on the model.

Step S13: and screening the retrieval samples according to the confidence values to obtain the indistinct positive and negative samples corresponding to the indistinct areas.

The confidence value refers to a measure of the confidence level or certainty level of the model for the retrieved sample. The confidence value is typically expressed as a number between 0 and 1 and can be understood as the confidence level or likelihood of the predicted outcome. When the confidence value approaches 1, it indicates that the model is very confident for the retrieved sample; and when the confidence value is close to 0, the uncertainty of the representation model for the retrieval sample is higher.

In a specific implementation, in the data mining process, if the semantics of the model or the search of the picture appear in the category with poor recognition effect, the picture retrieval equipment can search and mine the targets which cannot be recognized in the custom frame selected picture, so as to obtain the difficult-to-recognize areas corresponding to the difficult-to-recognize pictures. And screening the retrieved retrieval samples according to score values (namely confidence degrees) to obtain the positive and negative samples which correspond to the difficult-to-identify areas and are difficult to separate. New feature learning is realized through positive and negative sample unsupervised learning or semi-supervised learning, and new features are embedded after the multi-mode retrieval model, so that the multi-mode retrieval model can accurately identify new categories when the original retrieval mode is used.

Further, in the present embodiment, step S13 includes: obtaining an audit result based on the retrieval sample feedback; dividing the retrieval sample into a difficult-to-separate negative sample and a difficult-to-separate positive sample according to the confidence value and the auditing result fed back by the retrieval sample; and obtaining the refractory positive and negative samples corresponding to the refractory areas according to the refractory negative samples and the refractory positive samples.

In a specific implementation, in consideration of optimizing a multi-mode retrieval large model, new labeling sample data is needed, the data volume is large, and the process is time-consuming and high in cost. And require manual searching for samples, since the number of negative samples will be much higher than the number of positive samples, most of which are easily separable negative samples that have relatively little effect on training. The traditional manual screening method is difficult to ensure that the selected samples are difficult-to-separate samples (namely, data with larger loss values). Therefore, the picture retrieval device can screen the retrieved results according to the score value (i.e. confidence), select the result with lower score value and incorrect manual audit as the difficultly-separated negative sample, select the result with lower score value and correct manual audit as the difficultly-separated positive sample, generally select 50 positive and negative samples respectively, and the number of the positive and negative samples added later can be effectively improved, but the improvement effect is not obvious. A small range of data is screened by a score value for self-framing search, and the low confidence coefficient indicates that the model has low grasp on the classification of the characteristics in the graph (namely, the probability of judging a certain class is low), so that the characteristics are positioned near the decision boundary of the model, and if the points of different classes on the decision boundary are pulled far, the model can be more accurate in judging. Meanwhile, the capacity of multi-mode large model retrieval is improved without marking data, positive and negative samples are selected on the basis of the result of poor performance of the original multi-mode large model, and therefore time, labor and money are effectively saved.

Further, in order to improve the efficiency and accuracy of the picture retrieval, the new features with few samples can be accessed into a preset multi-mode retrieval large model; after step S20, the method further includes: extracting features of the warehouse-in picture through the preset multi-mode retrieval large model to obtain current features corresponding to the warehouse-in picture; judging whether the current characteristics are similar to the new characteristics of the few samples or not; and when the current feature has similar features of the new features with few samples, judging the current feature as the new features with few samples.

It should be noted that, referring to fig. 6, fig. 6 is a schematic flow chart of new feature learning of a positive and negative sample difficult to separate in the third embodiment of the picture retrieval method of the present invention. The traditional tool for searching the pictures is realized through a cloud large model, which is also called a multi-mode retrieval large model, is a deep learning model capable of processing and retrieving various types of data (such as images, texts, audios and the like), and can perform cross-mode data retrieval according to the correlation among different data types. Therefore, after the small sample new features are accessed to a preset multi-mode retrieval large model (full connection layer), the related retrieval of the warehouse-in picture and other cloud images can be quickly realized under the assistance of the small sample new features, so that the information related to the warehouse-in picture can be retrieved from various data sources more efficiently, and a more accurate and comprehensive retrieval result can be provided.

It should be understood that the current feature is a feature obtained by extracting a feature of the warehouse-in picture, and if the current feature of the warehouse-in picture has a target close to the new feature, the current feature of the warehouse-in picture can be judged to be a new feature with less samples, and the above process is called as Embedding. Wherein, embedding is a process of converting different data types or features into a shared low-dimensional vector space, and generally involves mapping different types of data or features into a common vector space so as to compare, match or search for measuring similarity or correlation between them, thereby achieving the goal of multi-modal search. By embedding the few-sample new features into the preset multi-modal retrieval large model, cross-modal information interaction and analysis can be more easily performed.

In a specific implementation, the new features with few samples can be embedded into the rear (full connection layer) of the preset multi-mode retrieval large model, and when the feature extraction is performed on the warehouse-in picture, if the current features of the warehouse-in picture have targets close to the new features with few samples, the current features are judged to be the new features with few samples. Through the above-mentioned process of editing, later in the work of retrieving of original characters or pictures, can improve its degree of accuracy and recall.

For ease of understanding, as shown in fig. 6, the conventional picture retrieval process is divided into three processes: the data is imported into a platform, multi-modal retrieval of a large model and screening output. The feature vector extraction and the mining are carried out on the warehouse-in picture by means of the cloud large model, and the common mining modes comprise label searching, semantic searching and picture searching, wherein the picture searching further comprises full-image searching, single-target searching and user-defined frame selection searching. And finally, checking the score value, selecting data and outputting the data to the labeling platform to finish the training of the vehicle end model. However, cloud large models may not work well for some data mining (accuracy or recall below a set threshold), one of which is that there are not enough samples in the trained data.

Therefore, based on the embodiments of the present invention, new feature learning can be realized through positive and negative sample unsupervised learning or semi-supervised learning, new features are used as queries (input, original input is text, pictures and labels), and the accuracy of the retrieval is improved by using a feature retrieval mode. In the process of data mining, if the semantic or picture search is in a category with poor recognition effect, the search mining can be performed by customizing the targets which cannot be recognized in the frame search picture, as shown in the black frame flow in fig. 6. And screening the retrieved result according to score value (namely confidence), and manually distinguishing the indistinct positive and negative samples in a small range. And performing unsupervised learning or semi-supervised learning on the difficult-to-separate positive samples and the difficult-to-separate negative samples, extracting features from pictures of the difficult-to-separate positive samples and the difficult-to-separate negative samples, and pulling similar feature distances closer to obtain new features. In this way, the model can better judge whether they belong to the same category by comparing the distances between the instances. And finally, searching the new features as query, and matching the features with features of the warehouse-in pictures, wherein the matching of the features is higher in accuracy and recall rate than the searching of the character features or the new picture features. Meanwhile, the new features can be connected to the rear of the cloud retrieval large model (the full connection layer), when the feature extraction is carried out on the warehouse-in picture, if the warehouse-in picture has a target close to the new features, the new features are judged, and the work is called as Embedding. And then in the original text and picture retrieval work, the accuracy and recall rate of the method are improved.

In the data mining process, if the semantics of the model or the search of the picture appear the category with poor recognition effect, the picture searching device can search and mine the targets which cannot be recognized in the custom frame selected picture, so as to obtain the difficult-to-recognize areas corresponding to the difficult-to-recognize pictures. And screening the retrieved retrieval samples according to score values (namely confidence degrees) to obtain the positive and negative samples which correspond to the difficult-to-identify areas and are difficult to separate. New feature learning is realized through positive and negative sample unsupervised learning or semi-supervised learning, and new features are embedded after the multi-mode retrieval model, so that the multi-mode retrieval model can accurately identify new categories when the original retrieval mode is used. Furthermore, in consideration of optimizing a multi-mode retrieval large model, new labeling sample data is needed, the data size is large, and the process is time-consuming and high in cost. And require manual searching for samples, since the number of negative samples will be much higher than the number of positive samples, most of which are easily separable negative samples that have relatively little effect on training. The traditional manual screening method is difficult to ensure that the selected samples are difficult-to-separate samples (namely, data with larger loss values). Therefore, the picture retrieval device can screen the retrieved results according to the score value (i.e. confidence), select the result with lower score value and incorrect manual audit as the difficultly-separated negative sample, select the result with lower score value and correct manual audit as the difficultly-separated positive sample, generally select 50 positive and negative samples respectively, and the number of the positive and negative samples added later can be effectively improved, but the improvement effect is not obvious. A small range of data is screened by selecting a score value for searching from a frame, because the confidence coefficient is low, the classification of the feature in the graph is not high (namely, the probability of judging a certain class is low), the feature is positioned near the decision boundary of the model, and if the points of different classes on the decision edge are far, the model can be more accurate in judging. Meanwhile, the capacity of multi-mode large model retrieval is improved without marking data, positive and negative samples are selected on the basis of the result of poor performance of the original multi-mode large model, and therefore time, labor and money are effectively saved.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a picture retrieval program, and the picture retrieval program realizes the steps of the picture retrieval method when being executed by a processor.

Referring to fig. 7, fig. 7 is a block diagram showing the structure of a first embodiment of the picture retrieval apparatus according to the present invention.

As shown in fig. 7, a picture retrieval apparatus according to an embodiment of the present invention includes:

the confidence screening module 701 is configured to perform confidence value screening on the difficult-to-identify picture, so as to obtain a difficult-to-identify positive and negative sample corresponding to the difficult-to-identify picture;

the feature extraction module 702 is configured to perform feature extraction based on the hard-to-separate positive and negative samples, and obtain new features of fewer samples of the hard-to-separate positive and negative samples;

and the picture retrieval module 703 is configured to retrieve the in-warehouse picture according to the new features of the few samples, and obtain a retrieved picture set corresponding to the in-warehouse picture.

Based on the above-described first embodiment of the picture retrieval apparatus of the present invention, a second embodiment of the picture retrieval apparatus of the present invention is proposed.

In this embodiment, the feature extraction module 702 is further configured to perform feature extraction on the hard-to-separate positive and negative samples, so as to obtain a plurality of sample features corresponding to the hard-to-separate positive and negative samples; obtaining feature distances between the plurality of sample features; and carrying out similar processing on the plurality of sample features based on the feature distance to obtain new features with few samples corresponding to the difficult-to-identify picture.

Further, the feature extraction module 702 is further configured to perform similarity detection on the plurality of sample features based on the feature distance, so as to obtain a plurality of sample similarities corresponding to the plurality of sample features; judging whether the sample similarity reaches a preset similarity value or not; and taking the sample characteristic corresponding to the preset similarity value as a new characteristic of less samples.

Further, the feature extraction module 702 is further configured to pull away, by using a difficult-to-separate negative sample, a plurality of sample features that do not reach the preset similarity value, and pull up, by using a difficult-to-separate positive sample, a plurality of sample features that reach the preset similarity value, so as to obtain a similar feature cluster, where the difficult-to-separate positive and negative samples include a difficult-to-separate positive sample and a difficult-to-separate negative sample; and marking the similar feature clusters to obtain new features of few samples corresponding to the difficult-to-identify pictures.

Further, the confidence filtering module 701 is further configured to perform region frame selection on a difficult-to-identify picture, so as to obtain a difficult-to-identify region corresponding to the difficult-to-identify picture; searching in advance according to the difficult-to-identify area to obtain a search sample; and screening the retrieval samples according to the confidence values to obtain the indistinct positive and negative samples corresponding to the indistinct areas.

Further, the confidence filtering module 701 is further configured to obtain an audit result based on the feedback of the retrieval sample; dividing the retrieval sample into a difficult-to-separate negative sample and a difficult-to-separate positive sample according to the confidence value and the auditing result fed back by the retrieval sample; and obtaining the refractory positive and negative samples corresponding to the refractory areas according to the refractory negative samples and the refractory positive samples.

Further, the picture retrieval module 703 is further configured to perform feature matching on the picture feature of the warehouse-in picture and the new feature of the few samples, so as to obtain a matching result; judging whether the matching result reaches a preset matching value or not; and when the matching result reaches the preset matching value, searching the warehouse-in picture according to the new characteristics of the few samples, and obtaining a searching picture set corresponding to the warehouse-in picture.

Other embodiments or specific implementation manners of the image retrieval device of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A picture retrieval method, characterized in that the picture retrieval method comprises:

2. The picture retrieval method according to claim 1, wherein the feature extraction based on the hard-to-separate positive and negative samples to obtain new features of fewer samples of the hard-to-separate positive and negative samples includes:

obtaining feature distances between the plurality of sample features;

3. The picture retrieval method according to claim 2, wherein the performing similar processing on the plurality of sample features based on the feature distance to obtain new features with fewer samples corresponding to the difficult-to-identify picture includes:

judging whether the sample similarity reaches a preset similarity value or not;

4. The picture retrieval method as recited in claim 3, wherein the taking the sample feature corresponding to the preset similarity value as the new feature with less samples comprises:

5. The method for retrieving a picture according to claim 1, wherein the performing confidence value screening on the difficult-to-identify picture to obtain the difficult-to-identify positive and negative samples corresponding to the difficult-to-identify picture comprises:

6. The picture retrieval method according to claim 5, wherein the screening the retrieval samples according to confidence values to obtain the refractory positive and negative samples corresponding to the refractory areas comprises:

obtaining an audit result based on the retrieval sample feedback;

7. The picture retrieval method according to claim 1, wherein the small sample new features are accessed into a preset multi-mode retrieval large model; the method further comprises the steps of after feature extraction is performed based on the positive and negative samples difficult to separate and few sample new features of the positive and negative samples difficult to separate are obtained:

8. A picture retrieval apparatus, the apparatus comprising:

9. A picture retrieval apparatus, the apparatus comprising: a memory, a processor and a picture retrieval program stored on the memory and executable on the processor, the picture retrieval program being configured to implement the steps of the picture retrieval method according to any one of claims 1 to 7.

10. A storage medium having stored thereon a picture retrieval program which, when executed by a processor, implements the steps of the picture retrieval method according to any one of claims 1 to 7.