WO2023231355A1

WO2023231355A1 - Image recognition method and apparatus

Info

Publication number: WO2023231355A1
Application number: PCT/CN2022/137039
Authority: WO
Inventors: 杨振宇; 李剑平
Original assignee: 深圳先进技术研究院
Priority date: 2022-06-01
Filing date: 2022-12-06
Publication date: 2023-12-07
Also published as: CN117218356A

Abstract

The embodiments of the present application relate to the technical field of image processing. Provided are an image recognition method and apparatus. The method comprises: acquiring an image to be recognized; performing image feature extraction on the image to be recognized, so as to obtain a first feature vector; searching a retrieval library, which is used for storing sample images and sample categories corresponding to the sample images, for a sample image in which the similarity between a second feature vector and the first feature vector meets a similarity condition, wherein the second feature vector is used for representing an image feature of the sample image; and determining, according to a sample category corresponding to the found sample image, a target category of the image to be recognized. By means of the embodiments of the present application, the problems of low recognition accuracy, instability, and poor generalization performance in the related art can be solved.

Description

Image recognition method and device

Technical field

The present application relates to the field of image processing technology. Specifically, the present application relates to an image recognition method and device.

Background technique

Image recognition is an important research topic in the field of computer vision and has been widely used in many fields. For example, it is aimed at image recognition of plankton in the marine environment to achieve long-term, continuous in-situ observation of the plankton.

At present, image recognition usually uses a training set to train a convolutional neural network model, and then predicts the category of the image to be recognized based on the convolutional neural network model to obtain the target category of the image to be recognized. In the above-mentioned image recognition scheme based on image classification, the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the image recognition based on the convolutional neural network model. recognition performance.

However, the update of the training set relies on a large amount of manual annotation and manual correction. Therefore, how to improve the recognition accuracy and robustness while reducing manual participation, and thus ensure the generalization performance, is an issue that remains to be solved.

Contents of the invention

Each embodiment of the present application provides an image recognition method, device, electronic device, and storage medium, which can solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies. The technical solutions are as follows:

According to one aspect of the embodiment of the present application, an image recognition method includes: obtaining an image to be recognized; performing image feature extraction on the image to be recognized to obtain a first feature vector; and storing sample images and their corresponding samples. In the retrieval library of the category, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition, and the second feature vector is used to represent the image features of the sample image; according to the found sample The sample category corresponding to the image determines the target category of the image to be recognized.

According to one aspect of the embodiment of the present application, an image recognition device includes: an image acquisition module, used to acquire an image to be recognized; a feature extraction module, used to extract image features from the image to be recognized, to obtain a first feature vector ; Image search module, used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval library used to store sample images and their corresponding sample categories, the second The feature vector is used to represent the image features of the sample image; the image recognition module is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.

In an exemplary embodiment, the feature extraction module includes: an extractor unit, configured to convert the image to be recognized into the first feature vector using a feature extractor that has completed model training.

In an exemplary embodiment, the device further includes: a model training module, configured to perform model training on a basic model according to the image pairs in the training set to obtain the feature extractor, where the basic model includes a first training branch and a second training branch, the first training branch and the second training branch respectively include a feature extraction layer and a dimensionality reduction layer; the model training module includes: an image traversal unit, used to perform image pairs in the training set Traverse, the image pair includes a positive sample pair and a negative sample pair, the two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories; The traversal includes: inputting two sample images in the image pair into the first training branch and the second training branch respectively for processing; obtaining according to the first training branch and the second training branch. The processing result is to calculate the model loss value; the convergence unit is used to obtain the feature extractor by converging the feature extraction layer in the basic model if the convergence condition is satisfied by the model loss value.

In an exemplary embodiment, the device further includes: an image pair building module; the image pair building module includes: an amplification unit, configured to perform at least two different operations on one of the sample images in the training set. Image data enhancement processing, then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image; a pairing unit is used to amplify the first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set. Enhance the image and perform image pairing processing to obtain the image pair.

In an exemplary embodiment, the image search module includes: a similarity calculation unit, configured to respectively calculate the second feature vector and the first feature vector for each second feature vector in the feature vector set. The similarity of the feature vector set is constructed from the second feature vector of the sample image in the retrieval library; the image search unit is used to select the sample with the highest similarity between the second feature vector and the first feature vector. image as a sample image found from the retrieval library.

In an exemplary embodiment, the device further includes: a set building module, configured to build the feature vector set from the second feature vector of the sample image in the retrieval library; the set building module package: a vector adding unit , used to extract image features from each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set; a vector traversal unit, used for Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, and calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set. , obtain the first similarity; the vector deletion unit is used to delete the second feature vector with high redundancy from the feature vector set based on the first similarity, the redundancy is used to indicate that in the The number of similar second feature vectors in the feature vector set.

In an exemplary embodiment, the vector deletion unit includes: a vector determination subunit, configured to use a second feature vector whose first similarity to the first vector is greater than a first set threshold as the second vector. ; Similarity calculation subunit, used to respectively calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity; Redundancy calculation subunit, used to calculate the similarity according to the second vector and the remaining second feature vectors in the feature vector set; The number of second feature vectors whose first similarity is greater than the first set threshold is determined, and the redundancy of the first vector is determined based on the second similarity with the second vector being greater than the first set threshold. 2. Set the number of second feature vectors with a threshold value to determine the redundancy of the second vector; delete the subunit, used if the redundancy of the first vector is greater than the redundancy of the second vector , then the first vector is deleted from the feature vector set.

In an exemplary embodiment, the image recognition module includes: an image recognition unit, configured to use the sample category corresponding to the found sample image as The target category of the image to be recognized.

In an exemplary embodiment, the device further includes: a new category correction module, configured to correct the target category of the image to be recognized in response to a category correction instruction; and a new category adding module, configured to correct the target category of the image to be recognized in the image to be recognized. When the corrected target category of the recognized image is a new category, in response to a category adding instruction, the image to be recognized and its corrected target category are added to the retrieval library, and the new category refers to the image to be recognized. The target category after image correction is different from the sample category in the retrieval library.

According to an aspect of an embodiment of the present application, an electronic device includes: at least one processor, at least one memory, and at least one communication bus, wherein a computer program is stored on the memory, and the processor reads the data in the memory through the communication bus. Computer program; when the computer program is executed by the processor, the image recognition method as described above is implemented.

According to one aspect of an embodiment of the present application, a storage medium has a computer program stored thereon. When the computer program is executed by a processor, the image recognition method as described above is implemented.

According to an aspect of an embodiment of the present application, a computer program product includes a computer program, the computer program is stored in a storage medium, a processor of an electronic device reads the computer program from the storage medium, and the processor executes the computer program, such that When executed, the electronic device implements the image recognition method as described above.

The beneficial effects brought by the technical solution provided by this application are:

In the above technical solution, based on the first feature vector of the image to be recognized, in the retrieval database used to store sample images and their corresponding sample categories, search for the similarity between the second feature vector and the first feature vector that satisfies the similarity condition. sample image, and then determine the target category of the image to be recognized based on the sample category corresponding to the found sample image. Thus, an image recognition solution that replaces image classification with image retrieval is realized. Since the recognition accuracy of image retrieval depends on the retrieval database The sample images and their corresponding sample categories are different from the image classification which relies on frequent changes of the training set and the retraining of the convolutional neural network model, so that it can fully improve the recognition accuracy and robustness while minimizing manual participation. It can effectively solve the problems of low recognition accuracy, instability and poor generalization performance in related technologies.

Description of the drawings

In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below.

Figure 1 is a schematic diagram of an implementation environment involved in this application;

Figure 2 is a flow chart of an image recognition method according to an exemplary embodiment;

Figure 3 is a schematic diagram showing that the image to be recognized is an ROI image according to an exemplary embodiment;

Figure 4 is a schematic structural diagram of a basic model according to an exemplary embodiment;

Figure 5 is a schematic structural diagram of a feature extraction layer according to an exemplary embodiment;

Figure 6 is a method flow chart of the model training process of the feature extraction layer according to an exemplary embodiment;

Figure 7 is a schematic diagram of an image pairing process according to an exemplary embodiment;

Figure 8a is a method flow chart of a construction process of a feature vector set according to an exemplary embodiment;

Figure 8b is a method flow chart in one embodiment of step 550 involved in the corresponding embodiment of Figure 8a;

Figure 9 is a flow chart of another image recognition method according to an exemplary embodiment;

Figure 10 is a schematic diagram of an image recognition framework based on image retrieval according to an exemplary embodiment;

Figure 11 is a structural block diagram of an image recognition device according to an exemplary embodiment;

Figure 12 is a hardware structure diagram of an electronic device according to an exemplary embodiment;

Figure 13 is a structural block diagram of an electronic device according to an exemplary embodiment.

Detailed ways

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present application and cannot be construed as limiting the present application.

Those skilled in the art will understand that, unless expressly stated otherwise, the singular forms "a", "an", "the" and "the" used herein may also include the plural form. It should be further understood that the word "comprising" used in the description of this application refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connections or wireless couplings. As used herein, the term "and/or" includes all or any unit and all combinations of one or more of the associated listed items.

As mentioned before, in image recognition solutions based on image classification, the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the performance based on the convolutional neural network model. Recognition performance of image recognition performed.

Take the image recognition of plankton in the marine environment as an example. Due to the continuous changes in the natural seawater environment, the types and abundance of plankton in it will inevitably change. Moreover, sampling plankton at different times in the same sea area will change with the The diurnal vertical migration of plankton has undergone drastic changes, resulting in the phenomenon of data drift. In the process of image recognition of plankton, in order to ensure the balance and stability of the data distribution, frequent changes in the training set and the reconfiguration of the convolutional neural network model are required. Training can maintain the recognition performance of image recognition in the context of changing plankton categories and abundances. However, the update of the training set, on the one hand, relies on a large amount of manual annotation and manual correction; on the other hand, the training set constructed by sampling images at limited spatial and temporal scales and resolutions is always difficult to fully and faithfully reflect the real marine environment. Plankton, these will inevitably affect the recognition accuracy of image recognition, and cannot meet the needs of real-time observation of plankton in the marine environment.

It can be seen from the above that related technologies still have limitations such as low recognition accuracy, instability, and poor generalization performance.

To this end, the image recognition method provided by this application can effectively improve the recognition accuracy and robustness, and fully ensure the generalization performance. Accordingly, the image recognition method is suitable for image recognition devices, and the image recognition device can be deployed in Electronic equipment configured with the von Neumann architecture, for example, the electronic equipment can be a desktop computer, a laptop computer, a server, etc.

In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

Figure 1 is a schematic diagram of an implementation environment involved in an image recognition method. It should be noted that this implementation environment is only an example adapted to the present invention and cannot be considered to provide any limitation on the scope of the present invention.

The implementation environment includes a collection terminal 110 and a server terminal 130.

Specifically, the collection terminal 110 can also be considered as an image collection device, including but not limited to a camera, a still camera, a camcorder and other electronic devices with a shooting function. For example, the collection terminal 110 is an underwater camera.

Server 130. The server 130 can be an electronic device such as a desktop computer, a laptop computer, a server, etc., or it can be a computer cluster composed of multiple servers, or even a cloud computing center composed of multiple servers. The server 130 is used to provide background services. For example, the background services include but are not limited to image recognition services and so on.

A network communication connection is established in advance between the server 130 and the collection terminal 110 through wired or wireless means, and data transmission between the server 130 and the collection terminal 110 is implemented through the network communication connection. The transmitted data includes but is not limited to: images to be recognized, etc.

In an application scenario, through the interaction between the collection terminal 110 and the server 130, the collection terminal 110 captures and collects the image to be recognized, and uploads the image to be recognized to the server 130 to request the server 130 to provide image recognition services.

For the server 130, after receiving the image to be identified uploaded by the collection end 110, the image recognition service is called to search for images similar to the image to be identified in the retrieval database that stores sample images and their corresponding sample categories. sample images, and then determine the target category of the image to be recognized based on the sample category corresponding to the found sample image, thereby realizing an image recognition solution that replaces image classification with image retrieval, thereby solving the inaccuracy in recognition accuracy existing in related technologies. High, unrobust, and poor generalization performance problems.

Please refer to Figure 2. This embodiment of the present application provides an image recognition method, which is suitable for electronic equipment. Specifically, the electronic equipment can be the server 130 in the implementation environment shown in Figure 1.

The description is given by taking the execution subject of each step of the method as an electronic device as an example, but this is not a specific limitation.

As shown in Figure 2, the method may include the following steps:

Step 310: Obtain the image to be recognized.

The image to be recognized is generated by photographing and collecting the environment containing the target object by the image acquisition device in the implementation environment shown in Figure 1 . The target object refers to an object in the shooting environment. For example, the target object may be an underwater creature, and specifically the underwater creature may be a plankton in a marine environment.

It can be understood that the shooting can be a single shooting or a continuous shooting. Then, for the same target object, for continuous shooting, a video can be obtained, and the image to be recognized can be any number of frames in the video. For multiple shots, multiple photos can be obtained, and the image to be recognized can be any number of photos among the multiple photos. That is to say, the image to be recognized in this embodiment may refer to a dynamic image, such as multiple frames in a video, or multiple photos, or a static image, such as any frame in a video, or For any one of the multiple photos, the image recognition in this embodiment can be performed on dynamic images or on static images, which is not limited here.

Regarding the acquisition of the image to be recognized, the image to be recognized can come from the image to be recognized that is captured and collected in real time by the image acquisition device, or it can be the image to be recognized that is captured and captured by the image acquisition device in a historical time period that is pre-stored in the electronic device. Then, for electronic devices, after the image acquisition device captures and collects the image to be recognized, the image to be recognized can be processed in real time, or it can be stored in advance for processing. For example, the image to be recognized can be processed when the CPU of the electronic device is low. , or process the image to be recognized according to the instructions of the staff. Therefore, the image recognition in this embodiment can be based on the image to be recognized obtained in real time or the image to be identified obtained in a historical time period, which is not specifically limited here.

In a possible implementation, the image to be recognized is an ROI (region of interest) image, that is to say, in the image to be recognized, the target object is located in the area of interest, which can also be understood as the target object passing through the sensor. The identification of the area of interest is significantly different from the background area. As shown in Figure 3, in the ROI image, the target object is plankton, located in the area of interest (gray-white area), which is significantly different from the background area (black area).

Step 330: Extract image features from the image to be recognized to obtain a first feature vector.

Among them, the first feature vector is used to represent the image features of the sample image. It can also be considered that the first feature vector is an accurate description of the image features of the image to be identified. It should be understood that if the image to be identified is different, the extracted image features will be different. The difference is that correspondingly, the first feature vectors are also different.

In a possible implementation, image feature extraction can be implemented through feature extraction algorithms such as directional gradient histogram features, local binary pattern features, and Haar-like features.

In a possible implementation, image feature extraction is achieved through a convolution kernel. It should be noted that based on different numbers and different sizes of convolution kernels, first feature vectors of different lengths will be obtained to reflect the image to be recognized from different scales.

In one possible implementation, the image features are extracted through a feature extractor. Specifically, the feature extractor that has completed model training is used to convert the image to be recognized into a first feature vector.

Step 350: In the retrieval database used to store sample images and their corresponding sample categories, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition.

First of all, the retrieval database essentially establishes a correspondence between sample images and their corresponding sample categories. By establishing this correspondence, the sample category corresponding to the sample object can be quickly determined, which is then used as the basis for image retrieval. In one possible implementation, the sample image refers to an image labeled with a sample category. In other words, the sample image refers to an image carrying a label indicating the sample category.

The essence of image retrieval is to measure the similarity between the image to be recognized and the sample images in the retrieval database. Image recognition based on image retrieval does not directly obtain the target category of the image to be recognized, but by comparing the image to be recognized and the retrieved image. The similarity between the sample images in the library indirectly obtains the target category of the image to be recognized, that is, first obtains the sample category corresponding to the sample image that satisfies the similarity condition between the images to be recognized, and then obtains the image to be recognized. target category.

Secondly, in this embodiment, the comparison of the similarity between the image to be recognized and the sample image in the retrieval database is achieved by calculating the similarity between the first feature vector and the second feature vector. The first feature vector is used to represent the image features of the image to be recognized, and the second feature vector is used to represent the image features of the sample images in the retrieval database.

In a possible implementation, the similarity calculation scheme includes but is not limited to: cosine similarity, Euclidean distance, Manhattan distance, Jaccard similarity coefficient, Pearson correlation coefficient, etc.

Take cosine similarity as an example to illustrate the calculation process of similarity:

In the calculation formula (1), Similarity(x,y) represents the similarity between x and y, and the value range of this similarity is [0,1]; x represents the first feature vector of the image to be recognized, and y represents the sample The second eigenvector of the image. It should be understood that the closer the similarity is to 1, the closer the first feature vector and the second feature vector are, that is, the more similar the image to be recognized is to the sample image.

It is worth mentioning that, as mentioned above, the image to be recognized is not limited to a static image, such as a photo or a frame, but can also be a dynamic image. If the image to be recognized refers to a dynamic image, such as multiple photos or multiple frames screen, you can combine calculation formula (1) and calculation formula (2) to calculate multiple similarities at the same time.

V=Q×G ^T (2).

Among them, V represents the similarity result matrix, Q represents the first eigenvector matrix of the image to be recognized, and G represents the second eigenvector matrix of each sample image in the retrieval database.

Based on this, in the similarity result matrix V, the values in each column of the i-th row represent: the first feature vector of the i-th photo or i-th frame in the image to be recognized, and the second feature vector of each sample image in the retrieval database. This method not only greatly improves the similarity calculation efficiency, but also realizes simultaneous recognition of multiple photos/multiple images, which is conducive to batch processing of images to be recognized and can effectively improve recognition efficiency.

In one possible implementation, the similarity condition refers to the highest degree of similarity. Therefore, the sample image with the highest degree of similarity between the second feature vector and the first feature vector is used as the sample image found from the retrieval database. Of course, in other embodiments, the similarity condition may also mean that the similarity exceeds the similarity threshold (for example, the similarity threshold = 0.8), or the similarity ranking exceeds the set ranking (for example, the set ranking is 10), etc., Then, the sample images whose similarity between the second eigenvector and the first eigenvector is greater than 0.8, or the sample images whose similarity between the second eigenvector and the first eigenvector are ranked in the top 10, are selected from the retrieval database. Found sample image.

In a possible implementation, the second feature vector is pre-calculated and stored in the storage area of the electronic device. In this way, when performing image recognition on different images to be recognized, it can be read directly from the storage area of the electronic device. Taking the second feature vector calculated in advance avoids repeated extraction of the second feature vector in each image recognition process, thereby helping to further improve the recognition efficiency of image recognition.

In a possible implementation, the second feature vector is stored in the storage area of the electronic device in the form of LUT (Look-up Table). Then, during the image recognition process, the LUT can be directly loaded into the memory of the electronic device to This avoids repeated extraction of the second feature vector in each image recognition process.

The above process is especially suitable for image recognition of out-of-distribution samples. For example, for new categories that have not participated in the training phase, the image recognition scheme based on image classification will not only affect the accuracy of classification, but also lead to inaccuracies in abundance quantification. The image recognition solution based on image retrieval can more accurately exclude out-of-distribution samples through similarity calculation, thereby effectively ensuring the recognition accuracy of image recognition.

Step 370: Determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.

That is to say, the sample category corresponding to the found sample image is the recognition result obtained by image recognition of the image to be recognized, that is, the target category of the image to be recognized.

The inventor realized that the target category of the image to be recognized may be a new category, that is, it does not belong to any sample category corresponding to each sample image in the retrieval database. It can also be understood that the target category of the image to be recognized is an unknown category. , at this time, according to the sample category corresponding to the found sample image, the target category of the image to be recognized cannot actually be obtained correctly.

Based on this, in order to avoid image recognition errors, in this embodiment, a decision condition is proposed to reject the recognition of unknown categories, thereby avoiding recognition errors.

In a possible implementation, the decision condition refers to that the similarity between the image to be recognized and the found sample image is greater than the similarity threshold. Then, the category decision-making process based on this decision condition specifically refers to: if the similarity between the second feature vector of the found sample image and the first feature vector of the image to be recognized is greater than the similarity threshold, then the found sample image will be The corresponding sample category is used as the target category of the image to be recognized; otherwise, the target category of the image to be recognized is determined to be a new category.

Of course, in other embodiments, the decision-making condition may also be related to the weight configured for the found sample image, which is not specifically limited here.

Through the above process, an image recognition solution is implemented where image retrieval replaces image classification. Since the recognition accuracy of image retrieval depends on the sample images in the retrieval database and their corresponding sample categories, unlike image classification, which relies on frequent changes in the training set and The retraining of the convolutional neural network model can fully ensure the recognition accuracy while minimizing manual participation, and can effectively solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies. The problem.

Figure 4 shows a schematic structural diagram of the basic model in one embodiment. In Figure 4, the basic model includes a first training branch and a second training branch. The first training branch and the second training branch respectively include a feature extraction layer and dimensionality reduction. layer. Among them, the feature extraction layer can be considered as a feature extractor that has not completed model training and is used to extract image features; the dimensionality reduction layer consists of two fully connected layers and is used to further reduce the dimensionality of the feature vector obtained by the feature extraction layer, for example , convert the feature vector of length 2048 obtained by the feature extraction layer into a feature vector of length 128.

Figure 5 shows a schematic structural diagram of the feature extraction layer in one embodiment. In Figure 5, the feature extraction layer is a convolutional neural network model with a structural depth of 50 layers and does not include a fully connected layer. As shown in Figure 5, in the 50-layer structure, in addition to the convolution layer Conv, the pooling layer Pool, and the activation function layer ReLU, it is also based on the ResNeXt module and introduces the SE (Squeeze-and-Excitation) attention module. The feature vector obtained based on this feature extraction layer has strong abstract expression ability, and with the assistance of the attention mechanism, it can focus on the parts of the image that play a major role in recognition, such as the area of interest in the ROI image, thus fully This ensures that image features can be extracted more effectively.

Now combined with Figures 4 to 7, the model training process of the feature extraction layer, that is, based on the image pairs in the training set, the basic model is trained to obtain the feature extractor. The following is a detailed description:

Referring to Figure 6, in an exemplary embodiment, the model training process may include the following steps:

Step 410: Traverse the image pairs in the training set.

Among them, the image pairs include positive sample pairs and negative sample pairs. The two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories.

Described here is the construction process of image pairs:

As shown in Figure 7, at least two different image data enhancement processes are performed on one of the sample images 701 in the training set, and then at least a first enhanced image 7011 and a second enhanced image 7012 are obtained from the sample image 701. Among them, image data enhancement processing (Image Data Augmentation) includes but is not limited to: random cropping, rotation, flipping, grayscale, brightness adjustment, contrast adjustment, saturation adjustment, etc., which are not limited here.

The first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set are subjected to image pairing processing to obtain an image pair.

For example, assume that the sample images in the training set include 701 and 702. Correspondingly, the first enhanced image and the second enhanced image obtained by amplifying the sample image 701 are 7011 and 7012 respectively. The first enhanced image obtained by amplifying the sample image 702 The enhanced image and the second enhanced image are 7021 and 7022 respectively.

Then, after image pairing processing, the constructed image pairs include {7011, 7012}, {7011, 7021}, {7011, 7022}, {7012, 7021}, {7012, 7022}, {7021, 7022}. Among them, among the above image pairs, {7011, 7012}, {7021, 7022} belong to the positive sample pair, and {7011, 7021}, {7011, 7022}, {7012, 7021}, {7012, 7022} belong to the negative sample pair right.

Then, the traversal process for image pairs in the training set can include the following steps:

Step 411: Input the two sample images in the image pair to the first training branch and the second training branch respectively for processing.

As shown in Figure 4, in one possible implementation, in the first training branch or the second training branch, the processing at least includes: extracting image features through the feature extraction layer, and reducing the dimensionality of the feature vector through the dimensionality reduction layer. wait.

It is worth mentioning that in order to avoid distortion, the sample images are pre-processed before being input to the first training branch or the second training branch. In a possible implementation manner, preprocessing includes but is not limited to: filling, scaling, normalization, etc. In this manner, since the occurrence of distortion is avoided, it is conducive to further effectively improving the accuracy of recognition.

Among them, the purpose of preprocessing such as padding and scaling is to ensure a unified input size of the first training branch or the second training branch. For example, the unified input size is 224×224.

Normalization preprocessing means that the sample image is normalized pixel by pixel according to the following calculation formula (3) after encoding preprocessing.

Among them, I _Norm represents the pixels in the sample image that have completed normalization processing, and I represents the pixels to be processed in the sample image;

mean and std respectively represent the pixel mean and pixel standard deviation of all pixels in all sample images in the training set.

Step 413: Calculate the model loss value based on the processing results obtained by the first training branch and the second training branch.

In a possible implementation, the calculation formula (4) of the model loss value is as follows:

Among them, L ^sup represents the model loss value;

I represents the set of all sample images in the training set; P(i) represents the set of positive sample pairs to which the i-th sample image belongs in the training set; A(i) represents all sample images in the training set excluding the i-th sample image. collection;

|F(i)| represents the number of sample images in the set P(i);

z _i is the feature vector of the i-th sample image in set I; z _p is the feature vector of the p-th sample image in set P(i); z _α is the feature vector of the α-th sample image in set A(i) ; τ is a temperature hyperparameter, used to balance the degree of attention the loss function pays to positive sample pairs and negative sample pairs respectively.

If the model loss value causes the convergence condition to be satisfied, step 430 is executed.

Otherwise, if the model loss value causes the convergence condition to be unsatisfied, step 415 is executed.

It should be noted that the convergence condition can refer to the model loss value being minimum or lower than the loss value threshold, or it can also refer to the number of iterations meeting the iteration threshold. This is not limited here and can be flexibly set according to the actual needs of the application scenario.

Step 415: Update the parameters of the basic model and return to step 410.

Step 430: The feature extractor is obtained by converging the feature extraction layer in the basic model.

As a result, the supervised contrastive learning model training of the feature extraction layer is completed, so that the feature extractor has the effect of bringing the positive sample closer to the two sample images in the feature space and pushing the negative sample farther away from the two sample images. .

It is worth mentioning that after the model training is completed, both the dual training branch and the dimensionality reduction layer are discarded, and only one of the feature extraction layers in the dual training branch is retained as a feature extractor for subsequent image recognition. Compared with The convolutional neural network model in image classification has a greatly simplified model structure, which further avoids relying on frequent changes in the training set to maintain recognition performance, and is more conducive to improving recognition accuracy.

In an exemplary embodiment, the above method may further include the following steps: constructing a feature vector set from the second feature vector of the sample image in the retrieval library.

In one possible implementation, the feature vector set is a LUT.

As mentioned above, the second feature vector can be stored in the storage area of the electronic device in a LUT manner, thereby avoiding repeated extraction of the second feature vector in each image recognition process, thereby improving the recognition efficiency of image recognition. However, the inventor also realized that as the number of sample images in the retrieval library increases, the number of pre-calculated second feature vectors in the LUT also increases, due to the need to calculate the first feature vector and each second feature in the LUT Vector similarity, then the number of second feature vectors in the LUT will affect the similarity calculation speed, thereby affecting the recognition efficiency of image recognition.

To this end, in this embodiment, a construction process of a feature vector set is proposed to realize LUT pruning, which can not only reduce the size of the LUT, that is, reduce the number of second feature vectors in the LUT, but also try to maintain the second feature vector in the LUT. Diversity of feature vectors.

Specifically, as shown in Figure 8a, the construction process of the feature vector set may include the following steps:

Step 510: Perform image feature extraction on each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set.

Step 530: Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set, and obtain the second feature vector. One degree of similarity.

Step 550: Based on the first similarity, delete the second feature vector with high redundancy from the feature vector set.

The redundancy of the second feature vector is used to indicate the number of similar second feature vectors to the second feature vector in the feature vector set. It should be understood that the higher the redundancy, the greater the number of second feature vectors that are similar to the second feature vector in the feature vector set. Then, in the feature vector set, it can be considered that the second feature vector corresponds to If the sample image is redundant, the second feature vector can be deleted from the set of feature vectors.

In a possible implementation, as shown in Figure 8b, the LUT pruning process can include the following steps:

Step 551: Use the second feature vector whose first similarity to the first vector is greater than the first set threshold as the second vector.

Step 553: Calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity.

Step 555: Determine the redundancy of the first vector based on the number of second feature vectors whose first similarity with the first vector is greater than the first set threshold, and determine the redundancy of the first vector based on the second similarity with the second vector that is greater than the first set threshold. 2. Set the number of second feature vectors with a threshold value to determine the redundancy of the second vector.

The redundancy of the first vector is used to indicate the number of similar second feature vectors to the first vector in the feature vector set. Similarity means that the first similarity is greater than the first set threshold.

The redundancy of the second vector is used to indicate the number of similar second feature vectors to the second vector in the feature vector set. Similarity means that the second similarity is greater than the first set threshold.

Step 557: Based on the redundancy of the second feature vector, delete the corresponding second feature vector from the feature vector set.

Among them, if the redundancy of the first vector is greater than the redundancy of the second vector, the first vector is deleted from the feature vector set. On the contrary, if the redundancy of the second vector is greater than the redundancy of the first vector, Then the second vector is deleted from the feature vector set.

For example, the second eigenvectors in the eigenvector set are: A, B, C, and D respectively.

Assume that the second eigenvector currently traversed is A, as the first vector, then calculate the similarity between the first vector A and the remaining second eigenvectors B, C, and D respectively, and obtain the first similarity: 0.91, 0.95, 0.97.

Assuming that the first similarities 0.91, 0.95, and 0.97 are all greater than the first set threshold (0.8), then the second feature vectors B, C, and D are used as the second vectors. At this time, the second vector B and the remaining second vectors are calculated respectively. The similarities of the feature vectors A, C, and D are used to obtain the second similarity: 0.91, 0.7, and 0.97; the similarities between the second vector C and the remaining second feature vectors A, B, and D are calculated respectively, and the second similarity is obtained: 0.95, 0.97, 0.75; calculate the similarity between the second vector D and the remaining second feature vectors A, B, and C respectively, and obtain the second similarity: 0.97, 0.75, 0.77.

Assuming that the second set threshold is also 0.8, it can be seen from the above that the number of second feature vectors (B, C, D) with a first similarity greater than 0.8 to the first vector A is 3, and the number of second feature vectors (B, C, D) with the first similarity of The number of second feature vectors (A, D) with a second similarity greater than 0.8 is 2, and the number of second feature vectors (A, B) with a second similarity greater than 0.8 with the second vector C is 2, The number of second feature vectors (A) whose second similarity degree is greater than 0.8 with the second vector D is 1.

Assuming that the redundancy is represented by numbers, the redundancy of the first vector A is 3, the redundancy of the second vector B is 2, the redundancy of the second vector C is 2, and the redundancy of the second vector D is 2. The degree is 1. Based on this, the first vector A with a redundancy of 3 is deleted from the feature vector set. Of course, in other embodiments, the redundancy can also be expressed in other forms, such as a number-based normalization method, which is not specifically limited here.

The first set threshold and the second set threshold may be the same or different, and they may be flexibly adjusted according to the actual needs of the application scenario to balance recognition efficiency and recognition accuracy. For example, in application scenarios with high recognition efficiency requirements, set a smaller first set threshold.

With the cooperation of the above embodiments, for different second feature vectors in the LUT, only when the distance between the different second feature vectors in the feature space is less than the first set threshold, they are retained, thereby realizing LUT pruning, that is, It can reduce the size of the LUT and try to ensure the diversity of the second feature vector in the LUT.

In a plankton application scenario, assuming that there are 200 plankton categories, if each plankton category contains 1,000 sample images, then the retrieval library contains 200,000 sample images, then the LUT contains up to 200,000 second features. Vector, taking the LUT in NVIDA RTX3090 GPU as an example, image recognition of the image to be recognized takes up to 5.8ms, which can fully meet the needs of real-time observation of plankton in the marine environment.

Referring to Figure 9, in an exemplary embodiment, the above method may further include the following steps:

Step 610: In response to the category correction instruction, correct the target category of the image to be recognized.

Step 630: If the corrected target category of the image to be recognized is a new category, add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction.

Among them, the new category means that the corrected target category of the image to be recognized is different from the sample category in the retrieval database.

As mentioned before, in image recognition solutions based on image classification, the recognition performance of image recognition partially relies on a large amount of manual annotation and manual correction. Due to the large workload of manual participation and the short period of manual participation, this type of image recognition This solution is not conducive to reducing the cost of image recognition, and cannot achieve a relatively robust, flexible and cheap automatic image recognition solution.

To this end, in this embodiment, in the image recognition solution based on image retrieval, a human-computer interaction interface is provided, thereby helping to promptly discover and correct image recognition deviations to fully ensure the recognition performance of image recognition.

Specifically, FIG. 10 shows a schematic diagram of an image recognition framework based on image retrieval in one embodiment. In Figure 10, the image recognition framework includes: a query image module (query) 801 for obtaining images to be recognized, a retrieval library (gallery) 802 for storing sample images and their corresponding sample categories, and a retrieval module (gallery) 802 for performing image feature analysis. The extracted feature extractor 803, the LUT 804 for storing the second feature vector, the measurement module 805 for calculating the similarity between the first feature vector and the second feature vector, and the decision module 806 for determining the target category of the image to be recognized. , human-computer interaction interface.

As shown in Figure 10, the human-computer interaction interface includes a correction interface 807 and an adding interface 808. The correction interface 807 is used to generate a category correction instruction to correct the target category of the image to be recognized; the adding interface 808 is used to generate a category addition instruction to add the image to be recognized and its corrected target category to the retrieval library.

Assume that the electronic device is a smartphone that provides browsing of recognition results. The smartphone displays a browsing page for browsing the recognition results, and the browsing page displays a correction interface and an adding interface. It should be noted that the correction interface and the addition interface are essentially controls that can realize human-computer interaction. For example, the controls can be input boxes, selection boxes, buttons, switches, progress bars, etc.

Then, if the user finds that the target category of the image to be recognized is a new category, he can trigger the corresponding operation on the correction interface. For the correction interface, if the corresponding operation triggered by the user is detected, a category correction instruction is generated to instruct the electronic device The target category of the image to be recognized is corrected in response to the category correction instruction. For example, the correction interface is an input box for the user to input the name of the new category, and the user's input operation is regarded as the corresponding operation triggered by the user on the correction interface; similarly , when the corrected target category of the image to be recognized is a new category, the user can also trigger the corresponding operation in the adding interface. For the adding interface, if the corresponding operation triggered by the user is detected, a category adding instruction is generated, so that Instruct the electronic device to add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction. For example, the adding interface is a "confirm/cancel" button for the user to click, and the user's click operation is regarded as The corresponding operation triggered by the user when adding the interface.

It should be added that depending on the input components configured in the electronic device, the specific behavior of the corresponding operation triggered by the user will be different. For example, if the electronic device is a smartphone equipped with a touch screen, the corresponding operations triggered may be click, touch, slide and other gesture operations; or, if the electronic device is a laptop equipped with a mouse, the corresponding operations triggered may be click, touch, slide, etc. Mechanical operations such as double-clicking and dragging are not specifically limited in this embodiment.

Under the effect of the above embodiments, effective supervision is introduced into image recognition. On the one hand, it helps to timely discover and correct deviations in image recognition to fully ensure the recognition performance of image recognition. On the other hand, it can effectively enhance retrieval. The diversity of sample images in the library not only avoids the degradation of recognition results due to data drift, but also effectively improves the recognition effect of image recognition. The rich diversity of the retrieval library enables more areas of the feature space to be composed of points composed of sample images. Covered by the group, the corresponding target category of the image to be recognized can be more accurately recognized when it falls into this area, which is beneficial to improving the recognition accuracy.

In addition, the image recognition framework based on image retrieval has the characteristic of "by adding a new category in the retrieval library, the target category of the image to be recognized can be immediately recognized as a new category", making retraining not suitable for this image recognition framework. is always necessary, thus helping to delay the need for retraining and reducing the frequency of retraining, providing more convenience and greater flexibility for image recognition.

The following are device embodiments of the present application, which can be used to execute the image recognition method involved in the present application. For details not disclosed in the device embodiments of this application, please refer to the method embodiments of the image recognition method involved in this application.

Referring to Figure 11, an embodiment of the present application provides an image recognition device 900, including but not limited to: an image acquisition module 910, a feature extraction module 930, an image search module 950, and an image recognition module 970.

Among them, the image acquisition module 910 is used to acquire the image to be recognized.

The feature extraction module 930 is used to extract image features from the image to be recognized to obtain the first feature vector.

The image search module 950 is used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval database used to store sample images and their corresponding sample categories. The second feature vector is used to Represents the image features of the sample image.

The image recognition module 970 is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.

It should be noted that when the image recognition device provided in the above embodiment performs image processing, only the division of the above functional modules is used as an example. In actual applications, the above function allocation can be completed by different functional modules as needed. , that is, the internal structure of the image recognition device will be divided into different functional modules to complete all or part of the functions described above.

In addition, the image recognition device provided in the above embodiments and the image recognition method embodiments belong to the same concept. The specific manner in which each module performs operations has been described in detail in the method embodiments and will not be described again here.

Figure 12 shows a schematic structural diagram of an electronic device according to an exemplary embodiment. The electronic device is suitable for the server 130 in the implementation environment shown in FIG. 1 .

It should be noted that this electronic device is only an example adapted to the present application and cannot be considered to provide any limitation on the scope of use of the present application. The electronic device is also not to be construed as being dependent on or required to have one or more components of the exemplary electronic device 2000 shown in FIG. 12 .

The hardware structure of the electronic device 2000 may vary greatly due to different configurations or performance. As shown in Figure 12, the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU). ,Central Processing Units)270.

Specifically, the power supply 210 is used to provide operating voltage for each hardware device on the electronic device 2000 .

The interface 230 includes at least one wired or wireless network interface for interacting with external devices. For example, the interaction between the collection terminal 110 and the server terminal 130 in the implementation environment shown in Figure 1 is performed.

Of course, in other examples adapted to this application, the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., as shown in Figure 12, which is not intended here. This constitutes a specific limitation.

As a carrier for resource storage, the memory 250 can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon include the operating system 251, application programs 253, data 255, etc., and the storage method can be short-term storage or permanent storage. .

Among them, the operating system 251 is used to manage and control each hardware device and application program 253 on the electronic device 2000, so as to realize the operation and processing of the massive data 255 in the memory 250 by the central processor 270, which can be Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM, etc.

The application program 253 is a computer program that performs at least one specific job based on the operating system 251. It may include at least one module (not shown in FIG. 12), and each module may include a computer program for the electronic device 2000. . For example, the image recognition device can be regarded as an application program 253 deployed on the electronic device 2000.

The data 255 may be photos, pictures, etc. stored in a disk, or may be an image to be recognized, etc., stored in the memory 250 .

The central processing unit 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read the computer program stored in the memory 250 and thereby implement operations on the massive data 255 in the memory 250 and processing. For example, the image recognition method is completed by the central processor 270 reading a series of computer programs stored in the memory 250 .

In addition, the present application can also be implemented through hardware circuits or hardware circuits combined with software. Therefore, implementation of the present application is not limited to any specific hardware circuit, software, or combination of the two.

Referring to Figure 13, an electronic device 4000 is provided in an embodiment of the present application. The electronic device 400 may include: a desktop computer, a notebook computer, an electronic device, etc.

In FIG. 13 , the electronic device 4000 includes at least one processor 4001 , at least one communication bus 4002 and at least one memory 4003 .

Among them, the processor 4001 and the memory 4003 are connected, such as through a communication bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004, which may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that in practical applications, the number of transceivers 4004 is not limited to one, and the structure of the electronic device 4000 does not constitute a limitation on the embodiments of the present application.

The processor 4001 can be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, a data signal processor), an ASIC (Application Specific Integrated Circuit, an application-specific integrated circuit), or an FPGA (Field Programmable Gate Array). , field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.

Communication bus 4002 may include a path that carries information between the above-mentioned components. The communication bus 4002 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture) bus, etc. The communication bus 4002 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 13, but it does not mean that there is only one bus or one type of bus.

The memory 4003 can be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory) or other types that can store information and instructions. Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disk storage, optical disk storage (including compression Optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer Any other medium, without limitation.

The computer program is stored in the memory 4003, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.

When the computer program is executed by the processor 4001, the image recognition method in each of the above embodiments is implemented.

In addition, embodiments of the present application provide a storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, the image recognition method in the above embodiments is implemented.

An embodiment of the present application provides a computer program product. The computer program product includes a computer program, and the computer program is stored in a storage medium. The processor of the electronic device reads the computer program from the storage medium, and the processor executes the computer program, so that the electronic device performs the image recognition method in the above embodiments.

Compared with related technologies, the image recognition framework based on image retrieval, on the premise of good image quality of the sample images in the retrieval library, uses powerful image feature representation brought by supervised contrastive learning to make the features belong to the same category in the feature space. The clustering of positive examples and the distance of negative examples belonging to different categories not only avoids relying on model retraining, but also effectively improves the recognition efficiency of image recognition, and fully guarantees the recognition accuracy of image recognition.

In addition, the retrieval library in the image recognition framework is not only suitable for re-training, but also suitable for user adjustment, which is conducive to flexible and customized services for recognition tasks of different attributes and spaces. For example, for the recognition task of diverse organisms, the number of sample categories in the retrieval database should be expanded as much as possible so that the image recognition ability can take into account the diversity; for the identification task of plankton in a specific sea area, the sample images in the retrieval database can be Limiting the sample categories, that is, excluding impossible sample categories, can not only reduce the calculation amount of similarity calculation, but also prevent the image to be recognized from being misidentified as impossible sample categories, indirectly ensuring the recognition performance of image recognition; For identification tasks with a limited number of organisms of interest, the size of the retrieval library can be further reduced to include only the sample categories of interest.

It should be understood that although various steps in the flowchart of the accompanying drawings are shown in sequence as indicated by arrows, these steps are not necessarily performed in the order indicated by arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least some of the steps in the flow chart of the accompanying drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and their execution order is also It does not necessarily need to be performed sequentially, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

The above are only some of the embodiments of the present application. It should be pointed out that those of ordinary skill in the technical field can also make several improvements and modifications without departing from the principles of the present application. These improvements and modifications can also be made. should be regarded as the scope of protection of this application.

Claims

An image recognition method, characterized in that the method includes:

Get the image to be recognized;

Perform image feature extraction on the image to be recognized to obtain a first feature vector;

In the retrieval database used to store sample images and their corresponding sample categories, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition, and the second feature vector is used to represent the Image features of the sample image;

According to the sample category corresponding to the found sample image, the target category of the image to be recognized is determined.
The method of claim 1, wherein the step of extracting image features from the image to be identified to obtain a first feature vector includes:

Using a feature extractor that has completed model training, the image to be recognized is converted into the first feature vector.
The method of claim 2, further comprising: performing model training on a basic model according to the image pairs in the training set to obtain the feature extractor, and the basic model includes a first training branch and a second training branch, the first training branch and the second training branch respectively include a feature extraction layer and a dimensionality reduction layer;

The basic model is trained according to the image pairs in the training set to obtain the feature extractor, which includes:

Traverse the image pairs in the training set. The image pairs include positive sample pairs and negative sample pairs. The two sample images in the positive sample pair belong to the same sample category. The two sample images in the negative sample pair Sample images belong to different sample categories; the traversal includes:

Enter the two sample images in the image pair into the first training branch and the second training branch respectively for processing;

Calculate the model loss value according to the processing results obtained by the first training branch and the second training branch;

If the model loss value causes the convergence condition to be satisfied, the feature extractor is obtained by converging from the feature extraction layer in the basic model.
The method of claim 3, further comprising: constructing the image pair in the training set;

Constructing the image pair in the training set includes:

Perform at least two different image data enhancement processes on one of the sample images in the training set, and then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image;

The first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set are subjected to image pairing processing to obtain the image pair.
The method according to claim 1, characterized in that, in the retrieval database used to store sample images and their corresponding sample categories, the similarity between the second feature vector and the first feature vector is found to satisfy a similarity condition. Sample images of, include:

For each second feature vector in the feature vector set, the similarity between the second feature vector and the first feature vector is calculated respectively. The feature vector set is composed of the second feature vector of the sample image in the retrieval database. Construct;

The sample image with the highest similarity between the second feature vector and the first feature vector is used as the sample image found from the retrieval database.
The method of claim 5, further comprising: constructing the feature vector set from the second feature vector of the sample image in the retrieval library;

Constructing the feature vector set from the second feature vector of the sample image in the retrieval database includes:

Perform image feature extraction on each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set;

Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, and calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set. , get the first similarity;

Based on the first similarity, delete second feature vectors with high redundancy from the feature vector set, where the redundancy is used to indicate individuals with similar second feature vectors in the feature vector set. number.
The method of claim 6, wherein deleting second feature vectors with high redundancy from the set of feature vectors based on the first similarity includes:

Use the second feature vector whose first similarity to the first vector is greater than the first set threshold as the second vector;

Calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set respectively to obtain a second similarity;

The redundancy of the first vector is determined based on the number of second feature vectors whose first similarity with the first vector is greater than the first set threshold, and the redundancy of the first vector is determined based on the second similarity with the second vector. The number of second feature vectors whose degrees are greater than the second set threshold is used to determine the redundancy of the second vectors;

If the redundancy of the first vector is greater than the redundancy of the second vector, the first vector is deleted from the feature vector set.
The method according to any one of claims 1 to 7, characterized in that determining the target category of the image to be recognized according to the sample category corresponding to the found sample image includes:

If the second feature vector of the found sample image satisfies the decision-making condition, the sample category corresponding to the found sample image is used as the target category of the image to be recognized.
The method according to any one of claims 1 to 7, characterized in that, after determining the target category of the image to be recognized based on the sample category corresponding to the found sample image, the method further includes:

In response to a category correction instruction, correct the target category of the image to be recognized;

In the case where the corrected target category of the image to be recognized is a new category, in response to a category adding instruction, the image to be recognized and its corrected target category are added to the retrieval database, where the new category refers to The corrected target category of the image to be recognized is different from the sample category in the retrieval library.
An image recognition device, characterized in that the device includes:

Image acquisition module, used to acquire images to be recognized;

A feature extraction module, used to extract image features from the image to be recognized to obtain a first feature vector;

An image search module, configured to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in a retrieval database used to store sample images and their corresponding sample categories, where the second feature The vector is used to represent the image features of the sample image;

The image recognition module is used to determine the target category of the image to be recognized according to the sample category corresponding to the found sample image.