WO2023231355A1 - Image recognition method and apparatus - Google Patents

Image recognition method and apparatus Download PDF

Info

Publication number
WO2023231355A1
WO2023231355A1 PCT/CN2022/137039 CN2022137039W WO2023231355A1 WO 2023231355 A1 WO2023231355 A1 WO 2023231355A1 CN 2022137039 W CN2022137039 W CN 2022137039W WO 2023231355 A1 WO2023231355 A1 WO 2023231355A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sample
feature vector
feature
vector
Prior art date
Application number
PCT/CN2022/137039
Other languages
French (fr)
Chinese (zh)
Inventor
杨振宇
李剑平
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2023231355A1 publication Critical patent/WO2023231355A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the field of image processing technology. Specifically, the present application relates to an image recognition method and device.
  • Image recognition is an important research topic in the field of computer vision and has been widely used in many fields. For example, it is aimed at image recognition of plankton in the marine environment to achieve long-term, continuous in-situ observation of the plankton.
  • image recognition usually uses a training set to train a convolutional neural network model, and then predicts the category of the image to be recognized based on the convolutional neural network model to obtain the target category of the image to be recognized.
  • the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the image recognition based on the convolutional neural network model. recognition performance.
  • Each embodiment of the present application provides an image recognition method, device, electronic device, and storage medium, which can solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies.
  • the technical solutions are as follows:
  • an image recognition method includes: obtaining an image to be recognized; performing image feature extraction on the image to be recognized to obtain a first feature vector; and storing sample images and their corresponding samples.
  • search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition, and the second feature vector is used to represent the image features of the sample image; according to the found sample
  • the sample category corresponding to the image determines the target category of the image to be recognized.
  • an image recognition device includes: an image acquisition module, used to acquire an image to be recognized; a feature extraction module, used to extract image features from the image to be recognized, to obtain a first feature vector ; Image search module, used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval library used to store sample images and their corresponding sample categories, the second The feature vector is used to represent the image features of the sample image; the image recognition module is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
  • the feature extraction module includes: an extractor unit, configured to convert the image to be recognized into the first feature vector using a feature extractor that has completed model training.
  • the device further includes: a model training module, configured to perform model training on a basic model according to the image pairs in the training set to obtain the feature extractor, where the basic model includes a first training branch and a second training branch, the first training branch and the second training branch respectively include a feature extraction layer and a dimensionality reduction layer;
  • the model training module includes: an image traversal unit, used to perform image pairs in the training set Traverse, the image pair includes a positive sample pair and a negative sample pair, the two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories;
  • the traversal includes: inputting two sample images in the image pair into the first training branch and the second training branch respectively for processing; obtaining according to the first training branch and the second training branch.
  • the processing result is to calculate the model loss value;
  • the convergence unit is used to obtain the feature extractor by converging the feature extraction layer in the basic model if the convergence condition is satisfied by the model loss value.
  • the device further includes: an image pair building module; the image pair building module includes: an amplification unit, configured to perform at least two different operations on one of the sample images in the training set. Image data enhancement processing, then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image; a pairing unit is used to amplify the first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set. Enhance the image and perform image pairing processing to obtain the image pair.
  • an image pair building module includes: an amplification unit, configured to perform at least two different operations on one of the sample images in the training set.
  • Image data enhancement processing then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image
  • a pairing unit is used to amplify the first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set. Enhance the image and perform image pairing processing to obtain the image pair.
  • the image search module includes: a similarity calculation unit, configured to respectively calculate the second feature vector and the first feature vector for each second feature vector in the feature vector set.
  • the similarity of the feature vector set is constructed from the second feature vector of the sample image in the retrieval library; the image search unit is used to select the sample with the highest similarity between the second feature vector and the first feature vector. image as a sample image found from the retrieval library.
  • the device further includes: a set building module, configured to build the feature vector set from the second feature vector of the sample image in the retrieval library; the set building module package: a vector adding unit , used to extract image features from each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set; a vector traversal unit, used for Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, and calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set. , obtain the first similarity; the vector deletion unit is used to delete the second feature vector with high redundancy from the feature vector set based on the first similarity, the redundancy is used to indicate that in the The number of similar second feature vectors in the feature vector set.
  • a set building module configured to build the feature vector set from the second feature vector of the sample image in the retrieval library
  • the set building module package a vector adding unit , used to extract image features from each
  • the vector deletion unit includes: a vector determination subunit, configured to use a second feature vector whose first similarity to the first vector is greater than a first set threshold as the second vector. ; Similarity calculation subunit, used to respectively calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity; Redundancy calculation subunit, used to calculate the similarity according to the second vector and the remaining second feature vectors in the feature vector set; The number of second feature vectors whose first similarity is greater than the first set threshold is determined, and the redundancy of the first vector is determined based on the second similarity with the second vector being greater than the first set threshold. 2.
  • the image recognition module includes: an image recognition unit, configured to use the sample category corresponding to the found sample image as The target category of the image to be recognized.
  • the device further includes: a new category correction module, configured to correct the target category of the image to be recognized in response to a category correction instruction; and a new category adding module, configured to correct the target category of the image to be recognized in the image to be recognized.
  • a new category correction module configured to correct the target category of the image to be recognized in response to a category correction instruction
  • a new category adding module configured to correct the target category of the image to be recognized in the image to be recognized.
  • an electronic device includes: at least one processor, at least one memory, and at least one communication bus, wherein a computer program is stored on the memory, and the processor reads the data in the memory through the communication bus.
  • Computer program when the computer program is executed by the processor, the image recognition method as described above is implemented.
  • a storage medium has a computer program stored thereon.
  • the computer program is executed by a processor, the image recognition method as described above is implemented.
  • a computer program product includes a computer program, the computer program is stored in a storage medium, a processor of an electronic device reads the computer program from the storage medium, and the processor executes the computer program, such that When executed, the electronic device implements the image recognition method as described above.
  • Figure 1 is a schematic diagram of an implementation environment involved in this application.
  • Figure 2 is a flow chart of an image recognition method according to an exemplary embodiment
  • Figure 3 is a schematic diagram showing that the image to be recognized is an ROI image according to an exemplary embodiment
  • Figure 4 is a schematic structural diagram of a basic model according to an exemplary embodiment
  • Figure 5 is a schematic structural diagram of a feature extraction layer according to an exemplary embodiment
  • Figure 6 is a method flow chart of the model training process of the feature extraction layer according to an exemplary embodiment
  • Figure 7 is a schematic diagram of an image pairing process according to an exemplary embodiment
  • Figure 8a is a method flow chart of a construction process of a feature vector set according to an exemplary embodiment
  • Figure 8b is a method flow chart in one embodiment of step 550 involved in the corresponding embodiment of Figure 8a;
  • Figure 9 is a flow chart of another image recognition method according to an exemplary embodiment.
  • Figure 10 is a schematic diagram of an image recognition framework based on image retrieval according to an exemplary embodiment
  • Figure 11 is a structural block diagram of an image recognition device according to an exemplary embodiment
  • Figure 12 is a hardware structure diagram of an electronic device according to an exemplary embodiment
  • Figure 13 is a structural block diagram of an electronic device according to an exemplary embodiment.
  • the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the performance based on the convolutional neural network model.
  • the update of the training set relies on a large amount of manual annotation and manual correction; on the other hand, the training set constructed by sampling images at limited spatial and temporal scales and resolutions is always difficult to fully and faithfully reflect the real marine environment. Plankton, these will inevitably affect the recognition accuracy of image recognition, and cannot meet the needs of real-time observation of plankton in the marine environment.
  • the image recognition method provided by this application can effectively improve the recognition accuracy and robustness, and fully ensure the generalization performance. Accordingly, the image recognition method is suitable for image recognition devices, and the image recognition device can be deployed in Electronic equipment configured with the von Neumann architecture, for example, the electronic equipment can be a desktop computer, a laptop computer, a server, etc.
  • Figure 1 is a schematic diagram of an implementation environment involved in an image recognition method. It should be noted that this implementation environment is only an example adapted to the present invention and cannot be considered to provide any limitation on the scope of the present invention.
  • the implementation environment includes a collection terminal 110 and a server terminal 130.
  • the collection terminal 110 can also be considered as an image collection device, including but not limited to a camera, a still camera, a camcorder and other electronic devices with a shooting function.
  • the collection terminal 110 is an underwater camera.
  • the server 130 can be an electronic device such as a desktop computer, a laptop computer, a server, etc., or it can be a computer cluster composed of multiple servers, or even a cloud computing center composed of multiple servers.
  • the server 130 is used to provide background services.
  • the background services include but are not limited to image recognition services and so on.
  • a network communication connection is established in advance between the server 130 and the collection terminal 110 through wired or wireless means, and data transmission between the server 130 and the collection terminal 110 is implemented through the network communication connection.
  • the transmitted data includes but is not limited to: images to be recognized, etc.
  • the collection terminal 110 captures and collects the image to be recognized, and uploads the image to be recognized to the server 130 to request the server 130 to provide image recognition services.
  • the image recognition service is called to search for images similar to the image to be identified in the retrieval database that stores sample images and their corresponding sample categories. sample images, and then determine the target category of the image to be recognized based on the sample category corresponding to the found sample image, thereby realizing an image recognition solution that replaces image classification with image retrieval, thereby solving the inaccuracy in recognition accuracy existing in related technologies. High, unrobust, and poor generalization performance problems.
  • the electronic equipment can be the server 130 in the implementation environment shown in Figure 1.
  • the method may include the following steps:
  • Step 310 Obtain the image to be recognized.
  • the image to be recognized is generated by photographing and collecting the environment containing the target object by the image acquisition device in the implementation environment shown in Figure 1 .
  • the target object refers to an object in the shooting environment.
  • the target object may be an underwater creature, and specifically the underwater creature may be a plankton in a marine environment.
  • the shooting can be a single shooting or a continuous shooting.
  • a video can be obtained, and the image to be recognized can be any number of frames in the video.
  • multiple photos can be obtained, and the image to be recognized can be any number of photos among the multiple photos.
  • the image to be recognized in this embodiment may refer to a dynamic image, such as multiple frames in a video, or multiple photos, or a static image, such as any frame in a video, or
  • the image recognition in this embodiment can be performed on dynamic images or on static images, which is not limited here.
  • the image to be recognized can come from the image to be recognized that is captured and collected in real time by the image acquisition device, or it can be the image to be recognized that is captured and captured by the image acquisition device in a historical time period that is pre-stored in the electronic device. Then, for electronic devices, after the image acquisition device captures and collects the image to be recognized, the image to be recognized can be processed in real time, or it can be stored in advance for processing. For example, the image to be recognized can be processed when the CPU of the electronic device is low. , or process the image to be recognized according to the instructions of the staff. Therefore, the image recognition in this embodiment can be based on the image to be recognized obtained in real time or the image to be identified obtained in a historical time period, which is not specifically limited here.
  • the image to be recognized is an ROI (region of interest) image, that is to say, in the image to be recognized, the target object is located in the area of interest, which can also be understood as the target object passing through the sensor.
  • the identification of the area of interest is significantly different from the background area.
  • the target object is plankton, located in the area of interest (gray-white area), which is significantly different from the background area (black area).
  • Step 330 Extract image features from the image to be recognized to obtain a first feature vector.
  • the first feature vector is used to represent the image features of the sample image. It can also be considered that the first feature vector is an accurate description of the image features of the image to be identified. It should be understood that if the image to be identified is different, the extracted image features will be different. The difference is that correspondingly, the first feature vectors are also different.
  • image feature extraction can be implemented through feature extraction algorithms such as directional gradient histogram features, local binary pattern features, and Haar-like features.
  • image feature extraction is achieved through a convolution kernel. It should be noted that based on different numbers and different sizes of convolution kernels, first feature vectors of different lengths will be obtained to reflect the image to be recognized from different scales.
  • the image features are extracted through a feature extractor.
  • the feature extractor that has completed model training is used to convert the image to be recognized into a first feature vector.
  • Step 350 In the retrieval database used to store sample images and their corresponding sample categories, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition.
  • the retrieval database essentially establishes a correspondence between sample images and their corresponding sample categories.
  • the sample category corresponding to the sample object can be quickly determined, which is then used as the basis for image retrieval.
  • the sample image refers to an image labeled with a sample category.
  • the sample image refers to an image carrying a label indicating the sample category.
  • image retrieval The essence of image retrieval is to measure the similarity between the image to be recognized and the sample images in the retrieval database.
  • Image recognition based on image retrieval does not directly obtain the target category of the image to be recognized, but by comparing the image to be recognized and the retrieved image.
  • the similarity between the sample images in the library indirectly obtains the target category of the image to be recognized, that is, first obtains the sample category corresponding to the sample image that satisfies the similarity condition between the images to be recognized, and then obtains the image to be recognized. target category.
  • the comparison of the similarity between the image to be recognized and the sample image in the retrieval database is achieved by calculating the similarity between the first feature vector and the second feature vector.
  • the first feature vector is used to represent the image features of the image to be recognized
  • the second feature vector is used to represent the image features of the sample images in the retrieval database.
  • the similarity calculation scheme includes but is not limited to: cosine similarity, Euclidean distance, Manhattan distance, Jaccard similarity coefficient, Pearson correlation coefficient, etc.
  • Similarity(x,y) represents the similarity between x and y, and the value range of this similarity is [0,1]; x represents the first feature vector of the image to be recognized, and y represents the sample The second eigenvector of the image. It should be understood that the closer the similarity is to 1, the closer the first feature vector and the second feature vector are, that is, the more similar the image to be recognized is to the sample image.
  • the image to be recognized is not limited to a static image, such as a photo or a frame, but can also be a dynamic image. If the image to be recognized refers to a dynamic image, such as multiple photos or multiple frames screen, you can combine calculation formula (1) and calculation formula (2) to calculate multiple similarities at the same time.
  • V represents the similarity result matrix
  • Q represents the first eigenvector matrix of the image to be recognized
  • G represents the second eigenvector matrix of each sample image in the retrieval database.
  • the values in each column of the i-th row represent: the first feature vector of the i-th photo or i-th frame in the image to be recognized, and the second feature vector of each sample image in the retrieval database.
  • the similarity condition refers to the highest degree of similarity. Therefore, the sample image with the highest degree of similarity between the second feature vector and the first feature vector is used as the sample image found from the retrieval database.
  • the second feature vector is pre-calculated and stored in the storage area of the electronic device. In this way, when performing image recognition on different images to be recognized, it can be read directly from the storage area of the electronic device. Taking the second feature vector calculated in advance avoids repeated extraction of the second feature vector in each image recognition process, thereby helping to further improve the recognition efficiency of image recognition.
  • the second feature vector is stored in the storage area of the electronic device in the form of LUT (Look-up Table). Then, during the image recognition process, the LUT can be directly loaded into the memory of the electronic device to This avoids repeated extraction of the second feature vector in each image recognition process.
  • LUT Look-up Table
  • the above process is especially suitable for image recognition of out-of-distribution samples.
  • the image recognition scheme based on image classification will not only affect the accuracy of classification, but also lead to inaccuracies in abundance quantification.
  • the image recognition solution based on image retrieval can more accurately exclude out-of-distribution samples through similarity calculation, thereby effectively ensuring the recognition accuracy of image recognition.
  • Step 370 Determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
  • the sample category corresponding to the found sample image is the recognition result obtained by image recognition of the image to be recognized, that is, the target category of the image to be recognized.
  • the target category of the image to be recognized may be a new category, that is, it does not belong to any sample category corresponding to each sample image in the retrieval database. It can also be understood that the target category of the image to be recognized is an unknown category. , at this time, according to the sample category corresponding to the found sample image, the target category of the image to be recognized cannot actually be obtained correctly.
  • a decision condition is proposed to reject the recognition of unknown categories, thereby avoiding recognition errors.
  • the decision condition refers to that the similarity between the image to be recognized and the found sample image is greater than the similarity threshold.
  • the category decision-making process based on this decision condition specifically refers to: if the similarity between the second feature vector of the found sample image and the first feature vector of the image to be recognized is greater than the similarity threshold, then the found sample image will be The corresponding sample category is used as the target category of the image to be recognized; otherwise, the target category of the image to be recognized is determined to be a new category.
  • the decision-making condition may also be related to the weight configured for the found sample image, which is not specifically limited here.
  • an image recognition solution is implemented where image retrieval replaces image classification. Since the recognition accuracy of image retrieval depends on the sample images in the retrieval database and their corresponding sample categories, unlike image classification, which relies on frequent changes in the training set and The retraining of the convolutional neural network model can fully ensure the recognition accuracy while minimizing manual participation, and can effectively solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies. The problem.
  • Figure 4 shows a schematic structural diagram of the basic model in one embodiment.
  • the basic model includes a first training branch and a second training branch.
  • the first training branch and the second training branch respectively include a feature extraction layer and dimensionality reduction. layer.
  • the feature extraction layer can be considered as a feature extractor that has not completed model training and is used to extract image features;
  • the dimensionality reduction layer consists of two fully connected layers and is used to further reduce the dimensionality of the feature vector obtained by the feature extraction layer, for example , convert the feature vector of length 2048 obtained by the feature extraction layer into a feature vector of length 128.
  • Figure 5 shows a schematic structural diagram of the feature extraction layer in one embodiment.
  • the feature extraction layer is a convolutional neural network model with a structural depth of 50 layers and does not include a fully connected layer.
  • the convolution layer Conv the convolution layer Conv
  • the pooling layer Pool the activation function layer ReLU
  • it is also based on the ResNeXt module and introduces the SE (Squeeze-and-Excitation) attention module.
  • SE Seeze-and-Excitation
  • the feature vector obtained based on this feature extraction layer has strong abstract expression ability, and with the assistance of the attention mechanism, it can focus on the parts of the image that play a major role in recognition, such as the area of interest in the ROI image, thus fully This ensures that image features can be extracted more effectively.
  • model training process may include the following steps:
  • Step 410 Traverse the image pairs in the training set.
  • the image pairs include positive sample pairs and negative sample pairs.
  • the two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories.
  • image data enhancement processing includes but is not limited to: random cropping, rotation, flipping, grayscale, brightness adjustment, contrast adjustment, saturation adjustment, etc., which are not limited here.
  • the first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set are subjected to image pairing processing to obtain an image pair.
  • the sample images in the training set include 701 and 702.
  • the first enhanced image and the second enhanced image obtained by amplifying the sample image 701 are 7011 and 7012 respectively.
  • the first enhanced image obtained by amplifying the sample image 702 The enhanced image and the second enhanced image are 7021 and 7022 respectively.
  • the constructed image pairs include ⁇ 7011, 7012 ⁇ , ⁇ 7011, 7021 ⁇ , ⁇ 7011, 7022 ⁇ , ⁇ 7012, 7021 ⁇ , ⁇ 7012, 7022 ⁇ , ⁇ 7021, 7022 ⁇ .
  • ⁇ 7011, 7012 ⁇ , ⁇ 7021, 7022 ⁇ belong to the positive sample pair
  • ⁇ 7011, 7021 ⁇ , ⁇ 7011, 7022 ⁇ , ⁇ 7012, 7021 ⁇ , ⁇ 7012, 7022 ⁇ belong to the negative sample pair right.
  • the traversal process for image pairs in the training set can include the following steps:
  • Step 411 Input the two sample images in the image pair to the first training branch and the second training branch respectively for processing.
  • the processing at least includes: extracting image features through the feature extraction layer, and reducing the dimensionality of the feature vector through the dimensionality reduction layer. wait.
  • the sample images are pre-processed before being input to the first training branch or the second training branch.
  • preprocessing includes but is not limited to: filling, scaling, normalization, etc. In this manner, since the occurrence of distortion is avoided, it is conducive to further effectively improving the accuracy of recognition.
  • the purpose of preprocessing such as padding and scaling is to ensure a unified input size of the first training branch or the second training branch.
  • the unified input size is 224 ⁇ 224.
  • Normalization preprocessing means that the sample image is normalized pixel by pixel according to the following calculation formula (3) after encoding preprocessing.
  • I Norm represents the pixels in the sample image that have completed normalization processing, and I represents the pixels to be processed in the sample image;
  • mean and std respectively represent the pixel mean and pixel standard deviation of all pixels in all sample images in the training set.
  • Step 413 Calculate the model loss value based on the processing results obtained by the first training branch and the second training branch.
  • the calculation formula (4) of the model loss value is as follows:
  • L sup represents the model loss value
  • I represents the set of all sample images in the training set
  • P(i) represents the set of positive sample pairs to which the i-th sample image belongs in the training set
  • A(i) represents all sample images in the training set excluding the i-th sample image. collection
  • step 430 is executed.
  • step 415 is executed.
  • the convergence condition can refer to the model loss value being minimum or lower than the loss value threshold, or it can also refer to the number of iterations meeting the iteration threshold. This is not limited here and can be flexibly set according to the actual needs of the application scenario.
  • Step 415 Update the parameters of the basic model and return to step 410.
  • Step 430 The feature extractor is obtained by converging the feature extraction layer in the basic model.
  • the supervised contrastive learning model training of the feature extraction layer is completed, so that the feature extractor has the effect of bringing the positive sample closer to the two sample images in the feature space and pushing the negative sample farther away from the two sample images. .
  • both the dual training branch and the dimensionality reduction layer are discarded, and only one of the feature extraction layers in the dual training branch is retained as a feature extractor for subsequent image recognition.
  • the convolutional neural network model in image classification has a greatly simplified model structure, which further avoids relying on frequent changes in the training set to maintain recognition performance, and is more conducive to improving recognition accuracy.
  • the above method may further include the following steps: constructing a feature vector set from the second feature vector of the sample image in the retrieval library.
  • the feature vector set is a LUT.
  • the second feature vector can be stored in the storage area of the electronic device in a LUT manner, thereby avoiding repeated extraction of the second feature vector in each image recognition process, thereby improving the recognition efficiency of image recognition.
  • the inventor also realized that as the number of sample images in the retrieval library increases, the number of pre-calculated second feature vectors in the LUT also increases, due to the need to calculate the first feature vector and each second feature in the LUT Vector similarity, then the number of second feature vectors in the LUT will affect the similarity calculation speed, thereby affecting the recognition efficiency of image recognition.
  • a construction process of a feature vector set is proposed to realize LUT pruning, which can not only reduce the size of the LUT, that is, reduce the number of second feature vectors in the LUT, but also try to maintain the second feature vector in the LUT. Diversity of feature vectors.
  • the construction process of the feature vector set may include the following steps:
  • Step 510 Perform image feature extraction on each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set.
  • Step 530 Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set, and obtain the second feature vector. One degree of similarity.
  • Step 550 Based on the first similarity, delete the second feature vector with high redundancy from the feature vector set.
  • the redundancy of the second feature vector is used to indicate the number of similar second feature vectors to the second feature vector in the feature vector set. It should be understood that the higher the redundancy, the greater the number of second feature vectors that are similar to the second feature vector in the feature vector set. Then, in the feature vector set, it can be considered that the second feature vector corresponds to If the sample image is redundant, the second feature vector can be deleted from the set of feature vectors.
  • the LUT pruning process can include the following steps:
  • Step 551 Use the second feature vector whose first similarity to the first vector is greater than the first set threshold as the second vector.
  • Step 553 Calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity.
  • Step 555 Determine the redundancy of the first vector based on the number of second feature vectors whose first similarity with the first vector is greater than the first set threshold, and determine the redundancy of the first vector based on the second similarity with the second vector that is greater than the first set threshold. 2. Set the number of second feature vectors with a threshold value to determine the redundancy of the second vector.
  • the redundancy of the first vector is used to indicate the number of similar second feature vectors to the first vector in the feature vector set. Similarity means that the first similarity is greater than the first set threshold.
  • the redundancy of the second vector is used to indicate the number of similar second feature vectors to the second vector in the feature vector set. Similarity means that the second similarity is greater than the first set threshold.
  • Step 557 Based on the redundancy of the second feature vector, delete the corresponding second feature vector from the feature vector set.
  • the first vector is deleted from the feature vector set.
  • the redundancy of the second vector is greater than the redundancy of the first vector. Then the second vector is deleted from the feature vector set.
  • the second eigenvectors in the eigenvector set are: A, B, C, and D respectively.
  • the second feature vectors B, C, and D are used as the second vectors. At this time, the second vector B and the remaining second vectors are calculated respectively.
  • the similarities of the feature vectors A, C, and D are used to obtain the second similarity: 0.91, 0.7, and 0.97; the similarities between the second vector C and the remaining second feature vectors A, B, and D are calculated respectively, and the second similarity is obtained: 0.95, 0.97, 0.75; calculate the similarity between the second vector D and the remaining second feature vectors A, B, and C respectively, and obtain the second similarity: 0.97, 0.75, 0.77.
  • the second set threshold is also 0.8
  • the number of second feature vectors (B, C, D) with a first similarity greater than 0.8 to the first vector A is 3, and the number of second feature vectors (B, C, D) with the first similarity of The number of second feature vectors (A, D) with a second similarity greater than 0.8 is 2, and the number of second feature vectors (A, B) with a second similarity greater than 0.8 with the second vector C is 2,
  • the number of second feature vectors (A) whose second similarity degree is greater than 0.8 with the second vector D is 1.
  • the redundancy of the first vector A is 3, the redundancy of the second vector B is 2, the redundancy of the second vector C is 2, and the redundancy of the second vector D is 2.
  • the degree is 1.
  • the first vector A with a redundancy of 3 is deleted from the feature vector set.
  • the redundancy can also be expressed in other forms, such as a number-based normalization method, which is not specifically limited here.
  • the first set threshold and the second set threshold may be the same or different, and they may be flexibly adjusted according to the actual needs of the application scenario to balance recognition efficiency and recognition accuracy. For example, in application scenarios with high recognition efficiency requirements, set a smaller first set threshold.
  • plankton application scenario assuming that there are 200 plankton categories, if each plankton category contains 1,000 sample images, then the retrieval library contains 200,000 sample images, then the LUT contains up to 200,000 second features.
  • Vector taking the LUT in NVIDA RTX3090 GPU as an example, image recognition of the image to be recognized takes up to 5.8ms, which can fully meet the needs of real-time observation of plankton in the marine environment.
  • the above method may further include the following steps:
  • Step 610 In response to the category correction instruction, correct the target category of the image to be recognized.
  • Step 630 If the corrected target category of the image to be recognized is a new category, add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction.
  • the new category means that the corrected target category of the image to be recognized is different from the sample category in the retrieval database.
  • a human-computer interaction interface is provided, thereby helping to promptly discover and correct image recognition deviations to fully ensure the recognition performance of image recognition.
  • FIG. 10 shows a schematic diagram of an image recognition framework based on image retrieval in one embodiment.
  • the image recognition framework includes: a query image module (query) 801 for obtaining images to be recognized, a retrieval library (gallery) 802 for storing sample images and their corresponding sample categories, and a retrieval module (gallery) 802 for performing image feature analysis.
  • human-computer interaction interface for a query image module (query) 801 for obtaining images to be recognized
  • a retrieval library for storing sample images and their corresponding sample categories
  • a retrieval module for performing image feature analysis.
  • the human-computer interaction interface includes a correction interface 807 and an adding interface 808.
  • the correction interface 807 is used to generate a category correction instruction to correct the target category of the image to be recognized;
  • the adding interface 808 is used to generate a category addition instruction to add the image to be recognized and its corrected target category to the retrieval library.
  • the electronic device is a smartphone that provides browsing of recognition results.
  • the smartphone displays a browsing page for browsing the recognition results, and the browsing page displays a correction interface and an adding interface.
  • the correction interface and the addition interface are essentially controls that can realize human-computer interaction.
  • the controls can be input boxes, selection boxes, buttons, switches, progress bars, etc.
  • the user finds that the target category of the image to be recognized is a new category, he can trigger the corresponding operation on the correction interface.
  • the correction interface if the corresponding operation triggered by the user is detected, a category correction instruction is generated to instruct the electronic device
  • the target category of the image to be recognized is corrected in response to the category correction instruction.
  • the correction interface is an input box for the user to input the name of the new category, and the user's input operation is regarded as the corresponding operation triggered by the user on the correction interface; similarly , when the corrected target category of the image to be recognized is a new category, the user can also trigger the corresponding operation in the adding interface.
  • the adding interface For the adding interface, if the corresponding operation triggered by the user is detected, a category adding instruction is generated, so that Instruct the electronic device to add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction.
  • the adding interface is a "confirm/cancel" button for the user to click, and the user's click operation is regarded as The corresponding operation triggered by the user when adding the interface.
  • the specific behavior of the corresponding operation triggered by the user will be different.
  • the corresponding operations triggered may be click, touch, slide and other gesture operations; or, if the electronic device is a laptop equipped with a mouse, the corresponding operations triggered may be click, touch, slide, etc.
  • Mechanical operations such as double-clicking and dragging are not specifically limited in this embodiment.
  • the image recognition framework based on image retrieval has the characteristic of "by adding a new category in the retrieval library, the target category of the image to be recognized can be immediately recognized as a new category", making retraining not suitable for this image recognition framework. is always necessary, thus helping to delay the need for retraining and reducing the frequency of retraining, providing more convenience and greater flexibility for image recognition.
  • an embodiment of the present application provides an image recognition device 900, including but not limited to: an image acquisition module 910, a feature extraction module 930, an image search module 950, and an image recognition module 970.
  • the image acquisition module 910 is used to acquire the image to be recognized.
  • the feature extraction module 930 is used to extract image features from the image to be recognized to obtain the first feature vector.
  • the image search module 950 is used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval database used to store sample images and their corresponding sample categories.
  • the second feature vector is used to Represents the image features of the sample image.
  • the image recognition module 970 is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
  • Figure 12 shows a schematic structural diagram of an electronic device according to an exemplary embodiment.
  • the electronic device is suitable for the server 130 in the implementation environment shown in FIG. 1 .
  • this electronic device is only an example adapted to the present application and cannot be considered to provide any limitation on the scope of use of the present application.
  • the electronic device is also not to be construed as being dependent on or required to have one or more components of the exemplary electronic device 2000 shown in FIG. 12 .
  • the hardware structure of the electronic device 2000 may vary greatly due to different configurations or performance.
  • the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU). ,Central Processing Units)270.
  • CPU central processing unit
  • CPU Central Processing Unit
  • the power supply 210 is used to provide operating voltage for each hardware device on the electronic device 2000 .
  • the interface 230 includes at least one wired or wireless network interface for interacting with external devices. For example, the interaction between the collection terminal 110 and the server terminal 130 in the implementation environment shown in Figure 1 is performed.
  • the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., as shown in Figure 12, which is not intended here. This constitutes a specific limitation.
  • the memory 250 can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
  • the resources stored thereon include the operating system 251, application programs 253, data 255, etc., and the storage method can be short-term storage or permanent storage. .
  • the operating system 251 is used to manage and control each hardware device and application program 253 on the electronic device 2000, so as to realize the operation and processing of the massive data 255 in the memory 250 by the central processor 270, which can be Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM, etc.
  • the application program 253 is a computer program that performs at least one specific job based on the operating system 251. It may include at least one module (not shown in FIG. 12), and each module may include a computer program for the electronic device 2000. .
  • the image recognition device can be regarded as an application program 253 deployed on the electronic device 2000.
  • the data 255 may be photos, pictures, etc. stored in a disk, or may be an image to be recognized, etc., stored in the memory 250 .
  • the central processing unit 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read the computer program stored in the memory 250 and thereby implement operations on the massive data 255 in the memory 250 and processing. For example, the image recognition method is completed by the central processor 270 reading a series of computer programs stored in the memory 250 .
  • present application can also be implemented through hardware circuits or hardware circuits combined with software. Therefore, implementation of the present application is not limited to any specific hardware circuit, software, or combination of the two.
  • the electronic device 400 may include: a desktop computer, a notebook computer, an electronic device, etc.
  • the electronic device 4000 includes at least one processor 4001 , at least one communication bus 4002 and at least one memory 4003 .
  • the processor 4001 and the memory 4003 are connected, such as through a communication bus 4002.
  • the electronic device 4000 may also include a transceiver 4004, which may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception.
  • a transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that in practical applications, the number of transceivers 4004 is not limited to one, and the structure of the electronic device 4000 does not constitute a limitation on the embodiments of the present application.
  • the processor 4001 can be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, a data signal processor), an ASIC (Application Specific Integrated Circuit, an application-specific integrated circuit), or an FPGA (Field Programmable Gate Array). , field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure.
  • the processor 4001 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.
  • Communication bus 4002 may include a path that carries information between the above-mentioned components.
  • the communication bus 4002 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture) bus, etc.
  • the communication bus 4002 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 13, but it does not mean that there is only one bus or one type of bus.
  • the memory 4003 can be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory) or other types that can store information and instructions.
  • Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disk storage, optical disk storage (including compression Optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer Any other medium, without limitation.
  • the computer program is stored in the memory 4003, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.
  • embodiments of the present application provide a storage medium, and a computer program is stored on the storage medium.
  • the computer program is executed by a processor, the image recognition method in the above embodiments is implemented.
  • An embodiment of the present application provides a computer program product.
  • the computer program product includes a computer program, and the computer program is stored in a storage medium.
  • the processor of the electronic device reads the computer program from the storage medium, and the processor executes the computer program, so that the electronic device performs the image recognition method in the above embodiments.
  • the image recognition framework based on image retrieval uses powerful image feature representation brought by supervised contrastive learning to make the features belong to the same category in the feature space.
  • the clustering of positive examples and the distance of negative examples belonging to different categories not only avoids relying on model retraining, but also effectively improves the recognition efficiency of image recognition, and fully guarantees the recognition accuracy of image recognition.
  • the retrieval library in the image recognition framework is not only suitable for re-training, but also suitable for user adjustment, which is conducive to flexible and customized services for recognition tasks of different attributes and spaces.
  • the number of sample categories in the retrieval database should be expanded as much as possible so that the image recognition ability can take into account the diversity;
  • the sample images in the retrieval database can be Limiting the sample categories, that is, excluding impossible sample categories, can not only reduce the calculation amount of similarity calculation, but also prevent the image to be recognized from being misidentified as impossible sample categories, indirectly ensuring the recognition performance of image recognition;
  • the size of the retrieval library can be further reduced to include only the sample categories of interest.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiments of the present application relate to the technical field of image processing. Provided are an image recognition method and apparatus. The method comprises: acquiring an image to be recognized; performing image feature extraction on the image to be recognized, so as to obtain a first feature vector; searching a retrieval library, which is used for storing sample images and sample categories corresponding to the sample images, for a sample image in which the similarity between a second feature vector and the first feature vector meets a similarity condition, wherein the second feature vector is used for representing an image feature of the sample image; and determining, according to a sample category corresponding to the found sample image, a target category of the image to be recognized. By means of the embodiments of the present application, the problems of low recognition accuracy, instability, and poor generalization performance in the related art can be solved.

Description

图像识别方法及装置Image recognition method and device 技术领域Technical field
本申请涉及图像处理技术领域,具体而言,本申请涉及一种图像识别方法及装置。The present application relates to the field of image processing technology. Specifically, the present application relates to an image recognition method and device.
背景技术Background technique
图像识别是计算机视觉领域中的一个重要的研究课题,已被广泛应用在众多领域。例如,面向海洋环境中的浮游生物的图像识别,以实现对该浮游生物长期、连续的原位观测。Image recognition is an important research topic in the field of computer vision and has been widely used in many fields. For example, it is aimed at image recognition of plankton in the marine environment to achieve long-term, continuous in-situ observation of the plankton.
目前,图像识别通常是利用训练集训练卷积神经网络模型,再根据该卷积神经网络模型对待识别图像进行类别预测,以得到待识别图像的目标类别。在基于图像分类的上述图像识别方案中,需要不断地更新训练集,进而使得卷积神经网络模型也随之开展较为频繁地重训练,方能够维持基于该卷积神经网络模型所进行的图像识别的识别性能。At present, image recognition usually uses a training set to train a convolutional neural network model, and then predicts the category of the image to be recognized based on the convolutional neural network model to obtain the target category of the image to be recognized. In the above-mentioned image recognition scheme based on image classification, the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the image recognition based on the convolutional neural network model. recognition performance.
然而,训练集的更新依赖于大量的人工标注和人工校正,由此,如何在减少人工参与的前提下,提高识别准确率、稳健性,进而保证泛化性能是尚待解决的问题。However, the update of the training set relies on a large amount of manual annotation and manual correction. Therefore, how to improve the recognition accuracy and robustness while reducing manual participation, and thus ensure the generalization performance, is an issue that remains to be solved.
发明内容Contents of the invention
本申请各实施例提供了一种图像识别方法、装置、电子设备及存储介质,可以解决相关技术中存在的识别准确率不高、不稳健、泛化性能不佳的问题。所述技术方案如下:Each embodiment of the present application provides an image recognition method, device, electronic device, and storage medium, which can solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies. The technical solutions are as follows:
根据本申请实施例的一个方面,一种图像识别方法,包括:获取待识别图像;对所述待识别图像进行图像特征提取,得到第一特征向量;在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和所述第一特征向量的相似度满足相似条件的样本图像,所述第二特征向量用于表示所述样本图像的图像特征;根据查找到的样本图像所对应的样本类别,确定所 述待识别图像的目标类别。According to one aspect of the embodiment of the present application, an image recognition method includes: obtaining an image to be recognized; performing image feature extraction on the image to be recognized to obtain a first feature vector; and storing sample images and their corresponding samples. In the retrieval library of the category, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition, and the second feature vector is used to represent the image features of the sample image; according to the found sample The sample category corresponding to the image determines the target category of the image to be recognized.
根据本申请实施例的一个方面,一种图像识别装置,包括:图像获取模块,用于获取待识别图像;特征提取模块,用于对所述待识别图像进行图像特征提取,得到第一特征向量;图像查找模块,用于在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和所述第一特征向量的相似度满足相似条件的样本图像,所述第二特征向量用于表示所述样本图像的图像特征;图像识别模块,用于根据查找到的样本图像所对应的样本类别,确定所述待识别图像的目标类别。According to one aspect of the embodiment of the present application, an image recognition device includes: an image acquisition module, used to acquire an image to be recognized; a feature extraction module, used to extract image features from the image to be recognized, to obtain a first feature vector ; Image search module, used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval library used to store sample images and their corresponding sample categories, the second The feature vector is used to represent the image features of the sample image; the image recognition module is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
在一示例性实施例中,所述特征提取模块包括:提取器单元,用于利用完成模型训练的特征提取器,将所述待识别图像转换为所述第一特征向量。In an exemplary embodiment, the feature extraction module includes: an extractor unit, configured to convert the image to be recognized into the first feature vector using a feature extractor that has completed model training.
在一示例性实施例中,所述装置还包括:模型训练模块,用于根据训练集中的图像对,对基础模型进行模型训练,得到所述特征提取器,所述基础模型包括第一训练分支和第二训练分支,所述第一训练分支和所述第二训练分支分别包括特征提取层和降维层;所述模型训练模块包括:图像遍历单元,用于对所述训练集中的图像对进行遍历,所述图像对包括正样本对和负样本对,所述正样本对中的两个样本图像属于相同的样本类别,所述负样本对中的两个样本图像属于不同的样本类别;所述遍历包括:将所述图像对中的两个样本图像,分别输入所述第一训练分支和所述第二训练分支进行处理;根据所述第一训练分支和所述第二训练分支得到的处理结果,计算模型损失值;收敛单元,用于若所述模型损失值使得收敛条件被满足,则由所述基础模型中的特征提取层收敛得到所述特征提取器。In an exemplary embodiment, the device further includes: a model training module, configured to perform model training on a basic model according to the image pairs in the training set to obtain the feature extractor, where the basic model includes a first training branch and a second training branch, the first training branch and the second training branch respectively include a feature extraction layer and a dimensionality reduction layer; the model training module includes: an image traversal unit, used to perform image pairs in the training set Traverse, the image pair includes a positive sample pair and a negative sample pair, the two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories; The traversal includes: inputting two sample images in the image pair into the first training branch and the second training branch respectively for processing; obtaining according to the first training branch and the second training branch. The processing result is to calculate the model loss value; the convergence unit is used to obtain the feature extractor by converging the feature extraction layer in the basic model if the convergence condition is satisfied by the model loss value.
在一示例性实施例中,所述装置还包括:图像对构建模块;所述图像对构建模块包括:扩增单元,用于对所述训练集中的其中一个样本图像,至少进行两次不同的图像数据增强处理,则由所述样本图像至少扩增得到第一增强图像和第二增强图像;配对单元,用于对由所述训练集中各样本图像扩增得到的第一增强图像和第二增强图像,进行图像配对处理,得到所述图像对。In an exemplary embodiment, the device further includes: an image pair building module; the image pair building module includes: an amplification unit, configured to perform at least two different operations on one of the sample images in the training set. Image data enhancement processing, then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image; a pairing unit is used to amplify the first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set. Enhance the image and perform image pairing processing to obtain the image pair.
在一示例性实施例中,所述图像查找模块包括:相似度计算单元,用于针对特征向量集合中的每一个第二特征向量,分别计算所述第二特征向量和所述第一特征向量的相似度,所述特征向量集合由所述检索库中样本图像的第二特征向量构建;图像查找单元,用于将所述第二特征向量和所述第一特征向量的相似度最高的样本图像,作为从所述检索库中查找到的样本图像。In an exemplary embodiment, the image search module includes: a similarity calculation unit, configured to respectively calculate the second feature vector and the first feature vector for each second feature vector in the feature vector set. The similarity of the feature vector set is constructed from the second feature vector of the sample image in the retrieval library; the image search unit is used to select the sample with the highest similarity between the second feature vector and the first feature vector. image as a sample image found from the retrieval library.
在一示例性实施例中,所述装置还包括:集合构建模块,用于由所述检索库中样本图像的第二特征向量构建所述特征向量集合;所述集合构建模块包:向量添加单元,用于对所述检索库中的每一个样本图像进行图像特征提取,分别得到所述检索库中各样本图像的第二特征向量,并添加至所述特征向量集合;向量遍历单元,用于对所述特征向量集合中的第二特征向量进行遍历,以遍历到的第二特征向量作为第一向量,分别计算所述第一向量和所述特征向量集合中其余第二特征向量的相似度,得到第一相似度;向量删除单元,用于基于所述第一相似度,从所述特征向量集合中删除冗余度高的第二特征向量,所述冗余度用于指示在所述特征向量集合中存在相似的第二特征向量的个数。In an exemplary embodiment, the device further includes: a set building module, configured to build the feature vector set from the second feature vector of the sample image in the retrieval library; the set building module package: a vector adding unit , used to extract image features from each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set; a vector traversal unit, used for Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, and calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set. , obtain the first similarity; the vector deletion unit is used to delete the second feature vector with high redundancy from the feature vector set based on the first similarity, the redundancy is used to indicate that in the The number of similar second feature vectors in the feature vector set.
在一示例性实施例中,所述向量删除单元包括:向量确定子单元,用于将与所述第一向量的第一相似度大于第一设定阈值的第二特征向量,作为第二向量;相似度计算子单元,用于分别计算所述第二向量和所述特征向量集合中其余第二特征向量的相似度,得到第二相似度;冗余度计算子单元,用于根据与所述第一向量的第一相似度大于第一设定阈值的第二特征向量的个数,确定所述第一向量的冗余度,并根据与所述第二向量的第二相似度大于第二设定阈值的第二特征向量的个数,确定所述第二向量的冗余度;删除子单元,用于若所述第一向量的冗余度大于所述第二向量的冗余度,则将所述第一向量从所述特征向量集合中删除。In an exemplary embodiment, the vector deletion unit includes: a vector determination subunit, configured to use a second feature vector whose first similarity to the first vector is greater than a first set threshold as the second vector. ; Similarity calculation subunit, used to respectively calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity; Redundancy calculation subunit, used to calculate the similarity according to the second vector and the remaining second feature vectors in the feature vector set; The number of second feature vectors whose first similarity is greater than the first set threshold is determined, and the redundancy of the first vector is determined based on the second similarity with the second vector being greater than the first set threshold. 2. Set the number of second feature vectors with a threshold value to determine the redundancy of the second vector; delete the subunit, used if the redundancy of the first vector is greater than the redundancy of the second vector , then the first vector is deleted from the feature vector set.
在一示例性实施例中,所述图像识别模块包括:图像识别单元,用于若查找到的样本图像的第二特征向量满足决策条件,则将查找到的样本图像所对应的样本类别,作为所述待识别图像的目标类别。In an exemplary embodiment, the image recognition module includes: an image recognition unit, configured to use the sample category corresponding to the found sample image as The target category of the image to be recognized.
在一示例性实施例中,所述装置还包括:新类别校正模块,用于响应于类别校正指令,对所述待识别图像的目标类别进行校正;新类别添加模块,用于在所述待识别图像校正后的目标类别为新类别的情况下,响应于类别添加指令,将所述待识别图像及其校正后的目标类别添加至所述检索库,所述新类别是指所述待识别图像校正后的目标类别区别于所述检索库中的样本类别。In an exemplary embodiment, the device further includes: a new category correction module, configured to correct the target category of the image to be recognized in response to a category correction instruction; and a new category adding module, configured to correct the target category of the image to be recognized in the image to be recognized. When the corrected target category of the recognized image is a new category, in response to a category adding instruction, the image to be recognized and its corrected target category are added to the retrieval library, and the new category refers to the image to be recognized. The target category after image correction is different from the sample category in the retrieval library.
根据本申请实施例的一个方面,一种电子设备,包括:至少一个处理器、至少一个存储器、以及至少一条通信总线,其中,存储器上存储有计算机程序,处理器通过通信总线读取存储器中的计算机程序;计算机程序被处理器 执行时实现如上所述的图像识别方法。According to an aspect of an embodiment of the present application, an electronic device includes: at least one processor, at least one memory, and at least one communication bus, wherein a computer program is stored on the memory, and the processor reads the data in the memory through the communication bus. Computer program; when the computer program is executed by the processor, the image recognition method as described above is implemented.
根据本申请实施例的一个方面,一种存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上所述的图像识别方法。According to one aspect of an embodiment of the present application, a storage medium has a computer program stored thereon. When the computer program is executed by a processor, the image recognition method as described above is implemented.
根据本申请实施例的一个方面,一种计算机程序产品,计算机程序产品包括计算机程序,计算机程序存储在存储介质中,电子设备的处理器从存储介质读取计算机程序,处理器执行计算机程序,使得电子设备执行时实现如上所述的图像识别方法。According to an aspect of an embodiment of the present application, a computer program product includes a computer program, the computer program is stored in a storage medium, a processor of an electronic device reads the computer program from the storage medium, and the processor executes the computer program, such that When executed, the electronic device implements the image recognition method as described above.
本申请提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solution provided by this application are:
在上述技术方案中,基于待识别图像的第一特征向量,在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和第一特征向量的相似度满足相似条件的样本图像,进而根据查找到的样本图像所对应的样本类别,确定待识别图像的目标类别,由此,实现图像检索替代图像分类的图像识别方案,由于图像检索的识别准确率取决于检索库中的样本图像及其对应的样本类别,而不同于图像分类依赖于训练集的频繁变动以及卷积神经网络模型的重训练,从而能够在尽量减少人工参与的前提下充分地提高识别准确率、稳健性,并充分地保证泛化性能,进而能够有效地解决相关技术中存在的识别准确率不高、不稳健、泛化性能不佳的问题。In the above technical solution, based on the first feature vector of the image to be recognized, in the retrieval database used to store sample images and their corresponding sample categories, search for the similarity between the second feature vector and the first feature vector that satisfies the similarity condition. sample image, and then determine the target category of the image to be recognized based on the sample category corresponding to the found sample image. Thus, an image recognition solution that replaces image classification with image retrieval is realized. Since the recognition accuracy of image retrieval depends on the retrieval database The sample images and their corresponding sample categories are different from the image classification which relies on frequent changes of the training set and the retraining of the convolutional neural network model, so that it can fully improve the recognition accuracy and robustness while minimizing manual participation. It can effectively solve the problems of low recognition accuracy, instability and poor generalization performance in related technologies.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below.
图1是根据本申请所涉及的实施环境的示意图;Figure 1 is a schematic diagram of an implementation environment involved in this application;
图2是根据一示例性实施例示出的一种图像识别方法的流程图;Figure 2 is a flow chart of an image recognition method according to an exemplary embodiment;
图3是根据一示例性实施例示出的待识别图像为ROI图像的示意图;Figure 3 is a schematic diagram showing that the image to be recognized is an ROI image according to an exemplary embodiment;
图4是根据一示例性实施例示出的基础模型的结构示意图;Figure 4 is a schematic structural diagram of a basic model according to an exemplary embodiment;
图5是根据一示例性实施例示出的特征提取层的结构示意图;Figure 5 is a schematic structural diagram of a feature extraction layer according to an exemplary embodiment;
图6是根据一示例性实施例示出的特征提取层的模型训练过程的方法流程图;Figure 6 is a method flow chart of the model training process of the feature extraction layer according to an exemplary embodiment;
图7是根据一示例性实施例示出的图像配对过程的示意图;Figure 7 is a schematic diagram of an image pairing process according to an exemplary embodiment;
图8a是根据一示例性实施例示出的特征向量集合的构建过程的方法流程图;Figure 8a is a method flow chart of a construction process of a feature vector set according to an exemplary embodiment;
图8b是图8a对应实施例所涉及的步骤550在一个实施例中的方法流程图;Figure 8b is a method flow chart in one embodiment of step 550 involved in the corresponding embodiment of Figure 8a;
图9是根据一示例性实施例示出的另一种图像识别方法的流程图;Figure 9 is a flow chart of another image recognition method according to an exemplary embodiment;
图10是根据一示例性实施例示出的基于图像检索的图像识别框架的示意图;Figure 10 is a schematic diagram of an image recognition framework based on image retrieval according to an exemplary embodiment;
图11是根据一示例性实施例示出的一种图像识别装置的结构框图;Figure 11 is a structural block diagram of an image recognition device according to an exemplary embodiment;
图12是根据一示例性实施例示出的一种电子设备的硬件结构图;Figure 12 is a hardware structure diagram of an electronic device according to an exemplary embodiment;
图13是根据一示例性实施例示出的一种电子设备的结构框图。Figure 13 is a structural block diagram of an electronic device according to an exemplary embodiment.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present application and cannot be construed as limiting the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art will understand that, unless expressly stated otherwise, the singular forms "a", "an", "the" and "the" used herein may also include the plural form. It should be further understood that the word "comprising" used in the description of this application refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Additionally, "connected" or "coupled" as used herein may include wireless connections or wireless couplings. As used herein, the term "and/or" includes all or any unit and all combinations of one or more of the associated listed items.
如前所述,在基于图像分类的图像识别方案中,需要不断地更新训练集,进而使得卷积神经网络模型也随之开展较为频繁地重训练,方能够维持基于该卷积神经网络模型所进行的图像识别的识别性能。As mentioned before, in image recognition solutions based on image classification, the training set needs to be continuously updated, which in turn causes the convolutional neural network model to be retrained more frequently in order to maintain the performance based on the convolutional neural network model. Recognition performance of image recognition performed.
以面向海洋环境中的浮游生物的图像识别为例,由于自然海水环境的不断变化,必然造成了其中浮游生物类别和丰度的不断变化,并且在同一海域不同时间定点采样浮游生物,会随着浮游生物的日间垂直迁徙发生剧变,而 出现数据漂移现象,进而在对浮游生物进行图像识别过程中,为了保证数据分布的均衡和稳定,需要训练集的频繁变动以及卷积神经网络模型的重训练,才能够在浮游生物类别和丰度不断变化的背景下维持图像识别的识别性能。然而,训练集的更新,一方面,依赖于大量的人工标注和人工校正,另一方面,在有限时空尺度和分辨率下采样图像而构建的训练集,始终难以全面如实地反映真实海洋环境中的浮游生物,这些都必然影响图像识别的识别准确率,无法满足海洋环境中浮游生物实时观测的需求。Take the image recognition of plankton in the marine environment as an example. Due to the continuous changes in the natural seawater environment, the types and abundance of plankton in it will inevitably change. Moreover, sampling plankton at different times in the same sea area will change with the The diurnal vertical migration of plankton has undergone drastic changes, resulting in the phenomenon of data drift. In the process of image recognition of plankton, in order to ensure the balance and stability of the data distribution, frequent changes in the training set and the reconfiguration of the convolutional neural network model are required. Training can maintain the recognition performance of image recognition in the context of changing plankton categories and abundances. However, the update of the training set, on the one hand, relies on a large amount of manual annotation and manual correction; on the other hand, the training set constructed by sampling images at limited spatial and temporal scales and resolutions is always difficult to fully and faithfully reflect the real marine environment. Plankton, these will inevitably affect the recognition accuracy of image recognition, and cannot meet the needs of real-time observation of plankton in the marine environment.
由上可知,相关技术中仍存在识别准确率不高、不稳健、泛化性能不佳的局限性。It can be seen from the above that related technologies still have limitations such as low recognition accuracy, instability, and poor generalization performance.
为此,本申请提供的图像识别方法,能够有效地提高识别准确率、稳健性,并充分地保证泛化性能,相应地,该图像识别方法适用于图像识别装置,该图像识别装置可部署于配置了冯诺依曼体系结构的电子设备,例如,该电子设备可以是台式电脑、笔记本电脑、服务器等等。To this end, the image recognition method provided by this application can effectively improve the recognition accuracy and robustness, and fully ensure the generalization performance. Accordingly, the image recognition method is suitable for image recognition devices, and the image recognition device can be deployed in Electronic equipment configured with the von Neumann architecture, for example, the electronic equipment can be a desktop computer, a laptop computer, a server, etc.
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
图1为一种图像识别方法所涉及的一种实施环境的示意图。需要说明的是,该种实施环境只是一个适配于本发明的示例,不能认为是提供了对本发明的使用范围的任何限制。Figure 1 is a schematic diagram of an implementation environment involved in an image recognition method. It should be noted that this implementation environment is only an example adapted to the present invention and cannot be considered to provide any limitation on the scope of the present invention.
该实施环境包括采集端110和服务端130。The implementation environment includes a collection terminal 110 and a server terminal 130.
具体地,采集端110,也可以认为是图像采集设备,包括但不限于摄像头、相机、摄录机等具有拍摄功能的电子设备。例如,采集端110为水下相机。Specifically, the collection terminal 110 can also be considered as an image collection device, including but not limited to a camera, a still camera, a camcorder and other electronic devices with a shooting function. For example, the collection terminal 110 is an underwater camera.
服务端130,该服务端130可以是台式电脑、笔记本电脑、服务器等等电子设备,还可以是由多台服务器构成的计算机集群,甚至是由多台服务器构成的云计算中心。其中,服务端130用于提供后台服务,例如,后台服务包括但不限于图像识别服务等等。 Server 130. The server 130 can be an electronic device such as a desktop computer, a laptop computer, a server, etc., or it can be a computer cluster composed of multiple servers, or even a cloud computing center composed of multiple servers. The server 130 is used to provide background services. For example, the background services include but are not limited to image recognition services and so on.
服务端130与采集端110之间通过有线或者无线等方式预先建立网络通信连接,并通过该网络通信连接实现服务端130与采集端110之间的数据传输。传输的数据包括但不限于:待识别图像等等。A network communication connection is established in advance between the server 130 and the collection terminal 110 through wired or wireless means, and data transmission between the server 130 and the collection terminal 110 is implemented through the network communication connection. The transmitted data includes but is not limited to: images to be recognized, etc.
在一应用场景中,通过采集端110与服务端130的交互,采集端110拍摄并采集得到待识别图像,并将该待识别图像上传至服务端130,以请求服务端130提供图像识别服务。In an application scenario, through the interaction between the collection terminal 110 and the server 130, the collection terminal 110 captures and collects the image to be recognized, and uploads the image to be recognized to the server 130 to request the server 130 to provide image recognition services.
对于服务端130而言,在接收到采集端110上传的待识别图像之后,便调用图像识别服务,以便于在存储了样本图像及其对应的样本类别的检索库中,查找与待识别图像相似的样本图像,进而根据查找到的样本图像所对应的样本类别,确定待识别图像的目标类别,从而实现图像检索替代图像分类的图像识别方案,以此来解决相关技术中存在的识别准确率不高、不稳健、泛化性能不佳的问题。For the server 130, after receiving the image to be identified uploaded by the collection end 110, the image recognition service is called to search for images similar to the image to be identified in the retrieval database that stores sample images and their corresponding sample categories. sample images, and then determine the target category of the image to be recognized based on the sample category corresponding to the found sample image, thereby realizing an image recognition solution that replaces image classification with image retrieval, thereby solving the inaccuracy in recognition accuracy existing in related technologies. High, unrobust, and poor generalization performance problems.
请参阅图2,本申请实施例提供了一种图像识别方法,该方法适用于电子设备,该电子设备具体可以是图1所示出实施环境中的服务端130。Please refer to Figure 2. This embodiment of the present application provides an image recognition method, which is suitable for electronic equipment. Specifically, the electronic equipment can be the server 130 in the implementation environment shown in Figure 1.
以该方法各步骤的执行主体为电子设备为例进行说明,但是并非对此构成具体限定。The description is given by taking the execution subject of each step of the method as an electronic device as an example, but this is not a specific limitation.
如图2所示,该方法可以包括以下步骤:As shown in Figure 2, the method may include the following steps:
步骤310,获取待识别图像。Step 310: Obtain the image to be recognized.
其中,待识别图像是由图1所示出实施环境中的图像采集设备,对包含目标对象的环境进行拍摄并采集生成的。目标对象指的是拍摄环境中的对象,例如,目标对象可以是水下生物,该水下生物具体可以是海洋环境中的浮游生物。The image to be recognized is generated by photographing and collecting the environment containing the target object by the image acquisition device in the implementation environment shown in Figure 1 . The target object refers to an object in the shooting environment. For example, the target object may be an underwater creature, and specifically the underwater creature may be a plankton in a marine environment.
可以理解,拍摄可以是单次拍摄,还可以是连续性拍摄,那么,针对同一个目标对象,对于连续性拍摄而言,可以得到一段视频,则待识别图像可以是该视频中的任意若干帧画面,而就多次拍摄来说,可以得到多张照片,则待识别图像可以是该多张照片中的任意若干张照片。也就是说,本实施例中的待识别图像可以是指动态图像,例如一段视频中的多帧画面、或者多张照片,还可以是指静态图像,例如一段视频中的任意一帧画面、或者多张照片中的任意一张照片,相应地,本实施例中的图像识别可以针对动态图像进行,还可以是针对静态图像进行,此处并未加以限定。It can be understood that the shooting can be a single shooting or a continuous shooting. Then, for the same target object, for continuous shooting, a video can be obtained, and the image to be recognized can be any number of frames in the video. For multiple shots, multiple photos can be obtained, and the image to be recognized can be any number of photos among the multiple photos. That is to say, the image to be recognized in this embodiment may refer to a dynamic image, such as multiple frames in a video, or multiple photos, or a static image, such as any frame in a video, or For any one of the multiple photos, the image recognition in this embodiment can be performed on dynamic images or on static images, which is not limited here.
关于待识别图像的获取,待识别图像可以来源于图像采集设备实时拍摄并采集的待识别图像,也可以是预先存储于电子设备的一历史时间段由图像采集设备拍摄并采集的待识别图像。那么,对于电子设备而言,在图像采集 设备拍摄并采集得到待识别图像之后,可以实时处理待识别图像,还可以预先存储了再处理,例如,在电子设备的CPU低的时候处理待识别图像,或者,根据工作人员的指示处理待识别图像。由此,本实施例中的图像识别可以针对实时获取到的待识别图像,也可以针对历史时间段获取到的待识别图像,在此并未进行具体限定。Regarding the acquisition of the image to be recognized, the image to be recognized can come from the image to be recognized that is captured and collected in real time by the image acquisition device, or it can be the image to be recognized that is captured and captured by the image acquisition device in a historical time period that is pre-stored in the electronic device. Then, for electronic devices, after the image acquisition device captures and collects the image to be recognized, the image to be recognized can be processed in real time, or it can be stored in advance for processing. For example, the image to be recognized can be processed when the CPU of the electronic device is low. , or process the image to be recognized according to the instructions of the staff. Therefore, the image recognition in this embodiment can be based on the image to be recognized obtained in real time or the image to be identified obtained in a historical time period, which is not specifically limited here.
在一种可能的实现方式,待识别图像为ROI(region of interest,感兴趣区域)图像,也就是说,在待识别图像中,目标对象位于感兴趣区域,也可以理解为,目标对象通过感兴趣区域标识,显著区别于背景区域。如图3所示,在ROI图像中,目标对象为浮游生物,位于感兴趣区域(灰白区域),显著区别于背景区域(黑色区域)。In a possible implementation, the image to be recognized is an ROI (region of interest) image, that is to say, in the image to be recognized, the target object is located in the area of interest, which can also be understood as the target object passing through the sensor. The identification of the area of interest is significantly different from the background area. As shown in Figure 3, in the ROI image, the target object is plankton, located in the area of interest (gray-white area), which is significantly different from the background area (black area).
步骤330,对待识别图像进行图像特征提取,得到第一特征向量。Step 330: Extract image features from the image to be recognized to obtain a first feature vector.
其中,第一特征向量用于表示样本图像的图像特征,也可以认为是,第一特征向量是对待识别图像的图像特征的准确描述,应当理解,待识别图像不同,提取得到的图像特征将有所区别,相应地,第一特征向量也各不相同。Among them, the first feature vector is used to represent the image features of the sample image. It can also be considered that the first feature vector is an accurate description of the image features of the image to be identified. It should be understood that if the image to be identified is different, the extracted image features will be different. The difference is that correspondingly, the first feature vectors are also different.
在一种可能的实现方式,图像特征的提取可以通过方向梯度直方图特征、局部二值模式特征、哈尔(Haar-like)特征等的特征提取算法实现。In a possible implementation, image feature extraction can be implemented through feature extraction algorithms such as directional gradient histogram features, local binary pattern features, and Haar-like features.
在一种可能的实现方式,图像特征的提取是通过卷积核实现的。应当说明的是,基于不同数量和不同尺寸的卷积核,将得到不同长度的第一特征向量,以从不同尺度上反映待识别图像。In a possible implementation, image feature extraction is achieved through a convolution kernel. It should be noted that based on different numbers and different sizes of convolution kernels, first feature vectors of different lengths will be obtained to reflect the image to be recognized from different scales.
在一种可能的实现方式,图像特征的提取是通过特征提取器实现的,具体地,利用完成模型训练的特征提取器,将待识别图像转换为第一特征向量。In one possible implementation, the image features are extracted through a feature extractor. Specifically, the feature extractor that has completed model training is used to convert the image to be recognized into a first feature vector.
步骤350,在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和第一特征向量的相似度满足相似条件的样本图像。Step 350: In the retrieval database used to store sample images and their corresponding sample categories, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition.
首先说明的是,检索库,实质是建立了样本图像及其对应的样本类别的对应关系,通过建立的该对应关系,便能够快速地确定样本对象对应的样本类别,进而作为图像检索的依据。在一种可能的实现方式,样本图像是指标注了样本类别的图像,换而言之,样本图像是指携带了用于指示样本类别的标签的图像。First of all, the retrieval database essentially establishes a correspondence between sample images and their corresponding sample categories. By establishing this correspondence, the sample category corresponding to the sample object can be quickly determined, which is then used as the basis for image retrieval. In one possible implementation, the sample image refers to an image labeled with a sample category. In other words, the sample image refers to an image carrying a label indicating the sample category.
图像检索,实质是对待识别图像与检索库中样本图像之间的相似性进行度量,而基于图像检索的图像识别,并不是直接得到待识别图像的目标类别,而是通过比较待识别图像与检索库中样本图像之间的相似性,间接地得到待 识别图像的目标类别,即先得到与待识别图像之间的相似性满足相似条件的样本图像对应的样本类别,再由此得到待识别图像的目标类别。The essence of image retrieval is to measure the similarity between the image to be recognized and the sample images in the retrieval database. Image recognition based on image retrieval does not directly obtain the target category of the image to be recognized, but by comparing the image to be recognized and the retrieved image. The similarity between the sample images in the library indirectly obtains the target category of the image to be recognized, that is, first obtains the sample category corresponding to the sample image that satisfies the similarity condition between the images to be recognized, and then obtains the image to be recognized. target category.
其次,本实施例中,待识别图像与检索库中样本图像之间的相似性的比较,是通过计算第一特征向量与第二特征向量的相似度实现的。其中,第一特征向量用于表示待识别图像的图像特征,第二特征向量用于表示检索库中样本图像的图像特征。Secondly, in this embodiment, the comparison of the similarity between the image to be recognized and the sample image in the retrieval database is achieved by calculating the similarity between the first feature vector and the second feature vector. The first feature vector is used to represent the image features of the image to be recognized, and the second feature vector is used to represent the image features of the sample images in the retrieval database.
在一种可能的实现方式,相似度的计算方案包括但不限于:余弦相似度、欧式距离、曼哈顿距离、杰卡德相似系数、皮尔逊相关系数等等。In a possible implementation, the similarity calculation scheme includes but is not limited to: cosine similarity, Euclidean distance, Manhattan distance, Jaccard similarity coefficient, Pearson correlation coefficient, etc.
以余弦相似度举例说明相似度的计算过程:Take cosine similarity as an example to illustrate the calculation process of similarity:
Figure PCTCN2022137039-appb-000001
Figure PCTCN2022137039-appb-000001
在计算公式(1)中,Similarity(x,y)表示x和y的相似度,该相似度的取值范围为[0,1];x表示待识别图像的第一特征向量,y表示样本图像的第二特征向量。应当理解,相似度越接近1,表示第一特征向量和第二特征向量越接近,亦即是表示待识别图像和样本图像越相似。In the calculation formula (1), Similarity(x,y) represents the similarity between x and y, and the value range of this similarity is [0,1]; x represents the first feature vector of the image to be recognized, and y represents the sample The second eigenvector of the image. It should be understood that the closer the similarity is to 1, the closer the first feature vector and the second feature vector are, that is, the more similar the image to be recognized is to the sample image.
值得一提的是,如前所述,待识别图像不限于静态图像,例如一张照片或者一帧画面,还可以是动态图像,若待识别图像是指动态图像,例如多张照片或者多帧画面,则可以结合计算公式(1)和计算公式(2),同时计算多个相似度。It is worth mentioning that, as mentioned above, the image to be recognized is not limited to a static image, such as a photo or a frame, but can also be a dynamic image. If the image to be recognized refers to a dynamic image, such as multiple photos or multiple frames screen, you can combine calculation formula (1) and calculation formula (2) to calculate multiple similarities at the same time.
V=Q×G T   (2)。 V=Q×G T (2).
其中,V表示相似度结果矩阵,Q表示待识别图像的第一特征向量矩阵,G表示检索库中各样本图像的第二特征向量矩阵。Among them, V represents the similarity result matrix, Q represents the first eigenvector matrix of the image to be recognized, and G represents the second eigenvector matrix of each sample image in the retrieval database.
基于此,在相似度结果矩阵V中,第i行各列数值表示:待识别图像中第i张照片或者第i帧画面的第一特征向量、和检索库中各样本图像的第二特征向量的相似度,此种方式下,不仅极大地提高了相似度计算效率,而且实现了多照片/多画面同时识别,有利于待识别图像的批量处理,能够有效地提高识别效率。Based on this, in the similarity result matrix V, the values in each column of the i-th row represent: the first feature vector of the i-th photo or i-th frame in the image to be recognized, and the second feature vector of each sample image in the retrieval database. This method not only greatly improves the similarity calculation efficiency, but also realizes simultaneous recognition of multiple photos/multiple images, which is conducive to batch processing of images to be recognized and can effectively improve recognition efficiency.
在一种可能的实现方式,相似条件是指相似度最高,由此,将第二特征向量和第一特征向量的相似度最高的样本图像,作为从检索库中查找到的样本图像。当然,在其他实施例中,相似条件还可以是指相似度超过相似度阈值(例如相似度阈值=0.8),或者,相似度排名超过设定排名(例如设定排名 为10名)等等,那么,将第二特征向量和第一特征向量的相似度大于0.8的样本图像,或者,将第二特征向量和第一特征向量的相似度排在前10名的样本图像,作为从检索库中查找到的样本图像。In one possible implementation, the similarity condition refers to the highest degree of similarity. Therefore, the sample image with the highest degree of similarity between the second feature vector and the first feature vector is used as the sample image found from the retrieval database. Of course, in other embodiments, the similarity condition may also mean that the similarity exceeds the similarity threshold (for example, the similarity threshold = 0.8), or the similarity ranking exceeds the set ranking (for example, the set ranking is 10), etc., Then, the sample images whose similarity between the second eigenvector and the first eigenvector is greater than 0.8, or the sample images whose similarity between the second eigenvector and the first eigenvector are ranked in the top 10, are selected from the retrieval database. Found sample image.
在一种可能的实现方式,第二特征向量预先计算并存储于电子设备的存储区,此种方式下,面对不同的待识别图像进行图像识别时,可以直接从电子设备的存储区中读取提前计算的第二特征向量,避免每一次图像识别过程中重复的第二特征向量提取,从而有利于进一步地提高图像识别的识别效率。In a possible implementation, the second feature vector is pre-calculated and stored in the storage area of the electronic device. In this way, when performing image recognition on different images to be recognized, it can be read directly from the storage area of the electronic device. Taking the second feature vector calculated in advance avoids repeated extraction of the second feature vector in each image recognition process, thereby helping to further improve the recognition efficiency of image recognition.
在一种可能的实现方式,第二特征向量以LUT(Look-up Table,查找表)方式存储于电子设备的存储区,那么,图像识别过程中,LUT能够直接加载至电子设备的内存,以此来避免第二特征向量在每一次图像识别过程中的重复提取。In a possible implementation, the second feature vector is stored in the storage area of the electronic device in the form of LUT (Look-up Table). Then, during the image recognition process, the LUT can be directly loaded into the memory of the electronic device to This avoids repeated extraction of the second feature vector in each image recognition process.
上述过程,尤其适用于分布外样本的图像识别,例如,对于训练阶段未参与训练的新类别,基于图像分类的图像识别方案,不仅影响分类的正确率,还会导致丰度定量的不准确,而基于图像检索的图像识别方案,通过相似度计算,能够较为精准地排除分布外样本,进而有效地保障图像识别的识别精度。The above process is especially suitable for image recognition of out-of-distribution samples. For example, for new categories that have not participated in the training phase, the image recognition scheme based on image classification will not only affect the accuracy of classification, but also lead to inaccuracies in abundance quantification. The image recognition solution based on image retrieval can more accurately exclude out-of-distribution samples through similarity calculation, thereby effectively ensuring the recognition accuracy of image recognition.
步骤370,根据查找到的样本图像所对应的样本类别,确定待识别图像的目标类别。Step 370: Determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
也就是说,查找到的样本图像所对应的样本类别,是对待识别图像进行图像识别得到的识别结果,即待识别图像的目标类别。That is to say, the sample category corresponding to the found sample image is the recognition result obtained by image recognition of the image to be recognized, that is, the target category of the image to be recognized.
发明人意识到,待识别图像的目标类别可能是一种新类别,即并不属于检索库中各样本图像对应的任意一种样本类别,也可以理解为,待识别图像的目标类别是未知类别,此时,根据查找到的样本图像所对应的样本类别,实际上并无法正确地得到待识别图像的目标类别。The inventor realized that the target category of the image to be recognized may be a new category, that is, it does not belong to any sample category corresponding to each sample image in the retrieval database. It can also be understood that the target category of the image to be recognized is an unknown category. , at this time, according to the sample category corresponding to the found sample image, the target category of the image to be recognized cannot actually be obtained correctly.
基于此,为了避免图像识别错误,本实施例中,提出一种决策条件,以拒绝对未知类别的识别,从而避免识别错误。Based on this, in order to avoid image recognition errors, in this embodiment, a decision condition is proposed to reject the recognition of unknown categories, thereby avoiding recognition errors.
在一种可能的实现方式,决策条件是指待识别图像与查找到的样本图像的相似度大于相似度阈值。那么,基于该决策条件的类别决策过程具体是指:若查找到的样本图像的第二特征向量和待识别图像的第一特征向量的相似度大于相似度阈值,则将查找到的样本图像所对应的样本类别,作为待识别图像的目标类别;否则,确定待识别图像的目标类别为新类别。In a possible implementation, the decision condition refers to that the similarity between the image to be recognized and the found sample image is greater than the similarity threshold. Then, the category decision-making process based on this decision condition specifically refers to: if the similarity between the second feature vector of the found sample image and the first feature vector of the image to be recognized is greater than the similarity threshold, then the found sample image will be The corresponding sample category is used as the target category of the image to be recognized; otherwise, the target category of the image to be recognized is determined to be a new category.
当然,在其他实施例中,决策条件还可以与查找到的样本图像所配置的权重有关,此处并非构成具体限定。Of course, in other embodiments, the decision-making condition may also be related to the weight configured for the found sample image, which is not specifically limited here.
通过上述过程,实现图像检索替代图像分类的图像识别方案,由于图像检索的识别准确率取决于检索库中的样本图像及其对应的样本类别,而不同于图像分类依赖于训练集的频繁变动以及卷积神经网络模型的重训练,从而能够在尽量减少人工参与的前提下充分地保证识别准确率,进而能够有效地解决相关技术中存在的识别准确率不高、不稳健、泛化性能不佳的问题。Through the above process, an image recognition solution is implemented where image retrieval replaces image classification. Since the recognition accuracy of image retrieval depends on the sample images in the retrieval database and their corresponding sample categories, unlike image classification, which relies on frequent changes in the training set and The retraining of the convolutional neural network model can fully ensure the recognition accuracy while minimizing manual participation, and can effectively solve the problems of low recognition accuracy, instability, and poor generalization performance in related technologies. The problem.
图4示出了一个实施例中基础模型的结构示意图,在图4中,基础模型包括第一训练分支和第二训练分支,第一训练分支和第二训练分支分别包括特征提取层和降维层。其中,特征提取层可以认为是未完成模型训练的特征提取器,用于图像特征的提取;降维层由两层全连接层组成,用于对特征提取层得到的特征向量进一步降维,例如,将特征提取层得到的长度为2048的特征向量转换为长度为128的特征向量。Figure 4 shows a schematic structural diagram of the basic model in one embodiment. In Figure 4, the basic model includes a first training branch and a second training branch. The first training branch and the second training branch respectively include a feature extraction layer and dimensionality reduction. layer. Among them, the feature extraction layer can be considered as a feature extractor that has not completed model training and is used to extract image features; the dimensionality reduction layer consists of two fully connected layers and is used to further reduce the dimensionality of the feature vector obtained by the feature extraction layer, for example , convert the feature vector of length 2048 obtained by the feature extraction layer into a feature vector of length 128.
图5示出了一个实施例中特征提取层的结构示意图,在图5中,特征提取层是具有50层结构深度且不包含全连接层的卷积神经网络模型。如图5所示,在50层结构中,除了卷积层Conv、池化层Pool、激活函数层ReLU、还以ResNeXt模块为基础,同时引入SE(Squeeze-and-Excitation)注意力模块,不仅使得基于该特征提取层得到的特征向量具有很强的抽象表达能力,而且在注意力机制的辅助下能够聚焦图像中对识别起主要作用的部分,例如,ROI图像中的感兴趣区域,从而充分地保证了图像特征能够更有效地被提取。Figure 5 shows a schematic structural diagram of the feature extraction layer in one embodiment. In Figure 5, the feature extraction layer is a convolutional neural network model with a structural depth of 50 layers and does not include a fully connected layer. As shown in Figure 5, in the 50-layer structure, in addition to the convolution layer Conv, the pooling layer Pool, and the activation function layer ReLU, it is also based on the ResNeXt module and introduces the SE (Squeeze-and-Excitation) attention module. The feature vector obtained based on this feature extraction layer has strong abstract expression ability, and with the assistance of the attention mechanism, it can focus on the parts of the image that play a major role in recognition, such as the area of interest in the ROI image, thus fully This ensures that image features can be extracted more effectively.
现结合图4至图7,对特征提取层的模型训练过程,即根据训练集中的图像对,对基础模型进行模型训练,得到特征提取器,进行以下详细地说明:Now combined with Figures 4 to 7, the model training process of the feature extraction layer, that is, based on the image pairs in the training set, the basic model is trained to obtain the feature extractor. The following is a detailed description:
请参阅图6,在一示例性实施例中,模型训练过程可以包括以下步骤:Referring to Figure 6, in an exemplary embodiment, the model training process may include the following steps:
步骤410,对训练集中的图像对进行遍历。Step 410: Traverse the image pairs in the training set.
其中,图像对包括正样本对和负样本对,正样本对中的两个样本图像属于相同的样本类别,负样本对中的两个样本图像属于不同的样本类别。Among them, the image pairs include positive sample pairs and negative sample pairs. The two sample images in the positive sample pair belong to the same sample category, and the two sample images in the negative sample pair belong to different sample categories.
在此说明的是,图像对的构建过程:Described here is the construction process of image pairs:
如图7所示,对训练集中的其中一个样本图像701,至少进行两次不同的图像数据增强处理,则由样本图像701至少扩增得到第一增强图像7011和第二增强图像7012。其中,图像数据增强处理(Image Data Augmentation) 包括但不限于:随机裁剪、旋转、翻转、灰度化、亮度调整、对比度调整、饱和度调整等等,此处并未加以限定。As shown in Figure 7, at least two different image data enhancement processes are performed on one of the sample images 701 in the training set, and then at least a first enhanced image 7011 and a second enhanced image 7012 are obtained from the sample image 701. Among them, image data enhancement processing (Image Data Augmentation) includes but is not limited to: random cropping, rotation, flipping, grayscale, brightness adjustment, contrast adjustment, saturation adjustment, etc., which are not limited here.
对由训练集中各样本图像扩增得到的第一增强图像和第二增强图像,进行图像配对处理,得到图像对。The first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set are subjected to image pairing processing to obtain an image pair.
举例来说,假设训练集中样本图像包括701、702,相应地,由样本图像701扩增得到的第一增强图像和第二增强图像分别是7011、7012,由样本图像702扩增得到的第一增强图像和第二增强图像分别是7021、7022。For example, assume that the sample images in the training set include 701 and 702. Correspondingly, the first enhanced image and the second enhanced image obtained by amplifying the sample image 701 are 7011 and 7012 respectively. The first enhanced image obtained by amplifying the sample image 702 The enhanced image and the second enhanced image are 7021 and 7022 respectively.
那么,经过图像配对处理后,构建得到的图像对包括{7011、7012}、{7011、7021}、{7011、7022}、{7012、7021}、{7012、7022}、{7021、7022}。其中,在上述图像对中,{7011、7012}、{7021、7022}属于正样本对,{7011、7021}、{7011、7022}、{7012、7021}、{7012、7022}属于负样本对。Then, after image pairing processing, the constructed image pairs include {7011, 7012}, {7011, 7021}, {7011, 7022}, {7012, 7021}, {7012, 7022}, {7021, 7022}. Among them, among the above image pairs, {7011, 7012}, {7021, 7022} belong to the positive sample pair, and {7011, 7021}, {7011, 7022}, {7012, 7021}, {7012, 7022} belong to the negative sample pair right.
那么,针对训练集中图像对进行的遍历过程,可以包括以下步骤:Then, the traversal process for image pairs in the training set can include the following steps:
步骤411,将图像对中的两个样本图像,分别输入第一训练分支和第二训练分支进行处理。Step 411: Input the two sample images in the image pair to the first training branch and the second training branch respectively for processing.
如图4所示,在一种可能的实现方式,在第一训练分支或者第二训练分支中,处理至少包括:通过特征提取层进行图像特征的提取、通过降维层进行特征向量的降维等。As shown in Figure 4, in one possible implementation, in the first training branch or the second training branch, the processing at least includes: extracting image features through the feature extraction layer, and reducing the dimensionality of the feature vector through the dimensionality reduction layer. wait.
值得一提的是,为了避免发生畸变,样本图像在输入第一训练分支或者第二训练分支之前,还进行了预处理。在一种可能的实现方式,预处理包括但不限于:填充、缩放、归一化等,此种方式下,由于避免了畸变的发生,有利于进一步有效地提高识别的准确率。It is worth mentioning that in order to avoid distortion, the sample images are pre-processed before being input to the first training branch or the second training branch. In a possible implementation manner, preprocessing includes but is not limited to: filling, scaling, normalization, etc. In this manner, since the occurrence of distortion is avoided, it is conducive to further effectively improving the accuracy of recognition.
其中,填充、缩放等预处理,目的在于保证第一训练分支或者第二训练分支统一的输入尺寸,例如,统一的输入尺寸为224×224。Among them, the purpose of preprocessing such as padding and scaling is to ensure a unified input size of the first training branch or the second training branch. For example, the unified input size is 224×224.
归一化预处理,是指样本图像在进行编码预处理之后,根据以下计算公式(3),进行逐像素归一化。Normalization preprocessing means that the sample image is normalized pixel by pixel according to the following calculation formula (3) after encoding preprocessing.
Figure PCTCN2022137039-appb-000002
Figure PCTCN2022137039-appb-000002
其中,I Norm表示样本图像中完成归一化处理的像素,I表示样本图像中待处理的像素; Among them, I Norm represents the pixels in the sample image that have completed normalization processing, and I represents the pixels to be processed in the sample image;
mean和std分别表示训练集中全部样本图像中全部像素的像素均值和像素标准差。mean and std respectively represent the pixel mean and pixel standard deviation of all pixels in all sample images in the training set.
步骤413,根据第一训练分支和第二训练分支得到的处理结果,计算模型损失值。Step 413: Calculate the model loss value based on the processing results obtained by the first training branch and the second training branch.
在一种可能的实现方式,模型损失值的计算公式(4)如下:In a possible implementation, the calculation formula (4) of the model loss value is as follows:
Figure PCTCN2022137039-appb-000003
Figure PCTCN2022137039-appb-000003
其中,L sup表示模型损失值; Among them, L sup represents the model loss value;
I表示训练集中所有样本图像的集合;P(i)表示训练集中排除了第i个样本图像所属的正样本对的集合;A(i)表示训练集中排除了第i个样本图像的所有样本图像的集合;I represents the set of all sample images in the training set; P(i) represents the set of positive sample pairs to which the i-th sample image belongs in the training set; A(i) represents all sample images in the training set excluding the i-th sample image. collection;
|F(i)|表示集合P(i)中样本图像的个数;|F(i)| represents the number of sample images in the set P(i);
z i是集合I中第i个样本图像的特征向量;z p是集合P(i)中第p个样本图像的特征向量;z α是集合A(i)中第α个样本图像的特征向量;τ是温度超参数,用于平衡损失函数分别对正样本对与负样本对的关注程度。 z i is the feature vector of the i-th sample image in set I; z p is the feature vector of the p-th sample image in set P(i); z α is the feature vector of the α-th sample image in set A(i) ; τ is a temperature hyperparameter, used to balance the degree of attention the loss function pays to positive sample pairs and negative sample pairs respectively.
若模型损失值使得收敛条件被满足,则执行步骤430。If the model loss value causes the convergence condition to be satisfied, step 430 is executed.
否则,若模型损失值使得收敛条件未满足,则执行步骤415。Otherwise, if the model loss value causes the convergence condition to be unsatisfied, step 415 is executed.
需要说明的是,收敛条件,可以是指模型损失值最小或者低于损失值阈值,还可以是指迭代次数满足迭代阈值,此处并未加以限定,可以根据应用场景的实际需要灵活地设置。It should be noted that the convergence condition can refer to the model loss value being minimum or lower than the loss value threshold, or it can also refer to the number of iterations meeting the iteration threshold. This is not limited here and can be flexibly set according to the actual needs of the application scenario.
步骤415,更新基础模型的参数,并返回执行步骤410。Step 415: Update the parameters of the basic model and return to step 410.
步骤430,由基础模型中的特征提取层收敛得到特征提取器。Step 430: The feature extractor is obtained by converging the feature extraction layer in the basic model.
由此,特征提取层的有监督对比学习式的模型训练完成,从而使得特征提取器具备在特征空间上拉近正样本对中两个样本图像、推远负样本对中两个样本图像的效果。As a result, the supervised contrastive learning model training of the feature extraction layer is completed, so that the feature extractor has the effect of bringing the positive sample closer to the two sample images in the feature space and pushing the negative sample farther away from the two sample images. .
值得一提的是,在模型训练完成之后,双训练分支和降维层均被丢弃,只有双训练分支中的其中一个特征提取层保留下来,作为特征提取器用于后续的图像识别,相较于图像分类中的卷积神经网络模型,模型结构大大简化,进一步避免依赖于训练集的频繁变动来维持识别性能,更加有利于提高识别准确率。It is worth mentioning that after the model training is completed, both the dual training branch and the dimensionality reduction layer are discarded, and only one of the feature extraction layers in the dual training branch is retained as a feature extractor for subsequent image recognition. Compared with The convolutional neural network model in image classification has a greatly simplified model structure, which further avoids relying on frequent changes in the training set to maintain recognition performance, and is more conducive to improving recognition accuracy.
在一示例性实施例中,上述方法还可以包括以下步骤:由检索库中样本 图像的第二特征向量构建特征向量集合。In an exemplary embodiment, the above method may further include the following steps: constructing a feature vector set from the second feature vector of the sample image in the retrieval library.
在一种可能的实现方式,特征向量集合为LUT。In one possible implementation, the feature vector set is a LUT.
如前所述,第二特征向量可以通过LUT方式存储于电子设备的存储区,以此来避免第二特征向量在每一次图像识别过程中的重复提取,从而有利于提高图像识别的识别效率。然而,发明人也意识到,随着检索库中样本图像数量的增多,LUT中预先计算的第二特征向量的数量也随之增多,由于需要计算第一特征向量和LUT中每一个第二特征向量的相似度,那么,LUT中第二特征向量的数量将影响相似度计算速度,从而影响图像识别的识别效率。As mentioned above, the second feature vector can be stored in the storage area of the electronic device in a LUT manner, thereby avoiding repeated extraction of the second feature vector in each image recognition process, thereby improving the recognition efficiency of image recognition. However, the inventor also realized that as the number of sample images in the retrieval library increases, the number of pre-calculated second feature vectors in the LUT also increases, due to the need to calculate the first feature vector and each second feature in the LUT Vector similarity, then the number of second feature vectors in the LUT will affect the similarity calculation speed, thereby affecting the recognition efficiency of image recognition.
为此,本实施例中,提出一种特征向量集合的构建过程,以此实现LUT剪枝,既能够缩小LUT规模,即减少LUT中第二特征向量的数量,又能够尽量保持LUT中第二特征向量的多样性。To this end, in this embodiment, a construction process of a feature vector set is proposed to realize LUT pruning, which can not only reduce the size of the LUT, that is, reduce the number of second feature vectors in the LUT, but also try to maintain the second feature vector in the LUT. Diversity of feature vectors.
具体地,如图8a所示,特征向量集合的构建过程可以包括以下步骤:Specifically, as shown in Figure 8a, the construction process of the feature vector set may include the following steps:
步骤510,对检索库中的每一个样本图像进行图像特征提取,分别得到检索库中各样本图像的第二特征向量,并添加至特征向量集合。Step 510: Perform image feature extraction on each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set.
步骤530,对特征向量集合中的第二特征向量进行遍历,以遍历到的第二特征向量作为第一向量,分别计算第一向量和特征向量集合中其余第二特征向量的相似度,得到第一相似度。Step 530: Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set, and obtain the second feature vector. One degree of similarity.
步骤550,基于第一相似度,从特征向量集合中删除冗余度高的第二特征向量。Step 550: Based on the first similarity, delete the second feature vector with high redundancy from the feature vector set.
其中,第二特征向量的冗余度,用于指示该第二特征向量在特征向量集合中存在相似的第二特征向量的个数。应当理解,冗余度越高,表示该第二特征向量在特征向量集合中存在相似的第二特征向量的个数越多,那么,在特征向量集合中,可以认为该第二特征向量对应的样本图像是冗余的,则可以从特征向量集合中删除该第二特征向量。The redundancy of the second feature vector is used to indicate the number of similar second feature vectors to the second feature vector in the feature vector set. It should be understood that the higher the redundancy, the greater the number of second feature vectors that are similar to the second feature vector in the feature vector set. Then, in the feature vector set, it can be considered that the second feature vector corresponds to If the sample image is redundant, the second feature vector can be deleted from the set of feature vectors.
在一种可能的实现方式,如图8b所示,LUT的剪枝过程可以包括以下步骤:In a possible implementation, as shown in Figure 8b, the LUT pruning process can include the following steps:
步骤551,将与第一向量的第一相似度大于第一设定阈值的第二特征向量,作为第二向量。Step 551: Use the second feature vector whose first similarity to the first vector is greater than the first set threshold as the second vector.
步骤553,分别计算第二向量和特征向量集合中其余第二特征向量的相似度,得到第二相似度。Step 553: Calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set to obtain the second similarity.
步骤555,根据与第一向量的第一相似度大于第一设定阈值的第二特征 向量的个数,确定第一向量的冗余度,并根据与第二向量的第二相似度大于第二设定阈值的第二特征向量的个数,确定第二向量的冗余度。Step 555: Determine the redundancy of the first vector based on the number of second feature vectors whose first similarity with the first vector is greater than the first set threshold, and determine the redundancy of the first vector based on the second similarity with the second vector that is greater than the first set threshold. 2. Set the number of second feature vectors with a threshold value to determine the redundancy of the second vector.
其中,第一向量的冗余度,用于指示第一向量在特征向量集合中存在相似的第二特征向量的个数。相似是指第一相似度大于第一设定阈值。The redundancy of the first vector is used to indicate the number of similar second feature vectors to the first vector in the feature vector set. Similarity means that the first similarity is greater than the first set threshold.
第二向量的冗余度,用于指示第二向量在特征向量集合中存在相似的第二特征向量的个数。相似是指第二相似度大于第一设定阈值。The redundancy of the second vector is used to indicate the number of similar second feature vectors to the second vector in the feature vector set. Similarity means that the second similarity is greater than the first set threshold.
步骤557,基于第二特征向量的冗余度,从特征向量集合中删除相应的第二特征向量。Step 557: Based on the redundancy of the second feature vector, delete the corresponding second feature vector from the feature vector set.
其中,若第一向量的冗余度大于第二向量的冗余度,则将第一向量从特征向量集合中删除,反之,若第二向量的冗余度大于第一向量的冗余度,则将第二向量从特征向量集合中删除。Among them, if the redundancy of the first vector is greater than the redundancy of the second vector, the first vector is deleted from the feature vector set. On the contrary, if the redundancy of the second vector is greater than the redundancy of the first vector, Then the second vector is deleted from the feature vector set.
举例来说,特征向量集合中的第二特征向量分别是:A、B、C、D。For example, the second eigenvectors in the eigenvector set are: A, B, C, and D respectively.
假设当前遍历到的第二特征向量为A,作为第一向量,那么,分别计算第一向量A与其余第二特征向量B、C、D的相似度,得到第一相似度:0.91、0.95、0.97。Assume that the second eigenvector currently traversed is A, as the first vector, then calculate the similarity between the first vector A and the remaining second eigenvectors B, C, and D respectively, and obtain the first similarity: 0.91, 0.95, 0.97.
假设第一相似度0.91、0.95、0.97均大于第一设定阈值(0.8),则将第二特征向量B、C、D作为第二向量,此时,分别计算第二向量B与其余第二特征向量A、C、D的相似度,得到第二相似度:0.91、0.7、0.97;分别计算第二向量C与其余第二特征向量A、B、D的相似度,得到第二相似度:0.95、0.97、0.75;分别计算第二向量D与其余第二特征向量A、B、C的相似度,得到第二相似度:0.97、0.75、0.77。Assuming that the first similarities 0.91, 0.95, and 0.97 are all greater than the first set threshold (0.8), then the second feature vectors B, C, and D are used as the second vectors. At this time, the second vector B and the remaining second vectors are calculated respectively. The similarities of the feature vectors A, C, and D are used to obtain the second similarity: 0.91, 0.7, and 0.97; the similarities between the second vector C and the remaining second feature vectors A, B, and D are calculated respectively, and the second similarity is obtained: 0.95, 0.97, 0.75; calculate the similarity between the second vector D and the remaining second feature vectors A, B, and C respectively, and obtain the second similarity: 0.97, 0.75, 0.77.
假设第二设定阈值也为0.8,由上可知,与第一向量A的第一相似度大于0.8的第二特征向量(B、C、D)的个数为3,与第二向量B的第二相似度大于0.8的第二特征向量(A、D)的个数为2,与第二向量C的第二相似度大于0.8的第二特征向量(A、B)的个数为2,与第二向量D的第二相似度大于0.8的第二特征向量(A)的个数为1。Assuming that the second set threshold is also 0.8, it can be seen from the above that the number of second feature vectors (B, C, D) with a first similarity greater than 0.8 to the first vector A is 3, and the number of second feature vectors (B, C, D) with the first similarity of The number of second feature vectors (A, D) with a second similarity greater than 0.8 is 2, and the number of second feature vectors (A, B) with a second similarity greater than 0.8 with the second vector C is 2, The number of second feature vectors (A) whose second similarity degree is greater than 0.8 with the second vector D is 1.
假设冗余度通过个数表示,则第一向量A的冗余度为3,第二向量B的冗余度为2,第二向量C的冗余度为2,第二向量D的冗余度为1,基于此,冗余度为3的第一向量A便从特征向量集合中删除。当然,在其他实施例中,冗余度也可以通过其他形式表示,例如,基于个数的归一化方式,此处并非构成具体限定。Assuming that the redundancy is represented by numbers, the redundancy of the first vector A is 3, the redundancy of the second vector B is 2, the redundancy of the second vector C is 2, and the redundancy of the second vector D is 2. The degree is 1. Based on this, the first vector A with a redundancy of 3 is deleted from the feature vector set. Of course, in other embodiments, the redundancy can also be expressed in other forms, such as a number-based normalization method, which is not specifically limited here.
其中,第一设定阈值和第二设定阈值可以相同,也可以不同,二者可以根据应用场景的实际需要灵活地调整,以平衡识别效率和识别精度。例如,在识别效率要求高的应用场景中,设置较小的第一设定阈值。The first set threshold and the second set threshold may be the same or different, and they may be flexibly adjusted according to the actual needs of the application scenario to balance recognition efficiency and recognition accuracy. For example, in application scenarios with high recognition efficiency requirements, set a smaller first set threshold.
在上述实施例的配合下,对于LUT中不同的第二特征向量而言,当不同的第二特征向量在特征空间上的距离小于第一设定阈值才予以保留,从而实现LUT剪枝,既能够缩小LUT规模,又能够尽量保证LUT中第二特征向量的多样性。With the cooperation of the above embodiments, for different second feature vectors in the LUT, only when the distance between the different second feature vectors in the feature space is less than the first set threshold, they are retained, thereby realizing LUT pruning, that is, It can reduce the size of the LUT and try to ensure the diversity of the second feature vector in the LUT.
在一浮游生物的应用场景中,假设存在200个浮游生物类别,若每个浮游生物类别包含1000个样本图像,则检索库中包含200,000个样本图像,那么,LUT中最多包含200,000个第二特征向量,以NVIDA RTX3090 GPU中的LUT为例,待识别图像进行图像识别耗时最多为5.8ms,完全能够满足海洋环境中浮游生物实时观测的需求。In a plankton application scenario, assuming that there are 200 plankton categories, if each plankton category contains 1,000 sample images, then the retrieval library contains 200,000 sample images, then the LUT contains up to 200,000 second features. Vector, taking the LUT in NVIDA RTX3090 GPU as an example, image recognition of the image to be recognized takes up to 5.8ms, which can fully meet the needs of real-time observation of plankton in the marine environment.
请参阅图9,在一示例性实施例中,上述方法还可以包括以下步骤:Referring to Figure 9, in an exemplary embodiment, the above method may further include the following steps:
步骤610,响应于类别校正指令,对待识别图像的目标类别进行校正。Step 610: In response to the category correction instruction, correct the target category of the image to be recognized.
步骤630,在待识别图像校正后的目标类别为新类别的情况下,响应于类别添加指令,将待识别图像及其校正后的目标类别添加至检索库。Step 630: If the corrected target category of the image to be recognized is a new category, add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction.
其中,新类别是指待识别图像校正后的目标类别区别于检索库中的样本类别。Among them, the new category means that the corrected target category of the image to be recognized is different from the sample category in the retrieval database.
如前所述,在基于图像分类的图像识别方案中,图像识别的识别性能部分依赖于大量的人工标注和人工校正,由于人工参与的工作量较大和人工参与的周期较短,此种图像识别方案并不利于降低图像识别的成本,无法实现相对稳健且灵活廉价的自动图像识别方案。As mentioned before, in image recognition solutions based on image classification, the recognition performance of image recognition partially relies on a large amount of manual annotation and manual correction. Due to the large workload of manual participation and the short period of manual participation, this type of image recognition This solution is not conducive to reducing the cost of image recognition, and cannot achieve a relatively robust, flexible and cheap automatic image recognition solution.
为此,本实施例中,在基于图像检索的图像识别方案中,提供人机交互接口,从而有助于及时发现和纠正图像识别的偏差,以充分地保证图像识别的识别性能。To this end, in this embodiment, in the image recognition solution based on image retrieval, a human-computer interaction interface is provided, thereby helping to promptly discover and correct image recognition deviations to fully ensure the recognition performance of image recognition.
具体地,图10示出了一个实施例中基于图像检索的图像识别框架的示意图。在图10中,该图像识别框架包括:用于获取待识别图像的查询图像模块(query)801、用于存储样本图像及其对应的样本类别的检索库(gallery)802、用于进行图像特征提取的特征提取器803、用于存储第二特征向量的LUT804、用于计算第一特征向量和第二特征向量的相似度的度量模块805、用于 确定待识别图像的目标类别的决策模块806、人机交互接口。Specifically, FIG. 10 shows a schematic diagram of an image recognition framework based on image retrieval in one embodiment. In Figure 10, the image recognition framework includes: a query image module (query) 801 for obtaining images to be recognized, a retrieval library (gallery) 802 for storing sample images and their corresponding sample categories, and a retrieval module (gallery) 802 for performing image feature analysis. The extracted feature extractor 803, the LUT 804 for storing the second feature vector, the measurement module 805 for calculating the similarity between the first feature vector and the second feature vector, and the decision module 806 for determining the target category of the image to be recognized. , human-computer interaction interface.
如图10所示,人机交互接口包括校正接口807和添加接口808。校正接口807用于生成类别校正指令,以便于对待识别图像的目标类别进行校正;添加接口808用于生成类别添加指令,以便于将待识别图像及其校正后的目标类别添加至检索库。As shown in Figure 10, the human-computer interaction interface includes a correction interface 807 and an adding interface 808. The correction interface 807 is used to generate a category correction instruction to correct the target category of the image to be recognized; the adding interface 808 is used to generate a category addition instruction to add the image to be recognized and its corrected target category to the retrieval library.
假设电子设备为提供识别结果浏览的智能手机,该智能手机中,显示用于识别结果浏览的浏览页面,该浏览页面中显示校正接口和添加接口。应当说明的是,校正接口和添加接口实质是可实现人机交互的控件,例如,该控件可以是输入框、选择框、按钮、开关、进度条等等。Assume that the electronic device is a smartphone that provides browsing of recognition results. The smartphone displays a browsing page for browsing the recognition results, and the browsing page displays a correction interface and an adding interface. It should be noted that the correction interface and the addition interface are essentially controls that can realize human-computer interaction. For example, the controls can be input boxes, selection boxes, buttons, switches, progress bars, etc.
那么,若用户发现待识别图像的目标类别为新类别,便能够在校正接口触发相应操作,对于校正接口来说,若检测到用户触发的相应操作,则生成类别校正指令,以此指示电子设备响应于该类别校正指令而对待识别图像的目标类别进行校正,例如,校正接口为一可供用户输入新类别名称的输入框,用户的输入操作视为用户在校正接口触发的相应操作;同理,在待识别图像校正后的目标类别为新类别的情况下,用户还能够在添加接口触发相应操作,对于添加接口来说,若检测到用户触发的相应操作,则生成类别添加指令,以此指示电子设备响应于该类别添加指令而将待识别图像及其校正后的目标类别添加至检索库,例如,添加接口为一可供用户点击的“确认/取消”按钮,用户的点击操作视为用户在添加接口触发的相应操作。Then, if the user finds that the target category of the image to be recognized is a new category, he can trigger the corresponding operation on the correction interface. For the correction interface, if the corresponding operation triggered by the user is detected, a category correction instruction is generated to instruct the electronic device The target category of the image to be recognized is corrected in response to the category correction instruction. For example, the correction interface is an input box for the user to input the name of the new category, and the user's input operation is regarded as the corresponding operation triggered by the user on the correction interface; similarly , when the corrected target category of the image to be recognized is a new category, the user can also trigger the corresponding operation in the adding interface. For the adding interface, if the corresponding operation triggered by the user is detected, a category adding instruction is generated, so that Instruct the electronic device to add the image to be recognized and its corrected target category to the retrieval library in response to the category adding instruction. For example, the adding interface is a "confirm/cancel" button for the user to click, and the user's click operation is regarded as The corresponding operation triggered by the user when adding the interface.
补充说明的是,根据电子设备所配置的输入组件的不同,用户触发的相应操作的具体行为会有所区别。例如,电子设备为配置触控屏的智能手机,则触发的相应操作可以是点击、触摸、滑动等手势操作;或者,电子设备为配置鼠标的笔记本电脑,则触发的相应操作可以是单击、双击、拖拽等机械操作,本实施例并非对此构成具体限定。It should be added that depending on the input components configured in the electronic device, the specific behavior of the corresponding operation triggered by the user will be different. For example, if the electronic device is a smartphone equipped with a touch screen, the corresponding operations triggered may be click, touch, slide and other gesture operations; or, if the electronic device is a laptop equipped with a mouse, the corresponding operations triggered may be click, touch, slide, etc. Mechanical operations such as double-clicking and dragging are not specifically limited in this embodiment.
在上述实施例的作用下,将有效的监督引入图像识别,一方面,有助于及时发现和纠正图像识别的偏差,以充分地保证图像识别的识别性能,另一方面,能够有效地增强检索库中样本图像的多样性,不仅避免因数据漂移造成识别效果下降,进而能够有效地改善图像识别的识别效果,而且多样性丰富的检索库使得特征空间的更多区域能够被样本图像构成的点群所覆盖,使得待识别图像在落入该区域时对应的目标类别能够被更加准确地被识别,进而有利于提升识别精度。Under the effect of the above embodiments, effective supervision is introduced into image recognition. On the one hand, it helps to timely discover and correct deviations in image recognition to fully ensure the recognition performance of image recognition. On the other hand, it can effectively enhance retrieval. The diversity of sample images in the library not only avoids the degradation of recognition results due to data drift, but also effectively improves the recognition effect of image recognition. The rich diversity of the retrieval library enables more areas of the feature space to be composed of points composed of sample images. Covered by the group, the corresponding target category of the image to be recognized can be more accurately recognized when it falls into this area, which is beneficial to improving the recognition accuracy.
此外,基于图像检索的图像识别框架,具备“通过检索库中新类别的添加,便能够立刻将待识别图像的目标类别识别为新类别”的特性,使得重训练对于该图像识别框架而言并非总是必须的,从而有助于延缓对重训练的需求,并降低重训练的频次,为图像识别提供了更多的方便和更大的灵活性。In addition, the image recognition framework based on image retrieval has the characteristic of "by adding a new category in the retrieval library, the target category of the image to be recognized can be immediately recognized as a new category", making retraining not suitable for this image recognition framework. is always necessary, thus helping to delay the need for retraining and reducing the frequency of retraining, providing more convenience and greater flexibility for image recognition.
下述为本申请装置实施例,可以用于执行本申请所涉及的图像识别方法。对于本申请装置实施例中未披露的细节,请参照本申请所涉及的图像识别方法的方法实施例。The following are device embodiments of the present application, which can be used to execute the image recognition method involved in the present application. For details not disclosed in the device embodiments of this application, please refer to the method embodiments of the image recognition method involved in this application.
请参阅图11,本申请实施例中提供了一种图像识别装置900,包括但不限于:图像获取模块910、特征提取模块930、图像查找模块950、以及图像识别模块970。Referring to Figure 11, an embodiment of the present application provides an image recognition device 900, including but not limited to: an image acquisition module 910, a feature extraction module 930, an image search module 950, and an image recognition module 970.
其中,图像获取模块910,用于获取待识别图像。Among them, the image acquisition module 910 is used to acquire the image to be recognized.
特征提取模块930,用于对待识别图像进行图像特征提取,得到第一特征向量。The feature extraction module 930 is used to extract image features from the image to be recognized to obtain the first feature vector.
图像查找模块950,用于在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和第一特征向量的相似度满足相似条件的样本图像,第二特征向量用于表示样本图像的图像特征。The image search module 950 is used to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in the retrieval database used to store sample images and their corresponding sample categories. The second feature vector is used to Represents the image features of the sample image.
图像识别模块970,用于根据查找到的样本图像所对应的样本类别,确定待识别图像的目标类别。The image recognition module 970 is used to determine the target category of the image to be recognized based on the sample category corresponding to the found sample image.
需要说明的是,上述实施例所提供的图像识别装置在进行图像处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即图像识别装置的内部结构将划分为不同的功能模块,以完成以上描述的全部或者部分功能。It should be noted that when the image recognition device provided in the above embodiment performs image processing, only the division of the above functional modules is used as an example. In actual applications, the above function allocation can be completed by different functional modules as needed. , that is, the internal structure of the image recognition device will be divided into different functional modules to complete all or part of the functions described above.
另外,上述实施例所提供的图像识别装置与图像识别方法的实施例属于同一构思,其中各个模块执行操作的具体方式已经在方法实施例中进行了详细描述,此处不再赘述。In addition, the image recognition device provided in the above embodiments and the image recognition method embodiments belong to the same concept. The specific manner in which each module performs operations has been described in detail in the method embodiments and will not be described again here.
图12根据一示例性实施例示出的一种电子设备的结构示意。该电子设备适用于图1所示出实施环境中的服务端130。Figure 12 shows a schematic structural diagram of an electronic device according to an exemplary embodiment. The electronic device is suitable for the server 130 in the implementation environment shown in FIG. 1 .
需要说明的是,该电子设备只是一个适配于本申请的示例,不能认为是提供了对本申请的使用范围的任何限制。该电子设备也不能解释为需要依赖 于或者必须具有图12示出的示例性的电子设备2000中的一个或者多个组件。It should be noted that this electronic device is only an example adapted to the present application and cannot be considered to provide any limitation on the scope of use of the present application. The electronic device is also not to be construed as being dependent on or required to have one or more components of the exemplary electronic device 2000 shown in FIG. 12 .
电子设备2000的硬件结构可因配置或者性能的不同而产生较大的差异,如图12所示,电子设备2000包括:电源210、接口230、至少一存储器250、以及至少一中央处理器(CPU,Central Processing Units)270。The hardware structure of the electronic device 2000 may vary greatly due to different configurations or performance. As shown in Figure 12, the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU). ,Central Processing Units)270.
具体地,电源210用于为电子设备2000上的各硬件设备提供工作电压。Specifically, the power supply 210 is used to provide operating voltage for each hardware device on the electronic device 2000 .
接口230包括至少一有线或无线网络接口,用于与外部设备交互。例如,进行图1所示出实施环境中采集端110与服务端130之间的交互。The interface 230 includes at least one wired or wireless network interface for interacting with external devices. For example, the interaction between the collection terminal 110 and the server terminal 130 in the implementation environment shown in Figure 1 is performed.
当然,在其余本申请适配的示例中,接口230还可以进一步包括至少一串并转换接口233、至少一输入输出接口235以及至少一USB接口237等,如图12所示,在此并非对此构成具体限定。Of course, in other examples adapted to this application, the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., as shown in Figure 12, which is not intended here. This constitutes a specific limitation.
存储器250作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源包括操作系统251、应用程序253及数据255等,存储方式可以是短暂存储或者永久存储。As a carrier for resource storage, the memory 250 can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc. The resources stored thereon include the operating system 251, application programs 253, data 255, etc., and the storage method can be short-term storage or permanent storage. .
其中,操作系统251用于管理与控制电子设备2000上的各硬件设备以及应用程序253,以实现中央处理器270对存储器250中海量数据255的运算与处理,其可以是Windows ServerTM、Mac OS XTM、UnixTM、LinuxTM、FreeBSDTM等。Among them, the operating system 251 is used to manage and control each hardware device and application program 253 on the electronic device 2000, so as to realize the operation and processing of the massive data 255 in the memory 250 by the central processor 270, which can be Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM, etc.
应用程序253是基于操作系统251之上完成至少一项特定工作的计算机程序,其可以包括至少一模块(图12中未示出),每个模块都可以分别包含有对电子设备2000的计算机程序。例如,图像识别装置可视为部署于电子设备2000的应用程序253。The application program 253 is a computer program that performs at least one specific job based on the operating system 251. It may include at least one module (not shown in FIG. 12), and each module may include a computer program for the electronic device 2000. . For example, the image recognition device can be regarded as an application program 253 deployed on the electronic device 2000.
数据255可以是存储于磁盘中的照片、图片等,还可以是待识别图像等等,存储于存储器250中。The data 255 may be photos, pictures, etc. stored in a disk, or may be an image to be recognized, etc., stored in the memory 250 .
中央处理器270可以包括一个或多个以上的处理器,并设置为通过至少一通信总线与存储器250通信,以读取存储器250中存储的计算机程序,进而实现对存储器250中海量数据255的运算与处理。例如,通过中央处理器270读取存储器250中存储的一系列计算机程序的形式来完成图像识别方法。The central processing unit 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read the computer program stored in the memory 250 and thereby implement operations on the massive data 255 in the memory 250 and processing. For example, the image recognition method is completed by the central processor 270 reading a series of computer programs stored in the memory 250 .
此外,通过硬件电路或者硬件电路结合软件也能同样实现本申请,因此,实现本申请并不限于任何特定硬件电路、软件以及两者的组合。In addition, the present application can also be implemented through hardware circuits or hardware circuits combined with software. Therefore, implementation of the present application is not limited to any specific hardware circuit, software, or combination of the two.
请参阅图13,本申请实施例中提供了一种电子设备4000,该电子设备 400可以包括:台式电脑、笔记本电脑、电子设备等等。Referring to Figure 13, an electronic device 4000 is provided in an embodiment of the present application. The electronic device 400 may include: a desktop computer, a notebook computer, an electronic device, etc.
在图13中,该电子设备4000包括至少一个处理器4001、至少一条通信总线4002以及至少一个存储器4003。In FIG. 13 , the electronic device 4000 includes at least one processor 4001 , at least one communication bus 4002 and at least one memory 4003 .
其中,处理器4001和存储器4003相连,如通过通信总线4002相连。可选地,电子设备4000还可以包括收发器4004,收发器4004可以用于该电子设备与其他电子设备之间的数据交互,如数据的发送和/或数据的接收等。需要说明的是,实际应用中收发器4004不限于一个,该电子设备4000的结构并不构成对本申请实施例的限定。Among them, the processor 4001 and the memory 4003 are connected, such as through a communication bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004, which may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that in practical applications, the number of transceivers 4004 is not limited to one, and the structure of the electronic device 4000 does not constitute a limitation on the embodiments of the present application.
处理器4001可以是CPU(Central Processing Unit,中央处理器),通用处理器,DSP(Digital Signal Processor,数据信号处理器),ASIC(Application Specific Integrated Circuit,专用集成电路),FPGA(Field Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器4001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The processor 4001 can be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, a data signal processor), an ASIC (Application Specific Integrated Circuit, an application-specific integrated circuit), or an FPGA (Field Programmable Gate Array). , field programmable gate array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.
通信总线4002可包括一通路,在上述组件之间传送信息。通信总线4002可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。通信总线4002可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。 Communication bus 4002 may include a path that carries information between the above-mentioned components. The communication bus 4002 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture) bus, etc. The communication bus 4002 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 13, but it does not mean that there is only one bus or one type of bus.
存储器4003可以是ROM(Read Only Memory,只读存储器)或可存储静态信息和指令的其他类型的静态存储设备,RAM(Random Access Memory,随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM(Electrically Erasable Programmable Read Only Memory,电可擦可编程只读存储器)、CD-ROM(Compact Disc Read Only Memory,只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代 码并能够由计算机存取的任何其他介质,但不限于此。The memory 4003 can be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory) or other types that can store information and instructions. Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact Disc Read Only Memory) or other optical disk storage, optical disk storage (including compression Optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer Any other medium, without limitation.
存储器4003上存储有计算机程序,处理器4001通过通信总线4002读取存储器4003中存储的计算机程序。The computer program is stored in the memory 4003, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.
该计算机程序被处理器4001执行时实现上述各实施例中的图像识别方法。When the computer program is executed by the processor 4001, the image recognition method in each of the above embodiments is implemented.
此外,本申请实施例中提供了一种存储介质,该存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述各实施例中的图像识别方法。In addition, embodiments of the present application provide a storage medium, and a computer program is stored on the storage medium. When the computer program is executed by a processor, the image recognition method in the above embodiments is implemented.
本申请实施例中提供了一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在存储介质中。电子设备的处理器从存储介质读取该计算机程序,处理器执行该计算机程序,使得该电子设备执行上述各实施例中的图像识别方法。An embodiment of the present application provides a computer program product. The computer program product includes a computer program, and the computer program is stored in a storage medium. The processor of the electronic device reads the computer program from the storage medium, and the processor executes the computer program, so that the electronic device performs the image recognition method in the above embodiments.
与相关技术相比,基于图像检索的图像识别框架,在检索库中样本图像良好的图像质量的前提下,通过有监督对比学习带来的强大的图像特征表示,使得特征空间上属于相同类别的正例聚拢、属于不同类别的负例远离,不仅避免依赖于模型的重训练,能够有效地提高图像识别的识别效率,而且充分地保证了图像识别的识别精度。Compared with related technologies, the image recognition framework based on image retrieval, on the premise of good image quality of the sample images in the retrieval library, uses powerful image feature representation brought by supervised contrastive learning to make the features belong to the same category in the feature space. The clustering of positive examples and the distance of negative examples belonging to different categories not only avoids relying on model retraining, but also effectively improves the recognition efficiency of image recognition, and fully guarantees the recognition accuracy of image recognition.
此外,图像识别框架中的检索库,不仅适用于重训练,而且适用于用户调整,有利于灵活定制化地服务不同属性和空间的识别任务。例如,面向多样性的生物的识别任务,检索库中样本图像尽量扩大样本类别的数量,以使图像识别能力能够兼顾多样性;面向某具体海域的浮游生物的识别任务,检索库中样本图像可以限定样本类别,即排除不可能出现的样本类别,既能够减少相似度计算的计算量,又能够避免待识别图像被误识别为不可能出现的样本类别,间接地保证了图像识别的识别性能;面向数量有限的感兴趣生物的识别任务,可进一步缩小检索库规模,使其仅包含感兴趣的样本类别。In addition, the retrieval library in the image recognition framework is not only suitable for re-training, but also suitable for user adjustment, which is conducive to flexible and customized services for recognition tasks of different attributes and spaces. For example, for the recognition task of diverse organisms, the number of sample categories in the retrieval database should be expanded as much as possible so that the image recognition ability can take into account the diversity; for the identification task of plankton in a specific sea area, the sample images in the retrieval database can be Limiting the sample categories, that is, excluding impossible sample categories, can not only reduce the calculation amount of similarity calculation, but also prevent the image to be recognized from being misidentified as impossible sample categories, indirectly ensuring the recognition performance of image recognition; For identification tasks with a limited number of organisms of interest, the size of the retrieval library can be further reduced to include only the sample categories of interest.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步 骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although various steps in the flowchart of the accompanying drawings are shown in sequence as indicated by arrows, these steps are not necessarily performed in the order indicated by arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least some of the steps in the flow chart of the accompanying drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and their execution order is also It does not necessarily need to be performed sequentially, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.
以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only some of the embodiments of the present application. It should be pointed out that those of ordinary skill in the technical field can also make several improvements and modifications without departing from the principles of the present application. These improvements and modifications can also be made. should be regarded as the scope of protection of this application.

Claims (10)

  1. 一种图像识别方法,其特征在于,所述方法包括:An image recognition method, characterized in that the method includes:
    获取待识别图像;Get the image to be recognized;
    对所述待识别图像进行图像特征提取,得到第一特征向量;Perform image feature extraction on the image to be recognized to obtain a first feature vector;
    在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和所述第一特征向量的相似度满足相似条件的样本图像,所述第二特征向量用于表示所述样本图像的图像特征;In the retrieval database used to store sample images and their corresponding sample categories, search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition, and the second feature vector is used to represent the Image features of the sample image;
    根据查找到的样本图像所对应的样本类别,确定所述待识别图像的目标类别。According to the sample category corresponding to the found sample image, the target category of the image to be recognized is determined.
  2. 如权利要求1所述的方法,其特征在于,所述对所述待识别图像进行图像特征提取,得到第一特征向量,包括:The method of claim 1, wherein the step of extracting image features from the image to be identified to obtain a first feature vector includes:
    利用完成模型训练的特征提取器,将所述待识别图像转换为所述第一特征向量。Using a feature extractor that has completed model training, the image to be recognized is converted into the first feature vector.
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:根据训练集中的图像对,对基础模型进行模型训练,得到所述特征提取器,所述基础模型包括第一训练分支和第二训练分支,所述第一训练分支和所述第二训练分支分别包括特征提取层和降维层;The method of claim 2, further comprising: performing model training on a basic model according to the image pairs in the training set to obtain the feature extractor, and the basic model includes a first training branch and a second training branch, the first training branch and the second training branch respectively include a feature extraction layer and a dimensionality reduction layer;
    所述根据训练集中的图像对,对基础模型进行模型训练,得到所述特征提取器,包括:The basic model is trained according to the image pairs in the training set to obtain the feature extractor, which includes:
    对所述训练集中的图像对进行遍历,所述图像对包括正样本对和负样本对,所述正样本对中的两个样本图像属于相同的样本类别,所述负样本对中的两个样本图像属于不同的样本类别;所述遍历包括:Traverse the image pairs in the training set. The image pairs include positive sample pairs and negative sample pairs. The two sample images in the positive sample pair belong to the same sample category. The two sample images in the negative sample pair Sample images belong to different sample categories; the traversal includes:
    将所述图像对中的两个样本图像,分别输入所述第一训练分支和所述第二训练分支进行处理;Enter the two sample images in the image pair into the first training branch and the second training branch respectively for processing;
    根据所述第一训练分支和所述第二训练分支得到的处理结果,计算模型损失值;Calculate the model loss value according to the processing results obtained by the first training branch and the second training branch;
    若所述模型损失值使得收敛条件被满足,则由所述基础模型中的特征提取层收敛得到所述特征提取器。If the model loss value causes the convergence condition to be satisfied, the feature extractor is obtained by converging from the feature extraction layer in the basic model.
  4. 如权利要求3所述的方法,其特征在于,所述方法还包括:在所述训 练集中构建所述图像对;The method of claim 3, further comprising: constructing the image pair in the training set;
    所述在所述训练集中构建所述图像对,包括:Constructing the image pair in the training set includes:
    对所述训练集中的其中一个样本图像,至少进行两次不同的图像数据增强处理,则由所述样本图像至少扩增得到第一增强图像和第二增强图像;Perform at least two different image data enhancement processes on one of the sample images in the training set, and then at least a first enhanced image and a second enhanced image are obtained by amplifying the sample image;
    对由所述训练集中各样本图像扩增得到的第一增强图像和第二增强图像,进行图像配对处理,得到所述图像对。The first enhanced image and the second enhanced image obtained by amplifying each sample image in the training set are subjected to image pairing processing to obtain the image pair.
  5. 如权利要求1所述的方法,其特征在于,所述在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和所述第一特征向量的相似度满足相似条件的样本图像,包括:The method according to claim 1, characterized in that, in the retrieval database used to store sample images and their corresponding sample categories, the similarity between the second feature vector and the first feature vector is found to satisfy a similarity condition. Sample images of, include:
    针对特征向量集合中的每一个第二特征向量,分别计算所述第二特征向量和所述第一特征向量的相似度,所述特征向量集合由所述检索库中样本图像的第二特征向量构建;For each second feature vector in the feature vector set, the similarity between the second feature vector and the first feature vector is calculated respectively. The feature vector set is composed of the second feature vector of the sample image in the retrieval database. Construct;
    将所述第二特征向量和所述第一特征向量的相似度最高的样本图像,作为从所述检索库中查找到的样本图像。The sample image with the highest similarity between the second feature vector and the first feature vector is used as the sample image found from the retrieval database.
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:由所述检索库中样本图像的第二特征向量构建所述特征向量集合;The method of claim 5, further comprising: constructing the feature vector set from the second feature vector of the sample image in the retrieval library;
    所述由所述检索库中样本图像的第二特征向量构建所述特征向量集合,包括:Constructing the feature vector set from the second feature vector of the sample image in the retrieval database includes:
    对所述检索库中的每一个样本图像进行图像特征提取,分别得到所述检索库中各样本图像的第二特征向量,并添加至所述特征向量集合;Perform image feature extraction on each sample image in the retrieval library, obtain the second feature vector of each sample image in the retrieval library, and add it to the feature vector set;
    对所述特征向量集合中的第二特征向量进行遍历,以遍历到的第二特征向量作为第一向量,分别计算所述第一向量和所述特征向量集合中其余第二特征向量的相似度,得到第一相似度;Traverse the second feature vectors in the feature vector set, use the traversed second feature vector as the first vector, and calculate the similarity between the first vector and the remaining second feature vectors in the feature vector set. , get the first similarity;
    基于所述第一相似度,从所述特征向量集合中删除冗余度高的第二特征向量,所述冗余度用于指示在所述特征向量集合中存在相似的第二特征向量的个数。Based on the first similarity, delete second feature vectors with high redundancy from the feature vector set, where the redundancy is used to indicate individuals with similar second feature vectors in the feature vector set. number.
  7. 如权利要求6所述的方法,其特征在于,所述根据所述第一相似度,从所述特征向量集合中删除冗余度高的第二特征向量,包括:The method of claim 6, wherein deleting second feature vectors with high redundancy from the set of feature vectors based on the first similarity includes:
    将与所述第一向量的第一相似度大于第一设定阈值的第二特征向量,作为第二向量;Use the second feature vector whose first similarity to the first vector is greater than the first set threshold as the second vector;
    分别计算所述第二向量和所述特征向量集合中其余第二特征向量的相似 度,得到第二相似度;Calculate the similarity between the second vector and the remaining second feature vectors in the feature vector set respectively to obtain a second similarity;
    根据与所述第一向量的第一相似度大于第一设定阈值的第二特征向量的个数,确定所述第一向量的冗余度,并根据与所述第二向量的第二相似度大于第二设定阈值的第二特征向量的个数,确定所述第二向量的冗余度;The redundancy of the first vector is determined based on the number of second feature vectors whose first similarity with the first vector is greater than the first set threshold, and the redundancy of the first vector is determined based on the second similarity with the second vector. The number of second feature vectors whose degrees are greater than the second set threshold is used to determine the redundancy of the second vectors;
    若所述第一向量的冗余度大于所述第二向量的冗余度,则将所述第一向量从所述特征向量集合中删除。If the redundancy of the first vector is greater than the redundancy of the second vector, the first vector is deleted from the feature vector set.
  8. 如权利要求1至7任一项所述的方法,其特征在于,所述根据查找到的样本图像所对应的样本类别,确定所述待识别图像的目标类别,包括:The method according to any one of claims 1 to 7, characterized in that determining the target category of the image to be recognized according to the sample category corresponding to the found sample image includes:
    若查找到的样本图像的第二特征向量满足决策条件,则将查找到的样本图像所对应的样本类别,作为所述待识别图像的目标类别。If the second feature vector of the found sample image satisfies the decision-making condition, the sample category corresponding to the found sample image is used as the target category of the image to be recognized.
  9. 如权利要求1至7任一项所述的方法,其特征在于,所述根据查找到的样本图像所对应的样本类别,确定所述待识别图像的目标类别之后,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that, after determining the target category of the image to be recognized based on the sample category corresponding to the found sample image, the method further includes:
    响应于类别校正指令,对所述待识别图像的目标类别进行校正;In response to a category correction instruction, correct the target category of the image to be recognized;
    在所述待识别图像校正后的目标类别为新类别的情况下,响应于类别添加指令,将所述待识别图像及其校正后的目标类别添加至所述检索库,所述新类别是指所述待识别图像校正后的目标类别区别于所述检索库中的样本类别。In the case where the corrected target category of the image to be recognized is a new category, in response to a category adding instruction, the image to be recognized and its corrected target category are added to the retrieval database, where the new category refers to The corrected target category of the image to be recognized is different from the sample category in the retrieval library.
  10. 一种图像识别装置,其特征在于,所述装置包括:An image recognition device, characterized in that the device includes:
    图像获取模块,用于获取待识别图像;Image acquisition module, used to acquire images to be recognized;
    特征提取模块,用于对所述待识别图像进行图像特征提取,得到第一特征向量;A feature extraction module, used to extract image features from the image to be recognized to obtain a first feature vector;
    图像查找模块,用于在用于存储样本图像及其对应的样本类别的检索库中,查找第二特征向量和所述第一特征向量的相似度满足相似条件的样本图像,所述第二特征向量用于表示所述样本图像的图像特征;An image search module, configured to search for sample images whose similarity between the second feature vector and the first feature vector satisfies the similarity condition in a retrieval database used to store sample images and their corresponding sample categories, where the second feature The vector is used to represent the image features of the sample image;
    图像识别模块,用于根据查找到的样本图像所对应的样本类别,确定所述待识别图像的目标类别。The image recognition module is used to determine the target category of the image to be recognized according to the sample category corresponding to the found sample image.
PCT/CN2022/137039 2022-06-01 2022-12-06 Image recognition method and apparatus WO2023231355A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210617217.1A CN117218356A (en) 2022-06-01 2022-06-01 Image recognition method and device
CN202210617217.1 2022-06-01

Publications (1)

Publication Number Publication Date
WO2023231355A1 true WO2023231355A1 (en) 2023-12-07

Family

ID=89026872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137039 WO2023231355A1 (en) 2022-06-01 2022-12-06 Image recognition method and apparatus

Country Status (2)

Country Link
CN (1) CN117218356A (en)
WO (1) WO2023231355A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118035496A (en) * 2024-04-15 2024-05-14 腾讯科技(深圳)有限公司 Video recommendation method, device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690659A (en) * 2016-12-27 2018-02-13 深圳前海达闼云端智能科技有限公司 A kind of image identification system and image-recognizing method
CN111898416A (en) * 2020-06-17 2020-11-06 绍兴埃瓦科技有限公司 Video stream processing method and device, computer equipment and storage medium
CN112633297A (en) * 2020-12-28 2021-04-09 浙江大华技术股份有限公司 Target object identification method and device, storage medium and electronic device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690659A (en) * 2016-12-27 2018-02-13 深圳前海达闼云端智能科技有限公司 A kind of image identification system and image-recognizing method
CN111898416A (en) * 2020-06-17 2020-11-06 绍兴埃瓦科技有限公司 Video stream processing method and device, computer equipment and storage medium
CN112633297A (en) * 2020-12-28 2021-04-09 浙江大华技术股份有限公司 Target object identification method and device, storage medium and electronic device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "One article to understand Ranking Loss/Margin Loss/Triplet Loss", 10 August 2020 (2020-08-10), XP093115639, Retrieved from the Internet <URL:https://www.cvmart.net/community/detail/3108> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118035496A (en) * 2024-04-15 2024-05-14 腾讯科技(深圳)有限公司 Video recommendation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117218356A (en) 2023-12-12

Similar Documents

Publication Publication Date Title
US11551333B2 (en) Image reconstruction method and device
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
US11604822B2 (en) Multi-modal differential search with real-time focus adaptation
US11605019B2 (en) Visually guided machine-learning language model
WO2019100724A1 (en) Method and device for training multi-label classification model
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
WO2017096753A1 (en) Facial key point tracking method, terminal, and nonvolatile computer readable storage medium
US20220222918A1 (en) Image retrieval method and apparatus, storage medium, and device
CN113065636B (en) Pruning processing method, data processing method and equipment for convolutional neural network
WO2020186887A1 (en) Target detection method, device and apparatus for continuous small sample images
WO2021218470A1 (en) Neural network optimization method and device
CN112052868A (en) Model training method, image similarity measuring method, terminal and storage medium
WO2023221790A1 (en) Image encoder training method and apparatus, device, and medium
CN113205142A (en) Target detection method and device based on incremental learning
US20210081677A1 (en) Unsupervised Video Object Segmentation and Image Object Co-Segmentation Using Attentive Graph Neural Network Architectures
CN113987119A (en) Data retrieval method, cross-modal data matching model processing method and device
WO2023231355A1 (en) Image recognition method and apparatus
CN115115855A (en) Training method, device, equipment and medium for image encoder
CN112529149A (en) Data processing method and related device
WO2021051562A1 (en) Facial feature point positioning method and apparatus, computing device, and storage medium
US20200151518A1 (en) Regularized multi-metric active learning system for image classification
WO2022127333A1 (en) Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN114821140A (en) Image clustering method based on Manhattan distance, terminal device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22944656

Country of ref document: EP

Kind code of ref document: A1