CN115690514A

CN115690514A - Image recognition method and related equipment

Info

Publication number: CN115690514A
Application number: CN202211422232.7A
Authority: CN
Inventors: 禹世杰; 施欣欣; 吴伟华; 范艳; 叶桔
Original assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Current assignee: SHENZHEN HARZONE TECHNOLOGY CO LTD
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-03

Abstract

The embodiment of the application discloses an image identification method and related equipment, wherein the method comprises the following steps: acquiring a target input image; generating a group of bounding boxes for foreground and background examples of a target input image by adopting an automatic label model based on a perception unknown RPN, wherein the bounding boxes comprise P bounding boxes and P corresponding scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without the class labels, and each bounding box in the m bounding boxes corresponds to one class label; selecting scores of the P scores which are larger than a preset threshold value to obtain Q scores; determining n bounding boxes in the P-m bounding boxes, wherein the n bounding boxes do not have an overlapping area with any bounding box in the m bounding boxes with the class label; and determining k target bounding boxes according to the Q scores and the n bounding boxes, marking the k target bounding boxes as unknown objects, wherein k is a positive integer. By the adoption of the image recognition method and device, image recognition accuracy can be improved.

Description

Image recognition method and related equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image recognition method and a related device.

Background

In the prior art, an image recognition method has a great application value in the field of intelligent video monitoring security, and in life, a user can realize a face unlocking function through image recognition, but the application range of image recognition is limited to face recognition frequently, or the simple application scenes are tracked by a target, so that the application value of image recognition is reduced, and the accuracy of image recognition is not shown, therefore, the problem of how to improve the accuracy of image recognition is urgently needed to be solved.

Disclosure of Invention

The embodiment of the application provides an image recognition method and related equipment, and the image recognition accuracy can be improved.

In a first aspect, an embodiment of the present application provides an image recognition method, where the method includes:

acquiring a target input image;

generating a group of bounding boxes for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, wherein the bounding boxes comprise P bounding boxes and P corresponding scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, m is an integer less than or equal to P;

selecting scores of the P scores which are larger than a preset threshold value to obtain Q scores, wherein Q is a positive integer smaller than or equal to P;

determining n bounding boxes of the P-m bounding boxes, wherein the n bounding boxes do not have an overlapping region with any one of the m bounding boxes with the class labels, and n is a positive integer less than or equal to P-m;

and determining k target bounding boxes according to the Q scores and the n bounding boxes, marking the k target bounding boxes as unknown objects, and taking k as a positive integer.

In a second aspect, an embodiment of the present application provides an image recognition apparatus, including: an acquisition unit, a recognition unit, a selection unit and a determination unit, wherein,

the acquisition unit is used for acquiring a target input image;

the identification unit is used for generating a group of bounding boxes for foreground and background examples of the target input image by adopting an automatic label model based on perceptually unknown RPN, wherein the bounding boxes comprise P bounding boxes and P corresponding scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, m is an integer less than or equal to P;

the selecting unit is used for selecting scores which are larger than a preset threshold value from the P scores to obtain Q scores, and Q is a positive integer which is smaller than or equal to P;

the determining unit is used for determining n bounding boxes which do not have an overlapping area with any bounding box of m bounding boxes with class labels in the P-m bounding boxes, and n is a positive integer less than or equal to P-m; and determining k target boundary boxes according to the Q scores and the n boundary boxes, marking the k target boundary boxes as unknown objects, wherein k is a positive integer.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for executing the steps in the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the first aspect of the embodiment of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The embodiment of the application has the following beneficial effects:

it can be seen that, in the image recognition method and the related device described in the embodiments of the present application, a target input image is obtained, and a set of bounding boxes is generated for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, where the bounding boxes include P bounding boxes and P scores corresponding to the P bounding boxes, each bounding box corresponds to one score, the P bounding boxes include m bounding boxes with category labels and P-m bounding boxes without category labels, and each bounding box in the m bounding boxes corresponds to one category label; p is an integer larger than 1, m is an integer smaller than or equal to P, scores larger than a preset threshold value in P scores are selected to obtain Q scores, Q is a positive integer smaller than or equal to P, n bounding boxes which do not have an overlapping region with any bounding box in the m bounding boxes with the category labels in the P-m bounding boxes are determined, n is a positive integer smaller than or equal to P-m, k target bounding boxes are determined according to the Q scores and the n bounding boxes, the k target bounding boxes are marked as unknown objects, and k is a positive integer.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1A is a schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 1B is a schematic diagram illustrating a marking effect of an automatic label model based on perceptual unknown RPN according to an embodiment of the present application;

fig. 1C is a schematic diagram illustrating an effect of demonstrating categorical clustering according to an embodiment of the present application;

fig. 1D is a schematic flowchart of another image recognition method provided in the embodiment of the present application;

FIG. 2 is a schematic flowchart of another image recognition method provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 4 is a block diagram of functional units of an image recognition apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.

The electronic Devices described in the embodiments of the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a car data recorder, a server, a notebook computer, a Mobile Internet device (MID, mobile Internet Devices), or a wearable device (e.g., a smart watch, a bluetooth headset), which are merely examples, but are not exhaustive and include but not limited to the above electronic Devices.

The following describes embodiments of the present application in detail.

In the embodiment of the application, in the target detection, other samples without labels can also be detected. The task of the semi-supervised object detection framework is to identify objects introduced as "unknown" without explicit supervision. The semi-supervised target detection framework process comprises the following steps of dividing a training sample set into marked samples and unmarked samples, marking C-type samples by the marked samples, and then detecting the training marked samples of the framework to obtain an optimal model. The model carries out reasoning in a non-labeled sample, can identify the previously labeled classes, identify other targets which are not labeled as unknown classes, acquire corresponding labels for the identified unknown classes, and then learn without forgetting the previously learned classes.

In the embodiment of the application, the target detection model identifies potential unknown objects by using contrast feature clustering, an energy-based classification head and an unknown perceptual RPN, and the following description is given by using the energy-only classification head. Furthermore, contrast learning is performed in the feature space to learn the discrimination clusters, and new classes can be flexibly added in a continuous manner without forgetting the previous classes.

By way of example, at any time t, a collection of object classes is considered to be known

Wherein N + represents a set of positive integers. To realistically model real-world dynamics, it is also assumed that they exist with a set of unknown classes U = { C +1, \8230 }, which may be encountered in the reasoning process. The known object class Kt is assumed to be marked in the dataset Dt = { Xt, yt }, where X and Y represent the input image and label, respectively. The input image set consists of M training images, xt = { I1, \8230;, IM } and the associated object label for each image constitute a label set Yt = { Y1, \8230;, YM }. Each Yi = { y1, y2, \8230;, yK } encodes a set of K object instances and their class labels and positions, i.e., yK = [ lk, xk, yK, wk, hk]And the lk ∈ Kt and xk, yK, wk and hk respectively represent the center coordinate, the width and the height of the bounding box.

In the embodiment of the present application, a target detection model MC (any model for implementing target detection) is trained to detect all the C classes encountered previously. Importantly, the model MC is able to identify test instances that belong to any known class C, and also can identify those that are unknown (represented by label 0) as a new or invisible class instance. The unknown instance set Ut then begins training, and n new classes of interest (among a potentially large number of unknowns) can be identified and their training examples provided. The n new classes are added incrementally and update themselves to generate an updated model MC + n without retraining the entire data set from the beginning. The known class set is also updated Kt +1= Kt + { C +1, \8230; C + n }. This loop runs through the entire life cycle of the object detector, during which it adaptively updates itself with new knowledge.

In an embodiment of the present application, given an input image, a set of bounding box predictions, and corresponding scores, may be generated for foreground and background instances by perceiving unknown RPNs. Then those boxes that have a high score but do not overlap with any of the labeled class object boxes may be labeled as potential unknown objects. Briefly, the top k background region proposals are selected, sorted according to their scores, as unknown objects.

Referring to fig. 1A, fig. 1A is a schematic flowchart of an image recognition method according to an embodiment of the present application, where as shown in the figure, the image recognition method includes:

101. a target input image is acquired.

In the embodiment of the present application, the target input image may be any image.

102. Generating a group of bounding boxes for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, wherein the bounding boxes comprise P bounding boxes and P corresponding scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, and m is an integer less than or equal to P.

The P bounding boxes may include a foreground bounding box and a background bounding box, and each bounding box corresponds to a score, regardless of whether the bounding box is a foreground bounding box or a background bounding box.

In the embodiment of the present application, an automatic label model based on a perceptual unknown Risk Priority (RPN) may be preset or default by the system. As shown in fig. 1B, the dashed bounding box represents a mark box, the bold line and solid bounding box represents a detection box, and the bold line and solid bounding box represents an unknown box, that is, a box in which an unknown object is located.

In a specific implementation, an automatic label model based on a perceptually unknown RPN may be used to generate a set of bounding boxes for foreground and background instances of the target input image, where the bounding boxes may include P bounding boxes and P scores corresponding to the bounding boxes, each bounding box corresponds to a score, and the known class label detection model is used to label the P bounding boxes, so as to obtain P bounding boxes including m bounding boxes with class labels and P-m bounding boxes without class labels, where each bounding box in the m bounding boxes corresponds to a class label, P is an integer greater than 1, and m is an integer less than or equal to P.

103. And selecting the scores which are larger than a preset threshold value from the P scores to obtain Q scores, wherein Q is a positive integer which is smaller than or equal to P.

The preset threshold value can be preset or default to the system, and the score larger than the preset threshold value can be understood as a result with high recognition result accuracy.

In this embodiment of the application, a score larger than a preset threshold may be selected from scores of a corresponding background bounding box in the P scores to obtain Q scores, where Q is a positive integer smaller than or equal to P.

104. And determining n bounding boxes of the P-m bounding boxes, wherein the n bounding boxes do not have an overlapping area with any bounding box of the m bounding boxes with the class label, and n is a positive integer less than or equal to P-m.

In the embodiment of the application, n bounding boxes of the P-m bounding boxes, which do not have an overlapping region with any bounding box of the m bounding boxes with the class label, can be determined, where n is a positive integer less than or equal to P-m, that is, an independent bounding box without the class label. Alternatively, n bounding boxes of the P-m bounding boxes that do not have an overlap region with any of the m bounding boxes having a category label and that do not have an overlap region with any foreground bounding box may also be determined.

105. And determining k target boundary boxes according to the Q scores and the n boundary boxes, marking the k target boundary boxes as unknown objects, wherein k is a positive integer.

In the embodiment of the present application, the task of the semi-supervised object detection framework is to identify objects introduced as "unknown" without explicit supervision.

In specific implementation, the k target bounding boxes may all be background bounding boxes, and then the k background bounding boxes are marked as unknown objects, where k is a positive integer. Of course, the k target bounding boxes may be all foreground bounding boxes, or the k target bounding boxes may include a partial background bounding box and a partial foreground bounding box.

In an embodiment of the present application, given an input image, a set of bounding box predictions, and corresponding scores, may be generated for foreground and background instances, sensing unknown RPNs. Those boxes that have a high score but do not overlap with any of the labeled class object boxes can be labeled as potential unknown objects. Briefly, the top k background region proposals are selected, sorted according to their scores, as unknown objects.

Optionally, the method may further include the following steps:

a1, inputting the k target bounding boxes and foreground bounding boxes in the P bounding boxes into an ROI head layer to obtain feature vectors;

and A2, inputting the feature vector into a comparison clustering function to obtain a known label category and an unknown label category.

Wherein, the contrast clustering function can be preset or default by the system.

In the embodiment of the application, the identification of the unknown class is the most important step, and the method is to use a contrast clustering algorithm to model a contrast clustering problem, so that the instances of the same class are forced to be kept nearby, and the instances of different classes are forced to be far away.

In specific implementation, a feature vector can be generated in an ROI head layer, the feature vector is used for representing a label, a contrast clustering function is adopted to force classification separation, and a contrast clustering loss is superposed on a conventional loss function to achieve the effects of forcibly reducing intra-class differences and increasing inter-class differences.

In the embodiment of the application, k target bounding boxes and foreground bounding boxes in P bounding boxes can be input into an ROI head layer to obtain feature vectors, and the feature vectors are input into a contrast clustering function to obtain known label categories and unknown label categories. As shown in fig. 1C, where diamonds represent marker class 1, circles represent marker class 2, and ellipses represent unknown class 3.

In a specific implementation, for each category i, a sensing unknown RPN generates different categories of boxes, and the boxes are input to an ROI head layer to generate feature vectors for representation, and then a contrast clustering loss function is as follows:

where D is any distance function, f _c Is a feature vector, p, derived from the ROI head layer _i Is the target vector, lcont is the contrast cluster loss function. Δ defines a metric between similarity and dissimilarity, and minimizing the penalty ensures class separation. Using the mean of the feature vectors corresponding to each class to create a target vector P = { P0..., pc }, a fixed-length queue qi may be defined to hold the features of the response of each class, and a stored feature Fstore = { q 0...., qc }, is stored in the corresponding queue. Wherein, the comparison clustering loss function can be used to optimize or train the comparison clustering function. The comparison clustering function may include a comparison clustering loss function.

In a specific implementation, the loss calculation process is as follows: input features f of known computational losses _c Store the feature Fstore, current iteration i, class target feature P = { P0.,. Pc }, parameter a, which may include the steps of:

1. p is initialized.

2、Lcont＝0。

3. If i = = Ib, update P is calculated with the mean of Fstore, and Lcont is calculated from P and fc.

4. If i > Ib and if i% Ib = =0, update Pnew is calculated with the mean of Fstore, where P = aP + (1-a) Pnew.

5. Lcont is calculated from P and fc.

Optionally, feature vectors may be generated in the ROI head layer for the k target bounding boxes and a foreground bounding box of the P bounding boxes, and the feature vectors are used to represent foreground frame labels and unknown frame labels, that is, a contrast clustering function is used to force a known label category and an unknown label category to be separated, and different categories of the unknown frame to be separated.

Optionally, the method may further include the following steps:

b1, constructing an energy basic model based on a feature vector in a preset feature space, wherein the energy basic model is a corresponding relation between an evaluation observation variable and an output variable set;

and B2, updating at least one classification model in the automatic label model based on the perception unknown RPN according to the energy basic model.

Wherein the preset feature space can be preset or default to the system. Energy-based models (energy-based models) may also be preset or system-defaulted.

In specific implementation, an energy basic model can be constructed based on a feature vector in a preset feature space, the energy basic model is used for evaluating the corresponding relation between an observation variable and an output variable set, at least one classification model in an automatic label model based on the perception unknown RPN is updated according to the energy basic model, namely, an energy head is adopted to construct the automatic label model based on the perception unknown RPN, an energy function is calculated, and known data points and position data points can be clearly separated through energy function calculation.

In the embodiment of the present application, an energy function can be sought to be learned by giving a feature vector (F E F) in a feature space F and a corresponding label L E L thereof. The model uses a single output scalar E (f) to estimate the compatibility between the observed variable f and the set of possible output variables l: the inherent ability of EBMS is to assign low energy values to data within a distribution and vice versa, which facilitates the use of energy value metrics to characterize whether a sample is from an unknown class. The formula is as follows:

t is a temporary parameter.

Correspondingly, in the embodiment of the present application, the energy of the classification model may also be defined as follows:

wherein, g _i (f) Is the network model classification head feature vector of the ith class.

In the embodiment of the application, a method for converting a standard classification head into an energy function can be provided through the formula, and known data points and position data points can be clearly separated through energy function calculation. The known energy values and the location energy distributions are modeled and the learned distributions can be used to predict unknown tags.

In a specific implementation, an energy head is used for replacing a classification head, an energy function is calculated, and a known data point and an unknown data point can be separated. Energy can be applied to the feature vector to separate out known data points and unknown data points. Specifically, the features corresponding to the foreground frame tag and the unknown frame tag may be input to the energy base model to separate the known data point and the unknown data point.

Further, in a specific implementation, the trained model is used to reason about the unlabeled samples (such as unknown data points), and the learned label classes and the unknown label classes are labeled.

Optionally, the method may further include the following steps:

c1, obtaining a plurality of samples of the unlabeled class labels, and classifying the samples into x equal parts to obtain x equal part samples, wherein x is a positive integer;

c2, inputting the x equal sample into the automatic label model based on the perception unknown RPN for reasoning to obtain a class b learned class label and an x-b unknown class label;

c3, determining real learned class labels and real unknown class labels in a man-machine interaction mode;

c4, determining a position label frame difference value between the learned class label and the unknown class label according to the class b learned class label, the x-b unknown class label, the real learned class label and the real unknown class label;

and C5, training a preset regression model by using the difference value, wherein the preset regression model is used for correcting the subsequent unknown class labels.

In the embodiment of the application, a plurality of samples of unlabeled class labels can be obtained, the samples are classified into x equal parts to obtain x equal parts samples, x is a positive integer, the x equal parts samples are input into an automatic label model based on perception unknown RPN to be inferred to obtain b class learned class labels and x-b class unknown class labels, the real learned class labels and the real unknown class labels are determined in a man-machine interaction mode, namely a manual inspection mode, the difference between a prediction result and a real result is determined according to the b class learned class labels, the x-b class unknown class labels, the real learned class labels and the real unknown class labels, the position label frame difference value between the learned class labels and the unknown class labels can be determined based on the difference, finally, a preset regression model can be trained by using the difference value, and the preset regression model is used for correcting the subsequent unknown class labels.

For example, in the embodiment of the present application, a sample without a labeled unknown class may be divided into N equal parts, a trained model is used to perform inference on the sample without the labeled unknown class, and a learned label class and an unknown label class are labeled. And manually checking a part of unknown class label frames, then counting the difference value between the positions of the manually checked unknown class label frames and the position label frames deduced by the model, and training a regression model by using the group of difference values. The unknown class label box is then rectified with a regression model.

Further, optionally, the method may further include the following steps:

c6, correcting the k target boundary frames through the preset regression model to obtain a correction result;

c7, determining the number of positive sample frames, the number of labels and the overlapping area of a marking frame and a prediction frame in the target input image according to the correction result;

c8, updating a preset loss function according to the number of the positive sample frames, the number of the labels and the overlapping area of the marking frame and the prediction frame;

and C9, performing consistency voting operation on the correction result according to the updated preset loss function.

In a specific implementation, the preset loss function may be preset or default to the system. For example, the preset loss function may specifically be as follows:

wherein σ ^j The regression consistency of the boxes is represented,

u represents the overlap of the marker box and the prediction box, N represents the number of positive sample boxes, M represents the number of negative sample boxes, and j represents the label (including known labels and unknown labels).

In the embodiment of the present application, the k target bounding boxes in step 105 may be corrected through a preset regression model, or the correction result is obtained, that is, a correct category label may be assigned to a bounding box corresponding to an unknown category that is originally marked and missed. And finally, performing consistency voting operation on the correction result according to the updated preset loss function.

In a specific implementation, the unknown class label box is corrected, and in the embodiment of the present application, positive sample consistency voting may be adopted. In the detection algorithm, the consistency of the regression results of the boxes allocated to each truth box can reflect the positioning quality of the corresponding box. Regression consistency of boxes σ ^j It is shown that,

u represents the overlap of the marker box and the prediction box area, N represents the number of positive sample boxes, and j represents the label (both known and unknown). Using regression loss functions

To achieve positive sample consensus voting. reg represents the regression output and the true output. A positive sample consensus vote is that a marker box that is located accurately will be assigned a larger regression loss weight, whereas a smaller regression loss weight will be assigned.

In the embodiment of the application, the trained model is used for reasoning the unmarked sample, and the learned label category and the unknown label category are marked. And manually checking a part of unknown labels, then counting the position difference between the manually checked unknown labels and the position label position deduced by the model, and training a regression model by using the group of difference values. Then, the unknown label frame is corrected by a regression model, the unknown label frame is further corrected, positive sample consistency voting is provided, and a regression loss function Lu is adopted _reg To achieve positive sample consensus voting.

In the specific implementation, a trained model is used for reasoning unlabeled samples, learned label types and unknown label types are marked, a part of labels are manually checked, then the difference value between the label positions checked by workers and the label positions deduced by the model is counted, and a regression model is trained by utilizing the group of difference values. The unknown label box is then rectified with a regression model. The unknown label box is further rectified with a positive sample consensus vote. Then, the unknown label frame is a known frame, model training fine tuning training is carried in to obtain a new detection model, and then the steps are repeated to expand a plurality of unknown class targets.

In a specific implementation, as shown in fig. 1D, the target input image may be sequentially input to a backbone network, a Feature Pyramid Network (FPN) and an unknown perceptual RPN network, and then the obtained result is input to an optimized ROI Head network, which optimizes the ROI Head network specifically through an energy classification Head trained by a loss function and a regression Head trained by the loss function.

For example, in this embodiment of the application, a known class label detection model (an automatic label model based on a perceptually unknown RPN) is trained, the known class label detection model is used to detect the input image, the automatic label model based on the perceptually unknown RPN is used to generate foreground and background example frames of the input image, a group of bounding boxes and corresponding scores are obtained, a part of foreground bounding boxes and corresponding scores are obtained, a part of background bounding boxes and corresponding scores are obtained, a bounding box larger than a threshold value is selected from the part of background frames, k target bounding boxes which are not overlapped with a foreground frame in the P bounding boxes, that is, k background bounding boxes, are selected from the bounding boxes larger than the threshold value, the k background bounding boxes are marked as unknown objects, and k is a positive integer. Generating a feature vector in an ROI head layer for the foreground bounding boxes in the k background bounding boxes and the P bounding boxes, representing the foreground frame labels and the unknown frame labels by using the feature vector, and forcing the known label categories and the unknown label categories to be separated by adopting a contrast clustering function, wherein the different categories of the unknown frames are separated. The energy head is used for replacing the classification head, the energy function is calculated, and the known data point and the unknown data point can be separated. And reasoning the unmarked samples by using the trained model, marking the learned label types and the unknown label types, manually checking a part of labels, then counting the difference between the label positions checked by the worker and the label positions deduced by the model, and training a regression model by using the group of difference. The unknown label box is then rectified with a regression model. The unknown label box is further rectified with a positive sample consensus vote. The unknown label frame is a known frame, model training fine tuning training is carried in to obtain a new detection model, the steps can be repeated to improve the identification precision, and a plurality of unknown class targets can be expanded.

It can be seen that, in the image recognition method described in the embodiment of the present application, a target input image is obtained, and a set of bounding boxes is generated for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, where the bounding boxes include P bounding boxes and P scores corresponding to the P bounding boxes, each bounding box corresponds to one score, the P bounding boxes include m bounding boxes with category labels and P-m bounding boxes without category labels, and each bounding box in the m bounding boxes corresponds to one category label; p is an integer larger than 1, m is an integer smaller than or equal to P, scores larger than a preset threshold value in P scores are selected to obtain Q scores, Q is a positive integer smaller than or equal to P, n bounding boxes which do not have an overlapping region with any bounding box in the m bounding boxes with the category labels in the P-m bounding boxes are determined, n is a positive integer smaller than or equal to P-m, k target bounding boxes are determined according to the Q scores and the n bounding boxes, the k target bounding boxes are marked as unknown objects, and k is a positive integer.

Referring to fig. 2, fig. 2 is a schematic flow chart of another image recognition method provided in the embodiment of the present application, applied to an electronic device, consistent with the embodiment shown in fig. 1A, as shown in the figure, the image recognition method includes:

201. and acquiring a target input image, and preprocessing the target input image.

202. Generating a group of bounding boxes for foreground and background examples of the preprocessed target input image by adopting an automatic label model based on perception unknown RPN, wherein the bounding boxes comprise P bounding boxes and P scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without the class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, and m is an integer less than or equal to P.

203. And selecting the scores which are larger than a preset threshold value from the P scores to obtain Q scores, wherein Q is a positive integer which is smaller than or equal to P.

204. And determining n bounding boxes of the P-m bounding boxes, wherein the n bounding boxes do not have an overlapping area with any bounding box of the m bounding boxes with the class label, and n is a positive integer less than or equal to P-m.

205. And determining k target boundary boxes according to the Q scores and the n boundary boxes, marking the k target boundary boxes as unknown objects, wherein k is a positive integer.

Wherein the pre-treatment may comprise at least one of: image enhancement, image noise reduction, image magnification, image reduction, and the like, without limitation.

For the detailed description of steps 201 to 205, reference may be made to the corresponding steps of the image recognition method described in fig. 1A, which are not described herein again.

It can be seen that, according to the image recognition method described in the embodiment of the application, on one hand, an input image can be preprocessed to improve the detection accuracy of the input image, and on the other hand, not only can a target of a known type be recognized, but also an unknown object can be accurately marked under the condition that the interference of the target of the known type is eliminated, so that the image recognition accuracy is improved.

In accordance with the foregoing embodiments, please refer to fig. 3, where fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and as shown in the drawing, the electronic device includes a processor, a memory, a communication interface, and one or more programs, which are applied to the electronic device, the one or more programs are stored in the memory and configured to be executed by the processor, and in an embodiment of the present application, the programs include instructions for performing the following steps:

acquiring a target input image;

generating a group of bounding boxes for foreground and background examples of the target input image by using an automatic label model based on a perception unknown RPN, wherein the bounding boxes comprise P bounding boxes and P corresponding scores, each bounding box corresponds to one score, the P bounding boxes comprise m bounding boxes with class labels and P-m bounding boxes without the class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, m is an integer less than or equal to P;

and determining k target boundary boxes according to the Q scores and the n boundary boxes, marking the k target boundary boxes as unknown objects, wherein k is a positive integer.

Optionally, the program further includes instructions for performing the following steps:

inputting the k target bounding boxes and foreground bounding boxes in the P bounding boxes into an ROI head layer to obtain feature vectors;

and inputting the characteristic vector into a comparison clustering function to obtain a known label category and an unknown label category.

constructing an energy basic model based on the feature vectors in the preset feature space, wherein the energy basic model is the corresponding relation between the evaluation observation variable and the output variable set;

and updating at least one classification model in the automatic label model based on the perception unknown RPN according to the energy basic model.

obtaining a plurality of samples of unlabeled class labels, and classifying the samples into x equal parts to obtain x equal part samples, wherein x is a positive integer;

inputting the x equal sample into the automatic label model based on the perception unknown RPN for reasoning to obtain a class b learned class label and an x-b unknown class label;

determining real learned category labels and real unknown category labels in a man-machine interaction mode;

determining a position tag frame difference value between the learned class tag and the unknown class tag according to the class b learned class tag, the class x-b unknown class tag, the real learned class tag and the real unknown class tag;

and training a preset regression model by using the difference, wherein the preset regression model is used for correcting the subsequent unknown class labels.

correcting the k target boundary frames through the preset regression model to obtain a correction result;

determining the number of positive sample frames, the number of labels and the overlapping area of a marking frame and a prediction frame in the target input image according to the correction result;

updating a preset loss function according to the number of the positive sample frames, the number of the labels and the overlapping area of the marking frame and the prediction frame;

and performing consistency voting operation on the correction result according to the updated preset loss function.

It can be seen that, in the electronic device described in this embodiment of the present application, a target input image is obtained, a set of bounding boxes is generated for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, where a bounding box includes P bounding boxes and P scores corresponding to the P bounding boxes, each bounding box corresponds to one score, the P bounding boxes include m bounding boxes with category labels and P-m bounding boxes without category labels, and each bounding box in the m bounding boxes corresponds to one category label; the method comprises the steps of selecting a score which is larger than a preset threshold value from P scores to obtain Q scores, wherein Q is a positive integer which is smaller than or equal to P, determining n bounding boxes which do not have an overlapping area with any bounding box of m bounding boxes with category labels from the P-m bounding boxes, wherein n is a positive integer which is smaller than or equal to P-m, determining k target bounding boxes according to the Q scores and the n bounding boxes, marking the k target bounding boxes as unknown objects, and marking the k target bounding boxes as positive integers.

Fig. 4 is a block diagram showing functional units of an image recognition apparatus 400 according to an embodiment of the present application. The image recognition apparatus 400 is applied to an electronic device, and the image recognition apparatus 400 may include: an acquisition unit 401, a recognition unit 402, a selection unit 403 and a determination unit 404, wherein,

the acquiring unit 401 is configured to acquire a target input image;

the identifying unit 402 is configured to generate a set of bounding boxes for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, where the bounding boxes include P bounding boxes and P scores, each bounding box corresponds to one score, the P bounding boxes include m bounding boxes with class labels and P-m bounding boxes without class labels, and each bounding box in the m bounding boxes corresponds to one class label; p is an integer greater than 1, m is an integer less than or equal to P;

the selecting unit 403 is configured to select a score greater than a preset threshold from the P scores to obtain Q scores, where Q is a positive integer less than or equal to P;

the determining unit 404 is configured to determine n bounding boxes, which do not have an overlapping area with any bounding box of the m bounding boxes with the category label, in the P-m bounding boxes, where n is a positive integer smaller than or equal to P-m; and determining k target boundary boxes according to the Q scores and the n boundary boxes, marking the k target boundary boxes as unknown objects, wherein k is a positive integer.

Optionally, the apparatus 400 is further specifically configured to:

determining real learned class labels and real unknown class labels in a man-machine interaction mode;

determining a position tag frame difference value between the learned class tag and the unknown class tag according to the class b learned class tag, the x-b unknown class tag, the real learned class tag and the real unknown class tag;

Optionally, the apparatus 400 is further specifically configured to:

It can be seen that, in the image recognition apparatus described in the embodiment of the present application, a target input image is obtained, and a set of bounding boxes is generated for foreground and background instances of the target input image by using an automatic label model based on a perceptually unknown RPN, where a bounding box includes P bounding boxes and P scores, each bounding box corresponds to one score, the P bounding boxes include m bounding boxes with category labels and P-m bounding boxes without category labels, and each bounding box in the m bounding boxes corresponds to one category label; the method comprises the steps of selecting a score which is larger than a preset threshold value from P scores to obtain Q scores, wherein Q is a positive integer which is smaller than or equal to P, determining n bounding boxes which do not have an overlapping area with any bounding box of m bounding boxes with category labels from the P-m bounding boxes, wherein n is a positive integer which is smaller than or equal to P-m, determining k target bounding boxes according to the Q scores and the n bounding boxes, marking the k target bounding boxes as unknown objects, and marking the k target bounding boxes as positive integers.

It can be understood that the functions of each program module of the image recognition apparatus of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Embodiments of the present application further provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments, and the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring a target input image;

selecting scores which are larger than a preset threshold value from the P scores to obtain Q scores, wherein Q is a positive integer smaller than or equal to P;

determining n bounding boxes of the P-m bounding boxes, wherein the n bounding boxes do not have an overlapping area with any bounding box of the m bounding boxes with the class label, and n is a positive integer less than or equal to P-m;

2. The method of claim 1, further comprising:

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method according to any one of claims 1-3, further comprising:

inputting the x equal samples into the automatic label model based on the perception unknown RPN for reasoning to obtain class b learned class labels and x-b unknown class labels;

5. The method of claim 4, further comprising:

6. An image recognition apparatus, characterized in that the apparatus comprises: an acquisition unit, an identification unit, a selection unit and a determination unit, wherein,

the acquisition unit is used for acquiring a target input image;

7. The apparatus of claim 6, wherein the apparatus is further specifically configured to:

8. The apparatus according to claim 6 or 7, wherein the apparatus is further specifically configured to:

9. An electronic device, comprising a processor, a memory to store one or more programs and configured to be executed by the processor, the programs including instructions for performing the steps in the method of any of claims 1-5.

10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-5.