CN116152576B

CN116152576B - Image processing method, device, equipment and storage medium

Info

Publication number: CN116152576B
Application number: CN202310416282.2A
Authority: CN
Inventors: 明安龙; 梁文腾; 薛峰; 康学净; 马华东
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-08-01
Anticipated expiration: 2043-04-19
Also published as: CN116152576A

Abstract

The application provides an image processing method, an image processing device, image processing equipment and a storage medium, wherein the method acquires an image to be processed; extracting features of the image to be processed through a preset target detection model to obtain bounding boxes and bounding box features; detecting by a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object; according to the bounding box and the feature of the bounding box, detection processing is carried out through a preset classifier and a preset bounding box displacement regression, a known object is obtained, the preset bounding box displacement regression is obtained through training of the feature of the bounding box and the displacement vector of the bounding box of an image sample, on the premise that the detection capability of the known object is basically unchanged, effective detection of the unknown object is achieved, the detection precision of the unknown object is improved, further, false detection of the non-object is reduced through negative energy suppression, and the unknown object is accurately positioned by utilizing a self-adaptive candidate frame screening algorithm.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

Object detection is one of the most basic tasks in computer vision, which aims at predicting the class and bounding box of objects in an input image. However, due to the large number of real world objects and high labeling cost, the target detection task can only be realized based on the assumption of a closed world, i.e. the detector only needs to detect a limited number of objects of the learned class. In recent years, the rapid development of autopilot and robotics has placed higher demands on target detection. The detector needs to find not only objects of a predefined class, i.e. known objects, but also objects that the detector never sees during training, i.e. unknown objects, in order for the unmanned car or robot to cope with a more challenging environment. Models designed based on the closed world assumption do not meet these requirements. This is because during training, even if an unknown object appears in the training image, the model learns it as a background. The detector cannot recognize the unknown object at the time of the test.

Currently, two main methods for identifying unknown objects are: open set classification and detection and open world object detection. Open set classification and detection finds unknown objects misclassified as known objects by the detector by designing an uncertainty method that measures feature differences between the unknown objects and the known objects. Open world object detection aims at letting the model detect known objects and unknown objects and improving the ability to detect unknown objects by automatically marking pseudo-unknown objects during training, and the model can learn new classes of annotations incrementally.

However, the detection accuracy of the unknown object in the prior art is low.

Disclosure of Invention

The application provides an image processing method, an image processing device, image processing equipment and a storage medium, so that the technical problem of low detection precision of an unknown object in the prior art is solved.

In a first aspect, the present application provides an image processing method, including:

acquiring an image to be processed;

extracting features of the image to be processed through a preset target detection model to obtain bounding boxes and bounding box features;

detecting according to the bounding box and the feature of the bounding box through a preset generalized object confidence coefficient regressor to obtain a bounding box of an unknown object, wherein the preset generalized object confidence coefficient regressor is obtained through training of the feature of the bounding box and the generalized object confidence coefficient of an image sample;

and detecting through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the preset classifier is obtained through training of bounding box characteristics and category probabilities of image samples, and the preset bounding box displacement regressor is obtained through training of bounding box characteristics and bounding box displacement vectors of the image samples.

The application provides a picture processing method capable of detecting known objects and unknown objects in images at the same time, aiming at the images needing to be subjected to object recognition, bounding boxes and bounding box features in the images are firstly extracted, the bounding boxes of the unknown objects are detected through a pre-built generalized object confidence coefficient regressor, generalized object features are learned from the known objects, the unknown objects are fully captured through the generalized object confidence coefficient regressor, the known objects can be detected through the pre-built classifier and the bounding box displacement regressor, accurate detection of the unknown objects and the known objects is achieved, effective detection of the unknown objects is achieved on the premise that the detection capability of the known objects is basically unchanged, and the detection precision of the unknown objects is improved.

Optionally, the detecting processing is performed by a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box, so as to obtain a bounding box of the unknown object, including: calculating generalized object confidence coefficient of each bounding box through a preset generalized object confidence coefficient regressor according to the bounding boxes and the feature of the bounding box, and performing first screening treatment on the bounding boxes according to the generalized object confidence coefficient to obtain bounding boxes to be treated; and carrying out second screening treatment on the bounding box to be treated through a self-adaptive bounding box screening mechanism according to the generalized object confidence of the bounding box to obtain a bounding box of the unknown object.

Optionally, the performing, according to the generalized object confidence of the bounding box, a second screening process on the bounding box to be processed through an adaptive bounding box screening mechanism to obtain a bounding box of an unknown object, where the method includes: constructing the bounding box to be processed into a weighted undirected graph, wherein each node in the weighted undirected graph set represents one bounding box to be processed, and each side in the weighted undirected graph set is composed of overlapping degrees among the nodes; iteratively decomposing the whole image to be processed into N subgraphs through a recursive normalization cutting algorithm until the normalization cutting cost value of the subgraphs is lower than a preset segmentation threshold value, wherein N is any positive integer; and in each subgraph, determining the bounding box to be processed with the highest confidence score of the generalized object as the bounding box of the unknown object.

Optionally, before the detecting processing is performed by a preset generalized object confidence coefficient regressor according to the bounding box and the bounding box characteristics, the method further includes: acquiring an image sample; inputting the bounding box features and the generalized object confidence coefficient of the image sample to a two-stage target detector, and training to obtain a preset generalized object confidence coefficient regressor; inputting the bounding box features and the class probabilities of the image samples to a two-stage target detector, and training to obtain a preset classifier; inputting the bounding box features and the bounding box displacement vectors of the image samples to a two-stage target detector, and training to obtain the preset bounding box displacement regressor.

Optionally, after inputting the bounding box features and the bounding box displacement vectors of the image samples to a two-stage target detector and training to obtain the preset bounding box displacement regressor, the method further includes: optimizing the preset classifier through negative energy inhibition to obtain an optimized classifier; and/or, performing optimization treatment on the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor; and/or, optimizing the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor; correspondingly, the detecting processing is performed through a preset generalized object confidence coefficient regressor according to the bounding box and the bounding box characteristics to obtain a bounding box of an unknown object, which comprises the following steps: detecting by using an optimized generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object; and detecting the object through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the method comprises the following steps of: and detecting through an optimized classifier and an optimized bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

Optionally, the optimizing the preset classifier through negative energy suppression to obtain an optimized classifier includes: training the preset classifier by negative energy suppression and combining a cross entropy loss function and an uncertainty measurement loss function synthesized based on a virtual sample to obtain an optimized classifier;

the optimizing processing is performed on the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor, and the optimizing processing comprises the following steps: training the preset bounding box displacement regressor through a preset regression loss function to obtain an optimized bounding box displacement regressor;

the optimizing processing is performed on the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor, which comprises the following steps: setting K examples in the image sample, wherein K is any positive integer; defining two indexes of an alternating prediction ratio and an alternating value ratio for the image sample, wherein the alternating prediction ratio and the alternating value ratio are obtained through calculation of the K examples and bounding box samples in the image sample; classifying bounding box samples in the image samples according to the cross prediction ratio and the cross transformation value ratio, distributing the bounding box samples containing the same object instance to the same group to obtain K groups of bounding box samples, and dividing the K groups of bounding box samples into a complete object set, a local object set, an out-of-limit object set and a non-object set; obtaining a first loss parameter according to a first preset generalized object confidence score and the complete object set; obtaining a second loss parameter according to a second preset generalized object confidence score and the local object set and/or according to the second preset generalized object confidence score and the out-of-limit object set; according to the complete object set, obtaining a third loss parameter through comparison and learning; and training the preset generalized object confidence coefficient regressor according to the first loss parameter, the second loss parameter and the third loss parameter to obtain the optimized generalized object confidence coefficient regressor.

Optionally, after the detecting processing is performed by a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics, the method further includes: and carrying out fusion processing on the bounding box of the unknown object and the known object to obtain an object set.

In a second aspect, the present application provides an image processing apparatus including:

the acquisition module is used for acquiring the image to be processed;

the feature extraction module is used for extracting features of the image to be processed through a preset target detection model to obtain a bounding box and bounding box features;

the first object identification module is used for carrying out detection processing through a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object, wherein the preset generalized object confidence coefficient regressor is obtained through training of the feature of the bounding box and the generalized object confidence coefficient of an image sample;

the second object identification module is used for carrying out detection processing through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the preset classifier is obtained through training of the bounding box characteristics and the class probability of the image sample, and the preset bounding box displacement regressor is obtained through training of the bounding box characteristics and the bounding box displacement vector of the image sample.

Optionally, the first object identification module is specifically configured to: calculating generalized object confidence coefficient of each bounding box through a preset generalized object confidence coefficient regressor according to the bounding boxes and the feature of the bounding box, and performing first screening treatment on the bounding boxes according to the generalized object confidence coefficient to obtain bounding boxes to be treated; and carrying out second screening treatment on the bounding box to be treated through a self-adaptive bounding box screening mechanism according to the generalized object confidence of the bounding box to obtain a bounding box of the unknown object.

Optionally, the first object identification module is further specifically configured to: constructing the bounding box to be processed into a weighted undirected graph, wherein each node in the weighted undirected graph set represents one bounding box to be processed, and each side in the weighted undirected graph set is composed of overlapping degrees among the nodes; iteratively decomposing the whole image to be processed into N subgraphs through a recursive normalization cutting algorithm until the normalization cutting cost value of the subgraphs is lower than a preset segmentation threshold value, wherein N is any positive integer; and in each subgraph, determining the bounding box to be processed with the highest confidence score of the generalized object as the bounding box of the unknown object.

Optionally, before the first identifying module is configured to perform detection processing by using a preset generalized object confidence coefficient regressor according to the bounding box and the bounding box feature, to obtain a bounding box of an unknown object, the apparatus further includes: the sample acquisition module is used for acquiring an image sample; the first training module is used for inputting the bounding box features and the generalized object confidence of the image sample to the two-stage target detector, and training to obtain a preset generalized object confidence regressor; the second training module is used for inputting the bounding box features and the class probabilities of the image samples to the two-stage target detector, and training to obtain a preset classifier; and the third training module is used for inputting the bounding box characteristics and the bounding box displacement vectors of the image samples to the two-stage target detector, and training to obtain the preset bounding box displacement regressor.

Optionally, after the third training module inputs the bounding box features and the bounding box displacement vectors of the image samples to a two-stage target detector and trains to obtain the preset bounding box displacement regressor, the apparatus further includes: an optimization module for: optimizing the preset classifier through negative energy inhibition to obtain an optimized classifier; and/or, performing optimization treatment on the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor; and/or, optimizing the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor;

Correspondingly, the first object identification module is specifically configured to: detecting by using an optimized generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object; the second object identification module is specifically configured to: and detecting through an optimized classifier and an optimized bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

Optionally, the optimization module is specifically configured to: training the preset classifier by negative energy suppression and combining a cross entropy loss function and an uncertainty measurement loss function synthesized based on a virtual sample to obtain an optimized classifier; and/or training the preset bounding box displacement regressor through a preset regression loss function to obtain an optimized bounding box displacement regressor; and/or, setting K examples in the image sample, wherein K is any positive integer; defining two indexes of an alternating prediction ratio and an alternating value ratio for the image sample, wherein the alternating prediction ratio and the alternating value ratio are obtained through calculation of the K examples and bounding box samples in the image sample; classifying bounding box samples in the image samples according to the cross prediction ratio and the cross transformation value ratio, distributing the bounding box samples containing the same object instance to the same group to obtain K groups of bounding box samples, and dividing the K groups of bounding box samples into a complete object set, a local object set, an out-of-limit object set and a non-object set; obtaining a first loss parameter according to a first preset generalized object confidence score and the complete object set; obtaining a second loss parameter according to a second preset generalized object confidence score and the local object set and/or according to the second preset generalized object confidence score and the out-of-limit object set; according to the complete object set, obtaining a third loss parameter through comparison and learning; and training the preset generalized object confidence coefficient regressor according to the first loss parameter, the second loss parameter and the third loss parameter to obtain the optimized generalized object confidence coefficient regressor.

Optionally, after the second object identification module performs detection processing through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics, the apparatus further includes: and the fusion module is used for carrying out fusion processing on the bounding box of the unknown object and the known object to obtain an object set.

In a third aspect, the present application provides an image processing apparatus comprising: at least one processor and memory; the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the image processing method as described above in the first aspect and the various possible designs of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the image processing method according to the first aspect and the various possible designs of the first aspect.

In a fifth aspect, the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements the image processing method according to the first aspect and the various possible designs of the first aspect.

The image processing method, the device, the equipment and the storage medium provided by the application, wherein the method aims at an image needing to be subjected to object recognition, firstly extracts bounding boxes and bounding box characteristics in the image, detects the bounding boxes of unknown objects through a pre-built generalized object confidence coefficient regressor, learns generalized object characteristics from known objects, fully captures the unknown objects through the generalized object confidence coefficient regressor, can be used for detecting the known objects through the pre-built classifier and the bounding box displacement regressor, realizes accurate detection of the unknown objects and the known objects, realizes effective detection of the unknown objects on the premise of ensuring that the detection capability of the known objects is basically unchanged, further reduces false detection of the non-objects through negative energy inhibition, accurately positions the unknown objects by utilizing a self-adaptive candidate frame screening algorithm, and improves the detection precision of the unknown objects.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic diagram of an image processing system according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 3 is a flowchart of another image processing method according to an embodiment of the present application;

fig. 4 is a flowchart of another image processing method according to an embodiment of the present application;

FIG. 5 is a schematic image of a complete object, a local object, an out-of-boundary object, and a non-object according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, terms in the embodiments of the present application will be explained:

regional advice network (Region Proposal Network, RPN): and taking the characteristic diagram of the picture as input, outputting a series of target bounding boxes, wherein each bounding box has a target score. The method has the advantages of capability of detecting a certain class of irrelevant objects, low detection precision and quick and rough object finding.

Graph segmentation: for an undirected graphThe point set is->Divided into dot sets->And Point set->Satisfy->And +.>I.e. the graph is divided into two sub-graphs by dividing the set of nodes of the graph into mutually exclusive groups.

Normalized Cut (Normalized Cut): normalized cut is a method of graph segmentation that measures both the overall dissimilarity between different subgraphs and the similarity within the subgraphs, and can segment the graph more accurately.

Uncertainty metric loss based on virtual sample synthesis: calculating the mean and covariance of the characteristics of various known samples, constructing a multi-element Gaussian distribution model, and generating a virtual sample at the boundary of the model to serve as the characteristics of a negative sample. And setting a two-class model by taking the energy value as an uncertainty measure, and judging the uncertainty of the virtual sample and the known sample, so as to realize constraint of the known class feature distribution and realize prediction of the unknown sample.

Aiming at object identification of images, the existing detection methods can be divided into two types of classification and detection of open world targets. Open set classification and detection finds unknown objects misclassified as known objects by the detector by designing an uncertainty method that measures feature differences between the unknown objects and the known objects. But to ensure accuracy of detection of known objects they suppress both unknown and non-objects during training, resulting in a lower recall of unknown objects. Open world object detection aims at letting the model detect known objects and unknown objects and improving the ability to detect unknown objects by automatically marking pseudo-unknown objects during training, and the model can learn new classes of annotations incrementally. However, many of the pseudo-unknown samples generated by the automatic labeling step are not actually representative of the true unknown object, which results in a limited ability to transfer knowledge from known to unknown. Therefore, in the reasoning process, many non-objects are erroneously detected as unknown objects, resulting in lower detection accuracy of the unknown objects. Therefore, a method for achieving both recall and precision is needed for unknown object detection tasks. The detection precision of the unknown object in the prior art is low.

In order to solve the above problems, embodiments of the present application provide an image processing method, apparatus, device, and storage medium, where the method first extracts bounding boxes and bounding box features in an image for an image to be subjected to object recognition, detects bounding boxes of unknown objects through a pre-constructed generalized object confidence regressor, learns generalized object features from known class objects, uses the generalized object confidence regressor to fully capture the unknown objects, and can be used to detect the known objects through the pre-established classifier and bounding box displacement regressor, thereby realizing accurate detection of the unknown objects and the known objects.

According to the method and the device, the unknown object can be fully captured by using a generalized object confidence regressor based on a closed-set object detection model and learning generalized object characteristics from known class objects, false detection of the non-object is reduced through negative energy inhibition, and the unknown object is accurately positioned by using a self-adaptive candidate frame screening algorithm.

Optionally, the application can construct a generalized object confidence regression device for extracting all square areas where objects may exist in an image, and designs a quantity-adaptive object candidate screening mechanism to accurately locate unknown objects from the square areas where objects may exist. Next, a training method for the object detector to learn generalized object representations is devised, allowing the model to learn generalized object class knowledge on a dataset of known class object labels. Finally, the object detected by the object detector of the known type in the detected generalized object is deleted, and the rest is the unknown object. According to the method, the generalized object confidence module can measure the scores of the generalized object areas in the image through the training method of the generalized object characterization, so that high scores are simultaneously given to the known object and the unknown object, and the areas which are most likely to be the objects are screened out through the object candidate screening mechanism with quantity self-adaption.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Optionally, fig. 1 is a schematic diagram of an image processing system architecture according to an embodiment of the present application. In fig. 1, the above architecture includes at least one of a data acquisition device 101, a processing device 102, and a display device 103.

It should be understood that the architecture illustrated in the embodiments of the present application does not constitute a specific limitation on the architecture of the image processing system. In other possible embodiments of the present application, the architecture may include more or fewer components than those illustrated, or some components may be combined, some components may be separated, or different component arrangements may be specifically determined according to the actual application scenario, and the present application is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

In a specific implementation, the data acquisition device 101 may include an input/output interface, or may include a communication interface, where the data acquisition device 101 may be connected to the processing device through the input/output interface or the communication interface.

The processing device 102 may extract bounding boxes and bounding box features in the image, detect bounding boxes of unknown objects through pre-built generalized object confidence regressors, learn generalized object features from known classes of objects, use the generalized object confidence regressors to adequately capture unknown objects, use pre-built classifier and bounding box displacement regressors to detect known objects, and further extract accurate bounding boxes of unknown objects using an adaptive object candidate screening mechanism.

The display device 103 may also be a touch display screen or a screen of a terminal device for receiving a user instruction while displaying the above content to enable interaction with a user.

It will be appreciated that the processing device described above may be implemented by a processor reading instructions in a memory and executing the instructions, or by a chip circuit.

In addition, the network architecture and the service scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided in the embodiments of the present application, and as a person of ordinary skill in the art can know, with evolution of the network architecture and appearance of a new service scenario, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The following describes the technical scheme of the present application in detail with reference to specific embodiments:

optionally, fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application. The execution body of the embodiment of the present application may be the processing device 102 in fig. 1, and the specific execution body may be determined according to an actual application scenario. As shown in fig. 2, the method comprises the steps of:

s201: and acquiring an image to be processed.

S202: and extracting features of the image to be processed through a preset target detection model to obtain bounding boxes and bounding box features.

Alternatively, the preset target detection model may be any detection model that extracts bounding boxes through images. For example, the trained two-stage object detector Faster-RCNN, the input and output of the model may be the image sample and bounding box features corresponding to the image sample, respectively.

Alternatively, a two-stage object detector Faster-RCNN which is already trained is taken as a feature extractor and a bounding box extractor, and a bounding box possibly existing in an object and feature vectors of the bounding box are extracted.

S203: and detecting by a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain the bounding box of the unknown object.

The preset generalized object confidence coefficient regressor is obtained through training of bounding box features and generalized object confidence coefficients of the image samples.

Optionally, according to the bounding box and the feature of the bounding box, detecting by a preset generalized object confidence regressor to obtain a bounding box of the unknown object, including:

calculating generalized object confidence coefficient of each bounding box through a preset generalized object confidence coefficient regressor according to the bounding boxes and the feature of the bounding box, and performing first screening treatment on the bounding boxes according to the generalized object confidence coefficient to obtain bounding boxes to be treated; and carrying out second screening treatment on the bounding box to be treated through a self-adaptive bounding box screening mechanism according to the generalized object confidence of the bounding box, so as to obtain the bounding box of the unknown object.

Optionally, performing a first screening process on the bounding box according to the confidence coefficient of the generalized object to obtain a bounding box to be processed, which specifically includes: and reserving the bounding box with the generalized object confidence coefficient larger than the first threshold value to obtain a bounding box to be processed.

In one possible implementation, the trained generalized object confidence regressors are used to predict the object confidence of these bounding boxes and score confidence greater than Is kept, denoted +.>Its generalized object confidence score is +.>. Wherein (1)>As the first threshold, it may be determined according to practical situations, and the embodiment of the present application is not particularly limited. M is any positive number.

Optionally, according to the generalized object confidence of the bounding box, performing second screening processing on the bounding box to be processed through a self-adaptive bounding box screening mechanism to obtain a bounding box of the unknown object, including: constructing the bounding box to be processed into a weighted undirected graph, wherein each node in the weighted undirected graph set represents the bounding box to be processed, and each side in the weighted undirected graph set is composed of the overlapping degree between the nodes; iteratively decomposing the whole image to be processed into N subgraphs through a recursive normalization cutting algorithm until the normalization cutting cost value of the subgraphs is lower than a preset segmentation threshold value, wherein N is any positive integer; in each subgraph, the bounding box to be processed with the highest confidence score of the generalized object is determined as the bounding box of the unknown object.

In one possible implementation, bounding boxes where objects may be present are screened out by an adaptive bounding box screening mechanism. Specifically, the bounding box is to Constructed as a weighted undirected graph +.>Set->Each node in (a) represents a bounding box +.>Set->Each edge of (a) is defined by the degree of overlap between nodes (Intersection over Union, ioU) -by ∈10>The composition is formed. Next, the entire map is treated with a recursive normalization cutting algorithm>Iteratively decomposing into several subgraphs until the normalized cut cost value of the subgraphs is below a threshold +.>And then terminate. And finally, taking the generalized object confidence score of each sub-graph as a bounding box of the predicted unknown object. Wherein (1)>For the preset segmentation threshold, it may be determined according to practical situations, and the embodiment of the present application is not specifically limited.

According to the method and the device, the unknown object is fully captured by using the generalized object confidence coefficient regressor, the bounding boxes are screened according to the generalized object confidence coefficient, the bounding boxes to be processed are screened by the self-adaptive bounding box screening mechanism, the unknown object is accurately positioned by utilizing the self-adaptive candidate frame screening algorithm, and the accuracy of unknown object identification is further improved.

S204: and detecting through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

The preset classifier is obtained through training of bounding box features and class probabilities of the image samples, and the preset bounding box displacement regressor is obtained through training of bounding box features and bounding box displacement vectors of the image samples.

In one possible implementation, the steps of extracting known objects follow the Faster-RCNN, and the extracted bounding boxes are predicted with trained bounding box displacement regressors and classifiers, respectively. Then, bounding boxes with extremely low scores are removed with non-maximal suppression, and several bounding boxes with highest scores are retained. From the result output by the classifier, the energy value is smaller thanPossibly a bounding box deletion of an unknown object. The resulting bounding box is the predicted known object. Wherein (1)>For the preset energy value, it may be determined according to practical situations, and the embodiment of the present application is not particularly limited.

Optionally, after performing detection processing by a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics, obtaining the known object, the method further includes: and carrying out fusion processing on the bounding box of the unknown object and the known object to obtain an object set.

Optionally, the known object is fused with the set of unknown object bounding boxes. Specifically, if the overlap (Intersection over Union, ioU) between an unknown object bounding box and any one of the known object bounding boxes exceeds the overlap threshold, then the object bounding box is deleted, otherwise the object bounding box is retained.

It will be appreciated that the overlapping degree threshold may be determined according to practical situations, which is not particularly limited in the embodiments of the present application. For example, the overlap threshold may be 95%.

Here, the embodiment of the application fuses the known object and the unknown object bounding box set, can effectively remove redundant data or misjudgment data, and realizes effective detection of the unknown object on the premise of ensuring that the detection capability of the known object is basically unchanged.

The embodiment of the application provides a picture processing method capable of simultaneously detecting a known object and an unknown object in an image, aiming at the image needing object identification, firstly extracting bounding boxes and bounding box characteristics in the image, detecting the bounding boxes of the unknown object through a pre-constructed generalized object confidence coefficient regressor, learning generalized object characteristics from the known class object, fully capturing the unknown object through the generalized object confidence coefficient regressor, and detecting the known object through a pre-established classifier and the bounding box displacement regressor, thereby realizing accurate detection of the unknown object and the known object, realizing effective detection of the unknown object on the premise of ensuring that the detection capability of the known object is basically unchanged, and improving the detection precision of the unknown object.

Optionally, a model for detecting an object may be pre-established in the embodiment of the present application to achieve the purpose of accurate identification, and accordingly, fig. 3 is a schematic flow chart of another image processing method provided in the embodiment of the present application, as shown in fig. 3, where the method includes:

s301: and acquiring an image to be processed.

S302: and extracting features of the image to be processed through a preset target detection model to obtain bounding boxes and bounding box features.

The implementation of step S301 to step S302 is similar to that of step S201 to step S202, and the embodiment of the present application is not particularly limited herein.

S303: an image sample is acquired.

The image sample may include a history image, and further include one or more of a bounding box, a bounding box feature, and a known object corresponding to the history image, and may also be a simulation image, and one or more of a bounding box, a bounding box feature, and a known object corresponding to the simulation image.

S304: inputting the bounding box features and the generalized object confidence of the image sample to a two-stage target detector, and training to obtain a preset generalized object confidence regressor.

S305: inputting the bounding box features and the class probability of the image sample to a two-stage target detector, and training to obtain a preset classifier.

S306: inputting bounding box features and bounding box displacement vectors of the image samples to a two-stage target detector, and training to obtain the preset bounding box displacement regressor.

Optionally, after inputting the bounding box features and the bounding box displacement vectors of the image samples to the two-stage target detector and training to obtain the preset bounding box displacement regressor, the method further includes:

optimizing the preset classifier through negative energy inhibition to obtain an optimized classifier; and/or, optimizing the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor; and/or, carrying out optimization treatment on the preset bounding box displacement regressor to obtain the optimized bounding box displacement regressor.

Correspondingly, according to the bounding box and the feature of the bounding box, detecting by a preset generalized object confidence coefficient regressor to obtain the bounding box of the unknown object, including: and detecting by using the optimized generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain the bounding box of the unknown object.

According to the bounding box and the feature of the bounding box, detecting through a preset classifier and a preset bounding box displacement regressor to obtain a known object, wherein the method comprises the following steps: and detecting by using the optimized classifier and the optimized bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

After each model is established, each model can be optimized, the purpose of distributing high confidence coefficient for surrounding boxes surrounding unknown objects and the purpose of expanding characteristic response gaps between the non-objects and the objects are achieved through applying different losses to collected samples, so that the purpose of misdetection of the non-objects is reduced, and the accuracy of object identification and the accuracy of image processing are further improved.

Optionally, the optimizing process is performed on the preset classifier through negative energy suppression, so as to obtain an optimized classifier, which comprises the following steps: and training the preset classifier by negative energy inhibition and combining a cross entropy loss function and an uncertainty measurement loss function synthesized based on a virtual sample to obtain an optimized classifier.

In one possible implementation, the classifier is supervised, in addition to the cross entropy loss function supervision commonly used in the field of object detection, by designing an additional loss function, comprising the sub-steps of:

first, a bounding box set of a current image (image sample)According to the negative energy value->And (5) sequencing.

Next, from the collectionIs selected from the group consisting of +.>The bounding boxes are used to suppress the characteristic responses of non-objects.

Then, a suppression loss is applied to constrain the negative energy score to be lowestMultiple bounding boxes for suppressing damageMalnutrition of the heart>The formula is as follows:

total energy loss includes the aboveAnd uncertainty metric loss based on virtual sample synthesisTotal energy loss->The formula is as follows:

wherein uncertainty metric loss based on virtual sample synthesisA self-opening collection object Detection (Open-Set Detection) algorithm VOS (Virtual Outlier Synthesis) is referenced. Through loss->After training of (a), the negative energy distribution of the non-object is significantly different from the unknown negative energy distribution, i.e. the non-object is suppressed. The method simultaneously reduces the characteristic response of the non-object bounding box, and further expands the generalized object confidence (Generalized Object Confidence, GOC) difference between non-objects. />

Optionally, performing optimization processing on the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor, including: and training the preset bounding box displacement regressor through a preset regression loss function to obtain the optimized bounding box displacement regressor.

The regression loss function commonly used in the field of object detection is adopted for supervision on the bounding box displacement regressor.

Optimizing the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor, comprising:

Setting an image sample to comprise K examples, wherein K is any positive integer; defining two indexes of an alternating prediction ratio and an alternating value ratio for an image sample, wherein the alternating prediction ratio and the alternating value ratio are obtained by calculating K examples and bounding box samples in the image sample; classifying bounding box samples in the image samples according to the cross prediction ratio and the cross transformation value ratio, distributing the bounding box samples containing the same object instance to the same group to obtain K groups of bounding box samples, and dividing the K groups of bounding box samples into a complete object set, a local object set, an out-of-limit object set and a non-object set; obtaining a first loss parameter according to a first preset generalized object confidence score and a complete object set; obtaining a second loss parameter according to the second preset generalized object confidence score and the local object set and/or according to the second preset generalized object confidence score and the out-of-limit object set; according to the complete object set, obtaining a third loss parameter through comparison and learning; training the preset generalized object confidence coefficient regressor according to the first loss parameter, the second loss parameter and the third loss parameter to obtain the optimized generalized object confidence coefficient regressor.

In one possible implementation, the generalized object confidence regressor design is supervised for multiple sets of loss functions, comprising the sub-steps of:

the fast-RCNN network which is trained in advance is trained in two stages according to the following loss function setting:

let the current image includeAn example is->Two indices, i.e. the cross prediction ratio (Intersection Over The Predicted Bounding Box, ioP) and the cross value ratio (In) are definedtersection Over The Correct Bounding Box，IoC）：

Next, for a bounding box setEach bounding box->From the instance set->Find and +.>Is->(Intersection over Union, cross-over ratio) the largest example. Assigning bounding boxes containing the same object instance to the same group, get +.>Group bounding box:>. Then, according to the following formula +.>、/>And->Will->Is divided into a complete object set +.>Local object set->Out-of-bounds object set->And non-object set->：

Wherein the method comprises the steps ofThe constant threshold value can be determined according to actual conditions.

Designing a first loss parameter, and tending the generalized object confidence score of the bounding box of the complete object to be 1:

/>

then, designing a second loss parameter to suppress the generalized object confidence score of the local object or the out-of-limit object to a constant The following steps are provided:

wherein the constant is. Then, contrast learning is employed to improve the model's ability to capture bounding boxes containing more complete objects, and the third loss parameter formula is as follows:

wherein whenWhen (I)>The method comprises the steps of carrying out a first treatment on the surface of the Otherwise->. And->。/>Is a small constant, set to 0.01.

Finally, calculating the total generalized object confidence loss includes the following three parts:

through the above sampling and training, the generalized object confidence scores of both unknown and known objects are pushed to very high scores.

Here, the embodiment of the application monitors the regression loss function commonly used in the object detection field of the bounding box displacement regressor, monitors the design of multiple groups of loss functions of the generalized object confidence regressor, monitors the classifier by adopting the cross entropy loss function commonly used in the object detection field, and also designs additional loss functions, so that accurate loss is applied to each detection model, the detection precision is further improved, and the accuracy of image processing is improved.

S307: and detecting by a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain the bounding box of the unknown object.

S308: and detecting through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

The implementation of step S307 to step S308 is similar to the implementation of step S203 to step S204, and the embodiment of the present application is not particularly limited herein.

Here, the embodiment of the application learns generalized object characteristics from known class objects, so that a model learns generalized object class knowledge on a data set marked by the known class objects, a preset generalized object confidence coefficient regressor is established, the unknown objects are fully captured by using the generalized object confidence coefficient regressor, a preset classifier and a preset bounding box displacement regressor are established by combining the known class objects, the known objects are extracted, the characteristics of the objects are fully combined to establish each model for object identification, and the accuracy of image identification is further improved.

Exemplary, fig. 4 is a flow chart of still another image processing method according to an embodiment of the present application, as shown in fig. 4, where a solid line represents a training and testing process, a dashed line represents a training loss function, a rounded rectangle represents an operation process, and a right rectangle represents a specific result obtained.

As shown in fig. 4, the method for detecting the object of the known and unknown class according to the embodiment of the application comprises the following steps:

step one: taking a trained two-stage target detector Faster-RCNN as a feature extractor andthe bounding box extractor extracts bounding boxes in which objects may exist and feature vectors of the bounding boxes. Let Faster-RCNN extract bounding box from current image asAnd bounding box->Is expressed as +.>Wherein->For the dimension of bounding box feature +.>Is an index of the bounding box.

Step two: and designing a pre-measuring head structure on the trained Faster-RCNN, and using the characteristics of each bounding box as a sample to train three pre-measuring heads of the object detector, namely a generalized object confidence coefficient regressor, a bounding box displacement regressor and a classifier.

Comprises the following substeps:

step 2.1: the structure of the bounding box displacement regressor and the classifier are consistent with that of the Faster-RCNN, and are all of a layer of linear transformation structure. Their inputs are all features of bounding boxesThe outputs are the displacement vector of each bounding box and the class probabilities of the bounding boxes, respectively.

Step 2.2: designing a generalized object confidence regressor, expressed as . The module structure is a linear transformation, and the input of the module structure is the characteristic of a bounding box +.>And outputs a constant +.>The constant is the generalized object confidence of the bounding box.

Step 2.3: let the input of the classifier be the feature of a bounding boxOutput is +.>Probability of individual category, referring to open object detection algorithm VOS, calculating bounding box +.>Is defined as the negative of the weighted sum of the outputs of the bounding box in exponential space:

wherein the method comprises the steps ofFor classifying head class->Is a logic output of>，/>To mitigate class imbalance.

Step three: by applying different losses to the collected samples, the purpose of assigning high confidence to the bounding box bounding the unknown object and the purpose of expanding the characteristic response gap between the non-object and the object to reduce false detection of the non-object are respectively achieved.

Comprises the following substeps:

step 3.1: and monitoring the regression loss function of the bounding box displacement regressor commonly used in the field of object detection.

Step 3.2: monitoring a plurality of groups of loss functions designed by a generalized object confidence regressor, comprising the following substeps:

step 3.2.1: the fast-RCNN network which has been trained in advance is trained in two stages according to the following loss function setting.

Step 3.2.2: let the current image includeAn example is->Two indices are defined, ioP and IoC:

/>

next, for the followingEach bounding box->From->Find and +.>Is->(Intersection over Union, cross-over ratio) the largest example. Assigning bounding boxes containing the same object instance to the same group, get +.>Group bounding box:>. Then, according to the following formula +.>、/>And->Will->Is divided into a complete object set +.>Local object set->Out-of-bounds object set->And non-object set->：

Wherein the method comprises the steps ofIs a constant threshold.

Step 3.2.3: design penalty, generalized object confidence score for bounding box of complete object tends to be 1:

then, designing a second loss, and pressing the generalized object confidence score of the local object or the out-of-limit object to be constantThe following steps are provided:

wherein the constant is. Contrast learning is then employed to enhance the ability of the model to capture bounding boxes containing more complete objects:

Exemplary, fig. 5 is an image schematic diagram of a complete object, a local object, an out-of-boundary object, and a non-object according to an embodiment of the present application, where an example box is a solid line, and a prediction box is a light dotted line.

Step 3.3: the method comprises the following substeps of:

step 3.3.1: first, bounding box set for current imageAccording to the negative energy value->And (5) sequencing.

Step 3.3.2: next, from the collectionIs selected from the group consisting of +.>The bounding boxes are used to suppress the characteristic responses of non-objects. />

Step 3.3.3: then, a suppression loss is applied to constrain the negative energy score to be lowestThe bounding boxes:

total energy loss includes the aboveAnd uncertainty metric loss based on virtual sample synthesis:

Wherein uncertainty metric loss based on virtual sample synthesisReference is made to the Open-Set Detection (Open-Set Detection) algorithm VOS. Through loss->After training of (a), the negative energy distribution of the non-object is significantly different from the unknown negative energy distribution, i.e. the non-object is suppressed. The method simultaneously reduces the characteristic response of the non-object bounding box and further expands the GOC difference between non-objects.

Step four: and detecting the object of the image of the natural scene by using the trained model.

Comprises the following substeps:

step 4.1: extracting bounding boxes and features of the bounding boxes from the image through a backbone network of the trained Faster-RCNN structure.

Step 4.2: detecting an unknown object by a generalized object confidence regressor, comprising the sub-steps of:

step 4.2.1: predicting the object confidence of the bounding boxes by using a trained generalized object confidence regressor, and scoring the confidence with a confidence score greater thanIs kept, denoted +.>Its generalized object confidence score is。

Step 4.2.2: and screening out bounding boxes in which objects possibly exist through an adaptive bounding box screening mechanism. Specifically, the bounding box is toConstructed as a weighted undirected graph +.>Each node in the set represents a bounding box +.>Set->Is made up of the degree of overlap between nodes. Next, the entire map is treated with a recursive normalization cutting algorithm>Iteratively decomposing into several subgraphs until the normalized cut cost value of the subgraphs is below a threshold +.>And then terminate. And finally, taking the generalized object confidence score of each sub-graph as a bounding box of the predicted unknown object.

Step 4.3: detecting a known object by a classifier and a bounding box displacement regressor, comprising the sub-steps of:

step 4.3.1: following the steps of Faster-RCNN extracting known objects, the extracted bounding boxes are predicted with a bounding box displacement regressor and classifier, respectively. Then, bounding boxes with extremely low scores are removed with non-maximal suppression, and several bounding boxes with highest scores are retained.

Step 4.3.2: from the result output by the classifier, the energy value is smaller thanPossibly a bounding box deletion of an unknown object. The resulting bounding box is the predicted known object.

Step 4.3.3: the known object is fused with the set of unknown object bounding boxes. Specifically, if IoU between an unknown object bounding box and any one of the known object bounding boxes exceeds 95%, then it is deleted, otherwise it is retained.

According to the method and the device, the generalized object representation is trained from the limited object types through the generalized object confidence module and the corresponding sampling and training strategies, false detection of non-objects is further reduced through negative energy suppression, and the unknown objects are captured by utilizing the self-adaptive object candidate screening mechanism during reasoning. Compared with the prior art, the invention can realize accurate and full detection of the unknown object on the premise of ensuring that the detection capability of the known object is basically unchanged.

Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus according to the embodiment of the present application includes: an acquisition module 601, a feature extraction module 602, a first object identification module 603, and a second object identification module 604. The image processing apparatus here may be the processing apparatus described above, the processor itself, or a chip or an integrated circuit that realizes the functions of the processor. Here, the division of the acquisition module 601, the feature extraction module 602, the first object identification module 603, and the second object identification module 604 is only a logical division, and both may be integrated or independent physically.

The acquisition module is used for acquiring the image to be processed;

the feature extraction module is used for extracting features of the image to be processed through a preset target detection model to obtain bounding boxes and bounding box features;

the first object recognition module is used for carrying out detection processing through a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object, wherein the preset generalized object confidence coefficient regressor is obtained through training of the feature of the bounding box and the generalized object confidence coefficient of an image sample;

The second object identification module is used for carrying out detection processing through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the preset classifier is obtained through training of the bounding box characteristics and the category probability of the image sample, and the preset bounding box displacement regressor is obtained through training of the bounding box characteristics and the bounding box displacement vector of the image sample.

Optionally, the first object identification module is specifically configured to: calculating generalized object confidence coefficient of each bounding box through a preset generalized object confidence coefficient regressor according to the bounding boxes and the feature of the bounding box, and performing first screening treatment on the bounding boxes according to the generalized object confidence coefficient to obtain bounding boxes to be treated; and carrying out second screening treatment on the bounding box to be treated through a self-adaptive bounding box screening mechanism according to the generalized object confidence of the bounding box, so as to obtain the bounding box of the unknown object.

Optionally, the first object identification module is further specifically configured to: constructing the bounding box to be processed into a weighted undirected graph, wherein each node in the weighted undirected graph set represents the bounding box to be processed, and each side in the weighted undirected graph set is composed of the overlapping degree between the nodes; iteratively decomposing the whole image to be processed into N subgraphs through a recursive normalization cutting algorithm until the normalization cutting cost value of the subgraphs is lower than a preset segmentation threshold value, wherein N is any positive integer; in each subgraph, the bounding box to be processed with the highest confidence score of the generalized object is determined as the bounding box of the unknown object.

Optionally, before the first identifying module is configured to perform detection processing by using a preset generalized object confidence coefficient regressor according to the bounding box and the bounding box characteristics, to obtain a bounding box of the unknown object, the apparatus further includes: the sample acquisition module is used for acquiring an image sample; the first training module is used for inputting the bounding box features of the image samples and the generalized object confidence to the two-stage target detector, and training to obtain a preset generalized object confidence regressor; the second training module is used for inputting the bounding box features and the class probability of the image sample to the two-stage target detector, and training to obtain a preset classifier; and the third training module is used for inputting the bounding box features and the bounding box displacement vectors of the image samples to the two-stage target detector, and training to obtain the preset bounding box displacement regressor.

Optionally, after the third training module inputs the bounding box feature and the bounding box displacement vector of the image sample to the two-stage target detector and trains to obtain the preset bounding box displacement regressor, the apparatus further includes: an optimization module for: optimizing the preset classifier through negative energy inhibition to obtain an optimized classifier; and/or, optimizing the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor; and/or, carrying out optimization treatment on the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor; accordingly, the first object identification module is specifically configured to: detecting by using an optimized generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object;

The second object identification module is specifically configured to: and detecting by using the optimized classifier and the optimized bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

Optionally, the optimization module is specifically configured to: training a preset classifier by negative energy suppression and combining a cross entropy loss function and an uncertainty measurement loss function synthesized based on a virtual sample to obtain an optimized classifier; and/or training the preset bounding box displacement regressor through a preset regression loss function to obtain an optimized bounding box displacement regressor; and/or, setting the image sample to comprise K examples, wherein K is any positive integer; defining two indexes of an alternating prediction ratio and an alternating value ratio for an image sample, wherein the alternating prediction ratio and the alternating value ratio are obtained by calculating K examples and bounding box samples in the image sample; classifying bounding box samples in the image samples according to the cross prediction ratio and the cross transformation value ratio, distributing the bounding box samples containing the same object instance to the same group to obtain K groups of bounding box samples, and dividing the K groups of bounding box samples into a complete object set, a local object set, an out-of-limit object set and a non-object set; obtaining a first loss parameter according to a first preset generalized object confidence score and a complete object set; obtaining a second loss parameter according to the second preset generalized object confidence score and the local object set and/or according to the second preset generalized object confidence score and the out-of-limit object set; according to the complete object set, obtaining a third loss parameter through comparison and learning; training the preset generalized object confidence coefficient regressor according to the first loss parameter, the second loss parameter and the third loss parameter to obtain the optimized generalized object confidence coefficient regressor.

Referring to fig. 7, there is shown a schematic diagram of a structure of an image processing apparatus 700 suitable for use in implementing an embodiment of the present disclosure, the image processing apparatus 700 may be a terminal apparatus or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (Personal Digital Assistant, PDA for short), a tablet (Portable AndroidDevice, PAD for short), a portable multimedia player (Portable Media Player, PMP for short), an in-vehicle terminal (e.g., an in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The image processing apparatus shown in fig. 7 is only one example, and should not bring any limitation to the functions and the use ranges of the embodiments of the present disclosure.

As shown in fig. 7, the image processing apparatus 700 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 701 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage device 708 into a random access Memory (Random Access Memory, RAM) 703. In the RAM 703, various programs and data required for the operation of the image processing apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a liquid crystal display (Liquid Crystal Display, LCD for short), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the image processing apparatus 700 to perform wireless or wired communication with other apparatuses to exchange data. While fig. 7 shows an image processing apparatus 700 having various devices, it is to be understood that not all illustrated devices are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 701.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the image processing apparatus; or may exist alone without being incorporated into the image processing apparatus.

The computer-readable medium carries one or more programs which, when executed by the image processing apparatus, cause the image processing apparatus to execute the method shown in the above embodiment.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or it may be connected to an external computer (e.g., connected via the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The image processing device of the embodiment of the present application may be used to execute the technical solutions of the embodiments of the methods of the present application, and its implementation principle and technical effects are similar, and are not repeated here.

The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and the computer executable instructions are used for realizing the image processing method of any one of the above when being executed by a processor.

Embodiments of the present application also provide a computer program product, including a computer program, which when executed by a processor is configured to implement the image processing method of any one of the above.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

Acquiring an image to be processed;

detecting according to the bounding box and the feature of the bounding box through a preset classifier and a preset bounding box displacement regressor to obtain a known object, wherein the preset classifier is obtained through training of the feature of the bounding box and the class probability of an image sample, and the preset bounding box displacement regressor is obtained through training of the feature of the bounding box and the displacement vector of the bounding box of the image sample;

detecting by a preset generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object, wherein the method comprises the following steps:

calculating generalized object confidence coefficient of each bounding box through a preset generalized object confidence coefficient regressor according to the bounding boxes and the feature of the bounding box, and performing first screening treatment on the bounding boxes according to the generalized object confidence coefficient to obtain bounding boxes to be treated;

Constructing the bounding box to be processed into a weighted undirected graph, wherein each node in the weighted undirected graph set represents one bounding box to be processed, and each side in the weighted undirected graph set is composed of overlapping degrees among the nodes;

iteratively decomposing the whole image to be processed into N subgraphs through a recursive normalization cutting algorithm until the normalization cutting cost value of the subgraphs is lower than a preset segmentation threshold value, wherein N is any positive integer;

and in each subgraph, determining the bounding box to be processed with the highest confidence score of the generalized object as the bounding box of the unknown object.

2. The method according to claim 1, further comprising, before the detecting, by a preset generalized object confidence regressor, according to the bounding box and the bounding box features, obtaining a bounding box of an unknown object:

acquiring an image sample;

inputting the bounding box features and the generalized object confidence coefficient of the image sample to a two-stage target detector, and training to obtain a preset generalized object confidence coefficient regressor;

inputting the bounding box features and the class probabilities of the image samples to a two-stage target detector, and training to obtain a preset classifier;

Inputting the bounding box features and the bounding box displacement vectors of the image samples to a two-stage target detector, and training to obtain the preset bounding box displacement regressor.

3. The method of claim 2, further comprising, after the inputting bounding box features and bounding box displacement vectors of the image samples to a two-stage object detector, training to obtain the preset bounding box displacement regressor:

optimizing the preset classifier through negative energy inhibition to obtain an optimized classifier;

and/or the number of the groups of groups,

performing optimization treatment on the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor;

and/or the number of the groups of groups,

optimizing the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor;

correspondingly, the detecting processing is performed through a preset generalized object confidence coefficient regressor according to the bounding box and the bounding box characteristics to obtain a bounding box of an unknown object, which comprises the following steps:

detecting by using an optimized generalized object confidence coefficient regressor according to the bounding box and the feature of the bounding box to obtain a bounding box of an unknown object;

And detecting the object through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the method comprises the following steps of:

and detecting through an optimized classifier and an optimized bounding box displacement regressor according to the bounding box and the feature of the bounding box to obtain the known object.

4. A method according to claim 3, wherein the optimizing the preset classifier by negative energy suppression to obtain an optimized classifier comprises:

training the preset classifier by negative energy suppression and combining a cross entropy loss function and an uncertainty measurement loss function synthesized based on a virtual sample to obtain an optimized classifier; the uncertainty measurement loss function based on virtual sample synthesis is used for calculating the mean value and covariance of the characteristics of various known samples, constructing a multi-element Gaussian distribution model, generating a virtual sample at the boundary of the model to serve as the characteristic of a negative sample, taking an energy value as an uncertainty measurement, and setting a two-class model to judge the uncertainty of the virtual sample and the known sample;

the optimizing processing is performed on the preset bounding box displacement regressor to obtain an optimized bounding box displacement regressor, and the optimizing processing comprises the following steps:

Training the preset bounding box displacement regressor through a preset regression loss function to obtain an optimized bounding box displacement regressor;

the optimizing processing is performed on the preset generalized object confidence coefficient regressor to obtain an optimized generalized object confidence coefficient regressor, which comprises the following steps:

setting the image sample to comprise K examples, wherein K is any positive integer;

defining two indexes of an alternating prediction ratio IoP and an alternating value ratio IoC for the image samples, wherein the alternating prediction ratio and the alternating value ratio are calculated through the K examples and bounding box samples in the image samples;

the cross prediction ratio and the cross value ratio are calculated as follows:wherein (1)>For the K examples, < >>N samples of said bounding box;

classifying bounding box samples in the image samples according to the cross prediction ratio and the cross transformation value ratio, distributing the bounding box samples containing the same object instance to the same group to obtain K groups of bounding box samples, and dividing the K groups of bounding box samples into a complete object set, a local object set, an out-of-limit object set and a non-object set;

obtaining a first loss parameter according to a first preset generalized object confidence score and the complete object set;

Obtaining a second loss parameter according to a second preset generalized object confidence score and the local object set and/or according to the second preset generalized object confidence score and the out-of-range object set;

according to the complete object set, obtaining a third loss parameter through comparison and learning;

and training the preset generalized object confidence coefficient regressor according to the first loss parameter, the second loss parameter and the third loss parameter to obtain the optimized generalized object confidence coefficient regressor.

5. The method according to any one of claims 1 to 4, further comprising, after the detecting processing by a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics,:

and carrying out fusion processing on the bounding box of the unknown object and the known object to obtain an object set.

6. An image processing apparatus, comprising:

the acquisition module is used for acquiring the image to be processed;

the second object identification module is used for carrying out detection processing through a preset classifier and a preset bounding box displacement regressor according to the bounding box and the bounding box characteristics to obtain a known object, wherein the preset classifier is obtained through bounding box characteristics and class probability training of an image sample, and the preset bounding box displacement regressor is obtained through bounding box characteristics and bounding box displacement vector training of the image sample;

the first object identification module is specifically configured to:

7. An image processing apparatus, characterized by comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1 to 5.

8. A computer-readable storage medium, in which computer-executable instructions are stored, which when executed by a processor are adapted to carry out the image processing method according to any one of claims 1 to 5.