CN117218515B

CN117218515B - Target detection method, device, computing equipment and storage medium

Info

Publication number: CN117218515B
Application number: CN202311211931.1A
Authority: CN
Inventors: 顾晓光
Original assignee: People Co Ltd
Current assignee: Konami Sports Club Co Ltd
Priority date: 2023-09-19
Filing date: 2023-09-19
Publication date: 2024-05-03
Anticipated expiration: 2043-09-19
Also published as: CN117218515A

Abstract

The invention discloses a target detection method, a device, a computing device and a storage medium, wherein the method comprises the following steps: obtaining a target image to be detected, calculating the similarity between the target image and a plurality of sample images in a pre-constructed comparison sample library, and screening the plurality of sample images according to the similarity to obtain a first preset number of comparison images; for each comparison image, constructing a prediction task according to the target image, the random initialization image of the target image, the comparison image and the result image of the comparison image; performing batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images; and adding corresponding target labels to the target images according to the target results to obtain a final detection result. According to the method, the device and the system, through feature searching, the effective comparison images with the first preset number are screened, and the problem of calculation efficiency under the scene with a large detection target number is solved.

Description

Target detection method, device, computing equipment and storage medium

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a target detection method, device, computing equipment, and storage medium.

Background

The detection and identification of the specific target are important research directions in the field of computer vision, and have wide application values, for example, in a content safety auditing system, an automatic driving system and a security system, the specific target needs to be positioned and identified.

The existing target detection method is mainly based on the traditional computer vision technology, such as sliding window and image feature extraction; these methods perform well in simple scenarios, but face limitations on accuracy and efficiency in complex scenarios; with the advent of deep learning technology, particularly the development of Convolutional Neural Networks (CNNs), significant progress has been made in target detection. Currently, a deep learning-based method has become a mainstream of target detection.

However, there are two problems in the practical application scenario: firstly, after training is finished, if a new type of detection target is to be added, a certain amount of training samples are required to be added, and retraining is performed; secondly, for large-scale detection targets, such as tens of thousands of grades, training is difficult, the capability of a classification head is weak, the recognition effect is difficult to ensure, and the expandability and flexibility of the detection targets are limited greatly in a comprehensive view.

Disclosure of Invention

The present invention has been made in view of the above problems, and it is an object detection method, apparatus, computing device and storage medium that overcome or at least partially solve the above problems.

According to an aspect of the present invention, there is provided a target detection method including:

Calculating the similarity between a target image to be detected and a plurality of sample images in a pre-constructed comparison sample library, and screening the plurality of sample images according to the similarity to obtain a first preset number of comparison images; the comparison sample library comprises a plurality of sample images and a result image of each sample image;

For each comparison image, constructing a prediction task according to the target image, the random initialization image of the target image, the comparison image and the result image of the comparison image;

performing batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images;

and adding corresponding target labels to the target images according to the target results to obtain a final detection result.

According to another aspect of the present invention, there is provided an object detection apparatus including:

the task screening module is used for calculating the similarity between the target image to be detected and a plurality of sample images in a pre-constructed comparison sample library, and screening the plurality of sample images according to the similarity to obtain a first preset number of comparison images; the comparison sample library comprises a plurality of sample images and a result image of each sample image;

The recognition module based on comparison is used for constructing a prediction task according to the target image, the random initialization image of the target image, the comparison image and the result image of the comparison image aiming at each comparison image; performing batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images;

and the post-processing module is used for adding corresponding target labels to the target image according to the multiple target results to obtain a final detection result.

According to yet another aspect of the present invention, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;

The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the target detection method.

According to still another aspect of the present invention, there is provided a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the above-described object detection method.

According to the target detection method, the device, the computing equipment and the storage medium, the similarity between the target image and a plurality of sample images in a pre-constructed comparison sample library is calculated by acquiring the target image to be detected, and a first preset number of comparison images are obtained by screening the plurality of sample images according to the similarity; the comparison sample library comprises a plurality of sample images and a result image of each sample image; constructing a random initialization image of the target image; constructing a prediction task according to the target image, the random initialization image, the comparison image and a result image of the comparison image aiming at each comparison image; performing batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images; and adding corresponding target labels to the target images according to the target results to obtain a final detection result. According to the invention, the problem of detecting the universal target is converted into the problem of dividing similar targets by the reference sample, the effective first preset number of comparison images are screened through feature search, the problem of calculation efficiency under the scene of large number of detection targets is solved, the visual semantic generalization capability of a universal image segmentation model is effectively utilized, a prediction task is constructed for one detection target, a plurality of prediction tasks are identified in batches, the response time of target detection is reduced, and the problems of expansibility and flexibility of target detection in the prior art are solved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a flowchart of a target detection method according to an embodiment of the present invention;

FIG. 2a is a schematic diagram showing the object detection method of the present embodiment;

FIG. 2b shows a schematic representation of the target result of a target image;

fig. 3 is a schematic structural diagram of an object detection device according to an embodiment of the present invention;

FIG. 4 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 shows a flowchart of an embodiment of a target detection method according to the present invention, as shown in fig. 1, the method includes the following steps:

Step S110: and calculating the similarity between the target image to be detected and a plurality of sample images in a pre-constructed comparison sample library, and screening the plurality of sample images according to the similarity to obtain a first preset number of comparison images.

In the art, for an image of an object to be detected, there may be a plurality of objects to be detected, for example: money, flags, identity cards, etc. on the target image; then, under the scene that the number of the detection targets is large, for example, thousands of detection targets and tens of thousands of detection targets, the training of the detection model is difficult, the capability of the classification head is weak, the recognition effect is difficult to ensure, and the expandability and the flexibility of the detection model are greatly limited in view of combination; for example, if a target image includes 1000 detection targets, it may need to be compared with 1000 comparison images of different labels in the sample library, and 1000 can not be processed by Batch processing at the same time, so in this embodiment, the problem of calculation efficiency in a scene with a larger number of detection targets is solved by first screening out a first preset number of comparison images similar to the target image.

In an alternative manner, step S110 further includes: vectorizing the target image to obtain a first feature vector; obtaining second feature vectors of a plurality of sample images from a comparison sample feature library corresponding to the comparison sample library; calculating the similarity between the first feature vector and each second feature vector; and sequencing the similarity according to the sequence from high to low to obtain a first sequencing result, and extracting a first preset number of sample images which are arranged at the front of the first sequencing result from the first sequencing result to serve as comparison images.

Fig. 2a shows a schematic diagram of a target detection method in this embodiment, as shown in fig. 2a, a comparison sample library is constructed by using a task screening module, where the comparison sample library includes a plurality of sample images and a result image of each sample image, in this step, feature extraction is performed on a target image to extract a first feature vector, semantic similarity between the images is used to calculate similarity between the target image and the plurality of sample images, so that a plurality of sample images closest to each other are searched in the comparison sample library, prediction data Batch is constructed only for the sample images with the first number (N) closest to each other obtained by the search, and a target detection result, that is, a target result, is obtained by one-time calculation, where the first preset number N may be adjusted according to an actual situation, for example, a Batch size allowed by hardware. The feature extraction of the target image can be carried out by adopting various methods, and the stronger the adopted feature characterization capability is, the better the effect is. For example, a VIT model (vit_base_patch 8_224) trained on a large-scale dataset may be used to extract a first feature vector by which a suitable sample image may be searched in a comparison sample library for a target image as a comparison image; it should be noted that the method further includes a corresponding comparison sample feature library, and the second feature vector in the comparison sample feature library may be obtained by performing feature extraction on a sample image in the comparison sample library in advance, so as to be used when comparing with the target image.

Taking a detection target of a target image as a coin as an example, searching a comparison sample library to obtain a plurality of sample images containing coin labels, calculating the similarity between the target image and the sample images to obtain a plurality of similar sample images, sorting the similarity according to a sequence from high to low in practical application to obtain a first sorting result, and extracting a first preset number of sample images which are arranged at the front from the first sorting result to serve as comparison images, so that an expected prediction task is defined.

Step S120: for each comparison image, a prediction task is constructed according to the target image, the randomly initialized image of the target image, the comparison image and the result image of the comparison image.

Specifically, in order to obtain the target result of the target image, a first preset number of comparison images, together with the result images thereof, and the target image and the random initialization image of the target image are required to be put into a prediction model for prediction, where the random initialization image is the original image of the result image of the target image, specifically, the random area of the target image may be subjected to mask (mask) processing, and the processed target image is used as the random initialization image of the target image.

Specifically, one comparison image can be used for constructing a prediction task, a first preset number of prediction tasks can be constructed for a first preset number of comparison images, and each time one comparison image is added, a Batch is constructed by superposing the comparison image and a result image thereof, and a plurality of prediction tasks can be completed simultaneously by synchronously calculating the Batch.

Step S130: and carrying out batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images.

In an alternative manner, step S130 further includes: and carrying out batch identification processing on a first preset number of prediction tasks by using a prediction model, updating a mask region in each random initialization image, and outputting a target result corresponding to a target image of each prediction task.

The prediction model is a general image segmentation model (SegGPT) based on image prompt, which is obtained by applying convolutional neural network pre-training on a segmentation sample dataset. SegGPT unify the different segmentation tasks into a generic context learning framework by converting the different types of segmentation data into the same image format. It can be used to segment everything in the context, the training of SegGPT is formulated as a context coloring problem, assigning color mappings randomly to each data sample, the goal being to accomplish different segmentation tasks according to context, rather than relying on a specific color, segGPT can perform any segmentation task in the image by context reasoning.

As shown in fig. 2a, the input of the prediction model is a first preset number of prediction tasks, each prediction task includes a target image, a random initialized image of the target image, a comparison image and a result image of the comparison image, four images are added, each prediction task outputs a result image of the target image, that is, the main task completed by the prediction model is to refer to the comparison image and the result image thereof, and reasonable prediction is automatically applied to the target image to obtain a target result. In the case of a plurality of prediction tasks, in the case where the detection target of each prediction task is clear and the number of kinds is small, the processing can be performed with high efficiency by the method provided by the present embodiment.

Step S140: and adding corresponding target labels to the target images according to the target results to obtain a final detection result.

In an alternative manner, step S140 further includes: performing binarization processing on the target result aiming at each target result, and calculating the area occupation ratio of the mask area in the target result; and sequencing the area proportion of the mask areas of the plurality of target results according to the sequence from large to small to obtain a second sequencing result, and adding corresponding target labels to the target images according to the second sequencing result to obtain a final detection result.

Specifically, the embodiment uses the visual semantic generalization capability presented by SegGPT to apply SegGPT to the detection and recognition of few-sample targets, and after the target result is obtained, the target result needs to be subjected to binarization processing to reveal and locate a mask area (i.e. a mask area), and the area ratio of the mask area in the target result is calculated to sort a plurality of target results according to the mask area ratio, so as to obtain a second sorting result, and a corresponding target label is added to the target image according to the second sorting result, so as to obtain a final detection result.

In an alternative manner, step S140 further includes: selecting a second preset number of target results with the front sorting from the second sorting results, and acquiring target labels of the comparison images corresponding to the second preset number of target results; and adding a target label of the comparison image for the target image.

Specifically, the second preset number may also be determined according to an actual scene, for example, the second preset number may be set to 1, that is, the target label of the comparison image corresponding to the mask target result with the maximum area ratio is obtained, the target label of the comparison image is added to the target image, for example, the comparison image is a target label with money, and then the money label is added to the target image; the second preset number may be multiple, and if multiple, the target labels of the multiple comparison images may be added to the target image, for example: the target labels of the first three comparison images are respectively: and adding target labels of the three comparison images to the target image to finish the detection target of the target image.

In an alternative manner, an area ratio threshold may be set, and for the target label of the comparison image in which the mask area ratio exceeds the threshold, the target label is added to the target image.

Fig. 2b shows a schematic diagram of a target result of a target image, as shown in fig. 2b, after the target result in the diagram is binarized, and an area ratio of mask (white area) is calculated to be 17.32%, and the target label of the comparison image corresponding to the target result can be compared when the first and second preset numbers are set to 1 in the second sequencing result; and adding a target label of the comparison image for the target image, and drawing an circumscribed rectangle of the mask area to locate the target.

In particular, in this embodiment, for a newly added detection target, training data is not required to be collected and retrained, and only a single or a small number of sample images including the newly added detection target label are required to be placed into a comparison sample library, so that the newly added detection target can be effectively identified, and training investment is greatly reduced.

By adopting the method of the embodiment, the problem of detecting the universal target is converted into the problem of dividing similar targets by a reference sample, and the effective first preset number of comparison images are screened by characteristic search, so that the problem of calculation efficiency under the scene of large number of detection targets is solved, the visual semantic generalization capability of a universal image segmentation model is effectively utilized, a prediction task is constructed for one detection target, a plurality of prediction tasks are identified in batches, the response time of target detection is reduced, and the problems of expansibility and flexibility faced by target detection in the prior art are solved; for the newly added detection target, the newly added detection target can be effectively identified by only putting a single or a small number of sample images which are compared with the sample images comprising the newly added detection target label into a comparison sample library without collecting training data for retraining, so that training investment is greatly reduced.

Fig. 3 is a schematic diagram showing the structure of an embodiment of the object detection device of the present invention. As shown in fig. 3, the apparatus includes: a task screening module 310, an alignment-based identification module 320, and a post-processing module 330.

The task screening module 310 is configured to calculate a similarity between a target image to be detected and a plurality of sample images in a comparison sample library constructed in advance, and screen the plurality of sample images to obtain a first preset number of comparison images according to the similarity; the comparison sample library comprises a plurality of sample images and a result image of each sample image.

In an alternative approach, the task filtering module 310 is further configured to: vectorizing the target image to obtain a first feature vector; obtaining second feature vectors of a plurality of sample images from a comparison sample feature library corresponding to the comparison sample library; calculating the similarity between the first feature vector and each second feature vector; and sequencing the similarity according to the sequence from high to low to obtain a first sequencing result, and extracting a first preset number of sample images which are arranged at the front of the first sequencing result from the first sequencing result to serve as comparison images.

The recognition module 320 based on comparison is configured to construct a prediction task according to the target image, the randomly initialized image of the target image, the comparison image, and the result image of the comparison image for each comparison image; and carrying out batch identification processing on the first preset number of prediction tasks to obtain a plurality of target results corresponding to the target images.

In an alternative approach, the alignment-based identification module 320 is further configured to: and carrying out mask processing on the random area of the target image, and taking the processed target image as a random initialization image of the target image.

In an alternative approach, the alignment-based identification module 320 is further configured to: and carrying out batch identification processing on a first preset number of prediction tasks by using a prediction model, updating a mask region in each random initialization image, and outputting a target result corresponding to a target image of each prediction task.

In an alternative way, the predictive model is a generic image segmentation model based on image cues pre-trained by applying convolutional neural networks on the segmented sample dataset.

The post-processing module 330 is configured to add corresponding target labels to the target image according to the multiple target results, so as to obtain a final detection result.

In an alternative approach, the post-processing module 330 is further configured to: performing binarization processing on the target result aiming at each target result, and calculating the area occupation ratio of the mask area in the target result; and sequencing the area proportion of the mask areas of the plurality of target results according to the sequence from large to small to obtain a second sequencing result, and adding corresponding target labels to the target images according to the second sequencing result to obtain a final detection result.

In an alternative approach, the post-processing module 330 is further configured to: selecting a second preset number of target results with the front sorting from the second sorting results, and acquiring target labels of the comparison images corresponding to the second preset number of target results; and adding a target label of the comparison image for the target image.

By adopting the device of the embodiment, the problem of detecting the general target is converted into the problem of dividing similar targets by the reference sample, and the effective first preset number of comparison images are screened by the feature search, so that the problem of calculation efficiency under the scene of large number of detection targets is solved, the visual semantic generalization capability of a general image segmentation model is effectively utilized, a prediction task is constructed for one detection target, a plurality of prediction tasks are identified in batches, the response time of target detection is reduced, and the problems of expansibility and flexibility of target detection in the prior art are solved.

The embodiment of the invention provides a non-volatile computer storage medium, which stores at least one executable instruction, and the computer executable instruction can execute the target detection method in any of the above method embodiments.

The executable instructions may be particularly useful for causing a processor to:

FIG. 4 illustrates a schematic diagram of an embodiment of a computing device of the present invention, and the embodiments of the present invention are not limited to a particular implementation of the computing device.

As shown in fig. 4, the computing device may include:

A processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus.

Wherein: the processor, communication interface, and memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers, etc. And the processor is used for executing the program, and can specifically execute the relevant steps in the embodiment of the target detection method.

In particular, the program may include program code including computer-operating instructions.

The processor may be a central processing unit, CPU, or an Application specific integrated Circuit, ASIC (Application SPECIFIC INTEGRATED circuits), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the server may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

And the memory is used for storing programs. The memory may comprise high-speed RAM memory or may further comprise non-volatile memory, such as at least one disk memory.

The program may be specifically operative to cause the processor to:

Obtaining a target image to be detected, calculating the similarity between the target image to be detected and a plurality of sample images in a pre-constructed comparison sample library, and screening the plurality of sample images according to the similarity to obtain a first preset number of comparison images; the comparison sample library comprises a plurality of sample images and a result image of each sample image;

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A method of detecting an object, comprising:

Constructing a prediction task according to the target image, the random initialization image of the target image, the comparison image and a result image of the comparison image aiming at each comparison image;

Performing batch identification processing on the first preset number of prediction tasks by using a prediction model to obtain a plurality of target results corresponding to the target image; the prediction model is a general image segmentation model based on image prompt, which is obtained by applying convolutional neural network pre-training on a segmentation sample data set, and different segmentation tasks are unified into a general context learning framework by converting different types of segmentation data into the same image format and used for segmenting everything in the context; each prediction task outputs a result image of a target image, and the prediction model refers to the comparison image and the result image of the comparison image, and predicts the target image to obtain a target result;

and adding corresponding target labels to the target image according to the target results to obtain a final detection result.

2. The method of claim 1, wherein calculating the similarity between the target image and a plurality of sample images in a pre-constructed comparison sample library, and screening a first preset number of comparison images from the plurality of sample images according to the similarity further comprises:

vectorizing the target image to obtain a first feature vector;

obtaining second feature vectors of a plurality of sample images from a comparison sample feature library corresponding to the comparison sample library;

Calculating the similarity between the first feature vector and each second feature vector;

And sequencing the similarity according to the sequence from high to low to obtain a first sequencing result, and extracting a first preset number of sample images which are arranged at the front of the first sequencing result from the first sequencing result to serve as comparison images.

3. The method according to claim 1, characterized in that the randomly initialized image of the target image is in particular:

And carrying out mask processing on the random area of the target image, and taking the processed target image as a random initialization image of the target image.

4. The method of claim 3, wherein the applying the predictive model to perform batch identification processing on the first preset number of predictive tasks to obtain a plurality of target results corresponding to the target image further includes:

And carrying out batch identification processing on the first preset number of prediction tasks by using a prediction model, updating a mask area in each random initialization image, and outputting a target result corresponding to a target image of each prediction task.

5. The method according to any one of claims 1-4, wherein adding a corresponding target label to the target image according to the plurality of target results, to obtain a final detection result further comprises:

performing binarization processing on each target result, and calculating the area occupation ratio of a mask area in the target result;

And sequencing the area proportion of the mask areas of the plurality of target results according to the sequence from large to small to obtain a second sequencing result, and adding corresponding target labels to the target images according to the second sequencing result to obtain a final detection result.

6. The method of claim 5, wherein adding a corresponding target label to the target image according to the second ranking result, and obtaining a final detection result further comprises:

Selecting a second preset number of target results with the front sorting from the second sorting results, and acquiring target labels of the comparison images corresponding to the second preset number of target results;

and adding a target label of the comparison image for the target image.

7. An object detection apparatus, comprising:

The recognition module based on comparison is used for constructing a prediction task according to the target image, the random initialization image of the target image, the comparison image and a result image of the comparison image aiming at each comparison image; performing batch identification processing on the first preset number of prediction tasks by using a prediction model to obtain a plurality of target results corresponding to the target image; the prediction model is a general image segmentation model based on image prompt, which is obtained by applying convolutional neural network pre-training on a segmentation sample data set, and different segmentation tasks are unified into a general context learning framework by converting different types of segmentation data into the same image format and used for segmenting everything in the context; each prediction task outputs a result image of a target image, and the prediction model refers to the comparison image and the result image of the comparison image, and predicts the target image to obtain a target result;

And the post-processing module is used for adding corresponding target labels to the target images according to the target results to obtain a final detection result.

8. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to an object detection method according to any one of claims 1 to 6.

9. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to an object detection method according to any one of claims 1 to 6.