CN114445711A

CN114445711A - Image detection method, image detection device, electronic equipment and storage medium

Info

Publication number: CN114445711A
Application number: CN202210112167.1A
Authority: CN
Inventors: 杨喜鹏; 李莹莹; 谭啸; 孙昊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-05-06
Anticipated expiration: 2042-01-29
Also published as: CN114445711B

Abstract

The disclosure provides an image detection method, an image detection device, electronic equipment and a storage medium, belongs to the field of artificial intelligence, and particularly relates to computer vision, image recognition and deep learning technologies. The implementation scheme is as follows: acquiring an image to be detected; scene prediction is carried out on an image to be detected so as to obtain a first probability corresponding to each scene; carrying out object detection on the image to be detected so as to determine detection frame information contained in the image to be detected and a second probability of the class to which the object in the detection frame belongs; and generating a detection result corresponding to the image to be detected according to the first probabilities and the second probabilities. The final detection result of the image to be detected is generated by combining the probability corresponding to each scene and the probability of the class to which the object in the detection frame belongs, and the scene difference is considered, so that the problem of false detection caused by the scene difference can be reduced, the accuracy of image detection is improved, and the generalization of the image detection method in different scenes is improved.

Description

Image detection method, image detection device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision, image recognition and deep learning technologies, which can be used in smart cities and smart traffic scenes, and in particular to an image detection method, an image detection device, an electronic device, and a storage medium.

Background

Generally, there will be a difference between the object to be detected and the background in the images shot by different cameras in the same geographical area, and there will be a great difference between the object to be detected and the background due to the difference of the facility construction in different geographical areas, so that there will be a problem of false detection when detecting.

Therefore, how to improve the accuracy of image detection is an urgent problem to be solved.

Disclosure of Invention

The disclosure provides an image detection method, an image detection device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided an image detection method including:

acquiring an image to be detected;

scene prediction is carried out on the image to be detected so as to obtain a first probability corresponding to each scene;

performing object detection on the image to be detected to determine detection frame information contained in the image to be detected and a second probability of a category to which an object in the detection frame belongs;

and generating a detection result corresponding to the image to be detected according to the first probabilities and the second probabilities.

According to another aspect of the present disclosure, there is provided an image detection apparatus including:

the acquisition module is used for acquiring an image to be detected;

the detection module is used for carrying out scene prediction on the image to be detected so as to obtain a first probability corresponding to each scene;

the detection module is further configured to perform object detection on the image to be detected to determine detection frame information included in the image to be detected and a second probability of a category to which an object in the detection frame belongs;

and the generating module is used for generating a detection result corresponding to the image to be detected according to the first probabilities and the second probabilities.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the above embodiments.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the above-described embodiments.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method of the above embodiment.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flowchart of an image detection method according to an embodiment of the disclosure;

fig. 2 is a schematic flowchart of an image detection method according to another embodiment of the disclosure;

fig. 3 is a schematic flowchart of an image detection method according to another embodiment of the disclosure;

fig. 4 is a schematic flowchart of an image detection method according to another embodiment of the disclosure;

fig. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing an image detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An image detection method, an apparatus, an electronic device, and a storage medium of the embodiments of the present disclosure are described below with reference to the drawings.

Artificial intelligence is the subject of research on the use of computers to simulate certain mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of humans, both in the hardware and software domain. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology comprises a computer vision technology, a voice recognition technology, a natural language processing technology, deep learning, a big data processing technology, a knowledge map technology and the like.

Computer vision is a science for researching how to make a machine "see", and means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect.

Image recognition, which refers to a technique for processing, analyzing and understanding images by a computer to recognize various different patterns of objects and objects, is a practical application of applying a deep learning algorithm.

Deep learning is a new research direction in the field of machine learning. Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds.

Fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present disclosure.

The image detection method of the embodiment of the disclosure can be executed by the image detection device provided by the embodiment of the disclosure, and a final detection result is generated by the first probability corresponding to each scene and the second probability of the category to which the object in each detection frame belongs, which are obtained by scene prediction, so as to improve the accuracy of image detection, be applicable to image detection of various scenes, and improve the generalization of image detection.

The electronic device may be any device with computing capability, for example, a personal computer, a mobile terminal, a server, and the like, and the mobile terminal may be a hardware device with various operating systems, touch screens, and/or display screens, such as an in-vehicle device, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, and the like.

As shown in fig. 1, the image detection method includes:

step 101, obtaining an image to be detected.

In this disclosure, the image to be detected is an image that needs to be subjected to target detection, and the image to be detected may be an image acquired on line, for example, the image to be detected may be acquired on line through a web crawler technology, or the image to be detected may also be an image acquired off line, or the image to be detected may also be an image acquired in real time, or the image to be detected may also be an image synthesized manually, and so on. Wherein the target may be one or more of a vehicle, a human, an animal, etc., and the disclosure is not limited thereto.

It should be understood that the image to be detected may also be a certain frame image in a video, and the image to be detected may be extracted from the video, where the video may be referred to as the video to be detected, and an obtaining method of the video to be detected is similar to that of the image to be detected, and is not described herein again.

And 102, carrying out scene prediction on an image to be detected to obtain a first probability corresponding to each scene.

In practical application, when an image shot by a certain camera is used for training a model and the model is used for carrying out target detection on an image shot by another camera, due to the fact that an object to be detected and a background in images shot by different cameras in the same geographic area are different, namely, scenes corresponding to the two cameras are different, the situation of false detection may exist. In addition, the facility construction in different geographical areas has scene difference, and false detection can also occur.

Based on this, in the present disclosure, a scene prediction branch may be added to a training network, and a detection network may be obtained through training, where the detection network includes the scene prediction branch.

In the present disclosure, the scenes may be divided according to different levels, such as a camera level, a street level, a city level, and the like. During training, training can be performed according to images of scenes at different levels respectively, so as to obtain a detection network capable of performing scene prediction at a corresponding level. Thus, different image detection requirements can be met.

When the image is detected, the feature extraction can be performed on the image to be detected by using a pre-trained backbone network in the detection network to obtain the feature direction, and then the scene prediction can be performed by using the feature vector and the scene prediction branch to obtain the first probability corresponding to each scene. The number of scenes is the same as the number of scenes in the training set during model training, for example, the number of scenes is 4, and then the first probability corresponding to each scene in the 4 scenes can be obtained.

In order to realize the accuracy of the feature extraction result and save resources, a proper backbone network can be selected to extract the features of the target image according to the application scene of the service. For example, the backbone network can be divided into a lightweight structure (such as ResNet18, ResNet34, DarkNet19, etc.), a medium-sized structure (such as ResNet50, ResNeXt (which is a combination of ResNet and inclusion) 50, DarkNet53, etc.), and a heavy-sized structure (such as ResNet101, ResNeXt152), and the specific network structure can be selected according to the application scenario.

In practical application, the detection networks of different scene prediction levels can be trained according to requirements, for example, the detection networks can predict the first probability that the image to be detected belongs to each scene, and for example, the detection networks can also predict the first probability that each object in the image to be detected belongs to each scene.

And 103, detecting the object of the image to be detected to determine the detection frame information contained in the image to be detected and the second probability of the class to which the object in the detection frame belongs.

In the present disclosure, the detection network includes object detection branches, wherein the object detection branches can perform classification and regression prediction. After the feature vector corresponding to the image to be detected is obtained, classification and regression prediction can be performed by using the feature vector and the object detection branch, and the detection frame information contained in the image to be detected and the second probability of the class to which the object in the detection frame belongs are determined.

The number of the detection frames contained in the image to be detected may be one, multiple or none; each detection frame information may include information such as a position and a size of the detection frame, and the category to which the object belongs may be a vehicle, a person, an animal, or the like, or may also be a type of the vehicle, such as a truck, a car, or the like, which is not limited in this disclosure.

In the present disclosure, the second probability of the class to which the object in the detection frame belongs may be the maximum probability among the probabilities that the object in the detection frame belongs to each class. For example, if a vehicle is detected in the image, the second probability is a probability that the object in the detection frame is a vehicle. For another example, the type of the vehicle in the image is detected, where the type of the vehicle includes a truck and a car, the probability that the object in a certain detection frame belongs to the truck is 0.8, and the probability that the object belongs to the car is 0.5, and then the probability that the class to which the object in the detection frame belongs is the truck, and the probability that the class to which the object belongs is 0.8.

For example, detecting an image to be detected to detect a vehicle in the image, then performing object detection on the detected image, and obtaining detection frame information included in the image to be detected and probabilities that objects in the detection frames belong to the vehicle, where for example, the image to be detected includes three detection frames, and the probabilities that the objects in the three detection frames belong to the vehicle are 0.6, 0.8, and 0.9, respectively.

And 104, generating a detection result corresponding to the image to be detected according to the first probabilities and the second probabilities.

In the present disclosure, the first probability corresponding to each scene may be regarded as a probability that a scene to which the picture belongs or a scene to which the object in the detection frame belongs appears in the training set, for example, the probability of the a scene in each first probability is 0.3 at most and is less than the threshold 0.6, and it may be considered that the scene to which the image belongs does not appear in the training set, and then the probability of the category to which the object in the detection frame belongs is weighted based on the first probability corresponding to the a scene, which obviously reduces the probability of false detection. Therefore, the final detection result of the image to be detected is determined by combining the probability corresponding to the scene, and compared with the method of directly outputting the object detection result, the accuracy of image detection can be improved, and the probability of false detection is reduced.

The image detection method disclosed by the embodiment of the disclosure can be used for smart cities and intelligent traffic scenes, for example, a certain type of vehicle in a certain road monitoring video can be detected, or the type of the vehicle appearing in each time period in a certain road video can be detected, and the like.

In the embodiment of the disclosure, an image to be detected is obtained; scene prediction is carried out on an image to be detected so as to obtain a first probability corresponding to each scene; carrying out object detection on the image to be detected so as to determine detection frame information contained in the image to be detected and a second probability of the class to which the object in the detection frame belongs; and generating a detection result corresponding to the image to be detected according to the first probabilities and the second probabilities. Therefore, the final detection result of the image to be detected is generated by combining the probability corresponding to each scene and the probability of the class to which the object in the detection frame belongs, and the scene difference is considered, so that the problem of false detection caused by the scene difference can be reduced, the accuracy of image detection is improved, and the generalization of the image detection method in different scenes is improved.

In an embodiment of the disclosure, when scene prediction is performed, the probability that an image to be detected belongs to each scene may be predicted, and when the probability that the image to be detected belongs to the scene is smaller than a first threshold, a second probability corresponding to each detection frame may be updated according to the probability, so that a detection result of the image to be detected is generated according to the updated probability. Fig. 2 is a schematic flow chart of an image detection method according to another embodiment of the present disclosure.

As shown in fig. 2, the image detection method includes:

step 201, an image to be detected is obtained.

Step 202, performing scene prediction on an image to be detected to obtain a first probability corresponding to each scene.

And 203, detecting the object of the image to be detected to determine the detection frame information contained in the image to be detected and the second probability of the class to which the object in the detection frame belongs.

In the present disclosure, steps 201 to 203 are similar to those described in the above embodiments, and therefore are not described herein again.

And 204, under the condition that the first probability is the probability that the image to be detected belongs to each scene, determining the maximum probability from the first probabilities.

In the present disclosure, if the detection network can predict the scene at the picture level, the detection network may be used to perform scene prediction on the image to be detected, so as to obtain a first probability that the image to be detected belongs to each scene.

Under the condition that the first probability is the probability that the image to be detected belongs to each scene, the maximum probability corresponding to the image to be detected can be determined from the first probability that the image to be detected belongs to each scene. The scene corresponding to the maximum probability can be regarded as the scene to which the image to be detected belongs.

For example, there are 4 scenes a1, a2, A3, and a4, the first probabilities that a certain image a to be detected belongs to each scene are 0.56, 0.5, 0.4, and 0.48, respectively, and the probability 0.56 that the image a to be detected belongs to the scene a4 is the largest, and it can be considered that the scene to which the image to be detected belongs is a 1.

And step 205, under the condition that the maximum probability is smaller than the first threshold, updating the second probability corresponding to each detection frame according to the maximum probability to obtain a third probability corresponding to each detection frame.

The second probability corresponding to each detection frame refers to the second probability of the category to which the object in each detection frame belongs.

In this disclosure, the maximum probability may be compared with a first threshold, and if the maximum probability is smaller than the first threshold, it may be considered that a scene corresponding to the maximum probability does not appear in the training set, and then the second probability corresponding to each detection frame in the image to be detected may be updated according to the maximum probability to obtain a third probability corresponding to each detection frame.

The third probability corresponding to each detection frame refers to the probability of the class to which the object in the detection frame belongs after the second probability of the class to which the object in the detection frame belongs is updated by using the maximum probability of the scene to which the image to be detected belongs.

During updating, the maximum probability may be multiplied by the second probability corresponding to each detection frame to obtain a third probability, so as to weight the second probability of the category to which the object in each detection frame belongs, so as to reduce the probability of high-score false detection.

For example, with reference to the above example, the first probability 0.56 corresponding to the scene a1 is the maximum probability and is smaller than the first threshold 0.7, the image to be detected includes 3 detection frames, the objects in the 3 detection frames are B1, B2 and B3, the second probabilities that the B1, B2 and B3 belong to the vehicle are 0.8, 0.9 and 0.9, respectively, and actually the B1 does not belong to the vehicle, then 0.8 is multiplied by 0.56 to be 0.448, so that the probability that the B1 belongs to the vehicle can be reduced, and the false detection probability can be reduced.

It should be noted that the first threshold in the present disclosure may be determined according to actual needs, and the present disclosure does not limit this.

And step 206, generating a detection result corresponding to the image to be detected according to the third probability corresponding to each detection frame.

According to the third probability corresponding to each detection frame, the target detection frame can be determined from each detection frame, and the detection result corresponding to the image to be detected is generated according to the information of the target detection frame and the category to which the object in the target detection frame belongs.

When determining the target detection frame, the third probability corresponding to each detection frame may be compared with the probability threshold, and the detection frame with the third probability greater than the probability threshold may be used as the target detection frame. Alternatively, the third probabilities may be sorted in descending order, and the detection frames corresponding to the probabilities of the first preset number may be used as the target detection frames.

For example, with reference to the above example, the second probabilities that B1, B2, and B3 belong to vehicles are 0.8, 0.9, and 0.9, respectively, and are multiplied by the maximum probability of 0.56 to obtain 0.448, 0.504, and 0.504, respectively, a detection frame with a probability greater than 0.5 may be used as a target detection frame, that is, a detection frame in which the object B2 is located and a detection frame in which B3 is located are used as target detection frames, and the generated detection result of the image to be detected includes information of the detection frame in which the object B2 is located, information of the detection frame in which B3 is located, and the categories to which B2 and B3 belong.

Therefore, the target detection frame is determined according to the updated third probability, and the final detection result is generated based on the target frame information and the category to which the object in the target detection frame belongs, so that the object which is detected by mistake can be effectively screened out, and the detection accuracy is improved.

When a detection result corresponding to the image to be detected is generated, the target detection frame can be determined according to the weighted sum of the third probability and the second probability corresponding to each detection frame, and then the detection result corresponding to the image to be detected is generated according to the information of the target detection frame and the category to which the object in the target detection frame belongs. The weights corresponding to the third probability and the second probability respectively can be determined according to actual needs, which is not limited by the present disclosure.

In the embodiment of the disclosure, when the detection result corresponding to the image to be detected is generated according to each first probability and each second probability, under the condition that the first probability is the probability that the image to be detected belongs to each scene, the maximum probability corresponding to the image to be detected is determined according to the first probability that the image to be detected belongs to each scene, the second probability of the category to which the object in each detection frame belongs is updated based on the maximum probability, and the detection result is generated based on the updated probability.

In an embodiment of the present disclosure, the method shown in fig. 3 may also be used to generate a detection result corresponding to the image to be detected. Fig. 3 is a schematic flowchart of an image detection method according to another embodiment of the disclosure.

As shown in fig. 3, the image detection method includes:

step 301, obtaining an image to be detected.

Step 302, performing scene prediction on an image to be detected to obtain a first probability corresponding to each scene.

Step 303, performing object detection on the image to be detected to determine the detection frame information included in the image to be detected and the second probability of the class to which the object in the detection frame belongs.

And 304, under the condition that the first probability is the probability that the image to be detected belongs to each scene, determining the maximum probability from the first probabilities.

And 305, under the condition that the maximum probability is smaller than the first threshold, updating the second probability corresponding to each detection frame according to the maximum probability to obtain a third probability corresponding to each detection frame.

And step 306, generating a detection result corresponding to the image to be detected according to the third probability corresponding to each detection frame.

In the present disclosure, steps 301 to 306 are similar to those described in the above embodiments, and therefore are not described herein again.

And 307, under the condition that the maximum probability is greater than or equal to the first threshold, generating a detection result corresponding to the image to be detected according to the detection frame information and the class to which the object in the detection frame belongs.

In the disclosure, the image to be detected belongs to a scene corresponding to the maximum probability in the first probabilities of the scenes, and the scene to which the image to be detected belongs may be considered, and when the maximum probability is greater than or equal to the first threshold, the scene to which the image to be detected belongs may be considered to have appeared in the training set, and at this time, the detection result of the image to be detected may be generated directly according to the detection frame information obtained by performing the object detection on the image to be detected and the category to which the object in the detection frame belongs.

Therefore, when the scene to which the image to be detected belongs is a scene appearing in the training set, the detection result is generated directly according to the detected detection frame information and the class to which the object in the detection frame belongs, and the detection accuracy is improved.

For example, there are 4 scenes a1, a2, A3, and a4, the first probabilities that a certain to-be-detected image b belongs to each scene are 0.8, 0.6, 0.5, and 0.5, respectively, the to-be-detected image b includes 3 detection frames, the first probability that the to-be-detected image b belongs to the scene a1 is 0.8, which is the maximum probability, and is greater than the first threshold value 0.7, and then the detection result of the to-be-detected image b can be generated according to the detected 3 detection frames and the category to which the object in each detection frame belongs.

In the embodiment of the disclosure, under the condition that the first probability is the probability that the image to be detected belongs to each scene, if the maximum probability in the first probabilities is smaller than the first threshold, the second probability of the category to which the object in each detection frame belongs may be updated based on the maximum probability, and the detection result is generated based on the updated probability; if the maximum probability in the first probabilities is greater than or equal to the first threshold, the final detection result can be generated directly according to the detection frame information contained in the image to be detected and the class to which the object in the detection frame belongs. Therefore, the final detection result of the image to be detected is generated by combining the probability that the image to be detected belongs to each scene, and the detection accuracy is improved.

In an embodiment of the present disclosure, when the first probability is a probability that the image to be detected belongs to each scene, the second probability corresponding to each detection frame may also be directly updated according to a maximum probability of the first probabilities corresponding to the image to be detected to generate a third probability, and a final detection result may be generated according to the third probability corresponding to each detection frame.

In an embodiment of the present disclosure, when scene prediction is performed, the probability that each object in the image to be detected belongs to each scene may be predicted, and a detection result of the image to be detected is generated according to the probability that each object belongs to each scene and the corresponding second probability. Fig. 4 is a schematic flowchart of an image detection method according to another embodiment of the present disclosure.

As shown in fig. 4, the image detection method includes:

step 401, obtaining an image to be detected.

Step 402, performing scene prediction on an image to be detected to obtain a first probability corresponding to each scene.

And 403, performing object detection on the image to be detected to determine the detection frame information contained in the image to be detected and the second probability of the class to which the object in the detection frame belongs.

In the present disclosure, steps 401 to 403 are similar to those described in the above embodiments, and therefore are not described herein again.

Step 404, determining the maximum first probability corresponding to each detection frame according to each first probability corresponding to each detection frame when the first probability is the probability that the object in each detection frame belongs to each scene.

In the present disclosure, if the detection network is a scene at a prediction object level, the detection network may be used to perform scene prediction on an image to be detected, so as to obtain a first probability that each object in the image to be detected belongs to each scene.

Under the condition that the first probability is the probability that an object in each detection frame in the image to be detected belongs to each scene, the maximum first probability corresponding to each detection frame can be determined from each first probability corresponding to each detection frame. The scene corresponding to the maximum first probability corresponding to each detection frame may be considered as the scene to which the object in each detection frame belongs.

For example, there are 4 scenes C1, C2, and C3, and a certain image C to be detected includes two detection frames M and N, where the first probabilities that the objects in the detection frame M belong to the scenes are 0.7, 0.4, and 0.5, respectively, the first probabilities that the objects in the detection frame N belong to the scenes are 0.3, 0.2, and 0.6, respectively, it can be determined that the maximum first probability corresponding to the detection frame M is 0.7, the scene to which the objects in the detection frame M belong is C1, the maximum first probability corresponding to the detection frame N is 0.6, and the scene to which the objects in the detection frame N belong is C3.

Step 405, updating the second probability of any detection frame according to the maximum first probability corresponding to any detection frame to obtain a fourth probability corresponding to any detection frame when the maximum first probability corresponding to any detection frame is smaller than the second threshold.

In this disclosure, the maximum first probability corresponding to each detection frame may be compared with the second threshold, and if the maximum first probability corresponding to any detection frame is smaller than the second threshold, it may be considered that a scene corresponding to the maximum first probability corresponding to the detection frame does not appear in the training set, and then the second probability corresponding to any detection frame may be updated according to the maximum first probability corresponding to any detection frame, so as to obtain the fourth probability corresponding to any detection frame.

In the updating, the maximum first probability corresponding to any detection frame may be multiplied by the second probability corresponding to any detection frame to obtain the fourth probability corresponding to any detection frame, so that the second probability of the category to which the object in any detection frame belongs is weighted to reduce the probability of high-score false detection.

For example, in combination with the above example, if the maximum first probability 0.6 corresponding to the detection frame N is smaller than the second threshold 0.65, the updated probability 0.48 may be obtained by multiplying 0.6 by the probability 0.8 of the class to which the object in the detection frame N belongs, whereas the maximum first probability 0.7 corresponding to the detection frame M is greater than the second threshold 0.65, and the second probability of the class to which the object in the detection frame M belongs may not be updated.

It should be noted that the second threshold in the present disclosure may be determined according to actual needs, and the present disclosure does not limit this.

And 406, generating a detection result corresponding to the image to be detected according to the fourth probability corresponding to any detection frame.

In the disclosure, the target detection frame may be determined according to the fourth probability corresponding to any detection frame and the second probability corresponding to the detection frame whose maximum first probability in the to-be-detected image is greater than or equal to the second threshold, and the detection result corresponding to the to-be-detected image may be generated according to the target detection frame and the category to which the object in the target detection frame belongs.

That is to say, if the second probabilities corresponding to some detection frames in the image to be detected are updated to obtain the fourth probabilities, the final detection result of the image to be detected can be generated according to the fourth probabilities corresponding to the detection frames and the second probabilities corresponding to the detection frames which are not updated in the image to be detected.

In the present disclosure, the method for determining the target detection frame is similar to the method for determining the target detection frame in the above embodiment, and therefore is not described herein again.

For example, with reference to the above example, the fourth probability corresponding to the detection frame N is 0.48, the probability of the type to which the object in the detection frame M belongs is 0.8, and the threshold is set to 0.6, and the detection result of the image c to be detected can be generated based on the detection frame N and the type to which the object in the detection frame N belongs, with the detection frame N having a probability greater than 0.6 as the target detection frame.

In the embodiment of the disclosure, when the first probability is a probability that an object in each detection frame belongs to each scene, a maximum first probability in each first probability corresponding to each detection frame is determined, and when the maximum first probability corresponding to any detection frame is smaller than a second threshold, a second probability corresponding to any detection frame is updated according to the maximum first probability corresponding to any detection frame, and then a detection result corresponding to an image to be detected is generated according to a probability of a category to which the object in each detection frame belongs, so that the detection result is generated in combination with the probability that the object belongs to each scene, a scene prediction result based on an object level is realized, the detection result of the image is determined, and the detection accuracy is improved.

In order to implement the above embodiments, the embodiments of the present disclosure further provide an image detection apparatus. Fig. 5 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the image detection apparatus 500 includes:

an obtaining module 510, configured to obtain an image to be detected;

the detection module 520 is configured to perform scene prediction on the image to be detected to obtain a first probability corresponding to each scene;

the detection module 520 is further configured to perform object detection on the image to be detected, so as to determine detection frame information included in the image to be detected and a second probability of a category to which an object in the detection frame belongs;

a generating module 530, configured to generate a detection result corresponding to the image to be detected according to each of the first probabilities and each of the second probabilities.

In a possible implementation manner of the embodiment of the present disclosure, the generating module 530 is configured to:

determining the maximum probability from the first probabilities under the condition that the first probabilities are the probabilities that the images to be detected belong to each scene;

under the condition that the maximum probability is smaller than a first threshold value, updating a second probability corresponding to each detection frame according to the maximum probability to obtain a third probability corresponding to each detection frame;

and generating a detection result corresponding to the image to be detected according to the third probability corresponding to each detection frame.

determining a target detection frame according to the third probability corresponding to each detection frame;

and generating a detection result corresponding to the image to be detected according to the target detection frame information and the category of the object in the target detection frame.

and under the condition that the maximum probability is greater than or equal to the first threshold, generating a detection result corresponding to the image to be detected according to the detection frame information and the class to which the object in the detection frame belongs.

determining the maximum first probability corresponding to each detection frame according to each first probability corresponding to each detection frame under the condition that the first probability is the probability that the object in each detection frame belongs to each scene;

under the condition that the maximum first probability corresponding to any detection frame is smaller than a second threshold value, updating the second probability according to the maximum first probability corresponding to any detection frame to obtain a fourth probability corresponding to any detection frame;

and generating a detection result corresponding to the image to be detected according to the fourth probability corresponding to any detection frame.

It should be noted that the explanation of the foregoing embodiment of the image detection method is also applicable to the image detection apparatus of this embodiment, and therefore, the description thereof is omitted here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the device 600 includes a computing unit 601 which can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 602 or a computer program loaded from a storage unit 608 into a RAM (Random Access Memory) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An I/O (Input/Output) interface 605 is also connected to the bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 601 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 601 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 601 executes the respective methods and processes described above, such as the image detection method. For example, in some embodiments, the image detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the image detection method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image detection method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, System On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in a conventional physical host and a VPS (Virtual Private Server). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present disclosure, the present disclosure further provides a computer program product, which, when executed by an instruction processor in the computer program product, executes the image detection method proposed by the above-mentioned embodiment of the present disclosure.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An image detection method, comprising:

acquiring an image to be detected;

2. The method of claim 1, wherein the generating a detection result corresponding to the image to be detected according to each of the first probabilities and each of the second probabilities comprises:

3. The method as claimed in claim 2, wherein the generating a detection result corresponding to the image to be detected according to the third probability corresponding to each of the detection frames comprises:

4. The method of claim 2, wherein after said determining a maximum probability from said first probabilities, further comprising:

5. The method as claimed in claim 1, wherein said generating a detection result corresponding to the image to be detected according to each of the first probabilities and each of the second probabilities comprises:

6. An image detection apparatus comprising:

the acquisition module is used for acquiring an image to be detected;

7. The apparatus of claim 6, wherein the means for generating is configured to:

8. The apparatus of claim 7, wherein the generating means is configured to:

9. The apparatus of claim 7, wherein the generating means is configured to:

10. The apparatus of claim 6, wherein the means for generating is configured to:

under the condition that the maximum first probability corresponding to any detection frame is smaller than a second threshold value, updating a second probability of the detection frame according to the maximum first probability corresponding to the detection frame to obtain a fourth probability corresponding to the detection frame;

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 5.