CN111402185B

CN111402185B - Image detection method and device

Info

Publication number: CN111402185B
Application number: CN201811528544.XA
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2023-12-08
Anticipated expiration: 2038-12-13
Also published as: CN111402185A

Abstract

The application relates to the technical field of image processing, in particular to an image detection method and device, wherein the method comprises the following steps: acquiring a monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video; then, screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image; and finally, determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by utilizing the determined target object detection models. By the method, the calculated amount in the target object detection process can be reduced, and the accuracy and the detection efficiency of target object detection are both considered.

Description

Image detection method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image detection method and apparatus.

Background

At present, in the application fields of video monitoring, security protection, unmanned driving and the like, detection of a target object in a monitoring video is involved, and the target object is, for example, a pedestrian, a vehicle or the like appearing in the monitoring video. When detecting a target object in the existing monitoring video, each frame of image in the monitoring video is generally input into a preset target object detection model so as to detect whether the characteristic information of the target object is included in each frame of image.

However, since the information about whether the target object appears in each frame of image of the monitoring video, the number and the position of the target object are not fixed, the detection efficiency may be low and the detection accuracy may be affected if the detection method is adopted.

Disclosure of Invention

In view of this, the embodiment of the application provides an image detection method and device, so as to improve the efficiency and accuracy of image detection.

In a first aspect, an embodiment of the present application provides an image detection method, including:

acquiring a monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video;

screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image;

and determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting target object information included in each candidate image by utilizing the determined target object detection models.

In a possible implementation manner, the characteristic information of the target area image includes at least one of the following information:

The number of target area images, the area ratio between the total area of the target area images and the total area of the corresponding candidate images.

In a possible implementation manner, the determining the target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image includes:

for a kth candidate image in the at least one candidate image, k is a positive integer, performing the following processing:

when the number of the target area images corresponding to the kth candidate image is smaller than or equal to a preset number, determining a target object detection model matched with the kth candidate image as a first target object detection model;

when the number of the target area images corresponding to the kth candidate image is larger than the preset number, determining a target object detection model matched with the kth candidate image as a second target object detection model;

wherein the complexity of the second target object detection model is higher than the complexity of the first target object detection model.

when the number of the target area images corresponding to the kth candidate image is larger than the preset number and the area ratio is larger than a preset area threshold, determining a target object detection model matched with the kth candidate image as a second target object detection model;

when the number of the target area images corresponding to the kth candidate image is greater than the preset number and the area ratio is smaller than or equal to the preset area threshold, determining a target object detection model matched with the kth candidate image as a third target object detection model;

wherein the complexity of the first target object detection model is lower than the complexity of the second target object detection model, and the complexity of the second target object detection model is lower than the complexity of the third target object detection model.

In a possible implementation manner, the detecting the target object information included in each candidate image by using the determined target object detection models respectively includes:

inputting each target area image intercepted from the kth candidate image into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image; or,

and splicing all target area images intercepted from the kth candidate image, inputting the spliced target area images into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image.

In a possible implementation manner, the stitching the target area images intercepted from the kth candidate image includes:

calculating the area of each target area image intercepted from the kth candidate image;

the areas of the target area images are arranged in sequence from large to small;

and based on the obtained sequencing result, splicing all the target area images intercepted in the kth candidate image to obtain a spliced target area image.

In a possible implementation manner, the target object information included in the kth candidate image includes at least one of the following information:

Marking information of a region image of the target object appears in the kth candidate image;

and mapping each regional image with the target object to coordinate position information on a corresponding image in the monitoring video.

In a possible implementation manner, the coordinate position information of each area image of the existing target object mapped onto the corresponding image in the monitoring video is detected according to the following manner:

determining the relative coordinate distance between the coordinate position of a second selected pixel point in each area image with the target object and the first selected pixel point by taking the coordinate position of the first selected pixel point in the corresponding image in the monitoring video as a reference coordinate position;

based on the relative coordinate distance, adjusting the coordinate position of each pixel point in each region image with the target object;

and determining the coordinate position of each pixel point after adjustment in the area image of each existing target object as the coordinate position of the area image of each existing target object mapped to the corresponding image in the monitoring video.

In a possible implementation manner, the determining the pixel information difference degree between any two adjacent frames of images in the monitoring video includes:

Aiming at an ith frame image and an (i+1) th frame image in the monitoring video, wherein i is a positive integer, executing a first processing process; wherein the first process includes:

converting the i-th frame image into a first gray scale image, and converting the i+1th frame image into a second gray scale image;

respectively subtracting the gray values of the pixel points of the second gray image and the first gray image to obtain a third gray image;

and determining a pixel information difference degree between the (i+1) -th frame image and the (i) -th frame image based on the third gray scale image.

In a possible implementation manner, the determining, based on the third gray scale image, a pixel information difference degree between the i+1st frame image and the i frame image includes:

determining a first type pixel point with a gray value larger than a first set threshold value and a second type pixel point with a gray value not larger than the first set threshold value in the third gray image;

the gray value of the first type pixel point is adjusted to be a first numerical value, and the gray value of the second type pixel point is adjusted to be a second numerical value, so that a fourth gray image is obtained;

and determining the pixel information difference degree between the (i+1) th frame image and the (i) th frame image according to the number of pixel points with gray values of the first numerical value in the fourth gray image.

In a possible implementation manner, the determining the candidate image in the monitoring video, where the difference degree of the pixel information meets a preset condition, includes:

when the pixel information difference degree between the (i+1) -th frame image and the (i) -th frame image is determined to be larger than a set difference degree threshold, determining a candidate image corresponding to the (i+1) -th frame image according to the (i+1) -th frame image and the fourth gray level image.

In a possible implementation manner, the determining the candidate image corresponding to the i+1th frame image according to the i+1th frame image and the fourth gray scale image includes:

determining a gray area image formed by pixel points with gray values of the first numerical value in the fourth gray image;

determining candidate area images matched with the gray area images in the (i+1) th frame image;

adjusting pixel values of pixel points of other region images except the candidate region image in the i+1th frame image to the second numerical value;

and determining the adjusted i+1st frame image as a candidate image corresponding to the i+1st frame image.

In a possible implementation manner, the capturing the target area image from the at least one candidate image includes:

Executing a second processing procedure for a j-th candidate image in the at least one candidate image, j being a positive integer; wherein the second process includes:

determining a pixel point of which the pixel value in the j candidate image is not a second numerical value;

and intercepting at least one target area image containing pixel points with pixel values not being the second numerical value from the j candidate image.

In a second aspect, an embodiment of the present application provides an image detection apparatus, including:

the determining module is used for acquiring the monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video;

the screening module is used for screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video and intercepting a target area image from the at least one candidate image;

and the detection module is used for determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by utilizing the determined target object detection models.

In one possible design, the feature information of the target area image includes at least one of the following information:

In one possible design, the detection module is specifically configured to, when determining a target object detection model that matches each candidate image based on feature information of a target area image corresponding to each candidate image:

In one possible design, the detection module is specifically configured to, when detecting target object information included in each candidate image using the determined target object detection models, respectively:

In one possible design, the detection module is specifically configured to, when stitching each target area image taken from the kth candidate image:

In one possible design, the target object information included in the kth candidate image includes at least one of the following information:

In one possible design, the detection module detects coordinate position information of each area image where the target object appears mapped onto a corresponding image in the monitoring video according to the following manner:

In one possible design, the determining module is specifically configured to, when determining a difference degree of pixel information between any two adjacent frames of images in the monitoring video:

In one possible design, the determining module is specifically configured to, when determining the pixel information difference degree between the i+1st frame image and the i frame image based on the third grayscale image:

In one possible design, the filtering module is specifically configured to, when determining the candidate image in the surveillance video, where the difference degree of the pixel information meets a preset condition:

In a possible design, the filtering module is specifically configured to, when determining, according to the i+1st frame image and the fourth grayscale image, a candidate image corresponding to the i+1st frame image:

In a possible design, the filtering module is specifically configured to, when capturing the target area image from the at least one candidate image:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the steps of the image detection method of the first aspect, or any of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the image detection method described in the first aspect, or any possible implementation manner of the first aspect.

According to the image detection method and device provided by the embodiment of the application, candidate images with pixel information difference degree meeting preset conditions are screened out from the monitoring video based on the pixel information difference degree between any two adjacent frames of images in the monitoring video, then the target object detection model matched with each candidate image is determined based on the target area image intercepted from the candidate images and the characteristic information of the target area image corresponding to each candidate image, and the target object information included in each candidate image is detected by utilizing the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, so that the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect target object information by combining the characteristic information of the target area image of the possibly-occurring target object, so that the detection efficiency and the detection accuracy can be improved.

The foregoing objects, features and advantages of embodiments of the application will be more readily apparent from the following detailed description of the embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flow chart of an image detection method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for determining a target object detection model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for determining a target object detection model according to an embodiment of the present application;

fig. 4 shows a schematic view of a spliced target area image according to an embodiment of the present application;

fig. 5 shows a flowchart of a target area image stitching method according to an embodiment of the present application;

FIG. 6 is a flowchart of a first process performed by an embodiment of the present application;

fig. 7 is a schematic diagram illustrating an example of a calculation process of a third grayscale image according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating an example of a fourth gray image determination method according to an embodiment of the present application;

Fig. 9 is a schematic flow chart of a candidate image determining method according to an embodiment of the present application;

FIG. 10 illustrates an exemplary diagram of candidate image determination provided by an embodiment of the present application;

FIG. 11 illustrates an exemplary diagram of candidate image determination provided by an embodiment of the present application;

FIG. 12 is a flowchart of a second process performed by an embodiment of the present application;

FIG. 13 is a schematic diagram of coordinate transformation provided by an embodiment of the present application;

fig. 14 is a schematic flow chart of an image detection method according to an embodiment of the present application;

FIG. 15 is a flowchart of a training method of a target object detection model according to an embodiment of the present application;

fig. 16 shows a schematic architecture diagram of an image detection apparatus 1600 according to an embodiment of the present application;

fig. 17 shows a schematic structural diagram of an electronic device 170 according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The following detailed description of embodiments of the application is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

First, an application scenario to which the present application is applicable will be described. The application can be applied to application scenes such as monitoring suspected persons possibly appearing in a certain area, counting pedestrians or vehicles appearing in a certain area in a specified time period and the like. In the prior art, whether characteristic information of a target object is included in each frame of image is detected mainly by inputting each frame of image in a monitoring video into a preset target object detection model.

In an example, when the selected target object detection model is a simple neural network model, if the image features in the input image are complex, deep features may not be accurately extracted by using the simple neural network model, which results in lower accuracy of detecting the target object. For example, when some images in the monitoring video include a large number of target objects, if a simple network model is adopted, deep features of the large number of existing target objects cannot be extracted, and therefore the accuracy rate is low when the images with the large number of target objects are detected.

In another example, when the selected target object detection model is a complex neural network model, if the image features in the input image are relatively simple, the detection result can be accurately determined without extracting deep features, and in this case, the detection efficiency of the target object is low by using the complex neural network model. For example, when a few images or only one target object appears in a monitoring video, the detection result can be accurately determined without extracting deep features, so that the detection efficiency is low when the complex neural network model is adopted to detect the images with the few images or only one target object.

It is worth noting that if some images in the monitoring video are complex, deep features in the images cannot be extracted through a simple neural network model, so that the accuracy rate is low when the target object is detected; if some images in the monitoring video are simpler, the detection result can be accurately determined without extracting deep features in the images, and the detection efficiency is lower when the detection is performed through a complex neural network model. Therefore, the prior art cannot give consideration to both detection efficiency and detection accuracy when detecting a target object appearing in a monitoring video.

In view of the above problems, an embodiment of the present application provides an image detection method and apparatus, based on pixel information difference between any two adjacent frames of images in a surveillance video, candidate images with pixel information difference meeting a preset condition are screened from the surveillance video, then, based on target area images intercepted from the candidate images, a target object detection model matched with each candidate image is determined according to feature information of the target area image corresponding to each candidate image, and target object information included in each candidate image is detected by using the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, so that the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect target object information by combining the characteristic information of the target area image of the possibly-occurring target object, so that the detection efficiency and the detection accuracy can be improved.

The image detection method and device provided by the application are described in detail below with reference to specific embodiments.

Example 1

Referring to fig. 1, a flowchart of an image detection method according to an embodiment of the present application includes the following steps:

step 101, acquiring a monitoring video, and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video.

The pixel information difference degree can be understood as the difference of pixel information between two adjacent frames of images, and when the pixel information in the two adjacent frames of images changes in the monitoring video, the pixel information difference degree between the two adjacent frames of images is not zero.

For example, if an object moving in a certain area monitored by the monitoring video is not present, the monitored images of each frame of image in the monitoring video are the same, for example, the monitoring video of a certain cell gate is monitored at late night, in this case, the pixel information of two adjacent frames of images in the monitoring video is not changed, so that the difference degree of the corresponding pixel information is zero; on the contrary, if a moving object appears in a certain area monitored by the monitoring video, images with different monitored pictures exist in the monitoring video, and in this case, the pixel information between two adjacent frames of images changes, so that the difference degree of the corresponding pixel information is not zero.

And 102, screening at least one candidate image with pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image.

In specific implementation, the preset condition may be that the difference degree of pixel information between the current frame image and the previous frame image is greater than a set difference degree threshold, and when the current frame image meets the above condition, a candidate image of the current frame image may be determined based on the current frame image; and when the pixel information difference degree between the current frame image and the previous frame image is not more than the set difference degree threshold value, the current frame image is not provided with a moving object, and whether the next frame image is provided with the moving object is further analyzed.

Wherein, considering that the pixel information difference degree is obtained according to the difference of the pixel information between any two continuous frame images, when the pixel information difference degree between the current frame image and the previous frame image meets the preset condition, the pixel information difference between the current frame image and the previous frame image is larger, so the current frame image is an image which possibly has the appearance of a target object.

However, considering the interference of some environmental factors, such as when the leaf is included in the monitoring video, and the leaf is blown in the external environment due to wind interference, the pixel information of the leaf included in the adjacent frame image of the monitoring video may change, but at this time, the current frame image does not include the target object, but it may be determined that the current frame image is a candidate image whose pixel information difference meets the preset condition, so that the candidate image may be further processed to accurately identify the image in which the target object exists in the monitoring video.

In the embodiment of the application, after the candidate image is determined according to the pixel information difference, in order to accurately detect the local image with the changed pixel information, the target area image can be intercepted from the candidate image.

For example, the image a is a current frame image, the image B is a previous frame image of the current frame image, the image a and the image B are composed of 1, 2, 3 and 4 area images, the pixel difference degree between the image a and the image B meets the preset condition, and the candidate image corresponding to the image a can be determined; however, if only the difference of pixel information between the 1 region of the image a and the 1 region of the image B is large, it may be stated that the 1 region of the image a may have a target object, and then when the candidate image corresponding to the image a is detected, the 1 region of the image a may be cut out as the target region image.

Specifically, a method of capturing an image of a target area from a candidate image will be described in the second embodiment, and will not be described here.

And 103, determining a target object detection model matched with each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting the target object information included in each candidate image by utilizing the determined target object detection models.

Wherein the characteristic information of the target area image includes at least one of the following information:

(1) The number of target area images may be regarded as the feature information of the target area images, considering that the number of target area images taken from any one of the candidate images is at least one;

(2) The area ratio between the total area of the target area image and the total area of the corresponding candidate image.

In one possible embodiment, when the feature information of the target area image includes the number of target area images, and when determining the target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image, taking the kth candidate image in at least one candidate image as an example, k is a positive integer, the target object detection model determining method as shown in fig. 2 may be performed, including the steps of:

step 201, obtaining the number of target area images corresponding to the kth candidate image.

Step 202, judging whether the number of the target area images corresponding to the kth candidate image is larger than a preset number.

If yes, go to step 203;

If the determination result is negative, step 204 is performed.

Step 203, determining the target object detection model matched with the kth candidate image as a second target object detection model.

Step 204, determining the target object detection model matched with the kth candidate image as the first target object detection model.

Wherein the complexity of the second target object detection model is higher than the complexity of the first target object. In the implementation, considering that the more the number of the target area images is, the more complex the features contained in the candidate images are, the more deep features need to be extracted, and based on the more complex the features, when the number of the target area images corresponding to the kth candidate image is greater than the preset number, determining the target object detection model matched with the kth candidate image as a second target object detection model; and when the number of the target area images corresponding to the kth candidate image is smaller than or equal to the preset number, determining the target object detection model matched with the kth candidate image as the first target object detection model.

In another possible embodiment, if the feature information of the target area image includes the number of target area images and the area ratio between the total area of the target area images and the total area of the corresponding candidate images, when determining the target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image, taking the kth candidate image in at least one candidate image as an example, k is a positive integer, the target object detection model determining method as shown in fig. 3 may further be performed, including the steps of:

Step 301, acquiring the number of target area images corresponding to the kth candidate image and the area ratio between the total area of the target area images corresponding to the kth candidate image and the total area of the kth candidate image.

Step 302, judging whether the number of the target area images corresponding to the kth candidate image is larger than a preset number.

If the determination result is no, executing step 303;

if yes, go to step 304.

Step 303, determining the target object detection model matched with the kth candidate image as the first target object detection model.

And 304, judging whether the area ratio between the total area of the target area image corresponding to the kth candidate image and the total area of the kth candidate image is larger than a preset area threshold value.

If yes, go to step 305;

if the determination result is negative, step 306 is performed.

Step 305, determining the target object detection model matched with the kth candidate image as the second target object detection model.

Step 306, determining the target object detection model matched with the kth candidate image as a third target object detection model.

Wherein the complexity of the first target object detection model is lower than the complexity of the second target object detection model and the complexity of the third target object detection model. In a specific implementation, considering that the more the number of the target area images is, the more complex the features contained in the candidate images are, the more deep features need to be extracted, based on which, when the number of the target area images is less than or equal to a preset number, the target object detection model matched with the kth candidate image is determined to be the first target object detection model, and when the number of the target area images is greater than the preset number, the target object detection model matched with the kth candidate image is determined to be the second target object detection model or the third target object detection model.

When the number of the target area images is greater than the preset number, in order to select a more matched target object detection model, in the embodiment of the application, further judgment can be performed according to the area ratio between the total area of the target area images corresponding to the candidate images and the total area of the candidate images. Specifically, when the area ratio between the total area of the target area image corresponding to the candidate image and the total area of the candidate image is smaller, it is stated that the smaller the total area of the target area image is, the more complex target object detection model is required to perform deep feature extraction, so as to realize detection of the target object.

In one possible implementation, the complexity of the third target object detection model is higher than the complexity of the second target object detection model. When the number of the target area images corresponding to the kth candidate image is larger than the preset number and the area ratio is larger than the preset area threshold, determining a target object detection model matched with the kth candidate image as a second target object detection model; and when the number of the target area images corresponding to the kth candidate image is greater than the preset number and the area ratio is smaller than or equal to the preset area threshold, determining the target object detection model matched with the kth candidate image as a third target object detection model.

Therefore, based on the foregoing implementation manner, the complexity of the target object detection model according to the embodiment of the present application is as follows in order from high to low: and the third target object detection model, the second target object detection model and the first target object detection model.

In an example of the present application, the first target object detection model may be, for example, a MobileNet model, a ShuffleNet model, or the like; the second target object detection model may be, for example, a ResNet18 model, a ResNet34 model, or the like; the third target object detection model may be, for example, a res net50 model, a res net101 model, or the like, and the network model may be another network model in practical applications, which is not limited to this aspect of the present application.

Specifically, the training process of the target object detection model will be described in detail in the fourth embodiment, and will not be described here.

After determining the target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image, the determined target object detection models may be used to detect the target object information included in each candidate image, and taking the kth candidate image in at least one candidate image as an example, k is a positive integer, and the following two ways may be referred to:

Mode one: inputting each target area image intercepted from the kth candidate image into a target object detection model matched with the kth candidate image respectively, and detecting target object information included in the kth candidate image;

before each target area image is input to the target object detection model matched with the kth candidate image, the images of the target area images can be adjusted to be the images with the same size, then the target area images with the adjusted sizes are sequentially input to the target object recognition model, and after each input target area image is recognized by the target object recognition model, if target object information is detected, the input target area image can be marked.

Mode two: and splicing all target area images intercepted from the kth candidate image, inputting the spliced target area images into a target object detection model matched with the kth candidate image, and detecting target object information included in the kth candidate image.

After the target area images are spliced, the spliced target area images are input into a target object detection model, so that detection of a plurality of target area images can be realized at the same time.

For example, in stitching each target area image taken from the kth candidate image, reference may be made to a target area image stitching method as shown in fig. 5, including the steps of:

step 501, calculating the area of each target area image cut from the kth candidate image.

Step 502, arranging the areas of the target area images in order from large to small.

And step 503, based on the obtained sequencing result, stitching all the target area images intercepted in the kth candidate image to obtain a stitched target area image.

In specific stitching, at least one target area image can be stitched according to the principle of minimum area, so as to obtain a stitched target area image. For example, after at least one target region is ordered from big to small, the target region may be spliced by using a binary tree data structure mode, so as to obtain a spliced target region image as shown in fig. 4.

According to the image detection method provided by the embodiment of the application, candidate images with pixel information difference degree meeting preset conditions are screened out from a monitoring video based on pixel information difference degree between any two adjacent frames of images in the monitoring video, then target object detection models matched with each candidate image are determined based on target area images intercepted from the candidate images and according to characteristic information of the target area images corresponding to each candidate image, and target object information included in each candidate image is detected by utilizing the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, so that the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect target object information by combining the characteristic information of the target area image of the possibly-occurring target object, so that the detection efficiency and the detection accuracy of the target object can be improved.

Example two

The image detection method provided in the first embodiment is specifically described below in connection with the image detection process described in the first embodiment.

In specific implementation, when determining the difference degree of pixel information between any two frames of images in the monitoring video, taking an i-th frame image and an i+1th frame image in the monitoring video as examples, i is a positive integer, executing a first processing procedure, where an execution step of the first processing procedure may refer to a first processing procedure execution flow chart shown in fig. 6, and the method includes the following steps:

step 601, converting the i-th frame image into a first gray scale image, and converting the i+1th frame image into a second gray scale image.

In a possible embodiment, in the case where each frame image in the monitor video is a color image, the i-th frame image may be converted into a first gray-scale image, and the i+1th frame image may be converted into a second gray-scale image, so as to determine the pixel information difference degree by comparing the difference in gray-scale value between the i-th frame image and the i+1th frame image.

Step 602, subtracting the gray values of the second gray image and the first gray image from each other, so as to obtain a third gray image.

In a specific implementation, the first gray scale image and the second gray scale image are derived from the same monitoring video, so that the image sizes and resolutions of the first gray scale image and the second gray scale image are the same, and if the first gray scale image and the second gray scale image both contain m×n pixels, gray values of two pixels at the same position in the two images can be subtracted, and a third gray scale image can be obtained after the subtraction.

For example, an exemplary schematic diagram of the calculation process of the third gray-scale image shown in fig. 7 is shown in fig. 7, where image 1 is a first gray-scale image and includes 9 pixels, the number in the image represents the gray value of each pixel in the first gray-scale image, image 2 is a second gray-scale image and includes 9 pixels, the number in the image represents the gray value of each pixel in the second gray-scale image, and image 3 is a third gray-scale image obtained by subtracting the first gray-scale image from the second gray-scale image, the third gray-scale image includes 9 pixels, and the number in the image represents the gray value of each pixel in the third gray-scale image.

Step 603, determining a pixel information difference degree between the (i+1) th frame image and the (i) th frame image based on the third gray level image.

In one example, the third gray-scale image may be first binarized to obtain a binarized fourth gray-scale image.

Specifically, a pixel point with a gray value greater than a first set threshold in the third gray image may be determined as a first type pixel point, and a pixel point with a gray value not greater than the first set threshold in the third gray image may be determined as a second type pixel point; then, the gray value of the first type pixel point is adjusted to be a first numerical value, and the gray value of the second type pixel point is adjusted to be a second numerical value, so that a fourth gray image is obtained; and finally, determining the pixel information difference degree between the (i+1) th frame image and the (i) th frame image according to the number of pixel points with the gray value of the first numerical value in the fourth gray image. Wherein the first value is not equal to the second value.

For example, fig. 8 is a schematic diagram illustrating an example of a fourth gray scale image determining method, where the image 4 is a third image, and the image 4 includes 9 pixels, and the numerical value on the image 4 represents the gray scale value of the pixel corresponding to each numerical value. If the first set threshold is set to be 5, the image 5 is a pixel type determined by each pixel of the third image, a "1" in the image 5 indicates that the pixel is a first type pixel, and a "2" in the image 5 indicates that the pixel is a second type pixel.

For example, if the first value may be 0 and the second value may be 255, then the image 4 in fig. 8 may be converted into the image 6, where the first value may be 0 and the second value may be 255, then the converted image 6 is a binary image, and in practical application, the converted image may also be adjusted to be a non-binary image, which is not limited in this aspect of the present application.

In a specific implementation, after the gray value is adjusted to obtain the fourth gray image, the pixel information of the pixel points in the partial image is considered to be changed, but after calculation and conversion, the fourth gray image is not displayed with the correct pixel value, at this time, the fourth gray image needs to be subjected to refinement processing, for example, expansion operation, erosion operation, opening operation, closing operation and the like, so that the fourth gray image is converted into a gray image with clear edges and no hole in the middle of the image. Specifically, the processing procedures of the expansion operation, the erosion operation, the open operation, and the close operation will not be explained.

In a possible embodiment, when determining the pixel information difference between the i+1st frame image and the i frame image according to the number of pixels having the first gray value in the fourth gray image, for example, a ratio between the number of pixels having the first gray value and the number of pixels in the whole image may be used as the pixel information difference, or the number of pixels having the first gray value may be directly used as the pixel information difference.

After calculating the pixel information difference degree between any two adjacent frame images in the monitoring video, the candidate image with the pixel information difference degree meeting the preset condition in the monitoring video can be determined based on the pixel information difference degree between any two adjacent frame images, and specifically, when the pixel information difference degree between the (i+1) th frame image and the (i) th frame image is determined to be larger than a set difference degree threshold, the candidate image corresponding to the (i+1) th frame image can be determined according to the (i+1) th frame image and the fourth gray level image.

In an example, the manner of determining the candidate image corresponding to the i+1th frame image may refer to a flowchart of a candidate image determining method shown in fig. 9, which includes the following steps:

Step 901, determining a gray area image formed by pixel points with gray values of the first value in the fourth gray image.

Because the gray value of the pixel point in the fourth gray image has two values: the method comprises the steps of determining a gray area image, wherein the gray value of the gray area image is a pixel point, the gray value of the pixel point is a pixel point, the gray value difference between the pixel point and the pixel point of the previous frame is larger than a first set threshold value, the gray value of the pixel point is a pixel point, the gray value difference between the pixel point and the pixel point of the previous frame is not larger than the first set threshold value, and the gray value difference between the current frame image and the gray value of the previous frame image is not larger than the first set threshold value.

Step 902, determining a candidate region image matched with the gray region image in the (i+1) th frame image.

In a possible embodiment, the coordinate position of the gray area image in the fourth gray image may be determined first, then, based on the determined coordinate position, the area image conforming to the determined coordinate position in the i+1st frame image is determined, and the area image conforming to the determined coordinate position is taken as the candidate area image.

Step 903, the pixel values of the pixels of the region images except the candidate region image in the i+1st frame image are adjusted to the second value.

In one example, the second value is, for example, 255. Through the above-described processing, it is possible to adjust the image areas other than the candidate area image in the i+1th frame image to a white area while retaining only the pixel information of the candidate area image in the i+1th frame image.

Step 904, determining the adjusted i+1st frame image as a candidate image corresponding to the i+1st frame image.

In an example, as shown in fig. 10, an image a in fig. 10 represents a fourth gray-scale image, in which the first value is 0, the second value is 255, an image B represents an i+1st frame image, an image C represents a candidate image, a black region in the image a represents a gray-scale region image having a gray-scale value of 0, a portion framed by a white line in the image B represents an image matching the gray-scale region image in the i+1st frame image, and after adjusting pixel values of pixels of other region images than the candidate region image in the i+1st frame image to the second value (i.e., 255), the candidate image shown in the image C is obtained.

Considering that there may be more than one part of the gray area image in the fourth gray image, as shown in fig. 11, two parts of black areas in the image a in fig. 11 represent the gray area image of the fourth image, two parts of areas in the (i+1) th frame image represented by the image B are matched with the gray area image in the image a, so that a candidate image including two parts of target area images shown in the image C can be obtained.

After the target area image in the candidate image is determined in the mode, the target area image can be cut out from the candidate image, and then the target area image is identified, so that the process of identifying the whole candidate area image is omitted, and the calculated amount is reduced.

In specific implementation, for the j-th candidate image in the determined candidate images, j is a positive integer, a second processing procedure is executed to intercept the target area image from the candidate images.

The second process may be as shown in fig. 12, and includes the following steps:

step 1201, determining a pixel point in the j candidate image, where the pixel value is not the second value;

step 1202, at least one target area image containing pixel points with pixel values not being the second value is intercepted from the j candidate image.

Further, when detecting whether the determined candidate image includes the target object information based on the truncated target area image, the target object information included in the jth candidate image may be determined based on at least one target area image truncated in the jth candidate image and a pre-trained target object detection model, and specifically, the process of selecting the target object detection model and identifying the target area image using the target object detection model will not be described in detail herein with reference to the method described in embodiment one.

Wherein, the target object information may include at least one of the following information:

(1) Marking information of the region image in which the target object appears in the kth candidate image, for example, the region image in which the target object appears may be marked in the target region image using a rectangular frame;

(2) And mapping each regional image with the target object to coordinate position information on a corresponding image in the monitoring video.

In the embodiment of the application, after the target object area image with the label is obtained, the position coordinate and position information of the target object on the i+1st frame image of the monitoring video can be determined based on the target object area image with the label.

In one possible implementation manner, the coordinate position of the first selected pixel point in the corresponding image in the monitoring video is taken as a reference coordinate position, and the relative coordinate distance between the coordinate position of the second selected pixel point and the first selected pixel point in the regional image of each existing target object is determined; then, based on the relative coordinate distance, adjusting the coordinate position of each pixel point in each region image with the target object; and then the coordinate positions of the pixel points after adjustment in the area images of each existing target object are determined as the coordinate positions of the area images of each existing target object mapped to the corresponding images in the monitoring video.

As shown in the coordinate transformation diagram of fig. 13, the right gray area represents the target area image, and the (i+1) th frame image on the left is the corresponding image in the monitoring video to which the target area image is mapped.

Taking the first selected pixel point as an O point of the upper left corner in the i+1st frame image as an example, and taking the O point as an origin to establish a first coordinate system on the i+1st frame image, the position of the O point in the first coordinate system is (0, 0), wherein the coordinate position of the a point of the upper left corner in the target area image on the first coordinate system is (x 0, y 0).

Taking the second selected pixel point as an A ' point of the upper left corner in the target area image as an example, and taking A ' as an origin to establish a second coordinate system on the target area image, the position of the A ' point in the second coordinate system is (0, 0).

Since the point a 'corresponds to the same pixel point as the point a, the relative coordinate distance between the point O and the point a, that is, the relative coordinate distance between the point O and the point a', can be determined, and thus, the relative coordinate distance can be determined to be (x 0, y 0).

Further, assuming that, after the candidate image is detected, the B '(x, y) point in the target area image shown in fig. 11 is determined to be a certain pixel point in the area image where the target object appears, after the B' point is mapped onto the i+1st frame image, the coordinate position of the obtained B point on the i+1st frame image is (x+x0, y+y0).

In one possible implementation manner, after the target object information is detected in the target area image, the target object contained in the target area image may be labeled in the form of a square frame, then coordinates of four vertices of the square frame labeled with the target object in the target area image may be determined, then coordinates of four vertices of the square frame labeled with the target object in the video frame image may be determined according to a corresponding relationship between the coordinate position of the target area image and the coordinate position of the video frame image, the target object may be labeled in the video frame image, and then the position of the target object may be determined in the video frame image. Here, the video frame image is the corresponding image in the monitoring video mapped to the target area image.

According to the method, a current video frame image and a previous video frame image are converted into gray images, a third gray image is determined by utilizing the gray value difference among pixel points, gray values of pixel points with gray values larger than a first set threshold value in the third gray image are adjusted to be a first value, gray values of pixel points with gray values not larger than the first set threshold value are adjusted to be a second value, and therefore a fourth gray image is determined; then, determining a candidate image based on the fourth gray scale image and the current video frame image; further, at least one target area image is intercepted from the candidate images, and the at least one target area image is spliced to obtain a spliced target area image; and finally, determining a matched target object detection model based on the spliced target area image, and determining target object information contained in the candidate image based on the spliced target area image and the target object detection model matched with the spliced target area image. By the method, not all images in the monitoring video are detected, but partial area images of the images with larger change of pixel information are detected, so that the calculated amount in the target object information is reduced; and selecting a proper target object detection model according to the spliced target area image, so that the accuracy and the detection efficiency of target object detection are improved.

Example III

In the embodiment of the present application, the image detection method provided in the first embodiment is taken as an example of the ith frame and the (i+1) th frame of images in the surveillance video in combination with the second embodiment, and the method is described as shown in fig. 14, and includes the following steps:

step 1401, determining a third gray scale image according to the i-th frame image and the i+1th frame image.

Specifically, the i-th frame image may be converted into a first gray image, the i+1-th frame image may be converted into a second gray image, and then the gray values of the first gray image and the second gray image for each pixel point may be subtracted to obtain a third gray image.

Step 1402, adjusting a gray value of each pixel point in the third gray image, and determining the adjusted image as a fourth gray image.

In one possible embodiment, the gray value of a point in the third gray image having a gray value greater than the first set threshold may be adjusted to a first value, and the gray value of a point in the third gray image having a gray value not greater than the first set threshold may be adjusted to a second value. Therefore, there are two possible values for the gray value of the pixel point included in the fourth gray image: a first value and a second value.

Step 1403, determining the pixel information difference degree between the i+1st frame image and the i frame image according to the fourth gray scale image.

In an example of the present application, the pixel information difference degree may be used according to the duty ratio of the number of the first number of pixels in the fourth gray scale image on the i+1th frame image.

Step 1404, determining candidate images of the (i+1) th frame of image according to the pixel information difference degree.

In the implementation, when the pixel information difference degree is larger than a set difference degree threshold value, determining a gray region image formed by pixel points with gray values of a first value in the fourth gray image; then determining candidate area images matched with the gray area images in the (i+1) th frame image; and adjusting the pixel values of the pixel points of the other region images except the candidate region image in the i+1th frame image to be a second numerical value, and determining the adjusted i+1th frame image as a candidate image corresponding to the i+1th frame image.

Step 1405, intercepting a target area image from the candidate image.

In one possible implementation manner, pixel points with pixel values not being the second value are cut from candidate images of the (i+1) th frame of image, and an image formed by the pixel points is determined as a target area image, wherein the determined target area image can be at least one.

Step 1406, determining a target object detection model matched with the candidate image based on the characteristic information of the target area image cut from the candidate image.

The method for determining the target object detection model matching with the candidate image is described in the first embodiment, and will not be described herein.

Step 1407, detecting target object information included in the candidate image using the determined target object detection model.

In an example, the at least one intercepted target area image may be stitched first, and then the stitched image is input into a pre-trained target object detection model, and the target area image including the target object is marked.

After the target area image with the label is obtained, the coordinate position of the first selected pixel point in the candidate image can be firstly determined as the reference coordinate position, then the relative coordinate distance between the coordinate position of the second selected pixel point of the target area image and the first selected pixel point is determined, the position coordinates of the pixel points in the area image with the target object appearing in the target area image are adjusted based on the relative distance coordinates, and finally the coordinate positions of the adjusted pixel points in the area image with the target object are determined to be the coordinate positions of the i+1st frame image of the monitoring video, wherein the coordinate positions of the pixel points are mapped to the area image with the target object.

By the method, not all images in the monitoring video are detected, but partial area images of the images with larger change of pixel information are detected, so that the calculated amount in the target object information is reduced; and selecting a proper target object detection model to realize the detection of the target object according to the spliced target area image, thereby improving the accuracy and the detection efficiency of the target object detection.

Example IV

In a fourth embodiment of the present application, a training process of the target object detection model will be described in detail, and a flowchart of a training method of the target object detection model shown in fig. 15 includes the following steps:

step 1501, a training sample image set and a verification sample image set of a target object detection model are acquired.

Specifically, the training sample image set may be an image containing a target object, where the target may be at least one target object. For example, the set of training sample images may be a set of images containing target object a, images containing target object B, images containing target object C, images containing target object D. The verification sample image set is a set of sample images marked by target object information of each sample image contained in the training sample image set.

Step 1502, sequentially inputting each sample image in the training sample image set into the target recognition model to obtain a training result of the training sample image set.

And 1503, determining the accuracy of the target object detection model based on the training result of the training sample image set and the verification sample image set.

Step 1504, judging whether the accuracy is larger than a preset accuracy.

If yes, executing step 1505;

if the determination result is negative, step 1506 is executed.

Step 1505, determining that the training of the target object detection model is completed.

Step 1506, adjusting the model parameters of the target object detection model, and then returning to execute step 1501, and continuing to train the target object detection model until it is determined that the accuracy of the training result of the target object detection model is greater than the preset accuracy.

By adopting the embodiment, the target object included in the target area image can be identified through the target area image and the pre-trained target object detection model, so that the identification of each pixel point of the image is avoided, and the identification efficiency of the target object is improved.

Example five

Referring to fig. 16, an architecture diagram of an image detection apparatus 1600 according to an embodiment of the present application includes a determining module 1601, a screening module 1602, and a detecting module 1603, specifically:

A determining module 1601, configured to obtain a monitoring video, and determine a pixel information difference degree between any two adjacent frames of images in the monitoring video;

a screening module 1602, configured to screen at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercept a target area image from the at least one candidate image;

a detection module 1603, configured to determine a target object detection model that matches each candidate image based on the feature information of the target area image corresponding to each candidate image, and detect the target object information included in each candidate image using the determined target object detection models, respectively.

In one possible design, the detection module 1603 is specifically configured to, when determining a target object detection model matched with each candidate image based on the feature information of the target area image corresponding to each candidate image:

In one possible design, the detection module 1603 is specifically configured to, when detecting target object information included in each candidate image using the determined target object detection models, respectively:

In one possible design, the detection module 1603 is specifically configured to, when stitching each target area image taken from the kth candidate image:

and sequentially splicing the target area images arranged after the first position on the left side and the right side of the reference area image by taking the target area images arranged after the first position as the reference area image, so as to obtain a k candidate image after splicing.

In one possible design, the detection module 1603 detects coordinate position information of each region image where the target object appears mapped onto a corresponding image in the surveillance video according to the following manner:

In one possible design, the determining module 1601 is specifically configured to, when determining a difference degree of pixel information between any two adjacent frames of images in the surveillance video:

In one possible design, the determining module 1601 is specifically configured to, when determining the pixel information difference degree between the i+1st frame image and the i frame image based on the third grayscale image:

In one possible design, the filtering module 1602 is specifically configured to, when determining a candidate image in the surveillance video in which the pixel information difference degree meets a preset condition:

In a possible design, the filtering module 1602 is specifically configured to, when determining, according to the i+1st frame image and the fourth grayscale image, a candidate image corresponding to the i+1st frame image:

In one possible design, the filtering module 1602 is specifically configured to, when capturing the target area image from the at least one candidate image:

According to the image detection device provided by the embodiment of the application, candidate images with pixel information difference degree meeting preset conditions are screened out from a monitoring video based on pixel information difference degree between any two adjacent frames of images in the monitoring video, then target object detection models matched with each candidate image are determined based on target area images intercepted from the candidate images and characteristic information of the target area images corresponding to each candidate image, and target object information included in each candidate image is detected by utilizing the determined target object detection models. By the method, the target object information can be determined without detecting each frame of image of the monitoring video, so that the calculated amount in the target object detection process is reduced; and selecting a proper target object detection model to detect target object information by combining the characteristic information of the target area image of the possibly-occurring target object, so that the detection efficiency and the detection accuracy of the target object can be improved.

Example six

Based on the same technical conception, the embodiment of the application also provides electronic equipment. Referring to fig. 17, a schematic structural diagram of an electronic device 170 according to an embodiment of the present application includes a processor 171, a memory 172, and a bus 173. The memory 172 is used for storing execution instructions, including a memory 1721 and an external memory 1722; the memory 1721 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 171 and data exchanged with the external memory 1722 such as a hard disk, and the processor 171 exchanges data with the external memory 1722 through the memory 1721, and when the computer device 170 is operated, the processor 171 and the memory 172 communicate with each other through the bus 173, so that the processor 171 executes the following instructions:

The specific process flow of the processor 171 may refer to the descriptions of the above method embodiments, and will not be repeated here.

Based on the same technical idea, the embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, performs the steps of the above-described image detection method.

Specifically, the storage medium can be a general-purpose storage medium, such as a mobile disk, a hard disk, and the like, and when the computer program on the storage medium is executed, the image detection method can be executed, so that the calculated amount in the target object detection process is reduced, and the target object detection efficiency and accuracy are improved.

Based on the same technical concept, the embodiment of the present application further provides a computer program product, which includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the image detection method, and specific implementation may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present application, and the application should be covered. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. An image detection method, comprising:

screening at least one candidate image with the pixel information difference degree meeting a preset condition from the monitoring video, and intercepting a target area image from the at least one candidate image; determining a target object detection model with complexity corresponding to each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting target object information included in each candidate image by utilizing the determined target object detection models;

the detecting, using the determined target object detection models, target object information included in each candidate image, respectively, includes:

2. The method of claim 1, wherein the characteristic information of the target area image includes at least one of:

3. The method of claim 2, wherein the determining a target object detection model that matches each candidate image based on the feature information of the target region image corresponding to each candidate image comprises:

4. The method of claim 2, wherein the determining a target object detection model that matches each candidate image based on the feature information of the target region image corresponding to each candidate image comprises:

5. The method of claim 1, wherein stitching each target area image taken from the kth candidate image comprises:

6. The method of claim 1, wherein the target object information included in the kth candidate image includes at least one of:

7. The method of claim 6, wherein the coordinate position information of each region image in which the target object appears mapped onto the corresponding image in the monitoring video is detected according to the following manner:

8. The method of claim 1, wherein determining the pixel information difference between any two adjacent frames of images in the surveillance video comprises:

9. The method of claim 8, wherein the determining a pixel information difference degree between the i+1th frame image and the i frame image based on the third grayscale image comprises:

10. The method of claim 9, wherein determining candidate images in the surveillance video for which the pixel information variance meets a preset condition comprises:

11. The method of claim 10, wherein the determining the candidate image corresponding to the i+1th frame image from the i+1th frame image and the fourth grayscale image comprises:

12. The method according to any one of claims 8 to 11, wherein said capturing the target area image from the at least one candidate image comprises:

13. An image detection apparatus, comprising: the determining module is used for acquiring the monitoring video and determining the pixel information difference degree between any two adjacent frames of images in the monitoring video;

the detection module is used for determining a target object detection model with complexity corresponding to matching of each candidate image based on the characteristic information of the target area image corresponding to each candidate image, and respectively detecting target object information included in each candidate image by utilizing the determined target object detection models;

the detection module is specifically configured to, when detecting target object information included in each candidate image using the determined target object detection models, respectively:

14. The apparatus of claim 13, wherein the characteristic information of the target area image includes at least one of:

15. The apparatus of claim 14, wherein the detection module, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

16. The apparatus of claim 14, wherein the detection module, when determining the target object detection model matching each candidate image based on the feature information of the target area image corresponding to each candidate image, is specifically configured to:

17. The apparatus of claim 13, wherein the detection module, when stitching each target area image truncated from the kth candidate image, is specifically configured to:

18. The apparatus of claim 13, wherein the target object information included in the kth candidate image comprises at least one of:

19. The apparatus of claim 18, wherein the detection module detects coordinate position information of each region image in which the target object appears mapped onto a corresponding image in the surveillance video according to:

20. The apparatus of claim 13, wherein the determining module, when determining a degree of difference of pixel information between any two adjacent frames of images in the surveillance video, is specifically configured to:

21. The apparatus of claim 20, wherein the determining module, when determining the pixel information difference between the i+1st frame image and the i frame image based on the third grayscale image, is specifically configured to:

22. The apparatus of claim 21, wherein the filtering module is specifically configured to, when determining the candidate image in the surveillance video in which the pixel information difference meets a preset condition:

23. The apparatus of claim 22, wherein the filtering module, when determining the candidate image corresponding to the i+1st frame image according to the i+1st frame image and the fourth grayscale image, is specifically configured to:

24. The apparatus according to any one of claims 20 to 23, wherein the screening module, when capturing the target area image from the at least one candidate image, is specifically configured to:

25. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the image detection method according to any one of claims 1 to 12.

26. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the image detection method according to any of claims 1 to 12.