CN112837326A

CN112837326A - Remnant detection method, device and equipment

Info

Publication number: CN112837326A
Application number: CN202110111585.4A
Authority: CN
Inventors: 邵新庆; 吴肖; 张磊; 覃晓元
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-25
Anticipated expiration: 2041-01-27
Also published as: CN112837326B

Abstract

A method, a device and equipment for detecting a remnant are provided, wherein a first image and a second image with the collection time interval of a target detection area being more than or equal to a preset time length are obtained; respectively inputting the first image and the second image into a pre-trained target semantic segmentation model to respectively obtain category information corresponding to each pixel point in the first image and the second image; determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image; determining a second area suspected of having a remnant in the second image according to the category information corresponding to each pixel point in the second image; and determining whether the object detection area has the carry-over according to the first area and the second area. Because the remnants only occupy a small proportion in the monitoring area generally, and the division of the remnants is carried out through the category of the pixel points, any small object in the image can be detected more accurately, and the omission factor of the detection of the remnants is reduced.

Description

Remnant detection method, device and equipment

Technical Field

The invention relates to the technical field of deep learning, in particular to a method, a device and equipment for detecting a remnant.

Background

The detection of the left object means that after a certain object is left in a monitoring area or abandoned for a certain time, the position of the object is automatically detected by processing an image video, and an alarm is triggered to prevent accidents. The detection of the abandoned object is a basic function in intelligent video monitoring, has great significance for timely finding back or eliminating potential danger of an object with unknown monitoring area, and is an important measure for preventing danger and ensuring safety.

With the development of deep learning technology, the target detection method is gradually adopted to locate and identify the carry-over, but because the carry-over usually occupies only a small proportion in the monitored area, the carry-over detection based on the target detection method has a large missing detection rate.

Disclosure of Invention

The embodiment of the invention provides a method, a device and equipment for detecting a remnant, which are used for reducing the missing detection rate of the remnant detection.

According to a first aspect, there is provided in an embodiment a carryover detection method comprising:

acquiring a first image and a second image of a target detection area, wherein the acquisition time interval of the first image and the second image is more than or equal to a preset time length;

inputting the first image and the second image into a pre-trained target semantic segmentation model respectively to obtain category information corresponding to each pixel point in the first image and the second image respectively, wherein the target semantic segmentation model is obtained by training based on a training sample labeled with the category information corresponding to each pixel point;

determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image;

determining a second area suspected of having a remnant in the second image according to the category information corresponding to each pixel point in the second image;

determining whether carryover is present in the target detection zone based on the first zone and the second zone.

Optionally, the determining whether the carry-over exists in the target detection area according to the first area and the second area comprises:

calculating the coincidence degree of the first region and the second region;

if the contact ratio is larger than a preset threshold value, determining that a remnant exists in the target detection area;

and if the contact ratio is less than or equal to a preset threshold value, determining that no remnant exists in the target detection area.

Optionally, the calculating the coincidence degree of the first region and the second region includes:

obtaining the contact ratio of the first region and the second region through a preset contact ratio formula, wherein the preset contact ratio formula is as follows:

wherein IoU is the coincidence ratio of the first region and the second region,

an overlap region of a suspected carryover in the first image and a suspected carryover in the second image,

is a merged region of suspected carryover in the first image and suspected carryover in the second image.

Optionally, after the determining that carryover is present in the target detection zone, the method further comprises:

and outputting prompt information to prompt that the object detection area has the carry-over.

Optionally, the category information output by the target semantic segmentation model includes a carry-over, and further includes one or more of a person, a motor vehicle, a non-motor vehicle, an animal, and a background.

Optionally, the target semantic segmentation model includes a feature extraction module, a pyramid pooling module and a prediction module, which are connected in sequence;

the characteristic extraction module is used for acquiring a first characteristic diagram of an input image through a convolutional neural network;

the pyramid pooling module is used for acquiring feature maps of the input image on multiple scales from the first feature map through expansion convolution, performing up-sampling on the feature maps on the multiple scales, and then combining the feature maps with the first feature map to obtain a combined feature map;

and the prediction module is used for predicting the probability of each pixel point in the input image belonging to each category by performing convolution operation on the combined feature graph, and determining the category corresponding to the maximum probability value in each pixel point as the category information corresponding to the corresponding pixel point.

Optionally, when the target semantic segmentation model is trained, whether the target semantic segmentation model converges is determined according to a cross entropy loss function.

According to a second aspect, there is provided in an embodiment a carryover detection apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring a first image and a second image of a target detection area, and the acquisition time interval of the first image and the second image is more than or equal to a preset time length;

the second acquisition module is used for respectively inputting the first image and the second image into a pre-trained target semantic segmentation model so as to respectively obtain category information corresponding to each pixel point in the first image and the second image, and the target semantic segmentation model is obtained by training based on a training sample marked with the category information corresponding to each pixel point;

the first determining module is used for determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image;

a second determining module, configured to determine, according to category information corresponding to each pixel point in the second image, a second area in the second image where a carry-over is suspected to exist;

a third determination module for determining whether carryover is present in the target detection zone based on the first zone and the second zone.

According to a third aspect, there is provided in one embodiment an electronic device comprising:

a memory for storing a program;

a processor configured to execute the program stored in the memory to implement the carryover detection method according to any one of the first aspects.

According to a fourth aspect, an embodiment provides a computer-readable storage medium having a program stored thereon, the program being executable by a processor to implement the carryover detection method of any one of the first aspects described above.

The embodiment of the invention provides a method, a device and equipment for detecting a remnant, wherein the method comprises the following steps: acquiring a first image and a second image of a target detection area, wherein the acquisition time interval of the first image and the second image is more than or equal to a preset time length; respectively inputting the first image and the second image into a pre-trained target semantic segmentation model to respectively obtain category information corresponding to each pixel point in the first image and the second image, wherein the target semantic segmentation model is obtained by training based on a training sample labeled with the category information corresponding to each pixel point; determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image; determining a second area suspected of having a remnant in the second image according to the category information corresponding to each pixel point in the second image; and determining whether the object detection area has the carry-over according to the first area and the second area. In the embodiment of the invention, the pre-trained target semantic segmentation model can output the category information corresponding to each pixel point in the two images with a preset time interval, and the areas suspected of having the carry-over in the respective images are respectively divided from the two images according to the category information corresponding to each pixel point, so that whether the carry-over exists in the target detection area is determined according to the respective areas. Because the remnants only occupy a small proportion in the monitoring area generally, and the division of the remnants is carried out through the category of the pixel points, any small object in the image can be detected more easily and accurately, and the omission factor of the detection of the remnants is reduced.

Drawings

Fig. 1 is a schematic flowchart of a first embodiment of a method for detecting a carry-over according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a second embodiment of a method for detecting a carry-over according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a third embodiment of a method for detecting a carry-over according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a target semantic segmentation model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a carryover detection apparatus according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

In the prior art, when a target detection method is adopted to position and identify a remnant, the remnant usually occupies a small proportion in a monitored area, so that the remnant detection based on the target detection method has a large missing rate. In order to reduce the missing rate of the detection of the carry-over, embodiments of the present invention provide a method, an apparatus and a device for detecting the carry-over, which are described in detail below.

Fig. 1 is a schematic flow chart of a first embodiment of a method for detecting a carry-over according to an embodiment of the present invention, as shown in fig. 1, the method for detecting a carry-over according to the present embodiment may include:

s101, acquiring a first image and a second image of a target detection area.

The execution subject of the embodiment of the present invention may be any device with processing capability, for example, the method for detecting a remnant provided by the embodiment may be executed by a local host connected to a monitoring camera.

In specific implementation, a first image and a second image of a target detection area are acquired from a monitoring video shot by a monitoring camera. And the acquisition time interval of the first image and the second image is more than or equal to the preset time length. For example, assuming that the preset time period is 5 minutes, the acquisition time intervals of the first image and the second image may be 5 minutes, 7 minutes, and the like. It should be noted that, because the time that the object remains in the target detection area is longer than or equal to the preset time length, it is determined that the object is the remnant, and therefore, two images whose interval acquisition time is longer than or equal to the preset time length need to be acquired, so as to determine the remnant in the following.

S102, the first image and the second image are respectively input into a pre-trained target semantic segmentation model to respectively obtain category information corresponding to each pixel point in the first image and the second image.

The target semantic segmentation model can be obtained by training based on a training sample labeled with category information corresponding to each pixel point. In a specific implementation, the training process of the target semantic segmentation model may include the following steps:

step a: and obtaining a plurality of sample images and label information corresponding to each sample image, wherein the label information is used for identifying the category of each pixel point in the sample images.

Step b: inputting any sample image into the initial semantic segmentation model, and outputting the category prediction probability aiming at each pixel point in any sample image by the initial semantic segmentation model.

Step c: and calculating the loss value of the preset loss function according to the category prediction probability of each pixel point in any sample image and the label information corresponding to any sample image.

Step d: and adjusting parameters of the initial semantic segmentation model according to the loss value of the preset loss function to obtain an updated semantic segmentation model.

Step e: and (3) iterating the training process aiming at the updated semantic segmentation model until the preset loss function is determined to realize convergence based on the loss value of the preset loss function or the iteration times are more than the preset training iteration times.

Step f: and determining a semantic segmentation model corresponding to the preset loss function based on the loss value of the preset loss function to realize convergence or the iteration times of the preset loss function is greater than the preset training iteration times as a target semantic segmentation model.

In a specific implementation, the predetermined loss function may be a cross entropy loss function. The target semantic segmentation model provided by the embodiment of the invention is obtained by training based on the training sample labeled with the category information corresponding to each pixel point, so that a large amount of training data is not needed, and the generalization capability of the model is better.

S103, determining a first area suspected of having the remnant in the first image according to the category information corresponding to each pixel point in the first image.

Specifically, the category corresponding to the pixel point in the first image may be all pixel points of the remnant, and the pixel points may be combined into a first area in the first image, where the remnant is suspected to be present. In addition, in order to improve the accuracy of the detection of the carry-over, data in the first area that is not related to the suspected carry-over in the first image may be further filtered out by means of a mask.

And S104, determining a second area suspected of having the remnant in the second image according to the category information corresponding to each pixel point in the second image.

Specifically, the categories corresponding to the pixel points in the second image may be all pixel points of the remnant, and the pixel points may be combined into a second region in the second image, where the remnant is suspected to be present. Meanwhile, in order to improve the accuracy of the detection of the carry-over, data in the second area, which is not related to the suspected carry-over in the second image, may be further filtered out by means of a mask.

And S105, determining whether the object detection area has the remnants according to the first area and the second area.

By acquiring the areas suspected of having the carry-over in the two images with the time interval being greater than or equal to the preset time length, whether two positions of the suspected carry-over in the acquisition time interval change or not can be predicted, and therefore whether the carry-over exists in the target detection area or not can be determined.

According to the method for detecting the remnant, the first image and the second image of the target detection area are obtained, wherein the acquisition time interval of the first image and the second image is larger than or equal to the preset time length; respectively inputting the first image and the second image into a pre-trained target semantic segmentation model to respectively obtain category information corresponding to each pixel point in the first image and the second image, wherein the target semantic segmentation model is obtained by training based on a training sample labeled with the category information corresponding to each pixel point; determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image; determining a second area suspected of having a remnant in the second image according to the category information corresponding to each pixel point in the second image; and determining whether the object detection area has the carry-over according to the first area and the second area. By the method provided by the embodiment of the invention, the pre-trained target semantic segmentation model can output the category information corresponding to each pixel point in the two images with the interval of the preset duration, and the areas suspected of having the carry-over in the images are respectively divided from the two images according to the category information corresponding to each pixel point, so that whether the carry-over exists in the target detection area is determined according to the areas. Because the remnants only occupy a small proportion in the monitoring area generally, and the division of the remnants is carried out through the category of the pixel points, any small object in the image can be detected more easily and accurately, and the omission factor of the detection of the remnants is reduced.

Optionally, the category information output by the target semantic segmentation model includes a carry-over, and may further include one or more of a person, a motor vehicle, a non-motor vehicle, an animal, and a background. The objects which have great influence on the detection of the abandoned object in the monitoring scene are divided, so that the interference can be eliminated, and the detection precision of the abandoned object is improved.

Fig. 2 is a schematic flow chart of a second embodiment of a method for detecting a carry-over according to an embodiment of the present invention, as shown in fig. 2, the method for detecting a carry-over according to the present embodiment may include:

s201, a first image and a second image of a target detection area are obtained.

And the acquisition time interval of the first image and the second image is more than or equal to the preset time length.

S202, the first image and the second image are respectively input into a pre-trained target semantic segmentation model to respectively obtain category information corresponding to each pixel point in the first image and the second image.

The target semantic segmentation model is obtained by training based on the training sample labeled with the category information corresponding to each pixel point.

S203, determining a first area suspected of having the remnant in the first image according to the category information corresponding to each pixel point in the first image.

And S204, determining a second area suspected of having the remnant in the second image according to the category information corresponding to each pixel point in the second image.

The specific implementation of S201-S204 can refer to the related descriptions of S101-S104 in the first embodiment.

S205, calculating the coincidence degree of the first region and the second region.

S206, judging whether the coincidence degree of the first area and the second area is larger than a preset threshold value.

If yes, executing S207; if not, go to step S208.

And S207, determining that the carry-over exists in the target detection area.

When the first region has a relatively high degree of overlap with the second region, the two locations within the acquisition time interval that are indicative of suspected carryover will change less, thus determining the presence of carryover in the target detection zone.

And S208, determining that no carry-over exists in the target detection area.

When the first region and the second region have a relatively low overlap, the two positions within the acquisition time interval that are indicative of suspected carryover change significantly, thus determining that no carryover is present in the target detection zone.

According to the method for detecting the remnant, provided by the embodiment of the invention, whether the contact ratio of the first area and the second area is greater than a preset threshold value is judged, and if the contact ratio is greater than the preset threshold value, the remnant is determined to exist in the target detection area; if the coincidence degree is smaller than or equal to the preset threshold value, the object detection area is determined to be not provided with the carry-over, and the changes of the suspected carry-over at two positions within the acquisition time interval can be determined, so that whether the carry-over exists in the object detection area can be determined more accurately.

As a possible implementation manner, a specific implementation manner of S205 in the second embodiment may be: obtaining the contact ratio of the first area and the second area through a preset Intersection over Union (IoU) formula, wherein the preset Intersection over Union formula is as follows:

wherein IoU is the coincidence degree of the first region and the second region,

is the overlapping area of the suspected carryover in the first image and the suspected carryover in the second image,

is the merged area of the suspected carryover in the first image and the suspected carryover in the second image.

As a possible implementation manner, after S207 of the second embodiment, the method for detecting a carry-over provided by this embodiment may further include: and outputting prompt information to prompt that the object is left in the target detection area so that a worker can find the object more quickly.

The method for detecting a carry-over according to an embodiment of the present invention is described below by taking a specific implementation manner as an example. Fig. 3 is a schematic flow chart of a third embodiment of a method for detecting a carry-over according to an embodiment of the present invention, as shown in fig. 3, the method for detecting a carry-over according to the present embodiment may include:

s301, defining the final output category of the target semantic segmentation model.

When the carry-over detection is performed, the final output category needs to be defined. For example, according to the usage scenario, the embodiment of the present invention may define the output classes of the target semantic segmentation model as 6 classes: humans, automobiles, non-automobiles, small animals, backgrounds, and carryover. Wherein, the background can comprise places for placing objects such as road surfaces, grass clusters and the like; the carry-over may be a small item attached to a person or a car.

Optionally, in addition, other objects with large interference in the detection of the carry-over object may also be expanded as one or more other output categories to eliminate such interference, so as to improve the accuracy of the detection of the carry-over object. The embodiment of the invention does not specifically limit the final output category of the target semantic segmentation model.

S302, designing a target semantic segmentation model.

In concrete implementation, the target semantic segmentation network can be divided into an encoder and a decoder, wherein the encoder mainly performs high-level abstraction on original image data to acquire semantic information included in the original image data; the decoder restores the high-level semantic information to the pixel classification information. For example, fig. 4 is a schematic structural diagram of a target semantic segmentation model according to an embodiment of the present invention, and as shown in fig. 4, the target semantic segmentation model may include a feature extraction module 410, a pyramid pooling module 420, and a prediction module 430, which are connected in sequence. In particular implementations, the encoder may include the feature extraction module 410 and the pyramid pooling module 420 of fig. 4, and the decoder may include the prediction module 430 of fig. 4.

The feature extraction module 410 may be configured to obtain a first feature map of an input image through a convolutional neural network.

The pyramid pooling module 420 may be configured to obtain feature maps of the input image on multiple scales from the first feature map through expansion convolution, perform upsampling on the feature maps on the multiple scales, and then merge the upsampled feature maps with the first feature map to obtain a merged feature map.

The prediction module 430 may be configured to predict the probability that each pixel point in the input image belongs to each category by performing convolution operation on the merged feature map, and determine the category corresponding to the maximum probability value in each pixel point as the category information corresponding to the corresponding pixel point. For example, if the probability that the pixel point m belongs to the category of "remains" is 0.9, the probability that the pixel point m belongs to the category of "people" is 0.02, the probability that the pixel point m belongs to the category of "motor vehicles" is 0.01, the probability that the pixel point m belongs to the category of "non-motor vehicles" is 0.01, the probability that the pixel point m belongs to the category of "animals" is 0.01, and the probability that the pixel point m belongs to the category of "backgrounds" is 0.05, then the category of "remains" corresponding to the maximum probability value of 0.9 in the pixel point m is determined as the category information corresponding to the pixel point m.

For example, the input of the target semantic segmentation model may be an image to be analyzed, the resolution of the image may be 1920 × 1080, and the data format may be a 3-dimensional matrix 1080 × 1920 × 3 (HWC), where "3" represents three channels of the color image RGB (a color standard in the industry). In the encoding network, the image Feature Map (Feature Map) may be obtained through a Feature extraction network (Feature extraction module 410), for example, the Feature extraction network may be a Convolutional Neural Network (CNN), and its output scale is 135 × 240 × 256. Here, the feature extraction network may select a Resnet18 residual network and perform 8-fold down-sampling in consideration of the calculation speed. The pyramid pooling module 420 may then be utilized to encode the acquired feature map so that global information may be leveraged. Specifically, the pyramid pooling module is used as an Encoder (Encoder), and mainly extracts overall information by using a high receptive field of the expanded convolution, and reduces the number of channels of the feature map by using 1 × 1 convolution operation, thereby reducing the data dimension. After the features at each scale are obtained, the resolution of the feature map is combined with the feature map output by the decoder by one up-sampling. Finally, the recognition stage may perform class Prediction (Final Prediction) using the convolution operation that the Prediction module 430 has, and output a Prediction probability for each pixel in the image. The size of the prediction probability of each pixel in the image is 1080 × 1920 × 6, wherein "1920" and "1080" correspond to the width and height of the original image, and "6" represents the prediction probability of each category, and when the prediction probability is specifically implemented, the corresponding category with the highest probability value can be taken as the category of the pixel, and the output is 1080 × 1920 × 1.

S303, a training set of the target semantic segmentation model is manufactured.

And making a training set of the target semantic segmentation model, which mainly comprises collecting sample data and labeling semantic labels. Specifically, the original picture data is subjected to region labeling, and a picture with a label is obtained. And, these raw picture data and labels are preprocessed: the original picture and the label are scaled to 1920 x 1080 x 3(W x H C) resolution, the parts of the original picture, the scaled scales of which do not reach the input scales of the training network, are filled with complementary '0', and the parts of the label, the scaled scales of which do not reach the input scales of the training network, are filled with complementary 'background class values', and particularly, the label is scaled by using a nearest neighbor interpolation method.

S304, training a target semantic segmentation model.

The training set of the target semantic segmentation model is the original image processed in S303 and the corresponding label image { X, Y }, where the dimension of X is 1080 × 1920 × 3, and the dimension of Y is 1080 × 1920 × 1.

When the target semantic segmentation model is trained, a Cross Entropy Loss function (Cross Entropy Loss) is added, and the Cross Entropy Loss function is as follows:

where H (X, Y) is the loss value of an image, where N is the number of pixels of the image, and p (X)_i) Is a label of the ith pixel, q (x)_i) Is the predicted value of the ith pixel.

After the loss function and the training set are designed, model training can be performed by adopting a Pythrch training frame of Facebook company, and when the training reaches the set training iteration number or the final convergence meets the expectation, the target semantic segmentation model obtained by training is output.

S305, image semantic recognition is carried out.

The image frame a obtained from the monitoring video is input into the trained target semantic segmentation model in the step S304, and the category of each pixel point in the image frame a is predicted, so that category information corresponding to each pixel point in the image frame a can be obtained. And determining the area suspected of having the carry-over in the image frame a according to the category information corresponding to each pixel point in the image frame a, and recording the area as a predicted image I suspected of having the carry-over_pWherein the pixel value of each pixel of the prediction image is a category value (0 to 5). In addition, irrelevant data in the predicted image is filtered in a masking mode for a residue detection area concerned by a user, and the masked image is marked as I_M. The final suspected carryover results were: i is_loss＝I_p*I_M。

S306, carry-over analysis.

After the category attribute of the carry-over is defined, the time of the carry-over under a specific environment needs to be defined, where t time is defined as the carry-over in the region of interest (target detection region), and in the embodiment of the present invention, t may be assumed to be 5 minutes. Assuming that the suspected carry-over is detected in both images separated by 5 minutes as determined by S305, the presence or absence of the carry-over can be determined by the positions of the two suspected carry-over, i.e. the intersection ratio of the areas suspected of having the carry-over in the two images can be calculated: the ratio of the intersection of "predicted region" and "real region" to the union of "predicted region" and "real region". Wherein, the intersection ratio formula is as follows:

when the value of IoU is greater than the threshold τ, it may be determined that there is a carry-over in the region of interest; when the value of IoU is less than or equal to the threshold τ, it may be determined that no carry-over is present in the region of interest. τ may be set to 0.7 in embodiments of the present invention. If no suspected carry-over is detected in any of the images, then IoU is 0, then it may be determined that no carry-over is present in the region of interest.

Fig. 5 is a schematic structural diagram of a carryover detection apparatus according to an embodiment of the present invention, and as shown in fig. 5, the carryover detection apparatus 50 may include a first obtaining module 510, a second obtaining module 520, a first determining module 530, a second determining module 540, and a third determining module 550.

The first obtaining module 510 may be configured to obtain a first image and a second image of the target detection area, where an interval between the acquisition times of the first image and the second image is greater than or equal to a preset time length.

The second obtaining module 520 may be configured to input the first image and the second image into a pre-trained target semantic segmentation model respectively, so as to obtain category information corresponding to each pixel point in the first image and the second image, where the target semantic segmentation model is obtained by training based on a training sample labeled with the category information corresponding to each pixel point.

The first determining module 530 may be configured to determine a first area suspected of having a carry-over in the first image according to the category information corresponding to each pixel point in the first image.

The second determining module 540 may be configured to determine, according to the category information corresponding to each pixel point in the second image, a second area suspected of having the carry-over in the second image.

A third determination module 550 may be configured to determine whether carryover is present in the target detection zone based on the first zone and the second zone.

According to the legacy detection device provided by the embodiment of the invention, a first image and a second image of a target detection area are obtained through a first obtaining module, wherein the acquisition time interval of the first image and the second image is more than or equal to a preset time length; respectively inputting the first image and the second image into a pre-trained target semantic segmentation model through a second acquisition module so as to respectively obtain category information corresponding to each pixel point in the first image and the second image, wherein the target semantic segmentation model is obtained by training based on a training sample labeled with the category information corresponding to each pixel point; determining a first area suspected of having a remnant in the first image according to the category information corresponding to each pixel point in the first image through a first determination module; determining a second area suspected of having a remnant in the second image according to the category information corresponding to each pixel point in the second image through a second determination module; and determining whether the object detection area has the remnants according to the first area and the second area through a third determination module. Outputting category information corresponding to each pixel point in two images with a preset time interval by a pre-trained target semantic segmentation model, and respectively dividing areas suspected of having the objects left in the two images according to the category information corresponding to each pixel point, thereby determining whether the objects left exist in the target detection area according to the areas. Because the remnants only occupy a small proportion in the monitoring area generally, and the division of the remnants is carried out through the category of the pixel points, any small object in the image can be detected more easily and accurately, and the omission factor of the detection of the remnants is reduced.

Optionally, when the third determining module 550 determines whether the object detection area has the carry-over according to the first area and the second area, it may specifically be configured to: calculating the contact ratio of the first area and the second area; if the contact ratio is greater than a preset threshold value, determining that a remnant exists in the target detection area; and if the contact ratio is less than or equal to a preset threshold value, determining that no remnant exists in the target detection area.

Optionally, when the third determining module 550 is implemented to calculate the overlap ratio between the first region and the second region, it may specifically be configured to: obtaining the contact ratio of the first area and the second area through a preset contact ratio formula, wherein the preset contact ratio formula is as follows:

Optionally, the carryover detection apparatus 50 may further include an output module (not shown in the figure) for outputting a prompt to prompt the presence of a carryover in the target detection area.

Optionally, the target semantic segmentation model may include a feature extraction module, a pyramid pooling module, and a prediction module, which are connected in sequence; the characteristic extraction module is used for acquiring a first characteristic diagram of an input image through a convolutional neural network; the pyramid pooling module is used for acquiring feature maps of the input image on multiple scales from the first feature map through expansion convolution, performing up-sampling on the feature maps on the multiple scales, and then combining the feature maps with the first feature map to obtain a combined feature map; and the prediction module is used for predicting the probability of each pixel point in the input image belonging to each category by performing convolution operation on the combined feature graph, and determining the category corresponding to the maximum probability value in each pixel point as the category information corresponding to the corresponding pixel point.

Optionally, when the target semantic segmentation model is trained, whether the target semantic segmentation model is converged is determined according to the cross entropy loss function.

In addition, corresponding to the method for detecting a carry-over provided by the above embodiment, an embodiment of the present invention further provides an electronic device, which may include: a memory for storing a program; a processor for implementing all the steps of the carryover detection method provided by the embodiment of the present invention by executing the program stored in the memory.

In addition, corresponding to the method for detecting a carry-over provided by the above embodiment, an embodiment of the present invention further provides a computer-readable storage medium, in which a program is stored, and the program, when executed by a processor, implements all the steps of the method for detecting a carry-over provided by the embodiment of the present invention.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A carryover detection method, comprising:

2. The method of claim 1, wherein said determining whether carryover is present in the target detection zone based on the first zone and the second zone comprises:

calculating the coincidence degree of the first region and the second region;

3. The method of claim 2, wherein the calculating a degree of overlap of the first region and the second region comprises:

wherein IoU is the first area and the second areaThe degree of the contact between the two materials is,

4. The method of claim 2, wherein after the determining that carryover is present in the target detection zone, the method further comprises:

5. The method of claim 1, wherein the class information output by the target semantic segmentation model includes carry-over and further includes one or more of a person, a motor vehicle, a non-motor vehicle, an animal, and a background.

6. The method of any one of claims 1-5, wherein the target semantic segmentation model comprises a feature extraction module, a pyramid pooling module, and a prediction module connected in sequence;

7. The method of claim 6, wherein in training the target semantic segmentation model, determining whether the target semantic segmentation model converges according to a cross entropy loss function.

8. A carry-over detection device, comprising:

9. An electronic device, comprising:

a memory for storing a program;

a processor for implementing the method of any one of claims 1-7 by executing a program stored by the memory.

10. A computer-readable storage medium, characterized in that the medium has stored thereon a program which is executable by a processor to implement the method according to any one of claims 1-7.