CN114419531A

CN114419531A - Object detection method, object detection system, and computer-readable storage medium

Info

Publication number: CN114419531A
Application number: CN202111481397.7A
Authority: CN
Inventors: 周永哲; 魏东东; 陆晓栋; 吴忠人
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-29

Abstract

The application discloses a target detection method, a target detection system and a computer readable storage medium, wherein the method comprises the following steps: acquiring a plurality of video frames; the video frame comprises at least one target to be detected; segmenting the foreground and the background in each video frame to obtain corresponding segmentation results; pixel superposition is carried out on the segmentation results of the plurality of video frames to obtain a superposition result, and a candidate lane area corresponding to the foreground is obtained from the superposition result; correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area; and verifying the target to be detected by using the background and the first lane area in the superposition result through a hierarchical weighting method. By the design mode, the accuracy of filtering the abnormal detection target can be ensured, so that the probability of false detection is reduced.

Description

Object detection method, object detection system, and computer-readable storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a target detection method, a target detection system, and a computer-readable storage medium.

Background

With the rapid development of computer vision technology based on deep learning, target detection becomes one of the most important research directions, and a large number of scientific researchers have achieved good results in the field of target detection and fully show the advancement and superiority of deep learning. With the development of the deep learning technology, the effect and robustness of the target detection algorithm are further improved, but due to the complex diversity of the real scenes, deep learning training samples cannot be well covered, so that in the application process, samples are often fewer in some scenes (such as insufficient illumination, strong illumination or congestion), the target detection effect is poor, a large number of targets which are detected by mistake and missed are existed, and the falling of the product is seriously influenced. For example, in the intelligent traffic early warning service, a large number of false alarms are caused by a large number of false detection and missed detection targets, so that the false alarms become the most urgent problem to be solved when the intelligent early warning service falls to the ground. Therefore, a new target detection method is needed to solve the above problems.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a target detection method, a target detection system and a computer readable storage medium, which can ensure the accuracy of filtering of abnormally detected targets.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a target detection method including: acquiring a plurality of video frames; the video frame comprises at least one target to be detected; segmenting the foreground and the background in each video frame to obtain a corresponding segmentation result; pixel superposition is carried out on the segmentation results of the video frames to obtain a superposition result, and a candidate lane area corresponding to the foreground is obtained from the superposition result; correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area; and verifying the target to be detected by using the background in the superposition result and the first lane area through a graded weighting method.

The step of verifying the target to be detected by using the background in the superposition result and the first lane area through a hierarchical weighting method includes: acquiring a target detection frame of the target to be detected and first information thereof from a target detector; wherein the first information includes size information and a spatial position of the target detection frame, and the size information includes a width of the target detection frame; responding to at least one of the situation that the space position of the target detection frame is outside the first lane area, the situation that the width of the target detection frame and the width of the first lane area meet preset conditions and the situation that the position corresponding to the target detection frame is the background, and judging that the target to be detected is an abnormal detection target; and verifying the type of the target to be detected by a hierarchical weighting method according to the size information and the spatial position.

The step of verifying the type of the target to be detected by a hierarchical weighting method according to the size and the spatial position comprises the following steps: obtaining a first overlapping proportion between the target detection frame and the first lane area, a ratio between the width of the target detection frame and the width of the first lane area, and a second overlapping proportion between the target detection frame and the background; setting a first condition value to a preset constant in response to the first overlap ratio being greater than a first threshold; setting a second condition value to a preset constant in response to the ratio being greater than a second threshold value and less than a third threshold value; setting a third condition value to a preset constant in response to the second overlap ratio being less than a fourth threshold; obtaining a sum of a first product of the first condition value and a first weighting ratio, a second product of the second condition value and a second weighting ratio, and a third product of the third condition value and a third weighting ratio; and responding to the fact that the sum value is larger than or equal to a fifth threshold value, and judging the target to be detected to be a normal detection target.

The step of performing pixel superposition on the segmentation results of the plurality of video frames to obtain a superposition result, and obtaining the candidate lane region corresponding to the foreground from the superposition result includes: pixel superposition is carried out on the segmentation results of the video frames to obtain a superposition result, and edge detection is carried out on the superposed foreground to obtain a candidate lane area corresponding to the foreground; each candidate lane area is configured with a lane number; traversing all the targets to be detected in the video frames; each target to be detected is configured with a target number; and responding to the condition that the target to be detected is created, acquiring a candidate lane area where the target to be detected is located, and binding the target number of the target to be detected with the lane number of the corresponding candidate lane area.

Wherein, the step of correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area comprises: in response to that the state of the target to be detected is not created and the state of the target to be detected is deleted, judging whether the target number of the target to be detected is bound with the lane number of the corresponding candidate lane area; if so, correcting the candidate lane area through the target track set from creation to deletion of the target to be detected to obtain the first lane area, and determining the direction of the first lane area through the target track point displacement direction of the target to be detected.

The target track set comprises a plurality of target tracks from creation to deletion of the target to be detected; the step of correcting the candidate lane area through the set of target tracks from creation to deletion of the target to be detected to obtain the first lane area includes: fitting a plurality of target tracks of the target to be detected to obtain a fitting line; correcting the lane line slope of the candidate lane region using the fitted line to obtain the first lane region.

Wherein the step of segmenting the foreground and the background in each of the video frames to obtain the corresponding segmentation result comprises: establishing a Gaussian model for each first pixel point in a first video frame, and taking the pixel value of the first pixel point as a model mean value; obtaining a pixel value of a second pixel point at the same position as the first pixel point in the current video frame, and judging whether a difference value between the pixel value of the second pixel point and the model mean value is greater than or equal to a sixth threshold value; if so, judging that the target to be detected corresponding to the second pixel point is a foreground; and otherwise, judging that the target to be detected corresponding to the second pixel point is a background.

Wherein the step of segmenting the foreground and the background in each of the video frames to obtain the corresponding segmentation result comprises: the background noise is removed by a morphological noise filter.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an object detection system, comprising a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions to implement the object detection method according to any of the above embodiments.

In order to solve the above technical problem, the present application adopts another technical solution: there is provided a computer-readable storage medium storing a computer program for implementing the object detection method mentioned in any one of the above embodiments.

Different from the prior art, the beneficial effects of the application are that: the target detection method provided by the application comprises the following steps: acquiring a plurality of video frames; the method comprises the steps that video frames comprise at least one target to be detected, a foreground and a background in each video frame are segmented to obtain a corresponding segmentation result, the segmentation results of a plurality of video frames are subjected to pixel superposition to obtain a superposition result, a candidate lane area corresponding to the foreground is obtained from the superposition result, the candidate lane area is corrected according to a target track of the target to be detected in the plurality of video frames to obtain a first lane area, and finally the target to be detected is verified by a hierarchical weighting method by utilizing the background and the first lane area in the superposition result. By the design mode, the accuracy of filtering the abnormal detection target can be ensured, so that the probability of false detection is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram of one embodiment of a target detection method of the present application;

FIG. 2 is a schematic flow chart illustrating an embodiment of step S2 in FIG. 1;

FIG. 3 is a high-speed scene example artwork;

FIG. 4 is a GMMS effect graph for an example high speed scenario;

FIG. 5 is a diagram of a parking false alarm result;

FIG. 6 is a diagram of GMMS processing results of parking false alarm frames;

FIG. 7 is a schematic diagram of a pedestrian false alarm result;

FIG. 8 is a diagram illustrating the GMMS processing results of a pedestrian false alarm frame;

FIG. 9 is a schematic flow chart illustrating an embodiment of step S3 in FIG. 1;

FIG. 10 is a schematic flow chart diagram illustrating an embodiment of step S4 in FIG. 1;

FIG. 11 is a schematic flow chart illustrating an embodiment of step S5 in FIG. 1;

FIG. 12 is a schematic flow chart diagram illustrating one embodiment of step S42 of FIG. 11;

FIG. 13 is a schematic diagram of an embodiment of an object detection system according to the present application;

FIG. 14 is a block diagram of an embodiment of an object detection system of the present application;

FIG. 15 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a target detection method according to the present application.

Specifically, the target detection method includes:

s1: a plurality of video frames is acquired.

Specifically, the video frame includes at least one object to be detected.

S2: the foreground and background in each video frame are segmented to obtain corresponding segmentation results.

Specifically, background modeling is performed through a Gaussian Mixture Model (GMMS) to complete the segmentation of a foreground (moving object) and a background (static object), and a corresponding segmentation result is obtained.

Specifically, in this embodiment, please refer to fig. 2-4, fig. 2 is a flowchart illustrating an embodiment of step S2 in fig. 1, fig. 3 is an original diagram of an example high speed scene, and fig. 4 is a GMMS effect diagram of an example high speed scene. Step S2 specifically includes:

s10: and establishing a Gaussian model for each first pixel point in the first video frame, and taking the pixel value of the first pixel point as a model mean value.

Specifically, the gaussian model is a common variable distribution model, a gaussian probability density function, whose formula is shown below:

wherein μ represents mean, σ²Representing the variance.

The Gaussian Mixture Model (GMMS) is used for accurately quantizing things by using a Gaussian probability density function, and is a model formed by decomposing things into a plurality of Gaussian probability density functions, wherein the joint probability density distribution function of the Gaussian Mixture Model (GMMS) is as follows:

wherein p (x | k) ═ N (x | μ ═ N_kΣ k) represents the gaussian probability density function of the kth gaussian model; p (k) ═ pi_kRepresenting the weight of the kth gaussian model.

S11: and obtaining the pixel value of a second pixel point at the same position as the first pixel point in the current video frame, and judging whether the difference value between the pixel value of the second pixel point and the model mean value is greater than or equal to a sixth threshold value.

S12: and if so, judging that the target to be detected corresponding to the second pixel point is the foreground.

S13: otherwise, the target to be detected corresponding to the second pixel point is judged as the background.

Specifically, the Gaussian mixture model is subjected to parameter initialization by using a first frame of traffic scene image, a Gaussian model is established for each pixel point, the pixel value of the first pixel point is used as a model mean value, and variance sigma is given²And then, when one frame of image is sent, calculating the distance between the pixel value of a second pixel point at the same position as the first pixel point in the current video frame and the model mean value, if the difference is larger, the target is considered to move as the foreground, otherwise, the target is the background, and the mixed model is updated along with the continuous traffic scene image of one frame to realize the segmentation of the static background of the traffic scene and the foreground of the vehicle target.

Taking vehicles in a traffic scene as an example, comparing fig. 3 and fig. 4, the GMMS has a higher capability of acquiring a small target at a far end, so that the efficiency of extracting moving vehicles can be improved. By the design mode, the segmentation of the static background and the vehicle target foreground in the traffic scene can be realized.

Preferably, in this embodiment, after step S2, that is, after the step of segmenting the foreground and the background in each video frame to obtain the corresponding segmentation result, the method includes: the background noise is removed by a morphological noise filter. Therefore, the foreground and background segmentation can be more complete, and the segmentation accuracy is improved.

Further, in the present embodiment, while segmenting the foreground and background using the gaussian mixture model, detection and tracking of vehicle objects in the video frame is accomplished using an object detector (e.g., YOLOv3, etc.), and an object number thereof is configured for each object. However, since the YOLOv3 detector is unstable (i.e. missing) for detecting a small target at a far end, the target detection frame may be lost for a long time and still at a position, which in turn causes a false parking alarm, as shown in fig. 5, fig. 5 is a diagram illustrating the false parking alarm result. The reason for the false alarm is that a small moving target at the far end is erroneously determined as a stationary target, and at this time, the moving state of the target needs to be accurately acquired. As shown in fig. 6, fig. 6 is a diagram illustrating GMMS processing results of a parking false alarm frame. The current motion state of the parking alarm target can be verified by introducing the GMMS, and if the target is in the motion state as a result of verification, the previous parking alarm can be cancelled, so that the probability of false alarm and false alarm can be greatly reduced. The sensitivity of a Gaussian Mixture Model (GMMS) to a moving target can effectively make up for the missing detection of a far-end small target in a complex environment of a deep learning detector.

In addition, as shown in fig. 7, fig. 7 is a pedestrian false alarm result diagram. The YOLOv3 detector may also detect the roadside sign as a pedestrian, resulting in a pedestrian false alarm, and fig. 8 is a schematic diagram of the GMMS processing result of the pedestrian false alarm frame, compared with fig. 8. GMMS does not detect the target at the pedestrian which is detected by the YOLOv3 detector, so that the GMMS can play a certain correction role on the YOLOv3 detector, and the effect of detecting the far-end small target in a video frame is more obvious.

S3: and performing pixel superposition on the segmentation results of the plurality of video frames to obtain a superposition result, and obtaining a candidate lane area corresponding to the foreground from the superposition result.

Specifically, in the present embodiment, please refer to fig. 9, and fig. 9 is a flowchart illustrating an implementation manner of step S3 in fig. 1. Step S3 specifically includes:

s20: and performing pixel superposition on the segmentation results of the plurality of video frames to obtain a superposition result, and performing edge detection on the superposed foreground to obtain a candidate lane area corresponding to the foreground.

Specifically, one lane number (ID) is arranged for each lane candidate region. In step S2, a Gaussian Mixture Model (GMMS) is used to complete segmentation of the static background and the moving foreground, the segmentation results of the multi-frame images are subjected to pixel superposition to obtain a superposition result, and then the superimposed foreground region is subjected to edge detection, wherein the superimposed foreground is subjected to shape processing of each lane region mainly by methods such as morphology, for example, expansion, corrosion, opening operation, and the like, and then edge detection is completed by a gradient operator to obtain candidate lane regions corresponding to each foreground, and a corresponding lane number (ID) is configured for each candidate lane region.

S21: and traversing the target to be detected in all the video frames.

Specifically, each object to be detected is provided with an object number (ID).

S22: and judging whether the state of the target to be detected is established.

Specifically, if the state of a target to be detected is creation, it is indicated that the target to be detected appears for the first time; and if the state of one target to be detected is not established, indicating that the target to be detected does not appear for the first time.

S23: if so, acquiring a candidate lane area where the target to be detected is located, and binding the target number of the target to be detected with the lane number of the corresponding candidate lane area.

Specifically, if the state of the target to be detected is creation, the candidate lane area where the target to be detected is located is further judged, and the target number of the target to be detected and the lane number of the corresponding candidate lane area are bound.

S24: otherwise, the process proceeds to step S4.

Specifically, if the state of the target to be detected is not created, it is indicated that the target to be detected does not appear for the first time, and the method proceeds to the step of correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain the first lane area.

S4: and correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area.

Specifically, in the present embodiment, please refer to fig. 10, and fig. 10 is a schematic flowchart illustrating an implementation manner of step S4 in fig. 1. Step S4 specifically includes:

s30: and if the state of the target to be detected is not established, judging whether the state of the target to be detected is deleted.

Specifically, if it is determined in step S24 that the status of the object to be detected is not created, it is determined whether the status of the object to be detected is deleted, that is, it is determined whether the object to be detected disappears in the video frame.

S31: and if so, judging whether the target number of the target to be detected is bound with the lane number of the corresponding candidate lane area.

Specifically, if the state of the target to be detected is deletion, it is determined whether the target number of the target to be detected is bound to the lane number of the corresponding candidate lane area.

S32: if so, correcting the candidate lane area through the target track set from creation to deletion of the target to be detected to obtain a first lane area, and determining the direction of the first lane area through the target track point displacement direction of the target to be detected.

Specifically, if the deleted target number of the target to be detected is bound with the lane number of the corresponding candidate lane area, the candidate lane area is corrected through the target track set from creation to deletion of the target to be detected to obtain a first lane area, and the direction of the target track point of the target to be detected to the first lane area is determined through the displacement direction of the target track point, so that the identification of scene information is increased, the moving targets obtained by a Gaussian Mixture Model (GMMS) are overlapped frame by frame, the area of the target movement can be obtained, and the identification of the lane area can be realized corresponding to a traffic road scene.

In this embodiment, the target track set includes a plurality of target tracks from creation to deletion of the target to be detected. Specifically, in this embodiment, the step of correcting the candidate lane area by the target trajectory sets from creation to deletion of the target to be detected in step S32 to obtain the first lane area specifically includes: A. fitting a plurality of target tracks of a target to be detected to obtain a fitting line; B. and correcting the lane line slope of the candidate lane area by using the fitted line to obtain a first lane area. Through the design mode, the state of the target is predicted from a clustering angle according to the motion change of the continuous frame target, the detection problem of the detector based on deep learning in the small target is obviously optimized, the directions of a plurality of target tracks can be counted, the risk of error when the direction is determined through a single target track is avoided, and the accuracy of correcting the candidate lane area is improved.

The traffic scene information recognition is completed through a Gaussian Mixture Model (GMMS) and target track fitting, the manual participation degree is reduced, and particularly for edge calculation products, the environmental adaptivity is improved to a great extent.

S33: otherwise, ending.

Specifically, when it is determined in step S30 that the state of the target to be detected is not created or deleted, the step of recognizing the lane information is ended. If it is determined in step S31 that the target number of the target to be detected is not bound to the lane number of the corresponding candidate lane area, the step of recognizing the lane information is also ended. This saves resources and processing time.

S5: and verifying the target to be detected by using the background and the first lane area in the superposition result through a hierarchical weighting method.

Considering the condition of missing detection or false detection of the YOLOv3 detector in fig. 5 and 7, in order to reduce the probability of missing detection or false detection, the traffic scene information acquired by the YOLOv3 detector needs to be checked by a Gaussian Mixture Model (GMMS). Specifically, in the present embodiment, please refer to fig. 11, where fig. 11 is a flowchart illustrating an implementation manner of step S5 in fig. 1. Step S5 specifically includes:

s40: and acquiring a target detection frame of the target to be detected and first information thereof from the target detector.

Specifically, the first information includes size information and a spatial position of the target detection frame, and the size information includes a width of the target detection frame.

S41: and determining that the target to be detected is an abnormal detection target in response to at least one of that the space position of the target detection frame is outside the first lane area, the width of the target detection frame and the width of the first lane area meet a preset condition, and the position corresponding to the target detection frame is a background.

Specifically, (1) if the spatial position of the target detection frame is outside the first lane area, the target to be detected corresponding to the target detection frame may be determined as a suspected abnormal detection target; (2) if the width of the target detection frame and the width of the first lane area meet the preset condition, the possibility that the detection target is an abnormal detection target can be increased; (3) if the position corresponding to the target detection frame is at least one of the backgrounds extracted by the Gaussian Mixture Model (GMMS), the possibility that the detection target is an abnormal detection target can be further increased. Preferably, in this embodiment, the preset condition in (2) may be that the width (size) of the target to be detected is too large or too small compared to the width (size) of the first lane area, and the target to be detected may be determined as an abnormal detection target.

By the design mode, the traffic scene information, the size of the target and the spatial position information are combined, one-sidedness of a target detection result based on a deep learning model is avoided, abnormal detection targets are judged and filtered, the filtering accuracy of the abnormal detection targets is guaranteed, and the false detection probability is reduced.

S42: and verifying the type of the target to be detected by a hierarchical weighting method according to the size information and the spatial position.

Further, in order to make the verification result more accurate, the three situations can be comprehensively considered by a hierarchical weighting method. In the present embodiment, please refer to fig. 12, where fig. 12 is a flowchart illustrating an implementation manner of step S42 in fig. 11. Step S42 specifically includes:

s420: and obtaining a first overlapping proportion between the target detection frame and the first lane area, a ratio between the width of the target detection frame and the width of the first lane area, and a second overlapping proportion between the target detection frame and the background.

Specifically, the three cases correspond to the following three calculation methods: (1) a first overlap ratio between the target detection frame and the first lane area; (2) a ratio between a width of the target detection frame and a width of the first lane area; (3) a second overlap ratio between the target detection box and the background.

S421: in response to the first overlap ratio being greater than the first threshold, the first condition value is set to a preset constant.

Specifically, condition a: the first overlap ratio between the target detection frame and the first lane area is larger than 60%, which indicates that the spatial position of the target detection frame is located in the first lane area, and the target to be detected corresponding to the target detection frame is a normal detection target, and then the first condition value a is set to 1; otherwise, the first condition value a is set to 0. Of course, in other embodiments, the value of the first threshold may also be other values, and is not limited herein.

S422: in response to the ratio being greater than the second threshold and less than the third threshold, the second condition value is set to a preset constant.

Specifically, condition b: if the ratio of the width of the target detection frame to the width of the first lane area is less than 120%, which indicates that the width of the target detection frame and the width of the first lane area do not satisfy the preset condition in step S41, setting the second condition value b to 1; otherwise, the second condition value b is set to 0. Of course, in other embodiments, the values of the second threshold and the third threshold may also be other values, which is not limited herein.

S423: in response to the second overlap ratio being less than the fourth threshold, the third condition value is set to a preset constant.

Specifically, condition c: the second overlap ratio between the target detection frame and the background is less than 50%, which indicates that the position corresponding to the target detection frame is at least one of the backgrounds extracted by a Gaussian Mixture Model (GMMS), and then the third condition value c is set to 1; otherwise, the third condition value c is set to 0. Of course, in other embodiments, the value of the fourth threshold may also be other values, and is not limited herein.

S424: a sum of a first product of the first condition value and the first weighting ratio, a second product of the second condition value and the second weighting ratio, and a third product of the third condition value and the third weighting ratio is obtained.

Specifically, the final calculation result res is calculated by the following formula:

res＝λa+βb+ηc；

where λ, β, and η are weighted ratios of the condition a, the condition b, and the condition c, respectively. Preferably, in the present embodiment, the weighting ratio λ: beta: eta is 2:2: 6. Of course, in other embodiments, the weighting ratio may be set according to actual conditions, and is not limited herein.

S425: it is determined whether the sum is greater than or equal to a fifth threshold.

Specifically, when the weighting ratio λ: beta: when η is 2:2:6, it is determined whether the final calculation result res is greater than or equal to 8, but the value of the fifth threshold may be other values, and is not limited herein.

S426: and if so, judging that the target to be detected is a normal detection target.

Specifically, if the final calculation result res is greater than or equal to 8, it is determined that the target to be detected is a normal detection target, and subsequent detection and tracking can be performed on the target.

S427: otherwise, discarding the target to be detected.

Specifically, if the final calculation result res is less than 8, it is determined that the target to be detected is an abnormal target to be detected, and the target to be detected is discarded without being detected in the next step, so that resources and the overall processing time can be saved.

Through the design mode, the final study and judgment are made by combining various information comprehensive consideration, the accuracy of filtering the abnormal detection target is ensured, and the false detection probability of the detector is reduced.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an embodiment of the object detection system of the present application. The target detection system specifically includes:

an obtaining module 10, configured to obtain a plurality of video frames; the video frame comprises at least one target to be detected.

A segmentation module 12, coupled to the obtaining module 10, for segmenting the foreground and the background in each video frame to obtain a corresponding segmentation result.

And an overlapping module 14, coupled to the segmentation module 12, for performing pixel overlapping on the segmentation results of the plurality of video frames to obtain an overlapping result, and obtaining a candidate lane region corresponding to the foreground from the overlapping result.

And the correcting module 16 is coupled to the overlaying module 14 and configured to correct the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area.

And the checking module 18 is coupled with the correcting module 16 and used for checking the target to be detected by a hierarchical weighting method by utilizing the background and the first lane area in the superposition result.

Referring to fig. 14, fig. 14 is a block diagram of an embodiment of an object detection system according to the present application. The object detection system includes a memory 20 and a processor 22 coupled to each other. Specifically, in the present embodiment, the memory 20 stores program instructions, and the processor 22 is configured to execute the program instructions to implement the target detection method mentioned in any of the above embodiments.

Specifically, the processor 22 may also be referred to as a CPU (Central Processing Unit). The processor 22 may be an integrated circuit chip having signal processing capabilities. The Processor 22 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, processor 22 may be commonly implemented by a plurality of integrated circuit chips.

Referring to fig. 15, fig. 15 is a block diagram illustrating a computer-readable storage medium according to an embodiment of the present invention. The computer-readable storage medium 30 stores a computer program 300, which can be read by a computer, and the computer program 300 can be executed by a processor to implement the object detection method mentioned in any of the above embodiments. The computer program 300 may be stored in the computer-readable storage medium 30 in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The computer-readable storage medium 30 having a storage function may be various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a terminal device, such as a computer, a server, a mobile phone, or a tablet.

In summary, unlike the prior art, the object detection method provided by the present application includes: acquiring a plurality of video frames; the method comprises the steps that video frames comprise at least one target to be detected, a foreground and a background in each video frame are segmented to obtain a corresponding segmentation result, the segmentation results of a plurality of video frames are subjected to pixel superposition to obtain a superposition result, a candidate lane area corresponding to the foreground is obtained from the superposition result, the candidate lane area is corrected according to a target track of the target to be detected in the plurality of video frames to obtain a first lane area, and finally the target to be detected is verified by a hierarchical weighting method by utilizing the background and the first lane area in the superposition result. By the design mode, the accuracy of filtering the abnormal detection target can be ensured, so that the probability of false detection is reduced.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method of object detection, comprising:

acquiring a plurality of video frames; the video frame comprises at least one target to be detected;

segmenting the foreground and the background in each video frame to obtain a corresponding segmentation result;

pixel superposition is carried out on the segmentation results of the video frames to obtain a superposition result, and a candidate lane area corresponding to the foreground is obtained from the superposition result;

correcting the candidate lane area according to the target track of the target to be detected in the plurality of video frames to obtain a first lane area;

and verifying the target to be detected by using the background in the superposition result and the first lane area through a graded weighting method.

2. The object detection method according to claim 1, wherein the step of verifying the object to be detected by a hierarchical weighting method using the background in the superimposed result and the first lane area comprises:

acquiring a target detection frame of the target to be detected and first information thereof from a target detector; wherein the first information includes size information and a spatial position of the target detection frame, and the size information includes a width of the target detection frame;

responding to at least one of the situation that the space position of the target detection frame is outside the first lane area, the situation that the width of the target detection frame and the width of the first lane area meet preset conditions and the situation that the position corresponding to the target detection frame is the background, and judging that the target to be detected is an abnormal detection target;

and verifying the type of the target to be detected by a hierarchical weighting method according to the size information and the spatial position.

3. The object detection method according to claim 2, wherein the step of checking the type of the object to be detected by a hierarchical weighting method according to the size and the spatial position comprises:

obtaining a first overlapping proportion between the target detection frame and the first lane area, a ratio between the width of the target detection frame and the width of the first lane area, and a second overlapping proportion between the target detection frame and the background;

setting a first condition value to a preset constant in response to the first overlap ratio being greater than a first threshold; setting a second condition value to a preset constant in response to the ratio being greater than a second threshold value and less than a third threshold value; setting a third condition value to a preset constant in response to the second overlap ratio being less than a fourth threshold;

obtaining a sum of a first product of the first condition value and a first weighting ratio, a second product of the second condition value and a second weighting ratio, and a third product of the third condition value and a third weighting ratio;

and responding to the fact that the sum value is larger than or equal to a fifth threshold value, and judging the target to be detected to be a normal detection target.

4. The object detection method according to claim 1, wherein the step of pixel-overlapping the segmentation results of the plurality of video frames to obtain an overlapping result, and obtaining the candidate lane region corresponding to the foreground from the overlapping result comprises:

pixel superposition is carried out on the segmentation results of the video frames to obtain a superposition result, and edge detection is carried out on the superposed foreground to obtain a candidate lane area corresponding to the foreground; each candidate lane area is configured with a lane number;

traversing all the targets to be detected in the video frames; each target to be detected is configured with a target number;

and responding to the condition that the target to be detected is created, acquiring a candidate lane area where the target to be detected is located, and binding the target number of the target to be detected with the lane number of the corresponding candidate lane area.

5. The object detection method according to claim 4, wherein the step of correcting the candidate lane area according to the target trajectory of the object to be detected in the plurality of video frames to obtain a first lane area comprises:

in response to that the state of the target to be detected is not created and the state of the target to be detected is deleted, judging whether the target number of the target to be detected is bound with the lane number of the corresponding candidate lane area;

if so, correcting the candidate lane area through the target track set from creation to deletion of the target to be detected to obtain the first lane area, and determining the direction of the first lane area through the target track point displacement direction of the target to be detected.

6. The object detection method according to claim 5, wherein the set of object tracks includes a plurality of object tracks from creation to deletion of the object to be detected; the step of correcting the candidate lane area through the set of target tracks from creation to deletion of the target to be detected to obtain the first lane area includes:

fitting a plurality of target tracks of the target to be detected to obtain a fitting line;

correcting the lane line slope of the candidate lane region using the fitted line to obtain the first lane region.

7. The object detection method of claim 1, wherein the step of segmenting the foreground and the background in each of the video frames to obtain the corresponding segmentation result comprises:

establishing a Gaussian model for each first pixel point in a first video frame, and taking the pixel value of the first pixel point as a model mean value;

obtaining a pixel value of a second pixel point at the same position as the first pixel point in the current video frame, and judging whether a difference value between the pixel value of the second pixel point and the model mean value is greater than or equal to a sixth threshold value;

if so, judging that the target to be detected corresponding to the second pixel point is a foreground;

and otherwise, judging that the target to be detected corresponding to the second pixel point is a background.

8. The object detection method of claim 1, wherein the step of segmenting the foreground and the background in each of the video frames to obtain the corresponding segmentation result is followed by:

the background noise is removed by a morphological noise filter.

9. An object detection system, comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the object detection method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for implementing the object detection method of any one of claims 1 to 8.