CN114821451B

CN114821451B - Offline target detection method and system for traffic signal lamp video

Info

Publication number: CN114821451B
Application number: CN202210737440.XA
Authority: CN
Inventors: 陈海华; 于乔烽; 何明
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-09-20
Anticipated expiration: 2042-06-28
Also published as: CN114821451A

Abstract

The invention relates to an off-line target detection method and system for traffic signal lamp videos, which belong to the field of image and video processing, and comprise the following steps: acquiring a first video containing a traffic signal lamp as a training set; training a Yolov5 neural network by using a training set; acquiring a second video of the traffic signal lamp to be detected; acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in a second video by using the trained neural network; judging whether false detection or missing detection of the signal lamp group exists according to the first position coordinate and the first category so as to obtain a detection result of the signal lamp group; then judging whether the signal lamp is subjected to false detection or missed detection; and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp. The method and the device can improve the accuracy of signal lamp detection response in the video in a post-processing mode by utilizing the target detection result of the whole video.

Description

Offline target detection method and system for traffic signal lamp video

Technical Field

The invention relates to the field of image and video processing, in particular to an off-line target detection method and system for a traffic signal lamp video.

Background

The quantity of motor vehicles kept in China is rapidly increased along with the development of economy, the demand of motorization of people during traveling is increasing day by day, and the tail gas emission of the motor vehicles becomes one of the main sources of urban air pollution. Especially in the signal light area of the intersection, the tail gas emission of the motor vehicle in the starting stage is particularly serious. Therefore, the setting and identification of the traffic signal lamp are very important.

From the current research situation, there are various methods for signal light target detection, such as an image processing-based method, a method combining image processing and machine learning, and a deep learning-based method. The image processing-based method is to extract an interested area by using one or more characteristics of an image, and classify by using the image characteristics in the interested area so as to judge the category of the signal lamp. However, in such methods, at some critical stage of detection, such as thresholding, filtering, etc., small errors may lead to ambiguous results. The method combining image processing and deep learning is realized by still segmenting an interested region by using image features at the stage of extracting the interested region, but the method is realized by training high-level image features such as HOG, SURF and the like by using machine learning classifiers such as a Support Vector Machine (SVM) and the like on signal light category judgment. The deep learning-based method uses a convolutional neural network target detector to train a picture data set marking the position and the category of a target, and uses the trained weight to perform target detection on a signal lamp. Common target detectors include R-CNN, SSD, YOLO, etc., and the method based on deep learning generally has better performance than the method based on non-deep learning because it can learn more robust signal light features from a large amount of training data. Most of the methods are based on online detection, and only the image characteristics of the current frame are used for detecting the position and the type of the signal lamp, but at present, few researches on detecting the signal lamp by using global information for an offline video are available.

Disclosure of Invention

The invention aims to provide an off-line target detection method and system for a traffic signal lamp video, which can improve the accuracy of signal lamp detection response in the video in a post-processing mode by utilizing the target detection result of the whole video.

In order to achieve the purpose, the invention provides the following scheme:

an off-line target detection method for traffic signal lamp videos comprises the following steps:

acquiring a YOLOv5 convolutional neural network;

acquiring a video containing a traffic signal lamp, and recording the video as a first video;

carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;

training the YOLOv5 convolutional neural network with the training set data;

acquiring a traffic signal lamp video to be detected, and recording the traffic signal lamp video as a second video;

acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;

judging whether false detection or missing detection of the signal lamp group exists or not according to the first position coordinate and the first category;

if so, deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group;

judging whether the signal lamp is mistakenly detected or missed according to the detection result of the signal lamp group, the second position coordinate and the second category;

and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp.

Optionally, judging whether there is an error detection of the signal lamp group according to the first position coordinate and the first category specifically includes:

sequentially traversing the first position coordinates and the first category of the signal lamp group of the second video;

judging whether one of the signal lamp groups in the first category is detected in 3 continuous frames according to the first category;

if yes, calculating the complete intersection ratio of the signal lamp group target frame according to the first position coordinate;

judging whether the complete intersection ratio is smaller than a threshold value;

if yes, the signal lamp group is mistakenly detected and deleted.

Optionally, a nuclear correlation filtering algorithm is adopted to supplement the missing detection of the signal lamp group.

Optionally, when the missing detection is less than or equal to 20 frames, the missing detection of the signal lamp group is supplemented by using a kernel correlation filtering algorithm, and when the missing detection is greater than 20 frames, the missing detection of the signal lamp group is supplemented by using a kernel correlation filtering algorithm and a linear interpolation method.

Optionally, the second category includes 8 signal light categories: red circle light, green circle light, red left-turn arrow, green left-turn arrow, red straight arrow, green straight arrow, red right-turn arrow, and green right-turn arrow.

Optionally, if there is a missing detection of the signal lamp, before the step of supplementing the missing detection of the signal lamp, the method further includes: and identifying the color of the signal lamp.

Optionally, the identifying the color of the signal lamp specifically includes:

acquiring a signal lamp area in a first category area in a second video, carrying out Gaussian filtering, and recording a filtered image as a first image;

cutting the first image into two parts with equal areas, and respectively recording the two parts as a first part and a second part;

acquiring color channel difference values of the first part and the second part;

and judging the color of the signal lamp according to the color channel difference value.

Optionally, the false detection of the signal lamp is deleted, which specifically includes: and deleting the signal lamp detection results outside the first position coordinates of the signal lamp group.

Optionally, the threshold is 0.75.

An offline object detection system for traffic signal light video, comprising:

the neural network acquisition module is used for acquiring a YOLOv5 convolutional neural network;

the first video acquisition module is used for acquiring a video containing a traffic signal lamp and recording the video as a first video;

the training set acquisition module is used for carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;

a training module for training the YOLOv5 convolutional neural network with the training set data;

the second video acquisition module is used for acquiring a traffic signal lamp video to be detected and recording the traffic signal lamp video as a second video;

the identification module is used for acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;

the signal lamp group judging module is used for judging whether the false detection or the missing detection of the signal lamp group exists or not according to the first position coordinate and the first class;

the signal lamp group detection module is used for deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group when the false detection or the missed detection of the signal lamp group exists;

the signal lamp judging module is used for judging whether the error detection or the omission detection of the signal lamp exists according to the detection result of the signal lamp group, the second position coordinate and the second category;

and the signal lamp detection module is used for deleting the error detection of the signal lamp and supplementing the missing detection of the signal lamp to obtain the detection result of the signal lamp when the error detection or the missing detection of the signal lamp exists.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a post-processing scheme for off-line target detection of a video containing signal lamps, which is used for acquiring the shape and color information of the signal lamps in the video to be detected. The method comprises the steps of dividing detection responses into signal lamp group types and signal lamp types, carrying out post-processing restoration on the detection responses of the target of the signal lamp group types, and carrying out supplementary classification and error detection response deletion on the detection responses of the target of the signal lamp types by utilizing the position distribution characteristics and the image color characteristics of the signal lamps in the signal lamp group. Therefore, the problem that the signal lamp is rarely detected by utilizing global information for the offline video at present is solved, and the detection accuracy of the video overall signal lamp is improved.

This patent is based on the target detection technique, and interframe information processing and image color characteristic classification's method has improved the detection accuracy to the position of signal lamp in the signal lamp video and signal lamp colour and classification greatly, when can obtaining the signal lamp according to the detected information to this guides the driving speed of motor vehicle, can effectively reduce opening of motor vehicle and stop the number of times, reduces exhaust emission. The correct guidance of the motor vehicle running speed can also improve the vehicle passing speed at the traffic intersection and relieve traffic jam.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of an off-line target detection method for traffic signal light video according to the present invention;

fig. 2 is a flowchart of post-processing of a signal lamp group class target detection response according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating training categories of a neural network-based target detection network according to an embodiment of the present invention;

FIG. 4 is a flowchart of a kernel-dependent filtering tracking algorithm according to an embodiment of the present invention;

fig. 5 is a schematic diagram of prediction of a target frame of a signal light group class according to an embodiment of the present invention;

fig. 6 is a flowchart of post-processing of signal lamp type target detection response according to an embodiment of the present invention;

fig. 7 is a schematic diagram of signal lamp position distribution based on signal lamp groups according to an embodiment of the present invention;

FIG. 8 is a block diagram of an off-line object detection system for traffic signal video according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The technical scheme of the invention is realized by the following processes:

1. using a video containing a traffic signal lamp to perform picture segmentation, performing signal lamp group and signal lamp labeling, and training a convolutional neural network;

2. acquiring target types and target detection frame coordinates of 3 types of signal lamp groups and 8 types of signal lamps through a trained YOLOv5 convolutional neural network, and calculating the size through the coordinates;

3. for the signal lamp group type target detection response, the post-processing operation comprises the following steps:

3.1 if the signal lamp group detection response exists, namely the network detects a signal lamp group target, extracting signal lamp group class detection responses of continuous 3 frames, judging whether the 2 nd frame has size and position detection errors or not through the complete intersection ratio (CIoU) of the detection frame, and deleting the erroneous detection response, namely the signal lamp group detection response of the frame is set to be absent;

3.2 if the signal lamp group detection response does not exist and the number of continuous nonexistent frames is less than or equal to 20, using a kernel correlation filtering tracking algorithm to predict the position of the signal lamp group area;

3.3 if the signal lamp group detection response does not exist and the number of frames which do not exist continuously is more than 20, connecting the upper left corner coordinate and the lower right corner coordinate of the front and rear signal lamp group detection boxes by using a linear interpolation method except for tracking by using a kernel-dependent filtering tracking algorithm, wherein the connection value is used as the prediction box coordinate of the signal lamp group in the vacant frame;

4. for signal lamp type target detection response, the post-processing operation comprises the following steps:

4.1, cutting out signal lamp areas according to the position distribution characteristics of signal lamps in the 3 types of signal lamp groups;

4.2 counting the global result of signal lamp type target detection response in the whole video signal lamp area, wherein the target detection response type with the highest occurrence probability is used as the type of the signal lamp form;

4.3, judging whether signal lamp type target detection responses obtained by the convolutional neural network exist in the signal lamp area or not, and if not, carrying out color classification;

4.4 if the signal lamp type target detection response exists in the signal lamp area but the signal lamp type target detection response does not accord with the signal lamp type determined by the global statistics, deleting operation is carried out, and then color classification is carried out.

And 4.5, directly carrying out deleting operation on the signal lamp type target detection response outside the signal lamp group type target frame.

Based on the above scheme, the present invention provides a specific method flow, as shown in fig. 1, a method for detecting an offline target of a traffic signal video, comprising:

step 101: a YOLOv5 convolutional neural network is obtained.

Step 102: and acquiring a video containing the traffic signal lamp and recording the video as a first video.

Step 103: and carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data.

Step 104: training the YOLOv5 convolutional neural network with the training set data.

Step 105: and acquiring a traffic signal lamp video to be detected, and recording the traffic signal lamp video as a second video.

Step 106: and acquiring a first position coordinate and a first category of the signal lamp group in the second video and a second position coordinate and a second category of the signal lamp group in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category comprise multiple categories.

Obtaining a target detection result of the YOLOv5 neural network of a traffic light video (i.e. a second video) to be detected, namely a target category response and a coordinate response of a rectangular frame where a target is located, wherein the training categories of the target detection neural network are shown in fig. 3 and respectively represent 3 types of signal light groups: number 1 signal lamp Group1, number 2 signal lamp Group2, number 3 signal lamp Group3, 8 signal lamps: red circle lamp R, green circle lamp G, red left-turn arrow RL, green left-turn arrow GL, red straight arrow RF, green straight arrow GF, red right-turn arrow RR, green right-turn arrow GR. Meanwhile, only 1 type of 3 types of signal lamp groups can appear in a section of signal lamp video.

Step 107: and judging whether the signal lamp group is subjected to false detection or missing detection according to the first position coordinate and the first category.

Step 108: and if so, deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group.

Since the post-processing operation of the signal light class target needs the position and category information of the signal light class target detection as a basis, the step 107-108 is false detection deletion and missing detection supplement for the signal light class target detection response, and the specific flowchart is shown in fig. 2 and specifically includes:

step 102 a: and deleting the error detection result of the neural network on the signal lamp group type target.

Sequentially traversing signal lamp group class target detection position responses of the whole video, calculating the size of a target object according to a detection result, and recording as a 1 st frame, a 2 nd frame and a 3 rd frame if a certain class of signal lamp group class target is detected by 3 continuous frames. And respectively calculating CIoU of the signal lamp group target frames of the 1 st frame and the 2 nd frame and the 3 rd frame, if the CIoU values are less than 0.75, judging that the signal lamp group target detection position response in the 2 nd frame has size and position detection errors, deleting the target detection position response, and supplementing the signal lamp group target detection response of the frame into correct position response through subsequent steps.

Step 102 b: and supplementing the frame of the detection result of the non-signal lamp group type target.

Such frames include frames in which the neural network does not detect the signal light group target and frames in which the result is deleted in step 102 a.As shown in fig. 4, the detection results of the signal lamp group class targets of the whole video are sequentially traversed, and if the signal lamp group class target in the frame is detected, that is, there is a signal lamp group class target detection response, the image in the target frame range is used to initialize the kernel correlation filtering tracking algorithm, and K =0 is initialized, where K is a count variable of each tracking. If the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the frame coordinates of the signal lamp group class target frame of the frame, taking the frame coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1. Repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time ₀ =20, if K>K ₀ The kernel correlation filtering tracking algorithm is terminated. By this step, the signal lamp group target detection response in the frame in which the signal lamp group detection result is deleted and the frame in which the signal lamp group detection response continuously lacks less than or equal to 20 frames in step 102a can be preliminarily supplemented as the correct position response.

Step 102 c: when no signal lamp group object detection result frames continuously appear and the total number exceeds K ₀ （K ₀ A constant value representing the upper limit of the number of tracking frames each time), predicting the coordinates of the target frame of the signal lamp group class by using a linear interpolation method, and specifically comprising the following steps of:

as shown in FIG. 5, let M + K ₀ The coordinates of the upper left corner and the lower right corner of the target frame of the signal lamp group in the frame are respectively

，

The coordinates of the upper left corner and the lower right corner of the target frame of the signal lamp group in the M + N +1 th frame are respectively

，

Then M + K ₀ Upper left corner of target frame of signal lamp group in + n framesAnd the coordinates of the lower right corner can be respectively expressed as

，

Wherein

And

，

and

are coordinate vectors, each containing two coordinate information of x and y, and N are counting scalars. The linear interpolation method described above assumes that the signal light group is correctly detected in both the mth frame and the M + N +1 th frame.

If a plurality of blinker groups are detected in the M + N +1 th frame, as shown in the M + N +1 th frame of fig. 7, and the right blinker group target frame is an error detection response, one of the blinkers is selected as the target area. Let Q represent the number of signal lamp groups detected in the M + N +1 th frame, then the Q-th signal lamp group and M + K ₀ The distance of the groups of signal lamps in a frame can be written as

Wherein

Represents the upper left corner and M + K of the qth signal lamp group ₀ Euclidean distance of the upper left corner of the signal lamp group in the frame, and

then represents the lower right corner of the qth signal lamp group and the M + K ₀ Euclidean distance in the lower right hand corner of the signal lamp group in the frame. In the method, D is selected _q The signal lamp group with the minimum value is used as the objectTarget area, i.e.

。

Step 109: and judging whether the signal lamp is mistakenly detected or missed according to the detection result of the signal lamp group, the second position coordinate and the second category.

Step 110: and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp.

Step 109-110 is to count the detection data of the entire video signal lamps according to the processing result of the signal lamp groups in step 108, and correct the signal lamp detection loss and errors by combining color judgment, so as to further improve the detection accuracy, and the specific flowchart is shown in fig. 6, and specifically includes:

step 103 a: in the step, signal lamp area extraction is carried out on the signal lamp group target detection frame. In the Group1 category, the left half part of the intercepting is a signal lamp area, in the Group2 category, the left half part and the right half part are respectively intercepted to be two signal lamp areas, and in the Group3 category, the left, the middle and the right three rectangles with the same size are respectively intercepted to be three signal lamp areas. The specific implementation steps are as follows:

let the coordinate of the upper left corner of the target frame of the signal lamp group class be

The coordinate of the lower right corner is

Height and width of

And

. Setting the width of the signal lamp according to the size characteristics of the signal lamp

。

As shown in fig. 7If the type of the target frame of the signal lamp Group is Group1, determining the coordinates of the upper left corner of 1 signal lamp in the signal lamp Group as

The coordinates of the lower right corner are determined as

. If the type of the target frame of the signal lamp Group is Group2, the coordinates of the upper left corners of 2 signal lamps in the signal lamp Group are determined as

，

The coordinates of the lower right corner are determined as

，

. If the type of the target frame of the signal lamp Group is Group3, determining the coordinates of the upper left corners of 3 signal lamps in the signal lamp Group as

，

，

The coordinates of the lower right corner are determined as

，

，

。

Step 103 b: the type (second type) of the signal lamp type target detection in the signal lamp area in step 103a is counted. Firstly, 8 signal lamps are classified into 4 forms, the first form is a round lamp type form, and the round lamp type form comprises a red round lamp and a green round lamp; the second type is a left-turn arrow type form, which comprises a red left-turn arrow and a green left-turn arrow; the third type is a straight arrow type form, which comprises a red straight arrow and a green straight arrow; the fourth is a right-turn arrow type configuration, including a red right-turn arrow and a green right-turn arrow. Secondly, counting the number of four types of types in the signal lamp area, and judging the type with the largest number to be the signal lamp type in the area. For example: if the signal lamp Group type in a video segment is Group2, counting the number of form types in a left signal lamp area and a right signal lamp area in the video segment, and if the number of left-turning arrows in the left signal lamp area is the largest, judging the form type in the area as a left-turning arrow; if the right signal lamp area contains a plurality of round lamps, the shape type in the area is judged as the round lamp.

Step 103 c: defining a certain signal lamp area image intercepted in the step 103a as L, and if no signal lamp type target detection response exists in the certain signal lamp area image, supplementing a detection result by using the following steps:

1. firstly, Gaussian filtering is carried out on L to remove image noise, and then the L is cut into an upper image and a lower image which have the same area, namely the L _u And L _d . Let L _u Respectively R for the red and green channel matrices _u And G _u ，L _d Respectively R for the red and green channel matrices _d And G _d Then, the color gamut differential values of the two regions can be expressed as:

wherein

And

respectively representing the difference values of the G channel and the R channel of the upper and lower two images,

and

respectively representing the difference values of the R channel and the G channel of the upper and lower two images.

2. The color of the signal lamp is judged according to the color channel difference value according to the judgment basis

Where Red, Green and None correspond to Red, Green and unassigned colors, respectively.

3. And adding detection responses including the positions and the types of the target detection frames to the frames in the signal lamp area in which the signal lamp target is not detected by combining the shape judgment result in the step 103b and the color judgment result.

Step 103 d: if the traffic light shape type in the traffic light region L does not match the statistical result of step 103b, the detection result is deleted, and step 103c is executed to supplement the region detection result.

Step 103 e: deleting the detection results of the signal lamps outside the areas of the signal lamp groups Group1, Group2 and Group 3. Because no target signal lamp exists outside the signal lamp group, all signal lamp type target detection responses outside the signal lamp group target frame are deleted on the basis of determining the signal lamp group type target detection frame.

Based on the above method, the present invention also discloses an offline target detection system for traffic signal lamp video, as shown in fig. 8, including:

a neural network obtaining module 201, configured to obtain a YOLOv5 convolutional neural network.

The first video acquiring module 202 is configured to acquire a video including a traffic light, and record the video as a first video.

And the training set acquisition module 203 is configured to perform picture segmentation on the first video and perform signal lamp group and signal lamp labeling to obtain training set data.

A training module 204, configured to train the YOLOv5 convolutional neural network with the training set data.

The second video acquiring module 205 is configured to acquire a traffic signal lamp video to be detected, and record the traffic signal lamp video as a second video.

The identifying module 206 is configured to acquire the first position coordinate and the first category of the signal light group in the second video, and the second position coordinate and the second category of the signal light by using the trained YOLOv5 convolutional neural network, where the first category and the second category each include multiple categories.

And the signal lamp group judging module 207 is configured to judge whether false detection or missed detection of the signal lamp group exists according to the first position coordinate and the first category.

And the signal lamp group detection module 208 is configured to delete the false detection of the signal lamp group and supplement the missed detection of the signal lamp group to obtain a detection result of the signal lamp group when there is false detection or missed detection of the signal lamp group.

And the signal lamp judging module 209 is configured to judge whether there is a false detection or a missed detection of the signal lamp according to the detection result of the signal lamp group, the second position coordinate, and the second category.

And the signal lamp detection module 210 is configured to delete the false detection of the signal lamp and supplement the missed detection of the signal lamp when the false detection or the missed detection of the signal lamp exists, so as to obtain a detection result of the signal lamp.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. An off-line target detection method for a traffic signal lamp video is characterized by comprising the following steps:

acquiring a YOLOv5 convolutional neural network;

training the YOLOv5 convolutional neural network with the training set data;

judging whether the error detection of the signal lamp group exists according to the first position coordinate and the first category, and specifically comprising the following steps:

if yes, calculating the complete intersection ratio of the 1 st frame and the 2 nd frame and the 3 rd frame of the signal lamp group target frame according to the first position coordinate;

judging whether the complete intersection ratios are all smaller than a threshold value;

if yes, judging that the size and position detection error occurs in the signal lamp group target detection position response in the 2 nd frame, and deleting the target detection position response;

supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group, and specifically comprising the following steps of:

sequentially traversing the detection results of the signal lamp group type targets of the whole video, if the signal lamp group type targets in the frame are detected, initializing a kernel correlation filtering tracking algorithm by using an image in the range of the target frame, and initializing K =0, wherein K is a counting variable of each tracking; if the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the coordinates of the signal lamp group class target frame of the frame, taking the coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1; repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time ₀ =20, if K>K ₀ If yes, the kernel correlation filtering tracking algorithm is stopped;

when the missed detection is more than 20 frames, supplementing the missed detection of the signal lamp group by adopting a nuclear correlation filtering algorithm and a linear interpolation method;

if yes, then delete the false detection of signal lamp, supplement the hourglass of signal lamp and examine, obtain the testing result of signal lamp, specifically include:

carrying out signal lamp area extraction on the signal lamp group target detection frame; counting the types of signal lamp target detection in a signal lamp area; if no signal lamp target detection response exists in the signal lamp area, performing color supplementary classification; if the detection response form type of the signal lamp type target in the signal lamp area is not consistent with the statistical result, deleting the detection result, and then performing color supplementary classification; deleting target detection responses outside the signal lamp group;

acquiring a signal lamp area image in a first category area in a second video, carrying out Gaussian filtering, and recording the filtered image as a first image;

2. The method of claim 1, wherein the second category comprises 8 signal light categories: red circle light, green circle light, red left-turn arrow, green left-turn arrow, red straight arrow, green straight arrow, red right-turn arrow, and green right-turn arrow.

3. The method of claim 1, wherein the threshold is 0.75.

4. An offline target detection system for traffic signal light video, comprising:

the signal lamp group judging module is used for judging whether the false detection or the missing detection of the signal lamp group exists or not according to the first position coordinate and the first class; judging whether the error detection of the signal lamp group exists according to the first position coordinate and the first category, and specifically comprising the following steps:

the signal lamp group detection module is used for deleting the error detection of the signal lamp group and supplementing the missing detection of the signal lamp group when the error detection or the missing detection of the signal lamp group exists, so that the detection result of the signal lamp group is obtained, and the detection method specifically comprises the following steps:

sequentially traversing the detection results of the signal lamp group type targets of the whole video, if the signal lamp group type targets in the frame are detected, initializing a kernel correlation filtering tracking algorithm by using an image in the range of the target frame, and initializing K =0, wherein K is a counting variable of each tracking; if the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the coordinates of the signal lamp group class target frame of the frame, taking the coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1; repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time ₀ =20, if K>K ₀ If yes, terminating the kernel correlation filtering tracking algorithm;

the signal lamp detection module is used for deleting the error detection of the signal lamp and supplementing the missed detection of the signal lamp when the error detection or the missed detection of the signal lamp exists, and the detection result of the signal lamp is obtained and specifically comprises the following steps:

cutting the first image into two parts with equal areas, and respectively marking the two parts as a first part and a second part;