CN114821451B - Offline target detection method and system for traffic signal lamp video - Google Patents

Offline target detection method and system for traffic signal lamp video Download PDF

Info

Publication number
CN114821451B
CN114821451B CN202210737440.XA CN202210737440A CN114821451B CN 114821451 B CN114821451 B CN 114821451B CN 202210737440 A CN202210737440 A CN 202210737440A CN 114821451 B CN114821451 B CN 114821451B
Authority
CN
China
Prior art keywords
signal lamp
detection
video
lamp group
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210737440.XA
Other languages
Chinese (zh)
Other versions
CN114821451A (en
Inventor
陈海华
于乔烽
何明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202210737440.XA priority Critical patent/CN114821451B/en
Publication of CN114821451A publication Critical patent/CN114821451A/en
Application granted granted Critical
Publication of CN114821451B publication Critical patent/CN114821451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an off-line target detection method and system for traffic signal lamp videos, which belong to the field of image and video processing, and comprise the following steps: acquiring a first video containing a traffic signal lamp as a training set; training a Yolov5 neural network by using a training set; acquiring a second video of the traffic signal lamp to be detected; acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in a second video by using the trained neural network; judging whether false detection or missing detection of the signal lamp group exists according to the first position coordinate and the first category so as to obtain a detection result of the signal lamp group; then judging whether the signal lamp is subjected to false detection or missed detection; and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp. The method and the device can improve the accuracy of signal lamp detection response in the video in a post-processing mode by utilizing the target detection result of the whole video.

Description

Offline target detection method and system for traffic signal lamp video
Technical Field
The invention relates to the field of image and video processing, in particular to an off-line target detection method and system for a traffic signal lamp video.
Background
The quantity of motor vehicles kept in China is rapidly increased along with the development of economy, the demand of motorization of people during traveling is increasing day by day, and the tail gas emission of the motor vehicles becomes one of the main sources of urban air pollution. Especially in the signal light area of the intersection, the tail gas emission of the motor vehicle in the starting stage is particularly serious. Therefore, the setting and identification of the traffic signal lamp are very important.
From the current research situation, there are various methods for signal light target detection, such as an image processing-based method, a method combining image processing and machine learning, and a deep learning-based method. The image processing-based method is to extract an interested area by using one or more characteristics of an image, and classify by using the image characteristics in the interested area so as to judge the category of the signal lamp. However, in such methods, at some critical stage of detection, such as thresholding, filtering, etc., small errors may lead to ambiguous results. The method combining image processing and deep learning is realized by still segmenting an interested region by using image features at the stage of extracting the interested region, but the method is realized by training high-level image features such as HOG, SURF and the like by using machine learning classifiers such as a Support Vector Machine (SVM) and the like on signal light category judgment. The deep learning-based method uses a convolutional neural network target detector to train a picture data set marking the position and the category of a target, and uses the trained weight to perform target detection on a signal lamp. Common target detectors include R-CNN, SSD, YOLO, etc., and the method based on deep learning generally has better performance than the method based on non-deep learning because it can learn more robust signal light features from a large amount of training data. Most of the methods are based on online detection, and only the image characteristics of the current frame are used for detecting the position and the type of the signal lamp, but at present, few researches on detecting the signal lamp by using global information for an offline video are available.
Disclosure of Invention
The invention aims to provide an off-line target detection method and system for a traffic signal lamp video, which can improve the accuracy of signal lamp detection response in the video in a post-processing mode by utilizing the target detection result of the whole video.
In order to achieve the purpose, the invention provides the following scheme:
an off-line target detection method for traffic signal lamp videos comprises the following steps:
acquiring a YOLOv5 convolutional neural network;
acquiring a video containing a traffic signal lamp, and recording the video as a first video;
carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;
training the YOLOv5 convolutional neural network with the training set data;
acquiring a traffic signal lamp video to be detected, and recording the traffic signal lamp video as a second video;
acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;
judging whether false detection or missing detection of the signal lamp group exists or not according to the first position coordinate and the first category;
if so, deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group;
judging whether the signal lamp is mistakenly detected or missed according to the detection result of the signal lamp group, the second position coordinate and the second category;
and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp.
Optionally, judging whether there is an error detection of the signal lamp group according to the first position coordinate and the first category specifically includes:
sequentially traversing the first position coordinates and the first category of the signal lamp group of the second video;
judging whether one of the signal lamp groups in the first category is detected in 3 continuous frames according to the first category;
if yes, calculating the complete intersection ratio of the signal lamp group target frame according to the first position coordinate;
judging whether the complete intersection ratio is smaller than a threshold value;
if yes, the signal lamp group is mistakenly detected and deleted.
Optionally, a nuclear correlation filtering algorithm is adopted to supplement the missing detection of the signal lamp group.
Optionally, when the missing detection is less than or equal to 20 frames, the missing detection of the signal lamp group is supplemented by using a kernel correlation filtering algorithm, and when the missing detection is greater than 20 frames, the missing detection of the signal lamp group is supplemented by using a kernel correlation filtering algorithm and a linear interpolation method.
Optionally, the second category includes 8 signal light categories: red circle light, green circle light, red left-turn arrow, green left-turn arrow, red straight arrow, green straight arrow, red right-turn arrow, and green right-turn arrow.
Optionally, if there is a missing detection of the signal lamp, before the step of supplementing the missing detection of the signal lamp, the method further includes: and identifying the color of the signal lamp.
Optionally, the identifying the color of the signal lamp specifically includes:
acquiring a signal lamp area in a first category area in a second video, carrying out Gaussian filtering, and recording a filtered image as a first image;
cutting the first image into two parts with equal areas, and respectively recording the two parts as a first part and a second part;
acquiring color channel difference values of the first part and the second part;
and judging the color of the signal lamp according to the color channel difference value.
Optionally, the false detection of the signal lamp is deleted, which specifically includes: and deleting the signal lamp detection results outside the first position coordinates of the signal lamp group.
Optionally, the threshold is 0.75.
An offline object detection system for traffic signal light video, comprising:
the neural network acquisition module is used for acquiring a YOLOv5 convolutional neural network;
the first video acquisition module is used for acquiring a video containing a traffic signal lamp and recording the video as a first video;
the training set acquisition module is used for carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;
a training module for training the YOLOv5 convolutional neural network with the training set data;
the second video acquisition module is used for acquiring a traffic signal lamp video to be detected and recording the traffic signal lamp video as a second video;
the identification module is used for acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;
the signal lamp group judging module is used for judging whether the false detection or the missing detection of the signal lamp group exists or not according to the first position coordinate and the first class;
the signal lamp group detection module is used for deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group when the false detection or the missed detection of the signal lamp group exists;
the signal lamp judging module is used for judging whether the error detection or the omission detection of the signal lamp exists according to the detection result of the signal lamp group, the second position coordinate and the second category;
and the signal lamp detection module is used for deleting the error detection of the signal lamp and supplementing the missing detection of the signal lamp to obtain the detection result of the signal lamp when the error detection or the missing detection of the signal lamp exists.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a post-processing scheme for off-line target detection of a video containing signal lamps, which is used for acquiring the shape and color information of the signal lamps in the video to be detected. The method comprises the steps of dividing detection responses into signal lamp group types and signal lamp types, carrying out post-processing restoration on the detection responses of the target of the signal lamp group types, and carrying out supplementary classification and error detection response deletion on the detection responses of the target of the signal lamp types by utilizing the position distribution characteristics and the image color characteristics of the signal lamps in the signal lamp group. Therefore, the problem that the signal lamp is rarely detected by utilizing global information for the offline video at present is solved, and the detection accuracy of the video overall signal lamp is improved.
This patent is based on the target detection technique, and interframe information processing and image color characteristic classification's method has improved the detection accuracy to the position of signal lamp in the signal lamp video and signal lamp colour and classification greatly, when can obtaining the signal lamp according to the detected information to this guides the driving speed of motor vehicle, can effectively reduce opening of motor vehicle and stop the number of times, reduces exhaust emission. The correct guidance of the motor vehicle running speed can also improve the vehicle passing speed at the traffic intersection and relieve traffic jam.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of an off-line target detection method for traffic signal light video according to the present invention;
fig. 2 is a flowchart of post-processing of a signal lamp group class target detection response according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating training categories of a neural network-based target detection network according to an embodiment of the present invention;
FIG. 4 is a flowchart of a kernel-dependent filtering tracking algorithm according to an embodiment of the present invention;
fig. 5 is a schematic diagram of prediction of a target frame of a signal light group class according to an embodiment of the present invention;
fig. 6 is a flowchart of post-processing of signal lamp type target detection response according to an embodiment of the present invention;
fig. 7 is a schematic diagram of signal lamp position distribution based on signal lamp groups according to an embodiment of the present invention;
FIG. 8 is a block diagram of an off-line object detection system for traffic signal video according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an off-line target detection method and system for a traffic signal lamp video, which can improve the accuracy of signal lamp detection response in the video in a post-processing mode by utilizing the target detection result of the whole video.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The technical scheme of the invention is realized by the following processes:
1. using a video containing a traffic signal lamp to perform picture segmentation, performing signal lamp group and signal lamp labeling, and training a convolutional neural network;
2. acquiring target types and target detection frame coordinates of 3 types of signal lamp groups and 8 types of signal lamps through a trained YOLOv5 convolutional neural network, and calculating the size through the coordinates;
3. for the signal lamp group type target detection response, the post-processing operation comprises the following steps:
3.1 if the signal lamp group detection response exists, namely the network detects a signal lamp group target, extracting signal lamp group class detection responses of continuous 3 frames, judging whether the 2 nd frame has size and position detection errors or not through the complete intersection ratio (CIoU) of the detection frame, and deleting the erroneous detection response, namely the signal lamp group detection response of the frame is set to be absent;
3.2 if the signal lamp group detection response does not exist and the number of continuous nonexistent frames is less than or equal to 20, using a kernel correlation filtering tracking algorithm to predict the position of the signal lamp group area;
3.3 if the signal lamp group detection response does not exist and the number of frames which do not exist continuously is more than 20, connecting the upper left corner coordinate and the lower right corner coordinate of the front and rear signal lamp group detection boxes by using a linear interpolation method except for tracking by using a kernel-dependent filtering tracking algorithm, wherein the connection value is used as the prediction box coordinate of the signal lamp group in the vacant frame;
4. for signal lamp type target detection response, the post-processing operation comprises the following steps:
4.1, cutting out signal lamp areas according to the position distribution characteristics of signal lamps in the 3 types of signal lamp groups;
4.2 counting the global result of signal lamp type target detection response in the whole video signal lamp area, wherein the target detection response type with the highest occurrence probability is used as the type of the signal lamp form;
4.3, judging whether signal lamp type target detection responses obtained by the convolutional neural network exist in the signal lamp area or not, and if not, carrying out color classification;
4.4 if the signal lamp type target detection response exists in the signal lamp area but the signal lamp type target detection response does not accord with the signal lamp type determined by the global statistics, deleting operation is carried out, and then color classification is carried out.
And 4.5, directly carrying out deleting operation on the signal lamp type target detection response outside the signal lamp group type target frame.
Based on the above scheme, the present invention provides a specific method flow, as shown in fig. 1, a method for detecting an offline target of a traffic signal video, comprising:
step 101: a YOLOv5 convolutional neural network is obtained.
Step 102: and acquiring a video containing the traffic signal lamp and recording the video as a first video.
Step 103: and carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data.
Step 104: training the YOLOv5 convolutional neural network with the training set data.
Step 105: and acquiring a traffic signal lamp video to be detected, and recording the traffic signal lamp video as a second video.
Step 106: and acquiring a first position coordinate and a first category of the signal lamp group in the second video and a second position coordinate and a second category of the signal lamp group in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category comprise multiple categories.
Obtaining a target detection result of the YOLOv5 neural network of a traffic light video (i.e. a second video) to be detected, namely a target category response and a coordinate response of a rectangular frame where a target is located, wherein the training categories of the target detection neural network are shown in fig. 3 and respectively represent 3 types of signal light groups: number 1 signal lamp Group1, number 2 signal lamp Group2, number 3 signal lamp Group3, 8 signal lamps: red circle lamp R, green circle lamp G, red left-turn arrow RL, green left-turn arrow GL, red straight arrow RF, green straight arrow GF, red right-turn arrow RR, green right-turn arrow GR. Meanwhile, only 1 type of 3 types of signal lamp groups can appear in a section of signal lamp video.
Step 107: and judging whether the signal lamp group is subjected to false detection or missing detection according to the first position coordinate and the first category.
Step 108: and if so, deleting the false detection of the signal lamp group and supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group.
Since the post-processing operation of the signal light class target needs the position and category information of the signal light class target detection as a basis, the step 107-108 is false detection deletion and missing detection supplement for the signal light class target detection response, and the specific flowchart is shown in fig. 2 and specifically includes:
step 102 a: and deleting the error detection result of the neural network on the signal lamp group type target.
Sequentially traversing signal lamp group class target detection position responses of the whole video, calculating the size of a target object according to a detection result, and recording as a 1 st frame, a 2 nd frame and a 3 rd frame if a certain class of signal lamp group class target is detected by 3 continuous frames. And respectively calculating CIoU of the signal lamp group target frames of the 1 st frame and the 2 nd frame and the 3 rd frame, if the CIoU values are less than 0.75, judging that the signal lamp group target detection position response in the 2 nd frame has size and position detection errors, deleting the target detection position response, and supplementing the signal lamp group target detection response of the frame into correct position response through subsequent steps.
Step 102 b: and supplementing the frame of the detection result of the non-signal lamp group type target.
Such frames include frames in which the neural network does not detect the signal light group target and frames in which the result is deleted in step 102 a.As shown in fig. 4, the detection results of the signal lamp group class targets of the whole video are sequentially traversed, and if the signal lamp group class target in the frame is detected, that is, there is a signal lamp group class target detection response, the image in the target frame range is used to initialize the kernel correlation filtering tracking algorithm, and K =0 is initialized, where K is a count variable of each tracking. If the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the frame coordinates of the signal lamp group class target frame of the frame, taking the frame coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1. Repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time 0 =20, if K>K 0 The kernel correlation filtering tracking algorithm is terminated. By this step, the signal lamp group target detection response in the frame in which the signal lamp group detection result is deleted and the frame in which the signal lamp group detection response continuously lacks less than or equal to 20 frames in step 102a can be preliminarily supplemented as the correct position response.
Step 102 c: when no signal lamp group object detection result frames continuously appear and the total number exceeds K 0 (K 0 A constant value representing the upper limit of the number of tracking frames each time), predicting the coordinates of the target frame of the signal lamp group class by using a linear interpolation method, and specifically comprising the following steps of:
as shown in FIG. 5, let M + K 0 The coordinates of the upper left corner and the lower right corner of the target frame of the signal lamp group in the frame are respectively
Figure 911648DEST_PATH_IMAGE001
Figure 376258DEST_PATH_IMAGE002
The coordinates of the upper left corner and the lower right corner of the target frame of the signal lamp group in the M + N +1 th frame are respectively
Figure 218313DEST_PATH_IMAGE003
Figure 347943DEST_PATH_IMAGE004
Then M + K 0 Upper left corner of target frame of signal lamp group in + n framesAnd the coordinates of the lower right corner can be respectively expressed as
Figure 709785DEST_PATH_IMAGE005
Figure 860144DEST_PATH_IMAGE006
Wherein
Figure 940226DEST_PATH_IMAGE007
And
Figure 935864DEST_PATH_IMAGE008
Figure 229178DEST_PATH_IMAGE009
and
Figure 284859DEST_PATH_IMAGE010
are coordinate vectors, each containing two coordinate information of x and y, and N are counting scalars. The linear interpolation method described above assumes that the signal light group is correctly detected in both the mth frame and the M + N +1 th frame.
If a plurality of blinker groups are detected in the M + N +1 th frame, as shown in the M + N +1 th frame of fig. 7, and the right blinker group target frame is an error detection response, one of the blinkers is selected as the target area. Let Q represent the number of signal lamp groups detected in the M + N +1 th frame, then the Q-th signal lamp group and M + K 0 The distance of the groups of signal lamps in a frame can be written as
Figure 586658DEST_PATH_IMAGE011
Wherein
Figure 385987DEST_PATH_IMAGE012
Represents the upper left corner and M + K of the qth signal lamp group 0 Euclidean distance of the upper left corner of the signal lamp group in the frame, and
Figure 643793DEST_PATH_IMAGE013
then represents the lower right corner of the qth signal lamp group and the M + K 0 Euclidean distance in the lower right hand corner of the signal lamp group in the frame. In the method, D is selected q The signal lamp group with the minimum value is used as the objectTarget area, i.e.
Figure 886687DEST_PATH_IMAGE014
Step 109: and judging whether the signal lamp is mistakenly detected or missed according to the detection result of the signal lamp group, the second position coordinate and the second category.
Step 110: and if so, deleting the false detection of the signal lamp, and supplementing the missed detection of the signal lamp to obtain the detection result of the signal lamp.
Step 109-110 is to count the detection data of the entire video signal lamps according to the processing result of the signal lamp groups in step 108, and correct the signal lamp detection loss and errors by combining color judgment, so as to further improve the detection accuracy, and the specific flowchart is shown in fig. 6, and specifically includes:
step 103 a: in the step, signal lamp area extraction is carried out on the signal lamp group target detection frame. In the Group1 category, the left half part of the intercepting is a signal lamp area, in the Group2 category, the left half part and the right half part are respectively intercepted to be two signal lamp areas, and in the Group3 category, the left, the middle and the right three rectangles with the same size are respectively intercepted to be three signal lamp areas. The specific implementation steps are as follows:
let the coordinate of the upper left corner of the target frame of the signal lamp group class be
Figure 190629DEST_PATH_IMAGE015
The coordinate of the lower right corner is
Figure 606698DEST_PATH_IMAGE016
Height and width of
Figure 719011DEST_PATH_IMAGE017
And
Figure 661034DEST_PATH_IMAGE018
. Setting the width of the signal lamp according to the size characteristics of the signal lamp
Figure 858797DEST_PATH_IMAGE019
As shown in fig. 7If the type of the target frame of the signal lamp Group is Group1, determining the coordinates of the upper left corner of 1 signal lamp in the signal lamp Group as
Figure 531087DEST_PATH_IMAGE020
The coordinates of the lower right corner are determined as
Figure 497906DEST_PATH_IMAGE021
. If the type of the target frame of the signal lamp Group is Group2, the coordinates of the upper left corners of 2 signal lamps in the signal lamp Group are determined as
Figure 613761DEST_PATH_IMAGE022
Figure 361137DEST_PATH_IMAGE023
The coordinates of the lower right corner are determined as
Figure 712484DEST_PATH_IMAGE024
Figure 674755DEST_PATH_IMAGE025
. If the type of the target frame of the signal lamp Group is Group3, determining the coordinates of the upper left corners of 3 signal lamps in the signal lamp Group as
Figure 414041DEST_PATH_IMAGE026
Figure 665025DEST_PATH_IMAGE027
Figure 820062DEST_PATH_IMAGE028
The coordinates of the lower right corner are determined as
Figure 89370DEST_PATH_IMAGE029
Figure 815536DEST_PATH_IMAGE030
Figure 803083DEST_PATH_IMAGE031
Step 103 b: the type (second type) of the signal lamp type target detection in the signal lamp area in step 103a is counted. Firstly, 8 signal lamps are classified into 4 forms, the first form is a round lamp type form, and the round lamp type form comprises a red round lamp and a green round lamp; the second type is a left-turn arrow type form, which comprises a red left-turn arrow and a green left-turn arrow; the third type is a straight arrow type form, which comprises a red straight arrow and a green straight arrow; the fourth is a right-turn arrow type configuration, including a red right-turn arrow and a green right-turn arrow. Secondly, counting the number of four types of types in the signal lamp area, and judging the type with the largest number to be the signal lamp type in the area. For example: if the signal lamp Group type in a video segment is Group2, counting the number of form types in a left signal lamp area and a right signal lamp area in the video segment, and if the number of left-turning arrows in the left signal lamp area is the largest, judging the form type in the area as a left-turning arrow; if the right signal lamp area contains a plurality of round lamps, the shape type in the area is judged as the round lamp.
Step 103 c: defining a certain signal lamp area image intercepted in the step 103a as L, and if no signal lamp type target detection response exists in the certain signal lamp area image, supplementing a detection result by using the following steps:
1. firstly, Gaussian filtering is carried out on L to remove image noise, and then the L is cut into an upper image and a lower image which have the same area, namely the L u And L d . Let L u Respectively R for the red and green channel matrices u And G u ,L d Respectively R for the red and green channel matrices d And G d Then, the color gamut differential values of the two regions can be expressed as:
Figure 840441DEST_PATH_IMAGE032
Figure 698675DEST_PATH_IMAGE033
wherein
Figure 451868DEST_PATH_IMAGE034
And
Figure 474181DEST_PATH_IMAGE035
respectively representing the difference values of the G channel and the R channel of the upper and lower two images,
Figure 564497DEST_PATH_IMAGE036
and
Figure 559129DEST_PATH_IMAGE037
respectively representing the difference values of the R channel and the G channel of the upper and lower two images.
2. The color of the signal lamp is judged according to the color channel difference value according to the judgment basis
Figure 545540DEST_PATH_IMAGE038
Where Red, Green and None correspond to Red, Green and unassigned colors, respectively.
3. And adding detection responses including the positions and the types of the target detection frames to the frames in the signal lamp area in which the signal lamp target is not detected by combining the shape judgment result in the step 103b and the color judgment result.
Step 103 d: if the traffic light shape type in the traffic light region L does not match the statistical result of step 103b, the detection result is deleted, and step 103c is executed to supplement the region detection result.
Step 103 e: deleting the detection results of the signal lamps outside the areas of the signal lamp groups Group1, Group2 and Group 3. Because no target signal lamp exists outside the signal lamp group, all signal lamp type target detection responses outside the signal lamp group target frame are deleted on the basis of determining the signal lamp group type target detection frame.
Based on the above method, the present invention also discloses an offline target detection system for traffic signal lamp video, as shown in fig. 8, including:
a neural network obtaining module 201, configured to obtain a YOLOv5 convolutional neural network.
The first video acquiring module 202 is configured to acquire a video including a traffic light, and record the video as a first video.
And the training set acquisition module 203 is configured to perform picture segmentation on the first video and perform signal lamp group and signal lamp labeling to obtain training set data.
A training module 204, configured to train the YOLOv5 convolutional neural network with the training set data.
The second video acquiring module 205 is configured to acquire a traffic signal lamp video to be detected, and record the traffic signal lamp video as a second video.
The identifying module 206 is configured to acquire the first position coordinate and the first category of the signal light group in the second video, and the second position coordinate and the second category of the signal light by using the trained YOLOv5 convolutional neural network, where the first category and the second category each include multiple categories.
And the signal lamp group judging module 207 is configured to judge whether false detection or missed detection of the signal lamp group exists according to the first position coordinate and the first category.
And the signal lamp group detection module 208 is configured to delete the false detection of the signal lamp group and supplement the missed detection of the signal lamp group to obtain a detection result of the signal lamp group when there is false detection or missed detection of the signal lamp group.
And the signal lamp judging module 209 is configured to judge whether there is a false detection or a missed detection of the signal lamp according to the detection result of the signal lamp group, the second position coordinate, and the second category.
And the signal lamp detection module 210 is configured to delete the false detection of the signal lamp and supplement the missed detection of the signal lamp when the false detection or the missed detection of the signal lamp exists, so as to obtain a detection result of the signal lamp.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (4)

1. An off-line target detection method for a traffic signal lamp video is characterized by comprising the following steps:
acquiring a YOLOv5 convolutional neural network;
acquiring a video containing a traffic signal lamp, and recording the video as a first video;
carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;
training the YOLOv5 convolutional neural network with the training set data;
acquiring a traffic signal lamp video to be detected, and recording the traffic signal lamp video as a second video;
acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;
judging whether false detection or missing detection of the signal lamp group exists or not according to the first position coordinate and the first category;
judging whether the error detection of the signal lamp group exists according to the first position coordinate and the first category, and specifically comprising the following steps:
sequentially traversing the first position coordinates and the first category of the signal lamp group of the second video;
judging whether one of the signal lamp groups in the first category is detected in 3 continuous frames according to the first category;
if yes, calculating the complete intersection ratio of the 1 st frame and the 2 nd frame and the 3 rd frame of the signal lamp group target frame according to the first position coordinate;
judging whether the complete intersection ratios are all smaller than a threshold value;
if yes, judging that the size and position detection error occurs in the signal lamp group target detection position response in the 2 nd frame, and deleting the target detection position response;
supplementing the missed detection of the signal lamp group to obtain the detection result of the signal lamp group, and specifically comprising the following steps of:
sequentially traversing the detection results of the signal lamp group type targets of the whole video, if the signal lamp group type targets in the frame are detected, initializing a kernel correlation filtering tracking algorithm by using an image in the range of the target frame, and initializing K =0, wherein K is a counting variable of each tracking; if the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the coordinates of the signal lamp group class target frame of the frame, taking the coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1; repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time 0 =20, if K>K 0 If yes, the kernel correlation filtering tracking algorithm is stopped;
when the missed detection is more than 20 frames, supplementing the missed detection of the signal lamp group by adopting a nuclear correlation filtering algorithm and a linear interpolation method;
judging whether the signal lamp is mistakenly detected or missed according to the detection result of the signal lamp group, the second position coordinate and the second category;
if yes, then delete the false detection of signal lamp, supplement the hourglass of signal lamp and examine, obtain the testing result of signal lamp, specifically include:
carrying out signal lamp area extraction on the signal lamp group target detection frame; counting the types of signal lamp target detection in a signal lamp area; if no signal lamp target detection response exists in the signal lamp area, performing color supplementary classification; if the detection response form type of the signal lamp type target in the signal lamp area is not consistent with the statistical result, deleting the detection result, and then performing color supplementary classification; deleting target detection responses outside the signal lamp group;
acquiring a signal lamp area image in a first category area in a second video, carrying out Gaussian filtering, and recording the filtered image as a first image;
cutting the first image into two parts with equal areas, and respectively recording the two parts as a first part and a second part;
acquiring color channel difference values of the first part and the second part;
and judging the color of the signal lamp according to the color channel difference value.
2. The method of claim 1, wherein the second category comprises 8 signal light categories: red circle light, green circle light, red left-turn arrow, green left-turn arrow, red straight arrow, green straight arrow, red right-turn arrow, and green right-turn arrow.
3. The method of claim 1, wherein the threshold is 0.75.
4. An offline target detection system for traffic signal light video, comprising:
the neural network acquisition module is used for acquiring a YOLOv5 convolutional neural network;
the first video acquisition module is used for acquiring a video containing a traffic signal lamp and recording the video as a first video;
the training set acquisition module is used for carrying out picture segmentation on the first video and carrying out signal lamp group and signal lamp labeling to obtain training set data;
a training module for training the YOLOv5 convolutional neural network with the training set data;
the second video acquisition module is used for acquiring a traffic signal lamp video to be detected and recording the traffic signal lamp video as a second video;
the identification module is used for acquiring a first position coordinate and a first category of a signal lamp group and a second position coordinate and a second category of a signal lamp in the second video by using a trained YOLOv5 convolutional neural network, wherein the first category and the second category both comprise multiple categories;
the signal lamp group judging module is used for judging whether the false detection or the missing detection of the signal lamp group exists or not according to the first position coordinate and the first class; judging whether the error detection of the signal lamp group exists according to the first position coordinate and the first category, and specifically comprising the following steps:
sequentially traversing the first position coordinates and the first category of the signal lamp group of the second video;
judging whether one of the signal lamp groups in the first category is detected in 3 continuous frames according to the first category;
if yes, calculating the complete intersection ratio of the 1 st frame and the 2 nd frame and the 3 rd frame of the signal lamp group target frame according to the first position coordinate;
judging whether the complete intersection ratios are all smaller than a threshold value;
if yes, judging that the size and position detection error occurs in the signal lamp group target detection position response in the 2 nd frame, and deleting the target detection position response;
the signal lamp group detection module is used for deleting the error detection of the signal lamp group and supplementing the missing detection of the signal lamp group when the error detection or the missing detection of the signal lamp group exists, so that the detection result of the signal lamp group is obtained, and the detection method specifically comprises the following steps:
sequentially traversing the detection results of the signal lamp group type targets of the whole video, if the signal lamp group type targets in the frame are detected, initializing a kernel correlation filtering tracking algorithm by using an image in the range of the target frame, and initializing K =0, wherein K is a counting variable of each tracking; if the signal lamp group class target detection response does not exist in the frame, starting a nuclear correlation filtering algorithm to predict the coordinates of the signal lamp group class target frame of the frame, taking the coordinates as the target detection position response of the signal lamp group class of the frame, and initializing a nuclear correlation filtering tracking algorithm by using a prediction range image, wherein K = K + 1; repeating the steps in the traversal process, and setting the upper limit K of the tracking frame number each time 0 =20, if K>K 0 If yes, terminating the kernel correlation filtering tracking algorithm;
when the missed detection is more than 20 frames, supplementing the missed detection of the signal lamp group by adopting a nuclear correlation filtering algorithm and a linear interpolation method;
the signal lamp judging module is used for judging whether the error detection or the omission detection of the signal lamp exists according to the detection result of the signal lamp group, the second position coordinate and the second category;
the signal lamp detection module is used for deleting the error detection of the signal lamp and supplementing the missed detection of the signal lamp when the error detection or the missed detection of the signal lamp exists, and the detection result of the signal lamp is obtained and specifically comprises the following steps:
carrying out signal lamp area extraction on the signal lamp group target detection frame; counting the types of signal lamp target detection in a signal lamp area; if no signal lamp target detection response exists in the signal lamp area, performing color supplementary classification; if the detection response form type of the signal lamp type target in the signal lamp area is not consistent with the statistical result, deleting the detection result, and then performing color supplementary classification; deleting target detection responses outside the signal lamp group;
acquiring a signal lamp area image in a first category area in a second video, carrying out Gaussian filtering, and recording the filtered image as a first image;
cutting the first image into two parts with equal areas, and respectively marking the two parts as a first part and a second part;
acquiring color channel difference values of the first part and the second part;
and judging the color of the signal lamp according to the color channel difference value.
CN202210737440.XA 2022-06-28 2022-06-28 Offline target detection method and system for traffic signal lamp video Active CN114821451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210737440.XA CN114821451B (en) 2022-06-28 2022-06-28 Offline target detection method and system for traffic signal lamp video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210737440.XA CN114821451B (en) 2022-06-28 2022-06-28 Offline target detection method and system for traffic signal lamp video

Publications (2)

Publication Number Publication Date
CN114821451A CN114821451A (en) 2022-07-29
CN114821451B true CN114821451B (en) 2022-09-20

Family

ID=82522457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210737440.XA Active CN114821451B (en) 2022-06-28 2022-06-28 Offline target detection method and system for traffic signal lamp video

Country Status (1)

Country Link
CN (1) CN114821451B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108761A (en) * 2017-12-21 2018-06-01 西北工业大学 A kind of rapid transit signal lamp detection method based on depth characteristic study

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069986B (en) * 2019-03-13 2021-11-02 北京联合大学 Traffic signal lamp identification method and system based on hybrid model
CN110543814B (en) * 2019-07-22 2022-05-10 华为技术有限公司 Traffic light identification method and device
CN110532903B (en) * 2019-08-12 2022-02-22 浙江大华技术股份有限公司 Traffic light image processing method and equipment
US11527156B2 (en) * 2020-08-03 2022-12-13 Toyota Research Institute, Inc. Light emitting component-wise traffic light state, signal, and transition estimator
CN112149509B (en) * 2020-08-25 2023-05-09 浙江中控信息产业股份有限公司 Traffic signal lamp fault detection method integrating deep learning and image processing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108761A (en) * 2017-12-21 2018-06-01 西北工业大学 A kind of rapid transit signal lamp detection method based on depth characteristic study

Also Published As

Publication number Publication date
CN114821451A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN109636771B (en) Flight target detection method and system based on image processing
CN109118498B (en) Camera stain detection method, device, equipment and storage medium
CN112528878A (en) Method and device for detecting lane line, terminal device and readable storage medium
CN105512660A (en) License number identification method and device
TWI640964B (en) Image-based vehicle counting and classification system
CN112766136B (en) Space parking space detection method based on deep learning
CN113221861B (en) Multi-lane line detection method, device and detection equipment
CN111340855A (en) Road moving target detection method based on track prediction
CN112818905B (en) Finite pixel vehicle target detection method based on attention and spatio-temporal information
CN110555464A (en) Vehicle color identification method based on deep learning model
CN111027475A (en) Real-time traffic signal lamp identification method based on vision
Teutsch et al. Robust detection of moving vehicles in wide area motion imagery
CN114511568B (en) Expressway bridge overhauling method based on unmanned aerial vehicle
Asgarian Dehkordi et al. Vehicle type recognition based on dimension estimation and bag of word classification
CN111985314B (en) Smoke detection method based on ViBe and improved LBP
CN116704490A (en) License plate recognition method, license plate recognition device and computer equipment
CN109978916B (en) Vibe moving target detection method based on gray level image feature matching
CN107862341A (en) A kind of vehicle checking method
CN114821451B (en) Offline target detection method and system for traffic signal lamp video
Gui et al. A fast caption detection method for low quality video images
CN114937248A (en) Vehicle tracking method and device for cross-camera, electronic equipment and storage medium
CN111881914B (en) License plate character segmentation method and system based on self-learning threshold
CN110826564A (en) Small target semantic segmentation method and system in complex scene image
JP4784932B2 (en) Vehicle discrimination device and program thereof
CN115690162A (en) Method and device for detecting moving large target in fixed video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant