CN113111862B - Vehicle tail lamp state identification method based on action-state joint learning - Google Patents

Vehicle tail lamp state identification method based on action-state joint learning Download PDF

Info

Publication number
CN113111862B
CN113111862B CN202110519911.5A CN202110519911A CN113111862B CN 113111862 B CN113111862 B CN 113111862B CN 202110519911 A CN202110519911 A CN 202110519911A CN 113111862 B CN113111862 B CN 113111862B
Authority
CN
China
Prior art keywords
tail lamp
state
act
vehicle
brake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110519911.5A
Other languages
Chinese (zh)
Other versions
CN113111862A (en
Inventor
宋文杰
刘室先
张婷
杨毅
付梦印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110519911.5A priority Critical patent/CN113111862B/en
Publication of CN113111862A publication Critical patent/CN113111862A/en
Application granted granted Critical
Publication of CN113111862B publication Critical patent/CN113111862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which comprises the following steps of firstly obtaining continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, and identifying 5 types of tail lamp action characteristics implicit in each tracking fragment data by adopting a CNN-LSTM network based on an attention model: the brake is not changed, the brake is stepped, released, turned left and turned right; then, obtaining the average brightness characteristic of the high-order stop lamp corresponding to each tracking segment based on semantic segmentation of the tail lamp, and forming high-order characteristics with the action characteristics of the tail lamp; and finally, constructing a linear chain element random field model, establishing long-term dependence among continuous segments by analyzing high-order characteristics, and deducing continuous tail lamp states at all times: no action, braking, left turning and right turning; therefore, the method and the device can accurately extract hidden semantic features of the tail lamp of vehicles with different types and standards in each frame of image in different actual complex traffic scenes to obtain the continuous stable state of the tail lamp at each moment.

Description

Vehicle tail lamp state identification method based on action-state joint learning
Technical Field
The invention belongs to the technical field of automatic control, and particularly relates to a vehicle tail lamp state identification method based on action-state joint learning.
Background
Autopilot and related research has progressed greatly over the past few decades. In high dynamic, high density urban traffic scenarios, autonomous driving of automobiles requires knowledge of the behavioral intent of surrounding vehicles in order to make more intelligent, reliable decisions and plans. Most existing methods predict intent based on historical trajectories of vehicles. As an important factor, the visual signals of the vehicle, especially the tail lights, are a direct indication of the vehicle's intention. Therefore, vehicle tail identification plays a crucial role in vehicle behavior understanding and trajectory prediction. However, in practical applications, such as different practical traffic scenarios, including day, night, crowded roads and highways, the estimation of the state of the tail lights presents many challenges, which results in few mature solutions for the presently common automatic driving systems. The main challenges include: 1) Variable lighting conditions: imaging noise such as halos, shadows, and strong reflections tend to overwhelm the features of the tail lights, as shown in fig. 1 (a); 2) Non-uniform tail lamp standards: the standards for tail lights of trucks, buses, cars, SUVs, etc., including shape, color, brightness, etc., are widely varied as shown in fig. 1 (b); 3) Random relative pose: the vehicle tail lights are generally observed from different angles on the left and right sides, except that a very small number of the tail lights are observed right side by side, which may cause the tail lights to be hidden and misshapen, as shown in fig. 1 (c).
Furthermore, tail light identification is a time series problem that is not only related to the current characteristic state, but also closely related to the transition relationship between successive states. For example, the outline marker lights are similar in color and shape to brake lights and show different states during the day and night: the daytime clearance lamp is kept off; at night, the marker lights remain bright. In this case, the tail lights that are not braked during the night may be brighter than the tail lights that are braked during the day (as shown in fig. 2). Therefore, it is necessary to analyze the continuous state change of the tail light over a period of time to determine the current behavior, not just from simple features such as shape and brightness.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for identifying a state of a tail lamp of a vehicle based on joint learning of motion and state, which can obtain a continuous stable state of the tail lamp even in a complicated actual scene.
A vehicle tail lamp state identification method based on action-state joint learning comprises the following steps:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene;
s2: adopting a pre-trained attention model-based CNN-LSTM network to respectively extract tail lamp action characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000021
Wherein, the tail lamp action characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000022
The operation is unchanged, the brake is stepped, the brake is released, and the vehicle turns left or right;
s3: extracting brightness characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000023
And to characterize the operation of the tail light
Figure BDA0003063505100000024
And brightness characteristics
Figure BDA0003063505100000025
Combined to form high-order features
Figure BDA0003063505100000026
Wherein, I l ,I r ,I b ,I h ,I thr Respectively representing the average brightness of a left tail lamp, the average brightness of a right tail lamp, the average brightness of a background, the average brightness of a high-mount stop lamp and an adaptive threshold value for segmenting the high-mount stop lamp from the tail part of the vehicle;
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field model
Figure BDA0003063505100000027
The tail lamp state at each moment is obtained, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is used as no braking and no turning.
Further, the vehicle tracking sequence obtaining method comprises the following steps:
and detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle, thereby realizing multi-vehicle tracking.
Further, the acquisition method of the trace fragment data is as follows:
adopting a sliding window with a set size, sliding on the vehicle tracking sequence according to a set step length, and dividing the vehicle tracking sequence into continuous tracking fragment data S I,t,t0 Wherein, t 0 Is the start time of each trace segment data.
Further, the luminance characteristic corresponding to the last frame image of each trace fragment data is set as the luminance characteristic { I } corresponding to each trace fragment data l ,I r ,I b ,I h ,I thr And when the areas of the left tail lamp, the right tail lamp and the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
determining an interested area of the high-mount stop lamp according to the geometric center of the left tail lamp area and the geometric center of the right tail lamp area, and then performing self-adaptive segmentation on the interested area subjected to Gaussian filtering by using an OTSU algorithm to obtain a high-mount stop lamp area H;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness of the right tail lampDegree I r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr
Further, a four-dimensional vector { x is used c,h ,y c,h ,w h ,h h Describing the region of interest of the high mounted stop lamp, wherein (x) c,h ,y c,h ) Is the center point coordinate of the region of interest, w h And h h The width and the height of the region of interest are respectively, and the calculation formula of each element of the four-dimensional vector is as follows:
Figure BDA0003063505100000041
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail lamp, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
Further, when the attention model-based CNN-LSTM performs network pre-training, the CNN model is the first 14 layers of the VGG16 model, and a symmetric coding-decoding convolutional network Segnet is used to segment the weight used when the monocular image sequence.
Further, the luminance characteristic corresponding to the last frame image of each trace-section data is taken as the luminance characteristic { I } corresponding to each trace-section data l ,I r ,I b ,I h ,I thr And when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, the brightness characteristic { I l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area as the average background brightness I b (ii) a Average brightness I of background b As average brightness I of high-mount stop lamp h And an adaptive threshold I thr
Further, high-order features corresponding to each tracking fragment data based on linear chain element random field model
Figure BDA0003063505100000057
The specific steps for acquiring the tail lamp state at each moment are as follows:
for each single-point moment t, extracting tracking segment data taking an image of a frame where the single-point moment t is located as a last frame image to execute state probability obtaining operation, obtaining conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, turning left and turning right, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
Figure BDA0003063505100000051
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operation
Figure BDA0003063505100000055
y t Tail lamp state at current single point time t, and y t ∈{off,brake,left, right }, off for no action, brake for brake, left for left turn, right for right turn, Z (f) is a normalization factor related to f,
Figure BDA0003063505100000052
is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
wherein, the forward vector corresponding to the current single-point time t
Figure BDA0003063505100000056
Backward vector beta corresponding to current single point time t t (y t The recurrence formula of if) is as follows:
Figure BDA0003063505100000053
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,
Figure BDA0003063505100000054
is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 If) is a backward vector corresponding to a next single-point time of the current single-point time t, M t (y t-1 ,y t If | f) is a state transition matrix of 4 × 4, and each row and each column of the state transition matrix each represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point in time t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2);
the state transition matrix M t (y t-1 ,y t F) elements M t (y t-1 ,y t | f) the corresponding transition probability calculation formula is as follows:
Figure BDA0003063505100000061
wherein, K 1 Number of classes for transfer features, g k1 (y t-1 ,y t F, t) are set transfer characteristics,
Figure BDA0003063505100000062
setting weight, K, for each transfer characteristic 2 Is the number of classes that are characteristic of the node,
Figure BDA0003063505100000063
in order to be a set node characteristic,
Figure BDA0003063505100000064
setting weight corresponding to each node feature;
wherein, the values of the transfer characteristics and the node characteristics are 0 or 1, and for each transfer characteristic, only the tail lamp state y corresponding to the last single point moment t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the current single-point time t is possible for the tail lamp state y t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the currently selected tail lamp action characteristic in the high-order characteristic f conforms to the actual brightness characteristic.
Further, K 1 =15,K 2 =10; wherein, the value situation and the weight setting situation of each transfer characteristic are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t)λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t)λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t)λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t)λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t)λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t)λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t)λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t)λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t)λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t)λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t)λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t)λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t)λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t)λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t)λ 15 =1
Figure BDA0003063505100000071
represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the conditions in brackets are met.
At the same time, the forward vector α t (y t If) and backward vector beta t (y t Initial value α of | f) 0 (y 0 If) and beta n+1 (y n+1 If) is defined as follows:
Figure BDA0003063505100000081
Figure BDA0003063505100000082
where n represents the end time of the vehicle tracking sequence.
Has the advantages that:
1. the invention provides a vehicle tail lamp state identification method based on action-state joint learning, which comprises the following steps of firstly obtaining continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, and adopting a CNN-LSTM network based on an attention model to identify 5 types of tail lamp action characteristics implicit in each tracking fragment data: the brake is not changed, the brake is stepped, released, turned left and turned right; then, obtaining the average brightness characteristic of the high-order stop lamp corresponding to each tracking segment based on semantic segmentation of the tail lamp, and forming high-order characteristics with the action characteristic of the tail lamp for analyzing a tail lamp state sequence; finally, constructing a Linear chain random field model (Linear-CRF), and establishing long-term dependence among continuous segments by analyzing high-order characteristics so as to deduce the continuous tail lamp state at each moment; therefore, the hidden semantic features of the taillight of vehicles with different types and different standards in each frame of image can be accurately extracted in different actual complex traffic scenes, so that the continuous stable state of the taillight at each moment can be obtained.
2. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which adopts a symmetric coding-decoding convolution network Segnet based on a VGG16 model to track a target vehicle for a long time, thereby generating a tracking sequence, analyzing the relative change between high-dimensional characteristic frames and effectively solving the problem of time sequence; in other words, the invention can determine the continuous stable state of the tail lamp, i.e. no action, braking, left turn and right turn, by comprehensively analyzing the continuous tail lamp action change and the current semantic expression of the tail lamp.
3. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which uses a weight value adopted when a symmetrical coding-decoding convolutional network Segnet is used for segmenting a monocular image sequence as the weight value of a CNN model in a CNN-LSTM based on an attention model, so that the network can extract relevant features of a tail lamp, pay attention to a tail lamp region more quickly, improve identification accuracy and accelerate model convergence.
4. The invention provides a vehicle tail lamp state identification method based on action-state joint learning.
5. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which is more convenient and faster by extracting a vehicle tracking sequence by using a method of a YOLO network + Deepsort architecture.
Drawings
Fig. 1 (a) illustrates the challenge of identifying tail lights of a vehicle in a complex real traffic scene with variable illumination conditions;
fig. 1 (b) illustrates the challenge of vehicle tail lamp identification in a complex real traffic scene with non-uniform tail lamp standards;
FIG. 1 (c) is a challenge of vehicle tail lamp identification under a complex real traffic scene with random relative observation poses;
FIG. 2 is a timing problem faced by vehicle tail light identification;
FIG. 3 is an overall system framework for the proposed method of the present invention;
FIG. 4 is a pre-trained CNN and attention mechanism of an attention-based CNN-LSTM model;
FIG. 5 is an LSTM structure of the attention-based CNN-LSTM model;
FIG. 6 is a process of high mounted stop lamp segmentation and luminance feature extraction;
FIG. 7 (a) is a typical scenario, i.e., estimation result of continuous state of tail light in daytime and high-speed environment;
FIG. 7 (b) is a typical scenario, namely the estimation result of continuous states of tail lamps in daytime and urban congested road environment;
FIG. 7 (c) is a typical scenario-estimation result of continuous state of tail light in daytime, various vehicle types, and congested environment;
FIG. 7 (d) is a typical scenario, a tail lamp continuous state estimation result in a night and high speed environment;
FIG. 7 (e) is a typical scenario-estimation result of continuous state of tail light in nighttime, various vehicle types, and congested environment;
fig. 7 (f) shows the estimation result of the continuous state of the tail light in a typical scene, i.e., in a nighttime environment, a poor illumination condition environment, and a congestion environment.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Aiming at an image sequence of a real traffic scene, the method firstly utilizes a monocular camera to capture an image stream, then utilizes a YOLO network to detect surrounding vehicles, and utilizes a Deepsort network to track each detected vehicle. The vehicle tracking sequence is then taken as input and segmented into successive image segments by sliding windows. Through action classification and brightness feature extraction, the action features and brightness features of the tail lamp of each segment are respectively extracted and then combined into higher-level features. Finally, the continuous taillight state is inferred by analyzing high-order features to establish long-term dependencies between continuous segments using a probabilistic-CRF model. The overall system framework is shown in fig. 3, and the specific process includes:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, specifically:
s11: segmenting tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model, wherein the tail lamp region segmentation result is not used for vehicle tracking, but provides a basis for subsequent feature extraction (brightness features and motion features);
s12: detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle so as to realize multi-vehicle tracking;
s13: adopting a sliding window with a set size, sliding on the vehicle tracking sequence according to a set step length, and dividing the vehicle tracking sequence into continuous tracking fragment data S I,t,t0 Wherein, t 0 Is the start time of each trace segment data.
That is, the present invention first requires pre-processing of the raw data. The data preprocessing comprises tail lamp semantic segmentation, detection tracking and sliding window. The raw data is a sequence of acquired monocular images in a real traffic scene. First, a symmetric encoding-decoding convolutional network based on the VGG16 model, segnet, segments the tail light regions of all vehicles in one frame of image, but the tail light region segmentation result is not used for vehicle tracking, but provides a basis for subsequent feature extraction. Then, a vehicle tracking sequence V is obtained by applying a detection and tracking module I,t Where I is the ID of the vehicle, representing the I-th vehicle. The basic idea is to detect the position of a vehicle in each frame by using YOLO, and then match and update the bounding box of a target vehicle through a Deepsort architecture, so as to realize multi-vehicle tracking. Then, taking the vehicle tracking sequence as an input of the model, the vehicle tracking sequence is segmented into continuous segments through a sliding window with the size =16 at the speed of step =1
Figure BDA0003063505100000111
Wherein t is 0 Is the start time of the segment, the size of the sliding windowDepending on the sampling frequency.
S2: adopting a CNN-LSTM network based on an attention model to respectively extract tail lamp action characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000121
Wherein, the tail lamp action characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000122
The operation is unchanged, the brake is stepped on, the brake is released, and the vehicle turns left or right.
That is, step S2 converts the fragment data in step S1
Figure BDA0003063505100000123
Inputting the data into a CNN-LSTM network based on an attention model to extract the action characteristics of tail lights
Figure BDA0003063505100000124
Wherein, each segment only extracts 1 action, the highest priority is turning, that is, if turning and braking occur simultaneously in the tracking segment data, the feature extracted from the tracking segment data is preferentially determined as turning. The tail light actions within a segment are divided into 5 actions: unchangeable, step on the brake, loosen the brake, turn left, turn right. Wherein, the action of the brake lamp is to step on and release the brake without change. The invariable condition means that the brake lamp in the 16-frame image has no change, and comprises two conditions of always braking and always not braking; braking means that the vehicle is not braked firstly and then is braked; the condition of braking first and then not braking is indicated by the loose braking. The left turn and the right turn are turn signal operations, which represent the flickering of the left turn signal and the right turn signal, respectively.
It should be noted that the overall flow of the CNN-LSTM network based on the attention model is as follows: at each moment, the CNN processes the image and extracts the characteristics, the visual attention model integrates the characteristics to obtain an input vector, and the LSTM network processes the input vector and predicts the current action and the attention vector at the next moment. The details of the CNN-LSTM network based on the attention model are shown in fig. 4 and 5.
At each time t, inputting a video frame into a pre-trained CNN model to obtain a feature tensor X with dimensions of K multiplied by D t . The pre-trained CNN model shown in fig. 4 is the top 14 layer of the VGG16 model, and the weights of the tail lamp semantic segmentation network (Segnet) encoder part trained in S11 are used to extract relevant features.
Then, the soft attention vector l is passed t To X t Compressing to reduce dimension and focus on the tail lamp region to obtain input vector x of LSTM t . The present invention employs the soft attention model proposed by bahdana et al, which is softmax at K × K positions, as shown in fig. 4. Thus, a vector x is input t Is defined as:
Figure BDA0003063505100000131
the processed information per frame is then input to the LSTM for establishing the dependency relationship between frames, as shown in fig. 5. The model of the LSTM network is:
Figure BDA0003063505100000132
wherein the content of the first and second substances,
Figure BDA0003063505100000133
is an affine transformation consisting of trainable parameters of a = D + D and b =4D dimensions, where D is a number including i t 、f t 、o t 、g t 、c t 、h t Dimension of all elements within, D being x t Dimension (d) of (a). Also, d is the number of tail light operation types (d =5 in the present model).
Hidden layer vector h t Attention vector l for predicting the next time instant t+1 And the output vector y t . After passing through a full connection layer, h t Is converted into a K x K-dimensional vector, l t+1 For softmax on this vector, define as follows:
Figure BDA0003063505100000134
at each time instant, the LSTM model predicts the attention vector l at the next time instant t+1 And an output vector y t Vector of amount and attention l t+1 Is the position softmax, defined as follows:
output vector y t Is a d-dimensional vector, is an LSTM hidden layer vector h t Softmax for the class after tanh activation is defined as follows:
Figure BDA0003063505100000135
last frame output y 15 The category with the highest probability is used as the action characteristic of the tail lamp
Figure BDA0003063505100000136
S3: extracting brightness characteristics corresponding to each tracking fragment data
Figure BDA0003063505100000141
And to characterize the operation of the tail light
Figure BDA0003063505100000142
And brightness characteristics
Figure BDA0003063505100000143
Combined to form high-order features
Figure BDA0003063505100000144
Wherein, I l ,I r ,I b ,I h ,I thr The average brightness of the left tail light, the average brightness of the right tail light, the average brightness of the background, the average brightness of the high mount stop lights, and the adaptive threshold for dividing the high mount stop lights from the rear of the vehicle are respectively shown.
It should be noted that the operation features extracted in step S2 are not sufficient to determine the continuous state of the tail lamp. For example, when nothing has happened, e.g.When the tail light is not operated, it is difficult to distinguish between "braking" and "no braking". Therefore, it is necessary to extract the high-mount stop lamp brightness feature according to the semantic segmentation result in step S3
Figure BDA0003063505100000145
And combined with tail lamp action
Figure BDA0003063505100000146
And forming a higher-level feature for tail lamp state sequence analysis.
Input last frame image (A) including a clip, vehicle bounding box ({ x) b ,y b W, h) and the taillight pixel set (L, R) for that vehicle. The desired output is the luminance characteristic I l ,I r ,I b ,I h ,I thr And the average brightness of the left tail lamp, the average brightness of the right tail lamp, the average brightness of the background, the average brightness of the high mount stop lamps and the adaptive threshold for dividing the high mount stop lamps from the tail of the vehicle are represented respectively, and the algorithm flow is shown in fig. 6.
First, it is necessary to determine whether the area of the left tail light, the right tail light, or the high-mount stop light on the image is larger than a set value, and whether the distance between the left tail light and the right tail light on the image is larger than the set value. When the area of the left tail lamp, the area of the right tail lamp or the area of the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, namely, the two tail lamps are accurately divided, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
the brightness characteristic corresponding to the last frame image of each tracking fragment data is taken as the brightness characteristic { I } corresponding to each tracking fragment data l ,I r ,I b ,I h ,I thr And determining the region of interest of the high-mount stop lamp according to the geometric center of the left tail lamp region and the geometric center of the right tail lamp region by using the left tail lamp region and the right tail lamp region of all vehicles in each frame image of the monocular image sequence segmented in S11, and then performing self-adaptive segmentation on the region of interest subjected to Gaussian filtering by using an OTSU algorithm to obtain the high-mount stop lampA lamp region H;
further, taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr
That is, the present invention first calculates the average brightness of the tail light pixel set (L, R) and the background B of the last frame image car of the segment, respectively: i is l ,I r ,I b (ii) a The background B is a portion of the last frame image a of the segment other than the left and right tail lights (L, R). The geometric centers of L, R are then calculated to determine the position of the high mounted stop lights. Taking the left taillight as an example, the average brightness I of the left taillight l And geometric center x of left taillight c,l Is calculated as:
Figure BDA0003063505100000151
Figure BDA0003063505100000152
wherein N is L Is the number of pixels belonging to the left tail light.
Then, a region of interest (ROI) of the high mounted stop lamp is determined. ROI is defined by a four-dimensional vector x c,h ,y c,h ,w h ,h h Description. In particular, the lateral center x of the ROI c,h Should be located midway between L and R. But the second-order bias term is introduced in consideration of the perspective effect caused by random relative observation poses
Figure BDA0003063505100000153
Figure BDA0003063505100000154
To mitigate this effect. The ROI was calculated as follows:
Figure BDA0003063505100000161
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail light, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
Next, a gaussian mask is used over the ROI, which is a gaussian function with respect to position (x, y):
Figure BDA0003063505100000162
where σ is a set parameter.
R channel of ROI
Figure BDA0003063505100000163
Multiplying by a Gaussian mask to produce R h R by OTSU adaptive thresholding algorithm h And performing self-adaptive segmentation to obtain a high-mount stop lamp area H. Recording the adaptive threshold for segmentation as a threshold feature: i is thr
Finally, the average brightness I of the high-mount stop lamp H is calculated h The definition is as follows:
Figure BDA0003063505100000164
wherein N is H Is the number of pixels belonging to the high-mount stop lamp.
Thus, a luminance characteristic is obtained
Figure BDA0003063505100000165
And the characteristics of actions
Figure BDA0003063505100000166
In combination, high-order features for tail lamp state inference are formed:
Figure BDA0003063505100000171
further, when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, as in practical application, the area of the tail lamp is too small (less than 20 pixels) or the distance between the left tail lamp and the right tail lamp is too close (less than 50 pixels), the tail lamp may have a blur phenomenon, and the tail lamp semantic segmentation result may have a deviation, then I l ,I r ,I b The calculation method of (1) is as above, the average brightness I of the high-mount stop lamp h And an adaptive threshold I thr The average brightness of the background is used instead: i is h =I thr =I b
Thus, the brightness characteristics are obtained
Figure BDA0003063505100000172
And the characteristics of actions
Figure BDA0003063505100000173
In combination, high-order features for tail lamp state inference are formed:
Figure BDA0003063505100000174
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field model
Figure BDA00030635051000001710
And acquiring the tail lamp state at each moment, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is taken as no braking and no turning.
It should be noted that, based on the high-order features extracted in step S2 and step S3, a Linear chain random field model (Linear-CRF) is constructed,establishing long-term dependence between successive segments, thereby inferring successive taillight states O i,t . Features extracted in step S2
Figure BDA0003063505100000175
Given the change in state of the tail light over a period of time, the features extracted in step S3
Figure BDA0003063505100000176
Given partial information of the state of the tail light at a single point in time, this step will infer the state of the tail light at each time from the above information.
Features of brightness
Figure BDA0003063505100000177
The variable in (1) ranges from 0 to 255, representing different luminance values. Characteristics of motion
Figure BDA0003063505100000178
The value range is as follows:
Figure BDA0003063505100000179
represents five tail light variation actions within a segment: constant =0, step =1, loose =2, left turn =3, right turn =4.
It should be noted that, for each vehicle tracking sequence, the invention calculates the conditional probability of 4 possible tail lamp states at the time corresponding to the last frame of each segment according to the high-order features, and then selects the state with the maximum probability as the output. Since the size of the sliding window is 16, the first 15 frames have no corresponding segment and high-order features, and thus the tail lamp state is recognized from the 16 th frame. The tail light states are classified into four types: y is t =y t E.g. { off, break, left, right }; off represents that the state of the tail lamp is 'no brake and steering', brake represents 'brake', left represents 'left turn', and right represents 'right turn' at the current single-point moment.
The specific acquisition method of the tail lamp state at each moment is as follows:
for each single-point moment t, extracting tracking fragment data taking the image of the frame where the single-point moment t is as the last frame image to execute state probability obtaining operation, obtaining the conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, left-turning and right-turning, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
Figure BDA0003063505100000181
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operation
Figure BDA0003063505100000182
y t Tail lamp state at current single point time t, and y t E { off, brake, left, right }, off for no action, brake for braking, left for left turn, right for right turn, Z (f) is a normalization factor related to f,
Figure BDA0003063505100000183
is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
the conditional probability P (Y) t =y t If) is a 4-dimensional vector, and the conditional probability of each possible state at a single-point time t is calculated by adopting a forward-backward algorithm, and alpha (y) is t If and beta (y) t And f) are 4-dimensional vectors and can be calculated through the recursion relation between the previous frame and the next frame.
Further, a forward vector corresponding to the current single-point time t
Figure BDA0003063505100000184
Backward direction corresponding to current single point time tQuantity beta t (y t The recurrence formula of if) is as follows:
Figure BDA0003063505100000191
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,
Figure BDA0003063505100000192
is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 I f) is a backward vector corresponding to a next single-point time of the current single-point time t, t =1,2, …, n, n represents the end time of the vehicle tracking sequence, M t (y t-1 ,y t If | f) is 4 × 4, the state transition matrix from the last state vector (including the forward vector and the backward vector) to the current state vector, and each row and each column of the state transition matrix represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point moment t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2); it is to be noted that the possible tail light state here is not an actual tail light state but an assumed state, and the purpose is to calculate the probability of occurrence of each possible situation.
Further, a state transition matrix M t (y t-1 ,y t If) elements M t (y t-1 ,y t The general formula for the transition probability, | f) is defined as follows:
Figure BDA0003063505100000193
wherein, K 1 For the number of classes of the features to be transferred,
Figure BDA0003063505100000194
in order to set the transfer characteristic of the transfer,
Figure BDA0003063505100000195
setting weight, K, for each transfer characteristic 2 Is the number of classes that are characteristic of the node,
Figure BDA0003063505100000196
in order to be a set node characteristic,
Figure BDA0003063505100000197
setting weight corresponding to each node feature; in addition, y is i-1 And y i Each has 4 values, which respectively correspond to four tail lamp states, so that the state transition matrix has 4 rows and 4 columns in total. At each position (u, v) of the state transition matrix, the transition probability from state u to state v needs to be calculated according to the above formula.
Furthermore, the values of the transfer characteristics and the node characteristics are 0 or 1, and only the tail lamp state y corresponding to the last single-point moment is used for each transfer characteristic t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the tail light state y possible at the current single point time t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the action characteristic of the tail lamp in the currently selected high-order characteristic f conforms to the actual brightness characteristic.
That is, each of the transition feature and the node feature takes a value of 0 or 1, and only takes a value of 1 when a certain condition is satisfied, and the other conditions are all 0, where the conditions are based on the motion feature extracted in step S2
Figure BDA0003063505100000201
The brightness feature extracted in step S3
Figure BDA0003063505100000202
And the relation between the state of the tail light is uniquely designed, and the characteristic is designed to act as a characteristic
Figure BDA0003063505100000203
Mainly based on the characteristics of brightness
Figure BDA0003063505100000204
As an auxiliary. Some of them are uniquely designed according to the logic relationship between the tail lamp action states in the actual situation, for example, when the last time state is brake, the action characteristic is "brake release" and the current time state is off, the characteristic value takes 1 (i.e. g) 1 ). Another part of the features is to use the brightness feature
Figure BDA0003063505100000205
For example, when the last time state is brake and the tail lamp operation is "brake release", the current time state should be off and the feature value should be 1, but if the brightness of the left and right tail lamps at the same time is higher than a certain threshold, it indicates that the left and right tail lamps are on, the probability that the current state is off is very small, and more likely to be brake, so the feature value is 0, while the last time state is brake, the current time state is brake and the operation feature is "brake release", and the brightness I of the left and right tail lamps is l ,I r When both are greater than 110, the characteristic value is 1 (i.e. g) 14 )。
The value and weight setting conditions of each transfer feature are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t)λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t)λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t)λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t)λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t)λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t)λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t)λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t)λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t)λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t)λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t)λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t)λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t)λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t)λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t)λ 15 =1
wherein the content of the first and second substances,
Figure BDA0003063505100000211
represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the conditions in brackets are met. That is, to simplify the expression, the above equation gives only the value of1 under the condition of 1. In g 1 For example, a specific form of a characteristic function at time t is given as follows:
Figure BDA0003063505100000221
simultaneous, forward vector
Figure BDA0003063505100000222
And a backward vector beta t (y t | f) initial value (α) 0 (y 0 If) and beta n+1 (y n+1 If)) by node features s only k And (6) determining. At alpha 0 (y 0 If is an example, α 0 (y 0 I f) includes the probability of 4 possible tail light states, i.e. y 0 There are 4 possible values. For each possible value, the probability of the value is calculated by using the weighted sum of the node characteristics, and a 4-dimensional vector is generated. The specific form is as follows:
Figure BDA0003063505100000223
Figure BDA0003063505100000224
wherein alpha is 0 (y 0 If) is a forward vector
Figure BDA0003063505100000225
Initial value of (1), beta n+1 (y n+1 If) is a backward vector beta t (y t Initial value of f). Since the backward vector is calculated progressively forward from the end of the sequence, β is calculated t (y t Initial value of | f) starts from t = n + 1.
Thus, using the above method, it is possible to calculate the conditional probability of each state at each time and then select the category y with the highest probability i As an output at the present time, a tail light continuous state sequence is constituted.
According to the method, data sets under different actual traffic scenes, including day, night, congested roads, expressways and the like, are collected on an automatic driving platform, 6 typical scenes are selected to test the continuous tail lamp state estimation method provided by the invention (3 daytime scenes and 3 nighttime scenes), and the test results are shown in fig. 7 (a) to 7 (f), so that the method can obtain feasible results in complex actual scenes.
In summary, the invention is based on YOLO and Deepsort, realizes multi-vehicle target detection and tracking, and obtains a vehicle tracking sequence. The sequence of continuous states of the tail lights is estimated using the vehicle tracking sequence as an input. For a sequence of tracking images of a certain vehicle, it is divided into several consecutive segments using a sliding window. For each segment, an attention model-based CNN-LSTM network is used to identify the implicit 5-type tail lamp action features in the segment: unchangeable, step on the brake, loosen the brake, turn left, turn right. Then, extracting the brightness characteristic of the high-order stop lamp of the last frame of each segment based on semantic segmentation of the tail lamp, and forming high-order characteristics by combining with the action characteristic of the tail lamp for analyzing the state sequence of the tail lamp. And finally, constructing a Linear chain element random field model (Linear-CRF), and establishing long-term dependence among continuous segments by analyzing high-order characteristics so as to deduce the continuous tail lamp state: off, shake, left, right.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (9)

1. A vehicle tail lamp state identification method based on action-state combined learning is characterized by comprising the following steps:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene;
s2: CNN-LSTM network using pre-trained attention-based modelsRespectively extracting tail lamp action characteristics corresponding to each tracking fragment data
Figure FDA0003063505090000011
Wherein, the tail lamp action characteristics corresponding to each tracking fragment data
Figure FDA0003063505090000012
The brake is constant, the brake is stepped, the brake is released, and the vehicle turns left or right;
s3: extracting brightness characteristics corresponding to each tracking fragment data
Figure FDA0003063505090000013
And to characterize the operation of the tail light
Figure FDA0003063505090000014
And brightness characteristics
Figure FDA0003063505090000015
Combined to form high-order features
Figure FDA0003063505090000016
Wherein, I l ,I r ,I b ,I h ,I thr Respectively representing the average brightness of a left tail lamp, the average brightness of a right tail lamp, the average brightness of a background, the average brightness of a high-mount stop lamp and an adaptive threshold value for segmenting the high-mount stop lamp from the tail part of the vehicle;
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field model
Figure FDA0003063505090000017
And acquiring the tail lamp state at each moment, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is taken as no braking and no turning.
2. The vehicle tail lamp state identification method based on action-state joint learning as claimed in claim 1, wherein the vehicle tracking sequence is obtained by:
and detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle, thereby realizing multi-vehicle tracking.
3. The vehicle tail lamp state identification method based on action-state joint learning as claimed in claim 1, wherein the tracking fragment data is obtained by:
adopting a sliding window with a set size, sliding on the vehicle tracking sequence according to a set step length, and dividing the vehicle tracking sequence into continuous tracking fragment data
Figure FDA0003063505090000021
Wherein, t 0 Is the start time of each trace segment data.
4. The method for recognizing the state of the tail light of the vehicle based on the motion-state joint learning as claimed in claim 1, wherein the luminance characteristic corresponding to the last frame image of each trace segment data is taken as the luminance characteristic { I } corresponding to each trace segment data l ,I r ,I b ,I h ,I thr And when the areas of the left tail lamp, the right tail lamp and the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
determining an interested area of the high-mount stop lamp according to the geometric center of the left tail lamp area and the geometric center of the right tail lamp area, and then performing self-adaptive segmentation on the interested area subjected to Gaussian filtering by using an OTSU algorithm to obtain a high-mount stop lamp area H;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr
5. The method of claim 4, wherein a four-dimensional vector { x } is used for identifying the state of the tail lamp of the vehicle based on the joint learning of the motion and the state c,h ,y c,h ,w h ,h h Describing areas of interest for high mounted brake lights, where (x) c,h ,y c,h ) Is the center point coordinate of the region of interest, w h And h h The width and the height of the region of interest are respectively, and the calculation formula of each element of the four-dimensional vector is as follows:
Figure FDA0003063505090000031
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail lamp, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
6. The method as claimed in claim 4, wherein the CNN-LSTM based on the attention model is the top 14 layers of the VGG16 model during network pre-training, and the weights used in segmenting the monocular image sequence by using the symmetric coding-decoding convolutional network Segnet are used.
7. The method for recognizing the state of the tail light of the vehicle based on the motion-state joint learning as claimed in claim 1, wherein the luminance characteristic corresponding to the last frame image of each trace segment data is taken as the luminance characteristic { I } corresponding to each trace segment data l ,I r ,I b ,I h ,I thr And when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, the brightness characteristic { I l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area as the average background brightness I b (ii) a Average brightness I of background b As average brightness I of high-mounted stop lamp h And an adaptive threshold I thr
8. The method as claimed in claim 1, wherein the method for identifying the state of the tail light of the vehicle based on the action-state joint learning is characterized in that the linear chaining member random field model is based on the corresponding high-order features of each tracking fragment data
Figure FDA0003063505090000046
The specific steps for acquiring the tail lamp state at each moment are as follows:
for each single-point moment t, extracting tracking segment data taking an image of a frame where the single-point moment t is located as a last frame image to execute state probability obtaining operation, obtaining conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, turning left and turning right, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
Figure FDA0003063505090000041
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operation
Figure FDA0003063505090000042
y t Tail lamp state at current single point time t, and y t E { off, brake, left, right }, off for no action, brake for braking, left for left turn, right for right turn, Z (f) is a normalization factor related to f,
Figure FDA0003063505090000043
is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
wherein, the forward vector corresponding to the current single-point time t
Figure FDA0003063505090000044
Backward vector beta corresponding to current single-point time t t (y t The recurrence formula for f) is as follows:
Figure FDA0003063505090000045
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,
Figure FDA0003063505090000051
is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 If) is a backward vector corresponding to a next single-point time of the current single-point time t, M t (y t-1 ,y t If | f) is a state transition matrix of 4 × 4, and each row and each column of the state transition matrix each represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point in time t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2);
the state transition matrix M t (y t-1 ,y t If) elements M t (y t-1 ,y t | f) the corresponding transition probability calculation formula is as follows:
Figure FDA0003063505090000052
wherein, K 1 For the number of classes of the feature to be transferred,
Figure FDA0003063505090000056
in order to set the transfer characteristics of the transfer,
Figure FDA0003063505090000053
setting weight, K, for each transfer characteristic 2 Is the number of classes of the node feature,
Figure FDA0003063505090000054
in order to be a set node characteristic,
Figure FDA0003063505090000055
setting weight corresponding to each node feature;
wherein, the value of transition characteristic and node characteristic is 0 or 1, and to each transition characteristic, only the tail lamp state y corresponding to the last single-point moment t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the tail light state y possible at the current single point time t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the action characteristic of the tail lamp in the currently selected high-order characteristic f conforms to the actual brightness characteristic.
9. The method of claim 8, wherein K is K 1 =15,K 2 =10; wherein, the value situation and the weight setting situation of each transfer characteristic are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t) λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t) λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t) λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t) λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t) λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t) λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t) λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t) λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t) λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t) λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t) λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t) λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t) λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t) λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t) λ 15 =1
Figure FDA0003063505090000061
represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the parenthesized conditions are met;
at the same time, the forward vector α t (y t If) and backward vector beta t (y t Initial value α of | f) 0 (y 0 If) and beta n+1 (y n+1 If) is defined as follows:
Figure FDA0003063505090000071
Figure FDA0003063505090000072
where n represents the end time of the vehicle tracking sequence.
CN202110519911.5A 2021-05-13 2021-05-13 Vehicle tail lamp state identification method based on action-state joint learning Active CN113111862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519911.5A CN113111862B (en) 2021-05-13 2021-05-13 Vehicle tail lamp state identification method based on action-state joint learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519911.5A CN113111862B (en) 2021-05-13 2021-05-13 Vehicle tail lamp state identification method based on action-state joint learning

Publications (2)

Publication Number Publication Date
CN113111862A CN113111862A (en) 2021-07-13
CN113111862B true CN113111862B (en) 2022-12-13

Family

ID=76722079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519911.5A Active CN113111862B (en) 2021-05-13 2021-05-13 Vehicle tail lamp state identification method based on action-state joint learning

Country Status (1)

Country Link
CN (1) CN113111862B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018076B (en) * 2022-08-09 2022-11-08 聚时科技(深圳)有限公司 AI chip reasoning quantification method for intelligent servo driver

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881739A (en) * 2020-06-19 2020-11-03 安徽清新互联信息科技有限公司 Automobile tail lamp state identification method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733465B2 (en) * 2017-09-20 2020-08-04 Tusimple, Inc. System and method for vehicle taillight state recognition
US11361557B2 (en) * 2019-01-18 2022-06-14 Toyota Research Institute, Inc. Attention-based recurrent convolutional network for vehicle taillight recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881739A (en) * 2020-06-19 2020-11-03 安徽清新互联信息科技有限公司 Automobile tail lamp state identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Learning to tell brake and turn signals in videos using CNN-LSTM structure;Han-Kai Hsu et al.;《 IEEE Xplore》;20180315;全文 *
基于视频的夜间车辆检测与跟踪算法研究;董天阳等;《计算机科学》;20171115;全文 *

Also Published As

Publication number Publication date
CN113111862A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
Sen-Ching et al. Robust techniques for background subtraction in urban traffic video
Chiu et al. A robust object segmentation system using a probability-based background extraction algorithm
Robert Video-based traffic monitoring at day and night vehicle features detection tracking
US8019157B2 (en) Method of vehicle segmentation and counting for nighttime video frames
US9378556B2 (en) Method for reducing false object detection in stop-and-go scenarios
Chiu et al. Automatic Traffic Surveillance System for Vision-Based Vehicle Recognition and Tracking.
KR102043089B1 (en) Method for extracting driving lane, device and computer readable medium for performing the method
CN109063667B (en) Scene-based video identification mode optimization and pushing method
WO2018058854A1 (en) Video background removal method
Babaei Vehicles tracking and classification using traffic zones in a hybrid scheme for intersection traffic management by smart cameras
CN113111862B (en) Vehicle tail lamp state identification method based on action-state joint learning
Tavakkoli et al. A novelty detection approach for foreground region detection in videos with quasi-stationary backgrounds
CN113392725A (en) Pedestrian street crossing intention identification method based on video data
Gad et al. Real-time lane instance segmentation using segnet and image processing
Ren et al. Automatic measurement of traffic state parameters based on computer vision for intelligent transportation surveillance
Arthi et al. Object detection of autonomous vehicles under adverse weather conditions
Muniruzzaman et al. Deterministic algorithm for traffic detection in free-flow and congestion using video sensor
Rajagopal et al. Vision-based system for counting of moving vehicles in different weather conditions
Song et al. Action-state joint learning-based vehicle taillight recognition in diverse actual traffic scenes
Diamantas et al. Modeling pixel intensities with log-normal distributions for background subtraction
abd el Azeem Marzouk Modified background subtraction algorithm for motion detection in surveillance systems
Nicolas et al. Video traffic analysis using scene and vehicle models
Roy et al. Real-time record sensitive background classifier (RSBC)
CN113158747A (en) Night snapshot identification method for black smoke vehicle
Mo et al. Research on expressway traffic event detection at night based on Mask-SpyNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant