CN113111862B - Vehicle tail lamp state identification method based on action-state joint learning - Google Patents
Vehicle tail lamp state identification method based on action-state joint learning Download PDFInfo
- Publication number
- CN113111862B CN113111862B CN202110519911.5A CN202110519911A CN113111862B CN 113111862 B CN113111862 B CN 113111862B CN 202110519911 A CN202110519911 A CN 202110519911A CN 113111862 B CN113111862 B CN 113111862B
- Authority
- CN
- China
- Prior art keywords
- tail lamp
- state
- act
- vehicle
- brake
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which comprises the following steps of firstly obtaining continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, and identifying 5 types of tail lamp action characteristics implicit in each tracking fragment data by adopting a CNN-LSTM network based on an attention model: the brake is not changed, the brake is stepped, released, turned left and turned right; then, obtaining the average brightness characteristic of the high-order stop lamp corresponding to each tracking segment based on semantic segmentation of the tail lamp, and forming high-order characteristics with the action characteristics of the tail lamp; and finally, constructing a linear chain element random field model, establishing long-term dependence among continuous segments by analyzing high-order characteristics, and deducing continuous tail lamp states at all times: no action, braking, left turning and right turning; therefore, the method and the device can accurately extract hidden semantic features of the tail lamp of vehicles with different types and standards in each frame of image in different actual complex traffic scenes to obtain the continuous stable state of the tail lamp at each moment.
Description
Technical Field
The invention belongs to the technical field of automatic control, and particularly relates to a vehicle tail lamp state identification method based on action-state joint learning.
Background
Autopilot and related research has progressed greatly over the past few decades. In high dynamic, high density urban traffic scenarios, autonomous driving of automobiles requires knowledge of the behavioral intent of surrounding vehicles in order to make more intelligent, reliable decisions and plans. Most existing methods predict intent based on historical trajectories of vehicles. As an important factor, the visual signals of the vehicle, especially the tail lights, are a direct indication of the vehicle's intention. Therefore, vehicle tail identification plays a crucial role in vehicle behavior understanding and trajectory prediction. However, in practical applications, such as different practical traffic scenarios, including day, night, crowded roads and highways, the estimation of the state of the tail lights presents many challenges, which results in few mature solutions for the presently common automatic driving systems. The main challenges include: 1) Variable lighting conditions: imaging noise such as halos, shadows, and strong reflections tend to overwhelm the features of the tail lights, as shown in fig. 1 (a); 2) Non-uniform tail lamp standards: the standards for tail lights of trucks, buses, cars, SUVs, etc., including shape, color, brightness, etc., are widely varied as shown in fig. 1 (b); 3) Random relative pose: the vehicle tail lights are generally observed from different angles on the left and right sides, except that a very small number of the tail lights are observed right side by side, which may cause the tail lights to be hidden and misshapen, as shown in fig. 1 (c).
Furthermore, tail light identification is a time series problem that is not only related to the current characteristic state, but also closely related to the transition relationship between successive states. For example, the outline marker lights are similar in color and shape to brake lights and show different states during the day and night: the daytime clearance lamp is kept off; at night, the marker lights remain bright. In this case, the tail lights that are not braked during the night may be brighter than the tail lights that are braked during the day (as shown in fig. 2). Therefore, it is necessary to analyze the continuous state change of the tail light over a period of time to determine the current behavior, not just from simple features such as shape and brightness.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for identifying a state of a tail lamp of a vehicle based on joint learning of motion and state, which can obtain a continuous stable state of the tail lamp even in a complicated actual scene.
A vehicle tail lamp state identification method based on action-state joint learning comprises the following steps:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene;
s2: adopting a pre-trained attention model-based CNN-LSTM network to respectively extract tail lamp action characteristics corresponding to each tracking fragment dataWherein, the tail lamp action characteristics corresponding to each tracking fragment dataThe operation is unchanged, the brake is stepped, the brake is released, and the vehicle turns left or right;
s3: extracting brightness characteristics corresponding to each tracking fragment dataAnd to characterize the operation of the tail lightAnd brightness characteristicsCombined to form high-order featuresWherein, I l ,I r ,I b ,I h ,I thr Respectively representing the average brightness of a left tail lamp, the average brightness of a right tail lamp, the average brightness of a background, the average brightness of a high-mount stop lamp and an adaptive threshold value for segmenting the high-mount stop lamp from the tail part of the vehicle;
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field modelThe tail lamp state at each moment is obtained, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is used as no braking and no turning.
Further, the vehicle tracking sequence obtaining method comprises the following steps:
and detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle, thereby realizing multi-vehicle tracking.
Further, the acquisition method of the trace fragment data is as follows:
adopting a sliding window with a set size, sliding on the vehicle tracking sequence according to a set step length, and dividing the vehicle tracking sequence into continuous tracking fragment data S I,t,t0 Wherein, t 0 Is the start time of each trace segment data.
Further, the luminance characteristic corresponding to the last frame image of each trace fragment data is set as the luminance characteristic { I } corresponding to each trace fragment data l ,I r ,I b ,I h ,I thr And when the areas of the left tail lamp, the right tail lamp and the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
determining an interested area of the high-mount stop lamp according to the geometric center of the left tail lamp area and the geometric center of the right tail lamp area, and then performing self-adaptive segmentation on the interested area subjected to Gaussian filtering by using an OTSU algorithm to obtain a high-mount stop lamp area H;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness of the right tail lampDegree I r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr 。
Further, a four-dimensional vector { x is used c,h ,y c,h ,w h ,h h Describing the region of interest of the high mounted stop lamp, wherein (x) c,h ,y c,h ) Is the center point coordinate of the region of interest, w h And h h The width and the height of the region of interest are respectively, and the calculation formula of each element of the four-dimensional vector is as follows:
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail lamp, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
Further, when the attention model-based CNN-LSTM performs network pre-training, the CNN model is the first 14 layers of the VGG16 model, and a symmetric coding-decoding convolutional network Segnet is used to segment the weight used when the monocular image sequence.
Further, the luminance characteristic corresponding to the last frame image of each trace-section data is taken as the luminance characteristic { I } corresponding to each trace-section data l ,I r ,I b ,I h ,I thr And when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, the brightness characteristic { I l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area as the average background brightness I b (ii) a Average brightness I of background b As average brightness I of high-mount stop lamp h And an adaptive threshold I thr 。
Further, high-order features corresponding to each tracking fragment data based on linear chain element random field modelThe specific steps for acquiring the tail lamp state at each moment are as follows:
for each single-point moment t, extracting tracking segment data taking an image of a frame where the single-point moment t is located as a last frame image to execute state probability obtaining operation, obtaining conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, turning left and turning right, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operationy t Tail lamp state at current single point time t, and y t ∈{off,brake,left, right }, off for no action, brake for brake, left for left turn, right for right turn, Z (f) is a normalization factor related to f,is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
wherein, the forward vector corresponding to the current single-point time tBackward vector beta corresponding to current single point time t t (y t The recurrence formula of if) is as follows:
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 If) is a backward vector corresponding to a next single-point time of the current single-point time t, M t (y t-1 ,y t If | f) is a state transition matrix of 4 × 4, and each row and each column of the state transition matrix each represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point in time t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2);
the state transition matrix M t (y t-1 ,y t F) elements M t (y t-1 ,y t | f) the corresponding transition probability calculation formula is as follows:
wherein, K 1 Number of classes for transfer features, g k1 (y t-1 ,y t F, t) are set transfer characteristics,setting weight, K, for each transfer characteristic 2 Is the number of classes that are characteristic of the node,in order to be a set node characteristic,setting weight corresponding to each node feature;
wherein, the values of the transfer characteristics and the node characteristics are 0 or 1, and for each transfer characteristic, only the tail lamp state y corresponding to the last single point moment t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the current single-point time t is possible for the tail lamp state y t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the currently selected tail lamp action characteristic in the high-order characteristic f conforms to the actual brightness characteristic.
Further, K 1 =15,K 2 =10; wherein, the value situation and the weight setting situation of each transfer characteristic are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t)λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t)λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t)λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t)λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t)λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t)λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t)λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t)λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t)λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t)λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t)λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t)λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t)λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t)λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t)λ 15 =1
represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the conditions in brackets are met.
At the same time, the forward vector α t (y t If) and backward vector beta t (y t Initial value α of | f) 0 (y 0 If) and beta n+1 (y n+1 If) is defined as follows:
where n represents the end time of the vehicle tracking sequence.
Has the advantages that:
1. the invention provides a vehicle tail lamp state identification method based on action-state joint learning, which comprises the following steps of firstly obtaining continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, and adopting a CNN-LSTM network based on an attention model to identify 5 types of tail lamp action characteristics implicit in each tracking fragment data: the brake is not changed, the brake is stepped, released, turned left and turned right; then, obtaining the average brightness characteristic of the high-order stop lamp corresponding to each tracking segment based on semantic segmentation of the tail lamp, and forming high-order characteristics with the action characteristic of the tail lamp for analyzing a tail lamp state sequence; finally, constructing a Linear chain random field model (Linear-CRF), and establishing long-term dependence among continuous segments by analyzing high-order characteristics so as to deduce the continuous tail lamp state at each moment; therefore, the hidden semantic features of the taillight of vehicles with different types and different standards in each frame of image can be accurately extracted in different actual complex traffic scenes, so that the continuous stable state of the taillight at each moment can be obtained.
2. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which adopts a symmetric coding-decoding convolution network Segnet based on a VGG16 model to track a target vehicle for a long time, thereby generating a tracking sequence, analyzing the relative change between high-dimensional characteristic frames and effectively solving the problem of time sequence; in other words, the invention can determine the continuous stable state of the tail lamp, i.e. no action, braking, left turn and right turn, by comprehensively analyzing the continuous tail lamp action change and the current semantic expression of the tail lamp.
3. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which uses a weight value adopted when a symmetrical coding-decoding convolutional network Segnet is used for segmenting a monocular image sequence as the weight value of a CNN model in a CNN-LSTM based on an attention model, so that the network can extract relevant features of a tail lamp, pay attention to a tail lamp region more quickly, improve identification accuracy and accelerate model convergence.
4. The invention provides a vehicle tail lamp state identification method based on action-state joint learning.
5. The invention provides a vehicle tail lamp state identification method based on action-state joint learning, which is more convenient and faster by extracting a vehicle tracking sequence by using a method of a YOLO network + Deepsort architecture.
Drawings
Fig. 1 (a) illustrates the challenge of identifying tail lights of a vehicle in a complex real traffic scene with variable illumination conditions;
fig. 1 (b) illustrates the challenge of vehicle tail lamp identification in a complex real traffic scene with non-uniform tail lamp standards;
FIG. 1 (c) is a challenge of vehicle tail lamp identification under a complex real traffic scene with random relative observation poses;
FIG. 2 is a timing problem faced by vehicle tail light identification;
FIG. 3 is an overall system framework for the proposed method of the present invention;
FIG. 4 is a pre-trained CNN and attention mechanism of an attention-based CNN-LSTM model;
FIG. 5 is an LSTM structure of the attention-based CNN-LSTM model;
FIG. 6 is a process of high mounted stop lamp segmentation and luminance feature extraction;
FIG. 7 (a) is a typical scenario, i.e., estimation result of continuous state of tail light in daytime and high-speed environment;
FIG. 7 (b) is a typical scenario, namely the estimation result of continuous states of tail lamps in daytime and urban congested road environment;
FIG. 7 (c) is a typical scenario-estimation result of continuous state of tail light in daytime, various vehicle types, and congested environment;
FIG. 7 (d) is a typical scenario, a tail lamp continuous state estimation result in a night and high speed environment;
FIG. 7 (e) is a typical scenario-estimation result of continuous state of tail light in nighttime, various vehicle types, and congested environment;
fig. 7 (f) shows the estimation result of the continuous state of the tail light in a typical scene, i.e., in a nighttime environment, a poor illumination condition environment, and a congestion environment.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Aiming at an image sequence of a real traffic scene, the method firstly utilizes a monocular camera to capture an image stream, then utilizes a YOLO network to detect surrounding vehicles, and utilizes a Deepsort network to track each detected vehicle. The vehicle tracking sequence is then taken as input and segmented into successive image segments by sliding windows. Through action classification and brightness feature extraction, the action features and brightness features of the tail lamp of each segment are respectively extracted and then combined into higher-level features. Finally, the continuous taillight state is inferred by analyzing high-order features to establish long-term dependencies between continuous segments using a probabilistic-CRF model. The overall system framework is shown in fig. 3, and the specific process includes:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene, specifically:
s11: segmenting tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model, wherein the tail lamp region segmentation result is not used for vehicle tracking, but provides a basis for subsequent feature extraction (brightness features and motion features);
s12: detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle so as to realize multi-vehicle tracking;
s13: adopting a sliding window with a set size, sliding on the vehicle tracking sequence according to a set step length, and dividing the vehicle tracking sequence into continuous tracking fragment data S I,t,t0 Wherein, t 0 Is the start time of each trace segment data.
That is, the present invention first requires pre-processing of the raw data. The data preprocessing comprises tail lamp semantic segmentation, detection tracking and sliding window. The raw data is a sequence of acquired monocular images in a real traffic scene. First, a symmetric encoding-decoding convolutional network based on the VGG16 model, segnet, segments the tail light regions of all vehicles in one frame of image, but the tail light region segmentation result is not used for vehicle tracking, but provides a basis for subsequent feature extraction. Then, a vehicle tracking sequence V is obtained by applying a detection and tracking module I,t Where I is the ID of the vehicle, representing the I-th vehicle. The basic idea is to detect the position of a vehicle in each frame by using YOLO, and then match and update the bounding box of a target vehicle through a Deepsort architecture, so as to realize multi-vehicle tracking. Then, taking the vehicle tracking sequence as an input of the model, the vehicle tracking sequence is segmented into continuous segments through a sliding window with the size =16 at the speed of step =1Wherein t is 0 Is the start time of the segment, the size of the sliding windowDepending on the sampling frequency.
S2: adopting a CNN-LSTM network based on an attention model to respectively extract tail lamp action characteristics corresponding to each tracking fragment dataWherein, the tail lamp action characteristics corresponding to each tracking fragment dataThe operation is unchanged, the brake is stepped on, the brake is released, and the vehicle turns left or right.
That is, step S2 converts the fragment data in step S1Inputting the data into a CNN-LSTM network based on an attention model to extract the action characteristics of tail lightsWherein, each segment only extracts 1 action, the highest priority is turning, that is, if turning and braking occur simultaneously in the tracking segment data, the feature extracted from the tracking segment data is preferentially determined as turning. The tail light actions within a segment are divided into 5 actions: unchangeable, step on the brake, loosen the brake, turn left, turn right. Wherein, the action of the brake lamp is to step on and release the brake without change. The invariable condition means that the brake lamp in the 16-frame image has no change, and comprises two conditions of always braking and always not braking; braking means that the vehicle is not braked firstly and then is braked; the condition of braking first and then not braking is indicated by the loose braking. The left turn and the right turn are turn signal operations, which represent the flickering of the left turn signal and the right turn signal, respectively.
It should be noted that the overall flow of the CNN-LSTM network based on the attention model is as follows: at each moment, the CNN processes the image and extracts the characteristics, the visual attention model integrates the characteristics to obtain an input vector, and the LSTM network processes the input vector and predicts the current action and the attention vector at the next moment. The details of the CNN-LSTM network based on the attention model are shown in fig. 4 and 5.
At each time t, inputting a video frame into a pre-trained CNN model to obtain a feature tensor X with dimensions of K multiplied by D t . The pre-trained CNN model shown in fig. 4 is the top 14 layer of the VGG16 model, and the weights of the tail lamp semantic segmentation network (Segnet) encoder part trained in S11 are used to extract relevant features.
Then, the soft attention vector l is passed t To X t Compressing to reduce dimension and focus on the tail lamp region to obtain input vector x of LSTM t . The present invention employs the soft attention model proposed by bahdana et al, which is softmax at K × K positions, as shown in fig. 4. Thus, a vector x is input t Is defined as:
the processed information per frame is then input to the LSTM for establishing the dependency relationship between frames, as shown in fig. 5. The model of the LSTM network is:
wherein the content of the first and second substances,is an affine transformation consisting of trainable parameters of a = D + D and b =4D dimensions, where D is a number including i t 、f t 、o t 、g t 、c t 、h t Dimension of all elements within, D being x t Dimension (d) of (a). Also, d is the number of tail light operation types (d =5 in the present model).
Hidden layer vector h t Attention vector l for predicting the next time instant t+1 And the output vector y t . After passing through a full connection layer, h t Is converted into a K x K-dimensional vector, l t+1 For softmax on this vector, define as follows:
at each time instant, the LSTM model predicts the attention vector l at the next time instant t+1 And an output vector y t Vector of amount and attention l t+1 Is the position softmax, defined as follows:
output vector y t Is a d-dimensional vector, is an LSTM hidden layer vector h t Softmax for the class after tanh activation is defined as follows:
last frame output y 15 The category with the highest probability is used as the action characteristic of the tail lamp
S3: extracting brightness characteristics corresponding to each tracking fragment dataAnd to characterize the operation of the tail lightAnd brightness characteristicsCombined to form high-order featuresWherein, I l ,I r ,I b ,I h ,I thr The average brightness of the left tail light, the average brightness of the right tail light, the average brightness of the background, the average brightness of the high mount stop lights, and the adaptive threshold for dividing the high mount stop lights from the rear of the vehicle are respectively shown.
It should be noted that the operation features extracted in step S2 are not sufficient to determine the continuous state of the tail lamp. For example, when nothing has happened, e.g.When the tail light is not operated, it is difficult to distinguish between "braking" and "no braking". Therefore, it is necessary to extract the high-mount stop lamp brightness feature according to the semantic segmentation result in step S3And combined with tail lamp actionAnd forming a higher-level feature for tail lamp state sequence analysis.
Input last frame image (A) including a clip, vehicle bounding box ({ x) b ,y b W, h) and the taillight pixel set (L, R) for that vehicle. The desired output is the luminance characteristic I l ,I r ,I b ,I h ,I thr And the average brightness of the left tail lamp, the average brightness of the right tail lamp, the average brightness of the background, the average brightness of the high mount stop lamps and the adaptive threshold for dividing the high mount stop lamps from the tail of the vehicle are represented respectively, and the algorithm flow is shown in fig. 6.
First, it is necessary to determine whether the area of the left tail light, the right tail light, or the high-mount stop light on the image is larger than a set value, and whether the distance between the left tail light and the right tail light on the image is larger than the set value. When the area of the left tail lamp, the area of the right tail lamp or the area of the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, namely, the two tail lamps are accurately divided, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
the brightness characteristic corresponding to the last frame image of each tracking fragment data is taken as the brightness characteristic { I } corresponding to each tracking fragment data l ,I r ,I b ,I h ,I thr And determining the region of interest of the high-mount stop lamp according to the geometric center of the left tail lamp region and the geometric center of the right tail lamp region by using the left tail lamp region and the right tail lamp region of all vehicles in each frame image of the monocular image sequence segmented in S11, and then performing self-adaptive segmentation on the region of interest subjected to Gaussian filtering by using an OTSU algorithm to obtain the high-mount stop lampA lamp region H;
further, taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr 。
That is, the present invention first calculates the average brightness of the tail light pixel set (L, R) and the background B of the last frame image car of the segment, respectively: i is l ,I r ,I b (ii) a The background B is a portion of the last frame image a of the segment other than the left and right tail lights (L, R). The geometric centers of L, R are then calculated to determine the position of the high mounted stop lights. Taking the left taillight as an example, the average brightness I of the left taillight l And geometric center x of left taillight c,l Is calculated as:
wherein N is L Is the number of pixels belonging to the left tail light.
Then, a region of interest (ROI) of the high mounted stop lamp is determined. ROI is defined by a four-dimensional vector x c,h ,y c,h ,w h ,h h Description. In particular, the lateral center x of the ROI c,h Should be located midway between L and R. But the second-order bias term is introduced in consideration of the perspective effect caused by random relative observation poses To mitigate this effect. The ROI was calculated as follows:
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail light, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
Next, a gaussian mask is used over the ROI, which is a gaussian function with respect to position (x, y):
where σ is a set parameter.
R channel of ROIMultiplying by a Gaussian mask to produce R h R by OTSU adaptive thresholding algorithm h And performing self-adaptive segmentation to obtain a high-mount stop lamp area H. Recording the adaptive threshold for segmentation as a threshold feature: i is thr 。
Finally, the average brightness I of the high-mount stop lamp H is calculated h The definition is as follows:
wherein N is H Is the number of pixels belonging to the high-mount stop lamp.
Thus, a luminance characteristic is obtainedAnd the characteristics of actionsIn combination, high-order features for tail lamp state inference are formed:
further, when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, as in practical application, the area of the tail lamp is too small (less than 20 pixels) or the distance between the left tail lamp and the right tail lamp is too close (less than 50 pixels), the tail lamp may have a blur phenomenon, and the tail lamp semantic segmentation result may have a deviation, then I l ,I r ,I b The calculation method of (1) is as above, the average brightness I of the high-mount stop lamp h And an adaptive threshold I thr The average brightness of the background is used instead: i is h =I thr =I b 。
Thus, the brightness characteristics are obtainedAnd the characteristics of actionsIn combination, high-order features for tail lamp state inference are formed:
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field modelAnd acquiring the tail lamp state at each moment, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is taken as no braking and no turning.
It should be noted that, based on the high-order features extracted in step S2 and step S3, a Linear chain random field model (Linear-CRF) is constructed,establishing long-term dependence between successive segments, thereby inferring successive taillight states O i,t . Features extracted in step S2Given the change in state of the tail light over a period of time, the features extracted in step S3Given partial information of the state of the tail light at a single point in time, this step will infer the state of the tail light at each time from the above information.
Features of brightnessThe variable in (1) ranges from 0 to 255, representing different luminance values. Characteristics of motionThe value range is as follows:represents five tail light variation actions within a segment: constant =0, step =1, loose =2, left turn =3, right turn =4.
It should be noted that, for each vehicle tracking sequence, the invention calculates the conditional probability of 4 possible tail lamp states at the time corresponding to the last frame of each segment according to the high-order features, and then selects the state with the maximum probability as the output. Since the size of the sliding window is 16, the first 15 frames have no corresponding segment and high-order features, and thus the tail lamp state is recognized from the 16 th frame. The tail light states are classified into four types: y is t =y t E.g. { off, break, left, right }; off represents that the state of the tail lamp is 'no brake and steering', brake represents 'brake', left represents 'left turn', and right represents 'right turn' at the current single-point moment.
The specific acquisition method of the tail lamp state at each moment is as follows:
for each single-point moment t, extracting tracking fragment data taking the image of the frame where the single-point moment t is as the last frame image to execute state probability obtaining operation, obtaining the conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, left-turning and right-turning, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operationy t Tail lamp state at current single point time t, and y t E { off, brake, left, right }, off for no action, brake for braking, left for left turn, right for right turn, Z (f) is a normalization factor related to f,is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
the conditional probability P (Y) t =y t If) is a 4-dimensional vector, and the conditional probability of each possible state at a single-point time t is calculated by adopting a forward-backward algorithm, and alpha (y) is t If and beta (y) t And f) are 4-dimensional vectors and can be calculated through the recursion relation between the previous frame and the next frame.
Further, a forward vector corresponding to the current single-point time tBackward direction corresponding to current single point time tQuantity beta t (y t The recurrence formula of if) is as follows:
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 I f) is a backward vector corresponding to a next single-point time of the current single-point time t, t =1,2, …, n, n represents the end time of the vehicle tracking sequence, M t (y t-1 ,y t If | f) is 4 × 4, the state transition matrix from the last state vector (including the forward vector and the backward vector) to the current state vector, and each row and each column of the state transition matrix represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point moment t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2); it is to be noted that the possible tail light state here is not an actual tail light state but an assumed state, and the purpose is to calculate the probability of occurrence of each possible situation.
Further, a state transition matrix M t (y t-1 ,y t If) elements M t (y t-1 ,y t The general formula for the transition probability, | f) is defined as follows:
wherein, K 1 For the number of classes of the features to be transferred,in order to set the transfer characteristic of the transfer,setting weight, K, for each transfer characteristic 2 Is the number of classes that are characteristic of the node,in order to be a set node characteristic,setting weight corresponding to each node feature; in addition, y is i-1 And y i Each has 4 values, which respectively correspond to four tail lamp states, so that the state transition matrix has 4 rows and 4 columns in total. At each position (u, v) of the state transition matrix, the transition probability from state u to state v needs to be calculated according to the above formula.
Furthermore, the values of the transfer characteristics and the node characteristics are 0 or 1, and only the tail lamp state y corresponding to the last single-point moment is used for each transfer characteristic t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the tail light state y possible at the current single point time t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the action characteristic of the tail lamp in the currently selected high-order characteristic f conforms to the actual brightness characteristic.
That is, each of the transition feature and the node feature takes a value of 0 or 1, and only takes a value of 1 when a certain condition is satisfied, and the other conditions are all 0, where the conditions are based on the motion feature extracted in step S2The brightness feature extracted in step S3And the relation between the state of the tail light is uniquely designed, and the characteristic is designed to act as a characteristicMainly based on the characteristics of brightnessAs an auxiliary. Some of them are uniquely designed according to the logic relationship between the tail lamp action states in the actual situation, for example, when the last time state is brake, the action characteristic is "brake release" and the current time state is off, the characteristic value takes 1 (i.e. g) 1 ). Another part of the features is to use the brightness featureFor example, when the last time state is brake and the tail lamp operation is "brake release", the current time state should be off and the feature value should be 1, but if the brightness of the left and right tail lamps at the same time is higher than a certain threshold, it indicates that the left and right tail lamps are on, the probability that the current state is off is very small, and more likely to be brake, so the feature value is 0, while the last time state is brake, the current time state is brake and the operation feature is "brake release", and the brightness I of the left and right tail lamps is l ,I r When both are greater than 110, the characteristic value is 1 (i.e. g) 14 )。
The value and weight setting conditions of each transfer feature are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t)λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t)λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t)λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t)λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t)λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t)λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t)λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t)λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t)λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t)λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t)λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t)λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t)λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t)λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t)λ 15 =1
wherein the content of the first and second substances,represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the conditions in brackets are met. That is, to simplify the expression, the above equation gives only the value of1 under the condition of 1. In g 1 For example, a specific form of a characteristic function at time t is given as follows:
simultaneous, forward vectorAnd a backward vector beta t (y t | f) initial value (α) 0 (y 0 If) and beta n+1 (y n+1 If)) by node features s only k And (6) determining. At alpha 0 (y 0 If is an example, α 0 (y 0 I f) includes the probability of 4 possible tail light states, i.e. y 0 There are 4 possible values. For each possible value, the probability of the value is calculated by using the weighted sum of the node characteristics, and a 4-dimensional vector is generated. The specific form is as follows:
wherein alpha is 0 (y 0 If) is a forward vectorInitial value of (1), beta n+1 (y n+1 If) is a backward vector beta t (y t Initial value of f). Since the backward vector is calculated progressively forward from the end of the sequence, β is calculated t (y t Initial value of | f) starts from t = n + 1.
Thus, using the above method, it is possible to calculate the conditional probability of each state at each time and then select the category y with the highest probability i As an output at the present time, a tail light continuous state sequence is constituted.
According to the method, data sets under different actual traffic scenes, including day, night, congested roads, expressways and the like, are collected on an automatic driving platform, 6 typical scenes are selected to test the continuous tail lamp state estimation method provided by the invention (3 daytime scenes and 3 nighttime scenes), and the test results are shown in fig. 7 (a) to 7 (f), so that the method can obtain feasible results in complex actual scenes.
In summary, the invention is based on YOLO and Deepsort, realizes multi-vehicle target detection and tracking, and obtains a vehicle tracking sequence. The sequence of continuous states of the tail lights is estimated using the vehicle tracking sequence as an input. For a sequence of tracking images of a certain vehicle, it is divided into several consecutive segments using a sliding window. For each segment, an attention model-based CNN-LSTM network is used to identify the implicit 5-type tail lamp action features in the segment: unchangeable, step on the brake, loosen the brake, turn left, turn right. Then, extracting the brightness characteristic of the high-order stop lamp of the last frame of each segment based on semantic segmentation of the tail lamp, and forming high-order characteristics by combining with the action characteristic of the tail lamp for analyzing the state sequence of the tail lamp. And finally, constructing a Linear chain element random field model (Linear-CRF), and establishing long-term dependence among continuous segments by analyzing high-order characteristics so as to deduce the continuous tail lamp state: off, shake, left, right.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it will be understood by those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (9)
1. A vehicle tail lamp state identification method based on action-state combined learning is characterized by comprising the following steps:
s1: acquiring continuous tracking fragment data based on a vehicle tracking sequence obtained in a real traffic scene;
s2: CNN-LSTM network using pre-trained attention-based modelsRespectively extracting tail lamp action characteristics corresponding to each tracking fragment dataWherein, the tail lamp action characteristics corresponding to each tracking fragment dataThe brake is constant, the brake is stepped, the brake is released, and the vehicle turns left or right;
s3: extracting brightness characteristics corresponding to each tracking fragment dataAnd to characterize the operation of the tail lightAnd brightness characteristicsCombined to form high-order featuresWherein, I l ,I r ,I b ,I h ,I thr Respectively representing the average brightness of a left tail lamp, the average brightness of a right tail lamp, the average brightness of a background, the average brightness of a high-mount stop lamp and an adaptive threshold value for segmenting the high-mount stop lamp from the tail part of the vehicle;
s4: high-order features corresponding to each tracking fragment data based on linear chain piece random field modelAnd acquiring the tail lamp state at each moment, wherein the tail lamp state comprises no action, braking, left turning and right turning, and the no action is taken as no braking and no turning.
2. The vehicle tail lamp state identification method based on action-state joint learning as claimed in claim 1, wherein the vehicle tracking sequence is obtained by:
and detecting the position of each vehicle in each frame of image by using a YOLO network, and then matching and updating the bounding box of the target vehicle through a Deepsort architecture to obtain a vehicle tracking sequence of each vehicle, thereby realizing multi-vehicle tracking.
3. The vehicle tail lamp state identification method based on action-state joint learning as claimed in claim 1, wherein the tracking fragment data is obtained by:
4. The method for recognizing the state of the tail light of the vehicle based on the motion-state joint learning as claimed in claim 1, wherein the luminance characteristic corresponding to the last frame image of each trace segment data is taken as the luminance characteristic { I } corresponding to each trace segment data l ,I r ,I b ,I h ,I thr And when the areas of the left tail lamp, the right tail lamp and the high-mount stop lamp on the image are all larger than a set value, and the distance between the left tail lamp and the right tail lamp on the image is larger than the set value, the brightness characteristic { I } l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
determining an interested area of the high-mount stop lamp according to the geometric center of the left tail lamp area and the geometric center of the right tail lamp area, and then performing self-adaptive segmentation on the interested area subjected to Gaussian filtering by using an OTSU algorithm to obtain a high-mount stop lamp area H;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area except all tail lights as the average background brightness I b (ii) a Taking the pixel mean value of all pixel points in the high-mount stop lamp area H as the average brightness I of the high-mount stop lamp h (ii) a Taking the threshold value adopted when the OTSU algorithm cuts the region of interest as the self-adaptive threshold value I thr 。
5. The method of claim 4, wherein a four-dimensional vector { x } is used for identifying the state of the tail lamp of the vehicle based on the joint learning of the motion and the state c,h ,y c,h ,w h ,h h Describing areas of interest for high mounted brake lights, where (x) c,h ,y c,h ) Is the center point coordinate of the region of interest, w h And h h The width and the height of the region of interest are respectively, and the calculation formula of each element of the four-dimensional vector is as follows:
wherein x is c,tmp Is an intermediate variable, (x) c,l ,y c,l ) Is the geometric center coordinate of the left tail lamp, (x) c,r ,y c,r ) Is the geometric center of the right tail lamp, y b And the vertical coordinate of the top left corner vertex of the vehicle boundary box, and w and h are the width and the height of the vehicle boundary box respectively.
6. The method as claimed in claim 4, wherein the CNN-LSTM based on the attention model is the top 14 layers of the VGG16 model during network pre-training, and the weights used in segmenting the monocular image sequence by using the symmetric coding-decoding convolutional network Segnet are used.
7. The method for recognizing the state of the tail light of the vehicle based on the motion-state joint learning as claimed in claim 1, wherein the luminance characteristic corresponding to the last frame image of each trace segment data is taken as the luminance characteristic { I } corresponding to each trace segment data l ,I r ,I b ,I h ,I thr And when the area of the left tail lamp, the right tail lamp or the high mount stop lamp on the image is not more than the set value, or the distance between the left tail lamp and the right tail lamp on the image is not more than the set value, the brightness characteristic { I l ,I r ,I b ,I h ,I thr The extraction method comprises the following steps:
segmenting left tail lamp regions and right tail lamp regions of all vehicles in each frame image of the monocular image sequence by using a symmetric coding-decoding convolutional network Segnet based on a VGG16 model;
taking the pixel mean value of all pixel points in the left tail lamp area as the average brightness I of the left tail lamp l (ii) a Taking the pixel mean value of all pixel points in the right tail lamp area as the average brightness I of the right tail lamp r (ii) a Taking the pixel mean value of all pixel points in the background area as the average background brightness I b (ii) a Average brightness I of background b As average brightness I of high-mounted stop lamp h And an adaptive threshold I thr 。
8. The method as claimed in claim 1, wherein the method for identifying the state of the tail light of the vehicle based on the action-state joint learning is characterized in that the linear chaining member random field model is based on the corresponding high-order features of each tracking fragment dataThe specific steps for acquiring the tail lamp state at each moment are as follows:
for each single-point moment t, extracting tracking segment data taking an image of a frame where the single-point moment t is located as a last frame image to execute state probability obtaining operation, obtaining conditional probabilities that the tail lamp state of the single-point moment t is no action, braking, turning left and turning right, and taking the state with the maximum conditional probability as the final tail lamp state of the single-point moment t; wherein the state probability obtaining operation is:
the state probability P (Y) is defined as follows t =y t L f) respectively calculating the conditional probability that the tail lamp state at the current single-point time t is no action, braking, left turning and right turning:
wherein f is a high-order feature corresponding to the tracking fragment data currently selected to execute the state probability acquisition operationy t Tail lamp state at current single point time t, and y t E { off, brake, left, right }, off for no action, brake for braking, left for left turn, right for right turn, Z (f) is a normalization factor related to f,is a forward vector corresponding to the current single-point time t, beta t (y t If) is a backward vector corresponding to the current single-point time T, and T is transposition;
wherein, the forward vector corresponding to the current single-point time tBackward vector beta corresponding to current single-point time t t (y t The recurrence formula for f) is as follows:
β t (y t |f)=[M t (y t-1 ,y t |f)]β t+1 (y t+1 |f)
wherein the content of the first and second substances,is a forward vector corresponding to a previous one-point time of the current one-point time T, T is a transposition, beta t+1 (y t+1 If) is a backward vector corresponding to a next single-point time of the current single-point time t, M t (y t-1 ,y t If | f) is a state transition matrix of 4 × 4, and each row and each column of the state transition matrix each represent a tail lamp state, each element in the state transition matrix represents each possible tail lamp state y from the last single point in time t-1 Change to the possible taillight State y at the Current Single Point time t t The transition probability of (2);
the state transition matrix M t (y t-1 ,y t If) elements M t (y t-1 ,y t | f) the corresponding transition probability calculation formula is as follows:
wherein, K 1 For the number of classes of the feature to be transferred,in order to set the transfer characteristics of the transfer,setting weight, K, for each transfer characteristic 2 Is the number of classes of the node feature,in order to be a set node characteristic,setting weight corresponding to each node feature;
wherein, the value of transition characteristic and node characteristic is 0 or 1, and to each transition characteristic, only the tail lamp state y corresponding to the last single-point moment t-1 Possible tail lamp state y at current single-point moment t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet the set conditions, the value of the transfer characteristics is 1, otherwise, the value of the transfer characteristics is 0; for each node feature, only the tail light state y possible at the current single point time t t When the action characteristics of the tail lamp in the currently selected high-order characteristics f meet set conditions, the value of the node characteristics is 1, otherwise, the value of the node characteristics is 0;
the setting conditions are as follows: last moment corresponding tail lamp state y t-1 Possible tail lamp state y changed to current single-point moment t t And the brightness characteristic corresponding to the tail lamp state deduced according to the operation logic in the actual running process of the vehicle and the action characteristic of the tail lamp in the currently selected high-order characteristic f conforms to the actual brightness characteristic.
9. The method of claim 8, wherein K is K 1 =15,K 2 =10; wherein, the value situation and the weight setting situation of each transfer characteristic are as follows:
g 1 =g 1 (y t-1 =brake,y t =off,f t,act =2,t) λ 1 =1
g 2 =g 2 (y t-1 =off,y t =brake,f t,act =1,t) λ 2 =1
g 3 =g 3 (y t-1 =left,y t =left,f t,act =3,t) λ 3 =1.5
g 4 =g 4 (y t-1 =right,y t =right,f t,act =4,t) λ 4 =1.5
g 5 =g 5 (y t-1 =left,y t =off,f t,act =0,t) λ 5 =1.5
g 6 =g 6 (y t-1 =right,y t =off,f t,act =0,t) λ 6 =1.5
g 7 =g 7 (y t-1 =off,y t =left,f t,act =3,t) λ 7 =1.5
g 8 =g 8 (y t-1 =off,y t =right,f t,act =4,t) λ 8 =1.5
g 9 =g 9 (y t-1 =brake,y t =brake,f t,act =0,t) λ 9 =1
g 10 =g 10 (y t-1 =off,y t =off,f t,act =0,t) λ 10 =1
g 11 =g 11 (y t-1 =left,y t =left,f t,act =0,t) λ 11 =1
g 12 =g 12 (y t-1 =right,y t =right,f t,act =0,t) λ 12 =1
g 13 =g 13 (y t-1 =off,y t =brake,f t,act =0,I l ,I r >110,t) λ 13 =1
g 14 =g 14 (y t-1 =brake,y t =brake,f t,act =2,I l ,I r >110,t) λ 14 =1
g 15 =g 15 (y t-1 =off,y t =brake,f t,act =1,I h >120,t) λ 15 =1
represents five tail light action features within one trace segment data: constant =0, step =1, loose =2, left turn =3, right turn =4;
the value taking condition and the weight setting condition of each node feature are as follows:
s 1 =s 1 (y t =brake,f t,act =0,t) μ 1 =0.5
s 2 =s 2 (y t =off,f t,act =0,t) μ 2 =0.5
s 3 =s 3 (y t =left,f t,act =3,t) μ 3 =0.5
s 4 =s 4 (y t =right,f t,act =4,t) μ 4 =0.5
s 5 =s 5 (y t =brake,f t,act =1,t) μ 5 =0.5
s 6 =s 6 (y t =off,f t,act =2,t) μ 6 =0.5
s 7 =s 7 (y t =off,f t,act =1,I thr <75,t) μ 7 =0.5
s 8 =s 8 (y t =off,f t,act =0,I thr <75,t) μ 8 =0.5
s 9 =s 9 (y t =brake,f t,act =1,I h >120,t) μ 9 =0.5
s 8 =s 8 (y t =brake,f t,act =0,I h >120,t) μ 10 =0.5
for each expression of the transfer characteristics and the node characteristics, the values of the transfer characteristics and the node characteristics are 1 only when the parenthesized conditions are met;
at the same time, the forward vector α t (y t If) and backward vector beta t (y t Initial value α of | f) 0 (y 0 If) and beta n+1 (y n+1 If) is defined as follows:
where n represents the end time of the vehicle tracking sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110519911.5A CN113111862B (en) | 2021-05-13 | 2021-05-13 | Vehicle tail lamp state identification method based on action-state joint learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110519911.5A CN113111862B (en) | 2021-05-13 | 2021-05-13 | Vehicle tail lamp state identification method based on action-state joint learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113111862A CN113111862A (en) | 2021-07-13 |
CN113111862B true CN113111862B (en) | 2022-12-13 |
Family
ID=76722079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110519911.5A Active CN113111862B (en) | 2021-05-13 | 2021-05-13 | Vehicle tail lamp state identification method based on action-state joint learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111862B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115018076B (en) * | 2022-08-09 | 2022-11-08 | 聚时科技(深圳)有限公司 | AI chip reasoning quantification method for intelligent servo driver |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881739A (en) * | 2020-06-19 | 2020-11-03 | 安徽清新互联信息科技有限公司 | Automobile tail lamp state identification method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10733465B2 (en) * | 2017-09-20 | 2020-08-04 | Tusimple, Inc. | System and method for vehicle taillight state recognition |
US11361557B2 (en) * | 2019-01-18 | 2022-06-14 | Toyota Research Institute, Inc. | Attention-based recurrent convolutional network for vehicle taillight recognition |
-
2021
- 2021-05-13 CN CN202110519911.5A patent/CN113111862B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881739A (en) * | 2020-06-19 | 2020-11-03 | 安徽清新互联信息科技有限公司 | Automobile tail lamp state identification method |
Non-Patent Citations (2)
Title |
---|
Learning to tell brake and turn signals in videos using CNN-LSTM structure;Han-Kai Hsu et al.;《 IEEE Xplore》;20180315;全文 * |
基于视频的夜间车辆检测与跟踪算法研究;董天阳等;《计算机科学》;20171115;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113111862A (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sen-Ching et al. | Robust techniques for background subtraction in urban traffic video | |
Chiu et al. | A robust object segmentation system using a probability-based background extraction algorithm | |
Robert | Video-based traffic monitoring at day and night vehicle features detection tracking | |
US8019157B2 (en) | Method of vehicle segmentation and counting for nighttime video frames | |
US9378556B2 (en) | Method for reducing false object detection in stop-and-go scenarios | |
Chiu et al. | Automatic Traffic Surveillance System for Vision-Based Vehicle Recognition and Tracking. | |
KR102043089B1 (en) | Method for extracting driving lane, device and computer readable medium for performing the method | |
CN109063667B (en) | Scene-based video identification mode optimization and pushing method | |
WO2018058854A1 (en) | Video background removal method | |
Babaei | Vehicles tracking and classification using traffic zones in a hybrid scheme for intersection traffic management by smart cameras | |
CN113111862B (en) | Vehicle tail lamp state identification method based on action-state joint learning | |
Tavakkoli et al. | A novelty detection approach for foreground region detection in videos with quasi-stationary backgrounds | |
CN113392725A (en) | Pedestrian street crossing intention identification method based on video data | |
Gad et al. | Real-time lane instance segmentation using segnet and image processing | |
Ren et al. | Automatic measurement of traffic state parameters based on computer vision for intelligent transportation surveillance | |
Arthi et al. | Object detection of autonomous vehicles under adverse weather conditions | |
Muniruzzaman et al. | Deterministic algorithm for traffic detection in free-flow and congestion using video sensor | |
Rajagopal et al. | Vision-based system for counting of moving vehicles in different weather conditions | |
Song et al. | Action-state joint learning-based vehicle taillight recognition in diverse actual traffic scenes | |
Diamantas et al. | Modeling pixel intensities with log-normal distributions for background subtraction | |
abd el Azeem Marzouk | Modified background subtraction algorithm for motion detection in surveillance systems | |
Nicolas et al. | Video traffic analysis using scene and vehicle models | |
Roy et al. | Real-time record sensitive background classifier (RSBC) | |
CN113158747A (en) | Night snapshot identification method for black smoke vehicle | |
Mo et al. | Research on expressway traffic event detection at night based on Mask-SpyNet |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |