CN112016499A - Traffic scene risk assessment method and system based on multi-branch convolutional neural network - Google Patents
Traffic scene risk assessment method and system based on multi-branch convolutional neural network Download PDFInfo
- Publication number
- CN112016499A CN112016499A CN202010921517.XA CN202010921517A CN112016499A CN 112016499 A CN112016499 A CN 112016499A CN 202010921517 A CN202010921517 A CN 202010921517A CN 112016499 A CN112016499 A CN 112016499A
- Authority
- CN
- China
- Prior art keywords
- convolutional neural
- neural network
- branch
- traffic scene
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012502 risk assessment Methods 0.000 title claims description 37
- 238000012549 training Methods 0.000 claims abstract description 55
- 230000003287 optical effect Effects 0.000 claims abstract description 43
- 238000012360 testing method Methods 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 20
- 238000011176 pooling Methods 0.000 claims description 19
- 238000011156 evaluation Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 12
- 238000010586 diagram Methods 0.000 claims description 9
- 206010039203 Road traffic accident Diseases 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 5
- 238000006073 displacement reaction Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 abstract description 5
- 239000000284 extract Substances 0.000 abstract 1
- 238000001514 detection method Methods 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012418 validation experiment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Primary Health Care (AREA)
- Multimedia (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The scheme firstly extracts optical flow information according to video frames, then takes the video frame images as spatial information, and inputs the optical flow images as time information into a space-time multi-branch convolutional neural network for learning and training; in addition, the present disclosure adds a time shift module and an attention module on the basis of the convolutional neural network: the time shifting module enables the space-time characteristics between adjacent frames to realize information exchange without increasing the number of network parameters; the attention module can learn the areas where special changes occur in the scene, which all increase the accuracy of risk estimation; a sparse time sampling strategy is adopted in the training and testing process, the video is divided into multiple sections, video frames are extracted and input into a network, information redundancy between adjacent frames is avoided, and meanwhile the calculation speed in practical application is accelerated.
Description
Technical Field
The present disclosure relates to the technical field of traffic scene risk assessment, and in particular, to a traffic scene risk assessment method and system based on a multi-branch convolutional neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the continuous development of global economy, the number of automobiles and drivers is increasing year by year, and according to data published by the ministry of public security, the quantity of automobiles in China breaks through 2 hundred million, and the problem of road traffic safety is more prominent and is widely concerned by people. The latest statistical data of the world health organization shows that about 125 thousands of people die due to road traffic accidents every year around the world, China is the country with the most road traffic accidents all over the world, and the huge quantity of automobiles enables China to bear huge road traffic safety threats.
In recent years, video shooting and storage technologies are rapidly developed, a vehicle event data recorder serving as a responsibility judgment auxiliary tool after a road traffic accident occurs enters the market, the installation rate is increased year by year, and then a large number of videos shot on the vehicle are generated. In the field of automatic driving, the automobile detects the surrounding environment of the automobile through sensors such as an ultrasonic radar, a millimeter wave radar, a laser radar and a camera, and a large amount of video data is generated. If the current road environment can be subjected to risk assessment through the videos and a driver or an automatic driving system is timely reminded, the driving safety is greatly improved, accidents are reduced, and the life and property safety of the driver is guaranteed.
Most of the existing traffic scene risk assessment methods use videos shot by a road monitoring camera, and one scheme is that the video sequence of road traffic is collected, vehicle acceleration, direction change and geometric position information in the videos are extracted, and whether abnormal events occur or not is judged; and the other type of system detects and tracks moving vehicles in the monitoring video, and analyzes the vehicle behaviors according to the characteristics of the speed, the driving direction, the track and the like of the target. The inventor finds that although the road monitoring camera has the advantages that the video background is fixed, the angle is proper, and dangerous driving behaviors can be detected or predicted by adopting some traditional image processing methods or a simple neural network, the detection result of the road monitoring camera only can provide road risk assessment in a large range, road risk at a local position cannot be effectively assessed, and the assessment result cannot be effectively provided for a driver in real time. Based on the above problems, some researchers have proposed a traffic scene risk estimation method for videos shot by a vehicle-mounted camera, but due to the problems of background motion, video jitter, shooting angle and the like of videos shot by the vehicle-mounted camera, the method has a complex design process of a feature extraction model, cannot obtain good evaluation accuracy, and consumes relatively long time for calculation.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a traffic scene risk assessment method and system based on a multi-branch convolutional neural network, and the scheme of the present disclosure performs traffic scene risk assessment by using the multi-branch convolutional neural network, and designs a time shift module, an attention module and a sparse time sampling mechanism, thereby effectively improving accuracy and efficiency of risk assessment.
According to a first aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment method based on a multi-branch convolutional neural network, including:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Further, before constructing the data set, the obtained video frame sequence and the optical flow diagram data need to be classified according to the possibility of traffic accidents, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.
Furthermore, the time shifting module is used for increasing information exchange and fusion between adjacent frames, the time shifting module performs displacement in a time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channel is kept unchanged; this process can be expressed as:
Y=ω1c1T+1+ω2c2T-1+ω3c3T0
wherein the channel c is divided into three parts (c1, c2 and c3), (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and1,ω2,ω3) Representing the weight.
Further, the attention module is inserted into the residual convolutional layer, the attention module includes a channel attention module and a space attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the input into the one-dimensional convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be expressed as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ represents an activation function;
further, the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together, sends the input into the convolutional layer for learning, obtains the channel attention weight through an activation function, and the calculation process can be expressed as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
Further, training the multi-branch convolutional neural network model, namely segmenting video frame data of a training set at equal intervals, randomly selecting and stacking the video frame data from each segment, and inputting the video frame data into a spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.
Further, when risk assessment is carried out by utilizing the trained multi-branch convolutional neural network model, video frame data in a test set are segmented at equal intervals, one frame with the largest change is selected from each segment and stacked, and the selected frame is input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.
According to a second aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment system based on a multi-branch convolutional neural network, including:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including an image acquisition apparatus, a memory, a processor, and a computer program stored in the memory for execution, where the processor implements the above-mentioned traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a traffic scene risk assessment system based on a multi-branch convolutional neural network as described above.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the scheme of the disclosure automatically learns the video space-time characteristics by means of the multi-branch convolutional neural network, thereby avoiding the complex task of designing a characteristic extractor to perform manual characteristic extraction on the video;
(2) according to the scheme, the time shifting module and the attention module are designed, the time shifting module strengthens information exchange among video frames, and the attention module can learn the importance degree of two dimensions of a space position and a feature. The accuracy of risk assessment is improved by the two modules;
(3) according to the scheme, a sparse time sampling mechanism is designed on the basis of a traditional video analysis method, a complete video is divided into multiple sections, video frames are extracted from the multiple sections and input into a network for training and testing, and the detection efficiency is greatly improved while the problem of adjacent frame information redundancy is solved;
(4) the scheme of the disclosure aims at the problem that a large amount of redundant information exists in video frames of adjacent moments in a video, the video is segmented at equal intervals, video frames are randomly selected from each video segment during training, and a frame with the largest change is selected from each video segment during testing; therefore, overfitting caused by information redundancy can be effectively solved, and meanwhile, the detection efficiency is greatly improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a flow chart of a traffic scene risk assessment according to a first embodiment of the disclosure;
FIG. 2 is a diagram of a multi-branch convolutional neural network according to a first embodiment of the present disclosure;
fig. 3 is a schematic diagram of a time shifting module process according to a first embodiment of the disclosure;
FIG. 4 is a schematic view of an attention module according to a first embodiment of the disclosure;
FIG. 5 is a schematic view of a channel attention module according to a first embodiment of the disclosure;
fig. 6 is a schematic diagram of a spatial attention module according to a first embodiment of the disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The first embodiment is as follows:
the embodiment aims to provide a traffic scene risk assessment method based on a multi-branch convolutional neural network.
A traffic scene risk assessment method based on a multi-branch convolutional neural network comprises the following steps:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Further, the method comprises five parts of training video data acquisition and processing, multi-branch convolutional neural network construction, space branch network training, time branch network training and online detection and evaluation;
the acquisition and processing of the training video data comprises the following steps:
step 1, shooting a road traffic scene video;
step 2, editing the video into video segments;
step 3, classifying the video segments according to risk levels;
step 4, decomposing the video segment into video frames;
and 5, calculating the optical flow between frames.
The construction of the multi-branch convolutional neural network comprises the following steps:
step 1, constructing a basic structure;
step 2, constructing a time shifting module;
and 3, constructing an attention module.
The training spatial branching network comprises the following steps:
step 1, randomly sampling and scaling video frame images at equal intervals, and inputting the video frame images into a spatial branch network;
step 2, calculating the output of the backbone network;
step 3, calculating the output of the full connection layer;
and 4, calculating a loss function, performing back propagation and training the spatial branch network parameters.
The training time-branching network comprises the following steps:
step 1, sampling and scaling optical flow images at equal intervals randomly, and inputting the optical flow images into a time branch network;
step 2, calculating the output of the backbone network;
step 3, calculating the output of the full connection layer;
and 4, calculating a loss function, performing back propagation and training time branch network parameters.
The online detection and evaluation comprises the following steps:
step 1, shooting a road traffic scene video;
step 2, calculating an inter-frame optical flow;
step 3, segmenting the video at equal intervals, selecting and scaling the image with the maximum change of each segment, inputting the image into the trained spatial branch network, wherein the maximum change is the motion amount of the video frame calculated according to the optical flows (the sum of the optical flows in all directions, and the optical flow calculation formula is shown in a formula (1)) and taking a frame with the maximum motion amount in each segment as a frame with the maximum change;
step 4, zooming the optical flow image corresponding to the selected video frame, and inputting the optical flow image into the trained time branch network;
and 5, fusing the results of the spatial branch and the time branch, and evaluating the traffic scene risk level.
By integrating the traffic scene risk assessment structure provided by the method and the multi-branch convolutional neural network, the overall method comprises the following steps:
step 1: acquiring a training video and calculating an optical flow, specifically:
the method includes the steps of shooting a road traffic scene in front of a vehicle by using a vehicle data recorder or a vehicle-mounted camera.
And secondly, using video clipping software to clip the video into video segments in units of 60 frames.
Thirdly, according to the possibility of accidents in the traffic scene in the video, the risk rating is divided into four grades: low risk: the number of pedestrians and vehicles in the scene is small, the vehicles are driven normally, and the possibility of accidents is low; moderate risk: the pedestrian flow and the vehicle flow in the scene are moderate, no dangerous driving behaviors exist, and the running is more orderly; high risk: too many pedestrians and vehicles exist in the scene, or the vehicle runs disorderly, so that the probability of accidents is high; accident level: a visible traffic accident occurs in the scene. The video segments are classified according to the above criteria.
And fourthly, decomposing the video segments into video frames by using video processing software or opencv software library to construct an image training data set.
Calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method, and constructing an optical flow training data set:
wherein, VxAnd VyIs the optical flow in the x and y directions, Ixi、IyiAnd ItiThe differences of the pixel point i in three directions (x, y, t) are respectively.
Step 2: constructing a multi-branch convolutional neural network
The convolutional neural network comprises a convolutional layer, a downsampling layer, a residual convolutional layer with an attention module, a time shifting module and a full connection layer, and the structure is shown in FIG. 2.
A time shift module: after extracting features through a series of operations such as convolution and pooling, the time shift module shifts in the time dimension, shifts the front quarter channel to the next time, shifts the lower quarter channel to the previous time, and keeps the remaining half channel unchanged, as shown in fig. 3.
This process can be expressed as:
Y=ω1c1T+1+ω2c2T-1+ω3c3T0 (2)
wherein channel c is divided into (c)1,c2,c3) Three parts, (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and1,ω2,ω3) Representing the weight.
③ attention module: simulating the observation habit of human, the present disclosure adds an attention mechanism, inserts channels and spatial attention modules in the residual convolutional layer, and the structure is shown in fig. 4.
The channel attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a one-dimensional convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 5.
The channel attention calculation process may be expressed as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)])) (3)
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ denotes the activation function.
The spatial attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 6.
The spatial attention calculation process may be expressed as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)])) (4)
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
And step 3: training a spatial branch convolutional neural network based on an image training data set to obtain a network parameter Ps。
Dividing a video frame into a plurality of segments by taking 5 frames as a unit, randomly extracting a frame from each segment, scaling the frame to 224 × 224, and stacking the frames together to input the frames into a network.
Random initialization of network parameters PsAccording to the followingAnd (4) calculating the output of the backbone network before the full connection layer.
Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (alpha, beta) respectively represent the weight and the deviation amount of the convolution layer.
Thirdly, calculating the output of the full connection layer according to the following formula:
p(xs)=σ(θfc T,f(xs)) (6)
wherein, thetafcIs the network parameter of the full connection layer.
And fourthly, performing iterative training according to the difference between the input data label and the network output.
The spatial convolutional neural network channel output loss function is:
wherein M represents the number of risk categories; y isicIs an indicator variable (0 or 1), is 1 if the class is the same as the class of sample i, otherwise is 0; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample i. After calculating the loss function, the network updates the parameter P in the direction of decreasing the loss function by means of back propagation.
The iteration times are set to 200, and the updating sequence of the parameters of each iteration process is as follows:
1 first update the classified full-link layer parameter θfc:
2 then update the parameters θ of the backbone network:
where the learning rate η is set to 0.001 and halved after 100 iterations.
And 4, step 4: training a time-branch convolutional neural network based on an optical flow training data set to obtain a network parameter Pt。
Dividing the optical flow frame into several segments by 5 frames, randomly extracting one frame from each segment, scaling to 224 × 224, and stacking the segments together to input into the network.
② randomly initializing network parameter PtThe backbone network output before the fully connected layer is calculated according to the following formula.
Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (α, β) respectively represent a weight and a deviation amount of the convolutional layer.
Thirdly, calculating the output of the full connection layer according to the following formula:
p(xt)=σ(θfc T,f(xt)) (11)
wherein, thetafcIs the network parameter of the full connection layer.
And fourthly, performing iterative training according to the difference between the input data label and the network output.
The spatial convolutional neural network channel output loss function is:
wherein M represents the number of risk categories; y isicIs an indicator variable (0 or 1), is 1 if the class is the same as the class of sample i, otherwise is 0; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample i. After calculating the loss function, the network updates the parameter P in the direction of decreasing the loss function by means of back propagation.
The iteration times are set to 200, and the updating sequence of the parameters of each iteration process is as follows:
1 first update the classified full-link layer parameter θfc:
2 then update the parameters θ of the backbone network:
where the learning rate η is set to 0.001 and halved after 100 iterations.
And 5: in practical application, the trained multi-branch convolutional neural network is used for carrying out online detection and evaluation on the traffic scene risk in the vehicle-mounted video.
The method comprises the steps of shooting vehicle-mounted videos by using a vehicle data recorder or a vehicle-mounted camera, and caching video data of 60 frames in a computer storage medium.
Secondly, calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method.
Thirdly, segmenting the video at intervals of 5 frames, selecting one frame of image with the largest change from each segment, zooming the image to the size of 224 × 224, stacking the images together and inputting the images into a trained spatial branch network, and calculating the spatial branch risk probability p (x) according to the formulas (5) and (6)s)。
Fourthly, inputting the optical flow diagram corresponding to the frame input by the spatial branch into the trained time branch network, and calculating the time branch risk probability p (x) according to the formulas (10) and (11)t)
Adding the risk probabilities of the spatial branch and the temporal branch according to a proportion to obtain a risk grade corresponding to the maximum risk probability:
rfusion=max(ωs·p(xs)+ωt·p(xt)) (15)
to demonstrate the feasibility of the protocol described in the present disclosure, experiments were performed to demonstrate that:
the hardware conditions for a set of validation experiments of the present disclosure were: 64bits windows 10, CPU intel core i7, GPU GTX 1070Ti, RAM 24GB, experimental programming language python3.6, and deep learning framework pytorch.
The present disclosure performed experiments on the Dashcam Video dataset. The data set comprises 620 sections of videos shot by the automobile data recorder, and four types of traffic risk videos with large distinguishing degree are cut from the videos for training and testing. The accuracy is adopted as an evaluation standard, and the higher the accuracy is, the better the performance of the algorithm is.
The experimental results were compared to conventional video understanding and analysis methods as follows:
the confusion matrix for the experimental results is as follows:
in summary, compared with the methods in the prior art, the method disclosed by the present disclosure has significantly improved evaluation accuracy.
Example two:
the embodiment aims to provide a traffic scene risk assessment system based on a multi-branch convolutional neural network.
A traffic scene risk assessment system based on a multi-branch convolutional neural network comprises:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Example three:
the embodiment aims at providing an electronic device.
An electronic device including an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for operation, wherein the processor implements the above traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program, and the method includes:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Example four:
an object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for evaluating the risk of a traffic scene based on a multi-branch convolutional neural network includes:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. The traffic scene risk assessment method based on the multi-branch convolutional neural network is characterized by comprising the following steps:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
2. The method for traffic scene risk assessment based on multi-branch convolutional neural network as claimed in claim 1, wherein the obtained video frame sequence and optical flow graph data need to be classified according to the probability of traffic accident before the data set is constructed, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.
3. The multi-branch convolutional neural network-based traffic scene risk assessment method of claim 1, wherein the time shift module is used for increasing information exchange and fusion between adjacent frames, the time shift module performs displacement in the time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channels are kept unchanged; this process can be expressed as:
Y=ω1c1T+1+ω2c2T-1+ω3c3T0
wherein, the channel c is divided into three parts (c1, c2 and c3), (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and1,ω2,ω3) Representing the weight.
4. The traffic scene risk assessment method based on the multi-branch convolutional neural network of claim 1, wherein the attention module is inserted into the residual convolutional layer, the attention module comprises a channel attention module and a spatial attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input into a one-dimensional convolutional layer for learning, a channel attention weight is obtained through an activation function, and the calculation process can be represented as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ denotes the activation function.
5. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 4, wherein the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input to the convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be represented as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
6. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein the training of the multi-branch convolutional neural network model is to segment the video frame data of the training set at equal intervals, randomly select and stack the video frame data from each segment, and input the video frame data into the spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.
7. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein when the trained multi-branch convolutional neural network model is used for risk assessment, video frame data in a test set are segmented at equal intervals, and one frame with the largest change is selected from each segment and stacked, and input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.
8. Traffic scene risk assessment system based on multi-branch convolutional neural network, characterized by comprising:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
9. An electronic device comprising an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for execution, wherein the processor when executing the program implements the method for traffic scene risk assessment based on a multi-branch convolutional neural network as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for traffic scenario risk assessment based on a multi-branch convolutional neural network as set forth in any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010921517.XA CN112016499A (en) | 2020-09-04 | 2020-09-04 | Traffic scene risk assessment method and system based on multi-branch convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010921517.XA CN112016499A (en) | 2020-09-04 | 2020-09-04 | Traffic scene risk assessment method and system based on multi-branch convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112016499A true CN112016499A (en) | 2020-12-01 |
Family
ID=73516926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010921517.XA Pending CN112016499A (en) | 2020-09-04 | 2020-09-04 | Traffic scene risk assessment method and system based on multi-branch convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112016499A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112464546A (en) * | 2020-12-14 | 2021-03-09 | 上海交通大学设计研究总院有限公司 | Public space pedestrian flow motion risk discrimination method based on dynamic data analysis |
CN112508625A (en) * | 2020-12-18 | 2021-03-16 | 国网河南省电力公司经济技术研究院 | Intelligent inspection modeling method based on multi-branch residual attention network |
CN112849156A (en) * | 2021-04-25 | 2021-05-28 | 北京三快在线科技有限公司 | Driving risk identification method and device |
CN113283338A (en) * | 2021-05-25 | 2021-08-20 | 湖南大学 | Method, device and equipment for identifying driving behavior of driver and readable storage medium |
CN115953239A (en) * | 2023-03-15 | 2023-04-11 | 无锡锡商银行股份有限公司 | Surface examination video scene evaluation method based on multi-frequency flow network model |
CN116629465A (en) * | 2023-07-26 | 2023-08-22 | 成都源轮讯恒科技有限公司 | Smart power grids video monitoring and risk prediction response system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
-
2020
- 2020-09-04 CN CN202010921517.XA patent/CN112016499A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084228A (en) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | A kind of hazardous act automatic identifying method based on double-current convolutional neural networks |
Non-Patent Citations (5)
Title |
---|
CORCORAN GARY-PATRICK,ET AL.: "Traffic Risk Assessment: A Two-Stream Approach Using Dynamic-Attention", 《2019 16TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV)》 * |
JI LIN,ET AL.: "TSM: Temporal Shift Module for Efficient Video Understanding", 《ARXIV:1811.08383V3》 * |
SANGHYUN WOO ET AL.: "CBAM: Convolutional Block Attention Module" * |
张再腾等: "一种基于深度学习的视觉里程计算法", 《激光与光电子学进展》 * |
武玉伟: "《深度学习基础与应用》", 30 April 2020 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112464546A (en) * | 2020-12-14 | 2021-03-09 | 上海交通大学设计研究总院有限公司 | Public space pedestrian flow motion risk discrimination method based on dynamic data analysis |
CN112464546B (en) * | 2020-12-14 | 2024-03-19 | 上海交通大学设计研究总院有限公司 | Public space pedestrian flow movement risk judging method based on dynamic data analysis |
CN112508625A (en) * | 2020-12-18 | 2021-03-16 | 国网河南省电力公司经济技术研究院 | Intelligent inspection modeling method based on multi-branch residual attention network |
CN112849156A (en) * | 2021-04-25 | 2021-05-28 | 北京三快在线科技有限公司 | Driving risk identification method and device |
CN112849156B (en) * | 2021-04-25 | 2021-07-30 | 北京三快在线科技有限公司 | Driving risk identification method and device |
CN113283338A (en) * | 2021-05-25 | 2021-08-20 | 湖南大学 | Method, device and equipment for identifying driving behavior of driver and readable storage medium |
CN115953239A (en) * | 2023-03-15 | 2023-04-11 | 无锡锡商银行股份有限公司 | Surface examination video scene evaluation method based on multi-frequency flow network model |
CN116629465A (en) * | 2023-07-26 | 2023-08-22 | 成都源轮讯恒科技有限公司 | Smart power grids video monitoring and risk prediction response system |
CN116629465B (en) * | 2023-07-26 | 2024-01-12 | 李波 | Smart power grids video monitoring and risk prediction response system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112016499A (en) | Traffic scene risk assessment method and system based on multi-branch convolutional neural network | |
WO2022083784A1 (en) | Road detection method based on internet of vehicles | |
CN110084151B (en) | Video abnormal behavior discrimination method based on non-local network deep learning | |
CN110188807B (en) | Tunnel pedestrian target detection method based on cascading super-resolution network and improved Faster R-CNN | |
CN112101221B (en) | Method for real-time detection and identification of traffic signal lamp | |
CN110766098A (en) | Traffic scene small target detection method based on improved YOLOv3 | |
WO2023207742A1 (en) | Method and system for detecting anomalous traffic behavior | |
CN115223130B (en) | Multi-task panoramic driving perception method and system based on improved YOLOv5 | |
CN111428558A (en) | Vehicle detection method based on improved YO L Ov3 method | |
CN112990065A (en) | Optimized YOLOv5 model-based vehicle classification detection method | |
CN112084928A (en) | Road traffic accident detection method based on visual attention mechanism and ConvLSTM network | |
CN113011322A (en) | Detection model training method and detection method for specific abnormal behaviors of monitoring video | |
CN113221716A (en) | Unsupervised traffic abnormal behavior detection method based on foreground object detection | |
CN115761599A (en) | Video anomaly detection method and system | |
CN112785610B (en) | Lane line semantic segmentation method integrating low-level features | |
CN112597996B (en) | Method for detecting traffic sign significance in natural scene based on task driving | |
CN113487889A (en) | Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent | |
CN111612803B (en) | Vehicle image semantic segmentation method based on image definition | |
Hao et al. | Aggressive lane-change analysis closing to intersection based on UAV video and deep learning | |
CN112288702A (en) | Road image detection method based on Internet of vehicles | |
CN116468994A (en) | Village and town shrinkage simulation method, system and device based on street view data | |
CN115661786A (en) | Small rail obstacle target detection method for area pre-search | |
Ye et al. | M2f2-net: Multi-modal feature fusion for unstructured off-road freespace detection | |
CN114255450A (en) | Near-field vehicle jamming behavior prediction method based on forward panoramic image | |
CN113673332A (en) | Object recognition method, device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201201 |