CN112016499A - Traffic scene risk assessment method and system based on multi-branch convolutional neural network - Google Patents

Traffic scene risk assessment method and system based on multi-branch convolutional neural network Download PDF

Info

Publication number
CN112016499A
CN112016499A CN202010921517.XA CN202010921517A CN112016499A CN 112016499 A CN112016499 A CN 112016499A CN 202010921517 A CN202010921517 A CN 202010921517A CN 112016499 A CN112016499 A CN 112016499A
Authority
CN
China
Prior art keywords
convolutional neural
neural network
branch
traffic scene
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010921517.XA
Other languages
Chinese (zh)
Inventor
常发亮
李子健
刘春生
李爽
路彦沙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010921517.XA priority Critical patent/CN112016499A/en
Publication of CN112016499A publication Critical patent/CN112016499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Multimedia (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The scheme firstly extracts optical flow information according to video frames, then takes the video frame images as spatial information, and inputs the optical flow images as time information into a space-time multi-branch convolutional neural network for learning and training; in addition, the present disclosure adds a time shift module and an attention module on the basis of the convolutional neural network: the time shifting module enables the space-time characteristics between adjacent frames to realize information exchange without increasing the number of network parameters; the attention module can learn the areas where special changes occur in the scene, which all increase the accuracy of risk estimation; a sparse time sampling strategy is adopted in the training and testing process, the video is divided into multiple sections, video frames are extracted and input into a network, information redundancy between adjacent frames is avoided, and meanwhile the calculation speed in practical application is accelerated.

Description

Traffic scene risk assessment method and system based on multi-branch convolutional neural network
Technical Field
The present disclosure relates to the technical field of traffic scene risk assessment, and in particular, to a traffic scene risk assessment method and system based on a multi-branch convolutional neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
With the continuous development of global economy, the number of automobiles and drivers is increasing year by year, and according to data published by the ministry of public security, the quantity of automobiles in China breaks through 2 hundred million, and the problem of road traffic safety is more prominent and is widely concerned by people. The latest statistical data of the world health organization shows that about 125 thousands of people die due to road traffic accidents every year around the world, China is the country with the most road traffic accidents all over the world, and the huge quantity of automobiles enables China to bear huge road traffic safety threats.
In recent years, video shooting and storage technologies are rapidly developed, a vehicle event data recorder serving as a responsibility judgment auxiliary tool after a road traffic accident occurs enters the market, the installation rate is increased year by year, and then a large number of videos shot on the vehicle are generated. In the field of automatic driving, the automobile detects the surrounding environment of the automobile through sensors such as an ultrasonic radar, a millimeter wave radar, a laser radar and a camera, and a large amount of video data is generated. If the current road environment can be subjected to risk assessment through the videos and a driver or an automatic driving system is timely reminded, the driving safety is greatly improved, accidents are reduced, and the life and property safety of the driver is guaranteed.
Most of the existing traffic scene risk assessment methods use videos shot by a road monitoring camera, and one scheme is that the video sequence of road traffic is collected, vehicle acceleration, direction change and geometric position information in the videos are extracted, and whether abnormal events occur or not is judged; and the other type of system detects and tracks moving vehicles in the monitoring video, and analyzes the vehicle behaviors according to the characteristics of the speed, the driving direction, the track and the like of the target. The inventor finds that although the road monitoring camera has the advantages that the video background is fixed, the angle is proper, and dangerous driving behaviors can be detected or predicted by adopting some traditional image processing methods or a simple neural network, the detection result of the road monitoring camera only can provide road risk assessment in a large range, road risk at a local position cannot be effectively assessed, and the assessment result cannot be effectively provided for a driver in real time. Based on the above problems, some researchers have proposed a traffic scene risk estimation method for videos shot by a vehicle-mounted camera, but due to the problems of background motion, video jitter, shooting angle and the like of videos shot by the vehicle-mounted camera, the method has a complex design process of a feature extraction model, cannot obtain good evaluation accuracy, and consumes relatively long time for calculation.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a traffic scene risk assessment method and system based on a multi-branch convolutional neural network, and the scheme of the present disclosure performs traffic scene risk assessment by using the multi-branch convolutional neural network, and designs a time shift module, an attention module and a sparse time sampling mechanism, thereby effectively improving accuracy and efficiency of risk assessment.
According to a first aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment method based on a multi-branch convolutional neural network, including:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Further, before constructing the data set, the obtained video frame sequence and the optical flow diagram data need to be classified according to the possibility of traffic accidents, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.
Furthermore, the time shifting module is used for increasing information exchange and fusion between adjacent frames, the time shifting module performs displacement in a time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channel is kept unchanged; this process can be expressed as:
Y=ω1c1T+12c2T-13c3T0
wherein the channel c is divided into three parts (c1, c2 and c3), (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and123) Representing the weight.
Further, the attention module is inserted into the residual convolutional layer, the attention module includes a channel attention module and a space attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the input into the one-dimensional convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be expressed as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ represents an activation function;
further, the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together, sends the input into the convolutional layer for learning, obtains the channel attention weight through an activation function, and the calculation process can be expressed as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
Further, training the multi-branch convolutional neural network model, namely segmenting video frame data of a training set at equal intervals, randomly selecting and stacking the video frame data from each segment, and inputting the video frame data into a spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.
Further, when risk assessment is carried out by utilizing the trained multi-branch convolutional neural network model, video frame data in a test set are segmented at equal intervals, one frame with the largest change is selected from each segment and stacked, and the selected frame is input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.
According to a second aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment system based on a multi-branch convolutional neural network, including:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including an image acquisition apparatus, a memory, a processor, and a computer program stored in the memory for execution, where the processor implements the above-mentioned traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a traffic scene risk assessment system based on a multi-branch convolutional neural network as described above.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the scheme of the disclosure automatically learns the video space-time characteristics by means of the multi-branch convolutional neural network, thereby avoiding the complex task of designing a characteristic extractor to perform manual characteristic extraction on the video;
(2) according to the scheme, the time shifting module and the attention module are designed, the time shifting module strengthens information exchange among video frames, and the attention module can learn the importance degree of two dimensions of a space position and a feature. The accuracy of risk assessment is improved by the two modules;
(3) according to the scheme, a sparse time sampling mechanism is designed on the basis of a traditional video analysis method, a complete video is divided into multiple sections, video frames are extracted from the multiple sections and input into a network for training and testing, and the detection efficiency is greatly improved while the problem of adjacent frame information redundancy is solved;
(4) the scheme of the disclosure aims at the problem that a large amount of redundant information exists in video frames of adjacent moments in a video, the video is segmented at equal intervals, video frames are randomly selected from each video segment during training, and a frame with the largest change is selected from each video segment during testing; therefore, overfitting caused by information redundancy can be effectively solved, and meanwhile, the detection efficiency is greatly improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a flow chart of a traffic scene risk assessment according to a first embodiment of the disclosure;
FIG. 2 is a diagram of a multi-branch convolutional neural network according to a first embodiment of the present disclosure;
fig. 3 is a schematic diagram of a time shifting module process according to a first embodiment of the disclosure;
FIG. 4 is a schematic view of an attention module according to a first embodiment of the disclosure;
FIG. 5 is a schematic view of a channel attention module according to a first embodiment of the disclosure;
fig. 6 is a schematic diagram of a spatial attention module according to a first embodiment of the disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The first embodiment is as follows:
the embodiment aims to provide a traffic scene risk assessment method based on a multi-branch convolutional neural network.
A traffic scene risk assessment method based on a multi-branch convolutional neural network comprises the following steps:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Further, the method comprises five parts of training video data acquisition and processing, multi-branch convolutional neural network construction, space branch network training, time branch network training and online detection and evaluation;
the acquisition and processing of the training video data comprises the following steps:
step 1, shooting a road traffic scene video;
step 2, editing the video into video segments;
step 3, classifying the video segments according to risk levels;
step 4, decomposing the video segment into video frames;
and 5, calculating the optical flow between frames.
The construction of the multi-branch convolutional neural network comprises the following steps:
step 1, constructing a basic structure;
step 2, constructing a time shifting module;
and 3, constructing an attention module.
The training spatial branching network comprises the following steps:
step 1, randomly sampling and scaling video frame images at equal intervals, and inputting the video frame images into a spatial branch network;
step 2, calculating the output of the backbone network;
step 3, calculating the output of the full connection layer;
and 4, calculating a loss function, performing back propagation and training the spatial branch network parameters.
The training time-branching network comprises the following steps:
step 1, sampling and scaling optical flow images at equal intervals randomly, and inputting the optical flow images into a time branch network;
step 2, calculating the output of the backbone network;
step 3, calculating the output of the full connection layer;
and 4, calculating a loss function, performing back propagation and training time branch network parameters.
The online detection and evaluation comprises the following steps:
step 1, shooting a road traffic scene video;
step 2, calculating an inter-frame optical flow;
step 3, segmenting the video at equal intervals, selecting and scaling the image with the maximum change of each segment, inputting the image into the trained spatial branch network, wherein the maximum change is the motion amount of the video frame calculated according to the optical flows (the sum of the optical flows in all directions, and the optical flow calculation formula is shown in a formula (1)) and taking a frame with the maximum motion amount in each segment as a frame with the maximum change;
step 4, zooming the optical flow image corresponding to the selected video frame, and inputting the optical flow image into the trained time branch network;
and 5, fusing the results of the spatial branch and the time branch, and evaluating the traffic scene risk level.
By integrating the traffic scene risk assessment structure provided by the method and the multi-branch convolutional neural network, the overall method comprises the following steps:
step 1: acquiring a training video and calculating an optical flow, specifically:
the method includes the steps of shooting a road traffic scene in front of a vehicle by using a vehicle data recorder or a vehicle-mounted camera.
And secondly, using video clipping software to clip the video into video segments in units of 60 frames.
Thirdly, according to the possibility of accidents in the traffic scene in the video, the risk rating is divided into four grades: low risk: the number of pedestrians and vehicles in the scene is small, the vehicles are driven normally, and the possibility of accidents is low; moderate risk: the pedestrian flow and the vehicle flow in the scene are moderate, no dangerous driving behaviors exist, and the running is more orderly; high risk: too many pedestrians and vehicles exist in the scene, or the vehicle runs disorderly, so that the probability of accidents is high; accident level: a visible traffic accident occurs in the scene. The video segments are classified according to the above criteria.
And fourthly, decomposing the video segments into video frames by using video processing software or opencv software library to construct an image training data set.
Calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method, and constructing an optical flow training data set:
Figure BDA0002666901910000081
wherein, VxAnd VyIs the optical flow in the x and y directions, Ixi、IyiAnd ItiThe differences of the pixel point i in three directions (x, y, t) are respectively.
Step 2: constructing a multi-branch convolutional neural network
The convolutional neural network comprises a convolutional layer, a downsampling layer, a residual convolutional layer with an attention module, a time shifting module and a full connection layer, and the structure is shown in FIG. 2.
A time shift module: after extracting features through a series of operations such as convolution and pooling, the time shift module shifts in the time dimension, shifts the front quarter channel to the next time, shifts the lower quarter channel to the previous time, and keeps the remaining half channel unchanged, as shown in fig. 3.
This process can be expressed as:
Y=ω1c1T+12c2T-13c3T0 (2)
wherein channel c is divided into (c)1,c2,c3) Three parts, (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and123) Representing the weight.
③ attention module: simulating the observation habit of human, the present disclosure adds an attention mechanism, inserts channels and spatial attention modules in the residual convolutional layer, and the structure is shown in fig. 4.
The channel attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a one-dimensional convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 5.
The channel attention calculation process may be expressed as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)])) (3)
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ denotes the activation function.
The spatial attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 6.
The spatial attention calculation process may be expressed as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)])) (4)
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
And step 3: training a spatial branch convolutional neural network based on an image training data set to obtain a network parameter Ps
Dividing a video frame into a plurality of segments by taking 5 frames as a unit, randomly extracting a frame from each segment, scaling the frame to 224 × 224, and stacking the frames together to input the frames into a network.
Random initialization of network parameters PsAccording to the followingAnd (4) calculating the output of the backbone network before the full connection layer.
Figure BDA0002666901910000091
Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (alpha, beta) respectively represent the weight and the deviation amount of the convolution layer.
Thirdly, calculating the output of the full connection layer according to the following formula:
p(xs)=σ(θfc T,f(xs)) (6)
wherein, thetafcIs the network parameter of the full connection layer.
And fourthly, performing iterative training according to the difference between the input data label and the network output.
The spatial convolutional neural network channel output loss function is:
Figure BDA0002666901910000092
wherein M represents the number of risk categories; y isicIs an indicator variable (0 or 1), is 1 if the class is the same as the class of sample i, otherwise is 0; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample i. After calculating the loss function, the network updates the parameter P in the direction of decreasing the loss function by means of back propagation.
The iteration times are set to 200, and the updating sequence of the parameters of each iteration process is as follows:
1 first update the classified full-link layer parameter θfc
Figure BDA0002666901910000101
2 then update the parameters θ of the backbone network:
Figure BDA0002666901910000102
where the learning rate η is set to 0.001 and halved after 100 iterations.
And 4, step 4: training a time-branch convolutional neural network based on an optical flow training data set to obtain a network parameter Pt
Dividing the optical flow frame into several segments by 5 frames, randomly extracting one frame from each segment, scaling to 224 × 224, and stacking the segments together to input into the network.
② randomly initializing network parameter PtThe backbone network output before the fully connected layer is calculated according to the following formula.
Figure BDA0002666901910000103
Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (α, β) respectively represent a weight and a deviation amount of the convolutional layer.
Thirdly, calculating the output of the full connection layer according to the following formula:
p(xt)=σ(θfc T,f(xt)) (11)
wherein, thetafcIs the network parameter of the full connection layer.
And fourthly, performing iterative training according to the difference between the input data label and the network output.
The spatial convolutional neural network channel output loss function is:
Figure BDA0002666901910000104
wherein M represents the number of risk categories; y isicIs an indicator variable (0 or 1), is 1 if the class is the same as the class of sample i, otherwise is 0; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample i. After calculating the loss function, the network updates the parameter P in the direction of decreasing the loss function by means of back propagation.
The iteration times are set to 200, and the updating sequence of the parameters of each iteration process is as follows:
1 first update the classified full-link layer parameter θfc
Figure BDA0002666901910000111
2 then update the parameters θ of the backbone network:
Figure BDA0002666901910000112
where the learning rate η is set to 0.001 and halved after 100 iterations.
And 5: in practical application, the trained multi-branch convolutional neural network is used for carrying out online detection and evaluation on the traffic scene risk in the vehicle-mounted video.
The method comprises the steps of shooting vehicle-mounted videos by using a vehicle data recorder or a vehicle-mounted camera, and caching video data of 60 frames in a computer storage medium.
Secondly, calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method.
Thirdly, segmenting the video at intervals of 5 frames, selecting one frame of image with the largest change from each segment, zooming the image to the size of 224 × 224, stacking the images together and inputting the images into a trained spatial branch network, and calculating the spatial branch risk probability p (x) according to the formulas (5) and (6)s)。
Fourthly, inputting the optical flow diagram corresponding to the frame input by the spatial branch into the trained time branch network, and calculating the time branch risk probability p (x) according to the formulas (10) and (11)t)
Adding the risk probabilities of the spatial branch and the temporal branch according to a proportion to obtain a risk grade corresponding to the maximum risk probability:
rfusion=max(ωs·p(xs)+ωt·p(xt)) (15)
to demonstrate the feasibility of the protocol described in the present disclosure, experiments were performed to demonstrate that:
the hardware conditions for a set of validation experiments of the present disclosure were: 64bits windows 10, CPU intel core i7, GPU GTX 1070Ti, RAM 24GB, experimental programming language python3.6, and deep learning framework pytorch.
The present disclosure performed experiments on the Dashcam Video dataset. The data set comprises 620 sections of videos shot by the automobile data recorder, and four types of traffic risk videos with large distinguishing degree are cut from the videos for training and testing. The accuracy is adopted as an evaluation standard, and the higher the accuracy is, the better the performance of the algorithm is.
The experimental results were compared to conventional video understanding and analysis methods as follows:
Figure BDA0002666901910000121
the confusion matrix for the experimental results is as follows:
Figure BDA0002666901910000122
in summary, compared with the methods in the prior art, the method disclosed by the present disclosure has significantly improved evaluation accuracy.
Example two:
the embodiment aims to provide a traffic scene risk assessment system based on a multi-branch convolutional neural network.
A traffic scene risk assessment system based on a multi-branch convolutional neural network comprises:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Example three:
the embodiment aims at providing an electronic device.
An electronic device including an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for operation, wherein the processor implements the above traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program, and the method includes:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
Example four:
an object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for evaluating the risk of a traffic scene based on a multi-branch convolutional neural network includes:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. The traffic scene risk assessment method based on the multi-branch convolutional neural network is characterized by comprising the following steps:
the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
training the multi-branch convolutional neural network model by using the training set;
and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
2. The method for traffic scene risk assessment based on multi-branch convolutional neural network as claimed in claim 1, wherein the obtained video frame sequence and optical flow graph data need to be classified according to the probability of traffic accident before the data set is constructed, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.
3. The multi-branch convolutional neural network-based traffic scene risk assessment method of claim 1, wherein the time shift module is used for increasing information exchange and fusion between adjacent frames, the time shift module performs displacement in the time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channels are kept unchanged; this process can be expressed as:
Y=ω1c1T+12c2T-13c3T0
wherein, the channel c is divided into three parts (c1, c2 and c3), (T)+1,T-1,T0) Represents a displacement operation in the time dimension, and123) Representing the weight.
4. The traffic scene risk assessment method based on the multi-branch convolutional neural network of claim 1, wherein the attention module is inserted into the residual convolutional layer, the attention module comprises a channel attention module and a spatial attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input into a one-dimensional convolutional layer for learning, a channel attention weight is obtained through an activation function, and the calculation process can be represented as:
Mc(F)=σ(Conv1d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ denotes the activation function.
5. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 4, wherein the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input to the convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be represented as:
Ms(F)=σ(Conv2d([AvgPool(F);MaxPool(F)]))
wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.
6. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein the training of the multi-branch convolutional neural network model is to segment the video frame data of the training set at equal intervals, randomly select and stack the video frame data from each segment, and input the video frame data into the spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.
7. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein when the trained multi-branch convolutional neural network model is used for risk assessment, video frame data in a test set are segmented at equal intervals, and one frame with the largest change is selected from each segment and stacked, and input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.
8. Traffic scene risk assessment system based on multi-branch convolutional neural network, characterized by comprising:
a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;
a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;
a model training module: training the multi-branch convolutional neural network model by using the training set;
a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.
9. An electronic device comprising an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for execution, wherein the processor when executing the program implements the method for traffic scene risk assessment based on a multi-branch convolutional neural network as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for traffic scenario risk assessment based on a multi-branch convolutional neural network as set forth in any one of claims 1-7.
CN202010921517.XA 2020-09-04 2020-09-04 Traffic scene risk assessment method and system based on multi-branch convolutional neural network Pending CN112016499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010921517.XA CN112016499A (en) 2020-09-04 2020-09-04 Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010921517.XA CN112016499A (en) 2020-09-04 2020-09-04 Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Publications (1)

Publication Number Publication Date
CN112016499A true CN112016499A (en) 2020-12-01

Family

ID=73516926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010921517.XA Pending CN112016499A (en) 2020-09-04 2020-09-04 Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Country Status (1)

Country Link
CN (1) CN112016499A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464546A (en) * 2020-12-14 2021-03-09 上海交通大学设计研究总院有限公司 Public space pedestrian flow motion risk discrimination method based on dynamic data analysis
CN112508625A (en) * 2020-12-18 2021-03-16 国网河南省电力公司经济技术研究院 Intelligent inspection modeling method based on multi-branch residual attention network
CN112849156A (en) * 2021-04-25 2021-05-28 北京三快在线科技有限公司 Driving risk identification method and device
CN113283338A (en) * 2021-05-25 2021-08-20 湖南大学 Method, device and equipment for identifying driving behavior of driver and readable storage medium
CN115953239A (en) * 2023-03-15 2023-04-11 无锡锡商银行股份有限公司 Surface examination video scene evaluation method based on multi-frequency flow network model
CN116629465A (en) * 2023-07-26 2023-08-22 成都源轮讯恒科技有限公司 Smart power grids video monitoring and risk prediction response system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084228A (en) * 2019-06-25 2019-08-02 江苏德劭信息科技有限公司 A kind of hazardous act automatic identifying method based on double-current convolutional neural networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084228A (en) * 2019-06-25 2019-08-02 江苏德劭信息科技有限公司 A kind of hazardous act automatic identifying method based on double-current convolutional neural networks

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CORCORAN GARY-PATRICK,ET AL.: "Traffic Risk Assessment: A Two-Stream Approach Using Dynamic-Attention", 《2019 16TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV)》 *
JI LIN,ET AL.: "TSM: Temporal Shift Module for Efficient Video Understanding", 《ARXIV:1811.08383V3》 *
SANGHYUN WOO ET AL.: "CBAM: Convolutional Block Attention Module" *
张再腾等: "一种基于深度学习的视觉里程计算法", 《激光与光电子学进展》 *
武玉伟: "《深度学习基础与应用》", 30 April 2020 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464546A (en) * 2020-12-14 2021-03-09 上海交通大学设计研究总院有限公司 Public space pedestrian flow motion risk discrimination method based on dynamic data analysis
CN112464546B (en) * 2020-12-14 2024-03-19 上海交通大学设计研究总院有限公司 Public space pedestrian flow movement risk judging method based on dynamic data analysis
CN112508625A (en) * 2020-12-18 2021-03-16 国网河南省电力公司经济技术研究院 Intelligent inspection modeling method based on multi-branch residual attention network
CN112849156A (en) * 2021-04-25 2021-05-28 北京三快在线科技有限公司 Driving risk identification method and device
CN112849156B (en) * 2021-04-25 2021-07-30 北京三快在线科技有限公司 Driving risk identification method and device
CN113283338A (en) * 2021-05-25 2021-08-20 湖南大学 Method, device and equipment for identifying driving behavior of driver and readable storage medium
CN115953239A (en) * 2023-03-15 2023-04-11 无锡锡商银行股份有限公司 Surface examination video scene evaluation method based on multi-frequency flow network model
CN116629465A (en) * 2023-07-26 2023-08-22 成都源轮讯恒科技有限公司 Smart power grids video monitoring and risk prediction response system
CN116629465B (en) * 2023-07-26 2024-01-12 李波 Smart power grids video monitoring and risk prediction response system

Similar Documents

Publication Publication Date Title
CN112016499A (en) Traffic scene risk assessment method and system based on multi-branch convolutional neural network
WO2022083784A1 (en) Road detection method based on internet of vehicles
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN110188807B (en) Tunnel pedestrian target detection method based on cascading super-resolution network and improved Faster R-CNN
CN112101221B (en) Method for real-time detection and identification of traffic signal lamp
CN110766098A (en) Traffic scene small target detection method based on improved YOLOv3
WO2023207742A1 (en) Method and system for detecting anomalous traffic behavior
CN115223130B (en) Multi-task panoramic driving perception method and system based on improved YOLOv5
CN111428558A (en) Vehicle detection method based on improved YO L Ov3 method
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN112084928A (en) Road traffic accident detection method based on visual attention mechanism and ConvLSTM network
CN113011322A (en) Detection model training method and detection method for specific abnormal behaviors of monitoring video
CN113221716A (en) Unsupervised traffic abnormal behavior detection method based on foreground object detection
CN115761599A (en) Video anomaly detection method and system
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN112597996B (en) Method for detecting traffic sign significance in natural scene based on task driving
CN113487889A (en) Traffic state anti-disturbance generation method based on single intersection signal control of rapid gradient descent
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
Hao et al. Aggressive lane-change analysis closing to intersection based on UAV video and deep learning
CN112288702A (en) Road image detection method based on Internet of vehicles
CN116468994A (en) Village and town shrinkage simulation method, system and device based on street view data
CN115661786A (en) Small rail obstacle target detection method for area pre-search
Ye et al. M2f2-net: Multi-modal feature fusion for unstructured off-road freespace detection
CN114255450A (en) Near-field vehicle jamming behavior prediction method based on forward panoramic image
CN113673332A (en) Object recognition method, device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201201