CN112016499A

CN112016499A - Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Info

Publication number: CN112016499A
Application number: CN202010921517.XA
Authority: CN
Inventors: 常发亮; 李子健; 刘春生; 李爽; 路彦沙
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-01

Abstract

The scheme firstly extracts optical flow information according to video frames, then takes the video frame images as spatial information, and inputs the optical flow images as time information into a space-time multi-branch convolutional neural network for learning and training; in addition, the present disclosure adds a time shift module and an attention module on the basis of the convolutional neural network: the time shifting module enables the space-time characteristics between adjacent frames to realize information exchange without increasing the number of network parameters; the attention module can learn the areas where special changes occur in the scene, which all increase the accuracy of risk estimation; a sparse time sampling strategy is adopted in the training and testing process, the video is divided into multiple sections, video frames are extracted and input into a network, information redundancy between adjacent frames is avoided, and meanwhile the calculation speed in practical application is accelerated.

Description

Traffic scene risk assessment method and system based on multi-branch convolutional neural network

Technical Field

The present disclosure relates to the technical field of traffic scene risk assessment, and in particular, to a traffic scene risk assessment method and system based on a multi-branch convolutional neural network.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

With the continuous development of global economy, the number of automobiles and drivers is increasing year by year, and according to data published by the ministry of public security, the quantity of automobiles in China breaks through 2 hundred million, and the problem of road traffic safety is more prominent and is widely concerned by people. The latest statistical data of the world health organization shows that about 125 thousands of people die due to road traffic accidents every year around the world, China is the country with the most road traffic accidents all over the world, and the huge quantity of automobiles enables China to bear huge road traffic safety threats.

In recent years, video shooting and storage technologies are rapidly developed, a vehicle event data recorder serving as a responsibility judgment auxiliary tool after a road traffic accident occurs enters the market, the installation rate is increased year by year, and then a large number of videos shot on the vehicle are generated. In the field of automatic driving, the automobile detects the surrounding environment of the automobile through sensors such as an ultrasonic radar, a millimeter wave radar, a laser radar and a camera, and a large amount of video data is generated. If the current road environment can be subjected to risk assessment through the videos and a driver or an automatic driving system is timely reminded, the driving safety is greatly improved, accidents are reduced, and the life and property safety of the driver is guaranteed.

Most of the existing traffic scene risk assessment methods use videos shot by a road monitoring camera, and one scheme is that the video sequence of road traffic is collected, vehicle acceleration, direction change and geometric position information in the videos are extracted, and whether abnormal events occur or not is judged; and the other type of system detects and tracks moving vehicles in the monitoring video, and analyzes the vehicle behaviors according to the characteristics of the speed, the driving direction, the track and the like of the target. The inventor finds that although the road monitoring camera has the advantages that the video background is fixed, the angle is proper, and dangerous driving behaviors can be detected or predicted by adopting some traditional image processing methods or a simple neural network, the detection result of the road monitoring camera only can provide road risk assessment in a large range, road risk at a local position cannot be effectively assessed, and the assessment result cannot be effectively provided for a driver in real time. Based on the above problems, some researchers have proposed a traffic scene risk estimation method for videos shot by a vehicle-mounted camera, but due to the problems of background motion, video jitter, shooting angle and the like of videos shot by the vehicle-mounted camera, the method has a complex design process of a feature extraction model, cannot obtain good evaluation accuracy, and consumes relatively long time for calculation.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a traffic scene risk assessment method and system based on a multi-branch convolutional neural network, and the scheme of the present disclosure performs traffic scene risk assessment by using the multi-branch convolutional neural network, and designs a time shift module, an attention module and a sparse time sampling mechanism, thereby effectively improving accuracy and efficiency of risk assessment.

According to a first aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment method based on a multi-branch convolutional neural network, including:

the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;

constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;

training the multi-branch convolutional neural network model by using the training set;

and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.

Further, before constructing the data set, the obtained video frame sequence and the optical flow diagram data need to be classified according to the possibility of traffic accidents, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.

Furthermore, the time shifting module is used for increasing information exchange and fusion between adjacent frames, the time shifting module performs displacement in a time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channel is kept unchanged; this process can be expressed as:

Y＝ω₁c₁T⁺¹+ω₂c₂T^-1+ω₃c₃T⁰

wherein the channel c is divided into three parts (c1, c2 and c3), (T)⁺¹,T^-1,T⁰) Represents a displacement operation in the time dimension, and₁,ω₂,ω₃) Representing the weight.

Further, the attention module is inserted into the residual convolutional layer, the attention module includes a channel attention module and a space attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the input into the one-dimensional convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be expressed as:

M_c(F)＝σ(Conv1d([AvgPool(F)；MaxPool(F)]))

wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ represents an activation function;

further, the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together, sends the input into the convolutional layer for learning, obtains the channel attention weight through an activation function, and the calculation process can be expressed as:

M_s(F)＝σ(Conv2d([AvgPool(F)；MaxPool(F)]))

wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv2d represents a two-dimensional convolution operation; σ denotes the activation function.

Further, training the multi-branch convolutional neural network model, namely segmenting video frame data of a training set at equal intervals, randomly selecting and stacking the video frame data from each segment, and inputting the video frame data into a spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.

Further, when risk assessment is carried out by utilizing the trained multi-branch convolutional neural network model, video frame data in a test set are segmented at equal intervals, one frame with the largest change is selected from each segment and stacked, and the selected frame is input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.

According to a second aspect of the embodiments of the present disclosure, there is provided a traffic scene risk assessment system based on a multi-branch convolutional neural network, including:

a data set construction module: the method comprises the steps of obtaining a traffic scene video, intercepting a video frame sequence according to a preset frequency, calculating an optical flow graph between adjacent frames, and dividing the video frame sequence and the optical flow graph data into a training set and a testing set respectively;

a model construction module: constructing a multi-branch convolutional neural network model, and embedding a time shifting module and an attention module as a space branch network and a time branch network respectively;

a model training module: training the multi-branch convolutional neural network model by using the training set;

a risk assessment module: and inputting the test set into a trained multi-branch convolutional neural network model to generate an evaluation score of risk occurrence in a traffic scene.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including an image acquisition apparatus, a memory, a processor, and a computer program stored in the memory for execution, where the processor implements the above-mentioned traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a traffic scene risk assessment system based on a multi-branch convolutional neural network as described above.

Compared with the prior art, the beneficial effect of this disclosure is:

(1) the scheme of the disclosure automatically learns the video space-time characteristics by means of the multi-branch convolutional neural network, thereby avoiding the complex task of designing a characteristic extractor to perform manual characteristic extraction on the video;

(2) according to the scheme, the time shifting module and the attention module are designed, the time shifting module strengthens information exchange among video frames, and the attention module can learn the importance degree of two dimensions of a space position and a feature. The accuracy of risk assessment is improved by the two modules;

(3) according to the scheme, a sparse time sampling mechanism is designed on the basis of a traditional video analysis method, a complete video is divided into multiple sections, video frames are extracted from the multiple sections and input into a network for training and testing, and the detection efficiency is greatly improved while the problem of adjacent frame information redundancy is solved;

(4) the scheme of the disclosure aims at the problem that a large amount of redundant information exists in video frames of adjacent moments in a video, the video is segmented at equal intervals, video frames are randomly selected from each video segment during training, and a frame with the largest change is selected from each video segment during testing; therefore, overfitting caused by information redundancy can be effectively solved, and meanwhile, the detection efficiency is greatly improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

Fig. 1 is a flow chart of a traffic scene risk assessment according to a first embodiment of the disclosure;

FIG. 2 is a diagram of a multi-branch convolutional neural network according to a first embodiment of the present disclosure;

fig. 3 is a schematic diagram of a time shifting module process according to a first embodiment of the disclosure;

FIG. 4 is a schematic view of an attention module according to a first embodiment of the disclosure;

FIG. 5 is a schematic view of a channel attention module according to a first embodiment of the disclosure;

fig. 6 is a schematic diagram of a spatial attention module according to a first embodiment of the disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The first embodiment is as follows:

the embodiment aims to provide a traffic scene risk assessment method based on a multi-branch convolutional neural network.

A traffic scene risk assessment method based on a multi-branch convolutional neural network comprises the following steps:

Further, the method comprises five parts of training video data acquisition and processing, multi-branch convolutional neural network construction, space branch network training, time branch network training and online detection and evaluation;

the acquisition and processing of the training video data comprises the following steps:

step 1, shooting a road traffic scene video;

step 2, editing the video into video segments;

step 3, classifying the video segments according to risk levels;

step 4, decomposing the video segment into video frames;

and 5, calculating the optical flow between frames.

The construction of the multi-branch convolutional neural network comprises the following steps:

step 1, constructing a basic structure;

step 2, constructing a time shifting module;

and 3, constructing an attention module.

The training spatial branching network comprises the following steps:

step 1, randomly sampling and scaling video frame images at equal intervals, and inputting the video frame images into a spatial branch network;

step 2, calculating the output of the backbone network;

step 3, calculating the output of the full connection layer;

and 4, calculating a loss function, performing back propagation and training the spatial branch network parameters.

The training time-branching network comprises the following steps:

step 1, sampling and scaling optical flow images at equal intervals randomly, and inputting the optical flow images into a time branch network;

step 2, calculating the output of the backbone network;

step 3, calculating the output of the full connection layer;

and 4, calculating a loss function, performing back propagation and training time branch network parameters.

The online detection and evaluation comprises the following steps:

step 1, shooting a road traffic scene video;

step 2, calculating an inter-frame optical flow;

step 3, segmenting the video at equal intervals, selecting and scaling the image with the maximum change of each segment, inputting the image into the trained spatial branch network, wherein the maximum change is the motion amount of the video frame calculated according to the optical flows (the sum of the optical flows in all directions, and the optical flow calculation formula is shown in a formula (1)) and taking a frame with the maximum motion amount in each segment as a frame with the maximum change;

step 4, zooming the optical flow image corresponding to the selected video frame, and inputting the optical flow image into the trained time branch network;

and 5, fusing the results of the spatial branch and the time branch, and evaluating the traffic scene risk level.

By integrating the traffic scene risk assessment structure provided by the method and the multi-branch convolutional neural network, the overall method comprises the following steps:

step 1: acquiring a training video and calculating an optical flow, specifically:

the method includes the steps of shooting a road traffic scene in front of a vehicle by using a vehicle data recorder or a vehicle-mounted camera.

And secondly, using video clipping software to clip the video into video segments in units of 60 frames.

Thirdly, according to the possibility of accidents in the traffic scene in the video, the risk rating is divided into four grades: low risk: the number of pedestrians and vehicles in the scene is small, the vehicles are driven normally, and the possibility of accidents is low; moderate risk: the pedestrian flow and the vehicle flow in the scene are moderate, no dangerous driving behaviors exist, and the running is more orderly; high risk: too many pedestrians and vehicles exist in the scene, or the vehicle runs disorderly, so that the probability of accidents is high; accident level: a visible traffic accident occurs in the scene. The video segments are classified according to the above criteria.

And fourthly, decomposing the video segments into video frames by using video processing software or opencv software library to construct an image training data set.

Calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method, and constructing an optical flow training data set:

wherein, V_xAnd V_yIs the optical flow in the x and y directions, I_xi、I_yiAnd I_tiThe differences of the pixel point i in three directions (x, y, t) are respectively.

Step 2: constructing a multi-branch convolutional neural network

The convolutional neural network comprises a convolutional layer, a downsampling layer, a residual convolutional layer with an attention module, a time shifting module and a full connection layer, and the structure is shown in FIG. 2.

A time shift module: after extracting features through a series of operations such as convolution and pooling, the time shift module shifts in the time dimension, shifts the front quarter channel to the next time, shifts the lower quarter channel to the previous time, and keeps the remaining half channel unchanged, as shown in fig. 3.

This process can be expressed as:

Y＝ω₁c₁T⁺¹+ω₂c₂T^-1+ω₃c₃T⁰ (2)

wherein channel c is divided into (c)₁,c₂,c₃) Three parts, (T)⁺¹,T^-1,T⁰) Represents a displacement operation in the time dimension, and₁,ω₂,ω₃) Representing the weight.

③ attention module: simulating the observation habit of human, the present disclosure adds an attention mechanism, inserts channels and spatial attention modules in the residual convolutional layer, and the structure is shown in fig. 4.

The channel attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a one-dimensional convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 5.

The channel attention calculation process may be expressed as:

M_c(F)＝σ(Conv1d([AvgPool(F)；MaxPool(F)])) (3)

wherein AvgPool and MaxPool denote average pooling and maximum pooling operations; conv1d represents a one-dimensional convolution operation; σ denotes the activation function.

The spatial attention model performs feature compression on the input through two pooling layers, then splices the input together and sends the input into a convolutional layer for learning, and the channel attention weight is obtained through an activation function, and the structure is shown in fig. 6.

The spatial attention calculation process may be expressed as:

M_s(F)＝σ(Conv2d([AvgPool(F)；MaxPool(F)])) (4)

And step 3: training a spatial branch convolutional neural network based on an image training data set to obtain a network parameter P_s。

Dividing a video frame into a plurality of segments by taking 5 frames as a unit, randomly extracting a frame from each segment, scaling the frame to 224 × 224, and stacking the frames together to input the frames into a network.

Random initialization of network parameters P_sAccording to the followingAnd (4) calculating the output of the backbone network before the full connection layer.

Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (alpha, beta) respectively represent the weight and the deviation amount of the convolution layer.

Thirdly, calculating the output of the full connection layer according to the following formula:

p(x_s)＝σ(θ_fc ^T,f(x_s)) (6)

wherein, theta_fcIs the network parameter of the full connection layer.

And fourthly, performing iterative training according to the difference between the input data label and the network output.

The spatial convolutional neural network channel output loss function is:

wherein M represents the number of risk categories; y is_icIs an indicator variable (0 or 1), is 1 if the class is the same as the class of sample i, otherwise is 0; p is a radical of_icRepresenting the predicted probability of belonging to class c for the observed sample i. After calculating the loss function, the network updates the parameter P in the direction of decreasing the loss function by means of back propagation.

The iteration times are set to 200, and the updating sequence of the parameters of each iteration process is as follows:

1 first update the classified full-link layer parameter θ_fc：

2 then update the parameters θ of the backbone network:

where the learning rate η is set to 0.001 and halved after 100 iterations.

And 4, step 4: training a time-branch convolutional neural network based on an optical flow training data set to obtain a network parameter P_t。

Dividing the optical flow frame into several segments by 5 frames, randomly extracting one frame from each segment, scaling to 224 × 224, and stacking the segments together to input into the network.

② randomly initializing network parameter P_tThe backbone network output before the fully connected layer is calculated according to the following formula.

Wherein, (c, h, w) respectively represent three dimensions of a channel, a height and a width, and (α, β) respectively represent a weight and a deviation amount of the convolutional layer.

p(x_t)＝σ(θ_fc ^T,f(x_t)) (11)

wherein, theta_fcIs the network parameter of the full connection layer.

The spatial convolutional neural network channel output loss function is:

1 first update the classified full-link layer parameter θ_fc：

2 then update the parameters θ of the backbone network:

where the learning rate η is set to 0.001 and halved after 100 iterations.

And 5: in practical application, the trained multi-branch convolutional neural network is used for carrying out online detection and evaluation on the traffic scene risk in the vehicle-mounted video.

The method comprises the steps of shooting vehicle-mounted videos by using a vehicle data recorder or a vehicle-mounted camera, and caching video data of 60 frames in a computer storage medium.

Secondly, calculating the interframe optical flow by adopting a Lucas-Kanade optical flow method.

Thirdly, segmenting the video at intervals of 5 frames, selecting one frame of image with the largest change from each segment, zooming the image to the size of 224 × 224, stacking the images together and inputting the images into a trained spatial branch network, and calculating the spatial branch risk probability p (x) according to the formulas (5) and (6)_s)。

Fourthly, inputting the optical flow diagram corresponding to the frame input by the spatial branch into the trained time branch network, and calculating the time branch risk probability p (x) according to the formulas (10) and (11)_t)

Adding the risk probabilities of the spatial branch and the temporal branch according to a proportion to obtain a risk grade corresponding to the maximum risk probability:

r_fusion＝max(ω_s·p(x_s)+ω_t·p(x_t)) (15)

to demonstrate the feasibility of the protocol described in the present disclosure, experiments were performed to demonstrate that:

the hardware conditions for a set of validation experiments of the present disclosure were: 64bits windows 10, CPU intel core i7, GPU GTX 1070Ti, RAM 24GB, experimental programming language python3.6, and deep learning framework pytorch.

The present disclosure performed experiments on the Dashcam Video dataset. The data set comprises 620 sections of videos shot by the automobile data recorder, and four types of traffic risk videos with large distinguishing degree are cut from the videos for training and testing. The accuracy is adopted as an evaluation standard, and the higher the accuracy is, the better the performance of the algorithm is.

The experimental results were compared to conventional video understanding and analysis methods as follows:

the confusion matrix for the experimental results is as follows:

in summary, compared with the methods in the prior art, the method disclosed by the present disclosure has significantly improved evaluation accuracy.

Example two:

the embodiment aims to provide a traffic scene risk assessment system based on a multi-branch convolutional neural network.

A traffic scene risk assessment system based on a multi-branch convolutional neural network comprises:

Example three:

the embodiment aims at providing an electronic device.

An electronic device including an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for operation, wherein the processor implements the above traffic scene risk assessment method based on a multi-branch convolutional neural network when executing the program, and the method includes:

Example four:

an object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for evaluating the risk of a traffic scene based on a multi-branch convolutional neural network includes:

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. The traffic scene risk assessment method based on the multi-branch convolutional neural network is characterized by comprising the following steps:

2. The method for traffic scene risk assessment based on multi-branch convolutional neural network as claimed in claim 1, wherein the obtained video frame sequence and optical flow graph data need to be classified according to the probability of traffic accident before the data set is constructed, and the risk rating is divided into four categories: the risk is low, the number of pedestrians and vehicles in the scene is small, the vehicles are in normal running, and the possibility of accidents is low; moderate risk, moderate pedestrian flow and traffic flow in the scene, no dangerous driving behavior and ordered driving; high risk, too many pedestrians and vehicles in the scene, or disordered driving, and high accident probability; at accident level, a visible traffic accident occurs in the scene.

3. The multi-branch convolutional neural network-based traffic scene risk assessment method of claim 1, wherein the time shift module is used for increasing information exchange and fusion between adjacent frames, the time shift module performs displacement in the time dimension, the front quarter channel is displaced towards the next moment, the lower quarter channel is displaced towards the previous moment, and the remaining half channels are kept unchanged; this process can be expressed as:

Y＝ω₁c₁T⁺¹+ω₂c₂T^-1+ω₃c₃T⁰

wherein, the channel c is divided into three parts (c1, c2 and c3), (T)⁺¹,T^-1,T⁰) Represents a displacement operation in the time dimension, and₁,ω₂,ω₃) Representing the weight.

4. The traffic scene risk assessment method based on the multi-branch convolutional neural network of claim 1, wherein the attention module is inserted into the residual convolutional layer, the attention module comprises a channel attention module and a spatial attention module, the channel attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input into a one-dimensional convolutional layer for learning, a channel attention weight is obtained through an activation function, and the calculation process can be represented as:

M_c(F)＝σ(Conv1d([AvgPool(F)；MaxPool(F)]))

5. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 4, wherein the spatial attention module performs feature compression on the input through two pooling layers, then splices the input together and sends the spliced input to the convolutional layer for learning, the channel attention weight is obtained through an activation function, and the calculation process can be represented as:

M_s(F)＝σ(Conv2d([AvgPool(F)；MaxPool(F)]))

6. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein the training of the multi-branch convolutional neural network model is to segment the video frame data of the training set at equal intervals, randomly select and stack the video frame data from each segment, and input the video frame data into the spatial branch network; and inputting the optical flow diagram data in the training set into a time branch network to optimize the parameters of the network model.

7. The traffic scene risk assessment method based on the multi-branch convolutional neural network as claimed in claim 1, wherein when the trained multi-branch convolutional neural network model is used for risk assessment, video frame data in a test set are segmented at equal intervals, and one frame with the largest change is selected from each segment and stacked, and input into the trained spatial branch network; and inputting the corresponding optical flow diagram into the trained time branch network, and fusing the characteristics of the two branches to obtain possible evaluation scores of the risks in the traffic scene.

8. Traffic scene risk assessment system based on multi-branch convolutional neural network, characterized by comprising:

9. An electronic device comprising an image acquisition apparatus, a memory, a processor and a computer program stored in the memory for execution, wherein the processor when executing the program implements the method for traffic scene risk assessment based on a multi-branch convolutional neural network as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for traffic scenario risk assessment based on a multi-branch convolutional neural network as set forth in any one of claims 1-7.