CN108921013B

CN108921013B - Visual scene recognition system and method based on deep neural network

Info

Publication number: CN108921013B
Application number: CN201810472012.2A
Authority: CN
Inventors: 缪其恒; 王江明; 许炜
Original assignee: Zhejiang Leapmotor Technology Co Ltd
Current assignee: Zhejiang Zero Run Technology Co Ltd
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2020-08-18
Anticipated expiration: 2038-05-16
Also published as: CN108921013A

Abstract

A deep neural network based visual scene recognition system, comprising: the vehicle-mounted vision system is used for acquiring a vehicle forward-looking vision image; the off-line training module is used for carrying out sample acquisition on a vehicle forward-looking visual image acquired from a vehicle-mounted visual system by utilizing a deep convolutional neural network for visual input, marking the acquired sample to generate a sample label and carrying out step-by-step training on neural network parameters; the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional characteristics, and network parameters are trained through samples and training tasks; and the online analysis module is used for carrying out real-time scene analysis on the sample trained by the offline training module by adopting a network compression and time-sharing parallel analysis strategy and outputting the time of day, the weather and the scene abnormal state of the road scene where the vehicle-mounted vision system is located.

Description

Visual scene recognition system and method based on deep neural network

Technical Field

The invention relates to the field of vehicle safety, in particular to a visual scene recognition system and method based on a deep neural network.

Background

Intellectualization now becomes an important development direction in the automotive industry, and sensing technologies based on visual sensors are becoming mature and increasingly widely applied in the field of active safety of vehicles. For a visual system, aiming at different road environments, corresponding adjustment can be performed from parameter and strategy levels from image acquisition to application layer algorithms, so that the method has strong application value and significance for accurately identifying scene information in visual input, and the robustness and fault tolerance of the system can be further enhanced by abnormal diagnosis of visual scenes.

Most of the existing visual systems do not have perfect basic algorithms, the existing time-of-day judgment is mostly based on a system clock, the weather judgment is based on auxiliary sensors such as rainfall and the like, and the methods and the visual system input do not have strong relevance, so that the effect is poor in some boundary conditions. The abnormal diagnosis of the visual scene is based on the block brightness statistics of the visual input, the threshold value is difficult to set, and the robustness is poor.

Disclosure of Invention

The invention aims to solve the problems that in the prior art, most of the existing time-of-day judgment is based on a system clock, weather judgment is based on auxiliary sensors such as rainfall and the like, and the method and the visual system input do not have strong correlation, so that the effect is poor in some boundary conditions, and provides a visual scene recognition system and method based on a deep neural network.

The technical scheme adopted by the invention for solving the technical problems is as follows: a deep neural network based visual scene recognition system, comprising: the vehicle-mounted vision system is used for acquiring a vehicle forward-looking vision image; the off-line training module is used for carrying out sample acquisition on a vehicle forward-looking visual image acquired from a vehicle-mounted visual system by utilizing a deep convolutional neural network for visual input, marking the acquired sample to generate a sample label and carrying out step-by-step training on neural network parameters; the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional characteristics, and network parameters are trained through samples and training tasks; and the online analysis module is used for performing real-time scene analysis by adopting a network compression and time-sharing parallel analysis strategy and utilizing the deep convolutional neural network trained by the offline training module, and outputting the time of day, the weather and the scene abnormal state of the road scene where the vehicle-mounted vision system is located.

The invention utilizes the multitask depth convolution neural network to identify the time of day, weather and scene abnormal conditions of a visual scene, and the input has stronger correlation with the visual system, so that the configuration parameters required by bottom acquisition and upper application of the visual system are more accurately set and have stronger robustness, thereby further improving the perception capability and robustness of the visual system, and the output can provide relevant prior information such as exposure parameters, detection classifier selection, image processing threshold values and the like for image acquisition and application algorithms on one hand, and can provide common diagnosis of image input abnormity such as image blurring, lens shielding and the like for the visual system on the other hand.

Further, the deep convolutional neural network is composed of a three-branch classification network sharing two layers of shallow convolutional features, and comprises: the time-of-day identification network: the time identification network inputs shared shallow features and outputs three time classifications of day, dusk and night; weather identification network: the weather identification network inputs shared shallow features and outputs five weather classifications of sunny days, cloudy days, rainy days, snowy days and foggy days; and, a scene anomaly identification network: the scene abnormity identification network inputs shared shallow layer characteristics, and outputs five abnormal scene classifications of normal scene, scene shading, fuzzy scene, too dark scene and too exposed scene.

Further, the off-line training module comprises a sample collecting and labeling unit and a neural network parameter step-by-step training unit; the sample acquisition and labeling unit is used for acquiring a vehicle forward-looking visual image in an off-line manner, extracting a discrete time sequence training sample, and expanding the sample by utilizing space transformation; sample distribution of each category of each task is balanced, labeling is carried out, and a sample label is generated; the tasks include: identifying time of day, weather and scene abnormity; the categories include: a time of day category, a weather category, and a scene anomaly category; sample labels for the time of day category: 0-day, 1-dusk, 2-night; sample labels for weather categories: 0-sunny, 1-cloudy, 2-rainy, 3-snowy, 4-foggy; and, for the sample label of the scene anomaly category: 0-normal, 1-occlusion, 2-fuzzy, 3-too bright, 4-too dark;

the neural network parameter step-by-step training unit is used for classified task training, the cross entropy is used as a loss function, firstly, shared feature layer parameters are trained, and the contribution coefficients of all tasks to weight updating are the same, namely:

loss-1/3 × L _ time +1/3 × L _ weather +1/3 × L _ absolute; and then, solidifying the convolution parameters of the shared characteristic layer, and updating the weight coefficient of each branch network by each task according to the respective loss function.

Further, the online analysis module comprises a network compression unit and a time-sharing parallel analysis unit; the network compression unit is used for quantizing and thinning the neural network parameters obtained by the offline training, evaluating the output precision loss of the compressed network by using the test set, and determining whether to retrain the quantized model according to the precision of the quantized data; the time-sharing parallel analysis unit is used for preferentially executing each frame of scene anomaly detection, and the weather and weather detection adopts a frame separation or frame skipping execution mode; in the front-end application, a double-thread or multi-thread parallel mode is adopted, one thread executes scene abnormity identification, the other thread alternately performs time-of-day and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies priority to the scene abnormity identification network to take precedence.

The invention also provides a visual scene recognition method based on the deep neural network, which comprises the steps of collecting a forward-looking visual image of a vehicle through a vehicle-mounted visual system, carrying out off-line network training on the visual image by using the deep convolutional neural network, carrying out on-line scene analysis, and outputting the time of day, the weather and the scene abnormal state of a road scene where the vehicle-mounted visual system is located; the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional characteristics, and network parameters are trained through samples and training tasks.

Further, the offline network training comprises: collecting and labeling samples, and training neural network parameters step by step; the sample collection and labeling means: collecting a vehicle forward-looking visual image in an off-line manner, extracting a discrete time sequence training sample, and expanding the sample by utilizing spatial transformation; sample distribution of each category of each task is balanced, labeling is carried out, and a sample label is generated; the tasks include: identifying time of day, weather and scene abnormity; the categories include: a time of day category, a weather category, and a scene anomaly category; sample labels for the time of day category: 0-day, 1-dusk, 2-night; sample labels for weather categories: 0-sunny, 1-cloudy, 2-rainy, 3-snowy, 4-foggy; and, for the sample label of the scene anomaly category: 0-normal, 1-occlusion, 2-fuzzy, 3-too bright, 4-too dark; the step-by-step training of the neural network parameters comprises the following steps: the classification task training is adopted, the loss function adopts cross entropy, firstly, the shared characteristic layer parameters are trained, and the contribution coefficients of all tasks to weight updating are the same, namely:

Loss＝1/3*L_time+1/3*L_weather+1/3*L_abnormal；

and then, solidifying the convolution parameters of the shared characteristic layer, and updating the weight coefficient of each branch network by each task according to the respective loss function.

Further, the online scene analysis comprises network compression and time-sharing parallel analysis strategies; the network compression is as follows: quantizing and thinning the neural network parameters obtained by the offline training, wherein the length of quantized data bits and the thinning degree are configuration parameters; evaluating the loss of the compressed network output precision by using the test set, and determining whether to retrain the quantized model according to the quantized data precision; the time-sharing parallel analysis strategy is as follows: each frame of scene anomaly detection is preferentially executed, and the weather and weather detection adopts a frame separation or frame skipping execution mode; in the front-end application, a double-thread or multi-thread parallel mode is adopted, one thread executes scene abnormity identification, the other thread alternately performs time-of-day and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies priority to the scene abnormity identification network to take precedence.

The substantial effects of the invention are as follows: the method utilizes the deep convolutional neural network to identify the scene information of the input image, can effectively identify the environmental information of the current vehicle, including the time of day, the weather and the like, and can effectively improve the image acquisition quality of a visual system and the operation efficiency and effect of an application algorithm by optimizing the configuration of related parameters. Meanwhile, the method provided by the invention can also identify lens shading, blurring and brightness abnormity in the visual input scene, and can effectively improve the fault diagnosis capability and fault tolerance of the visual system. The network structure of the invention has extensibility, and can enrich the scene classification result by adding other classification task branches.

Drawings

FIG. 1 is a general diagram of a system architecture of the present invention;

FIG. 2 is a diagram of a multitasking scene recognition deep neural network architecture according to the present invention.

Detailed Description

The technical solution of the present invention is further specifically described below by way of specific examples in conjunction with the accompanying drawings.

The invention provides a visual scene recognition system based on a deep neural network, which is based on vehicle-mounted camera visual input, judges the driving environment of a vehicle and the scene state of a visual system, provides basic configuration information for a related algorithm, inputs the system as foresight vehicle-mounted camera input, and outputs weather conditions, time-of-day conditions, camera shielding conditions and the like, and as shown in figure 1, the visual scene recognition system comprises: the vehicle-mounted vision system is used for acquiring a vehicle forward-looking vision image; the off-line training module is used for collecting samples from a vehicle forward-looking visual image collected by the vehicle-mounted visual system, marking the samples, generating sample labels, and performing step-by-step training on neural network parameters to obtain a deep convolutional neural network; the on-line analysis module is used for carrying out real-time scene analysis by utilizing the deep convolutional neural network obtained by the training of the off-line training module based on the input of the vehicle-mounted vision system; and the output module is used for outputting the time of day, the weather and the scene abnormal state of the road scene where the vehicle-mounted vision system is located.

The off-line training module comprises a sample collecting and labeling unit and a neural network parameter step-by-step training unit. The sample acquisition and labeling unit is used for acquiring a vehicle forward-looking visual image in an off-line manner, extracting a discrete time sequence training sample, and expanding the sample by utilizing space transformation; sample distribution of each category of each task is balanced, labeling is carried out, and a sample label is generated; the tasks include: identifying time of day, weather and scene abnormity; the categories include: a time of day category, a weather category, and a scene anomaly category; sample labels for the time of day category: 0-day, 1-dusk, 2-night; sample labels for weather categories: 0-sunny, 1-cloudy, 2-rainy, 3-snowy, 4-foggy; and, for the sample label of the scene anomaly category: 0-normal, 1-occlusion, 2-fuzzy, 3-too bright, 4-too dark; the neural network parameter step-by-step training unit is used for classified task training, the cross entropy is used as a loss function, firstly, shared feature layer parameters are trained, and the contribution coefficients of all tasks to weight updating are the same, namely:

The online analysis module comprises a network compression unit and a time-sharing parallel analysis unit; the network compression unit is used for quantizing and thinning the neural network parameters obtained by the offline training, evaluating the output precision loss of the compressed network by using the test set, and determining whether to retrain the quantized model according to the precision of the quantized data; the time-sharing parallel analysis unit is used for preferentially executing each frame of scene anomaly detection, and the weather and weather detection adopts a frame separation or frame skipping execution mode; in the front-end application, a double-thread or multi-thread parallel mode is adopted, one thread executes scene abnormity identification, the other thread alternately performs time-of-day and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies priority to the scene abnormity identification network to take precedence.

A visual scene recognition method based on a deep neural network comprises the steps that a vehicle forward-looking visual image is collected through a vehicle-mounted visual system, and offline network training is carried out on the collected vehicle forward-looking visual image to obtain a deep convolutional neural network; based on the input of the vehicle-mounted vision system, the deep convolutional neural network obtained by the off-line training module is used for carrying out on-line scene analysis, and the time of day, the weather and the scene abnormal state of the road scene where the vehicle-mounted vision system is located are output.

1. The multitask deep neural network architecture comprises the following steps: the deep convolutional neural network adopted by the invention is composed of three-branch classification networks sharing two-layer shallow convolutional characteristics, the network architecture is shown in figure 2, the branch network structures are multiplexed, different network parameters are trained through different samples and training tasks, and the specific branch network structure is as follows:

1.1 day time identification network: the time-of-day identification network inputs shared shallow features and outputs three time-of-day and time-of-day similar classifications of day, dusk and night (including tunnels).

1.2 weather recognition network: the weather identification network inputs shared shallow features and outputs five weather classifications of sunny days, cloudy days, rainy days, snowy days and foggy days.

1.3 scene abnormity identification network: the scene abnormity identification network inputs shared shallow layer characteristics, and outputs five abnormal scene classifications of normal scene, scene shading, fuzzy scene, too dark scene and too exposed scene.

2, off-line network training: collecting forward-looking driving scene data samples, marking correspondingly, and training each branch network parameter respectively by three types of recognition tasks:

2.1 sample collection and labeling: and (3) acquiring vehicle-mounted forward-looking driving scene data in an off-line manner, extracting 100000 discrete time sequence training samples, balancing the distribution of each class of sample of each task, and manually marking to generate a sample label. The label content comprises: the time of day category (0-day, 1-dusk, 2-night), the weather category (0-sunny, 1-cloudy, 2-rainy, 3-snowy, 4-foggy), and the scene anomaly category (0-normal, 1-occlusion, 2-fuzzy, 3-overly bright, 4-overly dark). Sample expansion is performed by using spatial transformation such as image color gamut, geometry and the like (if the collected samples are further expanded, the step can be omitted).

2.2 step-by-step training of network parameters: as the training is classified task, the loss function adopts cross entropy.

Firstly, training shared feature layer parameters, wherein contribution coefficients of all tasks to weight updating are the same, namely:

Loss＝1/3*L_time+1/3*L_weather+1/3*L_abnormal

3. And (3) online scene analysis: and (3) based on vehicle-mounted visual input, performing real-time scene analysis by using the deep convolutional neural network obtained by training in the step (2), wherein the real-time scene analysis comprises network compression and time-sharing parallel analysis strategies.

3.1 network compression: and (3) quantizing (8/16 bits) and thinning (20-50%) the neural network parameters obtained by training in the step (2), wherein the length of quantized data bits and the thinning degree are configurable parameters, evaluating the loss of the network output precision after compression by using a test set, and determining whether to retrain the quantized model according to the precision of the quantized data.

3.2 time-sharing parallel analysis strategy: the scene anomaly detection priority is high, each frame is required to be executed preferentially, the updating frequency of the relative result of weather and weather detection is low, a frame separation or frame skipping execution mode can be adopted, a double-thread (or multi-thread) parallel mode can be adopted in front-end application, one thread executes scene anomaly identification, the other thread alternately performs day-time and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies for priority to identify network priority by the scene anomaly. The above-described embodiment is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the scope of the invention as set forth in the claims.

Claims

1. A visual scene recognition system based on a deep neural network, comprising:

the vehicle-mounted vision system is used for acquiring a vehicle forward-looking vision image;

the off-line training module is used for collecting samples from a vehicle forward-looking visual image collected from a vehicle-mounted visual system by using a deep convolutional neural network, marking the samples to generate sample labels, and training neural network parameters step by step; the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional characteristics, and network parameters are trained through samples and training tasks; and the number of the first and second groups,

the online analysis module is used for performing real-time scene analysis by adopting a network compression and time-sharing parallel analysis strategy and utilizing the deep convolutional neural network trained by the offline training module, and outputting the time of day, the weather and the scene abnormal state of a road scene where the vehicle-mounted vision system is located;

the off-line training module comprises a sample acquisition and labeling unit and a neural network parameter step-by-step training unit;

the sample acquisition and labeling unit is used for acquiring a vehicle forward-looking visual image in an off-line manner, extracting a discrete time sequence training sample, and expanding the sample by utilizing space transformation; sample distribution of each category of each task is balanced, labeling is carried out, and a sample label is generated;

the tasks include: identifying time of day, weather and scene abnormity;

the categories include: a time of day category, a weather category, and a scene anomaly category;

sample labels for the time of day category: 0-day, 1-dusk, 2-night;

sample labels for weather categories: 0-sunny, 1-cloudy, 2-rainy, 3-snowy, 4-foggy; and the number of the first and second groups,

sample labels for scene anomaly categories: 0-normal, 1-occlusion, 2-fuzzy, 3-too bright, 4-too dark; the neural network parameter step-by-step training unit is used for classified task training, the cross entropy is used as a loss function, firstly, shared feature layer parameters are trained, and the contribution coefficients of all tasks to weight updating are the same, namely:

Loss＝1/3*L_time+1/3*L_weather+1/3*L_abnormal；

2. The deep neural network-based visual scene recognition system of claim 1, wherein the deep convolutional neural network is formed by a three-branch classification network sharing two layers of shallow convolutional features, and comprises:

the time-of-day identification network: the time identification network inputs shared shallow features and outputs three time classifications of day, dusk and night;

weather identification network: the weather identification network inputs shared shallow features and outputs five weather classifications of sunny days, cloudy days, rainy days, snowy days and foggy days; and the number of the first and second groups,

scene anomaly identification network: the scene abnormity identification network inputs shared shallow layer characteristics, and outputs five abnormal scene classifications of normal scene, scene shading, fuzzy scene, too dark scene and too exposed scene.

3. The deep neural network-based visual scene recognition system of claim 1, wherein the online analysis module comprises a network compression unit and a time-sharing parallel analysis unit;

the network compression unit is used for quantizing and thinning the neural network parameters obtained by the offline training, evaluating the output precision loss of the compressed network by using the test set, and determining whether to retrain the quantized model according to the precision of the quantized data;

the time-sharing parallel analysis unit is used for preferentially executing each frame of scene anomaly detection, and the weather and weather detection adopts a frame separation or frame skipping execution mode; in the front-end application, a double-thread or multi-thread parallel mode is adopted, one thread executes scene abnormity identification, the other thread alternately performs time-of-day and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies priority to the scene abnormity identification network to take precedence.

4. A visual scene recognition method based on a deep neural network is characterized in that a vehicle forward-looking visual image is collected through a vehicle-mounted visual system, offline network training is carried out on the visual image by using the deep convolutional neural network, then online scene analysis is carried out, and the time of day, the weather and the scene abnormal state of a road scene where the vehicle-mounted visual system is located are output;

the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional characteristics, and network parameters are trained through samples and training tasks;

the offline network training comprises: sample collection and labeling and step-by-step training of neural network parameters;

the sample collection and labeling means: collecting a vehicle forward-looking visual image in an off-line manner, extracting a discrete time sequence training sample, and expanding the sample by utilizing spatial transformation; sample distribution of each category of each task is balanced, labeling is carried out, and a sample label is generated;

the tasks include: identifying time of day, weather and scene abnormity; the categories include: a time of day category, a weather category, and a scene anomaly category;

sample labels for the time of day category: 0-day, 1-dusk, 2-night;

sample labels for scene anomaly categories: 0-normal, 1-occlusion, 2-fuzzy, 3-too bright, 4-too dark;

the step-by-step neural network parameter training refers to: the classification task training is adopted, the loss function adopts cross entropy, firstly, the shared characteristic layer parameters are trained, and the contribution coefficients of all tasks to weight updating are the same, namely:

Loss＝1/3*L_time+1/3*L_weather+1/3*L_abnormal；

5. The visual scene recognition method based on the deep neural network, as claimed in claim 4, wherein the deep convolutional neural network is composed of three-branch classification networks sharing two layers of shallow convolutional features, and comprises:

6. The visual scene recognition method based on the deep neural network is characterized in that the online scene analysis comprises network compression and time-sharing parallel analysis strategies;

the network compression is as follows: quantizing and thinning the neural network parameters obtained by the offline training, wherein the length of quantized data bits and the thinning degree are configuration parameters; evaluating the loss of the compressed network output precision by using the test set, and determining whether to retrain the quantized model according to the quantized data precision;

the time-sharing parallel analysis strategy is as follows: each frame of scene anomaly detection is preferentially executed, and the weather and weather detection adopts a frame separation or frame skipping execution mode; in the front-end application, a double-thread or multi-thread parallel mode is adopted, one thread executes scene abnormity identification, the other thread alternately performs time-of-day and weather identification in a time-sharing mode, and the convolutional neural network hardware acceleration unit applies priority to the scene abnormity identification network to take precedence.