CN114418895A

CN114418895A - Driving assistance method and device, vehicle-mounted device and storage medium

Info

Publication number: CN114418895A
Application number: CN202210083093.3A
Authority: CN
Inventors: 陈洋; 王汝卓
Original assignee: Infiray Technologies Co Ltd
Current assignee: Infiray Technologies Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-04-29

Abstract

The embodiment of the application provides a driving assistance method and device for infrared images, vehicle-mounted equipment and a storage medium, wherein the driving assistance method comprises the following steps: acquiring an original visible light image of a driving scene, carrying out image enhancement on the original visible light image to obtain an enhanced visible light image, and forming a sample set according to the enhanced visible light image and the original visible light image; labeling targets contained in the images in the sample set to obtain a training set containing target labels; training an initial image model based on the training set to obtain a trained image recognition model; and acquiring a driving scene infrared image acquired by an infrared shooting device in real time, carrying out target detection on the driving scene infrared image through the image recognition model, and outputting a target detection result of the driving scene infrared image.

Description

Driving assistance method and device, vehicle-mounted device and storage medium

Technical Field

The present disclosure relates to the field of intelligent driving technologies, and in particular, to a driving assistance method and apparatus based on infrared images, a vehicle-mounted device, and a computer-readable storage medium.

Background

The automatic driving automobile is a motor vehicle which mainly depends on the technologies of artificial intelligence, visual calculation, radar, GPS positioning and structured road and vehicle cooperation, so that the automobile has the capabilities of environment perception, path planning and autonomous control, and the embedded edge computing terminal can automatically operate. Since 2020, the concept of L2, L3 level autopilot has begun to grow in popularity within the industry. 2021 is a very important node for the development of the automatic driving technology, and many experts have entered 2021 as the automatic driving technology into the third year of the L3 level. The automatic driving vehicle is different from the traditional human driving vehicle in that the automatic driving vehicle has the very core characteristics of application and leading of an AI technology, and the driving process is also that a computer continuously collects various information in the driving process, performs information analysis and self-learning in a machine learning or deep learning mode and then controls the vehicle, so that the system engineering of automatically driving the vehicle by the computer is achieved. For the automatic driving vehicle, the aim is to safely drive under any road condition and any environment and timely react to extreme conditions, so that the life safety and vehicle safety of a driver and the safety of road participants are guaranteed. The thermal imaging technology has the characteristic of observation in a completely dark environment and under various weather conditions, has very pertinence to the requirements of all-weather monitoring of an automatic driving vehicle in 24 hours and monitoring of zero illumination, smoke permeation and water permeation fog, and has wide application prospect.

However, the currently known neural network model usually only performs well in a visible light scene, and performs poorly on an infrared data set, and a large amount of labor and time costs are required for retraining the neural network model based on re-acquisition of a large amount of infrared image data, which affects practical application and popularization value.

Disclosure of Invention

In order to solve the existing technical problems, the application provides a driving assistance method and device based on infrared images, vehicle-mounted equipment and a computer readable storage medium, wherein the driving assistance method and device based on infrared images can reduce labor and time cost, and improve driving safety without depending on illumination conditions.

In order to achieve the above purpose, the technical solution of the embodiment of the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a driving assistance method based on an infrared image, which is applied to a vehicle-mounted device, and includes:

acquiring an original visible light image of a driving scene, carrying out image enhancement on the original visible light image to obtain an enhanced visible light image, and forming a sample set according to the enhanced visible light image and the original visible light image;

labeling targets contained in the images in the sample set to obtain a training set containing target labels;

training an initial image model based on the training set to obtain a trained image recognition model;

and acquiring a driving scene infrared image acquired by an infrared shooting device in real time, carrying out target detection on the driving scene infrared image through the image recognition model, and outputting a target detection result of the driving scene infrared image.

In a second aspect, an embodiment of the present application provides a driving assistance apparatus based on an infrared image, including:

the system comprises a sample acquisition module, a sample acquisition module and a sample processing module, wherein the sample acquisition module is used for acquiring an original visible light image of a driving scene, carrying out image enhancement on the original visible light image to obtain an enhanced visible light image, and forming a sample set according to the enhanced visible light image and the original visible light image;

the marking module is used for marking targets contained in the images in the sample set to obtain a training set containing target marks;

the training module is used for training an initial image model based on the training set to obtain a trained image recognition model;

and the target identification module is used for acquiring the driving scene infrared image acquired by the infrared shooting device in real time, carrying out target detection on the driving scene infrared image through the image identification module, and outputting a target detection result of the driving scene infrared image.

In a third aspect, an embodiment of the present application provides an in-vehicle device, which includes a processor, an infrared shooting device connected to the processor, a memory, and a computer program stored on the memory and executable by the processor, where the computer program, when executed by the processor, implements the driving assistance method based on infrared images according to any embodiment of the present application.

In a fourth aspect, this application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the driving assistance method based on infrared images according to any embodiment of this application.

In the driving assistance method based on the infrared image provided in the above embodiment, the original visible light image of the driving scene is obtained, the image enhancement is performed on the original visible light image to obtain the enhanced visible light image, the sample set is formed according to the enhanced visible light image and the original visible light image, the targets included in the images in the sample set are labeled to obtain the training set including the target label, the initial image model is trained based on the training set to obtain the trained image recognition model, the infrared image of the driving scene acquired by the infrared shooting device in real time is obtained, the infrared image of the driving scene is detected by the trained image recognition model, the target detection result of the infrared image of the driving scene is output, and thus, the sample image is amplified by performing the image enhancement on the original visible light image, on one hand, the problem of overfitting in the neural network model training of a small sample can be solved; on the other hand, the difference between the visible light image data and the infrared image data can be eliminated, and the neural network model with good performance on the visible light scene can be transferred to be suitable for the recognition of the infrared scene image, so that the neural network model with good performance on the visible light scene can also have good performance.

In the above embodiments, the driving assistance device based on infrared images, the vehicle-mounted device, the computer-readable storage medium and the corresponding driving assistance method embodiment based on infrared images belong to the same concept, so that the driving assistance device and the vehicle-mounted device respectively have the same technical effects as the corresponding driving assistance method embodiment based on infrared images, and are not described herein again.

Drawings

FIG. 1 is an architecture diagram of an alternative application scenario of an infrared image-based driving assistance method in an embodiment;

FIG. 2 is a flow chart of a driving assistance method according to an embodiment;

FIG. 3 is a schematic diagram of a network structure of an initial neural network model in one embodiment;

FIG. 4 is a diagram of an initial image model of a teacher-student architecture in one embodiment;

FIG. 5 is a diagram illustrating training of an initial image model using an ablation training strategy in one embodiment;

FIG. 6 is a comparison graph of the effect of using a YOLOP pre-training model and using the image recognition model trained in the present application to perform target detection on real-time acquired infrared images of a driving scene;

FIG. 7 is another comparison graph of the results of target detection on real-time acquired infrared images of a driving scene using a YOLOP pre-training model and using the image recognition model trained in the present application, respectively;

FIG. 8 is a comparison graph of another effect of using a YOLOP pre-training model and using the image recognition model trained in the present application to perform target detection on real-time acquired infrared images of a driving scene;

FIG. 9 is a flowchart of a driving assistance method in an alternative embodiment;

FIG. 10 is a schematic structural diagram of a driving assistance apparatus according to an embodiment;

FIG. 11 is a schematic structural diagram of an in-vehicle device according to an embodiment.

Detailed Description

The technical solution of the present application is further described in detail with reference to the drawings and specific embodiments of the specification.

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to the expression "some embodiments" which describe a subset of all possible embodiments, it being noted that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first, second, and third" are only used to distinguish between similar items and do not denote a particular order, but rather the terms "first, second, and third" are used to indicate that a particular order or sequence of items may be interchanged where appropriate to enable embodiments of the application described herein to be practiced otherwise than as specifically illustrated or described herein.

Referring to fig. 1, a schematic view of an optional application scenario of the driving assistance method based on infrared images according to the embodiment of the present application is shown, where the vehicle-mounted device includes a main control system 13 and an infrared camera 11 communicatively connected to the main control system 13, and the main control system 13 mainly includes a memory, a processor, a display communicatively connected to the processor, and an input device. The main control system 13 is loaded with a computer program for implementing the driving assistance method based on the infrared image provided in the embodiment of the present application. Optionally, the vehicle-mounted device further includes a display alarm module 12 communicatively connected to the main control system 13. The vehicle-mounted equipment is arranged on a vehicle, when the vehicle runs on a lane, the infrared shooting device 11 collects infrared images around the running process of the vehicle in real time to form running scene infrared images and sends the running scene infrared images to the main control system 13, the main control system 13 carries out target detection on the running scene infrared images through an image recognition model, the target detection result is formed by identifying the drivable area, the lane line, the type and the position of the target object in the infrared image of the driving scene, according to the target detection result, a suspected collision target which is close to the vehicle and is possibly collided with the vehicle can be determined, if the risk that the suspected collision target collides with the vehicle is determined to exceed a certain value, can send a control instruction to the display alarm module 12, control the display alarm module 12 to send collision early warning information, to provide the vehicle driver with advance awareness of the risk and timely action to eliminate the risk of collision or timely action to mitigate the degree of collision.

Referring to fig. 2, a driving assistance method based on infrared images according to an embodiment of the present application is applied to the vehicle-mounted device shown in fig. 1, and the driving assistance method includes the following steps:

s101, obtaining an original visible light image of a driving scene, carrying out image enhancement on the original visible light image to obtain an enhanced visible light image, and forming a sample set according to the enhanced visible light image and the original visible light image.

The original visible light image of the driving scene can be visible light picture data shot by a visible light camera in real time during the driving process of the vehicle in the daytime and/or collected visible light video data. The visible light image of the driving scene is acquired by the vehicle in the daytime driving scene, so that more sources and more ways are available for acquiring the original visible light image of the driving scene. The method comprises the steps of obtaining an enhanced visible light image by carrying out image enhancement on an original visible light image, forming a sample set according to the enhanced visible light image and the original visible light image, and expanding the sample image data set by carrying out image enhancement on the original visible light image to obtain the enhanced visible light image.

The original visible light image is subjected to image enhancement to obtain an enhanced visible light image, and the image enhancement can be performed through a preset image enhancement mode. The image enhancement mode is an image processing method which aims to emphasize the overall and local characteristics of an image purposefully by enhancing useful information in the image, change an original unclear image into clear or emphasize certain interesting characteristics, enlarge the difference between different object characteristics in the image, inhibit the uninteresting characteristics, improve the image quality and enrich the information content, enhance the image interpretation and recognition effect and meet the requirements of certain special analysis. The original visible light image is subjected to image enhancement in a preset image enhancement mode to obtain an enhanced visible light image, and a sample set is formed according to the enhanced visible light image and the original visible light image.

And S103, labeling the targets contained in the images in the sample set to obtain a training set containing target labels.

The objects contained in the image may include various objects expected to be detected, analyzed and identified by the image recognition model, such as lanes, travelable areas, people, vehicles, objects and the like within the travel scene. Alternatively, the target included in the image may only refer to a target object that may generate a collision risk with the vehicle within the driving scene during the driving process of the vehicle, such as a person, a vehicle, and another object on the road such as a road block. And marking the target contained in the image in the sample set, wherein the target can be obtained by manual marking or automatic marking through a preset marking tool. The method comprises the steps of marking targets contained in images in a sample set, obtaining a training set containing target marks according to the images and the marks, and obtaining the training set by acquiring original visible light images under various driving scenes acquired by image acquisition equipment on a vehicle in the driving process in the daytime, enhancing the original visible light images to obtain enhanced visible light images, and marking various targets in the original visible light images and the enhanced visible light images.

In an alternative specific example, the image capturing device is installed at the front grille, and the visible light image data of the scenes of the expressway, the national road, the urban area and the suburban area are captured under different regions, different seasons and different weather conditions, so as to obtain tens of thousands of hours of video files. The method comprises the steps of extracting frames from a video file in proportion to obtain pictures, strictly cleaning the pictures, eliminating data without targets and repeated scenes with similar heights, and obtaining an original visible light image of a driving scene. Performing image enhancement on an original visible light image by adopting a preset image enhancement mode to obtain a corresponding enhanced visible light image, and manually marking targets in the original visible light image and the enhanced visible light image, wherein in the embodiment, the target marking comprises dividing line marking of a drivable area in a driving scene image, dividing line marking of a lane line in the driving scene image, and marking of the type and position of a target object which can generate collision risk with a vehicle in the lane, the target object comprises two categories of pedestrians and vehicles, and the pedestrians are divided into person, cyclist and rider of an electric vehicle and a motorcycle and correspond to a standard pedestrian, a cyclist of the bicycle and a rider of the electric vehicle and the motorcycle respectively; the vehicles are divided into car, bus, truck and vehicle, which are respectively corresponding to cars, buses, trucks and other types of vehicles.

And S105, training the initial image model based on the training set to obtain a trained image recognition model.

The initial image model may employ various known convolutional neural network models, deep convolutional neural network models, and the like. In an alternative example, please refer to fig. 3, the initial image model adopts a YOLOP (panoramic driving perception) convolutional neural network model, and the model architecture of the YOLOP may include a Backbone network Backbone, a Neck layer Neck, and a detection head Detect head. The Backbone network Backbone is used for extracting the characteristic information of the target in the image and inputting the characteristic diagram into a Neck structure Neck; the Neck structure neutral performs feature fusion on feature graphs of different trunk layers, and fuses a shallow network and a deep network to enhance the fusion capability of network features; the detection head Detect head is used to generate bounding boxes and classes of predicted targets. The detection head Detect head is used as an output layer of the image recognition model, a plurality of detection heads are correspondingly arranged corresponding to detection output results of different targets, the target labels comprise the section line labels of travelable areas in a traveling scene image, the section line labels of lane lines in the traveling scene image and the labels of types and positions of target objects of which lanes can generate collision risks with vehicles, the detection head Detect head correspondingly comprises the travelable area section head, and a Mask matrix with the shape similar to the size of an input picture is output; a lane line dividing head for outputting Mask matrix with the size similar to that of the input picture; and the target detection head is used for outputting a matrix with a preset size.

S107, acquiring a driving scene infrared image acquired by an infrared shooting device in real time, carrying out target detection on the driving scene infrared image through the image recognition model, and outputting a target detection result of the driving scene infrared image.

The driving scene infrared image is an infrared image shot for the environment where the vehicle is currently driving, and when the vehicle is driving on a lane, the driving scene infrared image is an infrared image shot for a certain range of areas around the vehicle on the lane. The infrared image of the driving scene where the vehicle is located is acquired in the driving process of the vehicle, roads, buildings, other vehicles, people or objects and the like in front of the vehicle, which are close to the surroundings of the vehicle, can be shot in the infrared image of the driving scene, so that the risk that the vehicle may collide during driving can be predicted according to the target detection result and early warning can be carried out before the collision occurs by detecting and analyzing the target in the infrared image of the driving scene acquired in real time, or the driving operation of the vehicle during driving on the road is optimized and assisted with prompt according to the target detection result, or the intelligent driving control of the vehicle is realized according to the target detection result. The target detection result comprises a drivable region segmentation result, a lane line segmentation result and a recognition result of whether the infrared image of the driving scene contains target object types such as people, vehicles or other objects and the positions of the target object types. The target object type refers to a type of different target objects which may form a collision risk with the vehicle and are included in the infrared image of the driving scene, wherein the number, the type name and the like of the target object type can be preset, for example, the target object type is a person or a vehicle; or the target object categories are classified for people into pedestrians, people riding bicycles, people riding electric vehicles, and for vehicles into cars, buses, trucks, and other categories of vehicles.

The vehicle-mounted device performs target detection on the driving scene infrared image through the image recognition model, and outputting the target detection result of the driving scene infrared image may be to output a segmentation result of a drivable area in the driving scene infrared image, a segmentation result of a lane line, and a target detection result of a target object type and a target object position included in the drivable area. The vehicle-mounted equipment carries out target detection on the infrared image of the driving scene, determines the imaging area of the driving area of the vehicle, the imaging area of the lane line, the type of the target objects including people, vehicles and the like in the image and the position of the target objects in the infrared image of the driving scene, and outputs a target detection result corresponding to the imaging range, the lane line position, the type of the target objects and the position of the target objects including the driving area in the infrared image of the driving scene.

Optionally, the target detection result further includes a size of the target object, the vehicle-mounted device performs target detection on the driving scene infrared image, determines a type of the target object, a position of the target object in the driving scene infrared image, and a size of the target object in the driving scene infrared image when it is determined that the driving scene infrared image includes the target object such as a person or a vehicle, and forms and outputs the target detection result according to an imaging range, a lane line position, a type of the target object, a position of the target object, and a size of the target object in the driving scene infrared image, where the corresponding driving scene infrared image includes the travelable region.

In the above embodiment, the original visible light image of the driving scene is acquired, the original visible light image is image-enhanced to obtain an enhanced visible light image, a sample set is formed according to the enhanced visible light image and the original visible light image, marking targets contained in the images in the sample set to obtain a training set containing target marks, training an initial image model based on the training set to obtain a trained image recognition model, acquiring infrared images of a driving scene acquired by an infrared shooting device in real time, carrying out target detection on the driving scene infrared image through the trained image recognition model, outputting a target detection result of the driving scene infrared image, and thus, the original visible light image is subjected to image enhancement to amplify the sample image, so that the problem of overfitting in the neural network model training through a small sample can be solved; on the other hand, the difference between the visible light image data and the infrared image data can be eliminated, and the neural network model with good performance on the visible light scene can be transferred to be suitable for the recognition of the infrared scene image, so that the neural network model with good performance on the visible light scene can also have good performance.

Optionally, the training an initial image model based on the training set to obtain a trained image recognition model includes:

constructing an initial neural network model, training the initial neural network model based on an open source image data set to obtain a pre-trained neural network model, and taking the pre-trained neural network model as an initial image model;

and training the initial image model based on the training set to obtain a trained image recognition model.

Training the initial neural network model based on the open source image data set to obtain a pre-trained neural network model, taking the pre-trained neural network model as the initial image model, and training the initial image model based on the training set to obtain a trained image recognition model. In an optional example, the pre-trained neural network model obtained by training the initial neural network model based on the open source Image data set is specifically a pre-trained neural network model obtained by training a YOLOP convolutional neural network model based on an Image-Net open source Image data set.

In the above embodiment, the pre-trained neural network model is obtained by training the initial neural network model based on the open source image data set, and the pre-trained neural network model is trained based on the training set, so that the pre-trained neural network model has certain feature extraction capability, and thus, the pre-trained neural network model can be trained with a smaller initial learning rate, and the smaller initial learning rate can help the pre-trained neural network model to find local optimal points more easily and converge more quickly, thereby improving the training efficiency and enhancing the robustness of the image recognition model.

constructing an initial neural network model, and taking the initial neural network model as a teacher model;

compressing the number of convolution kernels in the teacher model to obtain a student model;

constructing an initial image model according to the teacher model and the student model, training the initial image model based on the training set, keeping a first model parameter of the teacher model unchanged in the training process, and iterating a second model parameter of the student model by adopting a preset gradient descent algorithm according to the gradient of a consistency loss function between prediction results of the teacher model and the student model;

and obtaining the trained image recognition model until a preset iteration condition is met.

Wherein, the initial neural network model is used as a Teacher model, the initial neural network model after the convolution kernel is compressed is used as a Student model, and an initial image model of a Teacher-Student architecture (Teacher-Student) is constructed, please refer to fig. 4, for the teacher model, an online data set can be used as a sample set, and the student model uses a training set (offline data set) as a training set, in the training process, the model parameters of the teacher model are kept unchanged, for the convenience of distinguishing, the model parameters of the teacher model are called as first model parameters, iterating the model parameters of the student model by adopting a preset gradient descent algorithm through the gradient of a Consistency loss (Consistency Lost) function between the prediction results of the teacher model and the student model, for the convenience of distinguishing, the model parameters of the student model are called as second model parameters until preset iteration conditions are met to obtain the trained image recognition model. The preset gradient descent algorithm may be a known gradient descent algorithm such as SGD (Adaptive gradient descent) or Adam (Adaptive matrix estimation).

Optionally, the constructing an initial neural network model, and using the initial neural network model as a teacher model, includes: an initial neural network model is built, the initial neural network model is trained on the basis of an open source image data set to obtain a pre-trained neural network model, and the pre-trained neural network model is used as a teacher model. Here, the pre-trained neural network model obtained after training based on the open source image data set is used as the teacher model, and a certain feature extraction capability of the pre-trained neural network model before training through the training set can be utilized, so that the pre-trained neural network model is helped to find local optimal points more easily and converge more quickly through a smaller initial learning rate, and the training efficiency is improved.

In an optional specific example, the initial neural network model is a YOLOP convolutional neural network model, the original model of the YOLOP is used as a Teacher model, the convolution kernel quantities of a Focus module and a CBL (conv-bn-leakage _ Relu) module in a Backbone network backhaul in the YOLOP original model are compressed to 1/2 of the convolution kernel quantity of the original model, the compressed model is used as a Student model, and the pi model can be referred to for the constructed initial image model of a Teacher-Student architecture (Teacher-Student).

In the above embodiment, the Teacher-Student model based on the constructed YOLOP is used as the initial image model, the Teacher-Student model of the YOLOP is trained through the training set to obtain the trained image recognition model, the training efficiency can be improved, the training efficiency and the recognition efficiency of the trained model can be greatly improved under the condition of unobvious recognition precision loss, the expression effect of migrating the neural network model which is well expressed to the visible light scene based on the small sample data to the recognition suitable for the infrared scene image is further improved, and the robustness of the image recognition model is enhanced.

Optionally, the obtaining a trained image recognition model until a preset iteration condition is met includes:

setting a weighting coefficient of the consistency loss function by adopting a Gaussian slope function, wherein the expression of the weighting coefficient is as follows:

setting the initial value of the weighting coefficient to be 0, and obtaining a trained image recognition model until t approaches to 1; or the like, or, alternatively,

obtaining a trained image recognition model until the iteration times reach preset times; or the like, or, alternatively,

and obtaining the trained image recognition model until the consistency loss function is converged.

The weighting coefficient of the Consistency loss function Consistency Lost between the teacher model and the student model adopts a Gaussian ramp-up function (Gaussian ramp-up), and the expression of the weighting coefficient is as follows:

when training begins, the initial value of the weighting coefficient w (t) is set to be 0, so that the problem that a student model with poor characteristic extraction effect has great influence on a loss value when training begins can be eliminated, and t approaches to 1 along with the increase of a training period. In an alternative example, iterative training of the Teacher-Student model for YOLOP through the training set may be as follows: x is the number of_iTraining sample excitation; inputting a training small sample with labels; y is_iLabel of ith training sample; w (t) is the weighting coefficient of the unsupervised ramp function; t is_μ(x) A Teacher model with trainable first model parameters μ; s_θ(x) A Student model with a trainable second model parameter θ; g (x) an excitation function; k (x) YOLOP loss function; c is the number of target object categories used by the target detection head for detection; z is a radical ofⁱThe predicted result of the Student model;

a prediction result of the model; i belongs to minimatch B, which means that minimatch training samples are taken to the Teacher-Student model once in each iteration cycle. Training for one iteration may be represented as follows:

for t in[1,num_epochs]do

for each mimbatch B do

z_i∈B←S_θ(g(x_i∈B))

in the iterative training process, the first model parameter mu is kept unchanged, and the value of the second model parameter theta is updated by using a gradient descent strategy such as SGD or Adam.

T may continuously approach 1 as the training period increases, and thus t may be set to approach 1 as the termination condition of the model training iteration. Optionally, the iteration times reaching the preset times may be set as termination conditions of the model training iteration to improve training efficiency; or setting consistency loss function convergence as the termination condition of model training iteration to ensure that the training precision and the training efficiency obtain better balance.

Optionally, the labeling the target included in each image in the sample set to obtain a training set including a target label includes:

and labeling the category and the position of a target object contained in each image in the sample set, segmenting and labeling the drivable area, and segmenting and labeling the lane line to obtain a training set containing target labels.

The target labeling of the images of the sample set comprises three types: firstly, segmenting and labeling a travelable region; secondly, segmenting and marking the lane lines; and thirdly, labeling the category and the position of the target object. The training set is composed of sample images carrying three types of target labels, and the initial image model is trained through the training set composed of the sample images containing the three types of target labels, so that the trained image recognition model can detect and analyze the drivable area, lane lines and the target object type and position in the infrared image acquired in real time in the driving scene of the vehicle, and output target detection results containing the drivable area, lane lines and the target object type and position in the infrared image corresponding to the driving scene.

unfreezing model parameters of an initial image model, and training the initial image model based on the training set until a first iteration termination condition is met to obtain an intermediate image model;

freezing a backbone network layer in the intermediate image model, a first segmentation detection head for detecting the travelable area and model parameters of a second segmentation detection head for detecting the lane line;

and training the intermediate image model based on the training set until a second iteration termination condition is met to obtain a trained image recognition model.

Secondly, dividing training of the initial image model into two rounds of iterative training, unfreezing the model parameters of the initial image model in the first round of iterative training process, namely not freezing any model parameters of the initial image model, and training the initial image model through a training set until a first iteration termination condition is met to obtain an intermediate image model; in the second iteration training process, freezing the backbone network layer in the intermediate image model obtained by the first iteration training, the first segmentation detection head for detecting the travelable area and the model parameters of the second segmentation detection head for detecting the lane line, and training the intermediate image model through the training set until the second iteration termination condition is met, thereby finally obtaining the trained image recognition model. In an alternative example, the first iteration termination condition refers to training 150 iteration cycles, and the second iteration termination condition refers to training 50 iteration cycles.

In the above embodiment, the training of the initial image model is divided into two rounds of iterative training, the model parameters are not frozen in the first round of iterative training for training, and on the basis of the intermediate image model obtained in the first round of iterative training in the second round of iterative training, the backbone network layer and the ablation training strategy for training the model parameters of the two semantic segmentation detection heads for detecting the travelable region and the lane line are frozen, which is beneficial to helping the model training process to find the local optimal point more quickly and improve the training precision and the training speed. It should be noted that the ablation training strategy herein can be applied to any type of initial image model in the different embodiments of the present application, please refer to fig. 5, such as the initial neural network model (5a) in the training scheme directly trained with the initial neural network model; a pre-trained neural network model (5b) in the training scheme based on a pre-trained neural network model obtained after training based on the open source image data set; to build a teacher-student architecture model (5c) in a scenario where the teacher model and the student models are trained.

Optionally, the image enhancing the original visible light image to obtain an enhanced visible light image includes:

and carrying out image enhancement on the original visible light image by adopting an image enhancement mode based on chromaticity space change to obtain an enhanced visible light image.

The original visible light image is subjected to image enhancement in a preset image enhancement mode, for example, an image enhancement mode based on chromaticity space change is selected, so that after the enhanced visible light image obtained after enhancement is used for expanding a training set, an image recognition model obtained after an initial image model is trained through the training set is migrated and used for realizing that the recognition effect which is as good as that of the visible light image can be kept when the infrared image is subjected to target detection.

performing image enhancement on the original visible light image by adopting at least one of the following image enhancement modes to obtain an enhanced visible light image:

randomly changing a color dithering enhancement mode of at least one of brightness, saturation and contrast of the original visible light image;

an inversion enhancement mode in which each pixel value of the original visible light image is inverted;

a randomization enhancement mode for randomly inverting pixels of which pixel values exceed a set threshold value in the original visible light image;

a color level separation enhancement mode for randomly reducing preset bit values for each color channel in the original visible light image;

and performing a Gaussian blur enhancement mode of Gaussian convolution on the original visible light image.

The image enhancement mode for enhancing the original visible light image may specifically be: one or more of a color dither enhancement mode (ColorJitter), an inversion enhancement mode (inverter), a randomization enhancement mode (RandomSolarize), a gamut separation enhancement mode (Posterize), a gaussian blur enhancement mode (gaussian blur). The color-dither enhancement mode is to obtain a corresponding visible-light-enhanced image by randomly changing the brightness (brightness), saturation (contact), and contrast (hue) of the original visible-light image. The reverse enhancement mode is to obtain a corresponding visible light enhanced image by reversing the pixel values of each pixel (pixel) of the original visible light image. The randomized enhancement mode is to randomly invert the pixels with pixel values exceeding a set threshold (threshold) in the original visible light image to obtain a corresponding visible light enhanced image. The color level separation enhancement mode is to randomly reduce preset bit values (bits) for each color channel in the original visible light image to obtain a corresponding visible light enhancement image. The Gaussian blur enhancement mode is to obtain a corresponding visible light enhanced image by performing Gaussian convolution on an original visible light image. The initial image model is subjected to transfer learning by using the mixed data set formed by the visible light enhanced image after image enhancement and the original visible light image, so that the model is trained on the data set after data enhancement by adopting a smaller learning rate, and the recognition effect which is as good as that of the visible light image can be kept when the infrared image is subjected to target detection.

In the above embodiment, the color dither enhancement mode, the inversion enhancement mode, the randomization enhancement mode, the color level separation enhancement mode, and the gaussian blur enhancement mode are used as the image enhancement mode to perform the offline enhancement on the original visible light image, the enhanced visible light image obtained after the enhancement is used to extend the original visible light data set used for the training model, and the extended image data set is six times of the original visible light data set, so that when the training sample is a small sample, and the visible light image data set and the infrared image data set have a large difference, how to eliminate the over-fitting problem of the small sample is realized, and the difference from the infrared data set is eliminated, so that the image recognition model which performs well on the visible light scene is transferred to the infrared image data set, and the good performance can be achieved.

Taking an initial image model as a YOLOP pre-training model obtained by training an initial YOLOP model based on an open source image data set as an example, training the initial image model through a training set to obtain an image recognition model, as shown in fig. 6, in order to respectively perform target detection effect comparison graphs on infrared images of a driving scene acquired in real time by using the YOLOP pre-training model and the image recognition model trained in the application, through transfer learning, on one hand, the false detection rate can be reduced, and target objects such as pedestrians, bicycles and the like can be accurately recognized; on the other hand, the identification of lane lines and travelable areas can be newly added, and the identification effect is also improved. As shown in fig. 7, in order to compare the results of target detection on the infrared images of the driving scene acquired in real time by using the YOLOP pre-training model and the image recognition model trained in the present application, pedestrians, traffic lights, etc. can be taken as target objects in the range of target detection through transfer learning, so that the pedestrians and traffic lights which are not sensed originally can be recognized, and the application value of the YOLOP model in the field of vehicle-mounted auxiliary driving is improved. As shown in fig. 8, in order to compare the results of target detection on the infrared images of the driving scene acquired in real time by using the YOLOP pre-training model and the image recognition model trained in the present application, the riders, the traffic lights, etc. can be brought into the range of target detection as target objects through transfer learning, so that the riders and the traffic lights which are not perceived originally can be recognized, and the application value of the YOLOP model in the field of vehicle-mounted assisted driving is improved.

In order to provide a more comprehensive understanding of the driving assistance method based on infrared images provided in the embodiment of the present application, please refer to fig. 9, and a flow of implementing the driving assistance method will be described with an alternative specific example. The vehicle-mounted equipment comprises an infrared shooting device and a master control system, wherein the master control system comprises a memory and a processor.

S11, acquiring an original visible light image of a driving scene, performing offline enhancement (offset enhancement) on the original visible light image by adopting a preset image enhancement mode to obtain an enhanced visible light image, and expanding a training sample set through the enhanced visible light image; image enhancement modes include ColorJitter, Invert, RandomSolarize, Posterize, GaussianBlur, and the like.

S12, constructing an initial image model; the initial image model may be one of: the teacher-student architecture model is constructed by taking the initial neural network model as a teacher model and taking the initial neural network model obtained by compressing a convolution kernel as a student model. In this embodiment, the initial neural network model is a YOLOP model, and a network structure of the YOLOP model includes: backbone network Backbone, Neck layer tack, detection head Detect head. The Backbone network Backbone is used for extracting the characteristic information of the target in the image and inputting the characteristic diagram into a Neck structure Neck; the Neck structure neutral performs feature fusion on feature graphs of different trunk layers, fuses a shallow network and a deep network, and strengthens the fusion capability of network features; the detection head Detect head is used for generating a boundary frame and predicting the category of a target, and comprises a travelable area dividing head for detecting a travelable area in a traveling scene image, a lane line dividing head for detecting a lane line and a target detection head for detecting the type and the position of a target object such as a pedestrian, a vehicle and the like; respectively outputting Mask (Mask) matrixes with sizes similar to those of driving scene images by the driving area dividing head and the lane line dividing head; the target detection head outputs a matrix with the size of 1 xKx (5+ nc), a network structure of the target detection head adopts a Path Aggregation Network (PAN) structure composed of a plurality of Feature Pyramids (FPNs) and is used for extracting feature information of target objects with different scales, for example, two FPNs are adopted to form the PAN structure and respectively extract feature map information corresponding to the target objects with the sizes of three different scales, wherein the K dimension in the matrix output by the target detection head contains all the feature map information with the sizes of three different scales, and is positively correlated with the size of an input image to be recognized, and nc (number of class) refers to the number of classification task categories of the target detection head and can represent the number of categories of the target objects contained in the image to be recognized by the target detection head.

S13, training an initial image model through a training sample set, training the initial image model by using a mixed data set after image enhancement and expansion, and obtaining a trained image recognition model by using transfer learning of the initial image model; by means of the migration learning strategy for training the initial image model by using the enhanced visible light image obtained after the image enhancement mode is enhanced, the initial image mode with good target detection performance on the visible light scene image is migrated to the application of target detection and identification on the infrared scene image.

S14, acquiring infrared images of a driving scene in real time through an infrared shooting device and sending the images to a main control system;

s15, carrying out target detection on the infrared image of the driving scene through the image recognition model, and outputting a target detection result of the infrared image of the driving scene; the target detection result comprises detection and identification of types and positions of target objects such as drivable areas, lane lines, pedestrians and vehicles in the infrared image of the driving scene.

In the above embodiment, the driving assistance method has at least the following features:

firstly, an image data set is expanded based on an enhanced visible light image of image enhancement, so that the problem of over-fitting of a small sample can be solved, the difference between the visible light image data set and an infrared image data set can be eliminated to the maximum extent, and the purpose of good performance can be achieved after an image identification model well performing on a visible light scene is transferred to the infrared image data set;

secondly, through transfer learning training, the target detection range is expanded, pedestrians, riders, traffic lights and the like can be taken as target objects and are all brought into the target detection range, meanwhile, the travelable region is divided, lane line identification is brought into the target detection range, the trained image identification model can identify the original unperceived pedestrians, riders and traffic lights, and the travelable region and the lane line in the image are identified, so that the application value of an initial image model, such as a YOLOP model, in the field of vehicle-mounted auxiliary driving is improved;

in one example, taking an example that the initial neural network model is a YOLOP model, the image recognition model obtained after training is used for realizing the comparison of target detection on the infrared image of the driving scene, as shown in the following table:

wherein Precision refers to Precision and represents Precision of a model; recal refers to recall rate and is used for representing recall rate of the model; f1 is the arithmetic mean of Precision and Recal, i.e., F1 ═ 1/Precision +1/Recal)/2 ═ 2Precision ═ Recal/(Precision + Recal); the mAP (mean Average Precision) refers to the Average value of all types of APs, and is an evaluation index of target detection, wherein the AP (Average Precision) refers to the Average Precision and is generally expressed as the area under a Precision-Recall curve; mIOU (mean Intersection over Union) refers to the average cross-over ratio, i.e. the average value of IOU of each category; accuracy refers to pixel precision, i.e., the proportion of pixels marked correctly to the total pixels. As shown in the above table, the extended hybrid data set refers to a training set obtained by enhancing an original visible light image in a preset image enhancement mode to obtain an enhanced visible light image, and extending the original sample set by the enhanced visible light image. The sequence number 1 is an image recognition model obtained by training a neural network model of random initial model parameter values by using an expanded mixed data set (17178 training image sets); sequence number 2 refers to an image recognition model obtained by training a neural network model with random initial model parameter values through a development data set (BDD100 k: A overview Driving Video Database with Scalable simulation Tooling, public Driving data set) with a specified size; sequence number 3 refers to an image recognition model obtained by training a pre-trained neural network model using an extended mixed data set (17178 training image sets); sequence number 4 refers to an image recognition model obtained by training a pre-trained neural network model with a starting data set (BDD100k) of a specified size.

In one example, the target detection result of the target detection on the infrared image of the driving scene by the image recognition model may include the following cases: predicting a collision risk which is possibly generated according to the relative position of the target object and the vehicle, and carrying out early warning prompt on the collision risk through a display warning module so as to assist driving operation and improve driving safety; generating an intelligent driving control instruction according to the driving available area of the vehicle, the lane line and the detection result of the target object, so as to realize intelligent driving assistance; according to the detection results of the driving feasible region, the lane line and the target object of the vehicle, the emergency situation in the driving process of the vehicle is pre-judged, and a control command for eliminating the emergency situation is generated as necessary to assist the driving operation and improve the driving safety.

Thirdly, aiming at the initial image model, the initial neural network model is trained on the basis of the open source image data set to obtain a pre-training neural network model or a training scheme for constructing a teacher-student architecture model, and on the premise of ensuring the recognition accuracy of the image recognition model, the training efficiency can be effectively improved.

Referring to fig. 10, another aspect of the present embodiment provides a driving assistance device, including a sample obtaining module 21, configured to obtain an original visible light image of a driving scene, perform image enhancement on the original visible light image to obtain an enhanced visible light image, and form a sample set according to the enhanced visible light image and the original visible light image; the labeling module 22 is configured to label targets included in each image in the sample set to obtain a training set including target labels; the training module 23 is configured to train an initial image model based on the training set to obtain a trained image recognition model; and the target identification module 24 is configured to acquire a driving scene infrared image acquired by an infrared shooting device in real time, perform target detection on the driving scene infrared image through the image identification model, and output a target detection result of the driving scene infrared image.

Optionally, the training module 23 is specifically configured to construct an initial neural network model, train the initial neural network model based on an open-source image data set to obtain a pre-trained neural network model, and use the pre-trained neural network model as the initial image model; and training the initial image model based on the training set to obtain a trained image recognition model.

Optionally, the training module 23 is further configured to construct an initial neural network model, and use the initial neural network model as a teacher model; compressing the number of convolution kernels in the teacher model to obtain a student model; constructing an initial image model according to the teacher model and the student model, training the initial image model based on the training set, keeping a first model parameter of the teacher model unchanged in the training process, and iterating a second model parameter of the student model by adopting a preset gradient descent algorithm according to the gradient of a consistency loss function between prediction results of the teacher model and the student model; and obtaining the trained image recognition model until a preset iteration condition is met.

Optionally, the training module 23 is further configured to set a weighting coefficient of the consistency loss function by using a gaussian slope function, where an expression of the weighting coefficient is:

setting the initial value of the weighting coefficient to be 0, and obtaining a trained image recognition model until t approaches to 1; or obtaining the trained image recognition model until the iteration times reach the preset times; or obtaining the trained image recognition model until the consistency loss function converges.

Optionally, the labeling module 22 is specifically configured to label the type and position of the target object included in each image in the sample set, segment and label the travelable region, and segment and label the lane line, so as to obtain a training set including the target label.

Optionally, the training module 23 is further configured to unfreeze model parameters of an initial image model, train the initial image model based on the training set, until a first iteration termination condition is met, and obtain an intermediate image model; freezing a backbone network layer in the intermediate image model, a first segmentation detection head for detecting the travelable area and model parameters of a second segmentation detection head for detecting the lane line; and training the intermediate image model based on the training set until a second iteration termination condition is met to obtain a trained image recognition model.

Optionally, the sample obtaining module 21 is specifically configured to perform image enhancement on the original visible light image by using an image enhancement mode based on chromaticity space change to obtain an enhanced visible light image.

Optionally, the sample obtaining module 21 is further configured to perform image enhancement on the original visible light image by using at least one of the following image enhancement modes to obtain an enhanced visible light image: randomly changing a color dithering enhancement mode of at least one of brightness, saturation and contrast of the original visible light image; an inversion enhancement mode in which each pixel value of the original visible light image is inverted; a randomization enhancement mode for randomly inverting pixels of which pixel values exceed a set threshold value in the original visible light image; a color level separation enhancement mode for randomly reducing preset bit values for each color channel in the original visible light image; and performing a Gaussian blur enhancement mode of Gaussian convolution on the original visible light image.

It should be noted that: in the driving assistance device provided in the above embodiment, in the process of implementing the driving operation assistance during the driving of the vehicle, only the division of the above program modules is exemplified, and in practical applications, the above processing may be distributed to different program modules as needed, that is, the internal structure of the device may be divided into different program modules, so as to complete all or part of the above described method steps. In addition, the driving assistance device provided by the above embodiment and the driving assistance method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment, and is not described herein again.

Referring to fig. 11, in another aspect of the embodiment of the present application, an on-board device is further provided, which includes a processor 21, an infrared camera 11 connected to the processor 21, a memory 22, and a computer program stored in the memory 22 and executable by the processor 21, and when the computer program is executed by the processor 21, the driving assistance method according to any embodiment of the present application is implemented. The infrared shooting device 11 may be an infrared camera that collects image data of a driving scene in real time during the driving of the vehicle. The number of processors 21 may be one or more; the infrared camera is in communication connection with the processor 21, the vehicle-mounted 12V power system supplies power to the processor 21, the processor 21 supplies power to the infrared camera, the infrared camera is mounted at the front end grid position of the vehicle, and infrared radiation on a road in front of the vehicle is detected to form an infrared image of a driving scene. The processor 21 is installed in the vehicle cab, receives infrared image data acquired by the infrared camera, and executes the driving assistance method according to the embodiment of the present application.

Optionally, the vehicle-mounted device further includes a display alarm module connected to the processor 21. The display warning device can present the client with a simulated video of the target detection result, and carry out sound, light and/or picture warning on the possible collision danger.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the driving assistance method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A driving assistance method based on infrared images is applied to vehicle-mounted equipment, and is characterized by comprising the following steps:

2. The driving assistance method according to claim 1, wherein the training an initial image model based on the training set to obtain a trained image recognition model comprises:

3. The driving assistance method according to claim 1, wherein the training an initial image model based on the training set to obtain a trained image recognition model comprises:

4. The driving assistance method according to claim 3, wherein obtaining the trained image recognition model until a preset iteration condition is satisfied includes:

t∈[0，1]setting the initial value of the weighting coefficient to be 0, and obtaining a trained image recognition model when the t approaches to 1; or the like, or, alternatively,

5. The driving assistance method according to any one of claims 1 to 3, wherein labeling the target included in each image in the sample set to obtain a training set including a target label comprises:

6. The driving assistance method according to claim 5, wherein the training an initial image model based on the training set to obtain a trained image recognition model comprises:

7. The drive assist method according to any one of claims 1 to 3, wherein the image-enhancing the original visible-light image to obtain an enhanced visible-light image includes:

8. The drive assist method according to any one of claims 1 to 3, wherein the image-enhancing the original visible-light image to obtain an enhanced visible-light image includes:

9. A driving assistance apparatus based on an infrared image, characterized by comprising:

10. An in-vehicle apparatus characterized by comprising a processor, an infrared photographing device connected to the processor, a memory, and a computer program stored on the memory and executable by the processor, the computer program, when executed by the processor, implementing the infrared image-based driving assistance method according to any one of claims 1 to 8.

11. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, implements the infrared image-based driving assistance method according to any one of claims 1 to 8.