CN114419603A

CN114419603A - Automatic driving vehicle control method and system and automatic driving vehicle

Info

Publication number: CN114419603A
Application number: CN202210133889.5A
Authority: CN
Inventors: 黄好; 朱勇建; 黄篷迟; 钟声峙; 刘浩; 王剑鑫
Original assignee: Liuzhou Wuling Automobile Industry Co Ltd; Guangxi Automobile Group Co Ltd
Current assignee: Liuzhou Wuling Automobile Industry Co Ltd; Guangxi Automobile Group Co Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-04-29

Abstract

The embodiment of the application discloses an automatic driving vehicle control method and system and an automatic driving vehicle. The automatic driving vehicle control method includes: obtaining one or more images according to data collected by a camera on the autonomous vehicle; inputting one or more images into a target detection model and a semantic segmentation model to obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle, the second target comprises at least one of a lane line, a zebra crossing, and a travelable area; fusing the labeling information of the first target and the labeling information of the second target to obtain the labeling information of the image; and determining vehicle control information according to the labeling information of the image, and controlling the automatic driving vehicle to run according to the vehicle control information. The safety of automatic driving of the vehicle is improved.

Description

Automatic driving vehicle control method and system and automatic driving vehicle

Technical Field

The invention relates to the field of vehicles, in particular to an automatic driving vehicle control method and system and an automatic driving vehicle.

Background

With the development of science and technology, vehicles gradually become a part of people's lives, and particularly, automatic driving is widely concerned.

The autonomous vehicle is usually equipped with sensors, and the vehicle acquires data through the sensors to obtain the surrounding environment information of the vehicle. For example, data is collected by a camera mounted on an autonomous vehicle, and the autonomous vehicle obtains surrounding environment information using the data, and makes decisions and controls using the surrounding environment information to implement autonomous driving. However, the data acquired by the camera at present obtain less ambient information, which results in lower accuracy of decision and control, and thus lower safety of automatic driving.

Disclosure of Invention

In view of the above, the present application provides an autonomous vehicle control method, system and autonomous vehicle, so as to improve safety of automatic driving of the vehicle.

In a first aspect, the present application provides a method of controlling an autonomous vehicle, the method comprising:

obtaining one or more images according to data collected by a camera on the autonomous vehicle;

inputting one or more images to a target detection model and a semantic segmentation model to respectively obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle, the second target comprises at least one of a lane line, a zebra crossing, and a travelable area;

fusing the labeling information of the first target and the labeling information of the second target to obtain the labeling information of the image;

and determining vehicle control information according to the labeling information of the image, and controlling the automatic driving vehicle to run according to the vehicle control information.

By using the technical scheme of the embodiment of the application, the marking information of the obstacles around the vehicle, the marking information of at least one of the lane line, the zebra crossing and the drivable area can be obtained, so that more information is provided for the decision and control of the vehicle, and the safety of automatic driving is improved.

The automatic driving vehicle makes a decision and controls according to the information of at least one of the obstacle, the lane line, the zebra crossing and the drivable area, so that the safety is improved in the automatic driving process.

In addition, by adopting the technical scheme of the embodiment of the application, the two recognition models are recognized, and the two recognition results are fused and output, so that the accuracy of the recognition result is improved, and the safety of automatic driving is improved.

In some possible implementations, the target detection model includes the YOLOv5 target detection model.

In some possible implementations, the target detection model includes a deplab 3 semantic segmentation model.

In some possible implementations, the first target further includes a traffic light.

In some possible implementations, before inputting the one or more images into the object detection model and the semantic segmentation model, obtaining annotation information of a first object of the one or more images and annotation information of a second object of the one or more images, the method further includes:

the precision of the semantic segmentation model is reduced.

the accuracy of the target detection model is reduced.

In a second aspect, the present application provides an autonomous driving vehicle control system, the system comprising a camera, an environment sensing unit, a vehicle decision unit, and a vehicle control unit, wherein:

the camera is assembled on the automatic driving vehicle and used for acquiring data;

the environment sensing unit is used for: obtaining one or more images according to data collected by a camera on the autonomous vehicle; inputting one or more images into a target detection model and a semantic segmentation model to obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle, the second target comprises at least one of a lane line, a zebra crossing, and a travelable area;

the vehicle decision unit is used for determining vehicle control information according to the label information of the image, wherein the label information of the image is obtained by fusing the label information of the first target and the label information of the second target;

and the vehicle control unit is used for controlling the automatic driving vehicle to run according to the vehicle control information.

In some possible implementations, the target detection model includes the YOLOv5 target detection model, and/or the target detection model includes the deplab v3 semantic segmentation model.

In a third aspect, the present application provides an autonomous vehicle comprising an autonomous vehicle control system as in any above.

In a fourth aspect, the present application provides a computer readable storage medium for storing a computer program for executing the autonomous vehicle control method as in any one of the above.

Drawings

FIG. 1A is a flow chart of a method for controlling an autonomous vehicle provided by an embodiment of the present application;

fig. 1B is a schematic diagram of obtaining a private data set according to an embodiment of the present application;

fig. 1C is a schematic diagram of obtaining annotation information of an image according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an autonomous vehicle control system according to an embodiment of the present application.

Detailed Description

The automatic driving vehicle acquires data through the camera to acquire surrounding environment information of the vehicle. However, the data acquired by the camera at present obtain less ambient information, which results in lower accuracy of decision and control, and thus lower safety of automatic driving.

Based on this, in the embodiment of the present application provided by the applicant, first, one or more images are obtained according to data collected by a camera on an autonomous vehicle; and then inputting the obtained one or more images into the target detection model and the semantic segmentation model to obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images. The first target includes an obstacle, and the second target includes at least one of a lane line, a zebra crossing, and a travelable area. And fusing the labeling information of the first target and the labeling information of the second target to obtain the labeling information of the image, determining vehicle control information according to the labeling information of the image, and controlling the automatic driving vehicle to run according to the vehicle control information.

Therefore, the embodiment of the application has the following beneficial effects:

For example, obstacles around the vehicle are identified to control the vehicle to avoid the obstacles when the vehicle is running, so that the vehicle can avoid the obstacles. By marking the information lane line, the zebra crossing or the drivable area, the information of the lane line, the zebra crossing or the drivable area can be obtained. Controlling the vehicle to run on the lane according to the information of the lane line, and reducing the frequent lane change or line pressing running condition of the vehicle in the running process; according to the information of the lane lines, the information of the current and/or adjacent lane can be further determined, for example, the type of the lane (one-way lane, straight lane, left-turn lane, etc.) is determined, so that the vehicle is controlled to run on the lane according with the traffic rule; according to the information of the zebra crossing, the speed of the vehicle can be controlled to be reduced when the vehicle runs to the vicinity of the zebra crossing, and the safety of automatic driving is improved.

Specifically, for the obstacles, due to the fact that the distances between the obstacles and the driving scene are different, the display sizes of the obstacles in the vehicle-mounted camera are different, namely, the display sizes of the obstacles in the image are different, the types of the traffic obstacles are multiple, and for the control system, the information required by the control system is also the specific coordinate position, so that the actual physical coordinates can be mapped back. The target detection model has good coverage on large targets, small targets and detection types in the images, the detection types are wide in coverage, the detection on the large targets and the detection on the small targets can be covered, the result of the target detection returns the type information of the obstacles and coordinate information, and the coordinates can be provided for the decision control system so as to be mapped back to actual physical coordinates for assisting the decision control system to control the vehicle to run.

For the zebra crossing, the lane line and the drivable area, the proportion of the zebra crossing, the lane line and the drivable area in the video image data acquired by the vehicle-mounted camera is large, namely, for a pair of scene images for automatic driving, the proportion of the zebra crossing, the lane line and the drivable area in the image is high, and the semantic segmentation model can classify pixels in the image one by one and has strong classification capability, so that the information such as the zebra crossing, the lane line and the drivable area in the image acquired by the vehicle-mounted camera can be well distinguished.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.

To facilitate understanding of technical solutions provided in the embodiments of the present application, terms in the embodiments of the present application are first described.

Target detection: the technology for detecting the key object required in the image can acquire the position information and the object type information of the object in the image.

Semantic segmentation: the pixel points in the image are classified into a specific category by category division, for example, some pixel points in the image belong to vehicles and some pixel points belong to pedestrians, and the pixel points corresponding to the objects are identified and classified by a semantic segmentation method.

Data enhancement: by utilizing various image processing methods, the original data set is enlarged, and the problems of insufficient picture data volume, unbalanced data and the like are solved.

Machine Learning (ML): there are generally three common definitions for machine learning. Firstly, machine learning is the science of artificial intelligence, and the main research object in the field is artificial intelligence, particularly how to improve the performance of a specific algorithm in experience learning; secondly, machine learning is the research of a computer algorithm which can be automatically improved through experience; third, machine learning is the use of data or past experience to optimize the performance criteria of a computer program.

Deep Learning (DL): deep learning is a research direction in the field of machine learning. In general, deep learning is to learn the intrinsic regularity and expression hierarchy of sample data, and information obtained through the learning process is helpful to the interpretation of data such as text, images, and sounds. The purpose of deep learning is to make a machine capable of analyzing and learning like a human, and to recognize data such as character images and sounds. Deep learning algorithms are typically more complex machine learning algorithms.

Deep learning model: the system is composed of basic deep learning modules, and each module is responsible for different functions and finally forms a unified model. The models are built in different modes, so that the functions of the models are different, and corresponding results can be obtained after data are sent into the models.

And (3) target detection algorithm: for images, object detection is typically used to determine what objects are in the picture, and where the objects are respectively. The target detection algorithm is used for realizing target detection. For example, the resulting target object is framed in the image by a rectangular frame by the target detection algorithm.

And (3) semantic segmentation algorithm: for an image, semantic segmentation generally needs to distinguish each pixel point in the image, and different pixel points may belong to different targets.

Single precision (FP32 precision): in the single precision 32-bit format, 1 bit is used to indicate whether a number is a positive or negative number. The exponent retains 8 bits because it is binary, advancing 2 to the high order bit. The remaining 23 bits are used to represent the number that makes up the number, referred to as the significand.

Half precision (FP16 precision): the half-precision format is similar to the single-precision format described above. The leftmost bit is still the sign bit, the exponent is 5 bits wide and is stored in the form of the remainder-16 (process-16), and the mantissa is 10 bits wide but with an implicit 1. The mantissa can be understood as a number after a decimal point of a floating point number, for example, 1.11, the mantissa is 1100000000(1), and finally, when the implied 1 is mainly used for calculation, the implied 1 may have a situation that carry is possible.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, a description is first given of common application scenarios in the embodiments of the present application.

The modules for realizing the automatic driving of the vehicle are divided according to functions and can comprise an environment perception module, a control decision algorithm module and a motion control module.

Various types of sensors, such as laser radar, millimeter wave radar, onboard cameras, fisheye cameras, etc., are loaded on an autonomous vehicle, and the vehicle uses these sensors to collect data about the surroundings. In order to complete an automatic driving task, a vehicle generally needs to sense the surrounding environment through an environment sensing module by using data acquired by a sensor to obtain surrounding environment information. For example, obstacles around the vehicle are detected, and the resulting information is recorded. The environment sensing module provides the information to the control decision algorithm module, and provides more accurate surrounding environment information for the control algorithm, for example, provides object information shot in a vehicle-mounted camera for the control decision algorithm, so that the automatic driving vehicle can acquire surrounding vehicle pedestrian information and road information. And the control decision algorithm module determines the control and decision results according to the relevant information so that the motion control module can correspondingly control the vehicle.

In order to facilitate understanding of the technical solutions provided by the embodiments of the present application, an environment sensing method, an environment sensing system and an autonomous vehicle provided by the embodiments of the present application are described below with reference to the accompanying drawings.

While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Other embodiments, which can be derived by those skilled in the art from the embodiments given herein without any inventive contribution, are also within the scope of the present application.

In the claims and specification of the present application and in the drawings accompanying the description, the terms "comprise" and "have" and any variations thereof, are intended to cover non-exclusive inclusions.

The application provides a method for controlling an autonomous vehicle.

Referring to fig. 1A, fig. 1A is a flowchart of an autonomous vehicle control method provided in an embodiment of the present application.

As shown in fig. 1A, the present embodiment provides an autonomous vehicle control method including S101-S104.

S101, one or more images are obtained according to data collected by a camera on the automatic driving vehicle.

An autonomous vehicle refers to a vehicle that is autonomous.

The camera is mounted on the autonomous vehicle for collecting data.

The one or more images refer to images acquired by the camera or images obtained by processing data acquired by the camera.

The one or more images are used for subsequent processing to obtain the surrounding environment information of the autonomous vehicle.

S102, inputting the image to a target detection model and a semantic segmentation model, and respectively obtaining the labeling information of a first target in the image and the labeling information of a second target in the image.

The image in S102 refers to one or more images obtained in S101.

The target detection model is used for performing target detection on the image to obtain labeling information (target detection result) of a first target in the image.

The semantic segmentation model is used for performing object detection semantic segmentation on the image to obtain labeling information (semantic segmentation result) of a second object in the image.

For the object detection model and the semantic segmentation model, the inputs to the models are the same, both being one or more images in S101.

The first target includes an obstacle. The obstacle here refers to an obstacle in the surrounding environment when the vehicle is running.

The first object in the image may be an obstacle in one image, or a plurality of obstacles.

The second target includes at least one of a lane line, a zebra crossing, and a travelable area.

The travelable region refers to a region in which the vehicle can normally travel.

The second objects in the images may be one or more lane lines, one or more zebra stripes, and one or more drivable areas in one image.

The present embodiment does not limit the types and numbers of the objects specifically included in the first object and the second object.

The input of the target detection model is an image, and the output of the target detection model is the labeling information of the first target.

The input of the semantic segmentation model is an image, and the output of the semantic segmentation model is the labeling information of the second target.

S103, fusing the labeling information of the first target and the labeling information of the second target to obtain the labeling information of the image.

The labeling information of the first target is used for representing the information of the first target in the image; the labeling information of the second target is used for representing the information of the second target in the image.

And inputting the image to the target detection model and the semantic segmentation model, and fusing the output of the two models, namely the labeling information obtained by fusing the two models, so as to obtain the labeling information of the image.

And S104, determining vehicle control information according to the labeling information of the image, and controlling the automatic driving vehicle to run according to the vehicle control information.

The present application further provides another autonomous vehicle control method. The automatic driving vehicle control method of the embodiment of the application comprises S201-S208, and the method is used for realizing automatic driving of the vehicle.

S201, acquiring a first target detection model and a first semantic segmentation model.

The first object detection model is used to identify and locate a first object in the image. The first target may include an obstacle (a vehicle, a pedestrian, and the like), a traffic signal, and the like.

And detecting the traffic signal lamp to control the vehicle to run according to the indication of the signal lamp.

The first semantic segmentation model is used for performing semantic segmentation on the image and determining a second target in the image. The second target may include a lane line, a zebra crossing, a vehicle drivable area, and the like.

The first target recognition model and the first semantic segmentation model may be initial models.

Specifically, for the obstacle and the traffic signal lamp, due to the fact that the distance between the obstacle and the traffic signal lamp in the driving scene is different, the display size in the vehicle-mounted camera is different, namely the occupied pixels in the image are different, the types of the traffic obstacle and the traffic signal lamp are multiple, and for a control system, the required information is also the specific coordinate position, so that the actual physical coordinate can be mapped back. The target detection model has good coverage on large targets, small targets and detection types in the images, the detection types are wide in coverage, the detection on the large targets and the detection on the small targets can be covered, the type information of the obstacles and traffic lights and the coordinate information are returned by the target detection result, and the coordinate can be provided for the decision control system so as to be mapped back to the actual physical coordinate for assisting the decision control system to control the vehicle to run.

The first target detection model is trained using a first target detection data set, and the first semantic segmentation model is trained using a first semantic segmentation data set.

The images in the first target detection dataset and the first semantically segmented dataset may be the same or different.

In some possible cases, the images in the first target detection dataset and the first semantic segmentation dataset are the same, at which point the annotation information for the images utilized by the two models is different.

In some possible cases, the first target detection dataset and the first semantically segmented dataset may be public autopilot datasets.

S202, acquiring data by using a camera on the vehicle to obtain an image set.

The image set comprises at least one image.

Generally, data acquired by a camera on a vehicle is video data, and therefore after the video data is acquired by the camera, frame extraction processing can be performed on the video data to obtain one or more images in the video data, and the obtained images are used as images in the image set to obtain the image set.

Further, the one or more images may be filtered to obtain the image set.

Specifically, the one or more images may be filtered according to a preset condition.

For example, an image including the target object may be selected from the one or more images, or an image whose image parameters meet preset parameters may be selected, so as to obtain the image set, thereby reducing the amount of data to be processed.

The image containing the target object includes, for example, a target object such as a vehicle or a pedestrian (i.e., a scene-free image is screened out).

The image parameter is an image meeting a preset parameter, for example, the brightness of the image exceeds a preset brightness threshold; or the definition of the image exceeds a preset definition threshold (i.e. the blurred image is filtered), etc.

In some possible cases, the camera for collecting data may be an onboard monocular camera.

The vehicle in S202 may be a vehicle dedicated to data collection, or may be a vehicle having an automatic driving function, which is not limited in this embodiment.

In some possible cases, it may be images taken for different driving scenes.

S203, respectively inputting the images in the image set to the first target detection model and the first semantic segmentation model to obtain a second target detection data set and a second semantic segmentation data set.

Taking the image data in the image set as the input of a first target detection model to obtain the labeling information of the first target; the second object detection data set comprises the images in the image set and the labeling information of the first object in the images.

Taking the image data in the image set as the input of a first semantic segmentation model to obtain the labeling information of a second target; the second semantically segmented data set comprises images in said image set and annotation information of a second object in the images.

In practical applications, the labeling information of the first target detection model and the first semantic segmentation model may be more accurate or less accurate.

Therefore, the annotation information of the first target detection model can be screened to obtain a third target detection data set containing accurate annotation information, and the annotation information of the first semantic segmentation model can be screened to obtain a third semantic segmentation data set containing accurate annotation information.

In some possible cases, whether the annotation information of the model is accurate or not can be determined according to a preset accuracy condition.

Specifically, the third target detection data set includes (part or all of) the images in the image set and the label information of the first target detection model corresponding to the images, and the label information meets a preset accuracy condition; the third semantic segmentation data set comprises images (part or all) in the image set and annotation information of the first semantic segmentation model corresponding to the images, and the annotation information meets a preset accuracy condition.

In some possible cases, an image in which the annotation information of the model does not meet the accuracy condition in the image set can be obtained.

Furthermore, for the image of which the first target detection model and the labeling information identified by the first semantic identification do not meet the preset accuracy condition, more accurate labeling information can be obtained by adopting other modes. For example by manual labeling.

At this point, a fourth target detection dataset and a fourth semantically segmented dataset may be determined.

In the fourth target detection dataset and the fourth semantic segmentation dataset, the annotation information of the image is obtained through other methods (such as manual annotation) described above.

In some possible implementations, data with inaccurate annotation information may be replaced with data of a fourth target detection data set in the second target detection data set; data with inaccurate labeling information can be replaced by data of a fourth semantic segmentation data set in the second semantic segmentation data set, so that the accuracy of the data in the second target detection data set is improved.

In some possible cases, the first target detection dataset and the first semantically segmented dataset may be public autopilot datasets, may be understood as public datasets, and may be, for example, data published on a network. In this embodiment, since the data in the image set is captured by the camera of the vehicle, the second object detection data set and the second semantic segmentation data set can be understood as private data sets.

For the above process of obtaining the private data set, refer to fig. 1B, where fig. 1B is a schematic diagram of obtaining the private data set according to an embodiment of the present application.

The private data set is briefly acquired as described below in conjunction with fig. 1B, and the details of which are already described above.

As shown in fig. 1B, model training is performed by using the public data set to obtain a perception model, where the perception model includes a model for object detection and a model for semantic segmentation, that is, the first object detection model and the first semantic segmentation model in the above description.

And acquiring video data by using a camera, and performing frame extraction and screening processing on the video to obtain an image.

And identifying the obtained image by using the perception model to obtain an identification result.

The recognition result can be divided into an accurate recognition result and a fuzzy recognition result, and the fuzzy recognition result is also an inaccurate result.

And storing the accurate recognition result, and manually labeling the picture corresponding to the fuzzy recognition result to obtain a manually labeled result.

And forming a private data set by the identification accurate result and the result obtained by manual marking.

S204, training the first target detection model by using the second target detection data set to obtain a second target detection model; and training the first semantic segmentation model by using the second semantic segmentation data set to obtain a second semantic segmentation model.

The second target detection model and the second semantic segmentation model are respectively trained models of the first target detection model and the first semantic segmentation model. The trained data is from a second target detection dataset and a second semantic segmentation dataset, respectively.

In some possible implementations, the first precision is used in training the model.

For example, a first target detection model and a first semantic segmentation model are trained with a single precision (FP32 precision). Namely, after training, a second target detection model with FP32 precision and a second semantic segmentation model with FP32 precision are obtained.

In some possible implementation manners, the first target detection data set and the second target detection data set are fused, and the second target detection model is trained again by using the fused data sets; the first semantic segmentation data set and the second semantic segmentation data set can be fused, and the second semantic segmentation model is trained again by utilizing the fused data set, so that the accuracy of model prediction is improved.

For autonomous driving techniques, in order to take advantage of the modeling capability of deep learning, it is generally necessary to have a better data set, and some common autonomous driving data sets currently exist. By adopting the technical scheme of the embodiment, the performance of the model can be improved by utilizing a private data set or utilizing a public data set and vehicle collected data together. For example, when the image set is obtained by collecting data, a vehicle driving scene can be set in a targeted manner, so that the performance of the model is optimized and the accuracy of the model is improved after the private data set is utilized.

In this embodiment, the effective information in the vehicle-mounted image can be fully utilized by using the labeling information of the two models, so that the fast and accurate key information is provided for the control decision algorithm.

The following is an implementation manner of the structures of the second object detection model and the second semantic segmentation model provided in this embodiment.

The second target detection model may be the YOLOv5 target detection model.

The Yolov5 target detection model can be divided into three parts, wherein the first part is a backbone network and is responsible for extracting image features; the second part is a neck network and is responsible for fusing features; the third part is a head network which is responsible for specific task processing.

In this embodiment, the backbone network may be composed of a Focus, a Darknet53, a bottleeckcsp, and an SPP deep learning module; the neck network may employ a PANet module and the head network may employ a CNN module.

Furthermore, in the process of training the target detection model by using the data set, a mosaic data enhancement mode can be used, so that the model training effect is better. Mosaic data enhancement refers to a form of splicing four images into one image.

The second semantic segmentation model may be a deplab v3 semantic segmentation model.

The Deeplabv3 semantic segmentation model may be divided into two parts. The first part is an encoder, and a hole convolution and convolution neural network can be used as the encoder; the second part is a decoder, which can use an upsampling module and a convolutional neural network as a decoder. And training the encoder and the decoder by using the data set to obtain a trained model.

S205, deploying a second target detection model and a second semantic segmentation model.

And deploying the second target detection model and the second semantic segmentation model so that the second target detection model and the second semantic segmentation model realize functions in the automatic driving process of the vehicle.

In training the model, in order to optimize the performance of the model, training can be performed with higher precision, such as a second target detection model with FP32 precision and a second semantic segmentation model with FP32 precision.

Typically, the training of the model is done on the server. Due to the limited computing power of the vehicle, the accuracy of the model can be optimized to adapt to the hardware condition of the vehicle when the model is deployed.

That is, the second object detection model and the second semantic segmentation model are deployed on the vehicle and the accuracy of the second object detection model and the second semantic segmentation model is reduced at the time of deployment.

Specifically, a model of a first accuracy is converted into a model of a second accuracy, the first accuracy being greater than the second accuracy.

For example, the first precision is single precision (FP32 precision), and the second precision is half precision (FP16 precision).

At this time, the accuracy of the model is reduced, and a second target detection model with the accuracy of FP16 and a second semantic segmentation model with the accuracy of FP16 are obtained.

In consideration of real-time performance and model accuracy, in the technical scheme of the embodiment, the FP32 accuracy is adopted to train the model in the model training stage; and performing deployment with FP16 precision in a model deployment stage. When the model is deployed, the accuracy of the model is slightly lost, but the real-time performance of the model is improved, and the size of the model is reduced.

That is, the model size is reduced by converting the model accuracy, thereby obtaining the speed of the recognition result using the model.

The following is an implementation manner of deploying the second target detection model and the second semantic segmentation model provided in this embodiment.

In some possible implementations, a deep learning perceptual model may be deployed to the in-vehicle image recognition module, where the deep learning perceptual model includes: the system comprises a second target detection model, a second semantic segmentation model, an image preprocessing module and a result fusion module.

The deployment of the second target detection model and the second semantic segmentation model is realized by deploying the deep learning perception model to the vehicle-mounted image recognition module.

In the automatic driving process, the vehicle-mounted camera acquires data and sends the data to the vehicle-mounted image recognition module, the vehicle-mounted image module sends the data to the deep learning perception model to obtain a final recognition result, and finally the recognition result is sent to a control decision algorithm. The recognition result includes labeling information for the first target and the second target.

Specifically, the recognition result of the deep learning perception model is obtained by fusing the labeling information of the second target detection model and the labeling information of the second semantic segmentation model.

The following describes an image preprocessing module and a result fusion module in the vehicle-mounted image recognition module.

Since the vehicle-mounted camera usually collects video data, the data for model processing is usually image data. The image preprocessing module can be used for performing frame extraction on video data to obtain a plurality of frames of pictures; the multi-frame picture processing method can also be used for respectively inputting the multi-frame pictures into the module for processing.

In some possible cases, for an image obtained from video data acquired by a frame-extracting vehicle-mounted camera, the resolution of the image does not necessarily meet the requirements of model processing. Therefore, the image preprocessing module can be used for adjusting the resolution of the image obtained by frame extraction. For example, the resolution of the image obtained by frame extraction is usually larger, and the image preprocessing module is used to reduce the resolution of the image to meet the requirement of model processing.

The result fusion module can be used for fusing the results obtained by the second target detection model and the second semantic segmentation model. Specifically, the result fusion module may be configured to label the target detection result and the semantic segmentation result on the same image, so as to obtain fused label information.

In some possible implementation manners, the first target detection data set and the second target detection data set may be fused, and the deep learning perception model may be trained by using the fused data sets to optimize performance of the model.

It is to be understood that the above-described correlation process (S201-S205) for model training and model deployment may not be performed during the autonomous driving process.

S206, inputting the image to the second target detection model and the second semantic segmentation model to obtain the annotation information of the first target of the image, and inputting the image to the second semantic segmentation model to obtain the annotation information of the second target of the image.

The image input to the model is obtained from data collected by the autonomous vehicle, for example, an image collected by a camera on the autonomous vehicle, or an image obtained by processing data collected by a camera on the autonomous vehicle (for example, an image obtained by performing frame extraction on video).

The first target includes an obstacle and a traffic light; the second targets include lane lines, zebra crossings, and travelable areas.

And S207, fusing the labeling information of the first target and the labeling information of the second target to obtain the labeling information of the image.

After the annotation information of the first target and the annotation information of the second target are fused, the annotation information of the first target and the annotation information of the second target of the image can be obtained according to the annotation information of the image.

For example, the notation information of the first target is represented as: displaying a first target with a rectangular frame on the image; the notation information of the second target is represented as: a different second object is displayed in a different color on the image. After the fusion, when the image simultaneously contains the first target and the second target, the representation of the annotation information of the image is as follows: a first object is displayed with a rectangular frame on the image and a different second object is displayed with a different color on the image.

Referring to the above description, in one possible implementation, the deep learning perception model includes a second target detection model, a second semantic segmentation model, an image preprocessing module, and a result fusion module.

At this time, S206 and S207 are: inputting an image to a deep learning perception model to obtain the labeling information of the image; and the annotation information of the image is obtained by fusing the annotation information of the first target and the annotation information of the second target.

The following describes the labeling information of the obtained image with reference to the drawings.

Referring to fig. 1C, fig. 1C is a schematic diagram of obtaining annotation information of an image according to an embodiment of the present application.

As shown in fig. 1C, after the image is preprocessed, the image enters a target detection model and a semantic segmentation model for processing, that is, enters the second target detection model and the second semantic segmentation model for processing, and the obtained results are fused (which can be completed by the result fusion module) to obtain the labeling information of the image.

It is to be understood that the model for processing the image corresponding to the data collected by the autonomous vehicle may be a model after conversion. That is, after the image is preprocessed, the model with the converted precision is used for processing to obtain the labeling information of the target, and the labeling information of the target in the image is fused to obtain the labeling information of the image.

And S208, determining vehicle control information according to the labeling information of the image, and controlling the automatic driving vehicle to run according to the vehicle control information.

After the label information of the image is obtained, the label information can be used for obtaining the surrounding environment information of the automatic driving vehicle, so that decision and control can be carried out according to the surrounding environment information, and automatic driving of the vehicle is realized.

The application also provides an autonomous vehicle control system.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an automatic driving vehicle control system according to an embodiment of the present disclosure.

As shown in fig. 2, the autonomous vehicle control system 200 of the present embodiment includes a camera 201, an environment sensing unit 202, a vehicle decision unit 203, and a vehicle control unit 204 to improve safety of vehicle autonomous driving.

The camera 201 is mounted on the autonomous vehicle for collecting data.

The environment sensing unit 202 is used for obtaining one or more images according to data collected by the camera 201 on the automatic driving vehicle; inputting one or more images into a target detection model and a semantic segmentation model to obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle and the second target comprises at least one of a lane line, a zebra crossing, and a travelable area.

The vehicle decision unit 203 is configured to determine vehicle control information according to the annotation information of the image, where the annotation information of the image is obtained by fusing the annotation information of the first target and the annotation information of the second target.

The vehicle control unit 204 is configured to control the autonomous vehicle to travel according to the vehicle control information.

Further, the target detection model in the present embodiment may be a YOLOv5 target detection model; the target detection model may be a Deeplabv3 semantic segmentation model.

Technical effects that can be achieved by the devices and units included in the above-described environment sensing system have been described in the above embodiments, and are not described herein again to avoid repetition.

The present application further provides an autonomous vehicle.

The autonomous vehicle in this embodiment includes the autonomous vehicle control system of any of the above embodiments to improve safety of autonomous driving of the vehicle.

The structure of the autonomous vehicle control system included in the autonomous vehicle and the technical effects that can be achieved have been described in the above embodiments, and are not described herein again to avoid repetition.

In an embodiment of the present application, a computer-readable storage medium is further provided, where the computer-readable storage medium is used for storing a computer program, and the computer program is used for executing the above automatic driving vehicle control method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An autonomous vehicle control method, characterized in that the method comprises:

inputting the one or more images to a target detection model and a semantic segmentation model, and respectively obtaining the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle, the second target comprises at least one of a lane line, a zebra crossing, and a drivable area;

fusing the labeling information of the first target and the labeling information of the second target to obtain labeling information of the image;

2. The method of claim 1, wherein the target detection model comprises a YOLOv5 target detection model.

3. The method of claim 1, wherein the target detection model comprises a deep bv3 semantic segmentation model.

4. The method of claim 1, wherein the first target further comprises a traffic light.

5. The method of claim 1, wherein prior to said inputting the one or more images into a target detection model and a semantic segmentation model, obtaining annotation information for a first target of the one or more images, and annotation information for a second target of the one or more images, the method further comprises:

reducing the precision of the semantic segmentation model.

6. The method of claim 1, wherein prior to said inputting the one or more images into a target detection model and a semantic segmentation model, obtaining annotation information for a first target of the one or more images, and annotation information for a second target of the one or more images, the method further comprises:

reducing the accuracy of the target detection model.

7. An autonomous vehicle control system, comprising a camera and context awareness unit, a vehicle decision unit, and a vehicle control unit, wherein:

the environment sensing unit is used for: obtaining one or more images according to data collected by a camera on the autonomous vehicle; inputting the one or more images into a target detection model and a semantic segmentation model to obtain the labeling information of a first target of the one or more images and the labeling information of a second target of the one or more images; wherein the first target comprises an obstacle, the second target comprises at least one of a lane line, a zebra crossing, and a drivable area;

8. The system of claim 7, wherein the object detection model comprises a YOLOv5 object detection model, and/or wherein the object detection model comprises a deplab v3 semantic segmentation model.

9. An autonomous vehicle, characterized in that it comprises an autonomous vehicle control system according to any of claims 7 to 8.

10. A computer-readable storage medium for storing a computer program for executing the autonomous vehicle control method of any of claims 1 to 6.