WO2020062433A1 - 一种神经网络模型训练及通用接地线的检测方法 - Google Patents

一种神经网络模型训练及通用接地线的检测方法 Download PDF

Info

Publication number
WO2020062433A1
WO2020062433A1 PCT/CN2018/113661 CN2018113661W WO2020062433A1 WO 2020062433 A1 WO2020062433 A1 WO 2020062433A1 CN 2018113661 W CN2018113661 W CN 2018113661W WO 2020062433 A1 WO2020062433 A1 WO 2020062433A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
ground
image
network model
road
Prior art date
Application number
PCT/CN2018/113661
Other languages
English (en)
French (fr)
Inventor
年素磊
梁继
Original Assignee
初速度(苏州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 初速度(苏州)科技有限公司 filed Critical 初速度(苏州)科技有限公司
Publication of WO2020062433A1 publication Critical patent/WO2020062433A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Definitions

  • the invention belongs to the field of intelligent driving, and more particularly, relates to a universal ground wire detection method.
  • one aspect of the present invention is to provide a method for training a neural network model, which is characterized by:
  • the training method includes the following training steps: Step 11: Obtain a sample of a road, a pair of ground points labeled with a drivable area and a dynamic object at a boundary in a road sample image;
  • Step 12 input the road sample image into an initialized neural network model
  • Step 13 Use the labeled road sample image training to initialize the neural network model; the loss function of the neural network is:
  • L represents the loss function
  • p i , t i , s i respectively represent the predicted values of the pixels at the same location in the ground point classification map, ground point distance map, and driveable area segmentation map.
  • the corresponding label values are respectively;
  • L cls is the loss function of the classification, preferably the cross entropy loss is used; Is to normalize all pixels involved in the calculation;
  • L reg is the loss function of the regression, preferably using the mean square error,
  • L seg is the cross-entropy loss function, Normalize the pixels involved in the calculation of regression and segmentation, respectively.
  • ⁇ and ⁇ represent different coefficients.
  • the road sample image and the labeled drivable area map are scaled to a preset size.
  • the objects in the image that have overlapping parts with the road pavement, the boundary lines between the connected parts of these objects and the real road pavement are ground lines, and the two ends of the ground line are ground points.
  • step 12 includes steps 121 and 122:
  • Step 121 the road sample image is input into an encoder portion of the initialized neural network model
  • Step 122 The image features obtained by the encoder are input to a decoder of the initialized neural network model, and a driving area segmentation result and a ground line detection result are obtained.
  • the decoder includes a drivable area segmentation branch and a ground line detection branch.
  • the present invention provides a neural network model obtained by using the training method described in any one of the above.
  • a method for detecting a universal ground wire by using a training method of a neural network model according to any one of claims 1-5 is provided, wherein the detection method includes the following steps:
  • Step 1 Single-target the image acquired by the camera device, and record and store the internal parameters and distortion parameters of the camera device;
  • Step 2 The image obtained in step 1 is input into the trained neural network to obtain a driving area segmentation map, a ground point, and a ground line.
  • the step 1 further includes scaling the image to a preset size using bilinear interpolation.
  • the invention of the present invention lies in the following aspects, but is not limited to the following aspects:
  • This method uses a pre-trained driveable area detection model to identify the current road image captured by the vehicle.
  • the model can extract the current road image features and learn. While dividing the current road image into a drivable area and an obstacle area, it also detects the ground line and its corresponding object category. Subsequent cases of dynamic objects can be distinguished to generate different types of ground lines, and then the physical boundaries of the object, such as the speed of the object, can be estimated from the changes in the continuous multiple frames of the object.
  • FIG. 1 is an example diagram of a structured boundary of a drivable area provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a driveable area segmentation network in an encoder structure and a decoder structure in a network according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a structure for feature fusion in an encoder in a network according to an embodiment of the present invention.
  • the present invention obtains a structured representation of the boundary of the drivable area in the current road image by processing the current road image.
  • the vehicle can plan a driving strategy based on the structured representation.
  • FIG. 1 shows an example diagram of a structured boundary of a drivable area provided by an embodiment of the present invention, and a general ground line detection result is given in the example diagram.
  • the shaded area is the identified driveable area, and the other areas are non-driveable areas.
  • the embodiment of the present invention also gives a structured representation.
  • the boundary objects of the drivable area can be divided into static objects and dynamic objects.
  • Static objects include static obstacles such as road shoulders, fences, and triangular cones.
  • Dynamic objects include freely movable objects such as cars, electric vehicles, bicycles, and pedestrians.
  • the embodiment of the present invention is represented by a ground line.
  • the ground wire is divided into different categories according to the category of the object or different directions of the same object.
  • the line segments on the ground of the electric vehicle and the wheel parts of the vehicle respectively represent the grounding wires of the electric vehicle, the grounding wires of the front and rear sides of the motor vehicle, and the grounding wires of the left and right sides of the motor vehicle.
  • the two points of the ground wire are ground points. There are two types of ground points: visible ground point and speculative ground point.
  • Step 1 Obtain a road sample image.
  • the road sample area should be labeled with a pair of ground points of the dynamic objects at the boundary.
  • the road sample image can be regarded as a sample image for training the detection model.
  • the training model uses a supervised training method, so the sample images used need to have corresponding labels.
  • the drivable area needs to be labeled for each pixel in the image.
  • the ground points of dynamic objects at the borders of the drivable area in the sample image also need to be labeled.
  • the two ground points at both ends of the ground line need to be labeled as a pair, and the type of ground point also needs to be labeled.
  • the sample image is derived from a video stream collected by a camera.
  • the camera needs to perform single-targeting and record the internal parameters and distortion parameters of the camera.
  • the single-frame image is corrected according to the camera parameters, so that the picture is transformed into an undistorted or near-undistorted state.
  • a sample library can be constructed for model training.
  • Step 2 Enter the road sample pictures into the initialized neural network model.
  • road sample images need to be input into the neural network.
  • the road sample image needs to be scaled to a preset size.
  • Step 3 Train the initialized neural network with labeled road sample images.
  • a neural network is a network system formed by a large number of simple processing units that are widely connected to each other. It has a strong learning ability because it has a large number of adjustable parameters.
  • the neural network model is a mathematical model based on the neural network. Based on the powerful learning ability of the neural network model, the neural network model has been widely used in many fields.
  • convolutional neural network models are often used for pattern recognition. Due to the characteristics of local connection and weight sharing of convolutional layers in the convolutional neural network model, the parameters that need to be trained are greatly reduced, the network model is simplified, and the training efficiency is improved.
  • a convolutional neural network can be used as the initialization neural network model.
  • Features of road sample images were extracted using part of the convolutional neural network layer.
  • the subsequent convolutional neural network layer maps the relevant features to obtain the recognition result of the drivable area.
  • the neural network can also use these characteristics to get the information of the ground point pair.
  • the results of the recognition of the drivable area output from the neural network and the detection results of the ground object point pairs of the dynamic objects at the borders of the drivable area are compared with the pre-labeled drivable areas and ground point pairs of the road sample image, so that the parameters of the initial neural network model Optimization is performed.
  • a trained driving area and ground point detection model can be obtained.
  • the present application provides a training method for a dynamic object ground point detection model of a drivable region and a border of the drivable region.
  • the road sample images are labeled with the ground points and categories of dynamic objects in the drivable area and the boundary of the area.
  • the road sample images are input into the pre-established initial neural network model.
  • the road sample images are used to train the initial nerve in a supervised learning manner Network model.
  • Step 1 The road sample image is input to the convolutional neural network encoder.
  • the input road sample image is an RGB image.
  • the convolutional neural network encoder is composed of a convolutional layer, batch normalization, a ReLU activation function, and a pooling layer.
  • the convolution layer uses the same convolution kernel for different areas of an image to extract a feature of the image, such as the edge along a certain direction, and the weights are shared between different areas, which can greatly reduce the training parameters. . Further, by using a variety of convolution kernels to perform feature extraction on different regions of the image, various features of the image can be obtained.
  • RelU activation function is a commonly used activation function in convolutional neural networks, which provides non-linear modeling capabilities for the entire neural network.
  • the pooling layer reduces the size of the feature and reduces the amount of calculation.
  • Existing convolutional neural network models include VGG Net (Visual Geometry Group), AlexNet, Network Network, ResNet deep residual network model, and so on. These networks differ in terms of network depth, calculation volume, and accuracy of feature extraction.
  • the model selection of the convolutional neural network may be selected according to the computing power of the equipped equipment and the accuracy of the required drivable area and ground line detection.
  • the convolutional neural network encoder in the embodiment of the present application further includes a structure for feature fusion. Unlike the classification of convolutional neural network to extract abstract features, because subsequent segmentation of driving areas and detection of ground points require more accurate positioning, the features extracted by the encoder must not only include abstract semantic features, but also some Specific details. Therefore, the convolutional neural network encoder uses a structure for fusing features of each layer. As shown in Figure 3, this structure achieves the fusion of features at different depths by adding the features of convolutional neural networks at different depths.
  • Step 2 The image features obtained by the convolutional neural network encoder are input into the driving area segmentation branch of the convolutional neural network decoder to obtain the driving area segmentation result.
  • the convolutional neural network decoder includes a convolutional layer, batch normalization, ReLU activation function, and an upsampling section, where the upsampling section is to expand the size of the feature map and make the final output driveable.
  • the region segmentation result is the same size as the original road sample image.
  • the upsampling uses the deconvolution method. Deconvolution is the process of convolution backpropagation. It can control the size of the output feature map by controlling the step size of the deconvolution. Therefore, the deconvolution operation can be used to implement upsampling.
  • each pixel has a dimension of 2.
  • each pixel is classified into a driveable area and a non-drivable area. Probability, the category with the larger probability is the classification of the pixel, so that the road image can be divided into driveable and non-drivable areas.
  • Step 3 The image features obtained by the convolutional neural network encoder are input to the ground wire detection branch of the convolutional neural network decoder.
  • the input of this branch is the same as that of the segmentation branch of the drivable area, and also has a similar structure, except that the number of output channels is different.
  • Image features are convolved in this branch, normalized by batch processing, ReLU activation function and upsampling, and finally a feature map of the same size as the original picture is obtained.
  • the feature map will have two convolutional layer branches.
  • One of the branches obtains a score map of the same size as the original picture, and the number of channels of the score map is C + 1.
  • C represents the type of ground point, including the ground points visible on the left and right sides of the vehicle, the ground points not visible on the left and right sides of the vehicle, the ground points visible from the front and rear measurements of the vehicle, the ground points not visible from the front and rear measurements of the vehicle, the ground points visible from the electric vehicle, and electric The car cannot see the ground point, etc .; 1 means the background is not the ground point.
  • the scores of KxK pixels around each pixel are added to vote, and then the softmax function is input to obtain the score of the point classification.
  • the other branch obtains a distance map of the same size as the original image.
  • the number of channels in the distance map is 4, which represents the abscissa distance ⁇ x1 and ordinate distance ⁇ y1 of the point from the center of the ground point and the center of another ground point of the ground line.
  • the center of the ground point is obtained by voting according to the distance map of pixels in the surrounding KxK range. In this way, a candidate ground point can be obtained, and then a non-maximum suppression algorithm is used to obtain a final ground point.
  • the specific method is to select the grounding point with the highest score for a certain type of grounding point, and remove it when the other grounding points that are classified into this category are less than d from the grounding point with the highest score, so that the grounding point is treated. Do this for other unprocessed ground points. After all types of ground points have undergone this operation, the non-maximum suppression process is completed. This determines all ground points.
  • For the detected ground point you also need to connect to determine the ground line. For a certain ground point, find another pixel according to the distance obtained by voting around KxK pixels around it. When the pixel is less than c from another ground point and the When the ground point types meet the corresponding relationship, connect the two ground points. After traversing all ground points, if there are still unconnected ground points, these points are discarded.
  • the road sample image is outputted by the convolutional neural network to drive the segmentation map, ground point classification map and ground point distance map.
  • the driveable area and the boundary of the driveable road sample image can be obtained. Obstacle ground wire.
  • the driving area segmentation map, grounding point classification map, grounding point distance map and road sample image itself need to be compared to train the neural network. The comparison and training methods are explained in detail next.
  • Step 1 Scale the road sample image and the labeled map of the drivable area to a preset size.
  • the preset size is 448x448.
  • the ground point segmentation map and ground point distance map are calculated according to the ground point coordinates and types in the annotation.
  • Step 2 The road sample image is input to the above convolutional neural network to obtain a driving area segmentation map, a ground point classification map, and a ground point distance map. For each pixel in the drivable region segmentation map, calculate the cross-entropy loss between the labeled and drivable region segmentation map. Calculate the cross-entropy loss between the ground point classification map output from the neural network and the ground point classification map obtained from the labeled information. For a pixel marked as a ground point, calculate the mean square error loss between the distance value of the pixel in the ground point distance map and the distance value obtained according to the labeled information. These three losses are then added as a loss function for the training of the neural network.
  • L represents the loss function
  • p i , t i , s i represent the predicted values of the pixels at the same location in the ground point classification map, ground point distance map, and driveable area segmentation map.
  • the corresponding label values are respectively;
  • L cls is the loss function of the classification, here cross-entropy loss is used, Is to normalize all pixels involved in the calculation; similarly, L reg is the loss function of regression, you can use the mean square error, etc.
  • L seg is also a cross entropy loss function, Normalize the pixels involved in computing regression and segmentation, respectively.
  • the three loss functions are combined with different coefficients ⁇ and ⁇ .
  • Step 3 In an implementation of the embodiment of the present application, a convolutional neural network initialized with labeled road sample images needs to train 16 epochs according to the above-mentioned loss function, in which the learning rate is set to 0.00001, the optimization used The algorithm is Adam's algorithm. In other methods in the embodiments of the present application, the number of trainings and the learning rate may be adjusted according to the amount of data, and the optimization method may also use other optimization methods based on gradient descent.
  • Step 1 Perform a single target on the camera, record and store the internal parameters and distortion parameters of the camera. After obtaining the video stream from the camera, a single video frame is taken to correct the distortion of the video frame according to the above-mentioned calibration parameters, and then the picture is scaled to a preset size using bilinear interpolation.
  • Step 2 Input the picture obtained in Step 1 into the convolutional neural network. After the convolutional neural network and the above-mentioned subsequent processing, a segmentable map of the drivable area and a grounding point of obstacles at the border of the drivable area can be obtained, as shown in FIG. 1.
  • Step 3 Based on the ground lines of the dynamic objects in different video frames, the speed of the objects can be easily estimated.
  • the ground line is also easily projected into 3D space to estimate the pose and distance of the object.
  • the intelligent system can plan the driving route more accurately based on the structured information provided by the ground line and the information of the driveable area.
  • Each module or each step of the embodiments of the present invention may be implemented by a general-purpose computing device. They may be concentrated on a single computing device or distributed on a network composed of multiple computing devices. Alternatively, they may be calculated using computing.
  • the device executable program code is implemented so that they can be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described can be performed in a different order than here, or They are respectively made into individual integrated circuit modules, or multiple modules or steps in them are made into a single integrated circuit module for implementation. In this way, the embodiments of the present invention are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络训练方法,以及采用该神经网络进行通用接地线检测的方法,属于智能驾驶领域。现有技术存在难以估计边界的速度,不利于规划算法使用的技术问题,本发明提供的神经网络以及利用该神经网络系统进行通用接地线的检测,包括步骤1:对摄像装置获取的图像进行单目标定,记录并存储所述摄像装置的内部参数和畸变参数;步骤2:将步骤1中获得的图像输入到所述训练好的神经网络中,得到可行驶区域分割图以及接地点、接地线。该方法把当前道路图像分割成可行驶区域和障碍物区域的同时,检测出接地线及其对应的物体类别,相比传统方法检测更加准确快捷。

Description

一种神经网络模型训练及通用接地线的检测方法 技术领域
本发明属于智能驾驶领域领域,更具体地,涉及一种通用接地线检测方法。
背景技术
随着科学技术的发展,自动驾驶的概念被人们提出。在自动驾驶领域,车辆可以预置智能化系统检测出当前的可行驶区域并依照该区域行驶。
利用现有的可行驶区域检测方法可对图片进行处理得到障碍物区域和可行驶区域。然而,现有技术无法识别出障碍物的类型以及障碍物的个数。这样就难以估计边界的速度,不利于规划算法使用。
发明内容
针对现有技术的以上缺陷或改进需求,本发明的一个方面是提供了一种神经网络模型的训练方法,其特征在于:
所述训练方法包括如下训练步骤:步骤11:获取道路的样本,道路样本图像中标注可行驶区域以及边界处动态物体的一对接地点;
步骤12:将所述道路样本图像输入初始化的神经网络模型;
步骤13:利用经过标注的道路样本图像训练初始化的神经网络模型;所述神经网络的损失函数为:
Figure PCTCN2018113661-appb-000001
其中,L表示损失函数,p i,t i,s i分别表示同一位置的像素分别在接地点分类图、接地点距离图和可行驶区域分割图中的预测值,
Figure PCTCN2018113661-appb-000002
分别为 相应的label值;L cls是分类的损失函数,优选使用交叉熵损失;
Figure PCTCN2018113661-appb-000003
是对所有参与计算的像素点做归一化;L reg是回归的损失函数,优选使用均方误差,L seg是交叉熵损失函数,
Figure PCTCN2018113661-appb-000004
分别是对参与计算回归和分割的像素点做归一化,λ,γ表示不同的系数。优选地,在步骤11中,将所述道路样本图像和标注的可行驶区域图缩放至预设尺寸。
所述图像中与道路路面具有重叠部分的物体,这些物体与真实道路路面连接部分的边界线为接地线,所述接地线的两端点为接地点。
优选地,所述步骤12包括步骤121和步骤122:
步骤121:所述道路样本图像输入所述初始化的神经网络模型的编码器部分;
步骤122:将所述编码器获得的图像特征输入到所述初始化的神经网络模型的解码器,获得可行驶区域分割结果和接地线检测结果。
优选地,所述解码器包括可行驶区域分割分支和接地线检测分支。
按照本发明的另一方面,本发明提供了一种神经网络模型,其采用上述任一项所述的训练方法得到。
按照本发明的另一方面,提供了一种利用权利要求1-5中任一神经网络模型的训练方法检测通用接地线的方法,其特征在于,检测方法包括以下步骤::
步骤1:对摄像装置获取的图像进行单目标定,记录并存储所述摄像装置的内部参数和畸变参数;
步骤2:将步骤1中获得的图像输入到所述训练好的神经网络中,得到可行驶区域分割图以及接地点、接地线。
优选地,在所述步骤1中还包括使用双线性插值将所述图像缩放为预设尺寸。
本发明的发明点在于下述的几个方面,但不仅限于下述的几个方面:
(1)该方法利用预先训练好的可行驶区域检测模型,对车辆拍摄的当前道路图像进行识别。该模型可以提取当前道路图像特征并进行学习,在把当前道路图像分割成可行驶区域和障碍物区域的同时,检测出接地线及其对应的物体类别。后续还可通过区分动态物体的实例,产出不同类别的接地线,进而通过物体边界在连续多帧上的变化,估计出物体的速度等对于规划算法至关重要的物理量。
(2)损失函数的建立,该损失函数是将三个损失项相加得到的,三个损失项可归类为损失函数以及分类损失函数和归类损失函数;还考虑了参与计算回归和分割的像素点做归一化的因素,从而提高了神经网络的训练效果。
附图说明
图1是是本发明实施例提供的可行驶区域边界结构化的示例图;
图2是本发明实施例提供的网络中编码器结构和解码器结构中可行驶区域分割网络的结构示意图;
图3本发明实施例提供的网络中编码器中用于特征融合的结构的示意图。
在所有附图中,相同的附图标记用来表示相同的元件或结构,其中:
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明白,下面结合实施方式和附图,对本发明做进一步详细说明。在此,本发明的示意性实施方式及其说明用于解释本发明,但并不作为对本发明的限定。
本发明通过对当前道路图像进行处理得到了当前道路图像中可行驶区域边界结构化的表示,行车过程中,车辆可根据该结构化表示规划行车策略。
图1出示了本发明实施例提供的可行驶区域边界结构化的示例图,示例图中给出了一种通用接地线检测结果。如图1所示,阴影部分为识别到 的可行驶区域,其他部分即为非可行驶区域。对于可行驶区域的边界物体,本发明实施例也给出了结构化表示。可行驶区域边界物体可分为静态物体和动态物体。静态物体包括路肩、栅栏、三角锥等静态障碍物,动态物体包括汽车、电动车、自行车、行人等可自由移动的物体。对于静态物体,由于不需要估计物体的速度,所以其边界由可行驶区域边界表示,即图1中阴影部分。对于动态物体,本发明实施例通过接地线来表示。接地线根据所属物体类别或者同种物体不同方向而划分成不同类别。如图1中电动车和汽车车轮部位的地面上的线段分别表示电动车接地线,机动车前后侧接地线以及机动车左右侧接地线。接地线两段点为接地点,接地点有两个类别分别为可见接地点和推测接地点。
接下来对本申请实施例提供的一种可行驶区域及接地线检测模型的训练方法的具体实现方式进行介绍。
步骤1:获取道路样本图像,道路样本图像中应标注好可行驶区域以及边界处动态物体的一对接地点。
道路样本图像可以视为用于训练检测模型的样本图像。本申请实施例中,训练模型采用了有监督的训练方式,因此所用样本图像需要具有相应标注。其中可行驶区域需要对图像中每个像素进行标注。另外,样本图像中可行驶区域边界的动态物体的接地点也需要标注,接地线两端的两个接地点需要标注为一对,另外接地点类型也需要进行标注。
为了提高检测模型的精准度,需要大量样本图像。在本申请实施例中,样本图像来源于摄像机采集到的视频流。摄像机需要进行单目标定并记录摄像机的内部参数和畸变参数。根据摄像机参数对单帧图像进行矫正,使图片变换为无畸变或接近无畸变状态。对矫正后的道路样本图像进行标注后即可构建样本库以便模型训练。
步骤2:将道路样本图片输入初始化的神经网络模型。
为了训练神经网络模型需要将道路样本图像输入神经网络。在本申请 实施例一些可能的实现方式中,在将道路样本图像输入预先建立的初始神经网络模型之前,需要将道路样本图像缩放至预设尺寸。
步骤3:利用经过标注的道路样本图像训练初始化的神经网络。
为了便于理解,首先对神经网络模型的概念进行简单介绍。神经网络是由大量的、简单的处理单元广泛地互相连接而形成的网络系统,它因为拥有大量的可调节参数而具有强大的学习能力。神经网络模型即为基于神经网络建立的一种数学模型,基于神经网络模型的强大的学习能力,神经网络模型在许多领域都得到广泛的应用。
其中,在图像处理和模式识别领域,常常采用卷积神经网络模型进行模式识别。由于卷积神经网络模型中的卷积层局部连接以及权值共享的特性,使得需要训练的参数大大减少,简化了网络模型,提高了训练效率。
在本实施例中可以采用卷积神经网络作为初始化神经网络模型。利用部分卷积神经网络层对道路样本图像中的特征进行提取。根据提取到的图像特征,后续卷积神经网络层对相关特征进行映射,从而得到可行驶区域的识别结果。同样利用这些特征神经网络可以得到接地点对的信息。将神经网络输出的可行驶区域识别结果和可行驶区域边界动态物体接地点对的检测结果和道路样本图像预先标注的可行驶区域和接地点对进行对比,由此可以对初始神经网络模型的参数进行优化,当初始化的神经网络经过足够多的训练样本训练后,就可以得到训练好的可行驶区域和接地点检测模型。
由上可知,本申请提供了一种可行驶区域和可行驶区域边界动态物体接地点检测模型的训练方法。获取道路样本图像,道路样本图像中标注有可行驶区域和区域边界动态物体的接地点和类别,将道路样本图像输入预先建立的初始神经网络模型,利用道路样本图像以有监督学习方式训练初始神经网络模型。
为了使本申请的技术方案更清楚,下面将结合具体实施例对道路样本 图像经过神经网络,检测得到可行驶区域和障碍物接地线的过程进行详细说明。
步骤1:道路样本图像输入卷积神经网络编码器部分,如图2左半部分所示,输入的道路样本图像为RGB图像。卷积神经网络编码器由卷积层,批处理归一化,ReLU激活函数和池化层组合而成。其中卷积层对一张图像的不同区域采用相同的卷积核提取出该图像的一种特征,例如沿某一方向的边缘,不同区域之间实现权值共享,如此可以大大降低训练的参数。进一步地,采用多种卷积核分别对图像的不同区域进行特征提取,可以得到该图像的多种特征。批处理归一化通过对每一层的特征进行归一化使得后续神经网络的训练更容易收敛并减少了过拟合情况的发生。RelU激活函数则为卷积神经网络中常用的激活函数,它为整个神经网络提供了非线性建模能力。池化层则减少了特征的大小,减少了计算量,同时它还能够使神经网络对平移变换有一定鲁棒性。现有的卷积神经网络模型包括VGG Net(Visual Geometry Group)、AlexNet、Network in Network、ResNet深度残差网络模型等等。这些网络在网络深度,计算量,提取特征的精准度等方面有所不同。在本申请实施例中,卷积神经网络的模型选择可根据搭载设备的计算能力和所需可行驶区域和接地线检测的精准度进行选择。本申请实施例中卷积神经网络编码器还包括一个用于特征融合的结构。与分类卷积神经网络提取抽象特征不同,由于后续的可行驶区域分割和接地点检测都需要较为精确的定位,因此编码器提取的特征不能只包含抽象语义特征,还要包含道路样本图像中一些具体的细节特征。因此,卷积神经网络编码器中使用了用于融合各层特征的结构。如图3所示,该结构通过将不同深度的卷积神经网络的特征相加从而实现不同深度特征的融合。
步骤2:将卷积神经网络编码器获得的图像特征输入卷积神经网络解码器可行驶区域分割分支,获得可行驶区域分割结果。如图2所示,卷积 神经网络解码器中包含卷积层,批处理归一化,ReLU激活函数以及上采样部分,其中上采样部分是为了扩大特征图的大小,使得最后输出的可行驶区域分割结果和原道路样本图像相同大小。这里的上采样采用的是反卷积方法。反卷积即为卷积反向传播的过程,它可以通过控制反卷积的步长来控制输出特征图的大小,因此,反卷积操作可以用来实现上采样。经过一系列反卷积操作,得到了和原道路图像相同大小的特征图,其中每个像素的维度为2,将特征图经过softmax函数后得到每个像素分类为可行驶区域和不可行驶区域的概率,取较大值概率的类别就是该像素点的分类,这样就可以实现对道路图像可行驶区域和不可行驶区域的划分。
步骤3:将卷积神经网络编码器获得的图像特征输入到卷积神经网络解码器接地线检测分支。该分支和可行驶区域分割分支的输入相同,也具有相似的结构,只是输出通道数不同。图像特征在该分支中经过卷积,批处理归一化,ReLU激活函数和上采样,最终得到和原图片相同大小的特征图。该特征图将有两个卷积层分支。
其中一个分支得到和原图相同大小的分数图,该分数图的通道数为C+1。其中C表示接地点的种类,包括机动车左右侧可见接地点,机动车左右侧不可见接地点,机动车前后测可见接地点,机动车前后测不可见接地点,电动车可见接地点,电动车不可见接地点等;1代表背景即不是接地点。将每个像素点周围KxK个像素的分数相加投票,然后输入softmax函数,就得到该点分类的分数。
另一个分支得到和原图大小相同的距离图,该距离图的通道数为4,分别代表该点距离接地点中心的横坐标距离Δx1和纵坐标距离Δy1以及距离该接地线另一接地点中心的横坐标距离Δx2和Δy2。
对于分类为接地点的像素,根据周围KxK范围内像素的距离图投票得到接地点中心。如此可以得到候选接地点,然后采用非极大值抑制算法得到最终接地点。具体做法为对于某一类接地点,选取分数最高的接地点, 当其他分为该类的接地点距离该分数最大接地点小于d,则去除,这样认为该接地点经过处理。接着对于其他未处理的接地点执行该操作。当所有类别接地点都经过该操作后,就完成了非极大值抑制处理。如此就确定了所有接地点。
对于检测得到的接地点还需要进行连线以确定接地线,对于某个接地点,根据其周围KxK个像素投票得到的距离找到另一个像素,当该像素距离另一接地点相距小于c并且该接地点类别符合对应关系时,将两接地点连线。遍历所有接地点后,若仍有未连线的接地点,则将这些点舍去。
由上可知,道路样本图像经过卷积神经网络输出可行驶区域分割图,接地点分类图以及接地点距离图,由上述神经网络输出信息可以得到最终道路样本图像的可行驶区域和可行驶区域边界障碍物接地线。在训练中,需要将可行驶区域分割图,接地点分类图,接地点距离图和道路样本图像本身的标注进行对比从而训练神经网络。接下来将对比和训练方法进行详细说明。
步骤1:将道路样本图像和标注的可行驶区域图缩放至预设尺寸,在本申请实施例的一种实现方式中,预设尺寸为448x448。根据标注中的接地点坐标和种类计算得到接地点分割图和接地点距离图。
步骤2:将道路样本图像输入至上述卷积神经网络中得到可行驶区域分割图,接地点分类图和接地点距离图。对可行驶区域分割图中的每一个像素,计算与标注可行驶区域分割图之间的交叉熵损失。计算神经网络输出的接地点分类图和依据标注信息得到的接地点分类图之间的交叉熵损失。对于标注为接地点的像素,计算该像素在接地点距离图中的距离数值和依据标注信息得到的距离数值之间的均方误差损失。然后将这三个损失相加作为神经网络训练的损失函数。
Figure PCTCN2018113661-appb-000005
Figure PCTCN2018113661-appb-000006
上述式子中,L表示损失函数,p i,t i,s i分别表示同一个位置的像素在接地点分类图、接地点距离图、和可行驶区域分割图中的预测值,
Figure PCTCN2018113661-appb-000007
分别为相应的label值;L cls是分类的损失函数,这里使用交叉熵损失,
Figure PCTCN2018113661-appb-000008
是对所有参与计算的像素点做归一化;同理,L reg是回归的损失函数,可以使用均方误差等,L seg同样是交叉熵损失函数,
Figure PCTCN2018113661-appb-000009
分别是对参与计算回归和分割的像素点做归一化。3个损失函数之间用不同的系数λ,γ结合起来。
步骤3:在本申请实施例的一种实现方式中,需要将使用有标注道路样本图像将初始化的卷积神经网络依据上述损失函数训练16个epoch,其中学习率设定为0.00001,采用的优化算法为Adam算法。在本申请实施例的其他方法中,训练的次数,学习率可依据数据量调整,优化方法也可采用基于梯度下降的其他优化方法。
当可行驶区域检测和接地线检测模型训练完成后,本申请实施例具体施用细节如下:
步骤1:对摄像机进行单目标定,记录并存储摄像机的内部参数和畸变参数。从摄像机中获取视频流后取单个视频帧依据上述标定参数对视频帧进行畸变矫正,然后使用双线性插值将图片缩放为预设尺寸。
步骤2:将步骤1获得的图片输入卷积神经网络中。经过卷积神经网络及上述后续处理可以得到可行驶区域分割图以及可行驶区域边界障碍物接地点,如图1所示。
步骤3:基于不同视频帧中动态物体的接地线,可以方便地估计物体的运动速度。接地线也容易投影到3D空间从而估计物体的位姿和距离。这样根据接地线提供的结构化信息和可行驶区域的信息,智能系统可以更为精准的规划行车路线。
上述内容对本申请实施例中基于通用接地线的可行驶区域表示方法的模型训练和使用的细节步骤做了详细描述,下面将从硬件或软件实现角度对本发明实施例进行介绍。本发明实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明实施例不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明实施例可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (7)

  1. 一种神经网络模型的训练方法,其特征在于:所述训练方法包括如下训练步骤:
    步骤11:获取道路的样本,道路样本图像中标注可行驶区域以及边界处动态物体的一对接地点;
    步骤12:将所述道路样本图像输入初始化的神经网络模型;
    步骤13:利用经过标注的道路样本图像训练初始化的神经网络模型;其中所述神经网络的损失函数为:
    Figure PCTCN2018113661-appb-100001
    其中,L表示损失函数,p i,t i,s i分别表示同一位置的像素分别在接地点分类图、接地点距离图和可行驶区域分割图中的预测值,
    Figure PCTCN2018113661-appb-100002
    分别为相应的label值;L cls是分类的损失函数,优选使用交叉熵损失;
    Figure PCTCN2018113661-appb-100003
    是对所有参与计算的像素点做归一化;L reg是回归的损失函数,优选使用均方误差,L seg是交叉熵损失函数,
    Figure PCTCN2018113661-appb-100004
    分别是对参与计算回归和分割的像素点做归一化,λ,γ表示不同的系数;
    所述图像中与道路路面具有重叠部分的物体,这些物体与真实道路路面连接部分的边界线为接地线,所述接地线的两端点为接地点。
  2. 如权利要求1所述的训练方法,其特征在于,所述步骤12包括步骤121和步骤122:
    步骤121:将所述道路样本图像输入所述初始化的神经网络模型的编码器部分;
    步骤122:将所述编码器获得的图像特征输入到所述初始化的神经网络模型的解码器,获得可行驶区域分割结果和接地线检测结果。
  3. 如权利要求2所述的训练方法,其特征在于,所述解码器包括可行驶区域分割分支和接地线检测分支。
  4. 如权利要求1所述的训练方法,其特征在于,在步骤11中,将所述道路样本图像和标注的可行驶区域图缩放至预设尺寸。
  5. 一种神经网络模型,其采用权利要求1-4中任一项所述的训练方法得到。
  6. 利用权利要求1-4中任一项的神经网络模型的训练方法检测通用接地线的方法,其特征在于,检测方法包括以下步骤:
    步骤1:对摄像装置获取的图像进行单目标定,记录并存储所述摄像装置的内部参数和畸变参数;
    步骤2:将步骤1中获得的所述图像输入到所述训练好的所述神经网络中,得到可行驶区域分割图以及接地点、接地线。
  7. 如权利要求6所述的检测通用接地线的方法,其特征在于,在所述步骤1中还包括使用双线性插值将所述图像缩放为预设尺寸。
PCT/CN2018/113661 2018-09-29 2018-11-02 一种神经网络模型训练及通用接地线的检测方法 WO2020062433A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811143635.1A CN109726627B (zh) 2018-09-29 2018-09-29 一种神经网络模型训练及通用接地线的检测方法
CN201811143635.1 2018-09-29

Publications (1)

Publication Number Publication Date
WO2020062433A1 true WO2020062433A1 (zh) 2020-04-02

Family

ID=66295410

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113661 WO2020062433A1 (zh) 2018-09-29 2018-11-02 一种神经网络模型训练及通用接地线的检测方法

Country Status (2)

Country Link
CN (1) CN109726627B (zh)
WO (1) WO2020062433A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597913A (zh) * 2020-04-23 2020-08-28 浙江大学 一种基于语义分割模型的车道线图片检测分割方法
CN111968088A (zh) * 2020-08-14 2020-11-20 西安电子科技大学 一种基于像素和区域分割决策融合的建筑物检测方法
CN112036231A (zh) * 2020-07-10 2020-12-04 武汉大学 一种基于车载视频的车道线和路面指示标志检测与识别方法
CN112949097A (zh) * 2021-04-19 2021-06-11 合肥工业大学 一种基于深度迁移学习的轴承剩余寿命预测模型和方法
CN113781492A (zh) * 2020-06-10 2021-12-10 阿里巴巴集团控股有限公司 目标元素含量测量方法、训练方法、相关装置及存储介质
CN114648534A (zh) * 2022-05-24 2022-06-21 成都理工大学 基于视频帧聚类的管网缺陷智能识别方法及装置、介质
CN116400605A (zh) * 2023-06-08 2023-07-07 成都航空职业技术学院 一种机器人自动控制方法及系统
CN116863429A (zh) * 2023-07-26 2023-10-10 小米汽车科技有限公司 检测模型的训练方法、可行使区域的确定方法和装置
CN117523318A (zh) * 2023-12-26 2024-02-06 宁波微科光电股份有限公司 一种抗光干扰的地铁屏蔽门异物检测方法、装置及介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110956069B (zh) * 2019-05-30 2022-06-21 魔门塔(苏州)科技有限公司 一种行人3d位置的检测方法及装置、车载终端
CN110705470A (zh) * 2019-09-30 2020-01-17 的卢技术有限公司 一种实时检测可行使路面的方法及系统
CN110956214B (zh) * 2019-12-03 2023-10-13 北京车和家信息技术有限公司 一种自动驾驶视觉定位模型的训练方法及装置
CN111368794B (zh) * 2020-03-19 2023-09-19 北京百度网讯科技有限公司 障碍物检测方法、装置、设备和介质
CN111950478B (zh) * 2020-08-17 2021-07-23 浙江东鼎电子股份有限公司 一种动态平板秤称重区域汽车s型行驶行为检测方法
CN113239960B (zh) * 2021-04-09 2024-05-28 中用科技有限公司 融合ai视觉算法的道路防护智能预警方法和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815566A (zh) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 一种基于多任务卷积神经网络的人脸检索方法
JP2018060296A (ja) * 2016-10-03 2018-04-12 グローリー株式会社 画像処理装置、画像処理システム及び画像処理方法
CN108171136A (zh) * 2017-12-21 2018-06-15 浙江银江研究院有限公司 一种多任务卡口车辆以图搜图的系统及方法
CN108230329A (zh) * 2017-12-18 2018-06-29 孙颖 基于多尺度卷积神经网络的语义分割方法
CN108550259A (zh) * 2018-04-19 2018-09-18 何澜 道路拥堵判断方法、终端设备及计算机可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256413B (zh) * 2017-11-27 2022-02-25 科大讯飞股份有限公司 可通行区域检测方法及装置、存储介质、电子设备
CN108345875B (zh) * 2018-04-08 2020-08-18 北京初速度科技有限公司 可行驶区域检测模型训练方法、检测方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060296A (ja) * 2016-10-03 2018-04-12 グローリー株式会社 画像処理装置、画像処理システム及び画像処理方法
CN106815566A (zh) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 一种基于多任务卷积神经网络的人脸检索方法
CN108230329A (zh) * 2017-12-18 2018-06-29 孙颖 基于多尺度卷积神经网络的语义分割方法
CN108171136A (zh) * 2017-12-21 2018-06-15 浙江银江研究院有限公司 一种多任务卡口车辆以图搜图的系统及方法
CN108550259A (zh) * 2018-04-19 2018-09-18 何澜 道路拥堵判断方法、终端设备及计算机可读存储介质

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597913A (zh) * 2020-04-23 2020-08-28 浙江大学 一种基于语义分割模型的车道线图片检测分割方法
CN113781492A (zh) * 2020-06-10 2021-12-10 阿里巴巴集团控股有限公司 目标元素含量测量方法、训练方法、相关装置及存储介质
CN112036231A (zh) * 2020-07-10 2020-12-04 武汉大学 一种基于车载视频的车道线和路面指示标志检测与识别方法
CN112036231B (zh) * 2020-07-10 2022-10-21 武汉大学 一种基于车载视频的车道线和路面指示标志检测与识别方法
CN111968088A (zh) * 2020-08-14 2020-11-20 西安电子科技大学 一种基于像素和区域分割决策融合的建筑物检测方法
CN111968088B (zh) * 2020-08-14 2023-09-15 西安电子科技大学 一种基于像素和区域分割决策融合的建筑物检测方法
CN112949097B (zh) * 2021-04-19 2022-09-16 合肥工业大学 一种基于深度迁移学习的轴承剩余寿命预测模型和方法
CN112949097A (zh) * 2021-04-19 2021-06-11 合肥工业大学 一种基于深度迁移学习的轴承剩余寿命预测模型和方法
CN114648534A (zh) * 2022-05-24 2022-06-21 成都理工大学 基于视频帧聚类的管网缺陷智能识别方法及装置、介质
CN116400605A (zh) * 2023-06-08 2023-07-07 成都航空职业技术学院 一种机器人自动控制方法及系统
CN116400605B (zh) * 2023-06-08 2023-08-11 成都航空职业技术学院 一种机器人自动控制方法及系统
CN116863429A (zh) * 2023-07-26 2023-10-10 小米汽车科技有限公司 检测模型的训练方法、可行使区域的确定方法和装置
CN116863429B (zh) * 2023-07-26 2024-05-31 小米汽车科技有限公司 检测模型的训练方法、可行使区域的确定方法和装置
CN117523318A (zh) * 2023-12-26 2024-02-06 宁波微科光电股份有限公司 一种抗光干扰的地铁屏蔽门异物检测方法、装置及介质
CN117523318B (zh) * 2023-12-26 2024-04-16 宁波微科光电股份有限公司 一种抗光干扰的地铁屏蔽门异物检测方法、装置及介质

Also Published As

Publication number Publication date
CN109726627B (zh) 2021-03-23
CN109726627A (zh) 2019-05-07

Similar Documents

Publication Publication Date Title
WO2020062433A1 (zh) 一种神经网络模型训练及通用接地线的检测方法
WO2020244653A1 (zh) 物体识别方法及装置
CN110084850B (zh) 一种基于图像语义分割的动态场景视觉定位方法
WO2020151109A1 (zh) 基于点云带权通道特征的三维目标检测方法及系统
CN111311666B (zh) 一种融合边缘特征和深度学习的单目视觉里程计方法
Kong et al. Vanishing point detection for road detection
CN111612008B (zh) 基于卷积网络的图像分割方法
CN111784747B (zh) 一种基于关键点检测和校正的车辆多目标跟踪系统及方法
WO2015010451A1 (zh) 一种从单幅图像检测道路的方法
CN111291714A (zh) 一种基于单目视觉和激光雷达融合的车辆检测方法
CN110728209A (zh) 一种姿态识别方法、装置、电子设备及存储介质
CN107545263B (zh) 一种物体检测方法及装置
CN109886200B (zh) 一种基于生成式对抗网络的无人驾驶车道线检测方法
CN111696110B (zh) 场景分割方法及系统
CN110363160B (zh) 一种多车道线识别方法及装置
CN106778668A (zh) 一种联合ransac和cnn的鲁棒的车道线检测方法
CN112949493A (zh) 一种结合语义分割和注意力机制的车道线检测方法及系统
CN114429457A (zh) 一种基于双模态融合的风机叶片缺陷智能检测方法
Saleem et al. Steering angle prediction techniques for autonomous ground vehicles: a review
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
CN115018999A (zh) 一种多机器人协作的稠密点云地图构建方法及装置
CN113420648B (zh) 一种具有旋转适应性的目标检测方法及系统
Nguyen et al. UnfairGAN: An enhanced generative adversarial network for raindrop removal from a single image
CN112053385B (zh) 基于深度强化学习的遥感视频遮挡目标跟踪方法
Jin et al. Road curvature estimation using a new lane detection method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934566

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18934566

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18934566

Country of ref document: EP

Kind code of ref document: A1