CN110210350B - Rapid parking space detection method based on deep learning - Google Patents
Rapid parking space detection method based on deep learning Download PDFInfo
- Publication number
- CN110210350B CN110210350B CN201910429977.8A CN201910429977A CN110210350B CN 110210350 B CN110210350 B CN 110210350B CN 201910429977 A CN201910429977 A CN 201910429977A CN 110210350 B CN110210350 B CN 110210350B
- Authority
- CN
- China
- Prior art keywords
- parking space
- image
- convolution
- dimension
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/586—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of parking space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a quick parking space detection method based on deep learning, which belongs to the technical field of driving and is used for solving the problems of poor environment adaptability and large model calculation amount of parking space detection, and the method comprises the offline steps of: acquiring image data including parking spaces offline, and establishing a training and verification data set; training, evaluating and optimizing a neural network model; the neural network model is used for performing semantic segmentation on a parking space sideline in the image data; an online step: acquiring image data containing parking spaces on line, performing parking space sideline semantic segmentation by using a trained neural network model to obtain a parking space sideline mask, and fitting, clustering and combining the obtained sideline masks to obtain a geometric shape consisting of sidelines; and screening the geometric shapes according to the set shape discrimination conditions to determine the parking spaces. The invention has strong environmental adaptability; the adopted model has small volume, low calculation amount and small requirement on calculation resources; the system has low cost and large-scale application potential.
Description
Technical Field
The invention relates to the technical field of driving, in particular to a quick parking space detection method based on deep learning.
Background
Parking space detection and positioning are the basis of an automatic parking system and an auxiliary parking system, in the existing method, a method based on non-deep learning carries out parking space detection by means of manually extracting the sideline characteristics of the parking space, and the problem that the detection system fails exists under the conditions that the marks of the sidelines of the parking space are not clear, the shadow of a building and the reflection phenomenon caused by water accumulation exist, a camera is fuzzy and the like. However, the deep learning-based method generally has a large model volume, a large model calculation amount, a high requirement on a calculation device, and high system cost, and is not favorable for large-scale application and popularization on vehicles. The method with the robust detection rate and the low system cost has important popularization value.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a fast parking space detection method based on deep learning, which solves the problems of poor environment adaptability and large model calculation amount in parking space detection.
The purpose of the invention is mainly realized by the following technical scheme:
a quick parking space detection method based on deep learning comprises the following steps:
an off-line step: acquiring image data including parking spaces offline, and establishing a training and verification data set; training, evaluating and optimizing a neural network model; the neural network model is used for performing semantic segmentation on a parking space sideline in the image data;
an online step: acquiring image data containing parking spaces on line, performing parking space sideline semantic segmentation by using a trained neural network model to obtain a parking space sideline mask, and fitting, clustering and combining the obtained sideline masks to obtain a geometric shape consisting of sidelines; and screening the geometric shapes according to the set shape discrimination conditions to determine the parking spaces.
Further, the offline step specifically includes:
1) acquiring a plurality of groups of picture data containing parking spaces in an off-line manner, marking the side line areas of the parking spaces in the pictures, and constructing a training and verifying data set;
2) constructing a lightweight deep learning semantic segmentation model based on a channel compression convolution mode, and performing model parameter training by using a training data set;
3) establishing an evaluation standard, evaluating the trained model by using a verification data set, and adjusting model parameters;
4) and optimizing and accelerating the evaluated model.
Further, the lightweight deep learning semantic segmentation model based on the channel compression convolution mode comprises:
a preprocessing unit for input size W1×H1Performing convolution and maximum pooling on the x 3 image, reducing dimension of the image in width and height dimensions, connecting the convolution processing result and the maximum pooling processing result, and outputting the result with the size of W2×H2×N2The pre-processed image of (1);
a down-sampling feature extraction unit for sequentially performing two-stage down-sampling processing on the preprocessed image, reducing the dimension of the width and height dimensions of the image, extracting edge semantic features, and outputting the edge semantic features with the output size of W3×H3×N3The down-sampled image of (2);
an up-sampling feature extraction unit for sequentially performing two-stage up-sampling processing on the sampled image, increasing the width and height dimensions of the image, recovering edge semantic features, and outputting an output size of W2×H2Up-sampling a binary image by 2;
a model output unit for performing difference processing on the up-sampled binary image with output size W1×H1A binary image of x 2;
the two-stage up-sampling process corresponds to the two-stage down-sampling process; wherein the first stage upsampling process corresponds to the second stage downsampling process, and the second stage upsampling process corresponds to the first stage downsampling process.
Furthermore, the main structure of each stage of downsampling processing firstly reduces the channel dimension of an input image through a 1 × 1 convolution kernel, then reduces the width dimension and the height dimension through a 3 × 3 convolution kernel, and finally expands the channel dimension through the 1 × 1 convolution kernel to obtain a downsampling main output result, and the lateral structure of each stage of upsampling processing firstly reduces the width dimension and the height dimension through pooling layer operation and then expands the channel dimension through the 1 × 1 convolution kernel to obtain a downsampling lateral output result; and finally, performing element-by-element addition on the trunk output result and the side output result to obtain a down-sampling result.
Furthermore, the trunk structure of each stage of upsampling processing firstly adopts a 1 × 1 convolution kernel to reduce the channel dimension of the input image, then adopts a 3 × 3 deconvolution to extract the features and improve the width and height dimensions, and then adopts a 1 × 1 convolution mode to expand the channel dimension to obtain an upsampling lateral output result; the input features of the lateral connection structure of each level of up-sampling processing are the features of width and height dimension output by the corresponding down-sampling processing, and the features of width and height dimension and the main output result are added element by element to obtain the fused feature information of different levels.
Furthermore, after each convolution layer which is subjected to convolution operation, a batch normalization layer, a linear mapping layer and a linear rectification layer are sequentially connected, and batch normalization operation, linear mapping operation and linear rectification operation are carried out on convolution operation results to realize normalization and nonlinear transformation of output characteristics.
Further, the optimizing and accelerating the neural network model comprises,
a. extracting and fusing parameters in all the convolution layers, the batch normalization layer and the linear mapping layer in the model; the fused parameters include:
Wherein, woldAs convolution layer weights before fusion, boldBiasing the convolutional layer before fusing;
gamma and beta are parameters of a linear mapping layer;
mean and var are the mean and variance of all features in the normalization layer;
ε is a minimum value greater than 0;
b. the model was quantized with low precision using FP 16.
Further, the online step specifically includes:
1) in the driving process of the vehicle, a camera installed on the vehicle is adopted to collect the image information including the parking space on line;
2) performing parking space sideline semantic segmentation on the picture information by using the neural network model;
3) performing line-by-line scanning on the semantic partition result of the parking space sidelines, extracting the central point of a continuous area in the partition result, and performing straight line fitting by using hough transformation on the basis to obtain each sideline of the parking space contained in the picture;
4) and clustering and combining the sidelines to form a geometric area, and judging the geometric area meeting the parking space judging condition as the parking space according to the set parking space judging condition.
Further, the clustering the edge includes:
1) judging whether the included angle delta theta of two straight line segments in the side line is smaller than a clustering angle threshold theta or notT;
2) Judging whether the distance between two straight line sections is smaller than the pixel distance of the borderline of the parking space in the image, wherein the distance between the two straight line sections is the distance from the center point of any straight line section to the straight line where the other straight line section is located;
3) judging whether the distance between the nearest points of the two straight line segments is less than a threshold value dTSaidLsThe distance of the short side of the parking space in the image is taken as the distance;
4) clustering straight line segments satisfying 1) to 3).
Further, the image data including the parking space is acquired by a calibrated camera installed on the vehicle, and the calibrating of the camera includes:
1) off-line calibration is carried out on the internal and external parameters of the camera, and the parameters are used for eliminating image distortion caused by imaging of a camera lens;
2) and off-line calibration is carried out on the inverse perspective transformation matrix of the camera, so that the forward-looking image is converted into a top view, and the shape distortion of the parking space caused by the perspective transformation imaging of the camera is eliminated.
The scheme of the invention can realize at least one of the following beneficial effects:
1. the method for detecting the parking space by utilizing the deep learning has the advantages of strong environmental adaptability and good detection effect under the conditions of shadow, shielding, ground reflection, abrasion of parking space marking lines and the like.
2. The model structure design adopts a lightweight convolution module design method based on channel compression, and compared with a network structure using a standard convolution operation mode, the model has the advantages of small volume, low calculation amount and low requirement on calculation resources, so that the detection efficiency is high and the performance requirement on hardware is low.
3. Through the combination of network weight, the further improvement of network speed is realized, and the level of realizing real-time detection on an embedded platform can be reached, so that the system is low in cost and has the potential of large-scale application.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
Fig. 1 is a flowchart of a fast parking space detection method according to an embodiment of the present invention;
FIG. 2 is a diagram of a parking space segmentation network model architecture according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an overall calculation process of a standard convolution according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a single step standard convolution calculation process according to an embodiment of the present invention;
FIG. 5 is a diagram of a standard convolution wide high dimension information flow structure in an embodiment of the present invention;
fig. 6 is a diagram of a standard convolution channel information flow structure in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the principles of the invention.
The embodiment discloses a quick parking space detection method based on deep learning, as shown in fig. 1, comprising the following steps:
step S1, offline step: acquiring image data including parking spaces offline, and establishing a training and verification data set; training, evaluating and optimizing a neural network model; the neural network model is used for performing semantic segmentation on a parking space sideline in the image data;
the establishing process comprises the following steps:
1) acquiring a plurality of groups of image data containing parking spaces in an off-line manner, marking the side line areas of the parking spaces in the images, and constructing a training and verifying data set;
2) constructing a lightweight deep learning semantic segmentation model based on a channel compression convolution mode, and performing model parameter training by using a training data set;
3) establishing an evaluation standard, evaluating the trained model by using a verification data set, and adjusting model parameters;
4) and optimizing and accelerating the evaluated model.
Specifically, in order to enable the trained neural network model to be more accurate, image data near parking spaces of various types as much as possible are collected on the construction of training and verifying data sets; moreover, the sample ratio of the training data set to the validation data set is approximately 5: 1;
for example, 10000-30000 pieces of picture data are collected as training samples of a deep learning model training data set; meanwhile, 2000 plus 6000 pictures are collected as verification samples of the model verification data set;
since the top view is output after the calibration of the camera in step S1, the samples in the training and verification data sets need to be converted into the top view, and the conversion method adopts the same inverse perspective transformation method as that in step S1.
Specifically, marking the parking space borderline area of the sample in the training and verification data set is realized by adopting manual marking in a top view;
in the marking process, open source software such as a labelme tool is adopted to carry out image pixel level marking, the side line area of the parking space is marked as 1 through marking, and the background area is marked as 0.
Specially, the lightweight deep learning semantic segmentation model based on the channel compression convolution mode is constructed based on the open source deep learning framework Caffe, and specifically comprises the following steps:
a preprocessing unit for input size W1×H1Performing convolution and maximum pooling on the x 3 image, reducing dimension of the image in width and height dimensions, connecting the convolution processing result and the maximum pooling processing result, and outputting the result with the size of W2×H2×N2The pre-processed image of (1); the wide and high dimension of the image is reduced by preprocessing, so that the calculation amount of subsequent processing is greatly reduced.
A down-sampling feature extraction unit for sequentially performing two-stage down-sampling processing on the preprocessed image, reducing the dimension of the width and height dimensions of the image, extracting edge semantic features, and outputting the edge semantic features with the output size of W3×H3×N3The down-sampled image of (2);
an up-sampling feature extraction unit for sequentially performing two-stage up-sampling processing on the sampled image, increasing the width and height dimensions of the image, recovering edge semantic features, and outputting an output size of W2×H2Up-sampling a binary image by 2;
a model output unit for performing difference processing on the up-sampled binary image with output size W1×H1A binary image of x 2;
the two-stage up-sampling process corresponds to the two-stage down-sampling process; wherein the first stage upsampling process corresponds to the second stage downsampling process, and the second stage upsampling process corresponds to the first stage downsampling process.
For each level of downsampling processing, the main structure of the downsampling processing is firstly reduced through a 1 × 1 convolution kernel to reduce the channel dimensionality of an input image, then reduced through a 3 × 3 convolution kernel to reduce the width and height dimensionalities, and finally expanded through the 1 × 1 convolution kernel to obtain a downsampling main output result; the lateral structure firstly reduces the width dimension and the height dimension through the operation of a pooling layer, and then expands the channel dimension through a 1 multiplied by 1 convolution kernel to obtain a down-sampling lateral output result; and finally, performing element-by-element addition on the trunk output result and the side output result to obtain a down-sampling result.
Preferably, after the first-stage downsampling processing, the image feature data is subjected to feature extraction through a certain number of serial same-dimensional feature extraction modules, and after output, the second-stage downsampling processing is performed; after the second-stage down-sampling processing, feature extraction is carried out on the image feature data through a certain number of serial same-dimensional feature extraction modules;
for each level of up-sampling processing, firstly reducing the channel dimension of an input image by adopting a 1 × 1 convolution kernel, then extracting the features and improving the width and height dimensions by adopting a 3 × 3 deconvolution, and then expanding the channel dimension by adopting a 1 × 1 convolution mode to obtain an up-sampling lateral output result; the input features of the lateral connection structure of each level of up-sampling processing are the features of width and height dimension output by the corresponding down-sampling processing, and the features of width and height dimension and the main output result are added element by element to obtain the fused feature information of different levels.
Preferably, after the first-stage up-sampling processing, feature extraction is performed on the image feature data through a certain number of series same-dimensional feature extraction modules, and after output, second-stage up-sampling processing is performed; and after the second-stage up-sampling processing, performing feature extraction on the image feature data through a same-dimension feature extraction module.
Preferably, after each convolution layer which is subjected to convolution operation, a batch normalization layer, a linear mapping layer and a linear rectification layer are sequentially connected, and batch normalization operation, linear mapping operation and linear rectification operation are carried out on convolution operation results to realize normalization of output characteristics, so that convergence speed of network training is accelerated.
The model will be described below by taking as an example an RGB color image having an input image width and height of 448 × 448 (i.e., an image size of 448 × 448 × 3). The network structure using the model is shown in fig. 2;
1) the preprocessing unit is used for extracting features and reducing dimensions of an input image by adopting 3 × 3 standard convolution processing with the step length of 2, outputting a feature with the dimension of 224 × 224 × 13, simultaneously obtaining the feature with the dimension of 224 × 224 × 3 by adopting maximum pooling processing, connecting the feature with the feature in parallel to obtain an output feature with the dimension of 224 × 224 × 16, and performing batch normalization operation (BatchNorm) on the feature to obtain the output feature with the dimension of 224 × 224 × 16, namely the preprocessing output feature after linear mapping operation and linear rectification operation.
2) The down-sampling feature extraction unit is used for extracting features and reducing feature width and height dimensions, the down-sampling feature extraction unit of the network model shares a two-stage down-sampling module, the feature scale input by the first-stage down-sampling module is 224 multiplied by 16, and the output feature scale is 112 multiplied by 64; the method comprises the steps that 3 serial conventional feature extraction modules are connected behind a first-level downsampling module, the conventional feature extraction modules are same-dimensional conventional feature extraction modules, features with the extraction scale of 112 multiplied by 64 are output to a second-level downsampling module, the feature scale output by the second-level downsampling module is 56 multiplied by 128, 16 serial conventional feature extraction modules are connected behind the second-level downsampling module, the conventional feature extraction modules are same-dimensional conventional feature extraction modules, and the features with the extraction scale of 56 multiplied by 128 are output to an upsampling feature extraction unit.
The down-sampling module is formed as follows: the main structure firstly adopts 1 × 1 convolution to reduce the channel dimension of the feature dimension into 1/4 of the input feature channel dimension, then standard 3 × 3 convolution operation is carried out, the step length used in the convolution operation is 2, therefore, the width dimension and the height dimension of the output feature are reduced to half of the width dimension and the height dimension of the input feature, the channel dimension of the output feature is equal to the input feature, then the 1 × 1 convolution is used for expanding the feature channel dimension, and the width dimension and the height dimension of the feature are not changed. In the lateral connection structure, firstly, the maximum value pooling operation with the step length set to be 2 is used for carrying out the dimension reduction operation of the feature width and height dimensions, and then the 1 x1 convolution mode is adopted for carrying out the expansion of the feature channel dimensions. And then, carrying out element-by-element addition operation on the features output from the main structure and the features output from the lateral connection structure to realize feature fusion. After each convolution operation, batch normalization operation is used, and normalization of output characteristics is realized through linear mapping operation and linear rectification operation, so that convergence speed of network training is accelerated. In addition, the input features of the module can be connected with the features of the corresponding scale of the up-sampling module at the back, so that the fusion of the features of different layers is realized, and the segmentation precision is improved.
The conventional feature extraction module is constituted as follows: in the trunk structure, 1 × 1 convolution is firstly adopted for channel dimension reduction, then 3 × 3 standard convolution is adopted for feature extraction, and then a 1 × 1 convolution mode is adopted for channel dimension expansion; in the lateral connection structure, the pixel-by-pixel addition operation is directly carried out on the input features and the convolution output features of the main network; moreover, batch normalization operation is used after each convolution operation, and normalization of output characteristics is realized through linear mapping operation and linear rectification operation, so that convergence speed of network training is accelerated.
3) And the up-sampling feature extraction unit is used for realizing the expansion of feature width and height dimensions and the feature extraction function. The up-sampling feature extraction unit of the network model has two stages of up-sampling modules, the feature scale input by the first stage of up-sampling module is 56 multiplied by 128, and the feature scale output by the first stage of up-sampling module is 112 multiplied by 64; connecting 3 serial conventional feature extraction modules after the first-stage upsampling module, wherein the conventional feature extraction modules are same-dimensional conventional feature extraction modules, extracting features with the scale of 112 multiplied by 64 and outputting the features to the second-stage upsampling module, the feature scale input by the second-stage upsampling module is 112 multiplied by 64, and the output feature scale is 224 multiplied by 16; and a second-stage down-sampling module is connected with a serial conventional feature extraction module, the conventional feature extraction module is a channel dimension reduction conventional feature extraction module, and features with the extraction scale of 224 multiplied by 2 are output to a model output unit.
The upsampling module is composed as follows: the method comprises the steps that firstly, 1 × 1 convolution is adopted for a trunk structure to reduce channel dimensionality into 1/4 of input channel dimensionality, then 3 × 3 deconvolution is adopted for feature extraction and width and height dimensionality lifting, and then a 1 × 1 convolution mode is adopted for channel dimensionality expansion; the input features of the lateral connection structure are features corresponding to the width and the height dimensions in the downsampling module, and the features obtained by convolution processing in the main structure are subjected to element-by-element addition operation, so that feature information of different fused layers is obtained, and the segmentation accuracy of the network model is improved. After each convolution operation, batch normalization operation, linear mapping operation and linear rectification operation are used for realizing normalization and nonlinear transformation of output characteristics, and the convergence speed of network training is accelerated conveniently.
4) A model output unit that performs difference processing on an input image having a scale of 224 × 224 × 2 and outputs a binary image having a size of 448 × 448 × 2;
the 448 × 448 RGB color image is converted into a 448 × 448 × 2 binary image by a network model, where 1 in the binary image is a parking space borderline and a non-parking space borderline is 0.
The feature channel dimensionality reduction is carried out by a large amount of 1 multiplied by 1 convolution in the network model, so that the calculated amount of a feature extraction unit in the neural network model is greatly reduced, and meanwhile, a good detection effect can be guaranteed. The specific analysis is as follows:
in the deep convolutional neural network, a large part of the calculation amount comes from a convolutional layer or a fully-connected layer, and in the semantic segmentation network, the convolution operation occupies most of the calculation amount because the fully-connected layer is less adopted. The standard convolution calculation process is shown in fig. 3 and 4, in the standard convolution operation, the width, height and channel number of the input feature are respectively represented as W, H and N, the number of convolution kernels of the convolution layer is M, and the dimension of the convolution kernels is represented as K × N, where K represents the scale of the convolution kernel, and N represents the channel of the input vector, the convolution operation process is that the convolution kernel slides along the width direction and height direction of the image and performs pixel-by-pixel multiplication and summation with the corresponding image according to the set step size, the multiplication and summation result at each position represents the function response of the input data of the convolution kernel in the local area, and the convolution result after the same convolution kernel performs traversal on all positions of the input image is the output feature of the convolution kernel. In the convolution operation with the step size of 1, a convolution result with the same dimension as the input characteristic scale can be obtained, the dimension is H multiplied by W multiplied by 1, and the output characteristic dimension obtained by all the convolution results in the convolution layer is H multiplied by W multiplied by M.
For standard convolution, the multiplication operation of one convolution operation is calculated to be H × W × N × K under the condition that the step size is 12
Therefore, in the convolutional layer, the calculation amount of the multiplication operation performed on the input features is H × W × N × K2×M
Taking convolution operation with a size of 3 × 3 scale as an example, information flows in the image width and height spatial dimension and channel dimension are shown in fig. 5 and 6.
In the case that the dense connection in the width and height dimensions of the image is a local connection mode (local connection in a range of 3 × 3 pixels), and the dense connection relationship in the channel dimensions is full connection, that is, the channels are all connected to each other, so the calculation amount in the calculation process of the dimension is the same as that in the full connection mode, for this case, the convolution operation mode adopted in the embodiment is a convolution method based on channel compression module, which can greatly reduce the calculation amount, as follows:
under the conditions that the input characteristic dimension is H multiplied by W multiplied by N, the output characteristic dimension is H multiplied by 0W multiplied by 1M and the compression channel dimension is H multiplied by W multiplied by C, the calculation amount of the compression convolution, the intermediate conventional convolution and the expansion convolution is H multiplied by W multiplied by N multiplied by C, C multiplied by W multiplied by H multiplied by K2xC and CxW xH x M;
amount of computation compared to standard convolution
In the case of 64 input and output channels and 16 compressed channel dimensions, the method based on channel compressed convolution is 9.7% of the standard convolution. It can be seen that the convolution method based on the channel compression method has more economical computational overhead than the standard convolution method. Compared with a mainstream segmentation network, the network structure designed by the patent has smaller model volume and higher calculation speed.
Through tests, the lightweight neural network model can reach 20ms/frame on the NVIDIA GTX1060 video card without network acceleration optimization processing, the running speed on the embedded artificial intelligence platform NVIDAI TX2 is 103ms, and the size of the network model is 2.7M. After subsequent neural network optimization, the operation speed can reach 30ms/frame on NVIDIA TX2 on the embedded artificial intelligence platform.
Preferably, in the training process, the model evaluation is performed on the neural network model by using the verification data set, the evaluation criterion uses a pixel segmentation precision standard, that is, the ratio of the number of pixels correctly segmented by the network to the number of all pixels is used as an optimization target to train the network, the training platform is NVIDIA TITAN X, the training solver is an Adam solver, the number of training steps is 70000-100000 steps, the number of images input into the neural network in each step is 6, and the number depends on the performance of the graphics card used for training.
Preferably, the optimization and acceleration of the neural network model includes:
a. the parameters in all the convolution layers, the batch normalization layer (batch norm layer) and the linear mapping layer in the model parameters are extracted and fused, the calculated amount of the network can be greatly reduced, and the fusion principle is as follows:
the batch normalization layer (BatchNorm layer) and the linear mapping layer (Scale layer) play a role in accelerating the training and convergence of the neural network through data normalization. However, when the network is deployed, only forward reasoning is carried out, the updated parameters do not need to be propagated reversely, the batch normalization layer and the linear mapping layer only play a role in linear transformation of data, repeated redundant calculation is generated, and the calculation speed of the network is influenced. Considering that all the batch normalization layer and the linear mapping layer in the network are after the convolutional layer, the layer parameters and the convolutional layer parameters can be directly fused.
The calculation formula in the batch normalization layer isIn the formula, mean and var are the mean and variance of all the characteristics in the normalization layer, epsilon is a minimum value larger than 0, and the prevention denominator is 0;
the data is linearly transformed in a linear mapping layer (Scale layer) with the formulaIn the formula, gamma and beta are linear mapping layer parameters;
the parameters after parameter fusion are preferably adopted to include:
In the formula, woldAs convolution layer weights before fusion, boldIs biased for the pre-fused convolutional layer.
Meanwhile, after the model training is finished, the parameter of the convolutional layer is solidified and can be directly multiplied by the result of the combination of the two layers, so that the result of the combination of the three network layers is obtained. The model after parameter combination is effectively compressed, the size of the network model is reduced from 2.7M to 1.8M, the compression ratio reaches 33%, meanwhile, the inference time of the network is reduced from 103ms to about 78ms, the acceleration ratio of the network reaches nearly 24.3%, and obvious speed improvement is brought.
b. The model is subjected to low-precision quantification by using FP16, so that the model computation amount and the memory demand amount are further reduced, and the model can be subjected to real-time detection on an embedded platform: the training process of the network is carried out by using 32-bit floating point operation, and because a large number of local connections of the neural network have strong self-adaptive capacity, the calculation speed can be doubled by replacing parameters with low-precision 16-bit half-precision floating point operation under the condition that a segmentation result is not obviously reduced. After the TensorRT-based semi-precision quantization acceleration operation, the running speed of the network on the NVIDIA TX2 platform is compressed from 78ms to 31ms, and the segmentation frame rate exceeds 30 frames per second.
Step S2, online step: acquiring image data containing parking spaces on line, performing parking space sideline semantic segmentation by using a trained neural network model to obtain a parking space sideline mask, and fitting, clustering and combining the obtained sideline masks to obtain a geometric shape consisting of sidelines; and screening the geometric shapes according to the set shape discrimination conditions to determine the parking spaces.
The method specifically comprises the following steps:
1) in the driving process of the vehicle, a camera installed on the vehicle is adopted to collect the image information including the parking space on line;
2) performing semantic segmentation on the parking space borderline on the picture information by using the neural network model established in the step S2 to obtain a parking space borderline mask;
3) and (4) scanning line by line on the semantic partition result of the parking space sidelines, extracting the central point of the continuous area in the partition result, and performing straight line fitting by using hough transformation on the basis to obtain each sideline of the parking space contained in the picture.
4) Clustering and combining sidelines and judging the geometrical shape of the parking space:
in the image, due to the phenomena of shielding, shadow or sideline abrasion, the same sideline can be split into a plurality of straight line segments, so that the straight line segments fitted by hough transformation are clustered, and the straight line segments meeting the following position relation are clustered into the same straight line segment, so that the calculation amount and the mismatching rate when the parking space sideline is combined are reduced:
a. two straight line segments are approximately parallel to each other (| delta theta | < theta |)T,θT=5°);
b. The distance between the straight line sections is smaller than the pixel distance of the borderline of the parking space in the image, and the distance between the straight line sections is defined as the distance from the center point of any one straight line section to the straight line where the other straight line section is located;
c. the distance between the nearest points of the two straight line segments is less than a certain threshold dTLet L be the distance of the short side of the parking space in the imagesThen, then
In an actual scene, a parking space is a rectangle or a parallelogram (a parking line with a special shape is not in the detection direction of the technology related to the patent) surrounded by 4 or 3 sidelines, and the area surrounded by the shape, the parallelism and the distance between opposite sides all accord with certain standards. Therefore, the scale of the standard parking space needs to be calibrated according to the size of the actual parking space in the image, then the combination of 4 sidelines and 3 sidelines is selected from the extracted straight line segments in a full arrangement mode, and the parking space geometric shape screening is carried out according to the following rules, so that the parking space conforming to the geometric shape relation is obtained:
a. the area enclosed by the sideline area is about the calibration area range, namely | S-SC|<SdIn which S isCFor the calibrated standard area size, the value is equal to the arithmetic mean value S of randomly selected 10-30 standard parking spaces in the overhead view imageCThe threshold value for detecting the difference between the parking space area and the calibrated parking space area is equal to half of the range of the area of 10-30 randomly selected standard parking spaces in the overlook image.
b. Parallel to the side lines: let the included angle of the side lines be thetadThen | θdIf < 5 degrees, only one group of opposite sides is judged according to the condition of three side lines.
c. The distance between the opposite side lines is in a certain range: for a set of long edges, | Dl-DCl|<DdlWherein D isClThe calibrated standard distance between the long sides is equal to the central point of any long side calculated in randomly selected 10-30 standard parking spaces in the overhead view imageArithmetic mean of the distances of the bars to the long sides, DdlThe value of the threshold value for detecting the difference between the distance between the long sides of the parking spaces and the distance between the long sides of the calibrated parking spaces is equal to half of the range of the distance between the long sides of 10-30 randomly selected standard parking spaces in the overlook image. The same constraint on the distance between the short sides, i.e. | Ds-DCs|<Dds
Preferably, the image data including the parking space is acquired by a calibrated camera mounted on the vehicle; by calibrating the camera, the image distortion caused by the camera and the shooting angle is removed, and the problem of parking space deformation caused by the perspective effect is solved;
the method specifically comprises the following steps:
1) carrying out offline calibration on internal and external parameters of the camera:
the camera can bring image distortion when shooting pictures, internal and external parameters of the camera are obtained by utilizing the calibration of the images captured by the camera, the images are corrected, and the image distortion caused by the imaging of the camera lens is removed.
2) Calibrating an inverse perspective transformation matrix of the camera:
because the camera is arranged on the vehicle, when the parking space is imaged, the shape of the parking space is distorted due to the perspective effect; through the calibration of the inverse projection transformation matrix of the camera, a forward-looking image can be converted into an overlooking top view, and the problem of parking space shape distortion caused by the perspective transformation imaging of the camera is solved.
More preferably, the online step may employ two cameras, which are respectively disposed at the left and right sides of the vehicle, respectively detect the parking spaces at the left and right sides of the vehicle, and respectively project the parking spaces to the vehicle body coordinate system, thereby increasing the detection range. This scheme all has good detection effect to perpendicular parking stall and parallel parking stall.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (7)
1. A quick parking space detection method based on deep learning is characterized by comprising the following steps:
an off-line step: acquiring image data including parking spaces offline, and establishing a training and verification data set; training, evaluating and optimizing a neural network model; the neural network model is used for performing semantic segmentation on a parking space sideline in the image data; the neural network model is a lightweight deep learning semantic segmentation model;
an online step: acquiring image data containing parking spaces on line, performing parking space sideline semantic segmentation by using a trained neural network model to obtain a parking space sideline mask, and fitting, clustering and combining the obtained sideline masks to obtain a geometric shape consisting of sidelines; screening the geometric shapes according to a set shape discrimination condition to determine parking spaces;
clustering edges includes:
1) judging whether the included angle delta theta of two straight line segments in the side line is smaller than a clustering angle threshold theta or notT;
2) Judging whether the distance between two straight line sections is smaller than the pixel distance of the borderline of the parking space in the image, wherein the distance between the two straight line sections is the distance from the center point of any straight line section to the straight line where the other straight line section is located;
3) judging whether the distance between the nearest points of the two straight line segments is less than a threshold value dTSaidLsThe distance of the short side of the parking space in the image is taken as the distance;
4) clustering straight line segments satisfying 1) to 3);
the lightweight deep learning semantic segmentation model comprises a preprocessing unit, a downsampling feature extraction unit, an upsampling feature extraction unit and a model output unit;
the preprocessing unit is used for reducing the dimensions of the width dimension and the height dimension of the input image;
the down-sampling feature extraction unit is used for extracting features and reducing the width and height dimensions of the features; the down-sampling feature extraction unit shares a two-stage down-sampling module;
each stage of the down-sampling module is composed as follows: a backbone structure, firstly reducing the channel dimension of the characteristic dimension into 1/4 of the input characteristic channel dimension by adopting 1 × 1 convolution, then performing standard 3 × 3 convolution operation, wherein the step length used in the convolution operation is 2, the width dimension and the height dimension of the output characteristic are reduced to be half of the width dimension and the height dimension of the input characteristic, and the channel dimension of the output characteristic is equal to the input characteristic; then, expanding the dimension of the characteristic channel by using 1 × 1 convolution, wherein the width dimension and the height dimension of the characteristic are not changed; the method comprises the steps of firstly, performing dimension reduction operation on width and height dimensions of features by using maximum value pooling operation with step length set to be 2; then, expanding the channel dimension of the features by adopting a 1 × 1 convolution mode; then, carrying out element-by-element addition on the features output in the main structure and the features output in the lateral connection structure to realize feature fusion;
an upsampling feature extraction unit to implement width and height dimensions of featuresExpansion ofExtracting the features to obtain an up-sampling binary image with the same width and height as the image output by the preprocessing unit; the upsampling feature extraction unit is shared by two levels of upsampling modules;
each stage of the up-sampling module is composed of the following components: the method comprises the steps of constructing a backbone structure, firstly reducing channel dimensionality into 1/4 of input channel dimensionality by adopting 1 × 1 convolution, then extracting features and lifting width and height dimensionalities by adopting 3 × 3 deconvolution, and then expanding the channel dimensionality by adopting a 1 × 1 convolution mode; the input features are the features with the same corresponding width and height dimensions in the down-sampling module, and the input features and the features obtained by convolution processing in the main structure are subjected to element-by-element addition operation to obtain fused feature information of different layers;
and the model output unit is used for carrying out difference processing on the up-sampled binary image to output a binary image with the same width and height as the input image, wherein 1 in the binary image is a parking space borderline, and a non-parking space borderline is 0.
2. The rapid parking space detection method according to claim 1,
the offline step specifically comprises:
1) acquiring a plurality of groups of picture data containing parking spaces in an off-line manner, marking the side line areas of the parking spaces in the pictures, and constructing a training and verifying data set;
2) constructing a lightweight deep learning semantic segmentation model based on a channel compression convolution mode, and performing model parameter training by using a training data set;
3) establishing an evaluation standard, evaluating the trained model by using a verification data set, and adjusting model parameters;
4) and optimizing and accelerating the evaluated model.
3. The rapid parking space detection method according to claim 2, wherein the lightweight deep learning semantic segmentation model based on the channel compression convolution mode comprises:
a preprocessing unit for input size W1×H1Performing convolution and maximum pooling on the x 3 image, reducing dimension of the image in width and height dimensions, connecting the convolution processing result and the maximum pooling processing result, and outputting the result with the size of W2×H2×N2The pre-processed image of (1);
a down-sampling feature extraction unit for sequentially performing two-stage down-sampling processing on the preprocessed image, reducing the dimension of the width and height dimensions of the image, extracting edge semantic features, and outputting the edge semantic features with the output size of W3×H3×N3The down-sampled image of (2);
an up-sampling feature extraction unit for sequentially performing two-stage up-sampling processing on the sampled image, increasing the width and height dimensions of the image, recovering edge semantic features, and outputting an output size of W2×H2Up-sampling a binary image by 2;
a model output unit for performing difference processing on the up-sampled binary image with output size W1×H1A binary image of x 2;
the two-stage up-sampling process corresponds to the two-stage down-sampling process; wherein the first stage upsampling process corresponds to the second stage downsampling process, and the second stage upsampling process corresponds to the first stage downsampling process.
4. The rapid parking space detection method according to any one of claims 1 to 3, wherein each convolution layer that performs convolution operation is sequentially connected with a batch normalization layer, a linear mapping layer and a linear rectification layer, and the batch normalization operation, the linear mapping operation and the linear rectification operation are performed on the convolution operation result to realize normalization and nonlinear transformation of output characteristics.
5. The rapid parking space detection method according to claim 4,
the optimization and acceleration of the neural network model includes,
a. extracting and fusing parameters in all the convolution layers, the batch normalization layer and the linear mapping layer in the model; the fused parameters include:
Wherein, woldAs convolution layer weights before fusion, boldBiasing the convolutional layer before fusing;
gamma and beta are parameters of a linear mapping layer;
mean and var are the mean and variance of all features in the normalization layer;
ε is a minimum value greater than 0;
b. the model was quantized with low precision using FP 16.
6. The rapid parking space detection method according to claim 1,
the online steps specifically include:
1) in the driving process of the vehicle, a camera installed on the vehicle is adopted to collect the image information including the parking space on line;
2) performing parking space sideline semantic segmentation on the picture information by using the neural network model;
3) performing line-by-line scanning on the semantic partition result of the parking space sidelines, extracting the central point of a continuous area in the partition result, and performing straight line fitting by using hough transformation on the basis to obtain each sideline of the parking space contained in the picture;
4) and clustering and combining the sidelines to form a geometric area, and judging the geometric area meeting the parking space judging condition as the parking space according to the set parking space judging condition.
7. The rapid parking space detection method according to claim 1, wherein the image data including the parking space is acquired by a calibrated camera mounted on the vehicle, and the calibration of the camera comprises:
1) off-line calibration is carried out on the internal and external parameters of the camera, and the parameters are used for eliminating image distortion caused by imaging of a camera lens;
2) and off-line calibration is carried out on the inverse perspective transformation matrix of the camera, so that the forward-looking image is converted into a top view, and the shape distortion of the parking space caused by the perspective transformation imaging of the camera is eliminated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910429977.8A CN110210350B (en) | 2019-05-22 | 2019-05-22 | Rapid parking space detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910429977.8A CN110210350B (en) | 2019-05-22 | 2019-05-22 | Rapid parking space detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210350A CN110210350A (en) | 2019-09-06 |
CN110210350B true CN110210350B (en) | 2021-12-21 |
Family
ID=67788167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910429977.8A Active CN110210350B (en) | 2019-05-22 | 2019-05-22 | Rapid parking space detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210350B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991452B (en) * | 2019-12-03 | 2023-09-19 | 深圳市捷顺科技实业股份有限公司 | Parking space frame detection method, device, equipment and readable storage medium |
JP7346267B2 (en) * | 2019-12-04 | 2023-09-19 | キヤノン株式会社 | Information processing device, relay device, system, information processing device control method, relay device control method and program |
CN110992267A (en) * | 2019-12-05 | 2020-04-10 | 北京科技大学 | Abrasive particle identification method based on DPSR and Lightweight CNN |
CN111179272B (en) * | 2019-12-10 | 2024-01-05 | 中国科学院深圳先进技术研究院 | Rapid semantic segmentation method for road scene |
CN111178236B (en) * | 2019-12-27 | 2023-06-06 | 清华大学苏州汽车研究院(吴江) | Parking space detection method based on deep learning |
CN111368846B (en) * | 2020-03-19 | 2022-09-09 | 中国人民解放军国防科技大学 | Road ponding identification method based on boundary semantic segmentation |
CN112365434B (en) * | 2020-11-10 | 2022-10-21 | 大连理工大学 | Unmanned aerial vehicle narrow passage detection method based on double-mask image segmentation |
CN112600221B (en) * | 2020-12-08 | 2023-03-03 | 深圳供电局有限公司 | Reactive compensation device configuration method, device, equipment and storage medium |
CN112560945B (en) * | 2020-12-14 | 2024-08-09 | 珠海格力电器股份有限公司 | Equipment control method and system based on emotion recognition |
CN112991171B (en) * | 2021-03-08 | 2023-07-28 | Oppo广东移动通信有限公司 | Image processing method, device, electronic equipment and storage medium |
CN113283429B (en) * | 2021-07-21 | 2021-09-21 | 四川泓宝润业工程技术有限公司 | Liquid level meter reading method based on deep convolutional neural network |
CN113658268B (en) * | 2021-08-04 | 2024-07-12 | 智道网联科技(北京)有限公司 | Verification method and device for camera calibration result, electronic equipment and storage medium |
CN113762272B (en) * | 2021-09-10 | 2024-06-14 | 北京精英路通科技有限公司 | Road information determining method and device and electronic equipment |
CN114882727B (en) * | 2022-03-15 | 2023-09-05 | 深圳市德驰微视技术有限公司 | Parking space detection method based on domain controller, electronic equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106373426B (en) * | 2016-09-29 | 2019-02-12 | 成都通甲优博科技有限责任公司 | Parking stall based on computer vision and violation road occupation for parking monitoring method |
CN107516110B (en) * | 2017-08-22 | 2020-02-18 | 华南理工大学 | Medical question-answer semantic clustering method based on integrated convolutional coding |
-
2019
- 2019-05-22 CN CN201910429977.8A patent/CN110210350B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110210350A (en) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210350B (en) | Rapid parking space detection method based on deep learning | |
CN113052210B (en) | Rapid low-light target detection method based on convolutional neural network | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN111461134B (en) | Low-resolution license plate recognition method based on generation countermeasure network | |
CN106845478B (en) | A kind of secondary licence plate recognition method and device of character confidence level | |
CN111027461B (en) | Vehicle track prediction method based on multi-dimensional single-step LSTM network | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN111768388A (en) | Product surface defect detection method and system based on positive sample reference | |
CN112201078B (en) | Automatic parking space detection method based on graph neural network | |
CN110929649B (en) | Network and difficult sample mining method for small target detection | |
CN113888461A (en) | Method, system and equipment for detecting defects of hardware parts based on deep learning | |
CN112731436A (en) | Multi-mode data fusion travelable area detection method based on point cloud up-sampling | |
CN114299383A (en) | Remote sensing image target detection method based on integration of density map and attention mechanism | |
CN113449650A (en) | Lane line detection system and method | |
CN113052170A (en) | Small target license plate recognition method under unconstrained scene | |
CN111738071A (en) | Inverse perspective transformation method based on movement change of monocular camera | |
CN111881914B (en) | License plate character segmentation method and system based on self-learning threshold | |
CN112132746A (en) | Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment | |
CN116740657A (en) | Target detection and ranging method based on similar triangles | |
CN116863227A (en) | Hazardous chemical vehicle detection method based on improved YOLOv5 | |
CN115439926A (en) | Small sample abnormal behavior identification method based on key region and scene depth | |
CN115359376A (en) | Pedestrian detection method of lightweight YOLOv4 under view angle of unmanned aerial vehicle | |
CN114550124A (en) | Method for detecting obstacle in parking space and storage medium | |
CN111597967A (en) | Infrared image multi-target pedestrian identification method | |
CN115049604B (en) | Method for rapidly detecting tiny defects of large-width plate ultrahigh-resolution image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |