CN115116038A

CN115116038A - Obstacle identification method and system based on binocular vision

Info

Publication number: CN115116038A
Application number: CN202211044451.6A
Authority: CN
Inventors: 谢启伟; 周珍; 梅雨涵; 裴姗姗; 孙钊
Original assignee: Beijing Smarter Eye Technology Co Ltd
Current assignee: Beijing Smarter Eye Technology Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-09-27
Anticipated expiration: 2042-08-30
Also published as: CN115116038B

Abstract

The invention discloses a barrier identification method and system based on binocular vision, wherein the method comprises the following steps: acquiring an original image, and calculating a gray scale map of the original image; inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image; the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing an obstacle through rod-shaped pixels. The method is high in detection precision and high in detection speed, is suitable for urban complex road environments, and solves the technical problems of low obstacle identification efficiency and low identification accuracy in the prior art.

Description

Obstacle identification method and system based on binocular vision

Technical Field

The invention relates to the technical field of auxiliary driving, in particular to a barrier identification method and system based on binocular vision.

Background

In recent years, automatic driving and assistant driving are widely used, and with the continuous and deep research on vehicle intellectualization and networking, more and more attention is paid to how to accurately and timely learn the road environment ahead by the vehicle. The method has important significance in advanced auxiliary driving of the intelligent vehicle for detecting and identifying the front obstacle in real time.

However, in the prior art, the obstacle detection and identification algorithm based on vision needs to complete three steps of obstacle candidate area selection, obstacle identification and result integration in sequence, and has the technical problems of long candidate area selection time and low obstacle identification accuracy.

Disclosure of Invention

Therefore, the embodiment of the invention provides a barrier identification method and system based on binocular vision, and aims to at least partially solve the technical problem that the barrier identification efficiency and the identification accuracy are low in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

a binocular vision based obstacle recognition method, the method comprising:

acquiring an original image, and calculating a gray scale map of the original image;

inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image;

the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing obstacles through rod-shaped pixels.

Further, the training process of the obstacle recognition model comprises the following steps:

acquiring an original image sample, and performing data processing on the original image sample to generate a sample training set;

dividing a target area in each original image in the sample training set into barriers based on rod-shaped pixels;

and training the sample data set until the result accuracy reaches a preset value so as to obtain an obstacle identification model.

Further, acquiring an original image sample, and performing data processing on the original image sample to generate a sample training set, specifically including:

under the same target scene, two Stixel generation schemes are adopted simultaneously to realize the segmentation of the original image sample in the height direction;

a random forest selection segmentation scheme is used to generate a training set of samples.

Further, dividing the target area in each original image in the sample training set into obstacles based on rod-shaped pixels specifically includes:

stixel-based barrier partitioning, performing deep learning by using a deep learning network, representing the original image by Stixel, and partitioning Stixels of different barriers.

Further, in Stixel-based obstacle segmentation, the sample selection specifically includes:

for a left image color picture, adjacent stilxels are respectively changed into 35 × 70 sizes from left to right to be spliced into 70 × 70 sizes from left to right, a stilxel combination of the same object is used as a positive sample, and a stilxel combination of different objects is used as a negative sample;

doubling the data enhancement of the negative samples after the combination of the adjacent Stixels which do not belong to the same object at all;

two obstacles with larger differences are randomly selected, and m stixels in the object 2 are independently and randomly selected for splicing and combining to be negative samples for each Stixel of the object 1.

Further, the deep learning network is an improved DenseNet network.

The present invention also provides a binocular vision-based obstacle recognition system, the system including:

the image acquisition unit is used for acquiring an original image in a target area and calculating a gray scale map of the original image;

the obstacle recognition unit is used for inputting the gray level image into a pre-trained obstacle recognition model so as to obtain an obstacle recognition result in the original image;

the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing an obstacle through rod-shaped pixels.

The present invention also provides an intelligent terminal, including: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method as described above.

The present invention also provides a computer readable storage medium having embodied therein one or more program instructions for executing the method as described above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

According to the barrier identification method based on binocular vision, an original image in a target area is obtained, and a gray level image of the original image is calculated; inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image; the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing obstacles through rod-shaped pixels.

According to the method, firstly, left and right splicing data of two adjacent Stixel combinations are used as input data of a deep learning network DenseNet, and Stixels of barriers with large differences are artificially selected for data enhancement, so that negative sample data with small data volume are enriched. And cutting out different obstacles in one image through a deep learning training network. And then, all Stixel splicing data of the same obstacle are used as input data of a deep learning network AlexNet, and data enhancement is carried out through inversion. And identifying the category of each object through a deep learning training network. In the positioning, the influence of the barrier can be effectively avoided, and the barrier identification precision is improved. The method is high in detection precision and high in detection speed, is suitable for urban complex road environments, and solves the technical problems of low obstacle identification efficiency and low identification accuracy in the prior art.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

Fig. 1 is a flowchart of a binocular vision-based obstacle recognition method according to an embodiment of the present invention;

FIG. 2 is a Stixel effect diagram of a first solution in a random forest algorithm;

FIG. 3 is a diagram of the location of the nadir in solution two of the random forest algorithm;

FIG. 4 is a graph of a result of height segmentation for solution two in a random forest algorithm;

FIG. 5 is a diagram of the final Stixel effect in the random forest algorithm;

FIG. 6 is a flow chart of a training process of an obstacle identification model in the method of FIG. 1;

FIG. 7 is a graph of training loss variation during training of the model shown in FIG. 6;

FIG. 8 is a diagram illustrating the pixel filling effect in the method of FIG. 1;

fig. 9 is a block diagram illustrating an embodiment of a binocular vision-based obstacle recognition system according to the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, the related art terms to which the present invention relates will be explained.

Binocular stereoscopic vision: binocular stereo vision is an important form of machine vision, and is a method for acquiring three-dimensional geometric information of an object by acquiring a left image and a right image of the object to be measured from different positions by using imaging equipment based on a parallax principle and calculating position deviation between corresponding points of the images.

Parallax map: the binocular stereo vision integrates images obtained by two eyes and observes the difference between the images, so that people can obtain obvious depth feeling, the corresponding relation between features is established, mapping points of the same space physical point in different images are corresponded, and the difference is called as a parallax (Disparity) image.

Stixel: the rod-shaped pixel visually determines the bottom point of the obstacle on the basis of free space, then performs height segmentation on the obstacle to determine the top point of the obstacle, so that the selected region of interest is the travelable region which meets the first obstacle upwards, and a rectangle with a fixed width (for example, 7 pixels) and the same height as the obstacle is selected to represent the obstacle.

Free space: the method is also called as a travelable area, and in order to realize the functions of intellectualization and unmanned driving of vehicles, roads need to be divided into travelable areas and non-travelable areas, so that basic road information is provided for vehicle behavior decision and path planning and control. It is necessary to find a boundary line consisting of individual pixels, below which is the travelable area and above which is the first obstacle that the vehicle may encounter.

Height segmentation: after the Stixel is constructed for each pixel point on the Stixel with the abscissa, the height needs to be determined, so that whether the pixel points on the Stixel belong to the same object is determined. And obtaining a parallax value of each pixel point based on the parallax image and judging, wherein if the parallax value is larger than a certain threshold value, the judgment is that the pixel point is a background, and otherwise, the pixel point is an object.

Referring to fig. 1, fig. 1 is a flowchart illustrating a binocular vision-based obstacle recognition method according to an embodiment of the present invention.

In one embodiment, the present invention provides a binocular vision-based obstacle recognition method, including:

s101: acquiring an original image, and calculating a gray scale map of the original image; during data processing, two Stixel generation schemes are provided, the advantages and the shortages of the Stixel generation schemes are respectively arranged on the height segmentation, random forests are used, various characteristics of data are input, and finally a model is trained to finish scheme selection.

In a specific use scenario, during data processing, the same scenario generates pixels under a scheme one, obtains free space under a scheme two, obtains height segmentation under the scheme two, and generates Stixels after a random forest is adopted at a vertex and two schemes are selected one by adopting the scheme two. Fig. 2-5 show effect diagrams, wherein, based on the same scene, fig. 2 is a Stixel effect diagram of a first scheme, fig. 3 is a plot of the positions of the nadirs of a second scheme, fig. 4 is a plot of the height division results of the second scheme, and fig. 5 is a final Stixel effect diagram. As can be seen from fig. 2-5, the bottom point of the first solution is higher, and the second solution is more accurate, so the second solution is adopted. For the vertex, one part of the scheme is inaccurate in height segmentation, the second scheme has the problem of unstable performance, and random forests are used for training the characteristics of the two vertexes as input data so as to decide which scheme is used.

S102: inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image;

Stixel-based obstacle segmentation, advanced learning is carried out by using improved DenseNet, and the region of interest of one image is divided into different objects. And then, based on the divided obstacles, performing deep learning by using AlexNet, and identifying the type of the obstacle.

In some embodiments, as shown in fig. 6, the training process of the obstacle recognition model includes the following steps:

s601: acquiring an original image sample, and performing data processing on the original image sample to generate a sample training set. Under the same target scene, two Stixel generation schemes are adopted simultaneously to realize segmentation of the original image sample in the height direction, and a random forest selection segmentation scheme is used to generate a sample training set.

S602: and carrying out barrier division based on rod-shaped pixels on the target area in each original image in the sample training set. Stixel-based obstacle partition, performing deep learning by using a deep learning network, representing the original image by Stixel, and then partitioning Stixels of different obstacles, wherein in the Stixel obstacle partition, the sample selection specifically comprises the following steps:

two obstacles with large differences are randomly selected, and for each Stixel of the object 1, m stixels in the object 2 are independently and randomly selected to be spliced and combined into a negative sample.

Wherein the deep learning network is an improved DenseNet network.

S603: and training the sample data set until the result accuracy reaches a preset value so as to obtain an obstacle identification model.

In a specific use scenario, when Stixel-based obstacle division is performed, for a left image color chart, adjacent stixels are respectively changed to 35 × 70 size from left to right to be spliced to 70 × 70 size from left to right, a Stixel combination of the same object is used as a positive sample, and a Stixel combination of different objects is used as a negative sample. Considering that a Stixel with a width of 7 pixels contains the characteristics of two obstacles at some two obstacle transition positions at the same time, the Stixel is spliced to serve as a negative sample. Due to the imbalance of the positive and negative samples, a part of negative samples need to be artificially increased, and the data enhancement is performed by one time on the negative samples which are combined by adjacent stixels and do not belong to the same object at all. In addition, two obstacles with large differences are randomly selected: object 1, object 2. M stixels of the object 1 are randomly selected, and for each Stixel, m stixels in the object 2 are independently and randomly selected for stitching and combining as a negative sample. Carrying out vertical Sobel operator operation after carrying out graying processing on the left image color image, wherein the operator calculation formula is as follows:

and simultaneously selecting n Stixels with larger mean values of the Sobel operators in the vertical direction in the object 1 as the left side, and adjacently combining and splicing the left side and the right side of the object to form a positive sample. The vertical edge information of Stixel is more effective for obstacle segmentation, and the above-described negative sample enhancement also increases the probability that a positive sample is misidentified as a negative sample. Therefore, the positive sample with stronger vertical edge information is selected for data enhancement. The data amount changes before and after the data set consisting of 131 images was added to the data are shown in the following table.

In consideration of the influence of changing the size on data, black pixel (0, 0, 0) filling processing is performed on the upper side of the low-height stilt before one stilt combination splicing to make the two pixels highly consistent, and then the two pixels are simultaneously deformed into a size of 35 × 70, and the size of 70 × 70 is obtained after left and right splicing.

In the deep learning network selection part, a modified DenseNet121 network model is adopted. The number of Layer Dense layers in each DenceBlock in DenseNet121 is reserved, i.e., 6, 12, 24, 16 layers, respectively. But the 1 x 1 convolution in each layer is truncated leaving only the 3 x 3 convolution. The ratio of the training set to the validation set of input data is 2: 1.

calculating loss by using a cross entropy loss function, and selecting an Accuracy (Accuracy) evaluation model:

wherein TP, True Positive, represents the number of samples that are actually Positive samples and are predicted to be Positive samples; FP, False Positive, represents the number of samples that are actually negative samples predicted to be Positive samples; FN, False Negative, indicates the number of samples that are actually positive samples predicted to be Negative samples. Accuracy is the Accuracy, and reflects the proportion of the data (TP + TN) judged to be correct by the model to the total data. And in 50 times of training, the model with the highest output accuracy value is the best model for the training. The training loss variation is shown in fig. 7.

After the obstacles are divided, the obstacles are identified one by one, the input spliced stilxel combination retains the original relative position information, the highest vertex and the lowest bottom point are confirmed firstly, and stilxels with insufficient length are filled with black pixels up and down, and an exemplary diagram is shown in fig. 8. Compared with the traditional candidate frame, the method effectively eliminates the influence of the irrelevant objects in the rectangular frame on the obstacle identification. Step S102 is based on the left color map because the obstacle division requires distinguishing different obstacles based on color change. Step S103 is based on the left-image gray-scale map, because considering the color information, the probability that different objects of the same color are misclassified is increased. The training network for obstacle recognition is selected as the traditional AlexNet.

Selecting objects with Stixel number not less than 2 of the obstacles after the partition of 131 pictures as input data, and simultaneously performing turnover data enhancement on all data. All obstacles are classified into five classes, Boundary, Car, Van, Peaple, Bicycle. The classified data amount is shown in table two. The size of the input data is changed to 48 × 96. The ratio of training set to validation set was 3: 1.

and calculating loss by using a cross entropy loss function, and selecting an Accuracy (Accuracy) evaluation model. And in 50 times of training, the model with the highest output accuracy value is the best model for the training.

In the above specific embodiment, the barrier identification method based on binocular vision provided by the present invention obtains the original image in the target area, and calculates the gray scale of the original image; inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image; the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing an obstacle through rod-shaped pixels.

According to the method, firstly, left and right splicing data of two adjacent Stixel combinations are used as input data of a deep learning network DenseNet, and Stixels of barriers with large differences are artificially selected for data enhancement, so that negative sample data with small data volume are enriched. And cutting out different obstacles in one image through a deep learning training network. And then, all Stixel splicing data of the same obstacle are used as input data of a deep learning network AlexNet, and data enhancement is carried out through inversion. And identifying the category of each object through a deep learning training network. In the positioning process, the influence of the obstacles can be effectively avoided, and the obstacle identification precision is improved. The method is high in detection precision and high in detection speed, is suitable for urban complex road environments, and solves the technical problems of low obstacle identification efficiency and low identification accuracy in the prior art.

In addition to the above method, the present invention also provides a binocular vision-based obstacle recognition system, as shown in fig. 9, the system including:

an image acquisition unit 100 for acquiring an original image and calculating a gray scale map of the original image;

the obstacle recognition unit 200 is configured to input the grayscale map into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image;

In the above embodiment, the barrier recognition system based on binocular vision provided by the invention obtains the original image and calculates the gray scale map of the original image; inputting the gray scale image into a pre-trained obstacle recognition model to obtain an obstacle recognition result in the original image; the obstacle identification model is obtained by training an original image sample, wherein the original image sample is an image obtained by dividing an obstacle through rod-shaped pixels.

According to the system, firstly, left and right splicing data of two adjacent Stixel combinations are used as input data of a deep learning network DenseNet, and Stixels of barriers with large differences are artificially selected for data enhancement, so that negative sample data with small data volume are enriched. And cutting out different obstacles in one image through a deep learning training network. And then, all Stixel splicing data of the same obstacle are used as input data of a deep learning network AlexNet, and data enhancement is carried out through inversion. And identifying the category of each object through a deep learning training network. In the positioning, the influence of the barrier can be effectively avoided, and the barrier identification precision is improved. Through testing, the method is high in detection precision and high in detection speed, is suitable for urban complex road environments, and solves the technical problems of low obstacle identification efficiency and low identification accuracy in the prior art.

the data acquisition device is used for acquiring data; the memory for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the method as described above.

In correspondence with the above embodiments, the present invention also provides a computer-readable storage medium containing one or more program instructions therein. Wherein the one or more program instructions are for executing the method as described above by a binocular camera depth calibration system.

Corresponding to the above embodiments, the present invention also provides a computer program product, including a computer program, which when executed by a processor implements the method as described above.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer-readable storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above embodiments are only for illustrating the embodiments of the present invention and are not to be construed as limiting the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the embodiments of the present invention shall be included in the scope of the present invention.

Claims

1. A binocular vision based obstacle recognition method, the method comprising:

2. The obstacle recognition method according to claim 1, wherein the training process of the obstacle recognition model includes:

3. The obstacle recognition method according to claim 2, wherein acquiring an original image sample and performing data processing on the original image sample to generate a sample training set specifically comprises:

4. The obstacle recognition method according to claim 2, wherein the step of dividing the target area in each original image in the sample training set into the obstacles based on the rod-shaped pixels specifically comprises:

5. The obstacle identification method according to claim 4, wherein the sample selection specifically includes, in Stixel-based obstacle classification:

doubling the data enhancement for the negative samples after the combination of adjacent stixels which do not belong to the same object at all;

6. Obstacle identifying method according to claim 5, characterized in that said deep learning network is a modified DenseNet network.

7. A binocular vision based obstacle recognition system, the system comprising:

the image acquisition unit is used for acquiring an original image and calculating a gray scale map of the original image;

8. An intelligent terminal, characterized in that, intelligent terminal includes: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor, configured to execute one or more program instructions to perform the method of any of claims 1-6.

9. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-6.

10. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-6.