CN110992304B

CN110992304B - Two-dimensional image depth measurement method and application thereof in vehicle safety monitoring

Info

Publication number: CN110992304B
Application number: CN201911044348.XA
Authority: CN
Inventors: 郭宇翔; 郭中阳
Original assignee: Zhejiang Libang Hexin Automotive Brake System Co ltd
Current assignee: Zhejiang Libang Hexin Automotive Brake System Co ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2023-07-07
Anticipated expiration: 2039-10-30
Also published as: CN110992304A

Abstract

The application discloses a two-dimensional image depth measurement method and application thereof in vehicle safety monitoring. And obtaining depth information of the distance according to the correlation comparison of the image texture and the blurring. When the method is applied to vehicle safety monitoring, the distance between the vehicle and other road traffic participants can be estimated according to the depth information, and blind spot detection, automatic emergency braking, self-adaptive cruise control and the like can be realized. According to the measuring method, three-dimensional depth information can be obtained by a single-frame plane two-dimensional image, the calculated amount of the whole depth measuring process is small, the requirements on the pixel color, the brightness degree, the static background, the application scene and the like of the acquired two-dimensional image are avoided, and the applicability and the instantaneity are high; and the method does not depend on subjective experience accumulation of users, and has high reliability.

Description

Two-dimensional image depth measurement method and application thereof in vehicle safety monitoring

Technical Field

The invention relates to a two-dimensional image depth measurement method and application thereof in vehicle safety monitoring, and belongs to the field of vehicle safety monitoring.

Background

The comprehensive application of Advanced Driving Assistance System (ADAS) technology is developed, so that the safety of the automobile is effectively improved. In the ADAS functional category, camera applications are gaining more attention than radar and ultrasonic sensor applications. High cost stereoscopic binocular cameras are not preferred due to the extremely demanding requirements of popular automotive products for cost control.

The monocular camera has obvious vehicle-mounted application because the application range of the monocular camera is wide in all aspects including daily life of automobiles. An automobile is a dynamic carrier, and the position of the automobile needs environment sensing technology to be positioned. The images acquired with the monocular cameras are planar two-dimensional, requiring a two-dimensional image algorithm to derive relevant depth information, i.e. distance information from other static and dynamic objects around the car itself.

At present, the image depth information extraction of the monocular camera comprises the following methods:

(1) The image segmentation method for foreground and background comparison comprises the following steps: the color information and the space position information of each pixel are required, the calibration calculation amount of classification and distance information is large, the requirement on the quality of the pixels is high, and the problems of calculation capacity and calculation cost exist;

(2) Multiple spatial scale method: multiple dimensions are required to be defined for the image to contain necessary information to construct a multidimensional space diagram, coordinate axes of the space are constructed, the whole space structure is explained according to subjective experience, and then the distance information is calculated by analyzing and classifying in a low-dimensional space. The reliability problem of the estimation result exists on the premise of closely correlating subjective experiences;

(3) Semantic segmentation labeling method: and dividing each pixel in the image into corresponding categories, and realizing classification of pixel levels. Labeling different scenes and objects on the basis to estimate distance information, wherein the scene segmentation type is large in calculation amount, and the detail loss result is rough;

(4) Motion and geometry fusion method: performing scene segmentation on each frame of two-dimensional video image, and separating a static background from a dynamic foreground; the distance information is calculated based on the geometric information to generate a geometric depth map of the static background. The problem that the identification of the static background changes in actual automobile driving scenes and the real-time response is lagged exists;

(5) The line segment feature extraction method comprises the following steps: a structured environment of point features needs to be constructed to build line segment features, however, the point dependence on the environment is large, and the point features do not perform well in scenes such as texture loss. In addition, application scenes often need to utilize point features and line features extracted from binocular camera images.

In view of the above, the present inventors studied this, and developed a two-dimensional image depth measurement method and its application in vehicle safety monitoring.

Disclosure of Invention

The invention aims to provide a two-dimensional image depth measurement method, which is used for obtaining depth information of far and near according to correlation comparison of textures and blur so as to obtain the estimation of a distance value.

In order to achieve the above object, the solution of the present invention is:

a two-dimensional image depth measurement method comprising the steps of:

1) Uniformly dividing a two-dimensional image into N blocks, setting one of the N blocks as a reference position block, and setting the rest N-1 blocks as blocks to be tested;

2) Performing rough and fine analysis on N blocks of the two-dimensional image to enhance the saliency of the texture features of the image and highlight the sharpness information of the blurred edges;

3) Performing principal component analysis on the extracted image texture features to reduce the dimension of image data and obtain the edge line feature quantity of a pixel set;

4) And analyzing and processing the edge line characteristic quantity of the pixel set through a space frequency domain to obtain the texture density of an image space frequency domain, and calculating to obtain the distance information of each block to be detected relative to the reference position block according to the texture density.

Preferably, the coarse, sparse and fine analysis adopts downsampling treatment, 3-6 images including the original image are obtained through downsampling according to a set scaling, and then texture feature extraction is carried out on the obtained images respectively. The aim of coarse, sparse and fine analysis is to strengthen the prominence of texture features and fuzzy edge sharpness feature information required to be extracted in different pixel diagrams and improve the confidence of feature extraction.

Preferably, the spatial frequency domain analysis uses a discrete cosine transform DCT (Discrete Cosine Transformation). The discrete cosine transform DCT processes only the real part calculation, and has no information loss relative to the calculation of the principal component analysis result.

The two-dimensional image depth measurement method of the invention is characterized in that: a monocular camera that captures a two-dimensional image is set to focus far, with clear texture near the focus and the image near the camera itself appears blurred. And obtaining depth information of the distance according to the correlation comparison of the texture and the blurring. The distance value can be estimated by performing quantization calibration on the depth information of the distance.

According to the two-dimensional image depth measurement method, three-dimensional depth information can be obtained by a single-frame planar two-dimensional image, the calculated amount in the whole depth measurement process is small, and the method has no requirements on the pixel color, brightness degree, static background, application scene and the like of the acquired two-dimensional image, and has strong applicability and good instantaneity; the method does not depend on subjective experience accumulation of users, and has high reliability; furthermore, feature extraction computation speed can be accelerated by block division.

The second object of the invention is to provide a vehicle safety monitoring method, which collects images through a low-cost monocular camera and obtains three-dimensional depth information through single-frame plane two-dimensional image processing, thereby estimating the positions of static or dynamic environmental objects around the vehicle and sensing the safety distance.

In order to achieve the above object, the solution of the present invention is:

a vehicle safety monitoring method specifically comprises the following steps:

1) Shooting the vehicle environment in real time through a monocular camera arranged on the vehicle, uniformly dividing the shot two-dimensional image into N blocks, setting one of the blocks as a reference position block, and setting the rest N-1 blocks as blocks to be detected;

4) The edge line characteristic quantity of the pixel set is subjected to spatial frequency domain analysis processing to obtain spatial frequency domain result data of each block;

5) The spatial frequency domain result data of each block is used as an input interface of the iterative nerve in a serial-parallel mode;

6) And based on the deep learning neural network model, inputting the spatial frequency domain result data, and in the deep learning iterative neural network, performing interactive iterative optimization algorithm on the convolution layer and the pooling layer to obtain the distance information between the stationary or moving object and the self datum point in the vehicle monitoring range.

According to the vehicle safety monitoring method, when the monocular camera focuses far, clear textures are arranged near the focus, and the images near the camera show blurring. And obtaining depth information of the distance according to the correlation comparison of the texture and the blurring. The distance value can be estimated by performing quantization calibration on the depth information of the distance. Thus, the distance between the self-vehicle and other road passing participants (automobiles, electric bicycles, pedestrians and the like) is estimated, and Blind Spot Detection (BSD) can be realized; if fused with radar sensor data, other ADAS functions such as Automatic Emergency Braking (AEB), adaptive Cruise Control (ACC), etc., can be reliably implemented.

The invention is described in further detail below with reference to the accompanying drawings and specific examples.

Drawings

FIG. 1 is a flow chart of a two-dimensional image depth measurement method according to the present embodiment;

fig. 2 is a two-dimensional image taken by the monocular camera of the present embodiment;

FIG. 3 is a rough/fine analysis layer structure of the block A of the present embodiment;

FIG. 4 (a) is a schematic diagram of feature values and feature vectors of principal component analysis-texture feature extraction in the present embodiment;

FIG. 4 (b) is a schematic drawing showing the extraction of image edge lines for principal component analysis-texture feature extraction according to the present embodiment;

FIG. 5 is a representation of principal component analysis of an image according to the present embodiment, namely, a representation of texture density in the spatial frequency domain;

FIG. 6 is a map of the direct view blind spot of the driver of the vehicle according to the present embodiment;

FIG. 7 is a view profile of the CMS field of view of the vehicle driver of the present embodiment;

FIG. 8 is a flow chart of a vehicle safety monitoring method according to the present embodiment;

fig. 9 is a schematic diagram of a data connection layer and a deep learning neural network according to the present embodiment.

Detailed Description

The two-dimensional image depth measurement method disclosed by the embodiment mainly adopts the following working principle: setting a monocular camera for acquiring a two-dimensional image to focus far, wherein the acquired two-dimensional image is characterized in that: the image, with clear texture near the focus and near the camera itself, presents blur. Depth information of the far and near is obtained according to the correlation comparison of the texture and the blurring. And carrying out quantitative calibration on the depth information of the distance, thereby obtaining the estimation of the distance value. The specific measurement method is shown in fig. 1, and comprises the following steps:

s101, uniformly dividing the two-dimensional image into N blocks, setting one of the N blocks as a reference position block, and setting the rest N-1 blocks as blocks to be tested.

The method comprises the following steps: firstly, uniformly dividing a two-dimensional image shot by a monocular camera into N blocks, dividing the two-dimensional image into 8 x 8 blocks in the embodiment, setting one block (a block B shown in fig. 2) as a reference position block, and setting the rest 63 blocks as blocks to be detected; by processing the relative distance information of the other 63 blocks with reference to the reference position block B, the distance information of the 63 blocks with respect to the reference position block B can be obtained. Dividing a two-dimensional image into a plurality of blocks (typically to the power of 2 m) can accelerate the feature extraction computation speed.

S102, performing rough, sparse and fine analysis on N blocks of the two-dimensional image to enhance the salient of the texture features of the image and highlight the fuzzy edge sharpness information.

The coarse to fine analysis (coarse to fine) is specifically: each block is subjected to downsampling (i.e., downsampling), as shown in fig. 3, assuming that the scaling factor of the image pyramid is 2, the original block (layer 0) is downsampled by 1/2 scaling in sequence to obtain 4 pictures (including the original image), and then texture feature extraction is performed on the obtained images respectively. The aim of this is to strengthen the highlighting of texture features and blurred edge sharpness feature information that need to be extracted in different pixel maps and to improve the confidence of feature extraction. The number of fine image downsampling layers of each block depends on the application scene, and is generally 3-6 layers.

S103, performing principal component analysis on the extracted image texture features to reduce the dimension of image data and obtain the edge line feature quantity of the pixel set.

The method comprises the following steps: after coarse and fine analysis, the block performs principal component analysis on the extracted image texture features, namely PCA (Principal Component Analysis). In fig. 4 (a), the gray part is the blue pixel distribution of the block a, and the feature vector (the long arrow line segment is orthogonal to the short arrow line segment by 90 degrees) is obtained through principal component analysis. The long arrow vector is the principal vector component. The long arrow segments are seen as the main elements from the pixel blue lattice, also called edge lines. This results in edge lines, such as the straight line segments shown in fig. 4 (b). Since the distance in the method described in this embodiment is estimated by focusing the far texture by the monocular camera, the texture is clear and dense, while the near one is sparse. And after the principal component analysis, the edge linear density of the texture is analyzed and calculated by adopting a space frequency domain, and the depth distance information can be calibrated and estimated on the basis.

S104, analyzing and processing the edge line characteristic quantity of the pixel set through a space frequency domain to obtain the texture density of the image space frequency domain, and calculating the distance information of each block to be detected relative to the reference position block according to the texture density.

The edge line characteristic quantity of the pixel set is subjected to space frequency domain analysis processing, such as discrete cosine transform DCT (Discrete Cosine Transformation), so as to obtain texture density of an image space frequency domain, and according to the texture density, distance information of a block to be detected relative to a reference position block is calculated; focusing a far image according to a monocular camera so as to have a texture density of a higher spatial frequency domain, wherein the texture density of the spatial frequency domain of a near image is lower; the distance information of each block to be detected relative to the reference position block can be deduced by comparing. As shown in fig. 5, the left side is an image principal component analysis expression, and the right side is a texture density expression in the spatial frequency domain. In this embodiment, the far image is focused by the monocular camera to have a higher texture density in the spatial frequency domain, while the near image has a lower texture density in the spatial frequency domain. And the discrete cosine transform DCT is adopted, only the real part calculation is processed, and no information loss exists relative to the calculation of principal component analysis results.

According to the two-dimensional image depth measurement method, three-dimensional depth information can be obtained by a single-frame plane two-dimensional image, the calculated amount of the whole depth measurement process is small, the acquired two-dimensional image has no requirements on pixel colors, brightness, static background, application scene and the like, and the applicability is strong and the real-time performance is good; the method does not depend on subjective experience accumulation of users, and has high reliability; furthermore, feature extraction computation speed can be accelerated by block division.

The two-dimensional image depth measurement method can be applied to vehicle safety monitoring, a plurality of monocular cameras are arranged around a vehicle, and the depth measurement is carried out on two-dimensional images acquired by the monocular cameras to obtain distance information between other moving or static objects around the vehicle and the vehicle, so that the functions of blind spot detection, lane changing assistance, overtaking assistance and the like are realized. Other ADAS functions, such as Automatic Emergency Braking (AEB) and Adaptive Cruise Control (ACC), etc., may be reliably implemented if fused with vehicle radar sensor data.

The following describes in detail a vehicle blind area obstacle detection as an example.

Because of the pillars a, B, and C of the vehicle, there is a blind area ➀ ➁ ➃ ➅ in the driver-side rear direct view area, as shown in fig. 6. The indirect view through the side mirror is mostly the area of ➂ ➄. But when an electronic rear view mirror Camera Monitoring System (CMS) system is employed, the camera can take the scope of the mirror as shown in fig. 7.

As shown in fig. 8, the vehicle safety monitoring method disclosed in the embodiment specifically includes the following steps:

s201, shooting in real time and dividing blocks.

Firstly, shooting in real time through a monocular camera arranged on a rearview mirror of a vehicle, uniformly dividing a shot frame of two-dimensional image into N blocks, dividing the shot frame of two-dimensional image into 8 x 8 blocks in the embodiment, setting one block (a block B shown in fig. 7) as a reference position block, and setting the rest 63 blocks as blocks to be detected, wherein a block A in fig. 8 is one block to be detected; the distance information of the 63 blocks with respect to the reference position block B can be obtained by processing the relative distance information of the other 63 blocks with respect to the reference position block B. Dividing a plurality of blocks (typically to the power of 2, depending on the pixel size of the image) can speed up the feature extraction computation speed. If the reference position block is determined according to the application scene, for example, the blind area detection of the area behind the side surface of the automobile is used as the application purpose, the area near the wheel close to the tail of the automobile can be taken as the reference position block if the monocular camera arranged at the door mirror is selected in the mirror taking range as shown in the block B of fig. 8.

S202, coarse, fine analysis.

After the two-dimensional image is divided into blocks, coarse and fine analysis (coarse to fine) is required to be carried out on 64 blocks of the two-dimensional image, so that the salient of the texture features of the image is enhanced, and the fuzzy edge sharpness information is highlighted. The following is a block a for example, and the coarse, fine and coarse analysis specifically includes: as shown in fig. 3, the block a is downsampled (i.e., downsampled), and assuming that the scaling factor of the image pyramid is 2, the original block a to be detected (layer 0) is downsampled sequentially by 1/2 scaling to obtain 4 pictures (including the original image), and then the texture feature extraction is performed on the obtained images respectively. The aim of coarse, sparse and fine analysis is to strengthen the prominence of texture features and fuzzy edge sharpness feature information required to be extracted in different pixel diagrams and improve the confidence of feature extraction. Fig. 3 is a rough and sparse fine analysis pyramid layer structure of a block a, the number of downsampling layers of fine images of each block depends on application scenes, and the scene application of an automobile can be sampled into 3 layers or 4 layers, so that required image texture features can be extracted, and the defects of long analysis and calculation time consumption, high cost and the like are avoided.

S203, principal component analysis.

After rough, sparse and fine analysis is performed on the block, principal component analysis, namely PCA (Principal Component Analysis), is required to be performed on the extracted image texture features, and the principal component analysis is used for reducing the dimension of image data to obtain the edge line feature quantity of the pixel set.

In fig. 4 (a), the gray part is the blue pixel distribution of the block a, and the feature vector (the long arrow line segment is orthogonal to the short arrow line segment by 90 degrees) is obtained through principal component analysis. The long arrow vector is the principal vector component. The long arrow segments are seen as the main elements from the pixel blue lattice, also called edge lines. This results in edge lines, such as the straight line segments shown in fig. 4 (b). Since the distance in the method described in this embodiment is estimated by focusing the far texture by the monocular camera, the texture is clear and dense, while the near one is sparse. And after the principal component analysis, the edge linear density of the texture is analyzed and calculated by adopting a space frequency domain, and the depth distance information can be calibrated and estimated on the basis.

S204, spatial frequency domain analysis.

And obtaining spatial frequency domain result data of each block through the edge line characteristic quantity of the pixel set obtained through the principal component analysis and the spatial frequency domain analysis, wherein the spatial frequency domain result data is the texture density of the image spatial frequency domain. The left side of fig. 5 is an image principal component analysis expression, and the right side is a texture density expression of the spatial frequency domain. In this embodiment, the far-near distance information of the automobile blind area obstacle can be estimated by comparing the far-near distance information, which is shown in the right side of fig. 5, according to the fact that the monocular camera focuses the far-far image and thus has the texture density of the higher spatial frequency domain, and the texture density of the spatial frequency domain of the near-near image is lower.

The spatial frequency domain analysis employs a discrete cosine transform DCT (Discrete Cosine Transformation). The discrete cosine transform DCT processes only real part calculation, and has no information loss relative to calculation of principal component analysis results.

S205, data connection.

In order to enter the iterative deep learning neural network, after each block is processed sequentially according to the steps, the spatial frequency domain result data of the block is required to be connected in parallel to be an input interface of the iterative neural network, as shown in fig. 9- (4).

S206, deep learning neural network algorithm.

And based on the deep learning neural network model, inputting the spatial frequency domain result data, and in the deep learning iterative neural network, performing interactive iterative optimization algorithm on the convolution layer and the pooling layer to obtain the distance information between the stationary or moving object and the self datum point in the vehicle monitoring range. The deep learning neural network model is obtained through big data offline training, the roll layer and the pooling layer of the deep learning neural network model are combined, and model parameters are shown in fig. 9. The distance information between the stationary or moving object and the self datum point in the monitoring range is rapidly obtained due to the intelligent machine learning capability of the convolutional neural network during online processing.

The vehicle safety monitoring method is suitable for an auxiliary driving system formed by a vehicle-mounted monocular camera which is widely applied, and the image signal is different from the radar reflection signal and is insensitive to the material composition of the obstacle to be detected.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. The two-dimensional image depth measurement method is characterized by comprising the following steps of:

2) Performing rough and fine analysis on N blocks of a two-dimensional image to enhance the saliency of the texture features of the image and highlight the sharpness information of a fuzzy edge, wherein the rough and fine analysis comprises the steps of sequentially downsampling an original block according to a set scaling to obtain a plurality of images including the original image, and then respectively extracting the texture features of the obtained images;

4) And analyzing and processing the edge line characteristic quantity of the pixel set through a space frequency domain to obtain the texture density of an image space frequency domain, and calculating the distance information of each block to be detected relative to the reference position block based on the deep learning neural network model and the texture density.

2. A two-dimensional image depth measurement method according to claim 1, wherein: the rough, sparse and fine analysis is specifically that the original block is sequentially downsampled according to a set scaling ratio to obtain 3-6 images including the original image, and then texture feature extraction is carried out on the obtained images respectively.

3. A two-dimensional image depth measurement method according to claim 1, wherein: the spatial frequency domain analysis employs a discrete cosine transform, DCT.

4. A vehicle safety monitoring method, characterized in that:

5. A vehicle safety monitoring method as claimed in claim 4, wherein: the rough, sparse and fine analysis is specifically that the original block is sequentially downsampled according to a set scaling ratio to obtain 3-6 images including the original image, and then texture feature extraction is carried out on the obtained images respectively.

6. A vehicle safety monitoring method as claimed in claim 4, wherein: the spatial frequency domain analysis employs a discrete cosine transform, DCT.