CN114298965A

CN114298965A - Binocular vision system-based interframe matching detection method and system and intelligent terminal

Info

Publication number: CN114298965A
Application number: CN202111224174.2A
Authority: CN
Inventors: 裴姗姗; 孙钊; 肖志鹏; 王欣亮
Original assignee: Beijing Smarter Eye Technology Co Ltd
Current assignee: Beijing Smarter Eye Technology Co Ltd
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-04-08

Abstract

The invention discloses a binocular vision system-based interframe matching detection method, a binocular vision system-based interframe matching detection system and an intelligent terminal, wherein the method comprises the following steps: respectively acquiring a top gray-scale image and a top segmentation image of two adjacent frames in the same road scene; detecting a candidate region to be matched of a previous frame through the top view segmentation graph; calculating an initial moving distance between two frames and an initial estimation position of a next frame through the vehicle speed and the timestamp, and acquiring a search area of the next frame based on the initial estimation position of the next frame; performing template matching on the candidate area to be matched and the search area, and calculating the deviation of a matching position; and correcting the initial moving distance by using the matching position deviation to obtain an interframe matching result. The method and the device improve the registration precision of the data between the adjacent frames in the image processing of the auxiliary driving, and further provide more accurate image processing data for the auxiliary driving system.

Description

Binocular vision system-based interframe matching detection method and system and intelligent terminal

Technical Field

The invention relates to the technical field of automatic driving assistance, in particular to a binocular vision system-based interframe matching detection method and system and an intelligent terminal.

Background

With the development of automatic driving technology, people have increasingly higher requirements on safety and comfort of vehicles for assisting driving. In the auxiliary driving, the effect of image processing, especially the data registration precision between adjacent frames, has a great influence on the control effect of the auxiliary driving, and is directly related to the safety and comfort of the auxiliary driving vehicle.

Therefore, it is an urgent need to solve the problem for those skilled in the art to provide a frame-to-frame matching detection method based on a binocular vision system, so as to improve the registration accuracy of data between adjacent frames in image processing for driving assistance and provide more accurate image processing data for the driving assistance system.

Disclosure of Invention

Therefore, the embodiment of the invention provides a binocular vision system-based interframe matching detection method, a binocular vision system-based interframe matching detection system and an intelligent terminal, so that the registration accuracy of data between adjacent frames in image processing of auxiliary driving can be improved, and more accurate image processing data can be provided for an auxiliary driving system.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

the invention provides a binocular vision system-based interframe matching detection method, which is characterized by comprising the following steps:

respectively acquiring a top gray-scale image and a top segmentation image of two adjacent frames in the same road scene;

detecting a candidate region to be matched of a previous frame through the top view segmentation graph;

calculating an initial moving distance between two frames and an initial estimation position of a next frame through the vehicle speed and the timestamp, and acquiring a search area of the next frame based on the initial estimation position of the next frame;

performing template matching on the candidate area to be matched and the search area, and calculating the deviation of a matching position;

and correcting the initial moving distance by using the matching position deviation to obtain an interframe matching result.

Further, the obtaining of the top gray-scale image and the top segmentation image of two adjacent frames in front and behind the same road scene respectively specifically includes:

acquiring left and right views of the same road scene, and processing the left and right views to obtain a dense disparity map of the road scene;

converting image information of a target area into three-dimensional point cloud information under a world coordinate system based on the dense parallax map, and fitting a road surface model based on the three-dimensional point cloud information;

a target area is defined in the dense disparity map, an image of the target area is input into a trained semantic segmentation model, and two-dimensional image information after segmentation is obtained;

and converting the gray-scale image of the detection area to an XOZ projection plane to generate the top view gray-scale image and converting the segmentation image of the detection area to the XOZ projection plane to generate the top view segmentation image based on the homography transformation of the two-dimensional image information and the three-dimensional point cloud information.

Further, the converting the image information of the target area into three-dimensional point cloud information under a world coordinate system based on the dense disparity map specifically includes:

converting the image coordinate system of the dense parallax image into a world coordinate system based on a binocular stereo vision system imaging model and a pinhole imaging model;

taking a target area under a real world coordinate system as a reference, and intercepting the target area from the dense parallax image;

converting the image information in the target area into three-dimensional point cloud information according to the following formula:

b is the distance from the optical center of a left camera to the optical center of a right camera in the binocular stereoscopic vision imaging system;

f is the focal length of a camera in the binocular stereoscopic vision imaging system;

cx and cy are image coordinates of a camera principal point in the binocular stereoscopic vision imaging system;

and

is an image coordinate point within the target region;

disp is the coordinate of an image point of (

，

) The disparity value of (1);

x is the transverse distance between a three-dimensional point and the camera under the world coordinate system;

y is the longitudinal distance between the three-dimensional point and the camera under the world coordinate system;

and Z is the depth distance of the three-dimensional point from the camera under the world coordinate system.

Further, a road surface model equation fitted based on the three-dimensional point cloud information is as follows:

wherein the content of the first and second substances,

is the direction cosine of the included angle between the road surface normal vector and the x coordinate axis of the world coordinate system;

is a road surfaceThe direction cosine of an included angle between the normal vector and the y coordinate axis of the world coordinate system;

is the direction cosine of an included angle between a road surface normal vector and a world coordinate system z coordinate axis;

and D is the distance from the origin of the world coordinate system to the plane of the road surface.

Further, homography transformation based on the two-dimensional image information and the three-dimensional point cloud information is completed by using the following formula:

wherein the content of the first and second substances,

respectively, the lateral distance from the camera, the depth from the camera;

and

is an image coordinate point within the detection area;

and

is a projection plane coordinate point within the detection area;

h is the homography transformation matrix.

Further, the detecting the candidate region to be matched of the previous frame through the top view segmentation map specifically includes:

setting the pixel value marked as a preset mark on the overlook segmentation image of the previous frame as 1, and setting other positions as 0, and generating a binary image according to the pixel value;

and detecting a connected domain on the binary image, and expanding a region with a preset size to the periphery as a candidate region by taking the central position of the lower boundary of the connected domain as a reference.

Further, the acquiring a search area of the next frame based on the initial estimated position of the next frame specifically includes:

obtaining an initial estimation position of a candidate region in a next frame based on the initial moving distance and the candidate region;

and taking the initial estimation position as a reference, and expanding a region with a preset size to the periphery as a search region.

The invention also provides a binocular vision system-based interframe matching detection system, which comprises:

the image acquisition unit is used for respectively acquiring a top gray-scale image and a top segmentation image of two adjacent frames in the same road scene;

a candidate region acquisition unit, configured to detect a candidate region to be matched of a previous frame through the top view segmentation map;

a search area acquisition unit for calculating an initial moving distance between two frames and an initial estimated position of a subsequent frame by a vehicle speed and a time stamp, and acquiring a search area of the subsequent frame based on the initial estimated position of the subsequent frame;

the position deviation acquiring unit is used for performing template matching on the candidate area to be matched and the search area and calculating the matching position deviation;

and the matching result output unit is used for correcting the initial moving distance by using the matching position deviation so as to obtain an interframe matching result.

The present invention also provides an intelligent terminal, including: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method as described above.

The present invention also provides a computer readable storage medium having embodied therein one or more program instructions for executing the method as described above.

The invention provides a binocular vision system-based interframe matching detection method, which comprises the steps of respectively obtaining a top view gray scale image and a top view segmentation image of two adjacent frames in the same road scene, detecting a candidate area to be matched of the previous frame through the top view segmentation image, calculating an initial moving distance between the two frames and an initial estimation position of the next frame through a vehicle speed and a time stamp, and obtaining a search area of the next frame based on the initial estimation position of the next frame; and performing template matching on the candidate area to be matched and the search area, calculating a matching position deviation, and finally correcting the initial moving distance by using the matching position deviation to obtain an inter-frame matching result. Therefore, the method obtains the position deviation by matching the candidate area to be matched with the search area, so that the initial moving distance is corrected and compensated by using the position deviation, the registration precision between two adjacent frames is higher, the registration precision of data between the adjacent frames in the image processing of the auxiliary driving is improved, and more accurate image processing data is provided for the auxiliary driving system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

FIG. 1 is a flowchart of a binocular vision system based interframe matching detection method according to an embodiment of the present invention;

fig. 2 is a block diagram of a specific embodiment of the binocular vision system-based interframe matching detection system provided by the invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The binocular vision system-based interframe matching detection method provided by the invention can improve the registration precision of data between adjacent frames in the image processing of auxiliary driving.

In a specific embodiment, as shown in fig. 1, the interframe matching detection method provided by the present invention includes the following steps:

s1: and respectively acquiring top gray-scale views and top segmentation views of two adjacent frames in the same road scene.

In step S1, the obtaining of the top gray-scale image and the top segmentation image of two adjacent frames in the same road scene includes:

s11: and acquiring left and right views of the same road scene, and processing the left and right views to obtain a dense disparity map of the road scene.

That is to say, the left and right views of the same road scene are acquired through the binocular stereo vision sensor, and the left and right views are processed to obtain the dense disparity map of the road scene.

In this embodiment, the coordinate system of the binocular stereo camera is taken as a reference system, the optical axis direction of the left eye camera is a Z-axis distance direction, the baseline direction of the binocular stereo camera is an X-axis transverse direction, and the vertical direction is a Y-axis longitudinal direction.

S12: and converting the image information of the target area into three-dimensional point cloud information under a world coordinate system based on the dense parallax map, and fitting a road surface model based on the three-dimensional point cloud information.

Specifically, a target area in an image is intercepted by taking the target area in a real world coordinate system as a reference, and the image area of the target area is converted into three-dimensional point cloud information pts in the world coordinate system; and the image area information completes the conversion from an image coordinate system to a world coordinate system according to the imaging model of the binocular stereoscopic vision system and the pinhole imaging model.

In order to improve the accuracy of the three-dimensional point cloud information and further ensure the accuracy of the subsequent calculation result, in step S12, the converting the image information of the target area into the three-dimensional point cloud information under the world coordinate system based on the dense disparity map specifically includes:

and

is an image coordinate point within the target region;

disp is the coordinate of an image point of (

，

) The disparity value of (1);

In step S12, the road surface model equation fitted based on the three-dimensional point cloud information is:

wherein the content of the first and second substances,

is the direction cosine of the included angle between the road surface normal vector and the y coordinate axis of the world coordinate system;

S13: and defining a target area in the dense disparity map, inputting the image of the target area into a trained semantic segmentation model, and obtaining segmented two-dimensional image information.

In order to obtain an accurate semantic segmentation model, the terrain conditions possibly occurring in the road can be analyzed, the terrain common scene categories are classified, and then various scenes are shot to obtain a plurality of training images. And then, labeling the interested region for each training image to obtain a mask image. For example, the pixel value of the bridge joint is marked as 0, the pixel value of the common road surface is marked as 1, the pixel value of the road surface mark is marked as 2, the pixel value of the deceleration strip is marked as 3, the pixel value of the manhole cover is marked as 4, and the pixel value of the accumulated water is marked as 5, so that the mask image uniquely corresponding to each training image can be obtained.

S14: and converting the gray-scale image of the detection area to an XOZ projection plane to generate the top view gray-scale image and converting the segmentation image of the detection area to the XOZ projection plane to generate the top view segmentation image based on the homography transformation of the two-dimensional image information and the three-dimensional point cloud information.

In step S14, homography transformation based on the two-dimensional image information and the three-dimensional point cloud information is completed using the following formula:

wherein the content of the first and second substances,

respectively, the lateral distance from the camera, the depth from the camera;

and

is an image coordinate point within the detection area;

and

is a projection plane coordinate point within the detection area;

h is the homography transformation matrix.

The grayscale image of the detection area can be converted to the XOZ projection plane through homography to generate a top view grayscale image, and meanwhile, the segmentation image of the detection area is converted to the XOZ projection plane to generate a top view segmentation image.

S2: and detecting a candidate region to be matched of a previous frame through the top view segmentation graph.

It should be understood that, in this embodiment, the previous frame refers to the previous frame in two adjacent frames, and for convenience of description, the previous frame is set as the t-1 frame; the next frame refers to the next frame of the two adjacent frames, and for convenience of description, the next frame is set as the t frame.

In step S2, in order to improve the accuracy of the delimiting of the candidate region, the detecting the candidate region to be matched of the previous frame through the top view segmentation map specifically includes the following steps:

s21: setting the pixel value marked as a preset mark on the overlook segmentation image of the previous frame as 1, and setting other positions as 0, and generating a binary image according to the pixel value;

s22: and detecting a connected domain on the binary image, and expanding a region with a preset size to the periphery as a candidate region by taking the central position of the lower boundary of the connected domain as a reference.

For example, in a specific usage scenario, candidate regions with pixel values 2, 3, and 4 are detected on the top view of the t-1 th frame, specifically, on the top view of the t-1 th frame, the pixel values with pixel values 2, 3, and 4 are set to 1, and other positions are set to 0, a binary image is generated, a connected component is detected on the binary image, and a fixed-size region is expanded around the connected component as a candidate region with the center position of the lower boundary of the connected component as a reference.

S3: calculating an initial moving distance between two frames and an initial estimation position of a next frame through the vehicle speed and the timestamp, and acquiring a search area of the next frame based on the initial estimation position of the next frame;

the method for acquiring the search area of the next frame based on the initial estimation position of the next frame specifically comprises the following steps:

s31: obtaining an initial estimation position of a candidate region in a next frame based on the initial moving distance and the candidate region;

s32: and taking the initial estimation position as a reference, and expanding a region with a preset size to the periphery as a search region.

In the specific use scenario described above, the initial movement distance between two frames is calculated from the vehicle speed information, the t-frame timestamp information, and the t-1-frame timestamp information.

S4: and performing template matching on the candidate area to be matched and the search area, and calculating the matching position deviation.

Specifically, by the initial moving distance calculated in step S3 and the determined candidate region, the initial estimated position of the candidate region in the t-th frame may be obtained. That is, a fixed-size region is expanded around as a search region with the initial estimated position as a reference, and a matching position deviation is calculated by performing template matching using the candidate region and the search region.

S5: and correcting and compensating the initial moving distance by using the matching position deviation to obtain an interframe matching result.

In the above specific embodiment, the inter-frame matching detection method based on the binocular vision system provided by the present invention respectively obtains the top view grayscale image and the top view segmentation image of two adjacent frames in the same road scene, detects the candidate area to be matched of the previous frame through the top view segmentation image, calculates the initial moving distance between the two frames and the initial estimated position of the next frame through the vehicle speed and the timestamp, and obtains the search area of the next frame based on the initial estimated position of the next frame; and performing template matching on the candidate area to be matched and the search area, calculating a matching position deviation, and finally correcting the initial moving distance by using the matching position deviation to obtain an inter-frame matching result. Therefore, the method obtains the position deviation by matching the candidate area to be matched with the search area, so that the initial moving distance is corrected and compensated by using the position deviation, the registration precision between two adjacent frames is higher, the registration precision of data between the adjacent frames in the image processing of the auxiliary driving is improved, and more accurate image processing data is provided for the auxiliary driving system.

In addition to the above method, the present invention also provides a binocular vision system-based interframe matching detection system, which, in one embodiment, as shown in fig. 2, includes:

the image acquisition processing unit 100 is specifically configured to:

The converting the image information of the target area into three-dimensional point cloud information under a world coordinate system based on the dense disparity map specifically comprises:

and

is an image coordinate point within the target region;

disp is the coordinate of an image point of (

，

) The disparity value of (1);

The road surface model equation based on the three-dimensional point cloud information fitting is as follows:

wherein the content of the first and second substances,

Wherein homography transformation based on the two-dimensional image information and the three-dimensional point cloud information is completed by using the following formula:

wherein the content of the first and second substances,

respectively, the lateral distance from the camera, the depth from the camera;

and

is an image coordinate point within the detection area;

and

is a projection plane coordinate point within the detection area;

h is the homography transformation matrix.

The candidate area obtaining unit 200 is specifically configured to:

The search area obtaining unit 300 is specifically configured to:

In the above specific embodiment, the inter-frame matching detection system based on the binocular vision system provided by the invention respectively obtains the top view grayscale image and the top view segmentation image of two adjacent frames in the same road scene, detects the candidate area to be matched of the previous frame through the top view segmentation image, calculates the initial moving distance between the two frames and the initial estimated position of the next frame through the vehicle speed and the timestamp, and obtains the search area of the next frame based on the initial estimated position of the next frame; and performing template matching on the candidate area to be matched and the search area, calculating a matching position deviation, and finally correcting the initial moving distance by using the matching position deviation to obtain an inter-frame matching result. Therefore, the method obtains the position deviation by matching the candidate area to be matched with the search area, so that the initial moving distance is corrected and compensated by using the position deviation, the registration precision between two adjacent frames is higher, the registration precision of data between the adjacent frames in the image processing of the auxiliary driving is improved, and more accurate image processing data is provided for the auxiliary driving system.

In correspondence with the above embodiments, embodiments of the present invention also provide a computer storage medium containing one or more program instructions therein. Wherein the one or more program instructions are for executing the method as described above by a binocular camera depth calibration system.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above embodiments are only for illustrating the embodiments of the present invention and are not to be construed as limiting the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the embodiments of the present invention shall be included in the scope of the present invention.

Claims

1. A binocular vision system-based interframe matching detection method is characterized by comprising the following steps:

2. The inter-frame matching detection method according to claim 1, wherein the obtaining of the top-view grayscale image and the top-view segmentation image of two adjacent frames in the same road scene respectively specifically includes:

3. The method according to claim 2, wherein the converting image information of the target area into three-dimensional point cloud information in a world coordinate system based on the dense disparity map specifically comprises:

and

is an image coordinate point within the target region;

disp is the coordinate of an image point of (

，

) The disparity value of (1);

4. The interframe matching detection method of claim 2, wherein a road surface model equation fitted based on the three-dimensional point cloud information is:

wherein the content of the first and second substances,

5. The interframe matching detection method of claim 2, wherein homography transformation based on the two-dimensional image information and three-dimensional point cloud information is accomplished using the following formula:

wherein the content of the first and second substances,

respectively, the lateral distance from the camera, the depth from the camera;

and

is an image coordinate point within the detection area;

and

is a projection plane coordinate point within the detection area;

h is the homography transformation matrix.

6. The method according to claim 1, wherein the detecting the candidate region to be matched of the previous frame through the top view segmentation map specifically includes:

7. The method according to claim 6, wherein the obtaining a search area of a subsequent frame based on the initial estimated position of the subsequent frame specifically includes:

8. An interframe matching detection system based on a binocular vision system, the system comprising:

9. An intelligent terminal, characterized in that, intelligent terminal includes: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor, configured to execute one or more program instructions to perform the method of any of claims 1-7.

10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-7.