WO2016045425A1

WO2016045425A1 - Two-viewpoint stereoscopic image synthesizing method and system

Info

Publication number: WO2016045425A1
Application number: PCT/CN2015/082557
Authority: WO
Inventors: 侯春萍; 刘佳杰; 李飞; 胡文迪
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-09-22
Filing date: 2015-06-26
Publication date: 2016-03-31
Also published as: CN105430368A

Abstract

A two-viewpoint stereoscopic image synthesizing method and system, comprising: acquiring two viewpoint images via two MIPI interfaces; matching and synthesizing the obtained two viewpoint images, and generating two-viewpoint stereoscopic image data; and displaying a two-viewpoint stereoscopic image for the naked eye via a front-set LED stereoscopic display in a slit grating.

Description

Method and system for synthesizing two-view stereo image

Technical field

This paper relates to stereo imaging technology, especially a two-view stereo image synthesis method and system.

Background technique

The most effective channel for humans to access information is through vision. Since the human eyes see the real three-dimensional scene in nature, it is always the goal of human beings to be able to reproduce the true three-dimensional scene on the screen. Stereoscopic imaging technology is gradually developed based on such needs, and can be used in scientific research, military, education, industry, medical and many other fields. Through stereoscopic imaging technology, stereoscopic color images can be recorded, transmitted and displayed, giving viewers an immersive feel.

The spatial multiplexing mode stereoscopic imaging technology displays the stereoscopic image pairs on the screen at the same time, and through some special means, the two eyes respectively view different images at the same time, thereby obtaining a stereoscopic feeling. Optically speaking, a stereoscopic image can be viewed using various optical surfaces without wearing glasses, and is called an auto-stereoscopic display. Common optical surfaces include: Lenticular Plate, Parallax Barrier, IP Lens Array, and the like. For the dual-view autostereoscopic display mode, in general, by adding a lens cylinder or a parallax barrier in front of a CRT (Cathode Ray Tube) display or a flat panel display, the emission direction of each pixel light is controlled to make the left viewpoint The image is only incident on the left eye, and the image of the right viewpoint is only incident on the right eye, and binocular vision is generated by binocular parallax.

The lens cylinder is composed of a row of vertically arranged semi-circular cylindrical lenses. By using the refractive effect of each cylindrical lens, two different planar images are directed to the corresponding fields of view of the two eyes, so that the left eye image is focused on The viewer's left eye, the right eye image is focused on the viewer's right eye, thereby producing stereoscopic vision.

The parallax barrier is a vertical plate mounted in front of the display. For each eye, it blocks a part of the screen, so that all the pixels of the left view point are incident on the left eye field of view, and all the pixels of the right view point are incident on the right eye. Sight. The parallax barrier acts like a lens cylinder, except that it uses a baffle to block a portion of the pixel display rather than redirecting it by refraction.

FIG. 1 is a related art slit front LCD (Liquid Crystal Display) The schematic diagram of the structure of the autostereoscopic display, as shown in FIG. 1, places the slit grating in a proper position in front of the liquid crystal screen, and the slit blocks part of the line of sight of the human eye. The human eye sees the liquid crystal screen through the slit grating. Due to the occlusion of the slit grating, a single eye of a person can only see one column of pixels through a slit. For example, the right eye can only see the Rn column pixels, and the left eye can only see the Ln column pixels. If the Rn column pixel and the Ln column pixel respectively display images of the right eye and the left eye, then viewing the image by the human eye forms a stereoscopic image in the brain.

The stereoscopic video is finally displayed on the mobile screen, and the left and right eyes of the human eye respectively view two views having a certain parallax, thereby causing the human brain to recover the three-dimensional information in the view. Taking into account the small display screen of the mobile terminal, battery drive and the like, the use of two viewpoints is sufficient for the viewing needs.

In the related art, there is a scheme for performing stereoscopic image acquisition and display by using a single camera by artificially distorting a single frame image to generate another image different from the parallax, and then using the two images for stereo synthesis. Since the parallax of the artificial simulation simply shifts the image to the left and right without changing the relative distance of each scene in the image, the stereoscopic display effect is not good; in addition, since only one camera is used during the acquisition, when the camera is blocked or When the image noise is large, the quality of the stereo image is greatly affected. Moreover, in the case where the mobile terminal is occluded and the collected stereoscopic image material is noisy, the high-quality stereoscopic image cannot be collected, synthesized, and displayed at a relatively high speed.

Summary of the invention

This paper provides a two-view stereo image synthesis method and system, which can overcome the image mutation problem, enhance the matching stability, improve the anti-noise ability, and thus display high-quality stereo images.

A two-view stereoscopic image synthesis method includes: acquiring a dual-viewpoint image by using a dual mobile industry processor interface MIPI interface;

Perform matching and synthesizing processing on the collected two-viewpoint image to generate two-viewpoint stereoscopic image data;

The naked eye display of the two-view stereoscopic image is realized by the slit grating front LED stereoscopic display.

Optionally, after the step of acquiring the dual view image by using the dual MIPI interface, the method further includes: performing video driving processing on the collected dual view image.

Optionally, performing the matching and synthesizing processing on the collected dual-viewpoint image includes:

The data of each frame of the left and right views in the collected two-view image is extracted by using two preview threads executed in parallel, and matching and synthesizing processing are performed.

Optionally, the performing matching and synthesizing processing includes:

Registering a dedicated buffer for previewing the camera in the memory; acquiring the left and right video single frame data from the acquired two-viewpoint image, and storing the data in a dedicated buffer;

Synchronizing the obtained left and right video single frame data by using frame data time stamp;

Converting the collected data in YUV format to RGB format;

Perform image smoothing and size conversion processing on the left and right views after format conversion;

The features of the left and right views after image smoothing and size transformation are extracted and matched by the scale invariant feature transform SIFT feature matching algorithm;

The left and right view pixel points are arranged by a specific pixel arrangement manner set in advance, and one frame of stereoscopic image data that can be displayed under the raster is generated.

Optionally, the image smoothing process is implemented using a Gaussian low pass filter.

Optionally, after the step of extracting and matching the features of the left and right views after the image smoothing and the size transform processing by using the SIFT feature matching algorithm, the method further comprises: using the random sampling consistency RANSAC algorithm to remove the features of the matched left and right views. The feature points in the mismatch.

Optionally, in the step of arranging the left and right view pixel points by a specific pixel arrangement manner set in advance, and generating a frame of stereoscopic image data that can be displayed under the raster, the pixel arrangement manner is: in a vertical column. The first column of the composite image is arranged in the first column of the left view, the second column of the composite image is arranged in the first column of the right view; the third column of the composite image is arranged in the second column of the left view, the composite image The fourth column arranges the second column of pixels in the right view, and so on, until the left and right view pixels are completely arranged into the composite image.

Optionally, before the generating the stereoscopic image data that can be displayed under the raster, the method further includes: verifying whether the left and right views are occluded, and if there is occlusion, repairing the occlusion region.

Optionally, the repairing the occlusion region comprises: correcting a corresponding occluded region in a view by the gray value of the pixel in another view.

Optionally, the method further includes: detecting, by using a median filter, the left and right views after the occlusion is repaired The noise of the graph and the noise points are marked;

Noise point repair is performed on points determined to be noise.

Optionally, the performing the noise repair comprises: correcting the gray value of each pixel in the confirmed noise point by using the gray value of the pixel corresponding to the other viewpoint.

A two-view stereoscopic image synthesis system, comprising an acquisition unit, a processing unit, and a display unit; wherein

The acquisition unit is configured to: acquire a dual viewpoint image by using a dual MIPI interface;

The processing unit is configured to: perform matching and synthesizing processing on the collected two-viewpoint image, generate two-viewpoint stereoscopic image data, and output the data to the display unit;

The display unit is configured to realize naked-eye display of the two-view stereoscopic image through the slit grating front LED stereoscopic display.

Optionally, the collecting unit includes two rear cameras with MIPI interfaces, and is configured to separately collect left and right views;

The two cameras are mounted on different I ² C buses, interact with the memory and the central processor using separate data lines, and use the time stamp of each frame for frame synchronization.

Optionally, the camera is an OV5640 chip of Omnivision Corporation.

Optionally, the collecting unit further includes a camera driving module, configured to: drive the collected two-way view and output the result to the processing unit.

Optionally, the camera driving module is implemented by using a V4L2 video driving framework.

Optionally, the processing unit includes a preprocessing module, an extraction module, a matching module, and a synthesizing module;

The pre-processing module is configured to: register a dedicated buffer for previewing the camera in the memory; acquire the single-frame data of the left and right video, and store them in the dedicated buffer respectively; at the same time, use the frame data timestamp function to perform software synchronization; Converting the collected data in the YUV format into an RGB format; performing image smoothing and size conversion processing on the left and right views after the format conversion, and outputting the image to the extraction module;

The extraction module is set to: extract the pre-processed left and right views by using the SIFT feature matching algorithm a feature that generates a 32-dimensional SIFT feature descriptor;

The matching module is configured to: use the SIFT feature matching algorithm to match the extracted features of the left and right views, and use the point closest to the Euclidean distance in the right view as the matching point of the current left view SIFT key point, and record the coordinate information of the matching point pair;

The synthesizing module is configured to: arrange the left and right view pixels by a specific pixel arrangement manner set in advance, and generate a frame of stereoscopic image data that can be displayed under the raster.

Optionally, the processing unit further includes a culling module, configured to: cull the mismatched feature points by using the RANSAC algorithm, and estimate the left and right view pixel coordinate mapping models.

Optionally, the processing unit further includes an occlusion repair module, configured to: when the occlusion region exists in the estimated left and right view pixel coordinate mapping model, use the ash shaded by the other view for the region occluded by the foreign object in the view Degree correction to achieve occlusion area repair.

Optionally, the processing unit further includes a noise repair module, configured to: detect the noise of the left and right views by using a median filter, and mark the noise points and output the signals to the synthesis module.

A computer readable storage medium storing computer executable instructions for performing the method of any of the above.

Compared with the related art, the embodiment of the present invention includes acquiring a dual-viewpoint image by using a dual MIPI interface, performing matching and synthesizing processing on the acquired dual-viewpoint image, and generating two-viewpoint stereoscopic image data; and implementing the LED stereoscopic display through the slit grating front LED A naked eye display of a two-view stereoscopic image. In the embodiment of the present invention, the SIFT feature matching algorithm is adopted, and the features of the extracted left and right views are more effectively coped with the image abrupt problem frequently encountered in the shooting of mobile terminals such as “focus abrupt change”, and the matching stability and the anti-noise are enhanced. ability.

By determining the pixel coordinate mapping model between the left and right views of the current scene, and performing occlusion repair and denoising processing on the single-view low-quality stereo material according to the coordinate mapping model, and generating two-view stereo images, the quality of the three-dimensional material is generally solved in the related art. The problem of low efficiency in synthesis processing.

Through the repair of the occlusion area and the noise repair, it is realized that the high-quality stereo image is collected, synthesized and displayed at a relatively fast speed when the mobile terminal is occluded and the collected stereoscopic image material has a large noise.

BRIEF abstract

1 is a schematic structural view of a slit front-mounted LCD autostereoscopic display according to related art;

2 is a flowchart of a method for synthesizing two-view stereoscopic images according to an embodiment of the present invention;

3 is a schematic diagram of arrangement of pixels in left and right views according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a two-view stereoscopic image synthesizing system according to an embodiment of the present invention.

Embodiments of the invention

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments herein may be arbitrarily combined with each other.

The stereoscopic video utilizes the binocular parallax principle of the human eye, and the binoculars independently receive the left and right images of specific imaging points from the same scene to obtain a stereoscopic effect, and the amount of data to be processed is multiplied compared with the conventional single channel video. . In portable media, in order to overcome the high power consumption and electromagnetic interference (EMI) noise caused by parallel data buses in the case of high-speed data transmission, it is necessary to design a bus design suitable for a larger bandwidth and transmission rate to present the same level. Image and multimedia effects. Portable multimedia devices such as mobile phones, portable media players (PMPs), portable digital versatile discs (DVDs), and digital still cameras (DSCs) are also faced with such problems. To this end, a number of standardization organizations have emerged. The MVI (a video format for camera output) standard, jointly developed by Renesas Technology and Seiko Epson, is based on Low Voltage Differential Signaling (LVDS) technology, which effectively reduces EMI; mobile pixel links proposed by National Semiconductor (MPL, Mobile) Pixel Link) bus standard uses its own patented Whisperbus (a physical layer interface), including Sony Ericsson and other mobile phone giants have cooperated with it; more is the code-based multiple access (CDMA) technology drafter Qualcomm launched mobile-oriented The terminal's High Speed Serial Interface (MDDI) standard is also based on the Low Voltage Differential Signaling (LVDS) physical transmission specification; of course, the largest and most compelling is the Mobile Industry Processor Interface (MIPI) standard. Organized by Nokia, Advanced Reduced Instruction Set Machine (ARM), STMicroelectronics, Texas Instruments, Intel, Freescale and other terminal and solution providers Publish this specification. Among the many technical standards, MIPI standards have been influential in the industry after years of tempering.

Compared with other standards, modules using MIPI interface have the advantages of less wiring, faster speed, large amount of transmitted data, low power consumption, strong anti-interference and strong adaptability. The Serial Camera Interface (CSI) and Serial Display Interface (DSI) introduced by the High Speed Multipoint Connection Working Group under the MIPI organization are based on the digital physical layer (D-PHY, one of the MIPI protocols). , developed camera and display module interface specifications. The physical connections, protocol processing, and upper-layer applications are defined in detail in the specification. D-PHY adopts 1.2 volt, source synchronous upgradeable low-voltage signaling technology, and a 200mV differential signal pair. It can support up to 4 channels, and the rate of each channel can reach up to 1 Gigabit per second. In theory, 4 A total of 4Gbps transmission rate can be achieved, and the static power consumption is zero. The DSI specification for portable device applications also defines a maximum support for Extended Graphics Array (XGA) resolution. For the CSI interface specification, in addition to supporting raw image data such as three primary color space (RGB), Bayer (an image format) and gray color space (YUV), it can support user-defined data types or compressed data formats. The application mode of the DSI interface is basically the same as that of the CSI. MIPI has refined the application interfaces to their respective sub-specifications for different application requirements, so in mobile-based handheld electronic products applications, the MIPI interface specification can basically support data transmission requirements of any speed and resolution.

However, the MIPI standard does not address the issue of stereo acquisition. In the terminal CPU chip, there are two MIPI interfaces reserved for image acquisition, such as Texas Instruments' OMAP 4 series and OMAP 5 series chips. However, on the mobile terminal, the application of the dual MIPI interface is the front/back dual camera acquisition, which cannot meet the stereo acquisition function with high real-time and synchronization requirements.

2 is a flowchart of a method for synthesizing a two-view stereoscopic image according to an embodiment of the present invention. As shown in FIG. 2, the method includes:

Step 200: Acquire a dual viewpoint image using a dual MIPI interface.

The implementation of this step is: synchronous shooting from two capturing devices (such as cameras) with different capturing positions and angles to obtain a dual viewpoint image including image data of left and right views. That is to say, the dual MIPI rear camera satisfies the requirements of high bandwidth, high synchronism, low power consumption and low noise required for stereo acquisition.

For example, on the basis of the dual MIPI interface provided by Texas Instruments' OMAP4 series processor chipset development platform, real-time image acquisition of dual cameras can be realized by mounting two cameras on the I ² C bus. Compared with the traditional acquisition platform, it has the advantages of less wiring, high speed, large amount of data transmission, low power consumption, strong anti-interference and strong adaptability.

Those skilled in the art know that the OMAP4 series processor chip provides two ARM Cortex-A9 processors with a clock speed of 1.2G, one TMS320C64+ processor, two ARM Cortex-M3 processors, 1GB DRAM memory, and two channels. MIPI serial camera data interface: CSI-2_1 and CSI-2_2, as well as direct memory read and write direct memory read (DMA) controller and other modules. It can meet the processing requirements of collecting, synthesizing and displaying stereo images quickly and with high quality. The application of the OMAP4 series processor chip belongs to the conventional technical means of those skilled in the art, and details are not described herein again.

The two-way camera in the embodiment of the present invention is two cameras in which two optical characteristics, geometric characteristics, and imaging characteristics are almost aligned in the horizontal direction, and the distance between the cameras may be 35 mm. The two cameras are mounted on different I ² C buses, interact with the memory and the central processor using separate data lines, and use the time stamp of each frame of the image for frame synchronization. In the "transparent" mode of operation of the DMA controller, the DMA and the central processing unit CPU can alternately access the memory. The DMA controller controls the image data to be directly transferred from the camera to the memory. During the transfer process, the CPU of the central processing unit is not required to participate. The hardware opens a path for directly transferring data for the memory and the input/output device, so that the efficiency of the CPU is greatly improved.

The camera chip in the embodiment of the present invention can select the OV5640 chip of Omnivision, and the interface of the OV5640 is extended based on the MIPI interface. The camera control interface CCI defined in the MIPI interface is consistent with the I ² C standard, and has two ports, respectively. It is called serial clock line (SCL) and serial data line (SDA), where SCL is a unidirectional control clock signal line and SDA is a bidirectional control line. The RESET line, SHUTTER line, STROBE line, clock bus and power ground in the camera expansion interface can be shared by the two cameras, so there is no need to do too much processing, and the two cameras can be directly connected in parallel. How to implement the connection between the camera chip OV5640 and the extended interface of the central processing unit OMAP4460 belongs to the conventional technical means of those skilled in the art, and details are not described herein again.

In terms of video driving, the embodiment of the present invention can adopt a Linux video driver (V4L2, Video for Linux 2) video driving framework, V4L2 is a double layer driving system, and the upper layer is a video device (Video Device) module, which is a registered device function function. Character device. The lower layer is the V4L2 driver. The V4L2 driver and device nodes are registered by the registration function. After the device node is opened, the operation of the device file is replaced by various V4L2-compliant interfaces defined by the v4l2_ioctl_ops structure. When accessing the video hardware device, first call the V4L2 driver module in the Android kernel, and then the V4L2 module calls the device driver. The drive frame and flow used by the two cameras are identical. The main functions of the V4L2 driver framework are: timing management of video data and memory management of data buffers, control of hardware and acquisition of image data by means of I ² C bus, peripheral component interconnect standard (PCI) interface, etc.

The initialization operation function that will be called when the camera driver module is started completes a series of initialization tasks, including hardware power supply, bus controller MIPI clock setting, I ² C bus port initialization, MIPI data port initialization, video device detection and binding. Two video frame buffers (Buffer Queues) located in the memory are managed inside the camera driver module, one as an input buffer queue and the other as an output buffer queue. For the camera device, when the data is collected, the buffer in the input queue is automatically filled into the output queue after being filled with the image data, waiting for the video driver to call the dequeue command (VIDIOC_DQBUF) to transfer the data to the upper layer. After processing, re-invoke the enqueue command (VIDIOC_QBUF) to put the buffer back into the input queue.

Using the V4L2 video driver framework, the video stream image acquisition step generally includes: opening a video device file; obtaining a function list of the device, detecting a video supported format; setting a video frame capture format and a frame size; and applying a plurality of frame buffer regions to the memory, As an input and output cache queue. Multiple caches can be used to establish queues to improve the efficiency of video capture; obtain each cached information and map it to the cache information of the upper system space; start collecting video streams; and take out a frame that has been sampled in the output queue header buffer. The data is passed to the upper layer for processing; the frame buffer just processed is put back into the end of the input queue to cycle acquisition; the video stream is stopped; the video device is turned off.

In step 200, based on the full customization of the camera subsystem of the Android operating system, the OMAP4 series central processing unit of Texas Instruments is used as the hardware platform core, and the dual camera real-time synchronous image is realized through the dual MIPI serial data link. Compared with the traditional acquisition platform, the collection work has the advantages of less wiring, high speed, large amount of transmission data, low power consumption, strong anti-interference and strong adaptability.

Step 201: Perform matching and synthesizing processing on the collected two-viewpoint image to generate two-viewpoint stereoscopic image data.

In this step, two preview threads (PreviewThread) executed in parallel are used, and the function interface provided by the V4L2 framework (if video driving is performed) is used to extract the left and right images in the dual viewpoint image collected in step 200 from the system kernel driver layer. Viewing the data of each frame of image, and then matching and synthesizing the data, generally includes:

Registering a dedicated buffer for previewing the camera in the memory; acquiring left and right video single frame data from the acquired dual viewpoint images, and storing them in a dedicated buffer;

Converting the collected data in YUV format to RGB format;

The feature of the left and right views after image smoothing and size transformation processing extracted and matched by the SIFT feature matching algorithm;

Optionally, the method further includes repairing, denoising, etc., and finally, the synthesized image data is transmitted to the Android display subsystem (Surface) system library, and displayed on the application layer interface.

Taking the customized Android camera subsystem hardware abstraction layer library as an example, the Android Camera architecture is mainly based on the hierarchical structure of the Android system itself, and the corresponding hierarchy is mainly composed of the application layer (Camera App) and the application framework layer. (Camera Service), hardware abstraction layer (Camera Hal), kernel driver layer (Camera Driver). The entire camera subsystem is actually divided into two processes: the client and the server. In this step, the matching and synthesizing processing of the collected dual viewpoint images includes:

Registration preview shows the cache area. Register a dedicated buffer in memory for the camera preview display in the Android Display Surface System Library and specify the image data type.

Acquiring the single-frame original image data, that is, the data of each frame of the left and right views in the dual-view image collected in step 200, that is, calling the V4L2 interface function in the Linux kernel in the hardware abstraction layer library, and acquiring the left and right video single frames. The data is stored in a dedicated buffer; at the same time, the software is synchronized using the frame data time stamp function provided by the Android camera subsystem.

The collected YUV format data is converted to RGB format: YUV color space is European TV A color coding method used in the system, Y stands for brightness (grayscale) and UV stands for color difference (R-Y) (B-Y). Usually, the data collected by the camera is a pixel information matrix of the YUV format, and the stereo image is required to be in the RGB format for synthesis and display. Therefore, the collected original data type needs to be converted into a type consistent with the preview display cache registration type.

Preprocessing the image: Perform image smoothing and size conversion on the obtained left and right views. Among them, the image smoothing processing can be implemented by using a Gaussian low-pass filter, and the Gaussian low-pass filter can effectively overcome the ringing effect, and the effect of eliminating noise is obvious, how to implement the conventional technical means belonging to the skilled person, and will not be described here; The transformation process is to adjust the image size according to actual needs while ensuring the search quality and the processing speed, and how to implement the conventional technical means belonging to those skilled in the art, and details are not described herein again.

The features of the left and right views after preprocessing are extracted and matched, for example, by using a SIFT feature matching algorithm. The Scale Invariant Feature Transform (SIFT) is a local feature descriptor proposed by David Lowe in 1999, and was further developed and improved in 2004. The SIFT feature matching algorithm can deal with the matching problem between the two images, such as viewing angle change, occlusion, brightness change, rotation, noise and scale conversion, and has strong matching ability. The SIFT algorithm consists of two parts: interest point detection and feature description generation. The generated SIFT operator is a local feature descriptor, which describes the gray gradient distribution of the region of interest of the image. The SIFT algorithm has a wide range of applications in the field of image matching and target detection technology, and the target positioning accuracy is also very high. include:

The features of the left and right views after preprocessing are extracted: firstly, Gaussian difference (DoG) scale space is constructed; secondly, each pixel point is searched for extreme points in the neighborhood of its image space and DoG scale space, and the position of the feature points is initially obtained; Then, by fitting the three-dimensional quadratic function to accurately determine the position and scale of the key points (to achieve sub-pixel precision), at the same time remove the low-contrast key points and unstable edge response points to enhance the matching stability and improve the anti-noise ability; Finally, the direction parameter of each key point is specified by the gradient direction distribution characteristic of the neighboring pixels of the key point, so that the operator has rotation invariance and generates a 32-dimensional SIFT feature descriptor.

Matching the features of the extracted left and right views: First, a transformation model is assumed for the transformation between the left and right viewpoints; then, according to the position, scale and rotation information of each feature point, the Euclidean distance of the key feature vector is used as two Similarity measure of key points of image, according to hypothetical transformation The model calculates the transformation parameters of each pair of matches; finally, the point closest to the Euclidean distance in the right view is used as the matching point of the current left view SIFT key point, and the coordinate information of the matching point pair is recorded.

The SIFT feature matching algorithm with "scale invariant" and "rotation invariant" is used to match the left and right view materials, and the position of the feature points is used to estimate the transformation model according to the least squares criterion, and the non-conformity is discarded. The matching pair of the transformed model is then re-calculated according to the least squares criterion using the remaining matching pair. Compared with the traditional matching algorithm, the feature of the left and right views of the matching extraction in the embodiment of the present invention can effectively cope with the image abrupt problem frequently encountered when shooting a mobile terminal such as a "focal length change".

The embodiment of the present invention may further include rejecting the mismatched feature points. For example, a Random Sample Consensus (RANSAC) algorithm may be used, and the position information of the matching points in the feature points of the picture is used as a parameter to estimate the mapping between the two pictures. relationship. By adjusting the threshold in RANSAC, the mapping relationship between the two graphs can be accurately estimated, and the matching points of the SIFT are filtered, thereby achieving the effect of removing the mismatched points. Among them, RANSAC is a robust estimation method proposed by Fishler and Bolles. The basic idea is: when making parameter estimation, instead of treating all possible input data indiscriminately, first design a search engine for specific problems, and use this search engine to iteratively eliminate those inputs that are inconsistent with the estimated parameters. Data (Outliers), then use the correct input data to estimate the parameters. among them,

The estimation of the coordinate mapping mode relationship includes: randomly selecting a data point set from the matching point pair set S, and initializing the model from the subset; finding the set of support points Si according to the threshold Td to become the current model, the set Si is the sample The uniform set is defined as the effective point; if the size of the set Si exceeds the specified threshold T, the model is re-estimated with Si and ends; if the size of the set Si is smaller than the threshold Ts, a new sample is selected and the above steps are repeated. After N attempts, the largest uniform set Si was selected and used to re-estimate the model to get the final result. Only the correct matching point pairs that conform to the coordinate mapping model, that is, the point sets in Si, are retained, and the obtained left and right view pixel coordinate mapping models are saved as the frame image reference information.

The process of matching and synthesizing the collected two-viewpoint image is performed cyclically, and only the image reference information of the newly acquired 4-frame image is retained, and a queue is established. Each time a frame of stereo material is acquired, the head frame information is deleted, and the latest frame reference information is stored in the end of the team.

After the left and right view preprocessing and matching is completed, the left and right view pixels are arranged through a specific pixel In a manner, as shown in FIG. 3, stereoscopic image data that can be displayed under a raster is generated to realize a composite image processing process. The specific pixel arrangement manner is: in the vertical column unit, the first column of the composite view is arranged in the first column of the left view, and the second column of the composite view is arranged in the first column of the right view; The third column arranges the second column of pixels in the left view, the fourth column of the composite view arranges the second column of pixels in the right view, and so on, and so on, until the left and right view pixels are completely arranged into the composite image. The arrangement is special in that the number of horizontal pixels of the synthesized image is twice that of the original image.

Optionally, after the loop execution is completed, the average pixel coordinate mapping model is solved by using the first three frames of the current model queue to verify whether the left and right views are occluded. The average of the left and right views is divided into 8 blocks, and the average gray value is taken for each block, and the average gray value of the corresponding block in the left and right views is compared. If the relative difference of the average gray value of the block is more than 10%, then The low gray value block is regarded as a damaged block. If the number of broken blocks is 0, indicating that there is no occlusion, noise processing is performed.

If the number of damaged blocks is not 0, indicating that there is occlusion, this step further includes: repairing the occlusion area, that is, correcting the occlusion value of the pixel corresponding to the foreign object in the view by using the gray value of the corresponding pixel of the other view to ensure that the mobile terminal is in the When the camera is occluded, the left and right view materials of relatively high quality can still be obtained, which solves the problem that the quality of the three-dimensional material is generally low and the synthesis processing efficiency is not high in the related art. The damage complex includes: accurately determining the damage area, and using the Sobel operator to detect the gray-scale abrupt edge in the damaged block; using the coordinate mapping model of the left and right views of the current scene to determine the coordinate information of the damaged area in another view; The image content of the area replaces the image content of the current damaged area; the edge repair is performed, and for the edge of the gray-scale abrupt detected by the Sobel operator, the average number of gray values of the corresponding pixel points of the left and right views is used to correct each pixel within the 3×3 area. Pixel gray value. Here, for the left and right views that have been matched, if the relative difference of the average gray value of a block is more than 10%, the view block with the lower gray value is marked as a damaged block. Since the left and right views have been matched, here, the view block with the higher gray value (ie the other view mentioned above) can be used instead of the block with the lower gray value (that is, the view mentioned above) The area covered by foreign matter) to achieve damage repair.

After the occlusion area is repaired, due to the limitation of the coordinate mapping model, a small amount of salt and pepper noise is accompanied. Therefore, this step further includes: detecting the noise of the left and right views by using a median filter, and labeling the noise points. That is, in the current point N×N (N is an odd number), the maximum value, the minimum value, and the mean value of the gray value are taken, if the gray value of the current point is the maximum or minimum value in the neighborhood, and exceeds a preset threshold. , there may be noise, marked as suspicious. Where the threshold is in the experiment The empirical value can generally take the average gray value of the entire image. At this time, the coordinate mapping model is used to determine the location area of the suspicious point in another view, the current point is placed in the position, and the gray level comparison is performed again to determine whether the current point is a noise point;

For the noise point repair of the point determined to be noise, the gradation value of the pixel corresponding to the pixel point of the other view point may be used to correct the gray value of each pixel in the vicinity of the noise point 3×3. Compared with other denoising methods (other denoising methods of the related art mostly filter the image in which noise is generated as a whole and easily affect the image quality), the noise repairing method in the embodiment of the present invention is simpler and less computationally intensive. The denoising effect is more significant, and the effect on the stereoscopic image after synthesis is less affected.

Step 202: Perform naked eye display of the two-view stereoscopic image through a slit grating front LED (Light Emitting Diode) stereoscopic display.

In this step, the hardware abstraction layer system library sends a preview message to the server of the camera subsystem through a callback function. After receiving the message, the server calls the ISurface system library to complete the data filling work for the preview display buffer area. Take the Android operating system as an example, including: in the Android operating system application layer through two Camera object instances and their associated Surface preview controls, respectively, the Android.Hardware.Camera class in the application framework layer is provided to the upper layer application. Preview the relevant interface, and put the stereo image from the hardware abstraction layer of Android system, and then submit it to the ISurface system library of Android system to complete the data logic processing, and finally display it to the preview interface.

Since the method adopts the LED stereoscopic display of the pre-adaptive grating, the stereoscopic image data transmitted to the application layer will be projected onto the 2D display screen in a proportional manner to the pixels of the original image. Compared with the similar products, the technical solution proposed by the embodiments of the present invention has a small amount of data processing, low hardware cost, and is easy to manufacture.

Corresponding to the specific 2D display pixel size and viewing conditions, in order to allow the viewer's left and right eyes to see the corresponding stereo disparity image through the grating, it is necessary to accurately design the width and 2D display of the slit grating light strip and the light blocking strip. Structural parameters such as the distance between the screen and the slit grating. For a given 2D display, the display conditions are: the number of parallax images (viewpoints) is K, and the 2D display sub-pixel width is _Wp . The viewing condition is: the optimal viewing distance is L, and the viewpoint spacing of the leading parallax images is Q, which may be equal to or smaller than the pupil spacing of the two eyes. In general, Q=E/N can be set, where E is the pupil spacing of the human eye, and N is a natural number. At this time, if the left eye sees the i-th parallax image at the optimal viewing distance, the right eye sees the first (i+N) a parallax image. When N = 1, the viewpoint spacing of adjacent parallax images is the pupil spacing of the human eye. The slit grating parameters are: the grating pitch W _s , wherein the width of the light-transmitting strip and the light-blocking strip are W _w and W _b , respectively, and the distance between the 2D display screen and the slit grating is D.

In the method of the embodiment of the present invention, the hardware abstraction layer system library continuously loops the process of matching and synthesizing the collected dual-viewpoint images according to the thread loop loop mechanism, and the image data collected by the two camera hardware devices is continuously cycled. Frame matching, compositing, sending to the preview display buffer, and finally to the application interface.

It should be noted that, for the single-frame photographing function, the naked-eye display of the two-view stereoscopic image can be realized according to the above steps; for the video recording function, after the stereoscopic image synthesizing process of step 201, the data format is re-converted into the YUV format, so that For storage, the image data is transmitted to the video recorder subsystem (VideoRecorder) of the Android system for encoding processing; and then step 202 is performed.

4 is a schematic structural diagram of a two-view stereoscopic image synthesizing system according to an embodiment of the present invention. As shown in FIG. 4, at least an acquisition unit 410, a processing unit 420, and a display unit 430 are included.

The collecting unit 410 is configured to collect the dual viewpoint image by using the dual MIPI interface.

The collecting unit 410 includes two rear cameras having MIPI interfaces, that is, the first camera head 411 and the second camera 412 in FIG. 4, and is configured to separately collect left and right views. The two cameras are two cameras that are arranged in the horizontal direction with two optical characteristics, geometric characteristics, and imaging characteristics, and the distance between the cameras can be 35 mm. The two cameras are mounted on different I ² C buses, interact with the memory and the central processor using separate data lines, and use the time stamp of each frame for frame synchronization. Among them, the camera chip can choose Omnivision's OV5640 chip.

Optionally, the collecting unit 410 further includes a camera driving module 413, configured to: drive the collected two-way view and output the result to the processing unit 420. The camera driver module 413 can be implemented using a V4L2 video drive framework.

The processing unit 420 is configured to perform matching and combining processing on the collected two-viewpoint image, generate two-viewpoint stereoscopic image data, and output the data to the display unit 430. Processing unit 430 can be implemented using Texas Instruments' OMAP4 family of processor chips.

The processing unit includes a pre-processing module 421, an extraction module 422, a matching module 423, and a synthesizing module 424.

The pre-processing module 421 is configured to: register a dedicated buffer for previewing the camera in the memory; acquire single-frame data of the left and right video, and store them in a dedicated buffer; and simultaneously use the frame data timestamp function. Software synchronization is performed; the acquired YUV format data is converted into an RGB format; the format converted left and right views are subjected to image smoothing and size conversion processing, and then output to the extraction module 422.

The extracting module 422 is configured to: extract the features of the pre-processed left and right views by using a SIFT feature matching algorithm, and generate a 32-dimensional SIFT feature descriptor.

The matching module 423 is configured to: use the SIFT feature matching algorithm to match the extracted features of the left and right views, and use the point closest to the Euclidean distance in the right view as the matching point of the current left view SIFT key point, and record the coordinate information of the matching point pair.

The synthesizing module 424 is configured to: arrange the left and right view pixel points by a specific pixel arrangement manner set in advance, and generate a frame of stereoscopic image data that can be displayed under the raster.

Optionally, the culling module 425 is further configured to: remove the mismatched feature points by using the RANSAC algorithm, and estimate the left and right view pixel coordinate mapping models.

Optionally, the occlusion repair module 426 is further configured to: when the occlusion region is determined by using the estimated left and right view pixel coordinate mapping model, correcting the occlusion region of the view by the gray value of the pixel corresponding to the other view To achieve the repair of the occlusion area.

Optionally, the noise repair module 427 is further configured to: detect the noise of the left and right views by using a median filter, and output the noise points to the synthesis module 424.

The display unit 430 is configured to realize naked-eye display of the two-view stereoscopic image through the slit grating front LED stereoscopic display.

One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.

The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Industrial applicability

In the embodiment of the present invention, the SIFT feature matching algorithm is adopted, and the features of the extracted left and right views are more effectively coped with the image abrupt problem frequently encountered in the shooting of mobile terminals such as “focus abrupt change”, and the matching stability and the anti-noise are enhanced. ability. In addition, by determining the pixel coordinate mapping model between the left and right views of the current scene, and performing occlusion repair and denoising processing on the single-view low-quality stereo material according to the coordinate mapping model, and generating two-view stereo images, the related art stereo material is solved. The problem is that the quality is general and the synthesis processing efficiency is not high. Through the repair of the occlusion area and the noise repair, it is realized that the high-quality stereo image is collected, synthesized and displayed at a relatively fast speed when the mobile terminal is occluded and the collected stereoscopic image material has a large noise.

Claims

A two-view stereoscopic image synthesis method, comprising:

Acquiring dual viewpoint images using the dual mobile industry processor interface MIPI interface;

Perform matching and synthesizing processing on the collected two-viewpoint image to generate two-viewpoint stereoscopic image data;

The naked eye display of the two-view stereo image is realized by the slit grating front LED diode stereoscopic display.
The two-view stereoscopic image synthesizing method according to claim 1, wherein after the step of acquiring the dual-viewpoint image by using the dual MIPI interface, the method further comprises: performing video driving processing on the acquired dual-viewpoint image.
The two-view stereoscopic image synthesizing method according to claim 1 or 2, wherein the matching and synthesizing the acquired dual-viewpoint image comprises:

The data of each frame of the left and right views in the collected two-view image is extracted by using two preview threads executed in parallel, and matching and synthesizing processing are performed.
The two-view stereoscopic image synthesizing method according to claim 3, wherein the performing the matching and synthesizing processing comprises:

Registering a dedicated buffer for previewing the camera in the memory; acquiring the left and right video single frame data from the acquired two-viewpoint image, and storing the data in a dedicated buffer;

Synchronizing the obtained left and right video single frame data by using frame data time stamp;

Converting the collected data in YUV format to RGB format;

Perform image smoothing and size conversion processing on the left and right views after format conversion;

The features of the left and right views after image smoothing and size transformation are extracted and matched by the scale invariant feature transform SIFT feature matching algorithm;

The left and right view pixel points are arranged by a specific pixel arrangement manner set in advance, and one frame of stereoscopic image data that can be displayed under the raster is generated.
The two-view stereo image synthesis method according to claim 4, wherein the SIFT feature is used After the step of extracting and matching the features of the left and right views after the image smoothing and the size transform processing, the method further includes: using the random sampling consistency RANSAC algorithm to cull the mismatched feature points in the matched left and right view features.
The two-view stereoscopic image synthesizing method according to claim 4, wherein the left and right view pixel points are arranged by a specific pixel arrangement manner set in advance, and a step of generating a frame of stereoscopic image data that can be displayed under the raster is performed. The pixel arrangement manner is: in the vertical column unit, the first column of the composite view is arranged in the first column of the left view, and the second column of the composite view is arranged in the first column of the right view; the third column of the composite view The second column of pixels in the left view is arranged, the fourth column of the composite view is arranged in the second column of the right view, and so on, until the left and right view pixels are completely arranged into the composite image.
The two-view stereoscopic image synthesizing method according to claim 4 or 5, wherein the generating a frame before the stereoscopic image data displayed under the raster further comprises: verifying whether the left and right views are occluded, and if there is occlusion, Repair the occlusion area.
The two-view stereoscopic image synthesizing method according to claim 7, wherein the repairing the occlusion region comprises: correcting the corresponding occluded region in the view by using the gradation value of the pixel point in the other view.
The two-view stereoscopic image synthesizing method according to claim 7, further comprising: detecting a noise of the left and right views after the occlusion is repaired by using a median filter, and labeling the noise points;

Noise point repair is performed on points determined to be noise.
The two-view stereoscopic image synthesizing method according to claim 9, wherein the performing the noise repair comprises: correcting the gray value of each pixel in the confirmed noise point by using the gray value of the pixel corresponding to the other viewpoint.
A two-view stereoscopic image synthesis system includes: an acquisition unit, a processing unit, and a display unit; wherein

The acquisition unit is configured to: acquire a dual viewpoint image by using a dual MIPI interface;

The processing unit is configured to: perform matching and synthesizing processing on the collected two-viewpoint image, generate two-viewpoint stereoscopic image data, and output the data to the display unit;

The display unit is configured to realize naked-eye display of the two-view stereoscopic image through the slit grating front LED stereoscopic display.
The two-view stereoscopic image synthesizing system according to claim 11, wherein the acquisition unit comprises two rear cameras having MIPI interfaces, and is configured to separately collect left and right views;

The two cameras are mounted on different I 2 C buses, interact with the memory and the central processor using separate data lines, and use the time stamp of each frame for frame synchronization.
The two-view stereoscopic image synthesizing system according to claim 12, wherein the acquisition unit further comprises a camera driving module, configured to: drive the collected two-way view and output the same to the processing unit.
The two-view stereoscopic image synthesis system according to claim 11, wherein the processing unit comprises a pre-processing module, an extraction module, a matching module, and a synthesis module;

The pre-processing module is configured to: register a dedicated buffer for previewing the camera in the memory; acquire the single-frame data of the left and right video, and store them in the dedicated buffer respectively; at the same time, use the frame data timestamp function to perform software synchronization; Converting the collected data in the YUV format into an RGB format; performing image smoothing and size conversion processing on the left and right views after the format conversion, and outputting the image to the extraction module;

The extraction module is configured to: extract a feature of the pre-processed left and right views by using a SIFT feature matching algorithm, and generate a 32-dimensional SIFT feature descriptor;

The matching module is configured to: use the SIFT feature matching algorithm to match the extracted features of the left and right views, and use the point closest to the Euclidean distance in the right view as the matching point of the current left view SIFT key point, and record the coordinate information of the matching point pair;

The synthesizing module is configured to: arrange the left and right view pixels by a specific pixel arrangement manner set in advance, and generate a frame of stereoscopic image data that can be displayed under the raster.
The two-view stereoscopic image synthesizing system according to claim 14, wherein the processing unit further comprises a culling module, configured to: cull the mismatched feature points by using the RANSAC algorithm, and estimate the left and right view pixel coordinate mapping models.
The two-view stereoscopic image synthesizing system according to claim 18, wherein the processing unit further comprises an occlusion repair module, configured to: determine, by using the estimated left and right view pixel coordinate mapping model, when the occlusion region exists, the foreign object in the view The occluded area uses another view corresponding pixel The gray value of the point is corrected to achieve the repair of the occlusion area.
The two-view stereoscopic image synthesizing system according to claim 19, wherein the processing unit further comprises a noise repairing module, configured to: detect the noise of the left and right views by using a median filter, and mark the noise points and output the same to the synthesizing module.
A computer readable storage medium storing computer executable instructions for performing the method of any of claims 1-10.