CN113436130A - Intelligent sensing system and device for unstructured light field - Google Patents

Intelligent sensing system and device for unstructured light field Download PDF

Info

Publication number
CN113436130A
CN113436130A CN202110978131.7A CN202110978131A CN113436130A CN 113436130 A CN113436130 A CN 113436130A CN 202110978131 A CN202110978131 A CN 202110978131A CN 113436130 A CN113436130 A CN 113436130A
Authority
CN
China
Prior art keywords
module
light field
image
unstructured
parallax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110978131.7A
Other languages
Chinese (zh)
Other versions
CN113436130B (en
Inventor
方璐
戴琼海
张嘉凝
袁肖赟
毛适
赵严
温建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110978131.7A priority Critical patent/CN113436130B/en
Publication of CN113436130A publication Critical patent/CN113436130A/en
Application granted granted Critical
Publication of CN113436130B publication Critical patent/CN113436130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Studio Devices (AREA)

Abstract

The invention provides an intelligent sensing system and device for an unstructured light field, wherein the system comprises: the non-structural heterogeneous high-resolution imaging unit consists of at least one global camera and a plurality of long-focus cameras and is used for fusing data captured by different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit to obtain a first fused image; the variable baseline light field module consists of a plurality of non-structural heterogeneous high-resolution imaging units and is used for fusing the first fused image according to variable baselines among different cameras to obtain a second fused image; the camera array consists of four groups of super wide angle image sensors and is used for fusing the second fused image obtained by the variable baseline light field module to obtain an annular panoramic image. The method has very good robustness and expansibility, can extract large-range accurate depth information in a large scene, and can seamlessly embed the high-resolution picture of the flexible unstructured local camera into the panoramic image.

Description

Intelligent sensing system and device for unstructured light field
Technical Field
The invention relates to the technical field of virtual reality video acquisition, in particular to an intelligent sensing system and device for an unstructured light field.
Background
Three-dimensional vision is an important perception path of human beings, and in five perception paths of human beings, the vision occupies 70% -80% of information sources; while the brain has about 50% of its capacity for visual information and perception. The existing image acquisition scheme is a two-dimensional scheme, the whole three-dimensional scene cannot be recovered, the light rays in the three-dimensional world are completely represented by the light field, and the complete three-dimensional perception capability even can exceed the visual system of a human. Currently relevant researchers have proposed a series of systems and theories for light field acquisition, with classical design approaches including: 1) light field acquisition based on a Microlens Array (Microlens Array); 2) a Camera Array (Camera Array) based light field acquisition; 3) light field acquisition based on Coded masks (Coded Mask).
The main defects of the light field acquisition scheme based on the micro-lens array are that the image resolution loss of light field viewpoints is serious, and the parallax between the viewpoints of the light field is small; the array-based light field acquisition scheme has the main defects of complex system bloat, high hardware cost, strict requirement on the synchronous control precision of a camera, large data transmission pressure and complex calibration of the camera among camera arrays; the method based on the coding mask loses the intensity of light signals, the imaging signal-to-noise ratio is very low, and the collected light field needs to be restored through algorithms such as compressed sensing, so that the fidelity is greatly reduced.
Therefore, the existing light field system is difficult to realize the light field collection and perception with ultra-wide visual angle, ultra-high definition and wide range exceeding the human visual perception capability.
Based on the inherent contradiction of wide view field and high resolution when recording videos from daily behaviors to complex operation states in large scenes, a light field acquisition method for generating virtual reality contents with high robustness and high quality is lacked in the prior art.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide an intelligent sensing system for an unstructured light field, which obtains ultra-wide view field and ultra-high resolution images and videos through multi-reconstruction image sensor fusion imaging, and realizes large-range three-dimensional depth sensing by a variable baseline accurate reconstruction technology. Meanwhile, the device also breaks through the limitation that the image sensor of the existing light field system follows uniform distribution, and does not depend on a structured system architecture needing fine calibration.
Furthermore, the method breaks through the bottleneck of long-term restriction on the space-time bandwidth product of the imaging of the optical image sensor, improves the data flux of the light field sensing from the international highest million-level pixel to a hundred million-level pixel, and also realizes the real-time three-dimension of the wide-field high-resolution dynamic light field.
The second purpose of the invention is to provide an unstructured optical field intelligent sensing device.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides an unstructured light field intelligent sensing system, configured to acquire light field video data, where the system includes:
the non-structural heterogeneous high-resolution imaging unit consists of at least one global camera and a plurality of long-focus cameras and is used for fusing data captured by different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit to obtain a first fused image;
the variable baseline light field module consists of a plurality of non-structural heterogeneous high-resolution imaging units and is used for fusing the first fused image according to variable baselines among different cameras to obtain a second fused image;
the camera array of the four-girdle three-dimensional panoramic acquisition module consists of four groups of super-wide angle image sensors and is used for fusing the second fused image obtained by the variable baseline light field module to obtain an girdle panoramic image.
The unstructured light field intelligent sensing system comprises a unstructured heterogeneous high-resolution imaging unit, at least one global camera and a plurality of long-focus cameras, and is used for fusing data captured by different heterogeneous image sensors in the unstructured heterogeneous high-resolution imaging unit to obtain a first fused image; the variable baseline light field module consists of a plurality of non-structural heterogeneous high-resolution imaging units and is used for fusing the first fused image according to variable baselines among different cameras to obtain a second fused image; the camera array of the four-girdle stereo panoramic acquisition module consists of four groups of super wide-angle image sensors and is used for fusing the second fused image obtained by the variable baseline light field module to obtain an girdle panoramic image. Super wide view field ultrahigh resolution images and videos are obtained through multiple heterogeneous image sensor fusion imaging, and large-range three-dimensional depth perception is achieved through a variable baseline accurate reconstruction technology. Meanwhile, the system breaks through the limitation that the image sensor of the existing light field system follows uniform distribution, and does not depend on a structured system architecture needing fine calibration. Meanwhile, the invention breaks through the bottleneck of restricting the space-time bandwidth product of the imaging of the optical image sensor for a long time, improves the data flux of the light field sensing from the international highest million-level pixel to a hundred million-level pixel, and also realizes the real-time three-dimension of the wide-field high-resolution dynamic light field.
In addition, the unstructured light field intelligent sensing system according to the above embodiment of the present invention may also have the following additional technical features:
further, in one embodiment of the present invention, the system further comprises:
a rendering module for implementing annulus panoramic free viewpoint rendering according to a layered rendering strategy, the rendering module comprising:
an original layer module to render the 3D video above a threshold resolution using an original layer;
the fuzzy layer module is used for processing the dragging problem in the picture by adopting a fuzzy layer;
and the dynamic layer module is used for adopting the dynamic layer to perform dynamic foreground rendering.
Further, in an embodiment of the present invention, the system further includes:
the device comprises a support piece, a camera cloud platform and a frame;
the support is used for supporting the four-ring-band stereoscopic panoramic acquisition module and the variable baseline light field module;
the camera cloud deck is used for supporting the different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit;
and the rack is used for supporting the unstructured optical field intelligent sensing system through a connecting piece.
Further, in an embodiment of the present invention, in an image in video data of a global camera in the non-structural heterogeneous high resolution imaging unit, a feature-based stitching algorithm estimates internal and external parameters of the global camera to obtain a preset position of the global camera, and embeds an image of a local camera in the non-structural heterogeneous high resolution imaging unit into the preset position of the global camera by using non-structural embedding.
Further, in an embodiment of the present invention, the variable baseline light field module is further configured to:
carrying out girdle stereo matching on the girdle images, and extracting a feature map from the girdle images through a neural network;
constructing a matching cost value according to the feature map to obtain a 4D parallax cost value;
obtaining the matching cost under each candidate parallax according to the 4D parallax cost amount;
performing cost aggregation on the cost matching result to obtain an optimized cost matching result;
and determining the parallax of each position from the optimized cost matching result to obtain an annular zone parallax map.
In order to achieve the above object, a second embodiment of the present invention provides an unstructured light field intelligent sensing apparatus, which includes:
the first fusion module is used for fusing data captured by different heterogeneous image sensors to obtain a first fusion image;
the second fusion module is used for fusing the first fusion image according to variable baselines among different cameras to obtain a second fusion image;
and the panoramic image fusion module is used for fusing the second fusion image to obtain an annular panoramic image.
The multi-unstructured-light-field intelligent sensing device provided by the embodiment of the invention is used for fusing data captured by different heterogeneous image sensors through the first fusion module to obtain a first fusion image; the second fusion module is used for fusing the first fusion image according to the variable baseline among different cameras to obtain a second fusion image; and the annular panoramic image fusion module is used for fusing the second fusion image to obtain an annular panoramic image. According to the invention, ultra-wide view field ultrahigh resolution images and videos are obtained through multiple heterogeneous image sensor fusion imaging, and large-range three-dimensional depth perception is realized by a variable baseline accurate reconstruction technology. Meanwhile, the invention also breaks through the limitation that the image sensor of the existing light field system follows uniform distribution, and does not depend on a structured system architecture which needs fine calibration. Meanwhile, the invention breaks through the bottleneck of restricting the space-time bandwidth product of the imaging of the optical image sensor for a long time, improves the data flux of the light field sensing from the international highest million-level pixel to a hundred million-level pixel, and also realizes the real-time three-dimension of the wide-field high-resolution dynamic light field.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic structural distribution diagram of an unstructured light field intelligent sensing system according to one embodiment of the invention;
FIG. 2 is a block diagram of an unstructured light field intelligent sensing system according to one embodiment of the invention;
FIG. 3 is a schematic structural diagram of an unstructured light field intelligent sensing system according to one embodiment of the invention;
FIG. 4 is a schematic structural diagram of a non-structural heterogeneous high resolution imaging unit according to one embodiment of the present invention;
FIG. 5 is a schematic diagram of a variable baseline light field module, according to one embodiment of the present invention;
fig. 6 is a schematic structural diagram of a four-girdle stereo panoramic acquisition module according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of intelligent sensing of unstructured light fields, according to one embodiment of the invention;
fig. 8 is a schematic structural diagram of an unstructured light field intelligent sensing device according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes an unstructured light field intelligent sensing system and device according to an embodiment of the invention with reference to the drawings.
The system of the invention relies on a novel mixed spherically distributed unstructured heterogeneous camera array for billion-pixel-level, large-range, remote 3D panoramic VR photography and achieves large-range depth perception, and has imaging capability with both high resolution and wide field of view. In addition, dense camera arrangement is not needed, flexible arrangement of a high-resolution area is realized through unstructured local camera compensation under sparse camera arrangement, and the cost is greatly improved compared with that of a traditional camera array. As shown in fig. 1 and 2.
Fig. 3 is a schematic structural diagram of an unstructured light field intelligent sensing system according to an embodiment of the present invention.
As shown in fig. 3, the system 10 includes: the system comprises a non-structural heterogeneous high-resolution imaging unit 100, a variable baseline light field module 200 and a four-ring band stereo panoramic acquisition module 300.
The non-structural heterogeneous high-resolution imaging unit 100 is composed of at least one global camera and a plurality of telephoto cameras, and is configured to fuse data captured by different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit to obtain a first fused image.
Specifically, as shown in fig. 4, the non-structural heterogeneous high-resolution imaging unit 100 is composed of at least one wide-field global camera and a plurality of high-resolution telephoto cameras, so that the system has imaging capability with both high resolution and wide field.
The variable baseline light field module 200 is composed of a plurality of non-structural heterogeneous high-resolution imaging units 100, and is configured to fuse the first fused image according to the variable baseline between different cameras to obtain a second fused image.
Specifically, as shown in fig. 5, the variable baseline light field module 200 is composed of a plurality of non-structural heterogeneous high-resolution imaging units 100, and can be freely combined into a light field imaging system with different baselines, so as to realize wide-range depth perception.
It can be understood that, for images in the video data of the global camera in the non-structural heterogeneous high-resolution imaging unit 100, the feature-based stitching algorithm estimates the internal and external parameters of the global camera to obtain the corresponding position of the global camera, and embeds the images of the local cameras in the non-structural heterogeneous high-resolution imaging unit 100 into the corresponding position of the global camera using the unstructured embedding.
Further, the present invention relies on a novel hybrid spherically distributed unstructured heterogeneous camera array for billion pixel-level, large-scale distant 3D panoramic VR photography. To enable multi-scale, unstructured and extensible VR content capture, the parameters of all cameras in the system need to be designed accordingly. To capture a large scene VR scene, each zonal camera employs a custom lens with FoV of 360 degrees landscape, 60 degrees portrait, for each non-structural heterogeneous imaging unit 100, a wide field of view global camera in the unit employs a 5mm lens with a 1/1.7 "CMOS sensor to provide a sufficient FoV, while a high definition local camera in the imaging unit employs a 40mm lens with a 1/1.8" CMOS sensor to capture high resolution local details. Notably, the focal length of the partial camera lens is adjustable to accommodate various VR scenes.
In the aspect of mechanical layout, a light aluminum alloy stereo camera frame and a connecting piece made of thermal-stable polylactic acid (PLA) are adopted for connecting the camera frame. The spherical variable basis light field module 200 support made of a thermally stable polylactic acid is used to support each of the non-structural heterogeneous imaging units 100. Every heterogeneous formation of image of non-structure unit 100 units support piece is hexagon light aluminum alloy frame, and unit support piece provides a plurality of extra installation anchor points and is used for the installation of camera cloud platform, and the camera except that four ring belt camera systems all is connected with system 10 through the cloud platform.
Compared with the traditional camera array for light field acquisition, the invention does not need dense camera arrangement. Under the condition of sparse camera arrangement, flexible arrangement of a high-resolution area is realized through unstructured local camera compensation. The cost is greatly improved compared with the traditional camera array.
The camera array of the four-zonal stereo panoramic acquisition module 300 is composed of four groups of super wide-angle sensors, and is used for fusing the second fused image obtained by the variable baseline light field module 200 to obtain an zonal panoramic image.
Further, in order to improve resolution and detail of the fused panorama of the different level sensors, an unstructured embedding scheme is used to reshape the pictures of the local cameras in all the unstructured heterogeneous imaging units 100 to the positions of the global cameras of their corresponding imaging units. The warping field is represented by first finding matching points between global-local pictures using a cross-resolution matching algorithm and then estimating a grid-based multi-homography model. Also, a linear Monge-kantorovitch (mkl) solution is applied for mapping the color patterns of the local cameras to the global panorama to achieve local-global color consistency. By applying the same technical scheme, the fused images of all the non-structural heterogeneous imaging units 100 can be embedded into the panoramic image of the four-sideband imaging module 300.
The invention provides application of a variable baseline accurate reconstruction algorithm and a system thereof, the algorithm can adaptively adjust the length of an optical field baseline according to a scene, and realize accurate large-range three-dimensional depth reconstruction, and in addition, the details of the three-dimensional depth reconstruction can be further optimized by utilizing images formed by a high-strength local camera in each non-structural heterogeneous imaging unit 100.
As shown in fig. 6, the four-zonal stereo panorama acquisition module 300 finally generates two upper and lower panoramic stereo images with larger parallax, the stereo images generated by the four-zonal stereo imaging system extract feature maps from the two stereo zonal images by using a shared weight feature pyramid through a deformable neural network, and the calculation of the zonal parallax is characterized in that the deformable neural network is used for feature extraction, different from the conventional convolutional neural network, the feature extraction is performed by using a common convolution, the convolution kernel size is 3x3, the arrangement of the sampling points is very regular, the sampling points are square, the convolution kernel of the deformable neural network is not a regular 3x3 square convolution kernel, but an offset obtained by learning of an additional convolution layer is added to each sampling point, and the image is subjected to feature extraction by using the convolution kernel with irregular arrangement. The ring zone image is distorted to a greater extent than the conventional image, so that the deformable convolutional neural network can well extract the features therein.
Specifically, a matching cost value is constructed by using the feature map, so that a 4D parallax cost value is obtained, a matching cost under each candidate parallax is obtained by the 4D parallax cost value, cost aggregation is performed on the cost matching result to obtain an optimized cost matching result, and the parallax of each position is determined from the optimized cost matching result by using a differentiable soft-argmin operation, so that a panoramic annular zone parallax map is obtained.
In one embodiment of the present invention, to reduce the complexity of solving large-scale feature maps, a coarse-to-fine strategy is used to extract four decreasing spatial resolution feature maps. And then, fusing feature maps of different levels by adopting a skip-connected encoder-decoder structure, and expanding a receiving range and a searching range by adopting an SPP (spatial pyramid pooling) structure.
After the feature extraction is completed, the extracted feature map is used to construct a matching cost amount. The selected disparity candidate range is 0-384 pixels, so we need to construct a matching cost map corresponding to each candidate disparity. Specifically, to construct the cost matching amount under the candidate disparity x, we need to move all pixels of all feature maps extracted from the right map by x pixels in the disparity matching direction, and then construct the matching cost amount under the candidate disparity by using the distance metric between the left and right feature maps at the disparity level, thereby forming a 4D (channel number C, height H, width W, and disparity D) disparity cost amount. And there are four different proportional cost indices 1/8,1/16,1/32,1/64, respectively, corresponding to the four coarse to fine feature pyramids, respectively. The matching cost under the candidate disparity can be reflected by the cost amount.
After the initial cost matching result is obtained, since only local correlation is considered, the initial cost matching result is very sensitive to noise and cannot be directly used for calculating the optimal parallax, and further optimization, namely cost aggregation, needs to be performed on the initial cost matching result. Traditionally, this problem is usually solved by optimization, in a neural network, the system performs cost aggregation on the preliminarily calculated cost matching results by using a 3D convolutional layer, and the 3D convolutional layer can extract semantic information and summarize matching cost to improve parallax quality. Here we use a stacked hourglass structure to learn more semantic information so that the final result has the correct semantic structure.
Further, as shown in fig. 5 and 6, the system 10 further includes: the device comprises a support piece, a camera cloud platform and a frame; the support is used for supporting the four-sideband stereoscopic panorama acquisition module 300 and the variable baseline light field module 200, and the camera holder is used in the non-structural heterogeneous high-resolution imaging unit 100 and supports different heterogeneous image sensors; the frame supports the entire system 10 by means of connectors.
Further, the system 10 further includes: a rendering module for implementing annulus panoramic free viewpoint rendering according to a layered rendering strategy, the rendering module comprising:
an original layer module to render the 3D video above a threshold resolution using an original layer;
the fuzzy layer module is used for processing the dragging problem in the picture by adopting a fuzzy layer;
and the dynamic layer module is used for adopting the dynamic layer to perform dynamic foreground rendering.
Meanwhile, the process of intelligently fusing the data is the process of fusing the clear image into the blurred image, so that the fused image is clearer, and meanwhile, the system has wide-range depth perception.
According to the intelligent sensing system for the unstructured light field, which is provided by the embodiment of the invention, the unstructured heterogeneous high-resolution imaging unit is composed of at least one global camera and a plurality of long-focus cameras and is used for fusing data captured by different heterogeneous image sensors in the unstructured heterogeneous high-resolution imaging unit to obtain a first fused image; the variable baseline light field module consists of a plurality of non-structural heterogeneous high-resolution imaging units and is used for fusing the first fused image according to variable baselines among different cameras to obtain a second fused image; the camera array of the four-girdle stereo panoramic acquisition module consists of four groups of super wide-angle image sensors and is used for fusing the second fused image obtained by the variable baseline light field module to obtain an girdle panoramic image. Super wide view field ultrahigh resolution images and videos are obtained through multiple heterogeneous image sensor fusion imaging, and large-range three-dimensional depth perception is achieved through a variable baseline accurate reconstruction technology. Meanwhile, the system breaks through the limitation that the image sensor of the existing light field system follows uniform distribution, and does not depend on a structured system architecture needing fine calibration. Meanwhile, the invention breaks through the bottleneck of restricting the space-time bandwidth product of the imaging of the optical image sensor for a long time, improves the data flux of the light field sensing from the international highest million-level pixel to a hundred million-level pixel, and also realizes the real-time three-dimension of the wide-field high-resolution dynamic light field.
Next, an unstructured light field intelligent sensing apparatus according to one embodiment of the present invention is described with reference to the accompanying drawings.
Fig. 8 is a schematic structural diagram of an unstructured light field intelligent sensing device according to an embodiment of the invention.
As shown in fig. 8, the apparatus 20 includes: a first fusion module 400, a second fusion module 500, and an annular panoramic image fusion module 600.
The first fusion module 400 is configured to fuse data captured by different heterogeneous image sensors to obtain a first fusion image;
a second fusion module 500, configured to fuse the first fusion image according to variable baselines between different cameras to obtain a second fusion image;
and an annulus panoramic image fusion module 600, configured to fuse the second fusion image to obtain an annulus panoramic image.
Further, the apparatus 20 further comprises: and the layered rendering strategy module is used for realizing annular panoramic free viewpoint rendering according to the layered rendering strategy.
As shown in fig. 7, it can be understood that the device 20 collects, acquires, and receives video data by controlling an unstructured camera array, specifically, multiple heterogeneous image sensors perform fusion imaging, performs intelligent fusion on RGB data captured by image sensors at different levels, generates ultra-wide view field and ultra-high definition images and videos, and uses a variable baseline accurate reconstruction technique to realize a large-range depth perception function by using variable baselines between different cameras and realize panoramic annular free viewpoint rendering by using a layered rendering strategy. In order to present a high-resolution panoramic VR scene, a feature-based stitching algorithm is used to estimate the intrinsic and extrinsic parameters of each group of global cameras. In addition, in order to reduce obvious artifacts caused by camera positioning errors and color inconsistency in the region near the stitching slit boundary, when calculating the camera pose, a graph-cut is applied to estimate a seamless mask and eliminate non-mask regions in the image. Finally, a linear Monge-Kantorovitch solution is used to achieve color consistency between cameras.
It can be understood that in the conventional processing, only the candidate disparity corresponding to the minimum matching cost at each position needs to be found as the disparity value at the position, but the method which cannot be guided cannot be realized in the neural network, so that the invention obtains the disparity map from the cost value by using a differentiable soft-argmin operation. The probability of the point disparity at each candidate disparity value is calculated from the prediction cost using softmax, the prediction disparity being the sum of each disparity value weighted by its probability. Thereby obtaining the parallax value of each point.
Using softmax based on predicted cost cdCalculating the probability of the parallax at each position under each candidate parallax value, wherein the predicted parallax is the weighted sum of each candidate parallax value according to the probability, and the specific steps are as follows:
Figure 180535DEST_PATH_IMAGE001
wherein,
Figure 847140DEST_PATH_IMAGE002
is the predicted disparity, D is the true disparity, DmaxIs the maximum value of the candidate disparity value, σ represents the softmax operation, cdIs the cost of the parallax candidate value d. Loss function L of the neural networks1The method comprises the following steps:
Figure DEST_PATH_IMAGE003
where N is the number of true disparity values,
Figure 925954DEST_PATH_IMAGE002
is the predicted disparity and d is the true disparity.
After the preliminary parallax is obtained, since the base length of the annular system and the focal length of the camera are known, the relation is converted by the parallax depth:
Figure 423932DEST_PATH_IMAGE004
wherein b is a camera system baseline, f is a focal length of the camera system, d is the pixel parallax, and Z is the actual depth of the pixel, a depth prior of each region in the scene can be obtained, and the prior can better guide further depth estimation. Because the variable baseline light field consists of a plurality of groups of non-structural heterogeneous imaging units, a plurality of groups of light field imaging combinations with different baselines can be formed among different units, and further, the large-range depth perception is realized on the basis of the depth prior of the girdle band. As can be known from the above parallax depth conversion relation formula, the parallax and the depth are in an inverse relation under the condition of a fixed focal length and a fixed base length, so that it is difficult to recover good three-dimensional depth information at a place where the parallax is small or large and the calculation difficulty is high, and therefore, for different parallax regions, it is necessary to select an appropriate combination of the base and the focal length to accurately recover the depth information. Generally speaking, for the calculation of the parallax of the two images, the parallax range is more suitable to be 10-60 pixels, so that the optimal baseline composition at the position can be determined through the depth prior of the ring zone, and the parallax at the position can be acquired more accurately.
After the optimal baseline is determined, the exact disparity is obtained by the same steps as the zone depth calculation.
Further, the cross-scale mapping fusion optimization can be further performed on the parallax by using the non-structural heterogeneous high-resolution imaging unit. The depth acquired in the variable baseline wide-range depth perception is a depth map of the wide-field global camera view angle of each non-structural heterogeneous high-resolution imaging unit, and the depth map of the global camera view angle can be further optimized through an embedded high-definition local camera,
for the embedded local RGB image, a bilateral operator solver is adopted, and a local disparity map is refined based on the structure of the high-resolution local RGB image: assuming that the target disparity map is t and the per-pixel confidence map is c, an improved disparity map x is obtained by solving the following function:
Figure DEST_PATH_IMAGE005
wherein,
Figure 133262DEST_PATH_IMAGE006
is a correlation matrix that can be obtained from a reference image in YUV color space.
After the RGB pictures and the corresponding disparity maps are obtained, we propose an efficient 3-layer rendering scheme for rendering our billion-pixel 3D shots in real time.
A layered rendering policy module for implementing annulus panoramic free viewpoint rendering according to a layered rendering policy, wherein the layered rendering policy module comprises:
an original layer module to render the 3D video above a threshold resolution using an original layer;
the fuzzy layer module is used for processing the dragging problem in the picture by adopting a fuzzy layer;
and the dynamic layer module is used for adopting the dynamic layer to perform dynamic foreground rendering.
Wherein rendering the high-resolution 3D video using the original layer comprises: projecting the stitched disparity map onto three-dimensional coordinates to generate a background grid, and drawing the stitched panorama on the background grid:
Figure 83900DEST_PATH_IMAGE007
wherein K and R represent internal and external parameters of the camera,
Figure 966405DEST_PATH_IMAGE008
and
Figure 256572DEST_PATH_IMAGE009
is the pixel position of the point p in the image plane,
Figure 199121DEST_PATH_IMAGE010
is a pixel depth value
Figure 637055DEST_PATH_IMAGE012
Figure 57672DEST_PATH_IMAGE014
And
Figure 467925DEST_PATH_IMAGE016
representing the rendering position of pixel p.
For the area covered by the local camera, the mesh vertex density is increased at magnification to obtain better depth quality.
When rendering using a single layer mesh, stretched triangle artifacts are easily created at the depth edges when moving the viewpoint. To optimize these artifacts, we first tear the grid apart by removing the grid whose normal direction makes a large angle with the view direction.
Wherein, adopt the fuzzy layer to shelter from the problem of dragging that the sudden change of depth of processing department caused, include:
removing the pulling area: the pull area affecting the visual effect is removed by removing the grid with normal direction at a large angle to the view direction:
Figure 581375DEST_PATH_IMAGE017
wherein,
Figure 506605DEST_PATH_IMAGE018
is the normal vector of the grid surface,
Figure 730913DEST_PATH_IMAGE019
showing the direction of the view from the facing center to the optical center,
Figure 727163DEST_PATH_IMAGE020
to represent
Figure 11514DEST_PATH_IMAGE018
And
Figure 424041DEST_PATH_IMAGE019
the included angle between them;
and adding a fuzzy layer behind the original layer to repair holes generated when the viewpoint is moved.
After the dragging area is removed, a hole appears in the rendering effect, and the hole appearing when the viewpoint is moved is repaired by adding a fuzzy layer behind the original layer, so that the dragging area with sudden change of the shielding area becomes smooth, and the whole visual effect is not influenced.
Wherein, in order to realize efficient rendering, the dynamic layer is adopted to perform a dynamic foreground rendering layer comprising: the grid of dynamic foreground is updated.
The updating of the grid of the dynamic foreground specifically includes:
the extraction module is used for initially extracting the dynamic foreground grid through Gaussian mixture model background subtraction;
the optimization module is used for optimizing a dynamic mask with clear dynamic foreground grids by adopting a fully connected conditional random field model;
a rendering module to recalculate 3D vertices belonging to the dynamic mask based on the dynamic mask to render the dynamic foreground mesh.
The foreground may be initially extracted by a Gaussian Mixture Model (GMM) background subtraction method. Since the dynamic mask generated by GMM is coarser in the object boundary, an efficient dense conditional random field (denseCRF) inference model is employed to obtain a clear boundary mask. For each new frame, the 3D vertices belonging to the dynamic mask are recalculated based on the high quality dynamic mask to render the dynamic object.
A rendering procedure based on the entire layer may generate high quality panoramic rendering results, especially in local areas, which may improve the visual effect and provide a zoom-in function. Further, by using the diffusion layer, artifacts due to occlusion can be eliminated, and the dynamic region can be efficiently updated by using the dynamic layer.
The invention shows excellent robustness and expansibility after multiple on-site shooting experiments. The method can extract large-range accurate depth information in a large scene, and the high-resolution picture of the flexible unstructured local camera can be seamlessly embedded into the panoramic image.
It should be noted that the foregoing explanation of the embodiment of the unstructured light field intelligent sensing system is also applicable to the unstructured light field intelligent sensing apparatus of this embodiment, and details are not repeated here.
According to the intelligent sensing device for the unstructured light field, which is provided by the embodiment of the invention, the first fusion module is used for fusing data captured by different heterogeneous image sensors to obtain a first fusion image; the second fusion module is used for fusing the first fusion image according to the variable baseline among different cameras to obtain a second fusion image; and the annular panoramic image fusion module is used for fusing the second fusion image to obtain an annular panoramic image. According to the invention, ultra-wide view field ultrahigh resolution images and videos are obtained through multiple heterogeneous image sensor fusion imaging, and large-range three-dimensional depth perception is realized by a variable baseline accurate reconstruction technology. Meanwhile, the invention also breaks through the limitation that the image sensor of the existing light field system follows uniform distribution, and does not depend on a structured system architecture which needs fine calibration. Meanwhile, the invention breaks through the bottleneck of restricting the space-time bandwidth product of the imaging of the optical image sensor for a long time, improves the data flux of the light field sensing from the international highest million-level pixel to a hundred million-level pixel, and also realizes the real-time three-dimension of the wide-field high-resolution dynamic light field.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An unstructured light field intelligent sensing system for acquiring light field video data, the system comprising:
the non-structural heterogeneous high-resolution imaging unit consists of at least one global camera and a plurality of long-focus cameras and is used for fusing data captured by different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit to obtain a first fused image;
the variable baseline light field module consists of a plurality of non-structural heterogeneous high-resolution imaging units and is used for fusing the first fused image according to variable baselines among different cameras to obtain a second fused image;
the camera array of the four-girdle three-dimensional panoramic acquisition module consists of four groups of super-wide angle image sensors and is used for fusing the second fused image obtained by the variable baseline light field module to obtain an girdle panoramic image.
2. The unstructured light field intelligent perception system of claim 1, further comprising:
a rendering module for implementing annulus panoramic free viewpoint rendering according to a layered rendering strategy, the rendering module comprising:
an original layer module to render the 3D video above a threshold resolution using an original layer;
the fuzzy layer module is used for processing the dragging problem in the picture by adopting a fuzzy layer;
and the dynamic layer module is used for adopting the dynamic layer to perform dynamic foreground rendering.
3. The unstructured light field intelligent perception system of claim 1, further comprising:
the device comprises a support piece, a camera cloud platform and a frame;
the support is used for supporting the four-ring-band stereoscopic panoramic acquisition module and the variable baseline light field module;
the camera cloud deck is used for supporting the different heterogeneous image sensors in the non-structural heterogeneous high-resolution imaging unit;
and the rack is used for supporting the unstructured optical field intelligent sensing system through a connecting piece.
4. The unstructured light field intelligent perception system according to claim 1, wherein images in video data of a global camera in the unstructured heterogeneous high resolution imaging unit are processed by a feature-based stitching algorithm to estimate internal and external parameters of the global camera to obtain a preset position of the global camera, and images of local cameras in the unstructured heterogeneous high resolution imaging unit are embedded into the preset position of the global camera by using unstructured embedding.
5. The unstructured light field intelligent perception system of claim 1, wherein the variable baseline light field module is further configured to:
carrying out girdle stereo matching on the girdle images, and extracting a feature map from the girdle images through a neural network;
constructing a matching cost value according to the feature map to obtain a 4D parallax cost value;
obtaining the matching cost under each candidate parallax according to the 4D parallax cost amount;
performing cost aggregation on the cost matching result to obtain an optimized cost matching result;
and determining the parallax of each position from the optimized cost matching result to obtain an annular zone parallax map.
6. An intelligent sensing device for an unstructured light field, comprising:
the first fusion module is used for fusing data captured by different heterogeneous image sensors to obtain a first fusion image;
the second fusion module is used for fusing the first fusion image according to variable baselines among different cameras to obtain a second fusion image;
and the panoramic image fusion module is used for fusing the second fusion image to obtain an annular panoramic image.
7. The unstructured light field intelligent perception device of claim 6, further comprising:
a layered rendering policy module for implementing annulus panoramic free viewpoint rendering according to a layered rendering policy, wherein the layered rendering policy module comprises:
an original layer module to render the 3D video above a threshold resolution using an original layer;
the fuzzy layer module is used for processing the dragging problem in the picture by adopting a fuzzy layer;
and the dynamic layer module is used for adopting the dynamic layer to perform dynamic foreground rendering.
8. The unstructured light field intelligent sensing apparatus of claim 7, further comprising:
the extraction module is used for initially extracting the dynamic foreground grid through Gaussian mixture model background subtraction;
the optimization module is used for optimizing the dynamic mask with clear dynamic foreground grids by adopting a fully connected conditional random field model;
a rendering module to recalculate 3D vertices belonging to the dynamic mask based on the dynamic mask to render the dynamic foreground mesh.
9. The intelligent sensing apparatus for unstructured light fields according to claim 6, wherein the zonal panoramic image is characterized by the bias obtained by additional convolutional layer learning by using convolution kernel of the neural network arranged irregularly to check the characteristicExtracting, using softmaxsoftmax according to the predicted cost cdCalculating the probability of the parallax at each position under each candidate parallax value, wherein the predicted parallax is the sum weighted by the probability of each candidate parallax value, and the predicted parallax calculation formula is as follows:
Figure 356180DEST_PATH_IMAGE001
the loss function Ls1 of the neural network is:
Figure 786024DEST_PATH_IMAGE002
wherein,
Figure 968263DEST_PATH_IMAGE003
is the predicted disparity, d is the true disparity, Dmax is the maximum value of the candidate disparity values, σ represents the softmax operation, cdIs the amount of cost for the disparity candidate value d, and N is the number of true disparity values.
10. The intelligent sensing apparatus for unstructured light fields according to claim 6, further comprising a disparity map module for:
adopting a bilateral operator solver for the embedded local image, and obtaining an improved disparity map x based on the structure of the local image, wherein the improved disparity map x is obtained by solving the following function:
Figure 243386DEST_PATH_IMAGE004
where t is the target disparity map, c is the per-pixel confidence map,
Figure 886857DEST_PATH_IMAGE005
is the correlation matrix obtained with reference to the image.
CN202110978131.7A 2021-08-25 2021-08-25 Intelligent sensing system and device for unstructured light field Active CN113436130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110978131.7A CN113436130B (en) 2021-08-25 2021-08-25 Intelligent sensing system and device for unstructured light field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110978131.7A CN113436130B (en) 2021-08-25 2021-08-25 Intelligent sensing system and device for unstructured light field

Publications (2)

Publication Number Publication Date
CN113436130A true CN113436130A (en) 2021-09-24
CN113436130B CN113436130B (en) 2021-12-21

Family

ID=77797796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110978131.7A Active CN113436130B (en) 2021-08-25 2021-08-25 Intelligent sensing system and device for unstructured light field

Country Status (1)

Country Link
CN (1) CN113436130B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187724A (en) * 2021-12-01 2022-03-15 北京拙河科技有限公司 Target area security and monitoring system based on hundred million-level pixel camera
CN114842359A (en) * 2022-04-29 2022-08-02 西北工业大学 Vision-based method for detecting autonomous landing runway of fixed-wing unmanned aerial vehicle
CN116449642A (en) * 2023-06-19 2023-07-18 清华大学 Immersion type light field intelligent sensing calculation system, method and device
CN118301489A (en) * 2024-04-15 2024-07-05 四川新视创伟超高清科技有限公司 Parallax elimination method and system for multi-viewpoint image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130141524A1 (en) * 2012-06-08 2013-06-06 Apple Inc. Methods and apparatus for capturing a panoramic image
US20160286137A1 (en) * 2013-09-16 2016-09-29 Duke University Method for combining multiple image fields
CN107734268A (en) * 2017-09-18 2018-02-23 北京航空航天大学 A kind of structure-preserved wide baseline video joining method
CN110581959A (en) * 2018-06-07 2019-12-17 株式会社理光 Multiple imaging apparatus and multiple imaging method
CN111343367A (en) * 2020-02-17 2020-06-26 清华大学深圳国际研究生院 Billion-pixel virtual reality video acquisition device, system and method
CN113016173A (en) * 2018-12-07 2021-06-22 三星电子株式会社 Apparatus and method for operating a plurality of cameras for digital photographing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130141524A1 (en) * 2012-06-08 2013-06-06 Apple Inc. Methods and apparatus for capturing a panoramic image
US20160286137A1 (en) * 2013-09-16 2016-09-29 Duke University Method for combining multiple image fields
CN107734268A (en) * 2017-09-18 2018-02-23 北京航空航天大学 A kind of structure-preserved wide baseline video joining method
CN110581959A (en) * 2018-06-07 2019-12-17 株式会社理光 Multiple imaging apparatus and multiple imaging method
CN113016173A (en) * 2018-12-07 2021-06-22 三星电子株式会社 Apparatus and method for operating a plurality of cameras for digital photographing
CN111343367A (en) * 2020-02-17 2020-06-26 清华大学深圳国际研究生院 Billion-pixel virtual reality video acquisition device, system and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114187724A (en) * 2021-12-01 2022-03-15 北京拙河科技有限公司 Target area security and monitoring system based on hundred million-level pixel camera
CN114842359A (en) * 2022-04-29 2022-08-02 西北工业大学 Vision-based method for detecting autonomous landing runway of fixed-wing unmanned aerial vehicle
CN116449642A (en) * 2023-06-19 2023-07-18 清华大学 Immersion type light field intelligent sensing calculation system, method and device
CN116449642B (en) * 2023-06-19 2023-08-29 清华大学 Immersion type light field intelligent sensing calculation system, method and device
CN118301489A (en) * 2024-04-15 2024-07-05 四川新视创伟超高清科技有限公司 Parallax elimination method and system for multi-viewpoint image
CN118301489B (en) * 2024-04-15 2024-09-27 四川国创新视超高清视频科技有限公司 Parallax elimination method and system for multi-viewpoint image

Also Published As

Publication number Publication date
CN113436130B (en) 2021-12-21

Similar Documents

Publication Publication Date Title
CN113436130B (en) Intelligent sensing system and device for unstructured light field
CN107659774B (en) Video imaging system and video processing method based on multi-scale camera array
CN108986136B (en) Binocular scene flow determination method and system based on semantic segmentation
KR102003015B1 (en) Creating an intermediate view using an optical flow
CN111343367B (en) Billion-pixel virtual reality video acquisition device, system and method
CN109360235B (en) Hybrid depth estimation method based on light field data
US20220222776A1 (en) Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution
WO2019214568A1 (en) Depth-based light field splicing method
US8928737B2 (en) System and method for three dimensional imaging
CN108074218A (en) Image super-resolution method and device based on optical field acquisition device
CN106570938A (en) OPENGL based panoramic monitoring method and system
CN112396562A (en) Disparity map enhancement method based on RGB and DVS image fusion in high-dynamic-range scene
JP2008016918A (en) Image processor, image processing system, and image processing method
CN113221665A (en) Video fusion algorithm based on dynamic optimal suture line and improved gradual-in and gradual-out method
CN109949354B (en) Light field depth information estimation method based on full convolution neural network
CN114419568A (en) Multi-view pedestrian detection method based on feature fusion
CN108156383B (en) High-dynamic billion pixel video acquisition method and device based on camera array
CN115086550A (en) Meta-imaging method and system
CN108564654B (en) Picture entering mode of three-dimensional large scene
CN110290373B (en) Integrated imaging calculation reconstruction method for increasing visual angle
JP2013200840A (en) Video processing device, video processing method, video processing program, and video display device
KR100321904B1 (en) An apparatus and method for extracting of camera motion in virtual studio
JP2005260753A (en) Device and method for selecting camera
CN114862934B (en) Scene depth estimation method and device for billion pixel imaging
CN115988338B (en) Far-field signal inversion reconstruction method based on compound-eye camera array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant