System and method for realizing real-time multi-scale imaging by utilizing tiling of camera
Technical Field
The invention relates to a system and a method for realizing real-time multi-scale imaging by utilizing tiling of a camera, belonging to the technical field of image processing.
Background
The digital image resolution depends on the number of pixels. The resolution of solid-state imaging technology ranges from the first million to the present 10 to 100 million. Current sensor pixel sizes have been reduced to the order of several wavelengths of light. At this scale, further reduction of the pixel size becomes difficult. So to increase the image size, the sensor area starts to increase. However, the current sensor with the volume larger than 1 square centimeter is expensive and has low yield. In addition, the resolvable spot size and spatial frequency response capability of the lens decreases as the size of the sensor increases. This is also why the image size should be increased by using the camera array, rather than increasing the single camera sensor resolution.
In recent centuries, panoramic high-definition images are formed by continuously shooting a plurality of pictures by rotating a single camera on a tripod and splicing the pictures into a large image. Prior to the digital era, panoramic images were created by cropping and stitching. Recently emerging stitching techniques achieve stitching by common points in the associated overlap regions. Errors often occur in this process, causing perceptible errors. The main reason is that different pictures are taken at different times and the common point may shift between the two exposures.
The panoramic image may also be captured by a camera array, as shown in fig. 1. The camera arrays can acquire data synchronously, and time difference between images is eliminated. Such an ultra-high resolution image acquisition system has been described in us patent 8,259,212 (multi-scale optical system). In such an architecture, it is difficult to ensure that the optical axes of many cameras pass through the same point as much as possible.
As shown in fig. 2, the angular range in which the camera views the scene is called the field angle. Each camera pixel collects data from a scene, and the angular range corresponding to one pixel is called a transient field angle ifov. One camera has N horizontal pixels, M vertical pixels, and the horizontal field angle is N ifov and the vertical field angle is M ifov. The total number of pixels is N × M. The field angle can be increased by utilizing multiple cameras or sensors as described in the figures. Each camera observes a part of the scene and the images are partially overlapped, and the scene cannot have a place which is not covered by the camera.
As depicted in fig. 2, the camera views different portions of the scene from different perspectives. The different poses of the cameras make it difficult to form a single view overview. Ideally, we would like to arrange the front camera light holes around a central point so that all cameras share a viewing angle central point. In practice, this arrangement is difficult, if not impossible, to achieve. As an alternative, we can simply ensure that the camera optical axes pass through the same point, as shown in fig. 3.
In this case, we can translate the effective viewing angle from the aperture entrance to the viewing point where all cameras intersect together by scaling the image. However, the amount of scaling each object to translate the effective viewpoint point is related to the distance of the object. In a scene with objects in a number of different locations, it is not possible to achieve panning by accurate zooming.
In previous practice, the images of multiple cameras were stitched together to form a panoramic composite. The total number of pixels in the composite map is less than the number of pixels in the camera matrix, since there is an overlap region between the cameras, and some pixels are captured by more than one camera. The number of pixels of the composite map still far exceeds the number of pixels per camera and also far exceeds the number of pixels that can be effectively displayed on the display. These large drawings are typically stored in memory in a "tile" format, as shown in fig. 4. The image with the lowest resolution is arranged on the uppermost layer of the tile pyramid, and the scene is displayed. In this figure, the low resolution image is 128 x 128 pixels. The second layer of the pyramid has 4 images, each of 128 x 128 pixels, the scene is divided into four quadrants, and the resolution is increased by two times. The next layer, consisting of 16 tiles, is also 128 x 128 pixels, with a resolution two times higher than the tiles of the second layer. In the image display system, only tile pieces that need to be displayed are used. For example, a 128 x 128 display uses only the top ceramic tile to display the panorama. When a user zooms into a detail of a scene, only the tiles associated with that detail are sent to the display. In previous practice, this tile pyramid structure was created starting from a full resolution map of the composite scene. This configuration allows the user to see the full resolution image without transmitting the full resolution image over the network. This method is adopted by Baidu Google maps, allowing users to browse large sets of image data without transmitting all the data.
As a key component of this invention, we note that creating a global image in order to use a high definition image in a camera matrix is really not necessary, or even a good thing. The user simply wants to integrate the images from multiple cameras onto the display with a visually and organoleptically pleasing effect. Stitching errors in adjacent regions of a camera can cause user frustration, translating from one camera perspective to the next without necessarily causing user frustration. From this point of view, we seek a way to translate the camera view angle between adjacent cameras without stitching errors.
In certain cases, it is not possible to eliminate stitching errors from a computational perspective, since different cameras can see scenes that are not seen by other occluded cameras. In other cases, it is possible to eliminate or reduce stitching errors by matching adjacent camera images, but the amount of computation required is also significant. In this invention we seek a method that not only avoids stitching errors, but also ensures real-time browsing in the scene by eliminating the step of stitching images.
It is critical to understand this invention to recognize that the resolution of a single sensor camera or camera array is much greater than the display resolution. The resolution of existing high definition television images is 1920 x 1080 pixels, or 4k (3840 x 2160 pixels). Cameras of 8k resolution (twice the existing 4k resolution cameras) are under development. Recognizing that high definition resolution is sufficient for most human display applications, the addition of camera resolution can be used to achieve a digital PTZ effect. As described below, using a camera in this manner may enable real-time panoramic images without stitching errors.
Disclosure of Invention
The purpose of the invention is as follows: the method aims to provide a method which can reduce or avoid the use of an image splicing technology, avoid splicing errors, improve the display effect of the panoramic image and ensure real-time browsing of scenes in the panoramic image by eliminating the step of splicing the image.
The system for realizing real-time multi-scale imaging by utilizing the tiling of the camera comprises a camera array, an image processing and storing system and a display, wherein the image processing and storing system processes images acquired by the camera array and then sends the processed images to the display; the method is characterized in that: the camera array is composed of a plurality of micro cameras which are arranged in a staggered mode, and the resolution of a micro view field obtained by shooting through the micro cameras is higher than that of a display view field of the display.
The camera array also includes a wide field of view camera.
The four quadrants of the micro-visual field are respectively overlapped with one quadrant of the micro-visual field obtained by shooting of four adjacent micro-cameras.
Further, the size of the display field of view must be smaller than the size of the microprint field of 1/4.
The method for realizing real-time multi-scale imaging by utilizing the tiling of the camera comprises the following steps:
a wide-field-of-view camera acquires a coarse-field image of the field of view of the whole system;
the method comprises the steps that a micro-camera obtains a high-precision micro-view field image of a local system view field, and four quadrants of each micro-view field are respectively overlapped with one quadrant of the micro-view field obtained by shooting of four adjacent micro-cameras.
The display field of view of the display can be arbitrarily moved up, down, left and right within the micro-field of view, and when a certain quadrant is displayed, any one of two micro-cameras covering the quadrant can be used for generating display data.
When the display field of view is translated to the next quadrant of the current micro-field of view, the micro-camera covering the current micro-field of view and the micro-camera covering the current new quadrant may collectively provide display data for the display field of view.
When the user zooms in on the display field of view beyond the micro-field of view of a single micro-camera, a wide field of view camera is employed to provide the display data.
Furthermore, a dense micro-camera array can be used in the central zone of the system view field to realize quadrant overlapping acquisition of the micro view field, and a non-overlapping array is used in the edge zone of the system view field, and a traditional image stitching algorithm is adopted.
By utilizing the resolution difference between the micro-cameras and the display, the resolution of the micro-cameras is higher than that of the display, and the micro-cameras in the camera array are arranged in a staggered mode, so that two micro-cameras are arranged in each quadrant of a micro-field acquired by the micro-cameras to provide display data, and when the display field moves in the micro-field provided by the micro-cameras arranged adjacently, a real-time high-resolution panoramic image can be obtained. In addition, the process of splicing and calculating images of adjacent cameras is eliminated, so that the error of splicing and calculating is avoided, the image quality is improved, the display speed is also improved, and the real-time browsing in the field of view of the system is ensured.
Those skilled in the art will appreciate that this strategy can be applied to a multi-level architecture from coarse to fine. It will also be appreciated that it is necessary to match the brightness, contrast, Gamma curves and other image parameters between cameras in order to smoothly pan from one camera to another. It is also advantageous to smooth the field of view point movement by matching the inter-camera feature points, which of course is much more relaxed than the corresponding requirements in conventional image stitching.
Drawings
FIG. 1 is a schematic view of a single camera field of view;
FIG. 2 is a schematic view of a multi-camera field of view;
FIG. 3 is a schematic view of a view center point arrangement;
FIG. 4 is a schematic view of a tile pyramid display structure;
FIG. 5 is a schematic view of the multi-camera field of view configuration of the present invention;
FIG. 6 is a schematic view of a micro-field of view of the present invention;
fig. 7 is a schematic view of the field of view structure of the system of the present invention.
Detailed Description
The system and method for real-time multi-scale imaging with camera tiling according to the present invention will be described in further detail with reference to the accompanying drawings and specific examples.
The system for realizing real-time multi-scale imaging by utilizing the tiling of the camera comprises a camera array, an image processing and storing system and a display, wherein the image processing and storing system processes images acquired by the camera array and then sends the processed images to the display; the method is characterized in that: the camera array is composed of a plurality of micro cameras which are arranged in a staggered mode, and the resolution of a micro view field obtained by shooting through the micro cameras is higher than that of a display view field of the display.
The camera array also includes a wide field of view camera.
The four quadrants of the micro-visual field are respectively overlapped with one quadrant of the micro-visual field obtained by shooting of four adjacent micro-cameras.
Further, the size of the display field of view must be smaller than the size of the microprint field of 1/4.
The method for realizing real-time multi-scale imaging by utilizing the tiling of the camera comprises the following steps:
a wide-field-of-view camera acquires a coarse-field image of the field of view of the whole system;
the method comprises the steps that a micro-camera obtains a high-precision micro-view field image of a local system view field, and four quadrants of each micro-view field are respectively overlapped with one quadrant of the micro-view field obtained by shooting of four adjacent micro-cameras.
The display visual field of the display can be arbitrarily moved up, down, left and right in the micro visual field, and when a certain quadrant is displayed, any one of two micro cameras covering the quadrant can be used for generating display data.
When the display field of view is translated to the next quadrant of the current micro-field of view, the micro-camera covering the current micro-field of view and the micro-camera covering the current new quadrant may collectively provide display data for the display field of view.
When the user zooms in on the display field of view beyond the micro-field of view of a single micro-camera, a wide field of view camera is employed to provide the display data.
Furthermore, a dense micro-camera array can be used in the central zone of the system view field to realize quadrant overlapping acquisition of the micro view field, and a non-overlapping array is used in the edge zone of the system view field, and a traditional image stitching algorithm is adopted.
We consider an image system consisting of an array of cameras. We turn the cameras in each array into micro-cameras. Each micro-camera observes a portion of a scene. The space observed by all the micro-cameras is collectively called the system field of view, and the space observed by each micro-camera is called the micro-camera field of view. We also consider display images generated from an array of cameras. Each display image also has a field of view. Unlike tiled panoramic imaging, we assume that each display field of view corresponds to a sub-region of one micro-camera field of view. The basic structure of the system is illustrated in the accompanying drawings. It represents a microphotograph array with overlapping fields of view and the resulting display image.
High definition is referred to herein particularly in the context of image sizes significantly exceeding the display size of the display device. For example, a 4K image is displayed on a 1920 x 1080 pixel display. Even in the case of a display having a higher resolution, a satisfactory display effect can be presented on the display by up-sampling. At present, camera systems can acquire images of hundreds of million or even hundreds of millions of pixels, and the camera systems usually adopt an array camera or scan a scene by fixing the camera on a holder, and then splice a panoramic large image. The methods described herein use an array of cameras to enhance the display effect, but reduce or avoid the use of image stitching.
The core concept of the present invention is that we require that the resolution of each micro-camera be higher than the resolution required for displaying the image. We may require the micro-camera to have more pixels than the display or interpolate to increase the size gap between the micro-camera image and the display. In both cases, we assume that the display image area can be moved up and down, left and right within the image area of the micro-camera, and in order to obtain a larger field of view for the micro-camera, a 4K (3840 × 2160) camera can be used to generate a standard HD (1920 × 1080) display image stream. The same effect can be achieved by upsampling an HD camera to generate an equivalent 4K camera image, thereby generating a display image stream. In both cases, the display window may be digitally moved up and down left and right by selecting the corresponding pixel range in the camera window. Digital scaling may be achieved by up or down sampling the camera window to match the display window.
The invention relates to a method for directly obtaining image tiles by utilizing a camera array. Unlike previous methods that utilize tile structure trees, we require sufficient overlap of tiles and display images so that a display image stream can be obtained directly from one camera image stream without the need for stitching. To achieve this, we organize the camera array field of view as shown in the figure. A wide field-of-view camera captures a coarse field image of the entire scene. The fine resolution camera array acquires detailed images of parts of a scene. As shown in the figure, the arrays of fine resolution cameras are interleaved with each other, with the four quadrants of each camera overlapping with one quadrant of four adjacent cameras. The display window is required to be smaller than or equal to 1/4 camera acquisition window size, so that the display window can be continuously translated from one camera window to the next camera window without splicing. When the display window is in a particular quadrant, either one of the two cameras covering the tile may be used to generate the display data. When the display window moves to the next quadrant, the camera covering the current quadrant and the camera of the new quadrant may together smoothly provide data to the display window. In the new quadrant, the display window may be transferred to a new camera that also covers the window, so that the display window is separated from the existing camera and expanded to the new camera.
We can build a composite architecture using dense overlapping camera acquisition windows in the central zone of the system field of view and non-overlapping arrays in the marginal zones of the system field of view. In the edge zones, the translation between the cameras can be performed by using conventional image stitching algorithms and tile mosaic methods. In most cases, users are more tolerant of splicing errors at the edge zones. The coarse resolution camera is used when the user zooms in on the image window beyond the field of view of a single micro-camera.