CN116402878A

CN116402878A - Light field image processing method and device

Info

Publication number: CN116402878A
Application number: CN202310410075.6A
Authority: CN
Inventors: 李亚鹏; 马媛媛; 胡飞涛; 李扬冰; 王雷; 谭丞鸣
Original assignee: BOE Technology Group Co Ltd; Beijing BOE Technology Development Co Ltd
Current assignee: BOE Technology Group Co Ltd; Beijing BOE Technology Development Co Ltd
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-07-07

Abstract

The disclosure relates to the technical field of image processing, and particularly provides a light field image processing method and device. A light field image processing method comprises the steps of obtaining target test point information and a light field image group sent by acquisition equipment, carrying out view point fusion on foreground images in all light field images based on the target view point information to obtain foreground view point images, carrying out view point fusion on background images in all light field images based on the target view point information and a pre-generated background view point diagram of each view point position to obtain background view point images, and generating target light field images according to the foreground view point images and the background view point images. In the embodiment of the disclosure, the parallax images of the background images are generated in advance in a front background segmentation mode, so that the parallax images aiming at complex backgrounds do not need to be calculated in real time, the data volume of view image synthesis is greatly reduced, the image processing speed and the image processing precision are improved, and the real-time light field video communication can be realized.

Description

Light field image processing method and device

Technical Field

The disclosure relates to the technical field of image processing, in particular to a light field image processing method and device.

Background

The Light Field (Light Field) can record Light data with higher dimensionality, so that three-dimensional information with higher precision than traditional two-dimensional imaging and traditional three-dimensional imaging represented by binocular stereoscopic vision is obtained, and the Light Field video can accurately sense a dynamic environment, so that a user can feel an immersive viewing experience.

However, the data size of the light field video is large, the processing speed is low, and real-time light field video presentation is difficult to realize, so that the application scene of the light field video is limited.

Disclosure of Invention

In order to improve data processing efficiency of light field images and thereby achieve real-time light field video presentation, embodiments of the present disclosure provide a light field image processing method, apparatus, electronic device, video communication system, and storage medium.

In a first aspect, embodiments of the present disclosure provide a light field image processing method, applied to a display device, the method including:

acquiring target viewpoint information and a light field image group transmitted by acquisition equipment; the target viewpoint information represents position information of eyes of an observer of the display device, and the light field image group comprises light field images acquired by each target light field camera in the acquisition device;

Performing viewpoint fusion on foreground images in each light field image based on the target viewpoint information to obtain a foreground viewpoint image corresponding to the target viewpoint information;

performing viewpoint fusion on background images in each light field image based on the target viewpoint information and a pre-generated background parallax image of each viewpoint position to obtain a background viewpoint image corresponding to the target viewpoint information; the viewpoint position represents a position corresponding to each target light field camera;

and generating a target light field image according to the foreground viewpoint image and the background viewpoint image.

In some embodiments, the performing viewpoint fusion on the foreground images in each light field image based on the target viewpoint information to obtain a foreground viewpoint image corresponding to the target viewpoint information includes:

performing image segmentation on each light field image in the light field image group to obtain a foreground image corresponding to each light field image;

for a first light field image and a second light field image corresponding to any two adjacent target light field cameras, performing parallax estimation on a first foreground image corresponding to the first light field image and a second foreground image corresponding to the second light field image to obtain a first foreground parallax image corresponding to the first foreground image and a second foreground parallax image corresponding to the second foreground image;

Performing parallax mapping on the first foreground image based on the first foreground parallax image to obtain a first foreground map, and performing parallax mapping on the second foreground image based on the second foreground parallax image to obtain a second foreground map;

and carrying out image fusion processing on the first foreground mapping image and the second foreground mapping image based on the target viewpoint information to obtain the foreground viewpoint image.

In some embodiments, the image segmentation is performed on each light field image in the light field image group to obtain a foreground image corresponding to each light field image, including:

and for each light field image in the light field image group, carrying out image difference on the light field image based on a pre-generated preset background image with the same viewpoint position as the light field image, so as to obtain a foreground image and a background image corresponding to the light field image.

In some embodiments, performing disparity estimation on a first foreground image corresponding to the first light field image and a second foreground image corresponding to the second light field image to obtain a first foreground disparity map corresponding to the first foreground image and the second foreground disparity map corresponding to the second foreground image includes:

Downsampling the first foreground image based on a preset downsampling coefficient to obtain a first downsampled image, and downsampling the second foreground image to obtain a second downsampled image;

matching the positions of the same pixels on the first downsampling diagram and the second downsampling diagram to obtain a first parallax diagram corresponding to the first downsampling diagram and a second parallax diagram corresponding to the second downsampling diagram;

determining a first parallax searching range according to the first parallax map and the preset downsampling coefficient, and determining a second parallax searching range according to the second parallax map and the preset downsampling coefficient;

and performing parallax estimation on the first foreground image based on the first parallax search range to obtain the first foreground parallax image, and performing parallax estimation on the second foreground image based on the second parallax search range to obtain the second foreground parallax image.

In some embodiments, the parallax mapping the first foreground image based on the first foreground parallax map to obtain a first foreground map, and parallax mapping the second foreground image based on the second foreground parallax map to obtain a second foreground map, including:

Matching each pixel on the first foreground image to a corresponding pixel position on a mapping graph according to the first foreground parallax graph to obtain the first foreground mapping graph; matching each pixel on the second foreground image to a corresponding pixel position on a mapping graph according to the second foreground parallax graph to obtain the second foreground mapping graph;

the image fusion processing is performed on the first foreground mapping image and the second foreground mapping image based on the target viewpoint information to obtain the foreground viewpoint image, including:

determining a first weight and a second weight according to the target viewpoint information and the position information of any two adjacent target light field cameras;

and carrying out image fusion processing on the first foreground mapping image and the second foreground mapping image based on the first weight and the second weight to obtain the foreground viewpoint image.

In some embodiments, the method performs viewpoint fusion on background images in each light field image based on the target viewpoint information and a pre-generated background parallax map of each viewpoint position to obtain a background viewpoint image corresponding to the target viewpoint information; the viewpoint location represents a location corresponding to each target light field camera, including:

Performing image segmentation on each light field image in the light field image group to obtain a background image corresponding to each light field image;

for a first light field image and a second light field image corresponding to any two adjacent target light field cameras, performing parallax mapping on the first background image of the first light field image based on a pre-generated first background parallax image with the same viewpoint position as the first light field image to obtain a first background mapping image, and performing parallax mapping on the second background image of the second light field image based on a pre-generated second background parallax image with the same viewpoint position as the second light field image to obtain a second background mapping image;

and carrying out image fusion processing on the first background mapping image and the second background mapping image based on the target viewpoint information to obtain the background viewpoint image.

In some embodiments, the acquiring the target viewpoint information includes:

acquiring a scene image by an image acquisition device arranged on the display equipment;

performing image detection according to the scene image to obtain the position information of eyes of an observer in the scene image;

and generating the target viewpoint information based on the position information.

In some implementations, a process of acquiring a set of light field images transmitted by an acquisition device includes:

transmitting the target viewpoint information to the acquisition equipment so that the acquisition equipment can determine one or more target light field cameras from a plurality of light field cameras according to the target viewpoint information;

and receiving the light field image group sent by the acquisition equipment.

In some embodiments, the light field image processing method described in the present disclosure further comprises:

and receiving and storing the background disparity map of each viewpoint position sent by the acquisition equipment.

In a second aspect, embodiments of the present disclosure provide a light field image processing method applied to an acquisition device, the method including:

respectively acquiring current scene images through a plurality of light field cameras arranged on the acquisition equipment to obtain scene images of viewpoint positions corresponding to each light field camera;

for any two adjacent light field cameras, generating a background parallax image of the viewpoint position of each light field camera according to scene images acquired by the two light field cameras respectively;

the background disparity map for each viewpoint position is transmitted to a display device, so that the display device stores the background disparity map for each viewpoint position.

receiving target viewpoint information sent by the display equipment;

determining one or more target light field cameras from a plurality of light field cameras included in the acquisition device according to the target viewpoint information;

acquiring light field images through the target light field camera to obtain a light field image group, and sending the light field image group to the display equipment.

In a third aspect, embodiments of the present disclosure provide a light field image processing apparatus applied to a display device, the apparatus including:

the acquisition module is configured to acquire target viewpoint information and a light field image group transmitted by the acquisition equipment; the target viewpoint information represents position information of eyes of an observer of the display device, and the light field image group comprises light field images acquired by each target light field camera in the acquisition device;

the foreground fusion module is configured to perform viewpoint fusion on foreground images in each light field image based on the target viewpoint information to obtain a foreground viewpoint image corresponding to the target viewpoint information;

the background fusion module is configured to perform viewpoint fusion on background images in each light field image based on the target viewpoint information and a pre-generated background parallax image of each viewpoint position to obtain a background viewpoint image corresponding to the target viewpoint information; the viewpoint position represents a position corresponding to each target light field camera;

An image synthesis module configured to generate a target light field image from the foreground viewpoint image and the background viewpoint image.

In some embodiments, the foreground fusion module is configured to:

In some embodiments, the context fusion module is configured to:

In some embodiments, the acquisition module is configured to:

and receiving the light field image group sent by the acquisition equipment.

In some embodiments, the acquisition module is configured to:

In a fourth aspect, embodiments of the present disclosure provide a light field image processing apparatus applied to an acquisition device, the apparatus comprising:

the image acquisition module is configured to acquire current scene images through a plurality of light field cameras arranged on the acquisition equipment respectively to obtain scene images of view point positions corresponding to each light field camera;

The parallax determining module is configured to generate a background parallax image of the viewpoint position of each light field camera according to scene images acquired by the two light field cameras respectively for any two adjacent light field cameras;

and the sending module is configured to send the background disparity map of each viewpoint position to a display device so that the display device stores the background disparity map of each viewpoint position.

In some embodiments, the transmitting module is configured to:

receiving target viewpoint information sent by the display equipment;

In a fifth aspect, embodiments of the present disclosure provide an electronic device, including:

a processor; and

a memory storing computer instructions for causing the processor to perform the method according to any of the embodiments of the first aspect or to perform the method according to any of the embodiments of the second aspect.

In a sixth aspect, embodiments of the present disclosure provide a video communication system, including:

a display device comprising an image acquisition apparatus and a first controller for performing the method according to any embodiment of the first aspect;

an acquisition device comprising a light field camera array comprising a plurality of light field cameras and a second controller for performing the method according to any embodiment of the second aspect.

In a seventh aspect, an embodiment of the disclosure provides a storage medium storing computer instructions for causing a computer to perform the method according to any embodiment of the first aspect, or to perform the method according to any embodiment of the second aspect.

The light field image processing method comprises the steps of obtaining target test point information and a light field image group sent by acquisition equipment, carrying out viewpoint fusion on foreground images in all light field images based on the target viewpoint information to obtain foreground viewpoint images, carrying out viewpoint fusion on background images in all light field images based on the target viewpoint information and a pre-generated background viewpoint image of each viewpoint position to obtain background viewpoint images, and generating the target light field images according to the foreground viewpoint images and the background viewpoint images. In the embodiment of the disclosure, the parallax images of the background images are generated in advance in a front background segmentation mode, so that the parallax images aiming at complex backgrounds do not need to be calculated in real time, the data volume of view image synthesis is greatly reduced, the image processing speed and the image processing precision are improved, and the real-time light field video communication can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is an architecture diagram of a video communication system in accordance with some embodiments of the present disclosure.

Fig. 2 is a schematic structural diagram of an electronic device according to some embodiments of the present disclosure.

Fig. 3 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 4 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 5 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 6 is a schematic diagram of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 7 is a schematic circuit diagram of a light field camera array in accordance with some embodiments of the present disclosure.

Fig. 8 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 9 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 10 is a schematic diagram of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 11 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 12 is a flow chart of a light field image processing method in accordance with some embodiments of the present disclosure.

Fig. 13 is a block diagram of a light field image processing apparatus in accordance with some embodiments of the present disclosure.

Fig. 14 is a block diagram of a light field image processing apparatus in accordance with some embodiments of the present disclosure.

Fig. 15 is a block diagram of an electronic device in accordance with some embodiments of the present disclosure.

Detailed Description

The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure. In addition, technical features related to different embodiments of the present disclosure described below may be combined with each other as long as they do not make a conflict with each other.

The definition of a Light Field (Light Field) refers to the amount of Light passing through each point in each direction, and a Light Field image can record Light ray data of a higher dimension than a conventional two-dimensional image, thereby presenting three-dimensional information of higher accuracy than conventional two-dimensional imaging and conventional three-dimensional imaging represented by binocular stereoscopic vision.

The light field video can accurately sense a dynamic environment, and when the view point of a user changes in combination with the eyeball tracking technology, the video picture can follow the change of the view point in real time, so that the visual field video can be presented to the naked eye 3D viewing experience of the user in the scene.

The data acquisition of the light field video needs to use a light field camera array, the light field camera array comprises a plurality of light field cameras or even tens of light field cameras at different positions, each light field camera is responsible for acquiring images at one viewpoint position, so that the data volume of the light field video is huge, the calculation amount of viewpoint synthesis of image data acquired by different cameras at the later stage is large, and the data processing speed is slow.

Therefore, the light field video in the related art is mainly used for offline video scenes, and is difficult to realize for real-time video scenes. For example, taking a remote video chat scene as an example, a user A views a light field video of a scene where a user B is located through a first device, processing and sending are required to be performed in real time on light field video data of the scene where the user B is located, and meanwhile, combining viewpoint information of the user A to perform synthesized rendering on a light field video picture, so that the data operand of the whole process is large, and real-time presentation is difficult to achieve.

Based on the defects of the related art, the embodiment of the disclosure provides a light field image processing method, a device, an electronic device, a video communication system and a storage medium, which aim to improve the light field data processing speed and realize real-time light field video presentation.

Fig. 1 shows an architecture diagram of a video communication system in some embodiments of the present disclosure, and an application scenario of the embodiments of the present disclosure is described below with reference to fig. 1.

As shown in fig. 1, in some embodiments, the video communication system includes a capture device 100 and a display device 200, the capture device 100 and the display device 200 establishing a communicable connection through a wired or wireless network.

In one exemplary unidirectional video communication scenario, acquisition device 100 may acquire light field image data of the scene in which user a is located and transmit the light field image data to display device 200. The display device 200 obtains the current viewing point position of the user B by tracking the eye position of the user B, performs viewpoint image synthesis by combining the viewpoint position and the light field image data sent by the acquisition device 100, and renders and displays the synthesized light field image on the display device 200.

It will be understood, of course, that the above example only exemplifies unidirectional video communication, but the present disclosure is not limited to unidirectional video communication scenes, and for bidirectional video communication scenes, the display device 200 may also collect light field image data of the scene where the user B is located and send the light field image data to the collection device 100. The acquisition device 100 may also track the eye position of the user a to obtain the current viewing viewpoint position of the user a, combine the viewpoint position with the light field image data sent by the display device 200 to perform viewpoint image synthesis, and render and display the synthesized light field image on the acquisition device 100. Those skilled in the art will appreciate that this disclosure is not repeated.

Taking a two-way video communication scenario as an example, fig. 2 shows a schematic structural diagram of an electronic device in some embodiments of the present disclosure, where the electronic device may be either the acquisition device 100 or the display device 200, and the present disclosure is not limited thereto.

As shown in fig. 2, the electronic device includes a display screen 110, a light field camera array 120, and an image capture device 130.

The display screen 110 is used to display the Light field image, and the display screen 110 may be any suitable screen component, such as an LCD (Liquid Crystal Display) screen, an OLED (Organic Light-Emitting Diode) screen, or the like, which is not limited in this disclosure.

The light field camera array 120 includes a plurality of light field cameras 121, and the light field cameras 121 are disposed in an array form on the electronic device, and since each light field camera 121 is disposed at a different position on the electronic device, the light field camera array 120 can acquire scene images at different viewpoint positions.

For example, in the example of fig. 2, the light field camera array 120 includes 9 light field cameras 121 in total, and the 9 light field cameras 121 are uniformly spaced apart at the upper edge of the display screen 110. Of course, those skilled in the art will appreciate that the number of cameras and the manner of deployment of the light field camera array 120 are not limited to that shown in fig. 2, but may be in any other form suitable for implementation, and the disclosure is not limited in this regard.

The image capturing device 130 is a camera for implementing eye tracking of the user, which may be, for example, a high-precision RGB camera, that is, the image capturing device 130 determines current viewpoint information of the user, that is, position information representing eyes of the user, by capturing a current scene image and performing image detection on the scene image. In the example of fig. 2, the image capture device 130 is disposed below the display screen 110 of the electronic device, but it is understood that the present disclosure is not limited to the location of the image capture device 130.

On the basis of the video communication systems shown in fig. 1 and 2 described above, a light field image processing method according to an embodiment of the present disclosure will be described below.

It should be noted that, for ease of understanding, in the following embodiments of the present disclosure, an unidirectional video communication scene will be taken as an example, that is, the acquisition device 100 is used as a light field data acquisition end, the display device 200 is used as a light field video display end, and the principle of the bidirectional video communication scene is exactly the same as that, which is not repeated in the present disclosure.

In some implementations, the disclosed examples provide a light field image processing method that is applicable in display device 200, the processing being performed by a processor of display device 200, as described below in connection with fig. 3.

As shown in fig. 3, in some embodiments, a light field image processing method of an example of the present disclosure includes:

and S310, acquiring target viewpoint information and a light field image group transmitted by the acquisition device.

It will be appreciated that the light field image comprises a number of viewpoint images at a plurality of viewpoint positions, but the viewpoint positions that the user's eyes can receive are limited, i.e. not all the viewpoint images will enter the user's eyes, so that the target light field image rendered and displayed on the display device 200 is a viewpoint composite image generated in combination with the current viewpoint position of the user. Therefore, in the embodiment of the disclosure, the viewpoint image synthesis needs to be performed in combination with the current target viewpoint information of the user, so as to obtain the target light field image.

In combination with the scenario shown in fig. 1, at the end of the display device 200, the eye tracking of the user B may be implemented by using the image acquisition device 130 on the display device 200 in combination with an eye tracking algorithm, so as to obtain the position information of the eyes of the user B, where the position information is the target eye point information in the disclosure. The process of obtaining the target viewpoint information by eye tracking will be described in the following embodiments of the present disclosure, which will not be described in detail here.

Meanwhile, the display device 200 also needs to receive a light field image group from the acquisition device 100, which refers to an image group acquired by the light field camera array 120 on the acquisition device 100. Referring to fig. 2, in this example, the light field camera array includes 9 light field cameras, each of which can collect a light field image, and the light field image group is a set of the light field images.

It should be noted that, in some embodiments, the light field image group may include a light field image acquired by each light field camera, for example, in the example of fig. 2, 9 light field cameras acquire one light field image respectively, so that the light field image group includes 9 light field images in total, that is, each light field camera 121 in the light field camera array 120 is a target light field camera in the disclosure.

In other embodiments, as can be seen from the foregoing principle, not all the viewpoint images can enter the eyes of the observer due to the limited number of viewpoints that the human eyes can receive, i.e., there are a large number of redundant images in the light field image group acquired by the light field camera array 120, and this part of redundant image data also results in a slow data processing speed.

Therefore, one or more target light field cameras can be selected from the plurality of light field cameras based on the target viewpoint information to collect light field images, and the target light field cameras are light field cameras at positions corresponding to the target viewpoint information. It can be understood that only the target light field camera at the viewpoint position of the user is used for acquiring the light field image group, redundant image data can be screened out, the operation amount is reduced under the condition of ensuring the image precision, and the data processing efficiency is improved.

S320, performing viewpoint fusion on the foreground images in each light field image based on the target viewpoint information to obtain the foreground viewpoint image corresponding to the target viewpoint information.

S330, performing viewpoint fusion on the background images in the light field images based on the target viewpoint information and the pre-generated background parallax images of each viewpoint position to obtain background viewpoint images corresponding to the target viewpoint information.

It can be understood that the light field image displayed on the display device 200 needs to change along with the viewpoint position of the user, so that when the target light field image is generated, the light field image in the light field image group needs to be synthesized by combining the target viewpoint information of the user, so as to obtain the target light field image under the new viewpoint of the user, and realize the picture follow-up effect.

In the embodiment of the disclosure, in order to accelerate the synthesis speed of the viewpoint images and meet the real-time requirement of the light field video, the viewpoint images are synthesized into the foreground viewpoint image synthesis and the background viewpoint image synthesis.

It should be noted that, in conjunction with a video communication scenario, taking a video conference as an example, a large-screen electronic device for implementing the video conference tends to be relatively fixed in position, so that during the video communication, a background portion in a scene image acquired by the acquisition device 100 will hardly change, and generally only a foreground person or object will move.

Therefore, in the embodiment of the present disclosure, segmentation of the foreground and the background in the light field image may be considered, and for the background image with small variation, the background parallax map of each viewpoint position may be generated in advance by means of preprocessing.

The disparity (Map) is an image reflecting the pixel shift, and is a two-dimensional image with the same size as the original image, and each pixel on the disparity Map represents the disparity value of the pixel at the position on the original image. For example, in fig. 2, taking two arbitrary adjacent light field cameras 121 in the light field camera array 120 as an example, because the positions of the two light field cameras are different, the pixels on the collected light field images also have deviations, and the parallax map is used for recording the pixel deviations on the two images.

For a real-time light field video communication scene, since the light field camera array 120 includes a large number of cameras and each camera position represents a viewpoint position, when the viewpoint images are synthesized, a parallax map of each light field image and an adjacent image needs to be calculated, and the calculated data volume is large, which makes it difficult to realize real-time performance.

In the embodiment of the disclosure, the background portion is considered to be hardly changed in the scene where the user a is located, so that the parallax map can be generated in advance for the background portion of the light field image. For example, when the acquisition device 100 is turned on, the light field camera array 120 on the acquisition device 100 acquires a current scene image, and a background parallax map corresponding to each viewpoint position is built in advance, so that when the subsequent acquisition device 100 establishes video communication connection with the display device 200, the background parallax map is sent to the display device 200, and then the display device 200 can directly synthesize a background viewpoint image on a background portion in the current light field image based on the pre-obtained background parallax map, without recalculating the parallax map of the background portion, thereby reducing the data amount and accelerating the synthesis of the viewpoint image.

Meanwhile, as for the foreground images in the light field images, combining with video communication scenes, the foreground people generally occupy only a small range of the images, and after the complex background images are segmented out, the speed of view image synthesis for the foreground images is also greatly improved, so that the real-time performance and low-delay requirements of video communication are met.

In some embodiments of the present disclosure, after receiving the light field image group sent by the acquisition device 100, the display device 200 may first perform image segmentation on each light field image in the light field image group to obtain a foreground image and a background image corresponding to each light field image.

For a foreground image, since a foreground parallax image cannot be established in advance, real-time parallax image calculation can be performed on the foreground parallax image to obtain a foreground parallax image of each viewpoint position, and then viewpoint fusion is performed on each foreground image based on the foreground parallax image and target viewpoint information to obtain a foreground viewpoint image under the corresponding target viewpoint information.

For the background images, since the background parallax images of the viewpoint positions are established in advance, the background viewpoint images corresponding to the target viewpoint information can be obtained by directly carrying out viewpoint fusion on the background images based on the pre-established background parallax images and the target viewpoint information without parallax image calculation.

For the viewpoint fusion process of the foreground viewpoint image and the background viewpoint image, this is described in the following embodiments of the present disclosure, which will not be described in detail herein.

And S340, generating a target light field image according to the foreground viewpoint image and the background viewpoint image.

It will be appreciated that the foreground viewpoint image represents a foreground image at a new viewpoint position (i.e., a viewpoint position corresponding to the target viewpoint information), and the background viewpoint image represents a background image at a new viewpoint position (i.e., a viewpoint position corresponding to the target viewpoint information).

Therefore, after the foreground viewpoint image and the background viewpoint image are obtained, the foreground viewpoint image and the background viewpoint image are fused, and then the target light field image can be obtained. For example, in one example, the foreground viewpoint image is attached to the background viewpoint image, so that the target light field image can be obtained.

After the target light field image is obtained, the target light field image can be rendered and displayed on a display screen of the display device 200, and the user B can view the target light field image through the display screen.

It will be appreciated that, in the foregoing embodiment, the generation process of one frame of the target light field image is described, and the foregoing process is repeatedly performed for each frame of the light field video, so that the playing of the light field video can be implemented on the display device 200.

Meanwhile, it can be understood that in the process of generating the target light field image, the viewpoint position of the observer (namely, the viewpoint position corresponding to the target viewpoint information) needs to be combined, so that when the eye position of the observer moves, the target viewpoint information is updated accordingly, the finally obtained target light field image is changed into the light field image corresponding to the new viewpoint position, and the naked eye 3D effect that the video picture follows the viewpoint change of the observer is presented. In a video conference scene, for example, a user generates an immersive experience, the distance sense generated by conventional two-dimensional video communication is eliminated, and the user communication experience is improved.

As can be seen from the foregoing, in the embodiment of the present disclosure, by dividing the foreground and the background of the light field image, and implementing the synthesis of the viewpoint image based on the pre-generated background parallax map, the data volume of the synthesis of the viewpoint image is reduced, the operation efficiency and the accuracy are improved, and further, the real-time low-delay communication of the light field video is implemented, and the user video communication experience is improved.

As can be seen from the foregoing, in the embodiment of the present disclosure, a background disparity map for each viewpoint position needs to be generated in advance at the acquisition device 100, and a process of generating the background disparity map will be described below with reference to fig. 4.

As shown in fig. 4, in some embodiments, the light field image processing method of the examples of the present disclosure is applied to the acquisition device 100, and the process of generating the background disparity map by the acquisition device 100 includes:

s410, respectively acquiring current scene images through a plurality of light field cameras arranged on the acquisition equipment, and obtaining scene images of view point positions corresponding to each light field camera.

S420, for any two adjacent light field cameras, generating a background parallax image of the viewpoint position of each light field camera according to scene images acquired by the two light field cameras respectively.

And S430, transmitting the background disparity map of each viewpoint position to the display device, so that the display device stores the background disparity map of each viewpoint position.

In some embodiments, the processes of S410-S430 may be performed to generate a background disparity map for the current scene each time the acquisition device 100 is powered on.

For example, in the example of fig. 1, when the capturing device 100 is turned on, at this time, the capturing device 100 does not establish a video communication connection with the display device 200, so that no user a is in the scene where the capturing device 100 is located, and only the background is included, and at this time, the capturing device 100 may be used to generate a background disparity map for each viewpoint position.

As shown in connection with fig. 2, the acquisition device 100 may acquire one current scene image by each light field camera in the light field camera array 120 disposed above, that is, in the example of fig. 2, 9 light field cameras 121 in the light field camera array 120 acquire one current scene image.

It can be understood that, since the deployment position of each light field camera 121 in the light field camera array 120 is different, the acquired scene image corresponds to a different viewpoint position, and the parallax map of a certain viewpoint position can be understood as the pixel deviation of the light field image of that viewpoint position relative to the light field images of other viewpoint positions. In the embodiment of the disclosure, the background disparity map of each viewpoint position is a disparity map of a light field image of the viewpoint position and a light field image of an adjacent viewpoint position.

In some embodiments of the present disclosure, the background disparity map of each disparity position relative to the neighboring disparity positions may be determined based on a disparity estimation algorithm, and for the specific process of the disparity estimation algorithm, reference may be made to the following method process for calculating the foreground disparity map, which is not described in detail herein.

In other embodiments, the estimation process of the background disparity map is considered to be pre-constructed, so that there is no requirement on the operation speed, and a disparity estimation method with higher precision can be adopted. For example, a parallax estimation network based on a depth neural network (DNN, deep Neural Network) may be used, the parallax estimation network may be trained in advance by means of manual labeling, and then the scene images of any two adjacent viewpoint positions are input into the parallax estimation network to obtain a background parallax map corresponding to each viewpoint position.

After obtaining the background disparity map for each viewpoint position through the above-described procedure, the acquisition apparatus 100 may store the background disparity map in a buffer.

After completing the preparation work of the offline stage, when the display device 200 establishes a video communication connection with the capture device 100, it indicates that the user B of the display device 200 needs to perform video communication with the user a of the capture device 100, at which time the capture device 100 may send the background disparity map stored in the buffer to the display device 200, and after receiving the background disparity map of each viewpoint position, the display device 200 may store the background disparity map to wait for a subsequent viewpoint image synthesis stage call. The following will describe the interaction process of the capturing apparatus 100 and the display apparatus 200 in the video communication process.

In connection with the foregoing, in the video communication scene, the viewpoint that can be observed by the user B of the display device 200 is limited, and thus, there are a large number of redundant images in the light field image group acquired by the light field camera array 120 of the acquisition device 100. In some embodiments of the present disclosure, the target light field camera may be determined from the plurality of light field cameras of the acquisition device 100 according to the current eye position information of the user B, as described below in connection with fig. 5.

As shown in fig. 5, in some embodiments, a light field image processing method of an example of the present disclosure includes:

s510, the display equipment acquires scene images through the image acquisition device.

As shown in connection with fig. 1 and 2, the display device 200 may acquire an image of a scene including the user B through an image acquisition means 130 provided on the device.

And S520, the display equipment performs image detection according to the scene image to obtain the position information of eyes of the observer in the scene image.

The display device 200 may then perform image detection on the captured scene image based on the eye tracking algorithm to determine positional information of the eyes of the observer (i.e., user B), which positional information represents the image position of the eyes of the observer on the scene image.

S530, the display device generates target viewpoint information based on the position information.

Thereafter, the display apparatus 200 may map the position information onto the viewpoint position of the light field camera array 120 of the display apparatus 200 according to the position information of the observer's eyes, resulting in target viewpoint information, that is, target viewpoint information representing the position information of the observer's eyes in the light field camera array 120 of the display apparatus 200.

For example, in one example, the position information of the eyes of user B is mapped to target viewpoint information in the light field camera array 120 of the display device 200 as shown in FIG. 6, i.e., the target viewpoint information corresponds to the viewpoint positions of the light field cameras 121-a, 121-B, and 121-c.

S540, the display device sends the target viewpoint information to the acquisition device.

The display device 200 transmits the target viewpoint information to the acquisition device 100 to cause the acquisition device 100 to determine one or more target light field cameras from the plurality of light field cameras according to the target viewpoint information.

S550, the acquisition device 100 determines one or more target light field cameras from the plurality of light field cameras according to the target viewpoint information.

In the embodiment of the present disclosure, as shown in connection with fig. 6, the target viewpoint information represents position information of the observer's eyes in the light field camera array 120, so that the acquisition device 100 can determine a preset number of target light field cameras from the light field camera array 120 according to the target viewpoint information.

It is understood that the target light field cameras refer to light field cameras located near the target viewpoint information, and the number of the target light field cameras can be selected according to specific scenes, however, the number of the target light field cameras should ensure that the viewpoint position of the target light field camera can cover the viewpoint position of the target viewpoint information. Thus, for example, in the example of fig. 6, the number of target light field cameras may be set to 3, and the 3 target light field cameras may cover the viewpoints of both eyes of the user. Of course, those skilled in the art will understand that the number of target light field cameras is not limited to 3, but may be other numbers, and this disclosure will not be repeated.

S560, the acquisition equipment acquires the light field images through the target light field camera to obtain a light field image group.

After determining the target light field cameras, the acquisition device 100 can acquire the current scene images by using the target light field cameras, so as to obtain light field images acquired by each target light field camera, wherein the light field image group is a set of light field images acquired by each target light field camera.

In some embodiments, after the target light field camera acquires the light field images, preprocessing may be further performed on each light field image, where the purpose of preprocessing is to improve the accuracy of the light field images, and the preprocessing may include, for example, image calibration, distortion correction, and the like.

In the embodiment of the disclosure, in order to improve the preprocessing speed of the light field image and further ensure the implementation, a circuit topology structure of a multi-processing chip can be adopted. For example, in the example of fig. 7, cam0 to Cam8 are light field cameras in the light field camera array 120 respectively, in the circuit topology, each 3 adjacent light field cameras can be orderly organized into a group according to the serial numbers of Cam0 to Cam8, the circuit includes 3 processing chips, the input end of the processing chip tx_0 is connected with Cam0, cam3 and Cam6 respectively, the input end of the processing chip tx_1 is connected with Cam1, cam4 and Cam7 respectively, and the input end of the processing chip tx_2 is connected with Cam2, cam5 and Cam8 respectively.

Therefore, through the circuit structure shown in fig. 7, for any adjacent 3 target light field cameras, one processing chip can be ensured to independently process one path of image processing, parallel processing of light field image data is realized, and the image preprocessing speed is improved.

S570, the acquisition device sends the light field image group to the display device.

In the embodiment of the present disclosure, the acquisition device 100 obtains a light field image group after preprocessing a light field image acquired by a target light field camera, and then transmits the light field image group to the display device 200.

As can be seen from the foregoing, in the embodiment of the present disclosure, the target light field camera is determined from the light field camera array based on the current target viewpoint information of the user to perform light field image acquisition, so as to screen out the light field images at the redundant viewpoint positions, reduce the data amount synthesized by the viewpoint images, and improve the light field image processing efficiency.

For the display device 200, after receiving the light field image group sent by the acquisition device 100, it can perform view image synthesis according to the target view information and the light field image group. As can be seen from the foregoing, in the embodiment of the present disclosure, the viewpoint image is synthesized into the foreground viewpoint image synthesis and the background viewpoint image synthesis, so that after the display device 200 obtains the light field image group, the light field image needs to be segmented into the foreground and the background. In some implementations, a light field image processing method of examples of the present disclosure includes:

and carrying out image segmentation on each light field image in the light field image group to obtain a foreground image and a background image corresponding to each light field image.

It will be appreciated that the set of light field images includes light field images captured by each target light field camera, such as 3 light field images in the light field image set received by display device 200 in the example of fig. 7.

The display device 200 may then perform image segmentation on each light field image, the purpose of the image segmentation being to segment the foreground and background from the light field image. Taking a video communication scene as an example, the foreground is a person in the light field image, and the background is other image parts except the person.

In the embodiment of the disclosure, a specific algorithm for image segmentation is not limited, and a person skilled in the art may use any image segmentation algorithm in the related art to realize segmentation of the foreground and the background. However, in order to increase the image segmentation speed, in some embodiments of the present disclosure, image segmentation may be performed based on a background image obtained in advance.

It can be understood that, as can be seen in connection with the embodiment of fig. 4, in the process of generating the background disparity map by the capturing device 100 in advance, the current scene image captured by the capturing device 100 is the background image that does not include the person. In other words, in the foregoing S410, the capturing device 100 has captured, through each light field camera, a background image of each viewpoint position, which is a preset background image according to the present disclosure.

Thus, in some embodiments, in S430, when the capturing device 100 transmits the background disparity map to the display device 200, the scene images of the respective viewpoint positions captured by each light field camera are simultaneously transmitted to the display device 200, and the display device 200 receives and stores the scene images, that is, the preset background images as described in the present disclosure.

Therefore, in the image segmentation process, for the light field image at a certain viewpoint position, image difference can be carried out on the light field image according to a preset background image corresponding to the viewpoint position, so that the same image position of the light field image and the light field image is a background area, and different image positions of the light field image and the light field image are foreground areas, and based on the principle, the foreground and the background segmentation of each light field image can be rapidly realized, and the image segmentation speed is improved.

After the above process is performed on each light field image, a foreground image and a background image corresponding to each light field image can be obtained, and then, the foreground viewpoint image synthesis and the background viewpoint image synthesis are realized based on the foreground image and the background image, which are described below respectively.

As shown in fig. 8, in some embodiments, the light field image processing method illustrated in the disclosure synthesizes a foreground viewpoint image, which includes:

s810, performing parallax estimation on a first foreground image corresponding to the first light field image and a second foreground image corresponding to the second light field image for the first light field image and the second light field image corresponding to any two adjacent target light field cameras to obtain a first foreground parallax image corresponding to the first foreground image and a second foreground parallax image corresponding to the second foreground image.

S820, performing parallax mapping on the first foreground image based on the first foreground parallax image to obtain a first foreground map, and performing parallax mapping on the second foreground image based on the second foreground parallax image to obtain a second foreground map.

S830, performing image fusion processing on the first foreground mapping image and the second foreground mapping image based on the target viewpoint information to obtain a foreground viewpoint image.

It should be noted that, in order to implement the synthesis of the viewpoint images, first, it is necessary to calculate the disparity maps of the images of two adjacent viewpoint positions, and in the embodiment of the present disclosure, the process of synthesizing the foreground viewpoint images and the background viewpoint images of two light field images of any adjacent viewpoint positions will be described by taking two light field images of any adjacent viewpoint positions as an example.

For example, in the example of fig. 7, the light field image group includes 3 light field images, which are respectively a first light field image I1, a second light field image I2 and a third light field image I3, wherein the first light field image I1 and the second light field image I2 are light field images at adjacent viewpoint positions, and the second light field image I2 and the third light field image I3 are light field images at adjacent viewpoint positions. In the following embodiments of the present disclosure, the first light field image I1 and the second light field image I2 will be taken as examples, and the second light field image I2 and the third light field image I3 may refer to the same method steps, which are not described herein.

In combination with the foregoing, the first foreground image I corresponding to the first light field image I1 can be obtained by performing image segmentation on the first light field image I1 _fore 1 and first background image I _back 1, and likewise, performing image segmentation on the second light field image I2 to obtain a second foreground image I corresponding to the second light field image I2 _fore 2 and second background image I _back 2。

For foreground view image synthesis, it is first necessary to base on the first foreground image I _fore 1 and second foreground image I _fore 2 determines a disparity map, as described below in connection with fig. 9.

As shown in fig. 9, in some embodiments, a process of obtaining a first foreground disparity map and a second foreground disparity map according to a light field image processing method of an example of the present disclosure includes:

s811, downsampling the first foreground image based on a preset downsampling coefficient to obtain a first downsampled image, and downsampling the second foreground image to obtain a second downsampled image.

It should be noted that, in the embodiment of the present disclosure, considering that the original image scale of the light field image is larger, if disparity map estimation is performed based on the original image scale, the data volume is larger, so that the foreground image of the original image scale can be downsampled, thereby reducing the image size, reducing the data volume, and further improving the operation speed.

Referring to fig. 10, first, a first foreground image I of the original scale may be first of all _fore 1 down-sampling to obtain a first down-sampling diagram I _fore 1', likewise will be a second foreground image I of the original scale _fore 2 downsampling to obtain a second downsampled image I _fore 2’。

It should be noted that, the preset downsampling coefficient refers to a magnification of image reduction, the coefficient may be set according to a specific scene, the larger the image reduction magnification is, the higher the operation speed is, the lower the image reduction magnification is, the lower the operation speed is, the higher the accuracy is, and based on this rule, in one example, the preset downsampling coefficient may be 1/4, 1/8, 1/16, and the like.

And S812, matching the positions of the same pixels on the first downsampling diagram and the second downsampling diagram to obtain a first parallax diagram corresponding to the first downsampling diagram and a second parallax diagram corresponding to the second downsampling diagram.

In the disclosed embodiment, in the first foreground image I _fore 1 and second foreground image I _fore 2, i.e. based on the first downsampled map I after downsampling _fore 1' and second downsampled diagram I _fore 2' are used for disparity estimation, the basic principle of which is to match the positions of the same pixels on the image, so as to determine the distance of pixel offset, namely the pixel value on the disparity map is the distance representing the pixel offset.

In a first foreground image I _fore 1 as an example, a first foreground image I _fore 1 corresponding to a first disparity map I _forepara 1' represents the first foreground image I _fore 1 with respect to the second foreground image I _fore 2 offset distance of the same pixel.

Thus, parallax estimation can be realized through the pixel matching process, and a first foreground image I is obtained _fore 1 corresponding to a first disparity map I _forepara 1' and a second foreground image I _fore 2 corresponding to the second parallax image I _forepara 2'. First disparity map I _forepara 1' and second disparity map I _forepara 2' may be as shown in fig. 10.

S813, determining a first parallax searching range according to the first parallax map and a preset downsampling coefficient, and determining a second parallax searching range according to the second parallax map and the preset downsampling coefficient.

As can be seen from fig. 10, the first parallax map I _forepara 1' and second disparity map I _forepara The image scale of 2' is the same as the downsampled image scale, so the first disparity map I needs to be processed _forepara 1' and second disparity map I _forepara 2' to the original dimension.

With a first disparity map I _forepara 1' as an example, it can be appreciated that due to the first disparity map I _forepara The 1' scale is smaller, so the first disparity map I _forepara 1' and each pixel corresponds to a range on the original map, in the embodiment of the present disclosure, the first parallax map I may be _forepara 1' and determining a parallax search range (min ', max ') of the corresponding pixel of each image block by taking the image block as a unit, wherein the parallax search range (min ', max ') represents a pixel range of the image block corresponding to the original image size, and the specific numerical value of the parallax search range can be determined by combining the maximum value and the minimum value of the pixel value of the image block.

Then, the parallax search range (min ', max') is restored to the original image scale, that is, the minimum value and the maximum value of the parallax search range (min ', max') are multiplied by the reciprocal of the preset sampling coefficient to obtain a first parallax search range (min, max), namely, the parallax search range on the original image scale is represented by the first parallax search range (min, max). Similarly, the second disparity map I is subjected to the process _forepara And 2' processing to obtain a second parallax search range.

S814, performing parallax estimation on the first foreground image based on the first parallax search range to obtain a first foreground parallax image, and performing parallax estimation on the second foreground image based on the second parallax search range to obtain a second foreground parallax image.

In the embodiment of the disclosure, the first foreground image I _fore 1 as an example, after the first parallax search range (min, max) is obtained by the foregoing, the first parallax search range represents the first foreground image I _fore 1 and second foreground image I _fore 2, whereby a first foreground image I is subjected to a search range matching the same pixels on 2, based on the first parallax search range _fore 1, performing parallax estimation to obtain a first foreground image I _fore 1, a first foreground disparity map I corresponding to 1 _forepara 1. Similarly, a second foreground image I is searched for based on a second parallax search range _fore 2, performing parallax estimation to obtain a second foreground image I _fore 2, a second foreground disparity map I corresponding to _forepara 2。

As can be seen from the foregoing, in the embodiment of the present disclosure, by performing parallax estimation after downsampling a foreground image, the data size can be effectively reduced, the image processing efficiency is improved, and the real-time requirement of a video communication scene is satisfied.

In the embodiment of the disclosure, the first foreground images I are obtained respectively _fore 1 and second foreground image I _fore 2, a first foreground disparity map I corresponding to 2 _forepara 1 and second foreground disparity map I _forepara 2, performing viewpoint synthesis on the foreground image according to the foreground parallax map to obtain a foreground viewpoint image at a new viewpoint position, which is described below with reference to fig. 10.

As shown in fig. 11, in some embodiments, a light field image processing method of an example of the present disclosure, a process of obtaining a foreground viewpoint image includes:

s1110, matching each pixel on the first foreground image to a corresponding pixel position on the mapping graph according to the first foreground parallax graph to obtain a first foreground mapping graph; and matching each pixel on the second foreground image to the corresponding pixel position on the mapping graph according to the second foreground parallax graph to obtain a second foreground mapping graph.

In combination with the foregoing, the disparity map is an image reflecting the shift of the pixel position due to the difference of the viewpoint positions, each pixel on the disparity map represents the disparity value of the pixel at that position on the original map, and the first foreground disparity map I _forepara 1 and second foreground disparity map I _forepara 2 as an example, in case of neglecting calculation errors, the first foreground image I _fore 1 pixel coordinates plus a first foreground disparity map I _forepara The disparity value on 1 is the second foreground image I _fore 2, and likewise, a second foreground image I _fore Pixel coordinates on 2 plus a second foreground disparity map I _forepara The parallax value on 2 is the first foreground image I _fore 1.

Thus, after obtaining the first foreground disparity map I _forepara 1 and second foreground disparity map I _forepara 2, can be according to the first foreground disparity map I _forepara 1 and second foreground disparity map I _forepara 2 pairs of first foreground images I _fore 1 and second foreground image I _fore 2, performing pixel offset processing, wherein the pixel offset processing process is a pixel coordinate mapping process. In the first foreground image I _fore After the mapping processing of 1, a first foreground mapping diagram I can be obtained _foremap 1 for the second foreground image I _fore After the mapping process of 2, a first foreground mapping diagram I can be obtained _foremap 2。

With a first foreground map I _foremap 1 as an example, may be based on the first foreground disparity map I _forepara 1 for a first foreground image I _fore 1 to obtain a first foreground map I _foremap 1, from the first foreground image I _fore 1 to first foreground map I _foremap 1 to obtain a first foreground map I _foremap 1. Similarly, according to the second procedureForeground disparity map I _forepara 2 pairs of second foreground images I _fore 2, mapping to obtain a corresponding second foreground map I _foremap 2。

It is worth noting that, in the case of the first foreground image I _fore 1 and second foreground image I _fore 2, before the mapping process, the first foreground disparity map I may also be subjected to _forepara 1 and second foreground disparity map I _forepara 2, performing consistency check, wherein the purpose of the consistency check is to perform consistency check on the first foreground disparity map I _forepara 1 and second foreground disparity map I _forepara 2, correcting to improve the accuracy of the parallax map. For the consistency check procedure, those skilled in the art may refer to the related art, and this disclosure will not be repeated.

S1120, determining a first weight and a second weight according to the target viewpoint information and the position information of any two adjacent target light field cameras.

S1130, performing image fusion processing on the first foreground mapping image and the second foreground mapping image based on the first weight and the second weight to obtain a foreground viewpoint image.

In view of the foregoing, the target viewpoint information refers to the current new viewpoint position of the user, and the first foreground map I is required to be mapped by combining the target viewpoint information _foremap 1 and second foreground map I _foremap 2, performing viewpoint image synthesis, thereby obtaining a foreground viewpoint image at a new viewpoint position of the user.

In the first foreground map I _foremap 1 and second foreground map I _foremap 2, when view image synthesis is performed, first, a first foreground map I needs to be determined _foremap 1 and second foreground map I _foremap 2, which weight represents the first foreground map I _foremap 1 and second foreground map I _foremap 2.

It will be appreciated that the first foreground map I _foremap 1 corresponds to the viewpoint position of a target light field camera, and a second foreground map I _foremap 2 corresponds to the viewpoint position of another target light field camera, and the target viewpoint information corresponds to the viewThe point position may be any position between the two, and the new viewpoint is different from the viewpoint position of the target light field camera, and the corresponding weight value should also be different.

Based on this principle, in the embodiment of the present disclosure, the fusion coefficients for the left and right foreground maps, that is, the first weight and the second weight, may be determined according to the target viewpoint information and the position information of the left and right target light field cameras. For example, the target viewpoint information is located from the first foreground map I _foremap 1, the first foreground map I _foremap The first weight w1 corresponding to 1 is larger; otherwise, if the target viewpoint information is distant from the second foreground map I _foremap 2, the second foreground map I _foremap The second weight w2 corresponding to 2 is larger. Accordingly, a person skilled in the art may determine specific values of the first weight w1 and the second weight w2 in combination with the target viewpoint information, which is not limited in this disclosure.

After determining the first weight w1 and the second weight w2 according to the target viewpoint information, the first foreground map I can be mapped according to the first weight w1 and the second weight w2 _foremap 1 and second foreground map I _foremap 2, performing fusion treatment, wherein the fusion treatment process can be expressed as:

P ＝ w1* I _foremap 1+ w2* I _foremap 2 (1)

in the formula (1), P represents a foreground viewpoint image obtained after the fusion process. Through the above process, for the first foreground image I _fore 1 and second foreground image I _fore And 2, combining the target viewpoint information to synthesize the viewpoint images, and obtaining a foreground viewpoint image P at the corresponding new viewpoint position.

As shown in fig. 12, in some embodiments, the light field image processing method illustrated in the disclosure synthesizes a background viewpoint image, which includes:

s1210, for a first light field image and a second light field image corresponding to any two adjacent target light field cameras, performing parallax mapping on the first background image of the first light field image based on a pre-generated first background parallax image with the same viewpoint position as the first light field image to obtain a first background mapping image, and performing parallax mapping on the second background image of the second light field image based on a pre-generated second background parallax image with the same viewpoint position as the second light field image to obtain a second background mapping image.

S1220, performing image fusion processing on the first background mapping image and the second background mapping image based on the target viewpoint information to obtain a background viewpoint image.

For the synthesis of the background viewpoint images, taking the first light field image I1 and the second light field image I2 of adjacent viewpoint positions in the light field image group as an example, the first background image I of the first light field image I1 is obtained by image segmentation _back 1, and a second background image I of the second light field image I2 _back 2。

In the first background image I _back 1 and second background image I _back 2, there is no need to calculate the foreground disparity map during the synthesis of the foreground viewpoint image, since the background disparity map for each viewpoint position has been previously generated in an off-line stage by the aforementioned embodiment of fig. 4, and thus there is no need for on-line calculation.

Thus, for background viewpoint image synthesis, first, the first background image I can be used _back 1, and determining a first background disparity map I with the same viewpoint position from the background disparity maps _backpara 1, in the same way, according to the second background image I _back 2, determining a second background disparity map I with the same viewpoint position from the background disparity maps _backpara 2。

After the background disparity map is determined, the new viewpoint position combined with the target viewpoint information can be used for the first background image I _back 1 and second background image I _back 2, the background viewpoint image synthesis is performed, and the same procedure as the aforementioned foreground viewpoint image synthesis principle is performed, and only a brief description will be made below.

First, it can be based on the first background disparity map I _backpara 1 for the first background image I _back 1, performing an offset process for each pixel on the display panel 1, the offset processThe process of (1) is a mapping process aiming at pixel coordinates, so as to obtain a first background mapping diagram I _backmap 1 from the first background image I _back 1 to first background map I _backmap 1 to obtain a first background mapping diagram I _backmap 1. Similarly, the same procedure is followed according to a second background disparity map I _backpara 2 pairs of second background images I _back 2, mapping to obtain a corresponding second background mapping diagram I _backmap 2。

It is worth noting that, in the case of the first background image I _back 1 and second background image I _back 2, before the mapping process, the first background disparity map I can also be processed _backpara 1 and second background disparity map I _backpara 2, performing consistency check, wherein the purpose of the consistency check is to perform consistency check on the first background disparity map I _backpara 1 and second background disparity map I _backpara 2, correcting to improve the accuracy of the parallax map. For the consistency check procedure, those skilled in the art may refer to the related art, and this disclosure will not be repeated.

Then, according to the target viewpoint information and the position information of the left and right target light field cameras, a fusion coefficient for the left and right background maps, that is, the first weight w1 and the second weight w2, is determined, which is not described in detail. Then the first background mapping diagram I can be mapped according to the first weight w1 and the second weight w2 _backmap 1 and second background map I _backmap 2, performing fusion treatment, wherein the fusion treatment process can be expressed as:

Q ＝ w1* I _backmap 1+ w2* I _backmap 2 (2)

in the formula (2), Q represents a background viewpoint image obtained after the fusion process. Through the above process, for the first background image I _back 1 and second background image I _back And 2, combining the target viewpoint information to synthesize the viewpoint image, and obtaining a background viewpoint image Q corresponding to the new viewpoint position.

The foreground viewpoint image P and the background viewpoint image Q are obtained through the above process, and then fusion processing is carried out on the foreground viewpoint image P and the background viewpoint image Q, for example, the background viewpoint image Q is attached to the foreground viewpoint image P, and a target light field image corresponding to the new viewpoint position finally can be obtained. The above description is only performed by the view image synthesis between the first light field image I1 and the second light field image I2, and the view image synthesis process between any two other adjacent light field images is the same as the view image synthesis process, and the target light field image at each view position included in the target view information of the user can be obtained by sequentially referring to the execution.

After obtaining one or more target light field images, the target light field images can be rendered and displayed on the display screen 110 of the display device 200, and for the image rendering and displaying process, as will be understood by referring to the related art, the disclosure will not be repeated.

As can be seen from the foregoing, in the embodiment of the present disclosure, the disparity map of the background image is generated in advance by means of front-background segmentation, so that it is unnecessary to calculate the disparity map for the complex background in real time, the data size of the viewpoint image synthesis is greatly reduced, the image processing speed and accuracy are improved, and real-time light field video communication can be realized. In addition, in the calculation process of the foreground parallax map, parallax estimation is carried out based on the downsampled small map, so that the calculated amount is further reduced, and the data processing speed is improved. In addition, the target light field camera is selected from the light field camera array through the target viewpoint information, the operations of collection, transmission and processing of a large amount of redundant data are screened out, the image processing speed is further improved, and the video communication is implemented with low delay.

In some embodiments, the present disclosure provides a light field image processing apparatus applicable to a display device 200, as shown in fig. 13, the light field image processing apparatus of the example of the present disclosure includes:

An acquisition module 10 configured to acquire target viewpoint information and a set of light field images transmitted by the acquisition device; the target viewpoint information represents the position information of eyes of an observer of the display device, and the light field image group comprises light field images acquired by each target light field camera in the acquisition device;

the foreground fusion module 20 is configured to perform viewpoint fusion on foreground images in each light field image based on target viewpoint information to obtain a foreground viewpoint image corresponding to the target viewpoint information;

the background fusion module 30 is configured to perform viewpoint fusion on background images in each light field image based on target viewpoint information and a pre-generated background parallax image of each viewpoint position, so as to obtain a background viewpoint image corresponding to the target viewpoint information; the viewpoint position represents a position corresponding to each target light field camera;

the image synthesis module 40 is configured to generate a target light field image from the foreground viewpoint image and the background viewpoint image.

In some implementations, the foreground fusion module 20 is configured to:

carrying out image segmentation on each light field image in the light field image group to obtain a foreground image corresponding to each light field image;

and carrying out image fusion processing on the first foreground mapping image and the second foreground mapping image based on the target viewpoint information to obtain a foreground viewpoint image.

In some implementations, the foreground fusion module 20 is configured to:

determining a first parallax searching range according to the first parallax map and a preset downsampling coefficient, and determining a second parallax searching range according to the second parallax map and the preset downsampling coefficient;

And performing parallax estimation on the first foreground image based on the first parallax search range to obtain a first foreground parallax image, and performing parallax estimation on the second foreground image based on the second parallax search range to obtain a second foreground parallax image.

In some implementations, the foreground fusion module 20 is configured to:

matching each pixel on the first foreground image to a corresponding pixel position on the mapping graph according to the first foreground parallax graph to obtain a first foreground mapping graph; matching each pixel on the second foreground image to a corresponding pixel position on the mapping graph according to the second foreground parallax graph to obtain a second foreground mapping graph;

and carrying out image fusion processing on the first foreground mapping image and the second foreground mapping image based on the first weight and the second weight to obtain a foreground viewpoint image.

In some implementations, the context fusion module 30 is configured to:

carrying out image segmentation on each light field image in the light field image group to obtain a background image corresponding to each light field image;

for a first light field image and a second light field image corresponding to any two adjacent target light field cameras, performing parallax mapping on the first background image of the first light field image based on a pre-generated first background parallax image at the same viewpoint position as the first light field image to obtain a first background mapping image, and performing parallax mapping on the second background image of the second light field image based on a pre-generated second background parallax image at the same viewpoint position as the second light field image to obtain a second background mapping image;

And carrying out image fusion processing on the first background mapping image and the second background mapping image based on the target viewpoint information to obtain a background viewpoint image.

In some embodiments, the acquisition module 10 is configured to:

acquiring a scene image by an image acquisition device arranged on a display device;

target viewpoint information is generated based on the position information.

In some embodiments, the acquisition module 10 is configured to:

transmitting the target viewpoint information to the acquisition device, so that the acquisition device determines one or more target light field cameras from the plurality of light field cameras according to the target viewpoint information;

and receiving the light field image group sent by the acquisition equipment.

In some embodiments, the acquisition module 10 is configured to:

In some embodiments, the present disclosure provides a light field image processing apparatus applicable to an acquisition device 100, as shown in fig. 14, the light field image processing apparatus of the examples of the present disclosure comprising:

the image acquisition module 50 is configured to acquire current scene images through a plurality of light field cameras arranged on the acquisition device respectively, so as to obtain scene images of viewpoint positions corresponding to each light field camera;

the parallax determining module 60 is configured to generate a background parallax map of the viewpoint position of each light field camera according to the scene images acquired by the two light field cameras respectively for any two adjacent light field cameras;

the transmitting module 70 is configured to transmit the background disparity map of each viewpoint position to the display device, so that the display device stores the background disparity map of each viewpoint position.

In some implementations, the transmission module 70 is configured to:

receiving target viewpoint information sent by display equipment;

acquiring light field images through a target light field camera to obtain a light field image group, and sending the light field image group to display equipment.

In some embodiments, the present disclosure provides a video communication system, which may be as shown in fig. 1, comprising:

a display device 200 comprising an image acquisition means and a first controller for performing the method of any of the above embodiments;

an acquisition device 100 comprising a light field camera array comprising a plurality of light field cameras and a second controller for performing the method of any of the above embodiments;

In some embodiments, the present disclosure provides a storage medium storing computer instructions for causing a computer to perform the method of any of the above embodiments.

In some embodiments, the present disclosure provides an electronic device comprising:

a processor; and

and a memory storing computer instructions for causing the processor to perform the method of any of the embodiments described above.

In the embodiment of the present disclosure, the electronic device may be the acquisition device 100 or the display device 200, which is not limited in this disclosure. Specifically, fig. 15 shows a schematic structural diagram of an electronic device 600 suitable for implementing the method of the present disclosure, and by using the electronic device shown in fig. 15, the corresponding functions of the processor, the controller, and the storage medium described above may be implemented.

As shown in fig. 15, the electronic device 600 includes a processor 601 that can perform various appropriate actions and processes according to a program stored in a memory 602 or a program loaded into the memory 602 from a storage portion 608. In the memory 602, various programs and data required for the operation of the electronic device 600 are also stored. The processor 601 and the memory 602 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the above method processes may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method described above. In such an embodiment, the computer program can be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be apparent that the above embodiments are merely examples for clarity of illustration and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the present disclosure.

Claims

1. A light field image processing method, characterized by being applied to a display device, the method comprising:

2. The method according to claim 1, wherein performing viewpoint fusion on foreground images in each light field image based on the target viewpoint information to obtain a foreground viewpoint image corresponding to the target viewpoint information, includes:

3. The method of claim 2, wherein the performing image segmentation on each light field image in the light field image group to obtain a foreground image corresponding to each light field image comprises:

4. The method of claim 2, wherein performing disparity estimation on the first foreground image corresponding to the first light field image and the second foreground image corresponding to the second light field image to obtain a first foreground disparity map corresponding to the first foreground image and the second foreground disparity map corresponding to the second foreground image, comprises:

5. The method according to claim 2, wherein the parallax mapping the first foreground image based on the first foreground parallax map to obtain a first foreground map, and the parallax mapping the second foreground image based on the second foreground parallax map to obtain a second foreground map, comprises:

6. The method according to claim 1, wherein the background images in the respective light field images are subjected to viewpoint fusion based on the target viewpoint information and a pre-generated background disparity map of each viewpoint position, so as to obtain a background viewpoint image corresponding to the target viewpoint information; the viewpoint location represents a location corresponding to each target light field camera, including:

7. The method of claim 1, wherein the obtaining target viewpoint information comprises:

8. The method of claim 1, wherein the process of acquiring the set of light field images transmitted by the acquisition device comprises:

and receiving the light field image group sent by the acquisition equipment.

9. The method as recited in claim 1, further comprising:

10. A light field image processing method, applied to an acquisition device, the method comprising:

11. The method as recited in claim 10, further comprising:

receiving target viewpoint information sent by the display equipment;

12. A light field image processing apparatus for application to a display device, the apparatus comprising:

13. A light field image processing apparatus for application to an acquisition device, the apparatus comprising:

14. An electronic device, comprising:

a processor; and

memory storing computer instructions for causing the processor to perform the method according to any one of claims 1 to 9 or to perform the method according to any one of claims 10 to 11.

15. A video communication system, comprising:

display device comprising an image acquisition means and a first controller for performing the method according to any one of claims 1 to 9;

acquisition device comprising a light field camera array comprising a plurality of light field cameras and a second controller for performing the method according to any one of claims 10 to 11.

16. A storage medium having stored thereon computer instructions for causing a computer to perform the method according to any one of claims 1 to 9 or to perform the method according to any one of claims 10 to 11.