CN107623823B

CN107623823B - Video communication background display method and device

Info

Publication number: CN107623823B
Application number: CN201710812050.3A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2020-12-18
Anticipated expiration: 2037-09-11
Also published as: CN107623823A

Abstract

The invention provides a video communication background display method and a video communication background display device, wherein the method comprises the following steps: acquiring a scene image of a current user; acquiring a depth image of a current user; processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; fusing the figure region image with a preset pure-color background image to obtain a merged image, and displaying the merged image to a target user performing video communication with the current user; determining familiarity between a current user and a target user; and acquiring corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure-color background image. Therefore, during video communication, the scene information of the user is gradually and openly displayed to the target user according to the familiarity with the target user, so that the privacy of the user is protected, and the communication safety is realized.

Description

Video communication background display method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a video communication background display method and device.

Background

With the development of internet technology, more and more communication functions are developed and applied, wherein the video communication function is widely applied due to realization and visual communication of users in different places.

However, in the related art, when a user performs video chat, the environment information presented to the opposite user depends on the acquisition range of a hardware device such as a camera, so that the environment information where the user is located is generally directly presented to the opposite user, and privacy information of the current user cannot be effectively protected.

Disclosure of Invention

The invention provides a video communication background display method and device, and aims to solve the technical problem that in the prior art, when a user chats in a video mode, information of a scene where the user is located cannot be shielded.

The embodiment of the invention provides a video communication background display method, which is used for an electronic device and comprises the following steps: acquiring a scene image of a current user; acquiring a depth image of the current user; processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; fusing the figure region image with a preset pure-color background image to obtain a merged image, and displaying the merged image to a target user performing video communication with the current user; determining familiarity between the current user and the target user; and acquiring corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure-color background image.

Another embodiment of the present invention provides a video communication background display device for an electronic device, including: the visible light camera is used for acquiring a scene image of a current user; the depth image acquisition component is used for acquiring a depth image of the current user; a processor, configured to process the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; fusing the figure region image with a preset pure-color background image to obtain a merged image, and displaying the merged image to a target user performing video communication with the current user; determining familiarity between the current user and the target user; and acquiring corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure-color background image.

Another embodiment of the present invention provides an electronic device, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the video communication background display method of the above-described embodiments.

Yet another embodiment of the present invention provides a computer-readable storage medium including a computer program for use in conjunction with an electronic device capable of image capture, the computer program being executable by a processor to perform the video communication background display method of the above-described embodiments.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining a scene image of a current user, obtaining a depth image of the current user, processing the scene image and the depth image to extract a character area of the current user in the scene image to obtain a character area image, fusing the character area image with a preset pure color background image to obtain a combined image, displaying the combined image to a target user carrying out video communication with the current user, determining the familiarity between the current user and the target user, further obtaining corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure color background image. Therefore, during video communication, the scene information of the user is gradually and openly displayed to the target user according to the familiarity with the target user, so that the privacy of the user is protected, and the communication safety is realized.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the present invention;

FIG. 2 is a block diagram of a video communication background display apparatus in accordance with certain embodiments of the present invention;

FIG. 3 is a schematic structural diagram of an electronic device according to some embodiments of the invention;

FIG. 4 is a flow chart diagram of a method of video communication background display in accordance with certain embodiments of the invention;

FIG. 5 is a flow chart diagram of a method of video communication background display in accordance with certain embodiments of the invention;

6(a) -6 (e) are schematic views of a scene for structured light measurement according to one embodiment of the present invention;

FIGS. 7(a) and 7(b) are schematic diagrams of a scene for structured light measurements according to one embodiment of the present invention;

fig. 8 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the present invention;

fig. 9 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the present invention;

fig. 10 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the invention;

fig. 11 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the present invention;

fig. 12 is a flow chart diagram of a video communication background display method in accordance with certain embodiments of the invention;

FIG. 13 is a block diagram of an electronic device according to some embodiments of the invention; and

FIG. 14 is a block diagram of an electronic device according to some embodiments of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

A video communication background display method and apparatus according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a video communication background display method according to an embodiment of the present invention, as shown in fig. 1, the method including:

step 101, obtaining a scene image of a current user.

And 102, acquiring a depth image of the current user.

And 103, processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image.

And 104, fusing the figure region image with a preset pure-color background image to obtain a combined image, and displaying the combined image to a target user performing video communication with the current user. Referring to fig. 2 and 3, the video communication background display method according to the embodiment of the present invention may be implemented by the video communication background display apparatus 100 according to the embodiment of the present invention. The video communication background display 100 of the present embodiment is used in an electronic device 1000. As shown in fig. 3, the video communication background display apparatus 100 includes a visible light camera 11, a depth image capturing component 12, and a processor 20. Step 101 may be implemented by the visible light camera 11, step 102 may be implemented by the depth image acquisition assembly 12, and

steps

103 and 104 are implemented by the processor 20.

That is, the visible light camera 11 may be used to acquire a scene image of the current user; the depth image acquisition component 12 may be used to acquire a depth image of a current user; the processor 20 is operable to process the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image, and fuse the person region image with a preset solid background image to obtain a merged image.

The scene image can be a gray level image or a color image, and the depth image representation includes depth information of each person or object in the scene of the current user. The scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

The video communication background display apparatus 100 according to the embodiment of the present invention can be applied to the electronic apparatus 1000 according to the embodiment of the present invention. That is, the electronic device 1000 according to the embodiment of the present invention includes the video communication background display device 100 according to the embodiment of the present invention.

In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart band, a smart watch, a smart helmet, smart glasses, and the like.

The existing method for segmenting the human and the background mainly performs segmentation of the human and the background according to similarity and discontinuity of adjacent pixels in terms of pixel values, but the segmentation method is easily influenced by environmental factors such as external illumination and the like. Video communication background display method of the embodiment of the present invention, the video communication background display apparatus 100 and the electronic apparatus 1000 extract a person region in a scene image by acquiring a depth image of a current user. Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, the character region extracted through the depth image is more accurate, and particularly, the boundary of the character region can be accurately calibrated. Furthermore, the effect of the merged image formed by fusing the accurate figure region image and the preset pure-color background is better.

Referring to fig. 4, as a possible implementation manner, the step of acquiring the depth image of the current user in step 102 includes:

step 201, projecting structured light to a current user.

Step 202, taking a structured light image modulated by the current user.

In step 203, phase information corresponding to each pixel of the structured light image is demodulated to obtain a depth image.

In this example, with continued reference to fig. 3, the depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122. Step 201 may be implemented by the structured light projector 121 and

steps

202 and 203 may be implemented by the structured light camera 122.

That is, the structured light projector 121 may be used to transmit structured light to a current user; the structured light camera 122 may be configured to capture a structured light image modulated by a current user, and demodulate phase information corresponding to each pixel of the structured light image to obtain a depth image.

Specifically, after the structured light projector 121 projects a certain pattern of structured light onto the face and the body of the current user, a structured light image modulated by the current user is formed on the surface of the face and the body of the current user. The structured light camera 122 captures a modulated structured light image, and demodulates the structured light image to obtain a depth image. The pattern of the structured light may be laser stripes, gray codes, sinusoidal stripes, non-uniform speckles, etc.

Referring to fig. 5, in some embodiments, the step 203 of demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image includes:

step 301, demodulating phase information corresponding to each pixel in the structured light image.

Step 302, converting the phase information into depth information.

Step 303, generating a depth image according to the depth information.

With continued reference to fig. 2, in some embodiments,

steps

301, 302, and 303 may be implemented by the structured light camera 122.

That is, the structured light camera 122 may be further configured to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

Specifically, the phase information of the modulated structured light is changed compared with the unmodulated structured light, and the structured light displayed in the structured light image is the distorted structured light, wherein the changed phase information can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information according to the phase information, thereby obtaining the final depth image.

In order to make the process of acquiring depth images of the face and body of the current user according to the structure more obvious to those skilled in the art, a widely-applied raster projection technique (fringe projection technique) is taken as an example to illustrate the specific principle. The grating projection technology belongs to the field of surface structured light in a broad sense.

As shown in fig. 6(a), when the surface structured light is used for projection, firstly, a sinusoidal stripe is generated by computer programming, and is projected to a measured object through the structured light projector 121, then the structured light camera 122 is used to shoot the bending degree of the stripe after being modulated by an object, and then the bending stripe is demodulated to obtain a phase, and then the phase is converted into depth information, so as to obtain a depth image. To avoid the problem of error or error coupling, the depth image capturing assembly 12 needs to be calibrated before using the structured light to capture the depth information, and the calibration includes calibration of geometric parameters (e.g., relative position parameters between the structured light camera 122 and the structured light projector 121, etc.), calibration of internal parameters of the structured light camera 122 and internal parameters of the structured light projector 121, and so on.

Specifically, in a first step, the computer is programmed to generate sinusoidal stripes. Since the phase is acquired by using the distorted stripe, for example, the phase is acquired by using a four-step phase shifting method, four phase differences are generated here

Then the structured light projector 121 projects the four stripes onto the object to be measured (mask shown in fig. 6 (a)) in a time-sharing manner, and the structured light camera 122 acquires the image on the left side of fig. 6(b) and simultaneously reads the stripes on the reference plane shown on the right side of fig. 6 (b).

And secondly, phase recovery is carried out. The structured light camera 122 calculates a modulated phase according to the four acquired modulated fringe patterns (i.e., structured light images), and the obtained phase pattern is a truncated phase pattern. Since the result of the four-step phase-shifting algorithm is calculated by the arctan function, the phase after the light modulation of the structure is limited to between-pi, i.e. it starts again each time the modulated phase exceeds-pi, pi. The resulting phase principal value is shown in fig. 6 (c).

In the phase recovery process, the jump-canceling process is required, that is, the truncated phase is recovered to the continuous phase. As shown in fig. 6(d), the modulated continuous phase diagram is on the left and the reference continuous phase diagram is on the right.

And thirdly, subtracting the modulated continuous phase from the reference continuous phase to obtain a phase difference (namely phase information), wherein the phase difference represents the depth information of the measured object relative to the reference surface, and substituting the phase difference into a phase and depth conversion formula (parameters related in the formula are calibrated), so that the three-dimensional model of the object to be measured shown in fig. 6(e) can be obtained.

It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any pattern other than the grating, according to different application scenarios.

As a possible implementation mode, the invention can also use speckle structure light to collect the depth information of the current user.

Specifically, the method for acquiring depth information by using speckle structure light is to use a substantially flat diffraction element, wherein the diffraction element is provided with a relief diffraction structure with a specific phase distribution, and the cross section of the diffraction element is provided with a step relief structure with two or more concave-convex parts. The thickness of the substrate in the diffraction element is approximately 1 micron, the height of each step is not uniform, and the height can be in the range of 0.7-0.9 micron. The structure shown in fig. 7(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 7(b) is a cross-sectional side view taken along section a-a, with the abscissa and ordinate both in units of microns. Speckle patterns generated by speckle structured light are highly random and can shift pattern with distance. Therefore, before obtaining depth information using speckle structured light, firstly, a speckle pattern in a space needs to be calibrated, for example, a reference plane is taken every 1 cm within a range of 0-4 m from the structured light camera 122, 400 speckle images are saved after calibration is completed, and the smaller the calibrated interval is, the higher the accuracy of the obtained depth information is. Then, the structured light projector 121 projects the speckle structured light onto a measured object (i.e., a current user), and the speckle pattern of the speckle structured light projected onto the measured object is changed by the height difference of the surface of the measured object. After the structured light camera 122 shoots the speckle pattern (i.e., structured light image) projected onto the measured object, the speckle pattern and 400 speckle images stored after previous calibration are subjected to cross-correlation operation one by one, and then 400 correlation images are obtained. The position of the measured object in the space can display a peak value on the correlation image, and the peak values are superposed together and subjected to interpolation operation to obtain the depth information of the measured object.

Since the common diffraction element diffracts the light beam to obtain a plurality of diffracted lights, the difference of the light intensity of each diffracted light beam is large, and the risk of injury to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained light beam is low. Therefore, the effect of projecting the object to be measured by using the light beam diffracted by the ordinary diffraction element is poor. In this embodiment, the collimating beam splitting element is adopted, and the collimating beam splitting element not only has the function of collimating the non-collimated light beam, but also has the function of splitting light, that is, the non-collimated light reflected by the reflector exits a plurality of collimated light beams at different angles after passing through the collimating beam splitting element, the cross-sectional areas of the emitted collimated light beams are approximately equal, the energy fluxes are approximately equal, and further, the effect of projecting by using the scattered light diffracted by the light beams is better. Meanwhile, the laser emergent light is dispersed to each beam of light, the risk of damaging human eyes is further reduced, and compared with other uniformly-arranged structured light, the speckle structured light has the advantage that the electric quantity consumed by the speckle structured light is lower when the same collecting effect is achieved.

Referring to fig. 8, as a possible implementation manner, the step 103 processes the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image, including:

step 401, identifying a face region in a scene image.

Step 402, obtaining depth information corresponding to the face region from the depth image.

In step 403, the depth range of the human figure region is determined according to the depth information of the human face region.

In step 404, a person region connected to the face region and falling within the depth range is determined according to the depth range of the person region to obtain a person region image.

Referring back to fig. 2, in some embodiments,

steps

401, 402, 403, and 404 may be implemented by processor 20.

That is, the processor 20 may be further configured to identify a face region in the scene image, obtain depth information corresponding to the face region from the depth image, determine a depth range of the person region according to the depth information of the face region, and determine a person region connected to the face region and falling within the depth range according to the depth range of the person region to obtain a person region image.

Specifically, a trained depth learning model can be used to identify a face region in a scene image, and then depth information of the face region can be determined according to a corresponding relationship between the scene image and a depth image. Because the face region includes features such as a nose, eyes, ears, lips, and the like, the depth data corresponding to each feature in the face region in the depth image is different, for example, when the face is directly facing the depth image capturing component 12, the depth data corresponding to the nose may be smaller, and the depth data corresponding to the ears may be larger in the depth image captured by the depth image capturing component 12. Therefore, the depth information of the face region may be a value or a range of values. When the depth information of the face area is a numerical value, the numerical value can be obtained by averaging the depth data of the face area; alternatively, it may be obtained by taking the median of the depth data of the face region.

Since the human figure region includes the human face region, that is, the human figure region and the human face region are located in a certain depth range, after the processor 20 determines the depth information of the human face region, the depth range of the human figure region may be set according to the depth information of the human face region, and then the human figure region falling within the depth range and connected to the human face region is extracted according to the depth range of the human figure region to obtain the human figure region image.

In this way, the person region image can be extracted from the scene image based on the depth information. Because the depth information is not affected by the image of factors such as illumination, color temperature and the like in the environment, the extracted figure region image is more accurate.

Referring to fig. 9, in some embodiments, the method for displaying the background of video communication further includes the following steps:

step 501, a scene image is processed to obtain a full-field edge image of the scene image.

Step 502, correcting the image of the person region according to the full-field edge image.

Referring back to fig. 2, in some embodiments, both step 501 and step 502 may be implemented by the processor 20.

That is, the processor 20 may be further configured to process the scene image to obtain a full-field edge image of the scene image, and modify the person region image based on the full-field edge image.

The processor 20 first performs edge extraction on the scene image to obtain a full-field edge image, where edge lines in the full-field edge image include edge lines of the current user and a background object in the scene where the current user is located. Specifically, the edge extraction can be performed on the scene image through a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for carrying out convolution on a scene image so as to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, pixel points at the edge position in the scene image can be determined, and the full-field edge image after edge extraction is obtained.

After the processor 20 obtains the full-field edge image, the human area image is corrected according to the full-field edge image. It is understood that the person region image is obtained by merging all pixels in the scene image, which are connected to the face region and fall within the set depth range, and in some scenes, there may be some objects connected to the face region and fall within the depth range. Therefore, in order to make the extracted human figure region image more accurate, the human figure region image can be corrected using the full-field edge map.

Further, the processor 20 may perform a secondary correction on the corrected image of the person region, for example, perform an expansion process on the corrected image of the person region to expand the image of the person region to retain edge details of the image of the person region.

After the processor 20 obtains the person region image, the person region image and the preset solid background image may be fused to obtain a merged image. In some embodiments, the color of the preset solid background image may be randomly selected by the processor 20 or selected by the current user. The merged combined image may be displayed on a display screen of the electronic apparatus 1000 or may be printed by a printer connected to the electronic apparatus 1000.

In an embodiment of the present invention, when a current user wants to hide a current background in a video process with another person, the method for displaying a background in video communication according to the embodiment of the present invention can be used to fuse a person region image corresponding to the current user with a preset solid background image, and then display the fused merged image to a target user. Since the current user is in a video call with the other party, the visible light camera 11 needs to capture a scene image of the current user in real time, the depth image collecting component 12 also needs to collect a depth image corresponding to the current user in real time, and the processor 20 timely processes the scene image and the depth image collected in real time so that the other party can see a smooth video picture formed by combining multiple frames of combined images.

Step 105, determining familiarity between the current user and the target user.

And 106, acquiring corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure background image.

It can be understood that in some application scenarios, the more familiar the video communication user is, the more likely the user is to have the real scene of the real scene image of the environment where the video communication user is located to be displayed to the other party, at this time, the familiarity between the current user and the target user is determined, and then, the corresponding component elements are obtained from the scene where the current user is located according to the familiarity, and the component elements are displayed to the target user in the solid background image.

It should be noted that, depending on the specific application scenario, the familiarity between the current user and the target user may be determined in a variety of different ways:

as a possible implementation, as shown in fig. 10, step 105 includes:

step 601, detecting video interaction information of a current user and a target user according to a preset matching index.

Step 602, if it is detected that the video interaction information satisfies the preset matching information, querying a corresponding relationship between the preset matching information and the familiarity, and determining the familiarity between the current user and the target user.

In this example, it should be understood that users are more familiar with each other, the topic they talk about or the words they use is more random, or the amount of information they talk about is larger, the users are more familiar with each other, so that the content keywords of the speech information and the text information can be detected according to the preset matching index, wherein the keywords are calibrated according to a large amount of experimental data, and can be words defining the familiarity between users, such as words containing the relation call between users ("mom", "partnerships", "dad"), such as the spoken words "you go to dead bar", "ghost believes you", etc., and/or the amount of information of the speech information and the text information.

Furthermore, it can be understood that the corresponding relationship between the matching information and the familiarity is preset, for example, the familiarity corresponding to "mom" is high, the familiarity corresponding to "li teach" (professor) is low, the familiarity corresponding to 10 pieces of voice information volume is low, the familiarity corresponding to 1000 pieces of voice information volume is high, and the like, so that if it is detected that the video interaction information meets the preset matching information, the corresponding relationship between the preset matching information and the familiarity is queried, and the familiarity between the current user and the target user is determined.

As another possible implementation, as shown in fig. 11, step 105 includes:

step 701, sending verification requests corresponding to different familiarity degrees to target users.

And step 702, verifying the request response fed back by the target user and preset standard information, and determining the familiarity between the current user and the target user according to a verification result.

In this example, the user sets authentication requests corresponding to different familiarity in advance, and sets corresponding standard information for each authentication request, for example, the authentication request set for high familiarity is "several people at home", the standard information set for the request may be "five mouths", the authentication request set for low familiarity is "i am a man or a woman", the standard information set for the request may be "woman", and the like, so that the authentication requests corresponding to different familiarity are transmitted to the target user at the time of video chat.

At this time, the target user feeds back a request response according to the verification request, only the familiar target user can feed back standard information corresponding to the verification request with higher familiarity at this time, and the unfamiliar target user can only feed back standard information corresponding to the verification request with lower familiarity, so that verification is performed according to the request response fed back by the target user and preset standard information, and the familiarity between the current user and the target user is determined according to a verification result.

As another possible implementation, as shown in fig. 12, step 105 includes:

step 801, acquiring a user image of a target user, and extracting facial feature information of the user image. Step 802, inquiring a preset image information base according to the facial feature information to acquire the identity information of the target user.

Step 803, querying a preset corresponding relationship between the identity information and the familiarity, and determining the familiarity between the current user and the target user.

In this example, the corresponding relationship between the identity information of other users and the familiarity is established in advance, for example, the corresponding relationship between the identity information of family and the high familiarity is established, the corresponding relationship between the identity information of friend and the medium familiarity is established, the corresponding relationship between stranger and the low familiarity is established, and the like.

And further, acquiring a user image of the target user, extracting facial feature information of the user image, wherein the user image can be extracted by capturing a facial screenshot of the target user in the video call, further, inquiring a preset image information base according to the facial feature information to acquire identity information of the target user, inquiring a corresponding relation between the preset identity information and the familiarity, and determining the familiarity between the current user and the target user.

Further, corresponding component elements are obtained from the scene where the current user is located according to the familiarity, wherein the component elements include item information, ambient light and the like in the scene where the user is located, and the component elements are displayed to the target user in the solid background image, for example, according to the number of items corresponding to the familiarity, the corresponding number of component elements are obtained from the scene where the current user is located, for example, the higher the familiarity is, the more the component elements are displayed in the current scene to the target user, and for example, according to the type of the items corresponding to the familiarity, the component elements of corresponding types (such as living goods, office goods and the like) are obtained from the scene where the current user is located, for example, the higher the familiarity is, the more sensitive the types of the component elements in the current scene displayed to the target user are.

In summary, the video communication background display method according to the embodiment of the present invention obtains a scene image of a current user, obtains a depth image of the current user, processes the scene image and the depth image to extract a character region of the current user in the scene image to obtain a character region image, fuses the character region image and a preset solid background image to obtain a merged image, displays the merged image to a target user performing video communication with the current user, determines a familiarity between the current user and the target user, further obtains a corresponding component element from a scene where the current user is located according to the familiarity, and displays the component element to the target user in the solid background image. Therefore, during video communication, the scene information of the user is gradually and openly displayed to the target user according to the familiarity with the target user, so that the privacy of the user is protected, and the communication safety is realized.

Referring to fig. 3 and fig. 13, an electronic device 1000 is further provided in the present embodiment. The electronic device 1000 includes a video communication background display device 100. The video communication background display apparatus 100 may be implemented using hardware and/or software. The video communication background display apparatus 100 includes an imaging device 10 and a processor 20.

The imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.

Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112, and the visible light camera 11 can be used to capture color information of a current user to obtain an image of a scene, wherein the image sensor 111 includes a color filter array (e.g., a Bayer filter array), and the number of the lens 112 can be one or more. In the process of acquiring a scene image by the visible light camera 11, each imaging pixel in the image sensor 111 senses light intensity and wavelength information from a shooting scene to generate a group of original image data; the image sensor 111 sends the group of raw image data to the processor 20, and the processor 20 performs operations such as denoising and interpolation on the raw image data to obtain a colorful scene image. Processor 20 may process each image pixel in the raw image data one-by-one in a variety of formats, for example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and processor 20 may process each image pixel at the same or different bit depth.

The depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122, and the depth image acquisition assembly 12 is operable to capture depth information of a current user to obtain a depth image. The structured light projector 121 is used to project structured light to the current user, wherein the structured light pattern may be a laser stripe, a gray code, a sinusoidal stripe, or a randomly arranged speckle pattern, etc. The structured light camera 122 includes an image sensor 1221 and lenses 1222, and the number of the lenses 1222 may be one or more. The image sensor 1221 is used to capture a structured light image projected onto a current user by the structured light projector 121. The structured light image may be sent by the depth acquisition component 12 to the processor 20 for demodulation, phase recovery, phase information calculation, and the like to obtain the depth information of the current user.

In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be implemented by one camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, and the camera can capture not only the scene image but also the structured light image.

In addition to acquiring a depth image by using structured light, a depth image of a current user can be acquired by a binocular vision method, a Time of Flight (TOF) based depth image acquisition method, and the like.

The processor 20 is further configured to fuse the character region image extracted from the scene image and the depth image with a preset solid background image, display the merged image to a target user in video communication with the current user, further determine the familiarity between the current user and the target user, obtain corresponding component elements from the scene where the current user is located according to the familiarity, and display the component elements to the target user in the solid background image. When extracting the person region image, the processor 20 may extract a two-dimensional person region image from the scene image in combination with the depth information in the depth image, or may create a three-dimensional map of the person region according to the depth information in the depth image, and color-fill the three-dimensional person region in combination with the color information in the scene image to obtain a three-dimensional color person region image. Therefore, when the character region image and the preset solid background image are subjected to the fusion processing, the two-dimensional character region image and the preset solid background image may be fused to obtain a merged image, or the three-dimensional color character region image and the preset solid background image may be fused to obtain the merged image.

Further, the video communication background display apparatus 100 further includes an image memory 30. The image Memory 30 may be embedded in the electronic device 1000, or may be a Memory independent from the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The raw image data collected by the visible light camera 11 or the structured light image related data collected by the depth image collecting assembly 12 can be transmitted to the image memory 30 for storage or buffering. Processor 20 may read raw image data from image memory 30 for processing to obtain an image of a scene and may read structured light image-related data from image memory 30 for processing to obtain a depth image. In addition, the scene image and the depth image may also be stored in the image memory 30 for the processor 20 to call the processing at any time, for example, the processor 20 calls the scene image and the depth image to perform the person region extraction, and performs the fusion processing on the obtained person region image after the extraction and the preset solid background image to obtain the merged image. Wherein the preset solid color background image and the merged image may also be stored in the image memory 30.

The video communication background display device 100 may also include a display 50. The display 50 may retrieve the merged image directly from the processor 20 and may also retrieve the merged image from the image memory 30. The display 50 displays the merged image for viewing by the target user or for further Processing by a Graphics Processing Unit (GPU). The video communication background display apparatus 100 further includes an encoder/decoder 60, and the encoder/decoder 60 may encode and decode image data of a scene image, a depth image, a merged image, and the like, and the encoded image data may be stored in the image memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), a GPU, or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a Central Processing Unit (CPU), a GPU, and a coprocessor.

The video communication background display apparatus 100 also includes control logic 40. When imaging device 10 is imaging, processor 20 may perform an analysis based on data acquired by the imaging device to determine image statistics for one or more control parameters (e.g., exposure time, etc.) of imaging device 10. Processor 20 sends the image statistics to control logic 40 and control logic 40 controls imaging device 10 to determine the control parameters for imaging. Control logic 40 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

Referring to fig. 14, an electronic device 1000 according to an embodiment of the invention includes one or more processors 200, a memory 300, and one or more programs 310. Where one or more programs 310 are stored in memory 300 and configured to be executed by one or more processors 200. The program 310 includes instructions for performing the video communication background display method of any of the above embodiments.

For example, program 310 includes instructions for performing a video communication background display method as described in the following steps:

and step 01, acquiring a scene image of the current user.

And step 02, acquiring a depth image of the current user.

And step 03, processing the scene image and the depth image to extract the person region of the current user in the scene image to obtain a person region image.

And step 04, fusing the figure region image with a preset pure-color background image to obtain a combined image, and displaying the combined image to a target user performing video communication with the current user.

Step 05, determining the familiarity between the current user and the target user.

And step 06, acquiring corresponding component elements from the scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure background image.

As another example, the program 310 further includes instructions for performing a video communication background display method as described in the following steps:

0331: demodulating phase information corresponding to each pixel in the structured light image;

0332: converting the phase information into depth information; and

0333: and generating a depth image according to the depth information.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with the image-enabled electronic device 1000. The computer program can be executed by the processor 200 to perform the video communication background display method of any of the above embodiments.

For example, the computer program may be executable by the processor 200 to perform a video communication background display method as described in the following steps:

and step 01, acquiring a scene image of the current user.

And step 02, acquiring a depth image of the current user.

As another example, the computer program may also be executable by the processor 200 to perform a video communication background display method as described in the following steps:

0332: converting the phase information into depth information; and

0333: and generating a depth image according to the depth information.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A video communication background display method for an electronic device, comprising:

acquiring a scene image of a current user;

acquiring a depth image of the current user, wherein the depth information of the current user is acquired by using speckle structured light, the depth image of the current user is generated according to the depth information of the current user, wherein speckle patterns in a space are calibrated, the speckle patterns projected onto the current user are shot, the shot speckle patterns and the calibrated speckle patterns are subjected to cross-correlation operation one by one to obtain a correlation image, and the depth information of the current user is obtained according to the correlation image;

processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

fusing the figure region image with a preset pure-color background image to obtain a merged image, and displaying the merged image to a target user performing video communication with the current user;

determining familiarity between the current user and the target user;

acquiring corresponding component elements from a scene where the current user is located according to the familiarity, and displaying the component elements to the target user in the pure-color background image;

wherein the processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image comprises:

identifying a face region in the scene image;

acquiring depth information corresponding to the face area from the depth image;

determining the depth range of the character region according to the depth information of the face region; and

and determining a person region which is connected with the face region and falls into the depth range according to the depth range of the person region to obtain the person region image.

2. The method of claim 1, further comprising:

processing the scene image to obtain a full-field edge image of the scene image; and

and correcting the image of the person region according to the full-field edge image.

3. The method of claim 1, wherein the determining familiarity between the current user and the target user comprises:

detecting video interaction information of the current user and the target user according to a preset matching index;

and if the video interaction information is detected to meet the preset matching information, inquiring the preset corresponding relation between the matching information and the familiarity, and determining the familiarity between the current user and the target user.

4. The method according to claim 3, wherein the detecting video interaction information of the current user and the target user according to a preset matching index comprises:

and detecting content keywords of the voice information and the text information according to a preset matching index, and/or detecting the information content of the voice information and the text information.

5. The method of claim 1, wherein the determining familiarity between the current user and the target user comprises:

sending verification requests corresponding to different familiarity degrees to the target user;

and verifying according to the request response fed back by the target user and preset standard information, and determining the familiarity between the current user and the target user according to a verification result.

6. The method of claim 1, wherein the determining familiarity between the current user and the target user comprises:

acquiring a user image of the target user, and extracting facial feature information of the user image;

inquiring a preset image information base according to the facial feature information to acquire the identity information of the target user;

and inquiring the preset corresponding relation between the identity information and the familiarity, and determining the familiarity between the current user and the target user.

7. The method of claim 1, wherein the obtaining the corresponding component element from the scene where the current user is located according to the familiarity degree comprises:

acquiring a corresponding number of component elements from the scene where the current user is located according to the number of the articles corresponding to the familiarity; and/or the presence of a gas in the gas,

and acquiring component elements of corresponding types from the scene where the current user is located according to the item types corresponding to the familiarity.

8. A video communication background display apparatus for an electronic apparatus, comprising:

the visible light camera is used for acquiring a scene image of a current user;

the depth image acquisition component is used for acquiring a depth image of the current user, wherein speckle structured light is used for acquiring the depth information of the current user, the depth image of the current user is generated according to the depth information of the current user, speckle patterns in a space are calibrated, the speckle patterns projected onto the current user are shot, the shot speckle patterns and the calibrated speckle patterns are subjected to cross-correlation operation one by one to obtain a correlation image, and the depth information of the current user is obtained according to the correlation image;

a processor, configured to process the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

determining familiarity between the current user and the target user;

identifying a face region in the scene image;

9. The apparatus of claim 8, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting speckle structured light to the current user.

10. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the video communication background display method of any of claims 1-7.

11. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device capable of capturing images, the computer program being executable by a processor to perform the video communication background display method of any of claims 1-7.