CN107734267B

CN107734267B - Image processing method and device

Info

Publication number: CN107734267B
Application number: CN201710813585.2A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2020-06-26
Anticipated expiration: 2037-09-11
Also published as: CN107734267A

Abstract

The invention relates to an image processing method and device, wherein the method comprises the following steps: acquiring component elements of a scene where a current user is located, and processing the component elements according to preset image parameters to generate a virtual background image; if the brightness of the virtual background image is detected to be lower than the scene brightness, simulating a light-on sound, and simultaneously adding a virtual light source into the virtual background image according to the brightness difference value of the virtual background image and the scene brightness to enable the brightness of the virtual background image to be matched with the scene brightness; acquiring a scene image and a depth image of a current user; and obtaining a character area image and fusing the character area image with the virtual background image to obtain a combined image. Therefore, the virtual background image is supplemented with light according to the difference value between the virtual background and the scene brightness, the situation that the scene brightness is too large in brightness difference compared with the virtual background image is avoided, the figure region image and the virtual background image are fused naturally, the visual effect of image processing is improved, the light-on sound is simulated during light supplementation, the sense of reality of light supplementation is increased, and interaction with a user is achieved.

Description

Image processing method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.

Background

With the development of internet technology, more and more communication functions are developed and applied, wherein the video communication function is widely applied due to realization and visual communication of users in different places.

However, in the related art, when a user performs video chat, in order to protect the privacy of the user, the scene information where the user is located is shielded by using a virtual background, however, in practical applications, the brightness difference between the brightness in the real scene information and the brightness of the virtual background is usually large, which causes the image of the character area and the image of the virtual background to be fused together more abruptly, and the visual effect is not good.

Disclosure of Invention

The invention provides an image processing method and device, and aims to solve the technical problem that in the prior art, the scene brightness is too large compared with the brightness difference of a virtual background image, so that the fusion of a character region image and the virtual background image is more abrupt.

The embodiment of the invention provides an image processing method, which is used for an electronic device and comprises the following steps: acquiring a component element of a scene where a current user is located, and processing the component element according to a preset image processing mode to generate a virtual background image; detecting the current scene brightness, if the brightness of the virtual background image is lower than the scene brightness, simulating a light-on sound, and simultaneously adding a virtual light source into the virtual background image according to the brightness difference value of the two images to enable the brightness of the virtual background image to be matched with the scene brightness; acquiring a scene image of a current user; acquiring a depth image of the current user; processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; and fusing the character area image and the virtual background image to obtain a combined image.

Another embodiment of the present invention provides an image processing apparatus for an electronic apparatus, including: the device comprises a visible light camera, a virtual background image generation module and a virtual background image generation module, wherein the visible light camera is used for acquiring component elements of a scene where a current user is located and processing the component elements according to a preset image processing mode to generate the virtual background image; detecting the current scene brightness, if the brightness of the virtual background image is lower than the scene brightness, simulating a light-on sound, and simultaneously adding a virtual light source into the virtual background image according to the brightness difference value of the two images to enable the brightness of the virtual background image to be matched with the scene brightness; acquiring a scene image of a current user; the depth image acquisition component is used for acquiring a depth image of the current user; a processor for processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; and fusing the character area image and the virtual background image to obtain a combined image.

Another embodiment of the present invention provides an electronic device, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing methods of the above embodiments.

Yet another embodiment of the present invention provides a computer-readable storage medium including a computer program for use in conjunction with an electronic device capable of image capture, the computer program being executable by a processor to perform the image processing method of the above-described embodiment.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining component elements of a scene where a current user is located, processing the component elements according to a preset image processing mode to generate a virtual background image, detecting the brightness of the current scene, simulating a light-on sound if the brightness of the virtual background image is lower than the brightness of the scene, adding a virtual light source into the virtual background image according to the brightness difference value of the virtual background image and the brightness of the scene to enable the brightness of the virtual background image to be matched with the brightness of the scene, obtaining a scene image of the current user, obtaining a depth image of the current user, processing the scene image and the depth image to extract a character area of the current user in the scene image to obtain a character area image, and fusing the character area image and the virtual background image to obtain a combined image. Therefore, the virtual background image is supplemented with light according to the difference value between the virtual background and the scene brightness, the situation that the scene brightness is too large in brightness difference compared with the virtual background image is avoided, the figure region image and the virtual background image are fused naturally, the visual effect of image processing is improved, the light-on sound is simulated during light supplementation, the sense of reality of light supplementation is increased, and interaction with a user is achieved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow diagram of an image processing method according to some embodiments of the invention;

FIG. 2 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

FIG. 3 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

FIG. 4 is a block schematic diagram of an image processing apparatus according to some embodiments of the invention;

FIG. 5 is a schematic structural diagram of an electronic device according to some embodiments of the invention;

FIG. 6 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

FIG. 7 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

8(a) -8 (e) are scene diagrams of structured light measurements according to one embodiment of the present invention;

FIGS. 9(a) and 9(b) are schematic diagrams of a scene for structured light measurement according to one embodiment of the present invention;

FIG. 10 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

FIG. 11 is a schematic flow chart diagram of an image processing method according to some embodiments of the invention;

FIG. 12 is a block diagram of an electronic device according to some embodiments of the invention; and

FIG. 13 is a block diagram of an electronic device according to some embodiments of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

An image processing method and apparatus of an embodiment of the present invention are described below with reference to the drawings.

FIG. 1 is a flow diagram illustrating an image processing method according to some embodiments of the invention. As shown in fig. 1, the method includes:

step 101, acquiring a component element of a scene where a current user is located, and processing the component element according to a preset image processing mode to generate a virtual background image.

The component elements of the scene include items, environment information, and the like of the real scene where the user is located, for example, for a scene of a conference room, the corresponding component elements include tables and chairs of the conference room, office supplies, windows, landscapes outside the windows, and the like.

In addition, the preset image processing mode is a preset conversion mode for converting a scene into a virtual background image according to component elements in the scene where the user is located, the conversion mode is different according to different specific application scenes, and as a possible implementation mode, a virtual image corresponding to the component elements in each scene is preset and stored, and the virtual image can be in a cartoon form or a 3D model form, so that after the component elements of the scene where the current user is located are obtained, the corresponding relationship is inquired, the corresponding virtual image is obtained, and the virtual background image is generated according to the virtual image.

Specifically, in the embodiment of the present invention, a component element of a scene where a current user is located is obtained, and the component element is processed according to a preset image parameter to generate a virtual background image.

Based on the above analysis, the virtual background image may be a two-dimensional virtual background image, or a three-dimensional virtual background image, etc., where if the three-dimensional background image may be obtained by modeling according to information of a real scene where the user is located, the method is not limited herein, and in order to further improve the video experience of the user, the virtual background image may be determined randomly according to a preset mode or according to a preference characteristic of the current user, for example, an animation virtual background image is set for a user who prefers animation, a landscape painting virtual background image is set for a user who prefers landscape, etc., where the virtual background image may be a two-dimensional virtual background image, or a three-dimensional virtual background image, which is not limited herein.

Step 102, detecting the current scene brightness, if the brightness of the virtual background image is lower than the scene brightness, simulating a light-on sound, and adding a virtual light source in the virtual background image according to the brightness difference value of the two images to enable the brightness of the virtual background image to be matched with the scene brightness.

It should be noted that, according to different application scenes, different manners may be adopted to detect the current scene brightness, for example, the current scene brightness may be detected by a brightness sensor, and similarly, according to different application scenes, different manners may be adopted to obtain the brightness of the preset virtual background image, for example, an image brightness parameter of the virtual background may be extracted by an image processing technique, the brightness of the virtual background image may be calculated by the image brightness parameter, for example, the preset virtual background image may be input into a relevant measurement model, and the brightness of the virtual background image may be determined according to the output of the model.

It can be understood that, in practical applications, a preset image processing manner may be converted only based on component elements in a scene to obtain a virtual background image, when a difference between brightness of a preset virtual background image and brightness of the scene is large, a brightness representation of a face region and the like of a current user is large in difference with the preset virtual background image, at this time, in order to facilitate subsequent fusion of the face region and the like of the user and the preset virtual background image to be natural, a virtual light source may be added to the virtual background image to perform brightness enhancement processing.

The types of the virtual light source comprise: one or more of a surface light source, a spot light, a ball light and sunlight.

Specifically, as a possible implementation manner, the current scene brightness is detected, and if the brightness of the virtual background image is detected to be lower than the scene brightness, a virtual light source is added to the virtual background image according to the brightness difference between the two brightness values, so that the brightness of the virtual background image is matched with the scene brightness, and the brightness of the virtual background image is matched with the scene brightness.

In the embodiment of the invention, when the virtual light source is added into the virtual background image according to the brightness difference of the virtual light source and the virtual background image, in order to improve the reality of light supplement and realize interaction with a user, the sound of turning on the light is simulated, for example, a 'click' sound effect is added during light supplement.

It should be noted that, according to different application scenarios, different manners may be adopted to add a virtual light source in the virtual background image according to the luminance difference between the two, which is exemplified as follows:

as a possible implementation manner, referring to fig. 2, the step 102 includes:

step 201, querying fill-in light information corresponding to a preset virtual light source, and obtaining light source compensation intensity and projection direction matched with the brightness difference.

And step 202, adding a corresponding virtual light source in the virtual background image according to the light source compensation intensity and the projection direction.

It is to be understood that, in this example, fill-in information including light source compensation intensity and projection direction matching each brightness difference value is set in advance for each virtual light source, for example, when the virtual light sources are a surface light source and a spotlight, the correspondence among the fill-in information is as shown in table 1 below:

TABLE 1

And then after the brightness difference value is obtained, searching light supplement information corresponding to a preset virtual light source, obtaining light source compensation intensity and a projection direction matched with the brightness difference value, and further adding a corresponding virtual light source in the virtual background image according to the light source compensation intensity and the projection direction, so that the brightness of the virtual background image is matched with the brightness in the actual scene information.

As another possible implementation manner, referring to fig. 3, the step 102 includes:

in step 301, one or more types of virtual light sources are set in the virtual background image.

Step 302, querying preset supplementary lighting adjustment information according to the positions of the various types of virtual light sources, and acquiring target working state data corresponding to the brightness difference.

And 303, adjusting the working parameters of the virtual light sources at the corresponding positions according to the target working state data.

The working parameters of the virtual light source comprise one or more combinations of pitch angle, height, brightness, color and intensity.

It can be understood that, in this example, one or more types of virtual light sources have been set in the virtual background image in advance, at this time, preset supplementary lighting adjustment information may be queried according to the position of each type of virtual light source, and target operating state data corresponding to the brightness difference value is obtained, where the target operating state data corresponds to the total brightness effect and the like represented when each type of virtual light source operates.

Furthermore, in order to realize the brightness effect corresponding to the brightness difference, the operating parameters of the virtual light source at the corresponding position, such as the pitch angle, the height, the brightness, the color, the intensity and the like of the virtual light source, are adjusted according to the target operating state data, so that the brightness of the virtual background image is matched with the brightness in the actual scene information.

And 103, acquiring a scene image of the current user.

And 104, acquiring a depth image of the current user.

Step 105, processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image.

And step 106, fusing the character area image and the virtual background image to obtain a combined image.

Among them, referring to fig. 4 and 5, the image processing method according to the embodiment of the present invention can be realized by the image processing apparatus 100 according to the embodiment of the present invention. The image processing apparatus 100 according to the embodiment of the present invention is used for the electronic apparatus 1000. As shown in fig. 5, the image processing apparatus 100 includes a visible light camera 11, a depth image capturing assembly 12, and a processor 20.

Steps

101, 102 and 103 may be implemented by the visible light camera 11, step 104 may be implemented by the depth image acquisition assembly 12, and steps 105 and 106 are implemented by the processor 20.

That is to say, the visible light camera 11 may be configured to detect the current scene brightness, and if it is detected that the preset brightness of the virtual background image is lower than the scene brightness, simulate a light-on sound through a related sound device, and add a virtual light source in the virtual background image according to a brightness difference between the two, so that the brightness of the virtual background image matches the scene brightness, and further, obtain the scene image of the current user; the depth image acquisition component 12 may be used to acquire a depth image of a current user; the processor 20 is operable to process the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image, and fuse the person region image with a preset virtual background image to obtain a merged image.

The scene image can be a gray level image or a color image, and the depth image representation includes depth information of each person or object in the scene of the current user. The scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

The image processing apparatus 100 according to the embodiment of the present invention can be applied to the electronic apparatus 1000 according to the embodiment of the present invention. That is, the electronic apparatus 1000 according to the embodiment of the present invention includes the image processing apparatus 100 according to the embodiment of the present invention.

In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart band, a smart watch, a smart helmet, smart glasses, and the like.

The existing method for segmenting the human and the background mainly performs segmentation of the human and the background according to similarity and discontinuity of adjacent pixels in terms of pixel values, but the segmentation method is easily influenced by environmental factors such as external illumination and the like. The image processing device 100 and the electronic device 1000 of the embodiment of the invention extract the human figure region in the scene image by acquiring the depth image of the current user. Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, the character region extracted through the depth image is more accurate, and particularly, the boundary of the character region can be accurately calibrated. Furthermore, the effect of the merged image formed by fusing the accurate character region image and the preset virtual background is better.

Referring to fig. 6, as a possible implementation manner, the step of acquiring the depth image of the current user in step 103 includes:

step 401, projecting structured light to a current user.

Step 402, a structured light image modulated by a current user is captured.

In step 403, phase information corresponding to each pixel of the structured light image is demodulated to obtain a depth image.

In this example, with continued reference to fig. 5, the depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122. Step 401 may be implemented by the structured light projector 121 and

steps

402 and 403 may be implemented by the structured light camera 122.

That is, the structured light projector 121 may be used to transmit structured light to a current user; the structured light camera 122 may be configured to capture a structured light image modulated by a current user, and demodulate phase information corresponding to each pixel of the structured light image to obtain a depth image.

Specifically, after the structured light projector 121 projects a certain pattern of structured light onto the face and the body of the current user, a structured light image modulated by the current user is formed on the surface of the face and the body of the current user. The structured light camera 122 captures a modulated structured light image, and demodulates the structured light image to obtain a depth image. The pattern of the structured light may be laser stripes, gray codes, sinusoidal stripes, non-uniform speckles, etc.

Referring to fig. 7, in some embodiments, the step 403 of demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image includes:

step 501, phase information corresponding to each pixel in the structured light image is demodulated.

Step 502, converting the phase information into depth information.

Step 503, generating a depth image according to the depth information.

Continuing to refer to fig. 4, in some embodiments, step 501, step 502, and step 503 can all be implemented by the structured light camera 122.

That is, the structured light camera 122 may be further configured to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

Specifically, the phase information of the modulated structured light is changed compared with the unmodulated structured light, and the structured light displayed in the structured light image is the distorted structured light, wherein the changed phase information can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information according to the phase information, thereby obtaining the final depth image.

In order to make the process of acquiring depth images of the face and body of the current user according to the structure more obvious to those skilled in the art, a widely-applied raster projection technique (fringe projection technique) is taken as an example to illustrate the specific principle. The grating projection technology belongs to the field of surface structured light in a broad sense.

As shown in fig. 8(a), when the surface structured light is used for projection, firstly, a sinusoidal stripe is generated by computer programming, and is projected to a measured object through the structured light projector 121, then the structured light camera 122 is used to shoot the bending degree of the stripe after being modulated by an object, and then the bending stripe is demodulated to obtain a phase, and then the phase is converted into depth information, so as to obtain a depth image. To avoid the problem of error or error coupling, the depth image capturing assembly 12 needs to be calibrated before using the structured light to capture the depth information, and the calibration includes calibration of geometric parameters (e.g., relative position parameters between the structured light camera 122 and the structured light projector 121, etc.), calibration of internal parameters of the structured light camera 122 and internal parameters of the structured light projector 121, and so on.

Specifically, in a first step, the computer is programmed to generate sinusoidal stripes. Since the phase is acquired by using the distorted stripe, for example, the phase is acquired by using a four-step phase shifting method, four phase differences are generated here

Then the structured light projector 121 projects the four stripes onto the object to be measured (mask shown in fig. 8 (a)) in a time-sharing manner, and the structured light camera 122 acquires the image on the left side of fig. 8(b) and simultaneously reads the stripes on the reference plane shown on the right side of fig. 8 (b).

And secondly, phase recovery is carried out. The structured light camera 122 calculates a modulated phase according to the four acquired modulated fringe patterns (i.e., structured light images), and the obtained phase pattern is a truncated phase pattern. Since the result of the four-step phase-shifting algorithm is calculated by the arctan function, the phase after the light modulation of the structure is limited to between-pi, i.e. it starts again each time the modulated phase exceeds-pi, pi. The resulting phase principal value is shown in fig. 8 (c).

In the phase recovery process, the jump-canceling process is required, that is, the truncated phase is recovered to the continuous phase. As shown in fig. 8(d), the modulated continuous phase diagram is on the left and the reference continuous phase diagram is on the right.

And thirdly, subtracting the modulated continuous phase from the reference continuous phase to obtain a phase difference (namely phase information), wherein the phase difference represents the depth information of the measured object relative to the reference surface, and substituting the phase difference into a phase and depth conversion formula (parameters related in the formula are calibrated), so that the three-dimensional model of the object to be measured shown in the figure 8(e) can be obtained.

It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any pattern other than the grating, according to different application scenarios.

As a possible implementation mode, the invention can also use speckle structure light to collect the depth information of the current user.

Specifically, the method for acquiring depth information by using speckle structure light is to use a substantially flat diffraction element, wherein the diffraction element is provided with a relief diffraction structure with a specific phase distribution, and the cross section of the diffraction element is provided with a step relief structure with two or more concave-convex parts. The thickness of the substrate in the diffraction element is approximately 1 micron, the height of each step is not uniform, and the height can be in the range of 0.7-0.9 micron. The structure shown in fig. 9(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 9(b) is a cross-sectional side view taken along section a-a, with the abscissa and ordinate both in units of microns. Speckle patterns generated by speckle structured light are highly random and can shift pattern with distance. Therefore, before obtaining depth information using speckle structured light, firstly, a speckle pattern in a space needs to be calibrated, for example, a reference plane is taken every 1 cm within a range of 0-4 m from the structured light camera 122, 400 speckle images are saved after calibration is completed, and the smaller the calibrated interval is, the higher the accuracy of the obtained depth information is. Then, the structured light projector 121 projects the speckle structured light onto a measured object (i.e., a current user), and the speckle pattern of the speckle structured light projected onto the measured object is changed by the height difference of the surface of the measured object. After the structured light camera 122 shoots the speckle pattern (i.e., structured light image) projected onto the measured object, the speckle pattern and 400 speckle images stored after previous calibration are subjected to cross-correlation operation one by one, and then 400 correlation images are obtained. The position of the measured object in the space can display a peak value on the correlation image, and the peak values are superposed together and subjected to interpolation operation to obtain the depth information of the measured object.

Since the common diffraction element diffracts the light beam to obtain a plurality of diffracted lights, the difference of the light intensity of each diffracted light beam is large, and the risk of injury to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained light beam is low. Therefore, the effect of projecting the object to be measured by using the light beam diffracted by the ordinary diffraction element is poor. In this embodiment, the collimating beam splitting element is adopted, and the collimating beam splitting element not only has the function of collimating the non-collimated light beam, but also has the function of splitting light, that is, the non-collimated light reflected by the reflector exits a plurality of collimated light beams at different angles after passing through the collimating beam splitting element, the cross-sectional areas of the emitted collimated light beams are approximately equal, the energy fluxes are approximately equal, and further, the effect of projecting by using the scattered light diffracted by the light beams is better. Meanwhile, the laser emergent light is dispersed to each beam of light, the risk of damaging human eyes is further reduced, and compared with other uniformly-arranged structured light, the speckle structured light has the advantage that the electric quantity consumed by the speckle structured light is lower when the same collecting effect is achieved.

Referring to fig. 10, as one possible implementation manner, the step 104 of processing the scene image and the depth image to extract the person region of the current user in the scene image to obtain the person region image includes:

step 601, recognizing a face region in a scene image.

Step 602, obtaining depth information corresponding to the face region from the depth image.

Step 603, determining the depth range of the person region according to the depth information of the face region.

Step 604, determining a human figure region which is connected with the human face region and falls within the depth range according to the depth range of the human figure region to obtain a human figure region image.

Referring back to fig. 4, in some embodiments, step 601, step 602, step 603, and step 604 may be implemented by the processor 20.

That is, the processor 20 may be further configured to identify a face region in the scene image, obtain depth information corresponding to the face region from the depth image, determine a depth range of the person region according to the depth information of the face region, and determine a person region connected to the face region and falling within the depth range according to the depth range of the person region to obtain a person region image.

Specifically, a trained depth learning model can be used to identify a face region in a scene image, and then depth information of the face region can be determined according to a corresponding relationship between the scene image and a depth image. Because the face region includes features such as a nose, eyes, ears, lips, and the like, the depth data corresponding to each feature in the face region in the depth image is different, for example, when the face is directly facing the depth image capturing component 12, the depth data corresponding to the nose may be smaller, and the depth data corresponding to the ears may be larger in the depth image captured by the depth image capturing component 12. Therefore, the depth information of the face region may be a value or a range of values. When the depth information of the face area is a numerical value, the numerical value can be obtained by averaging the depth data of the face area; alternatively, it may be obtained by taking the median of the depth data of the face region.

Since the human figure region includes the human face region, that is, the human figure region and the human face region are located in a certain depth range, after the processor 20 determines the depth information of the human face region, the depth range of the human figure region may be set according to the depth information of the human face region, and then the human figure region falling within the depth range and connected to the human face region is extracted according to the depth range of the human figure region to obtain the human figure region image.

In this way, the person region image can be extracted from the scene image based on the depth information. Because the depth information is not affected by the image of factors such as illumination, color temperature and the like in the environment, the extracted figure region image is more accurate.

Referring to fig. 11, in some embodiments, the image processing method further includes the following steps:

step 701, processing a scene image to obtain a full-field edge image of the scene image.

Step 702, the image of the person region is modified according to the full-field edge image.

Referring back to fig. 4, in some embodiments, step 701 and step 702 may be implemented by processor 20.

That is, the processor 20 may be further configured to process the scene image to obtain a full-field edge image of the scene image, and modify the person region image based on the full-field edge image.

The processor 20 first performs edge extraction on the scene image to obtain a full-field edge image, where edge lines in the full-field edge image include edge lines of the current user and a background object in the scene where the current user is located. Specifically, the edge extraction can be performed on the scene image through a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for carrying out convolution on a scene image so as to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, pixel points at the edge position in the scene image can be determined, and the full-field edge image after edge extraction is obtained.

After the processor 20 obtains the full-field edge image, the human area image is corrected according to the full-field edge image. It is understood that the person region image is obtained by merging all pixels in the scene image, which are connected to the face region and fall within the set depth range, and in some scenes, there may be some objects connected to the face region and fall within the depth range. Therefore, in order to make the extracted human figure region image more accurate, the human figure region image can be corrected using the full-field edge map.

Further, the processor 20 may perform a secondary correction on the corrected image of the person region, for example, perform an expansion process on the corrected image of the person region to expand the image of the person region to retain edge details of the image of the person region.

After the processor 20 obtains the person region image, the person region image and the predetermined virtual background may be fused to obtain a merged image. In some embodiments, the preset virtual background may be randomly selected by the processor 20 or selected by the current user. The merged combined image may be displayed on a display screen of the electronic apparatus 1000 or may be printed by a printer connected to the electronic apparatus 1000.

In an embodiment of the present invention, a current user wants to hide a current background during a video process with another person, at this time, the image processing method according to the embodiment of the present invention may be used to fuse a person region image corresponding to the current user with a preset virtual background, add a virtual light source in the virtual background image in order to fuse the fused person region image with the preset virtual background, and simultaneously, simulate a switching sound in order to enhance an interaction with the user so that a brightness of the virtual background image matches a brightness of a scene, and then display the fused combined image to a target user. Since the current user is in a video call with the other party, the visible light camera 11 needs to capture a scene image of the current user in real time, the depth image collecting component 12 also needs to collect a depth image corresponding to the current user in real time, and the processor 20 timely processes the scene image and the depth image collected in real time so that the other party can see a smooth video picture formed by combining multiple frames of combined images.

In summary, in the image processing method according to the embodiment of the present invention, component elements of a scene where a current user is located are obtained, the component elements are processed according to a preset image processing manner to generate a virtual background image, the brightness of the current scene is detected, if it is detected that the brightness of the virtual background image is lower than the scene brightness, a light-on sound is simulated, meanwhile, a virtual light source is added to the virtual background image according to a brightness difference between the two brightness values, so that the brightness of the virtual background image matches the scene brightness, the scene image of the current user is obtained, a depth image of the current user is obtained, the scene image and the depth image are processed to extract a character region of the current user in the scene image to obtain a character region image, and the character region image and the virtual background image are fused to obtain a combined image. Therefore, the virtual background image is supplemented with light according to the difference value between the virtual background and the scene brightness, the situation that the scene brightness is too large in brightness difference compared with the virtual background image is avoided, the figure region image and the virtual background image are fused naturally, the visual effect of image processing is improved, the light-on sound is simulated during light supplementation, the sense of reality of light supplementation is increased, and interaction with a user is achieved.

Referring to fig. 5 and fig. 12, an electronic device 1000 according to an embodiment of the invention is further provided. The electronic device 1000 includes the image processing device 100. The image processing apparatus 100 may be implemented using hardware and/or software. The image processing apparatus 100 includes an imaging device 10 and a processor 20.

The imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.

Specifically, the visible light camera 11 includes an image sensor 111 and lenses 112, and the visible light camera 11 may be configured to detect current scene brightness, and if it is detected that the brightness of a preset virtual background image is lower than the scene brightness, add a virtual light source in the virtual background image according to a brightness difference between the two, and meanwhile, to improve reality, simulate a light-on sound while supplementing light, so that the brightness of the virtual background image matches the scene brightness, and further capture color information of a current user to obtain a scene image, where the image sensor 111 includes a color filter array (e.g., a Bayer filter array), and the number of the lenses 112 may be one or more. In the process of acquiring a scene image by the visible light camera 11, each imaging pixel in the image sensor 111 senses light intensity and wavelength information from a shooting scene to generate a group of original image data; the image sensor 111 sends the group of raw image data to the processor 20, and the processor 20 performs operations such as denoising and interpolation on the raw image data to obtain a colorful scene image. Processor 20 may process each image pixel in the raw image data one-by-one in a variety of formats, for example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and processor 20 may process each image pixel at the same or different bit depth.

The depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122, and the depth image acquisition assembly 12 is operable to capture depth information of a current user to obtain a depth image. The structured light projector 121 is used to project structured light to the current user, wherein the structured light pattern may be a laser stripe, a gray code, a sinusoidal stripe, or a randomly arranged speckle pattern, etc. The structured light camera 122 includes an image sensor 1221 and lenses 1222, and the number of the lenses 1222 may be one or more. The image sensor 1221 is used to capture a structured light image projected onto a current user by the structured light projector 121. The structured light image may be sent by the depth acquisition component 12 to the processor 20 for demodulation, phase recovery, phase information calculation, and the like to obtain the depth information of the current user.

In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be implemented by one camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, and the camera can capture not only the scene image but also the structured light image.

In addition to acquiring a depth image by using structured light, a depth image of a current user may be acquired by a binocular vision method, a time of flight (TOF) based depth image acquisition method, or the like.

The processor 20 is further configured to fuse the character region image extracted from the scene image and the depth image with a preset virtual background image, and display the merged image to a target user in video communication with the current user. When extracting the person region image, the processor 20 may extract a two-dimensional person region image from the scene image in combination with the depth information in the depth image, or may create a three-dimensional map of the person region according to the depth information in the depth image, and color-fill the three-dimensional person region in combination with the color information in the scene image to obtain a three-dimensional color person region image. Therefore, when the character region image and the preset virtual background image are subjected to the fusion processing, the two-dimensional character region image and the preset virtual background image may be fused to obtain a merged image, or the three-dimensional color character region image and the preset virtual background image may be fused to obtain the merged image.

Further, the image processing apparatus 100 includes an image memory 30. The image memory 30 may be embedded in the electronic device 1000, or may be a memory independent from the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The raw image data collected by the visible light camera 11 or the structured light image related data collected by the depth image collecting assembly 12 can be transmitted to the image memory 30 for storage or buffering. Processor 20 may read raw image data from image memory 30 for processing to obtain an image of a scene and may read structured light image-related data from image memory 30 for processing to obtain a depth image. In addition, the scene image and the depth image may also be stored in the image memory 30, so that the processor 20 may call the processing at any time, for example, the processor 20 calls the scene image and the depth image to perform the character region extraction, and performs the fusion processing on the obtained character region image after the calling and the depth image and the preset virtual background image to obtain the merged image. Wherein the preset virtual background image and the merged image may also be stored in the image memory 30.

The image processing apparatus 100 may also include a display 50. The display 50 may retrieve the merged image directly from the processor 20 and may also retrieve the merged image from the image memory 30. The display 50 displays the merged image for viewing by the target user or for further Processing by a Graphics Processing Unit (GPU). The image processing apparatus 100 further includes an encoder/decoder 60, and the encoder/decoder 60 may encode and decode image data of a scene image, a depth image, a merged image, and the like, and the encoded image data may be stored in the image memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), a GPU, or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a Central Processing Unit (CPU), a GPU, and a coprocessor.

The image processing apparatus 100 further comprises a control logic 40. When imaging device 10 is imaging, processor 20 may perform an analysis based on data acquired by the imaging device to determine image statistics for one or more control parameters (e.g., exposure time, etc.) of imaging device 10. Processor 20 sends the image statistics to control logic 40 and control logic 40 controls imaging device 10 to determine the control parameters for imaging. Control logic 40 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

Referring to fig. 13, an electronic device 1000 according to an embodiment of the invention includes one or more processors 200, a memory 300, and one or more programs 310. Where one or more programs 310 are stored in memory 300 and configured to be executed by one or more processors 200. The program 310 includes instructions for performing the image processing method of any of the above embodiments.

For example, program 310 includes instructions for performing the image processing method described in the following steps:

step 01: acquiring a component element of a scene where a current user is located, and processing the component element according to a preset image processing mode to generate a virtual background image.

Step 02: detecting the current scene brightness, if the brightness of the virtual background image is lower than the scene brightness, simulating a light-on sound, and simultaneously adding a virtual light source in the virtual background image according to the brightness difference value of the two images to enable the brightness of the virtual background image to be matched with the scene brightness.

Step 03: and acquiring a scene image of the current user.

Step 04: and acquiring a depth image of the current user.

Step 05: the scene image and the depth image are processed to extract a person region of the current user in the scene image to obtain a person region image.

Step 06: and fusing the character area image and the virtual background image to obtain a combined image.

For another example, the program 310 further includes instructions for performing an image processing method as described in the following steps:

step 0331: demodulating phase information corresponding to each pixel in the structured light image;

step 0332: converting the phase information into depth information; and

step 0333: and generating a depth image according to the depth information.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with the image-enabled electronic device 1000. The computer program may be executed by the processor 200 to perform the image processing method of any of the above embodiments.

For example, the computer program may be executed by the processor 200 to perform the image processing method described in the following steps:

Step 03: and acquiring a scene image of the current user.

Step 04: and acquiring a depth image of the current user.

As another example, the computer program may also be executable by the processor 200 to perform an image processing method as described in the following steps:

0331: demodulating phase information corresponding to each pixel in the structured light image;

0332: converting the phase information into depth information; and

0333: and generating a depth image according to the depth information.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An image processing method for an electronic device, the image processing method comprising:

acquiring component elements of a scene where a current user is located, wherein the component elements comprise articles and environment information of a real scene where the user is located; processing the component elements according to a preset image processing mode to generate a virtual background image; the virtual background image comprises one of a two-dimensional virtual background image or a three-dimensional virtual background image;

detecting the current scene brightness, if the brightness of the virtual background image is lower than the scene brightness, simulating a light-on sound, and simultaneously adding a virtual light source corresponding to the brightness difference value in the virtual background image according to the brightness difference value of the virtual background image and the scene brightness to enable the brightness of the virtual background image to be matched with the scene brightness;

acquiring a scene image of a current user;

acquiring a depth image of the current user;

processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

and fusing the character area image and the virtual background image to obtain a combined image.

2. The method according to claim 1, wherein adding a virtual light source in the virtual background image according to the brightness difference between the two comprises:

inquiring light supplement information corresponding to a preset virtual light source, and acquiring light source compensation intensity and a projection direction matched with the brightness difference;

and adding a corresponding virtual light source in the virtual background image according to the light source compensation intensity and the projection direction.

3. The method according to claim 1, wherein adding a virtual light source in the virtual background image according to the brightness difference between the two comprises:

setting one or more types of virtual light sources in the virtual background image;

inquiring preset supplementary lighting adjustment information according to the positions of the various types of virtual light sources, and acquiring target working state data corresponding to the brightness difference;

and adjusting the working parameters of the virtual light sources at the corresponding positions according to the target working state data.

4. The method of claim 3,

the type of the virtual light source comprises: one or more of a surface light source, a spot light, a ball light and sunlight;

the working parameters of the virtual light source comprise: pitch, altitude, brightness, color, intensity, or a combination thereof.

5. The method of claim 1, further comprising:

and determining the virtual background image randomly according to a preset mode or according to the preference characteristics of the current user.

6. The method of claim 1, wherein the obtaining the depth image of the current user comprises:

projecting structured light towards the current user;

shooting a structured light image modulated by the current user; and

and demodulating phase information corresponding to each pixel of the structured light image to obtain the depth image.

7. The method of claim 6, wherein demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image comprises:

demodulating phase information corresponding to each pixel in the structured light image;

converting the phase information into depth information; and

and generating the depth image according to the depth information.

8. The method of claim 1, wherein the processing the scene image and the depth image to extract a human figure region of the current user in the scene image to obtain a human figure region image comprises:

identifying a face region in the scene image;

acquiring depth information corresponding to the face area from the depth image;

determining the depth range of the character region according to the depth information of the face region; and

and determining a person region which is connected with the face region and falls into the depth range according to the depth range of the person region to obtain the person region image.

9. The method of claim 8, further comprising:

processing the scene image to obtain a full-field edge image of the scene image; and

and correcting the image of the person region according to the full-field edge image.

10. An image processing apparatus for an electronic apparatus, comprising:

the system comprises a visible light camera and a control module, wherein the visible light camera is used for acquiring component elements of a scene where a current user is located, and the component elements comprise articles and environment information of a real scene where the user is located; processing the component elements according to a preset image processing mode to generate a virtual background image; the virtual background image comprises one of a two-dimensional virtual background image or a three-dimensional virtual background image;

acquiring a scene image of a current user;

the depth image acquisition component is used for acquiring a depth image of the current user;

a processor for processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

11. The apparatus of claim 10, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting structured light to the current user;

the structured light camera is configured to:

shooting a structured light image modulated by the current user; and

12. The apparatus of claim 11, wherein the structured light camera is further to:

converting the phase information into depth information; and

and generating the depth image according to the depth information.

13. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing method of any of claims 1-9.

14. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device capable of capturing images, the computer program being executable by a processor to perform the image processing method of any of claims 1-9.