CN107734266B

CN107734266B - Image processing method and apparatus, electronic apparatus, and computer-readable storage medium

Info

Publication number: CN107734266B
Application number: CN201710813312.8A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2021-01-19
Anticipated expiration: 2037-09-11
Also published as: CN107734266A

Abstract

The invention discloses an image processing method which is used for an electronic device to process a combined image. The combined image is formed by fusing a preset three-dimensional background image and a character area image in a scene image of a current user in a real scene. The image processing method comprises the following steps: detecting position change information of the person region image according to the motion sensor output of the electronic device; and adjusting the predetermined three-dimensional background image according to the position change information to match the predetermined three-dimensional background with the position change information. The invention also discloses an image processing device, an electronic device and a computer readable storage medium. According to the image processing method, the image processing device, the electronic device and the computer readable storage medium, the position change of the character image is detected according to the motion information of the electronic device, so that the background image is adjusted, the motion information of the background image is matched with the motion information of the character image, the visual effect of the combined image is more real, and the user experience is improved.

Description

Image processing method and apparatus, electronic apparatus, and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic apparatus, and a computer-readable storage medium.

Background

When the character image of the existing real scene is fused with the virtual background image, the position of the character image in the real scene is changed due to the movement of the camera, the virtual background image is still unchanged, the matching degree of the character image and the virtual background image is poor, and the user experience is poor.

Disclosure of Invention

Embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic apparatus, and a computer-readable storage medium.

An image processing method according to an embodiment of the present invention is an image processing method for an electronic device to process a merged image in which a predetermined three-dimensional background image is merged with a person region image in a scene image of a current user in a real scene, the image processing method including:

detecting position change information of the person region image according to the motion sensor output of the electronic device; and

and adjusting the preset three-dimensional background image according to the position change information so as to enable the preset three-dimensional background to be matched with the position change information.

An image processing apparatus according to an embodiment of the present invention is an image processing apparatus for an electronic apparatus to process a merged image in which a predetermined three-dimensional background image is merged with a person region image in a scene image of a current user in a real scene, the image processing apparatus including:

a processor to:

The electronic device of an embodiment of the present invention includes one or more processors, memory, and one or more programs. Wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs including instructions for performing the image processing method described above.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with an electronic device capable of image capture, the computer program being executable by a processor to perform the image processing method described above.

According to the image processing method, the image processing device, the electronic device and the computer readable storage medium, when the combined image of the real person and the virtual background is processed, the position change of the person image is detected according to the motion information of the electronic device, so that the background image is adjusted, the motion information of the background image is matched with the motion information of the person image, the visual effect of the combined image is more real, and the user experience is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram illustrating an image processing method according to some embodiments of the invention.

Fig. 2 is a schematic structural diagram of an electronic device according to some embodiments of the invention.

FIG. 3 is a flow diagram illustrating an image processing method according to some embodiments of the invention.

FIG. 4 is a block diagram of an image processing apparatus according to some embodiments of the invention.

FIG. 5 is a flow chart illustrating an image processing method according to some embodiments of the present invention.

FIG. 6 is a flow chart illustrating an image processing method according to some embodiments of the invention.

Fig. 7(a) to 7(e) are schematic views of a scene of structured light measurement according to an embodiment of the present invention.

FIGS. 8(a) and 8(b) are schematic views of a scene for structured light measurement according to one embodiment of the present invention.

FIG. 9 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 10 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 11 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 12 is a flow chart illustrating an image processing method according to some embodiments of the invention.

FIG. 13 is a block diagram of an electronic device according to some embodiments of the invention.

FIG. 14 is a block diagram of an electronic device according to some embodiments of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1 and fig. 2, an image processing method according to an embodiment of the present invention is used for an electronic device to process a merged image, and the image processing method includes the steps of:

s10; detecting position change information of the person region image according to the motion sensor output of the electronic device; and

s20: and adjusting the predetermined three-dimensional background image according to the position change information to match the predetermined three-dimensional background with the position change information.

Referring to fig. 2, an image processing apparatus 100 according to an embodiment of the present invention is used for processing a merged image. The merged image is formed by fusing a preset three-dimensional background image and a character area image in a scene image of a current user in a real scene. The image processing method according to the embodiment of the present invention can be implemented by the image processing apparatus 100 according to the embodiment of the present invention, and is used in the electronic apparatus 1000. The image processing apparatus 100 includes a processor 20. Steps S10 and S20 may be implemented by the processor 20.

That is, the processor 20 is configured to detect position change information of the person region image according to the motion sensor output of the electronic device 1000 and adjust the predetermined three-dimensional background image according to the position change information to match the predetermined three-dimensional background with the position change information.

In some application scenes, such as a video conference or a video process, for safety, privacy, interest increase and other requirements, the participating parties adopt a predetermined three-dimensional background image to replace a real scene as a background to fuse with a character area image in a scene image of a current user in the real scene to form a combined image, and the combined image is output and presented to the other party. The scene image is collected through the camera, and in the process, the change of the shooting angle and even the abnormality can be caused due to the position change of the electronic device, such as falling, moving and the like, so that the scene image is changed in angle and position, and the user experience is poor.

In the image processing method according to the embodiment of the present invention, in the process of performing a video conference or a video chat, the motion of the camera, that is, the position change information of the person region image is detected by the output of the motion sensor, for example, the acceleration sensor or the gyroscope. Specifically, a three-dimensional coordinate system can be established by taking the position of the image of the current character area as the origin of coordinates, and an x axis, a y axis, a z axis and a positive direction are specified. It can be understood that there is a certain hysteresis in the adjustment of the three-dimensional background image with respect to the change of the character image region, and therefore, when it is detected that the character image region starts to change, the combined image that is stable in the last frame before the change can be continuously output, when the character region image is stabilized again, the position change information can be detected, the predetermined three-dimensional background image is adjusted according to the position change information, and after the adjustment is completed, the adjusted image is directly output, and the intermediate change process is ignored, so that the image is kept stable to some extent.

The image processing apparatus 100 according to the embodiment of the present invention can be applied to the electronic apparatus 1000 according to the embodiment of the present invention. That is, the electronic apparatus 1000 according to the embodiment of the present invention includes the image processing apparatus 100 according to the embodiment of the present invention.

In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart band, a smart watch, a smart helmet, smart glasses, and the like.

Referring to fig. 3 and 4, in some embodiments, an image processing method includes:

s01: acquiring a scene image of a current user;

s02: acquiring a depth image of a current user;

s03: processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image; and

s04: and fusing the human figure region image with a preset three-dimensional background image to obtain a combined image.

In some embodiments, the image processing apparatus 100 further includes a visible light camera 11 and a depth image acquisition assembly 12. Step S01 may be implemented by the visible light camera 11, step 02 may be implemented by the depth image capturing assembly 12, and steps S03 and step 04 may be implemented by the processor 20.

That is, the visible light camera 11 may be used to acquire a scene image of the current user; the depth image acquisition component 12 may be used to acquire a depth image of a current user; the processor 20 may be configured to process the scene image and the depth image to extract a person region of a current user in the scene image to obtain a person region image, and to fuse the person region image with a predetermined three-dimensional background image to obtain a merged image.

The scene image can be a gray level image or a color image, and the depth image representation includes depth information of each person or object in the scene of the current user. The scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

The existing method for segmenting the human and the background mainly performs segmentation of the human and the background according to similarity and discontinuity of adjacent pixels in terms of pixel values, but the segmentation method is easily influenced by environmental factors such as external illumination and the like. The image processing method, the image processing device 100 and the electronic device 1000 of the embodiment of the invention extract the human figure region in the scene image by acquiring the depth image of the current user. Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, the character region extracted through the depth image is more accurate, and particularly, the boundary of the character region can be accurately calibrated. Furthermore, the effect of the merged image obtained by fusing the accurate character region image and the predetermined three-dimensional background is better.

In some embodiments, the predetermined three-dimensional background image may be a predetermined three-dimensional background image obtained by modeling an actual scene, or may be an animated predetermined three-dimensional background image. The predetermined three-dimensional background image may be randomly selected by the processor 20 or may be selected by the current user.

Referring to fig. 5, in some embodiments, step S02 includes the steps of:

s021: projecting structured light to a current user;

s022: shooting a structured light image modulated by a current user; and

s023: and demodulating the phase information corresponding to each pixel of the structured light image to obtain a depth image.

Referring back to fig. 2, in some embodiments, the depth image capture assembly 12 includes a structured light projector 121 and a structured light camera 122. Step S021 may be implemented by the structured light projector 121, and steps S022 and S023 may be implemented by the structured light camera 122.

That is, the structured light projector 121 may be used to transmit structured light to a current user; the structured light camera 122 may be configured to capture a structured light image modulated by a current user, and demodulate phase information corresponding to each pixel of the structured light image to obtain a depth image.

Specifically, after the structured light projector 121 projects a certain pattern of structured light onto the face and the body of the current user, a structured light image modulated by the current user is formed on the surface of the face and the body of the current user. The structured light camera 122 captures a modulated structured light image, and demodulates the structured light image to obtain a depth image. The pattern of the structured light may be laser stripes, gray codes, sinusoidal stripes, non-uniform speckles, etc.

Referring to fig. 6, in some embodiments, the step S023 demodulating the phase information corresponding to each pixel of the structured-light image to obtain the depth image includes:

s0231: demodulating phase information corresponding to each pixel in the structured light image;

s0232: converting the phase information into depth information; and

s0233: and generating a depth image according to the depth information.

In certain embodiments, step S0231, step S0232, and step S0233 may all be implemented by structured light camera 122.

That is, the structured light camera 122 may be further configured to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

Specifically, the phase information of the modulated structured light is changed compared with the unmodulated structured light, and the structured light displayed in the structured light image is the distorted structured light, wherein the changed phase information can represent the depth information of the object. Therefore, the structured light camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information according to the phase information, thereby obtaining the final depth image.

In order to make the process of acquiring depth images of the face and body of the current user according to the structure more obvious to those skilled in the art, a widely-applied raster projection technique (fringe projection technique) is taken as an example to illustrate the specific principle. The grating projection technology belongs to the field of surface structured light in a broad sense.

As shown in fig. 7(a), when the surface structured light is used for projection, firstly, a sinusoidal stripe is generated by computer programming, and is projected to a measured object through the structured light projector 121, then the structured light camera 122 is used to shoot the bending degree of the stripe after being modulated by an object, and then the bending stripe is demodulated to obtain a phase, and then the phase is converted into depth information, so as to obtain a depth image. To avoid the problem of error or error coupling, the depth image capturing assembly 12 needs to be calibrated before using the structured light to capture the depth information, and the calibration includes calibration of geometric parameters (e.g., relative position parameters between the structured light camera 122 and the structured light projector 121, etc.), calibration of internal parameters of the structured light camera 122 and internal parameters of the structured light projector 121, and so on.

Specifically, in a first step, the computer is programmed to generate sinusoidal stripes. Using distortion due to subsequent needObtaining the phase, e.g. by using a four-step phase-shifting method, whereby four phase differences are generated

Then the structured light projector 121 projects the four stripes onto the object to be measured (mask shown in fig. 7 (a)) in a time-sharing manner, and the structured light camera 122 acquires the image on the left side of fig. 7(b) and simultaneously reads the stripes on the reference plane shown on the right side of fig. 7 (b).

And secondly, phase recovery is carried out. The structured light camera 122 calculates a modulated phase according to the four acquired modulated fringe patterns (i.e., structured light images), and the obtained phase pattern is a truncated phase pattern. Since the result of the four-step phase-shifting algorithm is calculated by the arctan function, the phase after the light modulation of the structure is limited to between-pi, i.e. it starts again each time the modulated phase exceeds-pi, pi. The resulting phase principal value is shown in fig. 7 (c).

In the phase recovery process, the jump-canceling process is required, that is, the truncated phase is recovered to the continuous phase. As shown in fig. 7(d), the modulated continuous phase diagram is on the left and the reference continuous phase diagram is on the right.

And thirdly, subtracting the modulated continuous phase from the reference continuous phase to obtain a phase difference (namely phase information), wherein the phase difference represents the depth information of the measured object relative to the reference surface, and substituting the phase difference into a phase and depth conversion formula (parameters related in the formula are calibrated), so that the three-dimensional model of the object to be measured shown in the figure 7(e) can be obtained.

It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any pattern other than the grating, according to different application scenarios.

As a possible implementation mode, the invention can also use speckle structure light to collect the depth information of the current user.

Specifically, the method for acquiring depth information by using speckle structure light is to use a substantially flat diffraction element, wherein the diffraction element is provided with a relief diffraction structure with a specific phase distribution, and the cross section of the diffraction element is provided with a step relief structure with two or more concave-convex parts. The thickness of the substrate in the diffraction element is approximately 1 micron, the height of each step is not uniform, and the height can be in the range of 0.7-0.9 micron. The structure shown in fig. 8(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 8(b) is a cross-sectional side view taken along section a-a, with the abscissa and ordinate both in units of microns. Speckle patterns generated by speckle structured light are highly random and can shift pattern with distance. Therefore, before obtaining depth information using speckle structured light, firstly, a speckle pattern in a space needs to be calibrated, for example, a reference plane is taken every 1 cm within a range of 0-4 m from the structured light camera 122, 400 speckle images are saved after calibration is completed, and the smaller the calibrated interval is, the higher the accuracy of the obtained depth information is. Then, the structured light projector 121 projects the speckle structured light onto a measured object (i.e., a current user), and the speckle pattern of the speckle structured light projected onto the measured object is changed by the height difference of the surface of the measured object. After the structured light camera 122 shoots the speckle pattern (i.e., structured light image) projected onto the measured object, the speckle pattern and 400 speckle images stored after previous calibration are subjected to cross-correlation operation one by one, and then 400 correlation images are obtained. The position of the measured object in the space can display a peak value on the correlation image, and the peak values are superposed together and subjected to interpolation operation to obtain the depth information of the measured object.

Since the common diffraction element diffracts the light beam to obtain a plurality of diffracted lights, the difference of the light intensity of each diffracted light beam is large, and the risk of injury to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained light beam is low. Therefore, the effect of projecting the object to be measured by using the light beam diffracted by the ordinary diffraction element is poor. In this embodiment, the collimating beam splitting element is adopted, and the collimating beam splitting element not only has the function of collimating the non-collimated light beam, but also has the function of splitting light, that is, the non-collimated light reflected by the reflector exits a plurality of collimated light beams at different angles after passing through the collimating beam splitting element, the cross-sectional areas of the emitted collimated light beams are approximately equal, the energy fluxes are approximately equal, and further, the effect of projecting by using the scattered light diffracted by the light beams is better. Meanwhile, the laser emergent light is dispersed to each beam of light, the risk of damaging human eyes is further reduced, and compared with other uniformly-arranged structured light, the speckle structured light has the advantage that the electric quantity consumed by the speckle structured light is lower when the same collecting effect is achieved.

Referring to fig. 9, in some embodiments, step S03 further includes:

s031: identifying a face region in a scene image;

s032: acquiring depth information corresponding to a face region from a depth image;

s033: determining the depth range of the character region according to the depth information of the face region; and

s034: and determining a human figure region which is connected with the human face region and falls within the depth range according to the depth range of the human figure region to obtain a human figure region image.

In certain embodiments, step S031, step S032, step S033, and step S034 may all be implemented by the processor 20.

That is, the processor 20 may be further configured to identify a face region in the scene image, obtain depth information corresponding to the face region from the depth image, determine a depth range of the person region according to the depth information of the face region, and determine a person region connected to the face region and falling within the depth range according to the depth range of the person region to obtain a person region image.

Specifically, a trained depth learning model can be used to identify a face region in a scene image, and then depth information of the face region can be determined according to a corresponding relationship between the scene image and a depth image. Because the face region includes features such as a nose, eyes, ears, lips, and the like, the depth data corresponding to each feature in the face region in the depth image is different, for example, when the face is directly facing the depth image capturing component 12, the depth data corresponding to the nose may be smaller, and the depth data corresponding to the ears may be larger in the depth image captured by the depth image capturing component 12. Therefore, the depth information of the face region may be a value or a range of values. When the depth information of the face area is a numerical value, the numerical value can be obtained by averaging the depth data of the face area; alternatively, it may be obtained by taking the median of the depth data of the face region.

Since the human figure region includes the human face region, that is, the human figure region and the human face region are located in a certain depth range, after the processor 20 determines the depth information of the human face region, the depth range of the human figure region may be set according to the depth information of the human face region, and then the human figure region falling within the depth range and connected to the human face region is extracted according to the depth range of the human figure region to obtain the human figure region image.

In this way, the person region image can be extracted from the scene image based on the depth information. Because the depth information is not affected by the image of factors such as illumination, color temperature and the like in the environment, the extracted figure region image is more accurate.

In some embodiments, the image processing method further comprises the steps of:

processing the scene image to obtain a full-field edge image of the scene image; and

and correcting the image of the person region according to the full-field edge image.

In some embodiments, the step of processing the scene image to obtain a full-field edge image of the scene image and the step of modifying the image of the person region based on the full-field edge image may be performed by the processor 20.

That is, the processor 20 may be further configured to process the scene image to obtain a full-field edge image of the scene image, and modify the person region image based on the full-field edge image.

The processor 20 first performs edge extraction on the scene image to obtain a full-field edge image, where edge lines in the full-field edge image include edge lines of the current user and a background object in the scene where the current user is located. Specifically, the edge extraction can be performed on the scene image through a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for carrying out convolution on a scene image so as to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, pixel points at the edge position in the scene image can be determined, and the full-field edge image after edge extraction is obtained.

After the processor 20 obtains the full-field edge image, the human area image is corrected according to the full-field edge image. It is understood that the person region image is obtained by merging all pixels in the scene image, which are connected to the face region and fall within the set depth range, and in some scenes, there may be some objects connected to the face region and fall within the depth range. Therefore, in order to make the extracted human figure region image more accurate, the human figure region image can be corrected using the full-field edge map.

Further, the processor 20 may perform a secondary correction on the corrected image of the person region, for example, perform an expansion process on the corrected image of the person region to expand the image of the person region to retain edge details of the image of the person region.

Referring to fig. 10, in some embodiments, step S04 includes the steps of:

s041: acquiring a preset fusion area in a preset three-dimensional background image;

s042: determining a pixel area to be replaced of a preset fusion area according to the person area image; and

s043: and replacing the pixel area to be replaced of the preset fusion area with the human figure area image to obtain a combined image.

In some embodiments, steps S041 to S043 may be implemented by the processor 20, or in other words, the processor 20 is configured to obtain a predetermined pixel region corresponding to a predetermined fusion region in the predetermined three-dimensional background image, determine a pixel region to be replaced of the predetermined fusion region according to the human figure region image, and replace the pixel region to be replaced of the predetermined fusion region with the human figure region image to obtain a merged image.

It can be understood that when the predetermined three-dimensional background image is obtained through actual scene modeling, depth data corresponding to each pixel in the predetermined three-dimensional background image can be directly obtained in the modeling process; when the preset three-dimensional background image is obtained through animation production, the depth data corresponding to each pixel in the preset three-dimensional background image can be set by a producer; in addition, since each object existing in the predetermined three-dimensional background image is also known, the fusion position of the human figure region image, that is, the predetermined fusion region can be specified from the depth data and the object existing in the predetermined three-dimensional background image before the image fusion processing is performed using the predetermined three-dimensional background image. Since the size of the image of the person region acquired by the visible light camera 11 is affected by the acquisition distance, when the acquisition distance is short, the image of the person region is large, and when the acquisition distance is long, the image of the person region is small, the processor 20 needs to determine the pixel region to be replaced in the predetermined fusion region according to the size of the image of the person region actually acquired by the visible light camera 11. And then, replacing the pixel area to be replaced in the preset fusion area with the image of the person area to obtain a fused combined image. In this way, the fusion of the human figure region image and the predetermined three-dimensional background image is realized.

Referring to fig. 11, in some embodiments, step S04 includes the steps of:

s044: processing the predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image;

s045: acquiring depth data of a preset three-dimensional background image;

s046: determining a calculation fusion area of a preset three-dimensional background image according to a full-field edge image and depth data of the preset three-dimensional background image;

s047: determining and calculating a pixel area to be replaced of the fusion area according to the person area image; and

s048: and replacing the pixel area to be replaced of the calculated fusion area with the human figure area image to obtain a combined image.

In some embodiments, steps S044 to S048 may be implemented by the processor 20, or in other words, the processor 20 is configured to process the predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image, acquire depth data of the predetermined three-dimensional background image, determine a calculation fusion region of the predetermined three-dimensional background image according to the full-field edge image and the depth data of the predetermined three-dimensional background image, determine a pixel region to be replaced of the calculation fusion region according to the person region image, and replace the pixel region to be replaced of the calculation fusion region with the person region image to obtain a merged image.

It is understood that, if the fusion position of the human figure region image is not specified in advance when the predetermined three-dimensional background image is fused with the human figure region image, the processor 20 needs to first determine the fusion position of the human figure region image in the predetermined three-dimensional background image. Specifically, the processor 20 first performs edge extraction on the predetermined three-dimensional background image to obtain a full-field edge image, and obtains depth data of the predetermined three-dimensional background image, wherein the depth data is obtained in a modeling or animation process of the predetermined three-dimensional background image. Processor 20 then determines a calculated fusion region in the predetermined three-dimensional background image based on the full-field edge image and the depth data of the predetermined three-dimensional background image. Since the size of the image of the person region is affected by the acquisition distance of the visible light camera 11, the size of the image of the person region needs to be calculated, and the pixel region to be replaced in the fusion region needs to be determined and calculated according to the size of the image of the person region. And finally, replacing the pixel area to be replaced in the calculated fusion area image with the person area image so as to obtain a combined image. In this way, the fusion of the human figure region image and the predetermined three-dimensional background image is realized.

The merged combined image may be displayed on a display screen of the electronic apparatus 1000 or may be printed by a printer connected to the electronic apparatus 1000.

In some embodiments, the person region image may be a two-dimensional person region image or a three-dimensional person region image. The processor 20 may extract a two-dimensional character region image from the scene image according to the depth information in the depth image, and the processor 20 may further create a three-dimensional image of the character region according to the depth information in the depth image, and perform color filling on the three-dimensional character region according to the color information in the scene image to obtain a three-dimensional color character region image.

In some embodiments, the predetermined fusion region or the calculated fusion region in the three-dimensional background image may be one or more. When the number of the predetermined fusion areas is one, the fusion position of the two-dimensional character area image or the three-dimensional character area image in the predetermined three-dimensional background image is the only one predetermined fusion area; when the number of the calculated fusion areas is one, the fusion position of the two-dimensional character area image or the three-dimensional character area image in the preset three-dimensional background image is the only calculated fusion area; when the predetermined fusion area is a plurality of predetermined fusion areas, the fusion position of the two-dimensional character area image or the three-dimensional character area image in the predetermined three-dimensional background image can be any one of the plurality of predetermined fusion areas, and further, because the three-dimensional character area image has depth information, the predetermined fusion area matched with the depth information of the three-dimensional character area image can be searched in the plurality of predetermined fusion areas as the fusion position, so as to obtain better fusion effect; when the number of the calculated fusion areas is multiple, the fusion position of the two-dimensional character area image or the three-dimensional character area image in the calculated three-dimensional background image can be any one of the multiple calculated fusion areas, and further, because the three-dimensional character area image has depth information, the calculated fusion area matched with the depth information of the three-dimensional character area image can be searched in the multiple calculated fusion areas to be used as the fusion position, so that a better fusion effect can be obtained.

In some embodiments, the position change information includes a change direction and a change angle of the person region image.

It is understood that, in some examples, since there is some hysteresis in the image processing process, and the image during the change is usually ignored for the purpose of image stabilization, the motion information only needs to judge the change direction and the change angle according to the change initial and the change end, and does not need to acquire the change rate to synchronize the whole change process, so that the calculation amount can be reduced to some extent. As described above, when the origin is defined, the direction of change and the angle of change can be calibrated after each axis is in the positive direction. The predetermined three-dimensional background image contains all information in the whole virtual space, and after the direction and the angle are changed, the three-dimensional background image can present information content in a new visual field range. For example, when the electronic device moves toward the zenith direction with the ceiling in the space being the positive z-axis direction, the three-dimensional background image is adjusted in the positive z-axis direction, and the contents are displayed.

Referring to fig. 12, in some embodiments, step S20 includes:

s21: adjusting the predetermined three-dimensional background in the same direction and angle as the change direction and the change angle to match the predetermined three-dimensional background with the position change information;

s22: and fusing the changed human figure region image with the adjusted preset three-dimensional background to obtain an adjusted combined image.

In some embodiments, steps S21 and S22 may be implemented by processor 20, or processor 20 may be configured to adjust the predetermined three-dimensional background in the same direction and at the same angle as the changing direction and the changing angle to match the predetermined three-dimensional background with the position change information and to fuse the changed image of the person region with the adjusted predetermined three-dimensional background to obtain an adjusted combined image.

It can be understood that adjusting the predetermined three-dimensional background in the same direction and angle as the changing direction and angle to match the predetermined three-dimensional background with the position change information can simulate new scene content collected by the camera due to the change of the angle of view after the position of the camera changes in a real situation, so that the combined image is more real. And re-fusing the adjusted background image and the changed person region image to obtain an adjusted combined image and outputting the combined image.

In some application scenarios, for example, when a current user wants to hide a current background during a video process with another person, the image processing method of the embodiment of the present invention may be used to fuse a person region image corresponding to the current user with a predetermined three-dimensional background, and then display the fused merged image to the other party. Since the current user is in a video call with the other party, the visible light camera 11 needs to capture a scene image of the current user in real time, the depth image collecting component 12 also needs to collect a depth image corresponding to the current user in real time, and the processor 20 timely processes the scene image and the depth image collected in real time so that the other party can see a smooth video picture formed by combining multiple frames of combined images.

Referring to fig. 13, an electronic device 1000 is further provided in the present embodiment. The electronic device 1000 includes the image processing device 100. The image processing apparatus 100 may be implemented using hardware and/or software. The image processing apparatus 100 includes an imaging device 10 and a processor 20.

The imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.

Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112, and the visible light camera 11 can be used to capture color information of a current user to obtain an image of a scene, wherein the image sensor 111 includes a color filter array (e.g., a Bayer filter array), and the number of the lens 112 can be one or more. In the process of acquiring a scene image by the visible light camera 11, each imaging pixel in the image sensor 111 senses light intensity and wavelength information from a shooting scene to generate a group of original image data; the image sensor 111 sends the group of raw image data to the processor 20, and the processor 20 performs operations such as denoising and interpolation on the raw image data to obtain a colorful scene image. Processor 20 may process each image pixel in the raw image data one-by-one in a variety of formats, for example, each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and processor 20 may process each image pixel at the same or different bit depth.

The depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122, and the depth image acquisition assembly 12 is operable to capture depth information of a current user to obtain a depth image. The structured light projector 121 is used to project structured light to the current user, wherein the structured light pattern may be a laser stripe, a gray code, a sinusoidal stripe, or a randomly arranged speckle pattern, etc. The structured light camera 122 includes an image sensor 1221 and lenses 1222, and the number of the lenses 1222 may be one or more. The image sensor 1221 is used to capture a structured light image projected onto a current user by the structured light projector 121. The structured light image may be sent by the depth acquisition component 12 to the processor 20 for demodulation, phase recovery, phase information calculation, and the like to obtain the depth information of the current user.

In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be implemented by one camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, and the camera can capture not only the scene image but also the structured light image.

In addition to acquiring a depth image by using structured light, a depth image of a current user can be acquired by a binocular vision method, a Time of Flight (TOF) based depth image acquisition method, and the like.

The processor 20 is further configured to fuse the image of the person region extracted from the image of the scene and the depth image with a predetermined two-dimensional background image. When extracting the person region image, the processor 20 may extract a two-dimensional person region image from the scene image in combination with the depth information in the depth image, or may create a three-dimensional map of the person region according to the depth information in the depth image, and color-fill the three-dimensional person region in combination with the color information in the scene image to obtain a three-dimensional color person region image. Therefore, when the human figure region image and the predetermined two-dimensional background image are subjected to the fusion processing, the two-dimensional human figure region image and the predetermined two-dimensional background image may be fused to obtain a combined image, or a three-dimensional color human figure region image and the predetermined two-dimensional background image may be fused to obtain a combined image.

Further, the image processing apparatus 100 includes an image memory 30. The image Memory 30 may be embedded in the electronic device 1000, or may be a Memory independent from the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The raw image data collected by the visible light camera 11 or the structured light image related data collected by the depth image collecting assembly 12 can be transmitted to the image memory 30 for storage or buffering. Processor 20 may read raw image data from image memory 30 for processing to obtain an image of a scene and may read structured light image-related data from image memory 30 for processing to obtain a depth image. In addition, the scene image and the depth image may be stored in the image memory 30 for the processor 20 to call up processing at any time, for example, the processor 20 calls up the scene image and the depth image to perform person region extraction, and performs fusion processing on the obtained person region image after the calling up and the depth image and a predetermined two-dimensional background image to obtain a merged image. Wherein the predetermined two-dimensional background image and the merged image may also be stored in the image memory 30.

The image processing apparatus 100 may also include a display 50. The display 50 may retrieve the merged image directly from the processor 20 and may also retrieve the merged image from the image memory 30. The display 50 displays the merged image for viewing by a user or for further Processing by a Graphics Processing Unit (GPU). The image processing apparatus 100 further includes an encoder/decoder 60, and the encoder/decoder 60 may encode and decode image data of a scene image, a depth image, a merged image, and the like, and the encoded image data may be stored in the image memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), a GPU, or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a Central Processing Unit (CPU), a GPU, and a coprocessor.

The image processing apparatus 100 further comprises a control logic 40. When imaging device 10 is imaging, processor 20 may perform an analysis based on data acquired by the imaging device to determine image statistics for one or more control parameters (e.g., exposure time, etc.) of imaging device 10. Processor 20 sends the image statistics to control logic 40 and control logic 40 controls imaging device 10 to determine the control parameters for imaging. Control logic 40 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

Referring to fig. 14, an electronic device 1000 according to an embodiment of the invention includes one or more processors 200, a memory 300, and one or more programs 310. Where one or more programs 310 are stored in memory 300 and configured to be executed by one or more processors 200. The program 310 includes instructions for performing the image processing method of any of the above embodiments.

For example, program 310 includes image processing method instructions for performing the steps of:

As another example, program 310 includes instructions for an image processing method that performs the steps of:

acquiring a scene image of a current user;

acquiring a depth image of a current user;

processing a scene image and the depth image to extract a person region of a current user in the scene image to obtain a person region image;

and fusing the human figure region image with a preset three-dimensional background image to obtain the combined image.

For another example, program 310 may further include instructions for performing the image processing method described in the following steps:

demodulating phase information corresponding to each pixel in the structured light image;

converting the phase information into depth information; and

and generating a depth image according to the depth information.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with the image-enabled electronic device 1000. The computer program may be executed by the processor 200 to perform the image processing method of any of the above embodiments.

For example, the computer program may be executed by the processor 200 to perform the image processing method described in the following steps:

detecting position change information of the person region image according to the output of a motion sensor of the electronic device; and

As another example, the computer program may be executable by the processor 200 to perform an image processing method as described in the following steps:

acquiring a scene image of a current user;

acquiring a depth image of a current user;

As another example, the computer program may also be executable by the processor 200 to perform an image processing method as described in the following steps:

converting the phase information into depth information; and

and generating a depth image according to the depth information.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An image processing method for an electronic device to process a merged image, wherein the merged image is formed by fusing a predetermined three-dimensional background image and a human figure region image in a scene image of a current user in a real scene, the image processing method comprising:

acquiring a scene image of a current user;

acquiring a depth image of the current user;

processing the scene image and the depth image to extract a person region of the current user in the scene image to obtain a person region image;

according to the depth data corresponding to each pixel in the preset three-dimensional background image and an object existing in the preset three-dimensional background image, a preset fusion area in the preset three-dimensional background image is calibrated;

determining a pixel area to be replaced of the preset fusion area according to the size of the human figure area image;

replacing the pixel area to be replaced of the preset fusion area with the figure area image to obtain the combined image; when the character image area is detected to start to change, the stable merged image of the last frame before the change is continuously output; when the image of the person area is stabilized again, outputting and detecting position change information of the image of the person area according to a motion sensor of the electronic device; and

2. The image processing method according to claim 1, wherein the step of obtaining the depth image of the current user comprises:

projecting structured light towards the current user;

shooting a structured light image modulated by the current user; and

and demodulating phase information corresponding to each pixel of the structured light image to obtain the depth image.

3. The method according to claim 2, wherein the step of demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image comprises:

converting the phase information into depth information; and

and generating the depth image according to the depth information.

4. The image processing method according to claim 1, wherein the step of processing the scene image and the depth image to extract a human figure region of the current user in the scene image to obtain a human figure region image comprises:

identifying a face region in the scene image;

acquiring depth information corresponding to the face area from the depth image;

determining the depth range of the character region according to the depth information of the face region; and

and determining a person region which is connected with the face region and falls into the depth range according to the depth range of the person region to obtain the person region image.

5. The image processing method according to claim 4, characterized in that the image processing method further comprises the steps of:

6. The image processing method according to claim 1, characterized in that the image processing method comprises:

processing the predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image;

acquiring depth data of the preset three-dimensional background image;

determining a calculation fusion area of the preset three-dimensional background image according to the full-field edge image of the preset three-dimensional background image and the depth data;

determining a pixel area to be replaced of the calculation fusion area according to the figure area image; and

and replacing the pixel area to be replaced of the calculated fusion area with the human figure area image to obtain the combined image.

7. The image processing method according to claim 1, wherein the position change information includes a change direction and a change angle of the person region image.

8. The image processing method according to claim 7, wherein the step of adjusting the predetermined three-dimensional background image according to the position change information to match the predetermined three-dimensional background with the position change information:

adjusting the predetermined three-dimensional background in the same direction and angle as the change direction and the change angle to match the predetermined three-dimensional background with the position change information;

and fusing the changed human figure region image with the adjusted preset three-dimensional background to obtain an adjusted combined image.

9. An image processing apparatus for an electronic apparatus to process a merged image in which a predetermined three-dimensional background image is merged with a person region image in a scene image of a current user in a real scene, the image processing apparatus comprising:

the visible light camera is used for acquiring a scene image of a current user; and

the depth image acquisition component is used for acquiring a depth image of the current user;

a processor to:

acquiring a scene image of a current user;

acquiring a depth image of the current user;

10. The image processing apparatus of claim 9, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting structured light to the current user;

the structured light camera is configured to:

shooting a structured light image modulated by the current user; and

11. The image processing apparatus of claim 10, wherein the processor is further configured to:

converting the phase information into depth information; and

and generating the depth image according to the depth information.

12. The image processing apparatus of claim 9, wherein the processor is further configured to:

identifying a face region in the scene image;

13. The image processing apparatus of claim 12, wherein the processor is further configured to:

and correcting the character area image according to the full-field edge image of the scene image.

14. The image processing apparatus of claim 9, wherein the processor is further configured to:

acquiring depth data of the preset three-dimensional background image;

15. The apparatus according to claim 9, wherein the position change information includes a change direction and a change angle of the person region image.

16. The image processing apparatus of claim 15, wherein the processor is configured to:

17. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing method of any of claims 1 to 8.

18. A computer-readable storage medium comprising a computer program for use in conjunction with an electronic device capable of capturing images, the computer program being executable by a processor to perform the image processing method of any one of claims 1 to 8.