CN107622495A

CN107622495A - Image processing method and device, electronic installation and computer-readable recording medium

Info

Publication number: CN107622495A
Application number: CN201710811779.9A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-01-23

Abstract

The invention discloses a kind of image processing method, for electronic installation.Image processing method includes：With the three-dimensional scene image and depth image of predeterminated frequency collection multiframe active user；To split people's object area in every frame scene image and except the background area beyond people's object area to obtain multiframe background area image, multiframe background area image corresponds to multiframe predetermined three-dimensional image for processing multiframe scene image and multiframe depth image；Multiframe merging image will be merged to obtain with corresponding background area image with output video image per frame predetermined three-dimensional image.The invention also discloses a kind of image processing apparatus, electronic installation and computer-readable recording medium.Predetermined three-dimensional image is replaced people's object area in three-dimensional scene image and merges image to obtain multiframe by image processing method and device, the electronic installation and computer-readable recording medium of embodiment of the present invention, and multiframe is merged into image and forms video image output, add the interest of image co-registration.

Description

Image processing method and device, electronic installation and computer-readable recording medium

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of image processing method and device, electronic installation and Computer-readable recording medium.

Background technology

Existing image co-registration is typically to be merged the portrait of user with background image, but the interest of such a amalgamation mode Taste is relatively low.

The content of the invention

Can the embodiment provides a kind of image processing method, image processing apparatus, electronic installation and computer Read storage medium.

The image processing method of embodiment of the present invention is used for electronic installation, and described image processing method includes：

With the three-dimensional scene image and depth image of predeterminated frequency collection multiframe active user；

Depth image described in scene image described in multiframe and multiframe is handled to split the personage in scene image described in every frame Region and except the background area beyond people's object area to obtain multiframe background area image, Background regional image described in multiframe As corresponding multiframe predetermined three-dimensional image；With

To merge with the corresponding background area image to obtain per predetermined three-dimensional image described in frame multiframe merging image with Output video image.

The image processing apparatus of embodiment of the present invention is used for electronic installation, and described image processing unit includes imaging device And processor.The imaging device is used for the three-dimensional scene image and depth map that multiframe active user is gathered with predeterminated frequency Picture.The processor is used to handle depth image described in scene image described in multiframe and multiframe to split scene image described in every frame In people's object area and except the background area beyond people's object area to obtain multiframe background area image, the back of the body described in multiframe Scenic spot area image corresponds to multiframe predetermined three-dimensional image, and will be per predetermined three-dimensional image and the corresponding background area described in frame Image co-registration obtains multiframe and merges image with output video image.

The electronic installation of embodiment of the present invention includes one or more processors, memory and one or more programs. Wherein one or more of programs are stored in the memory, and are configured to by one or more of processors Perform, described program includes being used for the instruction for performing above-mentioned image processing method.

The computer-readable recording medium of embodiment of the present invention includes what is be used in combination with the electronic installation that can be imaged Computer program, the computer program can be executed by processor to complete above-mentioned image processing method.

Image processing method, image processing apparatus, electronic installation and the computer-readable storage medium of embodiment of the present invention Matter splits personage and the back of the body per frame scene image by depth information after the scene image and depth image of three-dimensional is got Scape so that the three-dimensional people's object area being partitioned into and the three-dimensional three-dimensional more accurate, and that every frame is split to obtain in background area Background area image merged with corresponding predetermined three-dimensional image, i.e., by predetermined three-dimensional image replace active user in scene image In people's object area, the three-dimensional for obtaining multiframe predetermined three-dimensional image and three-dimensional background region image co-registration merges image, multiframe Three-dimensional, which merges image, can also form video image output, in this way, can increase the interest of image co-registration, lift user uses body Test.

The additional aspect and advantage of the present invention will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.

Brief description of the drawings

Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein：

Fig. 1 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 2 is the structural representation of the electronic installation of some embodiments of the present invention.

Fig. 3 is the schematic diagram of the image processing apparatus of some embodiments of the present invention.

Fig. 4 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 5 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 6 (a) to Fig. 6 (e) is the schematic diagram of a scenario of structural light measurement according to an embodiment of the invention.

Fig. 7 (a) and Fig. 7 (b) is the schematic diagram of a scenario of structural light measurement according to an embodiment of the invention.

Fig. 8 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 9 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 10 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 11 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 12 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 13 is the schematic diagram of the image processing apparatus of some embodiments of the present invention.

Figure 14 is the schematic diagram of the electronic installation of some embodiments of the present invention.

Embodiment

Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.

Also referring to Fig. 1 to 2, the image processing method of embodiment of the present invention is used for electronic installation 1000.At image Reason method includes：

03：With the three-dimensional scene image and depth image of predeterminated frequency collection multiframe active user；

05：Processing multiframe scene image and multiframe depth image with split people's object area in every frame scene image and except To obtain multiframe background area image, multiframe background area image corresponds to multiframe predetermined three-dimensional for background area beyond people's object area Image；With

07：It will merge to obtain multiframe merging image per frame predetermined three-dimensional image and corresponding background area image and be regarded with exporting Frequency image.

Referring to Fig. 3, the image processing method of embodiment of the present invention can be by the image procossing of embodiment of the present invention Device 100 is realized.The image processing apparatus 100 of embodiment of the present invention is used for electronic installation 1000.Image processing apparatus 100 wraps Include imaging device 10 and processor 20.Step 03 can be realized that step 05 and step 07 can be by processors by imaging device 10 20 realize.

In other words, imaging device 10 can be used for predeterminated frequency gather multiframe active user three-dimensional scene image and Depth image；Processor 20 can be used for processing multiframe scene image and multiframe depth image to split the people in every frame scene image Object area and except the background area beyond people's object area to obtain multiframe background area image, multiframe background area image is corresponding Multiframe predetermined three-dimensional image, and will merge to obtain multiframe merging figure with corresponding background area image per frame predetermined three-dimensional image As with output video image.

Wherein, predeterminated frequency refers to the frame per second of the collection each second image of imaging device 10, and the value of frame per second can be every Second 30 frames, frame each second 60, frame each second 120 etc..Frame per second is higher, and video image is more smooth.

Background area image is obtained by the scene image of three-dimensional after people's object area and background area segmentation, therefore, background Area image is also 3-D view.

In some embodiments, predetermined three-dimensional image includes three-dimensional virtual portrait, three-dimensional real person, three-dimensional At least one of animals and plants.Three-dimensional real person excludes active user itself.Wherein, three-dimensional virtual portrait can be three The animated character of dimension, such as Mario, Conan, major part son, RNB etc.；Three-dimensional real person can be 3-D view Personality, such as Hepburn Audery, handou sir, Harry Potter etc., three-dimensional animals and plants can be three-dimensional animations Animal or plant, such as Micky Mouse, Donald duck, pea shooter etc..

The image processing apparatus 100 of embodiment of the present invention can apply to the electronic installation of embodiment of the present invention 1000.In other words, the electronic installation 1000 of embodiment of the present invention includes the image processing apparatus of embodiment of the present invention 100。

In some embodiments, electronic installation 1000 includes mobile phone, tablet personal computer, notebook computer, Intelligent bracelet, intelligence Energy wrist-watch, intelligent helmet, intelligent glasses etc..

Image processing method, image processing apparatus 100 and the electronic installation 1000 of embodiment of the present invention are getting three After the scene image and depth image of dimension, the figure and ground per frame scene image is split by depth information so that be partitioned into Three-dimensional people's object area and the three-dimensional three-dimensional background area image more accurate, and that every frame is split to obtain in background area Merged with corresponding predetermined three-dimensional image, i.e., people object area of the active user in scene image replaced by predetermined three-dimensional image, Obtain multiframe predetermined three-dimensional image and merge image with the three-dimensional of three-dimensional background region image co-registration, the three-dimensional of multiframe merges image also Video image output can be formed, in this way, the interest of image co-registration can be increased, lifts the usage experience of user.Further, since close And do not include the actual persons picture of user in image, therefore the privacy of user can be protected to a certain extent.

Referring to Fig. 4, in some embodiments, step 03 gathers the two-dimentional field of multiframe active user with predeterminated frequency Scape image and depth image include：

031：Active user is shot with predeterminated frequency to obtain multiframe two dimensional image；

032：To active user's projective structure light；

033：The structure light image modulated with predeterminated frequency shooting multiframe through active user；With

034：Demodulate per phase information corresponding to each pixel of frame structure light image to obtain multiframe depth image；With

035：Multiframe two dimensional image and multiframe depth image are handled to obtain the three-dimensional scene image of multiframe.

Referring again to Fig. 3, in some embodiments, image processing apparatus 100 includes imaging device 10.Imaging device 10 Including visible image capturing first 11 and depth image acquisition component 12.Depth image acquisition component 12 includes structured light projector 121 With structure light video camera head 122.Step 031 can realize that step 032 can be by structured light projector by visible image capturing first 11 121 realize that step 033, step 034 and step 035 can be realized by structure light video camera head 122.

In other words, it is seen that light video camera head 11 can be used for shooting active user with predeterminated frequency to obtain multiframe X-Y scheme Picture；Structured light projector 121 can be used for active user's projective structure light；Structure light video camera head 122 can be used for predeterminated frequency The structure light image that shooting multiframe is modulated through active user, demodulate per phase information corresponding to each pixel of frame structure light image To obtain multiframe depth image, and multiframe two dimensional image and multiframe depth image are handled to obtain the three-dimensional scene graph of multiframe Picture.

Specifically, it is seen that light video camera head 11 shoots the two dimensional image of active user, and two dimensional image is gray level image or colour Image.Structured light projector 121 by the project structured light of certain pattern to active user face and body on after, used currently The face at family and the surface of body can form the structure light image after being modulated by active user.Structure light video camera head 122 is with default Structure light image after frame per second shooting multiframe is modulated, then each frame structure light image is demodulated to obtain and the frame structure Depth image corresponding to light image, in this way, can obtain multiframe depth image after being demodulated to multiframe structure light image.Its In, the pattern of structure light can be laser stripe, Gray code, sine streak, non-homogeneous speckle etc..Depth image, which characterizes to include, works as Preceding user each personal or object depth information in the scene.The scene domain of two dimensional image and the scene domain of depth image Always, each pixel and in two dimensional image can just be found in depth image to should pixel depth information.In this way, processing Device 20 can carry out three-dimensional modeling, modeling according to the scene that the depth information that depth image collects is shot to structure light video camera head 122 The color information in conjunction with two dimensional image carries out color to the scene of three-dimensional modeling and fills up the colored field that can obtain three-dimensional afterwards Scape image.

It should be noted that in a particular embodiment of the present invention, it is seen that light video camera head 11 and depth image acquisition component 12 should carry out the collection of two dimensional image and depth image respectively using same predeterminated frequency, in this way, three-dimensional multiframe scene image Corresponded with multiframe depth image, be easy in step 07 fusion treatment to predetermined three-dimensional image and background area image.

Referring to Fig. 5, in some embodiments, step 034 is demodulated corresponding to each pixel of every frame structure light image Phase information the step of obtaining multiframe depth image to include：

0341：Phase information corresponding to each pixel in the every frame structure light image of demodulation；

0342：Phase information is converted into depth information；With

0343：Depth image is generated according to depth information.

Referring again to Fig. 2, in some embodiments, step 0341, step 0342 and step 0343 can be by structures Light video camera head 122 is realized.

In other words, it is corresponding can be further used for demodulating each pixel in every frame structure light image for structure light video camera head 122 Phase information, phase information is converted into depth information, and depth image is generated according to depth information.

Specifically, compared with non-modulated structure light, the phase information of the structure light after modulation is changed, and is being tied The structure light showed in structure light image is to generate the structure light after distortion, wherein, the phase information of change can characterize The depth information of object.Therefore, structure light video camera head 122 demodulates phase corresponding to each pixel in every frame structure light image first Position information, calculates depth information, so as to obtain depth image corresponding with the frame structure light image further according to phase information.

In order that those skilled in the art be more apparent from according to structure light come gather active user face and The process of the depth image of body, illustrated below by taking a kind of widely used optical grating projection technology (fringe projection technology) as an example Its concrete principle.Wherein, optical grating projection technology belongs to sensu lato area-structure light.

As shown in Fig. 6 (a), when being projected using area-structure light, sine streak is produced by computer programming first, And sine streak is projected to measured object by structured light projector 121, recycle structure light video camera head 122 to shoot striped by thing Degree of crook after body modulation, then demodulates the curved stripes and obtains phase, then phase is converted into depth information to obtain Depth image.The problem of to avoid producing error or error coupler, needed before carrying out depth information collection using structure light to depth Image collection assembly 12 carries out parameter calibration, and demarcation includes geometric parameter (for example, structure light video camera head 122 and project structured light Relative position parameter between device 121 etc.) demarcation, the inner parameter and structured light projector 121 of structure light video camera head 122 The demarcation of inner parameter etc..

Specifically, the first step, computer programming produce sine streak.Need to obtain using the striped of distortion due to follow-up Phase, for example phase is obtained using four step phase-shifting methods, therefore the striped that four width phase differences are pi/2, then structure light are produced here The projector 121 projects the four spokes line timesharing on measured object (mask shown in Fig. 6 (a)), and structure light video camera head 122 gathers To the figure on such as Fig. 6 (b) left sides, while to read the striped of the plane of reference shown on the right of Fig. 6 (b).

Second step, carry out phase recovery.The bar graph that structure light video camera head 122 is modulated according to four width collected is (i.e. Structure light image) to calculate the phase diagram by phase modulation, now obtained be to block phase diagram.Because four step Phase-shifting algorithms obtain Result be that gained is calculated by arctan function, therefore the phase after structure light modulation is limited between [- π, π], that is, Say, the phase after modulation exceedes [- π, π], and it can restart again.Shown in the phase main value such as Fig. 6 (c) finally given.

Wherein, it is necessary to carry out the saltus step processing that disappears, it is continuous phase that will block phase recovery during phase recovery is carried out Position.As shown in Fig. 6 (d), the left side is the continuous phase bitmap modulated, and the right is to refer to continuous phase bitmap.

3rd step, subtract each other to obtain phase difference (i.e. phase information) by the continuous phase modulated and with reference to continuous phase, should Phase difference characterizes depth information of the measured object with respect to the plane of reference, then phase difference is substituted into the conversion formula (public affairs of phase and depth The parameter being related in formula is by demarcation), you can obtain the threedimensional model of the object under test as shown in Fig. 6 (e).

It should be appreciated that in actual applications, according to the difference of concrete application scene, employed in the embodiment of the present invention Structure light in addition to above-mentioned grating, can also be other arbitrary graphic patterns.

As a kind of possible implementation, the depth information of pattern light progress active user also can be used in the present invention Collection.

Specifically, the method that pattern light obtains depth information is that this spreads out using a diffraction element for being essentially flat board The relief diffraction structure that there are element particular phases to be distributed is penetrated, cross section is with two or more concavo-convex step embossment knots Structure.Substantially 1 micron of the thickness of substrate in diffraction element, each step it is highly non-uniform, the span of height can be 0.7 Micron~0.9 micron.Structure shown in Fig. 7 (a) is the local diffraction structure of the collimation beam splitting element of the present embodiment.Fig. 7 (b) is edge The unit of the cross sectional side view of section A-A, abscissa and ordinate is micron.The speckle pattern of pattern photogenerated has The randomness of height, and can with the difference of distance changing patterns.Therefore, depth information is being obtained using pattern light Before, it is necessary first to the speckle pattern in space is calibrated, for example, in the range of 0~4 meter of distance structure light video camera head 122, A reference planes are taken every 1 centimetre, then just save 400 width speckle images after demarcating, the spacing of demarcation is smaller, obtains Depth information precision it is higher.Then, structured light projector 121 is by pattern light projection to measured object (i.e. active user) On, the speckle pattern that the difference in height on measured object surface to project the pattern light on measured object changes.Structure light Camera 122 is shot project speckle pattern (i.e. structure light image) on measured object after, then by speckle pattern and demarcation early stage The 400 width speckle images preserved afterwards carry out computing cross-correlation one by one, and then obtain 400 width correlation chart pictures.Measured object in space Position where body can show peak value on correlation chart picture, above-mentioned peak value is superimposed and after interpolation arithmetic i.e. It can obtain the depth information of measured object.

Multi beam diffraction light is obtained after diffraction is carried out to light beam due to common diffraction element, but per beam diffraction light light intensity difference Greatly, it is also big to the risk of human eye injury.Re-diffraction even is carried out to diffraction light, the uniformity of obtained light beam is relatively low. Therefore, the effect projected using the light beam of common diffraction element diffraction to measured object is poor.Using collimation in the present embodiment Beam splitting element, the element not only have the function that to collimate uncollimated rays, also have the function that light splitting, i.e., through speculum The non-collimated light of reflection is emitted multi-beam collimation light beam, and the multi-beam collimation being emitted after collimating beam splitting element toward different angles The area of section approximately equal of light beam, flux of energy approximately equal, and then to carry out using the scatterplot light after the beam diffraction The effect of projection is more preferable.Meanwhile laser emitting light is dispersed to every light beam, the risk of injury human eye is reduce further, and dissipate Spot structure light is for other uniform structure lights of arrangement, when reaching same collection effect, the consumption of pattern light Electricity is lower.

Referring to Fig. 8, in some embodiments, step 05 handles multiframe scene image and multiframe depth image to split With obtain background area image included per people's object area in frame scene image and except the background area beyond people's object area：

051：Identification is per the human face region in frame scene image；

052：From the frame scene image or with the frame scene image corresponding to obtain in depth image it is corresponding with human face region Depth information；

053：The depth bounds of people's object area is determined according to the depth information of human face region；

054：The personage area for determining to be connected and fallen into depth bounds with human face region according to the depth bounds of people's object area Domain；

055：Background area image is determined according to people's object area and the frame scene image.

Referring again to Fig. 2, in some embodiments, step 051, step 052, step 053, step 054 and step 055 It can be realized by processor 20.

In other words, processor 20 can be further used for identifying the human face region in every frame scene image, from the frame scene Depth information corresponding with human face region is obtained in image or depth image corresponding with the frame scene image, according to human face region Depth information determine the depth bounds of people's object area, determine to be connected and fall with human face region according to the depth bounds of people's object area Enter people's object area in depth bounds, and the background area image is determined according to people's object area and the frame scene image.

Specifically, the face area that the deep learning Model Identification trained can be used to go out in every frame scene image first Domain, subsequently, as the scene image of three-dimensional is in addition to comprising color information per frame, also comprising depth information, therefore can direct basis The three-dimensional scene image of the frame obtains the depth information of human face region, or, can also be according to depth image and two dimensional image Corresponding relation can determine that the depth information of human face region in each frame scene image.Due to human face region include nose, eyes, The features such as ear, lip, therefore, each feature in human face region are corresponding in the scene image or depth image of three-dimensional Depth data is different, for example, in face face depth image acquisition component 12, depth image acquisition component 12 is shot Depth image in, depth data corresponding to nose may be smaller, and depth data corresponding to ear may be larger.Therefore, on The depth information for the human face region stated may be a numerical value or a number range.Wherein, when the depth of human face region is believed Cease for a numerical value when, the numerical value can be by averaging to obtain to the depth data of human face region；Or can be by people The depth data in face region is worth in taking.

Because people's object area includes human face region, in other words, people's object area is in some depth together with human face region In the range of, therefore, after processor 20 determines the depth information of human face region, it can be set according to the depth information of human face region The depth bounds of people's object area, the depth bounds extraction further according to people's object area fall into the depth bounds and with human face region phase People's object area of connection.After determining people's object area, the part in scene image except people's object area is the portion of background area Point, processor 20 will be extracted in scene image except the part outside people's object area, you can obtain background area image.

In this way, it is more to obtain people's object area and background area can be partitioned into from every frame scene image according to depth information Frame background area image.Because the image for obtaining the not factor such as illumination, colour temperature in by environment of depth information rings, therefore, extraction The background area image gone out is more accurate.

Referring to Fig. 9, in some embodiments, image processing method is further comprising the steps of：

061：Handle per frame scene image to obtain the whole audience edge image of every frame scene image；With

062：According to every frame whole audience edge image amendment background area image corresponding with the frame whole audience edge image.

Referring again to Fig. 2, in some embodiments, step 061 and step 062 can be realized by processor 20.

In other words, processor 20 can also be used to handle every frame scene image to obtain the whole audience edge of every frame scene image Image, and according to every frame whole audience edge image amendment background area image corresponding with the frame whole audience edge image.

Processor 20 carries out edge extracting to obtain multiframe whole audience edge image to every frame scene image first, wherein, entirely Edge lines in field edge image include the edge lines of background object in scene residing for active user and active user.Tool Body, edge extracting can be carried out to every frame scene image by Canny operators.Canny operators carry out the algorithm of edge extracting Core mainly includes the following steps：First, convolution is carried out to scene image to eliminate noise with 2D gaussian filterings template；Then, The Grad of the gray scale of each pixel, and the gradient side of the gray scale according to each pixel of Grad calculating are obtained using differential operator To adjacent pixels of the respective pixel along gradient direction can be found by gradient direction；Then, each pixel is traveled through, if certain The gray value of individual pixel is not maximum compared with the gray value of former and later two adjacent pixels on its gradient direction, then thinks this Individual pixel is not marginal point.In this way, the pixel that marginal position is in scene image is can determine that, so as to obtain edge extracting Whole audience edge image afterwards.

The corresponding frame whole audience edge image of each frame scene image, similarly, the corresponding frame background of each frame scene image Area image, therefore, whole audience edge image and Background regional image seem one-to-one.Processor 20 obtains whole audience edge image Afterwards, it is modified according to whole audience edge image pair background area image corresponding with whole audience edge image, specifically, first with complete Field edge image is modified to people's object area in scene image, and the final back of the body is determined further according to revised people's object area Scenic spot area image.It is appreciated that people's object area is will to be connected and fall into the depth bounds of setting in scene image with human face region All pixels carry out merger after obtain, in some scenarios, it is understood that there may be some are connected with human face region and fall into depth In the range of object.Therefore, whole audience edge graph can be used to be modified to people's object area to obtain more accurate people's object area, Background area is determined further according to the higher people's object area of the degree of accuracy.In this way, the background area image finally given is also more accurate.

Further, processor 20 can also carry out second-order correction to revised people's object area, for example, can be to revised People's object area carries out expansion process, expands people's object area to retain the edge details of people's object area.Then, processor 20 is further Background area is determined according to more accurate people's object area, then the precision of the background area image finally given is higher.

Referring to Fig. 10, in some embodiments, the image processing method of embodiment of the present invention also includes：

063：Every frame scene image and depth image are handled to extract the action message of active user；With

064：Predetermined three-dimensional image is rendered according to action message so as to follow moving for active user per frame predetermined three-dimensional image Make；

Step 07 will merge to obtain merging image per frame predetermined three-dimensional image and corresponding background area image to be regarded with exporting The step of frequency image, includes：

071：Every frame predetermined three-dimensional image after rendering merge to obtain with corresponding background area image merge image with Output video image.

Referring again to Fig. 2, in some embodiments, step 063, step 064 and step 071 can be real by processor 20 It is existing.In other words, processor 20 can be used for processing per frame scene image and depth image is to extract the action message of active user, Predetermined three-dimensional image is rendered according to action message so as to follow the action of active user per frame predetermined three-dimensional image, and will be per frame Predetermined three-dimensional image after rendering merges to obtain with corresponding background area image merges image with output video image.

Wherein, action message includes at least one of expression and limb action of active user.In other words, action letter Breath can be the expression of active user, or the limb action of active user can also be the expression and limbs of active user Action.

Specifically, in step 05, processor 20 has identified the human face region of every frame scene image, and has been partitioned into People's object area and background area, therefore, when performing step 063, processor 20 is by handling per frame human face region to identify The expression of active user, and people's object area in every frame scene image is handled to obtain the letter of active user's limb action Breath.Wherein, the information of active user's limb action can be obtained by way of template matches.Processor 20 is by people's object area Matched with multiple personage's templates.The head of people's object area is matched first；After the completion of being matched on head, then to fit heads Remaining multiple personage's templates carry out next limbs matching, i.e. upper part of the body trunk matching；Matched in upper part of the body trunk Cheng Hou, then the remaining multiple personage's templates to match to head and upper part of the body trunk carry out the matching of next limbs, i.e. upper limbs The matching of body and lower limb body, so as to determine the information of active user's limb action according to the method for template matches.Then, processor 20 render according to the expression and limb action of the active user that will identify that to predetermined three-dimensional image, make predetermined three-dimensional image In personage or animals and plants can follow imitate active user expression and limb action.Finally, after processor 20 will render Predetermined three-dimensional image is merged with background area image to obtain merging image.

Because people's object area where the active user in each frame scene image can be by three-dimensional personage or dynamic plant Thing substitutes, and personage in each frame predetermined three-dimensional image or animals and plants can follow field corresponding with the frame predetermined three-dimensional image The action of active user in scape image, therefore, multiframe merge image and formed after video image broadcasting, and video image can show one It is individual to follow the three-dimensional personage for imitating current user action or animals and plants.In this way, the interest of image co-registration is greatly improved, More preferable visual experience is brought to user.

Figure 11 is referred to, in some embodiments, step 07 will every frame predetermined three-dimensional image and corresponding background area Image co-registration obtains merging image to be included with output video image：

072：Compare size of every frame predetermined three-dimensional image with the size of people's object area in corresponding scene image；

073：When the size of the frame predetermined three-dimensional image is more than the size of people's object area, the frame predetermined three-dimensional image is reduced And the people's object area being filled into scene image merges image to merge to obtain；With

074：When the size of the frame predetermined three-dimensional image is less than the size of people's object area, amplify the frame predetermined three-dimensional image And the people's object area being filled into scene image merges image to merge to obtain, or the frame predetermined three-dimensional image is filled into scene People's object area in image, and using between people's object area adjoining the pixel filling frame predetermined three-dimensional image and people's object area Space with obtain merge image, and processing multiframe merge image with output video image.

Referring again to Fig. 2, in some embodiments, step 072, step 073, step 074 and step 075 can be by Processor 20 is realized.In other words, processor 20 can be additionally used in comparison per frame predetermined three-dimensional image with corresponding scene image People's object area size size, the frame predetermined three-dimensional image size be more than people's object area size when, reduce the frame Predetermined three-dimensional image and people's object area for being filled into scene image merges image to merge to obtain, in the frame predetermined three-dimensional image Size when being less than the size of people's object area, people's object area for amplifying the frame predetermined three-dimensional image and being filled into scene image with Fusion is obtained merging image, or the frame predetermined three-dimensional image is filled into people's object area in scene image, and utilizes personage area Space between the adjoining pixel filling in the domain frame predetermined three-dimensional image and people's object area is to obtain merging image, and processing Multiframe merges image with output video image.

Specifically, because the collection distance between active user and visible image capturing first 11 is not fixed, therefore, per frame The size of people's object area is nor fixed in scene image.In this way, by background area image and predetermined three-dimensional image Before fusion, it is necessary first to compare predetermined three-dimensional image and the background area image that merges therewith corresponding to people's object area size Size.Wherein, size includes the height and width of people's object area and predetermined three-dimensional image.In the width and height of predetermined three-dimensional image When degree is bigger than the size of people object area, suitable height and the diminution value of width can be determined according to the size of people's object area, and Predetermined three-dimensional image is reduced according to diminution value, so that predetermined three-dimensional image is filled into people's object area institute in scene image Part., can be according to personage area when the height and width of predetermined three-dimensional image are respectively less than the height and width of people's object area The size in domain determines the value of magnification of suitable height and width, and predetermined three-dimensional image is amplified according to value of magnification to fill Enter the part where people's object area in scene image；Or predetermined three-dimensional image is directly filled into scene graph with original size Part as in where people's object area, the pixel around people's object area is recycled between predetermined three-dimensional image and people's object area Space is filled up.It is more than the width of people's object area in the width of predetermined three-dimensional image, and the height of predetermined three-dimensional image is less than During the height of people's object area, the width of predetermined three-dimensional image can be suitably reduced according to the height of people's object area, and according to personage area The height in domain suitably amplifies the height of predetermined three-dimensional image, and the predetermined three-dimensional image after adjustment size is filled into scene image Part where middle people's object area.It is more than the height of people's object area, and predetermined three-dimensional image in the height of predetermined three-dimensional image When width is less than the width of people's object area, the height of predetermined three-dimensional image, and root can be suitably reduced according to the height of people's object area Suitably amplify the width of predetermined three-dimensional image according to the width of people's object area, and the predetermined three-dimensional image after adjustment size is filled into Part in scene image where people's object area.

In some embodiments, predetermined three-dimensional image can be randomly selected by processor 20, or by active user certainly Row is selected.

After processor 20 obtains multiframe merging image, multiframe merges image sequence and arranges and store, and multiframe merges image can Video format is stored as by processor 20 and forms video image, when video image with certain frame per second electronic installation 1000 display When being shown on device 50 (shown in Figure 13), user is the video pictures that can watch smoothness.

Figure 12 is referred to, in some embodiments, the image processing method of embodiment of the present invention also includes：

081：Gather the acoustic information of active user；With

082：Video image is merged to export sound video with acoustic information.

Referring again to Fig. 3, in some embodiments, image processing apparatus 100 also includes acoustoelectric element 70.Step 081 It can be realized by acoustoelectric element 70, step 082 can be realized by processor 20.In other words, acoustoelectric element 70 can be used for gathering The acoustic information of active user, processor 20 can be used for acoustic information merging video image to export sound video.

Specifically, when being opened in imaging device to gather the scene image and depth image of three-dimensional, acoustoelectric element 70 is also same Shi Kaiqi is to gather the acoustic information of active user.In this way, the acoustic information that acoustoelectric element 70 gathers merges image shape with multiframe Into video image keep synchronous.Then, processor 20 merges acoustic information and video image to be output into sound video. When being played on the display 50 (shown in Figure 13) of electronic installation 1000, picture and sound in sound video can be protected sound video Hold synchronous broadcasting.

Also referring to Fig. 2 and Figure 13, embodiment of the present invention also proposes a kind of electronic installation 1000.Electronic installation 1000 Including image processing apparatus 100.Image processing apparatus 100 can utilize hardware and/or software to realize.Image processing apparatus 100 Including imaging device 10 and processor 20.

Imaging device 10 includes visible image capturing first 11 and depth image acquisition component 12.

Specifically, it is seen that light video camera head 11 includes imaging sensor 111 and lens 112, it is seen that light video camera head 11 can be used for The colour information of active user is caught to obtain multiframe two dimensional image, wherein, imaging sensor 111 includes color filter lens array (such as Bayer filter arrays), the number of lens 112 can be one or more.Visible image capturing first 11 is being obtained per frame X-Y scheme As during, each imaging pixel in imaging sensor 111 senses luminous intensity and wavelength information in photographed scene, Generate one group of raw image data；Imaging sensor 111 sends this group of raw image data into processor 20, processor 20 The two dimensional image of colour is obtained after the computings such as denoising, interpolation are carried out to raw image data.Processor 20 can be in various formats Each image pixel in raw image data is handled one by one, for example, each image pixel there can be 8,10,12 or 14 bits Bit depth, processor 20 can be handled each image pixel by identical or different bit depth.

Depth image acquisition component 12 includes structured light projector 121 and structure light video camera head 122, depth image collection group The depth information that part 12 can be used for catching active user is to obtain depth image.Structured light projector 121 is used to throw structure light Active user is incident upon, wherein, structured light patterns can be the speckle of laser stripe, Gray code, sine streak or random alignment Pattern etc..Structure light video camera head 122 includes imaging sensor 1221 and lens 1222, and the number of lens 1222 can be one or more It is individual.Imaging sensor 1221 is used for the multiframe structure light image that capturing structure light projector 121 is projected on active user.Per frame Structure light image can be sent by depth acquisition component 12 to processor 20 be demodulated, phase recovery, phase information calculate etc. Handle to obtain the depth information of active user.

In some embodiments, it is seen that the function of light video camera head 11 and structure light video camera head 122 can be by a camera Realize, in other words, imaging device 10 only includes a camera and a structured light projector 121, and above-mentioned camera is not only Two dimensional image can be shot, can also shoot structure light image.

Except using structure light obtain depth image in addition to, can also by binocular vision method, based on differential time of flight (Time Of Flight, TOF) even depth obtains the depth image of active user as acquisition methods.

In addition, image processing apparatus 100 also includes memory 30.Memory 30 can be embedded in electronic installation 1000, The memory that can be independently of outside electronic installation 1000, and may include direct memory access (DMA) (Direct Memory Access, DMA) feature.The knot that the raw image data or depth image acquisition component 12 of first 11 collection of visible image capturing gather Structure light image related data, which can transmit, to be stored or is cached into memory 30.Processor 20 can be read from memory 30 Raw image data also can read structure light image related data to enter to be handled to obtain two dimensional image from memory 30 Row processing obtains depth image, can also read raw image data from memory 30 and from structure light image related data carries out Manage to obtain the colored scene image of three-dimensional.In addition, two dimensional image, scene image and depth image are also storable in memory In 30, calling is handled device 20 for processing at any time, for example, processor 20 calls multiframe scene image and multiframe depth image to carry out Background area is extracted, and the obtained multiframe background area image after extraction is carried out at fusion with corresponding predetermined three-dimensional image Reason merges image to obtain multiframe, and multiframe merges image sequence arrangement or storage forms video image.Wherein, preset three-dimensional Picture, merging image, video image may be alternatively stored in memory 30.

Image processing apparatus 100 may also include display 50.Display 50 can obtain video figure directly from processor 20 Picture, also it can obtain video image from memory 30.Display 50 shows that video image is watched for user, or is drawn by figure Hold up or graphics processor (Graphics Processing Unit, GPU) is further processed.Image processing apparatus 100 Also include encoder/decoder 60, encoder/decoder 60 can encoding and decoding two dimensional image, scene image, depth image, merging The view data of image, video image etc., the view data of coding can be saved in memory 30, and can be shown in image By decoder decompresses to be shown before on display 50.Encoder/decoder 60 can be by central processing unit (Central Processing Unit, CPU), GPU or coprocessor are realized.In other words, encoder/decoder 60 can be Any one or more in central processing unit (Central Processing Unit, CPU), GPU and coprocessor.

Image processing apparatus 100 also includes control logic device 40.Imaging device 10 imaging when, processor 20 can according into As the data that equipment obtains are analyzed to determine one or more control parameters of imaging device 10 (for example, time for exposure etc.) Image statistics.Processor 20 sends image statistics to control logic device 40, the control imaging of control logic device 40 Equipment 10 is imaged with the control parameter determined.Control logic device 40 may include to perform one or more routines (such as firmware) Processor and/or microcontroller.One or more routines can determine imaging device 10 according to the image statistics of reception Control parameter.

Image processing apparatus 100 also includes acoustoelectric element 70.Acoustoelectric element 70 is converted sound using electromagnetic induction principle Exported for electric current.The air vibration inside acoustoelectric element 70 can be driven during active user's sounding so that inside acoustoelectric element 70 Occurs micro-displacement between coil and magnetic core, so as to which cutting magnetic induction line produces electric current.Electric current is sent to processing by acoustoelectric element 70 Device 20, processor 20 handle electric current to generate acoustic information.Acoustic information can deliver to memory 30 and be stored.When processor 20 When merging to obtain sound video with acoustic information by video image, processor 20 can send sound video to display 50 and electricity In sound component (not shown), display 50 shows the video pictures in sound video, electroacoustic component synchronization back sound information.

Figure 14 is referred to, the electronic installation 1000 of embodiment of the present invention includes one or more processors 20, memory 30 and one or more programs 31.Wherein one or more programs 31 are stored in memory 30, and are configured to by one Individual or multiple processors 20 perform.Program 31 includes being used to perform the finger of the image processing method of above-mentioned any one embodiment Order.

For example, program 31 includes being used for the instruction for performing the image processing method described in following steps：

05：Processing multiframe scene image and multiframe depth image with split people's object area in every frame scene image and except Background area beyond people's object area is to obtain multiframe background area image, and background area image corresponds to multiframe and made a reservation for described in multiframe 3-D view；With

For another example program 31 also includes being used for the instruction for performing the image processing method described in following steps：

051：Identification is per the human face region in frame scene image；

The computer-readable recording medium of embodiment of the present invention includes being combined with the electronic installation 1000 that can be imaged making Computer program.Computer program can be performed by processor 20 to complete the image procossing of above-mentioned any one embodiment Method.

For example, computer program can be performed by processor 20 to complete the image processing method described in following steps：

For another example computer program can be also performed by processor 20 to complete the image processing method described in following steps：

051：Identification is per the human face region in frame scene image；

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification Close and combine.

Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize specific logical function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following：Electricity with one or more wiring Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and as independent production marketing or in use, can also be stored in a computer In read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention Type.

Claims

1. a kind of image processing method, for electronic installation, it is characterised in that described image processing method includes：

Depth image described in scene image described in multiframe and multiframe is handled to split people's object area in scene image described in every frame And except the background area beyond people's object area to obtain multiframe background area image, background area image pair described in multiframe Answer multiframe predetermined three-dimensional image；With

It will merge to obtain multiframe merging image to export with the corresponding background area image per predetermined three-dimensional image described in frame Video image.

2. image processing method according to claim 1, it is characterised in that described image processing method also includes：

Gather the acoustic information of the active user；With

The video image is merged to export sound video with the acoustic information.

3. image processing method according to claim 1, it is characterised in that described to be worked as with predeterminated frequency collection multiframe The step of three-dimensional scene image and depth image of preceding user, includes：

The active user is shot with the predeterminated frequency to obtain multiframe two dimensional image；

To active user's projective structure light；

The structure light image modulated with predeterminated frequency shooting multiframe through the active user；With

Phase information corresponding to each pixel of structure light image described in per frame is demodulated to obtain depth image described in multiframe；With

Depth image described in two dimensional image described in multiframe and multiframe is handled to obtain the three-dimensional scene image of multiframe.

4. image processing method according to claim 3, it is characterised in that demodulation structure light image described in per frame The step of phase information corresponding to each pixel is to obtain depth image described in multiframe includes：

Phase information corresponding to each pixel in demodulation structure light image described in per frame；

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

5. image processing method according to claim 1, it is characterised in that the predetermined three-dimensional image includes three-dimensional void At least one of anthropomorphic thing, three-dimensional real person, three-dimensional animals and plants, the three-dimensional real person eliminate described work as Preceding user itself.

6. image processing method according to claim 1, it is characterised in that described image processing method also includes：

Scene image described in per frame and the depth image are handled to extract the action message of the active user；With

The predetermined three-dimensional image is rendered according to the action message so as to be followed per predetermined three-dimensional image described in frame described current The action of user；

It is described to merge to obtain merging image to export with the corresponding background area image per predetermined three-dimensional image described in frame The step of video image, includes：

Predetermined three-dimensional image described in every frame after rendering merges to obtain conjunction described in multiframe with the corresponding background area image And image is with output video image.

7. image processing method according to claim 6, it is characterised in that the action message includes the active user Expression and/or limb action.

8. image processing method according to claim 1, it is characterised in that predetermined three-dimensional image corresponds to multiframe described in multiframe The scene image, it is described to merge to obtain merging figure with the corresponding background area image per predetermined three-dimensional image described in frame As being included with the step of output video image：

Compare size of the predetermined three-dimensional image described in every frame with the size of people's object area in the corresponding scene image；

When the size of predetermined three-dimensional image described in the frame is more than the size of people's object area, predetermined three-dimensional described in the frame is reduced Image is simultaneously filled into people's object area in the scene image to merge to obtain merging image；

When the size of predetermined three-dimensional image described in the frame is less than the size of people's object area, amplify predetermined three-dimensional described in the frame Image is simultaneously filled into people's object area in the scene image to merge to obtain the merging image, or will be pre- described in the frame Determine people's object area that 3-D view is filled into the scene image, and filled out using the adjoining pixel of people's object area The space filled between predetermined three-dimensional image described in the frame and people's object area is to obtain described merging image；With

Merge image described in processing multiframe to export the video image.

9. a kind of image processing apparatus, for electronic installation, it is characterised in that described image processing unit includes：

Imaging device, the imaging device are used for the three-dimensional scene image and depth that multiframe active user is gathered with predeterminated frequency Image；

Processor, the processor are used to handle depth image described in scene image described in multiframe and multiframe to split described in every frame People's object area in scene image and except the background area beyond people's object area to obtain multiframe background area image, it is more Background area image described in frame corresponds to multiframe predetermined three-dimensional image；With

It will merge to obtain with the corresponding background area image per predetermined three-dimensional image described in frame and merge image to export video figure Picture.

10. image processing apparatus according to claim 9, it is characterised in that described image processing unit also includes acoustic-electric Element, the acoustoelectric element are used for the acoustic information for gathering the active user；

The processor is additionally operable to the acoustic information merge the video image to export sound video.

11. image processing apparatus according to claim 9, it is characterised in that the imaging device includes visible image capturing Head and depth image acquisition component, the depth image acquisition component includes structured light projector and structure light video camera head, described Visible image capturing head is used to shoot the active user with the predeterminated frequency to obtain multiframe two dimensional image；

The structured light projector is used for active user's projective structure light；

The structure light video camera head is used for：

The structure light image modulated with predeterminated frequency shooting multiframe through the active user；

12. image processing apparatus according to claim 11, it is characterised in that the structure light video camera head is further used In：

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

13. image processing apparatus according to claim 9, it is characterised in that the predetermined three-dimensional image includes three-dimensional At least one of virtual portrait, three-dimensional real person, three-dimensional animals and plants, the three-dimensional real person eliminates described Active user itself.

14. image processing apparatus according to claim 9, it is characterised in that the processor is additionally operable to：

Scene image described in per frame and the depth image are handled to extract the action message of the active user；

15. image processing apparatus according to claim 14, it is characterised in that the action message includes the current use The expression and/or limb action at family.

16. image processing apparatus according to claim 9, it is characterised in that predetermined three-dimensional image described in multiframe is corresponding more Scene image described in frame, the processor are additionally operable to：

When the size of predetermined three-dimensional image described in the frame is more than the size of people's object area, predetermined three-dimensional described in the frame is reduced Image is simultaneously filled into people's object area in the scene image to merge to obtain merging image；With

Merge image described in processing multiframe to export the video image.

17. a kind of electronic installation, it is characterised in that the electronic installation includes：

One or more processors；

Memory；With

One or more programs, wherein one or more of programs are stored in the memory, and be configured to by One or more of computing devices, described program include being used at the image that perform claim is required described in 1 to 8 any one The instruction of reason method.

A kind of 18. computer-readable recording medium, it is characterised in that the meter being used in combination including the electronic installation with that can image Calculation machine program, the computer program can be executed by processor to complete the image procossing described in claim 1 to 8 any one Method.