CN107707837A

CN107707837A - Image processing method and device, electronic installation and computer-readable recording medium

Info

Publication number: CN107707837A
Application number: CN201710812799.8A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2018-02-16
Anticipated expiration: 2037-09-11
Also published as: CN107707837B

Abstract

It is used to handle the invention discloses a kind of image processing method and merges image.Merging image is merged by personage's area image of the predetermined three-dimensional background image with active user in the scene image under real scene to be formed.Image processing method includes：Obtain the acoustic information of active user；Acoustic information is handled to obtain sound property；Predetermined three-dimensional background image is switched according to sound property.The invention also discloses a kind of image processing apparatus, electronic installation and computer-readable recording medium.Image processing method, image processing apparatus, electronic installation and the computer-readable recording medium of embodiment of the present invention are in processing real person with predetermined three-dimensional background image when merging image, switch the predetermined three-dimensional background image merged in image according to the sound property of the active user identified, so that it is more abundant to merge image, improve Consumer's Experience, enhancing is interesting.

Description

Image processing method and device, electronic installation and computer-readable recording medium

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of image processing method and device, electronic installation and Computer-readable recording medium.

Background technology

When the character image of existing real scene merges with virtual background image, background image is typically more fixed, Consumer's Experience is poor.

The content of the invention

Can the embodiment provides a kind of image processing method, image processing apparatus, electronic installation and computer Read storage medium.

The image processing method of embodiment of the present invention, which is used to handle, merges image, and the merging image is carried on the back by predetermined three-dimensional Personage area image of the scape image with active user in the scene image under real scene, which merges, to be formed, described image processing side Method includes：

Obtain the acoustic information of the active user；

The acoustic information is handled to obtain sound property；With

The predetermined three-dimensional background image is switched according to the sound property.

The image processing apparatus of embodiment of the present invention, which is used to handle, merges image, and the merging image is carried on the back by predetermined three-dimensional Personage area image of the scape image with active user in the scene image under real scene, which merges, to be formed.Described image processing dress Put including acoustoelectric element and processor.The acoustoelectric element is used for the acoustic information for obtaining the active user, the processor Switch the predetermined three-dimensional Background for handling the acoustic information to obtain sound property, and according to the sound property Picture.

The electronic installation of embodiment of the present invention includes one or more processors, memory and one or more programs. Wherein one or more of programs are stored in the memory, and are configured to by one or more of processors Perform, described program includes being used for the instruction for performing above-mentioned image processing method.

The computer-readable recording medium of embodiment of the present invention includes what is be used in combination with the electronic installation that can be imaged Computer program, the computer program can be executed by processor to complete above-mentioned image processing method.

Image processing method, image processing apparatus, electronic installation and the computer-readable storage medium of embodiment of the present invention Matter, real person is being handled with predetermined three-dimensional background image when merging image, it is special according to the sound of the active user identified Property merge predetermined three-dimensional background image in image so that merge that image is more abundant, and enhancing is interesting, improve Consumer's Experience.

The additional aspect and advantage of the present invention will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.

Brief description of the drawings

Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein：

Fig. 1 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 2 is the schematic diagram of the image processing apparatus of some embodiments of the present invention.

Fig. 3 is the structural representation of the electronic installation of some embodiments of the present invention.

Fig. 4 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 5 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 6 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 7 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 8 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Fig. 9 (a) to Fig. 9 (e) is the schematic diagram of a scenario of structural light measurement according to an embodiment of the invention.

Figure 10 (a) and Figure 10 (b) is the schematic diagram of a scenario of structural light measurement according to an embodiment of the invention.

Figure 11 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 12 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 13 is the schematic flow sheet of the image processing method of some embodiments of the present invention.

Figure 14 is the schematic diagram of the electronic installation of some embodiments of the present invention.

Figure 15 is the schematic diagram of the image processing apparatus of some embodiments of the present invention.

Embodiment

Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.

Referring to Fig. 1, the image processing method of embodiment of the present invention, which is used to handle, merges image.Merge image by making a reservation for Personage area image of the three-dimensional background image with active user in the scene image under real scene, which merges, to be formed.Image procossing Method includes step：

02；Obtain the acoustic information of active user；

04：Acoustic information is handled to obtain sound property；With

06：Predetermined three-dimensional background image is switched according to sound property.

Also referring to Fig. 2 and Fig. 3, the image processing apparatus 100 of embodiment of the present invention, which is used to handle, merges image.Its In, merge image by predetermined three-dimensional background image and personage area image of the active user in the scene image under real scene Fusion forms.The image processing method of embodiment of the present invention can be real by the image processing apparatus 100 of embodiment of the present invention It is existing, and it is used for electronic installation 1000.Image processing apparatus 100 includes processor 20 and acoustoelectric element 70.Step 02 can be by sound Electric device 70 is realized.Step 04 and step 06 can be realized by processor 20.

In other words, acoustoelectric element 70 is used for the acoustic information for obtaining active user.Processor 20 is used to handle sound letter Breath switches predetermined three-dimensional background image to obtain sound property, and according to sound property.

The image processing apparatus 100 of embodiment of the present invention can apply to the electronic installation of embodiment of the present invention 1000.In other words, the electronic installation 1000 of embodiment of the present invention includes the image processing apparatus of embodiment of the present invention 100。

In some embodiments, electronic installation 1000 includes mobile phone, tablet personal computer, notebook computer, Intelligent bracelet, intelligence Energy wrist-watch, intelligent helmet, intelligent glasses etc..

In some embodiments, initial predetermined three-dimensional background image can be made a reservation for by what actual scene modeled to obtain The predetermined three-dimensional background image that three-dimensional background image or cartoon making obtain.Predetermined three-dimensional background image can be place Manage what device 20 was selected at random, can also voluntarily be selected by active user.

During some application scenarios, such as video conference or Video chat, both sides are participated in for safety, privacy or increasing Add the demands such as interest, real scene is substituted as background and scene graph of the active user under real scene using virtual image Personage's area image as in, which is fused into merge image and export, is presented to other side.The more single fixation of usual virtual image, is lacked Few change, more uninteresting using identical virtual image for a long time, Consumer's Experience is poor.

The image processing method of embodiment of the present invention, by gathering the acoustic information of active user, and by processor 20 The acoustic information collected is handled to extract sound property, so as to according to sound property and predetermined three-dimensional background image Some relations switch predetermined three-dimensional background image, and more more rich merging images are presented, and enhancing is interesting, improves user's body Test.

In some embodiments, sound property includes at least one of loudness, tone, tone color.In other words, sound Characteristic can only include loudness, tone or tone color, sound property can also include simultaneously loudness and tone, loudness and tone color or Tone and tone color, sound property can also include loudness, tone and tone color three simultaneously.

Specifically, loudness refers to the volume of sound.In some cases, residing for the size of volume and active user True environment has certain association.For example, when the sound of speaking of active user is smaller, illustrate true residing for active user Environment is more quiet, now, can switch to the predetermined three-dimensional background image merged in image containing night, the moon, star, figure The predetermined three-dimensional background image of the elements such as book shop, small bridge over the flowing stream.When the sound of speaking of active user is larger, illustrate active user Residing true environment is more noisy, now, can switch to the predetermined three-dimensional background image merged in image containing crowd, horse The predetermined three-dimensional background image of the elements such as road, rainwater.Wherein, because the acoustic information that acoustoelectric element 70 gathers shows as one group of ripple Shape, therefore, processor 20 can judge the loudness of acoustic information according to the height of the amplitude of this group of waveform.

Tone is substantially the frequency of vocal cord vibration, and when showing acoustically, vibration frequency of vocal band is higher, active user's hair The sound gone out is clearer and more melodious sharp, and vibration frequency of vocal band is lower, and the sound that active user sends is more overcast rough.The sound of usual male Sound is more overcast, and the sound of women is clearer and more melodious.Therefore, after processor 20 handles the acoustic information that acoustoelectric element 70 gathers, if judging Sound is more clear and melodious, then can switch to the predetermined three-dimensional background image merged in image containing elements such as Hello Kitty, flowers Predetermined three-dimensional background image, if judging, sound is more rough, can switch to the predetermined three-dimensional background image merged in image Predetermined three-dimensional background image containing elements such as court, automobile, games.Wherein, processor 20 can be believed by extracting sound The lowest frequency value of breath is as tonality feature, so as to carry out the differentiation of pitch characteristics according to lowest frequency value.

The sound that each user sends is different, and therefore, each user has different tone colors.Predetermined three-dimensional is carried on the back Scape image can also switch over according to the tone color of sound.Now, sound of the sound here except including active user oneself, Ambient sound can be included.For example, the sound of violin may be included in the acoustic information that acoustoelectric element 70 gathers, can now incite somebody to action The predetermined three-dimensional background image for merging image switches to the predetermined three-dimensional background image containing violin element；Or acoustic-electric member It may include mew in the acoustic information that part 70 gathers, the predetermined three-dimensional background image merged in image can now be switched For the predetermined three-dimensional background image containing cat element.Wherein, processor 20 can be by dividing the spectrum signature of acoustic information Analysis and detection, so as to carry out tone color differentiation.

Referring to Fig. 4, in some embodiments, sound property include it is multiple, predetermined three-dimensional background image include it is multiple, Each corresponding predetermined three-dimensional background image of the sound property.Step 06 switches the predetermined three-dimensional back of the body according to sound property The step of scape image, includes：

061：Predetermined three-dimensional background image corresponding with sound property is switched according to sound property.

Referring again to Fig. 2, in some embodiments, step 061 can be realized by processor 20.In other words, handle Device 20 is additionally operable to switch predetermined three-dimensional background image corresponding with sound property according to sound property.

Specifically, for example, the loudness characteristics in sound property can be divided into multiple ranks, each rank according to the size of volume The corresponding predetermined three-dimensional background image of volume.After processor 20 calculates the volume of acoustic information, according to giving great volume Predetermined three-dimensional background image corresponding to small switching.For another example in tonal characteristics in sound property, different tone colors correspond to different Predetermined three-dimensional background image is (for example, the corresponding predetermined three-dimensional background image containing violin element of violin, piano correspondingly contain The predetermined three-dimensional background image of piano element).Processor 20 handles acoustic information to determine after removing tone color type, according to determination Predetermined three-dimensional background image corresponding to the switching of tone color type.

Referring to Fig. 5, in some embodiments, predetermined three-dimensional background image is included multiple and stored with predefined procedure. The step of step 06 switches predetermined three-dimensional background image according to sound property includes：

062：Switch multiple predetermined three-dimensional background images in a predefined manner according to sound property.

Referring again to Fig. 2, in some embodiments, step 062 can be realized by processor 20.In other words, handle Device 20 is additionally operable to switch multiple predetermined three-dimensional background images in a predefined manner according to sound property.

Specifically, a sound property can correspond to multiple predetermined three-dimensional background images.Multiple predetermined three-dimensional background images can It is stored in predefined procedure in electronic installation 1000 (shown in Fig. 3), predefined procedure can be the storage of predetermined three-dimensional background image The self-defined sequence that sequencing or user are carried out according to personal like, image style, after storage, sound property can As the triggering of multiple predetermined three-dimensional background images switching, after processor 20 gets sound property, multiple predetermined three are triggered Tie up the switching of background image.

Wherein, predetermined way may include but be not limited to following manner：All circulations, random switching etc..

Referring to Fig. 6, in some embodiments, the image processing method of embodiment of the present invention also includes：

011：Obtain the scene image of active user；

012：Obtain the depth image of active user；

013：Processing scene image and depth image are obtained with extracting people's object area of the active user in scene image Personage's area image；With

014：Personage's area image is merged to obtain merging image with predetermined three-dimensional background image.

Referring again to Fig. 2, in some embodiments, image processing apparatus 100 is also including visible image capturing first 11 and deeply Spend image collection assembly 12.Step 011 can realize that step 012 can gather group by depth image by visible image capturing first 11 Part 12 realizes that step 013 and step 014 can be realized by processor 20.

In other words, it is seen that light video camera head 11 can be used for the scene image for obtaining active user；Depth image acquisition component 12 can be used for the depth image of acquisition active user；It is current to extract that processor 20 can be used for processing scene image and depth image People object area of the user in scene image and obtain personage's area image, and by personage's area image and predetermined three-dimensional background Image co-registration with obtain merge image.

Wherein, it can be gray level image or coloured image that scene image, which is, and depth image characterizes the field for including active user Each personal or object depth information in scape.The scene domain of scene image is consistent with the scene domain of depth image, and scene Each pixel in image can be found in depth image to should pixel depth information.

The method of existing segmentation personage and background according to similitude of the adjacent pixel in terms of pixel value and does not connect mainly Continuous property carries out the segmentation of personage and background, but this dividing method is easily influenceed by environmental factors such as ambient light photographs.It is of the invention real The image processing method for applying mode is gone out personage's extracted region in scene image by obtaining the depth image of active user Come.Influenceed because the acquisition of depth image is not easy the factor such as COLOR COMPOSITION THROUGH DISTRIBUTION in by illumination, scene, therefore, pass through depth image The people's object area extracted is more accurate, it is particularly possible to which accurate calibration goes out the border of people's object area.Further, more accurately Personage's area image merged with predetermined three-dimensional background image after merging image it is better.

Referring to Fig. 7, in some embodiments, the depth image that step 012 obtains active user includes：

0121：To active user's projective structure light；

0122：The structure light image that shooting is modulated through active user；With

0123：Phase information corresponding to each pixel of demodulation structure light image is to obtain depth image.

Referring again to Fig. 2, in some embodiments, depth image acquisition component 12 includes the He of structured light projector 121 Structure light video camera head 122.Step 0121 can realize by structured light projector 121, and step 0122 and step 0123 can be by Structure light video camera head 122 is realized.

In other words, structured light projector 121 can be used for active user's transmittance structure light；Structure light video camera head 122 can For shooting the structure light image modulated through active user, and phase information corresponding to each pixel of demodulation structure light image To obtain depth image.

Specifically, structured light projector 121 is by the face and body of the project structured light of certain pattern to active user Afterwards, the structure light image after being modulated by active user can be formed in the face of active user and the surface of body.Structure light images Structure light image after first 122 shooting is modulated, then structure light image is demodulated to obtain depth image.Wherein, structure The pattern of light can be laser stripe, Gray code, sine streak, non-homogeneous speckle etc..

Referring to Fig. 8, in some embodiments, phase corresponding to each pixel of step 0123 demodulation structure light image The step of information is to obtain depth image includes：

01231：Phase information corresponding to each pixel in demodulation structure light image；

01232：Phase information is converted into depth information；With

01233：Depth image is generated according to depth information.

Referring again to Fig. 2, in some embodiments, step 01231, step 01232 and step 01233 can be by tying Structure light video camera head 122 is realized.

In other words, structure light video camera head 122 can be further used in demodulation structure light image phase corresponding to each pixel Position information, phase information is converted into depth information, and depth image is generated according to depth information.

Specifically, compared with non-modulated structure light, the phase information of the structure light after modulation is changed, and is being tied The structure light showed in structure light image is to generate the structure light after distortion, wherein, the phase information of change can characterize The depth information of object.Therefore, structure light video camera head 122 demodulates phase corresponding to each pixel in structure light image and believed first Breath, calculates depth information, so as to obtain final depth image further according to phase information.

In order that those skilled in the art is more apparent from gathering the face of active user and body according to structure The process of the depth image of body, illustrate it by taking a kind of widely used optical grating projection technology (fringe projection technology) as an example below Concrete principle.Wherein, optical grating projection technology belongs to sensu lato area-structure light.

As shown in Fig. 9 (a), when being projected using area-structure light, sine streak is produced by computer programming first, And sine streak is projected to measured object by structured light projector 121, recycle structure light video camera head 122 to shoot striped by thing Degree of crook after body modulation, then demodulates the curved stripes and obtains phase, then phase is converted into depth information to obtain Depth image.The problem of to avoid producing error or error coupler, needed before carrying out depth information collection using structure light to depth Image collection assembly 12 carries out parameter calibration, and demarcation includes geometric parameter (for example, structure light video camera head 122 and project structured light Relative position parameter between device 121 etc.) demarcation, the inner parameter and structured light projector 121 of structure light video camera head 122 The demarcation of inner parameter etc..

Specifically, the first step, computer programming produce sine streak.Need to obtain using the striped of distortion due to follow-up Phase, for example phase is obtained using four step phase-shifting methods, therefore the striped that four width phase differences are pi/2, then structure light are produced here The projector 121 projects the four spokes line timesharing on measured object (mask shown in Fig. 9 (a)), and structure light video camera head 122 gathers To the figure on such as Fig. 9 (b) left sides, while to read the striped of the plane of reference shown on the right of Fig. 9 (b).

Second step, carry out phase recovery.The bar graph that structure light video camera head 122 is modulated according to four width collected is (i.e. Structure light image) to calculate the phase diagram by phase modulation, now obtained be to block phase diagram.Because four step Phase-shifting algorithms obtain Result be that gained is calculated by arctan function, therefore the phase after structure light modulation is limited between [- π, π], that is, Say, the phase after modulation exceedes [- π, π], and it can restart again.Shown in the phase main value such as Fig. 9 (c) finally given.

Wherein, it is necessary to carry out the saltus step processing that disappears, it is continuous phase that will block phase recovery during phase recovery is carried out Position.As shown in Fig. 9 (d), the left side is the continuous phase bitmap modulated, and the right is to refer to continuous phase bitmap.

3rd step, subtract each other to obtain phase difference (i.e. phase information) by the continuous phase modulated and with reference to continuous phase, should Phase difference characterizes depth information of the measured object with respect to the plane of reference, then phase difference is substituted into the conversion formula (public affairs of phase and depth The parameter being related in formula is by demarcation), you can obtain the threedimensional model of the object under test as shown in Fig. 9 (e).

It should be appreciated that in actual applications, according to the difference of concrete application scene, employed in the embodiment of the present invention Structure light in addition to above-mentioned grating, can also be other arbitrary graphic patterns.

As a kind of possible implementation, the depth information of pattern light progress active user also can be used in the present invention Collection.

Specifically, the method that pattern light obtains depth information is that this spreads out using a diffraction element for being essentially flat board The relief diffraction structure that there are element particular phases to be distributed is penetrated, cross section is with two or more concavo-convex step embossment knots Structure.Substantially 1 micron of the thickness of substrate in diffraction element, each step it is highly non-uniform, the span of height can be 0.7 Micron~0.9 micron.Structure shown in Figure 10 (a) is the local diffraction structure of the collimation beam splitting element of the present embodiment.Figure 10 (b) is Along the cross sectional side view of section A-A, the unit of abscissa and ordinate is micron.The speckle pattern tool of pattern photogenerated Have the randomness of height, and can with the difference of distance changing patterns.Therefore, depth letter is being obtained using pattern light Before breath, it is necessary first to the speckle pattern in space is calibrated, for example, in 0~4 meter of scope of distance structure light video camera head 122 It is interior, a reference planes are taken every 1 centimetre, then just save 400 width speckle images after demarcating, the spacing of demarcation is smaller, The precision of the depth information of acquisition is higher.Then, structured light projector 121 is (i.e. current by pattern light projection to measured object User) on, the speckle pattern that the difference in height on measured object surface to project the pattern light on measured object changes.Knot Structure light video camera head 122 is shot project speckle pattern (i.e. structure light image) on measured object after, then by speckle pattern and early stage The 400 width speckle images preserved after demarcation carry out computing cross-correlation one by one, and then obtain 400 width correlation chart pictures.Quilt in space Position where surveying object can show peak value on correlation chart picture, and above-mentioned peak value is superimposed and passes through interpolation arithmetic The depth information of measured object is can obtain afterwards.

Most diffraction lights are obtained after diffraction is carried out to light beam due to common diffraction element, but per beam diffraction light light intensity difference Greatly, it is also big to the risk of human eye injury.Re-diffraction even is carried out to diffraction light, the uniformity of obtained light beam is relatively low. Therefore, the effect projected using the light beam of common diffraction element diffraction to measured object is poor.Using collimation in the present embodiment Beam splitting element, the element not only have the function that to collimate uncollimated rays, also have the function that light splitting, i.e., through speculum The non-collimated light of reflection is emitted multi-beam collimation light beam, and the multi-beam collimation being emitted after collimating beam splitting element toward different angles The area of section approximately equal of light beam, flux of energy approximately equal, and then to carry out using the scatterplot light after the beam diffraction The effect of projection is more preferable.Meanwhile laser emitting light is dispersed to every light beam, the risk of injury human eye is reduce further, and dissipate Spot structure light is for other uniform structure lights of arrangement, when reaching same collection effect, the consumption of pattern light Electricity is lower.

Figure 11 is referred to, in some embodiments, step 013 handles scene image and depth image to extract current use People object area of the family in scene image and obtaining personage's area image also includes：

0131：Identify the human face region in scene image；

0132：Depth information corresponding with human face region is obtained from depth image；

0133：The depth bounds of people's object area is determined according to the depth information of human face region；With

0134：The personage area for determining to be connected and fallen into depth bounds with human face region according to the depth bounds of people's object area Domain is to obtain personage's area image.

Referring again to Fig. 2, in some embodiments, step 0131, step 0132, step 0133 and step 0134 To be realized by processor 20.

In other words, processor 20 can be further used for identifying the human face region in scene image, be obtained from depth image Depth information corresponding with human face region is taken, the depth bounds of people's object area is determined according to the depth information of human face region, and Determine to be connected with human face region according to the depth bounds of people's object area and people's object area for falling into depth bounds is to obtain personage Area image.

Specifically, the human face region that the deep learning Model Identification trained can be used to go out in scene image first, with The depth information of human face region is can determine that according to the corresponding relation of scene image and depth image afterwards.Because human face region includes The features such as nose, eyes, ear, lip, therefore, depth number of each feature corresponding in depth image in human face region According to being different, for example, in face face depth image acquisition component 12, depth that depth image acquisition component 12 is shot In image, depth data corresponding to nose may be smaller, and depth data corresponding to ear may be larger.Therefore, above-mentioned people The depth information in face region may be a numerical value or a number range.Wherein, when the depth information of human face region is one During individual numerical value, the numerical value can be by averaging to obtain to the depth data of human face region；Or can be by human face region Depth data take in be worth to.

Because people's object area includes human face region, in other words, people's object area is in some depth together with human face region In the range of, therefore, after processor 20 determines the depth information of human face region, it can be set according to the depth information of human face region The depth bounds of people's object area, the depth bounds extraction further according to people's object area fall into the depth bounds and with human face region phase People's object area of connection is to obtain personage's area image.

In this way, personage's area image can be extracted from scene image according to depth information.Due to obtaining for depth information The image of the not factor such as illumination, colour temperature in by environment is taken to ring, therefore, the personage's area image extracted is more accurate.

Referring again to Figure 11, in some embodiments, step 013 processing scene image and depth image are current to extract People object area of the user in scene image and obtaining personage's area image also includes：

0135：Scene image is handled to obtain the whole audience edge image of scene image；With

0136：According to whole audience edge image amendment personage's area image of scene image.

Referring again to Fig. 2, in some embodiments, step 0135 and step 0136 can be realized by processor 20. In other words, processor 20 can also be used to handle scene image to obtain the whole audience edge image of scene image, and according to field Whole audience edge image amendment personage's area image of scape image.

Processor 20 carries out edge extracting to obtain the whole audience edge image of scene image to scene image first, wherein, Edge lines in the whole audience edge image of scene image include background object in scene residing for active user and active user Edge lines.Specifically, edge extracting can be carried out to scene image by Canny operators.Canny operators carry out edge extracting The core of algorithm mainly include the following steps：First, convolution is carried out to scene image with 2D gaussian filterings template to make an uproar to eliminate Sound；Then, the Grad of the gray scale of each pixel is obtained using differential operator, and the gray scale of each pixel is calculated according to Grad Gradient direction, adjacent pixels of the respective pixel along gradient direction can be found by gradient direction；Then, each picture is traveled through Element, if the gray value of some pixel is not maximum compared with the gray value of former and later two adjacent pixels on its gradient direction, that It is not marginal point to think this pixel.In this way, the pixel that marginal position is in scene image is can determine that, so as to obtain The whole audience edge image of scene image after edge extracting.

Processor 20 obtain scene image whole audience edge image after, further according to scene image whole audience edge image to people Object area image is modified.It is appreciated that personage's area image is will to be connected and fall into set with human face region in scene image Obtained after all pixels progress merger of fixed depth bounds, in some scenarios, it is understood that there may be some and human face region connect The object for connecing and falling into depth bounds.Therefore, to cause personage's area image of extraction more accurate, scene image can be used The whole audience edge graph personage's area image is modified.

Further, processor 20 can also carry out second-order correction to revised personage's area image, for example, can be to amendment Personage's area image afterwards carries out expansion process, expands personage's area image to retain the edge details of personage's area image.

Figure 12 is referred to, in some embodiments, step 014 melts personage's area image and predetermined three-dimensional background image Close is included with obtaining merging image：

01411：Obtain the predetermined integration region in predetermined three-dimensional background image；

01412：The pixel region to be replaced of predetermined integration region is determined according to personage's area image；With

01413：The pixel region to be replaced of predetermined integration region is replaced with into personage's area image to obtain merging image.

Referring again to Fig. 2, in some embodiments, step 01411, step 01412 and step 01413 can be by Reason device 20 is realized.

In other words, processor 20 can be further used for obtaining the predetermined integration region in predetermined three-dimensional background image, root The pixel region to be replaced of predetermined integration region, and the pixel to be replaced by predetermined integration region are determined according to personage's area image Region replaces with personage's area image to obtain merging image.

It is appreciated that when predetermined three-dimensional background image models to obtain by actual scene, in predetermined three-dimensional background image Depth data can be obtained directly in modeling process corresponding to each pixel；Pass through cartoon making in predetermined three-dimensional background image When obtaining, depth data corresponding to each pixel can be by producer's sets itself in predetermined three-dimensional background image；It is in addition, predetermined Each object present in three-dimensional background image is also known, therefore, is melted carrying out image using predetermined three-dimensional background image Before processing is closed, personage's area image first can be calibrated according to depth data and the object being present in predetermined three-dimensional background image Fusion position, i.e., predetermined integration region.The size of the personage's area image collected due to visible image capturing first 11 is gathered The influence of distance, when gathering closer to the distance, personage's area image is larger, and when gathering distant, personage's area image is smaller, because This, processor 20 need to according to first 11 actual acquisition of visible image capturing to the size of personage's area image determine predetermined integration region In pixel region to be replaced.Then, the pixel region to be replaced in predetermined integration region is replaced with into personage's area image i.e. Merging image after being merged.In this way, realize merging for personage's area image and predetermined three-dimensional background image.

Figure 13 is referred to, in some embodiments, step 014 melts personage's area image and predetermined three-dimensional background image Close is included with obtaining merging image：

01421：Predetermined three-dimensional background image is handled to obtain the whole audience edge image of predetermined three-dimensional background image；

01422：Obtain the depth data of predetermined three-dimensional background image；

01423：Predetermined three-dimensional Background is determined according to the whole audience edge image and depth data of predetermined three-dimensional background image The calculating integration region of picture；

01424：Determined to calculate the pixel region to be replaced of integration region according to personage's area image；With

01425：The pixel region to be replaced for calculating integration region is replaced with into personage's area image to obtain merging image.

Referring again to Fig. 2, in some embodiments, step 01421, step 01422, step 01423, step 01424 It can be realized with step 01425 by processor 20.

In other words, processor 20 can be further used for handling predetermined three-dimensional background image to obtain predetermined three-dimensional background The whole audience edge image of image, the depth data of predetermined three-dimensional background image is obtained, according to the whole audience of predetermined three-dimensional background image Edge image and depth data determine the calculating integration region of predetermined three-dimensional background image, determine to calculate according to personage's area image The pixel region to be replaced of integration region, and the pixel region to be replaced for calculating integration region is replaced with into personage's area image To obtain merging image.

It is appreciated that when if predetermined three-dimensional background image merges with personage's area image, the fusion position of personage's area image Not demarcation in advance is put, then processor 20 needs to determine fusion position of personage's area image in predetermined three-dimensional background image first. Specifically, processor 20 first carries out edge extracting to obtain whole audience edge image to predetermined three-dimensional background image, and obtains predetermined The depth data of three-dimensional background image, wherein, depth data obtains in the modeling of predetermined three-dimensional background image or animation process Take.Then, processor 20 determines predetermined three-dimensional background according to the whole audience edge image and depth data of predetermined three-dimensional background image Calculating integration region in image.Because the size of personage's area image is influenceed by the collection distance of visible image capturing first 11, Therefore, the size of personage's area image need to be calculated, and determines to calculate in integration region according to the size of personage's area image Pixel region to be replaced.Finally, the pixel region to be replaced calculated in integration region image is replaced with into personage's area image, from And obtain merging image.In this way, realize merging for personage's area image and predetermined three-dimensional background image.

Merging image after fusion can be shown on the display 50 (shown in Figure 14) of electronic installation 1000.

In some embodiments, personage's area image can be the personage's area image or three-dimensional of two dimension Personage's area image.Wherein, the depth information that processor 20 can be combined in depth image extracts from scene image obtains two dimension Personage's area image, processor 20 can also establish the 3-D view of people's object area according to the depth information in depth image, then Color is carried out with reference to the color information in scene image to people's object area of three-dimensional to fill up to obtain the colored personage area of three-dimensional Area image.

In some embodiments, the predetermined integration region in predetermined three-dimensional background image or calculating integration region can be One or more.When predetermined integration region is one, personage's area image of two-dimentional personage's area image or three-dimensional exists Fusion position in predetermined three-dimensional background image is set to as an above-mentioned unique predetermined integration region；It is when calculating integration region At one, the fusion position of two-dimentional personage's area image or three-dimensional personage's area image in predetermined three-dimensional background image is set to As above-mentioned unique calculating integration region；When predetermined integration region is multiple, two-dimentional personage's area image or three Fusion position of the personage's area image of dimension in predetermined three-dimensional background image can be any one in multiple predetermined integration regions It is individual, further, because personage's area image of three-dimensional has depth information, therefore it can be sought in multiple predetermined integration regions The predetermined integration region that the depth information with three dimensional character area image matches is looked for as position is merged, preferably to be melted Close effect；When it is multiple to calculate integration region, personage's area image of two-dimentional personage's area image or three-dimensional is calculating three It can be any one in multiple calculating integration regions to tie up the fusion position in background image, further, due to three-dimensional Personage's area image has depth information, therefore the depth with three dimensional character area image can be found in multiple calculating integration regions The calculating integration region of information match is spent as fusion position, to obtain more preferable syncretizing effect.

In some application scenarios, for example, active user carries out wishing to hide current background during video with other people, Now, you can using the image processing method of embodiment of the present invention by personage's area image corresponding to active user with making a reservation for three Background image fusion is tieed up, then the merging image after fusion is shown to other side.Due to active user just with other side's video calling, because This, it is seen that light video camera head 11 needs the scene image of captured in real-time active user, and depth image acquisition component 12 is also required to adopt in real time Collect depth image corresponding to active user, and the scene image and depth image gathered in real time is carried out in time by processor 20 Reason is to cause other side it can be seen that the smooth video pictures combined by multiframe merging image.Sound is got in processor 20 After sound characteristic, processor 20 handles multiframe personage area image and the switching that multiframe scene image and multiframe depth image obtain Predetermined three-dimensional background image afterwards is merged respectively, merges image so as to obtain multiframe.Multiframe merges image and is combined into regarding Frequency picture.

Also referring to 3 and Figure 14, embodiment of the present invention also proposes a kind of electronic installation 1000.Electronic installation 1000 wraps Include image processing apparatus 100.Image processing apparatus 100 can utilize hardware and/or software to realize.Image processing apparatus 100 wraps Include imaging device 10 and processor 20.

Imaging device 10 includes visible image capturing first 11 and depth image acquisition component 12.

Specifically, it is seen that light video camera head 11 includes imaging sensor 111 and lens 112, it is seen that light video camera head 11 can be used for The colour information of active user is caught to obtain scene image, wherein, imaging sensor 111 includes color filter lens array (such as Bayer filter arrays), the number of lens 112 can be one or more.Visible image capturing first 11 is obtaining scene image process In, each imaging pixel in imaging sensor 111 senses luminous intensity and wavelength information in photographed scene, generation one Group raw image data；Imaging sensor 111 sends this group of raw image data into processor 20, and processor 20 is to original View data obtains colored scene image after carrying out the computings such as denoising, interpolation.Processor 20 can be in various formats to original Each image pixel in view data is handled one by one, for example, each image pixel can have the locating depth of 8,10,12 or 14 bits Degree, processor 20 can be handled each image pixel by identical or different bit depth.

Depth image acquisition component 12 includes structured light projector 121 and structure light video camera head 122, depth image collection group The depth information that part 12 can be used for catching active user is to obtain depth image.Structured light projector 121 is used to throw structure light Active user is incident upon, wherein, structured light patterns can be the speckle of laser stripe, Gray code, sine streak or random alignment Pattern etc..Structure light video camera head 122 includes imaging sensor 1221 and lens 1222, and the number of lens 1222 can be one or more It is individual.Imaging sensor 1221 is used for the structure light image that capturing structure light projector 121 is projected on active user.Structure light figure As can be sent by depth acquisition component 12 to processor 20 be demodulated, the processing such as phase recovery, phase information calculate to be to obtain The depth information of active user.

In some embodiments, it is seen that the function of light video camera head 11 and structure light video camera head 122 can be by a camera Realize, in other words, imaging device 10 only includes a camera and a structured light projector 121, and above-mentioned camera is not only Structure light image can also be shot with photographed scene image.

Except using structure light obtain depth image in addition to, can also by binocular vision method, based on differential time of flight (Time Of Flight, TOF) even depth obtains the depth image of active user as acquisition methods.

Processor 20 is further used for the personage's area image and predetermined three that will be extracted from scene image and depth image Tie up background image fusion.Wherein, fusion treatment personage area image and can be by the people of two dimension during predetermined three-dimensional background image Object area image is merged with predetermined three-dimensional background image to obtain merging image or the colored personage by three-dimensional Area image is merged with predetermined three-dimensional background image to obtain merging image.

In addition, image processing apparatus 100 also includes memory 30.Memory 30 can be embedded in electronic installation 1000, The memory that can be independently of outside electronic installation 1000, and may include direct memory access (DMA) (Direct Memory Access, DMA) feature.The knot that the raw image data or depth image acquisition component 12 of first 11 collection of visible image capturing gather Structure light image related data, which can transmit, to be stored or is cached into memory 30.Processor 20 can be read from memory 30 Raw image data also can read structure light image related data to enter to be handled to obtain scene image from memory 30 Row processing obtains depth image.In addition, scene image and depth image are also storable in memory 30, device 20 for processing with When calling handle, for example, processor 20 calls scene image and depth image to carry out personage's extracted region, and by after extraction To personage's area image and initial predetermined three-dimensional background image or switching after predetermined three-dimensional background image carry out merging place Manage to obtain merging image.Wherein, predetermined three-dimensional background image and merging image may be alternatively stored in memory 30.

Image processing apparatus 100 may also include display 50.Display 50 can obtain merging figure directly from processor 20 Picture, it can also be obtained from memory 30 and merge image.The display of display 50 merges image so that user watches, or is drawn by figure Hold up or graphics processor (Graphics Processing Unit, GPU) is further processed.Image processing apparatus 100 Also include encoder/decoder 60, encoder/decoder 60 can encoding and decoding scene image, depth image, predetermined three-dimensional Background Picture and the view data for merging image etc., the view data of coding can be saved in memory 30, and can be shown in image By decoder decompresses to be shown before on display 50.Encoder/decoder 60 can be by central processing unit (Central Processing Unit, CPU), GPU or coprocessor are realized.In other words, encoder/decoder 60 can be Any one or more in central processing unit (Central Processing Unit, CPU), GPU and coprocessor.

Image processing apparatus 100 also includes control logic device 40.Imaging device 10 imaging when, processor 20 can according into As the data that equipment obtains are analyzed to determine one or more control parameters of imaging device 10 (for example, time for exposure etc.) Image statistics.Processor 20 sends image statistics to control logic device 40, the control imaging of control logic device 40 Equipment 10 is imaged with the control parameter determined.Control logic device 40 may include to perform one or more routines (such as firmware) Processor and/or microcontroller.One or more routines can determine imaging device 10 according to the image statistics of reception Control parameter.

Image processing apparatus 100 also includes acoustoelectric element 70.Acoustoelectric element 70 is converted sound using electromagnetic induction principle Exported for electric current.The air vibration inside acoustoelectric element 70 can be driven during active user's sounding so that inside acoustoelectric element 70 Occurs micro-displacement between coil and magnetic core, so as to which cutting magnetic induction line produces electric current.Electric current is sent to processing by acoustoelectric element 70 Device 20, processor 20 handle electric current to generate acoustic information.Acoustic information can be further handled to obtain sound via processor 20 Sound characteristic, it can also deliver to memory 30 and be stored.Acoustoelectric element 70 can be microphone.

Figure 15 is referred to, the electronic installation 1000 of embodiment of the present invention includes one or more processors 20, memory 30 and one or more programs 31.Wherein one or more programs 31 are stored in memory 30, and are configured to by one Individual or multiple processors 20 perform.Program 31 includes being used to perform the finger of the image processing method of above-mentioned any one embodiment Order.

For example, program 31 includes being used for the image processing method instruction for performing following steps：

02；Obtain the acoustic information of active user；

04：Acoustic information is handled to obtain sound property；With

And for example, program 31 includes being used to perform the instruction of the image processing method of following steps：

0131：Identify the human face region in scene image；

The computer-readable recording medium of embodiment of the present invention includes being combined with the electronic installation 1000 that can be imaged making Computer program.Computer program can be performed by processor 20 to complete the image procossing of above-mentioned any one embodiment Method.

For example, computer program can be performed by processor 20 to complete the image processing method described in following steps：

02；Obtain the acoustic information of active user；

04：Acoustic information is handled to obtain sound property；With

And for example, computer program can be performed by processor 20 to complete the image processing method described in following steps：

0131：Identify the human face region in scene image；

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification Close and combine.

In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three It is individual etc., unless otherwise specifically defined.

Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize specific logical function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following：Electricity with one or more wiring Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, have suitable combinational logic gate circuit application specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and as independent production marketing or in use, can also be stored in a computer In read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention Type.

Claims

1. a kind of image processing method, merge image for handling, the merging image is by predetermined three-dimensional background image and currently Personage area image fusion of the user in the scene image under real scene forms, it is characterised in that described image processing side Method includes：

Obtain the acoustic information of the active user；

The acoustic information is handled to obtain sound property；With

2. image processing method according to claim 1, it is characterised in that described image processing method also includes：

Obtain the scene image of the active user；

Obtain the depth image of the active user；

The scene image and the depth image are handled to extract personage area of the active user in the scene image Domain and obtain personage's area image；With

Personage's area image is merged to obtain merging image with the predetermined three-dimensional background image.

3. image processing method according to claim 1, it is characterised in that loudness of the sound property including sound, Any one in tone, tone color.

4. image processing method according to claim 1, it is characterised in that the sound property is described pre- including multiple Determining three-dimensional background image includes multiple, each corresponding predetermined three-dimensional background image of the sound property, the basis The step of sound property switching predetermined three-dimensional background image, includes：

Predetermined three-dimensional background image corresponding with the sound property is switched according to the sound property.

5. image processing method according to claim 1, it is characterised in that the predetermined three-dimensional background image includes multiple And stored with predefined procedure, described the step of switching the predetermined three-dimensional background image according to the sound property, includes：

Multiple three-dimensional background images are switched according to the sound property in a predefined manner.

6. image processing method according to claim 2, it is characterised in that the depth map for obtaining the active user The step of picture, includes：

To active user's projective structure light；

The structure light image that shooting is modulated through the active user；With

Phase information corresponding to each pixel of the structure light image is demodulated to obtain the depth image.

7. image processing method according to claim 6, it is characterised in that described to demodulate each of the structure light image The step of phase information corresponding to pixel is to obtain the depth image includes：

Demodulate phase information corresponding to each pixel in the structure light image；

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

8. image processing method according to claim 2, it is characterised in that it is described by personage's area image with it is described Predetermined three-dimensional background image is merged to include the step of obtaining and merge image：

Obtain the predetermined integration region in the predetermined three-dimensional background image；

The pixel region to be replaced of the predetermined integration region is determined according to personage's area image；With

The pixel region to be replaced of the predetermined integration region is replaced with into personage's area image to obtain the merging figure Picture.

9. image processing method according to claim 2, it is characterised in that it is described by personage's area image with it is described Predetermined three-dimensional background image is merged to include the step of obtaining and merge image：

The predetermined three-dimensional background image is handled to obtain the whole audience edge image of the predetermined three-dimensional background image；

Obtain the depth data of the predetermined three-dimensional background image；

The predetermined three-dimensional background is determined according to the whole audience edge image of the predetermined three-dimensional background image and the depth data The calculating integration region of image；

The pixel region to be replaced of the calculating integration region is determined according to personage's area image；With

The pixel region to be replaced of the calculating integration region is replaced with into personage's area image to obtain the merging figure Picture.

10. a kind of image processing apparatus, merge image for handling, the merging image is by predetermined three-dimensional background image and currently Personage area image fusion of the user in the scene image under real scene forms, it is characterised in that described image processing dress Put including：

Acoustoelectric element, the acoustoelectric element are used for the acoustic information for obtaining the active user；With

Processor, the processor are used for：

The acoustic information is handled to obtain sound property；With

11. image processing apparatus according to claim 10, it is characterised in that described image processing unit also includes：

Visible image capturing head, the visible image capturing head are used for the scene image for obtaining the active user；

Depth image acquisition component, the depth image acquisition component are used for the depth image for obtaining the active user；

The processor is additionally operable to：

12. image processing apparatus according to claim 10, it is characterised in that the sound property includes the sound of sound Any one in degree, tone, tone color.

13. image processing apparatus according to claim 10, it is characterised in that the sound property is described including multiple Predetermined three-dimensional background image includes multiple, each corresponding predetermined three-dimensional background image of the sound property, the place Reason device is additionally operable to：

14. image processing apparatus according to claim 10, it is characterised in that the predetermined three-dimensional background image includes more Individual and stored with predefined procedure, the processor is additionally operable to：

15. image processing apparatus according to claim 11, it is characterised in that the depth image acquisition component includes knot Structure light projector and structure light video camera head, the structured light projector are used for active user's projective structure light；

The structure light video camera head is used for：

16. image processing apparatus according to claim 15, it is characterised in that the structure light video camera head is additionally operable to：

The phase information is converted into depth information；With

The depth image is generated according to the depth information.

17. image processing apparatus according to claim 11, it is characterised in that the processor is additionally operable to：

18. image processing apparatus according to claim 11, it is characterised in that the processor is additionally operable to：

19. a kind of electronic installation, it is characterised in that the electronic installation includes：

One or more processors；

Memory；With

One or more programs, wherein one or more of programs are stored in the memory, and be configured to by One or more of computing devices, described program include being used at the image that perform claim is required described in 1 to 9 any one The instruction of reason method.

A kind of 20. computer-readable recording medium, it is characterised in that the meter being used in combination including the electronic installation with that can image Calculation machine program, the computer program can be executed by processor to complete the image procossing described in claim 1 to 9 any one Method.