CN1466737A

CN1466737A - Image conversion and encoding techniques

Info

Publication number: CN1466737A
Application number: CNA018162142A
Authority: CN
Inventors: P・V・哈曼; P·V·哈曼; 福克斯; S·R·福克斯; 多雷; M·R·多雷; 福来克; J·C·福来克
Original assignee: DYNAMIC DIGITAL RESEARCH Pty Ltd
Current assignee: DYNAMIC DIGITAL RESEARCH Pty Ltd
Priority date: 2000-08-09
Filing date: 2001-08-09
Publication date: 2004-01-07

Abstract

A method of creating a depth map including the steps of assigning a depth to at least one pixel or portion of an image, determining relative location and image characteristics for each at least one pixel or portion of the image, utilising the depth(s), image characteristics and respective location to determine an algorithm to ascertain depth characteristics as a function relative location and image characteristics, utilising said algorithm to calculate a depth characteristics for each pixel or portion of the image, wherein the depth characteristics form a depth map for the image. In a second phase of processing the said depth maps form key frames for the generation of depth maps for non-key frames using relative location, image characteristics and distance to key frame(s).

Description

Image transitions and coding techniques

Invention field

The present invention is a kind of improvement technology that obtains depth map from one or more 2D images.

Background of invention

The degree of depth of the Flame Image Process mission requirements image internal object of some is known.This task comprises uses the conversion to three-dimensional 3D of special-effect and 2D image to film and video sequence.The degree of depth of decision target is called as the process of creating depth map.In a depth map, each target all is colored as gray shade, thus with the degree of depth of shadow representation target to point of fixity.Usually, the target of distance will be colored as dark-coloured shade, and contiguous target with painted must be brighter.Also adopt to create the standard handovers of depth map, and can use down painted or use different colors to represent the different degree of depth.For the ease of laying down a definition in this article, the target of distance the color target nearer than distance dark, and the painted normally progressive series of greys.

Adopt manual method from existing 2D image creation depth map traditionally always.Be understandable that image is a series of pixels for computing machine, although operating personnel can the resolution target and the relevant degree of depth thereof.

The establishment of depth map relates to a kind of system and distributes to the degree of depth of target, wherein can delineate out the target image that each will be changed by hand by this system.Be appreciated that this process is slow, consuming time and expensive.The common combined mouse of the step of delineating uses software program to finish.The example that can be used for carrying out the software program of this task is Adobe " After Effects ".Use the operating personnel of After Effects to distribute the profile of each target of the degree of depth to delineate on every side at needs usually, and come object is filled or " painted " apart from observer's the degree of depth or the gray shade of distance with decision subsequently.This process will be carried out repetition to each target in the image subsequently.In addition, comprising the occasion of several images, in film, also must carry out these steps to each image or the frame of film.

In traditional system, the profile of image is described as the form of curve (such as Bezier) usually.Use to this curve makes the operator can change the shape of profile, thereby the profile of this target can accurately be aimed at this target.

If a series of images needs to draw---such as---degree of depth of film or video then needs continuously each frame to be repeated this process.

The size of a target, position and/or the degree of depth may change continuously.In this case, the operator need manually follow the tracks of the target in each frame and handle each frame by fair curve, and upgrades target depth by changing gray shade as required.Be appreciated that this is slow, a dull consuming time and expensive process.

Past has made a try to improve this process.Description of the Prior Art be intended to the technology of automatically track target profile when target one frame frame moves.The application that an example of this technology is Active Contours (list of references: Active Contours-Andrew Blake and Michael Isard-ISBN 3-540-76217-5).The main limitation of this method is the desired movement that tracked target need be provided the software that uses this technology.When desired movement is the unknown and complex deformation, when perhaps needing a plurality of targets with different motion characteristic are followed the tracks of simultaneously, this just is greatly limited.

Also used the motion of determining outline line based on the tracking of point.These are general in the editing environment such as Commotion and After Effects.But the limitation of using them is very large, because can not determine the suitable trace point of a whole target travel of its motion reflection usually.When target was carried out simple translation, it was acceptable sometimes that point is followed the tracks of, but it can't be handled warpage, cover or multiple other common problem.

One tame Israel company, AutoMedia has produced the software product of a kind of AutoMasker of being called.It is followed the tracks of in profile and a frame frame ground that it makes the operator can delineate out target.The tracking of this product degree of depending on color of object is therefore just invalid when close color intersects.When such as, a target convergence observer or when travelling forward on screen, this product also has problem in the process of the target that the size of following the tracks of whole successive frame changes.

None can distribute or tracking depths figure satisfactorily in these methods, therefore manual system is still used in the establishment of depth map.

Also described other technology in the prior art, these technology depend on the restructure from motion to the video camera that was used to write down the 2D sequence originally.The limitation of these technology is to need camera motion in the original image sequence and the clearly feature of definition that can be used as trace point in each frame.

Goal of the invention

Now, the operator must manually create the depth map of each picture frame to obtain acceptable result.The objective of the invention is to reduce the quantity of the frame that needs the manual creation degree of depth, create the needed time of depth map thereby reduce the operator.

A framing that needs the manual creation depth map is still arranged.Another object of the present invention is to assist these frames are carried out the manual process that depth map is created.

Summary of the invention

Considered the above object, the invention provides a kind of method of creating depth map, it may further comprise the steps:

At least one pixel or part to image are distributed a degree of depth;

Determine each pixel at least of described image or the relative position and the picture characteristics of part;

Utilize the described degree of depth, picture characteristics and each relative position to determine the configuration of first algorithm, thereby determine depth characteristic as the function of relative position and picture characteristics;

Utilize described first algorithm to calculate each pixel of described image or the depth characteristic of part;

Wherein, described depth characteristic has formed the depth map of described image.

Another aspect of the present invention provides a kind of method of creating depth map, and it comprises the steps:

At least one pixel or a part to an image are distributed the degree of depth;

Determine the x of each the described pixel at least or the part of described image, y coordinate and picture characteristics;

Utilize the described degree of depth, picture characteristics and each x, the y coordinate is determined first algorithm, thereby determines as x the depth characteristic of the function of y coordinate and picture characteristics;

Use described first algorithm to calculate each pixel of described image or the depth characteristic of part;

Another aspect of the present invention provides a kind of and has created the method for a series of depth maps for image sequence, and it may further comprise the steps:

Receive the depth map of at least one frame of described image sequence;

Use described depth map to determine the configuration of an algorithm, thereby determine depth characteristic as the function of relative position and picture characteristics;

Use described algorithm to create the depth map of each frame in the described image sequence.

Of the present invention also have an aspect that a kind of method of creating a series of depth maps of image sequence is provided, and it may further comprise the steps:

From described image sequence, select at least one key frame;

For each at least one key frame, at least one pixel or the part of each frame are distributed a degree of depth;

Determine an at least one pixel of each described frame or a part relative position (such as, x, y coordinate) and picture characteristics;

Use the degree of depth, picture characteristics and the relative position of each described at least one frame to determine the configuration of the algorithm of each described at least one frame, thereby determine depth characteristic as the function of relative position and depth characteristic;

Each that use described algorithm disposes each pixel of calculating each described at least one frame or the depth characteristic of part;

Wherein, described depth characteristic has formed the depth map of each described at least one frame,

Use every width of cloth depth map to determine second configuration of second algorithm, thus determine every width of cloth frame as relative position and

Use every width of cloth depth map to determine second configuration of second algorithm, thereby determine the depth characteristic of every width of cloth frame as the function of relative position and picture characteristics;

Use described second algorithm to create each depth map of the every frame of described image sequence.

Be understandable that in fact the system that relates to algorithm can create several different functions, thereby create depth map as relative position and picture characteristics result.In preferable system, relative position will be x, the measured value of y coordinate.

Realize that system of the present invention can select to be scheduled to, which frame is taken as key frame in the sequence, such as each the 5th frame.This algorithm also can be thought ideally and is input to the time of algorithm with further refinement processing procedure.

The invention summary

The present invention is intended to improve the processing of the depth map that produces relevant 2D image.This preferred embodiment comprises two stages that produce the key frame depth map and produce residual graph.

First stage has obtained data in a small amount from the user.The basic structure of this data representation scenery.The data that the 2D image is relevant with it have produced a kind of algorithm, and this algorithm can obtain the relation between the depth z of being distributed to the different images pixel by the user, its x and y position and picture characteristics.This picture characteristics comprises the rgb value of each pixel, when being not limited only to this.Generally speaking, this algorithm answered the user in frame for the equation z=f of each pixel definition (x, y, R, G, B).

The relation that this algorithm will obtain subsequently puts in the image other pixel, to produce depth map.If desired, the user can refine its data to improve the accuracy of depth map.Should be noted that initial depth data not necessarily will be stipulated by the user---it can be by some other processing decide, these processing comprise from motion algorithm to be used automatic structure or obtains depth prediction from stereo-picture, but is not limited only to this.

Subordinate phase requires at selected key frame place the 2D image to be set and reaches relevant depth map.Can perhaps use degree of depth capturing technology to produce automatically as before being produced as open by the applicant at the depth map at these key frame places, these technology comprise laser range finder, i.e. LIDAR (radiation direction and scope) instrument and depth of field technology, but be not limited only to this.

Can obtain a kind of algorithm by the 2D image of each key frame and relevant depth map, this algorithm can obtain the relation between depth z, its x and the y position and the picture characteristics of each pixel in distributing to other frame.This picture characteristics comprises the rgb value of each pixel, when being not limited only to this.Generally speaking, this algorithm answered the equation z=f of each pixel in the key frame (x, y, R, G, B).

This algorithm presents with each successive frame that is close between the key frame subsequently, and each pixel is all used the value of this algorithm computation z.

Description of drawings

Fig. 1 shows an embodiment of stage 1 training process.

Fig. 2 shows an embodiment of stage 1 transfer process.

Fig. 3 shows an embodiment of stages 2 training process.

Fig. 4 shows an embodiment of stages 2 transfer process.

How Fig. 5 can cut apart feature space if having described learning process.

Fig. 6 shows the replaceable process of a depth map generation in stage 2.

Fig. 7 shows the replaceable method of determining each pixel depth in the stage 2.

Fig. 8 shows the process of search candidate training sample.

Fig. 9 shows from the process of several candidate's training sample compute depth.

Detailed description to invention

The invention provides a kind of improvement technology that obtains depth map from one or more 2D images.The present invention preferably comprises two stages, all comprises an automatic learning process on each stage theory.

Stage 1

Phase one operates on single image.This image is presented to the user, and the user uses simple graphical interfaces to determine the approximate depth of zones of different in the image.This graphical interfaces can provide and help the user and distribute the instrument of the degree of depth for pixel, and this instrument comprises pen and painting brush instrument, regional fill tool and according to the instrument of the pixel color distribution degree of depth, but is not limited only to this.The result of this process determines the degree of depth for the subset of pixels in the image.

Figure 1 illustrates this process, wherein 2D image 1 can be presented to the user.The user can be that pixels different in the image 2 is distributed the degree of depth subsequently.In the example of Fig. 1, the pixel of putting on " X " is also not by the pixel of user's prescribed depth.System makes 2D image 1 interrelated with customer-furnished depth data 2 subsequently, and uses training algorithm 3 to assist the establishment of mapping function 4, and this algorithm can be answered the function of each pixel depth in the image.

Customer-furnished information has determined the training data that uses with learning process, this learning process that will describe afterwards to make the degree of depth relevant with each pixel in the described single image.This process can be interactive, because the user can only determine some regional approximate depth.According to the result of described regional learning process, the user can be the zone that learning process finishes relatively poorly further depth prediction is provided.Interaction between this user and the learning process can repeat for several times.In fact, the user can instruct this learning process in this stage.Should be noted that the original degree of depth might not be stipulated by the user---it can other process decides by some as described above.

Create mapping function

In case for this system provides image and some pixel depths, then this system uses this pixel of determined depth analysis subsequently, thereby has created mapping function.This mapping function can be a process or one with any from image a pixel or the measured value of one group of pixel as input and with this pixel function of providing as output of the depth value of this group pixel maybe.

Each pixel measured value can comprise the value of red, green and blue look, or other measured value such as brightness, colourity, contrast and the level in being fixed on image with vertical the space measurement value.In addition, this mapping function can be operated the characteristics of image of the higher level such as big pixel groups, or on pixel groups or the measured value such as mean value and variation of positions such as edge, corner (that is the response of property detector).Big pixel groups can represent that such as the section in the image, this section is the pixel groups that forms the connection in same sex zone.

Just for purpose of description, pixel can x, y, and R, G, B, the form of z represents that wherein, x and y represent the relative position as pixel x and y coordinate, and R, G, B represent redness, green and the blue value of this pixel, and z then represents the degree of depth of this pixel.The value of z has only stipulated that the user position of a value is determined.

Obtain this mapping function by the view data and the relation between the depth data of catching by user-defined pixel.This mapping function can adopt the form of any common process unit, wherein receives and handle the input data, and output is provided.Preferably, this processing unit is subjected to the effect of learning process, and wherein, its character decides by checking user data and relevant view data.

Those people that work in artificial intelligence or machine learning field will understand, the learning process that should concern between input data and the desired output, and this process can adopt many forms.It is to be noted that these people generally can not work in stero or 2D image in the field of the conversion of 3D.In machine learning, such mapping function is known, and comprises neural network, decision tree, decision-making figure, model tree and nearest neighbor classifier, but is not limited only to this.The preferred embodiment of learning algorithm is to seek a kind of mapping function of design, and the error that this mapping function is measured some mappings is reduced to minimum and is generalized to the outer value of original data set satisfactorily.

This learning algorithm both can attempt determining the relation between the overall depth on 2D image information and the entire image, can determine the relation of the degree of depth on itself and the local less area of space again.

Can use this relation subsequently to finish the depth map of whole sequence.

Figure 2 illustrates this relation, wherein import data to the mapping function of being created 4, thereby create the depth map 5 of 2D image 1 from 2D image 1.

The example of successful learning algorithm is the K-mean value algorithm of C4.5 algorithm, local weighted linear regression method and study cluster type sorter of back-propagating algorithm, the learning decision tree of learning neural network.

Only, can consider that this learning algorithm calculates the following relation of each pixel in the 2D image sequence frame for purpose of description

z _n=k _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _nWherein

N is a n pixel in the key frame images

z _nBe to distribute to x _n, y _nThe depth value of place's pixel

k _aTo k _eBe constant and determine by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel

Figure 1 illustrates this process.

The personage that is skilled in technique will understand, and above equation has just been done simplification for convenience, in practice can not ideal operation.Using such as neural network and fixed in the practical application of the big number of pixels in the image, this network contains study a big equation of many k values, multiplication and addition.In addition, this K value different x in image can change on the y position, to be fit to the part image feature.

To this mapping function of 2D image applications

Then, the present invention will adopt this mapping function and be applied on the entire frame of 2D image sequence.For given pixel, the input of mapping function is determined in the mode similar to obtain mapping function in learning process.For example, if this mapping function obtains as input by the measured value with single pixel, then mapping function will need these identical measured values as input.By these inputs, this mapping function has been finished the task that it learns and has been exported depth measurement.Equally, in this embodiment, for single pixel, this depth measurement can be simple depth value.In this example, on entire image, use this mapping function, thereby finish whole depth data groups of image.In addition, if use bigger pixel groups to train mapping function, then need to produce this big pixel groups for image.With finishing measurement, such as mean value and variance to these pixel groups higher levels with mode identical in learning process.Utilize these inputs of setting up now, mapping function has generated for the desired depth of this pixel groups and has measured.

Figure 2 illustrates this process, and obtained the complete depth map of 2D image.If resulting depth map includes the zone of error, then will make amendment and repeat this process user data to revise these zones.This mapping function also can be used for other frame to produce depth map.

The people that the machine learning field is familiar with will be understood that this training stage can be implied in a general configuration of algorithm.This method is called as the example based on study, and comprises the technology such as local weighted linear regression, but is not limited only to this.In interchangeable embodiment, one group of target of user's definable and to these Target Assignment pixels.In this embodiment, the process that the user data of other pixel of image is concluded can be divided into entire image the target group by user's original definition.Mapping function objective definition or target itself can be the needed output of this embodiment.In addition, can be to the intended application function stipulating the degree of depth of these targets, thus the depth map of composing images.These functions can adopt other method of the degree of depth of objective definition as degree of depth slope form and other such as the definition in first to file PCT/AU00/00700.

In another alternative embodiment, this training algorithm can be attempted introducing a random component to user profile.Utilize any learning algorithm all can help to overcome the problem of overtraining.Overtrain and refer to the situation that learning algorithm is only remembered training information.This is like being that a child only learns to write multiplication table and the same to the no any understanding of multiplication itself.This problem is known in the field of machine learning, and a method alleviating this problem is exactly to introduce random noise to training data.To apply a better learning algorithm between the noise of training data and quality information, to distinguish.Carrying out in the middle of this process the character of the data that encourage learning rather than only remember it.An embodiment of this method refers to previous example, wherein the following function of this training algorithm study:

z _n＝k _a.x _n+k _b.y _n+k _c.R _n+k _d.G _n+k _e.B _n

When to training algorithm input z, x, y, R, G and B the time, can add little noise component to these values.This noise component can be a little positive random number or a negative random number.In preferred embodiment, do not add any noise to the z component.

Learning process

In preferred embodiment, the input of learning process is:

1. several training samples, they have some characteristic that comprises the degree of depth.

2. several " classification " samples, they have the characteristic of coupling training sample, and its degree of depth is determined by learning process.

Each pixel that this training sample is included, its characteristic comprise this locations of pixels (x, y), color (R, G, B) and the degree of depth (z).This learning process purpose calculate exactly its characteristic comprise the position (x, y) and color (R, G, the degree of depth of each classified pixels B) (z).

For each classification samples, the phase one of learning algorithm comprises the grouping of recognition training sample, and this station work sample is shared the picture characteristics with under discussion classified pixels " similar ".

Search candidate training sample

In order to discern and the similar training sample of current classification samples characteristic, we imagine the n dimensional feature space of a generation sample.In preferred embodiment, this is one 5 dimension space, and each venn diagram shows picture characteristics: x, y, R, G and a B.The axle in this space is carried out normalization, to consider the difference in each dimension scope.Therefore we can be point out difference between the sample with relative percentage.For example, the R component of given sample can have the difference of 10% (with respect to the absolute range of R component) with respect to second sample.

Distance in this space between two samples is their the measuring of similarity.In order to detect the training sample similar, defined a search radius to current classification samples.Any distance to classification samples all is considered to similar to classification samples less than the training sample of search radius and is used to the calculating of the degree of depth.Use simple Euclid's yardstick to measure distance in the n dimension search volume.In the data that do not take n dimensional feature space major part, use the Mahalanobis generalised distance yardstick so that result preferably to be provided.The replaceable method of the extension data scope such as the principle analysis of histogram equalization or RGB, YUV or HSV component provides confers similar advantages.

In the accurate estimation to the degree of depth, search radius is crucial parameter, and is provided with respect to data characteristic.In presenting high spatial or autocorrelative data of high time, the value of this radius is set to less than the value with low spatial or low autocorrelative image of time.

Search radius can be different for each dimension of feature space.For example, the search radius in the x axle can be different from the central search radius of axle of expression red color intensity.In addition, learning process can be applied to these parameters the data in certain user-defined boundary.For example, if unidentified to any suitable training sample among the colored radius of 5% space radius and 10%, then this space radius will be added to 10%.

Fig. 8 shows the exemplary that candidate search is handled.This figure has described one 2 dimension search volumes, to describe variation in the space x coordinate that the figure of institute paints sample to the variation in the red color intensity.In this space several training samples 20.In the distance of first radius 21 of object pixel 11, there is not training sample.Therefore, learning process with its search extension to second search radius 22 of object pixel 11 and recognize 3 candidate's training samples.

Can use interchangeable search strategy with identification suitable training candidate samples.Use this strategy, training data can be stored with the structure such as Hash tree, k-d tree or n dimension Voronoi diagram.Though should improve the used speed of identification candidate's training sample by strategy, they can not influence character of the present invention.

Similarly, by the search strategy of the classification samples degree of approach thereafter in the high-speed cache training sample development features space, can improve the speed of identification candidate training sample, but can not added among the present invention in large quantities.

Distance weighted study

In order to calculate the degree of depth of any given classification samples, we need one or more training samples, and it is similar to aforesaid classification samples that this training sample is considered to.We call " candidate " training sample to these training samples.

We are the weighted mean value of candidate's training sample degree of depth with the depth calculation of classification samples.The weighted value of any candidate's training sample all with respect in n-dimensional space apart from the distance of classification samples.As mentioned above, this distance is by normalization and can use the Mahalanobis yardstick or the fundamental component type analysis carries out data biasings.

Fig. 9 shows the example after the depth calculation process is simplified.As shown in Figure 8, Fig. 9 has described one 2 dimension search volumes, to describe being painted variation on the space x coordinate of sample to the variation on the red color intensity by figure.Candidate's training sample 19 shown in these and the object pixel 11 different distance (putting on w1, w2 and w3) of being separated by.The weighted mean value that this degree of depth can be used as candidate's training sample calculates, and uses following formula:

Wherein, D1 is the degree of depth with the training sample of object pixel 11 standoff distance w1, and D2 is the degree of depth with the training sample of object pixel standoff distance w2, and D3 is the degree of depth with the training sample of object pixel 11 standoff distance w3.

In preferred embodiment, square being inversely proportional to of weighted value and n-dimensional space middle distance.

Alternative embodiment

In alternative embodiment, learning process has been analyzed the whole rule of decision image property and sample depth relation of having supplied training data group and inference.

In this process, the n dimensional feature space is cut apart or is divided into one group of zone.Fig. 5 shows the reduced representation of this principle.In this example, the n-dimensional space boundary 23 of being made a strategic decision is divided into several rectangular areas.According to shared zone depth value is distributed to object pixel 11.

In operation, use M5 model tree algorithm to finish the division of feature space.The M5 algorithm has improved according to ground instance described above with two kinds of methods.The decision-making boundary needn't be perpendicular to the feature space axle, and the degree of depth can change by the linear function as picture characteristics in each zone.

Machine learning techniques skilled people will understand, and can use many study schemes to substitute M5 model tree algorithm, comprise neural network, decision tree, decision-making figure and nearest neighbour classification device.The actual characteristic of learning algorithm can not influence novelty of the present invention.

In preferred embodiment, operate learning process in image property x, y, R, G and B.Interchangeable embodiment can operate on the picture characteristics of the higher level such as big pixel groups, also can operate the measurement (that is response characteristic detecting device) such as mean value and variance or boundary value, corner value etc. on pixel groups.Big pixel groups can be such as the section in the presentation video, and they are the connection pixel groups that form the congeniality zone.

Subordinate phase

Subordinate phase is at the enterprising line operate of image sequence, wherein is defined as key frame to frame of major general.It receives the 3D stereo data for each key frame, common form with depth map.But any processing of this depth map attribution, the degree of depth of determining such as the output of artificial regulation, above-mentioned phase one, from stereo-picture or usable range are sought system and are directly obtained the degree of depth, but are not limited only to this.In addition, the 3D steric information can adopt other form except that depth map, such as the differential information that obtains from the key frame that comprises stereoscopic pair.

For other all frames in the 2D image sequence, the present invention provides regulation to depth map according to the key frame information of initial acquisition.Wish that quantity of key frames is the sub-fraction in the total quantity frame.Therefore, the invention provides a kind of method, it can significantly reduce the quantity that needs the depth map that generates at the very start.

Create mapping function

In case after this system being provided the depth map of key frame and their correspondences, this system is with regard to analysis of key frame and the corresponding depth map that obtains at first, thereby created mapping function.This mapping function can be with the given measured value of any 2D image process or the function as input, and the depth map of this image is provided as output.This mapping is learnt by the relation between the depth map data of catching key frame images data and these images.

This mapping function can adopt the form of any common process unit, wherein receives and handle the input data, and provides output.Preferably, this processing unit should be obeyed learning process, and wherein this character is determined by checking key frame data and its corresponding depth map.In the field of machine learning, this mapping function is known, and comprises neural network, decision tree, decision-making figure, model tree and near sorter, but is not limited only to this.

The relation between data and the required output data is attempted to learn to import by this system.In a learning process, for training algorithm provides information from the 2D key frame images.This information can be that unit is presented on the pixel with the pixel, wherein provide the pixel such as the red, green and blue colour to measure, or other measurement such as brightness, colourity, contrast and the space measurement the horizontal and vertical position in image.In addition, form that can the higher level characteristics of image is come presentation information, such as big pixel groups, and carries out measurement (that is the response of property detector) such as mean value and variance or boundary value, corner value etc. on pixel groups.Big pixel groups can be such as the section in the presentation video, and they are the connection pixel groups that form the congeniality zone.

Just in order to describe, the 2D image can x, y, and R, G, the form of B represents that wherein, x and y represent the x and the y coordinate of each pixel, and R, G, B then represent the red, green and blue colour of this pixel.

Then, present corresponding depth map to training algorithm, thereby can obtain its required mapping.Usually present each pixel to training algorithm.But if using the picture characteristics of higher level, such as big pixel groups or section, then this depth map can be the depth measurement of this pixel groups, such as mean value and variance.

The purpose in order to describe only, this depth map can z, x, and the form of y represents that wherein x and y represent the x and the y coordinate of each pixel, z then represents to distribute to the depth value of this respective pixel.

These personages in artificial intelligence field work will understand the process that this study concerns between input data and required output, and can adopt many forms.The preferred embodiment design of learning algorithm distributes can be with the minimized mapping function of the mapping error of some measurement.

This learning algorithm is attempted concluding 2D image information and is appeared at relation between the depth map in the key frame example.The form that to use this conclusion is subsequently finished the depth map of whole sequence.The example of successful learning algorithm known in the field is the backpropagation that is used for learning neural network, the C4.5 algorithm that is used for the learning decision tree and the K-mean value algorithm that is used to learn cluster type sorter.

Just, can suppose that this learning algorithm is to calculate the following relation of each pixel in the 2D image for purpose of description

z _n=k _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _nWherein,

N is n pixel in the key frame images

z _nBe to distribute to x _n, y _nThe depth value of place's pixel

k _aTo k _eBe constant and determine by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel

The personage that is skilled in technique will understand, the simplification that above equation is just done for the ease of explanation, and it can't be worked in practice.In actual applications, use the pixel such as big figure in neural network and the given image, this network will be learnt a big equation that contains k value, multiplication and addition.

Figure 3 illustrates this process, this process has shown the similar procedure that can use different number key frames.

Use mapping function

The present invention then will adopt this mapping function and also not obtain using it on the 2D image sets of depth map.For given 2D image in this group, be used in the input that the similar manner that offers mapping function in the learning process is determined mapping function.For example, if, then all will need these identical measurements the pixel of mapping function in new images by the measurement of single pixel is learnt mapping function as input.Utilize these inputs, this mapping function has been finished its learning of task and has been exported a depth measurement.In addition, in the example of single pixel, this depth survey can be a simple depth value.In this example, in the entire image sequence, use mapping function, to finish the entire depth data set of image sequence.In addition, if use bigger pixel groups to train mapping function, then need to produce bigger pixel groups like this for new image parameter.With with learning process in identical mode finish the measurement of higher level on these pixel groups, such as mean value and variance.Utilize these inputs of having set up, this mapping function has produced required depth measurement for this pixel groups.

For the 2D image sequence, the key frame with depth map is can any arbitrary mode spaced apart on sequence.In preferred embodiment, mapping function will be provided one group of key frame and their pairing depth maps that strides across the 2D image sets with some common ground.In the simplest situation, use two key frames to train mapping function, and use this mapping function to determine the depth map of 2D image between two described key frames subsequently.Yet, can be used for training the key frame of mapping function not have restricted number.In addition, the mapping function that is used to finish whole 2D image sets does not have restricted number yet.In preferred embodiment, two key frames that separated by one or more insertion frames are defined as the input of subordinate phase in this processing.The purpose in this stage is each the distribution depth map in these insertion frames.To inserting preferred order that frame distributes depth map is to be first near the processed frame of key frame in time.Processed frame becomes the key frame of depth map next frame subsequently.

Being added with of this time variable helps train the conclusion of function to the information that provides in the key frame.Lack time variable, the depth information in two key frames can contradict each other.When similar color pixel appeared at two the same space zones in the key frame and but belongs to different target, this situation may take place.For example, in first key frame, can observe the automobile of a green at the center of image, the prospect of this image has depth characteristic.In next key frame, automobile may move, and what be presented at its back is green ranchette, the zone, ground in the middle of the depth characteristic in this pasture has been stipulated.Provide two key frames to this training algorithm, these two key frames all have green pixel at the center of image, but have different depth characteristic.Can not solve this conflict, and estimate that this mapping function can not work well yet in such zone.By introducing a time variable, this algorithm can be by the green pixel of recognition image center on the time the foreground pixel near first key frame in the image sequence solve this conflict.Along with the time advances to second key frame, training algorithm will can identify more as the green pixel in the middle of the image of the degree of depth medially in green pasture.

This process has been shown in the example of Fig. 6.Each frame of box presentation video sequence.Above a line display source frame, according to their relative positions in image sequence they are numbered.The depth map that a beneath line display was produced by this stage.The numbering expression produces the grade of depth map.Though should be appreciated that, degree of depth frame 1 and 2 can reverse order be handled, and can make degree of depth frame 3 and 4 reverse etc. similarly.The input that key frame 7 is set to handle as mentioned above.First depth map that produces is relevant with source frame 1 as shown in the figure.Use preceding two width of cloth depth maps that produce to produce any follow-up depth map.

Preferred embodiment

For each pixel in the frame that will draw the degree of depth, use the picture characteristics of object pixel to determine the degree of depth relevant with described pixel.In preferred embodiment, obtain two depth estimation values again, each key frame obtains one.Figure 7 illustrates this process, wherein show before the present frame in image sequence and afterwards object pixel 11 are (steps 12 and 13) that how compare with immediate source key frame 6.Be similar to before describedly, this learning process uses search radius 14 to discern the pixel with similar image characteristic, and the use degree of depth (step 15 and 16) relevant with described pixel calculated the degree of depth (step 17 and 18) of object pixel.Each key frame produces the estimated value of an object pixel degree of depth, and we are defined as D1 and D2 with it.

In order to determine the ultimate depth relevant with object pixel, D1 and D2 necessary in conjunction with.In preferred embodiment, the weighted mean value of these values is calculated in the position of use key frame as weighting parameters.If the distance from present frame to first key frame is T1, the distance from present frame to second key frame is T2, and then the degree of depth of object pixel is provided by following formula:

w 1 = \frac{1}{{T 1}^{2}}

w 2 = \frac{1}{{T 2}^{2}}

Wherein, D1 and D2 are respectively the degree of depth from key frame 1 and key frame 2 calculating gained.

In some cases, learning process can't be determined the depth value of given pixel.If in above computation process, can't determine for one in two key frame estimation of Depth values, then distribute object pixel to the key frame estimation of Depth value that will distribute and not use weighting.If two estimated value D1 and D2 are not defined, then enlarge search radius and repeat this process.

It should be noted that the depth map that produces any other frame only needs a key frame.But, in the situation that the degree of depth of target changes in image sequence, two or more as mentioned above the key frame of weighting improved result will be provided.

The mode that it should be understood that the order of processed frame and will carry out combination from the result of a plurality of key frames can change under the prerequisite that does not influence character of the present invention substantially.

As in the situation of 2D image, be appreciated that this training stage can lie in the example based on study, thereby determine the estimation of Depth value at any pixel place of image in the sequence.

Figure 4 illustrates this process.

It being understood that and to realize and the learning process that is used for the similar process in stage 1 in the stage 2.Two these mapping functions can use 6000 bytes to be written as a file, and then for this cost, we have obtained the depth information of 20 frames.This has represented effectively that just size is the file of every frame 6000/20=300 byte.In actual applications, this effective compression will be important.

In Another Application, above compression can allow effective transmission of 3D information, and this 3D information is embedded in the 2D image source, can with the 2D image of 3D rendering compatibility in.Because the file size that mapping function needs normally 2D view data offers the very little part of 3D information, so can finish with very little system overhead to 2D image sequence interpolation 3D rendering.

In this case, before observing or when observing, use mapping function thereon when each 2D image is observed in sequence and produce 3D information observing only to hold.Calculate under the effectively situation providing after they are trained when the mapping function type of finding in the machine learning, this just may finish.Usually this training process is slowly and resource-intensive, and in the process that makes up the 3D rendering content normally off-line finish.In case train, this mapping function just can transmit to observing end, and the available 2D of being suitable for image is finished to the very high treatment capacity that 3D changes in real time.

The original publication of applicant oneself relates to the technology of 2D image to three-dimensional 3D rendering conversion.Disclosed transfer process has comprised the conclusion to the depth map relevant with the 2D image.In one embodiment, depth map is that a frame frame ground is created by hand.The improvement of Miao Shuing in this application makes the key frame of lesser amt have the depth map of establishment and the intermediate depth figure of calculating.Because key frame is represented the very little part of frame sum, all effectively important conversion improves on time and cost so this new technology has been represented.

The certain content of the disclosure thing is, the present invention should be applied in the establishment of depth map, rather than in the generation of stereographic map.

The personage that is skilled in technique will understand that this depth map is widely used in the industry in special-effect in the process that is called as the rotation demonstration.For synthetic on-the-spot action or computer-generated image in the 2D image, usually must manual depth map or the shading that produces each 2D picture frame.These shadings are synthesized additional image, thereby the approximate geometry size with original 2D image moves when making it to occur.The present invention described above can produce this shading fast.

It also is known that the shooting function that is developed obtains depth map from on-the-spot sight.Usually these adopt laser ranging technique and so-called LIDAR equipment.Catch depth map for speed, need a kind of costliness and complicated system with television frame.Application thing of the present invention can make simpler and uncomplicated relatively LIDAR equipment only be designed to be need catch depth map with the part of video field speed or in other unusual cycle, and catches the depth map that lacks that uses technology described in the invention to produce by interpolation.Manual depth map or the shading that produces each 2D picture frame of Chang Bixu.These shadings are synthesized additional image, thereby the approximate geometry size with original 2D image moves when making it to occur.The present invention described above can produce this shading fast.

It also is known that the shooting function that is developed obtains depth map from on-the-spot sight.Usually these adopt laser ranging technique and so-called LIDAR equipment.Catch depth map for speed, need a kind of costliness and complicated system with television frame.Application thing of the present invention can make simpler and uncomplicated relatively LIDAR equipment only be designed to be need catch depth map with the part of video field speed or in other unusual cycle, and catches the depth map that lacks that uses technology described in the invention to produce by interpolation.

Claims

1. method of creating depth map may further comprise the steps:

For at least one pixel or the part of image are distributed a degree of depth;

Determine each described at least one pixel of described image or the relative position and the picture characteristics of a part;

2. method of creating depth map may further comprise the steps:

For at least one pixel or the part of image are distributed a degree of depth;

Determine the x of each described at least one pixel or the part of described image, y coordinate and picture characteristics;

3. method according to claim 1 is characterized in that described picture characteristics comprises rgb value.

4. according to the described method of aforementioned any claim, it is characterized in that, thereby any pixel or the part that also are included as described image are redistributed the degree of depth and are revised any inconsistent step.

5. according to the described method of aforementioned any claim, it is characterized in that described picture characteristics comprises in brightness, colourity, contrast or the space measurement any one.

6. according to the described method of aforementioned any claim, it is characterized in that described first algorithm can be represented by following equation:

Z=f (x, y, R, G, B) wherein, x and y have defined the relative position of a sample.

7. according to the described method of aforementioned any claim, it is characterized in that, utilize a learning algorithm to determine the configuration of described first algorithm.

8. method according to claim 7 is characterized in that, for each pixel in the image, this learning algorithm calculates:

z _n=K _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _nWherein

N is a n pixel in the key frame images

z _nBe to distribute at x _n, y _nThe depth value of place's pixel

k _aTo k _eBe constant, determine by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel.

9. according to claim 7 or 8 described methods, it is characterized in that, introduce a random component to reduce over training to learning algorithm.

10. method according to claim 9 is characterized in that, described random component is a little plus or minus random number.

11., it is characterized in that described learning algorithm is pixel like initial identification and known pixels property class according to the described method of arbitrary claim among the claim 7-10.

12. method according to claim 11 is characterized in that, pixel like the search class in a search radius.

13. method according to claim 12 is characterized in that, described search radius all can change for each characteristic.

14., it is characterized in that the degree of depth of pixel is by determining with the weighted mean value of the distance of similar pixel according to the described method of arbitrary claim in the claim 11 to 13.

15. method according to claim 14 is characterized in that, weighted value is inversely proportional to distance.

16. method according to claim 7 is characterized in that, each characteristic is cut apart or is divided into one group of zone, and according to shared region allocation one depth value.

17. create the method for a series of depth maps for image sequence for one kind, may further comprise the steps:

Receive the depth map of at least one frame in the described image sequence;

Utilize described at least one depth map to determine second configuration of second algorithm, thereby determine depth characteristic as the function of relative position and picture characteristics;

Utilize described algorithm to create the depth map of each frame of described image sequence.

18. a method of creating a series of depth maps of image sequence may further comprise the steps:

Receive the depth map of at least one frame of described image sequence;

Utilize described at least one depth map to determine second algorithm, thereby determine as x the depth characteristic of the function of y coordinate and picture characteristics;

19. according to claim 17 or the described method of claim 18, it is characterized in that, receive at least two depth maps of corresponding described image sequence at least two frames.

20., it is characterized in that described picture characteristics comprises rgb value according to the described method of arbitrary claim in the claim 17 to 19.

21., it is characterized in that described characteristics of image comprises at least one in brightness, colourity, contrast or the space measurement according to the described method of arbitrary claim in the claim 17 to 20.

22. according to the described method of arbitrary claim in the claim 17 to 21, it is characterized in that, utilize a learning algorithm to determine the configuration of described second algorithm.

23. method according to claim 22 is characterized in that, described learning algorithm is in back-propagation algorithm, C4.5 algorithm or the K-mean value algorithm.

24., it is characterized in that described second algorithm computation according to claim 22 or 23 described methods:

z _n=K _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _nWherein

N is a n pixel in the key frame images

z _nBe to distribute at x _n, y _nThe depth value of place's pixel

k _aTo k _eBe constant, determine by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel.

25. according to the described method of arbitrary claim in the claim 17 to 24, it is characterized in that, for every pair of frame receiving depth map is created additional algorithm configuration.

26. a method of creating a series of depth maps of image sequence may further comprise the steps:

Receive the depth map of at least two key frames of described image sequence;

Utilize described depth map to determine second algorithm, thereby determine as x the depth characteristic of the function of y coordinate and picture characteristics;

Utilize described algorithm to create the depth map of each frame of described image sequence, wherein, before non-contiguous frames, handle the frame that is adjacent to described key frame.

27. method according to claim 26 is characterized in that, in case handled after the described contiguous key frame, just described contiguous key frame is worked as the key frame that acts on the further depth map of establishment.

28., it is characterized in that described second algorithm computation according to claim 22,23,26 or 27 described methods:

z _n=K _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _n+ k _f.T wherein

N is a n pixel in the image

z _nBe to distribute at x _n, y _nThe depth value of place's pixel

k _aTo k _fIt is the constant of before determining by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel

T is the time measured value of this particular frame in the sequence.

29. a method of creating a series of depth maps of image sequence may further comprise the steps:

From described image sequence, select at least one key frame;

Determine each described at least one pixel of each described key frame or the relative position and the picture characteristics of a part;

Utilize the described degree of depth, picture characteristics and each relative position of each described at least one key frame to determine first configuration of first algorithm of each described at least one frame, thereby determine depth characteristic as the function of relative position and depth characteristic;

Utilize described first algorithm to calculate each pixel of each described at least one key frame or the depth characteristic of part;

Wherein, described depth characteristic has formed the depth map of each described at least one key frame,

Utilize each depth map to determine second configuration of second algorithm, thereby determine the depth characteristic of each frame as the function of relative position and picture characteristics;

Utilize described second algorithm to create each depth map of described each frame of image sequence.

30. method according to claim 29 is characterized in that, handles the frame that is adjacent to described key frame before non-contiguous frames.

31. method according to claim 30 is characterized in that, the contiguous frames of next processing is used as the key frame of further processing.

32. one kind is carried out Methods for Coding to series of frames, comprises with described frame and transmits at least one mapping function, it is characterized in that, described mapping function comprises an algorithm, in order to determine the depth characteristic as the function of relative position and picture characteristics.

33. method according to claim 32 is characterized in that, described picture characteristics comprises rgb value.

34., it is characterized in that described picture characteristics comprises at least one in brightness, colourity, contrast or the space measurement according to claim 32 or 33 described methods.

35. according to the described method of arbitrary claim in the claim 32 to 34, it is characterized in that, utilize a learning algorithm to determine described mapping function.

36. method according to claim 35 is characterized in that, described learning algorithm is in back-propagation algorithm, C4.5 algorithm or the K-mean value algorithm.

37., it is characterized in that described mapping function calculates according to claim 35 or 36 described methods:

z _n=K _a.x _n+ k _b.y _n+ k _c.R _n+ k _d.G _n+ k _e.B _nWherein

N is a n pixel in the key frame images

z _nBe to distribute at x _n, y _nThe depth value of place's pixel

k _aTo k _eBe constant, determine by algorithm

R _nBe at x _n, y _nThe red color component value of place's pixel

G _nBe at x _n, y _nThe green component values of place's pixel

B _nBe at x _n, y _nThe blue component value of place's pixel.

38. according to the described method of arbitrary claim in the claim 32 to 37, it is characterized in that, for receiving each algorithm that establishment adds to frame of depth map.