MXPA99004772A - Motion tracking using image-texture templates - Google Patents

Motion tracking using image-texture templates

Info

Publication number
MXPA99004772A
MXPA99004772A MXPA/A/1999/004772A MX9904772A MXPA99004772A MX PA99004772 A MXPA99004772 A MX PA99004772A MX 9904772 A MX9904772 A MX 9904772A MX PA99004772 A MXPA99004772 A MX PA99004772A
Authority
MX
Mexico
Prior art keywords
image
model
models
search
block
Prior art date
Application number
MXPA/A/1999/004772A
Other languages
Spanish (es)
Inventor
Astle Brian
Original Assignee
Astle Brian
Princeton Video Image Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Astle Brian, Princeton Video Image Inc filed Critical Astle Brian
Publication of MXPA99004772A publication Critical patent/MXPA99004772A/en

Links

Abstract

Image templates are extracted from video images in real-time and stored in memory. Templates are selected on the basis of their ability to provide useful positional data, via position locator (2), and compared with regions of subsequent images to find the position giving the best match. From the position data, a transform model calculator (3) calculates a transform model. The tracking controller (1) tracks the background motion in the current image to accurately determine the motion and attitude of the camera recording the current image. A transform model confirmer (4) confirms the transform model by examining predefined image templates. Transform model data and camera sensor data are then used to insert images into the live video broadcast at the desired location in the correct perspective. Block updater (5) periodically updates stored templates to purge those that no longer give valid or significant positional data. New templates extracted from recent images are used to replace the discarded templates.

Description

FOLLOW-UP OF THE MOVEMENT USING IMAGE MODELS- TEXTURE CROSS REFERENCE FOR RELATED REQUESTS The present application is related to and claims the benefits of the provisional application of the United States No.60 / 031, 883 filed on November 27, 1996 entitled "Camera Tracking Using Persistent, Selected, Image-Texture Templates" The present application is also related to the following co-pending common property applications: United States Provisional Application Serial No. 60 / 038,143 filed on November 27, 1996 entitled "IMAGE IN VIDEO STREAMS USING A COMBINATION OF PHYSICAL SENSORS AND PATTERN RECOGNITION "; Serial No. 08 / 563,598 filed on November 28, 1995 entitled "SYSTEM AND METHOD FOR INSERTING STATIC AND DYNAMIC IMAGES INTO A LIVE BROADCAST"; Serial No. 08 / 580,892 filed on December 29, 1995 entitled "METHOD OF TRACKING? CENE MOTION FOR LIVE VIDEO INSERTION SYSTEMS"; and Serial No. 08 / 662,089 filed on June 12, 1996 entitled "SYSTEM AND METHOD OF REAL-TIME INSERTIONS INTO VIDEO USING ADAPTIVE OCCLUSION ITH TO SYNTHETIC COMMON REFERENCE IMAGE".
FIELD OF THE INVENTION The present invention relates to improvements in systems that insert selected brands into live video transmissions.
DESCRIPTION OF THE RELATED TECHNIQUE Electronic devices for inserting brands into live video transmissions have been developed and used for the purpose of inserting advertisements, for example, at sporting events. The availability of such devices depends directly on their ability to seamlessly insert and appear as real as possible to be part of the real scene. The insert must also be strong enough to handle typical camera manipulations such as panning, tilting, zooming, etc. without compromising the integrity of the transmission.
The key preponderance in such a mark insertion system is the ability to follow the movement of the scene and the movement of the background from one image to another in the transmission. Reliable tracking data is needed in order to calculate the transformation models that fit an insertion to the appropriate size and perspective before inserting an image into each new picture box.
The United States patent No. 5,264,933 for Rosser makes the observation that standard methods of pattern recognition and image processing can be used to track background and scene movement. The standard methods of pattern recognition and image processing referred to are character tracking using the standardized correlation of previously stored image models. These methods work properly but not under all conditions.
Subsequent methods that have been called "adaptive hierarchical geographic tracking" have been incorporated in which an elastic model is used to extend the domain of the image frames that can be followed in an appropriate manner. The extended domain includes noisy scenes that contain a large amount of occlusion. Occlusion refers to the action in the current image that obscures part or most of the preselected reference points used by an insertion system to calculate the position and perspective of an insert in the live stream. The extended domain also includes images that contain rapid variations in the general elimination conditions. Adaptive geographic hierarchical tracking requires at least three separate reference points that are always visible in the image that is being followed. Since precise image conditions can not be predicted in advance, a block matching technique called "non-normalized correlation" is usually employed.
The present invention further extends the image frame domain that can be followed to include frames in which there are no preselected reference points visible. Unlike adaptive geographic hierarchical tracking, which preferably uses predefined synthetic models, the present invention uses models taken from the stream of images being transmitted.
There are also movement estimation schemes referring to the prior art. Digital video encoders that use motion estimation for data compression purposes extract image models from the video images and calculate the motion vectors.
A real image is secured against intrusion with a set of models and the motion vectors are calculated for each model using a previously transmitted image. The object is to reduce the number of bits needed to encode an image block by transmitting only one motion vector plus optional correction factor as opposed to the transmission of a complete image block. After coding the image, the models are discarded.
The typical block matching criteria for this scheme include Ll norm, L2 norm and normalized correlation. Ll norm is defined as D = Sd, L2 norm is defined as? D2 where d is the difference in pixel values between the image and the model. The total sum is carried out during all the pixels in each model. The normalized correlation is defined as: / v «S / r ?? W where T represents the values in the model and I represents the values in the image.
In this description, the block equalization techniques will be defined so that the best equalization corresponds to the lowest value of the selected equalization criteria. Therefore, if the normalized correlation was used as the block equalization criterion, unequalization would be defined as: As the model moves over the current image the resulting value arrangement and calculated using the selected block equalization criteria are called surface error and the best equalization occurs when the error surface has a minimum value.
Since it is likely that the illumination levels in the current image are similar to the equalization blocks in the previously transmitted image, the block equalization is more reliable than the methods that include the average illumination information.
The present invention differs from the motion estimation used in video encoding in a number of significant ways. In the present invention, the models are a carefully selected subset of the total blocks available instead of all possible positions. Careful selection of a region and model is necessary because, unlike motion estimation in comprehension algorithms, the result of the present calculation is not a set of motion vectors for the blocks but instead a transformation model individual. In a sense of "least squares error" the individual transformation model is the best descriptor of the movement of the model assembly. In addition, the models are placed in selected positions in the image instead of securing the image. In addition, the models are stored in memory and are not discarded after each image is processed.
In the present invention, the real position of a model is determined in relation to the previous position considering that the movement estimation of the previous position is determined in relation to the current assured position. The movement estimation in the video coding is directed towards the search for the best displacement equalization, that is, with the least coding error, to the current image from a previously transmitted image. In contrast, the position location of the present invention is directed toward visual correction (the observer's perception of the image) of the movement of the image. In ambiguous cases it is not important how the motion estimation in the video encoding resolves the ambiguity although it is critical how the position location method of the present invention solves the ambiguity. The resolution of the ambiguity can involve the examination of the model as determined from other nearby blocks. Motion estimation has limited precision, often half a pixel due to the computational and coding requirements associated with increased accuracy. However, at the position location, there are no such limits of precision.
SHORT DESCRIPTION The present invention uses image models taken directly from a streaming video stream. Depending on the intended application, ie, baseball, football, soccer, etc., the specific capture criteria are used to select models from the current image. For long-term spatial stability, the models are stored in memory and remain useful while the models continue to meet certain retention criteria. The retention criteria include a satisfactory match to the current image of the transmission as well as a spatial consistency with other models. The spatial consistency means that the models that are to be conserved are compatible with other models with respect to the position as opposed to the curvature. The models are updated periodically to discard those that are no longer able to give satisfactory opposition data. The new models selected from the real image are then used to replace those discarded. The position of each model is determined by comparing the model against the real image. The preferred comparison method uses an entire position search followed by a two-dimensional interpolation process to obtain precise position information for fractions of a pixel. A transformation model calculated afterwards from the opposition data derived using additional data related to the shape of the error surface near the equalization position. A transformation model provides a description of the current image so that the marks can be inserted into the current image in the desired location and the correct perspective. There are several ways for this model of transformation, an example, the simplest model defines the panning of inclination and approach of the camera that records the event. More complex models can include camera parameters such as rotation, mounting displacements and other camera movement. The transformation model can be confirmed by examining predefined synthetic models and the model can be adjusted if necessary. Changes in inequality values over time allow video transitions such as scene cuts and gradual occurrences to be detected. Finally, the system and method of the present invention are viable insofar as the texture exists in the current image. The texture needs not to be stationary, although for periods longer than several video frames, that is to say mass scenes of people.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention can be better understood with reference to the following figures in which like reference numbers will represent similar elements in the various figures.
Figure 1 illustrates a block diagram of the texture tracking system, in which table 1 represents a tracking controller; Table 2 represents a position locator; Table 3 represents a calculator of the transpiring model; Table 4 represents a confirmation of the transformation model; Table 5 represents an updater block and Table 6 represents a block of information of the present invention.
Figure 2 (a) and 2 (b) illustrate two different image models.
Figure 3 is a block diagram showing the functions of the position locator. Where in box a) is predicted where a block could be; in box b) a search is conducted to find the approximate position of the integrator; in box c) it is examined whether the locations around the best area correspond adequately with minimum differences; in box d) a fractional interpolation is carried out to obtain the most exact positions of the block, in box e) the information is written to contain the information.
Figure 4 illustrates a two-dimensiointerpolation method used in the present invention.
Figure 5 illustrates three block inequality functions superimposed with respect to the position.
Figures 6 (a) - (c) illustrate a two-dimensiooptimization method used to locate the minimum error surface.
Figure 7 illustrates the spatial selection of blocks in an image, specifically for a soccer game.
Figure 8 illustrates a typical image and camera path A-B-C-D-E-F.
Figures 9 (a) - (c) illustrate a vertical model, a horizontal model and an arrangement of those models used to locate American football goal posts.
Figures 10 (a) - (c) illustrate the pan, tilt and rotate angles, the focal length image size, and the optical axis shift of a camera configuration.
DETAILED DESCRIPTION The detection of an objective area of insertion is only one aspect of a complete insertion system. By way of background, an LVIS, or live video insertion system, is described in common property application Serial No. 08 / 563,598 filed on November 28, 1995 entitled "SYSTEM AND METHOD FOR INSERTING STATIC AND DYNAMIC IMAGES INTO A LIVE VIDEO BROADCAST ". An LVIS is a system and method for inserting static or dynamic images into a live video broadcast in a realistic manner on a real-time basis. Initially, the natural reference points in a scene suitable for subsequent detection and tracking are selected. The reference points preferably comprise well-defined vertical, marked and clear, horizontal, diagoor corner features within the visible scene for the video camera as it makes panning and zooming. Typically, at least three or more natural landmarks are selected. It is understood that the landmarks are distributed throughout the entire scene, such as a baseball park or football stadium, and that the field of observation of the camera in an instant is normal and significantly smaller than the whole scene which can be taken in panoramic. The reference points are often located outside the destination point or area where the insertion will be placed because the insertion area is typically too small to include numerous identifiable reference points and the insertable image can be a dynamic and, therefore, , does not have an individual stationary target destination.
The system models the recognizable natural reference points on a deformable two-dimensiogrid. An arbitrary reference point is selected within the scene. The reference point is mathematically associated with the natural reference points and subsequently used to locate the insertion area.
Before the insertion process, the illustration of the image to be inserted for perspective is adjusted, that is, the shape. Because the system knows the mathematical relationship between the reference points in the scene, it can automatically determine the approach factor and the position adjustment X, Y that must be applied. Later, when the camera moves closer and moves away and changes its field of vision as it takes the panoramas, the insertable image remains on a suitable scale and proportionate to the other characteristics in the field of observation so that it seems natural to the viewer domestic. The system can pan in and out of a scene and have the insertable image that appears naturally in the scene instead of arising as has been the case with some of the prior art systems. The system can easily place an insertable image in any location.
The present invention relates to the tracking aspect of a live video insertion system. The Figure 1 illustrates a block diagram of an image texture tracking system of the present invention.
FOLLOW-UP CONTROLLER A tracking controller controls the operation and sequence of four modules: 1 positioner 2, a model 3 transformation computer, a 4 transformation model confirmer and a 5 block updater. Positioner 2 reads the blocks of data 6 and determines the positions of the current image. The blocks comprise models and associated data. Placeholder 2 stores the current positions of the models together with additional data in the data of block 6 once a model has been successfully located. Additional data includes data that pertains to which the current image also matches a model and how any inequality varies with the position. The transformation model calculator 3 uses the data from block 6 to calculate a transformation model. The transformation model defines how a reference model can be changed in order to correspond satisfactorily with the current image. The reference model is a representation of a scene in a coordinated system independent of the current image coordinates. A camera model is a specific type of transformation model expressed in terms of camera parameters only, for example panning, zooming, tilting and turning. The Transformation Model 4 confirmer ensures that the transformation model is a visually correct description of the actual image by observing tracking errors as well as evidence of scene cuts and other digital video effects. The block 5 updater examines the blocks, cleans the memory of those that are no longer useful and selects and stores the replacement or new blocks. Each of the modules in Figure 1 is normally activated once per interlaced video field.
When an insertion target area is detected for the first time, the tracking controller activates the updater of block 5 to select and store the blocks. The transformation model derived from the detection is used to relate the stored blocks to a reference model. These blocks are then used by the positioner 2, the model 3 computer and the model 4 driver in the subsequent field. POSITION LOCATOR Positioner 2 determines the position of stored models with respect to the real image. The models typically consist of a rectangular array of pixels. Good results have been obtained from 8x8 to 16x16 pixel arrangements. Larger sizes give better results and movement is simple while smaller sizes give better results for complex movement.
There are two types of models, image models that are derived from an image and synthetic models that are predefined and are not derived from any particular image. Synthetic models are typically zero-zero models. Half-zero models are models whose average lighting levels are zero. These can be derived from models by subtracting the average lighting level from each pixel in the model. The figure 2 (a) illustrates a half-zero model intended for vertical edge detection and Figure 2 (b) illustrates an artificial continuous image model showing a collection of pixel illumination levels for an 8x8 array.
The block consists of one or more models together with associated data. There are two types of blocks, image blocks that contain image models and synthetic blocks that contain synthetic models. The synthetic blocks are typically related to the lines in the reference model, while the image blocks are typically related to the points in the reference model.
Given the time, the positions of all stored models with respect to the real image are determined by search. Unless the search is exhaustive on a predefined area, the time taken for the search can not normally be predetermined and the tracking controller usually sets a time limit within which all searches must be completed. This time limit should allow enough time to update the blocks. Generally, time limits must be set so that the number of active blocks remaining after the block update can be adequately searched during subsequent fields. The number of blocks may vary as the video changes. Active blocks are those blocks that match the current image.
The position of each model is typically found by conducting a search over a limited region of the current image. An efficient way to search is to perform an equalization in whole pixel position using pre-selected block matching criteria. In order to obtain an accurate estimation of the movement of the images, it is desirable to have an estimate of the position of the model for the subpixel accuracy. The subpixel accuracy is necessary for the stability and precision of the transformation model, particularly when only a small number of models produce reliable position data. The search preferably takes place in two phases, a position integral search followed by a fractional pixel interpolation process.
For a position integral search, each model is placed in several potions of integral position in the real image and an error surface is calculated using the selected model capture criteria. Normally, the minimum surface error is used as the whole position. If extensive computational resources are available, the search can be exhaustive over a large area. For greater efficiency, the local movement and position of the model can be predicted using the recent history of the transformation models together with the locations of any models found in the real image. The estimated local motion vector is used to determine the size and shape of a region on which the search is executed. Since large motion vectors are likely to be less accurate in their forecast than small motion vectors, the size of the search region should increase as the magnitude of the vector increases. It is important to verify that the sector zero since the reproduction or other video editing can disturb the predicted transformation model. The zero vector represents no movement of the position of the model in the real image that is identical to the position of the model in the previous image.
Prognosis is important because it reduces calculation time and can resolve opposition ambiguities. For example, if two equally good model locations are found then the model location closest to the predicted position is more likely to be the correct model location. Simple linear or second order forecasting techniques are usually sufficient. Better results are obtained by forecasting the movement of the camera instead of the movement of the image since the cameras have physical inertia for panning and tilting, and the approach tends to be uniform. Note that in video entanglement you can alter simple field-to-field forecasts and care must be taken to allow this, instead of being used by frame-based prediction production schemes, or to allow a de-phasing of the video entanglement . Also note that position noise may be present due to the camera's vibration and that during playback, or when the video has been edited, simple prediction models may not work. Therefore, it is important to allow non-predictable behavior by continuously checking zero motion or executing a broad search on a few selected blocks with models that have well-defined texture.
When certain tracking objects such as network or separations, multiple error surface minima may be present. Such multiplicities can be solved by forecasting or obtaining an initial estimate from those models that exhibit only an individual minimum. One way to select between multiple minimums is to use evaluation fusion that places less emphasis on the minimums that are additional to the forecast position and to select the best minimum evaluated.
A method to conduct the integer position search is to exhaustively search a series of regions. The initial search region is centered on a predicted model position, and its size and shape depend on the local velocity of the pixels as estimated from the predicted camera movement. If the minimum occurs at one edge of the search region then a second search is made in a region that encloses the first minimum. When a minimum is found within the predicted region, that is, not at the limit, the integer search ends successfully. To avoid spending too much time on blocks that are likely to have an error, it is better to end the search after two or three stages. If a minimum is not found, then this information is written in the block data ß so that the block can be further debugged by the block 5 updater.
Another option is to estimate the transformation model as the model positions are determined progressively. The more reliable estimates are made, the search regions can be reduced in size and the number of stages allowed to decrease.
The typical block matching criteria include L norm 12 norm, and normalized correlation D =? D, L2 is defined as? Of where d is the difference in pixel values between the image and the model. The sum is carried out on all the pixels in each model. The normalized correlation is defined as: where T represents pixel values in the image model and the letter I represents the values in the current image.
In this description, the block equalization techniques will be defined so that the best equalization corresponds to the lowest value of the selected equalization criteria. Therefore, if the normalized correlation was used as the block equalization criterion, the inequality would be defined as: As the model moves over the real image, the resulting arrangement of values calculated using the matching criteria of selected blocks are called the error surface and the best equalization occurs when the error surface has a minimum value. The models are a carefully selected subset of the total blocks available instead of all possible positions. Careful selection of a region and model is necessary because, unlike movement estimation in comprehension algorithms, the result of the present calculation is not a set of motion vectors for the blocks, but an individual transformation model . In a sense of "minimal square error" the individual transformation model is the best descriptor of the assembly movement of the model. In addition, the models are placed in selected positions in the image instead of tilting the image. In addition, the models are stored in memory and are not discarded after each image is processed.
In the present invention, the actual position of a model is determined in relation to the previous positions by considering the movement estimation of the previous position that is determined in relation to the actual inclined position. The motion estimation in the video coding is directed to find the best displacement equalization, ie that with the least coding error, for the real image from a previously transmitted image. In contrast, the position location of the present invention is directed to the visual correction (perception of the image observer) of the movement of the image. In ambiguous cases, it is not important how the estimate of movement to ambiguity resolves, but it is important how the position location method of the present invention solves the ambiguity.
The resolution of the ambiguity can involve model examination as determined from other nearby blocks. Motion estimation has limited accuracy as frequently up to half a pixel, due to the computation and coding requirements associated with the increased accuracy. However, at the position location there are no such limits of precision.
After the search of the whole position successfully ends, the fractional part of the motion vector is estimated. There are several ways to do this.
After the search for integer finishes successfully, the fractional part of the motion vector is estimated. The numerical values of the inequality near the integer minimum give an error surface. The shape of the error surface depends on the block matching criteria, the model and the actual image. A preferred method uses Ll for integer search followed by a triangular interpolation for fractional estimation. A one-dimensional triangular interpolation is illustrated in Figure 3. Lines of equal inclination but opposite are constructed through the integer minimum 32 and two adjacent points 31 and 33. The Parabolic Interpolation, which is illustrated in Figure 3 ((a) prognosis where a block is likely to be, (b) conducts the entire pixel search to find the approximate position, (c) examines unequalization in whole locations around the best equalization area, that is, the minimum, (d) execute the fractional interpolation to obtain the most precise block positions, (e) write information to the data block); adjust a parabola through the same three points. Parabolic Interpolation leads to some two-dimensional interpolation methods, and is suitable for the equalization of L2 norm block. Parabolic and triangular interpolation generally gives different values for the position and magnitude of the minimum. Three interpolations of halves energy, which are intermediate between the triangular and parabolic interpolation can sometimes give better results. The use of additional points to adjust a cubic function or flexible ruler is not useful for the block equalization functions described above The location of a minimum error surface requires a two-dimensional optimization method. Several of these methods are available. The image model can be expanded to give values in positions of its pixel and the position of the integer search applied to those positions of its pixel. A second method uses a sequence of one-dimensional interpolations and a third method is to fit a two-dimensional surface directly to the error surface.
The first method is illustrated in Figures 6 (a) - (c). The source pixels in FIG. 6 (a) are separated and the values annotated therein by eg bilinear interpolation as shown in FIG ß (b). If the expansion is by a factor of n, the model pixels are matched against each pixel of the expanded image nth and the positional precision is 1 / n. The model shown in FIG: 6 (c) equals the expanded image model if it is placed at 1/2 pixel to the right and 1/4 pixel down with reference to FIG. 6 (a). This method is computationally expensive since the n2 matches must be done to obtain an accuracy of 1 / n.
The second method is illustrated in Fig. 4. It uses the error surface values near the minimum found by the integer position search 40. The minima are interpolated for horizontal scan lines on 41 in 42 and below 43 of the minimum whole, using a one-dimensional interpolation method. The final minimum is interpolated from these three minimums 44. Note that for a large class of mathematically patterned two-dimensional surfaces, this technique produces perfectly precise interpolated positions.
They also estimate the curvatures of the error surface. If the model defines a horizontal and vertical edge, of a textured structure, then the horizontal and vertical curvatures must be estimated. The simple estimate is a representation (A-2B + C) where B is the value of the error surface at the minimum and A and C are equidistant values from the minimum on each side. For the parabolic interpolation, the positions of the measurements are not critical, so that it can be computationally convenient to use integer position values. For other methods of interpolation, for example triangular, the position is important, and the interpolated values can be used instead. If the model defines a diagonal edge, then the curvature perpendicular to the edge must be measured. The measured or estimated curvatures contribute to the block evaluation used by the model 3 computer.
The block equalization can be executed by any of the previously mentioned techniques, for example Ll norm, L2 norm, or normalized correlation. From the computational point of view, it is the simplest. In video coding, it is normal to mediate only the luma component of the motion estimation image. Although luma usually contains most of the high frequency frequency space information, chromaticity components can provide additional information, particularly useful in resolving position ambiguities. The block inequality functions can be adjusted to incorporate the chromaticity information. In all the equalities, the values are summed for the particular color space using evaluations to combine the color planes. In Y, U and V color spaces and other similar color spaces, the luma component usually contains the most high spatial frequency information and is the most important component for positional equality A third method to find the two-dimensional position of the minimum of the error surface is to assume that the error surface has a particular shape near the minimum and interpolates the position of the minimum using a method from its singular value composition as described in " Numerical Recipes in C "2nd" Ed., WH Press et al. , Cambridge Universi ty Press, 1992, p. 59. The shape can be a second degree surface, an elliptical cone or another shape.
Care should be taken in the application of two-dimensional reduction methods since the diagonal edges can result in positions that are very susceptible to pixel measurement noise. It is recommended that the minimum found by interpolation methods has not been allowed to deviate by more than one pixel from the position indicated by the integer search.
A problem with simple interpolation methods is that they do not take into account the intrinsic symmetry of the error surface. Appendix A-2 illustrates the source of the intrinsic asymmetry. The preferred way to achieve more accurate estimation is to make an interpolation of the error surface found using the original image, that is, the image from which the model was extracted and measure the phase shift. This can become part of the reference position. Another method is to measure the shape of the error surface using the original image, and then calculate the position of the minimum based on the deviations from this form instead of the shape measured in subsequent images.
In order to extend the search range without incurring a large calculation violation, a multiple resolution search may be employed. The image is filtered through a low-pass filter first and subsampled to provide a series of lower resolution images. The blocks or sets of images are saved in each level of resolution. Search and block matching are executed at each resolution level, starting at the lowest resolution level. A transformation model is calculated for each resolution level and used to forecast the block positions for the next higher resolution level. This process reduces the search range at each resolution level. The transformation model is refined at each resolution level and the final transformation model is obtained at the highest resolution level. In some cases, for example rapid camera movement that disturbs the image details, it may not be possible to calculate an accurate transformation model at the highest resolution level. In such cases the transformation model calculated at a lower resolution level can and should be used.
MODEL CALCULATOR OF TRANSFORMATION A reference model is a representation of an image in a coordinate system that is independent of the actual image coordinates. A transformation model defines the way in which the reference model must be changed to correspond to the real image. Simple transformation models use three parameters: approach, horizontal displacement and vertical displacement. The most complete transformation models use more camera parameters. These camera parameters include panning, tilt, turn, zoom and focal length. The camera models respond by the changes of perspective in the scene. More complex transformation models can respond for additional changes such as load-mounting shifts, lens distortions, and lighting variations. A camera model is illustrated in the Figures. 10 (a) which shows how a camera can be defined in terms of panning, tilt and rotation angles together with focal length, image size and optical axis offset.
Several techniques can be used to calculate the transformation model. The preferred technique is an average quadratic error method that seeks to minimize the mean square error of an inequality function. A useful addition to this technique is to dynamically vary the evaluations of the block positions. This reduces the effect of extreme values below those middle quadratic methods. Extreme values are those blocks that have a position that differs significantly from that determined by most blocks. Dynamically varying the weights is important since the blocks may be in error due to an optical system image content instead of random noise that is assumed by many medium-squared error methods. The preferred method first adjusts the horizontal and vertical evaluations for each block depending on the corresponding curvatures of the error surface. A preliminary transformation model is then calculated using the mean square error reduction method. Each block is evaluated to determine how much it adheres to the preliminary transformation model. The block evaluations are modified afterwards depending on the spatial error. The final model is calculated using the modified evaluations. This two-stage technique reduces or eliminates the effect of extreme values. One way to calculate the transformation model is given in Appendix A-4.
CONFIRMATOR OF MODEL OF TRANSFORMATION The confirmation of the transformation model is executed by examining synthetic or predefined blocks. For example, if a soccer goal post is known to be lighter than lighter than the background that can be detected using the models in Figure 9 (a) for the verticals and Figure 9 (b) for the horizontal bar as it is placed in Figure 9 (c). The block equalization method for the middle to zero models would be maximizing the product sum of the model pixels and the image pixels. In order to confirm the pressure of a goal post in opposition to some other image feature consisting of intersecting lines. Models marked with an asterisk in Figure 9 (c) should give a positive correlation, ie the horizontal will not extend outside the verticals and the vertical should not extend below the crossbar. If a sufficient number of blocks are active and give the reliable position information, a separate transformation model can be calculated from those blocks and used to partially adjust the main transformation model. If only a few are visible then the adjustment must be small so as not to unduly disturb the transformation model. If the predefined blocks can not be found, or if the inequalities are of increasing amplitude, then a scene cut may have occurred, or a gradual enlargement of the image or a gradual fading of the image may be in process. A detailed evaluation of the predefined blocks together with the recent history of the image blocks allows a determination of the scene transition.
If inequality occurs in almost all models and is progressively increasing, then a gradual fading is indicated. If an inequality occurs along a boundary line that divides the image, then a gradual enlargement of the image is indicated. The problem of reliable detection scene transitions is simplified if you know in advance what kind of transactions can occur. The problem of detecting an unknown transition is difficult since such transitions can be similar to the changes that take place in a continuous activation.
Note that the scene transition information may be made available by means of a separate signal, and perhaps incorporated in the vertical blanking interval, or coded within the image itself. An estimate of the reliability of the transformation model can be used during this confirmation stage.
Less reliable transformation models may require more extensive confirmation. Reliability can be estimated from the sum of the evaluations as calculated by the calculator of the transformation model. This sum takes into account the number of blocks, their texture or curve of the error surface and the inequality from the transformation molding.
Once the transformation model has been found and confirmed, the insertion complementation can be done using the methods described in the patent North American 5,264,933 or the method described in copending application 08 / entitled "Tapestry".
BLOCK UPDATE The blocks are examined periodically to determine if they should be retained or not. The block update is preferably executed in odd fields only or in even fields only in order to reduce problems of video interleaving stability. In the update of the preferred modality, it is done over even tracking fields by counting the first tracking field as zero.
There are two stages in the procedure for updating the block, debugging old blocks and assigning new blocks.
The first stage of the block update procedure is to debug the blocks that do not meet the model retention criteria. In order to be retained for additional use each stored block must typically satisfy the following retention criteria: • The stored block must be in the image security area (for example, not in the horizontal suppression area) • The stored block must not be in an active image region (for example, not in overlapping on-screen graphics) • The stored block position must match the actual transformation model • The stored block must have sufficient curvature of the error surface.
There may be some additional application of specific retention criteria. For example, in tracking a turf game field, the model can only overlap the turf and not the players.
The second stage of the block update procedure is to allocate or extract new blocks. The blocks are first assigned to predefined positions in the reference model then in random positions in a search area in the reference model as they were transformed for the image.
It is important to always complete the first stage in the update procedure so that the invalid blocks are eliminated. The second stage can be completed when the time is up or when a sufficient number of models have been captured. This procedure dynamically adjusts the number of blocks stored for equalization.
The image models are copied from the image, optionally processed and stored in a memory. The number of models extracted may depend on the available processing time.
For extraction purposes, the image blocks can be divided into two types, point blocks and area blocks. Point blocks have predefined positions in the reference image. An example of a point block could be the corner of a soccer goal post. A model is assigned to the image position closest to that calculated from the reference model using the transformation model. If the criteria for successful storage are met, your actual position in the reference model is stored. This will deviate the reference position by less than half of an image pixel calculated using the transformation model for the image for which it was copied. Area blocks are assigned randomly within a search area in the reference model. If you meet the criteria for successful storage, your actual positions in the reference model are stored.
To make efficient the use of available processing resources, each extracted model must satisfy certain model capture criteria. Your position must be in a safe area; that is, each extracted model must be away from the edges of the image, and, in particular, away from any obscuration or other effects due to the suppression of video. In addition, each extracted model must be in the search area, that is, in an area known to the controller based on the previous analysis. For example, the models in a stadium can be captured from the premises or the stadium structures instead of the playing field in order to avoid spatial disturbances due to the movement of the players. Each extracted model must be predicted to not leave those areas. The forecast based on the recent history of camera movement. In addition, each extracted model should not be in any of the exclusion areas, for example the recording area that is displayed in a screen message independent of the source video and should be predicted to avoid those areas in the immediate future. Finally, each extracted model should not overlap any of the existing models for efficiency, although a slight overlap can be allowed. This can pose sufficient texture for the selected block matching criteria that work. The texture can be determined by one of a number of media, for example measuring the variation of luma, or applying the model to the source image and measuring the shape of the error surface. If all these conditions are satisfied, then the image model is extracted.
In certain situations there may be additional model capture restrictions on the models. These can be related to the color or texture of the image itself. For example, if you want to follow a red object, then all models must include some pixels in red. For another example, if you want to follow a grass surface of a playing field, then models should exclude regions that do not contain grass colors in order to exclude players. The grass can be defined as a certain volume in a three-dimensional color space. An additional calculation would allow the inclusion of the field lines of play in the models.
The models can be processed in a number of ways. They can be filtered to reduce noise and other artifacts, although this can have the undesired effect of reducing the spatial accuracy of matching. They can be compared with models previously captured in the same area and averaged to reduce noise. They can be adjusted for zooming or perspective variations based on the calculated camera actions.
Instead of copying a new model from the real image, the inactive blocks can be reactivated by retrieving the models from memory and equaling them to the current image. They can be matched directly, or changed in amplitude, shape or brightness or otherwise, in order to match the current transformation model and image illumination. This has the advantage of increasing long-term stability.
The shape of the model surface is important.
The directions and values of the maximum and minimum curvature must be determined so that it can be determined if the model represents a vertical or horizontal edge, or has a two-dimensional structure. One way to do this is to use block equalization to generate an error surface for the source image. The curvature of the error surface indicates the type of image feature. Some methods of calculating the transformation model do not recoe the diagonal edges and the presence of such edges can reduce the accuracy of the model. On the other hand, such models will be refined when they give an incorrect position and no longer have a term effect on the precision of the transformation model. However, for scenes where the diagonal lines form an important part of the spatial location information, for example, the tennis courts, the diagonal lines must be recoed and used to provide position information only in a perpendicular direction.
Figure 7 illustrates the selection of blocks to follow the movement of a football field. The fields are selected so that the complete block plus a security region around each block consists entirely of the playing surface, in this case grass. The grass is defined by a certain volumetric shape in three-dimensional color space. If any pixel is located outside this region the block is rejected.
Figure 8 illustrates the tracking procedure as the camera takes panoramic shots and tilt changes. For simplicity, it is assumed that the lighting and the approach are constant. The lower right corner of the goal post is assumed to follow the path A-B-C-D-E-F in relation to the image profile. During the trajectory from A to B, the tracking is executed using a mixture of predefined texture and image blocks. A confirmation of the complete model based on the size and shape of the goal posts is possible. At point B, the horizontal crossbar disappears from view. A complete model confirmation is no longer possible although zooming and panning can still be calculated from the reference point blocks by themselves. The blocks that were in the crossbar are reassigned to other parts of the image. From B to C the tracking continues using the stored blocks. At point C the right vertical position disappears from view and the blocks associated with it are reassigned. At point D the horizontal crossbar is predicted to reappear, and a search is conducted using the reference point blocks. After the crossbar is located, any discrepancies in the model are resolved slowly so as not to disturb the insertion location. At point E in the right vertical part it becomes visible, and, between E and F, full model confirmation is possible again.
During tracking, area blocks tend to migrate to that area of the image that more precisely defines the transformation model. For example, assume that a scene cut occurs for an insertion scene when large background objects are visible. The area blocks will be randomly assigned to the background and the foreground. Unless the foreground has the most blocks and moves with internal consistency, the transformation model will be defined by the background. As soon as part of the main field moves in correlation to the background, any blocks assigned to it will be incompatible with the transformation model and will be purified and then randomly reassigned, eventually to the bottom. Block migration increases the stability of the insertion position.
Another type of block migration between block types is illustrated by a typical clay court tennis match. At the beginning of the game the lines of the court are clean and well marked, and the surface of the court is uniform. The lines will be covered by the synthetic blocks, and there will be few image blocks on the surface of the court. During the equalization, the lines typically darken and the surface of the playing field becomes rougher and gains texture. The synthetic blocks are progressively purified and the image blocks are progressively added to the court.
The blocks can be purged, although the model stored for possible future use. For example, if the approach has changed so that a model no longer provides a good match, a new model can be captured and the former placed in a long-term storage. At some future time, the approach can return to its previous value, in which case the previous model can be retrieved and tested to see if it can provide useful equality or not.
APPENDICES Four appendices with the present have been included which further describe and illustrate certain aspects of the present invention. Appendix Al is a comparison of the selected block matching criteria against the current possible image blocks. Appendix A2 is a more detailed description of a one-dimensional fractional estimate that illustrates the source of asymmetry of a surface associated with simple interpolation methods. Appendix A3 is a glossary of terms used throughout the text of this document. Finally, Appendix A4 is a specific method for calculating a model of tracking without perspective of parameter of three optimal parameters for the measurement of derived image models.
APPENDICES APPENDIX A-1 A COMPARISON OF SOME BLOCK EQUALIZATION CRITERIA Consider matching the following model block 2x2 1 2 4 1 For the following image blocks: 1 3 9 21 30 30 4 1 39 11 30 30 A B C The best match is for block A that differs only one level in a pixel. Block B has a similar shape although a much greater amplitude, and block C is uniform.
The matches are evaluated using the following criterion methodologies.
• Ll = Ll norm • L2 = L2 norm • BA = 2 *? IT / (? I2 +? T2). • NC = Normalized correlation • TI =? TI • Zl =? IZ where Z are the values of zero mean pixels of the model as shown below -1 0 2 -1 Note that the value of the pixel in the upper right corner of the image has no effect on Zl suggesting that this is a method of weak matching criteria since multiplication by zero produces a null value for that pixel location.
The results of the different matching criteria are shown in the following table where an asterisk (*) marks the best match. Criteria A B C Ll min 1 * 72 112 L2 min 1 * 1750 3142 BA max 0.9796 * 0.1325 0.1995 NC max 0.9847 0.9991 * 0.8528 TI max 24 218 240 * Zl mx 6 58 * 0 It can be seen that the first three criteria methods, Ll, L2 and BA, work well. The normalized correlation (NC) has some potential problems, although in real images the chances of finding an image block with the same shape but different amplitude are small. TI and Zl are not recommended since there are many possible image blocks that give a higher score than a perfect match.
APPENDIX A-2, Unidimensional Fractional Estimation Form of Interpolation Consider the matching of a six-element model block 1 1 1 2 2 2 for an image containing a corresponding well-defined edge ... 1 1 1 1 2 2 2 2 ... The block inequality functions for several criteria they are given below: where NC stands for Normalized Correlation, Ll stands for Ll norm, and L2 stands for L2 norm. Both Ll and L2 have a triangular shape. The normalized correlation has an asymmetric shape that has more peaks than the triangular one.
Consider the matching of the 6-element model block 1 1 2 3 4 4 for an image that contains a corresponding uniform edge: ... 1 1 1 2 3 4 4 4 ... The block inequality functions for various criteria are give below: Ll has a triangular shape. L2 has a shape that is close to the satellite dish. The normalized correlation has an asymmetrical shape that is between a triangle and a parabola.
Movement of Lines Consider the matching of a four-element model block 2 2 4 4 for an image 2 2 4 4 4 2 The parabolic interpolation for the maximum using the normalized correlation gives a maximum of 0.194 pixels to the left of the correct center position. The interpolation of the minimum using Ll gives 0.167 pixels to the left. The triangular interpolation of the maximum using the normalized correlation of a maximum of 0.280 pixels to the left of the correct center position. The triangular interpolation of the minimum using Ll gives 0.250 pixels to the left.
This shows that in general, the interpolation using the normalized correlation or Ll produces only approximate positions of the best equalization.
The displacement of the image 0.5 pixel to the right gives 2 2 2 3 4 3 The maximum parabolic interpolation using the normalized correlation of a maximum at 0.105 pixels to the right of the center position. The minimum parabolic interpolation using Ll gives 0.167 pixels to the right. The triangular interpolation of the maximum using the normalized correlation of a maximum of 0.173 pixels to the right of the central position. The triangular interpolation of the minimum using Ll gives 0.25 pixels to the right The parabolic interpolation using the normalized correlation gives a shift to the right of 0.299 pixels, using Ll gives a displacement of 0.333 pixels, both less than the correct value of 0.5 pixels. The triangular interpolation using the normalized correlation gives a right shift of 0.453 pixels, using Ll gives a displacement of 0.5 pixels, both much closer to the correct value.
Movement of large edges Consider the matching of the 4-element model block 1 1 3 3 for an image 1 1 1 3 Using the maximum standardized correlation parabolic interpolation gives a maximum of 0.078 pixels to the right of the correct center position.
Using Ll both the parabolic and triangular interpolation give the correct position.
The displacement of the image 0.5 pixels to the right gives 1 1 1 2 3 3 3.
Using the parabolic interpolation of normalized minimum correlation gives a maximum of 0.167 pixels to the right of the center position. In other words moving the edge in 0.5 pixel shift the position of the interpolated point by only 0.089 pixel. Using Ll the parabolic and triangular interpolation give the correct position.
Movement of small edges Consider the equalization of the block of a 4-element model 4 4 6 6 for an image: 4 4 4 6 6 6.
The three normalized correlation values are: The parabolic interpolation of the minimum gives a maximum of 0.034 pixels to the right of the correct center position. Ll gives the correct position.
The displacement of the image 0.5 pixel on the right gives 4 4 4 5 6 6 6 The minimum parabolic interpolation gives a maximum of 0.469 pixels to the right of the center position. In other words moving the edge 0.5 pixel displaces the position of the interpolated pinto by 0.435 pixel. Ll gives the correct position.
This shows that for well-defined edges Ll gives a better estimate of the fractional interpolation than the normalized correlation.
APPENDIX A-3 GLOSSARY OF TERMS Active block those blocks that meet and are within the specified equalization criteria with respect to the real image. model to end a transformation model expressed by linear operations block of area an image block that is pre-assigned to an area in the reference model. antecedent that part of a scene that remains stationary with respect to the camera support. block one or more models plus the associated data containing position information for the reference model and the actual image model camera model a transformation model that is expressed by the camera parameters only, for example, panning, tilting, zooming and turning. error surface a bidimmensional arrangement of values that indicate the inequality between a model and a part of the real image. main plane that part of a scene that moves with respect to the camera support.
Image block a block containing an image model model of image a model derived from an image image texture a measure of pixel to pixel variations with respect to lighting levels vector of local movement the apparent visual movement of a small piece of the image from one frame or field to the next. minimum the point where the error surface is at its minimum point indicating the best equality between a model and the real image. dot block an image block that is pre-assigned to a specific point in the reference model. reference model a representation of the object scene is a coordinate system that is independent of the image coordinates. synthetic block a block containing a synthetic model. synthetic model a predefined model not derived from any particular image since neither the average level of illumination nor the magnification of an image is known, synthetic models are frequently edge models of zero mean. model a pixel layout. texture tracking the image tracking using models copied from the image and used according to the method described in this Transformation model defines the way in which the reference model should be changed in order to correspond to the real image APPENDIX A-4 ESTIMATION OF THE FOLLOW-UP MODEL A method of calculating the tracking model without perspective of three optimal parameters from the measurement of an image model is derived. The conditions under which the quadratic derivation of weighted media can generate the "wrong" model are analyzed. A method to calculate the model that avoids this problem is developed.
The problem of calculation of transformation model can be established as follows: given a reference model that contains a set of points P and a real image that contains a set of points of equality p, what is the best estimate of the transformation model? . The standard approach is to minimize some function of displacement errors. A convenient measurement is the weighted mean squared error. The weights can be based on the displacement errors predicted from previous fields and from the other points in the live field. The weights must also incorporate some measurement of the reliability or accuracy of the position measurement. For simplicity this appendix considers only displacement errors. Consider only a three parameter transformation model consisting of approach, horizontal displacement and vertical displacement.
Definition of Model The transformation model is defined in terms of three parameters: z approach, horizontal displacement u and vertical displacement v. If the reference image has a set of points X, Y then the corresponding current image points, X, Y are given by: x = zX + u (1) y = zY + v (2) the inversion produces: X = (xu) / z (3) Y = (yv) / z (4) MSE equations The total square error of the transformation is E = ?? xi (XÍ-ZXÍ-U) 2+? iWyi (y ± -zYi-y) 2 where is the weight associated with the horizontal displacements of the ith point, and yi is the weighting associated with the vertical displacements of the ith point. One reason for needing different weights is that the vertical direction in a field has an entanglement considering that the horizontal direction does not have it. In the optimal position: ^ ~ 2? Fw xf-zX (~ u) X¡-2 t? Wyl (¡-zYl ~ v) Y, ~ 0 • 5- 22, n (? -? Íí - ') l-0 r? W. «R < + v? w »S, w» ¡= ° (7) Solving equations 5, 6 and 7 results in: «» (S; WxíXt "2Sf W ... S" -tf (9) { 10) Equations 8, 9 and 10 allow the model to be calculated directly from the current image points.
Weighting Function The weighting function due to the displacement error must have the following characteristics: • Positive and negative shifts should contribute equally to the weighting • For small shifts the weighting function should be the unit • For large shifts, where the point is obviously in error, the weighting should be zero, and • Must make a transition uniforma for intermediate displacements.
Many of the weighting functions that meet these criteria are possible. The preferred function is defined as W = 1 + Gd¿ (14) where G is the weight constant and d is the distance between the predicted and measured positions.
The optimal position can be found through an interactive procedure: starting with an initial position or an initial set of weights, new positions and weights are calculated alternately. Start conditions can be derived from previous fields. As the interactions progress, those points that are extreme values and therefore have small weights, could be reexamined to determine if a valid point near the predicted point can be found. For example, an object near the desired point may initially be erroneous for the desired point although according to the position estimates are restructured, it may be possible to detect the correct point by looking for a small region around the predicted position.
The interactive procedure can converge to an optimum that depends on the start conditions and on G. When G is sufficiently small, there is an individual optimum. When G is large there are often many optima. Some are stable, that is, a small change is restored by the interactive procedure, and some are unstable, that is, a small change leads to a new optimum.
To avoid being trapped in a local optimum where only a small number of points have significant weights, the interactive procedure can start with unit weights. The result can then be compared to that start from a forecast from previous fields. If the results match the measurement error, a predicted filtered value can be used for the model. If the results differ significantly, then those based on the live field should be used instead of the difference that may be due to a non-predictable change.
Critical Values of the Weighting Constant The transition between one and more than one optimum occurs at the critical value of G. The critical value depends on the displacement of the image points. The critical values of G will be calculated for a simple but important case. Assume that the reference image consists of a set of points along a straight line. Without loss of generality this is assumed to be vertical. Assume that in the live image a fraction R of the points is displaced horizontally by a distance H, perhaps due to the nearby object. If G is smaller then there is an optimum, considering that if G is large there will be two stable optima, one near the line and one close to the object. Open an additional unstable optimum between these two.
Assume that the line in the reference image is at X = 0 then equation 9 is simplified to: x = SiX? Ii / Si? I Adding the weights of equation 14 gives: rH 'X rs - \ ± G (H ~ x? \ - r t + GV \ -? - G (Hx This equation can be written again as a cubic: Gx3- (2-r) GHx2 + (1+ (1-r) GH2) x-rH = e (15) Equation 15 has been expressed in terms of the residual error e. The optimal positions correspond to e = 0. This equation can be rewritten by entering again the variables without dimension J and SJ = GH2 (16) S = x / H (17) Js3- (2-r) Jsz + (1+ (1 -r) J) sr = e (18) ) When J is less than the optimal position s = r. This individual optimum is stable. This is equivalent to saying that the optimal non-weighting position of the line is the arithmetic average of the measured live image points.
When R = 0.5, an optimum always exists at s = 0.5. When J is small, there is a stable optimum. When J is large this optimum is unstable and two stable optima exist for the large and smallest values of s. The critical value of j can be calculated by the differentiation equation 18 with respect to s then setting the value equal to 0 at s = 0.5. The critical value of J found by this method is 4.
For smaller values of r the critical value can be calculated as follows. The differentiation equation 18 and equating to 0 gives the stationary points: .2-r ± Vl-r - / - r2-3 / J) / 3 (19) This can be substituted into equation 18 by placing e = 0 to give an equation for J and r. For a given r the new equation can be solved numerically to find the critical value of J. If r less than 0.5 the higher stationary point must be used to determine the critical value. Using this method the following critical values are produced: r J 1/2 4.0 1/3 21.5 1/4 45.7 1/6 118 1/10 358 1/20 1518 In the first two situations that follow, a line of six points has a phase shift of one point by 10 pixels. From the previous table, the critical value of J is 118. From equation 16 the critical weight is G = J / H2 = 1.18. The smallest values of G give an individual optimum and higher values give two stable optima.
To select a value of G for a tracking application, several approaches can be followed. One of the simplest is to assume the measurement accuracy that is H pixels. Given two points of greater separation is optimal should favor one or the other. Therefore, if H were 2 pixels, G would be 1.0.
It should be understood that the foregoing description is illustrative of the present invention. The modifications can be easily distinguished by those of ordinary skill in the art without departing from the spirit or scope of the present invention.

Claims (22)

1. A method for tracking movement in a stream of video images characterized in that it comprises the steps of: a) obtain a set of image models (blocks) from a current video image that meets certain model capture criteria and store such image models in memory; b) determine the position of each stored image model with respect to the current image; c) calculating a transformation model using the determined model position with respect to the actual image, the transformation model to be used to correspond to the reference position data for the actual image position data; d) debug image models from memory that do not meet certain model retention criteria; and e) obtain new image models from the real image to replace the image models that were refined.
2. The method according to claim 1, characterized in that the debug stage (d) and the obtaining stage (e) are executed in either odd video fields only or in even video fields only in order to reduce the problems of video interlacing stability.
3. The method according to claim 1, characterized in that the obtaining step (e) is completed after a pre-established time limit or after a sufficient number of image models have been obtained, any event that occurs first.
4. The method according to claim 1, further characterized in that it comprises the steps of: f) determining an error surface that indicates the inequalities between each image model and the real image in a region close to the determined model position; g) evaluate the error surface to determine its minimum value in order to determine the best equality between the image models and the real image; h) use the error surface in the calculation of the transformation model.
5. The method in accordance with the claim 4, characterized in that it also comprises the step of: i) confirm the accuracy of the transformation model by comparing its corresponding results against a set of previously defined synthetic models.
6. The method in accordance with the claim 5, characterized in that the determination of the position of each stored image model with respect to the real image comprises the steps of: j) execute an integer position search in order to determine the minimum value of such error surface; and k) at the completion of the integer position search, execute a fractional pixel interpolation in order to estimate the fractional part of the movement of a small piece of the image prior to the actual image.
7. The method according to claim 6, characterized in that the execution of the integer position search comprises the steps of: 1) placing each model in several locations of integer position in the image model and calculating a surface for each location using the specified block matching criteria; m) search for a series of model regions that have an initial search region centered around a predicted model position derived from an estimation of the movement of a small part of the previous image for the real image, such search determines the size and shape of the image model; n) Successfully complete the search if a minimum is found within the predicted search region; I) finish the search of the whole position in an unsatisfactory way, if, after several attempts, a minimum can not be found within the predicted search region and store the information related to the unsuccessful search so that the block can be refined later.
8. The method in accordance with the claim 7, characterized in that the integer position search uses linear forecasting techniques.
9. The method according to claim 7, characterized in that the integer position search uses the second-order polynomial forecasting techniques.
10. The method according to claim 7, characterized in that the error surface indicating the inequalities between the image model and the real image in a region close to the determined model position is calculated according to the following block equalization technique. : where M represents the inequality value, N represents a normalized correlation calculation, I represents the pixel values in the real image, and T represents the pixel values in the image model.
11. The method in accordance with the claim 7, characterized in that the error surface indicating the inequalities between the image model and the real image in a region close to the determined model position is calculated according to the following block equalization technique: where M represents the inequality value, BA represents an error surface calculation, I represents the pixel values in the real image, and T represents the pixel values in the image model.
12. The method according to claim 7, characterized in that the error surface indicating the inequalities between each image model and the real image in a region close to the determined model position is calculated according to the following block equality technique. : M = l- Llnorm = l-? D where M represents the inequality value, Ll norm represents the calculation of the error surface, and d represents the difference in pixel values between the image model and the real image.
13. The method according to claim 12, characterized in that the fractional pixel interpolation uses a triangular interpolation method.
14. The method according to claim 7, characterized in that the error surface indicating the inequalities between each image model and the real image in a region close to the determined model position is calculated according to the following block equalization technique. : M = 1- L2norm = 1-? D2 where M represents the inequality value, L2 norm represents an error surface calculation, and d represents the difference in pixel values between the image model and the real image.
15. The method according to claim 14, characterized in that the fractional pixel interpolation uses a parabolic interpolation method.
16. The method according to claim 7, characterized in that the fractional pixel fraction interpolation uses an energy interpolation method of three halves.
17. The method according to claim 7, characterized in that the evaluation of the error surface to determine its minimum value in order to determine the best equality between the image models and the real image comprises the steps of: p) expand the image model that produces the subpixel position values; and q) executing an additional integer position search according to step (j) above in those subpixel locations.
18. The method in accordance with the claim 7, characterized in that the evaluation of the error surface to determine its minimum value in order to determine the best equality between the image models and the real image comprises the steps of: r) obtaining error surface values close to the minimum value determined by the integer position search of stage (j); s) interpolating a value in the horizontal scan line just above where the original integer position search determined a minimum, the interpolation carried out by a one-dimensional method; t) interpolating a value in the horizontal scan line where the original integer position search determined a minimum, the interpolation carried out by a one-dimensional method; u) interpolating a value in the horizontal scan line just below where the original integer position search determined a minimum, the interpolation is carried out by a one-dimensional method; and v) interpolate the values of steps (s), (t) and (u) to determine a final minimum value for such error surface.
19. The method in accordance with the claim 7, characterized in that the evaluation of the error surface to determine its minimum value in order to determine the best equality between the image models and the real image comprises the step of: w) interpolating the position of the minimum using a decomposition method of individual value
20. The method according to claim 7, characterized in that the calculation of the transformation model comprises the steps of: x) fix the horizontal and vertical weights for each block depending on the curvature of the error surface; and) calculate a preliminary transformation model using a mean square error reduction method; z) evaluate each block for spatial error to determine how well it matches the preliminary transformation model; aa) modify the weights for each block according to the spatial error; and bb) calculate a final transformation model using the modified block weights;
21. The method in accordance with the claim 7, characterized in that the model retention criteria require that the image models, in order not to be purified, should not be in a horizontal pressure area, should not be in an active aging region, should agree with the model of actual transformation with respect to the position, and must have sufficient curvature of the error surface.
22. The method in accordance with the claim 7, characterized in that it also comprises the steps of: ce) low pass filtering and sub-sampling of the image models obtained in step (a) in order to provide a series of lower resolution image models; dd) execute an integer position search on the image models at each resolution level, starting at the lowest resolution level and continuing; ee) calculate a transformation model at each resolution level in order to forecast the positions of the image models at the next higher level.
MXPA/A/1999/004772A 1996-11-27 1999-05-21 Motion tracking using image-texture templates MXPA99004772A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US60/031,883 1996-11-27

Publications (1)

Publication Number Publication Date
MXPA99004772A true MXPA99004772A (en) 2000-07-01

Family

ID=

Similar Documents

Publication Publication Date Title
US6529613B1 (en) Motion tracking using image-texture templates
US6741725B2 (en) Motion tracking using image-texture templates
US6504569B1 (en) 2-D extended image generation from 3-D data extracted from a video sequence
Kanade et al. Virtualized reality: Concepts and early results
US6084979A (en) Method for creating virtual reality
EP0509208B1 (en) Camera work detecting method
KR100271384B1 (en) Video merging employing pattern-key insertion
CN100449572C (en) Motion control for image rendering
US9117310B2 (en) Virtual camera system
US5845006A (en) Method of processing image formation
US20020164067A1 (en) Nearest neighbor edge selection from feature tracking
WO1997000494A1 (en) Method of tracking scene motion for live video insertion systems
JP2010502945A (en) Method for determining depth map from image, apparatus for determining depth map
JPH02278387A (en) Method for detecting and tracking moving object in digital image train with stationary background
KR20070119018A (en) Automatic scene modeling for the 3d camera and 3d video
EP1864502A2 (en) Dominant motion estimation for image sequence processing
JP6683307B2 (en) Optimal spherical image acquisition method using multiple cameras
JP2000306108A (en) Optical flow estimation method
JPH08280026A (en) Method and device for estimating motion and depth
Lhuillier Toward flexible 3d modeling using a catadioptric camera
JPH04345382A (en) Scene change detecting method
Swaminathan et al. Polycameras: Camera clusters for wide angle imaging
MXPA99004772A (en) Motion tracking using image-texture templates
JP2807137B2 (en) 3D shape detection method
GB2362793A (en) Image processing apparatus