GB2312347A

GB2312347A - Image transformation using a graphical entity

Info

Publication number: GB2312347A
Application number: GB9607797A
Authority: GB
Inventors: Patrick Beauchemin
Original assignee: Discreet Logic Inc
Current assignee: Discreet Logic Inc
Priority date: 1996-04-15
Filing date: 1996-04-15
Publication date: 1997-10-22
Also published as: GB9607797D0

Abstract

Image data, representing sequentially displayable frames in the form of a moving clip, allows a plurality of locations 215,214,213,212 defined within a first image of a clip. A graphical entity, such as a mesh, is constructed and associated with defined locations. These locations are automatically tracked over a plurality of frames (Fig 2 not shown). For each frame a transformation (221) is calculated applicable to transform all of the specified locations between adjacent frames. Additional point 304,303,302,301,315 within the mesh are also transformed in accordance with the calculated transformation.

Description

Title: PROCESSING IMAGE DATA The present invention relates to a processing image data, in which images are displayed sequentially to produce a moving clip.

INTRODUCTION The process of combining a plurality of image clips in order to produce a new output clip is generally referred to as compositing. Compositing stations produced by the present assignee are distributed under the trade marks "FLAME" and "FLINT", which primarily allow conventional editing techniques and special effects to be introduced as post production procedures for video and cinematographic film.

Often post production techniques are used to introduce special effects in which the image data is distorted in some way so as to create the impression that the distortion has occurred with respect to images displayed by the medium. This often involves constructing a graphical entity, such as a net or mesh, which is associated with defined locations within the image. However, a problem with this approach is that each moving clip requires, typically, thirty frames per second to be modified therefore if modifications must be made manually to each individual frame, the process becomes time consuming and hence expensive.

SUMMARY OF THE INVENTION According to a first aspect of the present invention, there is provided a method of processing image data, in which images are displayed sequentially to produce a moving clip, comprising steps...of defining a plurality of locations within a first image of a clip; constructing a graphical entity associated with said defined locations; tracking said locations over a plurality of frames; calculating transformation details applicable to each location to define transformations for said locations between each adjacent frames; and using said calculated transformation details to transform said graphical entity.

Preferably, locations are tracked by comparing pixel difference values within a search region.

According to a second aspect of the present invention, there is provided apparatus for processing image data, including means for displaying image frames sequentially to produce a moving clip, comprising difining means for difining a plurality of locations within a first image clip; constructing means for constructing a graphical entity associated with said defined locations; tracking means for tracking said locations over a plurality of frames; and processing means arranged to calculate transformation details applicable to each of said defined locations, and to specify transformations for said locations between adjacent frames, whereafter said calculated transformation details are implemented to transform said graphical entity.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows a compositing station, including a monitor for viewing video images and an interface for identifying tracked positions within said images; Figure 2 shows images of the type displayed on the monitor shown in Figure 1; Figure 3 shows an enlarges image of the type shown in Figure 2 and a procedure for transforming vertices; and Figure 4 details a process for using the transformation identified in Figure 3.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention will now be described by way of example only with reference to the previously identified drawings.

A compositing station is shown in Figure 1, in which video clips are processed within a processing device 101. The processing of image clips is controlled in response to manual operation of a stylus 102 against a graphics touch tablet 103. Image clips are displayed to a video artist via a display unit 104.

Images are displayable on monitor 104 as sequential frames, to show a moving clip to the video artist. The artist is required to perform a pixel transformation on an image-by-image basis, possibly consisting of an image morph or warp etc. In order to produce this effect, it is necessary to define a graphical entity from which warping or morphing characteristics may be calculated. This graphical entity, in the example, takes the form of a mesh defined by a plurality of interconnecting vertices. Thus, pixel locations may be calculated as having distances relative to these vertices or said pixels may be identified as falling within a specific polygon defined by the mesh. A transformation, possibly in the form of a warp or a morph may be effected by firstly transforming the graphical entity. Thereafter, given the new positions of the vertices, it is possible for the polygons to be transformed on a frame-by-frame basis, thereby producing the required effect.

The artist defines a plurality of locations within a first image clip thereby effectively specifying the boundaries of the graphical entity.

From these defined locations, the graphical entity is constructed whereafter, on a frame-by-frame basis, the defined locations are tracked over a plurality of frames. In this way, it is possible to see a plurality of defined locations transform on frame-by-frame basis. For any specified number of defined locations, it is possible to define a functional transformation which, when receiving a coordinate location as its input, will produce new coordinate locations as outputs. Thus, the positions of defined locations in a first frame are compared with their locations in the subsequent frame, said subsequent locations being identified by an automated tracking operating.

By comparing the coordinate locations of associated points1 that is to say, the same notional points from one frame to the subsequent frame, transformation details are calculated which are applicable to each location, so as to define transformations for said locations between adjacent frames. Having calculated this transformation, based on the points that have been tracked, the calculated transformation is then used in order to transform further details of the graphical entity.

Thus, having tracked the originally identified locations, and used this tracking procedure to calculate transformation functions between adjacent frames, it is then possible to use these transformation functions in order to transform other parts of the graphical entity which1 in the example, includes additional vertices forming part of the transformed mesh.

Having transformed a mesh from a first frame to a second frame, it is now possible to repeat the procedure with respect to the second frame and the third frame. Thus, again, a transformation function, in the form of a matrix, is calculated for transforming points from the second frame to the third frame. Again, this transformation matrix is invoked for transforming additional points within the graphical entity such that the mesh as a whole can be transformed from the first frame to the third frame.

The procedure is repeated for each pair of adjoining frames within the clip, until calculations have been made allowing the whole of the graphical entity to be transformed throughout the duration of the clip. It will be appreciated that this procedure may be invoked provided that it is possible to track the defined locations for the duration of the clip.

However, the procedure significantly reduces the amount of time required to create an effect derived from a graphical entity, given that, under favourable conditions, it is not necessary to define the graphical entity manually for each individual frame within the clip.

A plurality of frames taken from a moving clip is shown in Figure 2. In a first frame 201, four locations identified as 202, 203, 204, and 205 have been identified by the video artist, by manual operation of stylus 102. These locations may, for example, identify the positions of corners of a moving rectilinear object, such as the side of a vehicle. As the clip progresses, the overall shape of the tracked object will remain rectilinear but its actual orientation and specific configuration will shift.

When viewed as a polygonal surface, this shifting may be considered as a transformation, specified in terms of translations, rotations and enlargements. As such, the transformation may be defined as a transformation matrix, effectively specifying a function which will map all (x,y) coordinate locations within the first frame to similar (x,y,) locations in the second frame.

A second frame of the clip is identified at 206. Within frame 206, location 202 has been tracked to location 207. Similarly, location 203 has been tracked to 208, with location 204 being tracked to location 209 and location 205 being tracked to location 210. On the next frame of the clip 211, location 207 has been tracked to location 212, location 208 has been tracked to location 213, location 209 has been tracked to location 214 and location 210 has been tracked to location 215. Again, these positions are considered and a further transformation matrix has been calculated which, when applied as a transformation function1 will map all locations 207, 208, 209 and 210, to their associated locations 212, 213, 214 and 215 respectively. Thus, it should be appreciated that the transformation calculated for transforming locations from frame 206 to frame 211 is not in any way derived from the transformation matrix arranged to translate locations within frame 201 to 206. However, given that these transformations will tend to be produced by natural movements within the image frame, the transformations calculated on a frame-by-frame basis will tend to represent some functional relationship which could, if required, be modelled by a cubic spline. In this way, if for any reason tracking locations are lost within a particular frame, it is possible to estimate the position of the locations using such a polynomial relationship.

Thus, the next frame in the clip is identified as 216. Location 212 is transformed to location 217, location 213 is transformed to location 218, location 214 is transformed to location 219 and location 215 is transformed to location 220. Again, a transformation matrix is calculated which maps each location 212, 213, 214, 215 on to its associated location 217, 218, 219 and 220 respectively, in the next image frame.

A matrix transformation is effected by a procedure of matrix multiplication. Thus, coordinate locations within frame 211, for example, would be represented by a matrix and the transformation to the locations within image frame 216 would be effected by multiplying this matrix by a transformation matrix.

The transformation function is illustrated 221. Coordinate locations in the next frame, such as frame 216 are represented by coordinate locations X(n+1) and Y(n+1). These coordinate locations are calculated by performing the transformation function F, defined by a transformation matrix, upon the coordinate positions of the corresponding locations within the previous frame, identified as X(n) and Y(n). Thus, when locations X(n+1), Y(n+1) and X(n), Y(n) are known, conventional techniques allow the transformation matrix F to be calculated. Whereafter, this transformation matrix is used to transform other vertices defined as part of a graphical entity associated with the defined locations illustrated in Figure 2.

The transformation of other vertices forming part of a graphical entity is illustrated in Figure 3. Locations 212, 213, 214 and 215 are shown in Figure 3, derived from an image frame 211. These locations have been tracked from similar locations within image frame 206 and a matrix has been defined which specifies translations of locations 207 208, 209 and 210t to their corresponding locations 212, 213, 214 and 215. Similarly, these locations are subsequently tracked to positions 217, 218, 219 and 220 in image frame 216. Again, a transformation matrix is calculated which transforms the locations from image frame 211 to image frame 216.

After the defined locations have been tracked for the entire image clip and transformation matrices have been calculated specifying the transformation which takes place on a frame-by-frame basis, a graphical entity is calculated for the first image frame 201. The graphical entity is constructed by bisecting vectors connecting defined locations to identify new vertices. The vertices are then themselves connected to define a polygon mesh and the newly defined internal vertices may be considered as part of the graphical entity.

Thus, in this example, a rectilinear object is originally created which, in the example shown in Figure 3, represents vectors connecting location 212 to location 213, followed by connecting location 213 to location 214, followed by connecting location 214 to location 215 and finally resulting in location 215 being connected to location 212. These vectors are then themselves bisected, such that the length of vector 212-213 is halved resulting in an identification of vertex 301. A similar bisection is performed across vector 214-215, resulting in the identification of a new vertex 302. Vertex 301 is connected by a vector to vertex 302, resulting in the creation of mesh line 301-302.

A similar bisection occurs with respect to vectors 213-214, resulting in the definition of vertex 303, and in the creation of vertex 204, as the bisection of vector 215-214. Again vertices 303 and 304 are connected by a vector 303-304 which in turn intersects vector 301-302, resulting in new vertex 305 being specified.

This bisection process is repeated for each of the bisected vectors, resulting in the definition of a total of 25 vertices defining the polygon mesh having 16 polygons therein. These polygons may then be considered as transforming on a frame-by-frame basis, which in turn allows them to be used for pixel based transformations, as is known in the art.

Procedures implemented by the processing device 101 are detailed in Figure 4. At step 401 locations specified by the video artist are tracked on a frame-by-frame basis. Then, at step 402 the relative positions of the tracked locations are calculated on a frame-by-frame basis, so as to calculate a transformation matrix for each frame transition.

At step 403 the first function transformation calculated at step 402 is applied to a mesh generated for the first frame resulting in the mesh being transformed to locations within the second frame. At step 404 a question is asked as to whether another frame is to be processed and when answered in the affirmative control is returned to step 403. Thus, at step 403 the next transformation matrix is selected resulting in the mesh being transformed again in to the next frame of the clip.

Eventually, all frames within the clip will have been processed resulting in the question asked at step 404 being answered in the negative.

At step 405 a specified effect is performed using the mesh created at step 403, for each frame, again on a frame-by-frame basis.

Thus, having performed the effect at step 405 for the first frame transition, a question is asked at step 406 as to whether another frame is present. When answered in the affirmative, control be returned to step 405, resulting in the effect being executed for the next frame transition using the mesh calculated for that particular frame. Eventually, the question asked at step 406 will be answered in the negative, thereby completing the overall effect.

It will be appreciated that the procedure identified in Figure 4 allows special effects such as warping or morphing etc to be implemented substantially more quickly than when using conventional techniques.

The tracking procedure identified at step 401 is arranged to identify a region association with defined locations within the image frame. The tracking box is identified and displayed to the video artist.

During the tracking procedure, values within this box are retained in memory such that, referring to the next frame, comparisons are made to determine how pixel values within a processed region have moved with respect to a search region. Thus, a region of 8 x 8 pixels may be identified in the first frame and this 8 x 8 array is compared against a 16 x 16 pixel array in the next frame. Comparisons are made so as to identify a best match from which a transformation matrix may be derived for the specified location.

Claims

1. A method of processing image data, in which images are displayed sequentially to produce a moving clip, comprising steps of defining a plurality of locations within a first image of a clip; constructing a graphical entity associated with said defined locations; tracking said locations over a plurality of frames; calculating transformation details applicable to each location to define transformations for said locations between each adjacent frame; and using said calculated transformation details to transform said graphical entity.

2. A method according to Claim 1, in which said images are derived from data representing non-compressed broadcast-quality video images.

3. A method according to Claim 1, wherein said locations are defined by manual operation of a stylus upon a touch tablet.

4. A method according to Claim 1, wherein said graphical entity is a polygonal mesh having a plurality of vertices.

5. A method according to Claim 1, wherein said locations are tracked by comparing pixel difference values within a search region.

6. A method according to Claim 1, wherein said transformation details are calculated by deriving transformation matrices.

7. A method according to Claim 6, wherein said transformation matrices are applied to coordinate locations specified within said graphical entity.

8. Apparatus for processing image data, including means for displaying image frames sequentially to produce a moving clip, comprising defining means for defining a plurality of locations within a first image clip; constructing means for constructing a graphical entity associated with said defined locations; tracking means for tracking said locations over a plurality of frames; and processing means arranged to calculate transformation details applicable to each of said defined locations, and to specify transformations for said locations between adjacent frames, whereafter said calculated transformation details are implemented to transform said graphical entity.

9. Apparatus according to Claim 8, wherein said means for displaying image frames is arranged to display non-compressed broadcast-quality frames at video rate.

10. Apparatus according to Claim 8, wherein said defining means includes a manually operable stylus and a co-operating touch tablet.

11. Apparatus according to Claim 8, wherein said constructing means is arranged to construct a polygonal mesh having a plurality of vertices.

12. Apparatus according to Claim 8, wherein said tracking means is arranged to track locations by comparing pixel difference values within a search region.

13. Apparatus according to Claim 8, wherein said processing means is arranged to calculate transformation details by calculating transformation matrices.

14. Apparatus according to Claim 13, wherein said processing means is arranged to apply said transformation matrices to coordinate locations specified within said graphical entity.

15. A method of processing image data, in which images are displayed sequentially to produce a moving clip, substantially as herein described with reference to the accompanying drawings.

16. Apparatus for processing image data, including means for displaying image frames sequentially to produce a moving clip, substantially as herein described with reference to the accompanying drawings.