OA10269A - Method and apparatus for data analysis - Google Patents

Method and apparatus for data analysis Download PDF

Info

Publication number
OA10269A
OA10269A OA60791A OA60791A OA10269A OA 10269 A OA10269 A OA 10269A OA 60791 A OA60791 A OA 60791A OA 60791 A OA60791 A OA 60791A OA 10269 A OA10269 A OA 10269A
Authority
OA
OAPI
Prior art keywords
signal
component
record
samples
signais
Prior art date
Application number
OA60791A
Inventor
Harald Aagaard Martens
Jan Otto Reberg
Original Assignee
Idt Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Idt Inc filed Critical Idt Inc
Publication of OA10269A publication Critical patent/OA10269A/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/23Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with coding of regions that are present throughout a whole video segment, e.g. sprites, background or mosaic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

A method and apparatus are disclosed for converting between samples of an input signal and an encoded signal composed of a plurality of component signals each representing a characteristic of the input signal in a different domain. The input signal is comprised of data samples organized into records of multiple samples, with each sample occupying a unique position within its record and each component signal is formed as the combination of a plurality of factors, each factor being the product of a score signal and a load signal. The score signal defines the variation of data samples from record to record and the load signal defines the relative variation of a subgroup of samples in different positions of a record.

Description

0 1 ί; 2 0 9
METHOD AND APPARATUS FOR DATA ANALYSIS
FIELD OF THE INVENTION
The présent invention relates generaily to amethod and apparatus for data analysis. More specifically,the présent invention relates to a method and apparatus foranalyzing data and extracting and utilizing relationalstructures in different domains, such as temporal, spatial,color and shape domains.
BACKGROUND OF THE INVENTION
Full motion digital image sequences in typicalvideo applications require the Processing of massive amountsof data in order to produce good quality Visual images fromthe point of view of shape, color and motion. Data compres-sion is often used to reduce the amount of data which mustbe stored and manipulated. A data compression System typi-cally includes modelling sub-Systems which are used toprovide simple and efficient représentations of the largeamount of video data. A numbe r ressicr. syscems bave cesn devel-oped which are well suited for video image rcncressicn.
These Systems car. be classifiée incc three m.atn groucsaccording to their opérations! and modelling characteris-tics. First, there is the causai global mcdeliing aoorcach.An example of this type of mcdel is a three dimensionai 3D)wire frame model which implies spatial controiiing positionand intensity at a sma.ll set of more or less fixed wireframegrid points and interpolâtes between the grid points. Insome applications, this approach is combined with 3D raytracing of solid objects. This wire frame approach iscapable of providing very efficient and compact data repré-sentation, since it involves a very deep model, i.e., asignificant amount of effort must be invested up front todevelop a comprehensive model. Accordingly, this modelprovides good visual appearance.
However, this approach suffers from several sig-nificant disadvantages. First, this causal type modelrequires detailed a priori (advance) modelling informationon 3D characterization, surface texture, lighting character-ization and motion behavior. Second, this approach has verylimited empirical flexibility in generic encoders, sinceonce the model has been defined, it is difficult to supplé-ment and update it dynamicaliy as r.sw and unexpected imagesare encountered. Thus, this type of model has limitedusefulness in situations requirmg dynamic modelling of realtime video sequences. A second type of modelling System is an empirical,updatable compression System which involves very limited mode! devsicpr.enc , bue prcvides relaciveiv inefficient ccm.pression. The MPEG 1 and M?EG 2 compatible Systems represenc such an apprcach. Fer exemple, in the MPEG stan- dard, an image serjer.ee is represented as a sparse set of 5 still image trames, e.g., every tenth frame in a sequence,which are ccmpressed/deccmpressed in terms of pixel blccks,such as 8 x S pixel blccks. The intermediate frames arereconstructed based on the closest decompressed frame, asmodified by additicnal information indicating blockwise 10 changes representing block movement and intensity changepatterns. The still image compression/decompression istypically carried out using Discrète Cosine Transforms(DCT), but other approaches such as subband, wavelet orfractal still image coding may be used. Since this approach 15 involves very little modelling depth, long range systematicredundancies in time and space are often ignored so thatessentially the same information is stored/transmitted overand over again. A third type of modelling system is an ernpirical 20 global modelling of image intensities based on factor analy-sis. This approach utilizes various techniques, such asprincipal component analysis, for approximating the intensi-ties of a set of N images by weighted suras of F "factors."Each such factor has a spatial parameter for each pixel and 25 a temporal parameter for each frame. The spatial parametersof each factor are sometimes referred to as "loadings",while the temporal parameters are referred to as "scores".One example of this type of approach is the Karhunen-Loeveexpansion of an N x M matrix of image intensities (M pixels 69 per crame, N trames; fer compression anà reccgnicion oc human facial images. This is discussed in detail in Kirby, M. and Sirovich, L. "Application or the Karhunen-LoeveProcedure for the Characterization of Human Faces", IEEETransactions on Pattern Analysis and Machine Intelligence,Vol. 12, No. 1, pp. 103-108 (1990), and R.C.Gonzales andR.E.Woods, Digital Image Processing. Chapter 3.6 (Addison-Wesley Publ.Co., ISBN 0-201-50803-6, 1992) which are incor-porated herein by reference.
In Karhunen-Loeve expansion (also referred to aseigen analysis or principal component analysis, Hotellingtransform and singular value décomposition), the product ofthe loadings and the scores for each consecutive factorminimizes the squared différence between the original andthe reconstructed image intensifies. Each of the factorloadings has a value for each pixel, and may therefore bereferred to as "eigen-pictures" ; the corresponding factorscore has a value for each frame. It should be noted thatthe Karhunen-Loeve System utilizes factors in only one do-main, i.e., the intensity domain, as opposed to the présent invention which utilizes factors in multiple domains, suchas intensity, address and probabilistic domains.
Such a compression System is very efficient incertain situations, such as when sets of pixels displayinterrelated intensity variations in fixed patterns fromimage to image. For example, if every time that pixels a,b, c become darker, pixels d, e, f become lighter, and viceversa, then ail of pixels a, b, c, d, e, f can be effective-ly modelled by a single factor consisting of an eigen pic- 2 69 cure rrzensrcy iCauLr.g r.aving ousiuive values cor pixels a,b, c ar.o r.-aarrve values zzz pixels h, e, f. The grouo cfpixels woulo cher, ce r.cue — eu oy e sincrle score number foreach image. Cuber ir.uerrelaued pixel patterns would alsogive rise to additicnal factors.
This type of approach results in visually disruo-tive errors in the reoonstructeô image if too few factorsare used to represent the original images. Additionally, ifthe image-to-image variations include large systematicspatial changes, such as moving objects, then the number ofeigen pictures required for good visual représentation willbe correspondingly high. As a resuit, the compression ratedeteriorates significantiy, Thus, the Karhunen-Loeve Sys-tems of factor modelling of image intensities cannot providethe necessary compression required for video applications. A fourth approach to video coding is the use ofobject oriented codées. This approach focuses on identify-ing "natural" groups of pixels ("objects”) that move and/orchange intensity together in a fairly simple and easilycompressible manner. More advanced versions of objectoriented Systems introduce a certain flexibility with re-spect to shape and intensity of individual objects, e.g.,affine shape transformations such as translations, scaling,rotation and shearing, or one factor intensity changes.However, it should be notée that the object oriented ap-proach typically employs only single factors.
In prior art Systems, motion is typically approxi-mated by one of two methods. The first of these methods isincrémental movement compensation over a short period of • · · r‘ f\ V i υ Ü Ι-ο
Cime which is essentiailv a différence codir.g according cowhich the différence betwsen pixels in a frame, n, and aprevious frame, n-1, are transmitted as a différence imacre.MPEG is one example of this cype of System. This approachallows for relatively simple introduction of new featuressince they are merely presented as part of tne différenceimage. However, this approach has a significant disadvan-tage in that dynamic adaptation or learning is very diffi-cult. For example, wnen an object is moving in an image,there is both a change in location and intensity, making itvery difficult to extract any systematic data changes. As aresuit, even the simplest form of motion requires extensivemodelling.
Another approach to incrémental movement compensa-tion is texture mapping based on a common reference, frame,according to which motion is computed relative to a commonreference frame and pixels are moved from the common refer-ence frame to synthesize each new frame. This is the ap-proach typically employed by most wire frame models. Theadvantage of this approach is that very efficient and com-pact représentation is possible in some cases. However, thesignificant downside to this approach is that the efficiencyis only maintained as long as the moving objects retaintheir original intensity or texture. Changes in intensityand features are not easily introduced, since existing
Systems incorporate only one dimensional change models, ineither intensity or address.
Accordingly, it is an object of the présent inven-tiôn to provide a method and apparatus for data .analysis which provides very efficient and compact dais, représenta-::cr. withcut reçnirmg a significant à’our.: ci advancsdmodelling icicrmaticn, doc stiii being aile te utilize suchinformation if it dces exist. 5 It is also an object of the présent invention to provide a method and apparatus for data analysis havingempirical flexibility and capable oi dynamic updating basedon short and long range systematic redundancies in variousdomains in the data being analyzed. 10 It is a further object of the présent invention to provide a method and apparatus for data analysis whichutilizes factor analysis in multiple domains, such as ad-dress and probabalistic domains, in addition to the intensi-ty domain. Additionally, the factor analysis is performed 15 for individual subgroups of data, e.g., for each separatespatial object.
An additicnal object of the présent invention isto provide a method and apparatus for data analysis whichuses multiple factors in several domains to model objects. 20 These "soft" models (address, intensity, spectral property,transparency, texture, type and time) are combined with"hard" models in order to allow for more effective learningand modelling of systematic change patterns in input data,such as a video image. Examples of such "hard" modelling 25 are: a) conventional affine motions modelling of moving objects w.r.t. translation, rotation, scaling and shearing(including caméra panning and zocming effects), and, b)multiplicative signai correction iMSC) and extensions ofthls, modelling of.mixed multiplicative and additive inten-
2Q sity effects (H. Martess and T. Naes, Multivariate Calibra-t ion. pp. 345 - 350, (John Wiley & Sons, 1969; , which isincorporated herein by rsference. A further object of the présent invention is the rr.cdel-ling of objects in domains other t’nan the spatial domain,e.g., grouping of local temporal change patterns into temoo-ral objects and grouping of spectral patterns into spectralobjects. Thus, in order to avoid undesirable oversimolify-ing associated with physical objects or object orientedprogramming, the term "holon" is used instead.
Yet another object of the présent invention is theuse of change data in the various domains to relate eachindividual frame to one or more common reference frames, andnot to the preceding frame of data.
SUMMARY OF THE INVENTION
The method and apparatus for data analysis of theprésent invention analyze data by extracting one or moresystematic data structures found in the variations in theinput sequence of data being analyzed. These variations aregrouped and parameterized in various domains to form areference data structure with change models in these do-mains. This is used in modelling of input data being ana-lyzed. This type of parameterization allows both compres-sion, interactivity and interpretability. Each data inputis then approximated or reconstructed as a composite of oneor more parameterized data structures maintained in thereference data structure. The flexibility of this approach lies in the fact that the systematic data structures and their associated change model parameters that make up the rererer.ee rata scrucrure car. ce meerfrea by appropriate parameter charges ir. créer ce msure che flexibility and scc-rcabiircy or eacr. mc.rvrc.ua.i_ syscemacic daca structure te a iarger r.umber or meut data. The paramecerization •-mat.· consists of "soft" mr 5Γ·,·»3Βί3®5#«η?ΒΕ88»ρ'2ΐ*^ΐΞΞύΪΐί2^^Ξ1^2^Ξ2^ΖΖΙ2ί2 riate factor modelling in various domains for various holons, which is optionally combinedwith "nard" causal modelling of the various domains, inaddition to possible error correction residuals. A pre-ferred embodiment of the présent invention is explained with 10 reference to the coding of image sequences such as video, inwhich case the most important domains are the intensity,address and probabilistic domains.
The présent invention includes a method and appa-ratus for encoding, editing and decoding. The basic model- 15 ling or encoding method (the "IDLE" modelling method) may becombined with other known modelling methods, and severalways of using the basic modelling method may be combined andcarried out on a given set of data.
The encoding portion of the présent invention in- 20 cludes methods for balancing the parameter estimation in thevarious domains. Also, the modelling according to theprésent invention may be repeated to produce cascaded model-ling and meta-modelling.
BRIEF DESCRIPTION OF THE DRAWINGS 25 The foregoing brief description and further ob- jecte, features, and advantages of the présent inventionwill be understood more completely from the following de-scription of presently preferred embodiments with referenceto'the drawings in.which:
t ί u 2 6 S 10
Figure 1 is a flow-chart iilustrating the high levsl operation of the encoding and decoding prccess acccrd- ing to the présent invention;
Figure 2 is a block diagram iilustrating singular5 value décomposition of a data matrix into the product of a score matrix and a loading matrix plus a residual matrix;
Figure 3a is a pictorial représentation of thedata format for each individual pixel in a reference image;
Figure 3b is a pictorial représentation of how a10 reference frame is derived;
Figures 4a-n are pictorial illustrations of model-ling in the intensity (blush) domain, wherein,
Figures 4a through 4c illustrate various de-grees of blushing intensity in input images; 15 Figures 4d through 4f illustrate the' intensi- ty change fields relative to a referenceframe in the encoder;
Figures 4g and 4h illustrate a blush factorloading that summarizes the change fields of 20 several frames in the encoder;
Figures 4i through 4k illustrate the recon-struction of the change fields in the décod-er;
Figures 41 through 4n illustrate the resuit- 25 ing reconstruction of the actual image inten- sifies from the changefields and referenceimage, in the décoder.
Figures 5a-n are a pictorial illustration of modellingin the address (smile) domain, wherein, 0 2 69
Figures 5a chrcugh zz illustrate varions de-grees ci smiling 'ror.sr.is or address changes
Figures 5d through 5f iliuscrate the address 5 change fields correspcnding to various de- grees of moveraents relative to the referenceimage ;
Figure 5g shows the reference intensity imageand Figure 5h illustrâtes a smile factor 10 loading;
Figures 5i through 5k illustrate the recon-structed address change fields;
Figures 51 and 5n illustrate the resultingreconstructed smiled image intensifies. 15 Figure 6 is a block diagram représentationof an encoder according to the présent invention;
Figure 7 is a block diagram représentation of amcdel estimator portion of the encoder of Figure 6;
Figure 8 is a block diagram représentation of a 20 change field estimator of the model estimator of Figure 7;
Figure 9 is a pictorial représentation of the operation of a the use of forecasting and local change fieldestimâtes in the change field estimator of Figure 8;
Figure 9a is a step-wise illustration of the use25 of forecasting and local change field estimâtes;
Figure 9b is a summary illustration of the move-ments shown in Figure 9a;
Figure 10 is a detailed block diagram of portions of the change field estimator of Figure 8;
' i ' 2 G S 12
Figure 11 is a block G· 1. H Ç 273.ΓΓ. the local change field estimâtor portion of the change f ? o Id estimator shown in Figures 8 and 10; Figure 12 is a block diagram of the intepreter portion of the encoder shown in Figure 7; Figure 13 is a block diagram of the décoder, used both as part of the encoder in Figure 8, and as stand-alone décoder.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The method and apparatus for data analysis of theprésent invention may be used as part of a data compressionSystem, including encoding and decoding circuits, for com-pressing, editing and decompressing video image sequences byefficient modelling of data redundancies in various'datadomains of the video image sequences.
Self-Modelling of Redundancies in Various Domains and Sub-
Operands
The System of the présent invention models redun-dancies in the input data (or transformed input data).
These redundancies may be found in the various domains or"operands" (such as coordinate address, intensity, andprobabalistic) and in various sub-properties of these do-mains ("sub-operands"), such as individual coordinate direc-tions and colors. Intensity covariations over time andspace between pixels and frames, and over time and spacebetween color channels may be moieLied. Movementcovariations are also modelled :ver time and space betweenpixels, and over time and space between different coordinate 02 69 channsis. These movemens ccvariacions tvpically describethe rxvener.i ci an cbjett as it caves across an image. Thecbjects or colons need no: be physical objeccs, racher theyrepresent connscced strucoures with simplified multivariatemodels of systematic changes in various domains, such asspatial distortions, intensity changes, color changes,transparency changes, etc.
Other redundancies which may be modelled includeprobabalistic properties such as opacity, which may bemodelled over time and space in the same manner as colorintensities. In addition, various low-level statisticalmodel parameters from various data domains may be modelledover time and space between pixels and between frames.
In the présent invention, successive input framesare modelled as variations or déviations from a reference frame which is chosen to include a number of characteristicsor factors in the various domains. For example, factorsindicative of intensity changes, movements and distortionsare included in the reference frame, such that input framescan be modelled as scaled combinations of the factors in-cluded in the reference frame. The terms factors and load-ings will be used interchangeably to refer to the systematicdata structures which are included in the reference frame.
Abstract Redundancy Modelling
The System and method of the présent invention combine various model structures and estimation principles,and utilize data in several different domains, producing amodel with a high level of richness and capable of recon- wwea we«ii»SE222!2ŒB£2 14 scructing several different image éléments. The mode! mayde expressed at varions levels of depth.
The modelling features of the présent inventionare further enhanced by using externaily established model 5 parameters from previous images. This procedure utilisespre-established spatial and/or temporal change patterns,wnich are adjusted to model a new scene. Further enhance-ment may be obtained by modelling redundancies in the modelparameters themselves, i.e., by performing principal compo- 10 nent analysis on the sets of model parameters. This isreferred to as meta-modelling.
The présent invention may employ internai datareprésentations that are different from the input and/oroutput data format. For example, although the input/output 15 format of video data may be RGB, a different color èpace maybe used in the internai parameter estimation, storage,transmission or editing. Similarly, the coordinate addressSystem may be cartesian coordinates at a certain resolution(e.g., PAL format), while the internai coordinate System may 20 be different, e.g., NTSC format or some other regular or irregular, dense or sparse coordinate System, or vice versa.
Encoder
An encoder embodying the présent invention pro-vides models to represent systematic structures in the input 25 data stream. The novel model parameter estimation is multi-variate and allows automatic self-modelling without the needfor any prior model information. However, the System canstill make effective use of any previously established modelinformation if it is available. The System also provides 02 69 dynamic mechanisms for updating cr eiiminating modal compo-r.erms chat are fcund ce be irrelevant or unreliable. TheSystem is also flexible in that different levai models maybe used at different tim.es. For example, at times it may beadvantageous to use shallow intensity based compression,while at other times it may be désirable to use deep hardmodels which involve extensive prior analysis.
Additionally, the présent System includes automat -ic initialization and dynamic modification of the compres-sion model. In addition, the présent invention may be usedfor any combination of compression, storage, transmission,editing, and control, such as are used in video téléphoné,video compression, movie editing, interactive games, andmedical image databases.
In addition, the présent invention can usé.factormodelling to simplify and enhance the model parameter esti-mation in the encoder, by using preliminary factor modelsfor conveying structural information between various localparts of the input data, such as between individual framesin a video sequence. This structural information is usedstatistically in the parameter estimation for restrictingthe number of possible parameter values used to model eachlocal part, e.g., frame. This may be used in the case ofmovement estimation, where the estimation of the movement field for one frame is stabilized with the help of a low-dimensional factor movement model derived from other framesin the same sequence.
An encoder according to the présent invention com- presses large amounts of input data, such as a video data scream, by compressing the data in separace stages accordinglo various rncdeis. In general, vidée secuences or framescan be represented by the frame -to-trame or interframevariations, including the variation rrom a blank image to 5 the first frame as well as subséquent interframe variations.In the présent encoder, interframe variations are detected,analyzed and modelled in tenus of spatial, temporal andprobabalistic model parameters in order to reduce the amountof data required to represent the original frames. The 10 obtained model parameters may then be further compressed toreduce the data stream necessary for representing the origi-nal images. This further compression may be carried out byrun length coding, Huffman coding or any other statistical 'compression technique. 15 The compressed data may then be edited (é.g., as part of a user-controlled video game or movie editing Sys-tem) , stored (e.g., in a CD-ROM, or other storage medium) ortransmitted (e.g., via satelite, cable or téléphoné line),and then decompressed for use by a décoder. 20 Décoder
The présent invention also provides for a décoder,at a receiving or décompression location which essentiallyperforms the inverse function of the encoder. The décoder 25 receives the compressed model parameters generated by theencoder and décompresses them to obtain the model parame-ters. The model parameters are then used to reconstruct thedata stream originally input to the encoder. P a rameter Est imat ion in the Encoder 17
Fxtending, wider.ing and Deenening of a Reference
Model ln the encoder of the présent invention, one ormore extended reference images are developed as a basis forother model parameters to represent the input data stream ofimage sequences or frames. Thus, ail images are representedas variations or changes relative to the extended referenceimages. The reference images are chosen so as to be repré-sentative of a number of spatial éléments found in a sé-quence of images. The reference image is "extended" in thesense that the size of the reference image may be extendedspatially relative to an image or frame in order to accommo-date and include additional éléments used in modelling the image sequences. Conceptually, the reference frame in thepreferred embodiment is akin to a collage or librarÿ .ofpicture éléments or components.
Thus, a long sequence of images can be representedby a simple model consisting of an extended reference imageplus a few parameters for modelling systematic image changesin address, intensity, distortion, transparency or othervariable. When combined with individual temporal parametersfor each frame, these spatial parameters define how thereference image inensities in the décoder are to be trans-formed into a reconstruction of that frame's intensities.Reconstruction generally involves two stages. First, itmust first be determined how tr.e reference frame intensities are to be changed spatially ’erms of intensity, transpar-ency, etc. from the reference rjordinate System and repré-sentation to the output frame ccordinate System and repre- 010269 13 sentation. Second, the reference frame intensifies must be char.ged to the output frame intensifies using image warping.
System Operation
Figure 1 is a block diagram illustration of thehigh level operation of the présent invention, showing boththe encoding and decoding operations. In the encoder, videoinput data 102 is first input to the System at step 104 andchanges are detected and modelled at steps 106 and 108respectively, in order to arrive at appropriate model param-eters 110.
The model parameters 110 are then compressed atstep 111 in order to further reduce the amount of informa-tion required to represent the original input data. Thisfurther compression takes advantage of any systematic dataredundancies présent in the model parameters 110. Thesetemporal parameters also exhibit other types of redundan-cies. For example, the scores or scalings which are appliedto the loadings or systematic data structure in the refer-ence frame, may hâve temporal autocorrélation, and cantherefore be compressed by, for example, prédictive codingalong the temporal dimension. Additionally, there arecorrélations between scores which can be exploited bybilinear modelling, followed by independent compression andtransmission of the model parameters and residuals. Like-wise, other redundancies such as between color intercorrelations or between parameter redundancies that maybe modelled.
These model parameters 110 are then used by a décoder according to the présent invention where the model 19 parame'srs are first deccmpressed at step 120, and at step 122, used to reconstrucc cas original input image, thereby producing the image output cr video output 124.
The décompression procedure at step 120 is essen-5 tially the inverse process that was performed in the com- pression step 111. It should be noted that the encoder anddécoder according to the présent invention may be part of areal-time or pseudo real-time video transmission System,such as picture téléphoné. Alternatively, the encoder and 10 décoder may be part of a storage type System, in which theencoder compresses video images or other data for storage,and retrieval and décompression by an encoder occur later.For example, a video sequences may be stored on floppydisks, tape or another portable medium. Furthermore, the 15 System may be used in games, interactive video and Virtualreality applications, in which case the temporal scores inthe décoder are modified interactively. The System may alsobe used for database operations, such as medical imaging,where the parameters provide both compression and effective 20 search or research applications.
Soft Modelling by Factor Analysis of Different Domains and
Sub-Operands
The présent invention utilizes factor analysis,which may be determined by principal component analysis or 25 singular value décomposition, to détermine the varions
factors which will be included in the reference frame. A video sequence which is input to the présent invention may be represented as a sériés of frames, each frame represent- ing the video sequence at a spécifie moment in time. Each W.îîWSKS* 20 frame, in turr., is ccmposed of a r.^risr of pixels, eachpixe- contaminç ode dans, reprsser.cir.g ohe video informationat a spécifie location in the frame.
In accordance with the présent invention, inputframes are decomposed into a set of scores or weightings invarions domains and sub-operands which are to be applied toone or more factors contained in a reference frame. Asshown in Figure 2, N input frames, each composed of M vari-ables, e.g., pixels, may be arranged in an N by M matrix202. In this représentation, the pixels are arranged as oneline for each frame, instead of the conventional two-dimen-sional row/column arrangement. The matrix 202 may then bedecomposed or represented by temporal score factors f=l, 2, . . . F for each frame, forming an N by F matrix 204, multi- plied by a spatial reference model, consisting of spatialloadings for the F factors, each with values for each of theM pixels, thus forming a loading matrix 206 of size F by M.
If the number of factors F is less than the smaller of N orM, a matrix of residuals (208) may be used to summarize theunmodelled portion of the data. This is described in fur-ther detail in H. Martens and T. Naes, Multivariate Calibra-tion. Chapter 3 (John Wiley & Sons, 1989), which is incorpo- rated herein by reference. This type of assumption-weakself-modelling or "soft modelling" may be optionally com-bined with more assumption-intensive "hard modelling" inother domains, such as movements of three-dimensional solidbodies and mixed multiplicative/addive modelling of intensi-ties by MSC modelling and extensions of this (H. Martens and 21 T. Na.es, Multivariate Calibration, pp 345-350, (John Wiley &
Sons, 1989), which is incorporated herein by rsference.
Figure 3b illustrâtes how several objects fromdifferent frames of a video sequence may be extracted asfactors and combined to form a reference frarae. As shown inFigure 3, frame 1 includes objects il and 12, a taxi andbuilding, respectively. Frame 4 includes the building 12only, while frame‘7 includes building 12 and car 13. Ananalysis of these frames in accordance with the présentinvention results in reference frame 20 which includesobjects il, 12, and 13. It should be noted that the holons need not be solid objects such as a house or a car. Rather,the same principles may be used to spatially represent moreplastic or déformable objects such as a talking head; howev-er, change factors in other domains may be required'. .
Figure 3a is a pictorial représentation of thedata format for each individual pixel in a reference image.Coordinate Systems other than conventional pixels may alsobe used in the model représentation. These include pyrami-dal représentations, polar coordinates or any irregular,sparse coordinate System.
As shown in Figure 3a, each pixel contains inten-sity information, which may be in the form of color informa-tion given in some color space, e.g., RGB; address informa-tion which may be in the form of vertical (V) , horizontal(H), and depth (Z) information; in addition to probabilistic, segment, and other information, the number ofsuch probabilistic values being different during the encoder 22 010269 parameter estimation as compared with after the parameter estimation.
Each of très® information components may in turna: various stages be composée! of one or more informationsub-components which may in turn be composed of one or morefurther information sub-components. For example, as shownin Figure 3a, the red (R) color intensity information con-tains several red information components R(0), R(l), R(2), ..... Similarly, R(2) contains one or more information sub- components indicating parameter value, uncertainty, andother statistical information.
The choice of objects which are used to constructthe reference image dépends on the type of application. Forexample, in the case of off-line encoding of previously re-corded video images, objects will be chosen to make thereference image as représentative as possible for longsequences of frames. In contrast, for on-line or real timeencoding applications, such as picture téléphoné or videoconferencing, objects will be selected such that the refer-ence image will closely correspond to the early images inthe sequence of frames. Subsequently, this initial refer-ence frame will be improved or modified with new objects asnew frame sequences are encountered and/or obsolète ones eliminated.
General temporal information (''scores") are repre-sented by the letter u followed by a second letter indicat-ing the type of score, e.g., uA fDr address scores. Occa-sionally, a subscript is added to indicate a spécifie pointin time, e.g., uA^, to indicate frame n. ο ί ΐΐ 1 6 9 23
Spacial information is represented in a nierarchi-cal format. The letter X is used to reprisent spatialinformation in general, and includes one or more of thefollowing domains: I (intensity), A (address) and P(prababilistic properties). These domains represent dataflow between operators and are thus referred to as operands.Each of these domain operands may in turn contain one ormore "sub-operands." For example, intensity I may containR, G and B sub-operands to indicate the spécifie colorreprésentation being used. Similarly, address A may containV (vertical), H (horizontal) and Z (depth) sub-operands toindicate the spécifie coordinate System being used. Also,probabilistic properties P may include sub-operands S (seg-ment) and T (transparency). Spatial information may berepresented in different formats for different pixels. In addition, the various domains and sub-operands may be refor-mulated or redefined at various stages of the data input,encoding, storage, transmission, decoding and output stages.
Each spatial point or pixel may thus be represent-ed by a number of different values from different domainsand sub-operands. For each sub-operand, there may be morethan one parameter or "change factor." The factors arecounted up from zéro, with the zeroth factor representingthe normal image information (default intensity andaddress). Thus, within X(0), 1(0) represents normal pictureintensity information, A(0) represents implicit coordinateaddress information and P(0). represents probabilistic infor-mation such as transparancy, while X(f), f>0 representsvarious other change model parameters or factor loadings, 24 010269 i.e. , systematic patterns in which the pixels vary togetherin che different domains.
Spatial information is defined for objects accord-ing to some spatial position, which is given in upper case 5 letters, lower case letters and subscripts. Upper case letters refer to spatial information in the reference imageposition, lower case letters refer to spatial information inthe position of a spécifie image, with the spécifie imagebeing indicated by a subscript. Thus, XRcf refers to the 10 spatial model in the reference position for a given sequence, while Xq refers to spatial data for input frame n.
Change fields, which are unparameterized différ-ence images, are used to indicate how to change one imageinto another according to the various domains. Change i 15 fields are indicated using a two letter symbol, typicallyused in conjunction with a two letter subscript. The firstletter of the two letter symbol is D or d which indicatesdifférence or delta, while the second letter indicates thedomain or sub-operand. The subscripts are used to designate 20 the starting and ending positions. For example, DARefim de-fines how to move the pixel values given in the referenceposition into those of reconstructed frame # m, while damdefines how to move pixel values from frame # m to frame # n. 25 Widening a Reference Model to Allow a Wider Range of Systematic Expression A reference image may be "widened" to include more types of change information than those available in the individual input images. For example, the picture intensity 010269 25 of a color image in an RGB System is typically representedby a single R, G and B intensity value for each of the red,green and blue color components associated with each indi-vidual pixel. However, in the case of a widened referenceimage, there may be several systematic ways in which groupsof pixels change together. These change factor loadings maybe defined for individual colors or combinations of colors,and for individual holons or groups of holons.
The "widening" of the reference image for a givenvideo sequence may also be performed for data domains otherthan color intensifies, such as address (coordinates) andvarious probabilistic properties such as transparency.Widening of the reference image is used to refer to the parameterization of the model used for a particular scene. 1
By combining different model parameters in different ways ina décoder, different individual manifestations of the modelmay be created. These output manifestations may be statis-tical approximations of the individual input data (individu-al video frames),· or they may represent entirely new, syn-thesized outputs, such as in Virtual reality applications.
The widening parameterization of the referenceframe in various domains may be obtained using a combinationof "soft" factor analytic modelling, traditional statisticalparameters, ad hoc residual modelling and "hard" or morecausally oriented modelling.
Once an extended or widened reference image model is established, it may be dynamically modified or updated to produce a "deepened" reference image model. This "deepened" reference model includes "harder" model parameters that hâve ··· 7·, fil 0269 2 6 a r.igh probability of representing important and relevantimage information, and a low probability of representingunimportant and irrelevant change information.
The purpose of widening in the various domains isto combine in a compact and flexible représentation, changeimage information from varions frames in a sequence. In thecase of antomatic encoding, this may be accomplished bycombining new change information for a given frame with thechange image information from previous frames in order toextract systematic and statistically stable common struc-tures. This is preferably accomplished by analyzing theresidual components of several frames and extracting modelparameter loadings. The computations may be carried outdirectly on the residuals or on various residual cross
J products. Different weighting functions can be used toensure that précisé change information is given more empha-sis than imprécise change information, as described in H.Martens and T. Naes, Multivariate Calibration, pp 314-321,(John Wiley & Sons, 1989), which is incorporated hereinby reference. The extraction of new bilinear factors andother parameters may be performed on different forms of thedata, ail providing essentially the same resuit. The dataformat may be raw image data, residual image informationafter removal of previously extracted model parameters ormodel parameters already extracted by some other method or at a different stage in the encoding process.
Several types of modellable structures may be ex- tracted during the widening process. One general type is based on spatio-temporal covariations, i.e., one or more 27 0 10 2 6 9 informational domains vary systematically over severalpixels over several frames. A typical form of covariationis multivariate linear covariance, which can be approximatedby bilinear factor modelling. This type of factor extrac- 5 tion is applicable to each of the different domains, e.g.,address, intensity and probabilistic. Nonlinear or non-metric summaries of covariations may also form the basis forthe widening operations.
Bilinear factors may, for example, be extracted 10 using singular vaine décomposition, which is applied to theresidual components from a number of frames. Singular valuedécomposition maximizes the weighted sum-of-squares used forextracting factors, but does not provide any balancing orfiltering of noise, or optimizing of future compression. 15 More advanced estimation techniques, such as the non-linear ) itérative least squares power method (NIPALS), may'be used.The NIPALS method is an open architecture allowing the useof additional criteria, as needed.
The NIPALS method is applied to a matrix of resid-20 ual values EaJ (matrix E in a system with a-l factors), from several frames in order to extract an additional factor andthereby reduce the size of the residual matrix to Ea (residu-al matrix in a system having a factors). The residualmatrix Ea can in turn be used to find the (a+1)th factor 25 resulting in residual matrix Εα+ι
This type of factor analysis may be applied to the different sub-operands in the vestons domains, and not justto the image intensifies. Typvcaily, address informationfor a picture frame is typical1 y given in terms of cartesian coordinates which specify horizontal and vertical addressesfor each pixel location. However, in a widened referenceframe, the address information may include multiple vari-ables for each single input pixel's coordinates.
The additional change factors in a widened refer-ence image, widen the range of applicability of the result-ing image model in the sense that many additional differentvisual qualifies or patterns may be represented by differentcombinations of the additional change factors or "loadings."In a preferred embodiment according to the présent inven-tion, the different loadings are combined linearly, i.e.,each loading is weighted by a "score" and the weightedloadings are summed to produce an overall loading. Thescore values used in the weighting process may be eitherpositive or négative and represent a scale factor applied tothe loadings or change factors. This will now be illustrât-ed for sub-operands red intensity r0, n=l, 2, . . . ,N and vertical address v0,n=l,2, . . ,N. When modelling intensity changes, thescores may be used to "turn up" or "turn down" the intensitypattern of the loading. Similarly, when modelling addressdistortion (movements), the scores are used to represent how.much or how little the loading is to be distorted.
Utilizing the above-mentioned widening principlefor widening a reference frame, an individual input frame’sredness intensity Rn, for example, may be modelled as alinear combination or summation of redness change factorloadings (note that the "hat" Symbol here is used in its 29 010269 conventional statistical meaning of "reconstructed" or "estimr r„hat = RRcf (0 ) *uR (0 ) n + RRcf ( 1 ) *uR ( 1 ) o + RRef (2 ) *uR (2 ) n + . . . (1) which may also be summarized over factors f=0,l,2,... usingmatrix notation as: r„hat = Rse^URn where RRef={ RRcf(0) , RRef(l) , RRef(2),...} represents the spatialchange factor loadings for redness in the extended referencemodel (for this holon) , and (¾ = {U0£n, Ulin . ..} ] UR^ = {uR(0)n, uR(l)n, uR(2)n, ...} represents the temporal rednessscores which are applied to the reference model, (designatedas i) to produce an estimate of frame n* s redness.
Intensity change factors of this type are herein called"blush factors" because they may be used to model how a face l blushes. However, it will be appreciated that these factorsmay be used to model many other types of signais and phenom-enon, including those not associated with video.
The use of these so-called blush factors is illustratedin Figures 4a through 4n. Figures 4a, 4b and 4c show theintensity images rn,n=l,2,3 of a red color channel for aperson blushing moderately (4a), blushing intensely (4b) andblushing lightly (4c), respectively. The first frame rj ishere defined as the reference frame. Accordingly, R(0)Rtf =ii·
Figures 4d through 4f show the corresponding intensity change fields DRRefn, n=l, 2,3 . In this non-moving illustration, the change field for a frame equals the dif- férence between the frame and the reference image, or drn=r„- 30 010269 rr f ( 0} - The chcinge field is a-LSO shovn cis 3- curve fox* 3. single line taken through the blushing cheeks of Figures 4a through 4c. As shown in Figures 4d through 4f, the lightly blushing (pale) face of figure 4c has the lowest intensity 5 change field values (Figure 4f) , the moderately blushing face of Figure 4a has no intensity change, since it actuallyis the reference image, (Figure 4d), while the intenselyblushing face of Figure 4b has the highest intensity changefield values (Figure 4e). L0 The statistical processing of the présent inven- tion will extract a set of generalized blush characteristicsor change factor loadings, to be used in different frames tomodel blushing States of varying intensity. Figures 4athrough 4f indicate a single blush phenomenon with respect i 15 to the reference image. The principal component analysis ofthe change fields DRReftD,n=l, 2,3 may give a good descriptionof this using one single blush factor, whose loading R(l)Rrfis shown in figure 4h with the respective scores (0, 1.0 and-0.5) given below. The modelling of the red intensity during 20 decoding in this case is achieved by applying these differ-ent scores to the main blush factor loading R(l)Ref to producedifferent change fields DRRcf,n (Figures 4i through 4k) andadding that to the reference image redness (Figure 4g) toproduce the reconstructed redness images (Figures 41 through 25 4n) : rnhat = RRef(0) + DRRcf,nwhere the redness change field is: DW RRcf(l) *uR(l)
As indicated by the numbers below figures 4d-f,the score value uR(i)n in obis case is 0 for the referenceimage (4a) itself, since here r,hat=RR.f ( 0 ) , is positive,e.g., 1.0, for the second frame (4b) with more intenseblushing, and is négative, e.g., -0.5, for the pale face in the third frame (4c). It should be noted that the négativescore for the third frame, Figure 4c, transforms the posi-tive blush loadings Figure 4h into a négative change fieldDRScf3 for the the third image which is paler than the refer- ence frame.
If more than one phenomenon contributed to theredness change in the images of this sequence, then themodel would require more than one change factor. For exam-ple, if the general illumination in the room was varied,independent of the person blushing and paling, this situa-tion may be modelled using a two factor solution, where thesecond factor involves applying a score uR(0)n to the refer-ence frame itself: r„hat - RRcf ( 0 ) + DRRef n where the blush change field is: = RKcf(0)*uR(0)n - RRef ( 1) *uR (1) n which may be generalized for different colors and different factors as: ^^R=f,n — ^Ref*Uln
Thus, Figures 4a-4n showloading 4h (contained in(appropriately scaled byc.hange fields such as are (2) how affect of blush factor
Iref) Cdd bç_ increased or decreasedscores to produce various blush shcwt\ m Figures 4d through 4f. 32
In this manner, significant amounts of intensity informatiormay be compressée! and represented by a single loading (Fig-ure 4h) and a sériés of less data intensive scores.
Changes in transparency T and changes inprobabilistic properties P may be modelled in a similarmanner. In the case of probabilistic modelling, bilinearmodelling is used in the preferred embodiment of the présentinvention. The spatial loadings P ( f),f=0,1,2.... and corre-sponding scores uP(f ) n, f=1,2,... together constitute theprobabilistic change factors.
Similar to the blush factors used to representintensity information, address information may also bemodelled by a linear combination of change factor loadings.For example, a frame's vertical address information V„ may be
I modelled in terms of a linear combination or summation ofchange factor loadings: DVn = VRef(0)*uV(0)n + VRef (1) *uV (1) n + VRcf (2) *uV (2) n + (1) which may also be summarized over vertical movement factorsf=0,l,2,... in matrix notation as: DVn = VRef*UVn where VRef={ VRef(0) , VRcf(l) , VRcf(2) , . . .} is the vertical spa-tial address change factor loadings for redness in theextended reference model (for this holon), and UVn = {uV(0)n, uV(l)n, uV (2) represents the temporal vertical movement scores which are applied to reference model in order to produce an estimate fo frame n's vertical coordi- nates for the various pixels in the frame. Address change 33 010269 factors of this type are referred to as "smile" factors, because they may be used to model how a face smiles.
Similar to the blush factors, here the verticaladdress change field needed to move the contents of thereference frame to approximate an input frame is referred toas DVRefn. It may be modelled as a sum of change contributionsfrom address change factor loadings (Vref) scaled by appropri-ate scores (un) . The address change factors are used to modelmotion and distortion of objects. The address change fac-tors used to model distortion of objects are referred to as "smile factors" because they may be used to model general-ized, "soft" movements, e.g. how a face smiles. However, itwill be appreciated that smile factors can egually wellmodel any signal or phenomenon, including those not associ-ated with video, which may be modelled as a complex ofsamples which may be distorted while still retaining acommon fundamental property.
The use of smile factors in accordance with theprésent invention is illustrated in Figures 5a through 5n.Figures 5a through 5c show a face exhibiting varying degrees'of smiling. Figure 5a shows a moderate smile; Figure 5bshows an intense smile; and Figure 5c shows a négative smileor frown. The moderately smiling face of Figure 5a may beused as part of the reference frame Figure 5g for illustra-tion. The address change fields DVRefn corresponding tovertical movements of the mouth with respect to the refer-ence image, as shcwn in Figures 5a through 5c, are shown in
Figures 5d through 5f. The concept of "reference position"(corresponding te trie reference image Figure 5g) is here 010269 34 illustrated for Figures 5d, e and f, in that numericalvalues of each pel in an address change field DVRcfn are givenat pixel coordinates in the reference image of Figure 5g,not at the coordinates in frames n=l,2,3 (Figures 5athrough 5c). Thus, the vertical change fields (movements)necessary to transform the reference image (Figure 5g) intoeach of the other frames Figures 5a through 5c are shown asvertical arrows at three points along the mouth at the posi-tion where the mouth is found in the reference image (Figure5g). The base of the arrows is the location of the mouth inthe reference image (Figure 5g), while the tips of thearrows are located at the corresponding points on the mouthin the other frames of Figures 5a through 5c. The fullchange fields are also given quantitatively alongside Fig-
I ures 5d through 5f as continous curves for the single linethrough the mouth in the reference image (Figure 5g).
Since the first frame of Figure 5a in this illus-tration functions both as the reference image (Figure 5g)and as an individual frame, the vertical smile change fieldDVRefi for frame 1 (Figure 5d) contains ail zéros. In Figure5b, the middle of the mouth moves downward and the ends ofthe mouth move upward. Thus, the smile field DVRef2 is néga-tive in the middle and positive at either side of the mouthin its reference position. The frown of Figure 5c illus-trâtes the opposite type pattern. These change fields thuscontain only one type of main movement and and may thus bemodelled using only one smile factor, and this may be ex-tracted by principal component analysis of the change fieldsin Figures 5d through 5f. The smile factor scores uVn are in 010269 35 this illustration, zéro for the reference image itself (Figure 5a) , positive for frame 2 (Figure 5b) and négative for frame 3 (Figure 5c), when the common vertical smile loading is as shown in Figure 5h.
If the head shown in Figures 5a through 5c werealso moving, i.e., nodding, independently of the smileaction, then a more involved movement model would be neededto accurately model ail the various movements. In thesimplest case, one or more additional smile factors could beused to model the head movements, in much the same manner asmulti-factor blush modelling. Each smile factor would thenhâve spatial loadings, with a variety of different movementsbeing simply modelled by various combinations of the few factor scores. Spatial rotation of image objects in two or / three dimensions would require factor loadings in morecoordinate dimensions, or alternatively require variouscoordinate dimensions to share some factor loadings. Forexample, if the person in Figures 5a-5n tilted their head 45degrees sideways, the smile movements modelled in Figures5a-5n as purely vertical movements would no longer be purelyvertical. Rather, an egually strong horizontal component ofmovement would also be required. The varying smile of themouth would still be a one-factor movement, but now with both a vertical and a horizontal component. Both a verticaland a horizontal loading may be used, in this case withequal scores. Alternatively, both the vertical and horizon-tal movement may share the same loading (Figure 5h), butagain with different scores depending on the angle of thetilting head. 36 V 1 0269
For better control and simpler decoding and com-pression, some movements may instead be modelled by a hardmovement model, referred to as "nod" factors. The nodfactors do not utilize explicit loadings, but rather referto affine transformations of solid bodies, including camérazoom and movements. Smile and nod movements may then becombined in a variety of ways. In a preferred embodimentaccording to the présent invention, a cascade of movementsis created according to some connectivity criteria. Forexample, minor movements and movement of pliable, non-solidbodies, such as a smiling mouth, may be modelled using smilefactors (soft modelling) , while major movements and movement of solid bodies, such as a head, may be modelled using nodfactors (hard modelling). In the case of a talking head,
I the soft models are first applied to modify the initialvertical reference addresses VRcf to the "smiled" coordinatesin the reference position, Vn-smilcd(aRef. The same procedure iscarried out for the horizontal, and optionally to the depth,coordinates for forming . These smiled coordinates
An.smücdeRcf are then modified by affine transformations, i.e.,rotation, scaling, shearing, etc., to produce the smiled andnodded coordinate values, still given in the referenceposition, The final address change field DARefn is then calculated as DARefn= Α^^-Α^.
ENCODING
Generally, the encchm process includes estab-lishing the spatial model paraxc-ters Xrcf for one or more 37 010269 referen.ce images or models and then estimating trie temporalscores U„ and residuals E„ for each frame. The encodingprocess may be fully manual, fully automatic or a mix ofmanual and automatic encoding. The encoding process iscarried out for intensity changes, movement changes, distor-tions and probabalistic statistical changes.
Manual Encoding
In one embodiment according to the présent inven-tion, video sequences may be modelled manually. In the caseof manual modelling, an operator Controls the modelling andinterprets the sequence of the input video data. Manualmodelling may be performed using any of a number of avail- able drawing tools, such as "Corel'Draw" or "Aldus !
Photoshop", or other specialized software.
Since humans are fairly good at intuitively dis-criminating between smile, blush and segmenting, the encod-ing process becomes mainly a matter of conveying this infor-mation to a computer for subséquent use, rather than havinga computerized process develop these complicated relation-ships.
If there are reasons for using separate models,such as if the sequence switches between different clips,the clip boundaries or cuts may be determined by inspectionof the sequence. Related clips are grouped together into ascene. The different scenes can then be modelled separate-iy.
For a given scene, if there are régions whichexhibit correlated changes in position or intensity, these 38 régions are isolated as holons by the human operator. Theserégions may correspond to objects in the ssquence. Inaddition, other phenomena such as shadows or reflections maybe chosen as holons. In the case of a complex object, it 5 may be advantageous to divide the object into several holons. For instance, instead of modelling an entire walk-ing person as one holon, it may be easier to model each portion, e.g., limb, separately. ea.ch holon, the frame where the holon is best 10 represented spatially is found by inspection. This is referred to as the reference frame. A good représentationmeans that the holon is not occluded by or affected byshadows from other holons, is not significantly affected bymotion blur, and is as représentative for as much of the 15 sequence as possible. If a good représentation cannot befound in any spécifie frame in the sequence, the holonreprésentation may be synthesized by assembling good repré-sentation portions from several different original frames,or by retouching. In this case of a synthesized holon, the 20 reference frame is made up of only the synthesized holon.Synthesized holons are quite adéquate for partially trans-parent holons such as shadows, where a smooth dark image isoften sufficient. This chosen or synthetic holon will beincluded as part of the reference image. The intensity 25 images of the holons from the respective frames are extract- ed and assembled into one common reference image.
Each holon must be assigned an arbitrary, but unique, holon number. A segmentation image the same size as reference image is then formed, the segmentation image containing ail the holons; however, the pixel intensity foreach pixel within the holon is replaced by the spécifieholon number. This image is referred to as the segmentationor S field.
Holon depth information is obtained by judgingocclusions, perspective or any other depth due, in order toarrange the holons according to depth. If there are severalpossible choices of depth orderings, e.g., if two holons inthe sequence never occlude each other and appear to hâve thesame depth, an arbitrary order is chosen. If no singledepth ordering is possible, because the order changes duringthe sequence, e.g., holon A occludes holon B at one timewhile holon B occludes holon A at another time, one of thepossible depth orderings is chosen arbitrarily, This depthordering is then converted into a depth scale in such a way that zéro corresponds to something infinitely far away andfull scale corresponds to essentially zéro depth, i.e.,nearest to the caméra. Depth scale may conveniently bespecified or expressed using the intensity scale availablein the drawing tool, such that infinitely far away objectsare assigned an intensity of zéro, and very close objectsare assigned full scale intensity. Based on this depthordering, an image is then formed having the same size asthe reference image; however, each pixel value has an inten-sity value functioning as a depth value. This image is referred to as the Z field.
Manual modelling or encoding also includes deter- mining holon opacity information. Opacity is determined by first forming an image that has maximum intensity value for 40 completely opaque pixels, zéros for entirely transparentpixels, and intermediate values for the remaining pixels.Typically, most objects will hâve the maximum value (maximumopacity) for the interior portion and a narrow zone withintermediate values at the edges to make it blend well withthe background. On the other hand, shadows and reflectionswill hâve values at approximately half the maximum. Thisimage which indicàtes opacity is referred to as the Probfield.
Holon movement information is obtained by firstdetermining the vertical and horizontal displacement, be-tween the reference image and the reference frame for eachholon. This is carried out for selected, easily recogniz-able pixels of the holons. These displacements are thenscaled so that no movement corresponds to more tïian half ofthe maximum intensity scale of the drawing tool. Darkerintensity values correspond to vertically upward or horizon-tally-leftward movements. Similarly, lighter intensityvalues correspond to the opposite directions, so that maxi-mum movements in both directions do not exceed the maximum intensity value of the drawing tool. Two new images, onefor the vertical and one for the horizontal dimension,collectively form the "first smile load", which is the samesize as the reference image. The scaled displacements arethen placed at the corresponding addresses in the firstsmile load, and the displacements for the remaining pixelsare formed using manual or automatic interpolation.
The first smile load should preferably be verified by preparing ail of the above-described fields for use in 4ΐ t ί ί* 2 8 9 the décoder, along wich a table of score values (this tablewill is referred to as the "Time Sériés") . Next, the scoresfor the first smile factor are set to 1 for ail holons whichform part of a test frame, which is then decoded. Theresulting decoded frame should provide good reproduction ofthe holons in their respective reference frame (except forblush effects, which hâve not yet been adressed). if thisis not the case, the cause of each particular error caneasily be attributed to an incorrect smile score or load,which may be adjusted, and then the process repeated usingthe new values. This process correctly establish.es how tomove holons from the reference image position to the refer-ence frame position.
Next, the movement of holons between frames mustbe estimated. For each holon, a frame is selected where the
I holon has moved in an easily détectable manner relative tothe decoded approximation of the reference frame, Im, whichis referred to as an intermediate frame. The saine procedurefor determining the first smile load is carried out, exceptthat now movement is measured from the decoded referenceframe to the selected new frame, and the resulting output isreferred to as the "Second smile load." These displacementsare positioned in the appropriate locations in the referenceimage, and the remaining values obtained by interpolation.The smile scores for both the f \ tst and second smile loadsfor ail holons are set to 1, c\nd Yhen the selected frame isdecoded. The resuit should tae. a good reproduction of the selected frame (except for blosh £||ects, which hâve not yet been adressed). Y,, 'ο ϊ 0 2 6 δ 42
The movement for the remaining frames in thesequence is obtained by merely changing the smile scoresusing trial and error based on the already established smileloads. Whenever a sufficiently good reproduction of themovement cannot be found using the already established smilefactors only, a new factor must be introduced according tothe method outlined above. The displacement for selectedfeatures (pixels) between each decoded intermediate frame Imand the corresponding frame in the original sequence ismeasured and the resuit stored in the reference image posi-tion. The remaining pixels are obtained by interpolation,and the final resuit verified and any necessary correctionperformed.
When the above process for calculating smile
I factors has produced sufficiently accurate movement repro-duction, blush factors may then be introduced. This may beperformed automatically by working through each frame in thesequence, and decoding each frame using the establishedsmile factors, and calculating the différence between eachdecoded and the corresponding frame in the originalsequence. This différence is then moved back to the refer-ence position and stored. Singular value décomposition maythen be performed for the différences represented in thereference position, in order to produce the desired blush loads and scores.
Addition of nod factors
Nod and smile factors may be combined in several ways, two of which will be discussed. In the first method, 43 movement can be described as one contribution from the smilefactors and one contribution from the nod factors, with thetwo contributions being added together. In the secondmethod, the pixel coordinates can first be smiled and then nodded.
In the first method, i.e., additive nod and smilefactors, the decoding process for one pixel in the referenceimage adds together the contributions from the differentsmile factors, and calculâtes the displacement due to thenod factors using the original position in the referenceimage. These two contributions are then added to producethe final pixel movement.
In the second method, i.e., cascaded nod and smilefactors, the decoding process first adds together the con-
I tributions from the different smile factors, and then ap-plies the nod factors to the already smiled pixel coordi-nates .
The first method is somewhat simpler to implement,while the second method may produce a model which corre-sponds more closely to the true physical interprétation ofseouences where nod factors correspond to large movements ofentire objects and smile factors correspond to small plasticdeformations of large objects.
The process of extracting smile factors can beextended to also include nod factors, which are used torepresent movements of solid objects (affine transforma-tions) . Essentially, nod factors are spécial situations ofsmile factors. Specifically, each time a new smile factorhas been calculated for a holon, it can be approximated by a 44
Cl 0259 nod factor, This approximation will be sufficiently accu-rate if the smile loads possess characteristics such thatfor verical and horizontal dimensions, movement of a pixelcan be considered as a function of its vertical and horizon-tal position, which can be fitted to a spécifie planethrough 3-dimensional space. Nod factors essentially corre-spond to the movement of rigid objects. The approximationwill be less accurate when the smile factors correspondinstead to plastic deformations of a holon.
To establish the nod loads, the smile loads areprojected onto three "nod loads" of the same size as theextended reference image. The first nod load is an imagewhere each pixel value is set to the vertical address ofthat pixel. The second nod load is an image where each
I pixel value is set to the horizontal address of that pixel.Finally, the third nod load is an image consisting of ail ones.
In the case of a nod factor added to a smilefactor, i.e., additive nod, the above procedure for extract-ing new smile factors may be utilized. However, for thecase of a cascaded nod factor, i.e., encoding using first anod factor and then a smile factor, one additional step mustbe performed in the encoding process. Whenever a new smileload is estimated based on an intermediate frame Im whichhas been produced using nod factors, not only must theposition in Im of the displacement be mapped back to thereference image, but the actual displacements must also bem,apped back using the inverse of the nod factor. In the 45 6 1 0269 case of cascaded nod and smile, in the décoder, each frame is first "smiled" and then "nodded."
DEEPENING NOD
In the general case of one nod factor per holon,the nod factors transmitted to the décoder consist of oneset of nod parameters for each holon for each frame. Howev-er, there may be strong corrélations between the nod parame-ters between holons and between frames. The corrélationsbetween holons may be due to the fact that the holons repre-sent individual parts of a larger object that moves in afairly coordinated manner, which is however, not sufficient-ly coordinated to be considered a holon itself. In addi-tion, when the holons correspond to physical objects, theremay also be corrélations between frames due to physicalobjects exhibiting fairly linear movement. When objectsmove in one direction, they often continue moving at approx-imately the same speed in a similar direction over thecourse of the next few frames. Based on these observations,nod factors may be deepened.
In the case of manual encoding, the operator canusually group the holons so that there is a common relation-ship among the holons of each group. This grouping isreferred to as a superholon and the individual holons withinsuch a group are referred to as subholons. This type ofgrouping may be repeated, whereby several superholons maythemselves be subholons of a higher superholon. Both sub-holons and superholons retain ail their features as holons.
In the case of automatic encoding, similar groupings can beestablished through cluster analysis of the nod transforms.
The nod factors for the subholons of one super- holon may be separated into two components, the first compo-rtent used to describe movements of the superholon and thesecond component used to describe movement of that individu-el sub-holon relative to the superholon.
The deepening of the nod factors between frames includes determining relationships between frames for nodfactors belonging to the same holon, be it a standard holon,superholon or subholon. This is accomplished by dividingthe nod factors into a static part, which defines a startingposition for the holon; a trajectory part, which defines atrajectory the holon may follow; and a dynamic part, whichdescribes the location along the trajectory for a spécifieholon in a given frame. Both the static and trajectoryparts may be defined according to the reference image or tothe nod factors of superholons.
The deepened nod factors represent sets of affine transforms and may be represented as a set of matrices, seeWilliam M. Newman and Robert F. Sproull, Principles ofInteractive Computer Graphics, page 57 (mCGraw Hill 1984),which is incorporated herein by reference. The static partcorresponds to one fixed matrix. The trajectory and dynamicparts correspond to a paramete-f i ced matrix, the matrix beingthe trajectory part and the pafattefcer being the dynamicpart, see Newman &amp; Sproull, 59, which is incorporated herein by reference. These ttQf\<sfcorms may be concatenatedtogether with respect to the felationships between the 010269 47 static, trajectory and dynamic parts. The transforras mayalso be concatenated together with respect to the combina-tions of several behaviors along a trajectory, as well aswith respect to the relationships between superholons andsubholons, see Newman &amp; Sproull, page 58, which is incorpo-rated herein by reference.
The above operations may be readily performed by ahuman operator utilizing: a method for specifying fullaffine transform matrices without parameters; a method forstoring transform matrices with sufficient room for oneparameter each specifying translation, scaling, rotation orshear; a method for specifying which transform matricesshould be concatenated together in order to form new trans-form matrices; and a method for specifying which transform i (which may be a resuit of concatenating several transforms)should be applied to each holon.
Automatic Encoding
In the case of automatic or semi-automatic encod-ing, the encoding process may be itérative, increasing theefficiency of the encoding with each itération. An impor-tant aspect of automatic encoding is achieving the correctbalance between intensity changes and address changes be-cause intensity changes may be modelled inefficiently asaddress changes and vice versa. Thus, in the modelling ofthe domains it is critical that the respective scores andresiduals be estimated by a process which avoids inefficientmodelling of intensity changes as address changes and viceversa. This is accomplished by building the sequence model 48 in such a way that blush modelling is introduced only whennecessary and, making sure that the modél parameters hâveapplicability to multiple £rames. A preferred embodimentinvolving full sequence modelling, and an alternative em-bodiment involving simplified sequence modelling, will bedescribed herein. In the présent description, the individu-al building blocks of the encoder will first be presented ata fairly high level, and then the operation and control ofthese building blocks will be described in more detail.
Automatic Encoder OverView
Automatic or semiautomatic encoding according tothe présent invention in the case of video sequence datawill be described in detail with reference to Figures 6-13.Figure 6 is a block diagram of an encoder according to the l présent invention. Figure 7 is a block diagram of a modelestimator portion of the encoder of Figure 6. Figures 8-10show details and principles of a preferred embodiment of theChangeFieldEstimator part of the ModelEstimator.
Figure'11 shows details of theLocalChangeFieldEstimator part of the ChangeFieldEstimator.
Figure 12 outlines the Interpréter of the Model-Estimator .
Figure 13 outlines the separate Décoder.
High Level Encoder Operation
The input data (610), which may be stored on adigital storage medium, consists of the video sequence xscqwith input images for frames n=l,2 , ...,nFrames. This input 43 010269 includes the actual intensity data iseq, with individual colorchannels according to a suitable format for color représen-tation, e.g. [Rscq, Gscq, BJeq] and some suitable spatial resolu-tion format. The input also consists of implicit or explic-it 2D coordinate address or location data a5eq for the differ-ent pixels or pels. Thus, the video sequence for eachframe consists of in, aa and pn information.
Finally, xIcq may also consist of probabalisticqualifies p«q to be used for enhancing the IDLE encoding.
These data consist of the following results of preprocessingof each frame: (a) Modelability, which is an estimate of theprobability that the different parts of a frame are easilydétectable in preceding or subséquent frames; (b) HeteroPel,which indicates the probability that the pels represent
J homogenous or heterogenous optical structures.
The automatic encoder according to the présentinvention consists of a high-level MultiPass controller 620and a ModelEstimator 630. The MultiPass controller 620optimizes the repeated frame-wise estimation performed for asériés of frames of a given sequence. The ModelEstimator630 optimizes the modelling of each individual video frame n.
In the preferred embodiment, a full sequence model withparameters in the different domains is gradually expanded("extended" and "widened") and refined ("deepened" or sta-tistically "updated") by including information from thedifferent frames of a sequence. The full sequence model isfurther refined in consecutive, itérative passes through the sequence. 010269 50
In contrast, in the alternative embodiment in- volving simplified modelling, a set of competing extra sequence models are developed in the different domains and over a number of different frames, in order to model the as 5 yet unmodelled portion of the input frames x„. It should be | noted that the modelled portion of the input frames xn has I been modelling using the established sequence model XRef. 1 I Each of these competing extra models has parameters in only •jt | one single domain. The number of frames (length of a pass)
I | 10 used to estimate parameters m each of the domains dépends I» | on how easily the f rames are modelled. At the end of the | pass in each domain, the full sequence model is then "wid- I ened" or "extended" by choosing a new factor or segmentation
Ifrom the competing extra domain model that has shown the15 best increase in modelling ability for the frames. This
embodiment is described in detail in Appendix II SIMPLIFIED ENCODER.
The ModelEstimator 630 takes as input the data foreach individual frame (640) , consisting of [in, an and pQ] 20 as defined above. It also takes as input, a preliminary,previously estimated model XRef (650) as a stabilizing inputfor the sequence. As output, the ModelEstimator 630 deliv-ers a reconstructed version of the input image x^hat (660)and a corresponding lack-of-fit residual en=xn-xnhat (665), 2 5 plus an improved version of the model XRef (655) .
The ModelEstimator 630 may also input/output LocalMo-dels 670 for the data structures in the vicinity of frame n. > ; Additionally, the ModelEstimator 630 may take as input pre-established moael éléments from an external Model- 010269 51
Primitives data base 680, which may consist of spatial andtemporal models of movement patterns, e.g. a human face orbody, running water, moving leaves and branches, and sim-pler modelling éléments such as polyhedral object models(see David W. Murray, David A. Castelow and Bernard F.
Buxon, "FROM IMAGE SEQUENCES TO RECOGNIZED MOVING POLYHEDRALOBJECTS", Internatl Journal of Computer Vision, 3, pp. 181-208, 1989, which is incorporated herein by reference.
The ModelEstimator 630 also exchanges controlinformation 63 5 and 637 from and to the Multipass Controller620. Details regarding the control parameters are notexplicitly shown in the subséquent figures.
Model Estimator A full implémentation of the ModelEstimator 630 ofFigure 6 is shown in Figure 7 for a given frame n. TheModelEstimator 630 contains a ChangeFieldEstimator 710 andan Interpréter 720. The ChangeFieldEstimator 710 takes asprimary input the data for the frame, Xn (corresponding to640) (consisting of image intensity data in, address informa-tion an and probabilistic information pn) . It also takes asinput, information from the preliminary version of thecurrent spatial and temporal Model XRcf, USeq 760 (correspond-ing to 650) existing at this point in time in the encodingprocess. The preliminary model information 760 is used tostabilize the estimation of tne -nangefield image fields inthe ChangeFieldEstimator 710, change fields being used to change the intensity and other quantifies of the prelimi-nary SequenceModel XRef(UScq (760) of the extended Reference 52 image in order to approximate as close as possible the inputimage intensities, in.
The ChangeFieldEstimator 710 also inputs variouscontrol parameters from the Multipass Controller 620 andexchanges local control information 755 and 756 with theInterpréter 720.
As its main output, the ChangeFieldEstimator 710yields the estimated change image fields DXRefn (730) whichare used to change the spatial and temporal parameters ofthe preliminary SequenceModel XRcf USeq (760) of the extendedReference image in order to approximate, as closely aspossible, the input image intensities, in. It also yieldspreliminary model-based decoded (reconstructed) versions ofthe input image, xnhat (640) and the corresponding lack-of-
I -fit residuals en (645) .
The ChangeFieldEstimator 710 also yields localprobabilistic quantifies wn (750), which contain variouswarnings and guidance statistics for the subséquent Inter-préter 720. Optionally, the ChangeFieldEstimator 710 inputsand updates local models 670 to further optimize and stabi-lize the parameter-estimation process.
The Interpréter 720 détermines the estimatedchange image fields DXRef-0, 73 0 as well as the preliminaryforecast x^hat and residual en, plus the estimation warningswn 750 and control parameters output from the MultiPassController 620. Optionally, the Interpréter 720 receivesinput information from the external data base of modelprimitives, 780. These model primitives are of several 53 G 10269 types: Sets of spatial Ioadings or temporal score sériéspreviously estimated from other data may be included inprésent IDLE model in order to improve compression or modelfunctionality. One example of usage of spatial loadingmodels is when already established general models for mouthmovements are adapted into the modelling of a talkingperson's face in picture téléphoné encoding. Thereby a widerange of mouth movements become available without having toestimate and store/transmit the detailed factor Ioadings;only the parameters for adapting the general mouth movementIoadings to the présent person's face need to be estimatedand stored/transmitted.
Similarly, including already established movementpatterns into an IDLE model is illustrated by using pre-
J estimated score time sériés for the movement of a walkingand a running person in video games applications- In thiscase the pre-established scores and their correspondingsmile Ioadings must be adapted to person(s) in the présentvideo game reference image, but the full model for walkingand running people does not hâve to be estimated. A third example of the use of model primitives isthe décomposition of the reference image into simpler, pre-defined geometrical shapes (e.g. polygons) for still imagecompression of the reference model XRef.
The Interpréter subsequently modifies the contents of the SequenceModel XRef 760 and outputs this as an updated sequence SequenceModel (765), together with a modified model-based decoded version of the input image, x„hat (770) and the corresponding lack-of-fit residual en (775). Upon 0 1 0269 54 convergence (determined in the MultiPass Controller 620) these outputs are used as the outputs of the entire ModelE- stimator (630) .
Chancre Field Estimator
Figure 8 is a block diagram représentation of aChangeFieldEstimator 710 according to a preferred embodimentof the présent invention. As shown in Figure 8, an inputframe which has been converted into the correct formatand color space used in the présent encoder, is provided tothe ChangeFieldEstimator 710. The SequenceModel XRef (760),in whatever form available at this stage of the model esti-mation, is also input to the ChangeFieldEstimator 710. Themain output from the ChangeFieldEstimator 710 is the changeimage field DXRefn (890) which converts the SequenceModel XRtf810 into a good estimate of the input frame
The ChangeField Estimator 710 may be implementedin either of two ways. First, in the preferred embodiment,the change fields are optimized separately for each domain,and the optimal combination determined iteratively in theInterpréter 720. Alternatively, the change fields may beoptimized jointly for the different domains within theChangeField Estimator 710. This will be described in moredetail below.
Additional outputs include the preliminary esti-mate, x„hat (892) the différence between the input and pre-liminary estimate, en (894), together with warnings wn (896)
Forecasting position m 55 010269
For both computational and statistical reasons, itis important to simplify the estimation of the change fieldas much as possible. In the présent embodiment of thechange field estimator, this is accomplished by forecastingan estimate χ,η which should resemble the input frame x„, andthen only estimating the local changes in going from x^ to xnin order to represent each input frame more accurately.
As will be described in more detail below, theChangeFieldEstimator 710 of the présent preferred embodi-ment, initially utilizes an internai Forecaster 810 andDécoder 830 to forecast an estimate, termed x,,, 835, toresemble the input frame x„. The Forecaster (810) receivesas input the temporal SequenceModel (811) and outputs forecasted temporal scores u„ (815) which are then input to
I the Décoder (830). The Décoder 830 combines these scoreswith the spatial sequence model XRef 831, yielding the desiredforecasted frame x^ (835). Additional details regarding thedécoder are set forth below.
Estimating. local chance field from m to inout frame n
Next, a LocalChangeFieldEstimator (850) is em-ployed to estimate the local change field needed to go fromthe forecasted to the actual input frame xn. This changeis referred to as the estimated local change field dx^, (855), and contains information in several domains, mainly ύ 1 Ο 2 6 9 56 movement and intensity change, as will be discussed indetailed below.
In the estimated local change field dx^, the dataon how to change the content of the forecast xm are given 5 for each pixel in the "m position", i.e. in the positionwhere the pixel is positioned in the forecasted frame x,,,.
In order to be able to model these new changefield datatogether with corresponding changefield data obtained previ-ously for other frames, it is important to move the change- 10 field data for ail frames to a common position. In the présent embodiment, this common position is referred to asthe Reference position, or reference frame XRef. This move-ment back to the common reference position will be describedbelow. Note that capital letters will be used to designate 15 data given in this reference position of the extended reference image model, while lower-case letters will be usedfor data given in the input format of image and approxi-mations of the input image x^.
An auxiliary output from the Décoder 830 is the 20 inverse address change field, dam-Ref 865 that allows a Moveroperator 870 to move the obtained local change field infor-mation dx„ from being given in the m position back to thecommon Reference position. This moved version of dxmn outputis referred to as 875, with capital letters denoting 25 that the information is not g: ver. in the reference position.
The local ChangeFiel iFstimator 850 may also re- ceive the full model XRcf, movei · ? the m position (xRefam 83 6) , plus correspondingly moved vers; v.s of DXRefm 825, and the JICEB2%3Zggr —’ " 010269 57 return smile field damRef 865 as inputs (not shown) from theDécoder 830, for use in internai stabilization of the param-eter estimation for dx^ 835.
Estimatino the full chancre field for frame nThe next step in the encoding process is to déter- mine the full estimated change field in going from theReference position to the estimated position of input framen. This is accomplished by presenting the change field DXRefnoriginally used for transforming XRef to to Adder 880together with the obtained ΡΧ^Β^, yielding the desired mainoutput, DXRcfn.
Illustration of local change estimation
The use of the forecasted position m, which hasbeen described above, is illustrated conceptually in Figure9 for the case of an address change DA for a given pel in animage representing a moving object. The détermination ofDARef>n, (as part of the change field DXRcf>) is represented aselement 902 in figure 9. The estimation of DARef>n, is a fourstage process.
The first step is to détermine the forecast changefield that moves spatial information from the Referenceposition to the forecasted m position, resulting in anapproximation of the input frame n. This is based on theaddress change field DARcf>tn (904) represented by the vectorfrom point Ref to point m. This vector is determined byforecasting, and is a part of DXRcfjn. b 2 (! 2 6 δ 58
Second, the local movement field from the fore-casted position m to the actual input frame # n, da^ (926) ,is determined.
Third, the estimated resuit dara is "moved" ortranslated back from the m position to the Reference posi-tion, usine? the inverse movement field daRefm (905) (i.e., the vector from the m position to the Reference position) , thusyielding DA^^f (936) .
Finally, the two fields given with respect to theReference position Ref, i.e., DARef>m and are added to yield the desired DARef_n (946) .
Thus, the function of the mover 870 is to "itiove"the local change field da^ back to the reference image model position Ref. Thus, ail the éléments in dx^ (dim, da^ and ! dp™) are thus moved back to the Ref position. The output ofmover 870 is (875) , which is the local change informa- tion in going from the forecasted frame m to the input framen, but positioned with respect to the Reference positionRef. The change information is "moved" back to the refer-ence position Ref in order to ensure that change informationobtained from frame n about a given object is positionedtogether with change information obtained from other framesabout the same object. By positioning ail information aboutan object in the same pel position, it is possible to devel-op simple models of the systematic changes in the sequence.
In this way, the System attempts dynamically to improve the initial estimation of input frames. In the case where the address change field DARefm (904) is defined to be ail zéros, 59 0ί 0269 the LocalChangeFieldEstimator 850 has to estimate the full change field DARef-n directly as da^,. This may for example cake place at the beginning of an encoding process, and for frames n, close to the frame used for initializing the reference image model.
It should be noted that the local probabilisticchange information dp^ contains extra dimensions containingstatistical descriptions of the performance of the LocalChangeField Estimator (850). For these dimensions, thecorresponding change field in DARcfm is considered as beingempty. These additional dimensions are used by the Inter-préter (720) for encoding optimization. These dimensionsmay, for example, reflect possible folding or occlusionproblems causing x,,, to hâve lost some of XRef' s spatial inf or-
I mation needed to estimate input frame sq,, as well as spatialinnovations in x^ needed to be included into XRef at a laterstage.
The LocalChangeFieldEstimator (850) also outputsan estimate of the input frame, x„hat (892) , the lack-of-fitresidual en (894) and certain interprétation warnings wn(896). These are also passed on to the Interpréter (720)where they are used for encoding optimization.
The input and output of Local Model information(899) for the LocalChangeFieldEstimator will be discussed in detail below.
Change Field Estimator
The Local Change Field Estimator 850 of Figure 8is shown in more detail in Figure 10, with each domain I, Aand P illustrated separately. It should be noted that each 010269 60 of these domains again contains subdomains (e.g. R, g, B inI; V, H, Z in A). For purposes of simplicity, these are notillustrated explicitly.
Referring now to Figure 10, which is a more de-5 tailed illustration of the main parts of the Change FieldEstimator of Figure 8, the available temporal score esti-mâtes for the seguence are used in the Forecaster 1010 to yield forecasted factors or scores for frame m in the threedomains: Intensity (ulm) , Address (uA,,,) and Probabilities 10 (uPm) .
Internai décoder portion of encoder ÇhangeFieldMaker l 15 The internai décoder portion of the encoder in- cludes ChangeField Maker 1020, Adder 1030 and Mover 1040which operate on their associated input, output and internaidata streams. In the first stage (change field maker) of thedécoder portion Internai to the encoder, the factors or 20 scores are combined with the corresponding spatial factorloadings available in the (preliminary) spatial model XRef inthe ChangeField Maker 1020 to produce the forecast changefields. For each domain I, A and P, and for each of theirsubdomains, the estimated factor scores and factor loadings 25 are multiplied and the resuit accumulated, yielding theforecast change fields DIRif-ra, DARcfi„, DPRef>m.
For simplicity, the additional functionality of hard mpdelling is not included in figures 8 and 10 for the inter- nai décoder portion of the encoder. This will instead be 0 î G 2 6 9 61 discussed below in conjunction with the separate DécoderFigure 13 together with various other additional details, asthe separate Décoder is essentially identical to the présentinternai décoder portion of the encoder.
Adder
In the second stage (adder) of the décoder, thechange fields are added to the corresponding basic (prelimi-nary) spatial images in Adder 1030, i.e., the extendedreference image intensities IRef(0) (e.g. RGB) , the (implicit)extended reference image addresses ARef(0) (e.g. VHZ) and theextended reference image probabilities PRef(0) (e.g. opacity) .This resuit s in Im0Rcf, and PmSRef.
I
Mover
The forecast change fields are transformed inMover 1040 in accordance with the movement field DARefm (904in Fig.9), yielding the forecasted intensity image im (e.g.in RGB), forecasted address image ara (e.g. VHZ) and fore-casted probabilistic image pm (e.g. opacity). Together,these forecasted data portions form the forecast output 3^(835 in figure 8) from décoder 830 of Figure 8.
Local ChangeField Est'.mator ·
The Local ChangeFiel : Fstimator (850) estimâteshow to change the forecasted x,n generated in the Décoder 830, in one or more dora: ns, primarily the intensity » ? - rw yr;' 010269 62 domain, in order to accrately approximate the input frame,xn. The resulting estimated changes are referred to as theLocal Change Fields dx^,.
The sequence model loadings, rnoved from the refer- 5 ence position to the forecasted position, xRcf(5)m 837 may beused as input for statistical model stabilization. maddition, a Local Models 899 may be used to stabilize thisestimation. The Local Models may be a spécial case modeloptimized for a particular subset of frames. 10
Separate versus joint domains in change field estimation
In the case of joint domain estimation of thelocal change fields in the ChangeField Estimator 710, some 15 m-n déviations are attributed to intensity différence di^,while some m-n déviations are instead attributed to move-ments da^, and additional m-n déviations attributed tosegmentation and other probabilistic différences dp^. TheChangeField Estimator 710 then requires internai logic and 20 itérative processing to balance the different domains sothat the same m-n change is not modelled in more than onedomain at the same time. Since the resulting local changefield dx^ already contains the proper balance of the contri-butions from the different domains, this simplifies the 25 remaining portion of the encoding process.
However, when dealing with joint local change field domains, the Local ChangeField Estimator 850 must make itérative use of various internai modelling mechanisms in order to balance the contributions from the various domains. 63 010269
Since these internai mechanisms (factor-score estimation,segmentation) are already required in the Interpréter (tobalance the contributions of different frames), the pre-ferred embodiment instead employs separate modelling of thevarious change field domains in the Local ChangeField Esti-mator 850. This results in a much simpler design of theLocal ChangeField Estimator 850. However, the encodingprocess must then 'iterate back and forth between theChangeField Estimator 710 and the Interpréter 720 severaltimes for each frame n, in order to arrive at an optimalbalance between modelling in the different domains for eachframe. The forecasted frame x,,, is thus changed after eachitération in order to better approximate χ„, and the incrémental changes in the different domains are accumulatedby the Interpréter 720, as will be described belott.
Local ChangeField Estimator using separatedomain modelling
The primary purpose of the LocalChangeField Esti-mator 850, shown in detail in Figure 11, is to estimateusing the forecasted frame 1101 and input frame 1102,the local change fields ώς^ 1103, used in going from theforecasted frame m to the input frame n.
The Local ChangeFieldEstimator 850 employs sepa-rate estimation of the different domains. An estimator,EstSmile 1110, estimâtes the local address change fields(smile fields) da^ 1115, while a separate estimator,EstBlush 1120, estimâtes the local intensity change fields(blush fields) di^, 1125. Either of these estimators may beused to estimate the probabilistic change fields dp^ 1126. 64 010269
The embodiment of Figure 11 illustrâtes the case where the probabilistic change fields are estimated by the EstBlush estimator 1120.
In addition, both estimators 1110 and 1120 provideapproximations, 1112 and 1114 respectively, of the inputdata, residuals and warnings. The warnings are used forthose image régions that are difficult to model in the givenestimator. The output streams 1112 and 1114 from the twoestimators are then provided as two separate sets of outputapproximations, x„hat, residuals ex„ and warnings wn.
EstSmile 1110 motion estimator
The EstSmile 1110 motion estimator estimâtes thelocal address change field da^ primarily by comparing theforecasted intensity im to the actual input intens'ity inusing any of a number of different comparison bases, e.g.,sum of absolute différences or weighted sum of squareddifférences. A variety of motion estimation techniques maybe used for this purpose, such as the frequency domaintechniqes described in R.C. Gonzales and R.E. Woods, DigitalImage Processing, pp. 465-478, (Addison-Wesley, 1992), whichis incorporated herein by reference, or methods using cou-pied Markov random field models as described in R. Depommierand E. Dubois, MOTION ESTIMATION WITH DETECTION OF OCCLUDEDAREAS. IEEE 0-7803-0532-9/92, pp. III269-III272, 1992, whichis incorporated herein by reference.
The preferred embodiment according to the présent invention utilizes a motion estimation technique which seeks to stabilize the statistical estimation and minimize the 65 need for new spatial smile loadings by using model informa-tion already established. The spatial model structures,moved from the reference position to the m position, xRcf@m isone such type of model information. This type of modelinformation also includes the moved version of the estimatedweights Wgts_XRcf, as will be described in greater detail below.
The probabilistic domain pRcfam includes segmentinformation sRef@m which allows the pixels in the area ofholon edges to move differently from the interior of aholon. This is important in order to obtain good motionestimation and holon séparation when two holons are adjacentto each other. The EstSmile estimator 1110 itself may findnew local segments which are then passed to the Interpréter720 as part of the warnings wn or probabilistic propertiesdp™,. Local segments are generally sub-segments or portionsof a segment that appear to move as a solid body from theforecasted frame m to frame n.
The address domain contains spatial address factorloadings a(f)Ref0m, f=0,l,2,... in each coordinate sub-operandand for each holon. The motion estimation seeks preferablyto accept motion fields daOTl that are linear combinations ofthese already reliably established address factor loadings.This nécessitâtes the use of an internai score estimator andresidual changefield estimator similar to those used in theInterpréter 720. Temporal smoothness of the scores of framen vs. frames n-1, n+1 etc, may then be imposed as an addi-tional stabilizing restriction. 010 269 66
The motion estimation may also inclu.de estimationof "hard" nod factors for the different segments. Thesesegments may be the whole frame (for pan and zoom estima-tion) , the holons defined in the forecast sm, or they may be 5 new local segments found by the motion estimation operatoritself.
The input uncertainty variances of the intensitiesand addresses of the varions inputs, x,,,, χ„, XRcf©m are used insuch a way as to ensure that motion estimation based on 10 uncertain data are generally overridden by motion estimationbased on relatively more certain data. Likewise, motionestimâtes based on pixel régions in the forecasted frameor input frame x„ previously determined to be difficult tomodel, as judged e.g. by pn, are generally overridden by
I 15 motion estimâtes from régions judged to be relatively easierto model.
During the initial modelling of a sequence, whenno spatial model structures hâve as yet been determined, andwhen the extracted factors are as yet highly unreliable, 20 other stabilizing assumptions, such as spatial and temporalsmoothness, are afforded greater weight.
The EstSmile 1110 estimator may perform the motionestimation in a different coordinate System than that usedin the rest of the encoder, in order to facilitate the 25 motion estimation process.
EstSiush 1120 intens·.·.y chance estimator ^jgutTT^ * M PT M w 7 ΜΚΚΜ<Ρ·κβ|ΜηαΒΙΒ
P G IG 2 6 9 67
The EstBlush es mator 1120 estimâtes the localincrémental blush field d:.....n, which in its simplest version may be expressed as: ôi *-=- -*-η in" im · 5 It should be noted that d-.. ring the itérative improvement ofthe estimated change fiel for a given frarae, it is ex-tremely important that th blush field used for reconstruct-ing the forecasted frame in the Décoder 830 in a certainitération, be not just based on di^, = in-ira from the previous 10 itération, since this would give an artificially perfect fitbetween the forecasted frame m and input frame n, thusprematurely terminating the estimation process for bettersmile and probabilistic change fields.
The EstBlush estimator 1120 also detects local 15 changes in the probabilistic properties, dp^, by detecting,inter alia, new edges for the existing holons. This may bebased on local application of standard segmentation tech-niques. Changes in transparancy may also be detected, basedon a local trial-and-error search for minor changes in the 20 transparancy scores or ioadings available in pRefem which improve the fit between im and in, without requiring furtherblush or smile changes.
Reverse Mover 25 The estimated local change fields (corresponding to dx^ 855 in Figure 8) are "moved" back from the forecastedposition m to the reference position Ref in the ReverseMover 1060, using the return address change field from m toRef:, damRef, from the Décoder Mover 870. These outputs ,jMWKr^WH»WWBIW'WW*^*WÎ’Oîf^ 0 1 « 2 6 9 6 8 -*· ιπηφ £ = t 1 ^^mntSRcf 3.Γ10 DPmn£,R;,·, CO rrespor.d to DA^r,,· 9 09 in Figure9 and DX^jj^f in Figure 8.
Reverse Adder 5 Finally, is added co the original· forecast- ing change fields, DXRcf>ra [DIRîf.m; DA?îftin and DPRcf,J in theReverse Adder 1070, to yield the desired estimated changefields which are applied to the reference model XRcf to esti-mate input frame n, x„. These change fields of CXRcf_a are 10 DIRcfn, DAR.fn and DPRtfn.
The Local ChangeFieldEscimator 1050 also yields residuals and prédictions corresponding to e0 (894) and scjiat.(892) in the various domains, as well as various otherstatistical warnings wn (396) in Figure 8, 15
Interpréter
Interpréter Ovendew
The main purpose of the Interpréter 720 is toextract from the estimated change field and other data for 20 the individual frames, stable model parametsrs for an entiresequence of data or portion of a sequence. The Interpréter720 in conjunction with the Change Field Estimatcr 710, isused hoth for preliminary internai model irproveitent, aswell as for final finishing of the model. In the case of 25 video coding, the Interpréter 720 couverts change fieldinformation into spatial, temporal, color and other modelparameters in the address, intensity and probabilisticdomains. The Interpréter 720 and the Change Field Estimateur710‘are repeatedly accessed under the control of the Multi- .Ψ yl l-.gw G 1 0 2 6 9 69
Pass Controller 620 for each individual frame n, for eachsequence of frames and for repeated passes through thesequence of frames.
For a given frame n at a given stage in the encod- 5 ing process, the Interpréter 72 0 takes as input the estimât -ed change fields in the various domains, DXRef>n 73 0 (includinguncertainty estimâtes) as well as additional warnings wn 750from the ChangeField Estimator 710. The Interpréter alsoreceives preliminary coded data for individual frames, ^hat 10 (735), and residual error en (745) from the Change Field
Estimator 710. The Interpréter 720 also receives existingmodels {XRef,USeq} 760, and may optionally receive a data baseof Model Primitives 780 for model deepening, in addition tolocal model information 899 and Local Change Field Esti- 15 mates dx^ and the input frame information x„. The 'Inter-préter 720 also receives and returns control signais andparameters 635 and 637 from and to the MultiPass Controller,and 755 and 756 to and from the ChangeField Estimator 710.
The Interpréter 720 processes these inputs and 20 outputs an updated version of the model {Χ^,υ^} 765. Thechanges in this model may be spatial extensions or redéfini-tions of the holon structure of the reference image mod-el (s), widened sub-operand models, or new or updated valuesof the factor loadings XRcf and sequence scores U^. The 25 Interpréter 720 also outputs scores in the various domainsand sub-operands vp (772) for each individual frame n, aswell as a reconstructed frame x„hat (770) and residualsen(775). It should be noted that ail of the Interpréter 70 L· 1 (; 2 6 &amp; outputs are expressed as both a signal v. .ue and its assocated uncertaintv estimate.
The internai operational blocks f the Interpret720 are shown in detail in Figure 12, Referring now to 5 Figure 12, the Interpréter 720 includes a Score Estimator12 D2 which estimâtes the scores xg (1204) of factors withknown loadings for each holon and each sub-operand. TheInterpréter 720 aiso estimâtes the matrix of nod scorescorresponding to affine transformations, including scores 10 for moving and scaling the entire fram.e due to caméra pan and zoom motions. These scores are provided to the ResidualChange Estimator 1210 which subtracts eut the effect ofthese known factors from the Change Field input DXR:fin, toproduce the residual or uumcde__ad portion EX„ (1212). The 15 residuals 1212 (or the full Change Field DXRef,n, depending onthe embodiment) are then used by the Spatial Model Widener1214 in order to attempt to extract additional rn. ?1 parame-ters by analyzing these change field data obtained fromseveral frames in the same seçmence. Si&amp;ce ail of the 20 change fields from the different frames in the subsecueucehâve been moved back to the reference position rs describedabove, spatio-temporal change structures that a tommon tomany pixels and frames may now be extraeîed nsi v factoranalysis of these change field data. New facto s, which are 25 considered to be reliable as judged by their abii. _y of d· scribe unmodelled changes found in twe or more fr..mes, are used to stabilise the change field estimation for subséquent frames. In contrast, minor change patterns which affect only a small nuaiber of pixels and fraæes are not used fer wwwwr·" 71 statistical stabilization, but rather, are a :umulated inmemory in case they represent emerging change oatterns thathâve not yet fully emerged but will become sta.isticallysignificant as more frames are brought into the modelling 5 process.
The Spatial Model Widener 1214 also handles additionaltasks such as 3D sorting/structure estimation and assessmentof transparency and shadow effects. The scores 1215 arealso provided to the Temporal Model Updater 1206 and Spatial 10 Model Updater 1208, where they are used for statistical refinement, simplification and optimisation of the models.
In the Interpréter 720, the input sequence X( isalso provided to the Spatial Model Extender 1216 whichcarries out various segmentation opérations used to extract 15 new spatial segments from each individual frame ni TheSpatial Model Extender 1216 also merges and splits imagesegments in order to provide more efficient holon struc-tures. The input sequence is also provided to the ModelDeepener 1218 which attempts to replace model parameters in 20 various domains by équivalent model parameters, but in moreefficient domains. This may, for example, inclure convert-ing "soft" modelling factors such as smile factors into"hard" nod factors, which require less explicit 'formation
Detailed description of Interpretv.c opéra- 25 tional blocks
The Score Estimator '.202 estimâtes the scores ofeach individual frame n, u„, ce various domains (operands) and sub-operands for the various holons for use with factors having known loadings in XRtf.
Each score con-
tassssssssssSSSSSS G 102 6 9 72 tains a value and associated estimation uncertainty. Robuststatistical estimation is used in order to balance thestatistical noise stabilization (minimization of erroneousscore estimation due to noise in the loadings or input 5 data), versus statistical robustness (minimizing erroneousscore estimation due to outlier pixels, i.e., those pixelswith innovation, i.e., change patterns not yet properlydescribed using the available spatial model.) Détection ofoutliers is described in H. Martens and T. Naes, Multi- 10 variate Calibration, pp 267-272, (John Wiley &amp; Sons, 1989),which is incorporated herein by reference. Statisticalstabilization to minimize noise is achieved by combining theimpact of a larger number of pixels during the score estima-tion. Statistical stabilization to minimize the effect of 15 outlier pixels is achieved by reducing or eliminating theimpact of the outlier pixels during the score estimation.
In a preferred embodiment, the robust estimation techniqueis an itérative reweighted least squares optimization, bothfor the estimation of smile, blush and probabilistic scores 20 of "soft models" with explicit loadings as well as for thenod score matrices of the affine transformations of solidobj ects.
Two different approaches to score estimation maybe used. The first approach is a full itérative search in 25 the score-parameter space to optimize the approximation ofthe input image x„. The second approach is a simpler projec-tion of the estimated change fields DXRcfn onto the knownfactor loadings (including the explicit loadings in XRcf andthe implicit loadings associated with nod affine transforma- ^ϊυ'ΧΛ’ Μ" «10 2 6 9 73 tions) . In addition, combinations of both methods may beused.
In the case of the itérative search in the score-parameter space, nonlinear itérative optimisation is used to 5 find the combinations of scores u„ in the different domains(operands), sub-operands, holons and factors that resuit inoptimal decoding conversion of the model XRef into estimatex^hat. The optimisation criterion is based on the lack offit différence (x„ - s^hat) , mainly in the intensity domain. 10 A different set of one or more functions may be used inorder to optimize the fit for individual holons or otherspatial subsegments. These function(s) indicate the lack offit due to different pixels by calculating, for example,absolute or squared différences. The different pixel con- 15 tributions are first weighed and then added according to thereliability and importance of each pixel. Thus, outlierpixels are assigned a lower weighting, while pixels thatcorrespond to visually or estimationally important lack offit residuals are assigned a higher weight. 20 The search in the score-parameter space may be a full global search of ail factor scores, or may insteadutilize a spécifie search strategy. In a preferred embodi-ment, the search strategy initially utlizes score valuespredicted from previous frames and itérations. In order to 25 control the computational resources required, the optimisa-tion may be performed for individual spatial subsegments(e.g., for individual holons), at different image resolu-tions (e.g., low resolution images first) or different timeresolutions, e.g., initially less than every frame, or for différent color charnel représentations (e.g , first forluminosity, ther for other coicr channsls) . it should benotée chat more emphasis should be pla.e-âd on estimatingma~ ;r factors with reliable Ioadings, &amp;nâ less emphasis on 5 mirer factors with less reliable Ioadings. This may becontrtlled by the Score Ridge pararaetef from the MultiPassControlier which drives unreliabie scortg toward zéro.
Score estimation by projectie® ci the estimatedchange field DXR.f,„ on 'btown' ioadings la XRcî does not re- 10 quire any image decoding of the referene© model. Instead,statistical projections (multivariate régressions) of theobtained change field D.SRef>n (régressantsî on known Ioadingsin XR.f (regressors) are used. The regrt®©ion is carried outfor ail factors simultaneously within e&amp;âfe domain's sub- 15 operand and for eacn nolon, usine least squares multiplelinear régression. If the weights of t’fcg different pixelsare changed, e.g., : or outlier pixels, or the regressorioadings become highly non-orthogonal, thesa a reduced rankrégression method is preferably used. ©Cîteerwise, the sta- 20 tistical modelling becocces highly unstafeS^, especially forintercorrelated factors with low weight©3i loading contribu-tions. In a preferred ambodiment , the régression is per-formed using standard biased partial le&amp;s£ square: régres-sât i (?1SR) or principal compor.ent régression (PCR), as 25 outiined in detail in H. Martens and T. ’îÆes, Mu 1t ivar 1 a teCalibration, pp 75-155, (John Wiley &amp; Sois., 1989), which isincorporated herein by référé:?.?---.
Other roi use rsgressi m techniques, such as purelynon-netric régressions or conventional .adage régressions. 75 010269 utilizing a ridge parameter, (H. Martens and T. Naes, Multi-variate Calibration, pp 230-232, (John Wiley &amp; Sons, 1989),which is incorporated herein by reference, may be used. Theridge parameter serves to stabilize the score estimation ofminor factors. Ridging may also be used to stabilize thelatent regressor variables in the PLSR or PCR régression.Alternatively, the scores may be biased towards zéro bycontrolling the ScoreRidge parameter from the MultiPassController so that only major factors are used in the ini-tial estimation process for the Change Field stabilization.The uncertainties of the scores may be calculated usingstandard sensitivity analysis or linear model theory, asdiscussed in H. Martens and T. Naes, Multivariate Calibra-tion, pp. 168, 206, (John Wiley &amp; Sons, 1989), which is incorporated herein by reference. ,
Residual Change Field Estimator
The Residual Change Field Estimator 1210 déter-mines the remaining umodelled residual EXRef>n by removing theeffects of the various scores which were estimated in theScore Estimator 12 02 from the respective changefields DXRef>nfor the various sub-operands and holons. In the preferredembodiment, the effects of the factors (e.g. the sum ofavailable loadings multiplied by the appropriate scores) aresimply subtracted from the change fields. For example, inthe case of red intensity: ERRefn = DRRef,n - (R(0)Rcf*uR(0)nR + R ( 1) Ref*uR( 1 ) + . . . .)
Optionally, the model parameters used in this residual con- struction may be quantized in order to make sure that the '^’· ;3£&amp;Τ*·«ΐ*Γύ;.5ί-> ο 1 0 2 6 9 76 effects of quantization errors are fed back to the encoderfor possible subséquent correction.
Spatial Model Widener
The Spatial Model Widener 1214 of the Interpréteraccumulâtes the residual change fields EXRefn for frame nalong with the unmodelled residuals from previous frames.These residual change fields represent as yet unmodelledinformation for each holon and each operand (domain) andsub-operand. These residuals are weighted according totheir uncertainties, and statistically processed in order toextract new factors. This factor extraction may preferablybe accomplished by performing NIPALS analysis on the weight-ed pixel-frame matrix of unmodelled residuals, as describedin e.g. H. Martens and T. Naes, Multivariate Calibration, pp97-116 and p.163 (John Wiley &amp; Sons, 1989), which is incor-porate herein by reference, or on their frame by frame crossproduct matrix, see H. Martens and T. Naes,
Multivariate Calibration, p. 100 (John Wiley &amp; Sons, 1989),which is incorporated herein by reference. However, thisitérative NIPALS method does not necessarily hâve to iterateto full convergence for each factor. Alternatively, thefactor extraction from the weighted pixel-frame matrix ofunmodelled residuals may be attained using singular valuedécomposition, Karhunen-Loeve transforms, eigen analysisusing Hotelling transforms, such as are outlined in detailin, e.g., R.C.Gonzales and R.E.Woods, Digital Image Process-ing. pp 148-156, (Addison-Wesley 1992), which is incorporat-ed herein by refernce, and Carlo Tomasi and Takeo Kanade,SHAPE AND MOTION WITHOUT DEPTH. IEEE CH2934-8/90 pp. 91-95, 77 1990, which is incorporated herein by refernce. The signif-icant change structures in the resulting accumulated residu-al matrix are extracted as new factors and included as partof the model [Xr^/U^] . Change structures which involveseveral pixels over several frames are deemed to be signifi-cant. The Spatial Model Widener portion of the Interprétermay be used for both local models 670, as well as morecomplété sequence or subsequence models 650.
In the case of real time encoding, the effect ofthe remaining unmodelled residuals from each individualframe may be scaled down as time passes, and removed fromthe accumulation of unmodelled residuals if they fall belowa certain level. In this way, residuals remaining for along time and not having contributed to the formation of anynew factors are essentially removed from further considér-ation, since statistically, there is a very low probabilitythat they will ever contribute to a new factor. In thisembodiment, the Spatial Model Widener 1214 produces indi-vidual factors that may be added to the existing model.Subsequently, this new set of factors, i.e., model, may beoptimized in the Temporal Updater 1206 and Spatial ModelUpdater 1208, under the control of the MultiPass Controller.
In an alternative embodiment, the existing modelis analyzed together with the change fields in order togenerate a new model. This new model preferably includesfactors which incorporate the additional information fromthe newly introduced change fields. Essentially, the entiremodel [XRtf, UScJ is re-computed as each new frame is intro- 78 duced. This is preferably accomplished using loadings XRefand scores USt, which are scaled so that the score matrix USeqis orthonormal, (see H. Martens and T. Naes, MultivariateCalibration, p.48, (John Wiley &amp; Sons, 1989), which isincorporated herein by reference. The different factorloading vectors in XRef then hâve different sums of squaresreflecting their relative significance. The new loadings [XRcf] (new) are then generated using factor analysis, e.g.,singular value décomposition svd, of the matrix consistingof [XRef(old) , DXRcfo] . This is a simplified, one-block svdbased version of the two-block PLSR-based updating methoddescribed in H. Martens and T. Naes, Multivariate Calibra-tion. pp. 162-123, (John Wiley &amp; Sons, 1989), which isincorporated herein by reference. New scores corresponding
I to the new loadings are also obtained in this process.Three-dimensional depth estimation
The Spatial Model Widener 1214 may also be used toestimate the approximate three dimensional depth structure zQof the pixels in a scene forming part of a frame sequence.This type of estimation is important for modelling of ob-jects moving in front of each other, as well as for model-ling of horizontally or vertically rotating objects. Thedepth information zn may also be of intrinsic interest byitself.
Depth modelling requires the depth to be estimât- ed, at least approximately, for the pixels involved in an occlusion. It is préférable to represent this estimated information at the involved pixel positions in the reference image model. 10 2 G 9 79
Depth estimation may be performed usine anv of anumber of different methods. In a preferred embediment,topological sorting of pixels, based on how some pixelsocclude other pixels in various frames is used. For pixelswhere potential occlusions are detected (as indicated in thewarnings wn from the Local ChangeField Estimator), differentdepth hypothèses are tried for several consecutive frames.
For each frame, the ChangeField Estimator is repeatedlyoperated for the different depth hypothèses, and the resuit-ing modelling success of the input frame intensity in usingthe different hypothèses is accumulated. The depth hypothe-sis that results in the most consistent and accurate repré- sentation of the intensity data i0 over the f rames tested,is accepted and used as the depth model information. Ini-tially, this depth information may be used to establish the
I basic depth Z(0)Rcf for those pixels where this is reguired.Subsequently in the encoding process for the same sequence,the same techniques may be used to widen the depth changefactor model with new factors Z (f ) Ref, f=l, 2, . . . for thosepixels that show more complex occlusion patterns owing totheir depth changing from one frame to another.
In an alternative embodiment, singular valuedécomposition of the address change fields DARcfn, may be usedto establish 3D depth information, as outlined in CarloTomasi and Takeo Kanade, "SHAFE AND MOTION WITHOUT DEPTH",IEEE CH2934-8/90, pp. 91-95,
Itérative control for (rame n
V 010268 80 A spécial mode of operation for the Spatial ModelWidener 1214 is used during itérative optimization for eachframe n. When separate (competing) estimâtes of localchange fields da^,, di^,, dp^ are used, as described above in 5 the preferred embodiment of the Local ChangeField Estimator850, the Spatial Model Widener 1214 must formulate a jointcompromise DXRef0(joint) to be used simultaneously for aildomains. In the preferred embodiment, information from onlyone of the domains is accepted into the joint change field 10 DXRcfn (joint ) during each itération.
At the beginning of the itérative estimation of each frame, smile changes are accepted as the most probablechanges. However, throughout the itérative estimation, caremust be taken that the accepted smile fields be sufficiently 15 smooth and do not give erroneous occlusions in the subsé-quent itération(s). In general, change field informationthat fits the already established factor loadings in XRef (asdetermined in the Score Estimator 1202) are accepted infavor of unmodelled residuals EXRefn (as determined in the 20 Residual ChangeField Estimator 1210), which are only accept-ed as change field information towards the end of the itéra-tive process for each frame. Thus, the change fields aremodified according to the particular stage of encoding andthe quality of the change fields of this itération compared 25 to those of previous itérations. In each itération, theresulting accepted change field information is accumulatedas the joint change field DXRcfn ( j oint ) .
During each itération, the Interpréter 720 must convey this joint change field, DXRefn(joint) back to the
G 2 G S 81
ChangeField Estimator 710 for further refinement in the nextitération. This is accomplished by incl-uding the jointchange field DXRefn ( j oint) as one extra factor in XRcf (withscore allways equal to 1) . Thus, this extra factor accumu- 5 lates incrémental changes to the change field for frame nfrom each new itération. At the end of the itérative pro-cess, this extra factor represents the accumulated jointchange field, which can then be used for score and residualestimation, widening, deepening, updating and extending, as 10 described above.
Model Updaters
The two updating modules, the Temporal ModelUpdater 1206 and Spatial Model Updater 1208, serve to opti- 15 mize the temporal and spatial model with respect to variouscriteria, depending on the application. In the case ofreal-time video coding, such as in video conférence applica·tions, the Temporal Model Updater 1206 computes theeigenvalue structure of the covariance matrix between the 20 different factors' scores within each domain, as time pass-es. Variation phenomena no longer active (e.g., a personwho has left the video conférence room) are identified asdimensions corresponding to low eigenvalues in the inter-score covariance matrices, and are thus eliminated from the 25 score model in the Temporal Model Updater 1206. The corre-sponding loading dimension is eliminated from the loadingsin the Spatial Model Updater 1208. The resultingeigenvalue-eigenvector structure of the inter-scorecovariance matrix may also be used to optimise the quant- tn 02e 62 ization and transmission control for the temporal parametersof the other, still active factors.
During encoding of video data (both real-time andoff-line), unreliable factor dimensions are likewise elimi-nated as the encoding proceeds repeatedly though the sé-quence, by factor rotation of the loadings and scores in thetwo Model Updaters 1206 and 1208 based on singular valuedécomposition of 'the inter-score covariance matrix or theinter-loading covariance matrix, and eliminating dimensionscorresponding to low eigenvalues.
The eigen-analysis of the factor scores in theTemporal Model Updater 1206 and of the factor loadings inthe Spatial Model Updater 1208 correspond to a type of meta-modelling, as will be discussed in more detail below. TheSpatial Model Updater 1208 may check for spatial pixel clus-ter patterns in the loading spaces indicating a need forchanges in the holon segmentation in the Spatial Model
Extender 1216.
The Model Updaters 1206 and 1208 may also performconventional factor analysis rotation, such as varimax rota-tion, to obtain a "simple structure" for the factor scoresin the case of Temporal Model Updater 1206 or loadings (inthe case of Spatial Model Updater 1208), for improved com-pression, editing and memory usage. Factor analytic "simplestructures" can be understood by way of the following exam-ple. First, assume that two types of changes patterns,e.g., blush patterns "A" (blushing cheeks) and "B" (roomlighting) hâve been modelled using two blush factors, butthe blush factor hâve coincidentally combined the patterns 83 in such a way that factor 1 models "A" and "B" and factor 2models "A" and "-B." Factor rotation to a simple structure,in this case, means computing a new set of loadings bymultiplying the two loadings with a 2x2 rotation matrix g sothat after the matrix multiplication, only pattern "A" isrepresented in one factor and only pattern "B" is represent-ed in the other factor. Corresponding new scores are ob-tained by multiplying the original scores with the inverseof matrix g. Alternatively, the original scores may beused. However, the new loadings must then be multiplied bythe inverse of g.
Yet another function of the Temporal Model Updater1206 is to accumulate multidimensional histograms of "co-occurrence" of various model parameters, e.g., smile and blush factors. This histogram gives an accumulated count of / how often various combinations of score values of the vari-ous domains occur simultaneously. If particular patterns ofco-occurence appear, this may indicate the need for deepen-ing the model, e.g., by converting blush factor informationinto smile factor information.
Spatial Model Extender
The Spatial Model Extender 1216 organizes andreorganizes data into segments or holons. In the case ofvideo coding, the segments are primarily spatial holons, andthus, the extender is referred to as a "Spatial" ModelExtender. The Spatial Model Extender 1216 receives as inputa set of holons, each represented by pixel loadings XRcf,sequence frame scores UScq, change fields DXRcf>n, andpnmodelled change field residuals EXRef<D. The Spatial Model siea3SS§tëi£££! 84 G 1 02 69
Extender 1216 also receives as input, the abnormality warn-ings from the ChangeField Estimator 710 wn, the actual inputframe x„, in addition to various input control parameters.
The Spatial Model Extender 1216 processes these inputs andoutputs an updated set of holons, each with. pixel loadingsXRtf, sequence frame scores UScq, unmodelled residuals EXRefn,and various output control parameters.
The Spatial Model Extender 1216 is activated fc>ythe Multipass Controller 620 whenever the accumulated signalfrom the warnings wQ output from from the ChangeField Estima-tor indicate a significant amount of unmodelled spatialinformation in a new frame x,,. The segmentation of as yetunmodelled régions into new holons may be performed usingthe estimated address change fields DARefn, e. g. as describedin John Y.A. Wang and Edward H. Adelson, "lAYERED REPRESEN-
I TATION FOR IMAGE SEQUENCE CODING", IEEE ICASSP, Vol.5, pp.221-224, Minneapolis, Minnesota, 1993, which is incorporatedherein by reference. This is particularly important in theareas where the incoming warnings wn indicate the need forsegmentation. The pixels in such areas are given particluarly high weights in the search for segments withhomogenous movement patterns.
As an alternative, or even additional, method of seg-mentation, the segments may be determined using variousfactor loading structures in XRc(, such as clusters of pixelsin the factor loading vector spaces (f=l,2,,..) as deter-mined using standard cluster ar.a’.ysis in the factor loadingspaces. Clusters with simple internai structures indicatepixels that change in related ways, and are thus, possible -WEfeocra &amp;>Χ·ίΙΙΛ«Χ·^^|ί.,^ι·^ΐ1·Γ·~Ρ·'τ·1 Oî 0269 85 candidates for segments. In addition, t’nose pixels that areadjacent to each other in the address space ARcf(0) , areidentified as stronger candidates for segmentation. In thismanner, new segments are formed. On the other hand, exist- 5 ing segments are expanded or merged if the new segments lieadjacent to the existing ones and appear to hâve similartemporal movement behavious. Existing segments that showhetorgenous movements along the edges may be contracted to asmaller spatial région, and segments that show heterogenous 10 movements in their spatial interiors may be split intoindependent holons.
One of the probabilistic properties of ^Ref usedto indicate a particularly high probability of segment shapechanges or extensions along existing segment edges, i.e., 15 there is a probability that seemingly new segments are infact just extensions of existing segments, extended at thesegment edges. Similarly, this probabilistic property maybe used to classify into segments those new objects appear-ing at the image edge. In addition, this property may also 20 be used to introduce semi-transparency at holon edges.
The Spatial Model Extender 1216, as operated by the MultiPass Controllst 620, produces both temporary holonsor segments which are used in the initial stabilization ortentative modelling in the encoding process; these holons 25 may be merged or deleted during the itérative encoding process, resulting in the final holons used to model eachindividual sequence at the end of the encoding process. Asillustrated in Figure 3, since with the introduction of newholons, the Extended Reference Image becomes larger than the individual input frames, the holons must .be spatially storedin the Extended Reference Image Model XRcf, so as not tooverlap with each other. Alternatively, storage methodssuch as the multilayer structure described in John Y.A. Wangand Edward H. Adelson, "LAYERED REPRESENTATION FOR IMAGESEQUENCE CODING", IEEE ICASSP, Vol.5, pp. 221-224, Minneapo-lis, Minnesota, 1993, which is incorporated herein by refer-ence, may be used.
Model Deepener
The Model Deepener 1218 of the Interpréter 720provides various functions that improve the modelling effi-ciency. One of these functions is to estimate transparencychange fields as a sub-operand of the probabilistic domainDPRefn. This may be performed using the technique describedin Masahiko Shizawa and Kenji Mase, "A UNIFIED COMPUTATIONALTHEORY FOR MOTION TRANSPARANCY AND MOTION BOUNDARIES BASEDON EIGENENRGY ANALYSIS”, IEEE CH2983-5/91, pp. 289-295, 1991, which is incorporated herein by reference.
Further, the Model Deepener 1218 is used to couvertblush factors into smile factors whenever the amount andtype of blush modelling of a holon indicates that it isinefficient to use blush modelling to model movements. Thismay be accomplished, for example, by reconstructing (decod-ing) the particular holon and then analyzing (encoding) itusing an increased bias towards sélection of a smile factor,rather than a blush factor. Similarly, smile factors may beconverted to nod factors, whenever the smile factor loadingsindicate holons having spatial patterns consistent withaffine transformations of solid objects, i.e., translations, 87 rotations, scaling, or shearing. This may be accomplishedby determining the address change fields DARefn for the holonsand then modelling them in tenus of pseudo smile loadingscorresponding to the various affine transformations. 5 88
V i G 2 t> S
DECODER
The présent invention includes a décoder thatreconstructs images from the spatial model parameter load-ings XRcf and temporal model parameters scores U. In applica-tions such as video compression, storage and transmission,the primary function of the décoder is to reproduce a cer-tain input sequence of f rames [x„, n=l, 2 using the scores [u„,n=l,2, .. .. ] = USeq which were estimated during theencoding of the sequence [^^=1,2,....1 = Xs«j· In otherapplications such as video garnies and Virtual reality, thescores at different points in time [ug, n=n1, n2, . . . ] =u may begenerated in real time, for example, by a user activatedjoystick.
In the présent description, the predicted resultsfor each frame n are denoted as the forecasted frame m.
Thus, Xo is équivalent to xjiat. A preferred embodiment of the Décoder 1300 isillustrated in block diagram form in Figure 13. This Décod-er 1300 is substantially équivalent to the Internai Décoder830 of the Change Estimator 710 (Figure 8) of the Encoder.However, the Décoder 1300 of Figure 13 includes some addi-tional functional éléments. These additional éléments arediscussed in detail in the attached appendix, DECODER-APPEN- DIX. 89
The resulting change fields DXRstm 1358 are thenpassée to the Adder 1330 w'nere they are added to the basicreference image X(0)R.f 1360, to produce X,n@Rcf 1362, i.e., the forecasted values for frame m given in the reference posi-tion. This contains the changed values which the variousholons in the reference image will assume upon output in theforecasted frame; however, this information is still givenin the reference position.
These changed values given in the reference posi-tion, Xjn^Ref 1362, are then "moved" in the Mover 1340 from thereference position to the m position using the movementparameters provided by the address change field DARcf>m 1364.
In the case of an internai décoder 830 of an encoder 600,the Mover 1340 may provide the return field dam?Ref 1366, whiclmay used to move values back from the m position to the
I reference position.
The primary output of the Mover 1340 is the forecastedresuit xæ, to which error corrections exm 1368 may optionall}be added. The resulting signal may then be filtered insidethe post processor 1350, for example, to enhance edge ef-fects, in order to yield the final resuit xæ 1370. TheAdder 1330, Mover 1340 and post processor 1350 may employstandard decoding techniques, such as are outlined in GeorgeWolberg, Digital Imaae Waroir.a, Chapter 7, (IEEE ComputerSociety Press 1990), which is -.r.corporated herein by refer-ence .
The Décoder 1300 may «aIso include additional func- tionality for controlling ar.d hqnürg the external communi- cation, decryption, local storage and retrieval of model 1 90 parameters which are repeatedlv used, for communication tothe outout medium (such as a computer video display terminalor TV screen) and other functions that are readily under-stood by those skilled in the art.
It should be noted that the Mover operators 1040(1340) and 1010 (870) may use different methods for combin-ing two or more pièces of information which are placed atthe same coordinate position. In the preferred implémenta-tion for video encoding and decoding, different informationis combined using 3D occlusion, modified according to thetransparancy of the various overlaid media. For otherapplications, such as analysis of images of two-way electro-phoresis gels for protein analysis, the contributions ofdifferent holons may simply be added.
I
ENCODER OPERATION - MULTIPASS CONTROLLER
Encoder System Control and Operation
The operation of the encoder/decoder System describein detail above, will now be explained for an off-line videoencoding application. First, the simplified encoder (alter-native embodiment) and the full encoder (preferred embodiment)will be compared. Then, the simplified encoder will first bedescribed, followed by a description of the full encoder. A video encoding System must be able to detectsequences of sufficiently related image information, in orderthat they be modelled by a sequence model. For each suchsequence, a model must be developed in such a way as to giveadéquate reconstruction quality, efficient compression, and 10 2 6 9 51 editability. This must be accomplished within the physical constraints of the encoding System, the storage/transmission and decoding Systems.
To achieve compact, parsimoneous modelling of a sé-quence, the changes in the sequence should be ascribed toappropriate domain parameters, viz., movements should mainlymodelled by smile and nod factors, intensity changes shouldmainly be modelled by blush factors and transparancy effectsmainly modelled by probabilistic factors. Effective modellin-of various change types to the proper domain parameters re-quires statistical stabilization of the model parameter estimtion, in addition to good séparation of the various modeldomains. This in turn requires modelling over many frames.
The two encoder embodiments differ in how they accomplish thi: task.
The simplified encoder employs a simple sequentialcontrol and operation mechanism that results in identif icatio:of suitable frame sequences during parameter estimation.However, it does'not attempt to optimize the simultaneousstatistical modelling in the various domains. The full encod·on the other hand, requires sequence identification as part o:a separate preprocessing stage. This preprocessing stage aiseinitializes various statistical weighting functions that areupdated and used throughout the encoding process to optimizethe noise and error robustness of the multi-domain modelling.
The simplified encoder repeatedly searches through î video frame sequence for related unmodelled change structures WWii which may be modelled either as a new factor in the smiledomain, the blush domain, or as a new spatial imagesegmentation. The optimal choice from among the potentialsmile, blush and segmentation changes, is included in thesequence model, either as a widening of the smile or blushmodel, or as an extension or réorganisation of the holons. T.search process is then repeated until adéquate modelling is atained.
The full encoder, in contrast, gradually widens,extends and deepens the model for a given sequence by passingthrough the sequence several times, each time attempting tomodel each frame in the three domains in such a way as to bemaximally consistent with the corresponding modelling of theother frames.
In the simplified encoder, the estimation of
I unmodelled change fields for each frame is relatively simple,since each domain is modelled separately. Smile change fieldsDARefn, n=nl, n2, . . . are extracted and modelled in one pass, whiclmay be shorter than the entire sequence of frames, and intens;ty change fields DIRefiB, n=nl,n2,... are extracted and modelledin a second pass, which may also be shorter than the entiresequence of f rames. Each pass is continued until the incremer.tal modelling information obtained is outweighed by the model-ling complexity. In the full encoder, the correspondingestimation of unmodelled change fields for each frame is morecomplicated, since the change fields for each frame are mod-elled jointly and therefore must be mutually compatible. This FJt^M««neaS2SÇ5ar- iâeutHtfttae· compatability is obtained by an itérative development of chanfields in the different domains for each frame.
Simplified Encoder Systems Control and OoeratiFor each frame, the simplified encoder uses the Sco:
Estimator 1202 of the Interpréter 720 to estimate factor scortip for the already established factors in XRef- The model may ;temporarily widened with tentatively established new factors .the domain being modelled. Subsequently, the ChangeFieldEstimator 710 is used to generate either an estimate ofunmodelled smile change fields DARefn or unmodelled blush changefields DIRef>n. In each case, the tentative new factors aredeveloped in the Spatial Model Widener 1214. The Interpréter720 also checks for possible segmentation improvements in theSpatial Model Extender 1216. The MultiPass Controller 620 inconjunction with the Spatial Model Widener 1214, widens eitherthe blush or the smile model with a new factor, or alternatively imposes spatial extension/reorganization in the SpatialModel Extender 1216. The MultiPass Controller 620 also initiâtes the beginning of a new seguence model whenever thechange fields exhibât dramatic change. The process may then brepeated until satisfactory modelling is obtained.
Full Encoder Systems Control and Operation
Preprocessinq
The input data are first converted from the inputcolor space, which may for example be RGB, to a differentformat, such as YUV, in order to ensure better séparation of 94 010269 luminosity and chrominance. This conversion may be carried cusing known, standard techniques. In order to avoid confusicbetween the V color component in YUV and the V (vertical)coordinate in HVZ address space, this description is given interms of RGB color space. The intensity of each convertedframe n is referred to as in. Also, the input spatial coordinate System may be changed at various stages of the encodingand decoding processes. In particular, the spatial resolutiomay during preprocessing be changed by successively reducingthe input format (vertical and horizontal pels, adresses an) ba factor of 2 in both horizontal and vertical direction usingstandard techniques. This results in a so-called "Gaussianpyramid" représentation of the same input images, but at different spatial resolutions. The smaller, low-resolutionimages may be used for preliminary parameter estimation, andthe spatial resolution increased as the model becomes increas-ingly reliable and stable.
Continuing, preliminary modelabilities of the inputdata are first estimated. For each of the successive spatialresolutions, the intensity data in for each frame are analyzedin order to assess the probabilities of whether the intensitydata for the individual pixels are going to be easy to modelmathematically. This analysis involves determining differentprobabilities which are referred to as pn, and discussed in detail below.
The preliminary mode 1abi1ity includes a determinatic of the two-dimensional recogn: ,:acι i ity of the input data, i.e. 95 an estimate of how "edgy" the different régions of the imageare. "Edgy" régions are easier to detect and foliow withrespect to motion, than continuons régions. Specifically, ar.estimate of the degree of spatially recognizable structures 5 p(l)„ is computed such that pixels representing clear 2D spa- tial contours and pixels at spatial corner structures areassigned values close to 1, while pixels in continuons areasare assigned values close to zéro. Other pixels are assignedintermediate values between zéro and one. This may be carrie· 10 out using the spécifie procedure set forth in Carlo Tomasi antTakeo Kanade, "SHAPE AND MOTION WITHOUT DEPTH", IEEE CH2934-8/90 pp. 91-95, 1990, which is incorporated herein by refer-ence, or in Rolf Volden and Jens G. Balchen, "DETERMINING 3-DOBJECT COORDINATES FROM A SEQUENCE OF 2-D IMAGES", Proc, of tl 15 Eighth Internatl Symposium on Unmanned Untethered SubmersibleTechnology, Sept. 1993, pp. 359-369, which is incorporatedherein by reference.
Similarly, the preliminary modelability includes adétermination of'the one-dimensional recognizability of the 20 input data, i.e, an indication of the intensity variations along either a horizontal or vertical line through the image.This procedure involves formulating an estimate of the degreeof horizontally or vertically clear contours. Pixels which arpart of clear horizontally or vertically contours (as detectec 25 from e.g. absolute values of the spatial dérivatives in hori-zontal and vertical directions) are assigned a value p(2)n=l, while those which are in continuons areas are assignée! a valu·of zéro, and other pixels are assigned values in between.
The preliminary modelability also includes determining aperture problems, by estimating the probability of aper-ture problems for each pixel as p(3)n. Smooth local movementsi.e., spatial structures that appear to move linearly over thecourse of several consecutive frames are assigned a maximumvalue of l, while pixels where no such structures are found aiassigned a value of 0. Similarly, structures which appear notto move at ail over the course of several consecutive framesare treated in much the same manner. Collectively, thisestimate of seemingly smooth movement or non-movement isreferred to as p(4)n. This property may also be used to esti-mate smooth intensity changes (or non-changes) over the courseof several consecutive frames.
The probability of half pixels which may arise atboundary edges and are unreliable because they are an averageof different intensity spatial areas, and as such, do notrepresent true intensifies, is computed and referred to asP(5)n.
Together, the intensity, address and probabilisticdata are symbolized by x0, and include address properties,intensity properties, and the different probabilistic proper-ties, such as p(l)„ through p(5)n.
The preprocessing also includes détection of sequenclength and the détermination of subsequence limits. This is 97 accomplished by analyzing the change property p(4)n and theintensities in over the entire sequence and performing amultivariate analysis of the low-resolution intensities inorder to extract a low number of principal components. This 5 followed by a cluster analysis of the factor scores, in orderto group highly related frames into sequences to be modelledtogether. If a scene is too long or too heterogenous, then :may be temporally split into shorter subsequences for simpli-fied analysis using local models. Later in the encoding 10 process, such subsequence models may be merged together intofull sequence model. In the initial splitting of sequences,is important that the subsequences overlap by a few frames ineither direction.
The thermal noise level in the subsequence is esti- 15 mated by accumulating the overall random noise variance assoc
I ated with each of the intensity channels and storing this valas the initial uncertainty variance s2in along with the actualvalues in in.
The preprocessing also produces an initial referenc 20 image XRef for each subsequence. Initially, one frame nRef ineach subsequence is chosen as the starting point for thereference image. This frame is chosen on the basis of principal component analysis of the low resolution intensities,followed by a search in the factor score space for the most 25 typical frame in the subsequence. Frames within the middleportion of the subsequence are preferred over frames at the 98
ί : 1 ' » G QΟ 1 ν r ν> «J start or end of the subsequence, since middle frames hâveneighboring frames in both directions of the subsequence.
Initialization
Initialization includes setting the initial valuesthe various control parameters. First, the ScoreRidge is setto a high initial value for ail domains and ail sub-operands.This parameter is used in the ScoreEstimator 1202 to stabilizthe scores of small factors. (When singular value decompositi(principal component analysis etc) is used for extracting thefactors, the size of individual factors is defined by theirassociated eigenvalue size,- small factors hâve small eigenva:ues. In the more general case, small factors are here definedas factors whose scores x loading product matrix has a low suit.of squared pixel values. The size of a factor is determined byhow many pixels are involved and how strongly they are affect ed by the loadings of that factor, and by how many frames areaffected and how strongly they are affected by the factorscores).
SqueezeBlush is set to a high initial value for each frarr.in order to make sure that the estimation of smile fields isnot mistakenly thwarted by preliminary blush fields thaterroneously pick up movement effects. Similarly SqueezeSmileis set to a high initial value for each frame in order to makesure that the proper estimation of the blush fields is notadversely affected by spurious inconsistencies in the prelimi-nary smile fields. The use of SqueezeBlush and SqueezeSmile ian itérative process designed to achieve the proper balance
99 0 1 02 8F between smile and blush change fields chat optimally model timage changes. The initialization also includes initiallyestablishing the full reference image XRef as one single holoiand assuming very smooth movement fields.
The spatial model parameters XRef and temporal mode}parameters USc4 are estimated by iteratively performing severapasses through the subsequence. For each pass, starting atinitial reference frame, the frames are searched bidirectionally through the subsequence on either side of theframe nRef until a sufficiently satisfactory model is obtained
For each frame, the statistical weights for eachpixel, for each itération and for each frame are determined.These statistical or reliability weights are an indication ofthe présent modelability of the pixels in a given frame. Thereliability weights wgts_xn for each pixel for frame n, fc the various sub-operands are: an: wgts_an = function of <pn,s2an,wn)in: wgts_ia = function of (p0,s2in,wn)
The reliability weights are proportional to the probabilisticproperties pn, and inversely proportional to both the variances2an and the warnings wn. Similarly, the reliability weightsWgts_XRef for each pixel in the preliminary model (s) XRef, foreach sub-operand, each factor and each holon are: ARcf: Wgts_ARef: inversely proportional function c(S:ARcf ) for each factor in each sub-operand. IRcf: Wgts_lRcf: inversely proportional functior.(S2lRcf ) for each factor in each sub-operand.
In general, only those factors which are found toapplicable to a sufficient number of frames are retained.Multi-frame applicability of the extracted factors is testedcross validation or leveraged correction, as described in H.Martens and T. Naes, Multivariate Calibration, pp 237-265,(John Wiley &amp; Sons, 1989), which is inoorporated herein byreference. Specifically, in the case of multi-pass or itéra-tive estimation, this may include preventing the contributiondue to the current frame n from being artificially validateda multi-frame factor based on its own contribution to the mocduring an earlier pass.
The estimation of the change field DXRefn and itssubséquent contribution to the model for each frame relative to the subsequence or full sequence model to which ibelongs is an itérative process, which will now be discusseddetail. For the first few frames encountered in the first pathrough the subsequence, no reliable model has as yet beendeveloped. Thus, the estimation of the change fields for thefirst few frames is more difficult and uncertain than forsubsuquent frames. As the model develops further, it increasingly assists in the stabilization and simplification of theestimation of the change fields for later frames. Therefore, during the initial pass through the first few frames, onlythose image régions that hâve a hïgh degree of modelability aused. In addition, with respect to movement, strong assump- 101 ΰ 1 b' 2 6 tions about smooth change fields are used in order to restricthe possible degrees of freedom in estimating the change fie}for the first few frames. Similarly, with respect to blushfactors, strong assumptions about smoothness and multi-frameapplicability are imposed in order to prevent unnecessaryreliance on blush factors alone. As the encoding processitérâtes, these assumptions and requirements are relaxed sothat true minor change patterns are properly modelled by chanfactors.
The encoding process for a sequence according to thtpreferred embodiment, requires that the joint change fieldsDXRefn be estimated for each frame, i.e., the different domainchange fields DARefiD, DIRef>n and DPRef>B may be used simultaneouslyto give acceptable decoded résulte As explained above, this requires an itérative modification of the different ί domains change fields for each frame. The weights, wgts xg an<Wgts_XRef, defined for address and intensity, are used foroptimization of the estimation of the local change field dz^.During this itérative process, the Interpréter 720 is usedprimarily for accumulating change field information inDXRef,n(joint) , as described above. The values in the alreadyestablished sequence model XRef, UScq are not modified.
In the itérative incrémental estimation of the changfield information DXRef>n(joint) , the model estimation keepstrack of the results from the individual itérations, and back- ϋ 1 02 102 tracks out of sets of itérations in which the chosen incremer.fail to produce satisfactory modelling stability.
Once the joint change field DXRef>n (joint) has beenestimated for a given frame, this is analyzed in the Interpre-er 72 0 in order to optimize the sequence model XRcf, based " DXRef.„( joint) .
Develooing the sequence model
The reliability weights for frame n and for the modeare updated. Subsequently, scores tq and residuals EXRefjn areestimated, and the change field information is accumulated foithe possible widening of the reference model with new validchange factors. The reference model is extended using segmen-tation, improvement of 3D structures are attempted, and oppor-tunities for model deepening are checked. Ail of these opera-tions will be described in detail below.
When ail the frames in a subsequence hâve been thusanalysed so that a pass is completed, the weights andprobabilistic properties are further updated to enhance theestimation during the next pass, with the obtained model beinçoptionally rotated statistically to attain a simpler factorstructure. In addition, the possibility of merging a givensubsequence with other subsequences is investigated, and theneed for further passes is checked. If no further passes arenecessary, the parameter resu les obtained thus far may be runthrough the System one final ::me, with the parameters beingquantized. 103 ύ 1 G 2 6 9
The control and operation of the full encodingprocess will now be described in more detail. First, theweights are modified according to the obtained uncertaintyvariances of the various sub-operands in DXRcfn. Pixels withhigh uncertainty in a given sub-operand change field are give~lower weight for the subséquent statistical operations for th;sub-operand. These weights are then used to optimize themultivariate statistical processes in the Interpréter 720.
The scores for the various domains and sub-oper-ands are estimated for the different holons in the ScoreEstimator 1202. Also, the associated uncertainty covariancesare estimated using conventional linear least squares method-ology assuming, e.g., normally distributed noise in the résiduals, and providing corrections for the intercorrelationsbetween the various factor weighted loadings. The scores witksmall total signal effects are biased towards zéro, using theScoreRidge parameter, for statistical stabilization.
The residual change field EX,, is estimated, aftersubtraction of the effects of the known factors, in ResidualChangeField Estimator 1210.
Next, the widening of the existing models XRef UStt) forvarious domains, sub-operands and holons, is attempted in theSpatial Model Widener 1214. This is performed using the esti-mated uncertainty variances and weights as part of the input,to make sure that data éléments with high certainty dominate.The uncertainty variances of the loadings are estimated using 104 standard linear least squares methodology assuming, e.g.,normally distributed noise.
As part of the Widening process, the basic 3D struc.ture Z(0) and associated change factors Z(f),f=1,2 , ... areestimated according to the available data at that stage. Inparticular, warnings for unmodelled pixels in wn suggest tentetive 3D modelling.
Modification of the segmentation is accomplished bychecking the varions domain data, in particular the"unmodellability" warnings wn and associated data in in, again:similar unmodelled data for adjacent frames, in order to detethe accumulated development of unmodelled related areas. Theunmodelled parts of the image are analyzed in the Spatial Mod^Extender 1216, thereby generating new holons or modificationsof existing holons in SRef. During the course of segmentation,higher probability of segmentation changes is expected alongthe edges of existing holons and along the edges of and XRef than elsewhere. Holons that are spatially adjacent in thereference image and temporally correlated are merged. Incontrast, holons that display inconsistent spatial and temporemodel structure are split.
Shadows and transparent objects are modelled as partof the Widening process. This includes estimating the basicprobabilistic transparancy of the holons. In a preferredembodiment for the identification of moving shadows, groups oladjacent pixels which in frame n display a systematic, low-dimensional loss of light in the color space as compared to a G i G 2 (· 105 different frame are designated as shadow holons. The shadowholons are defined as having dark color intensity and beingsemi- transparent.
Areas in the reference image with no clear factorstructure, i.e., many low-energy factors instead of a few hicenergy factors in A or I domains, are analyzed for spatiotempral structures. These areas are marked for modelling withspécial modelling' techniques, such as modelling of quasi-randSystems such as running water. This part of the encoder mayrequire some human intervention in terms of the sélection ofthe particular spécial technique. The effect of such spécialareas are minimized in subséquent parameter estimations.
The encoding operations described may be used withmore complex local change field estimâtes dx^. In the pre-ferred embodiment, for each pixel in each sub-opepand of theforecasted frame m, only one change value (with its associatecuncertainty) is estimated and output by the Local ChangeFieldEstimator 1050. In an alternative embodiment, there may bemultiple alternative change values (each with its associateduncertainy) estimated by the Local ChangeField Estimator 1050 for each domain or sub-operand. For example, two or morealternative potentially acceptable horizontal, vertical anddepth movements of groups of pixels may be presented as part cda^. in 855 by the Local ChangeField Estimator 850. Each of these alternatives are then moved back to the reference position as part of DXRcf>n 890. Subsequently, the Interpréter attempts to model the different combinations of alternatives, 106 and chooses the one chat produces the best resuit. A similarflexible alternative approach to local modelling is to let tiLocal ChangeField Estimator 850 output only one value for eacpixel for each suboperand, as in the preferred embodiment, buinstead to replace the uncertainty (e.g., uncertainty variancs'dx^) by local statistical covariance models that describe thmost probable combination of change alternatives. Thesecovariance models' may then be accumulated and used by theInterpréter to find the most acceptable combination of modelwidening, extension and deepening. II. Update models
After ail the frames of the présent subsequence hâvebeen analyzed during a particular pass and the System hasarrived at a stable model of a sequence, the mode], is updatedin the Temporal and Spatial Model Updaters 1206 and 1208,respectively, in the Interpréter 720, thus allowing even morecompact and easily compressible/editable factor structures. III. Merging subsequences
In the Multipass Controller 620, an attempt is madeto merge the présent subsequence with another subsequence,according to meta-modelling, or the technique given in appendiMERGE_SUBSEQUENCES. This couverts the local subsequence modelinto a model which is représentative for more frames of the sequence, than the individual sub-sequences. 107 IV Convergence concrol
At the end of each pass, the Multipass Controller £checks for convergence. If convergence has not been reached,more passes are required. Accordingly, the MultiPass Control1er 650 modifies the control parameters and initiâtes the nexpass. The MultiPass Controller also keeps track of the naturand conséquences of the various model developments in thevarious passes, and may back-track if certain model developme.choices appear to provide unsatisfactory results. V Final model optimization
Depending on the particular application, quantizatic errors due to parameter compression are introduced into theestimation of model parameters. The modelling of the sequenceis again repeated once more in order to allow subséquentparameters the opportunity to correct for the quantizationerrors introduced by prior parameters. Finally, the parameterin XRcf and üSeq and error correction residuals EXRef are com-pressed and ready for storage and/or transmission to be used ba décoder.
The internai model data may be stored using moreprécision than the input data. For example in video coding, bmodelling accumulated information from several input frames ofrelated, but moving objects, the final internai model XRef mayhâve higher spatial résolution than the individual inputframes. On the other hand, the internai model may be storedusing completely different res: lot ion than the input or output
10S data, e.g., as a compact subset of irregularly spaced keypicture éléments chosen by the Model Deepener from among thefull set of available pixels, so that goocL output image qualimay be obtained by interpolating between the pixels in the 5 Mover portion of the Décoder. The présent invention may aiseoutput decoded results in a different représentation than tha-of the input. For example, using interpolation and extrapolation of the temporal and spatial parameters, along with achange of the color space, the System may convert between NTS^ 10 and PAL video formats.
The IDLE modelling of the présent invention may be used to sort the order of input or output data éléments. Thiitype of sorting may be applied so that the rows of individualinput or output frames are changed relative to their common 15 order, as part of a video encryption scheme.
Deleterious effects due to missing or particularly noisy data éléments in the input data may be handled by theprésent System since the modelling contribution of each indi-vidual input data element may be weighted relative to that of 20 the other data éléments, with the individual weights beingestimated by the encoder System itself.
The preferred embodiment of the présent inventionuses various two-way bi-linear factor models, each consistingof a sum (hence the term "linear") of factor contributions, 25 each factor being defined as the product of two types of parameters, a score and a loading (hence the therm "bi-lin- ear"). These parameters describe, e.g., temporal and spatial 109 change information, respectively. This type of modelling ma^be generalized or extended. One such generalization is the vof higher-way models, such as a tri-linear model where eachfactor contribution is the product of three types of parame- 5 ters, instead of just two. Alternatively, each of the bi-linear factors may be further modelled by its own bi-linear model.
META MODELLING
Single-secruence meta-mode 11 inor 10 The IDLE model parameters obtained according to the
System and method described above already hâve redundancieswithin the individual suboperands removed. However, the modelparameters may still hâve remaining redundancies across domainand suboperands. For instance, the spatial pattern of how of 15 an object changes color intensity may resemble th£ spatial pattern of how that object also moves. Thus, there is spatialcorrélation between some color and movement loadings in XRef.Similarly, the temporal patterns of how one object changescolor over time may resemble how that object or some other 20 object moves over time. In this latter case, there is temporacorrélation between some color and movement scores in UScq.Meta-modelling is essentially the same as IDLE modelling,except that the input is the set of model parameters ratherthan a set of input frames. 25 Spatial meta-modelling
Spatial meta-modelling is essentially the same as IDLE modelling; however, the inputs to the model are now the Ο 1 Ο 2 6 s 110 individual loads determined as parc of a first IDLE model. feach holon of the initial model XRef, we may collect ail thefactor loadings of ail colors, e.g., in the case of RGB representations: red loadings R (f ) Ref, f = 0,1,2 , . . . , green loadings 5 loadings G (f ) Rcf, f=0,1,2 , . . . , and bine loadings B ( f ) Ref, f=o, 1,2 ,totalling F factors, into an équivalent single meta-se-quence consisting of F intensity "frames," each frame being aintensity loading' having the same size as the holon in theextended reference frame. When each of the loadings is strun·; 10 out as a line, as in the Spatial Widener in the Interpréter,the color intensity loadings form an FxM matrix, with a totalof F intensity loadings each having M pixels. A singular val·décomposition (svd) of this matrix generates meta-factors withmeta-loadings for each of the M pixels and meta-scores for eac15 of the F original factors. The svd yields a perfect recon-struction of the original loadings if the number of meta-factors equals the smaller of M or F. However, if there aresignificant inter-color spatial corrélations in the originalloadings, these will be accumulated in the meta-factors, 20 resulting in fewer than the smaller of M or F factors necessaifor proper reconstruction. The meta-scores indicate how the foriginal color factor loadings are related to each other, andthe meta-loadings indicate how these interrelations are spa-tially distributed over the M pixels.
Similarly, if there are spatial intercorrelations between how one holon moves in the three coordinate directions spatial meta-modeiling of the smile loadings in both horizon- 25 111 tal, vertical and depth direction will reveal these intercor:lations. Likewise, if there are spatial intercorrelationsbetween how one holon changes with respect to two or moreprobabilistic properties, these probabilistic redundancies ce 5 be Consolidated using spatial meta-modelling of the loadingsthe various probabilistic properties.
Finally, the spatial meta-modelling may instead beperformed on both the color intensity, movement and probabilitic change loadings simultaneously for each holon or for grou 10 of holons. Again, the spatial meta-loadings represent thespatial corrélation redundancies within the original IDLEmodel, and the spatial meta-scores quantify how the originalIDLE factor loadings are related to each other with respect tespatial corrélation. As in standard principal component 15 analysis, if the original input loading matrix i$ standardizerthe distribution of eigenvalues from the svd indicates thedegree of intercorrelation found, H. Martens and T. Naes,Multivariate Calibration. Chapter 3 (John Wiley &amp; Sons, 1989),which is incorporated herein by reference. 20 Such direct svd on spatial loadings may be considéré the équivalent of spatial blush modelling at the meta level.Similarly, the spatial meta modelling using only meta-blushfactors, may be extended to full IDLE modelling, with meta-reference, meta-blush, meta-smile and meta-probabilistic 25 models. One of the original loadings may be used as a meta- reference. The spatial meta-smile factors then define how régions in the different original loadings need to be moved ir 112 010269 crder to optimise their spatial redundancy. The meta-holonsr.eed not be the same as the original holons. Spatial meta-holons may be defined as either portions of the original holoor groups of the original holons, having régions with similarsystematic spatial inter-loading corrélation patterns. Otherprobabilistic spatial meta-suboperands such as spatial meta-transparancy allow blending of the different spatial meta- holons .
Temporal meta-modelling
Temporal meta-modelling is essentially the same asIDLE modelling; however, the input to the model is now thescores determined as part of a first IDLE model. In much thesame manner as the meta-modelling of the original spatialchange factor loadings in XRef, an IDLE meta-modelling may be
I applied to the sequence scores in USeq. The temporal meta-analysis may be performed on some or ail of the suboperandfactors for some or ail of the holons over some or ail of thesequence frames. '
The temporal meta-factor loadings thus indicate howthe different frames n=i,2,...N in the original video sequencerelate to each other, and the temporal meta-factor scoresf=l,2,...,F (for whichever suboperands and holons are beingmeta-analyzed together) indica:e how the scores of the differ-ent factors in the original IQLE. model relate to each other.Simple svd on the NxF matrix oF scores then models whatever
V 113 010269 temporal redundancies existed between thé factors of the oricnal IDLE model.
Such simple svd of the factor scores corresponds tetemporal meta-blush modelling. Full temporal IDLE meta-model-ling allows a reference which is a function of time, ratherthan a function of space as is the case with standard IDLEmodelling. In this situation, meta-holons represent event(s)or action(s) over' time, meta-smile factors represent a timeshift of the event(s) or action(s), and meta-blush factorsrepresent the extent of the event(s) or action(s). The meta-reference may be chosen to be one of the original factor scoresériés through the video sequence.
The temporal meta-smile factors can therefore be uséeto model systematic, yet complicated, temporal déviations awayfrom the meta-reference pattern for the other change patternsrepresented by the original IDLE model. For instance, if themovements of one object (e.g., a trailing car) in the originalsequence followed in time the movements and color changes ofanother object (e.g., brake lights of a lead car), but exhibited varying, systematic delays (e.g., due to varying accéléra-tion patterns), this would give rise to temporal meta-smilefactors. The loadings of the temporal meta-smile factorsindicate how the different frames in the original input se-quence relate to each other, and the temporal meta-smile score indicate how the different factors in the original IDLE modelrelate to each other. 020269 114
The temporal meta-holons generally correspond todiscrète temporal events that are best modelled separately freach other. Meta-transparancy factors may then be used tosmootnly combine different temporal holons. The model parameters of the meta-modelling processes described above may inturn themselves be meta-modelled.
When meta-modelling is used in the Encoder ("meta-encoding"), the Décoder System may hâve corresponding inversemeta-modelling ("meta-decoding").
Multi-secruence meta-modelling
The single-sequence meta-modelling described abovemay be further applied to multi-sequence meta-modelling. Oneprimary application of multi-sequence meta-modelling is videocoding, where it is used to relate IDLE models from different,but possibly related, video sequences. One way tp merge two cmore related IDLE models is to meta-model their loadings orscores directly as described above. Such direct meta-modellirof spatial structures is useful if the extended referenceimages are the same or very similar. However, the directspatial meta-modelling is difficult to accomplish if thesequences hâve differently sized extended reference images.
Furthermore, although physically achievable, the resuit israther meaningless if the extended reference image sizes arethe same, but the holons are different.
The direct temporal meta-modelling is also useful if the sequences are of the same length and reflect related events, such as the leading/trailing car example discussed 115 0 1 0 2 6 ί above. Meta-modelling is difficulté to perform if the sequencannot be separated into sub-sequences of the same length, a:becomes rather meaningless if the sequences do not reflect related events.
Indirect multi-sequence meta-modellingIndirect multi-sequence meta-modelling is the use c two or more stages of meta-modelling. One stage for is makirtwo or more model' parameter sets compatible, and a second staof meta-modelling of the resulting compatible sets. Indirectmulti-sequence meta-modelling is more flexible than the meta-modelling described above, in that it allows a single model tmodel a larger class of phenomena.
In the preliminary phase of spatial meta-modelling,the extended reference images and the associated factor load-ings of one or more sub-sequences are used to est,ablish a newextended reference image, e.g., by simple IDLE modelling. Analternative method of linking together two spatial sub-sequen<models in order to form a new extended reference image, is described in further detail in the Appendix MERGE_SUBSEQUENCES.This latter approach is applicable if the sub-sequences overl,each other by at least one frame.
Preliminary temporal meta-modelling achieves temporcompatability of one or more temporal reference sub-sequencesand associated factor scores, with the temporal reference subsequence of another sub-sequence. This may be accomplishedusing a simple IDLE model to model the temporal domain. υ 1 0 2 β s 116
Once compatability has been achieved in the spatialanà/or temporal domains, the different sub-sequence models msthen be jointly meta-modelled as if they belonged to a singlesub-sequence.
Oombinina of models usina meta-modelling
The scores and loadings from different models may b-'combined with the loadings and scores from different models.Alternatively, the scores or loadings of one model may bereplaced with other scores or loadings from an alternatesource, e.g., a real-time joystick input, and be combined usir.meta-modelling. Lip synchronization between sound and imagedata in video dubbing is one example of combining models usinemeta-modelling. Specifically, smile scores may be estimatedfrom an already established IDLE image mouth movement model.These scores may then be matched to a corresponding time sériérepresenting the sounds produced by the talking mouth. Lipsynch may then be accomplished using meta-modelling of the image scores from the already established model and the soundtime sériés loadings to provide optimal covariation of theimage data with the sound time sériés.
Another application of combining models using meta-modelling of IDLE parameters is the modelling of covariationsbetween the IDLE parameters of an already established model,and external data. For example, if IDLE modelling has beenused to model a large set of related medical images in a data-base, the IDLE scores for selected images may be related to th<spécifie médication and medical history for each of the sub- 117 jects of the corresponding images. One method for perrormintthis covariation analysis is the Partial Least Squares Regretsion £ 2 ("PLS2"), as described in H. Martens and T. Naes,Multivariate Calibration, pp. 146-163 (John Wiley &amp; Sons, 1989), which is incorporated herein by reference.
Joint vs separate movement modeling for the different image inout channels.
The typical input for a color video sequence has si:input quantifies: 3 implicit position dimensions (vertical,horizontal and depth) and 3 explicit intensifies (e.g. R,G,B)In the preferred embodiment of the basic IDLE System, it isassumed that the three intensity channels represent input fro:the same caméra and hence information relating to the sameobjects. Thus, the same segmentation and movements (S andopacity, smile and nod) are assumed for ail three color orintensity channels. The color channels are only separated inthe blush modelling. Further model redundancy is then elimi-nated by joint multivariate modelling of the various loadingsas described above.
Alternatively, the basic IDLE System may be modifieto hâve stronger connectivity between input quantifies, i.e.,model blush information in the different color channels simultaneously, by requiring each blush factor to hâve one commonscore for each frame, but different loadings for each colorchannel. This gives preference. to intensity changes with thesame temporal dynamics in ail coVccf channels for a holon or a 118 Oî 0269 group of holons, and could for instance be used in order tostabilize the estimation of the factors, as well as for edit:and compression.
Instead, the basic IDLE System may be modified tohâve weaker connectivity between input quantifies, wheremovement is modeled more or less independently for each colorchannel separately. This could be computationally advantagecand could give more flexibility in cases where the differentchannels in fact represent different spatial information.
One example of independent movement modelling is tircase of multi-sensor geographical input images from a set ofsurveillance satellites equipped with different sensors. Bas*on one or more repeated recordings of the same geographicalarea taken at different times from different positions, andpossibly exhibiting different optical aberrations,, differenttimes of recording and different resolutions, the IDLE Systemcould be used for effective normalisation, compression andinterprétation of the somewhat incongruent input images. Thedifferent sensor channels may exhibit quite different sensitixities to different spatial structures and phenomena. Forexample, radar and magnetometric imaging sensors .may be sensi-tive to land and océan surface height changes, whereas somephoton-based imaging sensors, e.g UV, Visible and Infrared caméras, may hâve varying sensitivities to various long-termclimatic changes and végétation changes, as well as short-terrnweather conditions. In this situation, the IDLE System may 01026( 119 require separate movement and blush modelling for the indeperdently observed channels.
Another example of this type of System is input datobtained from several medical imaging devices (MRI, PET, CT)repeatedly scanning a given subject, over a period of time ir.order to monitor cancer growth, blood vessel changes or othertime varying phenomenon. Since each device requires separatemeasurements, the'subject will be positioned slightly differ-ently for each different device and for each scan over thecourse of the repeated measurements. The movement of biologi-cal tissue typically does not follow affine transformations.Thus, IDLE smile factors may be a more flexible, yet suffi-ciently restrictive way of representing body movements andallow the required normalization. Each imaging device couldthen hâve its own subset of smile factors from its extendedreference position to the results for each individual set of scans from the various imaging devices. With the resultingnormalization, blush factors and local smile factors that giveearly warning of slowly developing tissue changes may bedetected. This is particularly effective if the extendedreference position is normalized, e.g., by meta-modelling, forthe different imaging devices for maximum spatial congruence.In this way, the joint signal from ail the channels of thedifferent imaging devices may be used to stabilize the model-ling against measurement noise, e.g. by requiring that theblush factor scores for ail channels be identical and that onlthe loadings be different. Généralisations from analysis of two-dimensional inputs (images)
The IDLE modelling System described above may be uifor input records of a different format than conventional twcdimensional video images. For instance, it may be used forone - dimensional data, such as a time sériés of Unes from aline caméra, or as individual columns in a still image.
The IDLE system may in the latter case is used aspart of a still image compression System. In this type ofapplication, the input information to the still image encoderis Unes or columns of pels instead of two dimensional framedata. Each input record may represent a vertical column inthe two dimensional image. Thus, the still image IDLE loadincparameters are column-shaped instead of two dimensional imagesThe time dimension of the video sequence (frames n=l,2,...) i:replaced in this case, by the horizontal pel index (columnnumber) in the image.
Simultaneous modeling for different input dimensions
If the input to the still-image IDLE codée is an RGIstill image, then the three color channels (or a transform ofthem like YXJV) may be coded separately or jointly, as discusseabove for the video IDLE codée. Likewise, if the input to thestill-image IDLE codée is a set of spatial parameters of theextended image model from a video IDLE codée, the differentinput dimensions (blush factors, smile factors, probabilisticfactors) may be coded separately or jointly. υ 1 G 2 6 3 121
The présent invention which has been described abo'in the context of a video compression application, may beaoplied to any of a number of information Processing and/oracquisition applications. For example, in the case of theProcessing of image sequences or video sequences for modelliror editing a video sequence (a set of related images) inblack/white or color, the modelling is carried out with respeto IDLE parameters in such a way as to optimize the editingusefulness of the model parameters. The model parameters arepossibly in turn related to established parameter sets, andother known editing model éléments are forced into the model.Groups of parameters are related to each other in hierarchica.fashion. The sequence is edited by changing temporal and/orspatial parameters. Sets of related video sequences aremodelled jointly by multi-sequence metamodelling, , i.e., each related sequence is mapped onto a 'Reference sequence' by aspécial IDLE meta-model.
The présent invention may also be applied to compression for storage or transmission. In this application, a videsequence is modelled by IDLE encoding, and the resulting mode]parameters are compressed. Different compression and represer.tation strategies may be used depending on the bandwidth andstorage capacity of the decoding System. Temporal sorting ofthe change factors, and pyramidal représentation and transmis-sion of the spatial parameters may be used to increase theSystem's robustness in the face of transmission bandwidthlimitations. G 1 G 2 6 8 122
Similarly, the présent invention may be applied tothe colorization of black/white movies. In this case, theblack/white movie seçruences are modelled by IDLE encoding. Thspatial holons in IRef are colored manually or automatically,and these colors are automatically distributed throughout thesequence. Sets of related sequences may be identified forconsistent coloring.
In addition, the présent invention may be used insimulators, Virtual reality, games and other related applica-tions. The relevant image sequences are recorded and com-pressed. When decoding, a few chosen scores may be controlle<by the user, instead of using the recorded scores. Similarly,other scores may be varied according to the user-controlledscores. For example, in the case of a traffic simulator:record sequences of the interior of a car and of the road andthe terrain; identify those scores, probably nod scores, thatcorrespond directly to how the car moves; détermine thosescores that change indirectly based on those nod scores, suchas smile/blush factors for illumination, shadows, perspectiveetc.; and set up a mathematical model that defines how the carreacts to certain movements of the control inputs, such as thesteering wheel, accelerator pedal, brake pedal etc. The usercan then sit in a simulated car interior, with a display infront and perhaps also on the s ides. The simulated controllerare then connected to the "direct" factors, which in turn maybe used to control the "indir·^?; " factors. The resultingimages will give a very natu: i. ic effect.
The présent invention also has application in realme Systems such as video téléphoné, télévision, and HDTV.Extrême compression ratios for very long sequences may beattained, although there may be bursts of spatial informatiorat the onset of new sequences. This application also includrreal-time encoding &amp; decoding. Depending on the computationspower available, different degrees of IDLE algorithm complex;may be implemented. For instance, information in the spatialdomain may be represented by a standard Gaussian Pyramid (refwith the IDLE encoder algorithm operating on variable imagesize depending on the particular applications's capacity andneeds. The encoder Interpréter parts for widening, extendingdeepening do not hâve to be fully realtime for each frame. Ti complexity of the scenes and size of image then defines thecompression ratios and coding qualities which may be attained
The présent invention may also be used in remotecaméra surveillance. By employing a remote real-time encoderat the source of the image information, both interprétation a:transmission of .the caméra data is simplified. The generalblush factors model normal systematic variations such asvarious normal illumination changes, while general smilefactors and nod factors correct for normal movements (e.g.,moving branches of a tree). The automatic outlier détection aispatial model extender detect systematic redundancies in theunmodelled residuals and generate new holons which in turn ma1be interpreted by searching in a data base of objects beforeautomatic error warnings are issued. Each object in the data 124 ο 1 G' 2 6 9 base may hâve its own smile, blush and probability factorloadings and/or movement model. The compressed parameters mbe stored or transmitted over narrow bandwidth Systems, e.g.twisted-pair copper téléphoné wire transmission of TV caméraoutput from security caméras in banks etc, or over extremelynarrow bandwidth Systems, such as are found in deep water orouter space transmission.
Images -from technical caméras, i.e., images notintended for direct human visualization may also be modeled/cmpressed using the IDLE technique. The more 'color' -channelsthe more effective the meta-modelling compression of thespatial IDLE models. Examples of this application includemulti-wavelength channel caméra Systems used to monitor biolocical processes in the Near Infrared (NIR), or Ultra-Violet/Viible wavelength ranges (e.g., for recording fluorescence).
The IDLE System may also be used in conjunction wibmultichannel satellites and/or aerial photography. Repeatedimaging of the same geographical area under different circum-stances and at different times may be modelled by IDLE encod-ing. Such parameterization allows effective compression forstorage and transmission. It also provides effective interpretation tools indicating the systematic intensity variations aimovements, and how they change over time. If the same geo-graphical area is imaged from slightly different positions orunder different measuring conditions, then an extra IDLEpreprocessing model may be used for improved alignment, allow 125 ing the geographical area to differ cuite significantly (e.g.more or less day-light) and yet allow accurate identificatior
The IDLE approach of the présent invention may aisebe utilized in cross domain coordination or lip synch applications for movie production and in Sound dubbing. For multiveiate calibration, the temporal parameter scores from an IDLEvideo model of the mouth région of talking persons are relateto the temporal parameters for a speech Sound model (e.g. asubband or a Celp codée, or an IDLE Sound codée) , e.g. by PLS.régression. This régression modelling may be based on datafrom a set of movie sequences of people speaking with variousknown image/sound synchronizations, thus modelling the local lip synch delay for optimizing the lip-sound synchronization.For each new sequence with lip synch problems, the same image and sound model score parameters are estimated. Once estimât- } ed, this local lip synch delay is corrected or compensated forby modifying the temporal IDLE parameters and/or Sound parame-ters .
The IDEE principle may also be applied to databasecompression and/or searching. There are many databases inwhich the records are related to each other, but these rela-tionships are somewhat complicated and difficult to express byconventional modelling. Examples of this type of applicationinclude police photographs of human faces ("mugshots"), variotmedical images, e.g., MRI body scans, photographs of biologicaspecimens, photographs of cars etc. In such cases, the conterof the database can be analyzed and stored utilizing IDEE mode 126 parameters. The IDLE représentation of related, but compliceed information in a database offers several advantages, viz...high compression, improved searchability and improved flexibiity with respect to the représentation of the individualrecords in the database. The compression which may be achievdépends on how many records can be modelled and how simple thIDLE model which is used, i.e., the size and complexity of th database content.’
The improved searchability (and interpretability)stems from the fact that the data base search in the case ofIDLE représentation may be performed using the low-dimensionaset of parameters corresponding to factor scores (e.g., a lownumber of nod, smile and blush scores), as opposed to the lareamount of original input data (e.g., 200,000 pixels per image;Compression techniques using fractals or DCT do not yieldsimilar searchable parameters. The few IDLE score variablesmay in turn be related statistically to external variables inthe database, providing the capability to search for larger,general patterns, e.g. in the case of medical images and medical treatments. The improved flexibility due to thereprésentation of the records in the database stems from thefact that the bilinear IDLE factors allow whatever flexibilityis desired. Equipping the holon models with a few smile andblush factors allows systematic unknown variations to bequantified during the pattern récognition without statisticaloverparameterization.
127 C1026S
The use of IDLE moaelling in database représentatif may be used for a variety of record types in databases, suchimage databases containing human faces, e.g. medical, criminereal estate promotional material; or technical drawings. Inthese situations, the IDLE modeling may allow multiple use ofeach holon in each drawing; the holons could in this spécialcase be geometrical primitives. Additional applicationsinclude sound (music, voice), events (spatiotemporal patternssituations (e.g., weather situations which combine variousmeteorological data for various weather structures or geograpical locations, for a certain time-span).
The IDLE principle may also be used for improvedpattern récognition. In matching unknown records againstvarious known patterns, added flexibility is obtained when th·known patterns also include a few smile and blush factorloadings whose scores are estimated during the matching pro-cess. In searching an input image for the presence of a give.pattern, added flexibility is obtained by allowing the holonsto include a few.smile and blush loadings, whose scores areestimated during the searching process. This type of pattern récognition approach may be applied to speech récognition.
The IDLE principle may also be applied to medical aiindustrial imaging devices, such as ultrasound, MRI, CT etc iiorder to provide noise filtering, automatic warnings, andimproved interprétation. In medical ultrasound imaging, noisçis a major problem. The hoist ts so strong that filtering onindividual f rames to reduce bisju f\oi se will often also destroy 128 010269 important parts of the wanted signal. Much of the noise israndom and additive with an expectation of zéro, and if manysamples could be collected from the same part of the sameobject, then the noise could be reduced by averaging samples.
It is often impossible to keep the measured object or subjectsteady, and the observed movement can seem to be quite comple-However, the observed movement is due to a limited number ofreasons, and so the displacements will need relatively few ID1smile and nod factors. In the reference position, noise can 1averaged away. The smile and blush factors can also be usefu]for interpreting such sequences. Finally, ultrasound sequenc»represent such large amounts of raw data that they are diffi-cult to store. Most often only one or a few still images arestored. The compression aspect of the présent invention istherefore highly applicable. ,
The IDLE principle of the présent invention may aisebe used for crédit card and other image data base compressionapplications. For example, in the case of compression, whenever there are sets of images with similar features, this set ofimages could be regarded as a sequence and compressed with theIDLE technique. This is readily applicable to databases offacial images. If ail the loads are known at both the encoderand the décoder side, this means that only the scores need tobe stored for each individual. These scores would then be ablto fit into the storage capacity of the magnetic stripe on acrédit card, and so could form the basis for an authenticationSystem. U 1 0 2 69 129
Other applications of the IDLE principle includestill image compression, radar (noise filtering, patternrécognition, and error warnings), automatic dynamic Visual ar(in an art gallery or for advertisement, two or more computerwith e.g. fiat color LCD screens where the output from IDLEmodels are shown. The score parameters of the IDLE model onone computer are functions of the screen output of the otherIDLE models, plus' other sensors in a self-organizing System),consumer products or advertisement (one computer with e.g., acolor fiat LCD screen displays output from an IDLE model whosescores and loadings are affected by a combination of randomnumber generators and viewer behavior), disjoint sensing &amp;meta-observation (when a moving scene has been characterized b different imaging sensors at sufficiently different times suclthat the images cannot be simply superimposed, IDLE modellingmay be used to normalize the moving scene for simpler superim-position) .
The IDLE system may also be used for data storagedevice normalization (magnetic, optical). Specifically, if thphysical positioning or field intensity of the writing processvaries, or the reading process or the medium itself is varyingand difficult to model and correct for by conventional model-ling, IDLE modelling using nod, smile and/or blush factors maycorrect for systematic, but unknown variations. This may beparticularly critical for controlling multilayer read/writeprocesses. In such an application, the already written layers 130 may serve as input data for the stabilizing latent smile andblush factors.
The IDLE principle of the présent invention also hanumerous Sound applications. For example sound, such as musivoice or electromechanical vibrations, may be modelled andcompressed utilizing parameterization by fixed translation/ncsystematic shift/smile, intensity/blush and overlap/opacity ithe various domaihs (e.g., time, frequency). A holon in sounmay be a connected sound pattern in the time and/or frequencydomains. Additional sound applications include sound modificition/editing; industrial process and monitoring, automotive,ships, aircraft. Also, searching may be carried out in sounddata bases (similar to searching in image or video databases discussed above). It is thus possible to combine IDLE model-ling in different domains, such as sound modelling both in thf i time and the frequency domains.
The IDLE principle may also be used in weather forecasting; machinery (robot quality control monitoring usinea caméra as a totally independent sensor and allowing the IDLISystem to learn its normal motions and warn for wear &amp; tear arabnormal behavior); robot modelling which combines classicalrobot connectivity "hard" nod trees with IDLE smile models for"softly" defined movements and using such "soft" and "hard"robot modelling in conjunction with blush factors to modelhuman body motion. 010268 131
The IDLE principle of the présent invention may al.be used for forensic research in the areas of finger prints,voice prints, and mug shot images.
While the invention has been particularly shown anc5 described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that variouschanges in form and detail may be made therein without départing from the spirït and scope of the invention. 132 010269
DECODER-APPENDIX 1. OverView 2. Frame Reconstruction 5 2.1 Intuitive explanation 2.2 INRec Formula 2.3 Holonwise loading-score matrix multiplication 2.4 Smile 2.5 Nod 10 2.6 Move 2.7 Ad hoc residuals 3. References 1. OverView
In order to increase readability, colloguial abbrevi15 ations are used in this description instead of the indexed and i subscripted symbolism used elsewhere in the application.
The décoder performs the following steps for each frame n:
Receives updates of the segmentation S field part20 of domain PRcf: S
Receives updates of the scores ("Sco") for the blushintensity changes ("Blu"), BluSco; the vertical and horizontaladdress smile changes ("Smi"), SmiSco; the 3D depth changes(Z), ZSco; and probabilistic changes ("Prob"), ProbSco for u„ 25 for each holon. v 1 () 2 G 9 133
Receives updates of the Blush, Smile, Prob and Zloadings for XRcf (abbreviated "Loads" or "Lod”): BluLod,SmiLod, ProbLod, ZLod.
Receives updates of the affine transformation ("Nod5 matrices, NodMat, containing the nod scores.
Receives optional error residuals ("Res") em = (Blues, SmiRes, ZRes, ProbRes).
Reconstructs the intensity of the présent frame (inhere termed IN) based on the S field., scores, loads and Nod 10 matrices, to produce a reconstructed i„hat resuit ("INRec"). 2 .
Frame Reconstruction 134 A. Intuitive explanation
Blush the image by changing the pixelintensities of the pixels at the various color channels in the reference image according to the blush factors.
Smile the image by changing the addressvalues of the pixels in the reference image accordito the smile factors (including the Z factors).
Change the probabilistic properties of theimage by changing the probabilistic suboperands likttransparancies in the reference image according tothe prob factors.
Nod the smiled coordinates by changing thesmiled adresses of the pixels according to nod matr: ces.
I
Move the pixels from the blushed referenceimage into the finished image so that each pixel encup at its smiled and nodded coordinates, so that"holes” in the image are filled with interpolatedvalues, so that the pixel with the highest Z value"wins" in the cases where several pixels end up atthe same coordinates, and so that pixels are partlytransparant if they hâve a Prob value lower than 1.
Optional: Add residual corrections to thereconstructed intensities.
Optional: Post process the resulting outpvimage to provide smooth blending of holons, especial 135 ly along edges formed during the Mover operator du·to movements. In the preferred embodiment, this isaccomplished by blurring along ail segment edges ii.the moved images. 2.2 INRec Formula
The formula for computing INRec is as follows: INRec = Move(IRef+BluSco*BluLod, S, ...
No‘d([V H] + SmiSco*SmiLod, Z+ZSco*ZLod, NodMatS) , ...
ProbSco*ProbLod) 2.3. Holonwise loading-score matrix multiplication
In an expression such as "BluSco*BluLod", the multiplication does not imply traditional matrix multipl. cation, but rather a variation referred to as holonwise 1 loading-score matrix multiplication. That is, each holoihas its own score, and for each pixel, the S field must ;analyzed in order to détermine which holon that pixelbelongs to, and this holon number must be used to selectthe correct score from BluSco.
To compute BluSco*BluLod:
For each Pixel :
Sum=0
For each Factor:
Sum = Sum + BluSco[S[Pixel],Factor] * Blulod[Factor,Pixel] 136 010269
Resuit[Pixel] = Sum
The saine applies to SmiSco*SmiLod, ZSco*ZLod andProbSco*ProbLod. 2.4 Smile
Smiling pixels means to displace the referenceposition coordinates according to address change field.
The address change field may hâve values in each coordi-nate dimension, such as vertical, horizontal and depthdimension (V,H,Z), and may be defined for one or moreholons. Each address change field may be generated as thesum of contribution of smile factors, and each changefactor contribution may be the product of temporal scores i and spatial loadings.
In order to displace information of pixels away froitthe reference position, the amount of motion that each of théspixels in the reference position (the address change fieldDARcfn) may be computed first, and the actual moving operationthen takes place later in the Mover stage of the décoder.
For each pixel with coordinates V, H, Z in the reference position, its new address after it has beenmoved is computed by:
VSmi = V + SmiScoV*SmiLodV
HSmi = H + SmiScoH*SmiLodH 137 010269 ZSmil = Z + SmiScoZ*SmiLodZIn these three expressions, V and H are the coordi- nate of each pixel in the reference position, while Z isthe value of the Z field for that pixel. The multipli-cation is Holonwise loading-score matrix multiplication,as defined in the previous paragraph. 2.5 Nod
The function of the Nod is to modify the value,of the coordinates of each pixel, which may be conceptuaized as a vector having homogenous coordinates: ASmi = [VSmiled HSmiled ZSmiled 1]
The nodded coordinates, ANod are then given by
—1 Γ" “I r~ ~i VNod | | tu T12 T13 0 1 | VSmi | HNod | = I T21 T22 T23 o | * | HSmi | ZNod I i T31 T32 T33 ο 1 | ZSmi | .Dummy j | T41 T42 T43 i 1 1 1 1 L_ _l L_ _J which may be equivalently expressed as: ANod = NodMat * ASmi 2.6 Move
Move the pixels the finished image so that each pixel ends up at itS Situ led and nodded coordinates,
V 138 such a way that "holes" in the image are filled withinterpolated values, and chat the pixel with the highestvalue "wins" in the cases where ssveral pixels end up atthe saine coordinates; and so that pixels are partly tran 5 parant if they hâve a Prob value lower than 1.
If the loadings X(f ) Ref, f=1,2, .. . are also movec together with the level 0 image, X(0)RCf, the same interpclation and Z' buffering strategies are used for f=l,2,...as for f=0 above. 2^ description of methods of moving and interpo-lating pixels may be found in, e.g., George Wolberg,Digital Image Warping, Chapter 7, (IEEE Computer SocietyPress 1990), which is incorporated herein by reference.description of Z-buffering may be found m, e.g., William 15 a. Newman and Robert F. Sproull, Prinçiples—of—I.Q.t eracti\
Computer Graphics, Chapter 24 (McGraw Hill 1984), which iincorporated herein by reference. A description of how tcombine partly transparent pixels may be found in, e.g.,John Y.A. Wang and Edward H. Adelson, "Layered Représente 2Q tion for Image Seguence Coding", IEEE ICASSP, Vol. 5, pp. 221-224, Minneapolis, Minnesota, 1993, which is incorpo-rated herein by reference. 139 0 1 0269
Appendix MERGE_SUBSEQUENCES
Check if the présent subsequence model can be merged witother subsequence models A. Call the présent reference model 'position I', andanother reference model 'position II'. Move thespatial model parameters of the extended referenceimage for the présent subsequence, XIr to the posi-tion of the extended reference image for anothersubsequence, Xn, using a frame n which has been mod-elled by both of the subsequences: 1. Since:
In Model I : inhat(I) =Move(DAi>n of Ir +DII>n)
In Model II: inhat (II) =Move (DAn(n of Xn+DIn,n)
I and this generalizes from inhat to ail domains in
Xnhat :
In Model I : xnhat(I) =Move(DAI>n of Xj +DXï>a)
In Model II: xjiat (II) =Move (DA^ of Xn+DXn>a) 2. We can move the estimate for frame n back to thtwo respective reference positions:
In Model I : X^hatd)^ =Move(DA^ of xjIn Model II: Xnhat (II) en=Move (DA^n of xj 140 01 0269 3. If the two models mainly contain smile, as op-posed to blush modelling, we may now ’ movemodel I to frame n's estimated position, usingmodel I, and then move model I into model II's 5 position using the reverse of model II:
Xian= MoveiDA^n of (Move (DAI<n of (X^DX^) ) 10 15 4. The obtained model I loadings given in modelII's position, Xian, may now be compared to ancmerged into Xn, (with local smile and blushestimation and model extension, plus détectionof parts in Xr lost in Xi®n'· This yield a newand enlarged model Xn that summarizes both mod-
J els I and II. 5. The new and enlarged model Xn may now similarlybe merged with another model III with which ishas another overlapping frame, etc. Subsequenceare merged together as long as it does not in-volve unacceptable dégradation in compressionand/or reproduction quality. 20
AFFENPIX :τν
XSCDER
Purpose:
Show cne way ci implementinç a simplifier IDLE encoder.
Contents : 1 En.cSeq........................ 2 ExpressSubSeqWichMcôels ................. 3 ExpressWithModels ................... 4 ExtractSmiFactSuhSeq ................ 5 ExtractEluFactSubSeq ................ 6 SegSuhSeq....................... 7 AllocateKoion ..................... 8 MoveBack ...................... 9 AnalyseMove ...................... 10 Other required methcds ...... „ ......... 142 10.1 Move......................2 0 10.3 Smi2Nod....... 20 10.4 UpdateModel..................... 10.5 Transmit....................2i 5
Appendix...........................
Notation.......................22 1 EncSeq
Seq: Sequence of f rames; one per row
ErrTol: Error tolérance
Output:
SmiLod: Smile loads 0 SmiSco: Smile scores
BluLod: Blush loads
BluScc: Blush scores
Informai description: L5 Work forward through the sequence. Whenever frames cannot be reconstructed with an error less than thetolérance using known smile and blush factors, intro-duce a new factor. Do this by first trying to intro-duce a smile factor and then trying to introduce a 20 blush factor. Choose the factor that improved the reconstruction the most.
During this process, different parts of the image maybe found te move independently of or occluding each 25 other. Each time this is detected, detect which parts of the image move coherently, isolate the 144 010269 smallest and define this as one or more new holons make new rcom by increasing the size of the image, place th ~ new holons there, and let a smile factor compensate for this repositioning.
Whenever new information is revealed (That is, partsof the image cannot be moved back to reference posi-tion with any fidelity using the existing nod orsmile factors) , f ind which holons are nearby and tryto model the new information under the assumptionthat it is an extension to each of these holons- Ifa good modelling behaviour can be found, extend theholon, eise create a new holon.
Take into account how much memory the décoder has left :
If it has much free memory, prefer factors thatspan many frames and so are believed to be more"correct" (even though they alone may describeeach individual frame with less fidelity) byrelaxing the test error tolérance TestErrTol.
If it has little free memory, it is importantthat the required fidelity must be reached withthe few remaining factors, so the test errortolérance TestErrTol must be tightened.
Vf Λ
Method: IRef = First image in the sequence Seq
Set Smilcà and BluLod to empty
While NextFraNo < = length(Seq) [SmiSco, BluSco, FailFraNo] = ...
BxpressSubSeqWithModels(Seq, NextFraNo,IBef, SmiLod, BluLod, ErrTol)
If FailFraNo < = length(Seq):
Try different ways of updating the model:
If the décoder has mu ch memory left(Based on Transmit history):
Set TestErrTol to a large valueelse if the décoder has little memoryleft :
Set TestErrTol to a value close to ErrTol
FromFraNo = FailFraNo [NewSmiLod, nSmiFra, TotSmiErr] =ExtraotSmiFactSubSeq(Seq, FromFraNo, ! i “ ‘ ) Z? ηV t b ί. ο b
TestErrTol, SmiLod, BluLod, SmiSco,BluSco) [NewBluLod, nBluFra, TotBluErr] =
ExtractBluFactSubSeq( Seq, FromFraNo,TestErrTol, SmiLod, BluLod, SmiSco,BluSco) [NewS, nSegFra, TotSegErr] = SegSubSe-q(Seq, FromFraNo, S, TestErrTol)
Based on nSmiFra, nBluFra and nSegFra, and TotS-miErr, TotBluErr and TotSegErr:
Either select one of Smile or Blush to beincluded in the model, or change the seg- f mentation
If Smile is selected:
Transmit(SmiLod)
Update smile factors: [SmiLod,SmiSco] = UpdateModel(SmiLod,-SmiSco,NewSmiLod) else if Blush is selected:
Transmit(BluLod)
Update blush factors: 147 [BluLod,BluSco] = UpdateModel(BluLod,31u3co, NewBluLod) else Secm.ent is selcted:
Transmit(News-S)
NewS
End of method EncSec 14 6 2 ExpressSubSeqWithModels
Purpose:
Express a Sequence with existing models consisting ofloads in smile and blush domain, as far as the error tolérance will allow.
[SmiSco, BluSco, NextFraNo] = ...
ExpressSubSeqWithModels(Seq, NextFraNo,ErrTol, IRef, SmiLod, BluLod, SmiSco, BluS-co)
Input:
Seq: The sequer.ce to be expressed
NextFraNo: Starcing point of the subsequence'within SeqErrTol: Error tolérance; the ending criterion for thesubsequence IRef: Reference imageSmiLod, BluLod: Smile load
SmiSco, BluSco: Already known smile and blush scores
Output:
SmiSco: Smile scores
BluSco: Blush scores
FailFraNo: Number of the frame where the modelling failed due to ErrTol
Msthoà :
Set current fram.e number N to NextFraNo
Repeat 5 IN = Seq[N[
Try to modal IN usine the known factors: [INE.ec, SmiSco[N], BluSco [N] ] =
ExpressWithModels(IN, S, SmiLod, BluLod)
Increase the frame number N 10 until Error(INEec,IN) < ErrTol or IN was the last frame in
Seq
NextFraNo = N 15 End of method ExpressSeqWithModels 1 G Z v 150 3 ExpressWithModels
Purpose:
Express a frame with the known models, i.e. calculate thescores for the existing loads that gives best fit betweenIN and a reconstruction [INRec, SmiSco, BluSco] = ExpressWithModels(IN, IRef, SmiLod,BluLod, S, SmiSco, BluSco)
Input: IN: One particular frameIRef: Reference image
SmiLod: Known smile loads
BluLod: Known blush loads S: S field
Optional input:
SmiSco, BluSco: Initial estimâtes for the smile and blushscores
Output: INRec: Reconstructed image
SmiSco: Improved estimâtes for the smile and blush scores
Informai description: 151 G lu 269
Find an optimal set of scores by trial and error, i.e. by a search method iike Simplex (For a description, see chapter 10.4, W: .Iliam K. Press, et al., "Downhill Simplex Method in Multit îimensions" in "Numerical Recipes" (Cam- bridge University Press), which is incorporated herein byreference.
Select new smile scores as variations of the previ-ously best known smile scores, estimate blush scoresby moving the différence between the decoded and thewanted image back to reference position and thenprojecting on the existing blush loads.
Judge how well each new image approximates the wantedimage, and use this as guidelines for how to selectnew variations of the smile scores. 152 G '1 02 6 9
Method:
For each holon: 5 Repeat
For a small number of variants:
Change the smile scores slightly Décodé an image using the new smile scores and the old blush scores 10 Move the différence between the decoded and the wanted image back to reference positionProject the différence onto blush loads,producing new BluSco Décodé an image using the new SmiSco and 15 BluSco
Select the best variant (i.e. keep the scoresthat gave best reconstruction) until the reconstructed image is good enough or thereconstruction is not improving 20
End of ExprèssWithModels method 4 ExtractSmiFactSubSeq
Purpose:
Extract one smile factor from a subsequence [NewSmiLod, nSmiFra, TotSmiErr] = ExtractSmiFactSubSeq(Seq,FromFraNo, ErrTol, IRef, SmiLod, BluLod, SmiSco, BluSco)
Input:
Seq: The sequence
FromFraNo:
Number of first frame in subsequence. This is thesame as NextFraNo in EncSeq
ErrTol: Error tolérance SmiLod, BluLod: Known smile and blush loads ' SmiSco, BluSco: Scores to be updated OutpUt: nSmiFra : Number of frames used for estimating factor
NewSmiLod: One new smile load
TotSmiErr: Total remaining error after smiling
Informai description:
For each frame, as long as smile seems reasonable: 154
Reconstruct the wanted frame IN as well as possible
using only the known loads; call this IM
Find how IM should be smiled in order to look like IN
Map this smile back to reference position
UpdateModel
Return the first factor of the final model
Method:
TestFraNo = FromFraNo
TotErrSmi = 0
Set SmiTestLod to empty
Repeat IN = Seq[TestFraNo]
Establish an image IM that reconstructs IN as well aspossible based on the reference image and known smileand blush factors, and as a side effect also computethe return field from M to Reference position: [IM,SmiSco[TestFraNo], BluSco[TestFraNo]] =
ExpressWithModels(IN, IRef, SmiLod, BluLod,SmiScoInit, BluScoInit)
SmiRefToM = SmiSco[M] * SmiLod IM = Move(IRef+BluSco[M]*BluLod, SmiSco[M]*SmiLod)
Find how IM should be made to look like IN when onlysmilina is allowed, and at the same time calculatethe Confidence of this smile field: [SmiMToN, SmiConfMToNJ = EstMov(IM, IN, TestSmi-
Lod)
Move the smile and its certainity back to referenceposition:
SmiMToNAtRef = MoveBack(SmiMToN, SmiRefToM)SmiConfMToNAtRef = MoveBack(SmiConfMToN, SmiRef-ToM)
Calculate the error when only smiling is used:
ErrSmi = IN - Move (IRefBlushed, SmiRefToM+SmiMT-oNAtRef )
I
[SmiTestLod,SmiTestSco] - ...
TotErrSmi = TotErrSmi + ErrSmi
UpdateModel(SmiTestLod,SmiTestSco, ErrSmi)
TotSmiConfMToNAtRef = TotSmiConfMToNAtRef + SmiConf-MToNAtRef
TestFraNo = TestFraNo + 1 until 156 ü 102 6 8
The energy is too much spread among the factors inSmiTestLod, or
ErrSmi is large 5 The last frame should not be included in the summary, so:
Undo the effect of the last UpdateModelUndo the effect of the last error summation:
TotErrSmi = TotErrSmi - ErrSmi 10 TotSmiConfMToNAtRef = TotSmiConfMToNAtRef - SmiConfMToNAt
Ref
NewSmiLod = SmiTestLod[1] nSmiFra = FromFraNo - NextFraNo 15
End of ExtractSmiFactSeq method 157 ϋ ί 0 2 6 9 5 ExtractBluFactSubSeq
Purpose:
Extract one blush factor from a subsequence [NewBluLod, nBluFra, TotBluErr] = ExtractBluFactSubSeq(Seq,NextFraNo, ErrTol, IRef, SmiLod, BluLod, SmiSco, BluSco)
Input:
Seq: The sequence
NextFraNo: Number of next trame, i.e. start of subsequenceErrTol: Error tolérance, which may define end of subseque-nce IRef: Reference image
SmiLod: Known smile load BluLod: Known blush loads SmiSco: Smile scores BluSco: Blush scores
Output:
NewBluLod: New blush load nBluFra: Number of frames for which this blush is definedTotBluErr: Total remainir.g error after blushing •Method: 158
TotBluErr = 0 Ü t U 2 6 9
TestFraNo = NextFraNo
Set BluTestLod to empty
Repeat
If scores for IM are not available from ExtractSmiFa-ctSubSeq:
Establish an image IM that reconstructs IN aswell as possible based on the reference imageand known smile and blush factors, and as a sideeffect also compute the return field from M toReference position: [IM,SmiSco[TestFraNo], BluSco[TestFraNo]] =ExprèssWithModels(IN, IRef, SmiLod,BluLod, SmiScoInit, BluScoInit)
SmiRefToM = SmiScoM * SmiLod
Try to make IM look like IN by blushing:
BluMToN = IN - IM
Move this blush back to reference position:
BluMToNAtRef = MoveBack(BluMToN, SmiRefToM) [BluTestLod,BluTestSco] = ...
UpdateModel(BluTestLod,BluTestSco, ErrBlu)
Calculate the error when only blushing is used:
ErrBlu = IN - Move(IRefBlushed+BluMToNAtRef,
SmiRefToM) 159 010269
TotErrBlu = TotErrBlu + ErrBlu
TestFraKo = TestFraNo + 1 5 until
The energy is too much spread out among factors in
BluTestLod, or
Sum(ErrBlu) is large 10 The last frame should not be included in the summary, so
Undo the effect of the last UpdateModelUndo the effect of the last error summation:
TotErrBlu = TotErrBlu - ErrBlu 15 NewBluLod = BluTestLod[1]
End of ExtractBluFact method
ISO 6 SegSubSeq
Purpose:
Propose a new segmentation of the holons, and report howmuch this improves the modelling [S, TotSegErr,nSegFra] = SegSubSeq(Seg, FromFraNo, SmiLod,SmiSco, S)
Input:
Smi: Smile field
FromFraNo: Number of first frame in the subsequenceSmiLod: Smile loads
SmiSco: Smile scores S: Previous S field
Output: S: New, updated S field
TotSegErr: Total error associated with segmentingnSegFra: Number of frames used for estimating the segmen-tation
Informai description:
Use various heuristic techniques to improve how the refer- ence image is split into separate holons. 161 G H! 2 6 9
Check how easy it is to extract either new smile or newblush factors under the assumption of this new split.Report back the best resuit.
Method:
Repeat
TestFraNo = FromFraNo
Repeat IN = Seq(TestFraNo)
Smi = SmiSco[TestFraNo] * SmiLod
I
Split one holon into two if necessary:
For each holon in S :
Compute a nod matrix from Smi for that holon
If the sum of errors between nod ma-trices and pels is large:
Split each holon along the prin-cipal component of the errors
Join two holons into one if necessary:
For each holon in S: 162
If the nod matrix is very similar tothe nod matrix of another holon:
Join the two holons
Let edge pels with bad fit change holon: INRec = Move(IRef+BluSco*BluLod, SmiSco*Sm-iLod)
For each pel, at position v,h, in INRecthat is on the edge of a holon:
If the pel fits better on the neighbo-uring holon, let the pel belong to theneighbouring holon
Pick up pels that don't belong ta any holon:VisInFromAtTo = AnalyseMove(Smi)
I
Make a new holon out of pels whereVisInFro-mAtTo[pel]<Thereshold
TestFraNo = TestFraNo + l until SmiSco[TestFraNo] is no longer available fromearlier runs of ExtractSmiFactSubSeg until convergence 163 [NewSmiLod, nSmiFra, TotSmiErr] = ExtractSmiFactSubSeq (Seq, FromFraNo, TestErrTol, SmiLod, BluLod, SmiSco, BluSco) [NewBluLod, nEluFra, TotBluErr] = ExtractBluFactSubSeq(Se5 q, FromFraNo, TestErrTol, SmiLod, BluLod, SmiSco, BluSco)
If Smile was "better" than Blush:
TotSegErr = TotSmiErrnSegFra = nBluFra 10 else
TotSegErr = TotBluErrnSegFra = nBluFra
End of SegSubSeq method 164 )26 9 7 AllocateHolon
Purpose:
SegSubSeq will need to change the spatial définition ofholons. Here is one example of an operation that isneeded, namely the one to allocate a new new holon in theReference image.
[S, SmiLod, BluLod, SmiSco, BluSco] = AllocateHolon(S, SNewHolon, Smi, SmiLod, BluLod, SmiSco, BluSco)
Input: S: Old S field, before updating SNewHolon: S field for one or mory new holons
I
Output: S: New, updated S field
Method:
For each new holon in S:
Find enough free space in S, if necessary increasethe size of S
Find a free holon number, put this into each new pelposition in S
Put the pels of SNewHolon into the new space
ISS Ο 10 2 6 9
Give the new holon a new smile factor capable ofmoving the holon from the new reference position backto its last position
Reformat the score tables accordingly
I 166 010269 8 MoveBack
Purpose:
Move the contents of an image back, e.g. from N to M5 position or from M to Ref position. This is almost an inverse of Move. IBack = MoveBack(IOut, SmiBack, SOut) 10 Input: IOut: Input image, in Moved Out position, e.g. IMSmiBack: Smile field, in Back position, e.g. RefSBack: S field, in Back position 15 Output: IBack: Image moved back, e.g. to reference position
Method:
For each pel at position v,h in SBack: 20 Interpolate, using two-way linear interpolation, IBack [v, h] from the four pels in IOut that surroundsthe sub-pixel position (v+SmiV[v,h], h+SmiH[v,h]) 167 9 AnalyseMove
Purpose: Détermine features of a smile field:
For each pel in a From image: Will it be visible inthe To image ?
For each pel in a To image: Was it visible in theFrom image ? [VisInToAtFrom, VisInFromAtTo] = AnalyseMove (SmiFrom,SFrom)
Input:
SmiFrom: Smile field, in From position, to be analyzedSFrom: S field, in From position
Output:
VisInToAtFrom: Visibility in To image at From position:
For each pel in a From image: 1 if the corresponding pel in the To image isvisible 0 otherwise
VisInFromAtTo: Visibility in in the From image at Toposition:
For each pel in a To image: 1 if the corresponding pel in the From image isvisible 168 Ο 1 U 2 6 9 Ο otherwise
Method.: 5 Generate VisInFromAtTo:
Initialize VisTo to ail zéros
For each pel, at position v,h, in SmiFrom:
VisInFromAtTo[int(v+SmiV[v,h] > , int (h+SmiH [v, h] -)] = 1 10
For each pel, at position v,h, in VisInFromAtTo:
Replace VisInFromAtTo[v,h] with the majorityvalue of itself and its four neighbours 15 Generate VisInFromAtTo: < [Dummy2, SmiRet] = Move(Dummyl, Smi)
Initialize VisFrom to ail zéros
For each pel, at position v,h, in SmiRet:
VisInToAtFrom[int (v+SmiRetV[v,h] ) , int (h+SmiRet-20 H[v,h] ) ] = 1
For each pel, at position v,h, in VisInToAtFrom:
Replace VisInToAtFrom with the majority value of 25 itself and its four neighbours 169 010 2 6 9 10 Other required methods 10.1 Move
Purpose: Move the contents of an image according to aSmile field .
[IMoved, Ret] = Move(IFrom, Smi, S) as described in ... 10.2 EstMov
I
Purpose:
Estimate the movement (i.e. Smile field) from oneframe to another, together with the certainity of theestimate [Smi, SmiConf] = EstMov(IFrom, ITo)
Input: IFrom: From-imageITo: To-image i7o 0 1 0269
Output:
Smi: Smile field
SmiConf: Smile confidence: How sure can we be on Smi 7
Method: E.g. any of those methods described in "Optic FlowComputation, A Unified Perspective", Ajit Singh, IEEEComputer Siciety Press 1991, ISBN 0-8186-2602, whichuses the term "OpticaJL flow field" much like a Smilefield is used in this context. 10.3 Smi2Nod
Purpose: Compute Nod matrices from Smile fields
NodMat = Smi2Nod(Smi, S) as described in ... 10.4 UpdateModel [NewLod, NewSco] = UpdateModel{OldLod, OldSco, NewData) as described in ... 171 Ο- 10.5 Transmit
Purpose: 5 Make the computed data available for the décoder sc it can décodé the sequence
Transmit(Data) 10 Method:
If Data is a spatial load:
Compress Data using conventional still imagecompression techniques else if Data is an update of an S field: 15 Compress Data using run-length encoding else if Data represents scores:
Compress Data using time sériés compressiontechniques 20 Send Data ta the receiver via whatever communication medium has been selected 172
Appendix
Notation = (Equals sign):
The expression to the left of the sign is evalvated, and the resuit is assigned to the variablor structure indicated by the identifier to theright of the sign.
If the expression to the left results in severa.output values, a corresponding list of identifi
I ers are given inside brackets on the right sideof the sign. () (Parenthesis):
After an identifier, a pair of parenthesis indi-cates that the identifier indicates a definedfunction to be evaluated or executed, and theidentifiers given inside the paranthesis repre-sent variables or structures that are sent tothe function as input parameters. (] (Square brackets) : 173 2 68
One use of square brackets is defined in theparagraph about the Equals sign.
Another use is to indicate indexing: When a paof square brackets appear after an identifier, 5 this means that the identifier refers to an array or matrix of values, and the expressioninside the square brackets selects one of thos- values. îo Naming
Mnemonic names are used: "Smi " is used instead of "DA" for Smile "Blu" is used instead of "DI" for Blush "Lod" dénotés loads "Sco" is used instead of "U" for scores
Pre- and postfixes are used instead of subscripts, andbold characters are not used, e.g. 20 "SmiMToN" is used instead of DA^.

Claims (63)

  1. u 1 Ο 2 6 9 174 We Claim:
    1. A method for converting samples of an input signal to an encoded signal composed of a plurality of compo-nent signais each representing a characteristic of the input signal in a different domain, said input signal being comprisec 5 of data samples organized into records of multiple samples, with each sample occupying a unique position within its record, characterized in that each component signal is formed as the combination of a plurality of factors, each factor being the product of a score signal and a load signal, the spore signal 10 defining the variation of data samples from record to record and the load signal defining the relative variation of a subgroup of samples in different positions of a record.
  2. 2. The method in accordance with claim 1 wherein a set of reference component signal values is provided which represents a reference pattern of samples and in each record the input signal is représentez bv a plurality of component 175 10 0 1ι; 2 6 9 change signal values for each record, each component change signal value being equal to the différence between reference pattern of samples and the record.
  3. 3 . The method of claim 2 wherein each record has the same number of samples arranged in a multi-dimensional array, a first of said component signais representing the magnitude of samples and a second of said component signais representing the position of a sample in the array. i
  4. 4. The methc.-d of claim 3 wherein a component change signal may .resuit in several pixels of the reference image being mapped to a common pixel of one of the frames, the intensity of the commor. pixel being equal to a weighted sum of the intensifies of the several pixels. 15 υ 1 u 2 6 S 176
  5. 5 . The method of claim 1 wherein at least one of e set of load signais and a set of score signais is selected for each component signal so as to be statistically représentative of variations in the corresponding characteristic among ail records.
  6. 6. The method of claim 3 wherein the number of factors and the précision of factors are selected so that the storage space required therefor will not exceed a predefined amount.
  7. 7. The method of claim 3 further comprising providing a plurality of error signais each corresponding to one of the component signais, each error signal providing correction to the extent that the corresponding component signal does not represent the corresponding characteristic of 15 the input signal within a predefined range. 177
  8. 8. The method of claim 7 wherein the number of factors and the précision of factors is selected to achieve error signais which remain below a predefined threshold value.
  9. 9 . The method of claim 8 wherein the number of 5 factors and the précision of factors are selected so that the storage space required therefor will not exceed a predefined amount.
  10. 10. The method of claim 1 further comprising provid- ing a plurality of error signais each corresponding to one of the component signais, each error signal providing correction to the extent that. the corresponding component signal does not represent the corresponding characteristic of the input signal within a predefined range.
  11. 11. The method in accordance with claim 10 wherein a 5 set of reference component signal values is provided which G 1 G 2 6 9 178 represents a reference pattern of samples and in each record cne input signal is represented by a plurality of comportent change signal values for each record, each component change signal value being eçrual to the différence between reference pattern of samples and the record.
  12. 12. The method of claim 1 wherein each record has the same number of samples arranged in a multi-dimensional array, a first of said component signais representing the magnitude of samples and a second of said component signais representing the position of a sample in the array.
  13. 13. The method of claim 12 wherein a component change signal may resuit in several pixels of the reference image being mapped to a common pixel of one of the frames, the intensity of the common pixel being equal to the sum of the 15 intensities of the several pixels. 179
  14. 14. The method of claim 12 wherein the input signal is a conventional video signal, each sample is a pixel of a video image, each record is a trame of video, said first component signal represents pixel inten9ity and said second component signal represents the location of a pixel in a frame
  15. 15. The method of claim 14 further comprising providing a plurality of error signais each corresponding to one of the component signais, each error signal providing correction to the extent that the corresponding component signal does not r I sent the corresponding characteristic of the input signal within a predefined range.
  16. 16. The method in accordance with claim 1 wherein set of reference component signal values is provided which repreeents a reference pattern of samples and in each record the input signal is represented by a plurality of component change signal values for each record, each component change 1 610269 ISO signal value being equal to the différence between reference pattern of samples and the record.
  17. 17. The method of claim 16 wherein a component change signal may resuit in several pixels of the reference 5 image being mapped to a common pixel of one of the frames, the intensity of the common pixel being equal to a weighted sum of the intensifies of the several pixels.
  18. 18. The method of claim 16 wherein a component ) change signal may resuit in several pixels of the reference 10 image being mapped to a common pixel of one of the f rames, the intensity of the common pixel being equal to be the différence between a constant and the sum of the intensifies of the several pixels.
  19. 19. The method of claim 16 wherein a component 15 change signal may resuit in several pixels of the reference 181 υ Π; 2 G 9 image being mapped to a common pixel of one of the frames, sa. method further comprising defining a depth for sach of the several pixels, the intensity of the common pixel being made equal to the intensity of the pixel among the several pixels 5 which has the least depth.
  20. 20. The method of claim 19 wherein the depth of pixels is defined as a separate domain represented by a third component signal.
  21. 21. The method of claim 16 wherein the reference image is provided with a collection of holons, the collection of holons containing every different holon appearing among ail the frames of the input signal.
  22. 22. The method of claim 21 wherein the location of pixel within the reference image is represented in a first 15 System of coordinates and the location of a pixel within at wRwæsisaa™··». 182 least one of the holons is represented in a different System coordinates.
  23. 23. The method of claim 21 wherein the location of pixel within different holons is represented in a different 5 System of coordinates.
  24. 24. The method of claim 21 wherein the holons include a set of pixels exhibiting coordinated behavior in at least one domain, and at least one of a load signai θ-nd score signal of at least one component signal opérâtes only on said set of pixels.
  25. 25. A method for producing a set of load and score signais for use in the method of claim 2 comprising the steps c a. determining the plurality of component change signal values as the différence between each record and 15 the reference pattern of samples; 110269 183 b. performing principal component analysis cr the plurality of component change signal values to extract a plurality of loads; c. projecting the plurality of component 5 change signais values on the plurality of loads to produce a set of score values which are applied to the plurality of loac to produce an approximated record; d. determining the différence between each approximated record and each record; ) e. repeating steps c and d until ,the différ- ence between each approximated record and each record is less than a predetermined value.
  26. 26. A method for producing a set of load and score signais for use in the method of claim 25, wherein the 15 principal component analysis is a weighted principal component analysis. 15 G 1 Ο 2 6 £ 184
  27. 27. A method for producing a set of load and score signais for use in the method of claim 16, comprising ti further step of extending the set of reference component signais to include additional component signais.
  28. 28. A method for decoding an encoded signal composed of a plurality of component signais in different domains to an input signal comprised of data samples organized into records of multiple samples, with each sample occupying a unique position within its record, said encoded signal repre- sented as a combination of a plurality of factors, each factor being the product of a score signal and a load signal, the score signal defining the variation of data samples from record to record and the load signal defining the relative variation of a subgroup of samples in different positions of a record, said method utilizing a reference pattern of samples, compris- ing the steps of: 185 a. multiplying each load signal by its assoc ated score signal to produce each factor; b. combining the factors produced in step a; c. modifying the set of reference component signal values according to the combined factors produced in step b to produce the records of a reproduced input signal.
  29. 29. A method for decoding on encoded signal as in claim 28 wherein at least one of the load signais and score signais is provided on a storage medium.
  30. 30. A method for decoding on encoded signal as in claim 28, wherein the reference component signal values are provided on the storage medium.
  31. 31. A method for decoding an encoded signal as in claim 28 wherein the method comprises the further step of 15 IC' 010269 186 receiving at least one of the load signais and score signais from a remote location over a communications medium.
  32. 32. The method of claim 31 wherein the reference component signal values are also received over the communica-tions medium.
  33. 33. A method for editing an encoded signal composed of a plurality of component signais in different domains to an input signal comprised of data samples organized into records of multiple samples, with each sample occupying a unique position within its record, said encoded signal repre- sented as a combination of a plurality of factors, each factor being the product of a score signal and a load signal, the score signal defining the variation of data samples from record to record and the load signal defining the relative variation 15 of a subgroup of samples in different positions of a record, 187 said method utilizing a reference pattern of samples, compris ing the steps of: a. modifying at least one score signal to achieve desired editing; b. multiplying each load signal by its assoc: ated modified score signal to prodüce each factor; c. combining the factors produced in step a; d. modifying the set of reference component signal values according to the combined factors produced in step b to produce the records of a reproduced input signal.
  34. 34. An apparatus for converting samples of an input signal to an encoded signal composed of a plurality of compo-nent signais each representing a characteristic of the input signal in a different domain, said input signal being comprisec of data samples organized into records of multiple samples, with each sample occupying a unique position within its record, comprising means for encoding each record as a combination, G 1 G 2 6 S 188 each component signal of a plurality of factors, each factor being the product of a score signal and a load signal, the score signal defining the variation of data samples from recoi to record and the load signal defining the relative variation of a subgroup of samples in different positions of a record.
  35. 35. The apparatus in accordance with claim 34 further comprising means for generating a set of reference component signal values which represents a reference pattern o: samples, means for producing for each record a plurality of component change signal values representing the input signal, each component change signal value being equal to the différ-ence between the reference pattern of samples and the record.
  36. 36. The apparatus of claim 35 wherein each record has the same number of samples arranged in a multi-dimensional array, a first of said component signais representing the G lû2 6 S 189 magnitude of samples and a second of said component signais representing the position of a sample in the array.
  37. 37. The apparatus of claim 36 wherein a component change signal may resuit in several pixels of the reference image being mapped to a common pixel of one of the frames, further comprising means for causing the intensity of the common pixel to be equal to a weighted sum of the intensities of the several pixels.
  38. 38. The apparatus of claim 36 further comprising means for providing a plurality of error signais each corre- sponding to one of the component signais, each error signal providing correction to the extent that the corresponding component signal does not represent the corresponding charac- 15 teristic of the input signal within a predefined range. 190 010269
  39. 39. The apparatus of claim 34 further comprising means for providing a plurality of error signais each corre-sponding to one of the component signais, each error signal providing correction to the extent that the corresponding component signal does not represent the corresponding charac- teristic of the input signal within a predefined range.
  40. 40. The apparatus in accordance with claim 34 further comprising means for generating a set of reference component signal values which represents a referenc(e pattern of samples, means for producing for each record a plurality of component change signal values representing the input signal, each component change signal value being equal to the différ-ence between the reference pattern of samples and the record.
  41. 41. The apparatus of claim 34 wherein each record 15 has the same number of samples arranged in a multi-dimensional array, said means for encoding causing a first of said compo- 191 nent signais representing the magnitude of samples and a secor of said component signais representing the position of a sampl in the array.
  42. 42. The apparatus of claim 41 wherein the input 5 signal is a conventional video signal, each sample is a pixel of a video image, each record is a frame of video, said first component signal represents pixel intensity and said second component signal represents the location of a pixel in a frame. I
  43. 43. The apparatus in accordance with claim 42 10 further comprising means for generating a set of reference component signal values which represents a reference pattern of samples, means for producing for each record a plurality of component change signal values representing the input signal, each component change signal value being egual to the differ- 15 ence between the reference pattern of samples and the record. 192
  44. 44. The apparatus of claim 43 wherein a component change signal may resuit in ssveral pixels of the reference image being mapped to a common pixel of one of the frames, the intensity of the common pixel being egual to a weighted sum of the intensifies of the several pixels.
  45. 45. The apparatus of claim 43 wherein a component change signal may resuit in several pixels of the reference image being mapped to a common pixel of one of the frames, further comprising means for controlling the intensity of the common pixel to egual the différence between a constant and the sum of the intensifies of the several pixels.
  46. 46. The apparatus of claim 43 wherein a component change signal may resuit in several pixels of the reference image being mapped to a common pixel of one of the frames, further comprising means for defining a depth for. each of the several pixels, and means for controlling the intensity of the 193 ϋ 10 2 β common pixel to be equal to the intensity of the pixel among the several pixels which has the least depth.
  47. 47. The apparatus of claim 43 wherein the refereno image includes a collection of holons, the collection of holoi 5 containing every different holon appearing among ail the frame of the input signal.
  48. 48. The apparatus of claim 47 wherein the holons include a set of pixels exhibiting coordinated be^iavior in at least one domain, said means for encoding producing at least 10 one of a load signal and score signal of at least one componen signal which opérâtes only on said set of pixels.
  49. 49. An apparatus for decoding an encoded signal composed of a plurality of component signais in different domains to an input signal comprised of data samples organized 15 into records of multiple samples, with each sample occupying a 194 unique position within its record, said encoded signal repre- sented as a combination of a plurality of factors, each factc being the product of a score signal and a load signal, the score signal defining the variation of data samples from recc 5 to record and the load signal defining the relative variation of a subgroup of samples in different positions of a record, said apparatus utilizing a reference pattern of samples, comprising: a. means for multiplying each load signal by t its associated score signal to produce each factor; I b. means for combining the factors produced step a; c. means for modifying the set of reference component signal values according to the combined factors 15 produced in step b to produce the records of a reproduced inp signal. G 1 Ο 2 β 9 195
  50. 50. An apparatus as in claim 49 further comprisinc storage medium containing at least one of the load signais ai. score signais.
  51. 51. An apparatus as in claim 49, wherein the stora 5 medium also contains the reference component signal values.
  52. 52. An apparatus as in claim 49 further comprising means for receiving at least one of the load signais and scor« signais from a remote location over a communications medium.
  53. 53. The apparatus of claim 52 wherein the reference 10 component signal values are also received over the communica- tions medium.
  54. 54. An apparatus for editing an encoded signal composed of a plurality of component signais in different domains to an input signal comprised of data samples organized 15 010269 196 ιυ into records of multiple samples, with each sample occupying - unique position within its record, said encoded signal repre- sented as a combination of a plurality of factors, each factor being the product of a score signal and a load signal, the score signal defining the variation of data samples from recor to record and the load signal defining the relative variation of a subgroup of samples in different positions of a record, said apparatus utilizing a reference pattern of samples,. comprising: a. means for modifying at least one score signal to achieve desired editing; b. means for multiplying each load signal by its associated modified score signal to produce each factor; c. means for combining the factors produced in step a; d. means for modifying the set of reference component signal values according to the combined factors v ! Ο 2 6 9 197 produced in step b to produce the records of a reproduced inp· signal.
  55. 55. A system comprising a reading apparatus and adata carrier containing data and adapted to be decoded accord- 5 ing to the method of any one of daims 28-32.
  56. 56. A system comprising a recording apparatus and adata carrier containing an encoded signal produced by themethod of any one of claims 1-28.
  57. 57. A system comprising a reading apparatus and a 0 data carrier comprising data and adapted to be decoded by the apparatus of any one of claims 49-53.
  58. 58. A system comprising a recording apparatus and adata carrier containing an encoded signal produced by theapparatus of any one of claims 34-48. 15 59. A system comprising a recording apparatus, a data carrier and a reading apparatus, wherein the data carriercontains an encoded signal produced according to the method ofany one of claims 1-28 and adapted to be decoded by the methodof any one of claims 28-32. 0 1 0 2 6 9 198
  59. 60. A System comprising a recording apparatus, adata carrier and a reading apparatus, wherein the data carrieicontains an encoded signal produced by the apparatus of any on·,of daims 34-48 and adapted to be read by the apparatus of any 5 one of daims 49-53.
  60. 61. A data carrier containing data recorded thereonand adapted to be decoded by the method of any one of daims28-32.
  61. 62. A data carrier containing an encoded signal jq produced by the method of any one of daims 1-28.
  62. 63. An apparatus producing a transmitted signalcontaining an encoded signal produced by the method of any oneof daims 1-28.
  63. 64. The encoded signal produced by the method of an}one of daims 1-28 provided on one of a storage medium and atransmission medium.
OA60791A 1993-09-08 1996-03-07 Method and apparatus for data analysis OA10269A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
NO933205A NO933205D0 (en) 1993-09-08 1993-09-08 Data representation system

Publications (1)

Publication Number Publication Date
OA10269A true OA10269A (en) 1997-10-07

Family

ID=19896406

Family Applications (1)

Application Number Title Priority Date Filing Date
OA60791A OA10269A (en) 1993-09-08 1996-03-07 Method and apparatus for data analysis

Country Status (10)

Country Link
EP (1) EP0748562A4 (en)
JP (1) JPH09502586A (en)
CN (1) CN1130969A (en)
AP (1) AP504A (en)
AU (1) AU693117B2 (en)
CA (1) CA2171293A1 (en)
NO (1) NO933205D0 (en)
OA (1) OA10269A (en)
WO (1) WO1995008240A2 (en)
ZA (1) ZA946904B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO942080D0 (en) * 1994-06-03 1994-06-03 Int Digital Tech Inc Picture Codes
EP0815536A1 (en) * 1995-03-22 1998-01-07 IDT INTERNATIONAL DIGITAL TECHNOLOGIES DEUTSCHLAND GmbH Method and apparatus for coordination of motion determination over multiple frames
WO1999007157A1 (en) * 1997-07-28 1999-02-11 Idt International Digital Technologies Deutschland Gmbh Method and apparatus for compressing video sequences
JP4224748B2 (en) * 1999-09-13 2009-02-18 ソニー株式会社 Image encoding apparatus, image encoding method, image decoding apparatus, image decoding method, recording medium, and image processing apparatus
US8600132B2 (en) * 2011-05-03 2013-12-03 General Electric Company Method and apparatus for motion correcting medical images
CN102360214B (en) * 2011-09-02 2013-03-06 哈尔滨工程大学 Naval vessel path planning method based on firefly algorithm
CN104794358A (en) * 2015-04-30 2015-07-22 无锡悟莘科技有限公司 Parameter estimation and fitting method for collecting supporting point frequency in vibrating wire mode
US11609353B2 (en) * 2017-09-26 2023-03-21 Schlumberger Technology Corporation Apparatus and methods for improved subsurface data processing systems
CN109064445B (en) * 2018-06-28 2022-01-04 中国农业科学院特产研究所 Animal quantity statistical method and system and storage medium
US20220237532A1 (en) * 2019-06-29 2022-07-28 Sameer Phadke System and Method for Modelling and Monitoring Processes in Organizations Using Digital Twins
CN111913866A (en) * 2020-08-19 2020-11-10 上海繁易信息科技股份有限公司 Method for monitoring equipment model data abnormity in real time and electronic equipment
CN112906650B (en) * 2021-03-24 2023-08-15 百度在线网络技术(北京)有限公司 Intelligent processing method, device, equipment and storage medium for teaching video
US11887222B2 (en) 2021-11-12 2024-01-30 Rockwell Collins, Inc. Conversion of filled areas to run length encoded vectors
US11915389B2 (en) 2021-11-12 2024-02-27 Rockwell Collins, Inc. System and method for recreating image with repeating patterns of graphical image file to reduce storage space
US12002369B2 (en) 2021-11-12 2024-06-04 Rockwell Collins, Inc. Graphical user interface (GUI) for selection and display of enroute charts in an avionics chart display system
US11954770B2 (en) 2021-11-12 2024-04-09 Rockwell Collins, Inc. System and method for recreating graphical image using character recognition to reduce storage space
US11842429B2 (en) 2021-11-12 2023-12-12 Rockwell Collins, Inc. System and method for machine code subroutine creation and execution with indeterminate addresses

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4394774A (en) * 1978-12-15 1983-07-19 Compression Labs, Inc. Digital video compression system and methods utilizing scene adaptive coding with rate buffer feedback
US4717956A (en) * 1985-08-20 1988-01-05 North Carolina State University Image-sequence compression using a motion-compensation technique
US4786967A (en) * 1986-08-20 1988-11-22 Smith Engineering Interactive video apparatus with audio and video branching
US5136659A (en) * 1987-06-30 1992-08-04 Kokusai Denshin Denwa Kabushiki Kaisha Intelligent coding system for picture signal
US5150432A (en) * 1990-03-26 1992-09-22 Kabushiki Kaisha Toshiba Apparatus for encoding/decoding video signals to improve quality of a specific region
EP0449478A3 (en) * 1990-03-29 1992-11-25 Microtime Inc. 3d video special effects system
JP3040466B2 (en) * 1990-07-17 2000-05-15 ブリテイッシュ・テレコミュニケーションズ・パブリック・リミテッド・カンパニー Image processing method
DE69222102T2 (en) * 1991-08-02 1998-03-26 Grass Valley Group Operator interface for video editing system for the display and interactive control of video material
US5392072A (en) * 1992-10-23 1995-02-21 International Business Machines Inc. Hybrid video compression system and method capable of software-only decompression in selected multimedia systems

Also Published As

Publication number Publication date
CA2171293A1 (en) 1995-03-23
EP0748562A1 (en) 1996-12-18
JPH09502586A (en) 1997-03-11
AU7871794A (en) 1995-04-03
WO1995008240A2 (en) 1995-03-23
AU693117B2 (en) 1998-06-25
WO1995008240A3 (en) 1995-05-11
ZA946904B (en) 1995-05-11
CN1130969A (en) 1996-09-11
AP504A (en) 1996-07-01
AP9400673A0 (en) 1994-10-31
EP0748562A4 (en) 1998-10-21
NO933205D0 (en) 1993-09-08

Similar Documents

Publication Publication Date Title
US5983251A (en) Method and apparatus for data analysis
OA10269A (en) Method and apparatus for data analysis
EP1016286B1 (en) Method for generating sprites for object-based coding systems using masks and rounding average
CA2432741C (en) Transformation block optimization
US6075875A (en) Segmentation of image features using hierarchical analysis of multi-valued image data and weighted averaging of segmentation results
USRE37668E1 (en) Image encoding/decoding device
EP0888592B1 (en) Sprite coding and decoding
US5692063A (en) Method and system for unrestricted motion estimation for video
US5854856A (en) Content based video compression system
CA2205177C (en) Mosaic based image processing system and method for processing images
US5790269A (en) Method and apparatus for compressing and decompressing a video image
WO1995006297A1 (en) Example-based image analysis and synthesis using pixelwise correspondence
US5485212A (en) Software video compression for teleconferencing
US6757441B1 (en) Image data encoding/decoding method and apparatus
Francois et al. Coding algorithm with region-based motion compensation
Agrawala et al. Model-based motion estimation for synthetic animations
JPH08149461A (en) Moving image processor and method therefor
JPH08149458A (en) Moving image processor
JPH08161505A (en) Dynamic image processor, dynamic image encoding device, and dynamic image decoding device
Karczewicz et al. Robust B-spline image modeling with application to image processing
Indra Very low bit rate video coding using adaptive nonuniform sampling and matching pursuit
JPH07107486A (en) Method for detecting hierarchical moving vector
JPH08153210A (en) Encoding device and decoding device for moving image
Komatsu et al. Global motion segmentation representation for advanced digital moving image processing
Chen et al. Method and apparatus for processing both still and moving visual pattern images