METHOD AND APPARATUS FOR VIDEO STREAM ANALYSIS
BACKGROUND OF THE INVENTION FIELD OF THE INVENTION
The present invention relates to video processing in general, and to a method and system for the determination of areas of high and low significance within frames in a video stream in particular. Further, the present invention relates to a method and apparatus for inserting external content and visual effects in the low significance areas of frames in the video stream.
DISCUSSION OF THE RELATED ART The present invention describes a method and system for determining areas of low and high significance within frames in a video stream. Identification of objects and separation of areas within a video frame is considered a very complicated and resource demanding computational task. Numerous applications may be accomplished in respect of video streams offered if areas of low and high significance in the video frames are identified. Such applications can include content insertion, visual effect placement, dynamic interaction with specific objects in the frames of the video stream, and the like. In the field of image and video processing there are several known techniques for segmentation of images and recognition of patterns and objects. A basic approach to such analysis is to identify an object within an image by identifying a pattern that would indicate the presence of the object within the image. Digital images and video frames are comprised of a large number of pixels, each represented by its position and intensity. Analysis methods that find correlations and relationships between groups of pixels within an image (with relation to their position and intensity) can provide a means to determine the composition of the image with respect to objects displayed in it. There are numerous methods
for image segmentation, pattern recognition and computer vision. Each group of methods stems from a common concept either mathematical or in terms of the processing algorithm. None of the known analysis method groups is viable for the solution of the high-low significance problem as we approach it. Image segmentation methods generally employ statistical analysis (employing statistical distribution functions), frequency domain analysis or edge extraction systems that mostly use derivative functions. These methods are basically two-dimensional and do not make use of the multi-frame data in video sequences. Known pattern recognition methods are mostly based on a filter-based analysis of the image (i.e. Gaussian or Laplacian filtering) and matching processes that equates two image segments or templates to each other and minimizes the error value. Computer vision methods try to imitate the process of human visual procession on a computer. One system in the art proposes to "teach" a computing device how certain objects may look and the manner of "appearance" of such objects within the image frame. Such a system is described in U.S. patent 5,546,476, which discloses a method by which the identity of an object is determined by obtaining an image of the objective shape, obtaining from that image an optimum objective shape representative of the object, and comparing the optimum objective shape with a known shape model of the object. The shape model is represented as a plurality of nodal points interconnected by line segments. The optimum objective shape is selected from a number of objective shapes each comprised of a row of dots having the same number of nodal points as the shape model. Each objective shape is compared with the shape model on the basis of (i) the proximity of the dots of the objective shape to the line segments of the shape model, (ii) the length of the dot segments with respect to the lengths of respective line segments, and (iii) the inclinations of the dot segments with respect to the inclinations of the line segments. The objective shape having the closest similarity to the shape model is selected as the optimum objective shape to be compared with the shape model for identifying
the object. This system tests the tolerance of the examined image with known or predetermined shapes and attempt to match object within an image to the "known" images.
Known methods have proved inadequate when attempting to separate images into areas of high and low significance. Such methods are do not provide accurate results in the separation process requiring substantial computing time and resources. In addition, systems such as those described above focus on the single image as a data source. Such systems do not take into account the object's movement as a result of a video stream input. In respect of video streams having a plurality of images the methods of the prior art are not practical for on-line recognition of objects because a large number of images must be processed in a very short period of time. Moreover, an analysis based on an image by image basis fails to take into account the stream of images and the changes occurring there within. Therefore, previous solutions have been inadequate for identifying areas of low and high of significance within frames in a video stream.
SUMMARY OF THE PRESENT INVENTION It is an object of the present invention to provide a novel method and apparatus for determination of the areas of low and high significance within frames in a video stream. One aspect of the present invention regards a method for deterrnining areas of significance within a plurality of images whereby the method is executed on the computer system, the method comprising the steps of receiving a video stream having at least two images represented by pixels associated with the images on the time axis vector; analyzing the images within the video stream to determine the areas of significance within the images; and applying the threshold value to the at least one analysis frame thus deterrnining the areas of significance within the images.
Another aspect of the present invention regards a method for analyzing the images within the video stream to determine the areas of significance within the images the method further comprises creating a pixel vector array corresponding to the images represented by the pixels associated with the images on the time axis vector, the array storing intensity values for pixels within the images; and saving the pixel vector array; applying at least one function to determine the pattern of changes within the pixel vector array generating at least one analysis frame; and deterrnining the threshold value of the at least one analysis frame; inserting content in the areas of low significance within the images or inserting functional display effects in the areas of low significance within the images or interacting with areas of significance within the images. A further aspect of the present invention regards the receiving of compensation data associated with the image-taking device; calculating the motion of the image-taking device; and adjusting the data associated with the images to correspond to the motion of the image-taking device.
A second aspect of the present invention regards a method for deterrnining areas of significance within the plurality of images, the method being stored on a computer readable medium, the medium being executed on a computer system, the method comprising the steps of receiving images on the time axis vector; creating a vector pixel array for each pixel location corresponding with the intensity value of each pixel within an image on the time axis; receiving a plurality of intensity values corresponding to pixels on the time axis; applying function or functions to the plurality of the intensity values; creating an analysis frame corresponding to each function results; normalizing each analysis frame; determimng the weight for each of the analysis frame; combining the analysis frames; determining a threshold value; adjusting the threshold value to result in threshold percentile corresponding to the percentage of pixels within the analysis frame that has the first predetermined value; and applying the threshold value to the analysis frame. Another aspect of the present invention regards the method for applying of function or functions to the plurality of the intensity values, the method comprising the steps of applying a first function for identifying the standard deviation of each pixel in the plurality of images; applying a second function for identifying the standard deviation of each pixel in the plurality of images; applying a third function for identifying the standard deviation of each pixel in the plurality of images; and applying a fourth function for identifying the standard deviation of each pixel in the plurality of images. The first function calculates the standard deviation of the pixel vector array. The second function calculates the kurtosis variation of the pixel vector array. The third function calculates the entropy variation of the pixel vector array. The fourth function approximates the pixel vector array with a like parabola substitute function. The fourth function is performed twice to avoid the loss of fast moving objects. The functions applied can be based on the standard deviation function, the Kurtosis function, the Least Squares method, and the Maximum Entropy Method
Yet another aspect of the present invention regards the method for normalizing each analysis frame, the method comprises the adjustment of the function results representing a set of measurements according to a transformation function thus transforming the results comparable with specific points of reference. The user can predetermine the weight for each of the analysis frames. The deterrnining of a threshold value can comprise calculating the threshold value on the basis of the image intensity values. The determining of a threshold value can alternatively comprise the steps of receiving the base value for the threshold; exarnining each pixel within the analysis frame to determine if the pixel is above the threshold value; setting each pixel which value exceeds the base value for threshold to a first predetermined value; setting each pixel which value does not exceeds the base value for threshold to a second predetermined value; applying a filter for cleaning isolated pixels having the first predetermine value; determine the percentile threshold corresponding to the percentage of pixels within the analysis frame having the first predetermined value. A user of the system can predetermine the base value for threshold. The first predetermined value can be one and the second predetermined value can be two.
Yet another aspect of the present invention regards the method for applying a filter, the method comprises the steps of examining each pixel within the analysis frame for the number of pixels having the first predetermined value in the pixel immediate vicinity; setting the pixel to the first predetermined value if the number of pixels having the first predetermined value in the pixel immediate vicinity is greater than two; and setting the pixel to the second predetermined value if the number of pixels having the first predetermined value in the pixel immediate vicinity is smaller than two. Applying the threshold value can comprise the steps of setting each pixel which value exceeds the threshold value to the first predetermined value; and setting each pixel which value does not exceeds the threshold value to a second predetermined value.
A third aspect of the present invention a video analysis apparatus for determining areas of significance within the plurality of images the apparatus being stored on a computer readable medium and executed by the computer system, the apparatus comprising a video input handler for receiving a plurality of images and creating pixel vector arrays; a pixel vector analysis module for applying functions to the pixel vector arrays; a thresholding function module for deterrnining the threshold values for creating analysis frames and applying threshold values to the analysis frames; and a filtering component for filtering the pixels within the analysis frames. The apparatus can further comprise a frame handler. The frame handler can comprise a content insertion module and an effect insertion module for inserting content and effects in the areas of low significance within the frames in the video stream. The content insertion module inserts rich media in the areas of low significance in the frames of a video stream. The content can be advertisement, special effects, information, and the like.
A fourth aspect of the present invention regards a method wherein the step of applying comprises the steps of applying a first function for identifying the behavior of each pixel vector in the plurality of images; applying a second function for identifying the behavior of each pixel vector in the plurality of images; applying a third function for identifying the behavior of each pixel vector in the plurality of images; and applying a fourth function for identifying the behavior of each pixel vector in the plurality of images. The first function can calculate a value based on the Standard Deviation of the pixel vector array. The second function can calculate a value based on the Kurtosis variation of the pixel vector array. The third function can calculate a value based on the Entropy variation of the pixel vector array. The fourth function can approximate the pixel vector array with a like parabola substitute function based on the Least Squares method. The fourth function can be performed twice to avoid the loss of fast moving objects.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which: Fig. 1 is a flowchart of the method for deterrnining the areas of low and high significance within frames in a video stream, in accordance with a preferred embodiment of the present invention;
Fig. 2 is a schematic illustration of a video stream pixel vector used in the determination of areas of low and high significance within frames in a video stream, in accordance with the preferred embodiment of the present invention;
Fig. 3 is a model graph showing one pixel intensity levels on a time frame axis, in accordance with the preferred embodiment of the present invention;
Fig. 4A is a model array showing three pixels intensity levels values represented on the time frame axis, in accordance with a preferred embodiment of the present invention;
Fig. 4B is a graph showing the values of the array of Fig. 4A on a time frame axis, in accordance with a preferred embodiment of the present invention; Fig. 4C is a three dimensional representation of the three pixels of
Fig. 4A within a frame cube on a time frame axis, in accordance with a preferred embodiment of the present invention;
Figs. 5 is a flowchart showing the pixel vector analysis of Fig. 1, in accordance with the preferred embodiment of the present invention;
Fig. 6 A is a flowchart showing the thresholding of Fig. 1, in accordance with the preferred embodiment of the present invention;
Fig. 6B is a flowchart showing the thresholding identification routine of Fig. 6A, in accordance with the preferred embodiment of the present invention;
Fig. 7 is a pictorial illustration of the operation of the morphological filter of Fig. 3, operative in accordance with the preferred embodiment of the present invention;
Fig. 8 is schematic illustration of the system operative in accordance with the preferred embodiment of the present invention;
Fig. 9 is a schematic illustration of the operation of a video analysis and content insertion embodiment, in accordance with the preferred embodiment of the present invention;
Fig. 10 is a pictorial illustration combined with a flowchart showing the resulting video frame images at each stage of the video analysis and content insertion, in accordance with the preferred embodiment of the present invention; Fig. 11 is a pictorial illustration of the resulting video frame images after the analysis functions and thresholding and filtering stages, in accordance with the preferred embodiment of the present invention;
Fig. 12 is a pictorial illustration of the content placement, in accordance with the preferred embodiment of the present invention;
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention overcomes the disadvantages of the prior art by providing a novel method and apparatus for determining areas of low and high significance within a plurality of images comprising a video stream. Such operation serves in one preferred embodiment to separate the background from the foreground objects within the image. Various applications such as content insertion, visual effect insertion, and interactive service objects inserted into the area of low significance and the like can be used once the areas of low and high significance are established in respect of the images comprising a video stream. Video devices such as video cameras and the like obtain such images.
Each image comprises a plurality of pixels. For example, in television screens employing the PAL system there are usually 293,750 pixels divided in 625 rows and 470 columns. Each pixel is identified by its location in the image (X,Y position) and its intensity value (I). Thus a pixel is traditionally described as Pixel (X, Y,T). The intensity value represents the intensity of the light emitted by a display device such as a monitor when the pixel is shown. The combination of pixels' intensity in an image creates the picture seen by the observer.
Images taken by a video-capturing device are processed to determine via analysis based on time axis vectors the areas of low and high significance within the images. Low and high significance in accordance with the present invention is determined through the identification of the behavior of pixels on a time vector within multiple frames of the video stream. Detected areas identified as high significance are marked as foreground, while areas identified as low significance are marked as background. Areas of high significance can include active elements such as a person moving about, and areas of low significance can include passive elements such as a wall, furniture, a fan or other mariimate objects and the like. The method and apparatus of the present invention are used in conjunction with a computing device such as a personal computer (PC) to analyze the areas within an image and to assess the relative significance of such
areas. Identification of areas and objects within frames in a video stream can be extremely beneficial. Such identification allows, for example, the insertion of content such that the content will not impinge on the significant areas of the images constituting the video stream shown. The preferred computing device in accordance with the present invention is a personal computer having there within a central processing unit (CPU), a memory device, a video Input Output device (such as a I/O Video Card) by Optibase Inc. of San Jose California, Dazzle Inc. of Fremont California or Pinnacle Systems of Mountain View California and an operating system, such as Windows developed by Microsoft Corp of Seattle Washington or Unix developed by Sun Inc. of Palo Alto California. Other computing devices having a CPU or a number of CPUs and one or more memory devices can also be used in conjunction with the present invention. It should be understood that the preferred computing device is not intended to illustrate one specific computer rather an example. The elements of the computing device are not limited to what is shown and may be varied and described in ways known in the art. I/O Video devices to be used can include one or more channels of information through which video stream input may be processed. For the purpose of simplicity and to facilitate a better understanding of the present invention the process employed with one channel is described here below. Such description is not intended to limit the scope of the present invention.
Reference is now made to Fig. 1 showing a flowchart of the method for determining the areas of low and high significance of an image within a video stream. Video stream that includes a plurality of images along a time axis is received in step 4. Video stream input can be in any analog or digital format.
If the video stream input is analog then the video stream is converted from analog to digital by means of a common analog to digital converter. The video stream input comprises a plurality of images, each image comprises of pixels formed in rows and columns. Color images contain multiple pixel values for
color representation. In accordance with the preferred embodiment of the present invention, a predetermined block of images within the video stream is processed and analyzed. The block of images is defined as a cube of images or simply as a cube. In the preferred method all the images within a cube are processed. Optionally, a number of selected images within the cube are processed. Such partial processing can increase the speed of analysis. Preferably about twenty to thirty images are processed at a time (as a single cube). Twenty or thirty images normally account for about one second of video stream time. A cube has the capacity to hold any number of images. The images within the cube are processed along the time / frame vector. Thus, for each pixel location within the time / frame vector a pixel vector array is created. The pixel vector array is an array storing on a time axis for each pixel within an image the intensity value. Such array can be represented as A(t) where A stands for the array and / stands for the time segment (or time index) of each image or frame number. Intensity values (also representing intensity levels) within each pixel vector array vary according to the intensity of each pixel. The array is further described in detail in association with Figs. 3, 4A, 4B and 4C. If the image contains color data, such data is transformed into the associated intensity (gray scale) value. Such value is stored in the pixel vector array for each pixel. The pixel vector array is saved onto a memory device for analysis as will be described below. Optionally, in step 6 data is received to compensate for the motion of the image-taking device. For example, if a handheld video camera device is used to acquire the images received, it is likely that such device was not stable during the time the images were captured. Receiving information about the movement of the picture-taking device can compensate the movement of such a device during the images capture process. In addition, in the motion compensation process step small camera motions that is the result of the camera operator's aiming processes are negated.
Small movements on the horizontal and vertical axes can interfere with the analysis process and are therefore discarded. Such small motion information is
referred to as motion estimation information as it provides information about the movement of the image-capturing device. The motion estimation information will also be saved onto the memory device for further analysis. In step 8 the pixel vector analysis is performed. The pixel vector analysis will be further described in detail below in conjunction with Figs. 5. In general, in the pixel vector analysis step at least one (and preferably between three and ten) function or functions are used to determine the pattern of changes within the pixel vector arrays. Each function generates one result for each pixel vector array generating at least one and up to about ten analysis frames. In the preferred embodiment of the present invention and in the example shown, four different functions are used to determine the significance of each pixel within the image cube. If one function is used only one analysis frame is created. An analysis frame that includes the results for each pixel in the image is obtained from the function(pixel(A , A(t+1), A(t+n))J. It will be appreciated that other similar functions may be employed to obtain like results and that more than ten functions may be used. Thresholding and filter analysis is performed next in step 12. The step of thresholding and filter analysis is further described in association with figs. 6 and 7. In general, in this step the threshold value of intensity of each pixel within a frame analysis is determined. First the results received are normalized. Then, the threshold for the analysis frames is determined. The determination of the threshold value is further detailed below as it may take at least one form. In the preferred embodiment of the present invention two methods are used separately to find the threshold value. Next, if there is more than one analysis frames such are combined by a process of addition in accordance with the respective weight of each analysis frame. Furthermore, a morphological filter is applied to the analysis frames to further filter "non-objects" within the image examined and to better define the boundaries of the significant areas within the image. The filter operation is designed to eliminate small ("unimportant") objects such as noise, quality spots, highly transient objects and the like. The filter operation also enhances the larger
objects identified thus enhancing the high significance areas of the image. Once the filter has been applied to the analysis frame the threshold value is determined and the threshold process is performed. The threshold process divides the pixels in the analysis frame to either a pixel having a value of 0 or a pixel having a value of 1. Those pixels having a value of more than threshold are set to the value of 1 (depicted in black) and those having a lower value then the threshold are set to 0 (depicted in white). Once the threshold value is set and the analysis frame is transformed the areas having a value of 1 are considered as high significant areas. Other areas are considered of low significance. There are a large number of applications that can make use of the method and apparatus disclosed herein. For the purpose of example three applications are enumerated in association with steps 12, 14 and 16 of Fig. 1. Optionally, in step 12 diverse content may be inserted in the areas of low significance. By insertion of the content in the areas of low significance, areas of high significance are not blocked from sight. For example, during a sport's event broadcast shown on the television, the movement of the players on the field is likely to create the result that the bottom part of the image is of high significance. Thus, advertisement, commercials and the like, can be inserted in the upper area of the image. Similarly and optionally, in step 14 various functional display effects or programs such as e-mail services, chat areas, television set controls and like other interactive services can be inserted in the areas of low significance. Moreover, another optional use of the method and apparatus of the present invention can be the interaction with an input device that will interact with the areas of high significance. A game can be played by the user on the areas of low significance while still viewing those areas of high significance. Similarly, the game itself may interact with the areas of high significance. For example, such games can include the well-known "breakout" game in which the ball interacts with the area of high significance, and the like. Persons skilled in the art will appreciate that many other uses and applications that can be applied to the
present method and apparatus all of which are contemplated by the present invention.
The method of deterrnining the areas of low and high significance within the image will now be described in greater detail. Turning now to Fig. 2 there is shown a schematic illustration of a video stream pixel vector selection operation. In order to determine the areas of low and high significance within the image the pattern of each vector of pixels is exar ined. In the present description reference to image is made likewise to frame both refer to a plurality of two-dimensional array of pixels representing a recorded representation of a visual object. It is noted that in case of color frames such contain additional color values. The significance value of each pixel along the vector is shown in Fig 2. Images 40, 42, 44, 46, 48 are received along time axis 70. Preferably, each consecutive image received is processed. For each pixel location vector an array is created indicating the values of that pixel in each of the different frames on the time / frame axis. In the example shown frames 42, 44, 46, 48 are examined. Values of pixel 80 (shown as values 52, 54, 56, 58 in each of frames 42, 44, 46, 48) are collected along the image coordinates X,Y of pixel 80. Each value 52, 54, 56, 58 represents the pixel 80 gray scale intensity (converted into gray scale in case of color image) associated with the specific frame 42, 44, 46, 48 representing the time frame axis. Thus, the value 52 of pixel 80 can be 1 when examined in association with frame 42 and of the value 10 when examined in association with frame 46 (shown as value 56) at the same coordinates of pixel
80. The values collected in points 52, 54, 56, 58 represent the pixel vector array.
The pixel vector array can also be represented as a graph as seen in Fig. 3. The graph that emerges from the different values indicates the type of activity that took place in the portion of the video stream images examined (42,
44, 46, 48 and the like). The activity is indicated on the intensity axis 92 as it changes along the time frame axis 90. In the present example of Fig. 3, a vector of 15 pixel values is exarnined. The 15 images together are defined as a cube. In
a possible 15 frame per second rate video stream the cube will represent about a one-second- video stream. Accordingly, vector 91 holds information on the activity that took place within the last second of the video stream images for the pixel 80. Two peaks 94, 96 are shown in the graph. Each represents heightened levels of intensity value for pixel 80. The values for each pixel are stored in a pixel vector array representing the various values of pixel intensity along a time axis.
In Fig. 4 A an array 100 shows the pixel intensity levels values represented on the time frame axis 104 for pixels 102, 102', 102". Unlike in the previous example, values of three pixel vector arrays 102, 102', 102" are shown. A pixel vector array is created for each pixel vector in the cube. Each is represented by an array. A standard digital image contains a large number of pixels for each frame, thus requiring several mega bits per second (Mbps) throughput rate for standard video rates of 15 to 30 frames a second. As can be seen from Fig. 4B the values of the arrays 102, 102', 102" of Fig. 4A can be represented by a graph on a time frame axis showing the three pixel vector arrays values. A three dimensional representation of the three pixel vector arrays 102, 102', 102" of Fig. 4A are shown in Fig. 4C. The cube 102 represents the 15 frames stacked together. Rectangle 102, 102', 102" represent the three pixel vector arrays on the time / frame axis. Bit value 105 represents one pixel information along the time / frame axis having a single intensity value for pixel 102 at a particular point in time. To deteπnine areas of low and high significance at least one (and preferably more) statistical functions are applied to each pixel vector array. Review of pixel intensity along the time axis will show that pixels of less sigmficance are likely to show a pattern of moderate or no change or continuous or repetitive changes. A pattern of moderate or no change detected in a pixel vector array indicates that the pixel examined is part of a fixed or moderately repetitious part of the image (such as background or manimate object) and is thus afforded a low significance. A pattern of continuous or
substantial repetitive changes is likely to indicate that the pixel examined is not "organic" but rather mechanical in nature (such as a fan). Patterns that do not meet the above behavior are considered as high significance. The pattern of behavior is determined by the use of functions applied to the pixel vector arrays. Thus the specific function would apply to values 52, 54, 56, 58 of Fig. 2 in the following manner: funcfvaluel, value 2, value n, value n+1).
Reference is now made to Fig. 5 showing a flowchart showing the pixel vector analysis of Fig. 1. The pixel vector analysis employs at least one function in deteπnining the sigmficance of each vector pixel array (such as 102') in the processed cube. In the preferred embodiment four functions: (1) like standard deviation 110, (2) like Kurtosis 112, (3) like maximum entropy 114, (4) like parabola substitute 116 are used to determine the behavior of pixel 80 of Fig. 2. A person skilled in the art will appreciate that several other functions having a like result can be used to determine the pattern behavior of each pixels vector array. The preferred functions are described next. In the following functions N represents the input pixel vector array exarnined. The history vector V^ of pixel vector array N contains the sequential intensity values of pixel coordinates (x,y) in the cube. Function 110 can be a function similar to the standard deviation function wherein the standard deviation of N is Std whereby j . The deviation of N is calculated by the statistical pattern of the values of
each pixel in the N images. The results received by the like standard deviation function are in the region of about 0 - 100. A larger deviation value indicates that there is significant change in the area of the image where pixel 80 is located. Function 112 can be a function similar to the Kurtosis function whereby the kurtosis variation of N is obtained by -3. The results received by the like
* Kri kurtosis function are in the region of about -5 - +5. Function 114 can be similar to an entropy function whereby the entropy variation of N is obtained by
where ^ =0.001, δ = Kbm σ ι„ „„M " Dm slze > I w =0.5 - bin coefficient
and σ -Std of the representing Image. The calculated n is inserted in an initial zeroed array A in the following form: An = An + 1 . The resulting entropy
value of the whole vector is calculated in the form E = ∑- (An + ϊ)\og(An +1) n=0
The result received from the like entropy function are negative. Another function 116 can be a function similar to a Parabola Substitute (PS) function whereby the vector Vxy is approximated with a parabola in the least-square method.
yk = a + bk + ck ,
Δ fl a = a ,a = Δ b, ,a = Δ c A Δ Δ Δ
N N N
Sy = ∑ v* > Sy = Σ Vkk> Sy = ∑ vtk>
4 = 1 A = l k = l s = „ N (N - 1)( 2N - I)
' - 2 ύ xx - 6
The PS parameter is calculated as a standard deviation of the result of
the parabola in the following form: PS = ^ + 2bcS + c' '—X^ * + cS~)2
Optionally, in order to avoid loosing fast moving objects the PS function is performed twice. Once for history length N and second time for
KhistoryN, where the input parameter Khistory <1. The result value is determined as
maximal of both PS functions performed. The result value is in the region of about 0 - 100.
Each function provides one result representing a significance value for the N pixels exarnined in each pixel vector. For each function an analysis frame is created holding the results of the function for all pixel vectors in one image cube. One analysis frame results from the application of each function to all the pixel vectors in the cube. The result of the analysis of one cube by one function in one analysis frame. In the present example, four analysis frames are received, one from each function 110, 112, 114, 116 operation. Such frames basically provide four illustrations of the cube images put together as affected by the functions. The results are recorded in the memory device as IstΛ hn, lent, and IpS where / defined an analysis frame. Thus, the entropy analysis frame (Ient) received will include values resulting from applying function 116 to the various pixel values (such as 52, 54, 56, 58 of Fig. 2). The statistical functions measure the distribution and the relation to a Gaussian normal curve and thus evaluate the behavior of the particular vector over time. Entropy is a direct assessment of the grouping factor and order of the pixel values in the time domain. The parabola substitution is a means to correlate a polynomial descriptor to the data for better evaluation of the overall behavioral activity without the interference of noise and rogue pixel values.
Once the various functions have been applied to each analysis frame the frames are normalized in step 120. Normalization is defined as the adjustment of a series or a vector of values typically representing a set of measurements according to some transformation function in order to transform the results comparable with some specific points of reference. Each analysis frame (Istd, hn, hm, Ips and /„) is normalized separately. The following operations are used to normalize each pixel in the analysis frame. The analysis frame is represented as input Q, each pixel is represented as q^ where / and / represent the
coordinates of the relative pixel. The result of the normalization is represented as Q :
~ -MQ = —
where
two-dimension Std of the image Q
M
Q = two-dimension Mean of the image Q. LR
The inversed normalized image Q'm is determined as normalized image of an inversed image Q'm , which elements are calculated as follows:
_ιm> 1
% = γ. ; — r -
In the present example, each of the four analysis frames Istd, Ib-t, nt, Ips is normalized. It will be evident that any other analysis frame require normalization as well. It will also be readily appreciated that like methods for normalizing the analysis frames can be employed to achieve like result. In step 124 weight detemiination of each image analysis frame (Is,d, n, lent, IPs and /„) is made. The weight of each image analysis frame is predetermined by the user. In the preferred embodiment all analysis functions are of equal weight. The weight for each function is 25% assuming there are 4 functions or 20% if there are 5 functions resulting in equal distribution of weight irrespective of the function used. Other weight distribution can be employed including a non-equal
distribution. The weight parameters are inserted by defining the Kstd,KKrt ,KEnt,KPS,Kn wherein the combined function weight is 1:
Kstd + KKrt + KEnt + Kps + Kn = 1 Next the analysis frames are combined in step
126. The combined image (Iw ) is obtained by a simple addition function relative
to the weight distribution: Iw = K^ ^ +K^I^ +K^ ^ +KFSIPS +K n The image Iw will next be used for the purpose of applying the threshold value to be applied to Iw . While the threshold value can be calculated and applied to one analysis frame such as other analysis frames can be used to further tune the thresholding process. In the present example the PS analysis frame (IPJ is provided with a different threshold and later combined with the Iw analysis frame. The function of the thresholding step 8 of Fig. 1 is to apply one threshold value to the values of analysis frame Iw . The value determined as the threshold determines whether each pixel in the analysis frame going through a threshold is defined as significant or not. In essence, the analysis frame Iw is flattened to create only two values in the image - significant (1) or not significant (0).
In the present example, the threshold value for Iw is obtained through the use of the Kurtosis analysis frame Ikrt . The Kurtosis analysis frame and function is the preferred frame and function for obtaining the threshold value because of its unique mathematical properties. Step 118 represents the steps in Fig 6, which can take place at any time after the Kurtosis and the PS functions have been applied to the pixel vector arrays and the analysis frames. Thus, the determination of the base threshold values as will be described next can take place at any time after the relative functions which are used for such determination have been applied to the pixel vector arrays. It will be appreciated by those skilled in the art that other analysis frames and like functions can be used for determining the base threshold values.
Fig. 6A shows a flowchart block diagram showing the thresholding of Fig. 1 and further explains the method for accomplishing the thresholding operation of step 118. While only one method for obtaining a threshold can be used, the present example shows two methods for obtaining a threshold value to be applied to an analysis frame. In accordance with the first method for finding a threshold in step 130 the basis predetermined value for the analysis is received. In accordance with the present example in step 130 the Kurtosis function analysis frame result (IKrt ) is received. The base threshold is set to about -0.9.
The base threshold Tis saved to the memory device. Next in step 131 each pixel within the I
Krt analysis frame is examined. If the examined pixel exceeds the threshold the pixel is defined as 1 (indicating high significance) otherwise it is set to 0 (indicating low sigmficance). The result is a flattened I
Krt image analysis frame having either values of 1 or 0. Next in step 132 the transformed I
Krt image analysis frame is passed through a morphological filter that is similar to a common "close" operation. The morphological filter is further described in association with Fig. 7. The flattened I
Krt image analysis frame is filtered in order to clean out "noise". Noise is generated as a result of camera (or like devices) errors and transmission problems. Pixels that show spiked behavior due to systematic sampling error or a transmission interference pattern will register a significance value and pass through the threshold filter. But, such pixels will be isolated and are not part of an image feature or object. A morphological filter is then used to 'clean out' single detached seemingly significant pixels and at the same time enhance the appearance of well grouped pixels. Each pixel 150 within flattened I
Krt analysis frame is checked about the number d of positive neighbor pixels (having a value of 1), including itself. The examined pixel is set to the value of 1 if d>2 or optionally if 'd' is larger then a different pre-selected filter parameter. Otherwise the value of the examined pixel is set to 0. The result is a filtered image analysis frame 132. For example, pixel 152 has only one positive
pixel in its immediate surrounding, including itself. It is therefore set to 0 (pixel 156). In contrast pixel 156 has three positive pixels in its immediate surrounding (including itself). It is therefore set to 1 (pixel 158). In step 134 the number of positive pixels is obtained from the I
Krt analysis frame. The filtered image analysis frame is checked to determine what percentage of pixels in the analysis frame is positive. In step 136 the percentage value is stored as the base threshold value under the condition that an analysis frame going through thresholding keeps the same percentage of positive pixels. At this stage the number (percentage) of significant pixels from the kurtosis analysis frame Kurtosis thresholding and filter operation is stored. Other thresholding operations are performed next using the pixel count of the kurtosis analysis. For example, if the Kurtosis analysis found 150 significant pixels (which are 1.9% of the total pixels in the image) then all other analysis frames will be put under a threshold that produces the same amount of significant pixels resulting in a kurtosis percentile threshold - only a certain top percent of the data gets through. Next in step 137 threshold for other analysis frames such as I
std , I
ent , I
n is determined based on the Kurtosis percentile threshold. In step 138 it is determined if an additional thresholding will take place in accordance with the operator's predetermined instructions. Reference is made to Fig. 6B showing the steps taken to deteπnine the threshold value for other analysis frames based on the base threshold obtained in step 136 of Fig. 6A. In the current preferred embodiment the base threshold is deduced from the Kurtosis analysis. A person skilled in the art will readily appreciate that other functions can be substituted. In step 139 the number of positive pixels determined in step 134 of Fig. 6 A is retrieved from the memory device. The count of positive pixels is preferably received from threshold base frame. Next, in step 140 the analysis frame for thresholding is retrieved from the memory device. In step 141 the base threshold T is applied to the retrieved
analysis frame in a similar manner, as is described in step 131 of Fig. 6A. Then, the number of positive pixels having a value of 1 is determined through a simple count operation (step 143). At step 143 it is determined if the number of positive pixels is equal to the number of positive pixels obtained in step 134 of Fig. 6A. If so, then the base threshold value T is correct and in step 144 the threshold operation is set as final. Otherwise the base threshold value of T is increased and steps 141, 142, 143 are repeated. Finally, in step 145 the morphological filter described in detail in conjunction with step 132 of Fig. 6 A and with Fig. 7 is applied to the analysis frame. At the end of this process predeteπnined analysis frames have gone through the thresholding step. As noted previously, optionally, an additional threshold may be used. For example, in the present embodiment the PS analysis frame is processed differently. Fig. 6A serves as a similar reference to the steps taken to obtain the PS threshold, however different calculations are used. In step 130 the PS analysis frame threshold base is obtained through the following operations. The threshold value is calculated according to the images intensity: ymin
![Figure imgf000026_0001](https://patentimages.storage.googleapis.com/ca/ed/42/f1231df9d01595/imgf000026_0001.png)
![Figure imgf000026_0002](https://patentimages.storage.googleapis.com/04/3b/54/a112f2aaf85552/imgf000026_0002.png)
1 PS 3 if V -^ ymaχ 73% > J 3%
where Y /o - intensity threshold, which is exceeded by about 3% pixels of representing image, Y3 yT > T PT > T P - represent predetermined input parameters. Next in step 131 the threshold T is applied by applying the threshold operation to the PS analysis frame in the manner described above in conjunction to the Kurtosis base threshold applying (step 131) resulting in analysis frame Ips . Next in step 132 the morphological filter is applied as is described above.
Steps 134, 136, 137, 138 are not performed as a final filtered analysis frame I PS has now been obtained. In the present example we now have in the memory storage device the two analysis frames: analysis frame Ips and analysis frame
Iw. Next the two analysis frames are combined by simple addition. It will be clear to the person skilled in the art that such addition will not be required if only one threshold value is determined and applied, as is clrealy one of the embodiments of the present invention.
Reference is now made to Fig 8. Which shows a schematic illustration of the operation of the video stream analysis and content handler apparatus 160. Apparatus 160 is preferably a computer program written in a computer programming language such as C++ linked, compiled and loaded into the memory device of a computer having at least one CPU and a video I/O device having at least one channel. It will be appreciated by those skilled in the art that any computer language that allows low level system access (i.e. device control and optimal processor operations) can be employed and that optional future legacy electronic devices that perform the same task are contemplated herein. While apparatus 160 is loaded into the memory device it is processed and executed by the CPU to achieve the functionality and perform the method of the present invention as described above. Apparatus 160 comprises two main modules: the video analysis module 162 and optionally the frame handler 180. Video analysis module 162 comprises video input handler 164, pixel vector analysis functions 166, thresholding function module 168 and a filtering component 170. Video input handler 164 receives video input and creates the pixel vector arrays. Pixel vector analysis functions module 166 applies the statistical functions described in association with Fig. 5 and creates analysis frames. Thresholding function module 168 calculates the threshold values, creates the "flattened" analysis frame while applying the threshold values. Filter functions module 170 filters each pixel in the analysis frames through the
morphological filter described in association with Figs. 6 A, 6B and 7. Frame handler 180 is an optional component. Frame handler 180 may comprise any application to handle the video output received. Frame handler 180 can comprise a content insertion and an effect insertion module 190. The Content insertion and the effect insertion module 190 effectively enable the user to insert content or effects in the areas of low significance within frames in the video stream. The insertion of the video content or effects is performed online as seen from the example application shown in Figs. 9, 10, 11 and 12.
Numerous applications can be applied to the method described above. Disclosed below is but one example of a preferred application for the method and apparatus described above. Reference is now made to Fig. 9, showing another preferred embodiment of the present invention and one possible application of the method shown. The application shown is of content insertion. Content insertion concerns the combining of rich media such as images, animation, video and the like into the area of low significance within the video frame. Such content can be advertisement, information requested by users, messages, other programs, and any other input, which may be combined with the video stream and processed in accordance with the method of the present invention. It will be apparent to the person skilled in the art that effect insertion and interactive handling of areas of low and high significance can be readily used in like manner. In step 200 a video stream comprising a plurality of images is received by video input handler 164 of Fig. 8. In step 202 external content such as advertisement, special effects or any other content is received by the content placement / treatment component 190 of Fig. 8. Next, in step 204 the process of determining the areas of low and high significance takes place by the pixel vector analysis functions 166, thresholding function module 168 and a filtering component 170 of Fig. 8. The inserted content is placed in an area of low significance by the content placement component 190. Finally, in step 206
an integrated video stream is produced as output to be delivered to display device such as a television set, a monitor and the like.
Tiirning now to Fig. 10 there is shown a pictorial illustration combined with a flowchart showing the resulting video frame images at each stage of the video analysis and content insertion. In step 220 the raw video image is received. For example, the raw video image 230 can include a close up of person standing in a room. In step 222 the areas of low and high significance are determined. The image 232 shows the different areas. Area 240 is determined to be of high significance, while area 242 is determined to be of low significance. In step 224 an image placement area allocation obtained. The area 244 in image 234 is located through a simple placement of the required area on the image 234. To compensate for the motion of objects in the video frame the area allocated for the inserted image is repositioned in the frame as the video stream progresses. If in a video frame in the example shown the area for the inserted image 244 falls on an area of high significance the area 244 is situated until said area is situated within the low significance (possibly the lower left area of the frame 242') and the area of high significance 240' is not disturbed. In step 226 the image 246 is placed into the area allocated in step 224 in image 236. In step 228 an integrated image 238 containing the content 248 is delivered as output to a user's display. It will be appreciated by those skilled in the art that the content insertion may be replaced by like operations.
Making reference to Fig. 11 there is shown a pictorial illustration of the resulting video analysis frames after the analysis functions and thresholding and filtering stages. In step 250 the analysis functions are applied to the pixel vector arrays. Frame analysis 254 shows a graphical illustration of a sample image that represents the various values of the frame analysis after one function was applied to the pixel vector arrays of the image. Bar 252 shows the various result values received. In step 256 the thresholding and filtering step takes place.
A sample image 262 that represents the "flattening" of the image 254 as a result
of step 256 is shown. The area of low significance 260 is shown in white while the area of high significance 258 is shown in black. Reference is now made to Fig. 12 showing a pictorial illustration of the content placement. Image 270 is the final analysis frame received showing the areas of low and high significance. Image overlay 90 is provided to placement function 49 which identifies the area of low significance and insert content 285. Content 285 is reserved a margin 287 and the image as whole is also provided with a margin 286.
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described herein above. Rather the scope of the present invention is defined only by the claims which follow.