MXPA99008509A

MXPA99008509A - Real-time method of digitally altering a video data stream to remove portions of the original image and substitute elements to create a new image

Info

Publication number: MXPA99008509A
Application number: MXPA/A/1999/008509A
Authority: MX
Inventors: D Steffano Michael
Original assignee: The Metaphor Group
Priority date: 1997-03-18
Filing date: 1999-09-17
Publication date: 2000-08-01

Abstract

A method that allows the real-time replacement of the designated background portion (54) of an incoming video signal with an alternate background (72). The method utilizes the actual background image for reference (48) as the basis for determining the background and foreground elements within the image with the end result being comparable to traditional blue-screen processes, such as chroma-key and ultimatte technology, but requiring only a personal computer, video camera and the software. In this case, however, the reference background image can be any reasonably static scene with a sufficient and stable light source captured by the camera. The video data stream is modified in real-time by comparisons against the reference background image and is then passed onto its original destination. Multiple signal-noise processing algorithms are applied in real-time against the signal to achieve a visually acceptable matte.

Description

METHOD IN REAL TIME TO DIGITALLY ALTER A CURRENT OF VIDEO DATA TO REMOVE PORTIONS OF THE ORIGINAL IMAGE AND REPLACE ELEMENTS TO CREATE A NEW IMAGE Background of the Invention Field of the Invention The present invention relates in general to the processing of digital images, and in particular, to a system and method for altering a stream of video data in real time, to remove portions of the original image, and replace elements in order to create a new image without using traditional blue screen techniques. Description of the Related Art In the moving and video image industries, two or more images are often combined in a single scene. For example, an image of a meteorologist can be combined with another image of a weather map to show the meteorologist standing in front of the map. This technique of combining images is achieved primarily through the use of a "blue screen" process, where an image is photographed against a solid blue background, and the second image is replaced instead of the blue background. This substitution can be done electronically or through optical photographic techniques.

Various improvements have been made to the "blue screen" process of the prior art. U.S. Patent No. 4,968,132 discloses a process of coupling in travel, to create male or female couplings that can be altered or corrected frame by frame in a computer, and that can be used to create special effects in cinematography conventional and in video recordings without the need for a blue screen background. In addition, in U.S. Patent No. 4,800,432, a video difference key generator has a stored reference video image. An input video image is compared to the reference video image by means of an absolute difference circuit, which subtracts different corresponding pixels of the two video images, the smallest of the largest, to produce a difference video image. . The difference video image can be filtered, and then entered into a transfer function circuit to produce an output that can be used as a key signal for the composition of video images. There would be a significant advantage to these fund replacement methodologies if laborious and time-consuming functions could be performed in real time, if the analysis of the video frames could provide greater understanding of the composition of the images within each video frame, and if you could totally avoid the use of blue screen techniques. SUMMARY OF THE INVENTION The present invention simplifies the process of removing the background scene from a video image, and replacing it with an alternative background. A simple PC computer can be used instead of the complex computer systems of the prior art. In the present invention, a series of video frames (or a single frame) produced by a stationary video camera are captured. These images, and their slight variations from frame to frame, such as lighting, color, shadow, subtle movements, and the normal variations produced by the video camera itself, are passed to the computer from the camera, where they are analyzed by software to produce a mathematical description of the video, in real time, as it is captured by the computer. The mathematical description of the captured video is then used to analyze new video segments of the same scene, in order to determine if changes have taken place in the scene. The elements of the new video that fall within a tolerance previously established in relation to the original mathematical description, are defined as the background. The elements of the new video that fall outside the previously established tolerance, are defined as close-up, and can be isolated from the background. This understanding and isolation of the different elements of foreground and background video allow modifications to be made to the new video stream. For example, the background can be changed to an entirely new image, while the foreground remains unchanged. The background image can be one of a moving video, a bitmap, or animation, as desired. Therefore, the functionality of traditional blue screen processes is achieved without using a blue screen. Accordingly, it is an object of the present invention to remove the background image of a live video scene, in real time, through the use of a software-only programming mechanism, employing a mathematical description of the elements of the scene Live video, and do not use the traditional blue screen processes, and replace the background image in real time, with an alternative background image, including the original foreground elements. Therefore, the present invention relates to a method for digitally altering, in real time, a live video scene, with a computer system having a memory, a visual display, and a stationary video camera connected to the computer, in such a way that the video signals from the camera pass to the computer, where an altered video scene is formed. This is done by first digitally capturing, and then mathematically describing, one or several frames of the live video scene, hereinafter referred to as the "reference view", in a first data structure in the memory. Next, each subsequent frame of the live video scene is digitized, and captured by the computer, with each new frame mathematically described by the software and stored in a second data structure in memory. In real time, these first and second data structures are compared using multiple signal-to-noise algorithms, available to anyone with sufficient experience in this field, and the background image is mathematically removed from the reference view, from the frame freshly captured Additionally, at this stage, since a mathematical description of a different background image is available in a third data structure in the memory, such as recorded video, a bitmap, or animation, it can be substituted in the second structure of data instead of the background image removed from the reference view, thus creating a new mathematical description of the digitized e. Finally, the mathematical description of the frame is converted back to a video signal, and displayed in the visual display, or transferred to any suie destination, such as a video conference participant, or a capture file. Therefore, the output of this process gives the appearance that any foreground elements in the original video scene (the reference view) are superimposed on a different background, creating a new image. BRIEF DESCRIPTION OF THE DRAWINGS These and other features of the present invention will be more fully disclosed when taken in conjunction with the following detailed description of the preferred embodiments, wherein like numerals represent like elements, and wherein: FIGS. A) - (D) are representations of the visual display screen, when it shows the reference view, the black screen (empty) that appears when the reference view has been mathematically removed, the black screen with the elements of the foreground detected, and a replacement view that replaces the reference view, the foreground elements forming a new image. Figure 2 is a schematic representation of the manner in which the screens of Figure 1 are obtained. Figure 3 is a representation of the mathematical Boolean exclusive OR (o) operation that couples duplicate bit values between two coupling sets of boolea-nos data, and which represents a reference view stored in a standard digital data format, the reference view being then digitally compared with the data representing the live video scene, to leave only the data that is not present in both scenes. Figure 4 is a flow diagram illustrating the novel steps of the present invention. Detailed Description of the Invention Figures 1 (A) - (D) represent the basic steps in the novel process of the present invention. In Figure 1 (A), a reference view is captured, and displayed in the visual display of the computer. A reference view is defined as a mathematical description of a finite sequence of digitized video frames, which is stored in a data structure at a location in the memory of the computer. It is used as the representative video scene for the separation of the elements of the foreground and the background from the digital video frames subsequently captured. A reference view may be comprised of one or many digitized video frames, depending on the selected algorithm. Subsequent video scenes from the same view are mathematically removed from each captured frame, making comparisons against the reference view. Adjustments are made to the parameters of the algorithm (either manually or automatically by the software), until the visual display screen is completely empty (the color black was selected in this case, but could easily have been white or any color), meaning a complete real-time removal of the reference view of the digitized video stream, as illustrated in Figure 1 (B). The visual display screen shown in Figure 1 (C) demonstrates the effect achieved when a person moves into the scene, as captured by the video camera. Because the person was not a part of the reference view, it is considered by the software process as a background element, and appears superimposed in front of the black background. The entire scene, with the exception of the person, is removing mathematics from the digitized video stream in real time through the software. Figure 1 (D) demonstrates the ability to replace the removed reference view with an alternative view. A computer graphic, animation, or video can be used to digitally replace the reference view, giving the appearance that the person is facing the alternative view. Accordingly, the results shown in Figure 1 (D) demonstrate the manner in which the elements of the foreground and the background have been digitally reconfigured, resulting in a new image. Figure 2 illustrates the apparatus of the present invention for creating the new image. The reference view 10 includes a desk 12 and a chair 14, defined as the visual image captured by the camera 16. The reference view must be free of unnecessary movement, and must be illuminated by a stable and strong global illumination for the best effect. The camera 16 must be mounted in a stable position, and connected to a personal computer 18. The personal computer 18 will include the appropriate software and video hardware required by the camera when it is installed and is operating. Video software may include operating systems, video units, compressors, decompressors, and software applications, such as video conferencing or video editing software. The personal computer 18 digitizes the captured reference view, stores it in a first location of its memory schematically represented by unit 19, and displays it. The personal computer 18 contains a software system running in its random access memory (also schematically represented by unit 19). The software system captures the video signal from the camera 16, as each box appears that is indicated as part of the reference view. The following frames are captured as indicated by reference numeral 20. To achieve superior results, the scene being viewed by camera 16 must be reasonably free of movement and well lit. The reference phase 20 of the software, constructs a set of data structures, easily understood by anyone with sufficient experience in the field, containing values that represent the scene, the dynamics of the illumination, and variations in the pixel values caused by the camera's receivers. The reference phase can be adjusted to allow an optimal production of the scene. The reference scene 10 is displayed on the computer monitor 22, to allow the operator to make easy adjustments. The software also allows automatic adjustment. Phase 24 represents the removal phase of the software system that mathematically removes the reference view of the captured video signal, thus leaving an empty view 27 (represented here by the color black) on the monitor of the computer 22. The Removal phase requires processing by pass-multiple of signal-to-noise algorithms (well known in the art) against the data representing each frame captured from the video scene, in order to create a visually acceptable empty view 27. "Visually acceptable" is defined as a single solid color video image without ripples (black was selected here). The replacement phase 28 of the software allows the replacement, in real time, of an alternative background image on the resulting video signal. The replacement scene 32 is also stored in another location of the computer memory, also schematically represented by the unit 19, and may be a moving video, a bitmap, or animation. You can place any type of visual image or set of images on the empty view. The replacement image is mapped pixel by pixel in the empty pixels that remain when the reference view is removed. Because the pixels are mapped from a coordinate system identical to the reference view, the replacement image is displayed as expected.

Then the new replacement scene 32 is displayed on the computer monitor 22, as shown. Now the operator 34 enters the view of the camera, adding a new element to the captured video scene 33. The video scene is captured by the same camera 16. The input video signal, representing the video scene 33 , it is stored in another location of the computer memory in the unit 19, and it is displayed on the monitor 22 of the computer 18. This signal passes to the coupling phase, as illustrated by numeral 40, and is processed in such a way that only the mathematical difference, within the adjustable tolerances, is displayed between the live scene 33 and the original reference view 10 (in this case, the person 34), on the replacement view 32, transforming it into the new image. The new image displayed on the visual display monitor 22 includes the alternative scene 32 and the added person 34. The best results are achieved if the operator is not using colors that correspond directly to the colors that are directly afterwards in the reference view. The result is similar to the blue screen processes, and can cause a bleed effect. However, unlike the blue screen processes, certain parameters within the software (because it has an understanding of the visual elements inside the scene) can count for a percentage of the bleeding effect, and remove it. As previously reported, the alternative scenes that can replace the reference view are easily swept in and out of the video stream. The process scales very well up to the color video, although there is a correspondingly larger demand on the central processing unit of the PC, due to the higher color data requirements. A moving video background can be replaced on the live video screen, giving the appearance of an office board. Figure 3 illustrates a simplistic Boolean exclusive OR (o) configuration, known to anyone who has experience in this field, that couples the duplicate bit values between two sets of binary data coupling, and that is used conceptually in the present process to get an empty view. Consider the eight bits in row A, as the reference view stored, and the eight bits in row B as the frame captured from the input video subsequently captured. If the eight bits in row B were identical to the eight bits in row A, and were subject to an exclusive OR (o) operation, the output would be all zeros, or an empty box would be generated. Since row B illustrates the eight bits of video scene 33 as shown in Figure 2 and is displayed in the visual display of computer 22, by performing an exclusive OR (o) operation between rows A and B, row C is obtained. Note that only the displayed data exists, where there has been a change in the video scene, compared to the reference view. Therefore, the only information that is displayed is the change in the data represented by bits 3, 5, and 8. Therefore, in summary, the reference view is captured in row A and stored in the digital data format standards Then an OR (or) exclusive operation is applied to the data representing a live video scene of the same view and shown in row B. The common data present in both scenes can then be subtracted from each frame of the live video. In a perfect world, this would be the only operation necessary to achieve a perfect coupling effect. However, in reality, a whole series of advanced signal-to-noise algorithms must be applied against the data multiple times to achieve this separation of the foreground and background elements, due to variations in illumination and in the shadows through time, to the subtle movements in the reference view, and to the effects of digital quantification of the CCD video camera on the pixels between the successive frames. The iterative use of this series of algorithms is well known to those skilled in the art. Figure 4 illustrates the novel process of the present invention. Accordingly, the process begins at step 46. In step 48, data representing live video is routed from an input device, such as a video camera, to a computing device, such as a PC. Inside the PC, they are converted to a digital representation of the analog data (if this has not already been done by the camera), and they move to a representative data structure in a memory storage. This is the reference view that is captured. In step 50, a decision step is determined if the appropriate scene is captured. If not, the process is routed to 52 back to step 48, until the appropriate scene is captured. This is determined visually by the software operator. When the appropriate scene has been captured, in step 54, using the data from the reference view initially stored in step 48, a series of algorithms are applied against the captured frames digitized from the current video supply. The algorithms try to match the chrominance and luminance values for the pixels at the corresponding positions within each frame, with the corresponding chrominance and luminance pixels initially stored as the reference view in step 48. When a coupling is determined, it is pixel position inside the box is marked as empty. Due to variations in lighting, shadows, movement, and the quantization effect, when analog data is converted to digital data, the pixels corresponding to the same view within the following frames may vary in their values. Therefore, data structures are created that represent all this information, and are maintained by the software inside the memory. Subsequently, well-known sophisticated signal processing algorithms, or "filters", are applied against the input video data, to precisely identify the pixels, as a coupling between the frames, and thus mark them as empty. In this way, the reference view captured in step 48 is removed from the captured video scene. This process is repeated for each frame captured from input video data. In decision step 56, it is determined whether the reference view is completely removed from the captured video. The degree of background removal can be adjusted in a manual or automatic way by the software, to remove the maximum amount of reference view of the current video supply. If the scene is not completely removed, the process is reversed along 58, back to step 54. If the reference view is sufficiently removed, as determined by the software or by the operator, the process moves to 60 until step 64, where the scene is varied, as by a user who enters the view of the video camera. Normally in this step, a person (referred to in the production industry as "talent") enters the scene that is being captured by the video camera. Because the talent pixel data is not a part of the reference view, the software recognizes that the pixel values of the talent do not match the original pixel values, and considers them foreground elements. The talent pixels are not removed, and subsequently appear within an empty video frame, as shown in Figure 1 (C). In step 66, additional signal processing algorithms can be applied to the noise against the captured video feed, to improve the image of the talent or "hero" in the empty box, as shown in Figure 1 (C). These filters can be very sophisticated. For example, if the talent tie matches the background color in the reference view shown in Figure 1 (A), a "bleed" is observed. However, with a proper filtering logarithm applied, the software can do sophisticated riddles to exclude the tie from being marked as empty (because it matches the pixel data of the reference view directly after the tie), based on the fact that it is surrounded by elements of the foreground. This is an important feature that traditional transmission technology can not achieve, such as the chroma key and ultimatte. The implementation of these filters can be done manually or automatically using the software. In step 68, if the talent image is acceptable, the process moves to step 72. If not, it is reversed at 70 back to step 66, where the filters continue to be manipulated until the talent is properly displayed on the screen. Step 68. When the talent is properly displayed in step 68, the process moves to step 72, where an alternative background can now be used to replace the empty portions of the video scene. This new image can be any graphic image that can be digitally represented inside a computer system, and will create the illusion that the talent is now in front of the new background. The replacement of the new image is achieved by replacing the corresponding empty pixels with the corresponding pixels of the new image on a frame by frame basis, using this technique, if the talent moves, it will appear moving in front of the new background. The background can be a previously recorded video that can be manipulated frame by frame. This gives the effect of talent in front of a moving background. If the session is terminated in step 74, the process is stopped in step 78. If the session is not completed, the process moves to 76, back to step 72. Consequently, a novel system has been disclosed. which allows the replacement in real time of the designated fund portion of an incoming video signal with an alternative fund. The system uses the real background image of the reference view as the base that creates a new video image, the final result being comparable with the traditional blue screen processes, such as the chroma key and ultimatte technology, but requiring only of a personal computer, video camera, and software. However, in this case, the background image can be any reasonably static scene, with a sufficient and stable light source captured by the video camera. The video stream is modified in real time, and then it is passed to its original destination. The structures, materials, acts, and corresponding equivalents of all elements or steps, plus the function elements of the following claims, are intended to include any structure, material, or act to perform the function in combination with other claimed elements, as claimed. specifically.

Claims

CLAIMS 1. A method to digitally alter a stream of video data representing an original image in real time, using a computer system that has a memory, and a visual screen, to remove portions of the original video image and replace new ones elements for creating a new video image, comprising the steps of: storing at least one data frame representing said original image in a first video data stream that enters a data structure in a first place in said memory in a computer; displaying said original image stored in said visual screen; capturing video in real time from a second video data stream that comes in representing an original image subsequently modified and storing data representing said modified original image in a data structure in a second place in said memory in a computer; comparing video data in the second stream of video data representing said original image with modifications to said original image video data stored to differentiate the background and foreground elements; removing the common background elements of said stored original image and the subsequently modified original image, leaving only the foreground elements of said original image subsequently modified; replacing said background elements of said original image subsequently modified with alternate background replacement elements; and displaying only the foreground elements in said originally modified image subsequently on said alternate background replacement elements to form said new image. The method of claim 1, further comprising the steps of: using a stationary video camera to obtain said original image in the form of said first video data arriving and said original image subsequently modified in the form of second data of video that arrive; coupling said video data arriving first and second of the camera with said computer; and digitizing said first video data arriving from said original image and said second video data arriving from said second stream of data in real time for storage in respective first and second places in said memory. The method of claim 2, further comprising the steps of: making said original image reasonably free of movement; and illuminating said original image sufficiently to allow the detection and separation of background and foreground elements when the second digitalized stream of video data arriving in real time is compared with said original digitized, stored image. The method of claim 3, further comprising the steps of: creating said first and second video signals with pixel receivers in said video camera, each pixel receiver generating an output signal containing values representing the original image stored carried by the first video data stream arriving and the original image arriving, in real time, stored, carried by the second video data stream; and creating a first set of data structures in said memory for storing values of said pixel receivers representing variations in the original image, variations in the illumination dynamics of the original image, and variations in values of pixel signals caused by the pixel receivers of the camera that generate the first video data stream arriving representing said original image. The method of claim 4, wherein the step of comparing the video data in said second stream of video data in real time representing said modified original image with the stored original image data further comprises the steps of: creating a second set of data structures in said memory for storing data representing variations in the second original image in real time, variations in the dynamics of the illumination, and signal variations in the values of pixel signals caused by the pixel receivers of the camera for the second original image in real time; and comparing the pixel values of said data stored in said second data structure for said second original image in real time with data values of corresponding pixels stored in said first data structure to determine elements of foreground and background in said second image. original in real time. The method of claim 5, further comprising the steps of: comparing the data in said second stream of time data in real time representing said modified original image with the stored original image data; and generating signals representing only said background elements to be represented on said alternate background replacement elements. The method of claim 6, further including the step of adding a person to said second stream of video data in real time to replace at least a portion of said original image. The method of claim 6, wherein the step of forming said alternate replacement image further comprises the step of providing one of either a moving video, a bitmap, or animation, or any image capable of being represented in a digital format, like the alternate replacement image. 9. A computer-assisted system for digitally editing, in real time, a stream of video data representing an original image to identify and separate portions of the original image into foreground and background elements and replace the background elements of the original image with substitute elements to create a new image, the system comprising: a visual screen for displaying said original image contained in said video data stream; a camera having a device associated therewith for digitizing video data representing said original image and captured by said camera; a computer coupled to said visual screen and said device to digitize; a first memory in the computer for storing the original scanned image from said video data stream; said camera and said digitizing device capturing and digitizing a second stream of video data in real time containing data representing the original image; a second memory in the computer for storing the second original real-time image digitized from said video data stream; a third memory in the computer for storing program instructions for comparing the second digitized real-time original image with the stored original image for detecting and separating foreground and background elements; an alternate replacement background image stored in a fourth memory available to said stored program instructions to be displayed on said visual display; said second original image in real time being modified by its background elements that are being replaced with the alternate replacement background image; said camera and said associated device capturing and digitizing said modified original image having the replacement background image; and said program instructions comparing said modified original image with said stored original image to obtain only foreground elements and cause said background elements to be represented on the alternate replacement image to form the new image. 10. A computer memory product that contains a program to cause digital alteration in real time of a video data stream from pixel receivers in a video camera that represents an original image, removing portions of the original image and substituting elements for creating a new image, the program comprising the steps of: causing video data in the first stream of video data representing the original image to be stored in a first memory location of the computer; causing a comparison of a stream of video data in real time representing the original image having substitute portions added thereto with the original image to obtain only the substitute portions; and controlling the computer in response to program instructions stored in the computer's memory to cause a visual representation of only the substitute portions of the original image on an alternate replacement image to form a new image. The program of claim 10, further comprising the steps of: accessing a first set of data structures in the first memory location in the computer that stores digitized video signals that represent the original image carried by the first stream of data. video data, including variances in the dynamics of the illumination of the original image, movement, and variations in the values of pixel signals caused by the pixel receivers of the video camera; accessing a second set of data structures in the computer's memory that stores data representing variations in the original image with substitute portions in it in a second stream of real-time video data, variations in the dynamics of image illumination original, movement, and signal variations in the pixel signal values for the second stream of video data in real time; and controlling the computer to cause a comparison of the data stored from the second video stream in real time, including substitute portions, with the original image data stored to determine foreground and background elements.