MXPA96004084A - A system for implanting an image into a video stream - Google Patents

A system for implanting an image into a video stream

Info

Publication number
MXPA96004084A
MXPA96004084A MXPA/A/1996/004084A MX9604084A MXPA96004084A MX PA96004084 A MXPA96004084 A MX PA96004084A MX 9604084 A MX9604084 A MX 9604084A MX PA96004084 A MXPA96004084 A MX PA96004084A
Authority
MX
Mexico
Prior art keywords
image
model
frame
video
mask
Prior art date
Application number
MXPA/A/1996/004084A
Other languages
Spanish (es)
Other versions
MX9604084A (en
Inventor
Kreitman Haim
Barel Dan
Amir Yoel
Tirosh Ehud
Original Assignee
Scitex Corporation Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from IL10895794A external-priority patent/IL108957A/en
Application filed by Scitex Corporation Ltd filed Critical Scitex Corporation Ltd
Publication of MX9604084A publication Critical patent/MX9604084A/en
Publication of MXPA96004084A publication Critical patent/MXPA96004084A/en

Links

Abstract

A system and method which mixes images, such as an advertisement, with a video stream of action occurring within a relatively unchanging space, such as a playing field, is disclosed. The system utilizes a model of the background space to change the video streamáso as to include the image at some location within the background space. It includes a video frame grabber (10) and an image implantation system (14). The frame grabber (10) grabs a single frame of the video signal at one time. The image implantation system (14) typically implants the grabbed image into the frame onto a predefined portion of a preselected one of the surfaces of the background space if the portion is shown in the frame.

Description

A SYSTEM TO IMPLEMENT AN IMAGE IN A VIDEO CURRENT FIELD OF THE INVENTION The present invention relates in general to the combination of a prepared image with a video signal. BACKGROUND OF THE INVENTION Sports arenas typically include a play area where the game occurs, a seating area where the spectator sits and a wall of some kind separating the two areas. Typically, the wall is covered at least partially with advertisements from the companies sponsoring the game. When the game is filmed, the advertisements on the walls are filmed as part of the sports arena. Ads can not be presented to the public in detail unless filmed by television cameras. Systems that combine predefined advertisements on surfaces in a video of a sports arena are known. A system has an operator that defines a target surface in the sand. The system then blocks the target surface and combines a predetermined announcement with the portion of the video stream corresponding to the surface. When the camera stops looking towards the surface, the system loses the target surface and the operator has to indicate again which surface is to be used. The system described above operates in real-time. Other systems are known that carry out essentially the same operation but not in real-time. Other systems for combining data in a video sequence are known. These include the insertion of an image between the video scenes, the superposition of ,? image data in one place in the television frame (such as the logos of the television station) and even the electronic insertion of image data as a "replacement" of a specific objective billboard. The latter is carried out using techniques such as color manipulation. The US 5,264,933 discloses an apparatus and method for altering video images in order to allow the addition of advertising images as part of the originally displayed image. The operator selects where the advertising image is to be implanted in the image captured. The American system 5,264,933 can also implant images, in selected main transmission areas, in response to audio signals, such as typical expressions of commentators. The PCT application PCT / FR91 / 00296 describes a procedure and device for modifying a zone in successive images. The images show a non-deformable target zone which has near registration marks. The system searches for the registration marks and uses them to determine the location of the area. A previously prepared image can then be superimposed on the area. The registration marks are any easily identifiable mark (such as crosses or other "letters") in or near the target area. The PCT / FR91 / 00296 system produces the captured image in many resolutions and uses the various resolutions in its identification process. SUMMARY OF THE PRESENT INVENTION It is an object of the present invention to provide a system and method that mixes images, such as an advertisement, with a video stream of the action occurring within a relatively unchanged space. Such a space can be a field or a court, a stage or a game room and the location is typically selected before the action (for example the game or the show). The images are "implanted" into a selected surface of the background space, where the term "implanted" herein means that the images are mixed in the part of the video stream that shows the selected surface. Specifically, the present invention uses the a priori information taking into account the background space to change the video stream in order to include the image in the same place within the background space. The system and the method do not operate any material whose perspective view of the background space is present in the video stream. According to a preferred embodiment of the present invention, the system preferably includes a video frame grabber and an image implantation system. The framing grabber takes only one frame of the video signal at a time. Typically, the image implantation system implants the advertising image in the frame on a predefined portion of one of the preselected surfaces of the background space if the portion is shown in the frame. To determine the location of the portion to receive the implantation, the image implantation system includes a unit for receiving a) a flat model of the fixed surfaces of the background space and b) an image mask indicating the portion of the flat model on which is about to mix the image. Through the model, the image implantation system identifies if and where the portion in the frame is shown. In addition, according to a preferred embodiment of the present invention, the system also includes a design work station on which the image can be designed and an image mask indicating the preselected surface. In addition, the identification preferably involves a) reviewing the frame and extracting the characteristics of the fixed surfaces of the same and b) determining a transformation in perspective between the model and the extracted characteristics. In addition, review and extraction includes the creation of a background mask and a foreground mask. The background mask indicates the places of the characteristics of interest, of the background elements in the frame and is used to extract the desired characteristics. The foreground mask is formed of the foreground elements of the frame, which should remain unchanged. Additionally, in accordance with a preferred embodiment of the present invention, the implantation includes the steps of a) transforming the image, an image mask and, optionally, a blending mask, with the transformation in perspective, and b) mixing the Transformed image, the image mask and the optional blend mask with the frame and the foreground mask. The foreground mask, as mentioned before in the present, indicates the locations of the foreground data that are not covered by the transformed image. In addition, the system preferably includes a look-up table to convert between the multiplicity of colors in the frame to one of: the colors of the features of interest, the colors of the background elements and a color indicating the foreground elements. The search table is preferably created by having a user indicate the relevant colors. If the relevant colors no longer indicate the characteristics of interest and background elements (typically due to changes in illumination), the user can indicate new colors that indicate the desired elements and then correct the search table. In addition, according to a preferred embodiment of the present invention, the search table is used to create background and foreground masks of the frame indicating the locations of the features of interest, of background elements and of foreground elements in the frame. According to an exemplary embodiment of the present invention, the features are lines. In one modality, they are extracted with a Hough transformation. In another modality, they are extracted when determining the angles of the line segments. The pixels of interest are selected and an immediacy is opened. The immediate area is subdivided and the sector that has the most activity is selected. The selected sector is extended and divided then. The process is repeated as necessary. Furthermore, according to a preferred embodiment of the present invention, the system projects the extracted characteristics towards an asymptotic function to determine which characteristics are perspective versions of parallel lines. In addition, according to the exemplary embodiment of the present invention, the background space is a sports arena that has lines marked on it. The system has a model of the sports arena and, preferably, has a list of the rectangles in the model and the places of its angular points. Preferably, the system carries out the following operations: a) selects two vertical lines and two horizontal lines of the extracted characteristics and determines their insertion points; b) generates a transformation matrix from the angular points of each rectangle of the model to the points of intersection of the characteristic; c) transform the model with each transformation matrix; d) use the background elements of the background mask, compare each transformed model with the frame; and e) selects the transformation matrix that equates the characteristics of the best framing. In addition, according to the exemplary embodiment of the present invention, camera parameters can be used to reduce the number of lines in the frame necessary to identify the sports field. For this modality the following actions occur: receive or extract the coordinates of a set of cameras; represent a current transformation matrix as a product of the coordinate, tilt, steer and approach matrices and then determine the values for tilt, turn and approach; and identify the camera that has the calculated values for the tilt, turn and approach and store the information; and repeat the stages of receiving, representing and identifying as long as there is a new cut in the video. Now any frame in the video stream can be treated either as if it were similar to the previous frame or as part of a new cut taken by an identified camera.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings, in which: Figure 1 is an illustration of the block diagram of a system for implanting images in a video stream, constructed and operative in accordance with a preferred embodiment of the present invention; Figure 2 is a schematic illustration of a tennis game used as an example to explain the operation of the system of Figure 1; Figure 3 is an illustration of a model of a tennis court, useful in understanding the operation of the system of Figure 1; Figure 4A is an illustration of an image to be implanted; Figure 4B is an illustration of an image region mask for the image of Figure 4A and the model of Figure 3; Figure 4C is an illustration of a mixing mask for the image of Figure 4A and the model of Figure 3; Figure 5 is a block diagram illustration of the elements of an image implantation unit that is part of the system of Figure 1; Figure 6 is an illustration of an exemplary video frame in which the image of Figure 4A is about to be implanted; Figure 7 is an illustration of a background mask generated from the video frame of Figure 6; Fig. 8 is an illustration of the block diagram of the operations of a feature identification unit that is part of the image implantation unit of Fig. 5; Figure 9A is an illustration of the flow diagram of a characteristic extraction method; Figure 9B is an illustration of a portion of the background mask, useful in understanding the method of Figure 9A; Figure 9C is an illustration of a subsector histogram of the background mask of Figure 9B, useful in understanding the method of Figure 9A; Figure 10 is an illustration of the block diagram of the operations of a perspective identification unit that is part of the image implantation unit of Figure 5; Figure HA is an illustration of the meeting points of the features extracted from Figure 7; Figure 11B is an illustration of parallel lines in perspective that are at different points due to calculation inaccuracies; Figures 12A and 12B are illustrations of gnoonic projections, useful in understanding the operations of the perspective identification unit of Figure 10; Figure 12C is a graphical illustration of an exemplary function useful for the gnomonic projections of Figures 12A and 12B; Figure 13 is a detailed illustration of the block diagram of the operations illustrated in Figure 10; Figures 14A and 14B are useful in understanding the operations of Figure 13; Figure 15 is an illustration of the use of transformation matrices; Figure 16 is a useful illustration of the understanding of the process of equating quadrilaterals with the geometric model, useful in understanding the operations of Figure 13; Fig. 17 is an illustration of the block diagram of the operations of the transformer and mixing units of the image implantation unit of Fig. 5; Fig. 18 is an illustration of the block diagram of a correction method for updating a look-up table used in the image implantation unit of Fig. 5; Figure 19 is a schematic illustration of the camera parameters; Figure 20 is an illustration of the flow diagram of the operations of the transformation matrix when the camera parameters of Figure 19 are known or calculable; Figure 21 is an illustration of a table useful in the process shown in Figure 20; and Figure 22 is an illustration of the flow chart of an operating method when the parameters of the camera are known or calculable. DETAILED DESCRIPTION OF THE PREFERRED MODALITIES Reference is now made to Figure 1, which illustrates a system for mixing images, such as advertisements, with a video stream of the action occurring within a relatively unchanged background space. The images are implanted in a selected surface of the background space. The system will be described in the context of a video of a tennis match, illustrated in figure 2 which is also referred to. It will be understood that the present invention is operative for all situations in which the surfaces on which the action occurs are known a priori and are identifiable. Typically, the system of the present invention comprises a video image grabber 10 for converting an input video sequence (such as from a tennis game) into video frames, a design workstation 12 for designing images (such as as advertisements) to be implanted on a selected surface (such as on the tennis court) seen within the video frame, an image implantation system 14 for combining the video frame with the designed image, a computer control system 16 for controlling the action of and providing the operator input to the image implantation system 14 and a transmission monitor 18. Typically, the computer control system 16 comprises a central processing unit (CPU) 20, a keyboard 22, a mouse 24 , a disk 26, a mobile support transmission unit such as a flexible disk 27 and a monitor 28. The monitor 28 is typically operated by a graphic adapter. which is part of the CPU 20. Typically, the design work station 12 also includes a mobile support transmission unit such as a flexible disk 27.
Typically, the control computer system 16 and the image implantation system 14 communicate via a system bus 29. Typically, the design workstation and the computer control system 16 communicate through a mobile medium. . The video sequence can be received from any source, such as a video recorder, a remote transmission station via satellite, microwave or any other type of video communication, etc. If the sequence is provided from a satellite, the system has no control over the speed of the video. Accordingly, the image implantation system 14 must carry out its operations within the video speed, typically 30ms between the frames, of the satellite video stream. If the sequence comes from a VCR, the system can control the video speed and operate at any desired speed. The video sequence is originally produced at the match site. As can be seen in Figure 2, for tennis matches, there are typically two television cameras 30 watching the action on the tennis court 32. The locations of the television cameras 30 are typically fixed. The court 32 is divided into two halves by a network 34. Each half has a plurality of areas 36, typically painted a first form of green, divided by a plurality of lines 38, typically painted white. The outer area of the court 40 is typically painted a second form of green. Actually, lines 38 are parallel and perpendicular lines. Since the cameras 30 approach the action from the angle, instead of from above, the images of the action they receive are seen in perspective. In this way, in the video output of the cameras 30, the parallel lines 38 appear to converge in infinity. The perspective angle of the video output changes in the measure of the angles of the cameras 30 and the approach amount. The present invention will implant an image 42, such as the word "IMAGE", in a desired location on a selected background surface, for all angles in perspective and approach amount. For tennis courts, the possible places are any rectangle within a half of the tennis court 32 defined by four lines 38. As shown in Figure 2, the image 42 will not interfere with the action of the players 44; it will appear as if the image 42 were painted on the surface of the court. Since, in fact they do not change the shape of the court 32 and the location of the lines 38 within the court 32, if the image implantation system has a game space model, including the location in which it is about to be implemented. the image, and can identify at least the angle of view and the amount of approach, you can combine the image in the video sequence so that it appears as if the image was implanted in the desired location. To do this, the image implantation system also needs to know the colors of the court as seen by the cameras. These colors may change as the lighting changes (daylight or artificial light). Reference is now made in addition to figure 3, which illustrates a geometric model 50 of the tennis court and figures 4A, 4B and 4C illustrating the data prepared by an implementation designer. The implementation designer works on the design workstation 12, such as the BLAZE workstation manufactured by Scitex Corporation Ltd. of Herzlia, Israel, and typically has the geometric model 50 of the tennis court 32, typically as a view from above. The model 50 is typically a scaled version of the court 32, indicating the elements thereof that must be identified by the implementation system 14, such as the lines 38. Other playing fields may include circles or other well-defined curves. Other identifiable elements include intersections 54 of the lines 38. The implantation designer designs the image 42 (illustrated in Figure 4A) to be implanted and determines where to place it in the model 50. Figure 3 shows a number of possible locations 52. The designer then prepares an image location mask 56 (Figure 4B) to identify where to place the image 42 within the model 50. The mask 56 is illuminated at the place where the image 42 is placed on the model 50 and is obscured at any other site. Since the image 42 can be bright colors, it may be desired not to implant the image itself but a softened vision thereof, so as not to significantly disturb the action on the court 32. Accordingly, the implantation designer may optionally prepare a blend mask 58 (FIG. 4C) indicating how the image 42 is mixed with the color of the court 32 at the location of the implantation as indicated by the location mask 56. The blend mask 58 can be any suitable mask as is known in the art. . In Figure 4C, the mask 58 is shown to have four areas 59, each indicating the inclusion of a different amount of court color, where the outside area 59 typically incorporates much more court color than the inside areas. Reference is now again made to Figures 1 and 2. The implantation data, formed of the geometric model 50, the image 42, the image location mask 56 and the optional blend mask 58, are typically prepared before the tennis match relevant and provided to the implementation system 14, typically through a mobile medium, for its implementation in the video input sequence when the match occurs. Most video sequences of live televised matches begin with an operational initialization sequence to allow the operators of the local station to synchronize their systems to the input sequence. Typically, this is also true for video tape data. In the present invention, the initialization video data is taken by the frame grabber 10 and is first provided to the computer control system 16. A station operator selects a frame that has a clear view of the playing field and uses it to provide calibration information, as described hereinafter. The calibration information is used by the image implantation system 14 to identify the court 32 and its characteristics (such as lines 38). In the embodiment described hereinafter, the calibration information includes the colors of the characteristics of interest in the background, such as the field lines, the playing field (court 32) and the ground outside the playing field (area external of the court 40). The remaining colors that can be received are defined as foreground colors. Other playing fields may require fewer or more features to define them and, thus, fewer or more colors. The operator of the station that uses the mouse 24 and keyboard 22, interactively define the colors of the calibration. This can be achieved in several ways, one of which will be described herein. A four-color layer is superimposed over the displayed frame in a concurrent manner on the control monitor 28. Initially, the four-color layer is comprised of a single color, a transparent color. In this way, the current frame is initially visible. The operator indicates the pixels that describe one of the three characteristics, lines 38, the internal game field 36 and the external game field 40. When he selects a pixel, those superimposed pixels correspond to the pixels in the current frame that has the selected color are colored in a single color changed, covering by this their corresponding pixels of the current frame. The selected color is stored. The process is repeated for the three areas. All unselected colors are assigned to four changed colors. If the operator approves the resulting four-color layer, a search table (LUT) is produced between the selected colors of the current frame and the changed colors. If desired, the control computer system 16 can store the pixels that the operator selected for later use in a LUT correction cycle, described hereinafter with reference to Figure 18. The control computer system 16 provides the framing data, consisting of the LUT and the pixels used to produce the LUT, to the image implantation system 14. The system 14 uses the framing data described above to identify the desired characteristics in each frame of the video signal of entry. Reference is now made to Figure 5, which illustrates the general elements of the image implantation system 14. Reference is also made to Figures 6 and 7, which are useful in understanding the operation of the system 14. Typically, the system 14 comprises a feature identification unit 60 (Figure 5) to identify which features of the court 32 are presented in each input video frame and a perspective identification unit 62 to identify the visual angle and the approach of a active camera 30 and to determine an appropriate perspective transformation between model 50 and the input video frame. The system 14 also comprises a transformer 64 for transforming the implantation data of the model plane into the visual plane of the image and a mixer 66 for mixing the perspective implementation data with the current video frame, thereby implementing the image 42 on the court 32. As described in more detail below, the feature identification unit 60 uses the LUT to create a background mask of the frame which indicates which parts of the frame have possible background characteristics of the frame. interest and which parts are foreground and therefore, should not be changed in subsequent operations. Figures 6 and 7, respectively, provide an exemplary entry frame 68 and its corresponding background mask 70. The entry frame 68 of Figure 6 has two players 44 on the court 32. The background mask 70 of Figure 7 shows the areas of the four colors. The areas marked 1-4 are the color areas of the line, the court's internal color, the court's outside color, and the remaining colors, respectively. It is noted that the areas of the players 44 are marked with the background color 4 and cover other important areas, such as those of the white lines 1. From the background mask 70, the unit 60 (figure 5) extracts the characteristics of the game field. For the tennis courts, the features of interest are the lines 38. The identification unit in perspective 62 compares the extracted characteristics with those of the model 50 and produces a transformation matrix from them. By using the transformation matrix, the transformer 64 converts the image implantation data (i.e., the image 42 to be implanted, the image location mask 56 and the blend mask 58) for the perspective of the input video frame. Finally, by using the transformed image location mask 56 and the background mask 70, the mixer 66 implants the perspective view of the image 42 into the desired background portions of the input video frame. In this way, if the players walk on the part of the court 32 where the image 42 is implanted, it will appear that they walk "over" the implanted image. If desired, the transformed mixing mask 58 can be used to mix the image 42 with the colors of the field on which the image 42 is implanted. Reference is now made to Figure 8, which details the operations of the identification unit. feature 60. In step 72, unit 60 uses the LUT to convert the frame of the input video of a multi-color frame to the four-color movie called the background mask 70. Specifically, for the tennis court 32, the LUT provides a first value for the pixels that have colors of the lines 38, a second value for the pixels that have colors of the inner court 36, a third value for the pixels that have colors of the outer court 40 and a fourth value (indicating the foreground pixels) to the remaining pixels. This is shown in Figure 7. The LUT can be implemented in any suitable of the many methods known in the art. The background mask 70 not only defines which pixels belong to the background of interest, it also includes in it the characteristics of interest, such as the lines 38. In this way, in the step 74, the feature identification unit 60 processes the mask 70 background to extract the features of interest.
Typically, though not necessarily, the LUT is designed to provide features with a single color value. For the example of a tennis match, the station involves reviewing those pixels of the bottom mask 70 having the first value and extracting the straight segments thereof. For example, step 74 can be implemented with a Hough transformation operation on the background mask 70. The Hough transformations are described on pages 121-126 of the book Digital Picture Processing, Second Edition, Vol.2 by Azriel Rosenfeld and Avinash C. Kak, Academic Press, 1982, whose book is incorporated herein by reference. The result is a series of line parameters, each describing a straight segment in the background mask 70. Line parameters for each segment include the coefficients of the line equations that describe them as well as a weight value that indicates the number of pixels included within the segment. In FIGS. 9A, 9B and 9C, an alternative method of extraction is illustrated to which a brief reference is now made. As shown generally in Fig. 9A, the method starts at a first pixel 69 (Fig. 9B) of the background mask 70 having the color of interest (in this example, white) and looking in its vicinity 75 to determine where they exist more white pixels (marked by shading). To do this, it divides the immediacy 75 into subsectors 71-74 of a predetermined size and performs a distribution histogram of white pixels in each sub-sector. Figure 9C illustrates the histogram for sectors 71-74 of Figure 9B. The one with a strong maximum (subsector 73) is selected as the next sector for the search. In the next step, a new immediacy 78 is defined which consists of the selected sub-sector 73 and an extension thereof. The entire immediacy 78 is twice the length of the vicinity 75. This new immediacy 78 is subdivided into four subsectors 76 and the process is repeated. This process continues until one of the following criteria is met: 1. The sub-sector is narrow enough to be defined as a straight line; 2. You do not get a strong maximum in the histogram. If condition 1 is obtained, the coefficients of the straight line are stored and the pixels that form the straight line are "colored" then to have the "remaining color" and thus removed from the search.
The characteristic extraction process produces a series of possible characteristics that include the true characteristics as well as scattered lines. Reference is now made to Figure 10, which generally illustrates the operations of the perspective identification unit 62 of Figure 5. Reference is also made to Figures HA and 11B, which are useful in understanding the operation from unit 62 in general, to figure 13, which details the operations of unit 62 for example of tennis court 32 and to figures 12A, 12B, 12C, 14A and 14B, which are useful in the understanding of the operations detailed in Figure 13. Using a priori information, unit 62, in step 80, processes the series of possible characteristics and determines the one that is most likely to be characteristics of interest. In step 82, unit 62 selects a minimum set of characteristics of the resulting true characteristics and attempts to compare them with the characteristics of model 50. The process is repeated as often as necessary until a match is found. In step 84, the matched features are used to generate a transformation matrix M that transforms the model to the characteristics in the input video frame.
In the example of the tennis court 32, the stage 80 uses the fact that the lines 38 of the model 50 are parallel in two directions (vertical and horizontal) and that in the perspective views (such as in the video frame of input) the lines that are parallel are actually in a finite point. This is illustrated in figure HA, in which all segments of extracted lines, represented by solid lines, extend by dotted lines. The perspective lines that correspond to the actual parallel lines (for example pseudo-parallel lines 90) intersect at a point 91 away from the outer edges 92 of the frame. All other intersections, labeled 94, occur within edges 92 or near their edges. However, as illustrated in Figure 11B, due to the scanning errors, it could be determined that the extensions of the three pseudo-parallel lines are not at a single point. In fact, they could be found at three widely separated points 96. Applicants have realized that, since the parallel lines in perspective are in infinity, the projection of the lines extracted on an asymptotic function will cause the points of intersection appear almost together. Accordingly, according to a preferred embodiment of the present invention, the segments of the extracted line project towards a two-dimensional asymptotic function. One such projection is known as a "Gnomonic Projection" and is described on pages 258, 259 and 275 of the book Robot Vision by Berthold Klaus Paul Horn, The MIT Press, Crambridge, Massachusetts, 1986, whose pages are incorporated into the present by reference. Figures 12A and 12B illustrate examples of gnomonic projections. In the gnomonic projection, a point 100 is projected on a plane XY 102 on a point 100 'in a hemisphere 104. A line 106 is projected in the plane of XY onto a large arc 106' of the hemisphere 104 (i.e. of a large circle of a sphere). The origin is represented by the south pole 109 and the infinity is represented by the equator 108. In this way, any group 110 (figure 12) of points near the equator 108 represents the intersection of pseudo-parallel lines and thus, the lines that have points that go through a group 110 are parallel lines. Figure 12B illustrates a plurality of large arcs, labeled 120a-120f, corresponding to some arbitrary extracted line segments (not shown). The three arcs 120a-120c have intersection points 122 that form a group 110a near the equator 108. In step 130 (FIG. 13), the gnomonic projection is used to produce a series of large arcs from the series of segments of straight line produced from the feature extraction (step 74, figure 8). In step 132, the area around equator 108 is searched to find all points of intersection 122. A value Vk is given to each intersection point. The value Vk is a function of the weights Wi of the intersecting line segments and the Z coordinate of the intersection point 122. An example of a function Vk is given in equation 1: V k «« line 1"line 2 1 V to intersection point / \ / where f (Zpoint of intersection) is any function that has a curve similar to curve 134 of Figure 12C where most points receive a lower value and only those points that approach to the equator 108 (Z = 1) receive values close to 1. For example f (Zpoint of intersection) could ST1T Z. In step 136, a small immediate vicinity around each intersection point 122 is searched for other points of intersection. one is found, the current point of intersection and those found are stored as a group 110 (Figure 12B) A group 110 is also defined as the one whose value of f (Zpoint d intlction) is above a predetermined threshold. this way, a gru po 110 may include only one intersection point. In Figure 12B there are three groups 110a-110c, one of which, group 110c, includes only one intersection point 122. Once all the points have been searched, a location of each group 110 is determined upon finding the " center of gravity "of the points in the group. The weight of group 110 is the sum of the values Vk of the points in the group. In step 138, the two groups with the highest weight are selected. For the example of Figure 12B, groups 110a and 110b are selected. In step 140, it is assumed that one group represents the "vertical" lines and the other represents the "horizontal" lines. Also, in step 140, the straight segments corresponding to the lines of the two selected groups are marked "vertical" or "horizontal", respectively. In step 142, the "vertical" and "horizontal" lines are reviewed and the two heaviest vertical lines and the two heaviest horizontal lines are selected, where the "heaviest" is determined by the Wi values. The selected lines, labeled 146, are shown in Figure 14A for the lines of Figure HA. In step 144, the intersection points labeled A, B, C and D of the four selected lines are determined and stored. As shown in Figure 14A, the selected lines can intersect outside the frame. Steps 130-144 are the operations necessary to identify the true features in the video frame (step 80 of FIG. 10). The output of step 144 are the characteristics that are to be compared with the model. The remaining stages compare the characteristics with the model and determine the transformation (stages 82 and 84 of Figure 10) as an integrated set of operations. A standard tennis court has five vertical lines and four horizontal lines. Since it is not possible to differentiate between the two halves of the court, only three horizontal lines are important. The number of different quadrilaterals that can be formed from a selection of two horizontal lines of three (three possible combinations) and two verticals of five (10 possible combinations) is thirty. The thirty quadrilaterals can be found in four different orientations for a total of 120 rectangles. In step 150, one of the 120 rectangles in the geometric model 50 is selected by selecting its four corners, labeled A ', B', C and D '(Figure 14B).
As can be seen, this is not the correct match. In step 152, the matrix M is determined, which is transformed from the four points A ', B', C and D 'of the model (figure 14B) to the four points A, B, C, D of the video frame (Figure 14A). The matrix M can be represented as an overlap of the subsequent transformations as explained in relation to figure 15. Figure 15 shows three quadrilaterals 180, 182 and 184. Quadrilateral 180 is the quadrilateral model ABCD shown in a plane of XY, quadrilateral 182 is a unit square that has points (0,1), (1,1), (0,0) and (1.0) in a plane of TS, and quadrilateral 184 is quadrilateral in perspective 184 in a plane of UV. The transformation M of the quadrilateral model 180 to the quadrilateral in perspective 184 can be represented by the superposition of two transformations, a conversion and scaling of the matrix T of the quadrilateral 180 to the square of unit 182 and a matrix in perspective P of the square of unit 182 to the quadrilateral 184. The matrix T, in the homogeneous coordinates, has the form: where Sx and Sy are the scaling factors in the directions of X and Y, respectively and Tx and Ty are the conversion factors of X and Y. Sx, Sy, Tx, and Ty are determined by the equation: (x, y , 1) * T = (s, t, 1) (3) for the four coordinates (x, y, 1) of quadrilateral 180 and the four coordinates (s, t, 1) of the square of unit 182. The matrix P , in the homogeneous coordinates, has the form: The elements of the matrix P are determined by solving the following equation: (s, t, 1) * P = (u, v, w) (5) where (u, v, w) represents the four known coordinates of the points A, B, C and D of quadrilateral 184, as shown in figure 15, and w is always normalized. Assuming that a33 = 1, then P can be calculated as follows: From (s, t, 1) = (0, 0, 1), we determine that: a31 = Uoo (6) a32 = Voo From (s, t, 1) = (1, 0, 1), we determine that: all + a31 = U? 0 (al3 + 1) = > a 11 = U? 0 (al3 + 1) - U00 (7) al2 + a32 = V10 (al3 + 1) = > a 12 = V10 (al3 + 1) - V00 From (s, t, 1) = (0, 1, 1), we determine that: a21 + a31 = Uoi (a23 + 1) = > a 21 = U0? (a23 + 1) - U00 (8) a22 + a32 = V0i (a23 + 1) = > a 22 = V0] (a23 + 1) - V00 From (s, t, 1) = (1, 1, 1), we determine that: all + a21 + a31 = Un (al3 + a23 + 1) (9) al2 + a22 + a32 = V? a (al3 + a23 +1) From equations 7-9, two equations in two unknown, al3 and a23, are produced, as follows: al3 (U? o - Un) + a23 (U0 ? - Un) = U + U00 - U10 - U0: (10) al3 (V? O - Vn) + a23 (V0? - V ") = V" + V00 - V10 - V01 Once they are determined al3 and a23 , the remaining elements can be determined from equations 7 and 8. The transformation matrix M, or representation, is the matrix product of matrices T and P, as follows: M = T * P (11) In step 154, the lines 38 of the model 50 are represented in the video frame using the representation matrix M. The result is a distorted frame 156 (figure 16) that has ls wherever there are the converted pixels of the model and Os anywhere else. As can be seen, points A ', B', C and D 'agree with points A, B, C and D, respectively. However, the rest of the geometric model 50 does not.
In step 158, the distorted frame 156 is XOR with the background mask 70 (FIG. 7). The XOR stage emits a 0 on two occasions: a) the pixels of the distorted frame 156 have a value of 1 and the pixels of the video frame have the field line color; and b) the pixels of the distorted frame 156 have a value of 0 and the pixels of the video frame have the color "different from that of the line". The remaining situations receive values of 1. In steps 160 and 161, the number of pixels having values of 1 is counted and the value is associated with the transformation matrix M. After all matrices M have been determined, in step 162 the matrix having the least weight is selected. Since there is a possibility that a match can not be made (ie the video is showing a commercial, the television cameras 30 are watching the audience, etc.), in step 164, the weight of the selected matrix is checked against a threshold. If it is above that value, then a null transformation matrix is provided. Otherwise, the selected matrix is defined as the transformation matrix M. Null transformation matrices are also provided when the test conditions of any of the previous steps fail.
Reference is now made to Figure 17, which illustrates the operations of the transformer 64 and the mixer 66 of Figure 5. The transformer 64 uses the transformation matrix M to distort each of the images 42, the image region mask 56 and the mixing mask 58 to the plane of the video frame (step 170). The distorted image region mask is also AND with the background mask 70, producing an authorization mask therefrom. The authorization mask indicates those pixels of the video frame that are both background pixels and are within the image region. On these pixels the image will be implanted. The mixer 66 combines the distorted image with the video frame according to the mixing and authorization masks. The formula that is implemented for each pixel (x, y) is typically: Output (x, y) = ß (x, y) * image (x, y) + (l-ß (x, y)) * video (x, y) (12) ß (x , y) = a (x, y) * P (x, y) (13) where Output (x, y) is the pixel value of the output frame, image (x, y) and video (x, y) are the values in the implanted image 42 and the video frame, respectively, a (x, y) is the value in the mix mask 58 and P (x, y) is the value in the authorization mask.
The description so far assumes that the LUT that produced the background mask 70 remains correct during the entire match. If the lighting changes (which typically occurs in outdoor parties), the colors in the video sequence may change and, as a result, the background mask 70 will no longer correctly indicate the background elements. Accordingly, a correction procedure can be carried out periodically. The correction procedure is detailed in Figure 18 to which reference is now made. It is observed that, in the calibration process, the test sites, which indicate the characteristics of interest in the fund (such as field lines and the internal and external courts) were selected by the operator. The place of the sites was recorded, according to their color values. Once the matrix for the calibration video frame is determined, the locations of the test sites were converted from the video framing plane into the geometric model plane (i.e., by using the inverse of the M matrix). ). After a certain time, when calibration is desired, the test sites become the plane of the current video frame. The distorted test sites within the current video frame are selected and their surroundings are sampled. The color characteristics - 3í of each immediacy are calculated (using histograms, for example) and the result is compared with the characteristics of the captured site. If there is any significant change in the colors, the LUT is corrected and the relevant sites are converted into the geometric model and captured. It will be appreciated that the present invention encompasses the process described heretofore for tennis matches as well as for other situations in the which the background information is fixed and known. The process described so far in the present can be improved in several ways, through the search and through the knowledge of the parameters of the camera, as described hereinafter. 15 When information is provided on each position of the camera, the angles of rotation, and the amount of approach (either externally or determined by the system), the operations described so far may be shortened as the number of degrees of freedom of the perspective matrix P. Specifically, the perspective matrix P includes in it the information regarding the position, angles of rotation and approach of the camera used.
This information can be extracted and the matrix in perspective P (or, similarly, the transformation matrix M) can be redefined as a function of each of the parameters of the cameras. Figure 19 illustrates a camera and its parameters. Its location is denoted by the vector 171 which has coordinates (x, y, z) of origin 0 of the coordinate system X, Y, Z 172. The camera rotates, tilts, turns and turns, respectively, around the camera based on axes U, V and, as indicated by arrows 173, 174 and 175, respectively. In addition, the lens of the camera can be approached along the V axis, as indicated by the arrow 176. Assuming that the camera does not rotate and that the aspect ratio of the camera (the ratio between the amplitude and the height of a pixel in the image produced by the camera) defines square pixels, the perspective matrix P can be measured as a function of the location (x, y, z) of the camera and its inclination, turn and approach. It is assumed that the camera does not change its position from framing to framing but only changes its inclination, turn, angles or approach. Figures 20, 21 and 22 represent the determination method and then the parameters of the camera are used. In Figure 20, when a new cut is identified in the video stream, the entire process of the perspective identification (step 180), as shown in Figure 10, is carried out in the first frame of the new cut. Step 180 produces the elements a (i, j) of the perspective matrix P. The process continues in two directions: a) The transformation matrix T is determined, starting from step 154 of figure 13; and b) The coordinates of the camera (x, y, z) are extracted (step 184) from the matrix P, as was thought in section 3.4 of the Three-Dimensional Computer Vision book: A Geometric Viewpoint, by Oliver Faugeras, MIT Press , 1993.
The book is incorporated in the present for reference. Once the coordinates of the camera have been extracted (x, y, z), two checks are carried out (steps 186 and 188) as follows: Condition 186: The camera does not rotate (rotate) in the direction 174. The turn occurs when the element a ^ is not equal to zero. Condition 188: The aspect ratio (AR) of the camera defines square pixels. (ie, AR = 1) If any condition is false, the rest of the shortened process is aborted. If both conditions are fulfilled completely, then, as was thought in the book Three-Dimensional Computer Vision: A Geometric Viewpoint, matrix P can be re-rendered (step 190) as the product of the following matrices: a) Approach (f): the projection matrix of the focal plane of the camera; b) Conversion: the conversion matrix from the origin of the coordinate system to the calculated position of the camera, (x, y, z); c) Incline (a): the matrix of the rotation around the axis U through the angle a; and d) The return (?): the matrix of the rotation around the W axis through the angle?. With the values of approach, inclination, turn and conversion, the first chamber is completely calibrated (step 192) and its parameters are inserted in a table 194 of the identified chambers (shown in figure 21). Other cameras will be identified and registered in table 194 as described hereinafter. The shortened calculation process, described with respect to Figure 22, is then carried out in all frames. A framing is examined (step 196) to determine its similarity to the previous framings, using a,? and f. The similarity is measured through a comparison coefficient (ie, the percentage of pixels of interest in the frame represented with success for the model using the calculated matrix). If good similarity is obtained, the calculated matrix can be used for the insertion process (described with respect to Figure 17). If the comparison coefficient is small, it is possible that this frame was filmed by another camera in table 194. To find the other camera, the current frame must be reviewed and a line must be identified in it. In addition, a point on the identified line, such as a point of intersection with another line, must also be identified (step 198). Normally, the identified line is the "strongest" line. In step 200, a concordance value is determined for each camera listed in table 194, as follows: The identified line and point are associated with a line and a point in the geometric model, and a matrix in perspective P is determined for this association, which transforms the line and point of the model to the identified line and point. Since each perspective matrix P is a function of the coordinates (x, y, z) of the current camera (which is known) and the inclination a, the return? and the approach f (which are unknown), the resulting perspective matrix P can be determined through the values of the inclination, turn and approach that can be calculated, assuming that the identified line and point are properly matched with the line and the point of the model. As in the method of Figure 10, the transformation matrix M is determined from the perspective matrix P and the geometric model is transformed, through the matrix M, in the plane of the image in the frame. The lines of the model are compared with the lines in the image and a matching value is produced. The process of associating a line and a point of the model with the identified line and point, producing a perspective matrix P of the known camera coordinates and the association of lines and points, and the determination of a matching value as a result , it is repeated for each combination of line and point in the geometric model. If the match values are considerably less than 1, indicating that the agreement was very poor, the process of concordance with the identified line and the point, described so far in the present, is repeated for another camera whose coordinates (x, y, z) are known. The highest concordance coefficient calculated for each chamber is inserted into a column, labeled 202, of Table 194 (Figure 21). In step 204, the camera with the highest value of coefficient 202 is selected and, if the coefficient is greater than a predetermined threshold, its perspective matrix P is used for the image insertion process of figure 17. If the Higher coefficient in column 202 has a value below the threshold, no known camera was used to trigger the current frames. The process of Figure 10 should be carried out followed by the camera identification process of Figure 20. It will be appreciated by persons skilled in the art that the present invention is not limited to what has been shown and described in particular up to and including moment. Rather, the scope of the present invention is defined by the following claims: NOVELTY OF THE INVENTION Having described the present invention, it is considered as a novelty and therefore the property described in the following claims is claimed as property. 1. A system for mixing an image with a video stream of the action occurring within a background space, the space having fixed surfaces and being scanned by at least one camera in an undetermined manner, the system comprises: a frame of video to capture a single video frame of a video signal at a time; an image implantation system comprising: means for receiving a) a flat model of said fixed surfaces and b) an image mask indicating a preselected portion of said flat model on which said image is to be mixed; identification means for using said model in order to determine if and where said preselected portion is shown in said frame; and means of implantation using the output of said identification means to implant said image in said frame on said preselected portion of said fixed surfaces, if said preselected portion is shown in said frame. A system according to claim 1, characterized in that it also comprises a design work station for previously designing said image and said image mask, said design work station occasionally communicating with said receiving means. A system according to claim 1, characterized in that said identification means comprise: means for reviewing said individual frame and for extracting the characteristics of said fixed surfaces thereof; and means to determine a transformation in perspective between said model and said extracted characteristics. A system according to claim 3, characterized in that said review and extraction means comprise means for creating a background mask and a foreground mask and an extraction means for using said background mask to extract desired characteristics, wherein said Background mask indicates the locations of the features of interest, of background elements and of foreground elements, wherein said foreground mask is formed of the foreground data portions of said background mask, and wherein said The extraction means uses the characteristics of the portion of interest of said background mask to extract said desired characteristics. A system according to claim 4, characterized in that said creation means comprises a search table for converting between the multiplicity of colors in said frame to one of: the colors of the characteristics of interest, the colors of the background elements and a color that indicates the foreground elements. A system according to claim 5, characterized in that said creation means comprises means for producing said search table from the colors indicated by a user and correction means for correcting said search table when said colors no longer indicate the characteristics of interest and background elements. A system according to claim 4, characterized in that said implantation means comprises: means for transforming said image and said image mask with said transformation into perspective; and means for mixing said transformed image and said image mask transformed both with said individual frame and with said foreground mask indicating the locations of the foreground data that will not be covered by said transformed image. A system according to claim 4, characterized in that said implantation means comprises: means for transforming said image, said image mask and said mixing mask with said transformation in perspective; and means for mixing said transformed image, said transformed image mask and said mixing mask transformed both with said individual frame and with said foreground mask indicating the locations of the foreground data that will not be covered by said transformed image. 9. A system according to claim 4, and characterized in that said characteristics are lines. A system according to claim 9, and characterized in that said revision and extraction means comprise a Hough transformation. 11. A system according to claim 9, and characterized in that said review and extraction means comprise means for determining the angles of segments of said lines when investigating the vicinity of the pixels by continually narrowing said vicinity. 12. A system according to claim 9, and characterized in that said determining means comprises means for projecting said extracted characteristics onto an asymptotic function to determine which of said characteristics are perspective versions of the parallel lines. A system according to claim 9, characterized in that said bottom space is a sports arena and wherein said means of determination also comprise: a list of rectangles of said model and the places of its angular points; means for selecting two vertical lines and two horizontal lines of said extracted characteristics and for determining their points of intersection; means for generating the transformation matrices of the angular points of each rectangle of said model for said characteristic intersection points; means to transform said model with each transformation matrix; means, using said bottom elements of said bottom mask, to compare each transformed model with said individual frame; and means to select the transformation matrix that best matches the characteristics of the frame. A system according to claim 13, and characterized in that said means of determination additionally comprises: means for receiving or extracting the coordinates of a set of cameras; means to represent a current transformation matrix as a product of the coordinate, tilt, and approach matrices and then determine the values for tilt, turn, and approach; and means for identifying the camera that has the calculated values for tilt turn and approach and store the information; and means to repeat the stages of receiving, representing and identifying as long as there is a new cut in the video.
. A method for mixing an image with a video stream of the action occurring within a background space, the space having fixed surfaces and being scanned by at least one camera in an undetermined manner, the method comprising the steps of: capturing a only framing one video signal at a time; receiving a) a flat model of said fixed surfaces and b) an image mask indicating a preselected portion of said flat model on which said image is to be mixed; use said model in order to determine if and where said preselected portion is shown in said frame; and with the output of said use step, implanting said image in said frame on said preselected portion of said fixed surfaces, if said preselected portion is shown in said frame. 16. A method according to claim 15, and characterized in that it also comprises the steps of previously designing said image and said image mask, and occasionally communicating with said receiving means. 17. A method according to claim 15, and characterized in that said step of use comprises the steps of: reviewing said individual frame and to extract characteristics of said fixed surfaces thereof; and determine a transformation in perspective between said model and said extracted characteristics. A method according to claim 17, characterized in that said step of review and extraction comprises the step of creating a background mask and a foreground mask, wherein said background mask indicates the places of the characteristics of interest, of elements background and foreground elements, wherein said foreground mask is formed of the foreground data portions of said background mask, and the step of using the features of the interest portion of said background mask for extract those desired characteristics. 19. A method according to claim 18, and characterized in that said step of creation includes the step of converting, through a search table, between the multiplicity of colors in said frame to one of: the colors of the characteristics of interest, the colors of the background elements and a color that indicates the foreground elements. A method according to claim 19, characterized in that said step of creating comprises the step of producing said search table from the colors indicated by a user and the step of correcting said search table when said colors no longer indicate the characteristics of interest and the background elements. 21. A method according to claim 18, and characterized in that said step of use comprises the steps of: transforming said image and said image mask with said transformation into perspective; and mixing said transformed image and said image mask transformed both with said individual frame and with said foreground mask indicating the locations of the foreground data that will not be covered by said transformed image. 22. A method according to claim 18, and characterized in that said step of use comprises the steps of: transforming said image, said image mask and said mixing mask with said transformation into perspective; and mixing said transformed image, said transformed image mask and said mixing mask transformed both with said individual frame and with said foreground mask indicating the locations of the foreground data that will not be covered by said transformed image. 23. A method according to claim 18, and characterized in that said features are lines. 24. A method according to claim 23, and characterized in that said step of review and extraction comprises the step of carrying out a Hough transformation. 25. A method according to claim 23, and characterized in that said step of review and extraction comprises the step of determining the angles of segments of said lines when investigating the vicinity of the pixels and continuously narrowing said surroundings. 26. A method according to claim 23, and characterized in that said determining step comprises the step of projecting said extracted characteristics onto an asymptotic function to determine which of said characteristics are perspective versions of the parallel lines. 27. A method according to claim 23, characterized in that said bottom space is a sports arena and wherein said step of determining also comprises the steps of: providing a list of rectangles of said model and the places of its angular points; select two vertical lines and two horizontal lines of said extracted characteristics and determine their points of intersection; generate transformation matrices of the angular points of each rectangle of said model for said characteristic intersection points; transform said model with each transformation matrix; using said background elements of said background mask, to compare each transformed model with said individual frame; and select the transformation matrix that best matches the characteristics of the frame. A method according to claim 27, characterized in that said determining step further comprises the steps of: receiving or extracting the coordinates of a set of cameras; represent a current transformation matrix as a product of the co-ordinate, tilt, turn and approach matrices and then determine the values for tilt, turn and approach; and identify the camera that has the calculated values for the tilt, turn and approach and store the information; and repeat the stages of receiving, representing and identifying as long as there is a new cut in the video.

Claims (1)

  1. CLAIMS AMENDED [received by the International Bureau on August 24, 1995 (08/24/95); original claims 1-28 replaced by new claims 29-36 (2 pages)] 29. A method for implanting an image in one selected at a time from a plurality of video frames representing a stream of action occurring within a background space, the space having fixed flat surfaces and being explored by means of at least one video camera, the method comprising the steps of: generating a model, independent of said plurality of video frames, of a selected one of said fixed surfaces, said model a representation of geometric configurations that characterize said surface; and using said model to implant said image in said frames, said use step comprising the step of distorting said model in perspective. 30. A method according to claim 29, characterized in that said geometric configurations comprise at least one of the group consisting of lines and arcs. A method according to claim 30, characterized in that it further comprises the step of providing an indication of the planar relation between the individual ones of said lines and arcs. 32. A method according to claim 29, characterized in that said representation is a planar vector representation. 33. An apparatus for implanting one image at a time into a selected one of a plurality of video frames representing a stream of action occurring within a background space, the space having fixed flat surfaces and being scanned by at least one camera of video, comprising: means for generating a model, independent of said plurality of video frames, of a selected one of said fixed surfaces, said model comprising a representation of geometric configurations that characterize said surface; and means for using said model to implant said image in said frames, said means of use comprising means for distorting said model in perspective. 34. An apparatus according to claim 33, characterized in that said geometric configurations comprise at least one of the group consisting of lines and arcs. 35. An apparatus according to claim 34, characterized in that it further comprises means for providing an indication of the planar relationship between the individual ones of said lines and arcs. 36. An apparatus according to claim 33, characterized in that said representation is a planar vector representation.
MXPA/A/1996/004084A 1994-03-14 1996-09-13 A system for implanting an image into a video stream MXPA96004084A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IL108957 1994-03-14
IL10895794A IL108957A (en) 1994-03-14 1994-03-14 System for implanting an image into a video stream
IL108,957 1994-03-14
PCT/US1995/002424 WO1995025399A1 (en) 1994-03-14 1995-02-27 A system for implanting an image into a video stream

Publications (2)

Publication Number Publication Date
MX9604084A MX9604084A (en) 1997-12-31
MXPA96004084A true MXPA96004084A (en) 1998-09-18

Family

ID=

Similar Documents

Publication Publication Date Title
US5491517A (en) System for implanting an image into a video stream
US6864886B1 (en) Enhancing video using a virtual surface
EP0796541B1 (en) System and method of real time insertions into video using adaptive occlusion with a synthetic reference image
EP0595808B1 (en) Television displays having selected inserted indicia
US7928976B2 (en) Telestrator system
AU611466B2 (en) System and method for color image enhancement
CN108141547B (en) Digitally superimposing an image with another image
CN111371966B (en) Method, device and storage medium for synthesizing foreground character shadow in virtual studio
KR20030002919A (en) realtime image implanting system for a live broadcast
CA2231849A1 (en) Method and apparatus for implanting images into a video sequence
MXPA96004084A (en) A system for implanting an image into a video stream
Owen et al. Augmented imagery for digital video applications
Tan Virtual imaging in sports broadcasting: an overview
KR20050008246A (en) An apparatus and method for inserting graphic images using camera motion parameters in sports video
CA1324321C (en) System and method for color image enhancement
CN111986133A (en) Virtual advertisement implanting method applied to bullet time
KR20050008247A (en) An Apparatus and Method for mixing 3D graphic images in sports video
MXPA97010194A (en) System and method of real time inserts in video signals using adjustable occlusions with a sintet reference image