WO2020115520A1

WO2020115520A1 - Ball tracking in sport events

Info

Publication number: WO2020115520A1
Application number: PCT/IB2018/059549
Authority: WO
Inventors: Evgeny TSIZIN-GOLDMAN; Evgeni KHAZANOV; Israel OR; Chen SHACHAR
Original assignee: Playsight Interactive Ltd.
Priority date: 2018-12-02
Filing date: 2018-12-02
Publication date: 2020-06-11
Also published as: WO2020115565A1

Abstract

A method of ball tracking in a sport event, the method comprising computer-executed steps of receiving a video sequence capturing movement of the ball during the sport event in a series of video frames, calculating a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence, and combining at least two of the calculated difference-frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

Description

BALL TRACKING IN SPORT EVENTS

FIELD AND BACKGROUND OF THE INVENTION

The present invention relates to video image processing during a sport event such as a tennis match or a training session, a soccer game or a training session, a football match, etc., and more particularly, but not exclusively to a system and method of ball tracking in such a sport event.

Such video image processing with which a ball is tracked during a sport event is often used for following certain events that occur during the sport event, which events need to be detected and classified. Many of the events are detected based on the ball’s motion and position. The events detected during a sport event may include, for example, a ball’s hitting the ground - i.e. an“In” or an“Out” event (say in tennis), a ball’s entering a basket (in basketball) or a soccer gate, a ball’s passing from one player to another, etc.

A video of a sport event may be divided into a quasi-static background (such as the court lines) and a dynamic foreground (usually, the moving players and ball).

Naturally, it is the foreground which bears information that reveals the dynamics of the sport event (say game) and therefore, the video image processing may focus on that background.

However, even when limiting most of the video processing to that foreground, ball detection and classification methods needed for tracking the ball’s movement, so as to identify or classify such events, may prove to be a non-trivial.

For example, when captured in video, many objects other than a ball may nevertheless resemble a ball. Due to the finite exposure time of camera shutter mechanisms, a ball (as captured in video) may appear somewhat elongated, and the ball’s image may be distorted while propagating through the camera optics to the camera sensors. While captured from great distance, an image of a ball as captured in a sequence of video images may appear as a small aggregate of pixels which hardly resemble a ball.

For these and other reasons, especially when resources (such as processing power, processing time, data storage, etc.) are limited, many objects other than the ball (say a player’s head or foot, a symbol printed on a player’s shirt, etc.) as captured in video during a sport event, may be mistakenly identified as the ball. Recently, Deep Learning methods such as neural networks with several neuronal layers have also been employed to solve image processing problems of the sort discussed hereinabove. Tunable parameters of such neuronal networks are usually based on the learning of large databases of usually, labeled of objects of interest.

The most computationally intensive part of such deep learning algorithms is usually carried out before the event, in an offline stage that is often performed by GPUs (Graphical Processing Units), with an objective of tuning the parameters in such a way that a system used to process video images captured during the sport event itself (i.e. on-line) can identify the ball in different realistic circumstances.

However, the online stage of applying a neural network created during the offline stage too may be computationally intensive, as it may have to be performed in real time (say 25-50 times a second). Thus, this stage too may substantially add to the resources needed for carrying out such image processing processes based on deep learning.

SUMMARY OF THE INVENTION

According to one aspect of the present invention, there is provided a method of ball tracking in a sport event, the method comprising computer-executed steps of: receiving a video sequence capturing movement of the ball during the sport event in a series of video frames, calculating a plurality of difference-frames, each difference- frame being calculated over a respective group of at least two of the video frames of the received video sequence, and combining at least two of the calculated difference- frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

According to a second aspect of the present invention, there is provided an apparatus for ball tracking in a sport event, the apparatus comprising: a computer processor, a video receiver implemented on the computer processor, configured to receive a video sequence capturing movement of the ball during the sport event in a series of video frames, a difference-frame calculator, in communication with the video receiver, configured to calculate a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence, and a composite frame former, in communication with the difference-frame calculator, configured to combine at least two of the calculated difference-frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

According to a third aspect of the present invention, there is provided a non- transitory computer readable medium storing computer executable instructions for performing steps of ball tracking in a sport event, the steps comprising: receiving a video sequence capturing movement of the ball during the sport event in a series of video frames, calculating a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence, and combining at least two of the calculated difference- frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.

Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is herein described, by way of example only, with reference to the accompanying drawings.

With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. The description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.

In the drawings:

Fig. 1 is a simplified block diagram schematically illustrating an exemplary apparatus for ball tracking in a sport event, according to an exemplary embodiment of the present invention.

Fig. 2 is a simplified flowchart schematically illustrating an exemplary method of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

Fig. 3-11 are simplified block diagrams schematically illustrating an exemplary implementation scenario of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

Fig. 12 is a simplified block diagram schematically illustrating a non- transitory computer readable medium storing computer executable instructions for performing steps of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present embodiments comprise an apparatus, method, and computer readable medium for ball tracking in a sport event.

With exemplary embodiments of the present invention, a video sequence captured by a camera during a sport event, is used for tracking a trajectory taken by a ball during the sport event. The sport event may include, but is not limited to a sport event such as a tennis match or a tennis training session, a soccer game or a soccer training session, a football match or a football training session, etc.

Based on that tracking of the ball’s trajectory, which tracking is carried out using the video sequence captured by the camera, events that occur during the sport events (say an“Out” or an“In” event in tennis, a handball event in soccer, etc.) may be identified, visualized to a referee, etc., as described in further detail hereinbelow.

Thus, accordingly to some exemplary embodiments of the present invention, during a sport event that takes place in a constrained environment such as a tennis court or a football field, there is received a video sequence that captures movement of a ball during the sport event in a series of video frames.

In one example, the video sequence is captured using a video camera installed in the constrained environment and is streamed live from the video camera to a system that implements one or more of the methods of the present invention, say to an apparatus implemented on a remote computer in communication with the video camera, as described in further detail hereinbelow.

Then, based on received video sequence, there is calculated a plurality of difference-frames. Each one of the difference-frames is calculated over a respective group of two or more of the video frames of the received video sequence.

The difference-frame may be calculated, for example, by subtracting between pixel values of the two or more video frames that make up that group of video frames, by calculating a high order difference over the video frames of the group (say a difference between differences), by applying a predefined formula on pixel values of the video frames of the group, etc., as described in further detail hereinbelow.

Thus, each one of the difference-frames is a video frame that represents a difference among the two or more video frames of the respective group. The difference-frame is accordingly likely to include an image of one or more moving objects, as captured in different positions, in the video frames that make up the received video sequence.

The difference -frame is thus meant to include two or more images of ball, such that each image of the ball as appearing in the difference-frame, has a different position within the frame and represents a different position of the moving ball, but omits at least some of the video sequence’s background elements (say court lines or fences), or a part thereof.

Then, at least two of the calculated difference-frames are combined so as to form a composite frame that represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence, as described in further detail hereinbelow. As a result, the composite frame represents positions of the ball as represented in two or more difference-frames, as described in further detail hereinbelow. Each image of the ball that appears in the composite frame has a different position within the composite frame and represents a different position of the moving ball, as described in further detail hereinbelow.

Optionally, the composite frame is further added one or more background elements omitted from the difference-frames and/or from the composite frame in one or more of the above mentioned steps of calculating the difference-frames and combing the difference-frames, say by multiplying pixels of one of the received frames by a factor (say by 0.1) and adding the thus multiplied frame to the composite frame, as described in further detail hereinbelow.

Thus, essentially, the composite frame resultant upon the steps described hereinabove, is a single frame that represents the trajectory of the ball as a series of images of the ball as captured in the received video sequence, as if the composite frame is created by overlaying a number of the difference-frames, each frame capturing the ball in a different position within the frame.

Optionally, several such composite frames are formed and combined, say in order to create a sequence of video such as a video clip, to illustrate the built-up of the trajectory taken by the ball during the ball’s movement as captured in the video sequence received from the camera, to emphasize certain moments during the ball’s movement, etc.

Optionally, the composite frame is presented to a user such as a referee, for assisting the user in determining occurrence of a predefined event during the sport event. For example, the composite frame may be presented to the user on a screen of a tablet computer or smart phone, for assisting the user (say referee) in determining occurrence of a predefined event (say an“In” or“Out” event) during the sport event, as described in further detail hereinbelow.

Indeed, many referees may find such a composite frame that reveals the ball’s trajectory using realistic images of the ball as captured in video sequence, to be more convincing than an artificially created image that represents the ball’s trajectory calculated using other means of tracking a ball or in a different form.

Optionally, the composite frame is additionally or alternatively, used for automatically determining the occurrence of the predefined event during the sport event, using the composite frame, as described in further detail hereinbelow. The resources needed to generate the composite frame with the exemplary embodiments presented hereinbelow, may be possible with much less computing resources (say computer memory, processing power, etc.), when compared with current Deep Learning methods or other resource-expensive methods.

For example, many methods in current use have relied on re source -expensive object recognition techniques, employed for recognizing a ball which as described hereinabove, may prove to be non-trivial and challenging.

By contrast, with at least some of the exemplary embodiments presented herein, the ball’s trajectory is extracted based on differences calculated among video frames without employing any object recognition technique (say a neuronal network based one) to recognize the ball.

With some of the embodiments, the ball’s trajectory and events based on analysis of the ball’s trajectory, may be identified directly using neuronal networks or other techniques, as described in further detail hereinbelow.

Indeed, with exemplary embodiments of the present invention, at least initially, the ball itself need not be recognized for extracting the ball’s trajectory or for revealing the ball’s trajectory to a referee or other user.

Potentially, the generation of the composite frame may require much less computing resources than multiple-layered neuronal networks, three dimensional (3D) modeling and tracking carried out using multiple cameras and computationally heavy 3D calculations, etc.

Optionally, the composite frame is rather used together with one of the currently known resource-expensive ball tracking methods, say for optimizing the use of the computationally heavy or otherwise resource-expensive method, by limiting the use of the re source -expensive method to moments of interest.

For example, the composite frames may be used to detect an occurrence of an event of interest (say an“In or Out” event) during the sport event, and only upon that detection, computationally heavier calculations of three dimensional modeling and tracking, may be employed, so as to more accurately determine if the event is an“In” event or rather an“Out” event, as described in further detail hereinbelow.

The principles and operation of a method, apparatus and computer readable memory according to the present invention may be better understood with reference to the drawings and accompanying description. Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings.

The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.

Reference is now made to Fig. 1, which is a simplified block diagram schematically illustrating an exemplary apparatus for ball tracking in a sport event, according to an exemplary embodiment of the present invention.

An apparatus 1000 for ball tracking in a sport event, according to an exemplary embodiment of the present invention, includes a computer. The computer may be a single computer or a rather, a group of two or more computers in communication over a local area network, over a wide area network (say the internet), over another computer network, etc., or any combination thereof.

The apparatus 1000 communicates with one or more cameras, say with a video camera in communication with the computer of the apparatus 1000, over the internet, over an intranet network, over a local area network, over another network, or any combination thereof, for receiving a video sequence made of video frames captured during a sport event, as described in further detail hereinbelow.

The apparatus 1000 further includes additional parts 110-130 and optionally, other parts too, as described in further detail hereinbelow.

Each one of the parts 110-130 may be implemented as software, as hardware, or as a combination of hardware and software, on the computer, on a device in communication with the computer, etc., as described in further detail hereinbelow.

Thus, one or more of the additional parts 110-130 may be implemented as software - say by programming one or more of the computer’s processor(s) to execute steps of the method described in further detail hereinbelow. Alternatively or additionally, one or more of the additional parts 110-130 may be implemented as hardware - say as one or more electric circuits, or rather as a combination of hardware and software. The apparatus 1000 includes a video sequence receiver 110, say a one implemented on the computer, as software, as hardware (say as an electric circuit implemented on a video card, a communication card, etc., or any combination thereof), etc., as described in further detail hereinabove.

During a sport event that takes place in a constrained environment such as a tennis court or a football field, the video sequence receiver 110 receives a video sequence that captures movement of a ball during the sport event in a series of video frames, as described in further detail hereinbelow.

In one example, the video sequence is captured using a video camera installed in the constrained environment and is streamed live from the video camera to the video sequence receiver 110, as known in the art.

In a second example, the video sequence is captured using a still camera installed in the constrained environment, and is rather made of a series of still frames captured by the still camera during the sport event, which still frames are received in an order that reflects the time in which each still frame is captured.

The apparatus 1000 further includes a difference-frame calculator 120, in communication with the video sequence receiver 110.

The difference -frame calculator 120 calculates a plurality of difference- frames, based on the video sequence received by the video sequence receiver 110, as described in further detail hereinbelow.

The difference-frame calculator 120 calculates each one of the difference- frames over a respective group of two or more of the video frames of the received video sequence, as described in further detail hereinbelow.

The difference-frame calculator 120 may calculate the difference-frame, say by subtracting between pixel values of the two or more video frames of the respective group of video frames, by calculating a high order difference over the group’s video frames (say a difference between differences), by applying a predefined formula on pixel values of the group’s video frames, etc., as described in further detail hereinbelow.

Optionally, the difference-frame calculator 120 further changes a resolution of at least one of the video frames of the group, as described in further detail hereinbelow. Optionally, the difference-frame calculator 120 limits at least a part of the calculating of the difference-frames to a region of interest (ROI) in the video frames of the group (say to pixels within a certain region surrounding a tennis court’s borderline or a soccer gate, as captured in each one of at least some of the frames).

Thus, each one of the difference-frames is a video frame that represents a difference among the respective group’s two or more video frames, and is accordingly likely to include an image of one or more moving objects (particularly, the ball) as captured in different positions, in the video frames that make up the video sequence.

The difference-frame is thus meant to include two or more images of ball (one image per each video frame used for calculating the difference-frame, if the ball appears in that video frame of the sequence), as described in further detail hereinbelow.

Each image of the ball that appears in the difference -frame calculated by the difference-frame calculator 120, has a different position within the calculated difference-frame, and represents a different position of the moving ball.

However, the difference-frame usually omits at least some of the video sequence’s background elements (say court lines, fences, soccer gates, or other elements that do not change or move between the frames of the video sequence received by the difference-frame calculator 120), or a part thereof.

Optionally, for calculating each one of the difference-frames, the difference- frame calculator 120 selects the video frames for the respective group of frames that the difference -frame is to be calculated over, according to a predefined criterion, say according to a time-dependent criterion. The criterion may be defined in advance, say by a programmer or operator of apparatus 1000, as described in further detail hereinbelow.

Thus, in one example, based on a time -dependent criterion, each specific one of the groups includes the most recently received frame of the video sequence (i.e. the last frame received when the specific difference-frame’s calculation over the group’s frames starts), and the video sequence’s frame received two seconds before that last frame.

In a second example, that is based on a different criterion, the received video sequence or a selected part thereof (say the last ten frames of the received video sequence), is stored in a buffer implemented on the computer’s memory, as described in further detail hereinbelow.

In the second example, each specific one of the groups that the difference- frames are calculated over, includes one of the frames that is chosen as a central, reference frame and all frames within a certain distance in the received video sequence, say a distance of two frames from that reference frame (thus the group of the example includes five frames).

Optionally, in the second example, the difference -frame calculator 120 calculates the difference-frames, by deriving a high order difference over the specific group’s video frames, as described in further detail hereinbelow.

In the second example, first, the difference-frame calculator 120 subtracts between each pixel value of the reference frame and corresponding pixel values of each respective one of the frames within the distance (i.e. between values of pixels that have a same position in the frame), to yield a respective difference for each pixel position.

Since the group of the example includes five frames, the difference-frame calculator 120 calculates four differences. A frame’s pixel can bear a positive value only, and therefore, the difference-frame calculator 120 uses the four differences in their absolute value.

Then, the difference-frame calculator 120 calculates a value for each pixel of the difference-frame.

In one version of the second example, the difference-frame calculator 120 calculates the value by averaging over the four differences calculated for the pixel’s position.

In another version of the second example, the difference-frame calculator 120 calculates the value by subtracting between secondary differences, each of which secondary differences in turn, is calculated by the difference-frame calculator 120, by subtracting between a respective, difference pair of the four differences.

The secondary differences values reflect the intensity of changes that occur between the video frames. Thus, potentially, when using such secondary differences, the ball’s acceleration when moving may be emphasized, say by differences in brightness of the ball’s image as appearing in the composite frame, as described in further detail hereinbelow. The apparatus 1000 further includes a composite frame former 130, in communication with the difference-frame calculator 120.

The composite frame former 130 combines two or more of the calculated difference-frames, as described in further detail hereinbelow.

The composite frame former 130 combines the two or more calculated difference-frames, so as to form a composite frame that represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence, as described in further detail hereinbelow.

Each image of the ball as appearing in the formed composite frame has a different position within the composite frame and represents a different position of the moving ball, as described in further detail hereinbelow.

Optionally, the composite frame former 130 further adds at least one of the video frames of the received video sequence to the composite frame, as described in further detail hereinbelow.

As a result, the composite frame former 130 adds one or more background elements omitted from the difference-frames and/or from the composite frame by the difference-frame calculator 120 or by the composite frame former 130 itself (say when combining the difference-frames), back to the composite frame.

Optionally, the composite frame former 130 further multiplies at least some of the values of the pixels of the at least one of the video frames of the received video sequence by a predefined (say by a programmer of apparatus 1000) factor (say by 0.2), and adds the video frame thus multiplied, to the composite frame, for adding at least some of the omitted background elements.

Thus, essentially, the composite frame formed by the composite frame former 130, is a single frame that represents the ball’s trajectory as a series of images of the ball as captured in the video sequence received by the video sequence receiver 110.

The composite frame may thus show an image of the ball in each respective one of a plurality of different positions of the ball along the ball’s trajectory, as described in further detail hereinbelow.

Optionally, the composite frame former 130 further forms a plurality of such composite frames and combines the composite frames, to form a sequence of video, say a video clip, to be used for illustrating the built-up of the trajectory taken by the ball during the ball’s movement as captured in the received video sequence, for emphasizing certain moments during the ball’s movement, etc.

The composite frame former 130 forms each composite frame of the plurality of composite frames by combining a respective group of at least two of the difference- frames calculated by the difference-frame calculator 120.

The composite frame thus formed by the composite frame former 130 represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the video sequence received by the video sequence receiver 110, as described in further detail hereinbelow.

Optionally, the composite frame former 130 selects the difference-frames used to form the composite frame according to a predefined criterion, say according to a time -dependent criterion. The criterion may be predefined, say in advance of video sequence receiving, for example, by a programmer or operator of apparatus 1000.

Thus, in one example, according to a first exemplary time -dependent criterion, each group of difference-frames that the composite frame former 130 uses for forming a respective composite frame, includes the most recently calculated one of the difference-frames (i.e. the last difference-frame calculated when the specific composite frame’s formation by the composite frame former 130 starts).

The time -dependent criterion of the example further dictates that the group of difference-frames further include the difference-frames calculated one second before, two seconds before, three seconds before, four second before, and five second before that most recently calculated one of the difference-frames.

In a second example too, based on another exemplary time -dependent criterion, each group of difference -frames that the composite frame former 130 uses for forming a respective composite frame, includes the most recently calculated one of the difference-frames (i.e. the last difference-frame calculated when the specific composite frame’s formation by the composite frame former 130 starts).

However, according to the second example’s time-dependent criterion, the group further includes the seven difference-frames calculated immediately before that last difference-frame.

Optionally, when combining the group’s difference-frames to form the compound frame, the composite frame former 130 gives a different weight to different ones of the difference-frames that the composite frame former 130 uses for forming the composite frame.

For example, the composite frame former 130 may apply a different weight factor to each one of the difference-frames that are subjected to the combining.

In one example, for forming the composite frame, the composite frame former 130 gives each difference-frame a different weight, by multiplying each one of at least some of the difference-frame’s pixel values by a weight factor that differs among the difference-frames. The more recent is the difference-frame of the example (and thus, the more recent are the frames that the difference-frame is calculated over), the higher is the weight factor that the difference-frame’s pixel values are multiplied by.

In the example, the difference-frame calculator 120 calculates each one of the difference-frames over a group that includes the most recently received two frames of the video sequence, such that a series of difference-frames is calculated over the received frames of the video sequence, in a sliding window fashion.

More specifically, in the example, a video sequence made of ten video frames is received by the video sequence receiver 110 in real-time, or in near real-time.

During the receiving of the frames of the video sequence of the example by the video data receiver 110, the difference-frame calculator 120 calculates one difference-frame over the most recently received (10^th) frame and the frame received immediately before that frame (i.e. the 9^th frame). However, the difference-frame calculator 120 calculates a second difference -frame a bit earlier, just before that first difference-frame’s calculation, over the 9^th frame and the one received immediately before that 9^th frame (i.e. the 8^th frame). Earlier yet, in the specific example, one difference-frame is calculated by the difference-frame calculator 120 over the 8^th frame and the one received immediately before that 8^th frame (i.e. the 7^th frame), and so forth, thus generating nine difference-frames.

Further in the example, in a sliding window fashion too, the composite frame former 130 combines each group made of the seven most recently calculated ones of the difference-frames, to form a composite frame.

Optionally, as a part of the combining, the composite frame former 130 multiplies pixel values of each difference-frame being combined, by a factor that changes (say exponentially) among the difference-frames, such that the earlier is the difference-frame’s calculation, the smaller is the factor that the difference -frame’s pixel values are multiplied by.

In the example, as a part of that combining, the values of the pixels of the most recently (i.e. 7^th) calculated one of the seven difference-frames are multiplied by 0.3, the values of the pixels of the difference-frame calculated immediately before that one (i.e. the 6^th) are multiplied by 0.2, and the values of the pixels of the remaining five difference-frames (5^th to 1^st), calculated even earlier, are multiplied by 0.1.

Then, the composite frame former 130 combines the difference-frames of the group, to form the composite frame, by adding their multiplied pixel values of a same position, and sets the resultant composite frame’s pixel value for that position to be the sum of the added multiplied pixel values.

As a result, the trajectory of the ball is still presented in the composite frame as a series of images of the ball as captured in the frames of the video sequence. However, due to the different weight factors given to the difference-frames based on the time of their calculation, which time depends on the time of receipt of the most recent one of the frames that the respective difference-frame is calculated over, the trajectory is represented in the composite frame with a“fading out” effect.

Thus, in the example, when the composite frame formed from the difference- frames based on those weight factors, is presented to a user on a screen of a computer or smart phone, the more recent is the position of the ball in the ball’s trajectory, the brighter is the image of the ball representing that position to the user (say referee), as described in further detail hereinbelow.

Optionally, the apparatus 1000 further includes a composite frame presenter (not shown), in communication with the composite frame former 130.

The composite frame presenter presents the composite frame to a user (say a referee), for assisting the user in determining occurrence of a predefined event during the sport event.

Optionally, the composite frame presenter presents the composite frame to the user on a screen of a tablet computer or smart phone, for assisting the user (say referee) in determining the occurrence of the predefined event (say an“In” or“Out” event) during the sport event, as described in further detail hereinbelow.

Indeed, many referees may find such a composite frame that reveals the ball’s trajectory using images of the ball as captured in video sequence, to be more convincing than an artificially created image that represents a ball’s trajectory calculated using other means of tracking a ball.

Optionally, the apparatus 1000 further includes an event determiner (not shown), in communication with the composite frame former 130.

The event determiner determines occurrence of a predefined event automatically, during the sport event, say using the ball’s trajectory revealed by the composite frame, for automatically determining on which side of the court the ball lands when bouncing, as described in further detail hereinbelow.

In a first example, in one of the images of the ball shown in the composite frame, the ball is caught in the very moment of landing, thus making the automatic determining of an“In” or“Out” event a potentially straightforward task.

However, in a second example, for determining if the event occurs, the event determiner further needs to interpolate between two images of the ball as captured in the composite frame, say using one or more rules that may be predefined, say by a programmer or operator of apparatus 1000, as described in further detail hereinbelow.

In the second example, the composite images shows a few images of the ball, and thereby reveals the trajectory taken by the ball, but does not show an image of the ball when actually touching the ground.

Specifically, in the second example, for determining whether the occurrence is of an“Out” event or rather of an“In” event, the event determiner interpolates between two images of the ball, to determine whether the ball lands out of the court, as described in further detail hereinbelow.

Potentially, the resources needed to generate the composite frame with the exemplary embodiments presented hereinbelow, may be possible with much less computing resources (say computer memory, processing power, etc.), when compared with current Deep Learning methods or other resource-expensive methods.

For example, possibly, the generation of the composite frame may require much less computing resources than multiple-layered neuronal networks, three dimensional (3D) modeling and tracking carried out using multiple cameras and computationally heavy 3D calculations, etc.

Optionally, the event’s determiner uses the composite frame together with one of the known in the art resource-expensive ball tracking methods, say for optimizing the use of the computationally heavy or otherwise re source -expensive method, by limiting the use of the resource-expensive method to moments of interest.

For example, the event determiner may use the composite frame to detect an occurrence of an event of interest (say an“In or Our” event) during the sport event, and only upon that detection, use computationally heavier calculations of 3D modeling and tracking, so as to more accurately determine if the event is an“In” event or an“Out” event, as described in further detail hereinbelow.

Reference is now made to Fig. 2, which is a simplified flowchart schematically illustrating an exemplary method of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

A first exemplary method of ball tracking in a sport event, according to an exemplary embodiment of the present invention, may be executed by a computer. The computer may include a single computer, a group of computers in communication over a network, one or more electric circuits, or any combination thereof.

Optionally, for carrying out the first exemplary method, the computer communicates with one or more cameras, say with a video camera, over the internet, over an intranet network, over a local area network, over another network, or any combination thereof, for receiving 210 a video sequence made of video frames captured during a sport event, as described in further detail hereinabove.

Thus, accordingly an exemplary embodiment of the present invention, during a sport event that takes place in a constrained environment such as a tennis court or a football field, there is received 210 a video sequence that captures movement of a ball during the sport event in a series of video frames, say by the video sequence receiver 110 of apparatus 1000, as described in further detail hereinabove.

In one example, the video sequence is captured using a video camera installed in the constrained environment and is streamed live, say over the internet, from the video camera to a computer of apparatus 1000 (i.e. to a remote computer), as described in further detail hereinabove.

In a second example, the video sequence is captured using a still camera installed in the constrained environment, and is rather made of a series of still frames captured by the still camera during the sport event, which still frames are ordered according to the order in which the still frames are captured. Based on the received 210 video sequence, there are calculated 220 a plurality of difference-frames, say by the difference -frame calculator 120 of apparatus 1000 as described in further detail hereinabove. Each one of the difference-frames is calculated 220 over a respective group of two or more of the video frames of the received 210 video sequence .

In one example, each one of the difference-frames is a grayscale frame calculated 220 over two or more grayscale frames of the received 210 video sequence or over two or more grayscale frames that are the result of a step of converting the received 210 video frames into grayscale format, using known in the art techniques.

A processing of grayscale frames rather than of color frames may potentially improve performance and prevent possible color distortion in the composite frames formed 230 using the calculated 220 difference-frames.

Alternatively, the received 210 video frames are color frames, and the difference-frame is calculated 220 over two or more of the frames, i.e. on the color frames.

The difference-frame may be calculated 220, for example, by subtracting between pixel values of the two or more video frames of the respective group of video frames, i.e. between values of pixels that have a same position in the respective video frame, as described in further detail hereinbelow.

The difference-frame may also be calculated 220, by calculating a high order difference over the group’s frames (say a difference between differences, or a temporal numerical derivative scheme of higher order), by applying a predefined formula on pixel values of the group’s frames, etc., as described in further detail hereinbelow.

Optionally, the calculating 220 further includes changing a resolution of at least one of the video frames of the group, as described in further detail hereinbelow.

For example, one of more of the received 210 video sequence’s frame (or a part of the frame) may be decimated - to lower the frame’s resolution, over-sampled and interpolated - to increase the frame’s resolution, etc., as known in the art.

Optionally, at least a part of the calculating 220 is limited to a region of interest (ROI) in the video frames of the group (say to pixels within a certain region surrounding a tennis court’s borderline or a soccer gate, as captured in each one of at least some of the frames), as described in further detail hereinbelow. The ROI may cover any number of the video frame’s pixels - one pixel, twelve pixels, all pixels but certain pixels, all pixels of the frame, etc.

The ROI may actually include two or more ROIs, i.e. be made of two or more separate ROIs selected in the video sequence’s frames according to a criterion such as, for example, proximity to a soccer gate (say the two regions surrounding a socccer field’s two gates, respectively), basket, borderline on different sides of the court, etc.

Thus, each one of the calculated 220 difference-frames is a video frame that represents a difference among the respective group’s two or more video frames. Accordingly, each one of the calculated 220 difference-frames is likely to include two or more images of a moving object (particularly, the ball) as captured in the video frames that make up the received 210 video sequence.

The difference-frame is thus meant to include two or more images of ball (one image per each video frame used for calculating 220 the difference-frame, if the ball appears in that video frame of the sequence), as described in further detail hereinbelow.

Each image of the ball that appears in the calculated 220 difference-frame, has a different position within the calculated 220 frame, and represents a different position of the moving ball. However, the difference-frame usually omits at least some of the video sequence’s background elements (say court lines, fences, soccer gates, or other elements that do not change or move between the frames of the received 210 video sequence), or a part thereof.

Optionally, for calculating 220 each one of the difference-frames, the video frames are selected for the respective group of frames that the difference -frame is to be calculated 220 over, according to a predefined criterion, say according to a predefined time -dependent criterion, as described in further detail hereinabove.

Thus, in one example, based on a time -dependent criterion, each specific one of the groups includes the most recently received 210 frame of the video sequence (i.e. the last frame received 210 when the specific difference-frame’s calculation 220 over frames of that group starts), and the video sequence’s frame received 210 two seconds before that most recently received 210 frame.

In a second example, that is based on a different criterion, the received 210 video sequence or a selected part thereof (say the last ten frames of the received 210 video sequence), is stored in a buffer implemented on a memory of a computer (say a one that is a part of apparatus 1000, as described in further detail hereinabove).

In the second example, each specific one of the groups that the difference- frames are calculated 220 over, includes one of the frames that is chosen as a central reference frame and all frames within a distance of two frames from that central reference frame, in the received 210 sequence (thus making the group a group of five frames).

Optionally, in the second example, the difference-frames are calculated 220 by deriving a high order difference over the specific group’s video frames, as described in further detail hereinabove.

For example, the difference-frame may be calculated 220 by subtracting between each pixel value of the reference frame and values of pixels of each respective one of the frames within the distance, to yield a respective difference for each pixel position (i.e. four differences per each pixel position). Then, a value for each pixel of the difference-frame is calculated 220 by averaging over the four differences that pertain to that pixel’s position, or using another calculation made using the differences, as described in further detail hereinabove.

Further in the method, at least two of the calculated 220 difference-frames are combined 230, say by the composite frame former 130 of apparatus 1000, as described in further detail hereinabove. The two or more calculated 220 difference- frames are combined 230 so as to form a composite frame that represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received 210 video sequence, as described in further detail hereinbelow.

In one example, for combining 230 the difference-frames, the value of each pixel of the composite frame is set to be the sum of the combined 230 difference- frames’ pixel values for the corresponding position within the frame. Thus, for example, the composite frame’s pixel positioned second from the left in the third row is set to have a value that is the sum of the values of the difference-frames’ pixels, each one of which pixels is positioned second from the left in the third row.

In a second example, for combining 230 the difference-frames, the value of each pixel of the composite frame is set to be the sum of the squares (or other powers - whether complex or a real, as known in the art) of the combined 230 difference- frames’ pixel values for the corresponding position within the frame. In a third example, for combining 230 the difference-frames, the value of each pixel of the composite frame is set using a polynomial or other formula applied on values of the combined 230 difference-frames’ pixels of a corresponding position within the frame.

Optionally, the formula includes one or more quasi-static coefficients selected so as to change the timescale of the ball’s trajectory, as known in the art.

Optionally, the formula of the third example is predefined by a programmer or operator of apparatus 1000, as described in further detail hereinabove.

Each image of the ball as appearing in the formed 230 composite frame has a different position within the composite frame and represents a different position of the moving ball, as described in further detail hereinbelow.

Optionally, the combining 230 of the difference-frames to form the composite frame, further includes adding at least one of the video frames of the received 210 video sequence to the composite frame, say by the composite frame former 130, as described in further detail hereinabove.

As a result, there is added one or more background elements omitted from the difference-frames and/or from the composite frame in one of the above mentioned steps of calculating 220 and combing 230, to the composite frame, as described in further detail hereinbelow.

Optionally, the combining 230 of the difference-frames to form the composite frame, further includes multiplying at least some of the values of the pixels of the at least one of the video frames of the received 210 video sequence by a predefined (say by a programmer of apparatus 1000) factor. The video frame thus multiplied, is then added to the composite frame, for adding at least some of the omitted background elements.

Thus, essentially, the composite frame formed 230 in that combining 230, is a single frame that represents the ball’s trajectory as a series of images of the ball as captured in the received 210 video sequence.

Thus, optionally, the manner in which the composite frame is created 230 may resemble an overlaying of all or some of the calculated 220 difference-frames, each of which difference -frames captures the ball in a different position within the difference- frame, to form 230 a single layer that shows an image of the ball in each respective one of the different positions, as described in further detail hereinbelow. Optionally, a plurality of such composite frames is formed 230 and combined, to form a sequence of video, say a video clip, to illustrate the built-up of the trajectory taken by the ball during the ball’s movement as captured in the received 210 video sequence, to emphasize certain moments during the ball’s movement, etc.

Each composite frame of the plurality is formed 230 by combining 230 a respective group of at least two of the calculated 220 difference-frames, and represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received 210 video sequence.

Optionally, the difference-frames used to form 230 the composite frame are selected according to a predefined criterion, say according to a time-dependent criterion. The criterion may be predefined, for example, by a programmer or operator of apparatus 1000, as described in further detail hereinabove.

Thus, in one example, according to a first exemplary time -dependent criterion, each group of difference-frames that is used for forming 230 a respective composite frame, includes the most recently calculated 220 one of the difference -frames (i.e. the last difference -frame calculated 220 when the specific composite frame’s formation 230 starts). The time-dependent criterion of the example further dictates that the group further include the difference-frames calculated 220 one second before, two seconds before, three seconds before, four second before, and five second before that most recently calculated 220 one of the difference -frames.

In a second example, based on another exemplary time -dependent criterion, each group of calculated 220 difference-frames used for forming 230 a respective composite frame includes the most recently calculated 220 one of the difference- frames (i.e. the last difference-frame calculated 220 when the specific composite frame’s generation 230 starts). According to the second example’s time-dependent criterion, the group further includes the seven difference -frames calculated 220 immediately before that last difference-frame.

Optionally, when combining 230 the group’s difference-frames to form the compound frame, there is given different weight to different ones of the difference- frames that are used for forming 230 the composite frame.

For example, the step of combining 230 may include applying a different weight factor to each one of the difference-frames that are subjected to the combining 230, as described in further detail hereinabove. In the example, for forming 230 the composite frame, each difference -frame is given a different weight by multiplying each one of at least some of the difference - frame’s pixel values by a weight factor that differs among the difference-frames. The more recent is the difference-frame of the example (and thus, the most recent are the frames that the difference -frame is calculated 220 over), the higher is the weight factor that the difference-frame’s pixel values are multiplied by.

In the example, each difference-frames is calculated 220 over a group that includes the most recently received 210 two frames of the video, such that a series of difference-frames is calculated 220 over the received 210 frames of the video sequence, in a sliding window fashion. Whenever a new video frame of the video sequence is received 210, the sliding video is updated in a first-in first-out fashion.

More specifically, in the example, a video sequence made of ten video frames is received 210 in real time or in near real time.

While receiving 210 the frames of the video sequence of the example, one difference-frame is calculated 220 over the most recently received 210 (10^th) frame and the frame received 210 immediately before that frame (i.e. the 9^th frame). A second difference-frame is calculated 220 earlier, just before that first difference- frame’s calculation 220, over the 9^th frame and the one received 210 immediately before that 9^th frame (i.e. the 8^th frame). Earlier in the specific example, one difference-frame is calculated 220 over the 8^th frame and the one received 210 immediately before that 8^th frame (i.e. the 7^th frame), and so forth, thus generating nine difference-frames.

Further in the example, in a sliding window fashion too, each group made of the seven most recently calculated 220 ones of the difference-frames is combined 230, to form 230 a composite frame.

Optionally, as a part of the combining 230, values of pixels of each difference- frame being combined 230, are multiplied by a factor that changes (say exponentially) among the difference-frames, such that the earlier is the difference-frame’s calculation 220, the smaller is the factor that the difference-frame’s pixel values are multiplied by.

In the example, as a part of that combining 230, the values of the pixels of the most recently (i.e. 7^th) calculated 220 one of the seven difference-frames are multiplied by 0.3 and the values of the pixels of the difference-frame calculated 220 immediately before that one (i.e. the 6^th) are multiplied by 0.2. Further in the example, the values of the pixels of the remaining five difference-frames (5^th to 1^st), calculated 220 even earlier, are multiplied by 0.1. The difference-frames of the group are then combined 230, to form 230 the composite frame, by adding their multiplied pixel values of a same position, to set 230 the composite frame’s pixel value for that position.

As a result, the trajectory of the ball is still presented in the composite frame as a series of images of the ball as captured in the frames of the video sequence. However, due to the different weight factors given to the difference-frames based on the time of their calculation 220, which time depends on the time of receipt 210 of the most recent one of the frames that the respective difference -frame is calculated 220 over, the trajectory is represented in the composite frame with a“fading out” effect.

Thus, in the example, when the composite frame formed 230 from the difference-frames based on those weight factors, is presented to a user on a screen of a computer or smart phone, the more recent is the position of the ball in the ball’s trajectory, the brighter is the image of the ball representing that position to the user (say referee), as described in further detail hereinbelow.

Optionally, the criterion for selecting the calculated 220 difference-frames for the group of difference-frames that are combined 230 to form 230 the composite frame is dynamic, say adaptive, as described in further detail hereinbelow.

Thus, in one example, initially, the criterion dictates that the composite frame is to be formed 230 by combining difference-frames calculated 220 once in every two frames of the received 210 video sequence. However, later on, the criterion is updated (say by an operator of apparatus 1000, or rather automatically - say randomly), so as to dictate that the composite frame is to be formed 230 by combining difference- frames calculated 220 once in every three frames of the received 210 video sequence.

In a second example, initially, the criterion dictates that the composite frame is to be formed 230 by combining difference-frames calculated 220 once in every two seconds. However, later on, the criterion that in the second example, is adaptive, is automatically updated due to changing lighting conditions, so as to dictate that the composite frame is to be formed 230 by combining difference-frames calculated 220 once in every three seconds. Optionally, in the exemplary method, there are formed 230 two or more composite frames simultaneously, and there may be used a different criterion for selecting the group of difference-frames to be combined 230 to form each respective one of the composite frames.

Thus, in one example, a first composite frame is formed 230 by combining difference-frames, each of which difference-frames is calculated 220 over different two of the received 210 video sequence’s frames. Simultaneously to the forming of the first composite sample, a second composite frame is formed 230 by combining difference-frames, each of which difference-frames is calculated 220 over different three of the received 210 video sequence’s frames. Optionally, as a result, the ball’s trajectory may be represented in two or more composite frames, such that each one of the composite frames represents the ball’s trajectory with a different time scale.

In a second example, the simultaneously formed 230 composite frames may additionally or alternatively, differ in the region of interest (ROI). In one case, the first composite frame is formed 230 using difference-frames calculated 220 by subtracting pixel values within a first ROI - say around a first soccer gate, while a second composite frame is formed 230 using difference-frames calculated 220 by subtracting pixel values within a second ROI - say around a second soccer gate.

The ROI may cover any number of the difference frame’s pixels - one pixel, twelve pixels, all pixels but certain pixels, etc. The ROI may actually include two or more ROIs, i.e. be made of two or more separate ROIs selected in the difference- frame according to a criterion such as, for example, a criterion pertaining to a specific part of the court, etc.

In a third example, the calculated 220 difference-frames that are combined 230 to form 230 the composite frames do not differ in their ROI.

However, a first composite frame is formed 230 based on a calculation run on a first ROI within the difference-frames combined 230 to form the composite frame, whereas a second composite frame is formed 230 based on a calculation run on a second ROI within the difference-frames combined 230 to form the composite frame.

In a forth example, the composite frame is formed 230 by combining one of the calculated 220 difference frames and using that frame as a central difference- frame (say a one selected by an operator of apparatus 1000), and an equal number (say two) of difference-frames calculated 220 before and after that difference-frame. Optionally, the method further includes a step of presenting the composite frame to a user (say a referee), for assisting the user in determining occurrence of a predefined event during the sport event, say by the composite frame presenter of apparatus 1000, as described in further detail hereinabove.

Optionally, the composite frame is presented to the user on a screen of a tablet computer or smart phone, for assisting the user (say referee) in determining occurrence of a predefined event (say an“In” or“Out” event) during the sport event, as described in further detail hereinabove.

Indeed, many referees may find such a composite frame that reveals the ball’s trajectory using images of the ball as captured in video sequence, to be more convincing than an artificially created image that represents the ball’s trajectory that is calculated using other means of tracking a ball.

Optionally, the method further includes a step of automatically determining occurrence of a predefined event during the sport event, say using the ball’s trajectory revealed by the composite frame, for automatically determining on which side of the court the ball lands when bouncing.

In one example, in one of the images of the ball shown in the composite frame, the ball is caught in the very moment of landing, thus making the automatic determining of an“In” or“Out” event a potentially straightforward task.

However, in other cases, for determining if the event occurs, the method may need to interpolate between two images of the ball as captured in the composite frame, based on one or more predefined rules, as described in further detail hereinabove.

Thus, in one example, the composite image shows a few images of the ball, and thereby reveals the trajectory taken by the ball, but does not show an image of the ball when actually touching the ground. In the example, an occurrence of an“Out” or “In” event may still be determined automatically, say by interpolating between two images of the ball, to determine whether the ball lands out of the court, as described in further detail hereinbelow.

Using the method steps described hereinabove, trajectories of one or more other moving objects such as a player, a tennis racket, etc., may also represented in the composite frame, say as multiple images of the racket as captured in the video sequence when moving to hit the ball. As a result, for example, the trajectory of the ball may be used together with the trajectory of tennis racket to determine occurrence of a tennis“Topspin” event.

Optionally, the method further employs other techniques, to determine an occurrence of the predefined event, say gesture recognition techniques applied on player’s gestures and actions (say a gatekeeper’s jumping or a player’s raising her thumb) as captured in the received 210 video sequence, etc., as known in the art.

Optionally, the composite frame is rather used together with one of the resource-expensive ball tracking methods, say for optimizing the use of the computationally heavy or otherwise re source -expensive method, by limiting the use of the resource-expensive method to moments of interest.

In one example, the composite frames may be used to detect an occurrence of an event of interest (say an“In or Our” event) during the sport event, and only upon that detection, computationally heavier calculations of three dimensional modeling and tracking, may be employed, so as to more accurately determine if the event is an “In” or“Out”, as described in further detail hereinbelow.

In a second example, at least a part of the computationally heavy calculations that are typical to deep learning methods, are run on the composite frames rather than on the frames of the originally received 210 video sequence.

Since each one of the composite frames may be formed 230 from multiple difference-frames, and each difference-frame may be calculated 220 over multiple frames of the originally received 210 video sequence, the composite frame represents events captured in several video frames in a single frame.

As a result, when compared with the received 210 video frames, the composite frames are much fewer in number, and the computationally heavy calculations’ parts that are run on the composite frames need to process much fewer frames, thus saving computer resources in their offline stage, online stage, or both stages

Further, a neuronal network run on the composite frames, may be able to identify ball trajectories using the composite frames, and automatically identify a predefined event based on the identified trajectories, as apposed to neuronal networks that if run on the video sequence’s frames as captured, would be employed merely for identifying the ball itself in the frames (leaving trajectory identifying to computationally heavy subsequent steps).

Optionally, the composite frames may also be used for downsampling the video frames used for 3D analysis, for example, by processing only frame regions of interest (ROI) that correspond to an area within a predefined (say by a programmer of apparatus 1000) distance from the ball’s trajectory or specific position.

As a result, for example, a 3D tracking of the ball’s movements during an online stage of such 3D analysis, carried out in real time (or near real time) as the ball moves during the sport event, may prove much more efficient in as far as the use of computing resources is concerned.

Optionally, one or more of the steps 210-230 of the exemplary method may further employ one or more other image processing methods, as known in the art. The other image processing methods may include, but are not limited to, for example:

- Transformation using synthetic radial distortion, affine or projective transformation, general transformation methods, etc., as known in the art. For example, such transformation may be employed to change the timescale of the ball’s trajectory to a one of slower movement (i.e. a slowdown effect), etc., as known in the art.

- Color Fitting, say for improving contrast.

- Visual Stabilization of video frames.

- Contrast Stretching.

- De -blurring (say using a point spread function, as known in the art).

- Other filters, as known in the art.

Optionally, the composite frame is formed 230 from difference-frames calculated 220 during long periods of the received 210 video sequence and even over all difference-frames calculated 220 over the received 210 video sequence. In one example, a weight factor of a changing value is used for each difference-frame combined 230 to form the composite frame, say a weight factor decaying exponentially or rather non-exponentially. In the example, the earlier is the calculation 220 of the difference-frame, the smaller is the weight factor that the difference-frame’s pixel values are multiplied by during combining 230, as described in further detail hereinabove.

In a second example, the composite frame is formed 230 based on all difference-frame calculated 220 before the composite frame is formed 230, and whenever a new difference-frame is calculated 220 (say when a new video frame is received 210 and used for that calculation 220), the composite frame is updated 230.

In the second example, for forming 230 the updated 230 composite frame, each pixel value of the thus newly formed 230 composite frame is set to a value that is the sum of the product of the previous value of that pixel and a first factor and the product of the new difference -frame’s pixel value of a same position within the frame and a second factor. Optionally, the first factor, the second factor, or both factors, are dynamically updated, say by recursively lowering the first factor whenever a new updated 230 composite frame is formed 230, so as to give a gradually decreasing weight to older difference -frames.

Optionally, one or more of the factors and coefficients mentioned hereinabove may be dynamically updated, whether randomly or deterministically, adaptively (say according to a time point of the video sequence, or according to a changing quality of the received 210 frames) or not, etc., as described in further detail hereinabove.

Reference is now made to Fig. 3-11, which are simplified block diagrams schematically illustrating an exemplary implementation scenario of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

In one exemplary scenario, during a sport event (say tennis match), on a partly cloudy day, a video camera installed on a tennis court captures a ball 301 hit by a racket (not shown), as the ball approaches an area surrounding the court’s borderline 302.

The video camera captures a video sequence that includes the video frames 3003-3006 illustrated in Fig. 3-6. The frames 3003-3006 capture the moving ball 301, the court borderline 302, and other objects, say a cloud 303 and a sun 304. In each one of the frames 3003-3006, the ball 301 is captured in a different position. In the first exemplary scenario, during the receiving of the video sequence, say by the video sequence receiver 110 of apparatus 1000, there is calculated a difference- frame 3007-3008 over each respective pair of frames of the video sequence received by the video sequence receiver 110.

In this first exemplary scenario, the difference-frames 3007-3008 are calculated over the first and second frames 3003-3004, and third and fourth frames 3005-3006, respectively.

However, in other exemplary implementation scenarios of exemplary embodiments of the present invention, the video frames for calculating each respective difference-frame over, may be selected differently, as described in further detail hereinabove.

For example, in a second exemplary implementation scenario that implements a sliding window, first in - first out approach, a first difference-frame is calculated over the first and second frames, a second difference -frame is calculated over the second and third frames, etc., as described in further detail hereinabove.

In the first exemplary implementation scenario, when in receipt of the second video frame 3004, there is calculated a first difference-frame 3007, as illustrated in Fig. 7.

In the first scenario, the first difference-frame 3007 is calculated between the first two frames 3003-3004 of the video sequence, say by subtracting between pixel values of a same position within the frames 3003, 3004, as described in further detail hereinabove. Thus, for example, the difference-frame’s 3007 second row, first column pixel is set with a value that is the absolute value of the result of subtracting the value of second frame’s 3004 pixel positioned in the frame’s 3004 second row, first pixel from the value of first frame’s 3003 pixel of the frame’s 3003 second row, first pixel.

As illustrated in Fig. 7, the first difference-frame 3007 shows the moving ball 301 in two positions, but omits object like the borderline 302, sun 304 and cloud 303 that do not move between the frames 3003-3004 that the first difference-frame 3007 is calculated over.

In the first scenario, when in receipt of the fourth video frame 3006, there is similarly calculated a second difference-frame 3008, as illustrated in Fig. 8.

As illustrated in Fig. 8, the second difference-frame 3008 shows the moving ball 301 in two newer positions, but omits the borderline 302, sun 304 and cloud 303 that do not move between the frames 3005-3006 that the second difference-frame 3008 is calculated over, say by subtracting between the frames’ 3005-3006 pixel values, as described in further detail hereinabove.

Then, the two difference-frames 3007-3008 are combined, say by the composite frame former 130 of apparatus 1000, to form a first composite frame 3009, as illustrated in Fig. 9.

The first composite frame 3009 represents the trajectory taken by the ball 301 during the ball’s movement during the sport event, as a series of images of the ball 301, as captured in the frames 3003-3006 of the received video sequence.

The composite frame former 130 further adds one of the frames of the received video - say a one that captures the borderline 302 even before the ball is hit, to form a final composite frame 3010 that shows the borderline 302 too. The final composite frame 3010 may thus be used to determine an occurrence of an event such as a“In” or“Out” event, as illustrated in Fig. 10.

In the example of the instant scenario, the final composite frame 3010 clearly shows the ball’s 301 landing within the borders of court, i.e. to the right of the borderline 302, thus allowing a user (say referee) or an automatic system (say the event determiner), to quite confidently determine that the event is not an“Out” event.

However, in another example, a final composite frame 3011 alone, though showing that the ball bounces, does not show the ball’s 301 landing itself, and therefore, does not allow a user (say referee) or an automatic system (say the event determiner of apparatus 1000), to determine that the event is not an“Out” event, as illustrated in Fig. 11.

However, the ball’s 301 trajectory may be interpolated between images of the ball 301 as presented on the composite image 3011, to determined if the ball 301 lands within the borders of court, i.e. to the right of the borderline 302, as described in further detail hereinabove.

Thus, in one example, according to a rule predefined by a programmer or operator of apparatus 1000, an assumption of linearity near the ball’s bouncing position (say by assuming that the ball’s 301 velocity does not change the ball’s direction significantly during a short time period of say the 0.02 second between frames of a 50 fps video, as known in the art) is employed. In the example, the landing position of the ball is determined simply by intersecting between two lines, as illustrated in Fig. 11. One line is drawn by connecting the images of the ball 301 that are to the right of the borderline 302, whereas the second line is drawn by connecting the images of the ball 301 that are to the left of the borderline 302.

In the example illustrated using Fig. 11, based on that interpolated landing position, the user (say referee) or automatic system (say the event determiner of apparatus 1000), determines that the event is not an“Out” event.

Reference is now made to Fig. 12, which is a simplified block diagram schematically illustrating a non-transitory computer readable medium storing computer executable instructions for performing steps of ball tracking in a sport event, according to an exemplary embodiment of the present invention.

According to an exemplary embodiment of the present invention, there is provided a non-transitory computer readable medium 12000, such as a CD-ROM, a USB-Memory, a Portable Hard Disk, etc.

The computer readable medium 12000 stores computer executable instructions, for performing steps of ball tracking in a sport event.

The computer executable instructions may be executed by a computer. The computer may include a single computer, a group of computers in communication over a network, one or more electric circuits, or any combination thereof, as described in further detail hereinabove.

Optionally, for executing the instructions, the computer communicates with one or more cameras, say with a video camera, over the internet, over an intranet network, over a local area network, over another network, or any combination thereof, as described in further detail hereinabove.

Thus, the computer executable instructions include a step in which, during a sport event that takes place in a constrained environment such as a tennis court or a football field, there is received 1210 a video sequence that captures movement of a ball during the sport event in a series of video frames, as described in further detail hereinabove.

In one example, the video sequence is captured using a video camera installed in the constrained environment and is streamed live from the video camera to computer on which the step of receiving 1210 the video sequence is executed, as described in further detail hereinabove.

In a second example, the video sequence is captured using a still camera installed in the constrained environment, and is rather made of a series of still frames captured by the still camera during the sport event, which still frames are received 1210 in the order in which the still frames are captured.

The computer executable instructions further include a step in which, based on received 1210 video sequence, there are calculated 1220 a plurality of difference- frames. Each one of the difference-frames is calculated 1220 over a respective group of two or more of the video frames of the received 1210 video sequence, as described in further detail hereinabove.

The difference-frame may be calculated 1220, for example, by subtracting between pixel values of the two or more video frames of the respective group of video frames, by calculating a high order difference over the group’s video frames (say a difference between differences), by applying a predefined formula on pixel values of the group’s video frames, etc., as described in further detail hereinabove.

Optionally, the calculating 1220 further includes changing a resolution of at least one of the video frames of the group, as described in further detail hereinabove.

Optionally, at least a part of the calculating 1220 is limited to a region of interest (ROI) in the video frames of the group (say to pixels within a certain region surrounding a tennis court’s borderline or a soccer gate, as captured in each one of at least some of the frames), as described in further detail hereinabove.

Thus, each one of the calculated 1220 difference-frames is a video frame that represents a difference among the respective group’s two or more video frames, and is accordingly likely to include an image of one or more moving objects (particularly, the ball) as captured in different positions, in the video frames that make up the received 1210 video sequence.

The difference-frame is thus meant to include two or more images of ball (one image per each video frame used for calculating 1220 the difference-frame, if the ball appears in that video frame of the video sequence), as described in further detail hereinabove.

Each image of the ball that appears in the calculated 1220 difference -frame, has a different position within the calculated 1220 frame, and represents a different position of the moving ball. However, the difference-frame usually omits at least some of the video sequence’s background elements (say court lines, fences, soccer gates, or other elements that do not change or move between the frames of the received 1210 video sequence), or a part thereof.

Optionally, the executable instructions for calculating 1220 each one of the difference-frames, include selecting the video frames for the respective group of frames that the difference-frame is to be calculated 1220 over, according to a predefined (say time -dependent) criterion, as described in further detail hereinabove.

Thus, in one example, based on a time -dependent criterion, each specific one of the groups includes the most recently received 1210 frame of the video sequence (i.e. the last frame received 1210 when the specific difference-frame’s calculation 1220 over frames of that group starts), and the video sequence’s frame received 1210 two seconds earlier.

In a second example, that is based on a different criterion, the received 1210 video sequence or a selected part thereof (say the last ten frames of the received 1210 video sequence), is stored in a buffer implemented on a computer memory, as described in further detail hereinabove.

In the second example, each specific one of the groups that the difference- frames are calculated 1220 over, includes one of the frames that is chosen as central reference frame and all frames within a distance of two frames from that central reference frame (thus making the group a group of five frames).

Optionally, in the second example, the difference-frames are calculated 1220 by deriving a high order difference over the specific group’s video frames, as described in further detail hereinabove.

In one example, the computer executable instructions calculate 1220 the difference-frame by subtracting between values of pixels of the reference frame and values of pixels of each respective one of the frames within the distance, to yield a respective difference for each pixel position (i.e., to yield four differences).

Optionally, each one of the four differences is used in its absolute value since pixels may bear a positive value only. Then, the instructions calculate 1220 a value for each pixel of the difference-frame, by averaging over the four differences or using another calculation made using the differences, as described in further detail hereinabove. The computer executable instructions further include a step of combining 1230 at least two of the calculated 1220 difference-frames, as described in further detail hereinabove. In the step 1230, the two or more calculated 1220 difference-frames are combined 1230, to form a composite frame that represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received 1210 video sequence, as described in further detail hereinabove.

Each image of the ball as appearing in the composite frame formed in the step of combining 1230, has a different position within the composite frame and represents a different position of the moving ball, as described in further detail hereinabove.

Optionally, the step of combining 1230 the difference-frames to form the composite frame further includes adding at least one of the video frames of the received 1210 video sequence to the composite frame, as described in further detail hereinabove.

As a result, there is added one or more background elements omitted from the difference-frames and/or from the composite frame in one of the above mentioned steps of calculating 1220 and combing 1230, to the composite frame, as described in further detail hereinabove.

Optionally, the step of combining 1230 of the difference-frames to form the composite frame further includes multiplying at least some of the values of the pixels of the at least one of the video frames of the received 1210 video sequence by a predefined factor. The step 1230 further includes adding the video frame thus multiplied, to the composite frame. As a result, at least some of the omitted background elements may be added to the composite frame.

Thus, essentially, the composite frame formed in the step of combining 1230, is a single frame that represents the ball’s trajectory as a series of images of the ball as captured in the received 1210 video sequence.

Thus, optionally, the manner in which the composite frame is created 1230 may resemble an overlaying of all or some of the calculated 1220 difference-frames. Each one of the calculated 1220 difference-frames captures the ball in a different position within the difference -frame, and the calculated 1220 difference-frames are thus combined 1230 to form a single layer that shows an image of the ball in each respective one of the different positions, as described in further detail hereinabove. Optionally, the step of combining 1230 the difference-frames to form the composite frame further includes forming a plurality of such composite frames and combining the composite frames of the plurality, to form a sequence of video, say a video clip. The video sequence formed through that combining of the plurality of composite frames, may serve to illustrate the built-up of the trajectory taken by the ball during the ball’s movement as captured in the received 1210 video sequence, to emphasize certain moments during the ball’s movement, etc.

Each composite frame of the plurality is formed 1230 by combining 1130 a respective group of at least two of the calculated 1220 difference-frames, and represents a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received 1210 video sequence, as described in further detail hereinabove.

Optionally, further in the step of combining 1230, the difference-frames used to form 1230 the composite frame are selected according to a predefined criterion, say according to a time-dependent criterion. The criterion may be predefined, say by a programmer or operator of apparatus 1000, as described in further detail hereinabove.

Thus, in one example, according to a first exemplary time-dependent criterion, each group of difference-frames that is used for forming 1230 a respective composite frame, includes the most recently calculated 1220 one of the difference-frames (i.e. the last difference-frame calculated 1220 when the specific composite frame’s generation 1230 starts). The time -dependent criterion of the example further dictates that the group further include the difference-frames calculated 1220 one second before, two seconds before, three seconds before, four second before, and five second before that most recently calculated 1220 one of the difference-frames.

In a second example, based on another exemplary time -dependent criterion, each group of difference-frames used for forming 1230 a respective composite frame, includes the most recently calculated 1220 one of the difference-frames (i.e. the last difference-frame calculated 1220 when the specific composite frame’s formation 1230 starts). According to the second example’s time -dependent criterion, the group further includes the seven difference-frames calculated 1220 immediately before that last difference-frame.

Optionally, in the step of combining 1230 the group’s difference-frames to form the compound frame, there is given different weight to different ones of the difference-frames that are used for forming 1230 the composite frame, as described in further detail hereinabove.

For example, the step of combining 1230 may include applying a different weight factor to each one of the difference-frames that are subjected to the combining 1230, as described in further detail hereinabove.

In one example, earlier calculated 1220 ones of the difference-frames are given less weight, as described in further detail hereinabove. As a result, the trajectory of the ball is still represented in the composite frame as a series of images of the ball as captured in the frames of the received 1210 video sequence. However, due to the different weight factors given to the difference-frames, the trajectory is represented in the composite frame with a “fading out” effect, as described in further detail hereinabove.

Thus, in the example, when the composite frame formed 1210 from the difference-frames based on those weight factors, is presented to a user on a screen of a computer or smart phone, the more recent is the position of the ball in the ball’s trajectory, the brighter is the image of the ball representing that position to the user (say referee), as described in further detail hereinabove.

Optionally, the computer executable steps further include a step of presenting the composite frame to a user (say a referee), say on a computer screen, for assisting the user in determining occurrence of a predefined event (say an“In” or“Out” event) during the sport event, as described in further detail hereinabove.

Optionally, the computer executable steps further include a step of automatically determining occurrence of a predefined event during the sport event, say using the ball’s trajectory revealed by the composite frame, for automatically determining on which side of the court the ball lands when bouncing.

Optionally, the step of determining the occurrence of the event, further includes interpolating between two images of the ball as captured in the composite frame, based on one or more predefined rules, as described in further detail hereinabove.

Optionally, in the step of automatically determining occurrence of the predefined event, the composite frame is used together with one of the resource- expensive ball tracking methods described in further detail hereinabove, so as to limit the use of the re source -expensive method to moments of interest. For example, the composite frames may be used to detect an occurrence of an event of interest (say an“In or Our” event) during the sport event, and only upon that detection, computationally heavier calculations of three dimensional modeling and tracking, may be employed, so as to more accurately determine if the event is an“In” or“Out” event, as described in further detail hereinabove.

It is expected that during the life of this patent many relevant devices and systems will be developed and the scope of the terms herein, particularly of the terms “Computer”,“Camera”,“Video”,“Image”,“Frame”,“CD-ROM”, a“USB-Memory”, a“Hard Disk Drive (HDD)”, and“Solid State Drive (SSD)”, is intended to include all such new technologies a priori.

It is also noted that although specific embodiments described hereinabove relate to a ball moving during a sport event, it is evident that many alternatives, modifications and variations to that specific moving ball embodiments will be apparent to those skilled in the art. Specifically, embodiments in which the methods of the presented embodiments are rather applied to an object that is different from a ball, such as a frisbee, a discus, a tennis racket, or any other object in move during a sport event, are also included herein.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.

Claims

WHAT IS CLAIMED IS:

1. A method of ball tracking in a sport event, the method comprising computer-executed steps of:

receiving a video sequence capturing movement of the ball during the sport event in a series of video frames;

calculating a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence; and

combining at least two of the calculated difference-frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

2. The method of claim 1, wherein said combining further comprises adding at least one of the video frames of the received video sequence to the composite frame.

3. The method of claim 1, wherein said combining further comprises multiplying at least some of the pixels of one of the video frames of the received video sequence by a predefined factor and adding the multiplied video frame to the composite frame.

4. The method of claim 1, further comprising forming a plurality of composite frames, each composite frame of the plurality being formed by combining a respective group of at least two of the calculated difference-frames and representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

5. The method of claim 1, further comprising presenting the composite frame to a user, for assisting the user in determining occurrence of a predefined event during the sport event.

6. The method of claim 1, further comprising automatically determining occurrence of a predefined event during the sport event, using the composite frame.

7. The method of claim 1, further comprising applying a different weight factor to each one of the difference-frames in said combining.

8. The method of claim 1, wherein said calculating is carried out by subtracting between pixel values of the video frames of the group.

9. The method of claim 1, wherein said calculating is carried out by calculating a high order difference over pixel values of the video frames of the group.

10. The method of claim 1, wherein said calculating is carried out by applying a predefined formula on pixel values of the video frames of the group.

11. The method of claim 1, wherein said calculating further comprises changing a resolution of at least one of the video frames of the group.

12. The method of claim 1, wherein said calculating further comprises limiting at least a part of said calculating to a region of interest in the video frames of the group.

13. The method of claim 1, wherein said combining further comprises limiting at least a part of said combining to a region of interest in at least two of the difference- frames.

14. The method of claim 1, further comprising selecting the video frames for each respective one of said groups according to a predefined criterion.

15. The method of claim 1, further comprising selecting the video frames for each respective one of said groups according to a dynamic criterion.

16. The method of claim 1, further comprising selecting the video frames for each respective one of said groups according to a time -dependent criterion.

17. The method of claim 1, further comprising selecting the difference-frames for said combining according to a predefined criterion.

18. The method of claim 1, further comprising selecting the difference-frames for said combining according to a dynamic criterion.

19. The method of claim 1, further comprising selecting the difference-frames for said combining according to a time-dependent criterion.

20. An apparatus for ball tracking in a sport event, the apparatus comprising: a computer processor;

a video receiver implemented on said computer processor, configured to receive a video sequence capturing movement of the ball during the sport event in a series of video frames;

a difference-frame calculator, in communication with said video receiver, configured to calculate a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence; and

a composite frame former, in communication with said difference-frame calculator, configured to combine at least two of the calculated difference-frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.

21. A non-transitory computer readable medium storing computer executable instructions for performing steps of ball tracking in a sport event, the steps comprising:

calculating a plurality of difference-frames, each difference-frame being calculated over a respective group of at least two of the video frames of the received video sequence; and combining at least two of the calculated difference-frames, to form a composite frame representing a trajectory taken by the ball in the movement as a series of images of the ball as captured in the received video sequence.