GB2565999A

GB2565999A - System for producing video recordings

Info

Publication number: GB2565999A
Application number: GB1707789.2A
Authority: GB
Inventors: Antony Clark Roger
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-05-15
Filing date: 2017-05-15
Publication date: 2019-03-06
Also published as: GB201707789D0

Abstract

A method of automatically assembling video highlights using fixed cameras is disclosed, where the cameras each view a fixed zone of interest and monitor the zone of interest for indicators of potentially interesting events. The timestamps of potentially interesting events are identified and used to assemble the highlights. The video highlights are of sporting events such as football or basketball. The zone of interest may be a goal or a basket. The indicators may be movement of players, trajectory of a ball, the relative sizes of objects, the presence of a person or a ball, speed of movement of objects, audio or a combination of these. Zones of interest may be determined by a person, computer image analysis, or by a supervised machine learning module. The supervised machine learning module may identify the potentially interesting events, where the assembled highlights are assigned user feedback scores, the results being returned to the supervised machine learning module. The score may be determined by the number of comments on social media. The advantage of the invention is the automation of the production of video highlights making it suitable for amateur sporting events.

Description

SYSTEM FOR PRODUCING VIDEO RECORDINGS

The present invention relates to the production of video recordings, particularly video recordings of parts of a sports match.

BACKGROUND TO THE INVENTION

Many sports matches are video recorded. For professional sports events of wide public interest, this is done commercially for the purposes of broadcasting the recording, often alongside a commentary. It is well known also to produce recordings not of the whole match, but of just selected highlights. In this way, viewers who do not want to spend the time watching the whole match may appreciate the most important and interesting parts of it in a short period of time.

Highlights are usually produced by a manual editorial process, where a human operator selects the most interesting parts of the recording.

It is becoming increasingly common to video record amateur sports matches as well. Where a commercial recording I broadcast is not viable due to minority interest in the event, it is nevertheless increasingly affordable to set up good quality recording equipment to capture an amateur game. The footage may be viewed and shared primarily among participants and friends.

It is likewise desirable to produce selected highlights of such recordings. This is especially the case since the recording is likely to lack commentary, and also the overall quality may be of a lower standard compared with a commercial recording of a professional game. Watching a lengthy recording of an entire game is therefore unlikely to be of interest to many people. However, the time required to manually edit the footage to produce highlights means that these highlights are not produced very often.

It is an object of the invention to automate the production of video highlights from a video recording.

SUMMARY OF THE INVENTION

According to the present invention, there is provided a method of producing video highlights of a sports game, the method comprising:

providing a camera in a fixed position relative to the play area of the sports game, and obtaining a video stream from the camera;

identifying at least one fixed area of interest within the video stream;

monitoring one or more indicators within the zone(s) of interest;

identifying timestamps of potentially interesting events based on the indicator(s); and assembling video highlights based on the identified timestamps.

The method of the invention can use cheap hardware of the kind which is already used to produce amateur video footage of amateur sports games. All that is required is a camera and some form of mount for holding the camera in a fixed position relative to the play area, and a general purpose computer which can identify the potentially interesting events according to the method.

At least one camera in a fixed position is required in order to monitor for the indicators within a zone of interest. In some embodiments, multiple cameras may be provided. In particular, multiple cameras may be used to cover multiple zones. As an example, in a game of football at least two cameras will probably be required to cover both goals. Also, further cameras which are not in a fixed position may be provided. For example, a camera may be carried by a person, around the play area, who will attempt to follow interesting parts of the game. When the highlights are assembled, they may include video from multiple cameras, for example selected due to the amount of movement in the video stream or even just multiple streams presented in a “split screen” format. However, only cameras in a fixed position, where a zone of interest is identified, are used to identify the timestamps of interesting events and determine what parts of the game (in time) will be assembled into highlights.

In some embodiments, multiple cameras may be used and the video streams stitched together to provide a wide field of view. In some cases, a 360 degree view may be possible.

Typically, the fixed area(s) of interest will be, for example, a goal in football, a basket in basketball, etc. The fixed areas are selected so that, where certain identifiable indicators are detected within the area, there is a good chance that this will indicate an interesting part of the sports game (e.g. a goal or basket has been scored, or there was a near miss).

The monitored indicators may include, for example, the presence of a ball within the zone of interest, the presence of a person within the zone of interest, movement or change in the zone of interest, speed of movement, and combinations of those things, for example, presence of a ball which is moving at above a certain speed.

Identification of the zone(s) of interest may be set up manually. For example, a human operator after fixing the camera in position may, via an interface, identify a zone on a video screen which corresponds to the outline of a goal on a football pitch. Alternatively, computer image analysis including for example feature extraction and/or object recognition may be used to attempt to identify zone(s) of interest. In this case, very limited or no human input may be required for system set-up. The system may be tailored for a particular game, e.g. football, and may be provided with image analysis software which is trained to identify, e.g. the outline of a goal.

Likewise, image analysis software may be set up to monitor indicators and identify potentially interesting events according to static algorithms. Various edge detection, object recognition and object tracking techniques are known in the art. In some embodiments, face recognition could also be used. A simple example would be to identify a potentially interesting event whenever a ball is in a zone of interest. The ball could be detected by, for example, shape or colour, and known computer vision techniques could be used to do this fairly reliably, especially if a certain ball were to be specified for use with the system.

Other example indicators include movement of players within the zone(s) of interest, the trajectory of specific artefacts (e.g. a ball) through a zone of interest, the relative size of specific artefacts (which could indicate distance in a direction towards I away from the camera). Of course, different zones of interest could monitor different indicators in different combinations with different weightings. For example, a zone of interest encompassing just a goal area could monitor for an “ball present” indicator at all times. Another zone of interest, perhaps encompassing the entire field of view of the camera, might monitor for a ball present but only above a certain relative size, indicating that the ball may be close to the goal area, but not present within it. Another example is that a relevant indicator in one zone may be any player present, but in another zone a relevant indicator might be three or more players present, with at least one of them moving, for example.

In more complex embodiments, machine learning techniques may be used to identify zones of interest and/or determine indicators to monitor and/or identify potentially interesting events according to monitored indicators. Supervised learning techniques may be used, where assembled highlights are assessed for level of interest in order to train the classifier and hopefully improve its ability to identify interesting highlights. Preferably, the assembled video highlights may be made available to users, and a platform may be provided where users can comment on and/or share the highlights. The level of engagement with each highlight, in terms of the number of comments / shares, could be used to determine the level of interest and provide feedback to train a learning classifier. Over time, this should ensure that most highlights produced by the system are of genuine interest, and that spurious indicators (e.g. caused by the weather, animals or other factors unrelated to the sport) are ignored.

To assemble the highlights, in a simple embodiment a fixed period of video is extracted from one or more of the cameras, starting at a fixed period before the event of interest was identified and ending at a fixed period after the event of interest was identified. In the case that further events of interest are identified within this time frame, the extracted video clip would be extended accordingly. Alternatively more sophisticated techniques could be used to choose when to start and end the video clip, perhaps based on monitoring the same indicators but in different combinations or with different thresholds. Again, in some embodiments users may be able to feed back, for example by manually cutting or extending the highlights. Such manual intervention could be fed back to a learning system, to improve the assessment of the start and end of highlights assembled by the system in the future.

In some embodiments, an audio stream is provided from a microphone, which may be mounted to the camera or provided separately. Information from the audio stream maybe used as an additional factor when identifying potentially interesting events. For example, the sound of a crowd cheering may be a very strong indicator of an interesting event. In some embodiments this kind of sound being detected above a certain threshold may be enough on its own to identify a potentially interesting event. However, this and other indicators derived from an audio stream may be fed into an algorithm, possibly a learning classifier, with different weights chosen to come up with an overall reliable assessment of when an event is interesting.

DESCRIPTION OF THE DRAWINGS

For a better understanding of the invention, and to show more clearly how it may be carried into effect, preferred embodiments will now be described with reference to the accompanying drawings in which:

Figure 1 shows a camera in a fixed position relative to a football pitch, for use in the method of the invention;

Figure 2 is a close-up view of the camera and a mounting;

Figure 3 is a flow chart illustrating an embodiment of the method of the invention;

Figure 4 is an example frame from a video stream from a fixed camera used in an embodiment of the invention in a game of football; and

Figure 5 is an example frame from a video stream from a fixed camera used in an embodiment of the invention in a game of basketball.

DESCRIPTION OF PREFERRED EMBODIMENTS

Referring firstly to Figure 1, a fixed camera 10 on a mount 12 is shown mounted behind a football goal. The entire area of the goal, including the frame 14 and the net 16, is in the field of view of the camera 10. The field of view also includes an area around and in front of the goal. An example frame from a similarly-mounted camera is illustrated in Figure 4.

Figure 2 shows the mount 12 in more detail. The mount is a very simple board, which holds the camera 10 in a fixed position. Guards 18 are provided at either side of the camera, to prevent the camera from being damaged or knocked out of position if struck by a football.

As illustrated in Figure 2, the mount 12 can be simply mounted to a fence or other structure behind the goal. The mounting can be temporary or permanent depending on the circumstances.

Referring now to Figure 3, an embodiment of the method will be described in more detail. Video input(s) / stream(s) are obtained 200. In the embodiment of Figures 1, and 4, there are two cameras 10, one behind each goal of a football pitch. There are therefore two video input(s) I stream(s). As part of a set-up process, zone(s) of interest are set up within each video stream 202. In Figure 4, the zone of interest is shown in outline 203. It is a rectangular area in the field of view which encompasses both the frame 14 and the net 16 of the goal.

The position of a football is continually tracked within the field of view at step 204. Known computer vision techniques may be used to track the position of the ball. In particular, detecting a known colour and/or shape of the ball may provide reliable enough tracking. Computer vision techniques such as colour thresholding and Hough circles maybe used to track the ball. In some embodiments, knowledge of previous ball positions may be used to filter out anomalies, so that the ball is truly “tracked” as opposed to independently identified in different frames. The position of the ball is tracked because one of the indicators being monitored in this embodiment is whether or not the ball is within the zone of interest 203. Another indicator being monitored is the number of moving objects within the zone of interest 203, at step 206. In particular, any change or variation in the number of moving objects is monitored, i.e. the first time derivative of the number of moving objects. Movement between frames may be quantified using computer vision techniques such as background subtraction. The number of distinct moving objects may be counted by conducting a morphological erosion of a binarized representation of moving parts of the image, and then counting discrete moving parts.

In this embodiment, a potentially interesting event is identified at a particular timestamp if, at that timestamp, the ball is in the zone of interest and there is a high level of change in the number of moving objects in the zone of interest. This corresponds to a ball being present in or near the goal, and potentially a large movement in the goal net. The conditions for a potentially interesting event are illustrated by the four cases 208, 210, 212, 214. When both of these conditions are met an event of interest is recorded 216. The method then loops back to continue analysing the rest of the video stream until there is no more video input to analyse. The recorded timestamps of potentially interesting events are used to assemble video highlights.

Note that no event is recorded if a ball is detected but there is a normal or low variation in the number of moving objects. This scenario could be caused by a stationary unused ball near the goal. Likewise, a high variation in movement but no ball does not produce a potentially interesting event, as the movement could be caused by the weather or other uninteresting non-game-related factors.

Figure 5 shows the field of view of an example fixed camera in an alternative embodiment which is set up for basketball. In this example, a primary zone of interest 301 is defined tightly around the circular basket. A secondary zone of interest 302 is defined as a wider rectangular area around the basket. In this case the monitored indicators might for example include:

• presence of the ball in the primary zone of interest 301;

• presence of the ball in the secondary zone of interest 302;

• presence of a player’s face in the secondary zone of interest 302.

An example rule for identifying a potentially interesting event might be:

• always identify a potentially interesting event when the ball is in the primary zone of interest 301;

• identify a potentially interesting event when the ball is in the secondary zone of interest 302 and a player’s face is in the secondary zone of interest 302;

• otherwise, do not identify a potentially interesting event.

Of course, more complex embodiments may apply different weights to different indicators, quantify the confidence level of a particular indicator (e.g. the ball is in the primary zone with 60% confidence), and combine the weighted indicators through complex classifier networks to assess whether a threshold of interest is met. The properties of the classifier network may be updated according to feedback to improve the detection of potentially interesting events in the future.

The method of the invention allows for automatic production of video highlights, using a limited amount of straightforward and cheap hardware. This allows highlights of amateur sports games to be produced with very little human effort. By sharing and commenting on highlights through, for example, social media platforms, data may be gathered as to the quality of the automatically generated highlights. This data may be fed back into a learning classifier to improve the quality of highlights generated in the future.

It will be appreciated that the particular embodiments described are examples only, and the different features described may be combined in various ways. The invention 5 is described in the claims.

Claims

1. A method of producing video highlights of a sports game, the method comprising:

identifying at least one fixed area of interest within the video stream; monitoring one or more indicators within the zone(s) of interest; identifying timestamps of potentially interesting events based on the indicator(s); and assembling video highlights based on the identified timestamps.

2. A method as claimed in claim 1, in which multiple cameras are provided.

3. A method as claimed in claim 2, in which at least one of the multiple cameras is not provided in a fixed position.

4. A method as claimed in claim 2 or claim 3, in which multiple fixed areas of interest are monitored within video streams obtained from multiple cameras.

5. A method as claimed in any of the preceding claims, in which the zone(s) of interest are identified manually via an interface.

6. A method as claimed in any of claims 1 to 4, in which the zone(s) of interest are identified at least in part by computer image analysis.

7. A method as claimed in any of the preceding claims, in which a supervised machine learning module is used to identify zone(s) of interest and/or to determine indicators to monitor and/or to identify potentially interesting events according to monitored indicators.

8. A method as claimed in claim 7, in which assembled video highlights are assigned scores according to user feedback, the scores being fed back to the supervised machine learning module.

9. A method as claimed in claim 8, in which the score of assembled video highlights is determined at least in part by the number of comments and/or shares on a communication platform.

5

10. A method as claimed in any of the preceding claims, in which an audio stream is obtained from a microphone, and information from the audio stream is used as an additional factor when identifying potentially interesting events.