WO2013026991A1

WO2013026991A1 - Improvements in automatic video production

Info

Publication number: WO2013026991A1
Application number: PCT/GB2011/001264
Authority: WO
Inventors: David John THOMAS
Original assignee: Thomas David John
Priority date: 2011-08-23
Filing date: 2011-08-23
Publication date: 2013-02-28

Abstract

A method for automatically producing a video of a subject (30) based on images from at least two mobile video recording devices (20a, 20b) whereby triangulation is used to determine the relative positions of the devices with respect to the subject or scene (30). An automatic editing system may carry out the triangulation or reverse triangulation process and may store boundaries for the devices; data (200) from the devices (20a, 20b) may be used in the triangulation process and may include one or more of video footage; GPS; location; compass bearing; inclination; and zoom settings. Maximum extent of the field of movement (10) of the subjects or objects being filmed may be determined which camera (20a, 20b) represents the best or closest or most representative view of the current play based on the location of the subject or object (30) in the field of play (10).

Description

IMPROVEMENTS IN AUTOMATIC VIDEO PRODUCTION

Field of the Invention

This invention relates to the automatic production of a video from multiple sources.

Background of the Invention

Many events such as sporting events take place without an official video recording of the event taking place. In such circumstances an alternative to a video recording is a radio broadcast recorded during the event or the use of sports writers to produce a concise written record of what transpired during the event. For people who are unable to be present at the event in person these alternative records are not particularly satisfying.

Summary of the Invention

According to one aspect of the present invention, there is provided a method for automatically producing a video of a subject or scene based on images from at least two mobile video recording devices whereby triangulation is used to determine the relative positions of the plurality of mobile video recording devices with respect to the subject or scene.

The method provides for the use of at least two and preferably more mobile recording devices to record a subject or scene such as a sports playing field e.g. football, soccer, athletics or rugby or a sports course such as golf or rowing where, the relative locations of the mobile recording devices with respect to the subject or scene and each other is initially unknown and is subsequently determined from the field of view of each device as it tracks the subject/scene. This information then provides a set of boundaries for the device with respect to the subject/scene and the other devices enabling an automated editing system to determine, using triangulation which of the mobile recording devices has the best image at a given time of the subject/scene. This then enables a complete recording of the subject/scene where the best available image is provided automatically.

According to another aspect of the present invention, there is provided a system for automatically producing a video of a subject or scene based on images from at least two mobile video recording devices whereby a means is provided to perform triangulation to determine the relative positions of the plurality of mobile video recording devices with respect to the subject or scene.

An automatic editing system may be provided and the automatic editing system carries out the triangulation process.

In one non-limiting example, the data from the at least two mobile video recording devices is used in the triangulation process.

The data may include one or more of video footage; GPS; location; compass bearing; inclination; and zoom settings.

The automated editing system may store the boundaries of each mobile video recording device. In other words, over a period of time the automated editing system may determine the maximum extent of the field of movement of the subjects or objects being filmed.

In one non-limiting example, over a period of time the automated editing system determines the maximum extent of the field of movement of the subjects or objects being filmed. In one embodiment the boundaries of each mobile video recording device is related to the maximum extent of the field of movement of the subjects or objects being filmed i.e. a playing field with fixed boundaries or a course with boundaries which may be fixed e.g. a golf course but have more than one field of play i.e. each hole is considered as a separate field of play. This allows the automated editing system to switch between mobile video recording devices as the area or region of interest changes over time e.g. as a ball moves from one end of a pitch to the other.

In one non-limiting example, the method uses triangulation to determine the position of unknown mobile video recording devices and/or such devices whose positions are unknown, for example cameras (such as mobile phones of spectators at a football match). The location/behaviour of the subject or scene and the location of the unknown camera relative to the field of play of the subject are determined from this information. The auto editing system then puts together the best, closest or most representative images according to the locations of the unknown cameras relative to the fields of play.

Reverse triangulation may be used to determine the position of each mobile video recording device relative to the field of movement of the subjects or objects being filmed.

In one non-limiting example, the automated editing system determines which camera represents the best or closest or most representative view of the current play based on the location of the subject or object in the field of play. If boundaries for each device have been stored this information can be used to make this decision. Preferably, the automated editing system determines that the video output from this camera is selected to be displayed or recorded as the output choice for a given moment.

The automated editing system may determine each selected camera at each given instant over the complete range of footage for the entirety of the period of play.

The automated editing system may be able to automatically edit footage for an entire period of play without operator control or intervention.

In one non-limiting example, the automated editing system monitors the location of the subject or object during play and determines whether to maintain or change the selected camera footage as the best, closest or most representative view of the current subject or object being filmed.

The automated editing system may additionally compile an audio commentary taken from a library of commentaries and using the known outcome of an event during a period of play.

According to a second aspect, the invention provides a method of compiling an audio commentary for video footage comprising:

accessing a library of commentaries; and

using the known outcome of an event during a period of play, compiling the audio commentary.

This method of making the commentary is particularly useful when multiple sources of footage might be used some of which may not have audio feedback and in cases where they do have audio recording, the actual audio recording may be inappropriate to use.

According to yet another aspect, there is provided a computer-usable medium for automatically producing a video of a subject or scene based on images from at least two mobile video recording devices, the computer-usable medium embodying computer program code, the computer program code comprising computer executable instructions configured to perform trianguiation to determine the relative positions of the plurality of mobile video recording devices with respect to the subject or scene.

The invention will now be described, by way of example only, with reference to the accompanying drawings, in which:-

Brief Description of the Drawings

Figure 1 is a perspective view of the invention at time A;

Figure 2 is a perspective view of the invention at time B;

Figure 3 is a perspective view of the invention showing the edge of playing area;

Figure 4 is a perspective view of the invention showing a second edge of playing area;

Figure 5 is a perspective view of the invention showing a third edge of playing area;

Figure 6 is a perspective view of the invention showing a fourth edge of playing area;

Figure 7 is a perspective view showing position of a camera relative to subjects at different locations using reverse trianguiation;

Figure 8 shows a perspective view of an automated decision choosing the best camera view for a subject;

Figure 9 is a flow diagram detailing steps used in the invention; Figure 10 is a flow chart according to the invention;

Figure 11 is a schematic diagram of a data processing system in which the present invention may be embodied;

Figure 12 is a schematic diagram of a software system for carrying out the present invention; and

Figure 13 is a schematic diagram of a network of data processing systems in which aspects of the present invention may be implemented.

Detailed Description of the illustrated embodiment

The invention provides for the collation of video footage from at least two sources having unknown position/location with respect to a region of area of interest for example, a sports playing field, along with data on GPS, compass bearing, inclination, zoom settings. This all provides data that an automated editing decision making system 250 analyses to provide position information of the devices acquiring footage relative to a field of play of a game or sport e.g. a sports playing field (Figure 1 & 2).

Referring to Figures 1 and 2, a field of play 10 marks the boundary of the area of interest 12 and ideally there will be sufficient unknown mobile recording devices covering enough of the area of interest 12 to provide coverage of the event. There are two mobile video recording devices (cameras) 20a, 20b which are positioned on one side 14 of the boundary 0 and spaced apart. The specific area or region of interest 30 moves over time and is shown in Figure 1 at time stamp A and in Figure 2 at time stamp B. Lines 22a and 22b show the line of sight of cameras 20a and 20b respectively at each time stamp. The system analyses information about one or more of GPS, compass bearing, inclination, zoom settings, over a period of time in which the subject achieves the maximum extent of their field of play i.e. the range or viewing field of the source in question (Figures 3, 4, 5 & 6).

Thus, as the region of interest 30 moves around the field of play 10 information or data is collected about the footage from each source 20a, 20b. In some circumstances, for example where the area of interest 30 is remote from the sources 20a, 20b as shown in Figures 3 and 5, both sources 20a, 20b will give a reasonable representation of what is happening. However, if the area of interest is at a boundary 14 close to the source 20a,20b but at the distil end of that boundary, the footage taken by a remote source (20b in Figure 4 and source 20a in Figure 6) will not have a clear view and this may be judged to be outside of that sources' field of view by the automated editing device.

The system then takes a number of different instances where the subject has moved position and uses triangulation to determine the position of each source 20a, 20b or camera relative to the defined field of play 10 (Figure 7).

So, at the start of play the automated editing device 250 receives data from each source 20a, 20b and over time determines or forms a picture of the boundaries of each device 20a,20b as the specific area of interest moves over time. For each camera 20a,20b (20b shown) triangulation is used to determine the position of the camera 20b relative to the specific area of interest or subjects 130,130' at two different locations at different times

The automated editing decision making system 250 determines which camera provides the best representation of the play at that moment or time frame. The automated editing decision making system 250 switches the output of a video source to be that particular output at that particular moment i.e. the camera that provides the best representation. The automated editing decision making system maintains that particular choice as the output until such time as the position of the subject or object being filmed changes to another position. The automated editing decision making system then determines whether to maintain the same camera as the best closest or most representative source, or to switch to an alternative source based on the above process.

The system preferably also analyses the data set to provide time based information for the relative position of something of interest e.g. a ball in the field of play. The system then makes a decision as to which source or camera provides the closest and best representative view of the play, based on the location of the subject and the location of the source or camera relative to the field of play and the subject of interest (Figure 8). Thus, the information can be compared to the boundary 10 to provide zones in which each camera has the best image and the automated editing device 250 can determine which of the cameras provides the best view and chose the data or picture of that camera.

It will be understood that the invention may be employed as an apparatus and/or a system.

The automated editing device 250 is any data processing system suitably configured to enable implementation of the processes and apparatus of the embodiments.

A general description of suitable computer environments in which embodiments of the present invention may be implemented will now follow. Although not required, the auto video editing device will be described in the general context of computer-executable instructions, such as program modules, being executed on a single computer. One skilled in the art would appreciate that the auto video editing devices and methods may be practiced with other computer systems including multi-processor systems, microprocessor based systems, programmable consumer electronics, networked PCs, mini computers, main frame computers, handheld devices and the like.

By way of example, a computer based data processing system 1100, in which the automated editing device 250 is implemented according to one embodiment is illustrated in FIG. 1 1. Data processing system 1 100 has a processor (central processor unit (CPU)) 1101. Operably coupled to processor 1 101 , via one or more data buses, are a Random Access Memory 1102 and a storage unit 1103. Input Units 1 104,1105 are configured to input data into processor 1 101 and Output Units 1 106 are configured to output the processed data. Inputs can be entered from a keyboard, pointing device, USB stick, appropriate data connection of other suitable input. In the example of FIG. 11 , the input units are a pointing device 1104, such as a mouse or touch screen pointer, and a text input device 05, such as a keyboard or touch screen keys. Input can also be downloaded or fed from one or more networks via a network interface 1107. For example, inputs can be downloaded from the internet via a communication device.

Processor 1101 is configured to perform calculations, make decisions and control units of the data processing system. Input units and network interface accept data and instructions and input this information in a useable form to the data processing for processing. RAM 1102 and storage unit 1 103 stores data and instructions input to the data processing system for and during processing by the processor and for future use. RAM 102 is utilized, but not limited, to holding a program being executed by the processor, and related data. Storage unit 1 103 is utilized but not limited to archive programs, documents, databases and data results etc. Non-limiting examples of storage devices are hard disk, USB stick, DVD, CD etc.

Output Units output the results of the data processed by the processing system. Typically, output units are a display or monitor 1106 for visually displaying the output data. Other types of possible output units are for example a USB stick or output cable. Output data is also uploadable to a network via the network interface 1 107 via a communication device.

It can be appreciated that the data-processing apparatus 1 100 is not limited to the specific data system of FIG. 11 and may be in some embodiments a mobile computing device such as a Smartphone, a laptop computer, iPhone, or tablet device etc. In other embodiments, data-processing apparatus 1100 may function as a desktop computer, server, and the like, depending upon design considerations.

Referring now to FIG. 12, there is illustrated a computer software system 1200 for controlling data processing system 1 100 of FIG. 1 1 to perform auto video editing operations. Software system is for example stored in RAM 1 02 and Storage Unit 1 103 of FIG. 1 1 . An operating system 1201 is configured to control operation of components of the data processing system. One or more application software program modules are available for execution by the data processing system 1 100. "Module" as defined herein refers to a simple application or to groupings of routines, programs, objects, components and/or data structures for performing one or more particular functions. Modules may be composed of an interface part and routines accessible by other modules.

In software system 1200 of FIG. 12, an auto video editing software application 1202 includes instructions for performing operations described herein in relation processes of the embodiments. Software 1202 may include one or modules. For example, in particular, software 1202 has a data collector module 1203 for collecting data from the mobile devices, a position determinator 1204 for determining the position of the mobile device with respect to the field of play from the collected data, video editor 1205 for determining and selecting the mobile device giving the best or most desired view and automatically generating video from the best or most desired views, and an audio commentator 1206 for generating and adding commentary to the generated video.

Interface 1203 is for receiving and inputting user instructions and data into the data processing system. Interface 1203 may be a user graphical interface formed for example from text input device, pointing input device and display of system FIG. 11. Alternatively or additionally, interface 1203 may be network interface 1 107. Operating system module and/or automated video editing modules control the data processing system to act upon inputs from the interface(s). Operating system 1201 may in one embodiment be a Mac operating system. It can be understood that other types of operating system can be adopted, such as Microsoft, Linux, Android, iOS or other operating system. It will be appreciate that once the data processing system has been pre-configured for auto video editing from particular mobile cameras, the auto video editing software application can run by itself without further user interface inputs to automatically auto edit video from the mobile cameras.

In one example, the present invention is embodied in a network of data processing systems. By way of example, FIG. 13 illustrates such a network of data processing systems. Network data processing system 1300 has a plurality of the mobile devices 20a, 20b for capturing video that are operably connectable via one or more networks 1301 to one or more servers 1302 and one or more clients 1303. Data processing system 1100 of FIG. 1 1 is implemented as client 1303 or as server 1302 depending on the particular application. Network(s) 1301 are in this example a telecommunication network and internet network for connecting the mobile device via a telecommunication network to one or more server(s) 1302 and for connecting the one or more servers 1302 to the client(s) 1303. In other examples, the network(s) can be intranet networks or a combination or both internet and intranet networks with or without a telecommunication network. A number of different types of networks can be utilized such as for example, local area network (LAN), a wide area network (WAN) or a private virtual network (VPN). In one example, the data processing system 1100 implemented in a client or server can receive data from the mobile devices over wifi link either directly or via a network without reliance on a cellular telecommunication connection.

Network data processing system 1300 may include additional servers, clients and other systems and devices not shown in FIG. 13. The computation described herein may be executed on one or a plurality of servers and information communicated over network(s) 1301 to client(s) 1303 or other devices. Network data processing system 1300 may also include storage or databases for storing data such the video images or mobile device data and/or audio commentary related library data for use by the auto video editing software running on the client or server.

Figure 9 is a flow diagram showing steps used in the invention. Firstly, data is collected from multiple independent sources 200. This data includes video footage, GPS data, compass bearing data, inclination data and zoom settings from the source. The data 200 from each source is analysed 210 to provide position information for each source and this analysed data 210 is used to determine the extent of the field of play for each source 220. Data from a source is triangulated 230 to determine the position of the source with respect to the field of play of the event. Based on location of a subject of the event 240 a decision is made by the automatic editing system 250 on which source provides the best view of play.

Therefore, one aspect of the invention resides in the ability to use mobile cameras at unknown positions to track the behaviour subject or scene and use this information to put together the video without having to "edit" the video in the conventional sense. Auto commentary can also be added. This results in a complete automatically edited audio video of the subject of scene that does not require conventional post editing so that local football matches etc. can still be covered despite the absence of conventional TV and video broadcasters etc. An additional aspect of the invention is to provide an audio commentary to the video footage whereby, modulation of the emotive pitch and intensity of the commentary is preferably taken from a library based commentary derived from the known outcome of any particular action or play in an event, with the purpose of communicating the assumed tension that a commentator would naturally impart from being present at the event. Thus the commentary sounds human rather than an automated/robotic sounding synthesised voice.

Alternatively, commentary can be made from what is recorded by each device and/or an independent commentator who gives an audio story whilst watching the video footage.

Figure 10 shows a flow chart according to the invention where the automated editing system 350 receives data 360, 362 from different sources and determines which data set gives the best view 370 i.e. which camera represents the best or closest or most representative view of the current play based on the location of the subject or object in the field of play. If boundaries 380 for each device have been stored this information can be used to make this decision. The automated editing system 350 determines that the video output from this camera is selected 390 to be displayed or recorded as the output choice for a given moment.

The automated editing system 350 determines 400 each selected camera at each given instant over the complete range of footage for the entirety of the period of play by constantly checking the data received about where play is occurring.

In one embodiment, the automated editing system 350 is able to automatically edit footage for an entire period of play without operator control or

intervention. The automated editing system 350 achieves this by monitoring the location of the subject or object 410 during play and determines whether to maintain or change the selected camera footage 370 as the best, closest or most representative view of the current subject or object being filmed.

The source can be any of a number of possible sources which can both take a video recording and substantially simultaneously transmit the data along with positional information to an analyser including but not limited to a mobile phone.

The automated editing system may be part of a video sharing web stream associated with a suitable communications system such as but not limited to the internet or another a wireless communications network.

It is to be appreciated that these Figures are for illustration purposes only and other configurations are possible.

The invention has been described by way of several embodiments, with modifications and alternatives, but having read and understood this description further embodiments and modifications will be apparent to those skilled in the art. All such embodiments and modifications are intended to fall within the scope of the present invention as defined in the accompanying claims.

Claims

1. A method for automatically producing a video of a subject or scene (30) based on images from at least two mobile video recording devices (20a, 20b) whereby triangulation is used to determine the relative positions of the plurality of mobile video recording devices with respect to the subject or scene (30).

2. A method according to claim 1 wherein, an automatic editing system is provided and the automatic editing system (250) carries out the triangulation process.

3. A method according to claim 1 or claim 2 wherein, data (200) from the at least two mobile video recording devices (20a,20b) is used in the

triangulation process.

4. A method according to claim 3 wherein, the data (200) includes one or more of video footage; GPS; location; compass bearing; inclination; and zoom settings.

5. A method according to any of claims 2 to 4 wherein, the automatic editing system (250) stores boundaries for each of the at least two mobile video recording devices (20a,20b).

6. A method according to any of claims 2 to 5 wherein, over a period of time the automated editing system (250) determines the maximum extent of the field of movement (10) of the subjects or objects being filmed.

7. A method according to any preceding claim wherein, reverse

triangulation is used to determine the position of each mobile video recording device (20a,20b) relative to the field of movement (10) of the subjects or objects being filmed.

8. A method according to any of claims 2 to 7 wherein, the automated editing system (250) determines which camera (20a,20b) represents the best or closest or most representative view of the current play based on the location of the subject or object (30) in the field of play (10).

9. A method according to claim 8 wherein, the automated editing system (250) determines that the video output from this camera (20a,20b) is selected to be displayed or recorded as the output choice for a given moment.

10. A method according to claim 8 or claim 9 wherein, the automated editing system (250) determines each selected camera (20a, 20b) at each given instant over the complete range of footage for the entirety of the period of play.

11. A method according to any of claims 2 to 10 wherein, the automated editing system (250) is able to automatically edit footage for an entire period of play without operator control or intervention.

12. A method according to claim 8 or claim 9 wherein, the automated editing system (250) monitors the location of the subject or object (30) during play and determines whether to maintain or change the selected camera (20a,20b) footage as the best, closest or most representative view of the current subject or object being filmed.

13. A method according to any preceding claim wherein, the automated editing system additionally compiles an audio commentary taken from a library of commentaries and using the known outcome of an event during a period of play.

14. A method of compiling an audio commentary for video footage comprising the steps of:

accessing a library of commentaries; and using the known outcome of an event during a period of play, compiling the audio commentary.

15. A system for automatically producing a video of a subject or scene based on images from at least two mobile video recording devices whereby a means is provided to perform triangulation to determine the relative positions of the plurality of mobile video recording devices with respect to the subject or scene.

16. The system according to claim 15, wherein said means comprises an automatic editing device (250).

17. The system according to claim 16, further comprising data (200) from the at least two mobile video recording devices (20a, 20b); and wherein said automated editing device is configured to use said data in the triangulation process.

18. The system according to claim 17 wherein said automated editing device (250) is configured to perform reverse triangulation to determine the position of each mobile video recording device (20a, 20b) relative to the field of movement (10) of the subjects or objects being filmed. 9. The system according to claim 7 wherein said automated editing system (250) is configured to determine which camera (20a, 20b) represents the best or closest or most representative view of the current play based on the location of the subject or object (30) in the field of play ( 0).

20. A computer-usable medium for automatically producing a video of a subject or scene based on images from at least two mobile video recording devices, said computer-usable medium embodying computer program code, said computer program code comprising computer executable instructions configured to perform triangulation to determine the relative positions of the plurality of mobile video recording devices with respect to the subject/scene.