WO2017079735A1

WO2017079735A1 - Method and device for capturing synchronized video and sound across multiple mobile devices

Info

Publication number: WO2017079735A1
Application number: PCT/US2016/060803
Authority: WO
Inventors: Joel Boyce THACKER; Wayne Anthony KILLIUS
Original assignee: Video Pipe Inc.
Priority date: 2015-11-05
Filing date: 2016-11-07
Publication date: 2017-05-11

Abstract

A method for generating a time synchronized video and sound recording on a plurality of mobile devices via a mobile device application independently, comprising a video capturing unit, a sound capturing unit, a receiver, a timing engine, a SMPTE generator, at least one sensor to detect the position and/or orientation of the mobile device, and a communication unit. The timing engine further comprising a reference clock source retrieval, and time estimation algorithm utilizing one or more time reference sources to enable multiple mobile devices with the mobile device application capturing a live event to synchronize their reference clock, upload the video to a centralized location, and share the captured video with the others via a web portal.

Description

METHOD AND DEVICE FOR CAPTURING SYNCHRONIZED VIDEO AND SOUND

ACROSS MULTIPLE MOBILE DEVICES

FIELD OF THE INVENTION

[0001] The present invention is generally directed toward a method and device for synchronizing recorded video and sound. More particularly, the present invention relates to a method and device for generating a time synchronized video and sound recording in a mobile device.

BACKGROUND OF THE INVENTION

[0002] Systems that acquire and analyze image and sound from multiple devices are well known in the art. Large scale production companies have the specific equipment necessary to capture, synchronize and produce a single product from numerous sources. For example, multiple cameras can be used to capture an event from different angles. However, this equipment is impractical and cost prohibitive for use by the general public. More specifically, this technology is not generally available to the average consumer, as it is very expensive, large, and heavy. Often, viewers of a live event would like to see a particular aspect from another angle, but they are limited to replays recorded by the production companies, or occasionally recordings that are uploaded to streaming sites.

[0003] Improvements to the scale of equipment needed are also known in the art. However for these mobile methods, the cameras are synchronized in place and time via a telecommunications network. Due to calibration between cameras, each camera is affected by every other camera in a multi camera session. Further, some of the methods require a master camera to synchronize other cameras or mobile devices. While the prior art has provided for the synchronization of time references so that the inputs and outputs of data capturing devices can be coordinated and controlled in a deterministic manner, this prior art system does not allow for using distributed time stamp technology and independent cameras which start at different times in different locations.

SUMMARY OF THE INVENTION

[0004] The presently disclosed invention is a method and device for independently capturing and synchronizing video and sound on a plurality of mobile devices via a mobile device application, comprising a video capturing unit, a sound capturing unit, a receiver, a timing engine, at least one sensor to detect the position and/or orientation of the mobile device, and a communication unit. The invention enables multiple mobile devices capturing a live event to independently synchronize their reference clocks, upload the video to a centralized location and share the captured video with the other mobile devices via a web portal, alleviating the need for multiple mobile devices present in one location, a master camera, and/or a shared network.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] Further advantages of the invention will become apparent by reference to the detailed description of preferred embodiments when considered in conjunction with the drawings:

[0006] FIG. 1 depicts a flow chart of the application of the method. [0007] FIG. 2 depicts a flow chart of the application of the method. DETAILED DESCRIPTION

[0008] The following detailed description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific details are set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the invention. Descriptions of specific applications are provided only as representative examples. Various modifications to the preferred embodiments will be readily apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the scope of the invention. The present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest possible scope consistent with the principles and features disclosed herein.

[0009] In a mobile experience and content-driven society, a need exists for a system which allows individuals present at an event to independently capture live content, obtain the best version of that content, and collaborate with other observers. Video capture and synchronization methods are presently reliant on other mobile devices either due to their location and/or start time, and some even require a master camera. Moreover, the majority of these systems require that the multiple cameras utilize a shared network, such as a cellular network. These requirements are impractical and wield the attempted methods unusable. For instance, at an event, each observer in the audience has a unique and limited video and sound experience, aspects of which complement other observers' experience. Therefore, a method and device is needed that does not require other mobile devices to be present in the same place and recording at the same time to create frame accurate multi camera recording and generate multi view video recordings of the same event from different angles and locations. [0010] The presently disclosed invention is a method and device for independently capturing and synchronizing video and sound on a plurality of mobile devices via a mobile device application, comprising a video capturing unit, a sound capturing unit, a receiver, a timing engine, a SMPTE generator, at least one sensor to detect the position and/or orientation of the mobile device, and a communication unit. The timing engine further comprises a reference clock source retrieval and a time estimation algorithm utilizing one or more time reference sources. As shown in FIG. 1, the method enables multiple mobile devices with the mobile device application capturing a live event to synchronize their internal reference clock, upload the video to a centralized location, and share the captured video with the others via a web portal.

[0011] The mobile device application runs on a user's mobile device and captures video and associated metadata, including the time the video began, the orientation of the mobile device camera, the location of the user, etc., at full frame rate. After recording, the metadata is sent to the server, which contains metadata from potentially thousands of users at the same event recording video at the same time. The user can then apply the first filter to obtain videos with the best focus, angles, locations, etc. and view a thumbnail of these sources. The sources are provided in the best quality possible based on a computation of the quality available given the bandwidth of the network and other user preferences. This determination is the second filter of the method. Based on the resulting thumbnail, the user can then choose and edit the best video for his or her specifications.

[0012] This mobile method can also be utilized by large scale producers. For instance, a NASCAR producer can choose the best video and audio of a crash that occurred at a turn of a race by accessing the server containing views from each user of the mobile device application at the specific turn of the race that he or she is interested in. If the quality of the audio of the recordings nearest to the crash is poor, the producer can then pick audio from a different location of the track at the exact same time to create an improved audio and video clip for viewers of the event.

[0013] As shown in FIG. 2, in the preferred embodiment of the method, multiple users with the mobile device application can independently capture via a mobile device video and sound of a single event at full frame rate.

[0014] Since there is no separate processor used for synchronizing video, the lGHz clock crystal contained in the mobile device serves as the master buss clock and is also used as a master reference clock for generating SMPTE. While capturing the video and audio, the video is synchronized at the lens opening and frame accurate SMPTE time code is printed on the video at capture. Audio is also time coded when captured at the A to D converter from either the microphone input or the line input.

[0015] The mobile device can be a mobile phone, pad, or computer capable of performing the method.

[0016] The SMPTE generator stripes video for use in synchronization and is a separate entity, self-sufficient of the timing engine and the algorithm engine discussed below. While the mobile device is in the powered on state, the SMPTE is continuously being generated and calibrated to reflect the most accurate time of day as possible. Output of the SMPTE generator goes directly to the analog to digital converter at the lens opening, where the CMOS collects the image and translates it into a digital stream. This allows for extremely accurate timing information to be printed to the video at the proper place in the hardware so that latency does not occur. The SMPTE time code is then stored in the standard user atom reserved for timing and location metadata. The CODEC stores the data during the compression stage of processing the video. When the CODEC decompresses the video, the SMPTE and metadata are unaffected by the CODEC, making it frame accurate.

[0017] While SMPTE is the preferred time encoding mechanism to stripe the video captured, all other mechanisms capable of printing accurate time on the video at capture are contemplated by this method.

[0018] Unlike the prior art, this method creates its own parameters and timing information in each mobile device independently. Each camera is completely autonomous and self-synchronizes using a timing engine, comprising a reference clock receiver and a time estimation algorithm utilizing one or more time reference sources. This timing engine receives and synchronizes several related and unrelated time reference sources, and a reference clock is generated from these time reference sources using the proprietary algorithm. The proprietary algorithm makes decisions based on the validity of the time source, and either considers or discards the timing information based on other time reference sources available at the time, that are compared to any timing information received from any server available. Using a single source Network Time Protocol (NTP) for time stamping may not be sufficient in some cases without additional time sources, such as GPS/p data, unix timestamp and cell packet conversion, to choose the best time reference and recalibrate the SMPTE time code that is consistently being generated in the process cache.

[0019] The time reference sources include, without limitation, NTP, cellular network sources, GPS precision data, and other available protocols. The triangulation retrieval pings multiple locations to verify location, then the second ping receives the time stamp and generates a single time stamp from multiple towers. The GPS/P data uses the current location and future locations to calculate accurate timing. When GPS satellite reaches a fixed point, the reference clock is re-calibrated and compared with the preceding reference clocks.

[0020] The timing algorithm improves the internal device timing engine to within a tolerance such that the videos across devices appear to be synchronized to the average viewer (usually +/- 2 video frames at 30Hz frame rate), and reduces battery usage by reducing the number of times the reference clock is re-calibrated. By looking at the calibration history, and using the internal crystal clock as a reference, the one gigahertz master clock is divided and stepped down to 1/30th of a second, which is then compared to the current reference clock's most recent calibration.

[0021] Software optimizes the algorithm by accounting for previous optimization, current optimization, and most importantly, other users optimization. When the mobile devices are being used at the same locations, the most recent and best algorithm communicates with other devices in the same location, and after comparison, the best algorithm upgrades the surrounding mobile devices either via Wi-fi or cellular network. An accurate time clock is then imprinted on each frame of video at the lens opening. Due to this independence, each camera remains unaffected by other cameras in a multi camera session.

[0022] In addition to the time coded video being captured, at least one sensor in the mobile device detects the position and/or orientation of the mobile device which is tagged as metadata.

[0023] Potential embodiments of the metadata capture include, without limitation, the time the video capture began, the location of the user at every representative point in the video, compass readings, accelerometers, and other reference information necessary for editing and identification.

[0024] After capture, the frame accurate SMPTE time coded video and additional metadata are uploaded to an online content server by breaking up the video and uploading it by frames. The server then assembles the video using the frames. The uploading of frames can occur all at once or a few frames at a time. Since the frames are accurate when uploaded to the server, no additional synchronization is necessary, other than minor timing adjustments made on the server itself.

[0025] While stored on the server, timing and color correction of the content occurs, after which videos are accessibly by users. The method receives timing information and makes decisions to create the most accurate time code internally. No external synchronization device is used. Timing is corrected, if necessary, on the server.

[0026] The method also contemplates 3D capabilities of synchronized footage using commercially available software.

[0027] Once on the server, the first level of selection occurs by the user. A second proprietary algorithm filters the data based on what the user is seeking so that a user can obtain the potential good sources. The mobile device application software manages bandwidth by compressing potential good sources and sending them in a thumbnail format for view by the user. For example, if thousands of users are at the same event recording video at the same time, the application does not transmit and send all of the captured data to the server. Instead, the captured low resolution video thumbnails, including metadata, is sent to the server and stored for each interested user. [0028] The quality of service is then determined based on the bandwidth of the network, the frames the user is interested in, and the minimum quality of video the user is willing to accept in terms of frame rate and size. Latency is minimized and the user is able to control the selection and quality. For instance, if a user wants to compress all of the potential good sources in 10 seconds, the user will be provided with 200 x 100 quality images at .25 frame rate in 10 seconds.

[0029] Compression methods include, but are not limited to, spatial subsampling/minimization, temporal subsampling, and image quality reduction.

[0030] Determining quality of service is the most efficient way to utilize available bandwidth based on what has been achieved in the past few frames while constantly updating the parameters and providing the user the smallest frames he or she can work with.

[0031] After the quality of service is determined based on the various parameters, the best video or videos that fit the user's specifications are downloaded for the user to view. Further, once a video is selected, the application utilizes commercially available software which is integrated seamlessly with the mobile device application for editing the video. Users can download or upload a full bandwidth version of the video once the editing has been finished.

[0032] All video built on a mobile device is a low resolution representation of what is actually being built on the server. The method does not provide a multi view of a scene to individual users. Rather, users log into a community editor located on a social network

(Facebook, Google +, etc.) and view their footage, along with other users' footage, so that it can be edited together. Users can communicate via social media and text messaging during the editing process to allow for seamless communication with other users. [0033] The terms "comprising," "including," and "having," as used in the claims and specification herein, shall be considered as indicating an open group that may include other elements not specified. The terms "a," "an," and the singular forms of words shall be taken to include the plural form of the same words, such that the terms mean that one or more of something is provided. The term "one" or "single" may be used to indicate that one and only one of something is intended. Similarly, other specific integer values, such as "two," may be used when a specific number of things is intended. The terms "preferably," "preferred," "prefer," "optionally," "may," and similar terms are used to indicate that an item, condition or step being referred to is an optional (not required) feature of the invention.

[0034] The invention has been described with reference to various specific and preferred embodiments and techniques. However, it should be understood that many variations and modifications may be made while remaining within the spirit and scope of the invention. It will be apparent to one of ordinary skill in the art that methods, devices, device elements, materials, procedures and techniques other than those specifically described herein can be applied to the practice of the invention as broadly disclosed herein without resort to undue experimentation. All art-known functional equivalents of methods, devices, device elements, materials, procedures and techniques described herein are intended to be encompassed by this invention. Whenever a range is disclosed, all subranges and individual values are intended to be encompassed. This invention is not to be limited by the embodiments disclosed, including any shown in the drawings or exemplified in the specification, which are given by way of example and not of limitation.

[0035] While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

[0036] All references throughout this application, for example patent documents including issued or granted patents or equivalents, patent application publications, and nonpatent literature documents or other source material, are hereby incorporated by reference herein in their entireties, as though individually incorporated by reference, to the extent each reference is at least partially not inconsistent with the disclosure in the present application (for example, a reference that is partially inconsistent is incorporated by reference except for the partially inconsistent portion of the reference).

Claims

CLAIMS: We Claim:

1. A method for generating time synchronized video and audio recording across multiple mobile devices comprising the steps of:

a. capturing audio and video of a live event via multiple mobile devices capable of recording audio and video;

b. synchronizing each mobile device reference clock independently;

c. uploading the recorded videos to a server;

d. filtering the recorded videos based on user preferences via an algorithm; e. viewing recorded videos in a compressed format;

f. selecting a full bandwidth video from the recorded videos in compressed format; and

g. sharing the selected full bandwidth video with other mobile devices via a web portal.

2. The mobile device of claim 1 wherein said mobile device is a smart mobile phone.

3. The mobile device of claim 1 wherein said mobile device is a smart mobile pad.

4. The mobile device of claim 1 wherein said mobile device is a proprietary device.

5. A device for generating a time synchronized video and sound recording across multiple mobile devices comprising:

a. a video capturing unit;

b. a sound capturing unit;

c. a receiver; d. a timing engine for receiving and synchronizing several related and unrelated time reference sources;

e. a SMPTE generator for printing accurate timing information on the recorded video at the proper place in the mobile device hardware;

f. at least one sensor for detecting the position and orientation information of the mobile device tagged as metadata; and

g. a communication unit.

6. The mobile device of claim 5 wherein said mobile device is a smart mobile phone.

7. The mobile device of claim 5 wherein said mobile device is a smart mobile pad.

8. The mobile device of claim 5 wherein said mobile device is a proprietary device.

9. The timing engine of claim 5 further comprising a reference clock source retriever and a time estimation algorithm utilizing one or more time reference sources.

10. The timing engine of claim 9 wherein said reference source is Network Time Protocol.

11. The timing engine of claim 9 wherein said reference source is cellular.

12. The timing engine of claim 9 wherein said reference source is GPS.

13. A method for managing video bandwidth comprising:

a. capturing metadata associated with a video;

b. storing the metadata on a server;

c. determining the preferences of a user;

d. determining the minimum quality of the video necessary in terms of frame rate and size;

e. compressing the sources meeting user preferences and quality requirements; and f. sending the compressed sources in a thumbnail format for view by the user.