WO2022074359A1 - Processing audio or video data - Google Patents

Processing audio or video data Download PDF

Info

Publication number
WO2022074359A1
WO2022074359A1 PCT/GB2021/052497 GB2021052497W WO2022074359A1 WO 2022074359 A1 WO2022074359 A1 WO 2022074359A1 GB 2021052497 W GB2021052497 W GB 2021052497W WO 2022074359 A1 WO2022074359 A1 WO 2022074359A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
video file
data
tag
Prior art date
Application number
PCT/GB2021/052497
Other languages
French (fr)
Inventor
Alexander Macdonald
Terje Aasen
Original Assignee
Innovative Video Solutions Llc
Wilson, Timothy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovative Video Solutions Llc, Wilson, Timothy filed Critical Innovative Video Solutions Llc
Publication of WO2022074359A1 publication Critical patent/WO2022074359A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot

Definitions

  • the present invention seeks to provide a new approach to processing audio or video data that may mitigate some of these problems.
  • the invention provides a method of processing audio or video data, comprising: processing tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; and creating, in response to a user interaction with a touchscreen display, a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
  • the invention provides a system for processing audio or video data, comprising: a processor; memory; and a touchscreen display, wherein the memory stores software comprising instructions which, when executed by the processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with the touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
  • the invention provides computer software for processing audio or video data, comprising instructions which, when executed by a processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with a touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
  • tag data is used to create a smaller file, containing data representative of a tagged segment from the original audio or video data, through a user interaction with a touchscreen display.
  • the new audio or video file may be such that the only audio or video data in the new audio or video file is the data equal to or derived from the identified audio or video data that corresponds to a single tagged segment. It may have a playback duration equal to the duration of the tagged segment. In this way, a single tagged segment may be extracted as a new file.
  • the software may cause the processor to identify, for each of a set of two or more tagged segments, respective audio or video data within the source audio or video file that corresponds to each respective tagged segment.
  • the set of tagged segments may be all of the tagged segments identified in the tag data, or may be a subset of the tagged segments identified in the tag data — e.g.
  • the new audio or video file may comprise data equal to or derived from all of the identified audio or video data. In some embodiments, the only audio or video data in the new audio or video file is the data equal to or derived from all of the identified audio or video data.
  • the new file may comprise audio or video that is a concatenation of the two or more tagged segments. In this way, the source file may be compacted as a new file that retains the tagged segments.
  • the source audio or video file may be deleted after the new audio or video file is created. This may free up space in the memory — e.g. by increasing the amount of unallocated data storage.
  • the software may be executed on a system which may comprise or be an electronic device. It may be a portable or battery-powered device such as a smartphone, tablet computer or digital camera. Such devices typically have limited memory capacity, especially when, as in some embodiments, the memory is solid-state memory such as RAM or flash. This makes these methods particularly well suited for such devices, as they may be used to release memory by creating a compacted version of the source file, which may then be deleted.
  • the system may comprise a communications interface. It may comprise a radio modem, such as a cellular network modem or a wireless local area network (WLAN) modem.
  • the software may comprise instructions for causing the new audio or video file to be transmitted over a network connection, e.g. by radio. It may comprise instructions for sending the new file to a remote recipient, such as a server or a client device — e.g. over the Internet.
  • a remote recipient such as a server or a client device — e.g. over the Internet.
  • the smaller size of the new file may advantageously make such transmission quicker compared with sending the original source file.
  • the system may comprise a video camera, which may include a microphone.
  • the software may comprise instructions for creating the source audio or video file using signals from the video camera and/or from a microphone — e.g. by recording a live sporting event or a vlog (video log). This can allow for particularly convenient capture and processing of audio or video content on a single device, such as a cell phone.
  • the source audio or video file may be received by the system, e.g. over a network connection.
  • Such remotely-generated content could be a professional movie, or amateur footage recorded on a different device, or any other media content.
  • the software may comprise instructions for receiving the source audio or video file over a communications interface, or for creating the source audio or video file from data received over a communications interface.
  • the software may comprise instructions for creating the tag data.
  • the processor may provide a user interface — e.g. through the touchscreen display of the system — by which a user can create tag data identifying a tagged segment in the source audio or video file (i.e. can "tag" or “highlight” a portion of the file).
  • the user may tag a segment of an audio or video file while the audio or video file is being created — e.g. in real time while recording on a camera or microphone. This can save time for the user by allowing live tagging, rather than having to tag the file at a later time.
  • the user may tag a segment of a pre-recorded audio or video file.
  • the software may support both modes of tagging (i.e.
  • the software may comprise instructions for outputting audio or video, e.g. through a loudspeaker and/or display screen of the system.
  • Audio or video of the source file may be output as a user tags one or more segments the audio or video file, which may happen as the file is being written to the memory. It may also occur after such tagging is completed, e.g. for review purposes. Audio or video from the new audio or video file may also be output by the system, e.g. to enjoy watching the tagged highlights.
  • a user may cause tag data to be created that identifies a tagged segment within a source audio or video file by performing a transient tagging action such as interacting with a graphical user interface (GUI) element, e.g. on the touchscreen display.
  • the software may comprise instructions for detecting such a tagging action.
  • a user interface may provide inputs for receiving different types of tagging action corresponding to different tag types — e.g. having a set of touchscreen buttons labelled with different tag categories.
  • a tagging action may be detected while the source audio or video file is being created (e.g. during a live recording) and/or while the source audio or video file is being output (e.g. while it is being rendered on the touchscreen display).
  • a time at which the tagging action is detected may determine a position of a tagged segment within the source audio or video file.
  • a tagged segment may start at a position corresponding to the time of the tagging action.
  • the time may be determined as an absolute time (e.g. in Universal Time) or as a relative time, such as a time offset from a start of the source audio or video file.
  • at least one tagged segment may be positioned to start before a position corresponding to the time of the tagging action by a back-trace period.
  • the back-trace period may comprise at least one audio sample or video frame but may, for example, have a duration of a few seconds, so that the tagged segment can include the build-up to a notable incident and/or to allow for a delay between the user observing an incident, live or during playback of a pre-recorded video, and pressing the tag button.
  • the back-trace period may be specific to a tag or to a type of tag (e.g. to all tags of a particular category).
  • the tagged segment may end a predetermined forward duration after the position corresponding to the time of the tagging action.
  • the durations may be specified in units of time, such as seconds or milliseconds, or as a number of samples or frames.
  • the software may create the tag data partly in dependence on tag-type data.
  • the tag-type data may define one or more common parameters for a particular tag type — e.g. for all tagged segments associated with a common tag category.
  • the common parameters may comprise any one or more of: a common duration for each tagged segment; a common back-trace duration for each tagged segment; and a common forward duration for each tagged segment.
  • the tag-type data may be statically coded within the software, or may be user-configurable.
  • the software may comprise instructions for providing a user interface to create new tag types and/or edit existing tag types, for example to alter the back-trace duration for the tag category "penalty kick".
  • Each tag category may be selected from a set of tag categories.
  • the set of tag categories may contain one, two or more categories.
  • the tag categories may be predefined (e.g. by a developer of the software), or they may be user-defined, or they may include both predefined and user-defined categories.
  • the tag categories may be associated with a particular event, such as a football match — e.g. being "goal", "penalty kick” and "foul". However, at its most general, the tag category could merely be a tag label (e.g. a text string) that might have no particular meaning or significance, or a meaning or significance known only to one user or to a particular group of individuals.
  • the tag data may be stored within the source audio or video file or it may be stored separately. It may be stored in a tag-data file which may be in a standardised data- exchange format, such as JavaScript Object Notation (JSON) or Extensible Markup Language (XML).
  • JSON JavaScript Object Notation
  • XML Extensible Markup Language
  • the tag data may comprise a set of one or more tag objects (or tag items), each tag object encoding a tagged segment and one or more tag categories associated with the tagged segment (i.e. associated with the tagged segment).
  • the tag data may store the tag category as text data.
  • the tagged segment, or each tagged segment may be associated with i) a respective time offset into the source audio or video file, and ii) a respective duration.
  • the duration may be predetermined (e.g. fixed by a software developer) or it may be configurable by a user of the system.
  • the duration may be configurable for each tagged segment individually, or may be configurable in common for all tagged segments associated with a particular category, or for all tagged segments collectively. If the duration is constant, the tag data need not necessarily encode a duration value at all, or it may encode a duration value once for all of the tagged segments, or for all the tagged segments associated with a particular tag category.
  • the tag data encodes a time offset (e.g. a frame number within the audio or video file) and a duration value independently for each tagged segment — e.g. for each tag object.
  • the software may nevertheless require each tagged segment of a particular tag type (i.e. associated with a common tag category) to have a common duration.
  • subsequent parsing of the tag data may be simplified.
  • the touchscreen display may be a capacitive or resistive touchscreen. It may be any size, but may, in some embodiments, cover more than half or more than 90% of a face of a device, such as a smartphone.
  • the creating of the new audio or video file may be performed in response to a single user action, such as a single interaction with (e.g. press of) a GUI element, such as a button or icon, on the touchscreen display.
  • the GUI element may be displayed in response to a user selecting the source audio or video file, e.g. from a list of one or more files.
  • Compacting may thus be performed in response to a single touch after the relevant file has been selected or identified by the user.
  • This can enable very efficient and intuitive user interaction, by enabling the "one touch" creation of the smaller file, after tagging has been performed. It will be appreciated that such an approach can be much easier to use than traditional video-editing software that requires complex manipulations of sliders, menus, keyboards, dialog boxes, etc. in order to extract one segment of a video file from a larger source file. It is also well-suited to a touchscreen device such as a smartphone that typically has space only for a relatively few, simply GUI elements.
  • the software provides a first interface for extracting a single tagged segment from the source file —which may be the only tagged segment, or which may be a tagged segment selected from a plurality of tagged segments. It may provide a second interface for creating the new file from a plurality of tagged segments, e.g. from all the tagged segments, or from a selected subset of the tagged segment. Each interface may provide a respective input for initiating the creating of the new file in response to a user action. The processing of the tag data to identify the audio or video data may also be performed in response to the user action, or it may be performed ahead of time.
  • An interface may be provided by which a user can control the order in which tagged segments are concatenated within the new file.
  • There may be a default order such as a chronological order (i.e. based on the start time of each segment).
  • the user may be able to select a different ordering (such as ordering primarily by tag category and secondarily by start time), or may be able to drag and drop user-interface elements representing the segments (e.g. a representative image frame for each segment) into a desired order.
  • the audio or video data may be audio data only, or it may be video data only (i.e. without audio), or it may be video and audio data.
  • the data may be a recording of one or more physical objects or real-world events, such as a sporting match, a celebration, a music concert, a book reading, or it may be at least partly computer-generated, such as a being a video capture of a computer game.
  • the new audio or video file may comprise audio or video data copied from the source file.
  • the data in the new file may be derived from the identified audio or video data in the source file, e.g. after recoding or compression.
  • data derived from the identified audio or video data, and included in the new file has the same playback duration as it did in the source file.
  • the software may be an application (app) for a mobile device, such as an AndroidTM or iOSTM app. It may be distributed through an on-line app store. However, some of the operations described herein may be carried out by an operating system or other application, and thus the software may, in some embodiments, comprise an operating system and/or a plurality of applications.
  • the processor may be a single processor core, or it may comprise a plurality of cores or distinct processors and/or one or more DSPs. It may be part of all of a microcontroller or system-on-chip (SoC).
  • SoC system-on-chip
  • the memory may comprise volatile memory (e.g. RAM) and/or non-volatile memory (e.g. flash or EEPROM).
  • volatile memory e.g. RAM
  • non-volatile memory e.g. flash or EEPROM
  • the memory may be wholly or partly built into an electronic device, such as a smartphone (e.g. soldered to a motherboard) and/or may be wholly or partly removable from an electronic device (e.g. comprising a memory card such as a Secure Digital (SD) card).
  • SD Secure Digital
  • the software instructions may be stored in non-volatile and/or volatile memory.
  • the software may store transient data in volatile memory.
  • Figure 1 is a schematic drawing of a device embodying the invention being used to record a sporting event
  • Figure 2 is flow chart of steps carried out when recording an event on the device
  • Figure 3 is flow chart of steps carried out when compacting a video file for sharing after recording the event
  • Figure 4 is a screenshot from the device of an interface for recording live video
  • Figure 5 is a screenshot from the device of an interface for a tag library
  • Figure 6 is a screenshot from the device of an interface for creating a new tag
  • Figure 7 is a screenshot from the device of an interface for reviewing details of a tag
  • Figure 8 is a screenshot from the device of an interface for reviewing a video
  • Figure 9 is a screenshot from the device of an interface for sharing a tagged segment of a video
  • Figure 10 is a screenshot from the device of an interface for compacting a video for sharing.
  • Figure 1 shows an electronic device 1 such as a smartphone, networked camera or tablet computer. It has a camera module 2, including a lens 2a, microphone 2b and associated electronics, for capturing video and audio footage of a scene 3 such as a sporting event. It has also a radio modem 4 for sending video and audio data to a remote device 5 over the Internet 6.
  • the remote device 5 could be a client device, such as another smartphone, or it could be a server associated with a cloud video storage and/or sharing service.
  • the scene 3 could including a sporting event, a family gathering, a lecture, or any other type of happening.
  • the device 1 comprises a touchscreen display 8 for displaying content to a user and for receiving touch inputs from the user. It is powered by a processor 9 (e.g. a system- on-chip device).
  • the processor 9 executes software instructions stored in flash memory 10 or in RAM 11 , which may include a conventional operating system (e.g. AndroidTM, iOSTM, LinuxTM or WindowsTM).
  • the device 1 is powered by a battery 12. It may include other conventional components which are not shown here for simplicity.
  • FIG. 2 outlines the key steps in a process of recording and tagging live video (including audio). The steps are performed by a user interacting with software executing on the processor 9 through a touch-based graphical user interface (GUI) on the display 9.
  • GUI graphical user interface
  • the user presses a button to initiate 20 the recording of live video (including a synchronized soundtrack) by the camera 2.
  • the processor stores the video frames and audio data as a video file in the RAM 11 or flash memory 10; the file may be written to directly while recording, or after temporary buffering of some or all of the video or audio data in the RAM 11.
  • the user can touch 21 a "highlight” button on the screen 8 at key moments to tag corresponding segments of the video.
  • the screen 8 may show one or multiple different “highlight” buttons while the video is being recorded, for creating tags of different types.
  • the different tags have categories or labels that are relevant to the type of activity that is being recorded — e.g. "laughter”, “hug” and “wave to camera” during a social occasion, or “goal”, “penalty kick", “foul” during a football match.
  • the software application stores 22 associated highlight data (embodying tag data, as disclosed herein) in a file associated with the video; the file may be written to immediately, or only once the recording has ended.
  • the file may be stored in the RAM 11 or flash 10.
  • the user presses 23 a button to end the video recording.
  • tags highlights
  • a pre-recorded video file which may include an audio track
  • software on the device 1 may allow tags to be added to captured screen footage of a computer video game, which could be after the event (similarly to tagging other video files), or potentially during game play.
  • the tagging functionality could be integrated into the computer game, or could be provided by a separate application running alongside the computer game on the device 1.
  • the camera module 2 may allow visual and/or audible user reactions to be captured and stored simultaneously with the game footage.
  • the software may create a general tag (e.g. having a default generic tag category, such as “event”) in response to a user-tap in a wider area of the screen 8 — e.g. anywhere on a video display area.
  • a general tag e.g. having a default generic tag category, such as “event”
  • the highlight data may be stored in the video file — e.g. as meta-data in a common video container file format — or it may be stored in one or more separate files. In some embodiments, it is stored in a JSON file that is saved on the device 1 and associated with the corresponding video file (e.g. being stored in a common directory folder or having a common file-name portion or identifier).
  • tag category or type e.g. "goal”
  • position of the tag e.g. as a time offset into the video in milliseconds, or a video frame number
  • tag duration e.g. as a time offset into the video in milliseconds, or a video frame number
  • ID unique tag identifier
  • the position data stored in a tag object may identify the beginning of the tagged segment (i.e. at the start of the back-trace period). In this case, the object may just store the total duration of the tagged segment. In other embodiments, the position data may identify the point within the video at which the user pressed the “highlight” button. In this case, the tag object will separately store the back-trace duration value and the forward duration value.
  • Each successive press of a “highlight” button causes a new tag object to be written to the JSON file associated with the video.
  • the video file is not itself altered by the tagging process.
  • Tags may be added similarly to live or pre-recorded audio recordings.
  • the video file (or files, if the video is sufficiently long that it must be split across multiple files) and associated JSON file as closed. If the JSON file is initially created in RAM 11 , the app may copy the file to flash 10 at this point for longer-term, non-volatile storage, alongside the video file.
  • Figure 3 outlines the key steps in a process of compacting a tagged video file (that is, a video file having associated highlight data). This process may be carried out when the user wishes to reclaim memory on the device 1 for other purposes, without losing the key moments from the video, or when the user wishes to produce a shorter, more entertaining "highlights" video to show to a friend on the display 8 of the device 1 , or to send more quickly or cheaply over a wireless connection to a remote device 5.
  • the user browses the memory 10 of the device 1 to locate the source video file and opens the video file within the app.
  • the video file could be a file that has been recorded using the camera module 2, or it could be a file that has been received by the device 1 from another source — e.g. a movie downloaded using the radio modem 4, or an animated cartoon created by a user on the device 1 using another app.
  • the user clicks a button to initiate a compacting 30 of the video file.
  • the user may select to retain all of the tagged segments, or only a subset of the tagged segments — e.g. just one tagged segment, or all segments of a particular tag type.
  • the app parses the associated JSON file to identify all the relevant tag objects. For each tag object (i.e. for each highlight), it identifies the corresponding time interval within the video file and copies 32 the video data for that segment to the new file. This may involve a direct copy of the binary data, or some processing of the video data may be performed — e.g. to compress the data more highly, or to re-encode the video and/or audio using a different codec, or to add watermarking.
  • the tagged segments are concatenated within the new video file.
  • the overlapping video frames may, in some embodiments, be duplicated, or they may be included only once in the output file.
  • the app may also create a new JSON file, associated with the new video file, containing copies of all the relevant tag objects from the source JSON file.
  • the user may then choose to share the new video file by pressing a button to send or upload the video to the remote party 5 over the Internet 6. If so, the app may invoke a communication function through the operating system or other software app (e.g. an email or social media app) or software library to cause the radio modem 4 to transmit data encoding the new video file over the radio access network 7.
  • the device 1 may have a wired connection to the Internet 6 and may use this instead.
  • Figures 4 to 10 are screenshots from an exemplary video-tagging app implemented on a smartphone similar to the device 1 of Figure 1.
  • Figure 4 shows the interface 40 for recording live video. This may be shown fullscreen on the touchscreen display 8.
  • a start/stop recording button 41 can be pressed to start and stop the recording.
  • the user is presented with a set of three touchscreen “highlight buttons” (i.e. tag buttons), overlaid on the video feed, which are labelled as "injury time” 42, "bicycle kick” 43 and “penalty kick” 44.
  • the user may have had the option to select the type of event as being "football” from a list of possible event types; this selection would then determine the categories of the “highlight” buttons that are presented. However, other embodiments do not have this option.
  • At any time during the recording e.g.
  • the user can simply touch an appropriate “highlight” button (e.g. the "penalty kick” button 44) to tag the corresponding segment of the video.
  • an appropriate “highlight” button e.g. the "penalty kick” button 44
  • a “highlight” button is pressed, a corresponding tag object is added to the JSON file associated with the video as explained with reference to Figure 2.
  • a small window 45 in the bottom left corner of the interface 40 contains a count of the number of tagged segments within the video and a thumbnail from the most recent tagged segment.
  • the tag library interface 50 contains a button 53 for adding a new “highlight” type to the library.
  • Figure 6 shows the interface 60 for adding a new tag (“highlight”) type, including a touch keyboard 61.
  • An editable text field 62 allows the user to give the tag a textual category.
  • a first editable number field 63 allows the user to define the back-trace duration, while a second editable number field 64 allows the user to define the forward duration.
  • Figure 8 shows an interface 80 for reviewing a tagged video. It includes a video preview window 81 showing a current video frame from a selected tagged (highlighted) segment. This is overlaid with viewing buttons including a play/pause button 82.
  • a horizontal time bar 83 is displayed under the video preview window 81 , with a cursor 83b for rapid moving the video frame to a desired point within the video.
  • the cursor 83b is synchronised with the content in the preview window 81.
  • the time bar 83 spans the duration of the video.
  • the bar 83 includes markers 83c positioned at the locations of tagged segments; more specifically, each marker 83c is positioned at the time offset into the video at which the “highlight” button was pressed during the tagging process.
  • Beneath the video preview window 81 is a set of thumbnail frames 84a - 84e, 85, each extracted from the video at an offset corresponding to a respective tag (“highlight”) position.
  • the app extracts these based on tag location and duration data it reads from the tag objects in the JSON file associated with the video.
  • six highlights are shown.
  • a filter bar 89 allows the user to show "all highlights” or only a selected type of highlight (i.e. tag) — in this case "pass", "dribbling” or "cross”.
  • Each thumbnail frame is accompanied by the tag category and the time stamp of the beginning of the segment (i.e. the time offset at which the “highlight” button was pressed, minus the back-trace period for the tag).
  • the icons 87, 88 on the vertical timeline 86 are highlighted in turn, using a contrasting colour, to indicate the corresponding video progress status at any time.
  • the circular icon 88 corresponding to the selected tagged segment is enlarged relative to the other icons 87a - 87e and shows a pie chart.
  • a shaded sector of the icon 88 has a variable sector angle representing the position, within the selected tagged segment, of the current frame displayed in the preview window 81, ranging from 0 degrees at the start of the tagged segment, past 180 degrees at the midpoint, to 360 degrees at the end of the tagged segment.
  • the user will long-press or double-tap on the corresponding thumbnail 84a - 84e, 85.
  • a "share” button 93 if pressed, causes the app to perform the "extract” actions and then also to open an interface with options for sharing the new video file, e.g. by sending a link to the new video file to an email app or social-networking app also installed on the device 1.
  • the interface may be a native data-sharing interface provided by the operating system.
  • Figure 10 shows an interface 100 that allows the user to extract all the tagged segments in a video (i.e. to compact the video file), rather than just a single tagged segment, and to share the original video file.
  • the display can be scrolled vertically through a list of tagged video files, with two different videos from the list being displayed at any one time, one in an upper pane 101 and a second in a lower pane 102.
  • Each video is represented by a thumbnail frame (e.g. the first frame of the video) and a data panel showing the total number of tagged segments, the duration of the whole video and the date the video was recorded.
  • a set of overlaid touch icons appears: a "plus” icon 104, a "share” icon 105 and a “delete” icon 106.
  • the "plus” icon 104 performs a compaction operation on the video by concatenating all of the tagged segments of the video (including associated audio portions) into a new video file, as described above with reference to Figure 3.
  • an option may be provided to allow the user to select only a subset of the tagged segments for the concatenation.
  • an interface may be shown that allows the user to reorder the tagged segments to be different from a default chronological order — e.g. to drag and drop segments into a different sequence.
  • the "share” icon 105 opens a sharing interface for passing the video file to another app for sending over the Internet 6, or, in some embodiments, for causing the tagging app itself to send the new file over the Internet 6.
  • the "delete” icon 106 removes the whole video file from the flash memory 10. This "delete” icon 106 may optionally be used, after performing an extraction or compaction operating on a source video file, to remove the larger, original source video file in order to free up space in the memory 10 of the device 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

A method of processing audio or video data comprises processing tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; and creating (31, 32), in response to a user interaction (30) with a touchscreen display (8), a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.

Description

Processing Audio or Video Data
BACKGROUND OF THE INVENTION
This invention relates to methods and apparatus for processing audio or video data.
Audio or video data is increasingly being recorded on mobile devices, such as smartphones. Such devices are convenient and portable, so are readily to hand for recording events such as sporting matches, birthday celebrations, music concerts, weddings, etc. They typically have network connections, such as WiFi™ or cellular radio connections, which allow recorded data to be shared with remote parties such as friends and families. This may happen privately, e.g. by email or a messaging service such as WhatsApp™, or publicly, e.g. using a broadcast medium such as YouTube™ or Twitter™.
However, recording audio or video can result in large data files, especially when performed over an extended period, such as recording all of a 90-minute football match. Audio or video files can rapidly fill up the limited storage capacity of a mobile device. They can also be slow to send over bandwidth-limited radio data links.
The present invention seeks to provide a new approach to processing audio or video data that may mitigate some of these problems.
SUMMARY OF THE INVENTION
From a first aspect, the invention provides a method of processing audio or video data, comprising: processing tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; and creating, in response to a user interaction with a touchscreen display, a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data. From a second aspect, the invention provides a system for processing audio or video data, comprising: a processor; memory; and a touchscreen display, wherein the memory stores software comprising instructions which, when executed by the processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with the touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
The source audio or video file and/or the tag data may also be stored in the memory of the system. The new audio or video file may be created in the memory of the system.
From a third aspect, the invention provides computer software for processing audio or video data, comprising instructions which, when executed by a processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with a touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
This aspect extends to a non-transitory computer-readable medium, such as a harddrive or non-volatile memory chip, having such computer software stored thereon as one or more computer programs. Thus it will be seen that, in accordance with the invention, tag data is used to create a smaller file, containing data representative of a tagged segment from the original audio or video data, through a user interaction with a touchscreen display. This results in a smaller file which can be transmitted more quickly over a data link than the larger original file (e.g. over a wireless or wired network connection). Significantly, it achieves this not by arbitrarily discarding data from all parts of the original file, but in a way that retains content from the tagged segment. This enables a user to preserve parts of the source audio or video that are of particular interest, such as the moment a goal is scored in a football match. The smaller file may thus not only be more compact for storage and for sharing, but may also be more useful by enabling the efficient viewing of "highlights" from the source file. The use of a touchscreen makes it straightforward and convenient for a user to compact the data, especially when, as in preferred embodiments, the invention is implemented on a portable device such as a smartphone.
The tagged (or “highlighted”) segment may correspond to a continuous time interval in the source file — e.g. comprising a series of successive audio samples or video frames. The tag (or “highlight”) data may identify one or more tagged segments. In some embodiments, the tag data may identify a plurality of tagged segments, which may correspond to a respective plurality of time intervals within the source file. The tagged segments (i.e. sections) may be spaced apart from each other in time, although this is not essential (i.e. at least two of the tagged segments may be contiguous or overlapping). The tag data may associate each tagged segment with a respective tag category. In some instances, all of the one or more segments identified by the tag data may be associated with a common tag category, while in other instances two or more of the tagged segments identified by the tag data may be associated with different respective tag categories.
The new audio or video file may be such that the only audio or video data in the new audio or video file is the data equal to or derived from the identified audio or video data that corresponds to a single tagged segment. It may have a playback duration equal to the duration of the tagged segment. In this way, a single tagged segment may be extracted as a new file. However, the software may cause the processor to identify, for each of a set of two or more tagged segments, respective audio or video data within the source audio or video file that corresponds to each respective tagged segment. The set of tagged segments may be all of the tagged segments identified in the tag data, or may be a subset of the tagged segments identified in the tag data — e.g. being all the tagged segments that are associated with a particular tag category. The new audio or video file may comprise data equal to or derived from all of the identified audio or video data. In some embodiments, the only audio or video data in the new audio or video file is the data equal to or derived from all of the identified audio or video data. The new file may comprise audio or video that is a concatenation of the two or more tagged segments. In this way, the source file may be compacted as a new file that retains the tagged segments.
In some embodiments, the source audio or video file may be deleted after the new audio or video file is created. This may free up space in the memory — e.g. by increasing the amount of unallocated data storage.
The software may be executed on a system which may comprise or be an electronic device. It may be a portable or battery-powered device such as a smartphone, tablet computer or digital camera. Such devices typically have limited memory capacity, especially when, as in some embodiments, the memory is solid-state memory such as RAM or flash. This makes these methods particularly well suited for such devices, as they may be used to release memory by creating a compacted version of the source file, which may then be deleted.
The system may comprise a communications interface. It may comprise a radio modem, such as a cellular network modem or a wireless local area network (WLAN) modem. The software may comprise instructions for causing the new audio or video file to be transmitted over a network connection, e.g. by radio. It may comprise instructions for sending the new file to a remote recipient, such as a server or a client device — e.g. over the Internet. The smaller size of the new file may advantageously make such transmission quicker compared with sending the original source file.
The system may comprise a video camera, which may include a microphone. The software may comprise instructions for creating the source audio or video file using signals from the video camera and/or from a microphone — e.g. by recording a live sporting event or a vlog (video log). This can allow for particularly convenient capture and processing of audio or video content on a single device, such as a cell phone.
However, in some embodiments the source audio or video file may be received by the system, e.g. over a network connection. Such remotely-generated content could be a professional movie, or amateur footage recorded on a different device, or any other media content. Thus the software may comprise instructions for receiving the source audio or video file over a communications interface, or for creating the source audio or video file from data received over a communications interface.
The software may comprise instructions for creating the tag data. The processor may provide a user interface — e.g. through the touchscreen display of the system — by which a user can create tag data identifying a tagged segment in the source audio or video file (i.e. can "tag" or “highlight” a portion of the file). The user may tag a segment of an audio or video file while the audio or video file is being created — e.g. in real time while recording on a camera or microphone. This can save time for the user by allowing live tagging, rather than having to tag the file at a later time. However, in some embodiments the user may tag a segment of a pre-recorded audio or video file. The software may support both modes of tagging (i.e. live and post-production). The software may comprise instructions for outputting audio or video, e.g. through a loudspeaker and/or display screen of the system. Audio or video of the source file may be output as a user tags one or more segments the audio or video file, which may happen as the file is being written to the memory. It may also occur after such tagging is completed, e.g. for review purposes. Audio or video from the new audio or video file may also be output by the system, e.g. to enjoy watching the tagged highlights.
In some embodiments, a user may cause tag data to be created that identifies a tagged segment within a source audio or video file by performing a transient tagging action such as interacting with a graphical user interface (GUI) element, e.g. on the touchscreen display. The software may comprise instructions for detecting such a tagging action. A user interface may provide inputs for receiving different types of tagging action corresponding to different tag types — e.g. having a set of touchscreen buttons labelled with different tag categories. A tagging action may be detected while the source audio or video file is being created (e.g. during a live recording) and/or while the source audio or video file is being output (e.g. while it is being rendered on the touchscreen display).
A time at which the tagging action is detected may determine a position of a tagged segment within the source audio or video file. A tagged segment may start at a position corresponding to the time of the tagging action. The time may be determined as an absolute time (e.g. in Universal Time) or as a relative time, such as a time offset from a start of the source audio or video file. However, in some embodiments, at least one tagged segment may be positioned to start before a position corresponding to the time of the tagging action by a back-trace period. The back-trace period may comprise at least one audio sample or video frame but may, for example, have a duration of a few seconds, so that the tagged segment can include the build-up to a notable incident and/or to allow for a delay between the user observing an incident, live or during playback of a pre-recorded video, and pressing the tag button. The back-trace period may be specific to a tag or to a type of tag (e.g. to all tags of a particular category). The tagged segment may end a predetermined forward duration after the position corresponding to the time of the tagging action. The durations may be specified in units of time, such as seconds or milliseconds, or as a number of samples or frames.
In some embodiments, the software may create the tag data partly in dependence on tag-type data. The tag-type data may define one or more common parameters for a particular tag type — e.g. for all tagged segments associated with a common tag category. The common parameters may comprise any one or more of: a common duration for each tagged segment; a common back-trace duration for each tagged segment; and a common forward duration for each tagged segment. The tag-type data may be statically coded within the software, or may be user-configurable. The software may comprise instructions for providing a user interface to create new tag types and/or edit existing tag types, for example to alter the back-trace duration for the tag category "penalty kick".
Each tag category may be selected from a set of tag categories. The set of tag categories may contain one, two or more categories. The tag categories may be predefined (e.g. by a developer of the software), or they may be user-defined, or they may include both predefined and user-defined categories. The tag categories may be associated with a particular event, such as a football match — e.g. being "goal", "penalty kick" and "foul". However, at its most general, the tag category could merely be a tag label (e.g. a text string) that might have no particular meaning or significance, or a meaning or significance known only to one user or to a particular group of individuals.
The tag data may be stored within the source audio or video file or it may be stored separately. It may be stored in a tag-data file which may be in a standardised data- exchange format, such as JavaScript Object Notation (JSON) or Extensible Markup Language (XML). The tag data may comprise a set of one or more tag objects (or tag items), each tag object encoding a tagged segment and one or more tag categories associated with the tagged segment (i.e. associated with the tagged segment). The tag data may store the tag category as text data.
The tagged segment, or each tagged segment, may be associated with i) a respective time offset into the source audio or video file, and ii) a respective duration. The duration may be predetermined (e.g. fixed by a software developer) or it may be configurable by a user of the system. The duration may be configurable for each tagged segment individually, or may be configurable in common for all tagged segments associated with a particular category, or for all tagged segments collectively. If the duration is constant, the tag data need not necessarily encode a duration value at all, or it may encode a duration value once for all of the tagged segments, or for all the tagged segments associated with a particular tag category.
In some embodiments, the tag data encodes a time offset (e.g. a frame number within the audio or video file) and a duration value independently for each tagged segment — e.g. for each tag object. The software may nevertheless require each tagged segment of a particular tag type (i.e. associated with a common tag category) to have a common duration. However, by encoding the duration separately in each tag object, subsequent parsing of the tag data may be simplified.
The touchscreen display may be a capacitive or resistive touchscreen. It may be any size, but may, in some embodiments, cover more than half or more than 90% of a face of a device, such as a smartphone. In some embodiments, the creating of the new audio or video file may be performed in response to a single user action, such as a single interaction with (e.g. press of) a GUI element, such as a button or icon, on the touchscreen display. The GUI element may be displayed in response to a user selecting the source audio or video file, e.g. from a list of one or more files.
Compacting may thus be performed in response to a single touch after the relevant file has been selected or identified by the user. This can enable very efficient and intuitive user interaction, by enabling the "one touch" creation of the smaller file, after tagging has been performed. It will be appreciated that such an approach can be much easier to use than traditional video-editing software that requires complex manipulations of sliders, menus, keyboards, dialog boxes, etc. in order to extract one segment of a video file from a larger source file. It is also well-suited to a touchscreen device such as a smartphone that typically has space only for a relatively few, simply GUI elements.
In some embodiments, the software provides a first interface for extracting a single tagged segment from the source file — which may be the only tagged segment, or which may be a tagged segment selected from a plurality of tagged segments. It may provide a second interface for creating the new file from a plurality of tagged segments, e.g. from all the tagged segments, or from a selected subset of the tagged segment. Each interface may provide a respective input for initiating the creating of the new file in response to a user action. The processing of the tag data to identify the audio or video data may also be performed in response to the user action, or it may be performed ahead of time.
An interface may be provided by which a user can control the order in which tagged segments are concatenated within the new file. There may be a default order, such as a chronological order (i.e. based on the start time of each segment). The user may be able to select a different ordering (such as ordering primarily by tag category and secondarily by start time), or may be able to drag and drop user-interface elements representing the segments (e.g. a representative image frame for each segment) into a desired order.
The audio or video data may be audio data only, or it may be video data only (i.e. without audio), or it may be video and audio data. The data may be a recording of one or more physical objects or real-world events, such as a sporting match, a celebration, a music concert, a book reading, or it may be at least partly computer-generated, such as a being a video capture of a computer game. The new audio or video file may comprise audio or video data copied from the source file. However, in some embodiments, the data in the new file may be derived from the identified audio or video data in the source file, e.g. after recoding or compression. Preferably, though, data derived from the identified audio or video data, and included in the new file, has the same playback duration as it did in the source file.
The software may create a new tag-data file for the new audio or video file. The new tag-data file may comprise a copy of the tag object, from the source tag-data file, associated with each tagged segment that is included in the new file.
The software may be an application (app) for a mobile device, such as an Android™ or iOS™ app. It may be distributed through an on-line app store. However, some of the operations described herein may be carried out by an operating system or other application, and thus the software may, in some embodiments, comprise an operating system and/or a plurality of applications.
The processor may be a single processor core, or it may comprise a plurality of cores or distinct processors and/or one or more DSPs. It may be part of all of a microcontroller or system-on-chip (SoC).
The memory may comprise volatile memory (e.g. RAM) and/or non-volatile memory (e.g. flash or EEPROM). The memory may be wholly or partly built into an electronic device, such as a smartphone (e.g. soldered to a motherboard) and/or may be wholly or partly removable from an electronic device (e.g. comprising a memory card such as a Secure Digital (SD) card). The software instructions may be stored in non-volatile and/or volatile memory. The software may store transient data in volatile memory.
Features of any aspect or embodiment described herein may, wherever appropriate, be applied to any other aspect or embodiment described herein. Where reference is made to different embodiments or sets of embodiments, it should be understood that these are not necessarily distinct but may overlap.
BRIEF DESCRIPTION OF THE DRAWINGS Certain preferred embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 is a schematic drawing of a device embodying the invention being used to record a sporting event;
Figure 2 is flow chart of steps carried out when recording an event on the device;
Figure 3 is flow chart of steps carried out when compacting a video file for sharing after recording the event;
Figure 4 is a screenshot from the device of an interface for recording live video; Figure 5 is a screenshot from the device of an interface for a tag library;
Figure 6 is a screenshot from the device of an interface for creating a new tag; Figure 7 is a screenshot from the device of an interface for reviewing details of a tag;
Figure 8 is a screenshot from the device of an interface for reviewing a video; Figure 9 is a screenshot from the device of an interface for sharing a tagged segment of a video;
Figure 10 is a screenshot from the device of an interface for compacting a video for sharing.
DETAILED DESCRIPTION
Figure 1 shows an electronic device 1 such as a smartphone, networked camera or tablet computer. It has a camera module 2, including a lens 2a, microphone 2b and associated electronics, for capturing video and audio footage of a scene 3 such as a sporting event. It has also a radio modem 4 for sending video and audio data to a remote device 5 over the Internet 6. The remote device 5 could be a client device, such as another smartphone, or it could be a server associated with a cloud video storage and/or sharing service.
The scene 3 could including a sporting event, a family gathering, a lecture, or any other type of happening.
The radio modem 4 may connect with the Internet 6 through a radio access network 7 which may comprise a residential or corporate wireless local area network (WLAN), or a cellular telecommunications network, or any other suitable network. The radio modem 4 may be a WiFi™, Bluetooth™, 3G, 4G or 5G cellular modem, or may use any other appropriate radio protocol. In some embodiments, the device 1 may have a wired network interface (not shown), such as an Ethernet or LISB-C socket, which can provide an alternative connection to the Internet 6.
The device 1 comprises a touchscreen display 8 for displaying content to a user and for receiving touch inputs from the user. It is powered by a processor 9 (e.g. a system- on-chip device). The processor 9 executes software instructions stored in flash memory 10 or in RAM 11 , which may include a conventional operating system (e.g. Android™, iOS™, Linux™ or Windows™). The device 1 is powered by a battery 12. It may include other conventional components which are not shown here for simplicity.
Figure 2 outlines the key steps in a process of recording and tagging live video (including audio). The steps are performed by a user interacting with software executing on the processor 9 through a touch-based graphical user interface (GUI) on the display 9.
First, the user presses a button to initiate 20 the recording of live video (including a synchronized soundtrack) by the camera 2. The processor stores the video frames and audio data as a video file in the RAM 11 or flash memory 10; the file may be written to directly while recording, or after temporary buffering of some or all of the video or audio data in the RAM 11.
During the recording, the user can touch 21 a "highlight" button on the screen 8 at key moments to tag corresponding segments of the video. The screen 8 may show one or multiple different “highlight” buttons while the video is being recorded, for creating tags of different types. The different tags have categories or labels that are relevant to the type of activity that is being recorded — e.g. "laughter", "hug" and "wave to camera" during a social occasion, or "goal", "penalty kick", "foul" during a football match.
After the user touches a tag button, the software application (app) stores 22 associated highlight data (embodying tag data, as disclosed herein) in a file associated with the video; the file may be written to immediately, or only once the recording has ended. The file may be stored in the RAM 11 or flash 10. When the user has captured sufficient footage, the user presses 23 a button to end the video recording. A similar process may be followed for adding tags (highlights) to a pre-recorded video file (which may include an audio track), but instead of pointing the device 1 at the scene and pressing tag buttons in real time, the user initiates playback of a video and presses a tag button during the playback. The app again stores corresponding highlight data in a file associated with the video.
In some embodiments, software on the device 1 may allow tags to be added to captured screen footage of a computer video game, which could be after the event (similarly to tagging other video files), or potentially during game play. The tagging functionality could be integrated into the computer game, or could be provided by a separate application running alongside the computer game on the device 1. The camera module 2 may allow visual and/or audible user reactions to be captured and stored simultaneously with the game footage.
In some embodiments, there may be no explicit tag (or “highlight”) button displayed on the screen 8 but instead the software may create a general tag (e.g. having a default generic tag category, such as “event”) in response to a user-tap in a wider area of the screen 8 — e.g. anywhere on a video display area.
The highlight data may be stored in the video file — e.g. as meta-data in a common video container file format — or it may be stored in one or more separate files. In some embodiments, it is stored in a JSON file that is saved on the device 1 and associated with the corresponding video file (e.g. being stored in a common directory folder or having a common file-name portion or identifier).
When a tagged segment is generated, data about the tag is written as a corresponding tag object (also referred to herein as a highlight object) in the JSON file. The data encodes the tag category or type (e.g. "goal"), the position of the tag (e.g. as a time offset into the video in milliseconds, or a video frame number), the tag duration, and a unique tag identifier (ID).
Each tag type has an associated forward duration and an associated back-trace duration. The back-trace duration defines a time period before the video frame that was showing when the user pressed the “highlight” button, while the forward duration defines a time period after this video frame. For example, a "goal" tag may have a back-trace of 5 seconds and a forward duration of 3 seconds. The tagged segment thus spans an interval of 8 seconds, with the goal occurring at some point in the first 5 seconds. This allows the tagged segment to include the build-up to the event that prompted the user to press the “highlight” button, as well as an allowance for the reaction time of the user between seeing the event and pressing the “highlight” button.
In some embodiments, the position data stored in a tag object may identify the beginning of the tagged segment (i.e. at the start of the back-trace period). In this case, the object may just store the total duration of the tagged segment. In other embodiments, the position data may identify the point within the video at which the user pressed the “highlight” button. In this case, the tag object will separately store the back-trace duration value and the forward duration value.
Each successive press of a “highlight” button causes a new tag object to be written to the JSON file associated with the video. In such embodiments, the video file is not itself altered by the tagging process.
Tags may be added similarly to live or pre-recorded audio recordings.
Once a live recording and tagging has been finished, the video file (or files, if the video is sufficiently long that it must be split across multiple files) and associated JSON file as closed. If the JSON file is initially created in RAM 11 , the app may copy the file to flash 10 at this point for longer-term, non-volatile storage, alongside the video file.
Figure 3 outlines the key steps in a process of compacting a tagged video file (that is, a video file having associated highlight data). This process may be carried out when the user wishes to reclaim memory on the device 1 for other purposes, without losing the key moments from the video, or when the user wishes to produce a shorter, more entertaining "highlights" video to show to a friend on the display 8 of the device 1 , or to send more quickly or cheaply over a wireless connection to a remote device 5.
First, the user browses the memory 10 of the device 1 to locate the source video file and opens the video file within the app. The video file could be a file that has been recorded using the camera module 2, or it could be a file that has been received by the device 1 from another source — e.g. a movie downloaded using the radio modem 4, or an animated cartoon created by a user on the device 1 using another app. The user then clicks a button to initiate a compacting 30 of the video file. The user may select to retain all of the tagged segments, or only a subset of the tagged segments — e.g. just one tagged segment, or all segments of a particular tag type.
This causes the app to create 31 a new video file container in the flash memory 10, for receiving selected video frames extracted from the source video file. The app then parses the associated JSON file to identify all the relevant tag objects. For each tag object (i.e. for each highlight), it identifies the corresponding time interval within the video file and copies 32 the video data for that segment to the new file. This may involve a direct copy of the binary data, or some processing of the video data may be performed — e.g. to compress the data more highly, or to re-encode the video and/or audio using a different codec, or to add watermarking. The tagged segments are concatenated within the new video file. If the time intervals associated with two different tagged segments overlap in time, the overlapping video frames may, in some embodiments, be duplicated, or they may be included only once in the output file. The app may also create a new JSON file, associated with the new video file, containing copies of all the relevant tag objects from the source JSON file.
The user may then choose to share the new video file by pressing a button to send or upload the video to the remote party 5 over the Internet 6. If so, the app may invoke a communication function through the operating system or other software app (e.g. an email or social media app) or software library to cause the radio modem 4 to transmit data encoding the new video file over the radio access network 7. In other embodiments, the device 1 may have a wired connection to the Internet 6 and may use this instead.
Figures 4 to 10 are screenshots from an exemplary video-tagging app implemented on a smartphone similar to the device 1 of Figure 1.
Figure 4 shows the interface 40 for recording live video. This may be shown fullscreen on the touchscreen display 8. In this example, a football match is being recorded. A start/stop recording button 41 can be pressed to start and stop the recording. The user is presented with a set of three touchscreen “highlight buttons” (i.e. tag buttons), overlaid on the video feed, which are labelled as "injury time" 42, "bicycle kick" 43 and "penalty kick" 44. In some embodiments, before entering this screen, the user may have had the option to select the type of event as being "football" from a list of possible event types; this selection would then determine the categories of the “highlight” buttons that are presented. However, other embodiments do not have this option. At any time during the recording (e.g. when a penalty kick is being taken), the user can simply touch an appropriate “highlight” button (e.g. the "penalty kick" button 44) to tag the corresponding segment of the video. When a “highlight” button is pressed, a corresponding tag object is added to the JSON file associated with the video as explained with reference to Figure 2. A small window 45 in the bottom left corner of the interface 40 contains a count of the number of tagged segments within the video and a thumbnail from the most recent tagged segment. Once the user stops the recording, by pressing the start/stop recording button 41 , the video file is finalised and stored in the flash memory 10 along with the associated JSON file. The user may be given the option to provide a file name for the video file and/or for individual tagged segments, e.g. through one or more text input boxes.
The app may provide a very similar interface for adding tags to pre-recorded media such as a downloaded movie, except that the start/stop recording button 41 is not present, and a set of playback controls may be provided instead (e.g. "play", "pause", "fast-forward", "rewind")
A list of common event types and corresponding tags may be hard-coded within the app. However, the app also allows a user to create new tags. Figures 5 to 7 show how this is done.
Figure 5 shows a tag (“highlight”) library interface 50. In this example, two “highlight” types are displayed, intended for tagging videos of tennis matches. These may be all the tag types that are currently configured in the app, or these may only be a subset of the tag types, e.g. being just those contained within a folder called "tennis". The first “highlight” type 51 is labelled (categorised) "ace serve" and is configured to provide a 3-second back-trace period and a 5-second forward period (i.e. a total tag interval of 8 seconds); and the second “highlight” type 52 is labelled "line call" and is configured to provide a 5-second back-trace period and a 10-second forward period (i.e. a total tag interval of 15 seconds). The tag library interface 50 contains a button 53 for adding a new “highlight” type to the library. Figure 6 shows the interface 60 for adding a new tag (“highlight”) type, including a touch keyboard 61. An editable text field 62 allows the user to give the tag a textual category. A first editable number field 63 allows the user to define the back-trace duration, while a second editable number field 64 allows the user to define the forward duration.
Figure 7 shows the interface 70 once data entry for the highlight type is completed, showing the category 71, back-trace duration 72 and forward duration 73. A save button 74 allows the user to add the highlight type to the library. The same interface 70 may be used to edit existing highlight types, to change one or more of their data fields.
Figure 8 shows an interface 80 for reviewing a tagged video. It includes a video preview window 81 showing a current video frame from a selected tagged (highlighted) segment. This is overlaid with viewing buttons including a play/pause button 82. A horizontal time bar 83 is displayed under the video preview window 81 , with a cursor 83b for rapid moving the video frame to a desired point within the video. The cursor 83b is synchronised with the content in the preview window 81. The time bar 83 spans the duration of the video. The bar 83 includes markers 83c positioned at the locations of tagged segments; more specifically, each marker 83c is positioned at the time offset into the video at which the “highlight” button was pressed during the tagging process.
Beneath the video preview window 81 is a set of thumbnail frames 84a - 84e, 85, each extracted from the video at an offset corresponding to a respective tag (“highlight”) position. The app extracts these based on tag location and duration data it reads from the tag objects in the JSON file associated with the video. In this example, six highlights are shown. A filter bar 89 allows the user to show "all highlights" or only a selected type of highlight (i.e. tag) — in this case "pass", "dribbling" or "cross". Each thumbnail frame is accompanied by the tag category and the time stamp of the beginning of the segment (i.e. the time offset at which the “highlight” button was pressed, minus the back-trace period for the tag). (Note that the identical time stamp values in Figure 8 are dummy values, and are not the true values that would be displayed in a production device.) A vertical time line 86 next to the thumbnails 84a - 84e, 85 visually indicates, by circular icons 87a - 87e, 88 at respective positions on the line 86, the offset of each tagged segment within the whole video. (Note that the positions on the line 86 in Figure 8 are dummy positions, and are not the actual positions that would be displayed in a production device, which would correspond to the markers 83c on the horizontal time bar 83, less the back-trace offset.)
As the video play progresses, the icons 87, 88 on the vertical timeline 86 are highlighted in turn, using a contrasting colour, to indicate the corresponding video progress status at any time.
The user can click on one of the thumbnail frames 84a - 84e, 85 to select a particular tagged segment 85. When a tagged thumbnail is tapped by the user, the content of the preview window 81 jumps to the start time of the tagged section, which will be earlier than the corresponding marker 83c on the horizontal timeline 83 by the backtrace length. In this example a tagged segment 85 on the top right has been selected. The horizontal time bar 83, the vertical timeline 86, and the content of the preview window 81 are all kept synchronised with each other by the app.
Additionally, the circular icon 88 corresponding to the selected tagged segment is enlarged relative to the other icons 87a - 87e and shows a pie chart. A shaded sector of the icon 88 has a variable sector angle representing the position, within the selected tagged segment, of the current frame displayed in the preview window 81, ranging from 0 degrees at the start of the tagged segment, past 180 degrees at the midpoint, to 360 degrees at the end of the tagged segment.
If the user wishes to remove or extract a tagged segment, or to share it over the Internet 6, the user will long-press or double-tap on the corresponding thumbnail 84a - 84e, 85.
Figure 9 shows the interface 90 that pops up in response to long-pressing a thumbnail frame 84d. Three buttons appear. A "remove highlight" button 91, if pressed, causes the app to delete the tagged segment by removing the tag object having the relevant tag ID from the JSON file. An "extract" button 92, if pressed, causes the app to create a new video file containing just the video and audio from the selected tagged segment, following the process described above with reference to Figure 3. The original video file and JSON file are retained, but may be deleted in a separate operation, described below. The new video file appears as a new video in the video library. The app may provide an interface that allows a user to give the new video file (and other tagged video files) a desired file name.
A "share" button 93, if pressed, causes the app to perform the "extract" actions and then also to open an interface with options for sharing the new video file, e.g. by sending a link to the new video file to an email app or social-networking app also installed on the device 1. The interface may be a native data-sharing interface provided by the operating system.
Figure 10 shows an interface 100 that allows the user to extract all the tagged segments in a video (i.e. to compact the video file), rather than just a single tagged segment, and to share the original video file.
The display can be scrolled vertically through a list of tagged video files, with two different videos from the list being displayed at any one time, one in an upper pane 101 and a second in a lower pane 102. Each video is represented by a thumbnail frame (e.g. the first frame of the video) and a data panel showing the total number of tagged segments, the duration of the whole video and the date the video was recorded.
If the user touches one of the video panes (in this example, the upper pane 101), a set of overlaid touch icons appears: a "plus" icon 104, a "share" icon 105 and a "delete" icon 106. The "plus" icon 104 performs a compaction operation on the video by concatenating all of the tagged segments of the video (including associated audio portions) into a new video file, as described above with reference to Figure 3.
In some embodiments, an option may be provided to allow the user to select only a subset of the tagged segments for the concatenation. In some embodiments, an interface may be shown that allows the user to reorder the tagged segments to be different from a default chronological order — e.g. to drag and drop segments into a different sequence. The "share" icon 105 opens a sharing interface for passing the video file to another app for sending over the Internet 6, or, in some embodiments, for causing the tagging app itself to send the new file over the Internet 6. The "delete" icon 106 removes the whole video file from the flash memory 10. This "delete" icon 106 may optionally be used, after performing an extraction or compaction operating on a source video file, to remove the larger, original source video file in order to free up space in the memory 10 of the device 1.
If a video is compacted using the "plus" icon 104, a new compacted video file will subsequently appear as an additional video file in the list shown in the interface 100. If a user wishes to share the compacted video file, he can select it from the list then touch the "share" icon 105.
It will be appreciated by those skilled in the art that the invention has been illustrated by describing one or more specific embodiments thereof, but is not limited to these embodiments; many variations and modifications are possible, within the scope of the accompanying claims.

Claims

1. A method of processing audio or video data, comprising: processing tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; and creating, in response to a user interaction with a touchscreen display, a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
2. The method of claim 1 , wherein the only audio or video data in the new audio or video file is the data equal to or derived from the identified audio or video data that corresponds to the tagged segment.
3. The method of claim 1 or 2, wherein the tag data identifies a plurality of tagged segments within the source audio or video file, and wherein the method comprises: processing the tag data to identify, for each of a set of two or more tagged segments of the plurality of tagged segments, respective audio or video data within the source audio or video file that corresponds to each respective tagged segment; and creating the new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from all of the identified audio or video data.
4. The method of claim 3, wherein the only audio or video data in the new audio or video file is the data equal to or derived from all of the identified audio or video data
5. The method of any preceding claim, further comprising: creating the source audio or video file using signals from a video camera; and creating the tag data while creating the source audio or video file.
6. The method of any preceding claim, wherein the new audio of video file is created in response to a single user interaction with a graphical user interface element on the touchscreen display.
7. Computer software for processing audio or video data, comprising instructions which, when executed by a processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with a touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
8. The computer software of claim 7, wherein the instructions cause the processor to create the new audio or video file such that the only audio or video data in the new audio or video file is the data equal to or derived from the identified audio or video data that corresponds to the tagged segment.
9. The computer software of claim 7 or 8, wherein the tag data identifies a plurality of tagged segments within the source audio or video file, and wherein the instructions cause the processor to: process the tag data to identify, for each of a set of two or more tagged segments of the plurality of tagged segments, respective audio or video data within the source audio or video file that corresponds to each respective tagged segment; and create the new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from all of the identified audio or video data.
10. The computer software of claim 9, wherein the instructions cause the processor to create the new audio or video file such that the only audio or video data in the new audio or video file is the data equal to or derived from all of the identified audio or video data.
11. The computer software of any of claims 7 to 10, comprising instructions for providing a user interface for a user to create tag data identifying a tagged segment in the source audio or video file.
12. The computer software of claim 11, comprising instructions for creating the tag data in response to the user pressing a graphical user interface element on a touchscreen display.
13. The computer software of claim 11 or 12, wherein the user interface provides a plurality of inputs for receiving a plurality of different types of user input corresponding to a plurality of different tag categories.
14. The computer software of any of claims 11 to 13, comprising instructions for determining a position of a tagged segment within the source audio or video file in dependence on a time at which the software detects the user pressing the graphical user interface element, wherein the position is such that the tagged segment starts before a position within the source audio or video file corresponding to said time by a back-trace period of at least one audio sample or video frame.
15. The computer software of any of claims 7 to 14, wherein the tag data is stored in a tag-data file comprising a set of one or more tag objects, each tag object encoding a tagged segment and one or more tag categories associated with the tagged segment.
16. The computer software of any of claims 7 to 15, comprising instructions for creating the new audio or video file in response to a single user interaction with a graphical user interface element on the touchscreen display.
17. The computer software of any of claims 7 to 16, comprising instructions for providing a first interface for creating the new audio or video file to contain a single tagged segment from the source audio or video file, and for providing a second interface for creating the new audio or video file from a plurality of tagged segments from the source audio or video file.
18. The computer software of any of claims 7 to 17, for execution on a device comprising the processor and a video camera, wherein the software comprises instructions for creating the source audio or video file using signals from the video camera.
19. The computer software of any of claims 7 to 18, for execution on a device comprising the processor and a radio modem wherein the software comprises instructions for causing the system to use the radio modem to send the new audio or video file to a remote recipient.
20. A system for processing audio or video data, comprising: a processor; memory; and a touchscreen display, wherein the memory stores software comprising instructions which, when executed by the processor, cause the processor to: process tag data that identifies a tagged segment within a source audio or video file, and that associates the tagged segment with a respective tag category, to identify audio or video data within the source audio or video file that corresponds to the tagged segment; detect a user interaction with the touchscreen display; and in response to the detected user interaction, create a new audio or video file, smaller than the source audio or video file, comprising data equal to or derived from the identified audio or video data.
21. The system of claim 20, wherein the source audio or video file and the tag data are stored in the memory, and wherein the software comprises instructions for causing the processor to create the new audio or video file in the memory.
22. The system of claim 20 or 21 , wherein the system comprises a radio modem, and wherein the software comprises instructions for causing the system to use the radio modem to send the new audio or video file to a remote recipient.
23. The system of any of claims 20 to 22, wherein the system comprises a video camera and the software comprises instructions for creating the source audio or video file using signals from the video camera.
24. The system of any of claims 20 to 23, wherein the system is a portable battery- powered device.
25. The system of claim 24, wherein the device is a smartphone, tablet computer or digital camera.
PCT/GB2021/052497 2020-10-05 2021-09-24 Processing audio or video data WO2022074359A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2015774.9 2020-10-05
GBGB2015774.9A GB202015774D0 (en) 2020-10-05 2020-10-05 Processing audio or video data

Publications (1)

Publication Number Publication Date
WO2022074359A1 true WO2022074359A1 (en) 2022-04-14

Family

ID=73223792

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2021/052497 WO2022074359A1 (en) 2020-10-05 2021-09-24 Processing audio or video data

Country Status (2)

Country Link
GB (1) GB202015774D0 (en)
WO (1) WO2022074359A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050154637A1 (en) * 2004-01-09 2005-07-14 Rahul Nair Generating and displaying level-of-interest values
US20180302694A1 (en) * 2017-04-14 2018-10-18 Sony Corporation Providing highlights of an event recording

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050154637A1 (en) * 2004-01-09 2005-07-14 Rahul Nair Generating and displaying level-of-interest values
US20180302694A1 (en) * 2017-04-14 2018-10-18 Sony Corporation Providing highlights of an event recording

Also Published As

Publication number Publication date
GB202015774D0 (en) 2020-11-18

Similar Documents

Publication Publication Date Title
US11157689B2 (en) Operations on dynamic data associated with cells in spreadsheets
US11082377B2 (en) Scripted digital media message generation
US9681087B2 (en) Method and system for still image capture from video footage
CN104540028B (en) A kind of video beautification interactive experience system based on mobile platform
US10728197B2 (en) Unscripted digital media message generation
WO2022143924A1 (en) Video generation method and apparatus, electronic device, and storage medium
CN101390032A (en) System and methods for storing, editing, and sharing digital video
CN107111437B (en) Digital media message generation
KR20080090218A (en) Method for uploading an edited file automatically and apparatus thereof
US20170243611A1 (en) Method and system for video editing
US11277668B2 (en) Methods, systems, and media for providing media guidance
JP2009027236A (en) Display control device, display control method, and program
CN106331869A (en) Video-based picture re-editing method and device
CN113918522A (en) File generation method and device and electronic equipment
CN112887794B (en) Video editing method and device
US11551724B2 (en) System and method for performance-based instant assembling of video clips
WO2022074359A1 (en) Processing audio or video data
US11503148B2 (en) Asynchronous short video communication platform based on animated still images and audio
WO2018005569A1 (en) Videos associated with cells in spreadsheets
WO2016184161A1 (en) Method, apparatus and terminal for producing operations manual

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21801162

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21801162

Country of ref document: EP

Kind code of ref document: A1