WO2022231626A1 - Video and audio action metadata - Google Patents

Video and audio action metadata Download PDF

Info

Publication number
WO2022231626A1
WO2022231626A1 PCT/US2021/030300 US2021030300W WO2022231626A1 WO 2022231626 A1 WO2022231626 A1 WO 2022231626A1 US 2021030300 W US2021030300 W US 2021030300W WO 2022231626 A1 WO2022231626 A1 WO 2022231626A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
metadata
video
stream
requirement
Prior art date
Application number
PCT/US2021/030300
Other languages
French (fr)
Inventor
Rafael Dal ZOTTO
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2021/030300 priority Critical patent/WO2022231626A1/en
Publication of WO2022231626A1 publication Critical patent/WO2022231626A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • Digital video files contain video streams and audio streams.
  • digital video files may be experienced locally on a computing device or streamed from a network.
  • FIG. 1 is a block diagram of a system for supporting video and audio action metadata, according to an example
  • FIG. 2A is an illustration of digital video file with action metadata, according to an example
  • FIG. 2B is a block diagram of a software stack for digital video file with action metadata, according to an example
  • FIG. 2C is a block diagram of a system for streaming action metadata according to an example
  • FIG. 3 illustrates a method for executing action metadata according to an example
  • FIG. 4 is a computing device for supporting instructions for video and audio action, according to an example.
  • Digital video files have become ubiquitous in the modern computing environment. In many cases, viewing digital video files is the preferred method of accessing information. Digital video files present a localized method for a user to view (and hear) digital video files on a device that has little to no connectivity. The internet supports many digital video streaming websites and applications. However, in many cases, the digital video files accessible on computing devices today lack any interacti vity to engage or assist the view. Digital video files remain passive knowledge transfer mechanisms.
  • the system includes a memory containing instructions to cause a processor to decode a digital video file.
  • the processor may extract an action from metadata in the digital video file.
  • the processor may align the action with the video and audio stream.
  • the processor may play back the video stream and audio stream. Responsive to surpassing a marker, the action is executed.
  • An action may be an interactive metadata element recorded during the recording of a video and/or audio streams for a digital image file. Additionally, actions may be added after the recording to better enhance the interactive experience. An action may take the place of the “Watch, Pause, Resume” cycle of video learning or knowledge transfer.
  • JSON javascript notation object
  • Example 1 illustrates an example of an action in JSON.
  • Actions may be defined in metadata in a method that is decodable by an application. JSON was chosen for its simplicity and human readable form. In this example, there are three portions to the action: a requirement, settings, and actions.
  • the requirement is a minimum common denominator requirement for effectuating an action.
  • the specification of a requirement mandates that certain criteria must be fulfilled, or the action may not succeed.
  • a setting corresponds to a metric originating from when the video was recorded. By recording the setting, any conversion (e.g. mapping of mouse movements) may be accomplished by interpolating the recording settings to the play back settings. In an example, the video stream recording took place at a higher resolution than the video play back, by utilizing the setting field, the lower resolution of the user’s play back display can be compensated for.
  • An “action” in the JSON of Example 1 corresponds to the action to be accomplished at the timestamp.
  • Example 1 presents the loading of “Notepad.exe” with a parameter list. Another action in example 1 presents a key combination press.
  • FIG. 1 is a block diagram of a system for supporting video and audio action metadata, according to an example.
  • the processor 102 of the device 100 may be implemented as dedicated hardware circuitry or a virtualized logical processor.
  • the dedicated hardware circuitry may be implemented as a central processing unit (CPU).
  • CPU central processing unit
  • a dedicated hardware CPU may be implemented as a single to many-core general purpose processor.
  • a dedicated hardware CPU may also be implemented as a multi-chip solution, where more than one CPU are linked through a bus and schedule processing tasks across the more than one CPU.
  • a virtualized logical processor may be implemented across a distributed computing environment.
  • a virtualized logical processor may not have a dedicated piece of hardware supporting it. Instead, the virtualized logical processor may have a pool of resources supporting the task for which it was provisioned.
  • the virtualized logical processor may be executed on hardware circuitry; however, the hardware circuitry is not dedicated.
  • the hardware circuitry may be in a shared environment where utilization is time sliced.
  • the virtualized logical processor includes a software layer between any executing application and the hardware circuitry to handle any abstraction which also monitors and save the application state.
  • Virtual machines may be implementations of virtualized logical processors.
  • a memory 104 may be implemented in the device 100.
  • the memory 104 may be dedicated hardware circuitry to host instructions for the processor 102 to execute.
  • the memory 104 may be virtualized logical memory.
  • dedicated hardware circuitry may be implemented with dynamic random-access memory (DRAM) or other hardware implementations for storing processor instructions.
  • the virtualized logical memory may be implemented in a software abstraction which allows the instructions 106 to be executed on a virtualized logical processor, independent of any dedicated hardware implementation.
  • the device 100 may also include instructions 106.
  • the instructions 106 may be implemented in a platform specific language that the processor 102 may decode and execute.
  • the instructions 106 may be stored in the memory 104 during execution.
  • the instructions 106 may cause the processor 102 to decode a digital video file comprising a video stream, an audio stream, and metadata 108.
  • the digital video file may be a containerized object.
  • the containerized object may include a flag indicative of action metadata encoding.
  • Within the digital video file may be three distinct substantive parts including one or more video streams (e.g. different camera angles), one or more audio streams (e.g. different languages), and metadata.
  • the metadata may include information regarding the encoding of the video streams and audio streams so that a playback application can properly index into the digital video file, synchronize any desired video and audio streams and play back the synchronized streams.
  • the instructions 106 may cause the processor 102 to extract an action from the metadata wherein the action comprises a marker 110.
  • the metadata may include an action.
  • An action may include an interactive component to the digital video file. Multiple actions may be included in the metadata where each action may correspond to a particular video or audio stream or combination. For example, one action may only be applicable to video stream A and audio stream B, where the action may not be effectuated unless play back is only video stream A and audio stream B.
  • Actions may include a marker.
  • the marker may be a timestamp relative to the duration of the video and audio streams.
  • the marker may be a point within the play back that the action is effectuated.
  • the action may also include interactions recorded into the metadata that mimic a behavior recorded in the video and audio stream.
  • the instructions 106 may cause the processor 102 to align the action with the video stream and the audio stream based on the marker 112.
  • the marker indicates where the action should take place relative to the video and audio stream.
  • the action may be interpreted and executed at the point in the video stream and audio indicated by the marker.
  • the marker may be a timestamp or a frame number. For example, a marker may correspond to a minute and fifteen seconds from the beginning of a video stream.
  • the instructions 106 may cause the processor 102 to extract a requirement from the metadata wherein the requirement corresponds to the action.
  • a requirement may be a validation step for a playback application to verify that the system on which playback may occur has the necessary tools and applications to complete actions within the metadata.
  • the instructions 106 may cause the processor 102 to align the requirement prior to the marker of the action. This alignment may be to protect the action from undefined behavior.
  • the requirement Prior to playback, the requirement may be evaluated. In some implementations, more than one requirement and possible all requirements may be evaluated. Requirement execution may validate operating system version levels, installed applications, and hardware functionality.
  • a prompt to a user to remediate an issue to the requirement may be issues. Popup dialog notification indicating the requirement at issue and how to resolve it is an example of the prompt.
  • a validation may be included in the metadata.
  • a validation similar to the requirement metadata, supports the completion of actions.
  • a validation may be extracted from the metadata and corresponds to an action. Unlike the requirement, the validation takes place after the action is executed. Responsive to the execution of the action, an evaluation of the validation may take place. The validation may check a system state determinative of success of the action upon the execution of the action. If the evaluation of the validation fails, indicating that the executed action failed, the playback of the video and audio stream may be paused. Additionally, a prompt may be present to the user explaining the action and validation states.
  • a validation may include checking an exit code of an application executed as an action.
  • the instructions 106 may cause the processor 102 to play back the video stream and the audio stream 114.
  • the traditional play back of the video stream and audio stream are controlled by the system. Once the play back begins, the user may still interact with the video and audio streams in the traditional ways (e.g. pause, rewind, etc.).
  • FIG. 2A is an illustration 200A of digital video file with action metadata, according to an example.
  • a multimedia recording 202 may include one or more video streams and one or more audio streams.
  • Metadata within the multimedia recording 202 may include actions 204, 206.
  • the actions 204, 206 align with given points within the multimedia recording 202.
  • the actions 204, 206 may take place during the duration 208 of the multimedia recording 202.
  • FIG. 2B is a block diagram of a software stack 200B for digital video file with action metadata, according to an example.
  • a traditional digital video file 222 there may exist a video 210 element and an audio 212 element
  • the video 210 and audio 212 elements may be separately encoded and synchronized with metadata.
  • the video and audio may be multiplexed into a single object with implicit synchronization due to the structure of the single object.
  • the software stack 200B may be augmented to support metadata action in digital video files by incorporating the actions 214.
  • mouse 218 movements may be replicated
  • keyboard 220 keystrokes may be replicated
  • applications 216 may be launched, replicating the view from the video 210.
  • Additional interaction supportable by the actions 214 may include other input methods including touchscreen input, and game controller input.
  • FIG. 2C is a block diagram of a system 200C for streaming action metadata according to an example.
  • a video streaming services such as YouTube®, may be augmented to support action metadata.
  • a video streaming service 224 may transmit the video 222 in a traditional video and audio stream.
  • a separate metadata service 234 may host the action metadata separately from the video streaming service 224.
  • the metadata 226 may be correlated to a video identifier associated with the video 222 on the video streaming service 224. The video identifier allows the metadata service 234 to retrieve the correct metadata 226 for the video 222 on the video streaming service 224.
  • a web browser 228 renders the video 222 of the video streaming service.
  • the web browser 228 may also be a standalone application that incorporates a traditional playback functionality with the plugin 230 and OS component 232 integrated in one.
  • a plugin 230 to the web browser 228 establishes a communication channel with the metadata service 234. When a user accesses the video streaming service 224 via the browser 228, the plugin 230 extracts the video identifier and sends the video identifier to the cloud service 224 to retrieve the appropriate metadata 226.
  • An operating system (OS) component 232 communicates with the plugin 230 to interface with subsystems that are limited in the web browser 228 environment. Examples include using the OS component to launch applications, replicate mouse movements, as well as recreating key stroke combinations.
  • OS operating system
  • FIG. 3 illustrates a method 300 for executing action metadata according to an example.
  • the method 300 of FIG. 3 corresponds to some of the components illustrated in FIG. 2C.
  • the components may be referenced in the description of the method 300.
  • the processor described in reference to method 300 may include the processor 102 introduced in reference to FIG. 1.
  • a processor 102 may receive a video stream and audio stream from a video streaming service 224.
  • the browser 228 may interpret the video and audio streams from the video streaming service 224 in a similar manner as to viewing without action metadata.
  • the processor 102 may receive a metadata 226 stream from a metadata server, wherein the metadata corresponds to the video and audio streams.
  • the metadata 226 stream may be recorded during the recording of the audio and video streams.
  • the video streaming service 224 may not support action metadata
  • the metadata 226 stream may be uploaded to the metadata service 234 and indexed based on an identifier within the video streaming service 224.
  • the processor 102 may process the metadata stream by correlating an action with a timestamp within the video and audio stream.
  • the plugin 230 may be executed on the processor to align the metadata 226 stream with the timestamps associated within the video 222 from the video streaming service 224.
  • the processor 102 may execute the action in an operating system specific execution environment.
  • the plugin 230 executed by the processor, may message the OS component 232 operating in an OS specific execution environment to execute the action.
  • the OS specific execution environment may be synonymous with the OS component 232 of FIG. 2C.
  • the operating system specific execution environment may include a native executable for a specific operating system configured to receive the action from the web browser plugin.
  • FIG. 4 is a computing device for supporting instructions for video and audio action, according to an example.
  • the computing device 400 depicts a processor 102 and a storage medium 404 and, as an example of the computing device 400 performing its operations, the storage medium 404 may include instructions 406-416 that are executable by the processor 102.
  • the processor 102 may be synonymous with the processor 102 referenced in FIG. 1. Additionally, the processor 102 may include but is not limited to central processing units (CPUs).
  • the storage medium 404 can be said to store program instructions that, when executed by processor 102, implement the components of the computing device 400.
  • the executable program instructions stored in the storage medium 404 include, as an example, instructions to open a digital video file comprising a video stream, an audio stream and metadata 406, instructions to extract an action, a requirement, and a validation from the metadata wherein the action comprises a timestamp and the requirement comprises an executable file 408, instructions to align the action with the video stream and the audio stream based on the timestamp 410, instructions to verify the executable file of the requirement is executable 412, instructions play back the video stream and the audio stream 414, and instructions to execute the executable file at the timestamp of the action in reference to the playback 416, and instructions to verify the executable executed 418.
  • Storage medium 404 represents generally any number of memory components capable of storing instructions that can be executed by processor 102.
  • Storage medium 404 is non-transitory in the sense that it does not encompass a transitory signal but instead is made up of at least one memory component configured to store the relevant instructions.
  • the storage medium 404 may be a non-transitory computer-readable storage medium.
  • Storage medium 404 may be implemented in a single device or distributed across devices.
  • processor 102 represents any number of processors capable of executing instructions stored by storage medium 404.
  • Processor 102 may be integrated in a single device or distributed across devices.
  • storage medium 404 may be fully or partially integrated in the same device as processor 102, or it may be separate but accessible to that computing device 400 and the processor 102.
  • the program instructions 406-418 may be part of an installation package that, when installed, can be executed by processor 102 to implement the components of the computing device 400.
  • storage medium 404 may be a portable medium such as a CD, DVD, or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed.
  • the program instructions may be part of an application or applications already installed.
  • storage medium 404 can include integrated memory such as a hard drive, solid state drive, or the like.
  • examples described may include various components and features. It Is also appreciated that numerous specific details are set forth to provide a thorough understanding of the examples. However, it is appreciated that the examples may be practiced without limitations to these specific details. In other instances, well known methods and structures may not be described in detail to avoid unnecessarily obscuring the description of the examples. Also, the examples may be used in combination with each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

In an example implementation according to aspects of the present disclosure, a system, method, and storage medium. The system decodes a digital video file comprising a video stream, an audio stream, and metadata. The system extracts an action from the metadata wherein the action comprises a marker. The system aligns the action with the video stream and the audio stream based on the marker. The system plays back the video stream and the audio stream and executes the action based on the surpassing of the marker.

Description

VIDEO AND AUDIO ACTION METADATA
BACKGROUND
[0001] Digital video files contain video streams and audio streams. In some instances, digital video files may be experienced locally on a computing device or streamed from a network.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 is a block diagram of a system for supporting video and audio action metadata, according to an example;
[0003] FIG. 2A is an illustration of digital video file with action metadata, according to an example;
[0004] FIG. 2B is a block diagram of a software stack for digital video file with action metadata, according to an example;
[0005] FIG. 2C is a block diagram of a system for streaming action metadata according to an example;
[0006] FIG. 3 illustrates a method for executing action metadata according to an example; and
[0007] FIG. 4 is a computing device for supporting instructions for video and audio action, according to an example.
DETAILED DESCRIPTION
[0008] Digital video files have become ubiquitous in the modern computing environment. In many cases, viewing digital video files is the preferred method of accessing information. Digital video files present a localized method for a user to view (and hear) digital video files on a device that has little to no connectivity. The internet supports many digital video streaming websites and applications. However, in many cases, the digital video files accessible on computing devices today lack any interacti vity to engage or assist the view. Digital video files remain passive knowledge transfer mechanisms.
[0009] In instances where a digital video file is attempting to teach the user a skill or technique, the user must engage in a “Watch, Pause, Resume" cycle. It is disclosed herein, a system to reduce or eliminate any subsequent reason to execute the “Watch, Pause, Resume” cycle.
[0010] In one example, the system includes a memory containing instructions to cause a processor to decode a digital video file. The processor may extract an action from metadata in the digital video file. The processor may align the action with the video and audio stream. The processor may play back the video stream and audio stream. Responsive to surpassing a marker, the action is executed.
[0011] An action may be an interactive metadata element recorded during the recording of a video and/or audio streams for a digital image file. Additionally, actions may be added after the recording to better enhance the interactive experience. An action may take the place of the “Watch, Pause, Resume" cycle of video learning or knowledge transfer. An example of a javascript notation object (JSON) follows in example 1:
Figure imgf000003_0001
Figure imgf000004_0001
Example 1
Example 1 illustrates an example of an action in JSON. Actions may be defined in metadata in a method that is decodable by an application. JSON was chosen for its simplicity and human readable form. In this example, there are three portions to the action: a requirement, settings, and actions.
[0012] The requirement is a minimum common denominator requirement for effectuating an action. The specification of a requirement mandates that certain criteria must be fulfilled, or the action may not succeed. A setting corresponds to a metric originating from when the video was recorded. By recording the setting, any conversion (e.g. mapping of mouse movements) may be accomplished by interpolating the recording settings to the play back settings. In an example, the video stream recording took place at a higher resolution than the video play back, by utilizing the setting field, the lower resolution of the user’s play back display can be compensated for. An “action" in the JSON of Example 1 , corresponds to the action to be accomplished at the timestamp. Example 1 presents the loading of “Notepad.exe” with a parameter list. Another action in example 1 presents a key combination press.
[0013] FIG. 1 is a block diagram of a system for supporting video and audio action metadata, according to an example. The processor 102 of the device 100 may be implemented as dedicated hardware circuitry or a virtualized logical processor. The dedicated hardware circuitry may be implemented as a central processing unit (CPU). A dedicated hardware CPU may be implemented as a single to many-core general purpose processor. A dedicated hardware CPU may also be implemented as a multi-chip solution, where more than one CPU are linked through a bus and schedule processing tasks across the more than one CPU.
[0014] A virtualized logical processor may be implemented across a distributed computing environment. A virtualized logical processor may not have a dedicated piece of hardware supporting it. Instead, the virtualized logical processor may have a pool of resources supporting the task for which it was provisioned. In this implementation, the virtualized logical processor may be executed on hardware circuitry; however, the hardware circuitry is not dedicated. The hardware circuitry may be in a shared environment where utilization is time sliced. In some implementations the virtualized logical processor includes a software layer between any executing application and the hardware circuitry to handle any abstraction which also monitors and save the application state. Virtual machines (VMs) may be implementations of virtualized logical processors.
[0015] A memory 104 may be implemented in the device 100. The memory 104 may be dedicated hardware circuitry to host instructions for the processor 102 to execute. In another implementation, the memory 104 may be virtualized logical memory. Analogous to the processor 102, dedicated hardware circuitry may be implemented with dynamic random-access memory (DRAM) or other hardware implementations for storing processor instructions. Additionally, the virtualized logical memory may be implemented in a software abstraction which allows the instructions 106 to be executed on a virtualized logical processor, independent of any dedicated hardware implementation.
[0016]The device 100 may also include instructions 106. The instructions 106 may be implemented in a platform specific language that the processor 102 may decode and execute. The instructions 106 may be stored in the memory 104 during execution.
[0017] The instructions 106 may cause the processor 102 to decode a digital video file comprising a video stream, an audio stream, and metadata 108. The digital video file may be a containerized object. The containerized object may include a flag indicative of action metadata encoding. Within the digital video file, may be three distinct substantive parts including one or more video streams (e.g. different camera angles), one or more audio streams (e.g. different languages), and metadata. The metadata may include information regarding the encoding of the video streams and audio streams so that a playback application can properly index into the digital video file, synchronize any desired video and audio streams and play back the synchronized streams.
[0018]The instructions 106 may cause the processor 102 to extract an action from the metadata wherein the action comprises a marker 110. Within the metadata may include an action. An action may include an interactive component to the digital video file. Multiple actions may be included in the metadata where each action may correspond to a particular video or audio stream or combination. For example, one action may only be applicable to video stream A and audio stream B, where the action may not be effectuated unless play back is only video stream A and audio stream B. Actions may include a marker. The marker may be a timestamp relative to the duration of the video and audio streams. The marker may be a point within the play back that the action is effectuated. The action may also include interactions recorded into the metadata that mimic a behavior recorded in the video and audio stream.
[0019] The instructions 106 may cause the processor 102 to align the action with the video stream and the audio stream based on the marker 112. The marker indicates where the action should take place relative to the video and audio stream. The action may be interpreted and executed at the point in the video stream and audio indicated by the marker. The marker may be a timestamp or a frame number. For example, a marker may correspond to a minute and fifteen seconds from the beginning of a video stream.
[0020] In some implementations, the instructions 106 may cause the processor 102 to extract a requirement from the metadata wherein the requirement corresponds to the action. As illustrated in Example 1 , a requirement may be a validation step for a playback application to verify that the system on which playback may occur has the necessary tools and applications to complete actions within the metadata. Similarly, to the action, the instructions 106 may cause the processor 102 to align the requirement prior to the marker of the action. This alignment may be to protect the action from undefined behavior. Prior to playback, the requirement may be evaluated. In some implementations, more than one requirement and possible all requirements may be evaluated. Requirement execution may validate operating system version levels, installed applications, and hardware functionality. Upon determining a requirement is not met, a prompt to a user to remediate an issue to the requirement may be issues. Popup dialog notification indicating the requirement at issue and how to resolve it is an example of the prompt.
[0021] Additionally, a validation may be included in the metadata. A validation, similar to the requirement metadata, supports the completion of actions. A validation may be extracted from the metadata and corresponds to an action. Unlike the requirement, the validation takes place after the action is executed. Responsive to the execution of the action, an evaluation of the validation may take place. The validation may check a system state determinative of success of the action upon the execution of the action. If the evaluation of the validation fails, indicating that the executed action failed, the playback of the video and audio stream may be paused. Additionally, a prompt may be present to the user explaining the action and validation states. A validation may include checking an exit code of an application executed as an action.
[0022] The instructions 106 may cause the processor 102 to play back the video stream and the audio stream 114. The traditional play back of the video stream and audio stream are controlled by the system. Once the play back begins, the user may still interact with the video and audio streams in the traditional ways (e.g. pause, rewind, etc.).
[0023] The instructions 106 may cause the processor 102 to execute the action based on the surpassing of the marker 116. Upon reaching the timestamp or the frame number corresponding to the marker, the action executes. What specifically happens when the action executes is defined in the corresponding action in the metadata. [0024] FIG. 2A is an illustration 200A of digital video file with action metadata, according to an example. In the illustration 200A, a multimedia recording 202 may include one or more video streams and one or more audio streams. Metadata within the multimedia recording 202 may include actions 204, 206. The actions 204, 206 align with given points within the multimedia recording 202. The actions 204, 206 may take place during the duration 208 of the multimedia recording 202.
[0025] FIG. 2B is a block diagram of a software stack 200B for digital video file with action metadata, according to an example. Within a traditional digital video file 222 there may exist a video 210 element and an audio 212 element The video 210 and audio 212 elements may be separately encoded and synchronized with metadata. In another implementation (not shown) the video and audio may be multiplexed into a single object with implicit synchronization due to the structure of the single object. The software stack 200B may be augmented to support metadata action in digital video files by incorporating the actions 214. Within the actions 214 block, mouse 218 movements may be replicated, keyboard 220 keystrokes may be replicated, and applications 216 may be launched, replicating the view from the video 210.
Additional interaction supportable by the actions 214 may include other input methods including touchscreen input, and game controller input.
[0026] FIG. 2C is a block diagram of a system 200C for streaming action metadata according to an example. In this example, a video streaming services, such as YouTube®, may be augmented to support action metadata.
[0027] In this system 200C, a video streaming service 224 may transmit the video 222 in a traditional video and audio stream. A separate metadata service 234 may host the action metadata separately from the video streaming service 224. Within the metadata service 234, the metadata 226 may be correlated to a video identifier associated with the video 222 on the video streaming service 224. The video identifier allows the metadata service 234 to retrieve the correct metadata 226 for the video 222 on the video streaming service 224.
[0028] A web browser 228 renders the video 222 of the video streaming service. In another embodiment, the web browser 228 may also be a standalone application that incorporates a traditional playback functionality with the plugin 230 and OS component 232 integrated in one. A plugin 230 to the web browser 228 establishes a communication channel with the metadata service 234. When a user accesses the video streaming service 224 via the browser 228, the plugin 230 extracts the video identifier and sends the video identifier to the cloud service 224 to retrieve the appropriate metadata 226.
[0029] An operating system (OS) component 232 communicates with the plugin 230 to interface with subsystems that are limited in the web browser 228 environment. Examples include using the OS component to launch applications, replicate mouse movements, as well as recreating key stroke combinations.
[0030] FIG. 3 illustrates a method 300 for executing action metadata according to an example. The method 300 of FIG. 3 corresponds to some of the components illustrated in FIG. 2C. The components may be referenced in the description of the method 300. The processor described in reference to method 300 may include the processor 102 introduced in reference to FIG. 1.
[0031] At block 302, a processor 102 may receive a video stream and audio stream from a video streaming service 224. The browser 228 may interpret the video and audio streams from the video streaming service 224 in a similar manner as to viewing without action metadata.
[0032] At block 304, the processor 102 may receive a metadata 226 stream from a metadata server, wherein the metadata corresponds to the video and audio streams. The metadata 226 stream may be recorded during the recording of the audio and video streams. As the video streaming service 224 may not support action metadata, the metadata 226 stream may be uploaded to the metadata service 234 and indexed based on an identifier within the video streaming service 224.
[0033] At block 306, the processor 102 may process the metadata stream by correlating an action with a timestamp within the video and audio stream. The plugin 230 may be executed on the processor to align the metadata 226 stream with the timestamps associated within the video 222 from the video streaming service 224.
[0034] At block 308, the processor 102 may execute the action in an operating system specific execution environment. The plugin 230, executed by the processor, may message the OS component 232 operating in an OS specific execution environment to execute the action. The OS specific execution environment may be synonymous with the OS component 232 of FIG. 2C. The operating system specific execution environment may include a native executable for a specific operating system configured to receive the action from the web browser plugin.
[0035] FIG. 4 is a computing device for supporting instructions for video and audio action, according to an example. The computing device 400 depicts a processor 102 and a storage medium 404 and, as an example of the computing device 400 performing its operations, the storage medium 404 may include instructions 406-416 that are executable by the processor 102. The processor 102 may be synonymous with the processor 102 referenced in FIG. 1. Additionally, the processor 102 may include but is not limited to central processing units (CPUs). The storage medium 404 can be said to store program instructions that, when executed by processor 102, implement the components of the computing device 400.
[0036] The executable program instructions stored in the storage medium 404 include, as an example, instructions to open a digital video file comprising a video stream, an audio stream and metadata 406, instructions to extract an action, a requirement, and a validation from the metadata wherein the action comprises a timestamp and the requirement comprises an executable file 408, instructions to align the action with the video stream and the audio stream based on the timestamp 410, instructions to verify the executable file of the requirement is executable 412, instructions play back the video stream and the audio stream 414, and instructions to execute the executable file at the timestamp of the action in reference to the playback 416, and instructions to verify the executable executed 418.
[0037] Storage medium 404 represents generally any number of memory components capable of storing instructions that can be executed by processor 102. Storage medium 404 is non-transitory in the sense that it does not encompass a transitory signal but instead is made up of at least one memory component configured to store the relevant instructions. As a result, the storage medium 404 may be a non-transitory computer-readable storage medium. Storage medium 404 may be implemented in a single device or distributed across devices. Likewise, processor 102 represents any number of processors capable of executing instructions stored by storage medium 404. Processor 102 may be integrated in a single device or distributed across devices. Further, storage medium 404 may be fully or partially integrated in the same device as processor 102, or it may be separate but accessible to that computing device 400 and the processor 102.
[0038] In one example, the program instructions 406-418 may be part of an installation package that, when installed, can be executed by processor 102 to implement the components of the computing device 400. In this case, storage medium 404 may be a portable medium such as a CD, DVD, or flash drive, or a memory maintained by a server from which the installation package can be downloaded and installed. In another example, the program instructions may be part of an application or applications already installed. Here, storage medium 404 can include integrated memory such as a hard drive, solid state drive, or the like.
[0039] It is appreciated that examples described may include various components and features. It Is also appreciated that numerous specific details are set forth to provide a thorough understanding of the examples. However, it is appreciated that the examples may be practiced without limitations to these specific details. In other instances, well known methods and structures may not be described in detail to avoid unnecessarily obscuring the description of the examples. Also, the examples may be used in combination with each other.
[0040] Reference in the specification to “an example" or similar language means that a particular feature, structure, or characteristic described in connection with the example is included in at least one example, but not necessarily in other examples. The various instances of the phrase “in one example” or similar phrases in various places in the specification are not necessarily all referring to the same example.
[0041] It is appreciated that the previous description of the disclosed examples is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these examples will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other examples without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

CLAIMS WHAT IS CLAIMED IS:
1 . A system comprising: a processor; and a memory communicatively coupled to the processor and storing machine readable instructions that when executed cause the processor to: decode a digital video fiie comprising a video stream, an audio stream, and metadata; extract an action from the metadata wherein the action comprises a marker; align the action with the video stream and the audio stream based on the marker; play back the video stream and the audio stream; and execute the action based on the surpassing of the marker.
2. The system of claim 1 , the instructions further comprising: extract a requirement from the metadata wherein the requirement corresponds to the action; align the requirement prior to the marker of the action: evaluate, prior to playback, the requirement; and prompt, responsive to the evaluation, a user to remediate an issue related to the requirement.
3. The system of claim 2, the instructions further comprising: extract a validation from the metadata where in the validation corresponds to the action; evaluate, responsive to the execution of the action, the validation; and pause, responsive to the evaluation, the playback of the video stream and audio stream.
4. The system of claim 3 wherein the metadata is encapsulated separately from the video and audio stream in a container object.
5. The system of claim 4 wherein the action corresponds to an executable application executing on a host device.
6. A method comprising: receiving a video stream and audio stream from a video streaming service; receiving a metadata stream from a metadata server, wherein the metadata corresponds to the video and audio streams; processing the metadata stream by correlating an action with a timestamp within the video and audio stream; and executing the action in an operating specific execution environment.
7. The method of claim 6 wherein the video stream and audio stream are received in a web browser.
8. The method of claim 7 wherein the metadata stream is received by a web browser extension.
9. The method of claim 8 wherein the operating system specific execution environment comprises a native executable configured to receive the action from the web browser plugin.
10. The method of ciaim 8 wherein the native executable executes at a permission level corresponding to the action.
11. A non-transitory computer readable medium comprising machine readable instructions that when executed cause a processor to: open a digital video file comprising a video stream, an audio stream and metadataextract an action, a requirement, and a validation from the metadata wherein the action comprises a timestamp and the requirement comprises an executable file; align the action with the video stream and the audio stream based on the timestamp; verify the executable file of the requirement is executable; play back the video stream and the audio stream; execute the executable file at the timestamp of the action in reference to the playback; and verify the executable executed.
12. The medium of ciaim 11, the verifying further comprises validating an exit code of the executable.
13. The medium of claim 11 wherein metadata comprises a javascript object notation (JSON) object describing the action, the requirement, and the validation.
14. The medium of claim 13 wherein the metadata is encapsulated separately from the video and audio stream in a container object.
15. The medium of claim 14 wherein the container object comprises a flag indicative of the metadata encoding.
PCT/US2021/030300 2021-04-30 2021-04-30 Video and audio action metadata WO2022231626A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/030300 WO2022231626A1 (en) 2021-04-30 2021-04-30 Video and audio action metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/030300 WO2022231626A1 (en) 2021-04-30 2021-04-30 Video and audio action metadata

Publications (1)

Publication Number Publication Date
WO2022231626A1 true WO2022231626A1 (en) 2022-11-03

Family

ID=83847198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/030300 WO2022231626A1 (en) 2021-04-30 2021-04-30 Video and audio action metadata

Country Status (1)

Country Link
WO (1) WO2022231626A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174774A1 (en) * 2005-04-20 2007-07-26 Videoegg, Inc. Browser editing with timeline representations
US9426543B1 (en) * 2015-12-18 2016-08-23 Vuclip (Singapore) Pte. Ltd. Server-based video stitching
US20160381111A1 (en) * 2015-06-23 2016-12-29 Facebook, Inc. Streaming media presentation system
US20180014077A1 (en) * 2016-07-05 2018-01-11 Pluto Inc. Methods and systems for generating and providing program guides and content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070174774A1 (en) * 2005-04-20 2007-07-26 Videoegg, Inc. Browser editing with timeline representations
US20160381111A1 (en) * 2015-06-23 2016-12-29 Facebook, Inc. Streaming media presentation system
US9426543B1 (en) * 2015-12-18 2016-08-23 Vuclip (Singapore) Pte. Ltd. Server-based video stitching
US20180014077A1 (en) * 2016-07-05 2018-01-11 Pluto Inc. Methods and systems for generating and providing program guides and content

Similar Documents

Publication Publication Date Title
US9747191B1 (en) Tool to replicate actions across devices in real time for improved efficiency during manual application testing
US9635098B2 (en) Open platform, open platform access system, storage medium, and method for allowing third party application to access open platform
CN106797388B (en) Cross-system multimedia data encoding and decoding method and device, electronic equipment and computer program product
US20120284696A1 (en) Method, Apparatuses and a System for Compilation
EP3274831A1 (en) Application container for live migration of mobile applications
WO2017107514A1 (en) Offline transcoding method and system
WO2015027912A1 (en) Method and system for controlling process for recording media content
US20180324238A1 (en) A System and Methods Thereof for Auto-playing Video Content on Mobile Devices
WO2016207735A1 (en) A system and methods thereof for auto-playing video content on mobile devices
US10402264B2 (en) Packet-aware fault-tolerance method and system of virtual machines applied to cloud service, computer readable record medium and computer program product
US9674255B1 (en) Systems, devices and methods for presenting content
WO2019071678A1 (en) Live broadcasting method and device
US9578395B1 (en) Embedded manifests for content streaming
US11868811B2 (en) Detecting real-time clock loss
CN113709185B (en) Method and system for realizing Android cloud streaming
WO2017218955A1 (en) Method and apparatus for hot upgrading a virtual machine management service module
US20180192121A1 (en) System and methods thereof for displaying video content
CN114042310A (en) Game operation data collection method and device, computer equipment and storage medium
CN112691365B (en) Cloud game loading method, system, device, storage medium and cloud game system
CN106471492B (en) Acts of indexing resources
US10083052B1 (en) Streamlined remote application streaming technology configuration and usage
WO2022231626A1 (en) Video and audio action metadata
US20170329594A1 (en) Communicating a data image for installing an operating system
WO2015184902A1 (en) Concurrent processing method for intelligent split-screen and corresponding intelligent terminal
KR102243696B1 (en) Configuration for detecting hardware-based or software-based decoding of video content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21939561

Country of ref document: EP

Kind code of ref document: A1