WO2023081213A1 - Systèmes et procédés de traitement vidéo - Google Patents

Systèmes et procédés de traitement vidéo Download PDF

Info

Publication number
WO2023081213A1
WO2023081213A1 PCT/US2022/048704 US2022048704W WO2023081213A1 WO 2023081213 A1 WO2023081213 A1 WO 2023081213A1 US 2022048704 W US2022048704 W US 2022048704W WO 2023081213 A1 WO2023081213 A1 WO 2023081213A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
compressed video
nft
metadata associated
memory
Prior art date
Application number
PCT/US2022/048704
Other languages
English (en)
Inventor
Forrest BRIGGS
Original Assignee
Lifecast Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/867,036 external-priority patent/US20230018560A1/en
Priority claimed from US17/961,051 external-priority patent/US20230103814A1/en
Application filed by Lifecast Incorporated filed Critical Lifecast Incorporated
Publication of WO2023081213A1 publication Critical patent/WO2023081213A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server
    • H04N21/2541Rights Management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/274Storing end-user multimedia data in response to end-user request, e.g. network recorder
    • H04N21/2743Video hosting of uploaded data from client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42202Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] environmental sensors, e.g. for detecting temperature, luminosity, pressure, earthquakes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/10Recognition assisted with metadata

Definitions

  • the present disclosure relates to systems and methods that perform video processing for various types of systems.
  • a non-fungible token is a type of smart contract, which is typically stored and executed in a blockchain, which represents one or more copies of a unique digital object. NFTs may be “minted” (e.g., created), bought, sold, and traded. The protocol of a specific blockchain maintains a record of ownership that is resilient to double-spend attacks and censorship. Many cryptographic digital assets are fungible (i.e., 1 ETH or 1 BTC is interchangeable for another), whereas NFTs are not fungible (i.e., each NFT represents a unique thing which is not equivalent to any other thing). NFTs are commonly used to represent images, photos, and videos.
  • NFTs also represent programs which can be executed in a web browser to produce dynamic or interactive content. This latter type of NFT may be programmed in Javascript or a compatible language that runs in a web browser. In some cases, an NFT may represent a bundle of data and files.
  • FIG. l is a block diagram illustrating an environment within which an example embodiment may be implemented.
  • FIG. 2 is a block diagram illustrating an embodiment of a computing system.
  • FIG. 3 is a flow diagram illustrating an embodiment of a non-fungible token (NFT).
  • NFT non-fungible token
  • FIG. 4 is a flow diagram illustrating an embodiment of a process for capturing photo data, video data, and other sensor data, then generating an NFT that includes the photo data, video data, other sensor data, and a video player application.
  • FIG. 5 illustrates an embodiment of a process for generating an NFT based on received compressed video data, metadata, and an application program.
  • FIG. 6 illustrates an embodiment of a marketplace or collection of content represented by thumbnail images that link to specific items of content.
  • FIG. 7 illustrates an embodiment of an avatar in the metaverse that interacts with a picture frame that is part of a shared virtual world.
  • FIG. 8 illustrates an example block diagram of a computing device.
  • the systems and methods described herein perform various image capture, image processing, image rendering, and related activities.
  • the described systems and methods are associated fields of virtual reality, mixed reality, augmented reality, video, volumetric video, 6 DOF (degrees of freedom) video, metaverse, cryptography, and non-fungible tokens (NFTs).
  • the described systems and methods include a recording device, such as a wearable recording device, that captures images and other data.
  • Rendering software processes the images and other data to produce a 3D video from, for example, the recording device wearer’s point of view (POV).
  • Player software may decode the 3D video so that it can be displayed on a 2D screen or head-mounted display, thereby producing the experience for the viewer of re-living the recorded experience.
  • the 3D video and/or player software may be packaged as an NFT.
  • Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
  • Computer storage media includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
  • SSDs solid state drives
  • PCM phase-change memory
  • An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network.
  • a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
  • Transmission media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
  • Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
  • the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
  • the disclosure may be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, hand- held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like.
  • the disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
  • program modules may be located in both local and remote memory storage devices.
  • ASICs application specific integrated circuits
  • a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code.
  • These example devices are provided herein for purposes of illustration, and are not intended to be limiting. Embodiments of the present disclosure may be implemented in further types of devices, as would be known to persons skilled in the relevant art(s).
  • At least some embodiments of the disclosure are directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium.
  • Such software when executed in one or more data processing devices, causes a device to operate as described herein.
  • a user may wear a head-mounted display, which shows images that create the effect of 3D for the user, and (ideally), a sense of immersion and presence in the virtual scene.
  • Virtual reality is related to augmented reality and mixed reality, where the user sees a combination of real and virtual worlds.
  • the metaverse is a loosely defined concept related to virtual reality, which includes a shared virtual space where users can interact with virtual objects using metaphors inspired by physical reality. Some experiences in the metaverse include interacting with photos or videos, or visiting places or moments from the past, present, or future. Ownership of virtual objects in the metaverse can be represented by NFTs.
  • 3D photo and video designed for VR, which create a greater sense of immersion and presence for the viewer.
  • some photos and videos for VR are stereoscopic, which produces an approximation of 3D perception for the user.
  • 3D VR video and photos which provide 6 degrees of freedom (6DOF) to translate and rotate the view, which enables a more immersive experience by reacting correctly to motion of the viewer’s head-mounted display.
  • 6DOF VR video involve storing a video plus an associated depth map.
  • wearable cameras can capture photos and video from a user's point of view (POV).
  • POV video is a genre of video in general. POV video can be watched in VR to create an immersive experience similar to re-living the memory of the recorded moment, although special care must be taken to avoid causing motion sickness to the viewer.
  • FIG. 1 is a block diagram illustrating an environment 100 within which an example embodiment may be implemented.
  • a recording device 102 is coupled to communicate with a server 104 and a computing system 108 via a data communication network 106.
  • Recording device 102 may record audio data, photo data, video data, and other sensor data.
  • recording device 102 may be a wearable device that is worn by a user, such as a pair of glasses, a pin, or a helmet.
  • the recording device may have additional functionality, such as virtual reality or augmented reality glasses.
  • recording device 102 includes one or more cameras that record video and, in some situations, additional sensors that include inertial measurement units (IMUs), microphones, or lidar.
  • IMUs inertial measurement units
  • recording device 102 may have onboard storage to save the captured data.
  • recording device 102 may have internet connectivity and may stream the recorded data to a remote server live.
  • the recording device may further have a physical or virtual user interface that enables the user to control the recording.
  • Server 104 performs various operations, such as executing rendering software 112, which generates 3D photo/video data 114, and the like.
  • Server 104 may access a database 116 to store and retrieve various types of data, such as 3D photo/video data 114.
  • 3D photo/video data 114 may be compressed and/or encoded.
  • rendering software 112 processes the data collected by recording device 102 to produce a 3D photo/video file and optional metadata.
  • Rendering software 112 may use images and/or IMU measurements to estimate the motion of recording device 102 with respect to an external coordinate frame (e.g., visual-inertial odometry).
  • Rendering software 112 may use any suitable method in the field of computer vision to produce a 3D representation of the scene, which could be a photo or a video, in which case the representation may vary as a function of time.
  • images may be processed to produce a 3D scene representation by a stereo disparity algorithm, multi-view stereo algorithm, photogrammetry, neural scene representation, and the like.
  • rendering software 112 encodes the (time-varying) 3D scene representation in a compressed format, which is suitable for efficient transmission over the internet and efficient decoding by player software, discussed herein.
  • rendering software 112 is optimized for data collected from a moving recording device (as would be the case if the device is worn), such that its output makes some modifications to minimize vestibulo-ocular conflict.
  • a moving recording device as would be the case if the device is worn
  • rendering software 112 is optimized for data collected from a moving recording device (as would be the case if the device is worn), such that its output makes some modifications to minimize vestibulo-ocular conflict.
  • horizon stabilization is one type of modification that fits this description, although others exist.
  • 3D photo/video data 114 consists of an image in a standard format (e.g., jpeg or png), a video in a standard format (e.g., mp4 or mov), a triangle mesh, the parameters of a neural network or differentiable computation graph, and/or additional metadata such as text, JSON, protobuf, or arbitrary binary data.
  • the image or video is partitioned into regions with different purposes. For example, one region may store color images, while another region stores alpha, depth or inverse depth information (potentially for multiple layers of a scene).
  • Computing system 108 performs various operations related to processing 3D photo/video data 112, generating NFTs, managing display devices, and the like as discussed herein.
  • a display device 110 is coupled to computing system 108 for presenting video data to a user.
  • Display device 110 may include a computer screen, VR headset, AR headset, and the like.
  • Another display device 118 may also receive video data from computing system 108 via data communication network 106. Although two display devices 110, 118 are shown in FIG. 1, particular embodiments may include any number of display devices that receive video data from one or more computing systems 108.
  • data communication network 106 includes any type of network topology using any communication protocol. Additionally, data communication network 106 may include a combination of two or more communication networks. In some implementations, data communication network 106 includes a cellular communication network, the Internet, a local area network, a wide area network, or any other communication network. In environment 100, data communication network 106 allows communication between server 104, computing system 108, and any number of other systems, devices, and the like.
  • FIG. 1 is given by way of example only. Other embodiments may include fewer or additional components without departing from the scope of the disclosure. Additionally, illustrated components may be combined or included within other components without limitation.
  • FIG. 2 is a block diagram illustrating an embodiment of computing system 108.
  • computing system 108 may include a communication manager 202, a CPU
  • Communication manager 202 allows computing system 108 to communicate with other systems, such as recording device 102 and server 104 shown in FIG. 1, and the like.
  • CPU 204 and GPU 208 can execute various instructions to perform the functionality provided by computing system 108, as discussed herein.
  • CPU memory 206 may store these instructions as well as other data used by CPU 204, GPU 208, and other modules and components contained in computing system 108.
  • GPU 208 processes 3D photo video data 114 received from server 104. This processing may include decompressing or decoding 3D photo video data 114.
  • computing system 108 may include an NFT generator 210 that generates various NFTs 212 that include different information.
  • NFT generator 210 may generate NFTs 212 that include 3D photo/video data 114, a video player application, and metadata.
  • a particular NFT 212 may represent a memory that includes, for example, various images and sounds associated with a first-person perspective or point of view.
  • a particular memory may have been captured by a specific person, at a particular time, and in a particular environment.
  • a memory allows any user to re-live the memory by experiencing the same sounds and images as the person who created the memory.
  • a memory may be captured (e.g., created) by a celebrity, sports star, or anyone else who wants to create a memory.
  • that memory may be included in an NFT and sold or otherwise transferred to one or more other people who can re-live that memory themselves from the perspective of the original person who captured the memory.
  • computing system 108 may include an NFT manager 214 that manages various NFTs 212 and may handle the distribution of NFTs 212 to other computing systems, other users, NFT marketplaces, and the like.
  • Video player software 216 can play or render various types of video data, such as 3D photo/video data 114. The played or rendered video data generated by video player software 216 may be communicated to a display device (e.g., display device 110 or 118), other computing systems, and the like.
  • a display device manager 218 manages any number of display devices and the display of information, such as video information, on one or more display devices. Additionally, display device manager 218 may coordinate the communication of video information to particular display devices.
  • computing system 108 decodes 3D photo/video data 114 using GPU 208 or similar graphics acceleration hardware, which is typically programmed using a graphics API (e.g., OpenGL).
  • graphics API e.g., OpenGL
  • the video display software executes code that produces a display image on a display device, such as a 2D screen or head-mounted display.
  • the display software precomputes some generic geometry, then decodes the 3D photo/video data into a texture which is accessible to vertex and fragment shaders.
  • the vertex shaders deform the precomputed geometry by sampling depth information from the texture.
  • the fragment shaders compute a color for each pixel on the display device by sampling color information from the texture.
  • the display device is a head-mounted display which tracks its motion in 6 DOF, and makes its current pose available to the vertex and fragment shaders via a uniform variable.
  • the vertex shader typically transforms the geometry using this pose so that the rendered scene responds to the user’s head movement.
  • frame-specific metadata is synchronized with a video, and copied into the vertex and fragment shader uniforms, then used as part of rendering.
  • frame-specific metadata might contain a rotation or 4x4 transform to be applied to the geometry in each frame, which undoes the motion of the recording device relative to the viewer in VR. This technique is important for minimizing vestibulo-ocular conflict when the recording device is wearable and/or captures video from the wearer’s point of view.
  • the display software is written in Javascript or any other language which can be executed in a web browser.
  • the display software may be one or more files or Javascript code which are executed by a browser as part of displaying a web page.
  • the in-browser display software renders using graphics acceleration hardware and a graphics API such as OpenGL.
  • the in-browser display software loads or streams the 3D photo/video data, and possibly additional metadata.
  • FIG. 3 is a flow diagram illustrating an embodiment of a non-fungible token (NFT) 300.
  • NFT 300 may be used in a virtual reality environment, an augmented reality environment, a mixed reality environment, a metaverse environment, and the like.
  • NFT 300 may represent a region of space-time in an environment such as a metaverse.
  • NFT 300 may be available for purchase in an NFT marketplace or other marketplace.
  • NFT 300 may be sold as an object in the metaverse or other environment.
  • NFT 300 may appear as part of a collection in the metaverse.
  • a particular NFT 300 may include 3D photo/video data 302 (e.g., 3D photo/video data 114 generated by rendering software 112), a video player application 304, and metadata 306.
  • NFT 300 contains 3D photo/video data 302 and the necessary video player application 304 to play the 3D photo/video data. Therefore, NFT 300 does not need an external video player application (e.g., external to the NFT) to play 3D photo/video data 302 contained in NFT 300.
  • playback of NFT 300 is initiated based on another object, a link, a trigger within an environment, and the like.
  • NFT 300 may include an index file (e.g., index.html file) that may identify the items in NFT 300.
  • Metadata 306 may be associated with 3D photo/video data 302 and can include a rotation matrix, a 4x4 transformation matrix, or other parameterization of a pose used to transform the 3D geometry of a scene to counteract camera motion and minimize vestibuloocular conflict. Metadata 306 may also include parameters of a neural network or differentiable computation graph that is responsible for at least partially rendering views of a 3D scene.
  • the NFT (e.g., NFT 300) is a token on the Ethereum blockchain (e.g., ERC-721) or any other blockchain suitable for constructing NFTs (e.g., Tezos, Solana, Polkadot, and the like).
  • the NFT can bundle a set of files and/or directories as associated data (for example, the NFT platform Hie Et Nunc supports creating NFTs with this structure). This may include one or more .html files which define a webpage or iframe content and/or embedded Javascript, one or more separate Javascript files, the 3D photo/video data, and optional additional metadata.
  • the NFT includes sufficient data as described herein to provide a complete system for replay of the 3D photo/video data on a 2D screen or head-mounted display, including both the data and a copy of the player software.
  • Typical NFTs represent only an image or video, whereas the systems and methods described herein may also include the player software as part of the NFT.
  • the player software is not part of the NFT (only the 3D photo/video data is part of the NFT), while in other embodiments the player software is part of the NFT.
  • the ultimate capability afforded by the described systems and methods is the ability to capture a 3D photo/video of an experience or memory from the point of view of the user of a wearable recording device.
  • the systems and methods then create the feeling of re-living the recorded experience when watched in VR and package all of the data and software necessary for replay in an NFT.
  • the described systems and methods may include an NFT that represents a memory or experience as well as the necessary data and software to enable a person to replay the memory or experience in VR without requiring any outside software or applications (e.g., video players).
  • NFT 300 represents an NFT for 3D VR video or immersive volumetric video.
  • immersive volumetric video may include an NFT that includes a 3D video representation and player for 3D or VR environments.
  • FIG. 4 is a flow diagram illustrating an embodiment of a process 400 for capturing photo data, video data, and other sensor data, then generating an NFT that includes the photo data, video data, other sensor data, and a video player application.
  • a recording device captures 402 raw photo data, video data, and other sensor data.
  • sensor data may include cameras, IMUs, microphones, or lidar.
  • Process 400 continues as the recording device sends 404 the captured photo data, video data, and other sensor data to a server or computing system.
  • Rendering software in a server then generates 406 3D photo/video data based on the photo data, video data, and other sensor data.
  • the generated 3D photo/video data is communicated 408 to a computing system or other device for processing.
  • the computing system receives the 3D photo/video data and generates 410 an NFT that includes the 3D photo/video data, video player software, and other data associated with the NFT.
  • the other data associated with the NFT includes various metadata discussed herein.
  • the computing system may store 412 the NFT, communicate the NFT to another computing system, or render the 3D photo/video data using the video player software within the NFT on a display device coupled to the computing system or otherwise accessible to the computing system.
  • FIG. 5 illustrates an embodiment of a process 500 for generating an NFT based on received compressed video data, metadata, and an application program.
  • process 500 receives 502 compressed video data from a recording device.
  • receives 504 from the recording device metadata associated with the compressed video data.
  • the metadata includes frame-specific metadata associated with frames in the compressed video data.
  • Process 500 continues by receiving 506 an application program configured to generate a real-time interactive experience for a user based on the compressed video data and the metadata associated with the compressed video data.
  • the process generates 508 an NFT that includes the compressed video data, the metadata associated with the compressed video data, and the application program.
  • FIG. 6 illustrates an embodiment of a marketplace 600 or collection of content represented by thumbnail images 602 that link to specific items of content.
  • the specific items of content in marketplace 600 may include NFTs for 3D POV videos and other types of content.
  • the thumbnail images 602 may change how they are rendered to show motion parallax based on the depth information in the NFT. For example, an NFT’s data and software may cause a preview of the photo or video to be rendered in marketplace 600.
  • the motion parallax may cause each thumbnail to look like a small window into a 3D world behind the thumbnail.
  • FIG. 7 illustrates an embodiment of an environment 700 that includes an avatar 704 in the metaverse that interacts with a picture frame 702 that is part of a shared virtual world.
  • a user is controlling avatar 704 within environment 700.
  • the user owns an NFT that can be used in environment 700.
  • avatar 704 is transported into the recorded experience or memory represented by the NFT.
  • environment 700 might initially contain any type of 3D content related to (or unrelated to) the NFT.
  • the original scene fades away or may be replaced with another visually appealing transition, with the content of the NFT.
  • the NFT does not replace the entire environment 700. Instead, the NFT replaces a portion of environment 700.
  • FIG. 8 illustrates an example block diagram of a computing device 800 suitable for implementing the systems and methods described herein.
  • a cluster of computing devices interconnected by a network may be used to implement any one or more components of the systems discussed herein.
  • Computing device 800 may be used to perform various procedures, such as those discussed herein.
  • Computing device 800 can function as a server, a client, or any other computing entity.
  • Computing device 800 can perform various functions as discussed herein, and can execute one or more application programs, such as the application programs described herein.
  • Computing device 800 can be any of a wide variety of computing devices, such as a desktop computer, a notebook computer, a server computer, a handheld computer, tablet computer and the like.
  • Computing device 800 includes one or more processor(s) 802, one or more memory device(s) 804, one or more interface(s) 806, one or more mass storage device(s) 808, one or more Input/Output (I/O) device(s) 810, and a display device 830 all of which are coupled to a bus 812.
  • Processor(s) 802 include one or more processors or controllers that execute instructions stored in memory device(s) 804 and/or mass storage device(s) 808.
  • Processor(s) 802 may also include various types of computer-readable media, such as cache memory.
  • Memory device(s) 804 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM) 814) and/or nonvolatile memory (e.g., read-only memory (ROM) 816). Memory device(s) 804 may also include rewritable ROM, such as Flash memory.
  • volatile memory e.g., random access memory (RAM) 814)
  • ROM read-only memory
  • Memory device(s) 804 may also include rewritable ROM, such as Flash memory.
  • Mass storage device(s) 808 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid-state memory (e.g., Flash memory), and so forth. As shown in Fig. 8, a particular mass storage device is a hard disk drive 824. Various drives may also be included in mass storage device(s) 808 to enable reading from and/or writing to the various computer readable media. Mass storage device(s) 808 include removable media 826 and/or non-removable media.
  • I/O device(s) 810 include various devices that allow data and/or other information to be input to or retrieved from computing device 800.
  • Example I/O device(s) 810 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like.
  • Display device 830 includes any type of device capable of displaying information to one or more users of computing device 800. Examples of display device 830 include a monitor, display terminal, video projection device, and the like.
  • Interface(s) 806 include various interfaces that allow computing device 800 to interact with other systems, devices, or computing environments.
  • Example interface(s) 806 include any number of different network interfaces 820, such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet.
  • Other interface(s) include user interface 818 and peripheral device interface 822.
  • the interface(s) 806 may also include one or more user interface elements 818.
  • the interface(s) 806 may also include one or more peripheral interfaces such as interfaces for printers, pointing devices (mice, track pad, etc.), keyboards, and the like.
  • Bus 812 allows processor(s) 802, memory device(s) 804, interface(s) 806, mass storage device(s) 808, and I/O device(s) 810 to communicate with one another, as well as other devices or components coupled to bus 812.
  • Bus 812 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Ecology (AREA)
  • Human Computer Interaction (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Environmental & Geological Engineering (AREA)
  • Environmental Sciences (AREA)
  • Remote Sensing (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Des systèmes et des procédés de traitement vidéo donnés à titre d'exemple sont décrits. Dans un mode de réalisation, des données vidéo compressées sont reçues en provenance d'un dispositif d'enregistrement. De plus, des métadonnées associées aux données vidéo compressées sont reçues de telle sorte que les métadonnées comportent des métadonnées spécifiques à des trames associées à des trames dans les données vidéo compressées. En outre, un programme d'application est reçu et configuré pour générer une expérience interactive en temps réel pour un utilisateur sur la base des données vidéo compressées et des métadonnées associées aux données vidéo compressées. Un jeton non fongible (JNF) est généré lequel comporte les données vidéo compressées, les métadonnées associées aux données vidéo compressées, et le programme d'application.
PCT/US2022/048704 2021-11-02 2022-11-02 Systèmes et procédés de traitement vidéo WO2023081213A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202163274831P 2021-11-02 2021-11-02
US63/274,831 2021-11-02
US17/867,036 2022-07-18
US17/867,036 US20230018560A1 (en) 2021-07-19 2022-07-18 Virtual Reality Systems and Methods
US17/961,051 2022-10-06
US17/961,051 US20230103814A1 (en) 2021-10-06 2022-10-06 Image Processing Systems and Methods

Publications (1)

Publication Number Publication Date
WO2023081213A1 true WO2023081213A1 (fr) 2023-05-11

Family

ID=86241869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/048704 WO2023081213A1 (fr) 2021-11-02 2022-11-02 Systèmes et procédés de traitement vidéo

Country Status (1)

Country Link
WO (1) WO2023081213A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160366330A1 (en) * 2015-06-11 2016-12-15 Martin Paul Boliek Apparatus for processing captured video data based on capture device orientation
US20210279695A1 (en) * 2019-04-08 2021-09-09 Transmira, Inc. Systems and methods for item acquisition by selection of a virtual object placed in a digital environment
US20210283496A1 (en) * 2016-05-26 2021-09-16 Electronic Scripting Products, Inc. Realistic Virtual/Augmented/Mixed Reality Viewing and Interactions
US20210295606A1 (en) * 2020-03-18 2021-09-23 Adobe Inc. Reconstructing three-dimensional scenes in a target coordinate system from multiple views

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160366330A1 (en) * 2015-06-11 2016-12-15 Martin Paul Boliek Apparatus for processing captured video data based on capture device orientation
US20210283496A1 (en) * 2016-05-26 2021-09-16 Electronic Scripting Products, Inc. Realistic Virtual/Augmented/Mixed Reality Viewing and Interactions
US20210279695A1 (en) * 2019-04-08 2021-09-09 Transmira, Inc. Systems and methods for item acquisition by selection of a virtual object placed in a digital environment
US20210295606A1 (en) * 2020-03-18 2021-09-23 Adobe Inc. Reconstructing three-dimensional scenes in a target coordinate system from multiple views

Similar Documents

Publication Publication Date Title
GB2553892B (en) 2D video with option for projected viewing in modeled 3D space
KR102332950B1 (ko) 구면 비디오 편집
Gauglitz et al. World-stabilized annotations and virtual scene navigation for remote collaboration
US10055888B2 (en) Producing and consuming metadata within multi-dimensional data
CN113661471B (zh) 混合渲染
EP1037167B1 (fr) Système et procédé de production et de reproduction de vidéos à trois dimensions
US10699471B2 (en) Methods and systems for rendering frames based on a virtual entity description frame of a virtual scene
US20130321396A1 (en) Multi-input free viewpoint video processing pipeline
WO2017062865A1 (fr) Systèmes, procédés et programmes logiciels pour plateformes de distribution de vidéo à 360°
JP6581742B1 (ja) Vr生放送配信システム、配信サーバ、配信サーバの制御方法、配信サーバのプログラム、およびvr生写真データのデータ構造
JP7200935B2 (ja) 画像処理装置および方法、ファイル生成装置および方法、並びにプログラム
JP2020520190A (ja) 仮想場面に関連付けられた映像再生およびデータの提供
CN112868224A (zh) 捕获和编辑动态深度图像的技术
KR102651534B1 (ko) 확장 현실 레코더
US9497487B1 (en) Techniques for video data encoding
US20230047123A1 (en) Video Processing Systems and Methods
WO2023081213A1 (fr) Systèmes et procédés de traitement vidéo
US20180160133A1 (en) Realtime recording of gestures and/or voice to modify animations
Du Fusing multimedia data into dynamic virtual environments
CN114341944A (zh) 计算机生成的现实记录器
KR101773929B1 (ko) 광 시야각 영상 처리 시스템, 광 시야각 영상의 전송 및 재생 방법, 및 이를 위한 컴퓨터 프로그램
US20240185535A1 (en) Method for collaboratively creating augmented reality story and system therefor
US20240169595A1 (en) Method for analyzing user input regarding 3d object, device, and non-transitory computer-readable recording medium
KR102685040B1 (ko) 사용자 움직임 기록 기반의 영상 제작 장치 및 방법
WO2016009695A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations, système de fourniture d'œuvre écrite et programme d'ordinateur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22890722

Country of ref document: EP

Kind code of ref document: A1