CN110351492B - Video data processing method, device and medium - Google Patents

Video data processing method, device and medium Download PDF

Info

Publication number
CN110351492B
CN110351492B CN201810303003.0A CN201810303003A CN110351492B CN 110351492 B CN110351492 B CN 110351492B CN 201810303003 A CN201810303003 A CN 201810303003A CN 110351492 B CN110351492 B CN 110351492B
Authority
CN
China
Prior art keywords
spherical
video
box
omnidirectional video
scaling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810303003.0A
Other languages
Chinese (zh)
Other versions
CN110351492A (en
Inventor
黄成�
王磊
陈颖川
李秋婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201810303003.0A priority Critical patent/CN110351492B/en
Priority to PCT/CN2019/078913 priority patent/WO2019192321A1/en
Publication of CN110351492A publication Critical patent/CN110351492A/en
Application granted granted Critical
Publication of CN110351492B publication Critical patent/CN110351492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2624Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of whole input images, e.g. splitscreen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation

Abstract

A video data processing method, apparatus and system are disclosed. The video data processing method comprises the following steps: identifying a projected omnidirectional video box based on a scheme type parameter in the limited scheme information box; determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image. The technical scheme can ensure the quality of the zoomed video aiming at the region of interest of the user and the consistency of the zooming operation of the video in the spherical region, thereby improving the watching experience of the user.

Description

Video data processing method, device and medium
Technical Field
The present invention relates to the field of video processing and transmission technologies, and in particular, to a method, an apparatus, and a system for processing video data.
Background
VR (Virtual Reality) refers to the simulation of a three-dimensional Virtual world by using computer technology, so that a user experiences a highly realistic Virtual space environment in the aspects of vision, hearing, touch, taste, etc. With the rapid development of VR technology, VR videos (also called panoramic videos) are applied more and more widely on demand and live broadcasts, and immersive video service experience is brought to users.
However, since an omnidirectional video viewing experience in the horizontal direction (e.g., 360 degree range) and the vertical direction (e.g., 180 degree range) is provided at the same time, the video quality or resolution (number of pixels per degree) per unit area of the panoramic video is much lower than that of the conventional single-view video at the same video rate. The overall video code rate of the panoramic video is limited by the current network transmission bandwidth condition, and the general image quality of the region of interest of the user in the panoramic video is not high due to the factors, so that the service experience of the user is greatly influenced.
In view of the above problems in the related art, no effective solution has been found at present.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present invention provide a method, an apparatus, and a system for processing video data, so as to ensure the quality of a zoomed video for a region of interest of a user and achieve the consistency of video zooming operations in a spherical region, thereby improving the viewing experience of the user.
According to a first aspect of the present application, an embodiment of the present invention provides a video data processing method, including:
identifying a projected omnidirectional video box based on a scheme type parameter in the limited scheme information box;
determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image.
According to a second aspect of the present application, an embodiment of the present invention provides a video data processing method, including:
identifying a scaled omnidirectional video box based on a scheme type parameter in the restricted scheme information box;
and determining the decoding frame of the video data as a scaled omnidirectional video image according to the scaled omnidirectional video box.
According to a third aspect of the present application, an embodiment of the present invention provides a video data processing method, including:
identifying a projected omnidirectional video box and a scaled omnidirectional video box based on a scheme type parameter in the limited scheme information box;
determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the projected omnidirectional video image.
According to a fourth aspect of the present application, an embodiment of the present invention provides a video data processing method, including:
identifying the video data file as a spherical scaling timing metadata track based on the sample entry type;
wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video.
An embodiment of the present invention provides a video data processing apparatus, including:
a first processing module for identifying a projected omnidirectional video box based on a scheme type parameter in the restricted scheme information box;
the second processing module is used for determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image.
An embodiment of the present invention provides a video data processing apparatus, including:
a first processing module for identifying a scaled omnidirectional video box based on a scheme type parameter in a restricted scheme information box;
a second processing module, configured to determine, according to the scaled omnidirectional video box, that a decoded frame of the video data is a scaled omnidirectional video image.
An embodiment of the present invention provides a video data processing apparatus, including:
a first processing module for identifying a projected omnidirectional video box and a scaled omnidirectional video box based on a scheme type parameter in the restricted scheme information box;
the second processing module is used for determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the projected omnidirectional video image.
An embodiment of the present invention provides a video data processing apparatus, including:
a first processing module for determining a sample inlet type;
a second processing module for identifying the video data file as a spherical zoom timing metadata track based on the sample entry type; wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video.
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored on the memory and executable on the processor, the video data processing program, when executed by the processor, implementing the steps of the video data processing method according to the first aspect of the present application.
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored on the memory and executable on the processor, the video data processing program, when executed by the processor, implementing the steps of the video data processing method according to the second aspect of the present application.
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored on the memory and executable on the processor, the video data processing program, when executed by the processor, implementing the steps of the video data processing method according to the third aspect of the present application.
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored in the memory and executable on the processor, wherein the video data processing program when executed by the processor implements the steps of the video data processing method according to the fourth aspect of the present application.
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method according to the first aspect of the present application are implemented.
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method according to the second aspect of the present application are implemented.
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method according to the third aspect of the present application are implemented.
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method according to the fourth aspect of the present application are implemented.
The technical scheme of the embodiment of the invention provides spherical zoom information for the omnidirectional video track, indicates a spherical zoom area of a zoom view relative to a complete spherical view at any time point of the omnidirectional video, and spherical zoom parameters adopted by zoom views of different versions. In the process of playing the omnidirectional video, the quality of the zoomed video of the region of interest of the user is guaranteed, and the zoom operation continuity of the video of the spherical region is realized, so that the watching experience of the user is improved.
Drawings
Fig. 1 is a flowchart of a video data processing method according to embodiment 1 of the present invention;
fig. 2is a flowchart of a video data processing method according to embodiment 2 of the present invention;
fig. 3 is a flowchart of a video data processing method according to embodiment 3 of the present invention;
fig. 4 is a flowchart of a video data processing method according to embodiment 4 of the present invention;
fig. 5 is a block diagram of a video data processing apparatus according to embodiment 5 of the present invention;
fig. 6 is a block diagram of a video data processing apparatus according to embodiment 6 of the present invention;
fig. 7 is a block diagram of a video data processing apparatus according to embodiment 7 of the present invention;
fig. 8 is a block diagram of a video data processing apparatus according to embodiment 8 of the present invention;
FIG. 9 is a block diagram of a video data processing system of example 1 of the present invention;
FIG. 10 is a schematic diagram of the internal interaction of a video data processing system in example 1 of the present invention;
fig. 11 is a schematic diagram of a track reference box for a scaled omnidirectional video track in example 3 of the present invention;
fig. 12is a diagram of a track group box for a scaled omnidirectional video in example 3 of the present invention;
fig. 13 is a first schematic diagram of a spherical zoom timing metadata track referencing an omnidirectional video track in example 5 of the present invention;
fig. 14 is a second schematic diagram of a spherical zoom timing metadata track referencing an omnidirectional video track in example 5 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In general, embodiments of the invention provide techniques that may be used for video data processing. In some embodiments, storing omnidirectional video data in a file based on an ISO (International Organization for Standardization) base media file format is implemented. Among them, the ISO Base Media File formats such as the limited scheme information box, the track reference box, the track Group box, etc. can operate with reference to the MPEG-4Part 12ISO Base Media File Format established by the ISO/IEC JTC1/SC29/WG11 Moving Picture Experts Group (MPEG for short). The projection of the omnidirectional video, the encapsulation steps and their basic format may operate with reference to the MPEG-I Part 2OMAF (omnidirectional media format) as established by the ISO/IEC JTC1/SC29/WG11 Motion Picture Experts Group (MPEG).
All data in the ISO base file format is contained in boxes (boxes), i.e., the ISO base file format represented by MP4 files consists of several boxes, each having a type and a length, which can be regarded as one data object. One cassette may contain another cassette, referred to as a container cassette. An MP4 file will first have and only have a box of the type "ftyp" that acts as a flag for the file format and contains some information about the file. There will then be and only one "MOOV" type Box (Movie Box), which is a container Box whose sub-boxes contain metadata information for the media. The Media Data of the MP4 file is contained in an "mdat" type Box (Media Data Box), which is also a container Box, and there may be a plurality of or none (when the Media Data all refer to other files), and the structure of the Media Data is described by metadata.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a video data processing method, including:
step S110, identifying a projected omnidirectional video box based on the scheme type parameters in the limited scheme information box;
step S120, determining the decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image.
In one embodiment, the identifying a projected omnidirectional video box based on the solution type parameter in the restricted solution information box includes:
and if the scheme type parameter value in the limited scheme information box is a first scheme type, indicating that the video data uses a projected omnidirectional video scheme.
The first scheme type may be 'podv' (projected systematic video), and the first scheme type may also be a "four character code" value in other similar forms.
In one embodiment, the projected omnidirectional video box has spherical zoom syntax elements, including:
the overlay information box of the projected omnidirectional video box contains a scaling format box having the spherical scaling syntax element.
In one embodiment, the projected omnidirectional video box has spherical zoom syntax elements, including:
the projected omnidirectional video box comprises a spherical area scaling box having the spherical scaling syntax element.
In one embodiment, the spherical zoom region of the projected omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameters of the projected omnidirectional video image include one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
Example 2
As shown in fig. 2, an embodiment of the present invention provides a video data processing method, including:
step S210, identifying a scaled omnidirectional video box based on the scheme type parameter in the limited scheme information box;
step S220, determining the decoded frame of the video data as a scaled omnidirectional video image according to the scaled omnidirectional video box.
In one embodiment, the identifying a scaled omnidirectional video box based on the scheme type parameter in the restricted scheme information box includes:
and if the scheme type parameter in the limited scheme information box takes the value of a second scheme type, indicating that the video data uses a scaled omnidirectional video scheme.
Wherein, the second scheme type may take the value of 'zodv' (zoomed omni directional video); the second scheme type value can also be a four-character code value in other similar forms.
In one embodiment, a plan information box of the restricted plan information boxes includes a scaled omnidirectional video box indicating a format of the scaled omnidirectional video image.
In one embodiment, the scaled omnidirectional video box has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the scaled omnidirectional video image.
In one embodiment, the spherical zoom region of the zoomed omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the scaled omnidirectional video image comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
wherein the scaled omnidirectional video track contains a track reference box, a track identification parameter in the track reference box referencing a track identifier of the projected omnidirectional video track.
In one embodiment, if the reference type parameter in the track reference box takes the value of the first reference type, then auxiliary zoom video information of the omnidirectional video track containing the referenced projection in the zoomed omnidirectional video track is indicated.
Wherein the first reference type may take the value 'vzom'; the first reference type value can also be a "four character code" value in other similar forms.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video track includes a track group type box;
if the track group type parameter of the track group type box takes on the value of the first track group type, indicating that the scaled omnidirectional video track belongs to a scaled omnidirectional video group.
Wherein, the first track group type parameter may take the value 'zoom'; the first track group type value can also be a similar four-character code value in other forms.
In one embodiment, the scaled omnidirectional video group includes a scaled omnidirectional video track and a projected omnidirectional video track corresponding to the same content source.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video tracks contain track selection boxes with a list of attributes that describe or distinguish the different scaled omnidirectional video tracks.
In one embodiment, the list of attributes includes at least one of the following attributes:
one or more content in the video track covers a spherical area;
scaling of spherical areas in a video track;
a type of scaling algorithm for a spherical area in a video track;
a boundary symbolization type of a spherical area in a video track;
type of zoom area of the spherical area in the video track.
Example 3
As shown in fig. 3, an embodiment of the present invention provides a video data processing method, including:
step S310, identifying a projected omnidirectional video box and a scaled omnidirectional video box based on the scheme type parameters in the limited scheme information box;
step S320, determining the decoded frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the projected omnidirectional video image.
In one embodiment, if the scaled omnidirectional video box is not present in the restricted profile information box, it is indicated that spherical scaled video is not present in the projected omnidirectional video image.
In one embodiment, the spherical zoom region of the projected omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the projected omnidirectional video image comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
Example 4
As shown in fig. 4, an embodiment of the present invention provides a video data processing method, including:
step S410, determining the type of a sample inlet;
step S420, identifying the video data file as a spherical zooming timing metadata track based on the sample entry type; wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video.
In one embodiment, the spherical scaled timed metadata track references one or more omnidirectional video tracks by referencing a track reference box of a second reference type.
Wherein, the second reference type can take the value 'cdsc'; the second reference type value can also be other character strings or values in other forms.
In one embodiment, the spherical-scale-timing metadata track references a track group identification (track _ group _ id) of a track group having a track group type (track _ group _ type) of the first track group type by referencing a track reference box having a second reference type.
Wherein, the first track group type parameter may take the value 'zoom'; the first track group type value can also be a similar four-character code value in other forms.
In one embodiment, the track sample entry type used by the spherical scaling timing metadata track (timed metadata for spatial mapping) is a first sample entry type.
Wherein the type value of the first sample inlet is 'spzm'; the first sample entry type value can also be a value of other character strings or other forms.
In one embodiment, a spherical scale information box is included in a sample entry of the spherical scale timing metadata track, the spherical scale information box having the spherical scale syntax element.
In one embodiment, each sample of the spherical scale timing metadata track has a spherical scale syntax element that indicates a spherical scale region of the referenced omnidirectional video.
In one embodiment, the spherical zoom region of the omnidirectional video includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the omnidirectional video comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
In one embodiment, the spherical zoom timing metadata track indicates a spherical zoom region and/or spherical zoom parameters of the referenced omnidirectional video track based on a director's clip or based on statistical measurements.
Example 5
As shown in fig. 5, an embodiment of the present invention provides a video data processing apparatus, including:
a first processing module 501, configured to identify a projected omnidirectional video box based on a scheme type parameter in the limited scheme information box;
a second processing module 502, configured to determine, according to the projected omnidirectional video box, that a decoded frame of the video data is a projected omnidirectional video image; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image.
In one embodiment, a first processing module identifies a projected omnidirectional video box based on a schema type parameter in a constrained schema information box, comprising:
and if the scheme type parameter value in the limited scheme information box is a first scheme type, indicating that the video data uses a projected omnidirectional video scheme.
The first scheme type may be 'podv' (projected systematic video), and the first scheme type may also be a "four character code" value in other similar forms.
In one embodiment, the projected omnidirectional video box has spherical zoom syntax elements, including:
the overlay information box of the projected omnidirectional video box contains a scaling format box having the spherical scaling syntax element.
In one embodiment, the projected omnidirectional video box has spherical zoom syntax elements, including:
the projected omnidirectional video box comprises a spherical area scaling box having the spherical scaling syntax element.
In one embodiment, the spherical zoom region of the projected omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameters of the projected omnidirectional video image include one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
Example 6
As shown in fig. 6, an embodiment of the present invention provides a video data processing apparatus, including:
a first processing module 601 for identifying a scaled omnidirectional video box based on a scheme type parameter in a restricted scheme information box;
a second processing module 602, configured to determine, according to the scaled omnidirectional video box, that a decoded frame of the video data is a scaled omnidirectional video image.
In one embodiment, the first processing module identifies a scaled omnidirectional video box based on a scheme type parameter in a restricted scheme information box, comprising:
and if the scheme type parameter in the limited scheme information box takes the value of a second scheme type, indicating that the video data uses a scaled omnidirectional video scheme.
Wherein, the second scheme type may take the value of 'zodv' (zoomed omni directional video); the second scheme type value can also be a four-character code value in other similar forms.
In one embodiment, a plan information box of the restricted plan information boxes includes a scaled omnidirectional video box indicating a format of the scaled omnidirectional video image.
In one embodiment, the scaled omnidirectional video box has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the scaled omnidirectional video image.
In one embodiment, the spherical zoom region of the zoomed omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the scaled omnidirectional video image comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
wherein the scaled omnidirectional video track contains a track reference box, a track identification parameter in the track reference box referencing a track identifier of the projected omnidirectional video track.
In one embodiment, if the reference type parameter in the track reference box takes the value of the first reference type, then auxiliary zoom video information of the omnidirectional video track containing the referenced projection in the zoomed omnidirectional video track is indicated.
Wherein the first reference type may take the value 'vzom'; the first reference type value can also be a "four character code" value in other similar forms.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video track includes a track group type box;
if the track group type parameter of the track group type box takes on the value of the first track group type, indicating that the scaled omnidirectional video track belongs to a scaled omnidirectional video group.
Wherein, the first track group type parameter may take the value 'zoom'; the first track group type value can also be a similar four-character code value in other forms.
In one embodiment, the scaled omnidirectional video group includes a scaled omnidirectional video track and a projected omnidirectional video track corresponding to the same content source.
In one embodiment, the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video tracks contain track selection boxes with a list of attributes that describe or distinguish the different scaled omnidirectional video tracks.
In one embodiment, the list of attributes includes at least one of the following attributes:
one or more content in the video track covers a spherical area;
scaling of spherical areas in a video track;
a type of scaling algorithm for a spherical area in a video track;
a boundary symbolization type of a spherical area in a video track;
type of zoom area of the spherical area in the video track.
Example 7
As shown in fig. 7, an embodiment of the present invention provides a video data processing apparatus, including:
a first processing module 701 for identifying a projected omnidirectional video box and a scaled omnidirectional video box based on a scheme type parameter in the restricted scheme information box;
a second processing module 702, configured to determine, according to the projected omnidirectional video box, that a decoded frame of the video data is a projected omnidirectional video image; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the projected omnidirectional video image.
In one embodiment, if the scaled omnidirectional video box is not present in the restricted profile information box, it is indicated that spherical scaled video is not present in the projected omnidirectional video image.
In one embodiment, the spherical zoom region of the projected omnidirectional video image includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the projected omnidirectional video image comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
Example 8
As shown in fig. 8, an embodiment of the present invention provides a video data processing apparatus, including:
a first processing module 801 for determining a sample inlet type;
a second processing module 802 for identifying the video data file as a spherical zoom timing metadata track based on the sample entry type; wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video.
In one embodiment, the spherical scaled timed metadata track references one or more omnidirectional video tracks by referencing a track reference box of a second reference type.
Wherein, the second reference type can take the value 'cdsc'; the second reference type value can also be other character strings or values in other forms.
In one embodiment, the spherical-scale-timing metadata track references a track group identification (track _ group _ id) of a track group having a track group type (track _ group _ type) of the first track group type by referencing a track reference box having a second reference type.
Wherein, the first track group type parameter may take the value 'zoom'; the first track group type value can also be a similar four-character code value in other forms.
In one embodiment, the track sample entry type used by the spherical scaling timing metadata track (timed metadata for spatial mapping) is a first sample entry type.
Wherein the type value of the first sample inlet is 'spzm'; the first sample entry type value can also be a value of other character strings or other forms.
In one embodiment, a spherical scale information box is included in a sample entry of the spherical scale timing metadata track, the spherical scale information box having the spherical scale syntax element.
In one embodiment, each sample of the spherical scale timing metadata track has a spherical scale syntax element that indicates a spherical scale region of the referenced omnidirectional video.
In one embodiment, the spherical zoom region of the omnidirectional video includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
In one embodiment, the spherical scaling parameter of the omnidirectional video comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
In one embodiment, the spherical zoom timing metadata track indicates a spherical zoom region and/or spherical zoom parameters of the referenced omnidirectional video track based on a director's clip or based on statistical measurements.
Example 9
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored in the memory and executable on the processor, wherein the video data processing program, when executed by the processor, implements the steps of the video data processing method described in embodiment 1 above.
Example 10
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored in the memory and executable on the processor, wherein the video data processing program, when executed by the processor, implements the steps of the video data processing method described in embodiment 2 above.
Example 11
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored in the memory and executable on the processor, wherein the video data processing program, when executed by the processor, implements the steps of the video data processing method described in embodiment 3 above.
Example 12
An embodiment of the present invention provides a video data processing apparatus, including:
a memory, a processor and a video data processing program stored in the memory and executable on the processor, wherein the video data processing program, when executed by the processor, implements the steps of the video data processing method described in embodiment 4 above.
Example 13
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method described in embodiment 1 above are implemented.
Example 14
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method described in embodiment 2 above are implemented.
Example 15
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method described in embodiment 3 above are implemented.
Example 16
An embodiment of the present invention provides a computer-readable storage medium, where a video data processing program is stored on the computer-readable storage medium, and when the video data processing is executed by the processing module, the steps of the video data processing method described in embodiment 4 above are implemented.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The video processing scheme of the present application is illustrated below by some examples.
Example 1
As shown in fig. 9, the present example provides a schematic structural composition diagram of a video data processing system, including: a video data processing server (10) and a video data processing terminal (20). The video data processing server is responsible for capturing, coding compression, fragment encapsulation processing, storage, transmission control and the like of an audio and video source, and can comprise a content acquisition module (101), a coding module (102), an encapsulation module (103) and a storage and transmission module (104).
In the schematic diagram of the video data processing system shown in fig. 10, the content obtaining module, the encoding module, and the encapsulating module in fig. 9 are responsible for content production, and the storage and transmission module in fig. 9 is implemented on a server.
And the content acquisition module is used for recording the sound-vision scene of the real physical world by utilizing a group of cameras or a camera device with a plurality of cameras and sensors and an audio sensor.
Video images shot by different cameras at the same time are spliced into an omnidirectional video and projected on a unit sphere. In this process, according to director art processing or user viewing statistics, a spherical zoom operation needs to be performed on a specific area of the omnidirectional video, and spherical zoom metadata is generated, which at least includes: sphere zoom area, sphere zoom parameter.
The position of the spherical zoom area is represented by a central point of the spherical zoom area and a range of the spherical zoom area. And the central point of the spherical zooming area is represented by the azimuth angle, the pitch angle and the inclination angle which are rotated when the unit sphere coordinate axis origin is moved to the central point of the spherical zooming area. The spherical zoom region range refers to an azimuth angle range and a pitch angle range passing through the center point of the spherical region.
The spherical scaling parameters include at least: the spherical area scales the video scale, the scaling algorithm type, the boundary symbolization type, and scales the video text description, etc.
And the coding module is responsible for coding and compressing the digital video signals and audio signals output by the content acquisition module, and usually generates audio and video elementary streams with multiple code rates so as to cover different network bandwidth requirements.
And the packaging module is responsible for packaging the original audio and video basic stream into a plurality of media fragment files with fixed time intervals and providing index information of the media fragment files. The index information is such as: media Presentation Description (MPD) in Dynamic Adaptive Streaming over HTTP (DASH), or Media Description file (M3U8) in HTTP (HTTP Live Streaming) based Streaming Media network transport Protocol (HLS) in HyperText Transfer Protocol (HTTP).
In addition, the encapsulation module is also responsible for adding the omnidirectional video spherical zoom metadata generated by the content acquisition module to one or more media files, and the method includes the following steps: different versions of an omnidirectional video track, or a timed metadata track.
In addition, the encapsulation module is also responsible for adding the omnidirectional video spherical zoom metadata generated by the content acquisition module to one or more index information, such as: a media presentation description file.
And the storage and transmission module is responsible for storing the media fragment file output by the packaging module and the index information of the media fragment file. The storage and transmission module may be any suitable type of Network server, such as a central node of a Content Delivery Network (CDN), an edge node server, or a proxy server, a Web server, or a combination thereof.
And the video data processing terminal is responsible for providing support for access, decoding, caching and playing operations of media resources such as omnidirectional videos and the like.
The video data processing terminal comprises a streaming media client, such as a DASH client and an HLS client. Analyzing the index information of the media fragment file, and acquiring a corresponding media fragment file according to the change request of the visual angle posture of the user, such as: an omnidirectional video file. Wherein:
the streaming media client controls the rendering of the omnidirectional video in the spherical area zooming video by extracting the spherical zooming metadata (at least comprising a spherical zooming area and spherical zooming parameters) in the omnidirectional video track file or the timing metadata track file.
The streaming media client also requests to access the omnidirectional video file with the corresponding spherical zoom area and the spherical zoom parameters according to the spherical zoom operation user instruction.
As shown in fig. 10, a video data processing terminal video player, such as a virtual reality head-mounted display device (HDM), can track the change of the user's visual angle posture and magnify the image on the micro display screen through a visual optical system located in front of the user's eyes, providing VR video immersive display effect.
Example 2
The present example provides a video data processing method that indicates a spherical zoom region and/or spherical zoom parameters of a projected omnidirectional video with a projected omnidirectional video box.
The present example provides a video data processing method, and the flow may include the steps of:
step S202, identifying a projected omnidirectional video box based on the scheme type parameters in the limited scheme information box;
step S204, determining the decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters of the projected omnidirectional video image.
The type of omnidirectional video scheme for projection is described in step S202, and the omnidirectional video scheme for projection is described below with reference to an alternative embodiment.
For the restricted video sample entry type 'resv', the projected omnidirectional video scheme is used to indicate that the decoded image is an encapsulated image containing monocular or binocular stereoscopic content. If the scheme _ type within the scheme type box in the limited scheme information box is equal to 'podv' (projected omni-directional video), it indicates that the projected omni-directional video scheme is used.
The format of the projected omnidirectional video image is represented using a projected omnidirectional video box (ProjectedOmniVideoBox) contained in a scheme information box (schemelnformationbox). When the type of protocol is 'podv', there is one and only one project omnivideobox in the schemelnformationbox.
In step S204, it is stated that the projected omnidirectional video box has a spherical scaling syntax element therein, and the spherical scaling syntax element indicates a spherical scaling region and/or a spherical scaling parameter of the projected omnidirectional video image.
The spherical zoom region of the projected omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
The spherical scaling parameters of the projected omnidirectional video image comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
The spherical scaling syntax element that the projected omnidirectional video box has is described below in conjunction with an alternative embodiment.
The first optional implementation mode:
the spherical zoom of the projected omnidirectional video image is represented by a spherical area zoom box (sphere area zoom box) contained in the projected omnidirectional video box (projectedomnivideo box), and is used for indicating information such as a spherical zoom area and a spherical zoom parameter of the projected omnidirectional video image.
SphereRegionZoomingBox (spherical area zoom Box)
Box Type:'srwz'
Container:ProjectedOmniVideoBox
Mandatory:No
Quantity:Zero or one
Grammar for grammar
Figure GDA0003179631340000241
Figure GDA0003179631340000251
Semantics
zoom _ shape _ type is used to specify the shape of a spherical area representing the coverage of content. zoom _ shape _ type equal to 0 indicates that the spherical area is specified by four large circles. zoom _ shape _ type equal to 1 indicates that the spherical area is specified by two azimuth circles and two elevation circles.
num _ regions is used to specify the number of spherical regions.
A value of 0 for view _ idc _ presence _ flag indicates that view _ idc [ i ] is not present, a value of 1 indicates that view _ idc [ i ] is present, and indicates that the scaled video of the spherical region is related to a specific view (left, right, or both).
A default _ view _ idc value of 0 indicates that the scaled video of the spherical area is all single-purpose. A value of 1 indicates that the scaled video of the spherical area is on the left view. A value of 2 indicates that the scaled video of the spherical area is on the right view. A value of 3 indicates that the scaled video of the spherical area contains a left view and a right view.
The value of view _ idc [ i ] is 1, which indicates that the zoom video of the ith spherical area is on the left view. A value of 2 indicates that the scaled video of the ith spherical area is on the right view. A value of 3 indicates that the scaled video of the ith spherical area contains a left view and a right view. The value 0 is retained.
zoom _ ratio is used to represent a scaling ratio at which the spherical area scales the video.
zoom _ algorithmjtyp is used to identify the scaling algorithm that the spherical region scales the video.
zoom _ systematic _ type is used to represent the symbol type of the spherical area zoom video boundary.
zoom _ area _ type is used to indicate an omnidirectional video spherical zoom area type.
zoom _ description is a UTF-8 string ending with a null character that provides a textual description of the scaled video.
Optional embodiment two:
the spherical surface of the projected omnidirectional video image is represented by a zoom format box (zoomformat box) contained in a coverage information box (CoverageInformationBox) in the projected omnidirectional video box (ProjectedOmniVideoBox), and is used for indicating information such as a spherical zoom area, a spherical zoom parameter and the like of the projected omnidirectional video image. At this time, all spherical areas covered by the projected omnidirectional video are the zoom video.
coverageInformationBox (overlay information box)
Box Type:'covi'
Container:ProjectedOmniVideoBox
Mandatory:No
Quantity:Zero or one
Figure GDA0003179631340000261
Semantics
zoom _ ratio is used to represent a scaling ratio at which the spherical area scales the video.
zoom _ algorithmjtyp is used to identify the scaling algorithm that the spherical region scales the video.
zoom _ systematic _ type is used to represent the symbol type of the spherical area zoom video boundary.
zoom _ area _ type is used to indicate an omnidirectional video spherical zoom area type.
zoom _ description is a UTF-8 string ending with a null character that provides a textual description of the scaled video.
Example 3
The present example provides a video data processing method that indicates a spherical zoom region and/or spherical zoom parameters of a zoomed omnidirectional video with the zoomed omnidirectional video.
The present example provides a video data processing method, comprising:
step S302, identifying a scaled omnidirectional video box based on the scheme type parameter in the limited scheme information box;
step S304, determining the decoded frame of the video data as a scaled omnidirectional video image according to the scaled omnidirectional video box.
The scaled omnidirectional video scheme type is described in step S302, and the scaled omnidirectional video scheme type is described below with reference to an alternative embodiment.
For the restricted video sample entry type 'resv', a scaled omnidirectional video scheme is used to indicate that the decoded image is a scaled image containing monocular or binocular stereoscopic content. If scheme _ type within a scheme type box (SchemeTypeBox) in a restricted scheme information box (retrievedschemeinfobox) is equal to 'zoodv' (zoom integrative video), it is indicated that a scaled omnidirectional video scheme is used.
The format of the scaled omnidirectional video image is represented using a scaled omnidirectional video box (zoomed omnivideobox) contained in a scheme information box (schemelnformationbox). When the protocol type is 'zodv', there is one and only one zoomed omnivideobox in the schemelnformationbox.
The 'zodv' scheme type is defined as an open scheme type of scaled omnidirectional video.
When the zoomed omnivideobox appears in the schemelnformationbox, the projected omnidirectional video box (ProjectedOmniVideoBox) must appear in the same schemelnformationbox. Wherein a scaled video format of the projected omnidirectional video is indicated by a zoomed omnivideobox contained in the schemelnformationbox.
When the zoomed omnivideobox appears in the schemelnformationbox, a stereoscopic video box (StereoVideoBox) may appear in the same schemelnformationbox.
Wherein the scaled omnidirectional video box has a spherical scaling syntax element indicating a spherical scaling parameter of the scaled omnidirectional video image.
The spherical zoom region of the zoomed omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
The spherical scaling parameters of the scaled omnidirectional video image comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
A scaled omnidirectional video box is described below in conjunction with an alternative embodiment.
Zoomed omnivideobox (zoomed omnidirectional video box)
Box Type:'zodv'
Container:SchemeInformationBox
Mandatory:Yes(when the SchemeType is'zodv')
Quantity:One
Zoomed omnivideobox (scaled omnidirectional video box) is used to indicate a decoded frame as a scaled omnidirectional video image containing monocular or binocular stereoscopic content. When SchemeType is equal to 'zodv', a scaled omnidirectional video box zoomed omnivideobox should be used.
Grammar for grammar
aligned(8)class ZoomedOmniVideoBox extends FullBox('zodv',0,0)
{
unsigned int(8)zoom_ratio;
unsigned int(8)zoom_algorithm_type;
unsigned int(8)zoom_symbolization_type;
unsigned int(8)zoom_area_type;
string zoom_description;
}
Semantics
zoom _ ratio is used to represent a scaling ratio at which the spherical area scales the video.
zoom _ algorithmjtyp is used to identify the scaling algorithm that the spherical region scales the video.
zoom _ systematic _ type is used to represent the symbol type of the spherical area zoom video boundary.
zoom _ area _ type is used to indicate an omnidirectional video spherical zoom area type.
zoom _ description is a UTF-8 string ending with a null character that provides a textual description of the scaled video.
In step S304, the scaled omnidirectional video box indicates the scaled omnidirectional video image, i.e., the scaled omnidirectional video track, and the scaled omnidirectional video track will be described with reference to the optional implementation.
In a file containing video samples, if there is a limited scheme information Box (contained in a Track Box (Track Box) in a Movie Box (Movie Box)) and a scheme type (scheme _ type) parameter within the scheme type Box of the limited scheme information Box is equal to 'zodv', the video file is instructed to use a scaled omnidirectional video scheme, containing scaled omnidirectional video tracks. A Track Reference Box (Track Reference Box) may be included in the scaled omnidirectional video Track, and is described below in connection with an optional implementation.
TrackReferenceBox (track citation box)
Box Type:'tref'
Container:TrackBox
Mandatory:No
Quantity:Zero or one
Grammar for grammar
Figure GDA0003179631340000301
Semantics
track _ IDs provides an integer array of track identifiers for referenced tracks, the values in the array must not be repeated.
reference _ type is set to one of the following values:
'vzom' which contains auxiliary zoom video information for the referenced track.
Fig. 11 is a schematic diagram of a track reference box of a scaled omnidirectional video track according to an embodiment of the invention.
As shown in fig. 11, a Track Reference Box (Track Reference Box) is included in the scaled omnidirectional video Track, and a Track identification parameter (Track IDs [ ]) in the Track Reference Box provides identification of the referenced video Track, i.e., a Track identifier of the projected omnidirectional video (projected omnidirectional video). Wherein, the reference type parameter in the track reference box takes the value 'vzom', which indicates that the video track contains the auxiliary zoom video information of the video track referenced by the track identification parameter (track _ IDs [ ]).
The scaled omnidirectional video Track may include a Track Group Box (Track Group Box), which is described below in connection with an optional implementation.
A track group type box (TrackGroupTypeBox) with track _ group _ type equal to 'zoom' indicates that the track is either a scaled omnidirectional video track or a projected omnidirectional video track.
Zoomed OmniVideoGroupBox (zoomed Omnidirectional video group box)
Box Type:'zoom'
Container:TrackBox
Mandatory:No
Quantity:Zero or one
Video tracks having the same track _ group _ id value in a scaled omnidirectional video group box (zoomed omnivideogroup box) constitute a pair of scaled omnidirectional video and projected omnidirectional video tracks corresponding to the same content source.
Grammar for grammar
Figure GDA0003179631340000311
Semantics
zoom _ flag equal to 0 indicates projected omnidirectional video, and zoom _ flag equal to 1 indicates scaled omnidirectional video. If the track _ group _ id values of the two tracks are the same, the value of zoom _ flag should be different.
track _ group _ type represents a group type and should be set to one of the following values:
'zoom' indicates that the track belongs to the scaled omnidirectional video group pbox.
Fig. 12is a diagram of a track group box for a scaled omnidirectional video according to an embodiment of the invention.
As shown in fig. 12, a track group type (track _ group _ type) box having a track group type (track _ group _ type) parameter equal to 'zoom' is included in the scaled omnidirectional video track, i.e., a scaled omnidirectional video group box. Video tracks having the same track group identification (track _ group _ id) value in the scaled omnidirectional video group box constitute a pair of scaled omnidirectional video and projected omnidirectional video tracks corresponding to the same content source.
Track _ group _ type in a track group type box (TrackGroupTypeBox) corresponding to a plurality of differently scaled omnidirectional video tracks of the same content source is set to 'alte' and has the same track _ group _ id field value, indicating that they belong to the same track group. Alternatively, the first and second electrodes may be,
the Track Header boxes (Track Header boxes) of a plurality of differently scaled omnidirectional video tracks corresponding to the same content source have the same alternate _ group field value indicating that they belong to the same Track group.
Only one scaled omnidirectional video in a group of tracks can be transmitted or played at any one time.
A Track Selection Box (Track Selection Box) may be included in the scaled omnidirectional video Track, and is described below in connection with an alternative implementation.
The attribute list provided by the attribute list (attribute _ list [ ]) parameter in the Track Selection Box (Track Selection Box) is used to describe or distinguish different scaled omnidirectional video tracks in a group of tracks.
TrackSelectionBox (track selection box)
Box Type:'tsel'
Container:UserDataBox of the corresponding TrackBox
Mandatory:No
Quantity:Zero or One
Grammar for grammar
Figure GDA0003179631340000321
Semantics
attribute _ list is a list of attributes that are used to describe or distinguish different video tracks from the same group of tracks.
As shown in table 1, the following attributes are used to describe the track.
Figure GDA0003179631340000331
TABLE 1
As shown in table 2, the following attributes are used to distinguish the tracks.
Figure GDA0003179631340000332
TABLE 2
Example 4
The present example provides a video data processing method that indicates a spherical zoom region and/or spherical zoom parameters of a projected omnidirectional video using a projected omnidirectional video box and a zoomed omnidirectional video box.
The present example provides a video data processing method, comprising:
step S402, identifying a projected omnidirectional video box and a zoomed omnidirectional video box based on the scheme type parameters in the limited scheme information box;
step S404, determining the decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the projected omnidirectional video image.
In step S402, the projected omnidirectional video scheme type and the scaled omnidirectional video scheme type are described, and the description thereof will be made with reference to an optional implementation. For the restricted video sample entry type 'resv', the projected omnidirectional video scheme is used to indicate that the decoded image is an encapsulated image containing monocular or binocular stereoscopic content. If the scheme _ type within the scheme type box in the limited scheme information box is equal to 'podv' (projected omni-directional video), it indicates that the projected omni-directional video scheme is used.
The format of the projected omnidirectional video image is represented using a projected omnidirectional video box (ProjectedOmniVideoBox) contained in a scheme information box (schemelnformationbox). When the type of protocol is 'podv', there is one and only one project omnivideobox in the schemelnformationbox. For the restricted video sample entry type 'resv', a scaled omnidirectional video scheme is used to indicate that the decoded image is a scaled image containing monocular or binocular stereoscopic content. If scheme _ type within a scheme type box (SchemeTypeBox) in a restricted scheme information box (retrievedschemeinfobox) is equal to 'zoodv' (zoom integrative video), it is indicated that a scaled omnidirectional video scheme is used.
The format of the scaled omnidirectional video image is represented using a scaled omnidirectional video box (zoomed omnivideobox) contained in a scheme information box (schemelnformationbox). When the protocol type is 'zodv', there is one and only one zoomed omnivideobox in the schemelnformationbox.
The 'zodv' scheme type is defined as an open scheme type of scaled omnidirectional video.
When the project edomnivideobox appears in the schemelnformationbox, a scaled omnidirectional video box (ZoomedOmniVideoBox) may appear in the same schemelnformationbox, indicating the scaled video format of the projected omnidirectional video image in a particular spherical area. If the zoomed OmniVideoBox does not exist, it indicates that the zoom video does not exist in the projected omnidirectional video.
In step S404, it is stated that the scaled omnidirectional video box has a spherical scaling syntax element indicating a spherical scaling region and/or a spherical scaling parameter of the projected omnidirectional video image, which is described below with reference to an optional implementation.
The spherical zoom region of the projected omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
The spherical scaling parameter of the projected omnidirectional video image comprises one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
Scaled omnidirectional video boxes are described below in connection with alternative implementations.
Zoomed omnivideobox (zoomed omnidirectional video box)
Box Type:'zodv'
Container:SchemeInformationBox Mandatory:Yes(when the SchemeType is'zodv')
Quantity:One
Zoomed omnivideobox (scaled omnidirectional video box) is used to indicate that the decoded frame contains scaled omnidirectional video images. When SchemeType is equal to 'zodv', a scaled omnidirectional video box zoomed omnivideobox should be used.
Grammar for grammar
Figure GDA0003179631340000351
Figure GDA0003179631340000361
Semantics
zoom _ shape _ type is used to specify the shape of a spherical area representing the coverage of content. zoom _ shape _ type equal to 0 indicates that the spherical area is specified by four large circles. zoom _ shape _ type equal to 1 indicates that the spherical area is specified by two azimuth circles and two elevation circles.
num _ regions is used to specify the number of spherical regions.
A value of 0 for view _ idc _ presence _ flag indicates that view _ idc [ i ] is not present, a value of 1 indicates that view _ idc [ i ] is present, and indicates that the scaled video of the spherical region is related to a specific view (left, right, or both).
A default _ view _ idc value of 0 indicates that the scaled video of the spherical area is all single-purpose. A value of 1 indicates that the scaled video of the spherical area is on the left view. A value of 2 indicates that the scaled video of the spherical area is on the right view. A value of 3 indicates that the scaled video of the spherical area contains a left view and a right view.
The value of view _ idc [ i ] is 1, which indicates that the zoom video of the ith spherical area is on the left view. A value of 2 indicates that the scaled video of the ith spherical area is on the right view. A value of 3 indicates that the scaled video of the ith spherical area contains a left view and a right view. The value 0 is retained.
zoom _ ratio is used to indicate a scaling ratio at which the omnidirectional video spherical area scales the video.
zoom _ algorithmjtyp is used to indicate the type of scaling algorithm that zooms the video in the omnidirectional video sphere area.
zoom _ systematic _ type is used to indicate a boundary symbolization type of the omni-directional video spherical area zoom video.
zoom _ area _ type is used to indicate an omnidirectional video spherical zoom area type.
zoom _ description is a UTF-8 string ending with a null character providing a textual description of the scaled omnidirectional video.
Example 5
A timed metadata track is a mechanism in the ISO base media file format (ISOBMFF) to establish the timed metadata associated with a particular sample. Timing metadata is less coupled to media data and is often "descriptive".
The present example provides a video data processing method that utilizes spherical zoom timing metadata to indicate spherical zoom regions and or spherical zoom parameters of a referenced omnidirectional video.
The present example provides a video data processing method, comprising:
step S502, determining the type of a sample inlet;
step S504, identifying a spherical scaling timing metadata track in the video data based on a sample entry type; the spherical scaling timing metadata track has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video.
The spherical zoom timing metadata track is described in step S502, and the following description will be given of the spherical zoom timing metadata track with reference to an optional implementation.
This example illustrates a general timing metadata track syntax for omnidirectional video spherical scaling. The purpose of the spherical scaling timing metadata track is indicated by a track sample entry type (sample entry), each sample indicating a spherical scaling region and/or spherical scaling parameters of an omnidirectional video.
The spherical zoom timing metadata track indicates a spherical zoom region and/or spherical zoom parameters for which omnidirectional video zooming should be performed when the user is not responsible for controlling or releasing control of omnidirectional video zooming.
The spherical zoom timing metadata track may indicate a spherical zoom area and/or spherical zoom parameters of the omnidirectional video track based on a director's cut or based on a statistical measure.
The spherical scaling information box contained in the sample entry of the spherical scaling timing metadata track has a spherical scaling syntax element indicating the spherical scaling parameters of the omnidirectional video referenced by the spherical scaling timing metadata track.
The spherical zoom region of the zoomed omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
The spherical scaling parameters of the omnidirectional video comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
The sphere scaling timing metadata track sample entry and sphere scaling information box are described below in connection with alternative implementations.
A spherical scaling timing metadata track (time metadata for spatial mapping) shall be 'spzm' using a track sample entry type, which is defined as follows:
grammar:
Figure GDA0003179631340000381
Figure GDA0003179631340000391
semantics:
zoom _ ratio is used to indicate the scaling of the omnidirectional video spherical area to scale the video.
zoom _ algorithmjype is used to indicate the type of scaling algorithm that zooms the video in the omnidirectional video sphere area.
zoom _ systematic _ type is used to indicate a boundary symbolization type of the omni-directional video spherical area zoom video.
zoom _ area _ type is used to indicate an omnidirectional video spherical zoom area type, as shown in table 3 below:
Figure GDA0003179631340000392
TABLE 3
zoom _ description is a UTF-8 string ending with a null character providing a textual description of the spherical area scaled video of the omnidirectional video.
The spherical scale timing metadata track sample entry type is inherited from a spherical region sample entry (SphereRegionSampleEntry) type. The shape type (shape _ type) parameter in the spherical area configuration box (sphere _ region configbox) contained therein should be equal to 0.
Wherein each sample in the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region of the omnidirectional video referenced by the spherical scaling timing metadata track.
The spherical zoom region of the omnidirectional video includes one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
The spherical scaling timing metadata track sample format is described below in connection with an alternative implementation.
The spherical scale timing metadata track sample format should inherit the syntax using spherical region samples (sphere region samples). Wherein, if there are a static azimuth range parameter (static _ azimuth _ range) and a static elevation angle range parameter (static _ elevation _ range), or an azimuth range parameter (azimuth _ range) and an elevation angle range parameter (elevation _ range), the azimuth range and the elevation angle range of the omnidirectional video spherical zoom area are respectively indicated. The center azimuth parameter (center _ azimuth) and the center elevation parameter (center _ elevation) respectively represent the center points of the omnidirectional video spherical zoom area with respect to the global coordinate axis. The center tilt angle parameter (center _ tilt) indicates a tilt angle of the omnidirectional video spherical zoom region.
In step S404, it is stated that the spherical scaling timing metadata track (including the sample entry and each sample) has a spherical scaling syntax element indicating the spherical scaling region and/or spherical scaling parameters of the referenced omnidirectional video. The following describes the use of a spherical zoom timing metadata track with reference to an omnidirectional video track in conjunction with an alternative implementation.
The spherical scale timing metadata tracks are referenced to one or more omnidirectional video tracks by a Track Reference Box (Track Reference Box) of Reference type (Reference _ type) of 'cdsc'.
The spherical-scale-timing metadata Track refers to a Track group identification (Track _ group _ id) of a Track group having a Track group type (Track _ group _ type) equal to 'zoom' by referring to a Track Reference Box (Track Reference Box) having a type (Reference _ type) of 'cdsc'.
Wherein the spherical scaling timing metadata track describes a spherical scaling region and/or spherical scaling parameters of (scaled omnidirectional video track) track where zoom _ flag is equal to 0 track in the track group; alternatively, the first and second electrodes may be,
the spherical scaling timing metadata track describes a spherical scaling region of a group of tracks where zoom _ flag is equal to 0 track (i.e., a scaled omnidirectional video track) is equal to 1 track (i.e., a projected omnidirectional video track) and a spherical scaling parameter of zoom _ flag is equal to 0 track (i.e., a scaled omnidirectional video track).
Fig. 13 is a first schematic diagram of a spherical zoom timing metadata track referencing an omnidirectional video track according to an embodiment of the present invention.
As shown in fig. 13, a Track Reference Box (Track Reference Box) is included in the spherical scaled timing metadata video Track with a sample entry type equal to 'spzm', and a Track identification parameter (Track IDs [ ]) in the Track Reference Box provides a Track identifier of the referenced omnidirectional video, which in this embodiment is a scaled omnidirectional video Track identification. Wherein, the reference type (reference _ type) parameter in the track reference box takes the value 'cdsc', which indicates that the content description information of the video track referenced by the track identification parameter (track _ IDs [ ]) is contained in the timing metadata track. In this embodiment, the spherical zoom timing metadata video tracks reference omni-directional video tracks that are located in the same file for zooming, providing spherical zoom regions and/or spherical zoom parameters.
Fig. 14 is a second schematic diagram of a spherical zoom timing metadata track referencing an omnidirectional video track according to an embodiment of the present invention.
Similarly, a spherical scaled timing metadata video track with a sample entry type equal to 'spzm' refers to a track of omnidirectional video. As shown in fig. 14, in the present embodiment, the spherical zoom timing metadata video tracks reference omnidirectional video tracks that are positioned in different files for zooming, providing spherical zoom regions and/or spherical zoom parameters.
It should be noted that the present invention can be embodied in other specific forms, and various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (41)

1. A video data processing method, comprising:
identifying a projected omnidirectional video box based on a scheme type parameter in the limited scheme information box; wherein the projected omnidirectional video box is used for representing the format of the projected omnidirectional video image;
determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters to perform a spherical zoom operation for a particular region on the projected omnidirectional video image.
2. The method of claim 1, wherein identifying the projected omnidirectional video box based on the scheme type parameter in the restricted scheme information box comprises:
and if the scheme type parameter value in the limited scheme information box is a first scheme type, indicating that the video data uses a projected omnidirectional video scheme.
3. The method of claim 1, wherein the projected omnidirectional video box has spherical scaling syntax elements, comprising:
the overlay information box of the projected omnidirectional video box contains a scaling format box having the spherical scaling syntax element.
4. The method of claim 1, wherein the projected omnidirectional video box has spherical scaling syntax elements, comprising:
the projected omnidirectional video box comprises a spherical area scaling box having the spherical scaling syntax element.
5. The method of any one of claims 1-4, wherein the spherical zoom region of the projected omnidirectional video image comprises information of one or more of:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
6. The method of any one of claims 1-4, wherein the spherical scaling parameters of the projected omnidirectional video image include one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
7. A video data processing method, comprising:
identifying a scaled omnidirectional video box based on a scheme type parameter in the restricted scheme information box; wherein a scheme information box of the restricted scheme information boxes includes a scaled omnidirectional video box indicating a format of the scaled omnidirectional video image;
and determining a decoding frame of the video data as a scaled omnidirectional video image according to the scaled omnidirectional video box.
8. The method of claim 7, wherein the identifying a scaled omnidirectional video box based on a scheme type parameter in a restricted scheme information box comprises:
and if the scheme type parameter in the limited scheme information box takes the value of a second scheme type, indicating that the video data uses a scaled omnidirectional video scheme.
9. The method of claim 7, wherein:
the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters to perform a spherical scaling operation for a particular region on the scaled omnidirectional video image.
10. The method of claim 9, wherein the spherical zoom region of the zoomed omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
11. The method of claim 9, wherein the spherical scaling parameters of the scaled omnidirectional video image comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
12. The method of claim 7, wherein:
the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
wherein the scaled omnidirectional video track comprises a track reference box, and a track identification parameter in the track reference box references a track identifier of a projected omnidirectional video track that belongs to the same restricted scheme information box as the scaled omnidirectional video track.
13. The method of claim 12, wherein:
and if the reference type parameter in the track reference box is a first reference type, indicating that the scaled omnidirectional video track contains auxiliary scaled video information of the referenced projected omnidirectional video track.
14. The method of claim 7, wherein:
the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video track includes a track group type box;
if the track group type parameter of the track group type box takes on a first track group type, indicating that the zoomed omnidirectional video track belongs to a zoomed omnidirectional video group; the zoomed omnidirectional video group comprises zoomed omnidirectional video tracks corresponding to the same content source and projected omnidirectional video tracks.
15. The method of claim 7, wherein:
the scaled omnidirectional video box indicates that the video data is a scaled omnidirectional video track;
the scaled omnidirectional video tracks contain track selection boxes with a list of attributes that describe or distinguish the different scaled omnidirectional video tracks.
16. The method of claim 15, wherein:
the list of attributes includes at least one of the following attributes:
one or more content in the video track covers a spherical area;
scaling of spherical areas in a video track;
a type of scaling algorithm for a spherical area in a video track;
a boundary symbolization type of a spherical area in a video track;
type of zoom area of the spherical area in the video track.
17. A video data processing method, comprising:
identifying a projected omnidirectional video box and a scaled omnidirectional video box based on a scheme type parameter in the limited scheme information box; wherein the projected omnidirectional video box is used for representing the format of the projected omnidirectional video image;
determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters to perform a spherical scaling operation for a particular region on the projected omnidirectional video image; wherein the scaled omnidirectional video box indicates a scaled video format of the projected omnidirectional video image at a particular spherical area.
18. The method of claim 17, wherein:
if the scaled omnidirectional video box does not exist in the restricted scheme information box, indicating that a spherical scaled video does not exist in the projected omnidirectional video image.
19. The method of claim 17, wherein the spherical zoom region of the projected omnidirectional video image comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
20. The method of claim 17, wherein the spherical scaling parameters of the projected omnidirectional video image comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
21. A video data processing method, comprising:
identifying the video data file as a spherical scaling timing metadata track based on the sample entry type;
wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters for a specific region on the referenced omnidirectional video to perform a spherical scaling operation.
22. The method of claim 21, wherein:
the spherical scaled timed metadata track references one or more omnidirectional video tracks by referencing a track reference box of a second reference type.
23. The method of claim 21, wherein:
the spherical scaling timing metadata track references the track group identification of the track group of which the track group type is the first track group type by referencing a track reference box of which the type is the second reference type.
24. The method of claim 21, wherein:
the spherical scale timing metadata track uses a track sample entry type that is a first sample entry type.
25. The method of claim 21 or 24, wherein:
a spherical scaling information box having the spherical scaling syntax elements is included in a sample entry of the spherical scaling timing metadata track.
26. The method of claim 21 or 24, wherein:
each sample of the spherical scaling timing metadata track has a spherical scaling syntax element that indicates a spherical scaling region of the referenced omnidirectional video.
27. The method of claim 21, wherein the spherical zoom region of the omnidirectional video comprises one or more of the following information:
a center point of the spherical zoom region;
the azimuth angle range and the pitch angle range of the spherical scaling region;
wherein the central point of the spherical zoom region comprises at least one of: azimuth angle of the center point, pitch angle of the center point, and tilt angle of the center point.
28. The method of claim 21, wherein the spherical scaling parameters of the omnidirectional video comprise one or more of the following information:
the spherical area is used for zooming the video;
the type of the scaling algorithm of the spherical area scaling video;
the boundary symbolization type of the spherical area zooming video;
spherical area scaling the textual description of the video;
type of spherical zoom region.
29. The method of claim 21, wherein:
the spherical zoom timing metadata track indicates a spherical zoom region and/or spherical zoom parameters of the referenced omnidirectional video track based on a director's clip or based on statistical measurements.
30. A video data processing apparatus comprising:
a first processing module for identifying a projected omnidirectional video box based on a scheme type parameter in the restricted scheme information box; wherein the projected omnidirectional video box is used for representing the format of the projected omnidirectional video image;
the second processing module is used for determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the projected omnidirectional video box has a spherical zoom syntax element that indicates a spherical zoom region and/or spherical zoom parameters to perform a spherical zoom operation for a particular region on the projected omnidirectional video image.
31. A video data processing apparatus comprising:
a first processing module for identifying a scaled omnidirectional video box based on a scheme type parameter in a restricted scheme information box; wherein a scheme information box of the restricted scheme information boxes includes a scaled omnidirectional video box indicating a format of the scaled omnidirectional video image;
and the second processing module is used for determining a decoding frame of the video data as a scaled omnidirectional video image according to the scaled omnidirectional video box.
32. A video data processing apparatus comprising:
a first processing module for identifying a projected omnidirectional video box and a scaled omnidirectional video box based on a scheme type parameter in the restricted scheme information box; wherein the projected omnidirectional video box is used for representing the format of the projected omnidirectional video image;
the second processing module is used for determining a decoding frame of the video data as a projected omnidirectional video image according to the projected omnidirectional video box; the scaled omnidirectional video box has a spherical scaling syntax element that indicates a spherical scaling region and/or spherical scaling parameters to perform a spherical scaling operation for a particular region on the projected omnidirectional video image; wherein the scaled omnidirectional video box indicates a scaled video format of the projected omnidirectional video image at a particular spherical area.
33. A video data processing apparatus comprising:
a first processing module for determining a sample inlet type;
a second processing module for identifying the video data file as a spherical zoom timing metadata track based on the sample entry type; wherein the spherical scaling timing metadata track has a spherical scaling syntax element indicating a spherical scaling region and/or spherical scaling parameters for a specific region on the referenced omnidirectional video to perform a spherical scaling operation.
34. A video data processing apparatus comprising:
memory, processor and video data processing program stored on the memory and executable on the processor, the video data processing program when executed by the processor implementing the steps of the video data processing method of any of the preceding claims 1-6.
35. A video data processing apparatus comprising:
memory, processor and video data processing program stored on the memory and executable on the processor, which video data processing program when executed by the processor implements the steps of the video data processing method of any of the preceding claims 7-16.
36. A video data processing apparatus comprising:
memory, processor and video data processing program stored on the memory and executable on the processor, which video data processing program when executed by the processor implements the steps of the video data processing method of any of the preceding claims 17-20.
37. A video data processing apparatus comprising:
memory, processor and video data processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the video data processing method of any of the preceding claims 21-29.
38. A computer-readable storage medium having stored thereon a video data processing program which, when executed by a processing module, implements the steps of the video data processing method of any of claims 1-6 above.
39. A computer-readable storage medium having stored thereon a video data processing program which, when executed by a processing module, implements the steps of the video data processing method of any of claims 7-16 above.
40. A computer-readable storage medium having stored thereon a video data processing program which, when executed by a processing module, implements the steps of the video data processing method of any of claims 17-20 above.
41. A computer-readable storage medium having stored thereon a video data processing program which, when executed by a processing module, implements the steps of the video data processing method of any of claims 21-29 above.
CN201810303003.0A 2018-04-06 2018-04-06 Video data processing method, device and medium Active CN110351492B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810303003.0A CN110351492B (en) 2018-04-06 2018-04-06 Video data processing method, device and medium
PCT/CN2019/078913 WO2019192321A1 (en) 2018-04-06 2019-03-20 Video data processing method, device, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810303003.0A CN110351492B (en) 2018-04-06 2018-04-06 Video data processing method, device and medium

Publications (2)

Publication Number Publication Date
CN110351492A CN110351492A (en) 2019-10-18
CN110351492B true CN110351492B (en) 2021-11-19

Family

ID=68099824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810303003.0A Active CN110351492B (en) 2018-04-06 2018-04-06 Video data processing method, device and medium

Country Status (2)

Country Link
CN (1) CN110351492B (en)
WO (1) WO2019192321A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511866B (en) * 2019-12-03 2024-02-23 中兴通讯股份有限公司 Media resource playing method, device, equipment and storage medium
CN116347183A (en) * 2020-06-04 2023-06-27 腾讯科技(深圳)有限公司 Data processing method and related device for immersion media
CN113099245B (en) * 2021-03-04 2023-07-25 广州方硅信息技术有限公司 Panoramic video live broadcast method, system and computer readable storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101228477A (en) * 2005-07-28 2008-07-23 微软公司 Real-time preview for panoramic images
CN101877140A (en) * 2009-12-18 2010-11-03 北京邮电大学 Panorama-based panoramic virtual tour method
CN103379341A (en) * 2012-04-16 2013-10-30 索尼公司 Extension of HEVC NAL unit syntax structure
CN105408916A (en) * 2013-07-26 2016-03-16 华为技术有限公司 Method and system for spatial adaptation in adaptive streaming
CN105916022A (en) * 2015-12-28 2016-08-31 乐视致新电子科技(天津)有限公司 Video image processing method and apparatus based on virtual reality technology
CN106233745A (en) * 2013-07-29 2016-12-14 皇家Kpn公司 Tile video flowing is provided to client
WO2017029402A1 (en) * 2015-08-20 2017-02-23 Koninklijke Kpn N.V. Forming a tiled video on the basis of media streams
WO2017060423A1 (en) * 2015-10-08 2017-04-13 Koninklijke Kpn N.V. Enhancing a region of interest in video frames of a video stream
CN107087212A (en) * 2017-05-09 2017-08-22 杭州码全信息科技有限公司 The interactive panoramic video transcoding and player method and system encoded based on spatial scalable
CN107211159A (en) * 2015-02-11 2017-09-26 高通股份有限公司 Sample packet is sent with signal in file format
CN107787585A (en) * 2015-06-17 2018-03-09 韩国电子通信研究院 For handling the MMT devices and MMT methods of stereo video data
CN107852532A (en) * 2015-06-03 2018-03-27 诺基亚技术有限公司 Method for video encoding, device and computer program

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262419B2 (en) * 2013-04-05 2016-02-16 Microsoft Technology Licensing, Llc Syntax-aware manipulation of media files in a container format
US10389999B2 (en) * 2016-02-17 2019-08-20 Qualcomm Incorporated Storage of virtual reality video in media files
US10979691B2 (en) * 2016-05-20 2021-04-13 Qualcomm Incorporated Circular fisheye video in virtual reality
US10699389B2 (en) * 2016-05-24 2020-06-30 Qualcomm Incorporated Fisheye rendering with lens distortion correction for 360-degree video
US10587934B2 (en) * 2016-05-24 2020-03-10 Qualcomm Incorporated Virtual reality video signaling in dynamic adaptive streaming over HTTP
CN109155861B (en) * 2016-05-24 2021-05-25 诺基亚技术有限公司 Method and apparatus for encoding media content and computer-readable storage medium
US11172005B2 (en) * 2016-09-09 2021-11-09 Nokia Technologies Oy Method and apparatus for controlled observation point and orientation selection audiovisual content

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101228477A (en) * 2005-07-28 2008-07-23 微软公司 Real-time preview for panoramic images
CN101877140A (en) * 2009-12-18 2010-11-03 北京邮电大学 Panorama-based panoramic virtual tour method
CN103379341A (en) * 2012-04-16 2013-10-30 索尼公司 Extension of HEVC NAL unit syntax structure
CN105408916A (en) * 2013-07-26 2016-03-16 华为技术有限公司 Method and system for spatial adaptation in adaptive streaming
CN106233745A (en) * 2013-07-29 2016-12-14 皇家Kpn公司 Tile video flowing is provided to client
CN107211159A (en) * 2015-02-11 2017-09-26 高通股份有限公司 Sample packet is sent with signal in file format
CN107852532A (en) * 2015-06-03 2018-03-27 诺基亚技术有限公司 Method for video encoding, device and computer program
CN107787585A (en) * 2015-06-17 2018-03-09 韩国电子通信研究院 For handling the MMT devices and MMT methods of stereo video data
WO2017029402A1 (en) * 2015-08-20 2017-02-23 Koninklijke Kpn N.V. Forming a tiled video on the basis of media streams
WO2017060423A1 (en) * 2015-10-08 2017-04-13 Koninklijke Kpn N.V. Enhancing a region of interest in video frames of a video stream
CN105916022A (en) * 2015-12-28 2016-08-31 乐视致新电子科技(天津)有限公司 Video image processing method and apparatus based on virtual reality technology
CN107087212A (en) * 2017-05-09 2017-08-22 杭州码全信息科技有限公司 The interactive panoramic video transcoding and player method and system encoded based on spatial scalable

Also Published As

Publication number Publication date
WO2019192321A1 (en) 2019-10-10
CN110351492A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
US11109013B2 (en) Method of transmitting 360-degree video, method of receiving 360-degree video, device for transmitting 360-degree video, and device for receiving 360-degree video
CN109691094B (en) Method for transmitting omnidirectional video, method for receiving omnidirectional video, apparatus for transmitting omnidirectional video, and apparatus for receiving omnidirectional video
CN110876051B (en) Video data processing method, video data transmission method, video data processing system, video data transmission device and video data transmission device
CN109074678B (en) Information processing method and device
CN108965929B (en) Video information presentation method, video information presentation client and video information presentation device
US20200389640A1 (en) Method and device for transmitting 360-degree video by using metadata related to hotspot and roi
US11539983B2 (en) Virtual reality video transmission method, client device and server
KR20190008901A (en) Method, device, and computer program product for improving streaming of virtual reality media content
CN112534825B (en) Packaging method, method of generating image, computing device, and readable storage medium
US10965928B2 (en) Method for 360 video processing based on multiple viewpoints and apparatus therefor
CN109218755B (en) Media data processing method and device
CN110351492B (en) Video data processing method, device and medium
WO2019139099A1 (en) Transmission device, transmission method, reception device and reception method
KR20200020913A (en) Method and apparatus for processing media information
US11677978B2 (en) Omnidirectional video processing method and device, related apparatuses and storage medium
CN115002470A (en) Media data processing method, device, equipment and readable storage medium
CN108271068B (en) Video data processing method and device based on streaming media technology
US20230360678A1 (en) Data processing method and storage medium
CN108271084B (en) Information processing method and device
WO2023194648A1 (en) A method, an apparatus and a computer program product for media streaming of immersive media
CN116848840A (en) Multi-view video streaming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant