WO2018219202A1 - 视频图像的呈现、封装方法和视频图像的呈现、封装装置 - Google Patents

视频图像的呈现、封装方法和视频图像的呈现、封装装置 Download PDF

Info

Publication number
WO2018219202A1
WO2018219202A1 PCT/CN2018/088197 CN2018088197W WO2018219202A1 WO 2018219202 A1 WO2018219202 A1 WO 2018219202A1 CN 2018088197 W CN2018088197 W CN 2018088197W WO 2018219202 A1 WO2018219202 A1 WO 2018219202A1
Authority
WO
WIPO (PCT)
Prior art keywords
video image
image
video
information
spherical
Prior art date
Application number
PCT/CN2018/088197
Other languages
English (en)
French (fr)
Inventor
邸佩云
谢清鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018219202A1 publication Critical patent/WO2018219202A1/zh
Priority to US16/689,517 priority Critical patent/US20200092531A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • the present application relates to the field of processing of video images, and more particularly to presentation of video images, packaging methods, and presentation of video images, packaging devices.
  • VR Virtual Reality
  • the rise of Virtual Reality (VR) has brought new visual experiences to people, and it has also brought new technical challenges.
  • the VR video image is typically divided into a plurality of independent video images, and each video image is then encoded to obtain a code stream of different video images. Since different video images may contain different image information, how to present a video image is a problem that needs to be solved.
  • the present application provides a video image presentation method, a packaging method, and a video image presentation and packaging device to improve display effects.
  • a method for presenting a video image comprising: acquiring a code stream of a first video image; parsing the code stream, determining the first video image and the first of the first video image Information, the first information is used to indicate whether the first video image is presented as a continuous area; and the first video image is presented according to the first information.
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly It is called a sub image.
  • the video image when a video image is a continuous area in the finally displayed image, the video image can be directly presented.
  • the video image is not a continuous area in the finally displayed image, the video image may be spliced with other video images before being displayed.
  • the presenting the first video image according to the first information comprising: when the first information indicates the first video image as a The first video image is presented when the continuous area is presented.
  • the first video image is presented as a continuous region, the first video image is ultimately mapped onto the spherical surface for continuous image content.
  • the first video image When it is determined that the first video image can be presented as a continuous area, the first video image is presented, and continuous image content can be displayed, and the display effect is better.
  • the method includes: when the first information indicates that the first video image is not presented as a continuous area, the first video image and the second video image are spliced according to a positional relationship at the time of presentation.
  • first video image cannot be rendered as a continuous area
  • first video image is mapped to a spherical display
  • discontinuous image content may appear on the spherical surface
  • the second video image adjacent to the first video image content and the first video image need to be spliced according to the positional relationship at the time of presentation, and then displayed to ensure that the display is Continuous images for improved display.
  • a method for presenting a video image comprising: acquiring a code stream of a first video image; parsing the code stream, determining the first video image and a second of the first video image Information, the second information is used to indicate an image type of the first video image, and the image type of the first video image includes a spherical image, a two-dimensional planar image that has not undergone the first operation processing, and is processed through the first operation. a second two-dimensional planar image, wherein the first operation is at least one of segmentation, sampling, flipping, rotating, mirroring, and stitching; and the first video image is presented according to the second information.
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly It is called a sub image.
  • the video image and the image type of the video image can be obtained from the code stream of a certain video image, and the subsequent operations can be initialized according to the image type of the video image, thereby reducing the time when the video image is presented. Delay, improve display efficiency.
  • the image type of the video image can be acquired while parsing a certain video image, and the operation type of the video image can be determined according to the image type of the video image earlier, and then these operations can be processed. Initialization is performed without having to parse the code stream of the full video as in the prior art, and these operations can be started, which can reduce the delay of presenting the video image and improve the display efficiency.
  • the presenting the first video image according to the second information including: when the second information indicates that the first video image is a spherical surface In the case of an image, the first video image is presented in a spherical display.
  • the presenting the first video image according to the second information including: when the second information indicates that the first video image is When the two-dimensional planar image that has not been processed by the first operation is described, the first video image is mapped to a spherical image; and the spherical image is presented in a spherical display manner.
  • the first video image When the first video image is a first type of two-dimensional planar image, the first video image needs to be mapped to a spherical image before being displayed on the spherical surface; otherwise, if the image type of the first video image is not known, the image is directly presented. A display error may occur during the first video image. Therefore, the image type of the first video image can be determined by the second information, thereby correctly displaying the first video image.
  • the presenting the first video image according to the second information including: when the second information indicates that the first video image is In the case of the two-dimensional planar image processed by the first operation, performing a second operation on the first video image to obtain a first video image after the second operation processing, where the second operation is the first An inverse operation of the operation; mapping the first video image after the second operation processing to a spherical image; and presenting the spherical image in a spherical display manner.
  • the second video image is first subjected to the second operation, and then the second video image after the second operation is mapped to the spherical image, and then the spherical image is Display, otherwise, a display error may also occur if the first video image is directly mapped to a spherical image and the spherical image is presented in a spherical display. Therefore, the image type of the first video image can be determined by the second information, thereby correctly displaying the first video image.
  • a third aspect provides a method for encapsulating a video image, the method comprising: determining first information of the first video image, where the first information is used to indicate whether the first video image is the first a continuous area in the image to be encoded corresponding to the video image; encoding the first video image and the first information to obtain a code stream of the first video image; encapsulating the code stream to obtain the first The image track of a video image.
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly It is called a sub image.
  • the video image when the video image is a continuous area in the final displayed image, the video image can be directly presented; and when the video image is not a continuous area in the final displayed image, the video image can be compared with other videos. The image is spliced and then displayed.
  • a method for encoding a video image comprising: determining second information of a first video image, the second information being used to indicate an image type of the first video image, the image type
  • the method includes a spherical image, a two-dimensional planar image that has not undergone the first operation processing, and a two-dimensional planar image that has undergone the first operation processing, wherein the first operation is at least one of segmentation, sampling, flipping, rotating, mirroring, and splicing.
  • Encoding the first video image and the second information to obtain a code stream of the first video image; encapsulating the code stream to obtain an image track of the first video image.
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly It is called a sub image.
  • the video image and the image type of the video image can be obtained from the code stream of a certain video image when the image is presented, thereby being able to
  • the image type of the video image is initialized in advance for subsequent operations, which can reduce the delay of presenting the video image and improve display efficiency.
  • the video rendering device can acquire the image type of the video image while parsing a certain video image, and can determine, according to the image type of the video image, which operation processing is to be performed on the video image, and then These operations can be initialized first, without having to parse the code stream of the full video as in the prior art, and these operations can be started, which can reduce the delay of presenting the video image and improve the display efficiency.
  • a presentation apparatus for a video image comprising means for performing the method of the first aspect or various implementations thereof.
  • a video image presentation apparatus comprising means for performing the method of the second aspect or various implementations thereof.
  • a package apparatus for a video image comprising means for performing the method of the third aspect or various implementations thereof.
  • a package apparatus for a video image comprising means for performing the method of the fourth aspect or various implementations thereof.
  • a ninth aspect a video image presenting apparatus is provided, the apparatus comprising: a storage medium, and a central processing unit, wherein the storage medium stores a computer executable program, the central processor is connected to the storage medium, and The computer executable program is executed to implement the method of the first aspect or various implementations thereof.
  • a video image presenting apparatus comprising: a storage medium, and a central processing unit, wherein the storage medium stores a computer executable program, the central processor is connected to the storage medium, and The computer executable program is executed to implement the method of the second aspect or various implementations thereof.
  • a video image encapsulating apparatus comprising: a storage medium, and a central processing unit, wherein the storage medium stores a computer executable program, and the central processor is connected to the storage medium, The computer executable program is executed to implement the method of the third aspect or various implementations thereof.
  • a video image encapsulating apparatus comprising: a storage medium, and a central processing unit, wherein the storage medium stores a computer executable program, and the central processing unit is connected to the storage medium, And executing the computer executable program to implement the method of the fourth aspect or various implementations thereof.
  • the storage medium may be a nonvolatile storage medium.
  • a thirteenth aspect a computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the first aspect or various implementations thereof .
  • a computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the second aspect or various implementations thereof .
  • a computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the third aspect or various implementations thereof .
  • a computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the fourth aspect or various implementations thereof .
  • FIG. 1 is a schematic flowchart of a method for presenting a video image according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a spherical image and a two-dimensional planar image.
  • Figure 3 is a schematic illustration of a video image.
  • FIG. 4 is a schematic diagram of the position of a video image in a two-dimensional planar image.
  • Figure 5 is a schematic illustration of the video image at a spherical position.
  • FIG. 6 is a schematic flowchart of a method for presenting a video image according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for encapsulating a video image according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for encapsulating a video image according to an embodiment of the present application.
  • FIG. 9 is a schematic flow chart of generating a code stream of a sub-image.
  • FIG. 10 is a schematic flow chart of analyzing a code stream of a sub-image.
  • FIG. 11 is a schematic block diagram of a video image presentation apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of a video image presentation apparatus according to an embodiment of the present application.
  • FIG. 13 is a schematic block diagram of a video image encapsulation apparatus according to an embodiment of the present application.
  • FIG. 14 is a schematic block diagram of a video image encapsulation apparatus according to an embodiment of the present application.
  • FIG. 15 is a schematic block diagram of a codec apparatus according to an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a codec apparatus according to an embodiment of the present application.
  • FIG. 17 is a schematic block diagram of a video codec system according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for presenting a video image according to an embodiment of the present application.
  • the method 100 of Figure 1 includes:
  • the first video image may be part of an original complete video image (which may also be referred to as an original video image, an original image, or an original image), or the first video image is a complete video.
  • a sub-video image obtained after the image is divided, and the sub-video image may also be directly referred to as a sub-image.
  • the first video image is divided into an original video image, which is a sub-image of the original image.
  • the original image may be a spherical image as shown in FIG. 2, which may be an image having a 360 degree angle of view.
  • the original image may also be a first type of two-dimensional planar image shown in FIG. 2.
  • the first type of two-dimensional planar image is obtained by mapping a spherical image to a plane, and the first type of two-dimensional planar image may be a longitude and longitude image. It may also be a planar image obtained by expanding the six faces of the hexahedron after the spherical image is mapped to the hexahedron.
  • the original image may also be a second type of two-dimensional image shown in FIG. 2, and the second type of two-dimensional image performs certain operations on the first type of two-dimensional image (for example, segmentation, sampling, flipping, and rotating).
  • the original image when the original image is a second type of two-dimensional image, the original image may be divided into nine sub-images (four dashed lines divide the two-dimensional planar image into nine regions, one for each region) Sub-image), the resulting sub-images A, B, C, D, E, F, G, H, and I.
  • the first video image may be any one of the nine sub-images.
  • the video image is conveniently encoded by dividing the original image into a plurality of sub-images.
  • the first information may be used to indicate whether the first video image is presented as a continuous area.
  • the code stream of the first video image may be a code stream generated by the encoding end when encoding the first video image, and the first video image may be acquired not only by parsing the code stream, but also the first video image may be acquired. a message.
  • the first information of the sub-image A is specifically used to indicate that the sub-image A is presented as a continuous area when presented, because the sub-image A Contains an image of a continuous region of the middle region of the first type of two-dimensional image.
  • the first information of the first video image also indicates that the first video image can be presented as a continuous region.
  • the first information of the sub-image is specifically used to indicate that the sub-image is not a continuous region in the final displayed image, because the sub-image G includes An image of a middle portion of the first type of two-dimensional image and a two-part region of the top portion, and the images of the two partial regions are not adjacent, and therefore, when the sub-image is the sub-image G, in the final displayed image A discontinuous area.
  • the first information of the sub-image is also a continuous region indicating that the sub-image is not the final displayed image.
  • a video rendering device which may also be a decoding end device, a decoder, or a device having a decoding function.
  • the video image when a video image is a continuous area in the finally displayed image, the video image can be directly presented.
  • the video image is not a continuous area in the finally displayed image, the video image may be spliced with other video images before being displayed.
  • the first video image can be presented as a continuous area
  • the image content of the first video image can be directly displayed.
  • the image content of the first video image is presented again, which can ensure that the displayed image content is continuous, and a certain display effect can be ensured.
  • the first video image when the first video image can be presented as a continuous area, the first video image is finally mapped onto the spherical surface to display continuous image content; and when the first video image is not presented as a continuous area, if still If the first video image is directly mapped to the spherical display, then there may be discontinuous image content displayed on the sphere, which may affect the visual experience.
  • the first video image is directly presented to the spherical display, there may be a case where the image content is discontinuous on the spherical surface (for example, two completely unrelated image contents may be displayed).
  • the second video image may be acquired according to the code stream of the second video image, wherein the second video image is a video image adjacent to at least a portion (image content) of the first video image at the time of presentation Then, the first video image and the second video image are spliced according to the positional relationship at the time of presentation and then presented.
  • the positional relationship of the first video image and the second video image at the time of presentation may be directly obtained by parsing from the code stream of the entire video, or may be separately acquired according to the code streams of the first video image and the second video image.
  • the position information of the first video image and the second video image is determined.
  • the second video image adjacent to the first video image content and the first video image need to be spliced according to the positional relationship at the time of presentation, and then displayed to ensure that the display is Continuous images for improved display.
  • the positions of the sub-image G on the first-type two-dimensional planar image and the spherical image are as shown in FIGS. 4 and 5, respectively.
  • the positions of the sub-image G on the first-type two-dimensional plane image in FIG. 4 are respectively on the left side of the top and the lower left corner of the middle (shaded portion area in FIG. 4).
  • the position of the sub-image G on the spherical image in Fig. 5 is the shaded area shown by 1 and 2. It can be seen that the sub-image G is two discontinuous regions on the first type of two-dimensional plane image and the spherical image. Therefore, if the sub-image is directly presented, then two discontinuous image contents will be displayed. The display is not good.
  • the first information may be implemented in various manners.
  • the first information may be described in a new syntax extended in the Track Group Type Box of the first video image.
  • the syntax in the SubPicture Composition Box may be used to describe First information.
  • the value of content_continuity may be used to indicate whether the first video image is presented as a continuous area, and the specific syntax is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • the first video image is not presented as a continuous area.
  • content_continuity indicates whether the first video image is presented as a continuous region.
  • the content_continuity may also take other values to respectively indicate whether the first video image is presented as a continuous region. .
  • FIG. 6 is a schematic flowchart of a method for decoding a video image according to an embodiment of the present application.
  • the method 600 of Figure 6 includes:
  • the first video image may be part of an original complete video image (which may also be referred to as an original video image, an original image, or an original image), or the first video image is a complete video.
  • a sub-video image obtained after the image is divided, and the sub-video image may also be directly referred to as a sub-image.
  • the original video image When the first video image is a sub-image obtained by dividing the original video image, the original video image may be a spherical image, a first type of two-dimensional planar image, or a second type of planar two-dimensional image as shown in FIG. 2.
  • the spherical image may be an image having a 360 degree angle of view
  • the first type of two-dimensional planar image may be a planar image obtained by mapping the spherical image to a plane.
  • the first type of two-dimensional planar image may be a warp and a weft image, or may be A planar image obtained by mapping a spherical image to a hexahedron and then expanding the six faces of the hexahedron.
  • the second type of two-dimensional image may be a planar image obtained after performing certain operations (for example, segmentation, sampling, flipping, rotating, mirroring, splicing) on the first type of two-dimensional image.
  • certain operations for example, segmentation, sampling, flipping, rotating, mirroring, splicing
  • FIG. 2 The top and bottom regions of the first type of two-dimensional image are compressed and spliced together, and arranged below the middle region to obtain a second type of two-dimensional planar image.
  • the original video image of the first video image when the original video image of the first video image is a second type of two-dimensional image, the original video image may be divided into nine sub-images, and the first video image may be the nine sub-images. Any of the sub-images in .
  • Parsing a code stream determining a first video image and second information of the first video image, where the second information is used to indicate an image type of the first video image, and the image type of the first video image includes a spherical image, A two-dimensional planar image processed by the operation and a two-dimensional planar image processed by the first operation.
  • the first operation may be at least one of segmentation, sampling, flipping, rotating, mirroring, and splicing.
  • the image type of the first video image is the same as the image type of the original video image of the first video image, for example, if the original video image is a first type of two-dimensional planar image, the first obtained by dividing the original video image The image type of the video image is also the first type of two-dimensional planar image.
  • the above two-dimensional planar image lesson that has not been processed by the first operation may be the first type of two-dimensional planar image in FIG. 2, and the two-dimensional planar image is obtained by directly mapping the spherical image to the plane, and is mapped to the plane. The first operation was not performed afterwards.
  • the two-dimensional planar image processed by the first operation may be the second type of two-dimensional planar image in FIG. 2, and the two-dimensional planar image is directly mapped to the plane by the spherical image to obtain the first type of two-dimensional planar image, and then Then, the planar image obtained by dividing, sampling, flipping, splicing, etc. of the first type of two-dimensional planar image is performed.
  • first operation may be referred to as packing
  • second operation may be referred to as reverse packing
  • the video image and the image type of the video image can be obtained from the code stream of a certain video image, and the subsequent operations can be initialized according to the image type of the video image, thereby reducing the time when the video image is presented. Delay, improve display efficiency.
  • the image type of the video image can be acquired while parsing a certain video image, and the operation type of the video image can be determined according to the image type of the video image earlier, and then these operations can be processed. Initialization is performed without having to parse the code stream of the full video as in the prior art, and these operations can be started, which can reduce the delay of presenting the video image and improve the display efficiency.
  • a video rendering device which may also be a decoding end device, a decoder, or a device having a decoding function.
  • the process of presenting the first video image is also different, and specifically includes the following three cases:
  • the first video image is a spherical image
  • the first video image When the first video image is a spherical image, the first video image may be directly presented on the spherical surface, that is, the first video image may be presented in a spherical display manner. Specifically, when the first video image is a spherical image, the first video image is a part (or all) of the original video image (the original video image is also a spherical image), and is directly based on the first video image on the spherical surface. The position information may be displayed by presenting the first video image to a corresponding position on the spherical surface.
  • the first video image is a two-dimensional planar image that has not been processed by the first operation
  • the first video image may be a first type of two-dimensional planar image as shown in FIG. 2.
  • the first video image is first mapped to a spherical image, and then the spherical image is presented in a spherical display.
  • the first video image When the first video image is a first type of two-dimensional planar image, the first video image needs to be mapped to a spherical image before being displayed on the spherical surface; otherwise, if the image type of the first video image is not known, the image is directly presented. A display error may occur during the first video image. Therefore, the image type of the first video image can be determined by the second information, thereby correctly displaying the first video image.
  • the first video image is a two-dimensional planar image processed by the first operation
  • the first video image may be a second type of two-dimensional planar image as shown in FIG. 2. If the first video sub-image is to be presented, the first video image is subjected to a second operation process to obtain a second operation-processed first video image, wherein the second operation is an inverse operation of the first operation. (Or called reverse operation, reverse operation), next, the first video image after the second operation is mapped to the spherical surface to obtain a spherical image, and then the spherical image is presented in a spherical display manner.
  • a second operation process to obtain a second operation-processed first video image
  • reverse operation reverse operation
  • the second video image is first subjected to the second operation, and then the second video image after the second operation is mapped to the spherical image, and then the spherical image is Display, otherwise, a display error may also occur if the first video image is directly mapped to a spherical image and the spherical image is presented in a spherical display. Therefore, the image type of the first video image can be determined by the second information, thereby correctly displaying the first video image.
  • the second operation is also flipping, and the second image can restore the video image to the state before the first operation, that is, the second operation is the restoration of the first operation. Operation, by the second operation, the image after the first operation processing can be restored to the state before the first operation processing.
  • the image of the top region and the bottom region may be enlarged, and the images of the enlarged top region and the bottom region may be respectively moved to the intermediate region.
  • the first type of two-dimensional planar image shown in Figure 2 is finally obtained.
  • the foregoing second information may be implemented in various manners.
  • the second information may be described in a new syntax extended in the Track Group Type Box of the first video image.
  • the syntax in the SubPicture Composition Box may be used to describe Second message.
  • the value of the fullpictureType may be used to indicate the image type of the video image, and the specific syntax is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • the first video image is a spherical image
  • the first video image is a two-dimensional planar image, and the first video image does not undergo a first operation
  • the first video image is a two-dimensional planar image, and the first video image undergoes a first operation.
  • fullpictureType uses different values to represent the image type of the first video image.
  • fullpictureTyp may also take other values to represent the image type of the first video image.
  • the second information may further include two sub-information, a first sub-information and a second sub-information, where the first sub-information is used to indicate whether the first video image is a spherical image or a two-dimensional planar image, when the first sub- When the information indicates that the video image is a two-dimensional planar image, the second sub-information indicates whether the video image has undergone the first operation.
  • the second information when the first video image is a spherical image, the second information includes only the first sub-information, and when the first video image is a two-dimensional planar image, the second information includes the first sub-information.
  • the two sub-information wherein the first sub-information indicates that the first video image is a two-dimensional planar image, and the second sub-information indicates whether the first video image has undergone the first operation.
  • the value of the fullpictureType may also be used to indicate the image type of the first video image, and the specific syntax is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • the first video image is a spherical image
  • the first video image is a two-dimensional planar image.
  • fullpictureType uses different values to represent the image type of the first video image.
  • fullpictureTyp may also take other values to represent the image type of the first video image.
  • the second sub-information in the second information may also be represented by a statement similar to the first sub-information.
  • the value of the packing may be used to indicate the image type of the first video image.
  • the specific syntax is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • packing uses different values to represent the image type of the first video image (whether the video image passes the first operation). In fact, packing can also take other values to represent the first video image. The type of image.
  • the method 100 or the method 600 further includes: determining third information of the first video image according to the code stream of the first video image, where the third information is used to indicate whether the first video image is a full image; The three information presents the first video image.
  • the full image image herein may be a complete image to be displayed, and the first video image may be either all of the complete image to be displayed or only a part of the complete image to be displayed.
  • the decoding end or the device that presents the video may determine that the first video image includes the entire image instead of the partial image after parsing the third information. It is not necessary to render the image content at any position in the entire image by means of other video images; and when the third information indicates that the first video image is the entire image, the decoding end or the device that presents the video after parsing the third information.
  • the position information of the first video image and the resolution information need to be parsed to determine the position of the first video image in the entire image, and then the first video image is presented.
  • the value of the fullpicture may also be used to indicate whether the first video image is a full image, and the specific syntax is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • the first video image is a full image
  • the first video image is a partial image of the full image.
  • the fullpicture uses different values to indicate whether the first video image is a full image.
  • the fullpicture may also take other values to indicate whether the first video image is a full image.
  • At least one of the first information, the second information, and the third information may be obtained by parsing the code stream of the first video sub-image, and may be based on the three types when parsing and parsing the first video image.
  • One or more of the information to present the first video image may be obtained by parsing the code stream of the first video sub-image, and may be based on the three types when parsing and parsing the first video image.
  • the scheme of presenting the first video image according to one or more of the first information, the second information, and the third information is within the protection scope of the present application.
  • the parsing the first video image according to the first information and the second information comprises: determining, according to the first information, whether the first video image is presented as a continuous region; determining an image type of the first video image according to the second information And presenting the image content of the first video image according to whether the first video image is presented as one continuous region and the image type of the first video image.
  • the image content of the first video image is presented, including: presenting the first video image as a continuous area And in the case where the first video image is a spherical image, the first video image (direct) is presented in a spherical display.
  • the image content of the first video image is presented, including: presenting the first video image as a continuous area And the first video image is a two-dimensional planar image that has not undergone the first operation processing, and the first video image is mapped to a spherical image; the spherical image is presented in a spherical display manner.
  • the image content of the first video image is presented, including: presenting the first video image as a continuous area And the first video image is a two-dimensional planar image processed by the first operation, performing a second operation on the first video image to obtain a first video image after the second operation processing, where the second operation is The inverse operation or the reverse operation of the first operation; mapping the first video image after the second operation processing to a spherical image; and presenting the spherical image in a spherical display manner.
  • the location information of the first video image in the entire video image may also be used.
  • the location information of the first video image can be implemented in various manners.
  • the location information of the first video image can be described in a new syntax extended in the Track Group Type Box of the first video image.
  • SubPicture Composition can be used.
  • the grammar in the Box describes the location information of the first video image.
  • the syntax for describing the location information of the first video image is as follows:
  • SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
  • track_x represents the horizontal position of the upper left corner of the first video image in the entire video image (or referred to as the original video image), taking a natural number, the range [0, composition_width-1];
  • Track_y represents the vertical position of the upper left corner of the first video image in the entire video image, taking a natural number, the range [0, composition_height-1];
  • Track_width indicates the width of the first video image, which is an integer, the range [1, composition_width–track_x];
  • Track_height indicates the height of the first video image, which is an integer, the range [1, composition_height–track_y].
  • Composition_width represents the width of the first video image
  • Composition_height represents the height of the first video image.
  • FIG. 7 and FIG. 8 The method for presenting the video image of the embodiment of the present application is described in detail with reference to FIG. 1 to FIG. 6 , and the method for encapsulating the video image of the embodiment of the present application is described below with reference to FIG. 7 and FIG. 8 .
  • the packaging methods of the video images shown in FIGS. 7 and 8 correspond to the methods 100 and 600 of the above, respectively.
  • the repeated description is appropriately omitted below.
  • FIG. 7 is a schematic flowchart of a method for encapsulating a video image according to an embodiment of the present application.
  • the method 700 of Figure 7 includes:
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly referred to as a sub-video image. image.
  • the video image when the video image is a continuous area in the final displayed image, the video image can be directly presented; and when the video image is not a continuous area in the final displayed image, the video image can be compared with other videos. The image is spliced and then displayed.
  • FIG. 8 is a schematic flowchart of a method for encapsulating a video image according to an embodiment of the present application.
  • the method 800 of Figure 8 includes:
  • the image type includes a spherical image, a two-dimensional planar image that has not been processed by the first operation, and a two-dimensional planar image that is processed by the first operation, wherein the first operation is segmentation, sampling, flipping, rotating, mirroring, and splicing. At least one type;
  • the first video image may be part of the original complete video image, or the first video image is a sub-video image obtained by dividing the original complete video image, and the sub-video image may also be directly referred to as a sub-video image. image.
  • the video image and the image type of the video image can be obtained from the code stream of a certain video image when the image is presented, thereby being able to
  • the image type of the video image is initialized in advance for subsequent operations, which can reduce the delay of presenting the video image and improve display efficiency.
  • the video rendering device can acquire the image type of the video image while parsing a certain video image, and can determine, according to the image type of the video image, which operation processing is to be performed on the video image, and then These operations can be initialized first, without having to parse the code stream of the full video as in the prior art, and these operations can be started, which can reduce the delay of presenting the video image and improve the display efficiency.
  • FIG. 9 is a schematic flow chart of generating a code stream of a sub-image.
  • the sub-image dividing module divides the input entire image into a plurality of sub-images, and determines metadata of each sub-image, and then outputs the sub-images;
  • the encoder encodes the input sub-images to generate a video.
  • the code stream encapsulation module encapsulates the input video bare code stream and metadata into the sub-picture code stream.
  • the video bare code stream data is a code stream conforming to the ITU-T H.264 or ITU-T H.265 specifications;
  • the metadata of the sub-image may include the first information, the second information, and the third information in the foregoing.
  • At least one of the metadata may be obtained from the sub-image partitioning module or from the divided preset conditions.
  • FIG. 10 is a schematic flow chart of analyzing a code stream of a sub-image.
  • the code stream decapsulation module obtains the code stream data of the sub image, and parses the metadata of the video and the video bare code stream data.
  • the image information of the sub-image can be obtained from the metadata of the video, and then the sub-image is presented according to the image information of the sub-image and the sub-image parsed in the video bare stream data of the sub-image.
  • the video image presentation method and the packaging method of the embodiments of the present application are described above with reference to FIG. 1 to FIG. 10 .
  • the video image presentation device and the packaging device of the embodiments of the present application are described below with reference to FIG. 11 to FIG.
  • the presentation device in FIG. 11 to FIG. 14 can implement the decoding method of the video image in FIG. 1 to FIG. 10
  • the packaging device can implement the encoding method of the video image in FIG. 1 to FIG. 10 . description.
  • FIG. 11 is a schematic block diagram of a video image presentation apparatus according to an embodiment of the present application.
  • the device 1100 includes:
  • the obtaining module 1110 is configured to acquire a code stream of the first video image.
  • a parsing module 1120 configured to parse the code stream, and determine first information of the first video image and the first video image, where the first information is used to indicate whether the first video image is a continuous area Present
  • the presentation module 1130 is configured to present the first video image according to the first information.
  • the presentation module 1130 is specifically configured to: when the first information indicates that the first video image is presented as one continuous area, present the first video image.
  • the presentation module 1130 is specifically configured to: when the first information indicates the first video image When not presented as a continuous area, the first video image and the second video image are stitched together according to the positional relationship at the time of presentation.
  • FIG. 12 is a schematic block diagram of a video image presentation apparatus according to an embodiment of the present application.
  • the device 1200 includes:
  • the obtaining module 1210 is configured to acquire a code stream of the first video image.
  • a parsing module 1220 configured to parse the code stream, determine the first video image and second information of the first video image, where the second information is used to indicate an image type of the first video image,
  • the first image type includes a spherical image, a two-dimensional planar image that has not undergone the first operation processing, and a two-dimensional planar image that has undergone the first operation processing, wherein the first operation is segmentation, sampling, flipping, rotating, mirroring At least one of splicing;
  • the presentation module 1230 is configured to present the first video image according to the second information.
  • the presentation module 1230 is specifically configured to: when the second information indicates that the first video image is a spherical image, present the first video image in a spherical display manner.
  • the presentation module 1230 is specifically configured to: when the second information indicates that the first video image is the two-dimensional planar image that has not undergone the first operation processing, The sub-image is mapped to a spherical image; the spherical image is presented in a spherical display.
  • the presentation module 1230 is specifically configured to: when the second information indicates that the first video image is the two-dimensional planar image after the first operation processing, Performing a second operation on the first video image to obtain a first video image after the second operation processing, where the second operation is an inverse operation of the first operation; and the first video processed by the second operation
  • the image is mapped to a spherical image; the spherical image is presented in a spherical display.
  • FIG. 13 is a schematic block diagram of a video image encapsulation apparatus according to an embodiment of the present application.
  • the device 1300 includes:
  • a determining module 1310 configured to determine first information of the first video image, where the first information is used to indicate whether the first video image is a continuous region in an image to be encoded corresponding to the first video image ;
  • the encoding module 1320 is configured to encode the first video image and the first information to obtain a code stream of the first video image
  • the encapsulating module 1330 is configured to encapsulate the code stream to obtain an image track of the first video image.
  • FIG. 14 is a schematic block diagram of a video image encapsulation apparatus according to an embodiment of the present application.
  • the device 1400 includes:
  • a determining module 1410 configured to determine second information of the first video image, where the second information is used to indicate an image type of the first video image, the image type includes a spherical image, and the second image is not processed by the first operation a planar image and a two-dimensional planar image processed by the first operation, wherein the first operation is at least one of segmentation, sampling, flipping, rotating, mirroring, and splicing;
  • the encoding module 1420 is configured to encode the first video image and the second information to obtain a code stream of the first video image
  • the encapsulating module 1430 is configured to encapsulate the code stream to obtain an image track of the first video image.
  • the presentation method and the encapsulation method of the video image in the present application may be performed by a system composed of a codec device or a codec device.
  • the presentation device and the encapsulation device of the video image above may also be specifically a codec device or Codec system.
  • a codec system composed of a codec device and a codec device will be described in detail below with reference to FIGS. 15 to 17. It should be understood that the codec device and the codec system in FIGS. 15 to 17 are capable of performing the above-described video image presentation method and video image encapsulation method.
  • FIG. 15 and 16 illustrate a codec device 50 of an embodiment of the present application, which may be a mobile terminal or user equipment of a wireless communication system. It should be understood that embodiments of the present application can be implemented in any electronic device or device that may require encoding and/or decoding of video images.
  • the codec device 50 may include a housing 30 for incorporating and protecting the device, a display 32 (which may specifically be a liquid crystal display), a keypad 34.
  • Codec device 50 may include a microphone 36 or any suitable audio input, which may be a digital or analog signal input.
  • the codec device 50 may also include an audio output device, which in the embodiment of the present application may be any of the following: an earphone 38, a speaker, or an analog audio or digital audio output connection.
  • Codec device 50 may also include battery 40, and in other embodiments of the present application, the device may be powered by any suitable mobile energy device, such as a solar cell, fuel cell, or clock mechanism generator.
  • the device may also include an infrared port 42 for short-range line of sight communication with other devices.
  • codec device 50 may also include any suitable short range communication solution, such as a Bluetooth wireless connection or a USB/FireWire wired connection.
  • Codec device 50 may include a controller 56 or processor for controlling codec device 50.
  • the controller 56 can be coupled to a memory 58, which in the embodiments of the present application can store data in the form of data and audio in the form of images, and/or can also store instructions for execution on the controller 56.
  • the controller 56 can also be coupled to a codec 54 suitable for implementing encoding and decoding of audio and/or video data or assisted encoding and decoding by the controller 56.
  • the codec device 50 may also include a card reader 48 and a smart card 46 for providing user information and for providing authentication information for authenticating and authorizing users on the network, such as a Universal Integrated Circuit Card (UICC) and a UICC. Reader.
  • a card reader 48 for providing user information and for providing authentication information for authenticating and authorizing users on the network, such as a Universal Integrated Circuit Card (UICC) and a UICC. Reader.
  • UICC Universal Integrated Circuit Card
  • UICC Universal Integrated Circuit Card
  • the codec device 50 may also include a radio interface circuit 52 coupled to the controller and adapted to generate, for example, a wireless communication signal for communicating with a cellular communication network, a wireless communication system, or a wireless local area network.
  • the codec device 50 may also include an antenna 44 coupled to the radio interface circuit 52 for transmitting radio frequency signals generated at the radio interface circuit 52 to other device(s) and for receiving radio frequency signals from other device(s) .
  • codec device 50 includes a camera capable of recording or detecting a single frame, and codec 54 or controller receives these single frames and processes them. In some embodiments of the present application, codec device 50 may receive video image data to be processed from another device prior to transmission and/or storage. In some embodiments of the present application, codec device 50 may receive images for encoding/decoding over a wireless or wired connection.
  • FIG. 17 is a schematic block diagram of a video codec system 10 according to an embodiment of the present application.
  • the video codec system 10 includes a source device 12 and a destination device 14.
  • Source device 12 produces encoded video data.
  • source device 12 may be referred to as a video encoding device or a video encoding device.
  • Destination device 14 can decode the encoded video data produced by source device 12.
  • destination device 14 may be referred to as a video decoding device or a video decoding device.
  • Source device 12 and destination device 14 may be examples of video codec devices or video codec devices.
  • Source device 12 and destination device 14 may include desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set top boxes, smart phones, and the like, televisions, cameras, display devices, digital media players, Video game console, onboard computer, or other similar device.
  • Channel 16 may include one or more media and/or devices capable of moving encoded video data from source device 12 to destination device 14.
  • channel 16 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real time.
  • source device 12 may modulate the encoded video data in accordance with a communication standard (eg, a wireless communication protocol) and may transmit the modulated video data to destination device 14.
  • the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network (eg, a local area network, a wide area network, or a global network (eg, the Internet)).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
  • channel 16 can include a storage medium that stores encoded video data generated by source device 12.
  • destination device 14 can access the storage medium via disk access or card access.
  • the storage medium may include a variety of locally accessible data storage media, such as Blu-ray Disc, DVD, CD-ROM, flash memory, or other suitable digital storage medium for storing encoded video data.
  • channel 16 can include a file server or another intermediate storage device that stores encoded video data generated by source device 12.
  • destination device 14 may access the encoded video data stored at a file server or other intermediate storage device via streaming or download.
  • the file server may be a server type capable of storing encoded video data and transmitting the encoded video data to the destination device 14.
  • the file server can include a web server (eg, for a website), a file transfer protocol (FTP) server, a network attached storage (NAS) device, and a local disk drive.
  • FTP file transfer protocol
  • NAS network attached storage
  • Destination device 14 can access the encoded video data via a standard data connection (e.g., an internet connection).
  • a standard data connection e.g., an internet connection.
  • An instance type of a data connection includes a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, DSL, cable modem, etc.), or both, suitable for accessing encoded video data stored on a file server. combination.
  • the transmission of the encoded video data from the file server may be streaming, downloading, or a combination of both.
  • the codec method of the present application is not limited to a wireless application scenario.
  • the codec method may be applied to video codec supporting multiple multimedia applications such as: aerial television broadcasting, cable television transmission, satellite television transmission, Streaming video transmission (e.g., via the Internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other application.
  • video codec system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • source device 12 includes video source 18, video encoder 20, and output interface 22.
  • output interface 22 can include a modulator/demodulator (modem) and/or a transmitter.
  • Video source 18 may include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer for generating video data.
  • Video encoder 20 may encode video data from video source 18.
  • source device 12 transmits the encoded video data directly to destination device 14 via output interface 22.
  • the encoded video data may also be stored on a storage medium or file server for later access by the destination device 14 for decoding and/or playback.
  • destination device 14 includes an input interface 28, a video decoder 30, and a display device 32.
  • input interface 28 includes a receiver and/or a modem.
  • Input interface 28 can receive the encoded video data via channel 16.
  • Display device 32 may be integral with destination device 14 or may be external to destination device 14. In general, display device 32 displays the decoded video data.
  • Display device 32 may include a variety of display devices such as liquid crystal displays (LCDs), plasma displays, organic light emitting diode (OLED) displays, or other types of display devices.
  • LCDs liquid crystal displays
  • OLED organic light emitting diode
  • Video encoder 20 and video decoder 30 may operate in accordance with a video compression standard (eg, the High Efficiency Video Codec H.265 standard) and may conform to the HEVC Test Model (HM).
  • a video compression standard eg, the High Efficiency Video Codec H.265 standard
  • HM HEVC Test Model
  • a textual description of the H.265 standard is published on April 29, 2015, ITU-T.265(V3) (04/2015), available for download from http://handle.itu.int/11.1002/1000/12455 The entire contents of the document are incorporated herein by reference.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了一种视频图像的呈现、封装方法和视频图像的呈现、封装装置。该视频图像的呈现方法包括:获取第一视频图像的码流;解析所述码流,确定所述第一视频图像以及所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否作为一个连续区域呈现;根据所述第一视频图像和所述第一信息,呈现所述第一视频图像。本申请能够根据第一视频图像的第一信息,更好地呈现第一视频图像。

Description

视频图像的呈现、封装方法和视频图像的呈现、封装装置
本申请要求于2017年05月27日提交中国专利局、申请号为201710387835.0、申请名称为“视频图像的呈现、封装方法和视频图像的呈现、封装装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频图像的处理领域,并且更具体地,涉及视频图像的呈现、封装方法和视频图像的呈现、封装装置。
背景技术
虚拟现实(Virtual Reality,VR)的兴起给人们带来了新的视觉体验,同时也带来了新的技术挑战。在对VR视频图像进行编码时,通常将VR视频图像划分成多个独立的视频图像,然后对每个视频图像进行编码,以得到不同视频图像的码流。由于不同的视频图像可能包含不同的图像信息,因此,如何呈现视频图像是一个需要解决的问题。
发明内容
本申请提供一种视频图像的呈现方法、封装方法和视频图像的呈现、封装装置,以提高显示效果。
第一方面,提供了一种视频图像的呈现方法,该方法包括:获取第一视频图像的码流;解析所述码流,确定所述第一视频图像以及所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否作为一个连续区域呈现;根据所述第一信息呈现所述第一视频图像。
应理解,上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
在呈现某个视频图像时考虑到了该视频图像在最终显示的图像中是否为一个连续区域,能够更好地呈现视频图像,从而提高显示效果。
具体地,当某个视频图像在最终显示的图像中为一个连续区域时,可以直接呈现该视频图像。而当该视频图像在最终显示的图像中不是一个连续区域时,可以将该视频图像与其它视频图像拼接后再显示。
结合第一方面,在第一方面的某些实现方式中,所述根据所述第一信息,呈现所述第一视频图像,包括:当所述第一信息指示所述第一视频图像作为一个连续区域呈现时,呈现所述第一视频图像。
应理解,当第一视频图像作为一个连续区域呈现时,将第一视频图像最终映射到球面上显示的是连续的图像内容。
当确定第一视频图像能够作为一个连续区域呈现时,再将第一视频图像呈现出来,能够显示出连续的图像内容,显示效果较好。
结合第一方面,在第一方面的某些实现方式中,所述第一视频图像的至少一部分和第二视频图像在呈现时邻接,所述根据所述第一信息,呈现所述子图像,包括:当所述第一信息指示所述第一视频图像不作为一个连续区域呈现时,将所述第一视频图像与所述第二视频图像按照呈现时的位置关系拼接后呈现。
应理解,当第一视频图像不能作为一个连续区域呈现时,如果将第一视频图像映射到球面显示的话,那么可能就会出现球面上显示的是不连续的图像内容。
当第一视频图像不能作为一个连续区域呈现时,需要将与第一视频图像内容邻接的第二视频图像与第一视频图像按照呈现时的位置关系进行拼接后再显示,以保证显示出的是连续的图像,提高显示效果。
第二方面,提供了一种视频图像的呈现方法,该方法包括:获取第一视频图像的码流;解析所述码流,确定所述第一视频图像以及所述第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述第一视频图像的图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;根据所述第二信息呈现所述第一视频图像。
应理解,上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
在呈现图像时,可以从某个视频图像的码流中获取该视频图像以及该视频图像的图像类型,进而能够根据该视频图像的图像类型预先进行后续操作的初始化,能够减少呈现视频图像的时延,提高显示效率。
具体地,在解析某个视频图像的同时就可以获取该视频图像的图像类型,能够更早地根据视频图像的图像类型确定后续要对该视频图像进行哪些操作处理,接下来就可以这些操作处理先进行初始化,而不必像现有技术那样需要解析完全部视频的码流后才能启动这些操作,能够减少呈现视频图像的时延,提高显示效率。
结合第二方面,在第二方面的某些实现方式中,所述根据所述第二信息,呈现所述第一视频图像,包括:当所述第二信息指示所述第一视频图像为球面图像时,将所述第一视频图像以球面显示的方式呈现。
结合第二方面,在第二方面的某些实现方式中,所述根据所述第二信息,呈现所述第一视频图像,包括:当所述第二信息指示所述第一视频图像为所述未经过第一操作处理的二维平面图像的时,将所述第一视频图像映射为球面图像;将所述球面图像以球面显示的方式呈现。
当第一视频图像为第一类二维平面图像时,需要先将该第一视频图像映射为球面图像,然后才能在球面上显示,否则,如果不知道第一视频图像的图像类型而直接呈现第一视频图像时可能会出现显示错误。因此,通过第二信息可以确定第一视频图像的图像类型,从而正确显示第一视频图像。
结合第二方面,在第二方面的某些实现方式中,所述根据所述第二信息,呈现所述第 一视频图像,包括:当所述第二信息指示所述第一视频图像为所述经过第一操作处理后的二维平面图像的情况时,对所述第一视频图像进行第二操作,得到第二操作处理后的第一视频图像,所述第二操作为所述第一操作的逆操作;将所述第二操作处理后的第一视频图像映射为球面图像;将所述球面图像以球面显示的方式呈现。
当第一视频图像为第二类二维平面图像时,需要先对该第一视频图像进行第二操作,然后再将第二操作后的第一视频图像映射为球面图像,然后才能在球面上显示,否则,直接将该第一视频图像映射为球面图像并将该球面图像以球面显示的方式呈现的话也会出现显示错误。因此,通过第二信息可以确定第一视频图像的图像类型,从而正确显示第一视频图像。
第三方面,提供了一种视频图像的封装方法,该方法包括:确定所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否为所述第一视频图像对应的待编码图像中的一个连续区域;对所述第一视频图像和所述第一信息进行编码,得到所述第一视频图像的码流;封装所述码流,获得所述第一视频图像的图像轨迹。
应理解,上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
通过将指示某视频图像是否为待编码图像中的一个连续区域的信息也编码到该视频图像的码流中,从而使得在呈现该视频图像时能够考虑到该视频图像是否为待显示图像中的一个连续区域,能够更好地呈现视频图像,从而提高显示效果。
例如,当该视频图像在最终显示的图像中为一个连续区域时,可以直接呈现该视频图像;而当该视频图像在最终显示的图像中不是一个连续区域时,可以将该视频图像与其它视频图像拼接后再显示。
第四方面,提供了一种视频图像的编码方法,该方法包括:确定第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;对所述第一视频图像和所述第二信息进行编码,得到所述第一视频图像的码流;封装所述码流,获得所述第一视频图像的图像轨迹。
应理解,上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
通过将视频图像的图像类型信息也编码到该视频图像的码流中,使得在呈现图像时可以从某个视频图像的码流中获取该视频图像以及该视频图像的图像类型,进而能够根据该视频图像的图像类型预先进行后续操作的初始化,能够减少呈现视频图像的时延,提高显示效率。
具体而言,视频呈现设备在解析某个视频图像的同时就可以获取该视频图像的图像类型,能够更早地根据视频图像的图像类型确定后续要对该视频图像进行哪些操作处理,接下来就可以这些操作处理先进行初始化,而不必像现有技术那样需要解析完全部视频的码流后才能启动这些操作,能够减少呈现视频图像的时延,提高显示效率。
第五方面,提供一种视频图像的呈现装置,所述装置包括用于执行所述第一方面或其各种实现方式中的方法的模块。
第六方面,提供一种视频图像的呈现装置,所述装置包括用于执行所述第二方面或其各种实现方式中的方法的模块。
第七方面,提供一种视频图像的封装装置,所述装置包括用于执行所述第三方面或其各种实现方式中的方法的模块。
第八方面,提供一种视频图像的封装装置,所述装置包括用于执行所述第四方面或其各种实现方式中的方法的模块。
第九方面,提供一种视频图像的呈现装置,该装置包括:存储介质,以及中央处理器,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述存储介质连接,并执行所述计算机可执行程序以实现所述第一方面或其各种实现方式中的方法。
第十方面,提供一种视频图像的呈现装置,该装置包括:存储介质,以及中央处理器,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述存储介质连接,并执行所述计算机可执行程序以实现所述第二方面或其各种实现方式中的方法。
第十一方面,提供一种视频图像的封装装置,该装置包括:存储介质,以及中央处理器,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述存储介质连接,并执行所述计算机可执行程序以实现所述第三方面或其各种实现方式中的方法。
第十二方面,提供一种视频图像的封装装置,该装置包括:存储介质,以及中央处理器,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述存储介质连接,并执行所述计算机可执行程序以实现所述第四方面或其各种实现方式中的方法。
应理解,在上述第九方面至第十二方面中,存储介质可以是非易失性存储介质。
第十三方面,提供一种计算机可读介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第一方面或其各种实现方式中的方法的指令。
第十四方面,提供一种计算机可读介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第二方面或其各种实现方式中的方法的指令。
第十五方面,提供一种计算机可读介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第三方面或其各种实现方式中的方法的指令。
第十六方面,提供一种计算机可读介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第四方面或其各种实现方式中的方法的指令。
应理解,本发明第五至第十六方面所提供的技术方案分别与第一方面至第四方面所提供的技术方案,技术手段一致,技术的有益效果类似,不再赘述。
附图说明
图1是本申请实施例的视频图像的呈现方法的示意性流程图。
图2是球面图像和二维平面图像的示意图。
图3是视频图像的示意图。
图4是视频图像在二维平面图像的位置的示意图。
图5是视频图像在球面位置的示意图。
图6是是本申请实施例的视频图像的呈现方法的示意性流程图。
图7是本申请实施例的视频图像的封装方法的示意性流程图。
图8是本申请实施例的视频图像的封装方法的示意性流程图。
图9是生成子图像的码流的示意性流程图。
图10是解析子图像的码流的示意性流程图。
图11是本申请实施例的视频图像的呈现装置的示意性框图。
图12是本申请实施例的视频图像的呈现装置的示意性框图。
图13是本申请实施例的视频图像的封装装置的示意性框图。
图14是本申请实施例的视频图像的封装装置的示意性框图。
图15是本申请实施例的编解码装置的示意性框图。
图16是本申请实施例的编解码装置的示意性图。
图17是本申请实施例的视频编解码系统的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1是本申请实施例的视频图像的呈现方法的示意性流程图。图1的方法100包括:
110、获取第一视频图像的码流。
上述第一视频图像可以是原来的完整的视频图像(该完整的视频图像也可以称之为原始视频图像、原始图像或者原图像)的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
假设上述第一视频图像为对原始视频图像划分得到的,该第一视频图像是原始图像的一个子图像。该原始图像可以是图2中所示的球面图像,该球面图像可以是具有360度视角的图像。该原始图像还可以是图2中所示的第一类二维平面图像,第一类二维平面图像是由球面图像映射到平面之后得到的,第一类二维平面图像既可以是经纬图,也可以是球面图像映射到六面体之后将六面体的六个面展开之后得到的平面图像。另外,该原始图像还可以是对图2所示的第二类二维图像,该第二类二维图像是对第一类二维图像进行一定的操作(例如,分割、采样、翻转、旋转、镜像、拼接)之后得到的平面图像,在图2中,将第一类二维图像的顶部区域和底部区域压缩后拼接在一起,并排列在中间区域的下方就得到了第二类二维平面图像。
例如,如图3所示,当原始图像为第二类二维图像时,可以将该原始图像划分成9个子图像(四条虚线将二维平面图像划分成了9个区域,每个区域对应一个子图像),得到的子图像A、B、C、D、E、F、G、H和I。上述第一视频图像可以是该9个子图像中的任意一个子图像。
通过将原始图像划分成多个子图像,便于对视频图像进行编码。
120、解析码流,确定第一视频图像以及第一视频图像的第一信息。
上述第一信息可以用于指示第一视频图像是否作为一个连续区域呈现。
上述第一视频图像的码流可以是编码端在对第一视频图像进行编码时生成的码流,通过解析该码流不仅可以获取该第一视频图像,还可以获取该第一视频图像的第一信息。
当上述第一视频图像为图3中的子图像A时,那么,子图像A的第一信息具体用于指示子图像A在呈现时是作为一个连续区域进行呈现的,这是因为子图像A包含的是第 一类二维图像的中间区域的一块连续区域的图像。类似的,当上述第一视频图像为图3中的子图像B至子图像F时,该第一视频图像的第一信息也是指示第一视频图像能够作为一个连续区域呈现。
而当上述第一视频图像为图3中的子图像G时,该子图像的第一信息具体用于指示该子图像不是最终显示的图像中的一个连续区域,这是因为子图像G包含的第一类二维图像的中间区域以及顶部区域的两部分区域的图像,并且,这两部分区域的图像不相邻,因此,当上述子图像为子图像G时,在最终显示的图像中是一个不连续的区域。同样,当上述子图像为图3中的子图像H和子图像I时,该子图像的第一信息也是指示该子图像不是最终显示的图像中的一个连续区域。
130、根据第一信息呈现该第一视频图像。
应理解,上述方法100可以由视频呈现设备来执行,该视频呈现设备同时还可以是解码端设备、解码器或者具有解码功能的设备。
本申请中,在呈现某个视频图像时考虑到了该视频图像在最终显示的图像中是否为一个连续区域,能够更好地呈现视频图像,从而提高显示效果。
具体地,当某个视频图像在最终显示的图像中为一个连续区域时,可以直接呈现该视频图像。而当该视频图像在最终显示的图像中不是一个连续区域时,可以将该视频图像与其它视频图像拼接后再显示。
在根据第一信息呈现第一视频图像时具体可以包含以下两种情况:
情况一:第一视频图像可以作为一个连续区域呈现
在这种情况下,由于第一视频图像最终呈现后会是一个连续的区域,因此,可以直接显示该第一视频图像的图像内容。
在确定第一视频图像能够作为一个连续区域呈现的情况下,再呈现第一视频图像的图像内容,能够确保显示的图像内容是连续的,可以保证一定的显示效果。
具体地,当第一视频图像能够作为一个连续区域呈现时,将第一视频图像最终映射到球面上显示的就是连续的图像内容;而当第一视频图像不作为一个连续区域呈现时,如果仍直接将第一视频图像映射到球面显示的话,那么可能就会出现球面上显示的是不连续的图像内容,进而会影响视觉体验。
情况二:第一视频图像不作为一个连续区域呈现
在这种情况下,直接将该第一视频图像呈现到球面显示的话,在球面上会出现图像内容不连续的情况(例如,可能会显示出两个完全不相关的图像内容)。
因此,在这种情况下,可以根据第二视频图像的码流获取第二视频图像,其中,该第二视频图像是在呈现时与第一视频图像的至少一部分(图像内容)邻接的视频图像,然后将第一视频图像与第二视频图像按照呈现时的位置关系拼接后再呈现。
应理解,第一视频图像和第二视频图像在呈现时的位置关系可以直接从整个视频的码流中解析后获取,也可以根据第一视频图像和第二视频图像的码流中分别获取的第一视频图像和第二视频图像的位置信息来确定。
当第一视频图像不能作为一个连续区域呈现时,需要将与第一视频图像内容邻接的第二视频图像与第一视频图像按照呈现时的位置关系进行拼接后再显示,以保证显示出的是连续的图像,提高显示效果。
例如,当上述第一视频图像为图3中的子图像G时,该子图像G在第一类二维平面图像以及球面图像上的位置分别如图4和图5所示。具体地,子图像G在图4中的第一类二维平面图像上的位置分别在顶部的左侧以及中间的左下角(图4中的阴影部分区域)。子图像G在图5中的球面图像上的位置为1和2所示的阴影区域。由此可见,子图像G在第一类二维平面图像以及球面图像上是两个不连续的区域,因此,如果直接呈现该子图像的话,那么显示出的将是两个不连续的图像内容,显示效果不好。
上述第一信息的实现方式有很多种,例如,该第一信息可以描述在第一视频图像的Track Group Type Box中扩展出来的新语法中,具体地,可以采用SubPicture Composition Box中的语法来描述第一信息。
具体地,对于第一信息,可以采用content_continuity的取值来指示第一视频图像是否作为一个连续区域呈现,具体语法如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
……
    unsigned int(8)content_continuity;
……
}
当content_continuity=0时,第一视频图像作为一个连续区域呈现;
当content_continuity=1时,第一视频图像不作为一个连续区域呈现。
应理解,以上只是content_continuity的不同取值表示第一视频图像是否作为一个连续区域呈现的一种具体情况,事实上,content_continuity还可以采用取其它数值来分别表示第一视频图像是否作为一个连续区域呈现。
图6是本申请实施例的视频图像的解码方法的示意性流程图。图6的方法600包括:
610、获取第一视频图像的码流。
上述第一视频图像可以是原来的完整的视频图像(该完整的视频图像也可以称之为原始视频图像、原始图像或者原图像)的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
当第一视频图像是由原始视频图像划分得到的子图像时,该原始视频图像可以是图2中所示的球面图像、第一类二维平面图像或者第二类平面二维图像。其中,球面图像可以是具有360度视角的图像,第一类二维平面图像可以是由球面图像映射到平面之后得到的平面图像,第一类二维平面图像既可以是经纬图,也可以是球面图像映射到六面体之后将六面体的六个面展开之后得到的平面图像。而第二类二维图像可以是对第一类二维图像进行一定的操作(例如,分割、采样、翻转、旋转、镜像、拼接)之后得到的平面图像,具体地,在图2中,将第一类二维图像的顶部区域和底部区域压缩后拼接在一起,并排列在中间区域的下方就得到了第二类二维平面图像。
例如,如图3所示,当上述第一视频图像的原始视频图像为第二类二维图像时,可以将该原始视频图像划分成9个子图像,该第一视频图像可以是这9个子图像中的任意一个子图像。
620、解析码流,确定第一视频图像以及第一视频图像的第二信息,该第二信息用于 指示第一视频图像的图像类型,第一视频图像的图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像。
上述第一操作可以为分割、采样、翻转、旋转、镜像、拼接中的至少一种。
应理解,上述第一视频图像的图像类型与第一视频图像的原始视频图像的图像类型相同,例如,如果原始视频图像为第一类二维平面图像,那么由原始视频图像划分得到的第一视频图像的图像类型也是第一类二维平面图像。
上述未经第一操作处理的二维平面图像课可以是图2中的第一类二维平面图像,这类二维平面图像是由球面图像直接映射到平面后得到的,并且在映射到平面后没有进行第一操作。
上述经过第一操作处理后的二维平面图像可以是图2中的第二类二维平面图像,这类二维平面图像是由球面图像直接映射到平面得到第一类二维平面图像,然后再对第一类二维平面图像进行分割、采样、翻转、拼接等操作后得到的平面图像。
另外,上述第一操作可以称为packing,而第二操作可以称为反向packing。
630、根据第二信息,呈现第一视频图像。
在呈现图像时,可以从某个视频图像的码流中获取该视频图像以及该视频图像的图像类型,进而能够根据该视频图像的图像类型预先进行后续操作的初始化,能够减少呈现视频图像的时延,提高显示效率。
具体地,在解析某个视频图像的同时就可以获取该视频图像的图像类型,能够更早地根据视频图像的图像类型确定后续要对该视频图像进行哪些操作处理,接下来就可以这些操作处理先进行初始化,而不必像现有技术那样需要解析完全部视频的码流后才能启动这些操作,能够减少呈现视频图像的时延,提高显示效率。
应理解,上述方法600可以由视频呈现设备来执行,该视频呈现设备同时还可以是解码端设备、解码器或者具有解码功能的设备。
当第一视频图像属于不同的图像类型时,呈现第一视频图像的过程也有所区别,具体可以包含以下三种情况:
(1)、第一视频图像为球面图像
当第一视频图像为球面图像时,可以直接将第一视频图像呈现到球面上显示,也就是可以将第一视频图像以球面显示的方式呈现。具体地,当第一视频图像为球面图像时,第一视频图像是原始视频图像(该原始视频图像也是球面图像)的一部分(或者全部),这时直接根据该第一视频图像在球面上的位置信息将该第一视频图像呈现到球面上相应的位置直接显示即可。
(2)、第一视频图像为未经过第一操作处理的二维平面图像
这种情况下,该第一视频图像可以如图2中所示的第一类二维平面图像。在呈现图像时,先将该第一视频图像映射为球面图像,然后再将球面图像以球面显示的方式呈现。
当第一视频图像为第一类二维平面图像时,需要先将该第一视频图像映射为球面图像,然后才能在球面上显示,否则,如果不知道第一视频图像的图像类型而直接呈现第一视频图像时可能会出现显示错误。因此,通过第二信息可以确定第一视频图像的图像类型,从而正确显示第一视频图像。
(3)、第一视频图像为经过第一操作处理后的二维平面图像
这种情况下,该第一视频图像可以如图2中所示的第二类二维平面图像。如果要呈现该第一视频子图像的话,需要先对该第一视频图像进行第二操作处理,得到第二操作处理后的第一视频图像,其中,该第二操作是第一操作的逆操作(或者称为反操作、反向操作),接下来,再将第二操作处理后的第一视频图像映射到球面,得到球面图像,然后再将该球面图像以球面显示的方式呈现。
当第一视频图像为第二类二维平面图像时,需要先对该第一视频图像进行第二操作,然后再将第二操作后的第一视频图像映射为球面图像,然后才能在球面上显示,否则,直接将该第一视频图像映射为球面图像并将该球面图像以球面显示的方式呈现的话也会出现显示错误。因此,通过第二信息可以确定第一视频图像的图像类型,从而正确显示第一视频图像。
应理解,当上述第一操作为翻转时,那么第二操作也为翻转,经过第二操作可以将视频图像恢复到第一操作之前的状态,也就是说,第二操作是第一操作的还原操作,通过第二操作能够将经过第一操作处理之后的图像还原成第一操作处理之前的状态。
当第一视频图像为图2中所示的第二类二维平面图像时,可以通过放大顶部区域和底部区域的图像,并且将放大后的顶部区域和底部区域的图像分别移动到中间区域的上方和下方,最终得到图2中所示的第一类二维平面图像。
上述第二信息的实现方式有很多种,例如,该第二信息可以描述在第一视频图像的Track Group Type Box中扩展出来的新语法中,具体地,可以采用SubPicture Composition Box中的语法来描述第二信息。
具体地,对于第二信息,可以采用fullpictureType的取值来指示视频图像的图像类型,具体语法如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
……
    unsigned int(8)fullpictureType;
……
}
当fullpictureType=0时,第一视频图像为球面图像;
当fullpictureType=1时,第一视频图像为二维平面图像,且该第一视频图像未经过第一操作;
当fullpictureType=2时,第一视频图像为二维平面图像,且该第一视频图像经过了第一操作。
应理解,以上只是fullpictureType采用不同取值表示第一视频图像的图像类型的一种具体情况,事实上,fullpictureTyp还可以采用取其它数值来表示第一视频图像的图像类型。
可选地,第二信息还可以包含两个子信息,第一子信息和第二子信息,其中,第一子信息用于指示第一视频图像为球面图像还是二维平面图像,当第一子信息指示视频图像为二维平面图像时,第二子信息指示视频图像是否经过第一操作。
也就是说,当第一视频图像为球面图像时,第二信息只包含第一子信息,而当第一视频图像为二维平面图像时,第二信息除了第一子信息之外还包含第二子信息,其中,第一 子信息指示第一视频图像为二维平面图像,第二子信息指示第一视频图像是否经过第一操作。
对于第二信息中的第一子信息,也可以采用fullpictureType的取值来指示第一视频图像的图像类型,具体语法如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
……
    unsigned int(8)fullpictureType;
……
}
当fullpictureType=0时,第一视频图像为球面图像;
当fullpictureType=1时,第一视频图像为二维平面图像。
应理解,以上只是fullpictureType采用不同取值表示第一视频图像的图像类型的一种具体情况,事实上,fullpictureTyp还可以采用取其它数值来表示第一视频图像的图像类型。
第二信息中的第二子信息也可以采用与第一子信息类似的语句表示,具体地,可以采用packing的取值来指示第一视频图像的图像类型,具体语法如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
……
    unsigned int(8)packing;
……
}
当packing=0时,第一视频图像未经过第一操作;
当packing=1时,第一视频图像经过了第一操作。
应理解,以上只是packing采用不同取值表示第一视频图像的图像类型(该视频图像是否经过第一操作)的一种具体情况,事实上,packing还可以采用取其它数值来表示第一视频图像的图像类型。
可选地,上述方法100或者方法600还包括:根据第一视频图像的码流确定第一视频图像的第三信息,该第三信息用于指示第一视频图像是否为全图图像;根据第三信息呈现该第一视频图像。
应理解,这里的全图图像可以是待显示的完整图像,第一视频图像既可以是待显示的完整图像的全部也可以只是待显示的完整图像的一部分。
具体地,当第三信息指示第一视频图像为全图图像时,解码端或者呈现视频的装置在解析到第三信息后,可以确定该第一视频图像包含了整个图像,而不是部分图像,不需要凭借其它视频图像就可以呈现整个图像中的任意位置的图像内容;而当第三信息指示第一视频图像为整个图像时,解码端或者呈现视频的装置在解析到第三信息后,还需要解析该第一视频图像的位置信息以及分辨率信息从而确定第一视频图像在整个图像中的位置,然后将第一视频图像呈现出来。
对于上述第三信息,也可以采用fullpicture的取值来指示第一视频图像是否为全图图 像,具体语法如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
……
    unsigned int(8)fullpicture;
……
}
当fullpicture=0时,第一视频图像为全图图像;
当fullpicture=1时,第一视频图像为全图图像的部分图像。
应理解,以上只是fullpicture采用不同取值表示第一视频图像是否为全图图像的一种具体情况,事实上,fullpicture还可以采用取其它数值来表示第一视频图像是否为全图图像。
应理解,在本申请中,可以通过解析第一视频子图像的码流获取第一信息、第二信息以及第三信息中的至少一种,在呈现解析第一视频图像时可以根据这三种信息中的一种或者多种来呈现第一视频图像。
因此,根据第一信息、第二信息以及第三信息中的一种或者多种来呈现第一视频图像的方案都在本申请的保护范围内。
可选地,根据第一信息和第二信息来呈现解析第一视频图像,包括:根据第一信息确定第一视频图像是否作为一个连续区域呈现;根据第二信息确定第一视频图像的图像类型;根据第一视频图像是否作为一个连续区域呈现以及第一视频图像的图像类型,呈现第一视频图像的图像内容。
可选地,作为一个实施例,根据第一视频图像是否作为一个连续区域呈现以及第一视频图像的图像类型,呈现第一视频图像的图像内容,包括:在第一视频图像作为一个连续区域呈现并且第一视频图像为球面图像的情况下,将第一视频图像(直接)以球面显示的方式呈现。
可选地,作为一个实施例,根据第一视频图像是否作为一个连续区域呈现以及第一视频图像的图像类型,呈现第一视频图像的图像内容,包括:在第一视频图像作为一个连续区域呈现,并且该第一视频图像为未经过第一操作处理的二维平面图像的情况下,将第一视频图像映射为球面图像;将该球面图像以球面显示的方式呈现。
可选地,作为一个实施例,根据第一视频图像是否作为一个连续区域呈现以及第一视频图像的图像类型,呈现第一视频图像的图像内容,包括:在第一视频图像作为一个连续区域呈现,并且该第一视频图像为经过第一操作处理后的二维平面图像的情况下,对第一视频图像进行第二操作,得到第二操作处理后的第一视频图像,该第二操作为所述第一操作的逆操作或者反向操作;将第二操作处理后的第一视频图像映射为球面图像;将球面图像以球面显示的方式呈现。
应理解,本申请实施例中,在呈现第一视频图像的图像内容时,还可能要用到第一视频图像在整个视频图像中的位置信息。
第一视频图像的位置信息的实现方式有很多种,例如,第一视频图像的位置信息可以描述在第一视频图像的Track Group Type Box中扩展出来的新语法中,具体地,可以采用 SubPicture Composition Box中的语法来描述第一视频图像的位置信息。
描述第一视频图像的位置信息的语法具体如下:
aligned(8)class SubPictureCompositionBox extends TrackGroupTypeBox(‘spco’)
{
    ……
    unsigned int(16)track_x;
    unsigned int(16)track_y;
    unsigned int(16)track_width;
    unsigned int(16)track_height;
    unsigned int(16)composition_width;
    unsigned int(16)composition_height;
    ……
}
其中,track_x表示第一视频图像的左上角在整个视频图像(或者称为原始视频图像)中的水平位置,取值自然数,范围[0,composition_width-1];
track_y表示第一视频图像的左上角在整个视频图像中的垂直位置,取值自然数,范围[0,composition_height-1];
track_width表示描述第一视频图像的宽度,取值为整数,范围[1,composition_width–track_x];
track_height表示第一视频图像的高度,取值为整数,范围[1,composition_height–track_y].
composition_width表示第一视频图像的宽度;
composition_height表示第一视频图像的高度。
上文结合图1至图6对本申请实施例的视频图像的呈现方法进行了详细的描述,下面结合图7和图8从视频图像的封装的角度对本申请实施例的视频图像的封装方法进行描述,应理解,图7和图8所示的视频图像的封装方法分别与上文中的方法100和方法600是对应的。为了简洁,下面适当省略重复的描述。
图7是本申请实施例的视频图像的封装方法的示意性流程图。图7的方法700包括:
710、确定第一视频图像的第一信息,第一信息用于指示第一视频图像是否为第一视频图像对应的待编码图像中的一个连续区域;
720、对第一视频图像和第一信息进行编码,得到第一视频图像的码流;
730、封装码流,获得第一视频图像的图像轨迹。
上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
通过将指示某视频图像是否为待编码图像中的一个连续区域的信息也编码到该视频图像的码流中,从而使得在呈现该视频图像时能够考虑到该视频图像是否为待显示图像中的一个连续区域,能够更好地呈现视频图像,从而提高显示效果。
例如,当该视频图像在最终显示的图像中为一个连续区域时,可以直接呈现该视频图 像;而当该视频图像在最终显示的图像中不是一个连续区域时,可以将该视频图像与其它视频图像拼接后再显示。
图8是本申请实施例的视频图像的封装方法的示意性流程图。图8的方法800包括:
810、图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
820、对第一视频图像和第二信息进行编码,得到第一视频图像的码流;
830、封装码流,获得第一视频图像的图像轨迹。
上述第一视频图像可以是原来的完整的视频图像的一部分,或者,该第一视频图像是对原来完整的视频图像进行划分后得到的一个子视频图像,该子视频图像还可以直接称为子图像。
通过将视频图像的图像类型信息也编码到该视频图像的码流中,使得在呈现图像时可以从某个视频图像的码流中获取该视频图像以及该视频图像的图像类型,进而能够根据该视频图像的图像类型预先进行后续操作的初始化,能够减少呈现视频图像的时延,提高显示效率。
具体而言,视频呈现设备在解析某个视频图像的同时就可以获取该视频图像的图像类型,能够更早地根据视频图像的图像类型确定后续要对该视频图像进行哪些操作处理,接下来就可以这些操作处理先进行初始化,而不必像现有技术那样需要解析完全部视频的码流后才能启动这些操作,能够减少呈现视频图像的时延,提高显示效率。
为了更好地理解本申请实施例的视频图像的呈现方法和封装方法,下面结合图9和图10对视频图像处理过程中子图像(相当于上文中的第一视频图像)的码流的生成和解析过程进行简单的描述。
图9是生成子图像的码流的示意性流程图。在图9中,子图像划分模块将输入的整个图像划分成多个子图像,并确定各个子图像的元数据,接下来再将子图像输出;编码器对输入的各个子图像进行编码,产生视频裸码流;码流封装模块将输入的视频裸码流和元数据封装到子图像码流中。
其中,视频裸码流数据是符合ITU-T H.264或者ITU-T H.265规范的码流;子图像的元数据可以包含上文中的第一信息、第二信息以及第三信息中的至少一个,元数据既可以从子图像划分模块获得,也可以从划分的预设条件中获得。
图10是解析子图像的码流的示意性流程图。在图10中,码流解封装模块获得子图像的码流数据,解析获得视频的元数据和视频裸码流数据。接下来,就可以从视频的元数据中获取子图像的图像信息,然后根据子图像的图像信息以及子图像的视频裸码流数据中解析得到的子图像呈现子图像。
上文结合图1至图10对本申请实施例的视频图像的呈现方法和封装方法进行了描述,下面结合图11至图14对本申请实施例的视频图像的呈现装置和封装装置进行描述,应理解,图11至图14中的呈现装置能够实现图1至图10中的视频图像的解码方法,封装装置能够实现图1至图10中的视频图像的编码方法,为了简洁,下面适当省略重复的描述。
图11是本申请实施例的视频图像的呈现装置的示意性框图。该装置1100包括:
获取模块1110,用于获取第一视频图像的码流;
解析模块1120,用于解析所述码流,确定所述第一视频图像以及所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否作为一个连续区域呈现;
呈现模块1130,用于根据所述第一信息呈现所述第一视频图像。
可选地,作为一个实施例,所述呈现模块1130具体用于:当所述第一信息指示所述第一视频图像作为一个连续区域呈现时,呈现所述第一视频图像。
可选地,作为一个实施例,所述第一视频图像的至少一部分和第二视频图像在呈现时邻接,所述呈现模块1130具体用于:当所述第一信息指示所述第一视频图像不作为一个连续区域呈现时,将所述第一视频图像与所述第二视频图像按照呈现时的位置关系拼接后呈现。
图12是本申请实施例的视频图像的呈现装置的示意性框图。该装置1200包括:
获取模块1210,用于获取第一视频图像的码流;
解析模块1220,用于解析所述码流,确定所述第一视频图像以及所述第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述第一图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
呈现模块1230,用于根据所述第二信息呈现所述第一视频图像。
可选地,作为一个实施例,所述呈现模块1230具体用于:当所述第二信息指示所述第一视频图像为球面图像时,将所述第一视频图像以球面显示的方式呈现。
可选地,作为一个实施例,所述呈现模块1230具体用于:当所述第二信息指示所述第一视频图像为所述未经过第一操作处理的二维平面图像的时,将所述子图像映射为球面图像;将所述球面图像以球面显示的方式呈现。
可选地,作为一个实施例,所述呈现模块1230具体用于:当所述第二信息指示所述第一视频图像为所述经过第一操作处理后的二维平面图像的情况时,对所述第一视频图像进行第二操作,得到第二操作处理后的第一视频图像,所述第二操作为所述第一操作的逆操作;将所述第二操作处理后的第一视频图像映射为球面图像;将所述球面图像以球面显示的方式呈现。
图13是本申请实施例的视频图像的封装装置的示意性框图。该装置1300包括:
确定模块1310,用于确定所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否为所述第一视频图像对应的待编码图像中的一个连续区域;
编码模块1320,用于对所述第一视频图像和所述第一信息进行编码,得到所述第一视频图像的码流;
封装模块1330,用于封装所述码流,获得所述第一视频图像的图像轨迹。
图14是本申请实施例的视频图像的封装装置的示意性框图。该装置1400包括:
确定模块1410,用于确定第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
编码模块1420,用于对所述第一视频图像和所述第二信息进行编码,得到所述第一视频图像的码流;
封装模块1430,用于封装所述码流,获得所述第一视频图像的图像轨迹。
应理解,本申请中的视频图像的呈现方法和封装方法可以由编解码装置或者编解码装置组成的系统执行,另外,上文中的视频图像的呈现装置和封装装置也可以具体是编解码装置或者编解码系统。
下面结合图15至图17对编解码装置以及编解码装置组成的编解码系统进行详细的介绍。应理解,图15至图17中的编解码装置和编解码系统能够执行上文中的视频图像的呈现方法以及视频图像的封装方法。
图15和图16示出了本申请实施例的编解码装置50,该编解码装置50可以是无线通信系统的移动终端或者用户设备。应理解,本申请实施例可以在可能需要对视频图像进行编码和/或解码的任何电子设备或者装置内实施。
编解码装置50可以包括用于并入和保护设备的外壳30,显示器32(具体可以为液晶显示器),小键盘34。编解码装置50可以包括麦克风36或者任何适当的音频输入,该音频输入可以是数字或者模拟信号输入。编解码装置50还可以包括如下音频输出设备,该音频输出设备在本申请的实施例中可以是以下各项中的任何一项:耳机38、扬声器或者模拟音频或者数字音频输出连接。编解码装置50也可以包括电池40,在本申请的其它实施例中,设备可以由任何适当的移动能量设备,比如太阳能电池、燃料电池或者时钟机构生成器供电。装置还可以包括用于与其它设备的近程视线通信的红外线端口42。在其它实施例中,编解码装置50还可以包括任何适当的近程通信解决方案,比如蓝牙无线连接或者USB/火线有线连接。
编解码装置50可以包括用于控制编解码装置50的控制器56或者处理器。控制器56可以连接到存储器58,该存储器在本申请的实施例中可以存储形式为图像的数据和音频的数据,和/或也可以存储用于在控制器56上实施的指令。控制器56还可以连接到适合于实现音频和/或视频数据的编码和解码或者由控制器56实现的辅助编码和解码的编码解码器54。
编解码装置50还可以包括用于提供用户信息并且适合于提供用于在网络认证和授权用户的认证信息的读卡器48和智能卡46,例如集成电路卡(Universal Integrated Circuit Card,UICC)和UICC读取器。
编解码装置50还可以包括无线电接口电路52,该无线电接口电路连接到控制器并且适合于生成例如用于与蜂窝通信网络、无线通信系统或者无线局域网通信的无线通信信号。编解码装置50还可以包括天线44,该天线连接到无线电接口电路52用于向其它(多个)装置发送在无线电接口电路52生成的射频信号并且用于从其它(多个)装置接收射频信号。
在本申请的一些实施例中,编解码装置50包括能够记录或者检测单帧的相机,编码解码器54或者控制器接收到这些单帧并对它们进行处理。在本申请的一些实施例中,编解码装置50可以在传输和/或存储之前从另一设备接收待处理的视频图像数据。在本申请的一些实施例中,编解码装置50可以通过无线或者有线连接接收图像用于编码/解码。
图17是本申请实施例的视频编解码系统10的示意性框图。如图17所示,视频编解码系统10包含源装置12及目的地装置14。源装置12产生经编码视频数据。因此,源装置12可被称作视频编码装置或视频编码设备。目的地装置14可解码由源装置12产生的 经编码视频数据。因此,目的地装置14可被称作视频解码装置或视频解码设备。源装置12及目的地装置14可为视频编解码装置或视频编解码设备的实例。源装置12及目的地装置14可以包含台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、智能电话等手持机、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机,或者其它类似的设备。
目的地装置14可经由信道16接收来自源装置12的编码后的视频数据。信道16可包括能够将经编码视频数据从源装置12移动到目的地装置14的一个或多个媒体及/或装置。在一个实例中,信道16可包括使源装置12能够实时地将编码后的视频数据直接发射到目的地装置14的一个或多个通信媒体。在此实例中,源装置12可根据通信标准(例如,无线通信协议)来调制编码后的视频数据,且可将调制后的视频数据发射到目的地装置14。所述一个或多个通信媒体可包含无线及/或有线通信媒体,例如射频(RF)频谱或一根或多根物理传输线。所述一个或多个通信媒体可形成基于包的网络(例如,局域网、广域网或全球网络(例如,因特网))的部分。所述一个或多个通信媒体可包含路由器、交换器、基站,或促进从源装置12到目的地装置14的通信的其它设备。
在另一实例中,信道16可包含存储由源装置12产生的编码后的视频数据的存储媒体。在此实例中,目的地装置14可经由磁盘存取或卡存取来存取存储媒体。存储媒体可包含多种本地存取式数据存储媒体,例如蓝光光盘、DVD、CD-ROM、快闪存储器,或用于存储经编码视频数据的其它合适数字存储媒体。
在另一实例中,信道16可包含文件服务器或存储由源装置12产生的编码后的视频数据的另一中间存储装置。在此实例中,目的地装置14可经由流式传输或下载来存取存储于文件服务器或其它中间存储装置处的编码后的视频数据。文件服务器可以是能够存储编码后的视频数据且将所述编码后的视频数据发射到目的地装置14的服务器类型。例如,文件服务器可以包含web服务器(例如,用于网站)、文件传送协议(FTP)服务器、网络附加存储(NAS)装置,及本地磁盘驱动器。
目的地装置14可经由标准数据连接(例如,因特网连接)来存取编码后的视频数据。数据连接的实例类型包含适合于存取存储于文件服务器上的编码后的视频数据的无线信道(例如,Wi-Fi连接)、有线连接(例如,DSL、缆线调制解调器等),或两者的组合。编码后的视频数据从文件服务器的发射可为流式传输、下载传输或两者的组合。
本申请的编解码方法不限于无线应用场景,示例性的,可将所述编解码方法应用于支持以下应用等多种多媒体应用的视频编解码:空中电视广播、有线电视发射、卫星电视发射、流式传输视频发射(例如,经由因特网)、存储于数据存储媒体上的视频数据的编码、存储于数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频编解码系统10可经配置以支持单向或双向视频发射,以支持例如视频流式传输、视频播放、视频广播及/或视频电话等应用。
在图17的实例中,源装置12包含视频源18、视频编码器20及输出接口22。在一些实例中,输出接口22可包含调制器/解调器(调制解调器)及/或发射器。视频源18可包含视频俘获装置(例如,视频相机)、含有先前俘获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频输入接口,及/或用于产生视频数据的计算机图形系统,或上述视频数据源的组合。
视频编码器20可编码来自视频源18的视频数据。在一些实例中,源装置12经由输出接口22将编码后的视频数据直接发射到目的地装置14。编码后的视频数据还可存储于存储媒体或文件服务器上以供目的地装置14稍后存取以用于解码及/或播放。
在图17的实例中,目的地装置14包含输入接口28、视频解码器30及显示装置32。在一些实例中,输入接口28包含接收器及/或调制解调器。输入接口28可经由信道16接收编码后的视频数据。显示装置32可与目的地装置14整合或可在目的地装置14外部。一般来说,显示装置32显示解码后的视频数据。显示装置32可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
视频编码器20及视频解码器30可根据视频压缩标准(例如,高效率视频编解码H.265标准)而操作,且可遵照HEVC测试模型(HM)。H.265标准的文本描述ITU-TH.265(V3)(04/2015)于2015年4月29号发布,可从http://handle.itu.int/11.1002/1000/12455下载,所述文件的全部内容以引用的方式并入本文中。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代 码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种视频图像的呈现方法,其特征在于,包括:
    获取第一视频图像的码流;
    解析所述码流,确定所述第一视频图像以及所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否作为一个连续区域呈现;
    根据所述第一信息呈现所述第一视频图像。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述第一信息,呈现所述第一视频图像,包括:
    当所述第一信息指示所述第一视频图像作为一个连续区域呈现时,呈现所述第一视频图像。
  3. 如权利要求1或2所述的方法,其特征在于,所述第一视频图像的至少一部分和第二视频图像在呈现时邻接,所述根据所述第一信息,呈现所述子图像,包括:
    当所述第一信息指示所述第一视频图像不作为一个连续区域呈现时,将所述第一视频图像与所述第二视频图像按照呈现时的位置关系拼接后呈现。
  4. 一种视频图像的呈现方法,其特征在于,包括:
    获取第一视频图像的码流;
    解析所述码流,确定所述第一视频图像以及所述第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述第一视频图像的图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
    根据所述第二信息,呈现所述第一视频图像。
  5. 如权利要求4所述的方法,其特征在于,所述根据所述第二信息,呈现所述第一视频图像,包括:
    当所述第二信息指示所述第一视频图像为球面图像时,将所述第一视频图像以球面显示的方式呈现。
  6. 如权利要求4所述的方法,其特征在于,所述根据所述第二信息,呈现所述第一视频图像,包括:
    当所述第二信息指示所述第一视频图像为所述未经过第一操作处理的二维平面图像的时,将所述第一视频图像映射为球面图像;
    将所述球面图像以球面显示的方式呈现。
  7. 如权利要求4所述的方法,其特征在于,所述根据所述第二信息,呈现所述第一视频图像,包括:
    当所述第二信息指示所述第一视频图像为所述经过第一操作处理后的二维平面图像的情况时,对所述第一视频图像进行第二操作,得到第二操作处理后的第一视频图像,所述第二操作为所述第一操作的逆操作;
    将所述第二操作处理后的第一视频图像映射为球面图像;
    将所述球面图像以球面显示的方式呈现。
  8. 一种视频图像的封装方法,其特征在于,包括:
    确定所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否为所述第一视频图像对应的待编码图像中的一个连续区域;
    对所述第一视频图像和所述第一信息进行编码,得到所述第一视频图像的码流;
    封装所述码流,获得所述第一视频图像的图像轨迹。
  9. 一种视频图像的封装方法,其特征在于,包括:
    确定第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
    对所述第一视频图像和所述第二信息进行编码,得到所述第一视频图像的码流;
    封装所述码流,获得所述第一视频图像的图像轨迹。
  10. 一种视频图像的呈现装置,其特征在于,包括:
    获取模块,用于获取第一视频图像的码流;
    解析模块,用于解析所述码流,确定所述第一视频图像以及所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否作为一个连续区域呈现;
    呈现模块,用于根据所述第一信息呈现所述第一视频图像。
  11. 如权利要求10所述的装置,其特征在于,所述呈现模块具体用于:
    当所述第一信息指示所述第一视频图像作为一个连续区域呈现时,呈现所述第一视频图像。
  12. 如权利要求10或11所述的装置,其特征在于,所述第一视频图像的至少一部分和第二视频图像在呈现时邻接,所述呈现模块具体用于:
    当所述第一信息指示所述第一视频图像不作为一个连续区域呈现时,将所述第一视频图像与所述第二视频图像按照呈现时的位置关系拼接后呈现。
  13. 一种视频图像的呈现装置,其特征在于,包括:
    获取模块,用于获取第一视频图像的码流;
    解析模块,用于解析所述码流,确定所述第一视频图像以及所述第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述第一图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
    呈现模块,用于根据所述第二信息呈现所述第一视频图像。
  14. 如权利要求13所述的装置,其特征在于,所述呈现模块具体用于:
    当所述第二信息指示所述第一视频图像为球面图像时,将所述第一视频图像以球面显示的方式呈现。
  15. 如权利要求13所述的装置,其特征在于,所述呈现模块具体用于:
    当所述第二信息指示所述第一视频图像为所述未经过第一操作处理的二维平面图像的时,将所述子图像映射为球面图像;
    将所述球面图像以球面显示的方式呈现。
  16. 如权利要求13所述的装置,其特征在于,所述呈现模块具体用于:
    当所述第二信息指示所述第一视频图像为所述经过第一操作处理后的二维平面图像的情况时,对所述第一视频图像进行第二操作,得到第二操作处理后的第一视频图像,所述第二操作为所述第一操作的逆操作;
    将所述第二操作处理后的第一视频图像映射为球面图像;
    将所述球面图像以球面显示的方式呈现。
  17. 一种视频图像的封装装置,其特征在于,包括:
    确定模块,用于确定所述第一视频图像的第一信息,所述第一信息用于指示所述第一视频图像是否为所述第一视频图像对应的待编码图像中的一个连续区域;
    编码模块,用于对所述第一视频图像和所述第一信息进行编码,得到所述第一视频图像的码流;
    封装模块,用于封装所述码流,获得所述第一视频图像的图像轨迹。
  18. 一种视频图像的封装装置,其特征在于,包括:
    确定模块,用于确定第一视频图像的第二信息,所述第二信息用于指示所述第一视频图像的图像类型,所述图像类型包括球面图像、未经过第一操作处理的二维平面图像以及经过第一操作处理后的二维平面图像,其中,所述第一操作为分割、采样、翻转、旋转、镜像、拼接中的至少一种;
    编码模块,用于对所述第一视频图像和所述第二信息进行编码,得到所述第一视频图像的码流;
    封装模块,用于封装所述码流,获得所述第一视频图像的图像轨迹。
PCT/CN2018/088197 2017-05-27 2018-05-24 视频图像的呈现、封装方法和视频图像的呈现、封装装置 WO2018219202A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/689,517 US20200092531A1 (en) 2017-05-27 2019-11-20 Video image presentation and encapsulation method and video image presentation and encapsulation apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710387835.0A CN108965917B (zh) 2017-05-27 2017-05-27 视频图像的呈现、封装方法和视频图像的呈现、封装装置
CN201710387835.0 2017-05-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/689,517 Continuation US20200092531A1 (en) 2017-05-27 2019-11-20 Video image presentation and encapsulation method and video image presentation and encapsulation apparatus

Publications (1)

Publication Number Publication Date
WO2018219202A1 true WO2018219202A1 (zh) 2018-12-06

Family

ID=64455214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/088197 WO2018219202A1 (zh) 2017-05-27 2018-05-24 视频图像的呈现、封装方法和视频图像的呈现、封装装置

Country Status (3)

Country Link
US (1) US20200092531A1 (zh)
CN (1) CN108965917B (zh)
WO (1) WO2018219202A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102598082B1 (ko) * 2016-10-28 2023-11-03 삼성전자주식회사 영상 표시 장치, 모바일 장치 및 그 동작방법
KR102278848B1 (ko) * 2018-07-31 2021-07-19 엘지전자 주식회사 다중 뷰포인트 기반 360 비디오 처리 방법 및 그 장치
CN113489791B (zh) * 2021-07-07 2024-05-14 佳都科技集团股份有限公司 图像上传方法、图像处理方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247513A (zh) * 2007-12-25 2008-08-20 谢维信 利用单摄像机实时生成360°无缝全景视频图像的方法
US20130162641A1 (en) * 2010-09-14 2013-06-27 Thomson Licensing Method of presenting three-dimensional content with disparity adjustments
CN104735464A (zh) * 2015-03-31 2015-06-24 华为技术有限公司 一种全景视频交互传输方法、服务器和客户端
CN105791882A (zh) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 视频编码方法及装置
CN105869113A (zh) * 2016-03-25 2016-08-17 华为技术有限公司 全景图像的生成方法和装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661265B (zh) * 2009-09-29 2011-01-05 哈尔滨师范大学 数字信息立体显示的多通道全息记录方法
US9407904B2 (en) * 2013-05-01 2016-08-02 Legend3D, Inc. Method for creating 3D virtual reality from 2D images
US10204658B2 (en) * 2014-07-14 2019-02-12 Sony Interactive Entertainment Inc. System and method for use in playing back panorama video content
US9858706B2 (en) * 2015-09-22 2018-01-02 Facebook, Inc. Systems and methods for content streaming
CN106341673A (zh) * 2016-08-15 2017-01-18 李文松 一种新型2d/3d全景vr视频的存储方法
CN106358033B (zh) * 2016-08-25 2018-06-19 北京字节跳动科技有限公司 一种全景视频关键帧编码方法和装置
CN106162207B (zh) * 2016-08-25 2019-02-12 北京字节跳动科技有限公司 一种全景视频并行编码方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101247513A (zh) * 2007-12-25 2008-08-20 谢维信 利用单摄像机实时生成360°无缝全景视频图像的方法
US20130162641A1 (en) * 2010-09-14 2013-06-27 Thomson Licensing Method of presenting three-dimensional content with disparity adjustments
CN104735464A (zh) * 2015-03-31 2015-06-24 华为技术有限公司 一种全景视频交互传输方法、服务器和客户端
CN105791882A (zh) * 2016-03-22 2016-07-20 腾讯科技(深圳)有限公司 视频编码方法及装置
CN105869113A (zh) * 2016-03-25 2016-08-17 华为技术有限公司 全景图像的生成方法和装置

Also Published As

Publication number Publication date
CN108965917A (zh) 2018-12-07
US20200092531A1 (en) 2020-03-19
CN108965917B (zh) 2021-07-20

Similar Documents

Publication Publication Date Title
US11758187B2 (en) Methods, devices and stream for encoding and decoding volumetric video
US11025955B2 (en) Methods, devices and stream for encoding and decoding volumetric video
US11095920B2 (en) Method and apparatus for encoding a point cloud representing three-dimensional objects
CN107454468B (zh) 对沉浸式视频进行格式化的方法、装置和流
EP3557864A1 (en) Method and apparatus for image display using privacy masking
CN110419224B (zh) 消费视频内容的方法、电子设备和服务器
US20200228777A1 (en) Methods, devices and stream for encoding and decoding three degrees of freedom and volumetric compatible video stream
CN112425177B (zh) 用于体积视频传输的方法和装置
EP4287637A1 (en) Information processing method and apparatus
US20210219000A1 (en) Point Cloud Encoding Method, Point Cloud Decoding Method, Encoder, and Decoder
US11375235B2 (en) Method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
US10735826B2 (en) Free dimension format and codec
WO2018219202A1 (zh) 视频图像的呈现、封装方法和视频图像的呈现、封装装置
WO2018141116A1 (zh) 编解码方法及装置
JP2019514313A (ja) レガシー及び没入型レンダリングデバイスのために没入型ビデオをフォーマットする方法、装置、及びストリーム
US11798195B2 (en) Method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
WO2020149990A1 (en) Methods and apparatus for multi-encoder processing of high resolution content
CN114945946A (zh) 具有辅助性分块的体积视频
CN112771878B (zh) 处理媒体数据的方法、客户端和服务器
US20220138990A1 (en) Methods and devices for encoding and decoding three degrees of freedom and volumetric compatible video stream
WO2020015517A1 (en) Point cloud encoding method, point cloud decoding method, encoder and decoder
RU2809180C2 (ru) Способ и аппаратура для кодирования и декодирования глубины
CN116684629A (zh) 视频编码及解码方法、装置、电子设备及介质
CN114760525A (zh) 视频生成及播放方法、装置、设备、介质
CN114080799A (zh) 处理体数据

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18810212

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18810212

Country of ref document: EP

Kind code of ref document: A1