CN111935557A - Video processing method, device and system - Google Patents
Video processing method, device and system Download PDFInfo
- Publication number
- CN111935557A CN111935557A CN201910415208.2A CN201910415208A CN111935557A CN 111935557 A CN111935557 A CN 111935557A CN 201910415208 A CN201910415208 A CN 201910415208A CN 111935557 A CN111935557 A CN 111935557A
- Authority
- CN
- China
- Prior art keywords
- image
- sub
- code stream
- encoded data
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 82
- 230000000007 visual effect Effects 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 abstract description 17
- 230000005540 biological transmission Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000004927 fusion Effects 0.000 description 11
- 230000015654 memory Effects 0.000 description 10
- 230000004044 response Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 101100247669 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PB1 gene Proteins 0.000 description 3
- 101150025928 Segment-1 gene Proteins 0.000 description 3
- 101100242902 Thogoto virus (isolate SiAr 126) Segment 1 gene Proteins 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000872 buffer Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/2662—Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47202—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/65—Transmission of management data between client and server
- H04N21/658—Transmission by the client directed to the server
- H04N21/6587—Control parameters, e.g. trick play commands, viewpoint selection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The embodiment of the application provides a video processing method, a device and a system, wherein the method comprises the following steps: decoding the encoded data of the first sub-image covered by the first view to obtain a first sub-image; when the first view angle is switched to the second view angle, acquiring the encoded data of a second sub-image covered by the second view angle and the encoded data of a first reference image of the second sub-image, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image; decoding the encoded data of the first reference image to obtain a first reference image; the encoded data of the second sub-picture is decoded from the first reference picture to obtain a second sub-picture. Therefore, seamless switching is realized in the process of switching the visual angle of the terminal, the high-quality layer to which the second sub-image belongs is always displayed, and the user experience is effectively improved.
Description
Technical Field
The embodiment of the application relates to the field of video processing, in particular to a video processing method, device and system.
Background
At present, in the prior art, when a terminal plays a video, if a view switching occurs, a random cut-in point corresponding to a slice (Tile) in a new view may not be reached at the time of the switching, so that the terminal cannot decode the new Tile, and only a data layer with low quality can be temporarily played or a black screen appears, resulting in a problem of poor user experience.
Disclosure of Invention
The application provides a video processing method, a video processing device and a video processing system, which can avoid the problem of poor user experience caused by a low data layer in the visual angle switching process to a certain extent.
In order to achieve the purpose, the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a video processing method, where the method includes: the terminal (or the client, or the decoding end) decodes the encoded data of the first sub-image covered by the first view to obtain the first sub-image. And then, when the terminal is switched from the first view angle to the second view angle, acquiring the encoded data of the second sub-image covered by the second view angle and the encoded data of the first reference image of the second sub-image, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image. Then, the terminal decodes the encoded data of the first reference image to obtain the first reference image. And the terminal may decode the encoded data of the second sub-image from the first reference image, thereby obtaining the second sub-image.
Through the mode, the server side prepares two independent code streams in advance, wherein the code streams comprise the code stream comprising the coded data of the first reference image and the code stream comprising the coded data of the second subimage (and/or the first subimage), so that the terminal can decode the second subimage covered by the switched visual angle based on the first reference image after the visual angle is switched, the second subimage is obtained, the terminal realizes seamless switching in the visual angle switching process, the display picture of the terminal is always kept at the high-quality layer to which the second subimage belongs, the phenomenon of a low-quality layer or a black screen cannot occur in the visual angle covering range, and the use experience of a user is effectively improved.
In a possible implementation manner, the code stream of the encoded data of the second sub-image is a Segment (Segment) code stream of the encoded data of the second sub-image.
Specifically, in the application, in the interaction process between the server and the terminal, data transmission is performed in the form of a code stream, optionally, the code stream may be a segment code stream, that is, the server sends at least one segment code stream after encoding and packaging to the terminal, so that the terminal decodes and displays the segment code stream.
In a possible implementation manner, the second sub-image is an image other than the first image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located.
By the above manner, when the switching time is the playing time corresponding to the second image in the decoding order in the image sequence described by the code stream of the encoded data of the second sub-image, the first reference image is consistent with the image described by the reference image (i.e. the image frame at the head of the segment code stream) depended on by the second sub-image. Optionally, when the switching time is divided by the first image and the second image, for example: at the playing time corresponding to the third image or the fourth image, the first reference image may be the same as the image described by the reference image depended on by the third image or the fourth image.
In a possible implementation manner, the image content of the first reference image is the same as the image content of the second reference image of the second sub-image, and the second reference image is an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
By the method, the first reference image and the second sub-image are encoded based on the same image content when the server side encodes. Alternatively, the coding rate at the time of coding may be the same, and may be different.
In a possible implementation manner, the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
By the method, when the view angle is switched, the code stream acquired by the terminal is switched from the code stream in which the encoded data of the first sub-image is located, for example, the segment code stream in which the encoded data of the first sub-image is located, to the code stream in which the encoded data of the second sub-image is located, for example, the segment code stream in which the encoded data of the second sub-image is located, the image content of the first reference image is consistent with the image content of the reference image (namely, the second reference image) depended by the second sub-image, wherein the second reference image can be an image other than the first image in the decoding order in the image sequence corresponding to the segment code stream in which the encoded data of the second sub-image is located.
In one possible implementation, the second reference picture is a previous picture of the second sub-picture in the sequence of pictures.
In this way it is achieved that the image content of the first reference image may coincide with the image content of a previous image of the second sub-image in the image sequence, i.e. the second reference image.
In a possible implementation manner, the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all image frames described in the code stream.
By the method, the prediction mode of all image frames in the code stream where the coded data of the second sub-image are located can be inter-frame prediction. For example: all image frames in the segment code stream where the video frames are located can be P frames.
In a possible implementation manner, the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
By the method, the prediction mode of all image frames in the code stream where the coded data of the first reference image are located can be intra-frame prediction. For example: all image frames in each segment code stream are I frames.
In one possible implementation, the first reference picture is a pure random access CRA picture.
By the method, the prediction mode of all image frames in the code stream where the coded data of the first reference image is located can be intra-frame prediction, and all image frames can be CRA type I frames.
In one possible implementation, the second sub-image is not covered by the first view angle.
By the method, the second sub-image acquired by the terminal can not be covered by the first view angle, namely, the second sub-image is only displayed in the second view angle, namely, the method and the device can be applied to a decoding process of a new sub-image (namely, the sub-image only covers the switched view angle and is not covered by the view angle before switching).
In one possible implementation, the second view covers an area of the first sub-image, and the method further includes: the terminal decodes the sub-image of the next frame in the code stream of the coded data of the first sub-image, so as to obtain the sub-image of the next frame; the terminal splices the next frame image with the first reference image so as to obtain a spliced image; and playing the spliced images.
By the above manner, the first reference image can be displayed in the coverage range of the second view angle after being decoded, that is, in the present application, if the switching time is the playing time corresponding to the previous image in the image sequence where the second sub-image is located, the image content of the first reference image acquired by the terminal is consistent with the image content of the previous image, and after the terminal decodes the first reference image, the second sub-image is decoded based on the first reference image, and the first reference image and other subsequent sub-images including the second sub-image and the code stream where the second sub-image is located are displayed in the coverage range of the second view angle.
In one possible implementation, the second view covers an area of the first sub-image, and the method further includes: decoding a next frame sub-image in a code stream where the coded data of the first sub-image are located, thereby obtaining a next frame sub-image; splicing the next frame image with the second sub-image to obtain a spliced image; and playing the spliced images.
By the above manner, it is achieved that the first reference image is decoded and is not displayed in the coverage of the second view angle, that is, in the present application, if the switching time is the playing time corresponding to the second sub-image, the image content of the first reference image acquired by the terminal is consistent with the image content of the reference image (for example, the second reference image in the present application) that the second sub-image depends on, and after the terminal decodes the first reference image, the second sub-image is decoded based on the first reference image, and other subsequent sub-images including the second sub-image and the code stream where the second sub-image is located are displayed in the coverage of the second view angle.
In a possible implementation manner, the playing time corresponding to the first sub-image is t1The playing time corresponding to the second sub-image is t2,t1Time t and2the time is the adjacent playing time, or the difference between the first sub-image and the second sub-image in the playing sequence is N frames, and N is 1,2,3 or 4.
By the above manner, when the first reference image is not displayed (the situation of not displaying is as described above, and is not described herein), the switching time in the present application may be that the terminal plays the first sub-image from the currently played first sub-image (it needs to be noted that the code stream where the first sub-image is located and the second sub-image are located)The code streams of the two sub-images are different, or it can be understood that, when the view angle is switched, the segment code stream displayed by the terminal is switched from one segment code stream to the other segment code stream) can be played at t1The time begins to switch, and the playing time of the second sub-image displayed by the switched visual angle is t2Wherein, t1Time t and2the time is the adjacent playing time. Alternatively, the difference between the first sub-picture and the second sub-picture in the playing order may be N frames, where N is 1,2,3, or 4.
In a possible implementation manner, the playing time corresponding to the first sub-image is t1The playing time corresponding to the first reference image is t2,t1Time t and2the time is the adjacent playing time, or the difference between the first sub-picture and the first reference picture in the playing sequence is N frames, where N is 1,2,3, or 4.
By the above manner, when the first reference image is displayed (as shown above, the display condition is not described herein), the switching time in this application may be that the terminal may be capable of t-time playing the first sub-image (it needs to be noted that the code stream where the first sub-image is located is different from the code stream where the second sub-image is located, or it can be understood that when the view angle is switched, the segment code stream displayed by the terminal is switched from one segment code stream to another segment code stream) during playing of the first sub-image being played1The time begins to switch, and the playing time of the first reference image displayed by the switched visual angle is t2Wherein, t1Time t and2the time is the adjacent playing time. Alternatively, the difference between the first sub-picture and the first reference picture in the playing order may be N frames, where N is 1,2,3, or 4.
In a possible implementation manner, the second view covers the first sub-image area, the first view also covers the third sub-image area, and decoding the encoded data of the first sub-image covered by the first view, so as to obtain the first sub-image includes: merging the first code stream and the third code stream to obtain a first merged code stream, wherein the first code stream is a code stream where the coded data of the first subimage is located, and the third code stream is a code stream where the coded data of the third subimage is located; decoding the first combined code stream to obtain an image comprising a first subimage and a third subimage; decoding the encoded data of the second sub-picture from the first reference picture, thereby obtaining the second sub-picture comprises: merging the first code stream and a second code stream to obtain a second merged code stream, wherein the second code stream is a code stream in which the coded data of a second subimage is located, and the subimage described by the first code stream corresponds to the position in the image described by the first merged code stream and is consistent with the position in the image described by the second merged code stream in which the subimage described by the first code stream corresponds; and decoding the first combined code stream according to the first reference image so as to obtain an image comprising the second subimage and the subimage described by the first code stream.
By the above manner, after the view angle of the terminal is switched, for the decoding process of the first sub-image existing in the coverage range of the first view angle and the second sub-image only covering the second view angle, since the reference image from the image displayed at the current playing time of the code stream where the first sub-image is located is cached in the terminal, when the terminal decodes the code stream (i.e. the first code stream), the position of the code stream in the image described by the merged code stream (i.e. the second merged code stream) can be the same as the position of the image described by the first merged code stream, thereby avoiding the problem that the code stream (e.g. Segment in the overlapped Tile in the embodiment of the present application) fails to be decoded in the decoding process.
In a second aspect, an embodiment of the present application provides a video processing method, where the method includes: the method comprises the steps that a server (or an encoding end) sends encoded data of a first sub-image covered by a first view angle of a terminal to the terminal; and when the first view angle of the terminal is switched to the second view angle, the server sends the encoded data of the second sub-image covered by the second view angle and the encoded data of the first reference image of the second sub-image to the terminal, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image.
In a possible implementation manner, the code stream of the encoded data of the second sub-image is the segment code stream of the encoded data of the second sub-image.
In a possible implementation manner, the second sub-image is an image other than the first image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the image content of the first reference image is the same as the image content of the second reference image of the second sub-image, and the second reference image is an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
In one possible implementation, the second reference picture is a previous picture of the second sub-picture in the sequence of pictures.
In a possible implementation manner, the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all image frames described in the code stream.
In a possible implementation manner, the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
In one possible implementation, the first reference picture is a pure random access CRA picture.
In one possible implementation, the second sub-image is not covered by the first view angle.
In a third aspect, an embodiment of the present application provides a video processing apparatus, which may include: the decoding module can be used for decoding the coded data of the first sub-image covered by the first view angle so as to obtain the first sub-image; the acquisition module can be used for acquiring the encoded data of a second sub-image covered by a second visual angle and the encoded data of a first reference image of the second sub-image when the first visual angle is switched to the second visual angle, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image; and the decoding module may be further configured to decode the encoded data of the first reference image to obtain a first reference image; the decoding module may be further configured to decode the encoded data of the second sub-image from the first reference image, thereby obtaining the second sub-image.
In a possible implementation manner, the code stream of the encoded data of the second sub-image is a Segment (Segment) code stream of the encoded data of the second sub-image.
In a possible implementation manner, the second sub-image is an image other than the first image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the image content of the first reference image is the same as the image content of the second reference image of the second sub-image, and the second reference image is an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
In one possible implementation, the second reference picture is a previous picture of the second sub-picture in the sequence of pictures.
In a possible implementation manner, the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all image frames described in the code stream.
In a possible implementation manner, the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
In one possible implementation, the first reference picture is a pure random access CRA picture.
In one possible implementation, the second sub-image is not covered by the first view angle.
In a possible implementation manner, the second view covers the first sub-image region, and the first view also covers the third sub-image region, and accordingly, the decoding module may be further configured to: merging the first code stream and the third code stream to obtain a first merged code stream, wherein the first code stream is a code stream where the coded data of the first subimage is located, and the third code stream is a code stream where the coded data of the third subimage is located; decoding the first combined code stream to obtain an image comprising a first subimage and a third subimage; the decoding module can be further used for merging the first code stream and the second code stream to obtain a second merged code stream, wherein the second code stream is a code stream where the coded data of the second subimage is located, the subimage described by the first code stream corresponds to the position in the image described by the first merged code stream, and the position of the subimage described by the first code stream corresponding to the position in the image described by the second merged code stream is consistent; and decoding the first combined code stream according to the first reference image so as to obtain an image comprising the second subimage and the subimage described by the first code stream.
In a fourth aspect, an embodiment of the present application provides a video processing apparatus, including: a sending module, configured to send, to the terminal, encoded data of a first sub-image covered by a first view of the terminal; and when the first view angle of the terminal is switched to the second view angle, the coded data of the second sub-image covered by the second view angle and the coded data of the first reference image of the second sub-image are sent to the terminal, wherein the coded data of the first reference image is independent of the code stream of the coded data of the second sub-image.
In a possible implementation manner, the code stream of the encoded data of the second sub-image is the segment code stream of the encoded data of the second sub-image.
In a possible implementation manner, the second sub-image is an image other than the first image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in the decoding order in the image sequence described by the code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the image content of the first reference image is the same as the image content of the second reference image of the second sub-image, and the second reference image is an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
In a possible implementation manner, the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
In one possible implementation, the second reference picture is a previous picture of the second sub-picture in the sequence of pictures.
In a possible implementation manner, the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all image frames described in the code stream.
In a possible implementation manner, the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
In one possible implementation, the first reference picture is a pure random access CRA picture.
In one possible implementation, the second sub-image is not covered by the first view angle.
In a fifth aspect, an embodiment of the present application provides a video processing system, which includes a server and a terminal, where the server sends, to the terminal, encoded data of a first sub-image covered by a first view angle of the terminal; the terminal receives the coded data of the first sub-image and decodes the coded data of the first sub-image to obtain the first sub-image; when the first view angle of the terminal is switched to the second view angle, the server side sends the encoded data of the second sub-image covered by the second view angle and the encoded data of the first reference image of the second sub-image to the terminal, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image; the terminal receives the coded data of the second sub-image and the coded data of the first reference image, and decodes the coded data of the first reference image to obtain a first reference image; and the terminal decodes the encoded data of the second sub-image according to the first reference image, thereby obtaining the second sub-image.
In a sixth aspect, embodiments of the present application provide a computer-readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.
In a seventh aspect, the present application provides a computer-readable medium for storing a computer program including instructions for executing the second aspect or the method in any possible implementation manner of the second aspect.
In an eighth aspect, the present application provides a computer program including instructions for executing the method of the first aspect or any possible implementation manner of the first aspect.
In a ninth aspect, the present application provides a computer program including instructions for executing the method of the second aspect or any possible implementation manner of the second aspect.
In a tenth aspect, an embodiment of the present application provides a chip, which includes a processing circuit and a transceiver pin. Wherein the transceiver pin and the processing circuit are in communication with each other via an internal connection path, and the processor is configured to perform the method of the first aspect or any one of the possible implementations of the first aspect to control the receiving pin to receive signals and to control the sending pin to send signals.
In an eleventh aspect, embodiments of the present application provide a chip, where the chip includes a processing circuit and a transceiver pin. Wherein the transceiver pin and the processing circuit are in communication with each other via an internal connection path, and the processor is configured to perform the method of the second aspect or any possible implementation manner of the second aspect to control the receiving pin to receive signals and to control the sending pin to send signals.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a transmission system for a video processing method, shown schematically;
fig. 2 is a schematic diagram of an exemplary transmission flow;
FIG. 3 is a schematic flow diagram of an exemplary video processing method;
fig. 4 is a schematic structural diagram of a video processing system according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 6 is a schematic diagram of a handover process provided in an embodiment of the present application;
FIG. 7 is a diagram illustrating a manner in which a reference frame provided by an embodiment of the present application functions;
fig. 8 is a schematic flowchart of a decoding method provided in an embodiment of the present application;
fig. 9 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The terms "first" and "second," and the like, in the description and in the claims of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first target object and the second target object, etc. are specific sequences for distinguishing different target objects, rather than describing target objects.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the description of the embodiments of the present application, the meaning of "a plurality" means two or more unless otherwise specified. For example, a plurality of processing units refers to two or more processing units; the plurality of systems refers to two or more systems.
For a better understanding of the present application, the following is a brief introduction to the background that may be involved in the present application:
with the rise of Virtual Reality (VR) applications, panoramic video services are gradually on-line on various platforms, and become an important component of network video traffic. The panoramic video service has the characteristics of ultrahigh resolution and ultrahigh code rate. In order to reduce bandwidth consumption as much as possible while ensuring user experience quality, currently, a Dynamic Adaptive Streaming over HTTP (DASH) transmission system based on HyperText Transfer Protocol (HTTP) for Tile is generally adopted in a panoramic video on demand service to transmit a panoramic video. A typical transmission system is shown in figure 1:
in fig. 1, a server side may be used to provide a bitstream generated based on a panoramic video, where the bitstream includes, but is not limited to: the enhancement layer may alternatively be referred to as a high quality layer or high quality codestream and the base layer may alternatively be referred to as a low quality layer or low quality codestream. The low quality layer is used for coding the panoramic video at a lower code rate or a lower resolution, the high quality layer is used for coding the panoramic video at a higher code rate or a higher resolution, and the high quality layer comprises a plurality of tiles which are divided by the server according to different areas of the panoramic video content. Subsequently, the server may transmit the codestream (including the low-quality layer and the high-quality layer) to the terminal through technologies such as DASH or HTTP Live Streaming (HTTP Live Streaming, HLS).
Still referring to fig. 1, after acquiring the low quality layer and the high quality layer, the terminal fuses multiple tiles in the high quality layer and decodes the fused tiles, and the terminal decodes the low quality layer. The terminal then displays the high-quality layer and the low-quality layer, as shown in fig. 1, i.e., the low-quality layer is a data layer covering the entire panoramic video, and the high-quality layer may be a portion of the video data at the terminal view angle (including the portion within the view angle and beyond the specified range of the view angle). Therefore, the user can present a clearer and high-quality video picture in the visual angle. It should be noted that, in the prior art, during the use of the terminal, the terminal will continuously request the low-quality layer to ensure that, when the view angle of the terminal is rotated, no picture or black area occurs due to unsuccessful decoding of the data frame corresponding to the video data of the high-quality layer.
In the prior art, as described above, the server divides the panoramic video into a plurality of tiles. The server is further configured to divide each Tile into a plurality of segments (segments) according to time, where a start position of each Segment is a random Access point (rap), for example: and (4) I frames. The terminal only needs to request the high-quality Tile and the low-quality layer at the user view angle (including the part in the specified range within the view angle, namely outside the view angle), so that the requirement and consumption of the bandwidth can be reduced while the high-quality picture within the user view angle is ensured.
However, in the prior art, the basic transmission unit of Tile is Segment, and under different application requirements, Segment has different time lengths, such as 1 second or 3 seconds. The start position of each Segment is usually encoded as an I-frame, which is a random access point for Segment switching, that is, switching to Tile can only be performed at RAP (because the frame in each Segment needs to use the previous frame or frames as reference frames for decoding, and the terminal needs to use the I-frame corresponding to the random access point of each Segment as reference frames for decoding the following frames continuously). As shown in fig. 2 (wherein Seg in fig. 2 is an abbreviation for Segment). When the user view changes and a new Tile appears in the view (i.e., Tile m is the Tile in the old view and Tile n is the Tile appearing in the new view), the terminal can decode Tile n (i.e., Seg t +1 shown in fig. 2) only when the next random access point arrives, and display the new Tile (i.e., Tile n) with high quality. That is, as shown in fig. 2, when the view angle is switched in the set, the user can only wait until the set +1 to access a new Tile with high quality, and during this period, the user will view a data layer with low quality or may be called a video picture with low quality.
Therefore, in the prior art, the terminal can only switch to the content of the high-quality layer at the time of the arrival of the random access point, and even if the terminal performs view switching at the time point corresponding to the first P frame after the I frame, the terminal needs to wait for a duration close to the duration corresponding to the Segment to switch to the high-quality content, which seriously affects the user experience.
To solve the above problem, the prior art proposes a feasible implementation scheme, and specifically, the prior art performs a segmentation on each Tile by shifting frame by frame according to a Group of pictures (GOP) size on the server side, and performs encoding according to the partitioned tiles, as shown in fig. 3. Therefore, when the view angle of the user changes, the terminal can immediately request the section of which the next frame is the I frame at the time of view angle switching and decode the section, so that the waiting time for switching can be reduced by reducing the length of the section, and the purpose of switching the high-quality Tile in real time is realized.
However, the prior art adds Segment of multiple backups, i.e. generates a large amount of backup data at the server side, thereby increasing the overhead of transcoders and storage at the server side, decreasing the video coding efficiency of the server, and increasing the amount of data transmitted over the network. And after the view angle is switched, the server sends the whole new data, and when the bandwidth is not ideal, the video picture displayed in the terminal still cannot reach the expectation.
And, in the decoding stage in the prior art, that is, after the terminal receives all tiles, in order to implement real-time decoding of all tiles by using a single decoder, the downloaded tiles need to be spliced and fused at the terminal first to obtain a stream to be decoded. However, the switching time of the user view usually occurs at a certain position in the segment, and it is P frame that original Tile (i.e. Tile appearing in the old view and new view, also called overlap Tile) participates in the fusion in the view, and the first frame that new Tile participates in the fusion is I frame. At this time, the Tile fusion according to the Tile position in the new view will cause the problem that the existing Tile P frame loses its reference frame due to the position change and cannot be decoded.
Aiming at the problems in the prior art, the application provides a video transmission method, aiming at realizing the real-time switching of high-quality video data.
Before describing the technical solution of the embodiment of the present application, a video transmission system according to the embodiment of the present application will be described with reference to the drawings. Referring to fig. 4, a schematic diagram of a video transmission system according to an embodiment of the present application is provided. The video transmission system includes a server 100 and a terminal 200. In the specific implementation process of the embodiment of the present application, the terminal 200 may be a computer, a smart phone, a VR device, or the like. It should be noted that, in practical applications, the number of the servers and the number of the terminals may be one or more, and the number of the servers and the number of the terminals of the video transmission system shown in fig. 4 are only adaptive examples, which is not limited in this application.
In conjunction with the above schematic application scenario shown in fig. 4, a specific embodiment of the present application is described below:
scene one
Referring to fig. 4, as shown in fig. 5, a schematic flow chart of a video processing method in the embodiment of the present application is shown, where in fig. 5:
step 101, the terminal acquires the encoded data of the sub-image covered by the first view angle from the server side.
Specifically, in the embodiment of the present application, the terminal sends a request message generated based on a user instruction to the server. The request message may be used to request the server for encoded data corresponding to a video picture (or referred to as a video image, or image content, etc.) of the video data to be displayed by the terminal.
Optionally, in this application, the server encodes and encapsulates the image content into code streams with different encoding rates, including but not limited to: high quality codestreams (or referred to as high quality layers) and low quality codestreams (or referred to as low quality layers). For the high-quality layer, the server divides the image content into a plurality of sub-images in the spatial dimension, and optionally, in the application, the sub-images may be tiles. And the server encodes the image content corresponding to each Tile to obtain the encoded data corresponding to each Tile. Accordingly, in the present application, the terminal may request, from the server, encoded data of at least one Tile covered by the current view (e.g., the first view in the embodiment of the present application) of the terminal. Note that, in the present application, the Tile covered by the viewing angle includes tiles entirely within the viewing angle range or tiles partially within the viewing angle range.
As described above, the basic transmission unit of Tile (specifically, transmission of Tile encoded data) in the high-quality layer is Segment, that is, the basic transmission unit when the terminal interacts with the server is Segment, or it can be understood that the code streams referred to in this application (including the code stream generated by the server, the code stream transmitted from the server to the terminal, and the code stream received by the terminal) are Segment code streams. For example, the following steps are carried out: the terminal can send the identification information carrying the Segment in the required Tile to the server to request the server for the encoded data of each image in the target Segment code stream in the corresponding Tile. The identification information may be a frame number or a timestamp.
Optionally, the identification information may also be a storage location of Segment, that is, the storage location of Segment may be used to uniquely identify the Segment. For example, the following steps are carried out: the requested information may be "absolute path of codestream segment in server" (such as http://192.168.0.2/dash/Tile0/4.m4s (where 4.m4s represents a segment)).
Alternatively, the identification information may also be the start position and size of the Segment, i.e., the start position and size can be used to uniquely identify the Segment. For example, the following steps are carried out: the requested segment is in the 10000 bytes of file 1.mp4, and is 2000 bytes in size.
Optionally, in this application, after the terminal requests the first Segment from the server, the terminal may continue to send a request to the server to request encoded data of other segments in Tile n.
It should be noted that the video data (which may also be referred to as video pictures or video images) required by the terminal may be video data pre-downloaded in an on-demand scene, or may also be video data currently displayed by the terminal (including within and outside of a viewing angle) in a live scene.
For example, the following steps are carried out: if the terminal can request the server for the video data in the on-demand scene, the requested video data can be the predicted video content to be displayed by the terminal, namely, the preloading function is realized, and the decoding efficiency of the terminal is improved through pre-downloading. Accordingly, in an on-demand scenario, the server is prepared with a high-quality code stream (also referred to as a high-quality layer) and a low-quality code stream (also referred to as a low-quality layer) corresponding to the video data in advance, and optionally, in the on-demand scenario, the terminal may request, from the server, a high-quality layer within a predicted view angle range (including portions within a specified range within the view angle and outside the view angle), and a low-quality layer corresponding to the video data to be displayed by the terminal requesting prediction (as described above, the low-quality layer is encoded data corresponding to the panoramic video).
If in the live scene, the terminal can request the server for the video data in real time, and the requested video data is the video data (or video picture) to be displayed by the terminal currently. Correspondingly, in a live scene, the server encodes the corresponding video data based on the video data requested by the terminal and then sends the encoded video data to the terminal. Alternatively, in a live scene, the terminal may request from the server a high-quality layer within the view range (including portions within a specified range within and outside the view) and a low-quality layer corresponding to video data to be displayed by the terminal (as described above, the low-quality layer is encoded data corresponding to panoramic video).
Optionally, in this embodiment of the application, the terminal requests the server for the required video data (e.g., encoded data corresponding to Tile covered by the first view) by detecting the size of the view. For example, the following steps are carried out: the viewing angles may be circular or rectangular in shape and may vary in size, so that different sizes of viewing angles require different high quality layers. For example, the following steps are carried out: a large viewing angle requires 10 tiles of a high quality layer, while a relatively small viewing angle may require only 6 corresponding tiles.
Next, in the present application, the server receives a request transmitted from the terminal, and transmits Tile encoded data necessary for the terminal to the terminal in response to the request.
As described above, in the present application, the server encodes and encapsulates the image content into codestreams with different encoding rates, and the high-quality codestreams generated by the server are: the panoramic video is divided into a plurality of rectangular tiles in the space dimension, each Tile is divided into a plurality of sections in the time dimension, and the duration of each section can be 1 second (the sections can be divided according to actual requirements). Wherein, the size of each Tile in space may be the same or different, and the duration of each segment in the Tile may also be the same or different. Each segment of each Tile is encoded, wherein a prediction mode of a first image (it should be noted that the first image may also be a frame located at the head of each segment) of each segment in the encoded data of each Tile obtained after encoding may be intra-frame prediction. For example: the first frame of each segment may be an I-frame and the other frames may be P-frames (i.e., frames decoded using inter-frame prediction). Optionally, in this application, the server may not refer to the temporal motion vector when encoding the P frame in the segment, that is, in the high-quality code stream, except for the frame (e.g., I frame) located at the head of each segment, the subsequent frame only refers to the inserted frame, that is, the frame that depends on the preamble except for the frame with the reference position.
Optionally, in the application, if the application scenario is an on-demand scenario, the server encodes the Tile in advance to obtain encoded data of the Tile, and encapsulates the encoded data into a code stream (or a data stream). In the on-demand scene, after receiving a request message sent by a terminal, a server determines the coded data corresponding to the Tile required by the terminal and sends the coded data of the Tile to the terminal.
Optionally, in the application, if the application scene is a live scene, the server determines a Tile required by the terminal based on a request sent by the terminal, encodes the Tile required by the terminal to obtain corresponding encoded data, and sends the encoded data of the Tile to the terminal.
For the encoding, transmission and decoding method of the low quality layer, reference may be made to the prior art embodiments, and details are not repeated in this application.
And 102, decoding the acquired encoded data of the sub-image by the terminal.
Specifically, in the present application, the terminal decodes the received high-quality code stream (i.e., Tile encoded data) and the low-quality code stream respectively (the decoding and displaying process of the low-quality layer may refer to the prior art embodiment, and is not described herein again), and displays the video picture formed by the base layer (i.e., the low-quality layer) and the high-quality layer after decoding. Specifically, after receiving encoded data of one or more tiles covered by a current view (i.e., a first view in this embodiment), the terminal decodes and fuses the encoded data, so as to obtain tiles (or what may be referred to as a Tile-described image sequence).
And 103, when the first view angle is switched to the second view angle, the terminal acquires the encoded data of the sub-image covered by the second view angle and the encoded data of the corresponding reference image from the server side.
Specifically, in the present application, after the view of the terminal is switched, the video data (i.e. Tile) required by the terminal will be updated, and the terminal sends a request message to the server to request the encoded data of the Tile covered by the new view. For example, the following steps are carried out: at t1At this moment, the first view displayed by the terminal is switched to the second view, and then the terminal requests the server for the encoded data of the Tile covered by the second view, where the playing time of the Tile covered by the second view (referred to as the second sub-image in this embodiment of the present application) is t2Time of day, and, t1And t2Adjacent playing time.
Optionally, in this application, tiles covered by the second view angle (i.e. the switched new view angle) (it should be noted that tiles covered by the new view angle include tiles entirely within the new view angle range or tiles partially within the new view angle range) can be divided into two types: one is a new Tile (i.e., not present in the old view, only within the coverage of the new view), and the other is an overlapping Tile (hereinafter simply referred to as an overlapping Tile, i.e., a Tile that is included in the coverage of the old view and the coverage of the new view).
Specifically, in the present application, after the view angle is switched, the terminal stops transmitting the request message corresponding to the old Tile (excluding the overlap Tile) in the old view angle to the server, but the terminal still transmits the request message corresponding to the overlap Tile to the server and locally caches the overlap Tile. And the terminal also sends a request message carrying the identification information of the new Tile to the server. Optionally, the terminal may further send a request message carrying identification information of a reference frame corresponding to the new Tile to the server. That is, in the present application, after the view angle is switched, the terminal acquires only the encoded data of the new Tile and the encoded data of the overlapped Tile instead of the encoded data of the old Tile.
Examples of suchBright: as shown in fig. 6, the high quality layer in view 1 includes Tile1 and Tile2, and before the view is not switched, the terminal requests the server for Tile1 encoded data and Tile2 encoded data. And at the moment of switching the visual angle, the visual angle displayed by the terminal is switched to the visual angle 2, and the high-quality layers in the visual angle 2 comprise Tile2 and Tile 3. Tile2 is the overlapping Tile described above. Among them, the terminal stops the request for Tile1 (referring to the request for Tile1 encoded data), and stops the decoding operation for Tile1 encoded data. Meanwhile, the terminal continues to request the encoded data of Tile2 and buffers the previously requested encoded data. And the terminal acquires Tile3 from the server side (the terminal acquires Tile3 and then acquires from the switching time t)1P frame and subsequent Segmenet (i.e. t) after the corresponding time point2Image and t corresponding to time2Images corresponding to time and later) as indicated by arrows in fig. 6) and encoded data of a reference image corresponding to the encoded data of Tile3 (the manner of generating and transmitting the encoded data of the reference image will be described in detail in the following embodiments).
It should also be noted that, in the present application, t in the new Tile is optional2The picture corresponding to the time (i.e. the picture at the head of the encoded data of the new Tile in the new view) may be a picture other than the first picture in the GOP to which the picture sequence in which the Tile belongs, or may be understood as t in the new Tile2The image corresponding to the time (e.g. the second sub-image in the embodiment of the present application) may be an image other than the first image in decoding order in the image sequence described by the code stream (or may be understood as the Segment) where the image corresponds to the time, and optionally, in the present application, t in the new Tile is an image other than the first image in decoding order2The image corresponding to the moment can also be an image except for a first image and a second image in the decoding sequence in the image sequence described by the code stream where the image is located. For example, the following steps are carried out: t is t2The image corresponding to the time may be an image corresponding to a frame other than the first frame in the Segment to which the image belongs. Or, t in the new Tile2The image corresponding to the time may be an image other than the first image and the second image in the GOP of the image sequence in which the Tile is located, that is, referring to fig. 6, the new imageThe starting frame (i.e. solid black frame) played by Tile3 in the new view may be any frame other than the original first frame and the second frame of the Segment, for example: segment1 of Tile3 describes an image sequence including images corresponding to data frame 1, data frame 2, data frame 3, and data frame 4 after encoding. At the switching time, the start frame at the viewing angle playing time may be a data frame 2, a data frame 3, or a data frame 4, which is not limited in this application.
Specifically, in the present application, the server receives a request from the terminal, and transmits encoded data of a Tile and encoded data of a reference image to the terminal.
Optionally, in this application, the terminal may not sense whether the viewing angle is switched, that is, the request message sent by the terminal to the server may carry the identification information of the desired Tile. After the server receives the request message, the server determines whether the terminal currently has view switching based on the identification information carried in the request message, and if so, the server sends a Tile covered by the new view to the terminal (including the new Tile and the overlapped Tile, and the definitions of the new Tile and the overlapped Tile are as described above, which is not described herein again).
Optionally, in the application, the terminal may detect whether the viewing angle is switched, and if the viewing angle is switched, the request message sent by the terminal to the server may carry the identification information of the Tile covered by the new viewing angle and the identification information of the reference image corresponding to part or all of the tiles covered by the new viewing angle. Alternatively, in the present application, the reference image may be referred to as a reference frame. Optionally, in this application, the identification information of the reference frame (i.e., the reference image) may be an identification manner such as a frame number or a timestamp of the reference frame.
The manner in which the reference frame is generated is explained in detail below. Specifically, in the present application, the server side is based on the panoramic video, and may generate a code stream (hereinafter, referred to as a reference code stream) that can be independently decoded for each image (or each frame) in addition to the high-quality code stream and the low-quality code stream as described above. The independent decoding means that all image blocks in a frame are decoded by intra-frame prediction, that is, the reference code stream is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the reference code stream, and in the present application, the reference code stream is independent of the high-quality code stream. Optionally, in this application, the reference code stream may be a RARF stream. Optionally, in this application, each frame in the reference code stream may be a pure Random Access (CRA) type I frame. Optionally, in this application, each frame in the reference code stream may also be a P frame, where the P frames are all I blocks (that is, the P frame including the I block is still a frame that is decoded by intra-frame reference prediction, and a corresponding frame type may be referred to as a P slice (slice)).
Optionally, in the present application, the image content corresponding to the reference code stream is partially or completely the same as the image content corresponding to the Tile (i.e., the Tile that can be decoded based on the reference frame in the reference code stream, for example, the second sub-image in this embodiment). For example, the following steps are carried out: referring to a code stream (hereinafter, abbreviated as RARF stream) as a reference frame of Tile of a high-quality layer as shown in fig. 7, Tile n in view 2 is new Tile after view switching, and a terminal is in P2The view angle switching is performed at the time corresponding to the frame (i.e. the time t mentioned above), and in the prior art, P of Tile n is3Frame (i.e., t as described above)2Picture corresponding to time) is P2Frame, however, since the terminal is at P2Switching at frame time, no P exists in terminal2Frame (the reason is that the terminal obtains t from the server side2Images corresponding to the time and later), then, in this application, the frame in the RARF stream (i.e. the RARF2 frame shown in the figure) can be regarded as P3Reference frame of frame, so that the terminal can pair P based on RARF frame3The frames are decoded and the decoding of the P frames in the Segment can continue in turn. That is, the server may transmit one or more RARF frames to the terminal, so that the terminal decodes a plurality of frames in the Segment based on the RARF frames as reference frames of the Segment. It should be noted that, in the present application, the image content of the reference frame (e.g. RARF2 frame) in RARF and t in new Tile2Reference frames (e.g. P) on which decoding of temporally corresponding frames depends2Frames) of image contentThe same is true. It is also noted that, as mentioned above, t2The picture at a time may be any picture other than the first picture or the first picture and the second picture (or referred to as the first frame in the Segment) in the GOP, and therefore, in this application, optionally, the picture content of the RARF reference frame may be identical to the picture content of the first frame (where the decoding type of the first frame may be intra-frame decoding or inter-frame decoding), where the decoding type of the RARF reference frame is intra-frame decoding, and in this case, the RARF reference frame may be used as the reference frame of the second frame and decoded. Alternatively, the image content of the reference frame may be identical to the image content of a frame, such as the second frame or the third frame, whose decoding type is inter-frame decoding, that is, as described in the above example.
Alternatively, in the present application, as described above, the image content of the RARF reference frame may be associated with t2The picture content of the reference frame on which the time instant frame (i.e. the second sub-picture in the embodiments of the present application) depends for decoding is the same. Alternatively, in the present application, the reference frame on which the second sub-picture is decoded may be t1The image content corresponding to the time, i.e. the frame before the switching time, i.e. the RARF frame, may be the same as the image content of the frame corresponding to the time before the switching time in the new Tile. For example: still referring to FIG. 7, P terminated at Tile n2The view angle is switched at the time corresponding to the frame, so that the image content corresponding to the RARF frame can be corresponding to the P2If the corresponding image contents of the frames are the same, P3The frame can be decoded based on RARF frame, and it should be noted that in this embodiment, the second view of the terminal is at t2Is played at any moment as P3The image described by the frame. Optionally, the image content corresponding to the RARF frame may also be the same as the image content of the nth frame before the switching time. For example: in FIG. 7, P2The image content corresponding to a frame may also correspond to P1The image content of the frames is the same.
Optionally, in this application, the second viewing angle of the terminal is at t2The time playing can also be the image described by the RARF frame. For example, the following steps are carried out: still taking FIG. 7 as an example, P terminated at Tile n2Looking at the corresponding time of frameThe angle is switched, then the image content corresponding to the RARF frame can be matched with P3If the image contents corresponding to the frames are the same, the terminal decodes the RARF frame and then performs the decoding process at t2The moment played is RARF frame (or P)3Frame) of the image described, then, P4The frame may continue to be decoded with RARF frames as reference frames.
In this application, for the time of generating the reference code stream, optionally, the server may generate three code streams in advance: low-quality code stream, high-quality code stream, and RARF stream corresponding to the high-quality code stream. In this embodiment, the server may send the corresponding low-quality bitstream, Tile in the high-quality bitstream, and the corresponding reference frame to the terminal based on the received request message. For example, the following steps are carried out: in a live broadcast scene, each time the server receives a request message of the terminal, the server generates a low-quality code stream, a high-quality code stream and a RARF stream corresponding to required video data based on the request message, and sends the low-quality code stream and the high-quality code stream to the terminal. After the visual angle of the terminal is switched, the playing time t of the switched visual angle2P in image sequence with Segment as corresponding image3The image corresponding to the frame. The terminal sends P carrying Tile n to the server3A request message for a frame number of the frame. After the server receives the request message, the server determines that the terminal has view switching, and then the server generates Tile (the generated Tile is from P) based on the request message3Start of frame, see fig. 7) and generation of P3RARF frame of frame, ready for P3The encoded data of the frame refers to the encoded data of the RARF frame to be decoded. Subsequently, the server will start P in Tile n3The Segment1 of the start of frame and the corresponding RARF frame are sent to the terminal. Alternatively, the terminal can also request P like a server3RARF frame and P having the same image content4Frame and P frame following it, and the server generates the coded data of RARF frame and P based on the request message4And the encoded data of the frame, and the like, and encapsulates the encoded data of the frame into Segment, and transmits the Segment to the terminal.
Optionally, in the on-demand scenario, the server may generate a high-quality code stream, a low-quality code stream, and a RARF stream based on the video data, and store the high-quality code stream, the low-quality code stream, and the RARF stream locally in the server. Subsequently, the server may send the corresponding low-quality code stream and Tile in the high-quality code stream to the terminal based on the received request message of the terminal. Similarly, if the server detects that the terminal has a view switching, the server may send Tile in the high-quality code stream corresponding to the new view and the corresponding RARF frame to the terminal.
Optionally, in this application, to reduce the overhead stored by the server, the server may further generate a corresponding RARF frame after detecting that the terminal view is switched or receiving a request message carrying the RARF stream. For example, the following steps are carried out: before the view angle is switched, in a live scene, the server generates corresponding coded data of high-quality Tile based on a request message of the terminal and sends the coded data to the terminal. After the visual angle of the terminal is switched, the terminal sends P carrying Tile to the server3A request message for a frame number of the frame. After the server receives the request message, the server determines that the terminal has view switching, and then the server generates Tile (the generated Tile is from P) based on the request message3Start of frame, see fig. 7) and generation of P3RARF frame of frame, ready for P3Frame reference to the RARF frame for decoding. Subsequently, the server will start P in Tile n3The Segment1 of the start of frame and the corresponding RARF frame are sent to the terminal.
Optionally, in the present application, as mentioned above, the terminal requests the encoded data of the new Tile and continues to request the encoded data of the old Tile (i.e. the above-mentioned overlapped Tile) appearing in the new view. Therefore, the server transmits the encoded data of the new Tile corresponding to the new view to the terminal, and also continuously transmits the encoded data of the overlapped Tile to the terminal. For example, the following steps are carried out: at 1 minute and 30 seconds, the terminal sends I carrying Tile m to the server0The frame number of the frame, i.e. the encoded data of the first Segment (assuming each Segment is 3 seconds in duration) that the terminal requests Tile m from the server, then, at 1 minute 32 seconds (i.e. t is1Time), the terminal view is switched, then the terminal requests the server for the P of Tilen15Frame (i.e. t)2Frame played at corresponding time) (first Segment belonging to Tile n) and P15The frame corresponds to a RARF frame. GarmentAfter receiving the request, the server generates Tile and the corresponding RARF frame (note that, in this embodiment, the generated RARF frame is P and P15Reference frames corresponding to frames, i.e., the client can pair P based on one RARF frame15Performs decoding) and from P15Start of frame transfer Tile and corresponding P to terminal15RARF frame of frame. Meanwhile, the server will continue to send the remaining frames in the first Segment of Time m to the terminal, and at 1 minute and 33 seconds, the terminal will continue to request the encoded data of the second Segment of Tile m from the server, and the server will respond to the request of the terminal and send the encoded data of the other segments of Tile m to the terminal.
Optionally, in this application, the server returns a response message to the terminal, where the response message carries, in addition to the identification information of the tiles, actual location information used for indicating each Tile in the image data or the video image.
In step 104, the terminal decodes the sub-picture in the second view based on the encoded data of the reference picture.
Specifically, in the embodiment of the present application, the terminal acquires encoded data of Tile (specifically, encoded data of each image frame in Segment in Tile), encoded data of a reference image (for example, encoded data of RARF reference frame), and position information of each Tile (the position information acquired by the terminal from the server side is actual position information of Tile in the screen) in the high-quality code stream sent by the server. Then, the terminal may decode the encoded data of the RARF reference frame to obtain the RARF reference frame. The terminal may decode a new Tile in the new view based on the decoded RARF reference frame. And, the terminal can also continue decoding the overlapped Tile based on the received reference frame (I frame and/or P frame).
Specifically, in the present application, as shown in fig. 8, tiles (including Tile1, Tile2, Tile3, Tile4, Tile8, Tile9, Tile10, Tile11) in view 1 (old view), and tiles (including Tile4, Tile5, Tile6, Tile7, Tile11, Tile12, Tile13, Tile14) in view 2 (new view) are shown. Wherein, Tile4 and Tile11 are overlapping tiles. It should be noted that the number of tiles shown in fig. 8 is merely a suitable example, and for example: the number of overlapping tiles may be one or more (e.g., 5), and the specific number is determined according to the size of the viewing angle, the rotation angle, the speed, and the like.
Referring to fig. 8, the process of decoding the code stream (including the encoded data of the new Tile and the overlapped Tile) by the terminal is as follows:
1) and the terminal fuses the encoded data of the new Tile and the encoded data of the overlapped Tile.
Specifically, in the present application, as shown in fig. 8, Tile: when Tile4 is merged with Tile11 in the old view, its location in the Tile splicing (Tile Merge, or Tile rewrite)) module (or may be referred to as a bitstream splicing module or a Tile splicing module) is as shown in fig. 8, i.e., Tile4 and Tile11 are located at the rightmost side of Tile Merge. It should be noted that, in the prior art, a Tile to be fused is placed in the Tile Merge according to the position of the Tile corresponding to the video data in the new view, then Tile4 and Tile11 may be located at the leftmost side in the Tile Merge corresponding to the new view, however, the reference frame on which the encoded data of Tile4 and the encoded data of Tile11 depend during decoding is still located at the rightmost side of the Tile Merge module, and thus Tile4 and Tile11 in the prior art cannot be decoded because the reference frame cannot be found.
With continued reference to fig. 8, optionally, in the embodiment of the present application, Tile4 and Tile11 can still be placed at the rightmost side of the Tile Merge module, i.e., remain in the original position. Then, the encoded data of Tile4 and the encoded data of Tile11 can be decoded based on the received reference frame (I frame or P frame).
Optionally, in this application, the terminal places a new Tile (including Tile5, Tile6, Tile7, Tile12, Tile13, Tile14) in the Tile Merge in an empty location other than the overlapping Tile. The placing position can be as shown in fig. 8, it should be noted that the placing manner of the new Tile in fig. 8 is only a suitable example, and the new Tile can be placed on any empty position except the overlapping Tile. Subsequently, the terminal fuses a plurality of tiles in the Tile Merge, and the specific fusing manner may refer to the technical scheme in the prior art embodiment, which is not described in detail herein.
Optionally, in the present application, the terminal stores the actual location information and the fusion location information of the Tile in a local cache. Optionally, the fusion position of Tile in Tile Merge (i.e. the placement position of Tile in Tile Merge), and the actual position of Tile in the video image (i.e. the position information sent by the server in step 105) may be represented in an array manner. For example, the following steps are carried out: taking the fusion position of the encoded data of each Tile in the Tile Merge shown in fig. 8 as an example, two 8-bit arrays are set to record the position information of each Tile in the Tile Merge line by line, where the array 1 is used to represent the real position of the Tile in the picture, and the array 2 is used to represent the fusion position information of the Tile. Specifically, the information in the two arrays in view 1 is [1, 2,3, 4, 8, 9, 10, 11 ]. When the view is switched to the turning view 2, the information in the array 1 is updated to [4, 5, 6, 7, 11, 12, 13, 14], and the information in the array 2 is [5, 6, 7, 4, 12, 13, 14, 11 ]. Optionally, the array may also be maintained by the server, that is, the server generates a corresponding array according to the fusion position of Tile in the terminal and the actual position of Tile in real time, and sends the array to the client when the client switches the view angle.
2) And decoding the fused Tile encoded data.
Specifically, the terminal decodes the fused Tile encoded data through a single decoder to obtain the Tile. For details of the decoding, reference may be made to the technical solutions in the prior art embodiments, and details of the present application are not described herein.
3) And projecting the Tile according to the recorded position information (including the actual position information and the fusion position information).
Specifically, in the present application, the terminal may project each Tile based on the fusion location information and the actual location information to obtain a video image to be displayed. For example, the following steps are carried out: the array 2 corresponding to the view angle 2 stored in the terminal is: [5, 6, 7, 4, 12, 13, 14, 11], based on array 2, and with reference to array 1: [4, 5, 6, 7, 11, 12, 13, 14], in the projection, Tile4 and Tile11 are placed at the position represented by array 1, and then the terminal displays the projected Tile on the screen of the terminal.
To sum up, according to the technical scheme in the embodiment of the application, the reference code stream of which each frame is an intra-frame decoding frame is set as the reference frame of the high-quality code stream, so that after the terminal performs view angle switching, the new Tile in the view angle can be decoded based on the reference frame, the fast switching of the code stream can be realized without backing up a large amount of data, and the storage overhead of the server is remarkably reduced. And in the decoding process, the overlapped Tile can continue decoding at the original position, thereby reducing repeated downloading of the downloaded code stream to the terminal, further reducing the bandwidth overhead and improving the resource utilization rate.
Optionally, in this application, after the view angle is switched, the terminal may request the server for the new Tile, the RARF frame corresponding to the new Tile (i.e., the RARF frame corresponding to the frame where the new Tile is at the head), and the overlap Tile and the RARF frame corresponding to the overlap Tile (i.e., the RARF frame corresponding to the frame where the overlap Tile is at the head). Alternatively, the server may send the new Tile, the RARF frame corresponding to the new Tile, the overlap Tile, and the RARF frame corresponding to the overlap Tile to the terminal after detecting the view angle switching. In this embodiment, the positions of the overlapped Tile and the new Tile in the Tile target may be arbitrarily placed, and in the decoding process, the terminal may decode the frame of the overlapped Tile located at the head based on the RARF frame corresponding to the overlapped Tile, and decode the new Tile based on the RARF frame corresponding to the new Tile. And recombining the tiles based on the position information to obtain a source video image.
Scene two
Referring to fig. 4, as shown in fig. 9, a schematic flow chart of a video processing method in the embodiment of the present application is shown, where in fig. 9:
in step 201, the terminal obtains the encoded data of the sub-image covered by the first view angle and the encoded data of the corresponding reference image from the server side.
Specifically, in the embodiment of the present application, the terminal sends a request message generated based on a user instruction to the server. The request message may be used to request the server for the low quality layers displayed by the terminal and the high quality layers within the viewing angle range of the terminal.
Alternatively, in the present application, the server may encode and encapsulate the image into a high-quality bitstream, a low-quality bitstream, and a RARF stream. Optionally, in this application, the high-quality code stream generated by the server is: the panoramic video is divided into a plurality of rectangular tiles in the space dimension, each Tile is divided into a plurality of sections in the time dimension, and the duration of each section can be 1 second (the sections can be divided according to actual requirements). Wherein, the size of each Tile in space may be the same or different, and the duration of each segment in the Tile may also be the same or different. The prediction mode of the image frame in each Segment of each Tile is inter-frame prediction, that is, Segment performs inter-frame prediction on all the image frames described by Segment to obtain a code stream, for example: all image frames in the Segment may be P frames.
Optionally, in this application, the server encodes the high-quality code stream frame by frame into an RARF code stream according to a video picture corresponding to Tile, and the first RARF frame of each segment is used as a reference frame of each segment of the high-quality code stream. And the code stream corresponding to the video data sent by the server to the terminal comprises: low-quality code stream, high-quality code stream, the first RARF frame of each segment in the RARF code stream.
Optionally, in the on-demand scene, the server encodes the video data in advance to generate code streams with different qualities, where the code streams include, but are not limited to: high quality codestream, low quality codestream, and RARF stream. The server receives a request message sent by the terminal, determines video data required by the terminal (the required data can be data to be displayed currently and/or predicted video data), and sends a low-quality code stream corresponding to the video data, a Tile in a high-quality code stream corresponding to the video data, and a first RARF frame of each segment in the RARF stream corresponding to the high-quality code stream to the terminal. Wherein, Tile in the high-quality code stream sent is determined based on identification information carried in the request message sent by the terminal.
Optionally, in a live broadcast scene, after receiving a request message from a terminal, a server determines video data required by the terminal, encodes the video data required by the terminal, and obtains a low-quality code stream, a high-quality code stream, and an RARF stream after encoding. Then, the server sends the low-quality code stream, the Tile in the high-quality code stream corresponding to the video data and the first RARF frame of each segment in the RARF stream corresponding to the high-quality code stream to the terminal. Wherein, Tile in the high-quality code stream sent is determined based on identification information carried in the request message sent by the terminal.
Optionally, in the process of generating the RARF frame, the server may generate only the RARF frame used as a reference frame of the high-quality code stream, that is, only the first RARF frame of each segment, so as to reduce overhead.
For other details, reference may be made to step 101, which is not described herein.
In step 202, the terminal decodes the acquired encoded data of the sub-image based on the encoded data of the reference image.
Specifically, the terminal decodes the low-quality code stream and the high-quality code stream respectively to obtain a base layer (or may be referred to as a low-quality layer) corresponding to the low-quality code stream and a high-quality layer corresponding to the high-quality code stream.
In the application, the terminal receives tiles corresponding to a plurality of high-quality code streams, and the high-quality code streams splice and fuse the tiles so as to decode the fused tiles by using a single decoder. Then, the terminal may decode the low-quality bitstream based on the I-frame of the low-quality bitstream. And the terminal decodes the high-quality code stream based on the received RARF frame. For example: the terminal receives the encoded data of the first RARF frame (RARF 1 frame for short) of the first Segment of the RARF n stream, wherein the RARF n stream has the same image content as Tile n, that is, RARF n is generated according to Tile n frame-by-frame encoding, and then the terminal uses the RARF1 frame as the reference frame of the first Segment of Tile n to decode the first image in the image sequence of the first Segment. The specific fusion and decoding process may refer to the technical solutions in the prior art embodiments, and are not described in detail herein.
Step 203, when the first view angle is switched to the second view angle, the terminal acquires the encoded data of the sub-image covered by the second view angle and the encoded data of the corresponding reference image from the server side.
Specifically, in the embodiment of the present application, after the view angle of the terminal is switched, the video data required by the terminal needs to be updated, and the terminal sends a request message to the server to request the video data on the low-quality layer and the new view angle (including the portions within the specified range inside the new view angle and outside the new view angle).
For details, refer to step 103, which is not described herein.
In step 204, the terminal decodes the sub-picture in the second view based on the encoded data of the reference picture.
Specifically, in the present application, the server transmits, to the terminal, the low-quality bitstream covered by the new view, the high-quality bitstream, and the encoded data of the RARF frame (in this embodiment, the RARF frame refers to the RARF frame that is a reference frame of the top frame of tiles in the new view (the top frame refers to a frame corresponding to the Segment of the tiles in the new view and the view switching time)).
Optionally, in this application, the server continues to send Segment in the overlapped Tile to the terminal according to a request of the terminal, and sends, based on the request of the terminal, the encoded data of the new Tile and the encoded data of the RARF frame corresponding to the new Tile to the terminal (it should be noted that the number of the sent RARF frames may be one or more, where, if one RARF frame is sent, the RARF frame is a reference frame for decoding the data frame located at the head in the Tile as described above).
Optionally, in this application, the server returns a response message to the terminal, where the response message carries, in addition to the identification information of the tiles, actual location information used for indicating each Tile in the video image. The actual position information may be represented by an array, and the specific representation manner will be exemplified in the following embodiments.
Then, in the present application, the terminal acquires Tile encoded data, a corresponding reference frame, and location information of each Tile in the high-quality code stream sent by the server. Then, the terminal may decode encoded data of an image frame positioned at the head in the Segment of the new Tile in the new view based on the RARF frame. And the terminal can also decode the image frame at the head in the overlapped Tile and continue to decode other frames based on the received encoded data of the RARF frame.
For details, reference may be made to step 104, which is not described herein.
In summary, according to the technical scheme in the embodiment of the present application, since the high-quality code stream does not have an I frame, the storage overhead of the server is further reduced.
Optionally, in this application, after the view angle is switched, the terminal may request the server for the new Tile, the RARF frame corresponding to the new Tile, and the overlapping Tile and the RARF frame corresponding to the overlapping Tile. Alternatively, the server may send the new Tile, the RARF frame corresponding to the new Tile, the overlap Tile, and the RARF frame corresponding to the overlap Tile to the terminal after detecting the view angle switching. In this embodiment, the positions of the overlapped Tile and the new Tile in the Tile target may be arbitrarily placed, and in the decoding process, the terminal may decode the overlapped Tile based on the RARF frame corresponding to the overlapped Tile, and decode the new Tile based on the RARF frame corresponding to the new Tile. And mapping each Tile based on the position information to obtain a video image to be displayed.
Optionally, the method may be applied to the embodiments of the scenario one and the scenario two, in this application, the high-quality code stream may further include a plurality of code streams with different quality levels (i.e., different coding rates). For example, the following steps are carried out: the high-quality code stream may include three high-quality code streams of quality level a, quality level B, and quality level C. Wherein quality class a is higher than quality class B is higher than quality class C.
Optionally, in this application, the server may determine, based on an external condition, a quality level corresponding to the Tile sent to the terminal. External conditions include, but are not limited to: the processing power of the terminal, the size of the transmission bandwidth between the server and the terminal, etc. For example, the following steps are carried out: if the load of the current terminal is larger or the bandwidth is smaller, the server can transmit the coded data of the Tile corresponding to the code stream with the quality level of C to the terminal. On the contrary, if the external condition is good, the server may transmit the coded data of Tile corresponding to the code stream with quality level a to the terminal. Typically, a server may transmit one or more streams of quality class a, quality class B, and quality class C to a terminal. For example, the following steps are carried out: taking a video image containing a face image of a person as an example, the server may send a code stream with a quality level of a generated image of the face image of the person to the terminal, and send code streams with quality levels of B and/or C generated by other images except the face image within the view angle range to the terminal, so that the terminal can display the face image more clearly.
The above-mentioned scheme provided by the embodiment of the present application is introduced mainly from the perspective of interaction between network elements. It is understood that the video processing apparatus (including the terminal and the server) includes a hardware structure and/or a software module for performing the respective functions in order to realize the above functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the video processing apparatus may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.
In the case of dividing each functional module by corresponding functions, fig. 10 shows a schematic diagram of a possible structure of the video processing apparatus 300 (e.g., the terminal 200) according to the foregoing embodiment, and as shown in fig. 10, the video processing apparatus may include: a decoding module 301 and an obtaining module 302. The decoding module 301 can be used to "decode the encoded data of the first sub-image covered by the first view, so as to obtain the first sub-image". For example: the module can support the video processing apparatus 300 to perform the steps 102 and 104 in the above method embodiments. The obtaining module 302 may be used for the step of obtaining encoded data of a second sub-picture covered by a second view when the first view is switched to the second view. For example: this module may support the video processing apparatus 300 to perform step 103 in the above method embodiment. And, the decoding module 301 can be further configured to "decode the encoded data of the first reference image to obtain a first reference image" and "decode the encoded data of the second sub-image according to the first reference image to obtain the second sub-image". For example: this module may support the video processing apparatus 300 to perform step 104 in the above method embodiment.
In this application, fig. 11 shows a schematic diagram of a possible structure of the video processing apparatus 400 (e.g., the server 100) involved in the above embodiments, and as shown in fig. 11, the video processing apparatus 400 may include: a sending module 401. The sending module 401 may be configured to send "to a terminal, encoded data of a first sub-image covered by a first view of the terminal" and send "to the terminal, when the first view of the terminal is switched to a second view, encoded data of a second sub-image covered by the second view".
An apparatus provided by an embodiment of the present application is described below. As shown in fig. 12:
the apparatus comprises a processing module 501 and a communication module 502. Optionally, the apparatus further comprises a storage module 503. The processing module 501, the communication module 502 and the storage module 503 are connected by a communication bus.
The communication module 502 may be a device with transceiving function for communicating with other network devices or a communication network.
The storage module 503 may include one or more memories, which may be devices in one or more devices or circuits for storing programs or data.
The memory module 503 may be independent and connected to the processing module 501 through a communication bus. The memory module may also be integrated with the processing module 501.
The apparatus 500 may be used in a network device, circuit, hardware component, or chip.
The apparatus 500 may be a terminal in the embodiment of the present application, such as the terminal 200. Optionally, the communication module 502 of the apparatus 500 may include an antenna and a transceiver of the terminal.
The apparatus 500 may be a chip in a terminal in an embodiment of the present application. The communication module 502 may be an input or output interface, a pin or circuit, or the like. Alternatively, the storage module may store computer-executable instructions of the terminal-side method, so that the processing module 501 executes the terminal-side method in the above-described embodiments. The storage module 503 may be a register, a cache, or a RAM, etc., and the storage module 503 may be integrated with the processing module 501; the memory module 503 may be a ROM or other type of static storage device that may store static information and instructions, and the memory module 503 may be separate from the processing module 501.
When the apparatus 500 is a terminal or a chip in a terminal in the embodiment of the present application, the apparatus 500 may implement the method performed by the terminal in the embodiment described above.
The apparatus 500 may be a server in the embodiment of the present application, such as the server 100.
The apparatus 500 may be a chip in a server in the embodiment of the present application. The communication module 502 may be an input or output interface, a pin or circuit, or the like. Alternatively, the storage module may store computer-executable instructions of the server-side method, so that the processing module 501 executes the server-side method in the above embodiments. The storage module 503 may be a register, a cache, or a RAM, etc., and the storage module 503 may be integrated with the processing module 501; the memory module 503 may be a ROM or other type of static storage device that may store static information and instructions, and the memory module 503 may be separate from the processing module 501.
When the apparatus 500 is a server or a chip in a server in the embodiment of the present application, the method performed by the server in the embodiment described above may be implemented.
The embodiment of the application also provides a computer readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer storage media and communication media, and may include any medium that can communicate a computer program from one place to another. A storage media may be any available media that can be accessed by a computer.
As an alternative design, a computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The embodiment of the application also provides a computer program product. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in the above method embodiments are generated in whole or in part when the above computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus.
The steps of a method or algorithm described in connection with the disclosure of the embodiments of the application may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a network device. Of course, the processor and the storage medium may reside as discrete components in a network device.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (28)
1. A method of video processing, the method comprising:
decoding encoded data of a first sub-image covered by a first view to obtain the first sub-image;
when the first view angle is switched to a second view angle, acquiring encoded data of a second sub-image covered by the second view angle and encoded data of a first reference image of the second sub-image, wherein the encoded data of the first reference image is independent of a code stream where the encoded data of the second sub-image is located;
decoding the encoded data of the first reference image to obtain a first reference image;
and decoding the encoded data of the second sub-image according to the first reference image so as to obtain the second sub-image.
2. The method according to claim 1, wherein the code stream of the encoded data of the second sub-image is a Segment (Segment) code stream of the encoded data of the second sub-image.
3. The method according to claim 1 or 2, wherein the second sub-image is an image other than the first image in decoding order in an image sequence described by a code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in decoding order in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
4. The method according to any one of claims 1 to 3, wherein the image content of the first reference image is the same as the image content of a second reference image of the second sub-image, the second reference image being an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
5. The method according to claim 4, wherein the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
6. The method of claim 5, wherein the second reference picture is a previous picture in the sequence of pictures of the second sub-picture.
7. The method according to any one of claims 1 to 6, wherein the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all image frames described in the code stream.
8. The method according to any one of claims 1 to 7, wherein the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
9. The method of claim 8, wherein the first reference picture is a pure random access (CRA) picture.
10. The method of any of claims 1 to 9, wherein portions of the second sub-image not covered by the first view angle.
11. The method according to any of claims 1 to 10, wherein the second view covers the first sub-image area, the first view also covers a third sub-image area, and the decoding the encoded data of the first sub-image covered by the first view to obtain the first sub-image comprises:
merging the first code stream and the third code stream to obtain a first merged code stream, wherein the first code stream is the code stream where the coded data of the first subimage is located, and the third code stream is the code stream where the coded data of the third subimage is located;
decoding the first merged code stream to obtain an image comprising a first sub-image and the third sub-image;
said decoding the encoded data of the second sub-image from the first reference image, thereby obtaining the second sub-image comprises:
merging the first code stream and a second code stream to obtain a second merged code stream, wherein the second code stream is a code stream where the coded data of the second subimage is located, and the subimage described by the first code stream corresponds to the position in the described image of the first merged code stream and is consistent with the position in the described image of the second merged code stream corresponding to the subimage described by the first code stream;
and decoding the first combined code stream according to the first reference image so as to obtain an image comprising a second sub-image and the sub-image described by the first code stream.
12. A method of video processing, the method comprising:
sending the encoded data of a first sub-image covered by a first view of a terminal to the terminal;
and when the first view angle of the terminal is switched to the second view angle, sending the encoded data of the second sub-image covered by the second view angle and the encoded data of the first reference image of the second sub-image to the terminal, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image.
13. The method according to claim 12, wherein the code stream of the encoded data of the second sub-image is a segment code stream of the encoded data of the second sub-image.
14. The method according to claim 12 or 13, wherein the second sub-image is an image other than the first image in decoding order in an image sequence described by a code stream in which the encoded data of the second sub-image is located, or the second sub-image is an image other than the first image and the second image in decoding order in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
15. The method according to any one of claims 12 to 14, wherein the image content of the first reference image is the same as the image content of a second reference image of the second sub-image, the second reference image being an image in an image sequence described by a code stream in which the encoded data of the second sub-image is located.
16. The method according to claim 15, wherein the second reference picture is a picture other than the first picture in decoding order in a picture sequence described by a code stream in which the encoded data of the second sub-picture is located.
17. The method of claim 16, wherein the second reference picture is a previous picture in the sequence of pictures of the second sub-picture.
18. The method according to any one of claims 12 to 17, wherein the code stream in which the encoded data of the second sub-image is located is a code stream obtained by performing inter-frame prediction on all the image frames described in the code stream.
19. The method according to any one of claims 12 to 18, wherein the code stream in which the encoded data of the first reference image is located is a code stream obtained by performing intra-frame prediction on all image blocks of all image frames described in the code stream, and the code stream in which the encoded data of the first reference image is located is independent of the code stream in which the encoded data of the second sub-image is located.
20. The method of claim 19, wherein the first reference picture is a pure random access (CRA) picture.
21. The method of any of claims 12 to 20, wherein the second sub-image is not covered by the first view angle.
22. A video processing apparatus, characterized in that the apparatus comprises:
the decoding module is used for decoding the coded data of the first sub-image covered by the first view angle so as to obtain the first sub-image;
the acquisition module is used for acquiring the encoded data of a second sub-image covered by a second visual angle and the encoded data of a first reference image of the second sub-image when the first visual angle is switched to the second visual angle, wherein the encoded data of the first reference image is independent of the code stream of the encoded data of the second sub-image;
the decoding module is further used for decoding the coded data of the first reference image to obtain a first reference image;
the decoding module is further configured to decode the encoded data of the second sub-image according to the first reference image, thereby obtaining the second sub-image.
23. A video processing apparatus, characterized in that the apparatus comprises:
the terminal comprises a sending module, a receiving module and a processing module, wherein the sending module is used for sending the coded data of a first sub-image covered by a first visual angle of the terminal to the terminal;
the sending module is further configured to send, to the terminal, encoded data of a second sub-image covered by a second view and encoded data of a first reference image of the second sub-image when the first view of the terminal is switched to the second view, where the encoded data of the first reference image is independent of a code stream where the encoded data of the second sub-image is located.
24. A video processing system is characterized by comprising a server side and a terminal, wherein,
the server side sends the encoded data of the first sub-image covered by the first view angle of the terminal to the terminal;
the terminal receives the coded data of the first sub-image, and decodes the coded data of the first sub-image to obtain the first sub-image;
when a first view angle of the terminal is switched to a second view angle, the server end sends encoded data of a second sub-image covered by the second view angle and encoded data of a first reference image of the second sub-image to the terminal, wherein the encoded data of the first reference image is independent of a code stream where the encoded data of the second sub-image is located;
the terminal receives the coded data of the second sub-image and the coded data of the first reference image, and decodes the coded data of the first reference image to obtain a first reference image;
and the terminal decodes the encoded data of the second sub-image according to the first reference image, thereby obtaining the second sub-image.
25. A computer readable storage medium having stored thereon a computer program comprising at least one code section executable by a computer for controlling the computer to perform the method of any one of claims 1 to 11.
26. A computer readable storage medium having stored thereon a computer program comprising at least one piece of code executable by a computer to control the computer to perform the method of any one of claims 12 to 21.
27. A computer program for performing the method of any one of claims 1 to 11 when the computer program is executed by a computer.
28. A computer program for performing the method of any one of claims 12 to 21 when the computer program is executed by a computer.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910415208.2A CN111935557B (en) | 2019-05-13 | 2019-05-13 | Video processing method, device and system |
PCT/CN2020/085263 WO2020228482A1 (en) | 2019-05-13 | 2020-04-17 | Video processing method, apparatus and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910415208.2A CN111935557B (en) | 2019-05-13 | 2019-05-13 | Video processing method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111935557A true CN111935557A (en) | 2020-11-13 |
CN111935557B CN111935557B (en) | 2022-06-28 |
Family
ID=73282865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910415208.2A Active CN111935557B (en) | 2019-05-13 | 2019-05-13 | Video processing method, device and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111935557B (en) |
WO (1) | WO2020228482A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112770051A (en) * | 2021-01-04 | 2021-05-07 | 聚好看科技股份有限公司 | Display method and display device based on field angle |
CN114189696A (en) * | 2021-11-24 | 2022-03-15 | 阿里巴巴(中国)有限公司 | Video playing method and device |
CN115174942A (en) * | 2022-07-08 | 2022-10-11 | 叠境数字科技(上海)有限公司 | Free visual angle switching method and interactive free visual angle playing system |
CN115834966A (en) * | 2022-11-07 | 2023-03-21 | 抖音视界有限公司 | Video playing method, device, equipment and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114598853A (en) * | 2020-11-20 | 2022-06-07 | 中国移动通信有限公司研究院 | Video data processing method and device and network side equipment |
CN113660529A (en) * | 2021-07-19 | 2021-11-16 | 镕铭微电子(济南)有限公司 | Video splicing, encoding and decoding method and device based on Tile encoding |
CN114268835B (en) * | 2021-11-23 | 2022-11-01 | 北京航空航天大学 | VR panoramic video space-time slicing method with low transmission flow |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150346812A1 (en) * | 2014-05-29 | 2015-12-03 | Nextvr Inc. | Methods and apparatus for receiving content and/or playing back content |
CN107439010A (en) * | 2015-05-27 | 2017-12-05 | 谷歌公司 | The spherical video of streaming |
CN109698949A (en) * | 2017-10-20 | 2019-04-30 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and system based on virtual reality scenario |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102055967B (en) * | 2009-10-28 | 2012-07-04 | 中国移动通信集团公司 | Visual angle switching and encoding method and device of multi-viewpoint video |
JP6352248B2 (en) * | 2013-04-04 | 2018-07-04 | シャープ株式会社 | Image decoding apparatus and image encoding apparatus |
EP3485646B1 (en) * | 2016-07-15 | 2022-09-07 | Koninklijke KPN N.V. | Streaming virtual reality video |
CN108616758B (en) * | 2016-12-15 | 2023-09-29 | 北京三星通信技术研究有限公司 | Multi-view video encoding and decoding methods, encoder and decoder |
CN108810636B (en) * | 2017-04-28 | 2020-04-14 | 华为技术有限公司 | Video playing method, virtual reality equipment, server, system and storage medium |
-
2019
- 2019-05-13 CN CN201910415208.2A patent/CN111935557B/en active Active
-
2020
- 2020-04-17 WO PCT/CN2020/085263 patent/WO2020228482A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150346812A1 (en) * | 2014-05-29 | 2015-12-03 | Nextvr Inc. | Methods and apparatus for receiving content and/or playing back content |
CN107439010A (en) * | 2015-05-27 | 2017-12-05 | 谷歌公司 | The spherical video of streaming |
CN109698949A (en) * | 2017-10-20 | 2019-04-30 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device and system based on virtual reality scenario |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112770051A (en) * | 2021-01-04 | 2021-05-07 | 聚好看科技股份有限公司 | Display method and display device based on field angle |
CN112770051B (en) * | 2021-01-04 | 2022-01-14 | 聚好看科技股份有限公司 | Display method and display device based on field angle |
CN114189696A (en) * | 2021-11-24 | 2022-03-15 | 阿里巴巴(中国)有限公司 | Video playing method and device |
CN114189696B (en) * | 2021-11-24 | 2024-03-08 | 阿里巴巴(中国)有限公司 | Video playing method and device |
CN115174942A (en) * | 2022-07-08 | 2022-10-11 | 叠境数字科技(上海)有限公司 | Free visual angle switching method and interactive free visual angle playing system |
CN115834966A (en) * | 2022-11-07 | 2023-03-21 | 抖音视界有限公司 | Video playing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111935557B (en) | 2022-06-28 |
WO2020228482A1 (en) | 2020-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111935557B (en) | Video processing method, device and system | |
CN111052754B (en) | Streaming frames of spatial elements to a client device | |
TWI571113B (en) | Random access in a video bitstream | |
KR102218385B1 (en) | Codec techniques for fast switching | |
US10999583B2 (en) | Scalability of multi-directional video streaming | |
CN110784740A (en) | Video processing method, device, server and readable storage medium | |
CN105791882A (en) | Video coding method and device | |
US20190313144A1 (en) | Synchronizing processing between streams | |
WO2017197828A1 (en) | Video encoding and decoding method and device | |
CN110351606B (en) | Media information processing method, related device and computer storage medium | |
CN110582012B (en) | Video switching method, video processing device and storage medium | |
KR20170030521A (en) | Dependent random access point pictures | |
CN112602329B (en) | Block scrambling for 360 degree video decoding | |
JP7553679B2 (en) | Encoder and method for encoding tile-based immersive video - Patents.com | |
CN114189696A (en) | Video playing method and device | |
WO2022031633A1 (en) | Supporting view direction based random access of bitstream | |
CN115988171A (en) | Video conference system and immersive layout method and device thereof | |
CN113508601B (en) | Client and method for managing streaming sessions of multimedia content at a client | |
CN114339265A (en) | Video processing, control, playing method, device and medium | |
CN114513658B (en) | Video loading method, device, equipment and medium | |
Fautier | State-of-the-art virtual reality streaming: Solutions for reducing bandwidth and improving video quality | |
CN117319592B (en) | Cloud desktop camera redirection method, system and medium | |
CN117579843B (en) | Video coding processing method and electronic equipment | |
CN115834899A (en) | View-angle-based VR video coding method and device, storage medium and electronic equipment | |
CN117440174A (en) | Video processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |