CN114598853A - Video data processing method and device and network side equipment - Google Patents

Video data processing method and device and network side equipment Download PDF

Info

Publication number
CN114598853A
CN114598853A CN202011312797.0A CN202011312797A CN114598853A CN 114598853 A CN114598853 A CN 114598853A CN 202011312797 A CN202011312797 A CN 202011312797A CN 114598853 A CN114598853 A CN 114598853A
Authority
CN
China
Prior art keywords
resolution
video stream
low
panoramic video
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011312797.0A
Other languages
Chinese (zh)
Inventor
尹瑜坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011312797.0A priority Critical patent/CN114598853A/en
Publication of CN114598853A publication Critical patent/CN114598853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1016IP multimedia subsystem [IMS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/156Mixing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a method, a device and network side equipment for processing video data, wherein the method comprises the following steps: decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream; performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream; carrying out tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles; taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate the panoramic video stream; in the embodiment of the invention, the panoramic video acquisition terminal uploads the low-resolution video stream to the network side equipment, and the network side equipment adopts the video resolution enhancement technology to perform video resolution enhancement processing on the network side equipment and cooperates with the tile division transmission technology, so that a panoramic video with higher resolution is obtained, and the reasonable utilization of the uplink bandwidth and the downlink bandwidth is ensured.

Description

Video data processing method and device and network side equipment
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for processing video data, and a network device.
Background
The panoramic video is used as a new video service hotspot and is composed of 360-degree omnibearing video pictures, and a user can select any angle to watch interested contents to obtain immersive experience. The panoramic video has wide application prospects in scenes such as landscape scenic spots, live-action guide of city blocks, panoramic exhibition of enterprise appearances, live-action virtual tourism of museums and exhibitions, major events, artistic performances, field records of sports events and the like, and meanwhile, in real-time video services, live panoramic broadcast and panoramic video conversation also become current research hotspots.
However, the common resolution cannot reach the definition of the traditional video in the 360-degree video, so that the screen window effect is easy to generate, and the user experience is seriously influenced, therefore, the panoramic video generally has the characteristic of ultrahigh resolution (4K or more), and has huge data volume compared with the traditional video. Considering the limitation of end-to-end hardware condition, network bandwidth, computer resources and other factors, how to transmit panoramic video with huge data processing capacity in a limited network environment becomes a key problem.
The end-to-end network transmission scheme of the panoramic video can be divided into a series of processes of acquisition, editing, storage, transfer and broadcasting, and because the 360-degree panoramic video has the characteristic of ultrahigh resolution, the huge data volume usually has high requirements on network bandwidth in the transmission process. In the prior art, a self-adaptive transmission method using a terminal FOV (field of view) feedback, in particular, a block (Tile) transmission method is used, as shown in fig. 1, to acquire ultra high definition such as 8K panoramic video for uploading, and perform decoding and mapping with a cloud server, wherein a first path is an original ultra high definition video stream and is used as an enhancement layer for Tile division and encoding, and a second path is downsampled to reduce a resolution such as 2K and is used as a background layer for encoding. The two paths of panoramic video streams are subjected to network self-adaptive downloading according to the FOV feedback of the watching terminal, so that the reasonable utilization of the video distribution network bandwidth in the panoramic video transmission process can be improved. However, the problem still exists that the Tile-based panoramic video transmission scheme can improve the utilization of network downlink bandwidth and save video distribution network bandwidth, and the role of services with higher requirements on network uplink bandwidth, such as panoramic live broadcast and panoramic video call, is relatively limited. That is, the existing scheme only saves the downlink bandwidth of the network, and the congestion status of the uplink bandwidth is still not improved.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method and an apparatus for processing video data, and a network side device, so as to solve a problem that an existing transmission scheme for panoramic video cannot improve uplink bandwidth congestion.
In order to solve the foregoing problems, an embodiment of the present invention provides a method for processing video data, which is executed by a network device, and includes:
receiving at least one low resolution video stream;
decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate the panoramic video stream;
and sending the panoramic video stream to a panoramic video playing terminal.
Wherein the method further comprises:
determining an FOV area according to FOV information fed back by the panoramic video playing terminal;
the generating of the panoramic video stream by splicing the first path of low-resolution video stream as a background layer of the panoramic video and the high-resolution video tile as an enhancement layer of the panoramic video includes:
and taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video, and splicing to generate the FOV-based panoramic video stream.
Wherein, the processing of the resolution enhancement on the second path of low-resolution video stream to obtain the high-resolution video stream includes:
obtaining a target resolution corresponding to an enhancement layer of a panoramic video stream;
and performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream with the resolution being the target resolution.
Wherein, the taking the first low resolution video stream as a background layer of the panoramic video includes:
and carrying out equidistant columnar projection ERP and coding on the first path of low-resolution video stream to form a background layer of the panoramic video.
Wherein, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video comprises:
and carrying out ERP and coding on the high-resolution video tile in the FOV area to form an enhancement layer of the panoramic video.
Wherein, under the condition that the network side equipment is IP multimedia subsystem network side equipment, the panoramic video playing terminal is a calling party of panoramic video call, and the method comprises the following steps:
receiving a panoramic video request sent by the calling party, and forwarding the panoramic video request to a called party of the panoramic video call;
the receiving at least one low resolution video stream includes:
and receiving at least one path of low-resolution video sent by the called party under the condition that the called party supports the panoramic video function.
An embodiment of the present invention further provides a device for processing video data, which is applied to a network device, and includes:
the first receiving module is used for receiving at least one path of low-resolution video stream;
the first processing module is used for decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
the second processing module is used for performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
the third processing module is used for carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
a generating module, configured to splice together the first low-resolution video stream as a background layer of the panoramic video and the high-resolution video tile as an enhancement layer of the panoramic video to generate a panoramic video stream;
and the sending module is used for sending the panoramic video stream to a panoramic video playing terminal.
An embodiment of the present invention further provides a network side device, including a processor and a transceiver, where the transceiver receives and transmits data under the control of the processor, and the processor is configured to perform the following operations:
receiving at least one path of low-resolution video stream;
decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate the panoramic video stream;
and sending the panoramic video stream to a panoramic video playing terminal.
The embodiment of the present invention further provides a network side device, which includes a memory, a processor, and a program stored in the memory and capable of running on the processor, and when the processor executes the program, the processor implements the video data processing method described above.
Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the processing method of video data as described above.
The technical scheme of the invention at least has the following beneficial effects:
in the method and the device for processing video data and the network side equipment of the embodiment of the invention, the panoramic video acquisition terminal uploads the low-resolution video stream to the network side equipment, and the network side equipment adopts the video resolution enhancement technology to perform video resolution enhancement processing on the network side equipment and cooperates with the tile division transmission technology, so that a panoramic video with higher resolution is obtained, not only the network uplink bandwidth is saved, but also the network downlink bandwidth is saved, and the reasonable utilization of the uplink bandwidth and the downlink bandwidth is ensured.
Drawings
Fig. 1 is a schematic diagram illustrating a tile transmission scheme of a panoramic video in the prior art;
fig. 2 is a flowchart illustrating steps of a method for processing video data according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a tile transmission scheme of a panoramic video in a video data processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a network-side device according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an example of a first embodiment of the present invention;
FIG. 6 is a flowchart illustrating steps of an example first embodiment of the present invention;
FIG. 7 is a schematic diagram of a second example provided by an embodiment of the invention;
FIG. 8 is a flowchart illustrating steps of example two provided by an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present invention;
fig. 10 is a second schematic structural diagram of a network-side device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 2 and fig. 3, an embodiment of the present invention provides a method for processing video data, which is executed by a network device, and includes:
step 201, receiving at least one path of low resolution video stream; and the at least one path of low-resolution video stream is collected and reported by the panoramic video collection terminal.
The low-resolution video stream may be a single low-resolution panoramic video stream, or may be multiple low-resolution 2D video streams, which is not limited herein.
Step 202, decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
step 203, performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
step 204, performing block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
step 205, taking the first low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate a panoramic video stream;
and step 206, sending the panoramic video stream to a panoramic video playing terminal.
It should be noted that the network side device may be a cloud server of the general internet, or may be a cloud server of the IMS network center; in other words, the video data processing method can be used for panoramic live broadcast based on the general internet and panoramic video call based on the IMS network, and cloud servers need to be deployed in different communication network centers to realize functions of panoramic video super-resolution enhancement processing and tile division processing.
On one hand, in order to solve the problem of uplink bandwidth congestion in panoramic video live broadcast, the embodiment of the invention uploads a video stream with lower resolution by reducing the video acquisition resolution requirement of a panoramic video acquisition terminal (which can be a common intelligent terminal or a professional panoramic video acquisition device), performs video enhancement processing on the resolution by cooperating with tile transmission at a network side, and then performs panoramic video stream distribution by combining with a tile division transmission scheme to form end-to-end panoramic video transmission.
On the other hand, in order to solve the problem that the uplink bandwidth of the panoramic video call cannot sufficiently meet the requirement of ultra-high resolution data transmission, the embodiment of the invention performs super-resolution enhancement processing on the low resolution video stream sent by one call party in the cloud server of the network center and sends the video stream to the other call party through the tile transmission scheme, so that the quality and experience of the video call can be ensured, and the reasonable utilization of the uplink bandwidth and the downlink bandwidth of the call network can be ensured.
As an alternative embodiment, the method further comprises:
determining an FOV area according to FOV information fed back by the panoramic video playing terminal;
accordingly, step 205 includes:
and taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video, and splicing to generate the FOV-based panoramic video stream.
The embodiment of the invention deploys the panoramic video super-resolution enhancement processing capability at the network center, and cooperates with a tile transmission method to improve the utilization condition of the uplink network bandwidth during the transmission of the panoramic video tile.
Wherein, the processing of the resolution enhancement on the second path of low-resolution video stream to obtain the high-resolution video stream includes:
obtaining a target resolution corresponding to an enhancement layer of a panoramic video stream;
and performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream with the resolution being the target resolution.
For example, the network side device determines the target resolution of the panorama video super resolution enhancement processing in combination with the resolution of the enhancement layer in the tile transmission scheme, such as 8K.
For another example, a multi-frame video super-resolution algorithm model is constructed, for example, a countermeasure Network (GANs) and a deep-circular Convolutional Network (DRCN) are generated in combination, and a non-linear mapping relationship between a low-resolution video frame and a high-resolution video frame is obtained based on a deep learning method combining external samples and internal samples, so as to further determine the target resolution.
It should be noted that, if the second path of low-resolution video stream is a multi-path low-resolution 2D video stream, super-resolution enhancement Processing may be performed by using a parallel GPU (Graphics Processing Unit).
Wherein, the taking the first low resolution video stream as a background layer of the panoramic video includes:
and carrying out equidistant columnar projection ERP and coding on the first path of low-resolution video stream to form a background layer of the panoramic video. Namely, the first path of low-resolution video stream is directly generated by carrying out the ERP splicing of the panoramic video and is coded in an H.265 mode as a background layer.
Wherein, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video comprises:
and carrying out ERP and coding on the high-resolution video tile in the FOV area to form an enhancement layer of the panoramic video.
As shown in fig. 2, one or more low-resolution video streams (for example, 2K resolution) uploaded by the acquisition end code are obtained, an h.264/h.265 coding mode can be adopted, and the decoded video stream in YUV format is copied into two paths, wherein the first path is generated by directly performing panoramic video ERP stitching on the low-resolution video stream, and the low-resolution video stream is coded in an h.265 mode as a background layer. And secondly, performing super-resolution enhancement processing on the low-resolution video stream, and splicing to generate an 8K high-resolution panoramic video ERP as a tile transmission enhancement layer. And then coding the high-resolution panoramic video ERP in a Tile division and H.265 mode. And according to the FOV information fed back by the panoramic video playing terminal, combining the first as a background stream and the second high-resolution tile in the FOV area, rendering in a GPU to generate the FOV-based panoramic video, and performing network distribution.
It should be noted that the network side device provided by the embodiment of the present invention is deployed on a cloud server of a universal internet and/or an IMS network center, and implements resolution enhancement processing and tile division of a panoramic video. As shown in fig. 3, the network side device includes a communication unit, a codec conversion unit, a processing unit, and a dividing unit. The communication unit comprises the functions of acquiring and distributing video streams, and supports network transport protocol RTP (real-time transport protocol)/DASH (adaptive streaming media protocol)/HLS (streaming media network transport protocol) and the like; the encoding and transcoding unit comprises the encoding functions of uploading video stream decoding and FOV self-adaptive panoramic video stream, and needs to support an H.264/H.265 encoding mode; the processing unit comprises a video super-resolution enhancement processing algorithm model, panoramic video splicing and ERP projection; the partition unit includes the parameter definition and the mesh partition function of tile transmission, such as the number of partitions, and can cooperate with the processing unit.
As an optional embodiment, in a case that the network side device is an IP multimedia subsystem network side device, the panoramic video playing terminal is a calling party of a panoramic video call, and the method includes:
receiving a panoramic video request sent by the calling party, and forwarding the panoramic video request to a called party of the panoramic video call;
the receiving at least one low resolution video stream includes:
and receiving at least one path of low-resolution video sent by the called party under the condition that the called party supports the panoramic video function.
Further, the method further comprises:
receiving a 2D video stream sent by the calling party;
sending the 2D video stream to the called party.
It should be noted that, during the panoramic video call, the calling party requests the panoramic video of the called party, and normally, the called party can only play the 2D video of the calling party.
In order to more clearly describe the video data processing method provided by the embodiment of the present invention, the following description is made with reference to two examples.
As an example, a processing method of panoramic live broadcast in a general internet environment is shown in fig. 5 and 6:
step 1, a 360-degree panoramic camera or a plurality of paths of 2D cameras collect images of a live broadcast site, and encoding can be performed by adopting encoders such as H.264/H.265 and the like.
And 2, uploading the data stream to a network position where a cloud server of the Internet equipment is located by one path of panoramic video stream or multiple paths of 2D video streams in a WiFi/5G/special line mode or the like.
And 3, processing the one-path panoramic video stream or the multiple 2D video streams with low resolution in the Internet equipment to obtain the target resolution fed back by the dividing unit, and generating a video super-resolution method for resisting network learning by adopting PG-GAN (video super-resolution) of NVIDIA (network video interface) and the like to obtain a high-resolution video. The high-resolution video and the original low-resolution video are spliced to form a background layer and an enhancement layer of the panoramic video respectively. This step may be implemented using general purpose GPU equipment.
And 4, carrying out Tile division processing on the high-resolution video panoramic video, taking the original low-resolution panoramic video as a background stream according to the FOV parameter fed back by the panoramic video acquisition terminal, combining the high-resolution videos in the FOV area, rendering the high-resolution videos in a GPU to generate the FOV-based panoramic video, and carrying out network distribution through a communication unit.
And 5, transmitting the enhanced FOV panoramic video stream based on Tile to a panoramic video playing terminal for playing through connection modes such as WiFi/5G/special line, wherein the adopted coding mode is H.265.
In the example, a panoramic video acquisition terminal acquires, encodes, compresses and uploads one or more paths of video streams with lower resolution, obtains a high-resolution panoramic video through a panoramic video resolution enhancement processing module deployed in a network center, forms a panoramic video stream according to FOV parameters fed back by a playing terminal, and distributes and transmits the panoramic video stream in cooperation with a tile network. The example combines the own media control function of the network, ensures collaboration and packet loss resistance, and distributes the data to the playing terminal for watching. In a panoramic video transmission application scene based on the general internet, such as panoramic live broadcast, the example can effectively relieve the congestion condition possibly encountered by an uploading network in a transmission link.
Example two, a data processing method of a panoramic video call in an IMS network environment is shown in fig. 7 and 8:
and 6, communicating two parties A and B, wherein B is a calling party and A is a called party. And B, initiating a request (namely SDP offer) of the panoramic video call to A to request the panoramic video media function and the viewport processing function, and forwarding the SDP offer to the called user A by the IMS network cloud server to request the panoramic video function. The called user A generates a response (namely SDP answer) indicating whether the panoramic video function is supported, and the cloud server responds to the user B and confirms the panoramic video media function and accepts the viewport processing function.
And 7, acquiring a low-resolution panoramic image by using the 360-degree panoramic camera in the step A, acquiring a 2D image by using a traditional camera in the step B, and encoding by using encoders such as H.264/H.265 and the like after media negotiation. And uploading the media data stream to a network position where a cloud server of an IMS center is located through IMS bearing and session control by the panoramic video stream of the A and the 2D video stream of the B, and transmitting the media data stream through an RTP (real-time transport protocol) media stream.
Step 8, the low resolution panoramic video stream sent by a negotiates with the partition unit for the target resolution of the tile transmission enhancement layer through decoding in the network side device described in fig. 4. The high-resolution video stream and the original low-resolution video stream are spliced to form an enhancement layer and a background layer of the panoramic video respectively. This step may be implemented using general purpose GPU equipment. At this time, the data streams of B and A need to keep clock synchronization.
And 9, processing the obtained high-resolution video and the original low-resolution panoramic video sent by the A, dividing the original low-resolution panoramic video into background streams through tiles and according to the FOV parameters fed back by the B, combining the high-resolution videos in the FOV area and rendering the high-resolution videos in the FOV area in a GPU to generate the FOV-based panoramic video, and distributing the FOV-based panoramic video through a communication unit. The user B can feed back FOV information to the cloud server through the RTCP, and the cloud server transmits a high-resolution FOV video through the RTP media stream. And the user B can receive the enhanced FOV panoramic video stream based on Tile and transmit the enhanced FOV panoramic video stream to a playing terminal, the enhanced FOV panoramic video stream is played by adopting H.265 decoding, and the user A can receive the 2D video of the user B and play the enhanced FOV panoramic video stream by adopting/H.264/H.265 decoding.
And step 10, either end of the two parties A and B quits the call, and the call is ended.
In this example, since the operator mainly follows the traffic model of data traffic when actually deploying the radio resources (i.e., the uplink traffic is much smaller than the downlink traffic). In other words, the uplink and downlink resources are asymmetric. However, they are symmetric in the audio-visual communication model based on the bit rate and resolution of the SDP offer/answer negotiation. Panoramic video transmission in such an asymmetric traffic model is prone to network resource shortage, especially in the uplink, video calls may be affected and even audio calls may be rewound. Therefore, the embodiment of the invention additionally deploys the resolution enhancement processing capability of the panoramic video on the service capability layer of the IMS network center, and reduces the requirement on uplink network resources. The panoramic video acquisition party can send a media stream with lower resolution, and an uplink media stream can transmit the media stream with higher resolution through super-resolution processing in an IMS network center, so that the conversation experience of the called party of the panoramic video is guaranteed.
In this example, when performing media negotiation, the SDP offer is sent by the user B to request a panoramic video media function and a viewport processing function, and the IMS network cloud server forwards the SDP offer to the called user a to request a panoramic video function or a 2D video function. And the SDP answer generates a response whether the cloud server supports the panoramic video function or not by the called user A, and the cloud server responds to the remote user B to confirm the panoramic video media function and accept the viewport processing function. The call user B can feed back FOV information to the cloud server through the RTCP, the cloud server processes the FOV panoramic video stream and transmits the FOV panoramic video stream through the RTP media stream, and the call user B can receive the high-resolution FOV video.
In summary, the embodiment of the invention realizes the network center super-resolution enhancement processing function in the 360-degree panoramic video end-to-end network tile transmission process, so that the requirements of the current panoramic video on acquisition equipment are reduced, the end-to-end transmission network uplink bandwidth is saved, the panoramic video uploading efficiency is improved, and the panoramic video quality and the viewing experience are guaranteed. Taking the uploaded 2K lower-resolution panoramic video mentioned in the embodiment as an example, negotiating the target resolution of 8K with a Tile transmission and distribution scheme, performing super-resolution processing, ensuring viewing experience, and simultaneously reducing the uploading panoramic video code rate from 8K 120Mbps to 2K 4Mbps, accordingly, greatly reducing the requirement on network uplink bandwidth, saving 60% -80% of network downlink bandwidth by combining the Tile transmission and distribution scheme, and significantly improving the utilization of the network bandwidth of the upstream and downstream panoramic video end-to-end network transmission.
Compared with the prior art, the technical scheme is realized by using a system unit form, bidirectional negotiation of video super-resolution processing and tile division transmission is realized, a tile transmission scheme is optimized, and the flexibility and the universality of functions are improved.
As shown in fig. 9, an embodiment of the present invention further provides a video data processing apparatus, applied to a network device, including:
a first receiving module 901, configured to receive at least one low resolution video stream;
the first processing module 902 decodes and copies the low-resolution video stream to obtain a first low-resolution video stream and a second low-resolution video stream;
a second processing module 903, configured to perform resolution enhancement processing on the second low-resolution video stream to obtain a high-resolution video stream;
a third processing module 904, configured to perform block tile division processing on the high-resolution video stream to obtain multiple high-resolution video tiles;
a generating module 905, configured to use the first low-resolution video stream as a background layer of a panoramic video, and use a high-resolution video tile as an enhancement layer of the panoramic video, and generate a panoramic video stream by splicing;
a sending module 906, configured to send the panoramic video stream to a panoramic video playing terminal.
As an alternative embodiment, the apparatus further comprises:
the area determining module is used for determining an FOV area according to the FOV information fed back by the panoramic video playing terminal;
the generation module comprises:
and the generation submodule is used for splicing and generating the panoramic video stream based on the FOV by taking the first path of low-resolution video stream as a background layer of the panoramic video and taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video.
As an alternative embodiment, the second processing module comprises:
a first unit to obtain a target resolution corresponding to an enhancement layer of a panoramic video stream;
and the second unit is used for carrying out resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream with the resolution being the target resolution.
As an alternative embodiment, the third processing module comprises:
and the third unit is used for carrying out equidistant columnar projection ERP and coding on the first path of low-resolution video stream to form a background layer of the panoramic video.
As an alternative embodiment, the generating submodule includes:
and the fourth unit is used for carrying out ERP and coding on the high-resolution video tile in the FOV area to form an enhancement layer of the panoramic video.
As an optional embodiment, in a case that the network side device is an IP multimedia subsystem network side device, the panoramic video playing terminal is a calling party of a panoramic video call, and the apparatus includes:
the request receiving module is used for receiving a panoramic video request sent by the calling party and forwarding the panoramic video request to a called party of the panoramic video call;
the first receiving module includes:
the first receiving submodule is used for receiving at least one path of low-resolution video sent by the called party under the condition that the called party supports the panoramic video function.
The embodiment of the invention realizes the network center super-resolution enhancement processing function in the 360-degree panoramic video end-to-end network tile transmission process, so that the requirements of the current panoramic video on acquisition equipment are reduced, the end-to-end transmission network uplink bandwidth is saved, the panoramic video uploading efficiency is improved, and the panoramic video quality and the viewing experience are ensured; and bidirectional negotiation of video super-resolution processing and tile division transmission is realized, a tile transmission scheme is optimized, and flexibility and universality of functions are improved.
It should be noted that the video data processing apparatus provided in the embodiment of the present invention is an apparatus capable of executing the above video data processing method, and all embodiments of the above video data processing method are applicable to the apparatus and can achieve the same or similar beneficial effects.
As shown in fig. 10, an embodiment of the present invention further provides a network-side device, including a processor 100 and a transceiver 110, where the transceiver 110 receives and transmits data under the control of the processor 100, and the processor 100 is configured to perform the following operations:
receiving at least one path of low-resolution video stream;
decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate the panoramic video stream;
and sending the panoramic video stream to a panoramic video playing terminal.
As an alternative embodiment, the processor is further configured to perform the following operations:
determining an FOV area according to FOV information fed back by the panoramic video playing terminal;
and taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video, and splicing to generate the FOV-based panoramic video stream.
As an alternative embodiment, the processor is further configured to perform the following operations:
obtaining a target resolution corresponding to an enhancement layer of a panoramic video stream;
and performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream with the resolution being the target resolution.
As an alternative embodiment, the processor is further configured to perform the following operations:
and carrying out equidistant columnar projection ERP and coding on the first path of low-resolution video stream to form a background layer of the panoramic video.
As an alternative embodiment, the processor is further configured to perform the following operations:
and carrying out ERP and coding on the high-resolution video tile in the FOV area to form an enhancement layer of the panoramic video.
As an optional embodiment, in a case that the network-side device is an IP multimedia subsystem network-side device, the panoramic video playing terminal is a calling party of a panoramic video call, and the processor is further configured to perform the following operations:
receiving a panoramic video request sent by the calling party, and forwarding the panoramic video request to a called party of the panoramic video call;
and receiving at least one path of low-resolution video sent by the called party under the condition that the called party supports the panoramic video function.
The embodiment of the invention realizes the network center super-resolution enhancement processing function in the 360-degree panoramic video end-to-end network tile transmission process, so that the requirements of the current panoramic video on acquisition equipment are reduced, the end-to-end transmission network uplink bandwidth is saved, the panoramic video uploading efficiency is improved, and the panoramic video quality and the viewing experience are ensured; and bidirectional negotiation of video super-resolution processing and tile division transmission is realized, a tile transmission scheme is optimized, and flexibility and universality of functions are improved.
It should be noted that, the network side device provided in the embodiments of the present invention is a network side device capable of executing the video data processing method, and all embodiments of the video data processing method are applicable to the network side device, and can achieve the same or similar beneficial effects.
An embodiment of the present invention further provides a network-side device, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, where the processor implements each process in the above-described embodiment of the video data processing method when executing the program, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process in the above-described video data processing method embodiment, and can achieve the same technical effect, and details are not repeated here to avoid repetition. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks.
These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method for processing video data, which is executed by a network side device, is characterized by comprising the following steps:
receiving at least one path of low-resolution video stream;
decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile as an enhancement layer of the panoramic video, and splicing to generate the panoramic video stream;
and sending the panoramic video stream to a panoramic video playing terminal.
2. The method of claim 1, further comprising:
determining an FOV area according to FOV information fed back by the panoramic video playing terminal;
the generating of the panoramic video stream by splicing the first path of low-resolution video stream as a background layer of the panoramic video and the high-resolution video tile as an enhancement layer of the panoramic video includes:
and taking the first path of low-resolution video stream as a background layer of the panoramic video, taking the high-resolution video tile in the FOV area as an enhancement layer of the panoramic video, and splicing to generate the FOV-based panoramic video stream.
3. The method of claim 1, wherein performing resolution enhancement processing on the second low-resolution video stream to obtain a high-resolution video stream comprises:
obtaining a target resolution corresponding to an enhancement layer of a panoramic video stream;
and performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream with the resolution being the target resolution.
4. The method according to claim 1, wherein said taking the first low resolution video stream as a background layer of a panoramic video comprises:
and carrying out equidistant columnar projection ERP and coding on the first path of low-resolution video stream to form a background layer of the panoramic video.
5. The method of claim 2, wherein using the high resolution video tile in the FOV area as an enhancement layer of the panoramic video comprises:
and carrying out ERP and coding on the high-resolution video tile in the FOV area to form an enhancement layer of the panoramic video.
6. The method according to claim 1, wherein in a case that the network-side device is an IP multimedia subsystem network-side device, the panoramic video playing terminal is a calling party of a panoramic video call, and the method comprises:
receiving a panoramic video request sent by the calling party, and forwarding the panoramic video request to a called party of the panoramic video call;
the receiving at least one low resolution video stream includes:
and receiving at least one path of low-resolution video sent by the called party under the condition that the called party supports the panoramic video function.
7. A video data processing device applied to a network side device is characterized by comprising:
the first receiving module is used for receiving at least one path of low-resolution video stream;
the first processing module is used for decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
the second processing module is used for performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
the third processing module is used for carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
the generating module is used for splicing the first path of low-resolution video stream as a background layer of the panoramic video and the high-resolution video tile as an enhancement layer of the panoramic video to generate the panoramic video stream;
and the sending module is used for sending the panoramic video stream to a panoramic video playing terminal.
8. A network side device comprising a processor and a transceiver, the transceiver receiving and transmitting data under the control of the processor, wherein the processor is configured to:
receiving at least one path of low-resolution video stream;
decoding and copying the low-resolution video stream to obtain a first path of low-resolution video stream and a second path of low-resolution video stream;
performing resolution enhancement processing on the second path of low-resolution video stream to obtain a high-resolution video stream;
carrying out block tile division processing on the high-resolution video stream to obtain a plurality of high-resolution video tiles;
splicing the first path of low-resolution video stream as a background layer of the panoramic video and the high-resolution video tile as an enhancement layer of the panoramic video to generate a panoramic video stream;
and sending the panoramic video stream to a panoramic video playing terminal.
9. A network side device comprises a memory, a processor and a program which is stored on the memory and can run on the processor; characterized in that the processor implements the method for processing video data according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for processing video data according to any one of claims 1 to 7.
CN202011312797.0A 2020-11-20 2020-11-20 Video data processing method and device and network side equipment Pending CN114598853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011312797.0A CN114598853A (en) 2020-11-20 2020-11-20 Video data processing method and device and network side equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011312797.0A CN114598853A (en) 2020-11-20 2020-11-20 Video data processing method and device and network side equipment

Publications (1)

Publication Number Publication Date
CN114598853A true CN114598853A (en) 2022-06-07

Family

ID=81802343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011312797.0A Pending CN114598853A (en) 2020-11-20 2020-11-20 Video data processing method and device and network side equipment

Country Status (1)

Country Link
CN (1) CN114598853A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115208862A (en) * 2022-07-12 2022-10-18 厦门立林科技有限公司 Cloud talkback video transmission control method and cloud talkback video device
CN116634194A (en) * 2023-05-10 2023-08-22 北京国际云转播科技有限公司 Video live broadcast method, video live broadcast device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107087212A (en) * 2017-05-09 2017-08-22 杭州码全信息科技有限公司 The interactive panoramic video transcoding and player method and system encoded based on spatial scalable
WO2018171487A1 (en) * 2017-03-23 2018-09-27 华为技术有限公司 Panoramic video playback method and client terminal
CN108965847A (en) * 2017-05-27 2018-12-07 华为技术有限公司 A kind of processing method and processing device of panoramic video data
WO2020228482A1 (en) * 2019-05-13 2020-11-19 华为技术有限公司 Video processing method, apparatus and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018171487A1 (en) * 2017-03-23 2018-09-27 华为技术有限公司 Panoramic video playback method and client terminal
CN107087212A (en) * 2017-05-09 2017-08-22 杭州码全信息科技有限公司 The interactive panoramic video transcoding and player method and system encoded based on spatial scalable
CN108965847A (en) * 2017-05-27 2018-12-07 华为技术有限公司 A kind of processing method and processing device of panoramic video data
WO2020228482A1 (en) * 2019-05-13 2020-11-19 华为技术有限公司 Video processing method, apparatus and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115208862A (en) * 2022-07-12 2022-10-18 厦门立林科技有限公司 Cloud talkback video transmission control method and cloud talkback video device
CN116634194A (en) * 2023-05-10 2023-08-22 北京国际云转播科技有限公司 Video live broadcast method, video live broadcast device, storage medium and electronic equipment
CN116634194B (en) * 2023-05-10 2024-05-24 北京国际云转播科技有限公司 Video live broadcast method, video live broadcast device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US8988486B2 (en) Adaptive video communication channel
US10893080B2 (en) Relaying multimedia conferencing utilizing software defined networking architecture
US9860572B2 (en) Spatially segmented content delivery
US20220239719A1 (en) Immersive viewport dependent multiparty video communication
CN103843301B (en) The switching between expression during the network crossfire of decoded multi-medium data
CN105392020B (en) A kind of internet video live broadcasting method and system
US10567462B2 (en) Apparatus and method for cloud assisted adaptive streaming
US20080100694A1 (en) Distributed caching for multimedia conference calls
KR20040069360A (en) Targeted scalable video multicast based on client bandwidth or capability
CN102790921B (en) Method and device for choosing and recording partial screen area of multi-screen business
CN114600468B (en) Combiner system, receiver device, computer-implemented method and computer-readable medium for combining video streams in a composite video stream with metadata
CN104756099A (en) Additive content and related client devices
CN108063911B (en) Video conference capacity expansion method
CN114073097A (en) Facilitating video streaming and processing by edge computation
CN113727144A (en) High-definition live broadcast system and streaming media method based on mixed cloud
CN111447503A (en) Viewpoint switching method, server and system for multi-viewpoint video
CN114598853A (en) Video data processing method and device and network side equipment
CN110233844A (en) A kind of multimedia live broadcast method, apparatus, equipment and medium
CN113507574A (en) System for processing and playing ultrahigh-definition video
CN113194278A (en) Conference control method and device and computer readable storage medium
CN112866725A (en) Live broadcast control method and device
CN108156413B (en) Video conference transmission method and device and MCU
US20110088069A1 (en) Network device, information processing apparatus, stream switching method, information processing method, program, and content distribution system
JP2013042492A (en) Method and system for switching video streams in resident display type video conference
CN102664900B (en) Media business supplying method and device, media business display packing and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination