CN109451293B - Self-adaptive stereoscopic video transmission system and method - Google Patents

Self-adaptive stereoscopic video transmission system and method Download PDF

Info

Publication number
CN109451293B
CN109451293B CN201810903751.2A CN201810903751A CN109451293B CN 109451293 B CN109451293 B CN 109451293B CN 201810903751 A CN201810903751 A CN 201810903751A CN 109451293 B CN109451293 B CN 109451293B
Authority
CN
China
Prior art keywords
video
user
base station
server
adaptive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810903751.2A
Other languages
Chinese (zh)
Other versions
CN109451293A (en
Inventor
刘奕彤
田旺
杨鸿文
吴建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201810903751.2A priority Critical patent/CN109451293B/en
Publication of CN109451293A publication Critical patent/CN109451293A/en
Application granted granted Critical
Publication of CN109451293B publication Critical patent/CN109451293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The invention provides a self-adaptive stereo video transmission method for the computing power of an open base station. The invention discloses a transmission scheme of a three-dimensional video from two angles of video coding and decoding and streaming media transmission. The streaming media server completes the coding, slicing and deployment of an original video source and supports various stereoscopic video coding schemes. The user requests a specified required video slice stream according to the device capability and network condition change. The base station carries out aggregation analysis on the service request of the access user, requests the existing stereoscopic video slice on the server from the streaming media server, and then generates a corresponding video slice according to the requirement specified by the user for pushing. The computing power of the base station is utilized to convert the stereoscopic video slice into a specific video slice stream compatible with user equipment and network capability, and the code rate of the video slice stream can be dynamically adjusted according to network changes to realize self-adaptive streaming. The invention reduces the storage and flow pressure of the server, reduces the requirement on the computing capacity of the user terminal, has good compatibility and can realize code rate self-adaptive transmission of various stereoscopic videos.

Description

Self-adaptive stereoscopic video transmission system and method
1. Field of application
The present invention relates to the problem of adaptive streaming of stereoscopic video.
2. Background of the invention
With the rapid development and mutual integration of computer vision, computer graphics and Video processing technologies, the existing traditional flat Video, i.e., two-dimensional Video service, has not been able to meet the needs of people, and can provide stereoscopic three-dimensional (3D) Video and Multi-view Video (MVV) for people to pay more attention.
To avoid confusion among terms, multi-view video herein refers to a set of video signals obtained by synchronously capturing the same scene from different perspectives by using a plurality of cameras located at different viewpoints, each of which is captured by a binocular video; the 3D video refers to a binocular video of a certain viewpoint, i.e., a set of video signal sets including two-channel video, the two channels corresponding to the left and right eyes of the human eye, respectively. The stereoscopic video is a general term for 3D video and multi-view video, and can present a stereoscopic display effect by combining with a specific display technology. The streaming media scheme of the stereoscopic video not only needs to consider the design of the transmission protocol and scheme, but also needs to consider the compression storage mode and the stereoscopic display mode of the video source. This scheme will be described herein from both video coding and streaming perspective.
The compression storage mode of the video source mainly depends on the video encoding and decoding technology and corresponds to the corresponding storage format. With reference to the mainstream codec standard h.264/AVC and the next generation codec standard h.265/HEVC, as well as the h.264/MVC, MVV-HEVC and 3D-HEVC extensions proposed by the two generation standards with respect to stereoscopic video, stereoscopic video mainly has the following storage formats:
a) the traditional binocular video consists of two paths of plane videos, and each path of video signal is independently coded and respectively corresponds to a left eye and a right eye;
b) the traditional video + depth map is composed of a traditional plane video and a depth map, the depth map is generated by recording depth information and the like by using a camera, and a new coding technology is introduced to improve the compression efficiency.
c) The multi-view video is composed of multiple paths of binocular videos, and each path of video corresponds to one view of the binocular videos, so that the multi-view video comprises multiple paths of traditional plane videos and depth maps. The multi-view video coding introduces parallax prediction and other technologies, and improves the compression efficiency by utilizing the inter-view redundancy.
The binocular video and the traditional video and the depth map can both represent the three-dimensional information of a scene, the binocular video and the traditional video and the depth map are relatively direct, and the two paths of videos directly correspond to the left eye and the right eye; the latter is an indirect representation method, and requires conversion to form a binocular video. The multi-view video contains multiple paths of binocular videos, so that stereoscopic multi-view stereoscopic display, binocular stereoscopic display and traditional two-dimensional display can be provided, and the data volume is larger.
The stereoscopic display simulates the retinal imaging process according to the visual mechanism of human eyes, and projects the left and right images of the same scene onto the retinas of the left and right eyes of a human to obtain stereoscopic vision feeling. It is therefore necessary to ensure that the left and right eyes see the corresponding views, respectively, without overlap, using suitable means. The stereoscopic video display technology is mainly classified into glasses type stereoscopic display (such as polarized glasses type stereoscopic display) and naked eye type stereoscopic display (such as parallax barrier naked eye type stereoscopic display).
In view of transmission, the data volume of the stereoscopic video far exceeds that of the traditional plane video, the transmission content is changed from a single-path monocular video to one-path or multi-path binocular video, and the requirement of bandwidth resources is extremely high. In the traditional binocular video scheme, the data volume is twice that of the traditional plane video; the traditional video + depth scheme has extremely high requirements on the terminal computing capacity. The scheme designed by the method is that video transcoding is completed at a base station, so that the transmission content is still the traditional binocular video, and the data volume is still twice of that of the traditional plane video. The multi-view video data volume is larger, and the bearing capacity of the existing network is far from enough.
3. Summary and features of the invention
The present invention relates to the problem of adaptive streaming of stereoscopic video.
The invention relates to a streaming media server, a base station and a user (client). The streaming media server provides a stereoscopic video source, and the stereoscopic videos comprise traditional binocular videos, traditional videos and depth maps and multi-view videos; the base station refers to a mobile communication base station with certain computing capacity, and is the last station of the mobile user access network. The user accesses the network through the base station, requests the three-dimensional video from the streaming media server, and acquires the video resource in a streaming transmission mode.
The method comprises the following specific steps:
a) and (5) video source coding. The video source server performs compression coding on original stereo video, and the video coding requires the use of a closed and fixed-length GOP (Group of Picture) structure, and each GOP can be independently decoded. For a traditional binocular video, two groups of video streams of left and right viewpoints are generated; for a traditional video and a depth map, generating a traditional video stream and a depth map data stream, wherein each frame in the depth map corresponds to each frame of the video; for multi-view video, if the number of views is N, 2N sets of video streams are generated.
b) Video source slices. And the video source server slices the video stream according to the fixed time length t, wherein each slice comprises the same number of GOPs. And generating a custom video information description file according to a data format negotiated with the user client in the processing process. The video description information after slicing contained in the file comprises a file name, total playing time, video compression formats (HEVC, AVC, VP9, VP8 and the like), audio compression formats (AAC, MP3 and the like), packaging modes (MPEG-2, MPEG-4 and the like), resolution, frame rate, slice duration, slice serial number, URI and encoding modes (traditional binocular video, traditional video + depth map, multi-view video and the like). In addition, according to different video coding modes, the traditional binocular video needs to contain channel serial numbers (corresponding to left and right eyes) and the like; the conventional video + depth map needs to include information description of the depth map (such as picture compression format, corresponding GOP number and video frame number), and the multi-view video needs to include view number and channel number.
c) A video information description file is requested. The user requests the video information description file from the streaming media server through the base station to acquire the video resource information which can be requested. And meanwhile, the base station also acquires video information deployed in the streaming media server.
d) A video slice file is requested. The user sends out a request according to the equipment capability, the change of the network condition, the change of the viewpoint and the like, wherein the request comprises the requirement that the user can receive the video slice, and the video, the audio compression format, the packaging mode, the resolution, the frame rate, the slice duration, the serial number, the code rate and the like of the video slice are required to be specified in the request. And for the stereo video with any coding mode, two groups of video streams are respectively requested, and then the two groups of video streams are respectively decoded, rendered and played.
e) The user group requests analysis and video transcoding. The base station analyzes and classifies the requests of the access users, and the users requesting the same video source form a user group. And the base station requests the streaming media server for video slices and then carries out transcoding and streaming pushing according to the user request. For a traditional binocular video, transcoding is required according to user requirements, slices meeting user specified requirements (video, audio compression format, packaging mode, resolution, slice duration, serial number and code rate) are generated and then pushed; for a traditional video and a depth map, the work of converting 2D into 3D is required to be completed, a traditional binocular video slice is rendered by using a 2D plane video and a corresponding depth map, and then a video slice pushing required by a user is generated; for multi-view video, it is necessary to analyze view information requested by a user group, then request one or more groups of video slices of a specific view from a streaming media server, then complete 2D to 3D and rate conversion, and push video slices specified by a user.
The stereoscopic video sources are various, the transmission scheme designed by the patent only performs simple slicing processing on the original video, and is low in complexity and low in storage pressure. The user equipment is very different, the format of the video slices received by the user is unified in the transmission scheme designed by the patent, the video slices are designed into slices of the traditional binocular video, and decoding and rendering can be directly performed by utilizing a traditional video decoder. Most of three-dimensional videos (traditional videos + depth maps and multi-view videos) have high requirements on terminal computing capacity, and the transcoding process is transferred to the base station side, so that the requirements on user equipment are reduced, and good compatibility is kept. For the multi-user scenario, the contents watched by the user groups in the same cell are highly similar, the demands of the user groups are integrated and analyzed, the flow pressure of the streaming media server is reduced, and the calculation amount of the base station side is reduced. For a single-user scenario, the wireless resources allocated to each user change constantly, and in the specially designed transmission scheme, the user can adaptively adjust the code rate of the requested video slice according to the capability of the terminal equipment, the network condition and the like, so that the network resources are utilized to the maximum extent, and smooth playing experience is ensured.
Drawings
(1) Fig. 1 is a schematic view of the present invention.
(2) FIG. 2 is a schematic flow chart of the method of the present invention.
(3) FIG. 3 is a schematic diagram of an embodiment.
4. Examples of specific embodiments
To further illustrate the method of practicing the present invention, an exemplary embodiment is given below. This example is merely representative of the principles of the present invention and does not represent any limitation of the present invention.
Suppose an HTTP streaming server is to deploy a section of multi-view video "Sport" with 6 views and duration of 800s, and the format is YUV 420. The base station has access to four users A, B, C and D, wherein the maximum resolution of the video played by the user A, B is 3840x2160, the maximum resolution of the video played by the user C, D is 7680x4320, the maximum frame rate of the video played by the user ABCD is 60fps, and all audio-video compression and encapsulation formats are supported. The average available bandwidth of users A, B, C and D is 15Mbps, 20Mbps, 30Mbps and 40Mbps respectively. According to the invention, the specific method comprises the following steps:
a) the original video encodes the slice. After encoding is completed, the video compression format is HEVC, the audio compression format is AAC, the packaging format is MPEG-4, the resolution is 7680x4320, the frame rate is 60fps, the GOP length is 30, the first frame of each GOP is an IDR frame, and the video code rate after compression is 50 Mbps. The original video contains 6 viewpoints, so that 6 groups of video files are obtained after coding and compressing, and the viewpoint sequence number is 01-06. After slicing, the slicing time is 2s, the slicing serial number is 1-400, and 6 groups are provided. These file description information are stored in the description file "MVV _ sport.
b) Suppose first that the user ABCD requests a change of video via the same base station. And the user ABCD sends an HTTP GET request to acquire a video information description file' MVV _ Sport.
c) The user ABCD sends HTTP POST request to HTTP stream media server to request the slice with 1-400 serial number. The user A requires that the resolution of the video is 3840x2160, the code rate is 15Mbps, and the view point sequence number is 01; user B requires that the resolution of the video is 3840x2160, the code rate is 20Mbps, and the view point sequence number is 02; user C requires that the resolution of the video is 7680x4320, the code rate is 30Mbps, and the viewpoint sequence number is 02; user D requires that the resolution of the video is 7680x4320, the code rate is 40Mbps, and the view sequence number is 02. In addition, other requirements of the user ABCD for video slices are consistent and are not listed here. This information is written into Body of the POST request.
d) And the base station analyzes and performs aggregation analysis after receiving the HTTP POST requests, and learns that the content of the user ABCD request is the same video source, and the video slices to be acquired come from the slice groups with the video viewpoint sequence numbers of 01 and 02. Video slices with slice numbers 1-400 in the two sets of view numbers 01 and 02 will be requested in sequence from the HTTP streaming server.
e) And the base station transcodes the video according to the requirement on the video in the HTTP POST request Body. The slice with the view number of 01 comprises a traditional video slice and a depth map slice, and the slices are transcoded into binocular video slices through 2D to 3D. Generating a slice with the resolution of 3840x2160 and the code rate of 15Mbps by downsampling and code rate control according to the requirements of a user A; and generating slices with the resolution of 3840x2160 and the code rate of 20Mbps according to the requirements of the user B through down sampling and code rate control. The subsequent operations required for the slice with view number 02 are the same as above. And respectively pushing the transcoded slices to corresponding users.
f) Repeating the steps c) d) and e) until all the slices with the slice serial numbers of 1-400 are pushed to the user or the user cancels the service. In the above process, the available bandwidth of the user may be changed at any time, so that a new code rate requirement exists in the http post request of the user, and the base station needs to generate a video slice meeting the requirement according to each request of the user, thereby realizing the transmission of the adaptive code rate.

Claims (4)

1. A self-adapting stereoscopic video transmission method, the method is compatible with existing streaming media server and terminal video decoder, participate in the stereoscopic video transmission business through the computational capability of the open base station, have reduced the storage and flow pressure of the server, has reduced the requirement for terminal computational capability of the user, has realized the self-adapting streaming transmission of multiple stereoscopic videos, the characteristic of the method lies in the following step:
a) before service is initiated, a server and a user need to agree on a media information description mode, and a written description file is acquired by the user and a base station simultaneously before service is initiated; the base station is used for aggregating and analyzing the user request and forwarding the user request;
b) the encoding operation related to the deployment of the video source by the server only limits the GOP structure, other encoding parameters are not affected, the slicing operation is simple, the encoded video content is not affected, and the server deploys one copy of the original video without redundancy;
c) the video slices received by the client are slices of traditional videos, are compatible with the existing video decoder, and do not need extra calculation except decoding;
d) the base station participates in the service and plays a key role; the base station classifies and combines the user requests, reduces the operation of acquiring the video slice stream from the server and reduces the flow pressure of the server; the base station converts the stereoscopic video slices into traditional binocular video slices, so that compatibility of user equipment is guaranteed; and the base station generates a video slice meeting the requirement of the user according to the requirement of the user, so that self-adaptive transmission is realized.
2. The adaptive stereoscopic video transmission method of claim 1, wherein the streaming media server refers to any server capable of implementing streaming media services, and implemented transmission protocols include HTTP, RTP/RTSP; the base station refers to any radio station with computing capability and providing mobile communication access function.
3. The adaptive stereoscopic video transmission method according to claim 1, wherein the stereoscopic video includes a binocular stereoscopic video, a monocular/binocular stereoscopic video with a depth map, a multi-view video with/without a depth map.
4. The adaptive stereoscopic video transmission method of claim 1, wherein the adaptation comprises adaptive adjustment of video compression format, audio compression format, and encapsulation format, and dynamic adaptive change of resolution, frame rate, and code rate.
CN201810903751.2A 2018-08-09 2018-08-09 Self-adaptive stereoscopic video transmission system and method Active CN109451293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810903751.2A CN109451293B (en) 2018-08-09 2018-08-09 Self-adaptive stereoscopic video transmission system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810903751.2A CN109451293B (en) 2018-08-09 2018-08-09 Self-adaptive stereoscopic video transmission system and method

Publications (2)

Publication Number Publication Date
CN109451293A CN109451293A (en) 2019-03-08
CN109451293B true CN109451293B (en) 2021-11-26

Family

ID=65530120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810903751.2A Active CN109451293B (en) 2018-08-09 2018-08-09 Self-adaptive stereoscopic video transmission system and method

Country Status (1)

Country Link
CN (1) CN109451293B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110062130B (en) * 2019-03-14 2021-06-08 叠境数字科技(上海)有限公司 Gigabit pixel video rendering method and device based on preprocessed file structure
CN110446051A (en) * 2019-08-30 2019-11-12 郑州航空工业管理学院 Three-dimensional video-frequency code stream Adaptable System and method based on 3D-HEVC
CN113115077B (en) * 2021-03-12 2022-04-26 上海交通大学 Code rate self-adaptive transmission method and system for static point cloud server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022540A (en) * 2006-02-13 2007-08-22 中兴通讯股份有限公司 Video monitoring system and method under server/customer end constitution
CN104540043A (en) * 2014-12-24 2015-04-22 北京邮电大学 Video transmitting method for wireless network and base station
CN105872587A (en) * 2015-11-25 2016-08-17 乐视云计算有限公司 Video request processing method and device
CN107333153A (en) * 2016-04-28 2017-11-07 华为技术有限公司 A kind of video transmission method, base station and system
CN107734382A (en) * 2017-08-28 2018-02-23 北京邮电大学 Video transmission method and device under a kind of low-and high-frequency Collaborative environment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102343331B1 (en) * 2015-07-07 2021-12-24 삼성전자주식회사 Method and apparatus for providing video service in communication system
CN105245798B (en) * 2015-09-21 2018-02-23 西安文理学院 The CCD video compress measurement imaging system and control method perceived based on splits' positions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022540A (en) * 2006-02-13 2007-08-22 中兴通讯股份有限公司 Video monitoring system and method under server/customer end constitution
CN104540043A (en) * 2014-12-24 2015-04-22 北京邮电大学 Video transmitting method for wireless network and base station
CN105872587A (en) * 2015-11-25 2016-08-17 乐视云计算有限公司 Video request processing method and device
CN107333153A (en) * 2016-04-28 2017-11-07 华为技术有限公司 A kind of video transmission method, base station and system
CN107734382A (en) * 2017-08-28 2018-02-23 北京邮电大学 Video transmission method and device under a kind of low-and high-frequency Collaborative environment

Also Published As

Publication number Publication date
CN109451293A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
US11082719B2 (en) Apparatus, a method and a computer program for omnidirectional video
CN109218734B (en) Method and apparatus for providing media content
US11184584B2 (en) Method for image decoding, method for image encoding, apparatus for image decoding, apparatus for image encoding
CN109155861B (en) Method and apparatus for encoding media content and computer-readable storage medium
KR101649207B1 (en) Multiview video coding and decoding
CN101453662B (en) Stereo video communication terminal, system and method
WO2020254720A1 (en) An apparatus, a method and a computer program for video encoding and decoding
JP2022518367A (en) Devices, methods, and computer programs for video coding and decoding
Akar et al. Transport methods in 3DTV—a survey
CN109451293B (en) Self-adaptive stereoscopic video transmission system and method
Su et al. 3D video communications: Challenges and opportunities
KR101861929B1 (en) Providing virtual reality service considering region of interest
CN115211131A (en) Apparatus, method and computer program for omnidirectional video
Su et al. A DASH-based 3D multi-view video rate control system
Gürler et al. Peer-to-peer system design for adaptive 3D video streaming
EP3673665A1 (en) An apparatus, a method and a computer program for omnidirectional video
Fautier VR video ecosystem for live distribution
Dogan et al. Real-time immersive multimedia experience for collaborating users over hybrid broadcast networks
Kumar et al. A Comparative Analysis of Advance Three Dimensional Video Coding for Mobile Three Dimensional TV
Mohib End-to-end 3D video communication over heterogeneous networks
Hewage Perceptual quality driven 3-D video over networks
Rahimi Reliable Streaming of Stereoscopic Video Considering Region of Interest
Hamza Quality-aware 3D video delivery
Reddy et al. A client-driven 3D content creation system using 2D capable devices
CN115174942A (en) Free visual angle switching method and interactive free visual angle playing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant