CN114513702A

CN114513702A - Web-based block panoramic video processing method, system and storage medium

Info

Publication number: CN114513702A
Application number: CN202210169847.7A
Authority: CN
Inventors: 张海涛; 李加畅; 曾泷; 马华东
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-05-17
Anticipated expiration: 2042-02-23
Also published as: CN114513702B

Abstract

The invention provides a method, a system and a storage medium for processing a partitioned panoramic video based on Web, wherein the method comprises the following steps: the method comprises the steps of obtaining an original first panoramic video, generating a plurality of second panoramic videos with different resolutions, blocking, slicing and packaging each second panoramic video, and generating an MPD index file containing video slice information; taking a first coordinate on the spherical surface of the first sphere as a position coordinate of the virtual image acquisition equipment, enabling the virtual image acquisition equipment to shoot towards the center of the first sphere, taking an intersection point of an extension line of a connecting line of the first coordinate and the center of the sphere and the spherical surface of the second sphere as a visual field center coordinate of a user, and determining a first video block positioned in the visual field of the user; the method comprises the steps of creating players with the same number as video blocks, downloading and decoding MPD index files corresponding to the first video blocks, and synchronizing the playing progress of the first video blocks; and splicing and stitching the plurality of first video blocks, and performing video rendering on the spliced and stitched panoramic video.

Description

Web-based block panoramic video processing method, system and storage medium

Technical Field

The invention relates to the technical field of panoramic videos, in particular to a method and a system for processing a partitioned panoramic video based on Web and a storage medium.

Background

Panoramic video is also called full-view video or 3D live-action video, is a typical form of virtual reality technology (VR), and belongs to the category of weak interactive virtual reality technology, namely, the state of a virtual entity cannot be influenced by the interaction of a user. The panoramic video uses the panoramic camera to carry out 360 all-round shootings to the scene that is located, can take notes the picture of all visual angles around, and the user can freely switch through specific panorama player and watch the visual angle, obtains the immersive experience of watching. Compared with the traditional video, the panoramic video enables the user not to be limited by the content of a specific visual field, the user can freely switch the visual angle to select an interested picture, better interactivity and on-site experience effect are achieved, and the user has stronger attraction to consumers. The panoramic video is shot by using an annular camera matrix or a panoramic camera and is formed by processing special panoramic stitching, projection, encoding and the like, and as shown in fig. 1, the core technology mainly relates to the panoramic video stitching, panoramic video projection and panoramic video playing and controlling technology.

Most panoramic video players currently visible on the network support users to freely switch viewing fields by requesting full-view pictures. Because the panoramic video records a picture with a 360-degree view, the picture content is actually a three-dimensional spherical surface, however, the current popular video coding techniques are all oriented to two-dimensional video pictures and do not support the direct coding of three-dimensional stereoscopic pictures, so that the panoramic projection technique needs to be applied to project the three-dimensional pictures onto a two-dimensional plane and the existing two-dimensional video coding technique is applied to code, thereby being convenient for directly adopting the existing video transmission network to distribute. At the player end, the panoramic player always requests the panoramic video with all the visual field pictures, performs reverse video decoding on the panoramic video, and reversely projects the two-dimensional pictures onto the three-dimensional spherical surface. And then, establishing a three-dimensional sphere model by using the three-dimensional image drawing capability provided by the Web platform, pasting the panoramic image on the three-dimensional sphere model in a texture mode, and finally rendering the panoramic view picture on a user screen through WebGL and the like. In addition, the user controls the switching of the current view field position through an interactive device such as a mouse and a keyboard. Because the visual field actually seen by the user only occupies a small part of the whole panoramic picture, a large amount of operations such as transmission, decoding and the like aiming at video data are wasted, and the data utilization rate is low; in addition, because the panoramic video has a 360-degree view, the projected and encoded panoramic video generally has huge resolution and code rate, which have higher requirements on network transmission bandwidth, terminal processor performance and the like; and the method can not improve the visual field picture quality by continuously improving the whole video code rate and the resolution.

In the prior art, some panoramic video players based on non-uniform coding are adopted, and the non-uniform coding of the panoramic video reduces the data volume of the whole planar panoramic media by introducing a virtual view and performing non-uniform mapping based on the view direction. The non-uniform mapping is based on a set virtual visual angle, when the spherical panoramic content is sampled, the spherical panoramic content is sampled in a high density mode around a set virtual visual angle area, the sampling density is gradually reduced in an area far away from the set visual angle, and the non-uniform distribution of pixels of the planar panoramic video based on the visual angle is achieved. In the transmission and playing of the non-uniform coding panoramic video, the server side has a plurality of panoramic video versions with different virtual visual angles, which are originated from the same panoramic video but provide different visual angles. And the panoramic player calculates a virtual visual angle which is most matched with the current visual field position based on the current visual field position of the user, and requests the panoramic video of the corresponding version from the server. The scheme can ensure the quality of a key area and reduce the resolution and data volume of the panoramic video on the whole. However, the method needs to store a plurality of video versions with different virtual viewing angles at the server, and consumes huge storage space; in the non-uniform coding scheme, because the data of the virtual visual angle is limited and discrete, and the user visual angle is continuous, when the actual visual angle of a user is positioned between two virtual visual angles, the two video versions can be repeatedly switched, and the watching effect is influenced; in addition, the current mainstream panoramic cameras, panoramic videos and panoramic viewing devices still mainly use longitude and latitude map panoramic mapping, and the adoption of a non-uniform coding scheme will bring compatibility problems, and various related devices are required to provide expansion support suitable for the non-uniform mapping.

In addition to the above, in the prior art, there are also non-Web players using a block-based panoramic video, in order to implement panoramic video transmission based on the user view position, in this scheme, an original panoramic video is divided into a plurality of blocks and transmitted in units of blocks, and only those blocks required by the current view of the user are transmitted, and these blocks will eventually constitute the viewing view of the user. The panoramic player based on the partitioned panoramic video can effectively improve the utilization rate of panoramic video data. Compared with other panoramic video players, the player based on the partitioned panoramic video has higher challenges in aspects of partitioned video synchronization, partitioned splicing, partitioned rendering and the like, needs to rely on video processing and playing interface capabilities supported by a terminal platform, and develops playing and rendering programs suitable for the partitioned panoramic video by applying languages such as C, C + + and the like; the method is only suitable for terminal native applications such as a desktop and the like, and cannot be transplanted and compatible in different terminal devices and operating systems; and the desktop native player needs the user to download and install the application, which is not convenient for the rapid distribution and use of the application.

Therefore, how to provide a panoramic video processing method and system that can effectively improve the utilization rate of panoramic video transmission data, reduce the overall data volume of panoramic video transmission, reduce the transmission bandwidth requirement, save the bandwidth cost, and improve the visual field picture quality, and are conveniently implemented on a Web platform is a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, the present invention provides a method, a system and a storage medium for processing a partitioned panoramic video based on a Web, so as to solve one or more problems in the prior art.

According to one aspect of the invention, the invention discloses a method for processing a partitioned panoramic video based on Web, which comprises the following steps:

the method comprises the steps of obtaining an original first panoramic video, generating a plurality of second panoramic videos with different resolutions from the first panoramic video, carrying out video blocking, slicing and packaging on each second panoramic video, and generating an MPD index file containing video slice information;

taking a first coordinate on a spherical surface of a first sphere as a position coordinate of virtual image acquisition equipment, enabling the virtual image acquisition equipment to shoot towards the center of the first sphere, taking an intersection point of an extension line of a connecting line of the first coordinate and the center of the sphere and a spherical surface of a second sphere as a visual field center coordinate of a user, acquiring a center position coordinate of each video block, and determining a plurality of first video blocks located in the visual field of the user based on a distance between the center position coordinate of each video block and the visual field center coordinate; wherein the second sphere is concentric with the first sphere, and the radius of the second sphere is larger than that of the first sphere;

creating players with the same number as the video blocks, downloading and decoding MPD index files corresponding to the first video blocks based on a plurality of players corresponding to the first video blocks, and synchronizing the playing progress of the first video blocks;

and splicing and stitching the plurality of first video blocks, and performing video rendering on the spliced and stitched panoramic video.

In some embodiments of the invention, the method further comprises:

acquiring input operation of a mouse, and changing the position coordinate of the virtual image acquisition equipment based on the input of the mouse; the input operation of the mouse is a mouse pressing event, a mouse moving event or a mouse releasing event.

In some embodiments of the present invention, acquiring an input operation of a mouse, and changing position coordinates of the virtual image capturing device based on the input of the mouse includes:

acquiring the current position coordinate and the initial position coordinate of the mouse, and calculating the offset of the mouse based on the current position coordinate and the initial position coordinate of the mouse;

calculating a new position coordinate of the virtual image acquisition equipment based on the offset, the position coordinate of the virtual image acquisition equipment corresponding to the initial position coordinate of the mouse and a preset view switching sensitivity value;

and taking the new position coordinate of the virtual image acquisition equipment obtained by calculation as the current position coordinate of the virtual image acquisition equipment.

In some embodiments of the present invention, calculating a new position coordinate of the virtual image capturing device based on the offset, a position coordinate of the virtual image capturing device corresponding to the initial position coordinate of the mouse, and a preset view switching sensitivity value includes:

converting the position coordinate of the virtual image acquisition equipment corresponding to the initial position coordinate of the mouse into a first longitude and latitude coordinate;

calculating a product of the offset and a preset view switching sensitivity value, and taking the sum of the product and the first longitude and latitude coordinate as a second longitude and latitude coordinate of the virtual image acquisition equipment;

and converting the second longitude and latitude coordinates into spherical coordinates to be used as new position coordinates of the virtual image acquisition equipment.

In some embodiments of the present invention, determining a first plurality of video segments located within a user's field of view based on a distance between a center position coordinate of each of the video segments and the field of view center coordinate comprises:

judging the specific position of the visual field center on the second sphere;

when the view center is positioned on the equator of the second sphere, calculating the distance between the center position coordinate of the video block and the view center coordinate based on an Euler distance formula;

when the visual field center is positioned at two poles of the second sphere, calculating the distance between the center position coordinate of the video block and the visual field center coordinate based on a Manhattan distance formula;

and judging whether the distance between the calculated center position coordinate of the video block and the center coordinate of the visual field is smaller than a distance threshold value or not, and when the distance is smaller than the distance threshold value, taking the corresponding video block as a first video block positioned in the visual field of the user.

In some embodiments of the present invention, creating players equal in number to the video chunks, and downloading and decoding an MPD index file corresponding to each first video chunk based on a plurality of players corresponding to a plurality of the first video chunks includes:

creating a label container on a Web panoramic player interface, and establishing a label matrix based on the number of the video blocks, wherein the player is used as a label object;

creating a media source extension object, and binding the created media source extension object with each player object;

and downloading and decoding the MPD index file corresponding to each first video block based on the media source extension object.

In some embodiments of the present invention, synchronizing the playing progress of a plurality of the first video chunks includes:

selecting one of the plurality of players as a reference player, acquiring the playing progress of the video cutout played by the reference player, broadcasting the current playing progress of the reference player to other players, and updating the corresponding playing progress of the video cutout by the other players based on the received current playing progress of the reference player.

In some embodiments of the present invention, stitching and stitching a plurality of the first video blocks, and performing video rendering on the stitched and stitched panoramic video includes:

creating a canvas on a Web platform, and drawing an image corresponding to the first video block on the canvas;

and creating a sphere model in the panoramic scene, and pasting the panoramic image on the canvas to the sphere of the sphere model.

According to another aspect of the present invention, a Web-based tiled panoramic video processing system is also disclosed, the system comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps of the method according to any of the above embodiments when the computer instructions are executed by the processor.

According to yet another aspect of the present invention, a computer-readable storage medium is also disclosed, on which a computer program is stored, which is characterized in that the program, when executed by a processor, implements the steps of the method according to any of the above embodiments.

According to the method for processing the partitioned panoramic video based on the Web, on one hand, the utilization rate of panoramic video transmission data can be effectively improved, the whole data volume of panoramic video transmission is reduced, the transmission bandwidth requirement is reduced, and the bandwidth cost is saved. On the other hand, a higher-quality panoramic view can be provided under the same transmission bandwidth, and the problems of low view quality, difficulty in high-definition panoramic video transmission and the like in panoramic video application are solved to a certain extent. In addition to the above, the method for processing the partitioned panoramic video based on the Web also solves the problems that the existing partitioned panoramic video player scheme is not suitable for a Web platform and is difficult to implement, realizes the application of the panoramic video player in a Web application mode, has greater advantages in the aspects of application distribution, multi-terminal platform adaptation, installation-free starting and the like compared with desktop application, and is beneficial to the vigorous development of the partitioned panoramic video technology and the wide application of related software.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to what has been particularly described hereinabove, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. For purposes of illustrating and describing some portions of the present invention, corresponding parts of the drawings may be exaggerated, i.e., may be larger, relative to other components in an exemplary apparatus actually manufactured according to the present invention. In the drawings:

fig. 1 is a flowchart of a panoramic video playing process in the prior art.

Fig. 2 is a flowchart illustrating a method for processing a Web-based tiled panoramic video according to an embodiment of the present invention.

Fig. 3 is a schematic flowchart of processing a panoramic video into a blocked video resource supporting a Web platform according to an embodiment of the present invention.

FIG. 4 is a diagram illustrating a relationship between virtual camera coordinates and a center of a field of view according to an embodiment of the present invention.

Fig. 5 is a schematic flowchart of a process of downloading, decoding and controlling a multi-partition panoramic video of a Web platform according to an embodiment of the present invention.

Fig. 6 is a schematic flowchart of a synchronization mechanism for playing progress of multiple video partitions on a Web platform according to an embodiment of the present invention.

Fig. 7 is a flowchart illustrating a process of splicing and rendering a video block on a Web platform according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of the implementation of panoramic broadcast control.

Fig. 9 is a schematic flow chart of an implementation process of panoramic broadcast control.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not closely related to the present invention are omitted.

It should be emphasized that the term "comprises/comprising/comprises/having" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

The panoramic playing process involves decoding of a panoramic video and converting two-dimensional video data into a three-dimensional space for rendering and display by applying a corresponding projection technology. At present, an ERP panoramic projection mode is mainly adopted for panoramic video, and a corresponding panoramic broadcast control principle is shown in fig. 8. In the ERP projection, a spherical image is uniformly expanded into a plane image according to the longitude and latitude so as to facilitate the coding, storage and transmission of a panoramic video. And when the panorama is played, the opposite operation of ERP projection is needed, namely, the rectangular plane video is projected onto the sphere according to the equally divided longitude and latitude. The process is usually realized by adopting OpenGL ES, and can support a plurality of platforms such as IOS, Android and Web. Referring to fig. 9, by using an API interface provided by a platform where the video frame is located, a video frame picture is read from a video stream, and a sphere environment is built by OpenGL ES. And then generating a three-dimensional vertex array of the sphere, mapping the video data serving as texture data to the three-dimensional vertex, and rendering the texture map to the sphere to realize the mapping from the rectangular image to the three-dimensional sphere. In the panoramic playing process, the center of a sphere is taken as an observation point, and different positions of the sphere are observed along different visual angles; the switching of the viewing angle may be controlled by sliding the screen or using device sensor data. Currently, the common panoramic VR (virtual reality technology) broadcast control devices mainly include Gear VR, Google Cardboard, Oculus Rift, HTC Vive and the like. Common panoramic video development tool libraries include MD360Player, fact 360, and the like.

Because the panoramic video has a view field picture of 360 degrees, a user only sees the picture in the view field area in the actual watching process, and the pictures in other areas cannot be utilized, so that the utilization rate of panoramic video data transmission is reduced. Therefore, transmitting the corresponding video blocks based on the current view of the user becomes an effective way to improve the transmission efficiency. The core principle is that the original panoramic video is divided into smaller video blocks according to grids, the blocks are independently coded and stored, and each block is numbered. The required blocks are requested at the player side based on the position of the current viewer field of view, only specific blocks need to be transmitted and used to compose the user field of view. This involves video processing techniques such as blocking, encoding, streaming panoramic video, and so on, and there are many software libraries and toolkits available.

FFmpeg is a collection of libraries and tools for processing multimedia content, such as audio, video, subtitles, and related metadata. The libavcodec library provides a plurality of encoders applied to videos and audios, the libavformat library can perform streaming processing, packaging and other operations on the videos and the audios, and the libavfilter library provides a plurality of audio and video processing multi-filters and supports operations such as division, scaling, graph transformation and the like on the videos. The core audio and video processing libraries are integrated into a command line tool packet ffmpeg, and audio and video resources can be directly processed in a command line instruction mode.

The video transmission technology is a technology intended to transmit a video file over a network. In the past, online video was primarily transmitted via the RTMP (Real-Time Messaging Protocol) Protocol. RTMP or real-time message protocol is a real-time streaming media transmission protocol based on Flash, and today, an online video platform for transmitting video by using RTMP still exists. However, Flash plug-ins are no longer maintained, and there are fewer and fewer devices supporting this protocol each year, and Flash-based video is no longer suitable for transmitting video to users. The RTMP protocol is gradually replaced by video transmission protocols such as hls (http Live Streaming) and dash (dynamic Adaptive Streaming over http). They deliver content via standard HTTP Web servers, which means that no special facilities are required in this way, any standard Web server or CDN can work. They work on the principle of splitting the video into shorter video segments. They are also transmission protocols that support adaptive code rates, meaning that the client device and server can dynamically detect the user's network bandwidth and adjust the video quality accordingly. In addition, they support video coding such as H.264, HEVC/H.265, VP9, etc.

DASH-srd (spatial rendering description) is a method proposed by the MPEG organization on a DASH basis for streaming a portion of video to a terminal display device, which may be combined with an adaptive bitrate delivery form supported by DASH in nature. SRD extends the Media Presentation Description (MPD) of DASH by describing the spatial relationships between related video content segments. This enables the video viewing terminal to select and request only those portions of the video that can directly enhance the user experience (QoE). DASH-SRD provides efficient support for enabling the transmission of scalable wide-view video and also for the transmission of blocks of panoramic video.

Based on the above contents, the method for processing the partitioned panoramic video based on the Web specifically comprises the steps of Web platform panoramic video visual field representation and partitioning selection, Web platform multi-panoramic video partitioning decoding and synchronization, Web platform panoramic video partitioning splicing and texture rendering.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In the drawings, the same reference numerals denote the same or similar components, or the same or similar steps.

Fig. 2 is a flowchart illustrating a method for processing a Web-based tiled panoramic video according to an embodiment of the present invention, and as shown in fig. 2, the method at least includes steps S10 to S40.

Step S10: the method comprises the steps of obtaining an original first panoramic video, generating a plurality of second panoramic videos with different resolutions from the first panoramic video, carrying out video blocking, slicing and packaging on each second panoramic video, and generating an MPD index file containing video slice information.

In this step, the first panoramic video may be a video captured by an image capturing apparatus. In order to support that a panoramic player can request a panoramic video from a server in units of blocks, before releasing panoramic video resources, an original first panoramic video needs to be processed into blocked panoramic video media resources supported by a Web platform; given the limitations of Web platform network bandwidth and processing performance, a second panoramic video with quality versions of different resolutions is first provided.

Referring to fig. 3, the scale function of ffmpeg toolkit video filter can be used to generate multiple (for example, 6) quality versions of equal-difference resolution distribution from the original resolution by using the original first panoramic video at a fixed ratio, and the quality versions are commanded as ffmpeg-I input.mp4-vf scale ═ width >: height > output.mp4, where the parameters width and height are the width and height of the resolution of the processed second panoramic video respectively; video coding employs the h.264 format. And then cutting the processed second panoramic video in a grid mode, wherein during cutting, a crop function provided by a viewer in an ffmpeg toolkit can be adopted, and video blocks at required positions are cut out from the second panoramic video by a command ffmpeg-i input.mp4-vf crop: < width > < height >: x >: y > output.mp4, wherein parameters width and height represent the height and width of the output video blocks, and parameters x and y represent the horizontal and vertical coordinates of the reference points of the video blocks in the original video. In this example, the second panoramic video is divided into 12 video blocks according to a 4 × 3 grid, so as to avoid that too many video blocks bring challenges to parallel video decoding, synchronization and the like of the Web terminal; in order to play the block panoramic video at the adaptive code rate of the Web platform, the video blocks need to be further sliced, packaged and the like. Specifically, the MPEG-Dash protocol can be adopted to transmit the panoramic video blocks between the resource server and the Web browser; the mp4box open source software library is used to generate the final Dash (dynamic Adaptive Streaming over http) resource, which processes the command code as mp4box-Dash 5000-frag 5000-rap-frag-rap-file dashav 264: on demand ver1.mp4 ver2.mp4 ver3.mp4 audio. m4a-out video. mp, which blocks multiple different quality versions of video into Dash resource and generates the corresponding mp file. And finally, deploying the generated Dash resource to a Web server, and requesting the needed panoramic video blocks by a Web browser through downloading and analyzing the mpd file.

Step S20: taking a first coordinate on a spherical surface of a first sphere as a position coordinate of virtual image acquisition equipment, enabling the virtual image acquisition equipment to shoot towards the center of the first sphere, taking an intersection point of an extension line of a connecting line of the first coordinate and the center of the sphere and a spherical surface of a second sphere as a visual field center coordinate of a user, acquiring a center position coordinate of each video block, and determining a plurality of first video blocks located in the visual field of the user based on a distance between the center position coordinate of each video block and the visual field center coordinate; the second sphere is concentric with the first sphere, and the radius of the second sphere is larger than that of the first sphere.

In this step, the first sphere is used to form a spherical orbit of the virtual image capture device, and the second sphere is used as a sphere model for rendering the panoramic picture. And the virtual image capturing device may specifically be a virtual camera.

Since the blocking panoramic player needs to selectively transmit the video blocks required by the current view according to the current view position of the user (i.e. the video blocks located in the current view of the user), it is necessary to acquire information related to the view position of the user on the Web platform and determine a blocked video list which needs to be transmitted finally based on the information. In order to facilitate the presentation of the viewing field position of the viewer, as shown in fig. 4, the virtual camera for cropping the viewing field range in the Web3D frame is disposed on a spherical orbit of a first sphere with a radius of 1, and the shooting direction of the virtual camera is directed toward the center of sphere. Meanwhile, the sphere model of the second sphere used for rendering the panorama picture and the spherical surface of the virtual camera orbit have the same spherical center, but the radius of the second sphere is much larger than that of the first sphere.

Further, the position coordinates of the virtual camera are represented by spherical coordinates (θ, φ) thereof, and then the first coordinates are (θ, φ); taking theta and phi as variables facilitates reading and modifying the same. The shooting direction of the virtual camera faces to the center of sphere, so that the actual shot panoramic spherical surface position (the view center) is positioned on the connecting line of the virtual camera position coordinate and the center of sphere, and the spherical surface coordinate (theta ', phi') of the view center can pass through a formula

And (4) calculating.

The user's view center coordinates will be used to select which view blocks are located in the view, however the location information of the view blocks is two-dimensional and the view center coordinates are three-dimensional, so the three-dimensional view center coordinates need to be translated into two dimensions for representation. In particular, by

Conversion between three-dimensional coordinates and two-dimensional coordinates can be realized, and the two-dimensional coordinates of the converted visual field center are expressed as (u'V'), where W and H are the width and height, respectively, of the rectangle in an ERP (cylindrical projection) projection.

In an embodiment, determining a plurality of first video blocks located in the field of view of the user based on a distance between the center position coordinate and the center coordinate of the field of view of each of the video blocks specifically includes: judging the specific position of the visual field center on the second sphere; when the view center is positioned on the equator of the second sphere, calculating the distance between the center position coordinate of the video block and the view center coordinate based on an Euler distance formula; when the visual field center is positioned at two poles of the second sphere, calculating the distance between the center position coordinate of the video block and the visual field center coordinate based on a Manhattan distance formula; and judging whether the distance between the calculated center position coordinate of the video block and the center coordinate of the visual field is smaller than a distance threshold value or not, and when the distance is smaller than the distance threshold value, taking the corresponding video block as a first video block positioned in the visual field of the user.

In the above-described embodiment, it is determined whether or not the field-of-view block is within the user's field of view based on the distance between the center position coordinates and the field-of-view center coordinates of the field-of-view block. The distance calculation formula is as follows:

wherein (ω)_i,μ_i) Is the coordinates of the center position of the video patch, and (u ', v') is the coordinates of the position of the center of the field of view, R is the radius of the circular field of view, and H is the height of the rectangle in ERP (cylindrical projection) projection. According to the calculation formula, when the visual field of the user is located in the non-polar region of the sphere, the visual field region is regarded as a circular region with the radius of R, the Euler distance is used for calculating the distance, and whether the video block needs to be requested to be transmitted or not is determined according to whether the center of the video block is located in the circular region of the visual field or not. When the visual field is positioned near the two poles of the sphere, due to the characteristics of the ERP projection, a large amount of redundancy exists in the data of the two pole regions, the visual field of the two pole regions of the second sphere needs to be completed by more blocks, and in order to meet the characteristics, the visual field of the two pole regions is divided into two blocksThe FOV (field of view) is regarded as a scalable rectangular range, the initial length of the rectangle is 2R, the initial width is R, and the FOV is enlarged along with the reduction of the distance between the field of view and the pole until the FOV is the same as the length of the rectangle in the ERP; wherein R is the radius of the circular field of view.

After the distance between each video block and the view field center is obtained through calculation, a distance threshold value is further set to limit the range of the user view field, when the distance between the block center and the view field center is smaller than or equal to the distance threshold value, the block is judged to be in the view field range and is transmitted, and otherwise, the video block is not transmitted. Generally, the greater the set distance threshold, the greater the number of video slices that are within the user's field of view.

Step S30: and creating players with the same number as the video blocks, downloading and decoding MPD index files corresponding to the first video blocks based on a plurality of players corresponding to the first video blocks, and synchronizing the playing progress of the first video blocks.

Compared with non-blocked video playing, the blocked panoramic video player needs to simultaneously process downloading, caching, decoding and playing of multiple paths of video streams, and besides, the playing progress of the blocks needs to be synchronized so as to avoid obvious block boundaries caused by inconsistent progress in a visual field. In the Web platform, since only the JavaScript language is supported, the existing video processing software library cannot be directly used to perform processing such as decoding, splicing, and synchronizing on multiple channels of video, which is one of the main problems of the Web platform in realizing transmission of the blocked panoramic video. In this step, the downloading, buffering, and decoding of the multi-way chunked panoramic video stream may be implemented based on the < video > tag in the HTML5 standard, which is widely supported by the Web platform.

For example, creating players equal in number to the video chunks, and downloading and decoding an MPD index file corresponding to each first video chunk based on a plurality of players corresponding to a plurality of the first video chunks includes: creating a label container on a Web panoramic player interface, and establishing a label matrix based on the number of the video blocks, wherein the player is used as a label object; creating a media source extension object, and binding the created media source extension object with each player object; and downloading and decoding the MPD index file corresponding to each first video block based on the media source extension object. And synchronizing the playing progress of the plurality of first video partitions specifically comprises: selecting one of the plurality of players as a reference player, acquiring the playing progress of the video cut blocks played by the reference player, broadcasting the current playing progress of the reference player to other players, and updating the playing progress of the corresponding video cut blocks by the other players based on the received current playing progress of the reference player.

Specifically, as shown in fig. 5, a hidden < div > tag container is first created on the panoramic player interface, a < video > tag matrix whose number is consistent with the number of the video blocks is created inside the < div > container through a document createlement ('video') interface, that is, finally 12 players are created for downloading, encoding and playing the video blocks, objects corresponding to the players form a video obj array, so as to obtain the current downloading, buffering and playing status information of all the video blocks, and an onplayback listening event is registered on each player object to detect whether the video is decoded and can be rendered and played. In order to further realize the control of downloading, caching and playing quality of video blocks, an MSE (MSE) object is created and bound for each player object, wherein MSE (media Source extensions) is a Web API MSE with stronger functions introduced by the W3C specification, and the video object is directly controlled through an interface provided by the MSE object, so that the operations of cache size control, code rate control, video playing progress control and the like are realized. This operation is implemented based on the open source library dashjs, which initiates the creation of MSE objects and binding to player objects through the dashjs initial (videoObj) interface. Controls such as cache size, cache policy, video version quality, etc. are specified through the updateSettings interface, and these MSE objects constitute the array MSEObjs. Finally, based on the list of first video chunks located in the user' S field of view determined in step S20, the player object is manipulated by the corresponding MSE object in the MSEObjs array to effect downloading, decoding of the selected first video chunk.

After the downloading and decoding of the plurality of first video sub-blocks are realized, the playing progress of the plurality of first video sub-blocks is further synchronized, which is mainly realized by synchronizing the duration attribute of each player object. As shown in fig. 6, one of the players is first selected as a reference player, and the video chunk data will be continuously downloaded and decoded by the reference player regardless of whether the video chunk data appears in the field of view to maintain the current reference playing progress. The synchronization object can obtain the current latest playing progress from the reference player, and the onduration change event call-back of other players is monitored by the synchronization object through registration, and the onduration change event call-back of the player can send a progress change event call-back to the synchronization object when the playing progress, the playing state and the like of the player are changed; after receiving the call-back, the synchronization object broadcasts the current playing progress of the reference player to all other players, and the other players update the actual duration parameters in the synchronization object according to the needs. In the process of playing the panoramic video, the player corresponding to the video block always generates an onduration change event, so that other players can be actively driven to acquire and synchronize the current playing progress, and the synchronization of the playing progress among a plurality of block videos of the Web platform is realized.

Step S40: and splicing and stitching the plurality of first video blocks, and performing video rendering on the spliced and stitched panoramic video.

After the decoding and synchronization of the blocked panoramic video are completed in step S30, the video images are not rendered by each player, but the pictures in the blocked videos are further spliced into a complete panoramic image, and the panoramic image is finally rendered on the screen. Because the Web platform can not support the direct splicing and stitching processing of a plurality of blocked videos, the Canvas capability of the Web platform is applied in the invention. Specifically, the splicing and stitching are performed on a plurality of first video blocks, and video rendering is performed on the spliced and stitched panoramic video, including: creating a canvas on a Web platform, and drawing an image corresponding to the first video block on the canvas; and creating a sphere model in the panoramic scene, and pasting the panoramic image on the canvas to the sphere of the sphere model.

In the aspect of Web platform video playing technology, from the early stage to the later stage of the 2000 s, video playing on a network mainly depends on a Flash plug-in. To fill this gap, in a new version of the HTML standard, video and audio playback is introduced, and finally the < video > tag is added in the HTML5 standard. It can directly link the video resource to the current page and provides the relevant API to control the playing, pausing, playing speed, etc. of the video. However, the API capability originally provided by the HTML5 is limited, and the W3C specification introduces webapimse (media Source extensions) with stronger functions, and through the interface provided by the API, the video object can be directly controlled, so as to realize more complex control functions, such as cache control, code rate control, video playing progress control, and the like.

The Web image rendering instant noodles mainly relate to Canvas and WebGL related technologies and software libraries. The Canvas is part of HTML5, which provides a Canvas carrier and allows users to dynamically draw 2D or 3D graphics using the Javascript language. WebGL is a Web version of OpenGL, which is a 3D engine that helps users perform 3D operations in a Web browser. In short, Canvas provides image rendering capabilities for the Web, while WebGL provides three-dimensional graphics capabilities for the Web. They are used in 2D or 3D animations, games, and at the same time, they can also be used in the rendering of panoramic video.

In one embodiment, as shown in fig. 7, a canvas is first created by the document createlement ('canvas') function and its height and width are specified. The context object context of the control brush is obtained through a getContext ("2D") interface of the canvas object, then a draw image (video, x, y, width, height) method of the context object is called to directly draw the image in the video player to a 2D canvas, wherein x and y correspond to the position coordinates of the upper left corner of the image on the canvas, and width and height represent the width and height of the image on the canvas. Finally, after the drawing of a plurality of blocks is completed, a complete rectangular image can be obtained, wherein the part drawn by the blocks in the image has pictures, and the part not drawn by the blocks is black.

In order to realize rendering of the panoramic image, firstly a sphere model needs to be drawn in a panoramic scene through the sphere model, and on the basis, the panoramic image drawn on a Canvas is directly used as a texture material to be mapped on the sphere, so that rendering of the panoramic video is realized.

In another embodiment of the present invention, a method for processing a Web-based tiled panoramic video further includes: acquiring input operation of a mouse, and changing the position coordinate of the virtual image acquisition equipment based on the input of the mouse; the input operation of the mouse is a mouse pressing event, a mouse moving event or a mouse releasing event.

Specifically, acquiring an input operation of a mouse, and changing a position coordinate of the virtual image capturing device based on the input of the mouse includes: acquiring the current position coordinate and the initial position coordinate of the mouse, and calculating the offset of the mouse based on the current position coordinate and the initial position coordinate of the mouse; calculating a new position coordinate of the virtual image acquisition equipment based on the offset, the position coordinate of the virtual image acquisition equipment corresponding to the initial position coordinate of the mouse and a preset view switching sensitivity value; and taking the new position coordinate of the virtual image acquisition equipment obtained by calculation as the current position coordinate of the virtual image acquisition equipment.

And calculating new position coordinates of the virtual image acquisition equipment based on the offset, the position coordinates of the virtual image acquisition equipment corresponding to the initial position coordinates of the mouse and a preset view switching sensitivity value, wherein the new position coordinates of the virtual image acquisition equipment comprise: converting the position coordinate of the virtual image acquisition equipment corresponding to the initial position coordinate of the mouse into a first longitude and latitude coordinate; calculating a product of the offset and a preset view switching sensitivity value, and taking the sum of the product and the first longitude and latitude coordinate as a second longitude and latitude coordinate of the virtual image acquisition equipment; and converting the second longitude and latitude coordinates into spherical coordinates to be used as new position coordinates of the virtual image acquisition equipment.

In the embodiment, switching and interaction of the panoramic video view of the Web platform are realized. Namely, the input operation of a mouse and a keyboard of a user is monitored mainly by using an API provided by a Web platform, and the coordinate position of a virtual camera used for clipping rendering content in a panoramic scene is changed based on the input of the mouse and the keyboard. Taking mouse input as an example, three event monitoring events including mousedown, mouseove and mouseup are added through an addEventListener interface of a Web platform, the coordinate position (x, y) of a mouse on a screen when an event is generated can be obtained from parameters of the three events, and the realization process of controlling the virtual camera coordinate based on the mouse event is as follows:

(1) when a mousedown event occurs, it is detected that the viewer may then press and slide the mouse to switch the field of view, and the current mouse coordinate is recorded as the initial coordinate (x) of the mouse^start,y^start) And entering an interactive control mode, wherein any sliding of the mouse in the interactive control mode influences the position of the visual field.

(2) When the mousecover event occurs, the latest mouse position is obtained as (x)^curr,y^curr) The latest mouse position can also be regarded as the current position coordinate of the mouse, so that the offset (x) from the just click of the mouse to the current position can be calculated^curr-x^start,y^curr-y^start)。

For convenience of description, the offset of the mouse in the x and y directions is expressed by Δ x and Δ y, respectively. Since the coordinate position of the virtual camera corresponding to the initial coordinate of the mouse is the spherical coordinate (θ, φ), the spherical coordinate cannot be directly superimposed with the offset of the mouse. The spherical coordinates are thus converted to approximate first longitude and latitude coordinates (lat, lon), and then the offset is multiplied by the horizon-switching sensitivity dpi and superimposed with the first longitude and latitude coordinates. The conversion formula of the spherical coordinate into the first longitude and latitude coordinate is

And the calculation formula of the second longitude and latitude coordinate (lat ', lon') corresponding to the new position coordinate of the virtual camera is as follows:

and finally, converting the calculated new second longitude and latitude coordinate into a spherical coordinate again to serve as a new position coordinate of the virtual camera so as to change the coordinate position of the virtual camera.

(3) When the mouseup event occurs, the user is marked to no longer switch the panoramic view angle through the mouse, the mouse interaction control mode is quitted, and then the user slides the mouse to no longer affect the panoramic view.

Through the embodiment, the method for processing the partitioned panoramic video based on the Web can be found out, and the problem of how to process the panoramic video into the video partitioned and streaming media resources suitable for transmission, decoding and playing on a Web platform is solved; the problem of how to abstract and express the panoramic view of the user based on a related interface provided by a Web platform so as to support the selection of video blocks required by transmission based on the current view of audiences is solved; the problems of how to decode the videos of the panoramic videos in blocks and the synchronization of the playing progress on the Web platform are solved; the problem how to splice the pictures in the panoramic video blocks into a complete panoramic picture on a Web platform and render the panoramic view field on a screen; and how to realize the effect of switching the panoramic view by reading the operations of a user mouse, a keyboard and the like on the Web platform.

Correspondingly, the present invention also provides a Web-based tiled panoramic video processing system, which includes a processor and a memory, wherein the memory stores computer instructions, and the processor is configured to execute the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the system implements the steps of the method according to any of the above embodiments.

In addition, the invention also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method according to any of the above embodiments.

According to the method and the system for processing the partitioned panoramic video based on the Web, disclosed by the invention, on one hand, the utilization rate of panoramic video transmission data can be effectively improved, the whole data volume of panoramic video transmission is reduced, the transmission bandwidth requirement is reduced, and the bandwidth cost is saved. On the other hand, a higher-quality panoramic view can be provided under the same transmission bandwidth, and the problems of low view quality, difficulty in transmission of high-definition panoramic videos and the like in panoramic video application are solved to a certain extent. In addition to the above, the method for processing the partitioned panoramic video based on the Web also solves the problems that the existing partitioned panoramic video player scheme is not suitable for a Web platform and is difficult to implement, realizes the application of the panoramic video player in a Web application mode, has greater advantages in the aspects of application distribution, multi-terminal platform adaptation, installation-free starting and the like compared with desktop application, and is beneficial to the vigorous development of the partitioned panoramic video technology and the wide application of related software.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for processing a partitioned panoramic video based on Web, the method comprising:

2. The Web-based tiled panoramic video processing method according to claim 1, characterized in that it further comprises:

3. The Web-based blocked panoramic video processing method according to claim 2, wherein acquiring an input operation of a mouse and changing the position coordinates of the virtual image capture device based on the input of the mouse comprises:

4. The Web-based blocked panoramic video processing method of claim 3, wherein calculating new position coordinates of the virtual image capturing device based on the offset, position coordinates of the virtual image capturing device corresponding to the initial position coordinates of the mouse, and a preset view switching sensitivity value comprises:

5. The Web-based tiled panoramic video processing method of claim 1, wherein determining a first plurality of video tiles located within a user's field of view based on a distance between a center position coordinate of each of the video tiles and the center coordinate of the field of view comprises:

judging the specific position of the visual field center on the second sphere;

6. The Web-based tiled panoramic video processing method according to claim 1, wherein creating players equal in number to the video tiles, and downloading and decoding an MPD index file corresponding to each first video tile based on a plurality of players corresponding to a plurality of the first video tiles comprises:

7. The Web-based tiled panoramic video processing method of claim 1, wherein synchronizing the playing progress of the plurality of first video tiles comprises:

selecting one of the plurality of players as a reference player, acquiring the playing progress of the video cut blocks played by the reference player, broadcasting the current playing progress of the reference player to other players, and updating the playing progress of the corresponding video cut blocks by the other players based on the received current playing progress of the reference player.

8. The Web-based tiled panoramic video processing method according to claim 1, wherein stitching and stitching a plurality of the first video tiles and performing video rendering on the stitched and stitched panoramic video includes:

and creating a sphere model in a panoramic scene, and mapping the panoramic image on the canvas to the spherical surface of the sphere model.

9. A Web-based tiled panoramic video processing system comprising a processor and a memory, characterized in that said memory has stored therein computer instructions for executing computer instructions stored in said memory, which when executed by the processor implements the steps of the method according to any of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.