WO2023071469A1

WO2023071469A1 - Video processing method, electronic device and storage medium

Info

Publication number: WO2023071469A1
Application number: PCT/CN2022/114283
Authority: WO
Inventors: 许静; 孟宇; 王剑楠
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-10-25
Filing date: 2022-08-23
Publication date: 2023-05-04
Also published as: CN116033180A

Abstract

The present application discloses a video processing method, an electronic device and a storage medium. The video processing method used in a server comprises: acquiring an original video of a film source (S110); calculating the saliency of the original video to obtain information about saliency distribution on the original video (S120); dividing the original video into blocks to obtain multiple sub-videos (S130); and encoding and compressing the sub-videos according to the saliency distribution information (S140).

Description

Video processing method, electronic device and storage medium

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202111239873.4 and a filing date of October 25, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present application relates to but not limited to the field of visual technology, and in particular relates to a video processing method, electronic equipment and storage media.

Background technique

The development of virtual reality technology (Virtual Reality, VR) has been widely known to the present, and as the main bearing form of virtual reality related resources and the main consumption content of virtual reality users, 360-degree panoramic video based on VR is accepted by more and more users. and consumption. Compared with traditional flat video, there is usually only one lens for content shooting, while 360-degree panoramic video is usually shot by multiple lenses at the same time when shooting, and then the content is corrected for distortion and spliced to form a complete panoramic content. As the resolution of VR is getting higher and higher, the increase in the amount of data will also lead to an increase in the amount of calculation in the process of encoding and decoding, which leads to an increase in the transmission bandwidth occupied by it, which greatly increases the use of computing and storage resources. Consumption, affecting the viewing quality of the video.

Contents of the invention

The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

Embodiments of the present application provide a video processing method, electronic equipment, and a storage medium.

In the first aspect, the embodiment of the present application provides a video processing method, which is applied to a server. The video processing method includes: obtaining the original video of the film source; performing saliency calculation on the original video to obtain the saliency distribution information; block the original video to obtain multiple sub-videos; encode and compress the sub-videos according to the saliency distribution information.

In the second aspect, the embodiment of the present application provides a video processing method, which is applied to a playback terminal, and the video processing method includes: sending a playback request of a film source to a server, so that the server determines the film source: receiving the coded and compressed sub-video corresponding to the film source sent by the server, the coded and compressed sub-video is calculated by the server according to the original video of the film source to obtain the original saliency distribution information on the video, and after the original video is divided into blocks to obtain multiple sub-videos, the server encodes and compresses the sub-videos according to the saliency distribution information; according to the encoded and compressed After the sub-video is decoded, the playback video is obtained.

In the third aspect, the embodiment of the present application provides an electronic device, which includes: a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the implementation of the first aspect of the present application is implemented. The video processing method described in any one of the examples or the video processing method described in any one of the embodiments of the second aspect of the present application.

In the fourth aspect, the embodiment of the present application provides a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement any one of the embodiments of the first aspect of the application. The video processing method or the video processing method described in any one of the embodiments of the second aspect of the present application.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Description of drawings

The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

Fig. 1 is a schematic flow chart of a video processing method provided by an embodiment of the present application;

FIG. 2 is a schematic flowchart of a video processing method provided by another embodiment of the present application;

FIG. 3 is a schematic flowchart of a video processing method provided in another embodiment of the present application;

FIG. 4 is a schematic flowchart of a video processing method provided in another embodiment of the present application;

FIG. 5 is a schematic flowchart of a video processing method provided by another embodiment of the present application;

FIG. 6 is a schematic flowchart of a video processing method provided by another embodiment of the present application;

FIG. 7 is a schematic flowchart of a video processing method provided by another embodiment of the present application;

Fig. 8 is a schematic flowchart of a video processing method provided by another embodiment of the present application;

Fig. 9 is a schematic diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

In the description of the present application, it should be understood that the orientation descriptions, such as up, down, front, back, left, right, etc. indicated orientations or positional relationships are based on the orientations or positional relationships shown in the drawings, and are only For the convenience of describing the present application and simplifying the description, it does not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the embodiments of the present application.

It should be understood that in the description of the embodiments of the present application, multiple (or multiple) means more than two, greater than, less than, exceeding, etc. are understood as not including the original number, and above, below, within, etc. are understood as including the original number. If there is a description of "first", "second", etc., it is only for the purpose of distinguishing technical features, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features or implicitly indicating the indicated The sequence relationship of the technical characteristics.

In the description of the embodiments of the present application, unless otherwise clearly defined, terms such as setting, installation, and connection should be understood in a broad sense, and those skilled in the art can reasonably determine the meaning of the above words in the embodiments of the present application in combination with the specific content of the technical solution. Concrete meaning.

With the development of VR to the present, with the improvement of related upstream and downstream industrial chains and the development of software and hardware technology in key links, immersive virtual reality technology is becoming more and more familiar to people. As the main bearing form of virtual reality related resources and the main consumption content of virtual reality users, 360-degree panoramic video starts with 4K VR, and 8K VR is the future development trend, and even in the future, it will develop to 12K VR, 24K VR, etc. For 8K VR, it will take up more than 100 megabits of bandwidth, and the increase in data volume will also lead to an increase in the amount of calculations in processing processes such as encoding and decoding. Therefore, how to ensure users watch high-quality VR content while reducing bandwidth has become an urgent problem to be solved for the further implementation of VR applications.

Based on this, the embodiment of the present application provides a video processing method, electronic equipment and storage medium. The video processing method can be applied in servers and playback terminals. The predicted saliency distribution information realizes the encoding and compression of different sub-videos after division, so that the size of the encoded and compressed sub-videos corresponds to the characteristics of user behavior, so as to realize low-bandwidth video transmission while reducing calculation and storage resource consumption.

Detailed description will be given below.

The embodiment of the present application provides a video processing method applied to a server. Referring to FIG. 1 , the video processing method in the embodiment of the present application includes but not limited to step S110, step S120, step S130 and step S140.

Step S110, acquiring the original video of the film source.

In one embodiment, the server first obtains the original video of the film source. The film source can be an on-demand film source or a live film source for playback by the playback terminal. The film source can be obtained in any form, and the injected film source codes and compresses the format and file The encapsulation format is not limited, and the original video is the video source that has not been processed by the video processing method in this application. The server in the embodiment of this application can be a video server, and the original video can be panoramic video or other types of video. When the original video When it is a panoramic video, the video processing method in the embodiment of the present application can be applied to the field of VR technology, and the processed video can be played by a VR playback terminal. When the original video is other types of video, in the embodiment of the present application The video processed by the video processing method can be played by playing terminals such as mobile phones, tablet computers, and video players. In the following embodiments, the original video may be referred to as a panoramic video or a 360-degree panoramic video.

Step S120, performing saliency calculation on the original video to obtain saliency distribution information on the original video.

In one embodiment, the server uses a saliency prediction algorithm to calculate the saliency of the 360-degree panoramic video, and obtains a two-dimensional matrix of saliency distribution corresponding to the length, width or resolution of the panoramic video. It should be noted that in the embodiment of the present application The length, width or resolution of the obtained saliency distribution two-dimensional matrix can be the same as the original video, and the saliency distribution two-dimensional matrix is used as the saliency distribution information on the original video, and the saliency distribution two-dimensional matrix marks the panoramic video. The saliency value of each pixel. The main purpose of the saliency calculation here is to obtain the prediction result of user behavior. The saliency prediction algorithm in the embodiment of the application can obtain the saliency value of each pixel in the panoramic video. There is no specific limitation on the specific algorithm in this embodiment of the application. It can be understood that the embodiment of this application can give different prediction accuracy results according to different saliency prediction algorithms, and the matching saliency prediction algorithm is selected and applied to the video processing method. middle.

As a new media content, 360-degree panoramic video also introduces some new features compared with traditional flat video. On the video acquisition side, traditional flat video usually has only one lens to shoot content, while 360-degree panoramic video is usually shot by multiple lenses at the same time, and then the content is distorted and spliced to form a complete 360-degree panoramic content , and in the subsequent transmission and storage process, the 360-degree panoramic content will be mapped onto the plane in a non-uniform manner for compression encoding and transmission. Compared with traditional flat video, these processing procedures will introduce additional quality loss to 360-degree panoramic video, and when the video content is played on the playback terminal, users usually wear a head-mounted display device to watch, and traditional flat video will The complete content is directly presented in the center of the user's field of view. When watching a 360-degree panoramic video, the user can only watch partial content in units of field of view, and can independently select the viewing area by turning the head and other actions. On the one hand, the immersive viewing method isolates external visual interference, while the high degree of freedom and the characteristics of local visibility determine that the user's quality perception results will be more affected by local content.

It should be noted that according to the results of immersive user viewing behavior analysis, the head movement data of different users when watching the same 360-degree panoramic video content shows a high degree of consistency, that is, the areas that different users tend to watch It is similar to the length of time spent watching. Furthermore, in an immersive viewing environment, after a quick observation of the entire scene, users tend to fixate on certain areas. According to the characteristics of the user behavior, the calculation of the saliency of the complete 360-degree panoramic video can obtain the two-dimensional matrix of the saliency distribution corresponding to the characteristics of the user behavior. Saliency, also known as saliency, is an important visual feature in a video or image, reflecting the degree to which the human eye attaches importance to certain areas of the image. For a video, the user is only interested in some areas in the video, or the area in the video that the user tends to watch. This part of the interested area represents the user's query intention, while other uninterested areas are related to The user's query intention is irrelevant, and the area characterized by salience is the area in the video that most arouses the user's interest and expresses the video content.

Step S130, divide the original video into blocks to obtain multiple sub-videos.

In one embodiment, the server divides the panoramic video into blocks (TILE) to obtain sub-videos on multiple blocks, and the server can divide the panoramic video into blocks of the same or different sizes to obtain multiple sub-videos of the same or different sizes, For example, the service can divide sub-videos with a slightly larger block size according to the direction in which the user's viewing angle tends to be higher in the panoramic video, and the sub-videos in other areas can be divided into smaller blocks, which can be set according to actual needs, so as to improve video quality. Encoding and decoding effect to improve the playback quality of the sub-block sub-video. It should be noted that the number of sub-blocks can be set according to actual needs. In one embodiment, the server divides the panoramic video with a resolution of 3840X1920 into blocks according to the length and width of 4x3 , to obtain 12 sub-videos in total, the resolution of each sub-video is 960x640, and the server can adjust the number of sub-videos obtained by the sub-videos according to the bandwidth requirement, when the 12 sub-videos obtained in the above-mentioned embodiment The bandwidth meets the design requirements, that is, there is no need to perform other numbers of blocks, and the server can set the number of blocks according to actual needs or artificial intelligence algorithms. In another embodiment, the more blocks, the obtained sub-video The more, the more obvious the bandwidth reduction effect can be achieved, but the required processing resources will increase. The server can set the corresponding number of blocks according to the bandwidth requirements to meet the bandwidth requirements and processing requirements. This application does not make specific restrictions on it .

It should be noted that, in order to reduce the overall transmission bandwidth, a window-dependent transmission mechanism is a necessary and feasible panoramic video transmission scheme. The window-dependent transmission mechanism is to transmit some content that the user is highly concerned about in a high-quality form, or the user's current The content of the window that is being watched, and the content of other areas that will not be seen temporarily, the transmission quality can be appropriately reduced, or even not transmitted. In order to realize the panoramic video transmission that depends on the user's window, the embodiment of the present application adopts the Tiled Streaming strategy, which mainly divides the uncompressed original high-resolution video into several low-resolution video frame by frame in the pixel domain. The spatial block video of high rate, and the sub-video of each block is coded into independently decodable media content, and such scheme usually needs to carry out the coding of different quality to each block, in one embodiment, for the user's When watching the area within the viewing angle, the playback terminal can download the high-quality segmented content of the corresponding area from the server side, and download the low-quality segmented content of the corresponding area for the area outside the viewing angle.

It should be noted that the block in the embodiment of the present application can be understood as, in High Efficiency Video Coding (HEVC), an image can be divided into several Tile, that is, horizontally and vertically The image is divided into several rectangular areas, and these rectangular areas are called blocks, and the blocks can be defined by MPEG HEVC encoding, which is not specifically limited in this application.

The order of the above steps S120 and S130 is not in any particular order, and the saliency calculation can also be done in blocks first.

Step S140, encoding and compressing the sub-video according to the saliency distribution information

In an embodiment, the server encodes and compresses the sub-videos according to the saliency distribution information. It can compress and encode all the obtained sub-videos, and can also compress and encode the sub-videos in some areas. It should be noted that HEVC is The optimization algorithm in the representative coding standard is designed for traditional video and does not take into account some new features of panoramic video. Therefore, how to optimize the encoder for panoramic video and make full use of service resources has become another urgent problem. The problem. In the embodiment of the present application, sub-videos are non-uniformly coded on the basis of blocks, and sub-videos of different qualities can be obtained, and the video quality obtained after encoding and compressing sub-videos with high salience is higher than that of sub-videos with low salience. For video, the main purpose of this embodiment of the application is to obtain the prediction results of user behavior through saliency calculation. Based on the saliency distribution information obtained by different predictions, the encoding and compression of different sub-videos after division is realized, so that the encoded and compressed sub-videos The size corresponds to the characteristics of user behavior, so that low-bandwidth video transmission can be achieved while reducing the consumption of computing and storage resources.

It can be understood that, in the panoramic video in one embodiment, the user tends to watch the sub-videos in certain areas with higher quality after coding and compression, while the video quality of sub-videos in other positions is lower after coding and compression. low, and the user tends to watch a small proportion of the panoramic video in certain areas, that is, the number of high-quality sub-videos is less than the number of low-quality sub-videos, so after the video processing method in the embodiment of the present application, all Compared with the original video, the obtained compressed video can reduce the consumption of computing and storage resources while transmitting the video with low bandwidth.

Referring to FIG. 2 , in an embodiment, the above step S120 may also include but not limited to step S210 and step S220.

Step S210, acquiring the first frame or key frame of the original video.

Step S220, performing saliency calculation on the original video according to the first frame or key frame, to obtain saliency distribution information on the first frame or key frame.

In one embodiment, the saliency distribution information obtained by the server is calculated based on the first frame or key frame of the original video, and the server obtains the first frame or key frame of the original video according to the preset requirements. The frame can be the most salient image frame in the original video selected by the user or the developer according to the actual needs. According to the first frame or key frame, the image screenshot of the original video at this frame can be obtained, and the server performs the salient degree calculation, and the calculation result is applied to other image frames in the panoramic video. It can be understood that, in the video processing method in the embodiment of the present application, the saliency distribution information obtained at the first frame or key frame of the original video can represent the prediction result of the user behavior of the user on the original video, and the obtained saliency Distribution information is applied to the image frames of the entire video for encoding and compression processing. The server can refresh the saliency distribution information periodically or aperiodically according to actual needs, and reacquire key frames to calculate new saliency distribution information. This application does not specifically limit it.

Referring to FIG. 3 , in an embodiment, the above step S140 may also include but not limited to step S310 to step S330.

In step S310, the saliency weight value corresponding to each sub-video is obtained according to the saliency distribution information.

In step S320, the encoding parameters of each sub-video are obtained according to the saliency weight value.

Step S330, encoding and compressing the corresponding sub-video according to the encoding parameters.

In one embodiment, the server calculates the corresponding weight of the sub-video saliency on each block, obtains the corresponding saliency value on each sub-video according to the saliency distribution information, and obtains the saliency value on the total panoramic video according to the saliency distribution information. The total number of saliency, the saliency weight value corresponding to each sub-video is obtained from the ratio of the saliency value on the sub-video to the total saliency value, and the server assigns different encoding parameters according to the saliency weight value on each sub-video, In order to carry out corresponding sub-videos according to different encoding parameters, after different encoding parameters encode and compress the sub-videos, video files with different qualities can be obtained, so after the video processing method in the embodiment of the present application, the obtained compressed Compared with the original video, the video can achieve low-bandwidth video transmission while reducing the consumption of computing and storage resources.

Referring to FIG. 4 , in an embodiment, the above step S320 may also include but not limited to steps S410 to S440.

Step S410, acquiring the pre-allocated total quality parameter corresponding to the original video.

Step S420, acquiring a preset quality parameter corresponding to the sub-video.

Step S430, obtain the pre-allocation quality parameter of the sub-video according to the pre-allocation total quality parameter and the saliency weight value.

In step S440, the encoding parameters of the sub-video are determined according to the pre-assigned quality parameters and the preset quality parameters.

In one embodiment, the server sets different encoding parameters for sub-videos according to different quality parameters, and the server obtains the pre-allocated total quality parameters corresponding to the panoramic video. The total quality parameter that needs to be set is less than the original quality parameter of the original video, and the preset quality parameter corresponding to the sub-video is obtained. The preset quality parameter can be the encoder’s preset minimum or maximum value for the sub-video, and then the server will The pre-assigned total quality parameter and saliency weight value are assigned to each sub-video to obtain the pre-assigned quality parameters of each sub-video, and the encoding parameters of the sub-video are determined according to the relationship between the pre-assigned quality parameter and the preset quality parameter, which can be understood Notably, the quality parameter in the embodiment of the present application may be the video code rate, or other video coding parameters that affect image quality, such as quantization parameter (QP), key frame interval (GOP), resolution, frame rate, etc. The server compares the size relationship between the pre-allocated quality parameter and the preset quality parameter, and obtains an appropriate one as an encoding parameter.

It should be noted that, according to different quality parameters, the encoding parameters obtained by the server are also different. For example, when the quality parameter is the bit rate, the preset quality parameter is the preset bit rate, and the preset bit rate corresponds to the encoding of the sub-video. It is related to the device and has a certain limit. When the preset quality parameter is greater than the pre-allocated quality parameter, the server can use the pre-allocated quality parameter or the size of the preset quality parameter as the encoding parameter to encode and compress the sub-video. When the preset quality parameter is less than When pre-allocating quality parameters, the server can no longer assign higher quality parameters to the sub-video, so the server can encode and compress the sub-video according to the size of the preset quality parameter as an encoding parameter; another example, when the quality parameter is resolution , the preset quality parameter is the preset resolution, and the preset resolution is related to the sub-video size of the obtained block. When the preset quality parameter is greater than the pre-allocated quality parameter, the server can use the size of the pre-allocated quality parameter as the encoding parameter as The sub-video is encoded and compressed. When the preset quality parameter is smaller than the pre-allocated quality parameter, the server can use the pre-allocated quality parameter or the size of the preset quality parameter as the encoding parameter to encode and compress the sub-video; the above examples are only examples and are not intended to Representation is a limitation on the embodiments of the application.

Referring to FIG. 5 , in an embodiment, the above step S320 may also include but not limited to step S510 and step S520.

In step S510, a difference is obtained according to the pre-allocated quality parameter and the preset quality parameter.

Step S520, obtaining an updated pre-allocated total quality parameter according to the difference value and the pre-allocated total quality parameter.

In an embodiment, after obtaining the pre-allocated quality parameters of each sub-video, the server obtains the difference between the pre-allocated quality parameters and the preset quality parameters according to the value of the two, and uses the obtained difference and The original pre-allocated total quality parameter is calculated and updated to obtain the updated pre-allocated total quality parameter. It should be noted that the pre-allocated total quality parameter can be a preset parameter whose size corresponds to the bandwidth, and then The updated pre-allocated total quality parameter is obtained through difference correction, so that the updated pre-allocated total quality parameter can be smaller, and low-bandwidth video transmission is realized. It can be understood that, in an embodiment, in the process of updating the pre-allocation total quality parameter, if the pre-allocation quality parameter of a certain sub-video is higher than its preset quality parameter, the subsequently updated pre-allocation total quality parameter can be improved. , but for the overall panoramic video, the proportion of sub-videos whose pre-allocated quality parameters are higher than the preset quality parameters is not high. On the whole, the updated pre-allocated total quality parameters are compared with the original pre-allocated total quality parameters. The quality parameter is reduced. In another embodiment, the updated pre-allocated total quality parameter may be the same as the original pre-allocated total quality parameter, but it will be smaller than the original quality parameter of the original video. This embodiment of the present application does not Make specific restrictions.

In one embodiment, taking the quality parameter as the code rate as an example, a code rate selection algorithm is used to determine the specific encoding code rate of each sub-video, and the code rate selection algorithm steps are as follows:

1) The total code rate is limited to R _total , then the code rate that should be assigned to the i-th sub-video is: R _i =R _total w _i , where i represents the i-th sub-video, and w _i is the i-th sub-video corresponding Significance weight value.

2) If the minimum code rate corresponding to the sub-video is R _low , the minimum code rate can be the above-mentioned preset quality parameter, if the code rate calculated in 1) is smaller than the minimum code rate, that is, R _i <R _low , then the code rate value is updated as The lowest code rate, that is, R _i =R _low ; update R _total , if the difference between R _i and R _low is D _i , subtract D _i from R _total to get the latest R _total .

3) If the highest bit rate corresponding to the sub-video is R _high , the highest bit rate can be the above preset quality parameter, if 1) the calculated bit rate is greater than the highest bit rate, that is, R _i >R _high , then the bit rate value is updated as The highest code rate, that is, R _i =R _total ; update R _total , if the difference between R _i and R _high is D _i , add D _i to R _total to get the latest R _total .

Under the premise of meeting the requirements of the embodiments of the present application, the difference between the above steps 2) and 3) may not be used to update the pre-allocated total quality parameter. For example, when the preset quality parameter is less than or greater than the pre-allocated quality parameter, the server may According to the preset quality parameter as the encoding parameter of the corresponding sub-video, for the encoding of the remaining sub-videos, the pre-assigned total quality parameter can be subtracted from the value of the sub-video whose encoding parameters have been allocated, and then the remaining pre-assigned total quality parameter value and the saliency weight values of the remaining sub-videos to obtain the pre-allocated quality parameters of the remaining sub-videos, thereby obtaining the encoding parameters of the remaining sub-videos; it can also be understood that, for example, when the preset quality parameter is greater than the pre-allocated quality parameter, The server can use the pre-allocation quality parameter as the encoding parameter of the corresponding sub-video, and after updating the pre-allocation total quality parameter through the difference obtained from the above calculation, for the encoding of the remaining sub-videos, the updated pre-allocation total quality parameter can be reduced After removing the values of the sub-videos for which the encoding parameters have been allocated, the pre-allocated quality parameters of the remaining sub-videos are obtained according to the value of the remaining pre-allocated total quality parameters and the saliency weight values of the remaining sub-videos, so that the encoding of the remaining sub-videos can be obtained Parameters, the above example is only an example, and does not mean to limit the embodiment of the present application.

Referring to FIG. 6 , the video processing method in the embodiment of the present application may further include but not limited to step S610 and step S620.

Step S610, acquire the video encapsulation protocol.

Step S620, perform streaming media transmission encapsulation on the encoded and compressed sub-video according to the encapsulation protocol.

In one embodiment, the server needs to carry out streaming media transmission encapsulation to the encoded video, the server can obtain its video encapsulation protocol according to the video playback device, and perform streaming media transmission encapsulation on the encoded and compressed sub-video according to different encapsulation protocols, For example, if some encapsulation protocols need to be encapsulated periodically, the server completes the encapsulation of the encoded and compressed sub-video once within a certain period. The encapsulation protocols include but are not limited to HLS, DASH, and MSS, etc., and the embodiments of this application do not specify them. limit.

In one embodiment, the server generates playback index files for each sub-video according to the encoded and compressed sub-videos, and adds information fields of sub-videos in blocks to the playback index file, and marks the block number and location information corresponding to each video file , used for player terminal identification.

The embodiment of the present application also provides a video processing method, which is applied to the playback terminal. Referring to FIG. 7 , the video processing method in the embodiment of the present application includes but not limited to step S710, step S720 and step S730.

Step S710, sending a play request of the film source to the server, so that the server determines the film source according to the play request.

Step S720, receiving the coded and compressed sub-video corresponding to the film source sent by the server, the coded and compressed sub-video is calculated by the server according to the saliency of the original video of the film source to obtain the saliency distribution information on the original video, and the original video After obtaining multiple sub-videos by dividing into blocks, the server encodes and compresses the sub-videos according to the saliency distribution information.

Step S730, decode the encoded and compressed sub-video to obtain the playing video.

In one embodiment, the video processing method mentioned in the embodiment of the present application is applied on the playback terminal, and the encoded and compressed sub-video in the embodiment of the present application is obtained by the video processing method executed by the server in the above embodiment, here No need to go into details, the playback terminal sends a play request of the source to the server, and after the server determines the corresponding source according to the playback request, the playback terminal can receive the encoded and compressed sub-video corresponding to the source sent by the server, and compress the sub-video according to the encoding. After the sub-video is decoded, the playback video is obtained. In the embodiment of this application, the original video is a 360-degree panoramic video as an example. Therefore, the playback terminal in the embodiment of the application is a VR playback terminal, but it does not represent a limitation to the embodiment of the application. , the following embodiments may call the original video a panoramic video or a 360-degree panoramic video. In the embodiment of the present application, the main purpose of the saliency calculation is to obtain the prediction results of user behavior, based on the saliency distribution information obtained by different predictions, to realize the encoding and compression of different sub-videos after division, so that the size of the encoded and compressed sub-videos Corresponding to the characteristics of user behavior, the encoded and compressed sub-videos obtained by the playback terminal have different qualities according to the characteristics of the user's position, so as to realize low-bandwidth video transmission and reduce the consumption of computing and storage resources.

Referring to FIG. 8 , in one embodiment, the playback request includes the user's selected area on the viewing angle of the playback terminal, and the above step S720 may also include but not limited to step S810 and step S820.

Step S810, according to the area corresponding to the selected area, determine the sub-video corresponding to the area of the original video block.

Step S820, receiving the encoded and compressed sub-video corresponding to the selected area sent by the server.

In one embodiment, the playback terminal can download all the sub-videos of the panoramic video from the server, and can also obtain the modified sub-videos according to the user's selected area on the viewing angle of the playback terminal. Area, to determine the sub-video corresponding to the area of the original video block. The selected area can be the area selected by the user, or the area where the user's perspective is located. Correspondingly coded and compressed sub-videos can realize low-bandwidth video transmission and reduce consumption of computing and storage resources.

It can be understood that when the playback device is not a VR playback device, but other types of playback devices such as mobile phones, the user can select a viewing area on the playback device, and use this area as the above-mentioned embodiment. Select an area, so that the playback device can obtain the sub-video in this area for decoding according to the user's selection requirements, so as to realize low-bandwidth video transmission and reduce the consumption of computing and storage resources.

In the following, description will be made through specific examples.

Embodiment 1, the embodiment of the present application provides a panoramic video file processing method applied to the on-demand service, and the specific steps are as follows:

1.1) The complete source of panoramic video for on-demand is injected into the video server, and the encoded compression format and file encapsulation format of the injected film source are not limited. The video encoding format can be AVC/H.264, HEVC/H265, etc., and the document encapsulation format can be MP4, MPEG-TS, etc.

1.2) Use the saliency prediction algorithm to calculate the saliency of the 360-degree panoramic video, and obtain a two-dimensional matrix of saliency distribution with the same length and width (resolution) as the panoramic video. Here, a 4K panoramic video with a resolution of 3840x1920 is used to illustrate that the The degree prediction result is a two-dimensional matrix with a size of 3840x1920, and the two-dimensional matrix marks the saliency value of each pixel. The main purpose of the saliency calculation here is to obtain the prediction results of user behavior, and there is no restriction on the specific algorithm. Different saliency prediction algorithms give different prediction accuracy which may affect the final optimization results.

1.3) Divide the panoramic video with a resolution of 3840x1920 into blocks of 4x3 in length and width, a total of 12 blocks, and the resolution of each block is 960x640. In practical applications, the size of the blocks can be the same or different.

The above steps 1.2) and 1.3) are in no particular order, and the saliency calculation can also be done in blocks first.

1.4) Calculate the corresponding weight of each block, and the block weight is calculated as the sum of the saliency values covered by the corresponding 960x640 resolution rectangular box than the sum of all saliency values in the range of 3840x1920.

Above, step 1.2), step 1.3) and step 1.4) can be calculated for the entire video, or for the first frame or some key frames, and the calculation results are also applied to other image frames in the video.

1.5) The specific value of the encoding parameter is determined. The encoding parameter here can be the video bit rate, or other video encoding parameters that affect the image quality, such as quantization parameters, key frame intervals, resolution, frame rate, etc.

Taking the bit rate as an example, the bit rate selection algorithm is used to determine the specific encoding bit rate of each block. The steps of the bit rate selection algorithm are as follows:

1.5.1) The total code rate is limited to R _total , then the code rate that should be assigned to the i-th sub-video is: R _i =R _total w _i , where i represents the i-th sub-video, and w _i is the i-th sub-video Corresponding significance weight value.

1.5.2) If the minimum code rate corresponding to the sub-video is R _low , the minimum code rate can be the above preset quality parameter, if the code rate calculated in 1) is smaller than the minimum code rate, that is, R _i <R _low , then the code rate value Update to the lowest code rate, that is, R _i =R _low ; update R _total , if the difference between R _i and R _low is D _i , subtract D _i from R _total to get the latest R _total .

1.5.3) If the sub-video corresponds to the highest bit rate R _high , the highest bit rate can be the above preset quality parameter, if the calculated bit rate in 1) is greater than the highest bit rate, that is, R _i >R _high , then the bit rate value Update to the highest code rate, that is, R _i =R _total ; update R _total , if the difference between R _i and R _high is D _i , add D _i to R _total to get the latest R _total .

The order of step 1.5.2) and step 1.5.3) can be changed.

1.6) Decode the original panoramic video, and then perform HEVC tile encoding based on MCTS (Motion Constrained Block Set). In the encoding process, according to step 1.5), the compression code rate is limited to complete the encoding operation. In this example, 12 compressed videos that can be completely independently decoded are obtained through encoding.

1.7) Streaming and encapsulating the encoded video to generate a playback index file. Encapsulation protocols include but are not limited to HLS, DASH, and MSS, etc., and the block information field is added to the playback index file to mark the block number and location information corresponding to each video file for playback terminal identification.

Embodiment 2, the embodiment of the present application provides an interaction process between a VR player terminal and a video server in a panoramic video on demand service, and the specific steps are as follows:

2.1) The VR playback terminal initiates a request to the video server to play the index file index.mpd, and index.mpd describes the name, storage path, corresponding block number, location and other information of each video file.

2.2) The video server returns the playback index file index.mpd.

2.3) The VR playback terminal parses index.mpd, obtains video file information, and initiates a video file download request to the video server. It can download video files corresponding to all segments, or only download files corresponding to segments within the viewing angle range.

2.4) The video server sends the video file to the VR playback terminal.

2.5) The VR player terminal reorganizes, decodes and plays the video file.

Embodiment 3, the embodiment of the present application provides a panoramic video file processing method applied to the live broadcast service, and the specific steps are as follows:

3.1) The video server pulls or receives 360-degree panorama live streaming. The coding format of the live stream and the packaging protocol of the streaming transmission are not limited. The video coding protocol can be AVC/H.264, HEVC/H265, etc., and the packaging protocol of the streaming transmission It can be RTSP, RTMP, HLS, etc.

3.2) Use the saliency prediction algorithm to calculate the saliency of the panoramic video live stream, and obtain the same saliency distribution two-dimensional matrix as the panoramic video length and width (resolution). The degree prediction result is a two-dimensional matrix with a size of 3840x1920, and the two-dimensional matrix marks the saliency value of each pixel.

3.3) Divide the panoramic video with a resolution of 3840x1920 into blocks of 4x3 in length and width, a total of 12 blocks, and the resolution of each block is 960x640. In practical applications, the sizes of the blocks may be the same or different.

The above steps 3.2) and 3.3) are in no particular order, and the saliency calculation can also be done in blocks first.

3.4) Calculate the corresponding weight of each block, and the block weight is calculated as the sum of the saliency values covered by the corresponding 960x640 resolution rectangle box than the sum of all saliency values in the range of 3840x1920.

The above steps 3.2), 3.3) and 3.4) can be calculated for a certain key frame or some key frames of the live stream, and the calculation results are also applied to other image frames in the live stream.

3.5) The specific value of the encoding parameter is determined. The encoding parameter here can be the video bit rate, or other video encoding parameters that affect the image quality, such as quantization parameters, key frame intervals, resolution, frame rate, etc.

3.5.1) The total code rate is limited to R _total , then the code rate that should be assigned to the i-th sub-video is: R _i =R _total w _i , where i represents the i-th sub-video, and w _i is the i-th sub-video Corresponding significance weight value.

3.5.2) If the minimum code rate corresponding to the sub-video is R _low , the minimum code rate can be the above preset quality parameter, if the code rate calculated in 1) is smaller than the minimum code rate, that is, R _i <R _low , then the code rate value Update to the lowest code rate, that is, R _i =R _low ; update R _total , if the difference between R _i and R _low is D _i , subtract D _i from R _total to get the latest R _total .

3.5.3) If the sub-video corresponds to the highest bit rate R _high , the highest bit rate can be the above preset quality parameter, if the calculated bit rate in 1) is greater than the highest bit rate, that is, R _i >R _high , then the bit rate value Update to the highest code rate, that is, R _i =R _total ; update R _total , if the difference between R _i and R _high is D _i , add D _i to R _total to get the latest R _total .

The order of step 3.5.2) and step 3.5.3) can be changed.

3.6) At each interval, the panoramic video live stream in the period is decoded, and then HEVCtile encoding based on MCTS is performed. In the encoding process, according to step 3.5), the compression code rate is limited to complete the encoding operation. In this example, 12 compressed videos that can be completely independently decoded are obtained through encoding.

3.7) Perform streaming slice encapsulation on the encoded video in the above period to generate playback index files. The encapsulation protocols include but are not limited to HLS, DASH and MSS, etc. Add the block information field to the playback index file to mark each video The block number and location information corresponding to the file are used for playback terminal identification.

Embodiment 4. The embodiment of the present application provides an interaction process between a VR playback terminal and a video server in a panoramic video live broadcast service. The specific steps are as follows:

4.1) The VR playback terminal periodically initiates a request to the video server to play the index file index.mpd. index.mpd describes the name, storage path, corresponding block number, location and other information of the video files in the last few cycles.

4.2) The video server returns the latest playback index file index.mpd.

4.3) The VR playback terminal parses index.mpd, obtains video file information, and initiates a video file download request to the video server. It can download video files corresponding to all segments, or only download files corresponding to segments within the viewing angle range.

4.4) The video server sends the video file to the VR playback terminal.

4.5) The VR player terminal reorganizes, decodes and plays the video file.

The server in the above embodiments has video encoding and decoding and storage capabilities, and the playback terminal has panoramic video decoding and playback capabilities.

FIG. 9 shows an electronic device 100 provided by an embodiment of the present application. The electronic device 100 includes: a processor 101 , a memory 102 , and a computer program stored on the memory 102 and operable on the processor 101 , and the computer program is used to execute the above video processing method when running.

The processor 101 and the memory 102 may be connected through a bus or in other ways.

The memory 102, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer executable programs, such as the video processing method described in the embodiment of the present application. The processor 101 implements the above video processing method by running the non-transitory software programs and instructions stored in the memory 102 .

The memory 102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store and execute the aforementioned video processing method. In addition, the memory 102 may include a high-speed random access memory 102, and may also include a non-transitory memory 102, such as at least one storage device, a flash memory device or other non-transitory solid-state storage devices. In some implementations, the memory 102 may optionally include memory 102 remotely located relative to the processor 101 , and these remote memory 102 may be connected to the electronic device 100 through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to realize the above-mentioned video processing method are stored in the memory 102. When executed by one or more processors 101, the above-mentioned video processing method is executed, for example, the method steps in FIG. 1 are executed. S110 to step S140, method step S210 to step S220 in Fig. 2, method step S310 to step S330 in Fig. 3, method step S410 to step S440 in Fig. 4, method step S510 to step S520 in Fig. 5, Fig. Steps S610 to S620 of the method in FIG. 6 , steps S710 to S730 of the method in FIG. 7 , and steps S810 to S820 of the method in FIG. 8 .

The embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions, and the computer-executable instructions are used to execute the above-mentioned video processing method.

In one embodiment, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more control processors, for example, performing steps S110 to S140 of the method in FIG. Method step S210 to step S220 in, method step S310 to step S330 in Fig. 3, method step S410 to step S440 in Fig. 4, method step S510 to step S520 in Fig. 5, method step S610 to step S520 in Fig. 6 Step S620, the method steps S710 to S730 in FIG. 7, and the method steps S810 to S820 in FIG. 8.

The embodiment of the present application at least includes the following beneficial effects: the video processing method in the embodiment of the present application can be applied to a server or a playback terminal. Then the server divides the original video into blocks to obtain multiple sub-videos, and then encodes and compresses the sub-videos according to the saliency distribution information to obtain the encoded and compressed sub-videos, so that the playback terminal can send the sub-videos to the server. After receiving the playback request from the source, the encoded and compressed sub-video corresponding to the source sent by the server is received, and the playback terminal decodes it to obtain the playback video. Based on the saliency distribution information obtained by different predictions, the different sub-videos after segmentation are encoded and compressed, so that the size of the encoded and compressed sub-videos corresponds to the characteristics of user behavior, so that low-bandwidth video transmission can be realized. At the same time, the consumption of computing and storage resources is reduced.

The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, storage device storage or other magnetic storage devices, or Any other medium that can be used to store desired information and that can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embody computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

It should also be understood that the various implementation manners provided in the embodiments of the present application may be combined arbitrarily to achieve different technical effects.

The above is a specific description of some implementations of the present application, but the present application is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present application. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

A video processing method applied to a server, the video processing method comprising:

Get the original video of the film source;

performing saliency calculation on the original video to obtain saliency distribution information on the original video;

Blocking the original video to obtain a plurality of sub-videos;

Encoding and compressing the sub-video is performed according to the saliency distribution information.
The video processing method according to claim 1, wherein said performing saliency calculation on said original video to obtain saliency distribution information on said original video comprises:

Obtain the first frame or key frame of the original video;

Performing saliency calculation on the original video according to the first frame or the key frame to obtain the saliency distribution information on the first frame or the key frame.
The video processing method according to claim 1 or 2, wherein said encoding and compressing said sub-video according to said saliency distribution information comprises:

obtaining a saliency weight value corresponding to each of the sub-videos according to the saliency distribution information;

Obtain encoding parameters of each sub-video according to the saliency weight value;

Encoding and compressing the corresponding sub-videos according to the encoding parameters.
The video processing method according to claim 3, wherein said obtaining the coding parameters of said sub-video according to said saliency weight value comprises:

Obtain the pre-allocated total quality parameter corresponding to the original video;

Acquiring preset quality parameters corresponding to the sub-video;

Obtaining a pre-allocated quality parameter of the sub-video according to the pre-allocated total quality parameter and the saliency weight value;

The encoding parameter of the sub-video is obtained by determining according to the pre-allocated quality parameter and the preset quality parameter.
The video processing method according to claim 4, wherein said obtaining the encoding parameters of said sub-video according to said saliency weight value further comprises:

obtaining a difference according to the pre-allocated quality parameter and the preset quality parameter;

The updated pre-allocated total quality parameter is obtained according to the difference value and the pre-allocated total quality parameter.
The video processing method according to claim 1, wherein the video processing method further comprises:

Obtain the video encapsulation protocol;

Perform streaming media transmission encapsulation on the encoded and compressed sub-video according to the encapsulation protocol.
A video processing method applied to a playback terminal, the video processing method comprising:

Sending a play request of the film source to the server, so that the server determines the film source according to the play request;

receiving the coded and compressed sub-video corresponding to the film source sent by the server, the coded and compressed sub-video is calculated by the server according to the original video of the film source to obtain the saliency distribution information, and after the original video is divided into blocks to obtain a plurality of sub-videos, the server encodes and compresses the sub-videos according to the saliency distribution information;

The sub-video compressed according to the encoding is decoded to obtain a playing video.
The video processing method according to claim 7, wherein the playback request includes receiving the encoded and compressed video file corresponding to the film source sent by the server in the selected area on the viewing angle of the playback terminal. sub-videos, including:

determining the sub-video corresponding to the area of the original video block according to the area corresponding to the selected area;

receiving the coded and compressed sub-video corresponding to the selected area sent by the server.
An electronic device, including: a memory, a processor, the memory stores a computer program, and when the processor executes the computer program, the video processing method or the video processing method described in any one of claims 1 to 6 is implemented. The video processing method described in any one of claims 7 to 8.
A computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the video processing method according to any one of claims 1 to 6 or any one of claims 7 to 8 One described video processing method.