CN117676110A

CN117676110A - Video processing method, device and server

Info

Publication number: CN117676110A
Application number: CN202211009109.2A
Authority: CN
Inventors: 张立欧
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2024-03-08

Abstract

The disclosure provides a video processing method, a device and a server, wherein the method comprises the following steps: acquiring a first full-view video, wherein the first full-view video is associated with video streams with a plurality of code rates, and an image frame in the video streams comprises a plurality of image fragments; acquiring a first view angle range and a first view angle center when a user watches the first full view angle video; determining a target video stream in the video streams with the multiple code rates, and determining a first image slice in multiple image slices of an image frame of the target video stream based on the first view angle range and the first view angle center; and sending the first image fragment to terminal equipment. And the utilization rate of transmission resources is improved.

Description

Video processing method, device and server

Technical Field

The embodiment of the disclosure relates to the technical field of video processing, in particular to a video processing method, a video processing device and a server.

Background

The server may send a full view video (e.g., virtual Reality (VR) video) to the terminal device to cause the terminal device to play the VR video.

Currently, the server may send a high-definition full-frame image of the 3D effect (each frame of image in the VR video is an image of the 3D effect) to the terminal device. For example, the video frame in the VR video may be an image obtained by cube mapping, and the server may generate a high-definition full-frame image with a 3D effect and send the high-definition full-frame image to the terminal device. However, the viewing angle range of the user is smaller, and the user cannot see each region in the full-frame image, so that more pixel transmission code rate is wasted, and the utilization rate of transmission resources is lower.

Disclosure of Invention

The disclosure provides a video processing method, a video processing device and a server, which are used for solving the technical problem of low utilization rate of transmission resources in the prior art.

In a first aspect, the present disclosure provides a video processing method, the method comprising:

acquiring a first full-view video, wherein the first full-view video is associated with video streams with a plurality of code rates, and an image frame in the video streams comprises a plurality of image fragments;

acquiring a first view angle range and a first view angle center when a user watches the first full view angle video;

determining a target video stream in the video streams with the multiple code rates, and determining a first image slice in multiple image slices of an image frame of the target video stream based on the first view angle range and the first view angle center;

and sending the first image fragment to terminal equipment.

In a second aspect, the present disclosure provides a video processing apparatus, including a first acquisition module, a second acquisition module, a determination module, and a transmission module, wherein:

the first acquisition module is used for acquiring a first full-view video, wherein the first full-view video is associated with video streams with a plurality of code rates, and image frames in the video streams comprise a plurality of image fragments;

The second acquisition module is used for acquiring a first view angle range and a first view angle center when a user watches the first full view angle video;

the determining module is used for determining a target video stream in the video streams with the multiple code rates, and determining a first image slice in multiple image slices of an image frame of the target video stream based on the first view angle range and the first view angle center;

the sending module is used for sending the first image fragment to terminal equipment.

In a third aspect, embodiments of the present disclosure provide a server, comprising: a processor and a memory;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the video processing method as described above in the first aspect and the various possible aspects of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the video processing method as described in the first aspect and the various possible aspects of the first aspect above.

In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the video processing method as described above in the first aspect and the various possible aspects of the first aspect.

The disclosure provides a video processing method, a device and a server, wherein the server acquires a first full-view video, wherein the first full-view video is associated with video streams with a plurality of code rates, an image frame in the video stream comprises a plurality of image slices, a first view angle range and a first view angle center when a user watches the first full-view video are acquired, a target video stream is determined in the video streams with the plurality of code rates, and a first image slice is determined in the plurality of image slices of the image frame of the target video stream based on the first view angle range and the first view angle center, and the first image slice is sent to a terminal device. In the method, the server can send the video in the area focused by the user to the terminal equipment according to the area focused by the user, and the code rate of the video in the area focused by the user is higher, so that the user experience can be improved, the pixel transmission code rate can be saved, and the utilization rate of transmission resources can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present disclosure, and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of a cube map provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a process for determining a first image slice according to an embodiment of the disclosure;

fig. 5 is a schematic diagram of a process of sending image slices to a terminal device according to an embodiment of the disclosure;

fig. 6 is a flowchart of a method for acquiring a first full view video according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a process for obtaining view coverage according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of determining a target image slicing policy according to an embodiment of the disclosure;

FIG. 9A is a schematic diagram of a process for encoding an image frame according to an embodiment of the present disclosure;

FIG. 9B is a schematic diagram of another encoding process for image frames according to an embodiment of the present disclosure;

fig. 10 is a process schematic diagram of a video processing method according to an embodiment of the disclosure;

fig. 11 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of another video processing apparatus according to an embodiment of the present disclosure; the method comprises the steps of,

fig. 13 is a schematic structural diagram of a server according to an embodiment of the disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

For ease of understanding, concepts related to the embodiments of the present disclosure will be first described.

Terminal equipment: is a device with wireless receiving and transmitting function. The terminal device may be deployed on land, including indoors or outdoors, hand-held, wearable or vehicle-mounted; can also be deployed on the water surface (such as a ship, etc.). The terminal device may be a mobile phone (mobile phone), a tablet computer (Pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal device, an augmented reality (augmented reality, AR) terminal device, a wireless terminal in industrial control (industrial control), a vehicle-mounted terminal device, a wireless terminal in unmanned driving (self driving), a wireless terminal in remote medical (remote medical), a wireless terminal in smart grid (smart grid), a wireless terminal in transportation security (transportation safety), a wireless terminal in smart city (smart city), a wireless terminal in smart home (smart home), a wearable terminal device, or the like. The terminal device according to the embodiments of the present disclosure may also be referred to as a terminal, a User Equipment (UE), an access terminal device, a vehicle terminal, an industrial control terminal, a UE unit, a UE station, a mobile station, a remote terminal device, a mobile device, a UE terminal device, a wireless communication device, a UE proxy, or a UE apparatus, etc. The terminal device may also be fixed or mobile.

In the related art, the server may send the VR video to the terminal device, so that the terminal device plays the VR video. At present, the server can send high-definition full-picture images with 3D effects to the terminal equipment so as to improve the experience of watching VR videos by users. However, the user has less attention to the high-definition full-frame (for example, the user does not pay attention to the pictures in other directions when looking at the picture in front of the user), and converting the 3D effect image in the VR video into the high-definition full-frame image wastes more pixel transmission code rate (for example, the region not paying attention to the user also displays the high-definition full-frame image, and a large amount of transmission code rate is required in the transmission process), so that the utilization rate of transmission resources is lower.

In order to solve the technical problems in the related art, an embodiment of the present disclosure provides a video processing method, in which a server obtains a plurality of image slicing strategies capable of dividing an image frame into a plurality of image slices, obtains a second full-view video in which a preset view angle range and a first full-view video are not subjected to image slicing, determines a target image slicing strategy based on the preset view angle range and the plurality of image slicing strategies, obtains a first full-view video based on the target image slicing strategy and the second full-view video, obtains a first view angle range and a first view angle center when a user views the first full-view video, determines a target video stream in video streams of a plurality of code rates, determines a first image slice in the plurality of image slices of the image frame of the target video stream based on the first view angle range and the first view angle center, and transmits the first image slice to a terminal device. Therefore, the server can combine the view angle range, accurately divide the image frame in the second full view angle video into a plurality of image fragments, and the server can determine the first image fragment focused by the user according to the first view angle range and the first view angle center in real time of the user, so that the server can send the first image fragment to the terminal equipment, the terminal equipment can play the region focused by the user, the viewing experience of the user is improved, the pixel transmission code rate is further saved, and the utilization rate of transmission resources is improved.

Next, an application scenario of the embodiment of the present disclosure will be described with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure. Please refer to fig. 1, which includes a server and a virtual reality device. The virtual reality device sends an acquisition request of the ultra-clear VR video to the server, the server determines a plurality of image slices in the VR video, and determines a first image slice of the ultra-clear code rate in the plurality of image slices according to the view angle range and the view angle center of the user, and the server sends the VR video to the terminal device, wherein the region of interest of the user in the VR video is the first image slice of the ultra-clear code rate, and the code rates of other image slices (the region not of interest of the user) can be high-definition code rates. The virtual reality device may display the video frames when it receives the video frames of the VR video.

Referring to fig. 1, the area (first image slice) watched by the user is within the dotted line of the picture watched by the user, the area (other image slices) not watched by the user is outside the dotted line, the image definition of the area watched by the user is super-definition, and the image definition of the area not watched by the user is high definition. Therefore, the server can ensure that the region watched by the user is an ultra-clear image, the experience of the user watching VR video is not affected, the pixel transmission code rate can be saved, and the utilization rate of transmission resources is improved.

The following describes the technical solutions of the present disclosure and how the technical solutions of the present disclosure solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a video processing method according to an embodiment of the disclosure. Referring to fig. 2, the method may include:

s201, acquiring a first full view video.

The execution subject of the embodiments of the present disclosure may be a server, or may be a video processing apparatus provided in the server. The video processing device can be realized by software, and the video processing device can also be realized by a combination of software and hardware.

Optionally, the first full view video is associated with a video stream of multiple code rates. Alternatively, the first full view video may be a video to be played. For example, the first full view video may include VR video, 180 degree view video, spherical video, and the like. For example, the first full view video may be associated with a video stream of standard definition rate, a video stream of high definition rate, a video stream of super definition rate, and so on.

It should be noted that, the server may transcode the video streams with multiple code rates associated with the first full-view video in advance, and may transcode the video streams with the associated code rates when the terminal device requests to obtain the first full-view video, which is not limited in this embodiment of the present disclosure, and when the server receives a request sent by the terminal device to obtain the first full-view video, the server may determine, according to the identifier in the request, the first full-view video that the terminal device requests to play.

Optionally, the image frames in the video stream comprise a plurality of image slices. For example, each image frame in the video stream of the high definition code rate associated with the first full view video includes 100 image slices, through which one image frame may be composed.

Optionally, the number of image slices in the image frames of the video stream of the plurality of code rates associated with the first full view video is the same. For example, if an image frame in a video stream of a super-definition rate associated with a first full view video includes 100 image slices, then an image frame in a video stream of a high definition rate associated with a first full view video also includes 100 image slices, and each image slice is the same size.

Optionally, image frames in the video stream of the plurality of code rates associated with the first full view video may be mapped into the 3D model. For example, each image frame may be mapped to a cube model, a sphere model, and so the image frames may have the effect of a 3D display.

Next, a cube map will be described with reference to fig. 3.

Fig. 3 is a schematic diagram of a cube map provided in an embodiment of the present disclosure. Referring to fig. 3, the method includes: a cube model and an image frame in a first full view video. The cube model is a play model associated with the VR video. For example, 6 images are displayed on 6 surfaces of the cube model, respectively, thereby realizing the VR video effect. And combining the images in 6 faces of the cube model into 1 image to obtain an image frame in the first full-view video. The image frame comprises 2 rows and 3 columns of images, wherein the 1 st image of the 1 st row is an image of the left side face of the cube model, the 2 nd image of the 1 st row is an image of the front face of the cube model, the 3 rd image of the 1 st row is an image of the right side face of the cube model, the 1 st image of the 2 nd row is an image of the bottom face of the cube model, the 2 nd image of the 2 nd row is an image of the back face of the cube model, and the 3 rd image of the 2 nd row is an image of the top face of the cube model.

Alternatively, the server may obtain the first full view video in response to the first operation. Optionally, the first operation may be an operation that the user requests to play the first full view video in the terminal device. For example, when a user uses the VR device, if the user clicks a play control of the first full-view video a, the VR device sends an acquisition request of the first full-view video a to the server, and the server may send an image frame of the first full-view video a to the VR device; if the user clicks the play control of the first full-view video B, the VR device sends an acquisition request of the first full-view video B to the server, and the server may send an image frame of the first full-view video B to the VR device. It should be noted that, the VR video may be a video synthesized by a plurality of image frames, so when the VR device sends a request for acquiring the first full-view video to the server, the server may send a plurality of image frames corresponding to the first full-view video to the VR device.

S202, acquiring a first view angle range and a first view angle center when a user watches a first full view angle video.

Alternatively, the first viewing angle range may be a range that can be covered by the viewing angle of the user. For example, the viewing angle ranges of each user are different, and when the user uses the terminal device, the terminal device can detect both eyes of the user, so as to obtain the first viewing angle range of the user. Alternatively, the first viewing angle range may be a fixed viewing angle range, which is not limited by the embodiments of the present disclosure.

The first view center may be a focus center of a user when viewing the first full view video. For example, if the user is focusing on the upper left corner while watching the first full view video, the first view center is the upper left corner of the first full view video. It should be noted that, the terminal device may obtain the first viewing angle center of the user in real time through a preset viewing angle detection algorithm.

S203, determining a target video stream in the video streams with the multiple code rates.

Alternatively, the server may determine the target video stream from among the video streams of the plurality of code rates. For example, when the terminal device requests to acquire the first full-view video, the terminal device may send a video acquisition request to the server, where the video acquisition request may include an identifier of the first full-view video and a code rate to be played, and the server may determine, according to the code rate to be played in the video acquisition request, a target video stream from among video streams of multiple code rates associated with the first full-view video.

For example, if the video acquisition request sent by the terminal device to the server includes a high-definition code rate and a first full-view video a, the server determines a video stream of the high-definition code rate associated with the first full-view video a as a target video stream; if the video acquisition request sent by the terminal equipment to the server comprises the super-clear code rate and the first full-view video B, the server determines the video stream of the super-clear code rate associated with the first full-view video B as a target video stream.

S204, determining a first image slice among a plurality of image slices of an image frame of the target video stream based on the first view angle range and the first view angle center.

Alternatively, the server may determine the first image slice among the plurality of image slices of the image frame of the target video stream based on the first view range and the first view center. For example, if the image frame of the target video stream includes an image slice a, an image slice B, an image slice C, and an image slice D, and the view angle coverage image slice C and a part of the image slice D of the user are determined based on the first view angle range and the first view angle center, the server determines the image slice C and the image slice D as the first image slice.

Next, a process of determining the first image patch by the server will be described with reference to fig. 4.

Fig. 4 is a schematic diagram of a process for determining a first image slice according to an embodiment of the disclosure. Referring to fig. 4, the method includes: image frames of the target video stream. The image frames of the target video stream comprise an image slice 1, an image slice 2, an image slice 3, an image slice 4, an image slice 5 and an image slice 6. If the user view covers image tile 2, image tile 3, image tile 5, and image tile 6, then the server determines image tile 2, image tile 3, image tile 5, and image tile 6 as the first image tile.

S205, sending the first image segmentation to the terminal equipment.

Alternatively, the server may send the first image slice to the terminal device. For example, if the server determines that the view angle of the user covers the image slice a and the image slice B, and the terminal device requests that the acquired video code rate is the super-clear code rate, the server may send the super-clear image slice a and the super-clear image slice B to the terminal device.

Optionally, the server may also send the first image slice to the terminal device based on the following possible implementation manner: a first video stream is determined among the video streams of the plurality of code rates. Optionally, the code rate of the first video stream is smaller than the code rate of the target video stream. For example, if the code rate of the target video stream is an ultra-clear code rate, the code rate of the first video stream may be standard definition or high definition.

And acquiring a plurality of third image slices associated with the image frames of the first video stream, and sending the first image slices and the plurality of third image slices to the terminal equipment. For example, if the image frame of the first video stream includes image slice a, image slice B, image slice C, and image slice D, the server determines that the third image slice includes image slice a, image slice B, image slice C, and image slice D. For example, if the code rate of the target video stream is an ultra-clear code rate and the code rate of the first video stream is a high-definition code rate, the server sends the first image slice of the ultra-clear code rate and all the image slices in the first video stream to the terminal device. In this way, the server sends the plurality of third image slices and the first image slices to the terminal equipment, but the required code rate is still smaller than the code rate required by sending all the image slices in the target video stream to the terminal equipment, so that the code rate of transmission can be reduced, and the utilization rate of transmission resources can be improved.

Optionally, before the server obtains the first image slice, if the terminal device has watched the position of the first image slice, the server may first send the third image slice of the position to the terminal device, and send the first image slice to the terminal device when the timestamp of the first image slice obtained by the server is greater than the timestamp of the position watched by the user. For example, in the field of live VR, if the timestamp played by the terminal device is greater than the timestamp of the first image slice generated by the server (i.e., the live broadcast delay seen by the user is low, and the server has not yet generated the first image slice at that moment), the server may first send the third image slice corresponding to the timestamp to the terminal device, and when the timestamp of the first image slice loaded by the server is greater than the timestamp of the viewing position of the user, send the first image slice to the terminal device, so that the clamping of the VR live broadcast process can be avoided, and the playing effect is improved.

Next, a procedure of transmitting image slices to the terminal device by the server during VR live broadcast will be described with reference to fig. 5.

Fig. 5 is a schematic diagram of a process of sending image slices to a terminal device according to an embodiment of the disclosure. Please refer to fig. 5, which includes a coordinate system. The horizontal axis of the coordinate system is the time axis, and the vertical axis of the coordinate system is the space axis. Before the user cuts, the server may generate a first image slice a and send the first image slice a to the VR device, and after the user cuts, the speed of the server generating the first image slice is lower than the speed of the user watching VR live broadcast, so the server sends a third image slice B and a third image slice C to the terminal device. When the server generates that the timestamp of the first image slice is greater than the timestamp of the location where the user viewed the VR live, the server again sends the first image slice D to the VR device. Like this, can avoid the live card of VR to stop, improve the broadcast effect of VR live.

The embodiment of the disclosure provides a video processing method, a server acquires a first full view video, wherein the first full view video is associated with video streams with a plurality of code rates, an image frame in the video stream comprises a plurality of image slices, a first view range and a first view center when a user views the first full view video are acquired, a target video stream is determined in the video streams with the plurality of code rates, a first image slice is determined in the image slices of the image frame of the target video stream based on the first view range and the first view center, the first video stream is determined in the video streams with the plurality of code rates, the code rate of the first video stream is smaller than the code rate of the target video stream, a plurality of third image slices of the first video stream are acquired, and the first image slices and the plurality of third image slices are sent to terminal equipment. In the method, the server can send the video in the region focused by the user in the target video stream to the terminal equipment, and can send the first video stream after the code rate is reduced to the terminal equipment, so that when the user suddenly moves the visual angle, the third image fragment with the low code rate can still be seen, the user experience is improved, the pixel transmission code rate is saved, and the utilization rate of transmission resources is improved.

On the basis of any one of the above embodiments, a method for acquiring the first full view video in the above video processing method will be further described with reference to fig. 6.

Fig. 6 is a flowchart of a method for acquiring a first full view video according to an embodiment of the present disclosure. Referring to fig. 6, the method includes:

s601, acquiring a plurality of image segmentation strategies.

Optionally, the image slicing partition strategy is used to divide the image frames in the video into a plurality of image slices. For example, the image slicing partitioning policy may include: each image frame in the video is divided equally into 100 image slices.

Optionally, the number of image slices divided by any two image slice division policies is different. Alternatively, the image slicing policy may be a preset policy. For example, the image slicing scheme 1 may divide an image frame into 50 image slices on average, the image slicing scheme 2 may divide an image frame into 100 image slices on average, the image slicing scheme 3 may divide an image frame into 150 image slices on average, etc., it should be noted that, in the embodiment of the present disclosure, the image slicing scheme may divide an image frame into any number of image slices on average, but the number of image slices divided in an image frame by any two image slicing schemes is different.

S602, acquiring a preset view angle range and a second full view angle video.

Optionally, the preset viewing angle range may be any set viewing angle area. For example, the preset viewing angle range may be a viewing angle area of a user using the terminal device. For example, in practical applications, most users use VR devices with a viewing angle range of approximately 25% of full frames, so that most users can view a fixed size. Optionally, the server may also obtain a preset viewing angle range according to a viewing angle range preset in the server by different users. For example, user a's view angle range is 20% of full frame, user B's view angle range is 30% of full frame, the server may determine that the preset view angle range is 20% of full frame when user a uses the VR device, and the server may determine that the preset view angle range is 30% of full frame when user B uses the VR device.

Alternatively, the second full-view video may be a video in which the first full-view video is not image-sliced. For example, the image frames in the video streams of the plurality of code rates associated with the first full-view video are all videos after the image slicing has been performed, and therefore, the video content of the second full-view video is the same as the video content of the first full-view video, and the image frames in the video streams of the plurality of code rates associated with the second full-view video are not image sliced.

S603, determining a target image slicing strategy based on a preset view angle range and a plurality of image slicing strategies.

Alternatively, the target image tile partitioning policy may be determined according to the following possible implementation: based on a preset view angle range, determining view angle coverage rate associated with each image segmentation strategy, obtaining a plurality of view angle coverage rates, and determining a target image segmentation strategy in the plurality of image segmentation strategies based on the plurality of view angle coverage rates.

Optionally, the view coverage is a ratio of a preset view range to an area of the image slice covered by the preset view range. For example, if the viewing angle range is 1 square meter, the viewing angle range covers 4 image slices, and the area of each image slice is 0.5 square meter, the viewing angle coverage is 50%.

Alternatively, the preset viewing angle coverage is inversely proportional to the transmitted redundant pixels. For example, in the practical application process, the content user in the partial image slices covered by the view angle edge cannot pay attention to, but the image definition of the region needs to be improved during transcoding, so that the higher the view angle coverage ratio is, the smaller the area of the region is, the fewer the redundant pixels are transmitted, the lower the view angle coverage ratio is, the larger the area of the region is, and the more the redundant pixels are transmitted.

Optionally, for any one image slicing and dividing strategy, determining the view coverage rate associated with each image slicing and dividing strategy based on a preset view range specifically includes: based on an image slicing partitioning strategy, the image frames in the second full view video are partitioned into a plurality of second image slices. For example, if the image slice division policy may divide an image frame in a video into 100 image slices, the server may divide the image frame in the second full view video into 100 image slices on average.

And determining target image slices covered by the preset visual angle range from the plurality of second image slices. For example, when the size of the view angle range of the user is the same, the target image slice covered by the view angle range can be determined by covering the plurality of second image slices by the view angle range. For example, if the image frame in the second full view video is divided into 100 image slices on average, and the preset view angle range can cover 10 image slices therein, the 10 image slices are determined as the target image slices.

And obtaining the view coverage rate associated with the image classification and division strategy based on the preset view range and the target image segmentation. For example, the server may acquire a first area of the target image patch and a second area of the preset view angle range, and determine the view angle coverage based on the first area and the second area. For example, the server may determine a ratio of a preset view range to a sum of areas of the plurality of target image slices as a view coverage associated with the image slice division policy. For example, the image slicing dividing policy is to divide an image frame in a video into 100 image slices on average, an area indicated by each image slice is 2 square meters, and an area indicated by a preset view angle range of a user is 1 square meter, so that the preset view angle range of the user can cover 1 image slice, and the view angle coverage corresponding to the image slicing dividing policy is 50% (the area ratio of the view angle range to the 1 image slice).

Next, a procedure for acquiring the view coverage corresponding to the image slice division policy will be described with reference to fig. 7.

Fig. 7 is a schematic diagram of a process for obtaining view coverage according to an embodiment of the disclosure. Referring to fig. 7, an image frame in a second full view video is included. Wherein the image segmentation division strategy equally divides the image frame into a plurality of image segments (only 6 image segments are shown in fig. 7), each image segment having an area of 1 square meter. The first image includes a viewing angle range, and the viewing angle range covers an area of 3 square meters.

Referring to fig. 7, the view angle range can cover 6 image slices in the image frame at maximum, and since the area indicated by each image slice is 1 square meter and the area indicated by the view angle range is 3 square meters, the view angle coverage rate corresponding to the image slice division strategy is 50%.

Optionally, determining the target image segmentation policy from the multiple image segmentation policies based on the multiple view coverage rates, specifically includes: a first curve is determined based on the plurality of view coverage rates and the image slicing scheme associated with each view coverage rate. Alternatively, the ordinate of the first curve may be the view coverage, and the abscissa of the first curve may be the number of image slices that divide the image frame into a plurality of image slices according to the image slice division policy.

And obtaining the inflection point of the first curve, and determining the target image segmentation strategy based on the inflection point. For example, the image slicing scheme corresponding to the number of image slices of the inflection point abscissa of the first curve may be determined as the target image slicing scheme. For example, if the abscissa of the inflection point of the first curve is 100, the target image slicing dividing policy is to divide the image frame in the second full view video into 100 image slices on average, and if the abscissa of the inflection point of the first curve is 200, the target image slicing dividing policy is to divide the image frame in the second full view video into 200 image slices on average.

Next, a process of determining the target image slice division policy will be described with reference to fig. 8.

Fig. 8 is a schematic diagram of determining a target image slicing policy according to an embodiment of the disclosure. Please refer to fig. 8, which includes a coordinate system. The coordinate system comprises a first curve, the horizontal axis of the coordinate system is the image segmentation number, and the vertical axis of the coordinate system is the view coverage rate. The point a is an inflection point in the first curve, and the number of image slices corresponding to the point a on the horizontal axis is determined to be 72, so that the target image slice division strategy is determined to divide the video frame into 72 image slices on average.

S604, obtaining a first full-view video based on a target image slicing and dividing strategy and the second full-view video.

Optionally, the server may process each image frame in the second full view video based on the target image slicing policy, so as to obtain the first full view video. For example, if the target image slicing policy is to divide the image frames into 50 image slices on average, the server may divide each image frame in the second full view video into 50 image slices on average, so as to obtain the first full view video.

Optionally, after the server divides the image frames in the second full-view video into a plurality of image slices to obtain the first full-view video, the server further needs to transcode the first full-view video, and since the view includes a left-eye view and a right-eye view, each image frame of the first full-view video is associated with a left-eye view image and a right-eye view image, and the server may encode the first full-view video according to the following possible implementation manner: and performing stitching processing on the left eye view angle image and the right eye view angle image associated with each image frame to obtain a plurality of stitched images, and performing encoding processing on the plurality of stitched images. This can improve coding efficiency while keeping the total number of views unchanged. For example, the server may directly stitch the left-eye view image and the right-eye view image associated with the image frame, or the server may flip one of the view images by 180 degrees (e.g., flip the left-eye view image or the right-eye view image by 180 degrees) and stitch the flipped view image with the other view image (e.g., stitch the left-eye view image by 180 degrees and then stitch the left-eye view image with the right-eye view image), so that the image similarity of the stitched area after the flipping is high, and thus the coding efficiency of the first full-view video may be improved.

Next, a process in which the server encodes the first image after the first image is divided into a plurality of image slices will be described in detail with reference to fig. 9A to 9B.

Fig. 9A is a schematic diagram of a process for encoding an image frame according to an embodiment of the disclosure. Referring to fig. 9A, a left eye view image and a right eye view image associated with an image frame are included. The left eye visual angle image comprises 4 image slices, and the right eye visual angle image also comprises 4 image slices. When the image frame is encoded, the server can turn over the right-eye view angle image by 180 degrees, and splice the 1 st image slice in the left-eye view angle image and the 1 st image slice in the right-eye view angle image left and right, splice the 2 nd image slice in the left-eye view angle image and the 2 nd image slice in the right-eye view angle image left and right, splice the 3 rd image slice in the left-eye view angle image and the 3 rd image slice in the right-eye view angle image left and right, and splice the 4 th image slice in the left-eye view angle image left and right. In this way, since the image similarity between the same-numbered image slices is high, the encoding efficiency of the image frames can be improved.

Fig. 9B is a schematic diagram of another process for encoding an image frame according to an embodiment of the present disclosure. Referring to fig. 9B, left-eye view images and right-eye view images associated with an image frame are included. The left eye visual angle image comprises 4 image slices, and the right eye visual angle image comprises 4 image slices. When encoding the image frame, the server may splice the left-eye view image and the right-eye view image up and down, for example, splice the 1 st image slice in the left-eye view image and the 1 st image slice in the right-eye view image up and down, splice the 2 nd image slice in the left-eye view image and the 2 nd image slice in the right-eye view image up and down, splice the 3 rd image slice in the left-eye view image and the 3 rd image slice in the right-eye view image up and down, and splice the 4 th image slice in the left-eye view image and the 4 th image slice in the right-eye view image up and down.

The embodiment of the disclosure provides a method for dividing a plurality of image slices, which comprises the steps of obtaining a plurality of image slice division strategies, obtaining a preset view angle range and a second full view angle video, determining a target image slice division strategy based on the preset view angle range and the plurality of image slice division strategies, and obtaining a first full view angle video based on the target image slice division strategy and the second full view angle video. Therefore, the image frame in the second full-view video can be divided into a plurality of image slices with less pixel redundancy (not least, because the image slices are determined according to the inflection point of the first curve, the processing efficiency of the server and the number of redundant pixels can be considered), so that the pixel redundancy in the transmission process can be reduced, and the utilization rate of transmission resources can be improved.

On the basis of any one of the above embodiments, a procedure of the above video processing method will be described below with reference to fig. 10.

Fig. 10 is a process schematic diagram of a video processing method according to an embodiment of the disclosure. Referring to fig. 10, the method includes: a server, a virtual reality device, and a video frame in a second full view video. After the server obtains the video frame, the video frame is divided into 100 image slices, the server divides the video frame into 100 image slices according to the view angle range of the user, the view angle coverage of the view angle range in the 100 image slices is 1 square meter, and the view angle range is 5 square meters, so that the video frame is divided into 100 image slices, the view angle coverage of the video frame is 5/6, and the target image slice division policy is determined to divide the video frame into 100 image slices, so as to obtain a first full view video, wherein the image frames in the first full view video are divided into 100 image slices (for convenience of illustration, the view angle coverage of other image slice division policies are not shown in fig. 10, and the specific image slice division policy can refer to the embodiments shown in fig. 7 and 8).

Referring to fig. 10, a virtual reality device sends a request for acquiring a first full view video with an ultra-clear code rate to a server, the server acquires a video stream with the ultra-clear code rate associated with the first full view video and a video stream with the high definition code rate associated with the first full view video, determines a target image slice in the video stream with the ultra-clear code rate based on a first view range and a first view center of a user, and sends the target image slice with the ultra-clear code rate and a first video frame in the video stream with the high definition code rate to a terminal device.

Referring to fig. 10, after receiving the target image slice and the first video frame, the virtual reality device may determine an area 1 corresponding to the target image slice in the first video frame, and cover the area 1 in the first video frame by the target image slice to obtain a target image, and display the target image, where an ultra-clear code rate is in the area 1 in the target image, and a high-definition code rate is in an area outside the area 1. Therefore, the server can ensure that the region watched by the user is an ultra-clear image, the experience of the user watching VR video cannot be affected, in addition, only the definition of part of the region in the image frame is ultra-clear, the definition of the other part of the image frame is high-definition, and further, the pixel transmission code rate is saved, and the utilization rate of transmission resources is improved.

Fig. 11 is a schematic structural diagram of a video processing apparatus according to an embodiment of the disclosure. Referring to fig. 11, the video processing apparatus 110 includes a first acquisition module 111, a second acquisition module 112, a determination module 113, and a transmission module 114, wherein:

the first obtaining module 111 is configured to obtain a first full-view video, where the first full-view video is associated with a video stream with multiple code rates, and an image frame in the video stream includes multiple image slices;

the second obtaining module 112 is configured to obtain a first view angle range and a first view angle center when the user views the first full view video;

the determining module 113 is configured to determine a target video stream from among the video streams with the multiple code rates, and determine a first image slice from among multiple image slices of an image frame of the target video stream based on the first view angle range and the first view angle center;

the sending module 114 is configured to send the first image slice to a terminal device.

In one possible implementation manner, the first obtaining module 111 is specifically configured to:

acquiring a plurality of image segmentation strategies, wherein the image segmentation strategies are used for dividing an image frame in a video into a plurality of image segments;

Acquiring a preset view angle range and a second full view angle video, wherein the second full view angle video is a video which is not subjected to image slicing division of the first full view angle video;

determining a target image slicing strategy based on the preset view angle range and the plurality of image slicing strategies;

and obtaining the first full-view video based on the target image slicing and dividing strategy and the second full-view video.

based on the preset view angle range, determining view angle coverage rate associated with each image segmentation strategy to obtain a plurality of view angle coverage rates;

and determining the target image slicing strategy in the image slicing strategies based on the multiple view coverage rates.

dividing the image frame in the second full view video into a plurality of second image slices based on the image slice division policy;

determining target image slices covered by the preset visual angle range from the plurality of second image slices;

and obtaining the view coverage rate associated with the image classification and division strategy based on the preset view range and the target image segmentation.

acquiring a first area of the target image slice and a second area of the preset visual angle range;

the view coverage is determined based on the first area and the second area.

determining a first curve based on the plurality of view coverage rates and an image slicing partitioning strategy associated with each view coverage rate;

and acquiring an inflection point of the first curve, and determining the target image slicing and dividing strategy based on the inflection point.

In one possible implementation, the sending module 114 is specifically configured to:

determining a first video stream from the video streams with the plurality of code rates, wherein the code rate of the first video stream is smaller than that of the target video stream;

and acquiring a plurality of third image slices associated with the image frames of the first video stream, and sending the first image slices and the plurality of third image slices to the terminal equipment.

The video processing device provided in the embodiments of the present disclosure may be used to execute the technical solutions of the embodiments of the methods, and the implementation principle and the technical effects are similar, and are not repeated here.

Fig. 12 is a schematic structural diagram of another video processing apparatus according to an embodiment of the disclosure. On the basis of the illustration in fig. 11, referring to fig. 12, the video processing apparatus 110 further includes a splicing module 115, where the splicing module 115 is configured to:

performing stitching processing on the left eye view angle image and the right eye view angle image associated with each image frame to obtain a plurality of stitched images;

and carrying out coding processing on the spliced images.

Fig. 13 is a schematic structural diagram of a server according to an embodiment of the disclosure. Referring to fig. 13, a schematic diagram of a server 1300 suitable for implementing embodiments of the present disclosure is shown, where the server 1300 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (Personal Digital Assistant, PDA for short), a tablet (Portable Android Device, PAD for short), a portable multimedia player (Portable Media Player, PMP for short), an in-vehicle terminal (e.g., an in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The server illustrated in fig. 13 is merely an example, and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in fig. 13, the server 1300 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 1301, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage device 1308 into a random access Memory (Random Access Memory, RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the server 1300 are also stored. The processing device 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

In general, the following devices may be connected to the I/O interface 1305: input devices 1306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1307 including, for example, a liquid crystal display (Liquid Crystal Display, abbreviated as LCD), a speaker, a vibrator, or the like; storage 1308 including, for example, magnetic tape, hard disk, etc.; and communication means 1309. The communication means 1309 may allow the server 1300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 13 shows a server 1300 having various devices, it is to be understood that not all illustrated devices are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communications device 1309, or installed from the storage device 1308, or installed from the ROM 1302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 1301.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the server; or may exist alone without being assembled into the server.

The computer-readable medium carries one or more programs which, when executed by the server, cause the server to perform the methods shown in the above embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or it may be connected to an external computer (e.g., connected via the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations. The data may include information, parameters, messages, etc., such as tangential flow indication information.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A video processing method, comprising:

and sending the first image fragment to terminal equipment.

2. The method of claim 1, wherein the acquiring the first full view video comprises:

3. The method of claim 2, wherein the determining a target image slicing strategy based on the preset view range and the plurality of image slicing strategies comprises:

4. A method according to claim 3, characterized by a partitioning strategy for any one image slice; based on the preset view angle range, determining the view angle coverage rate associated with each image slicing and partitioning strategy comprises the following steps:

5. The method of claim 4, wherein the obtaining the view coverage associated with the image classification policy based on the preset view range and the target image slice comprises:

The view coverage is determined based on the first area and the second area.

6. The method of any of claims 3-5, wherein the determining the target image segmentation strategy among the plurality of image segmentation strategies based on the plurality of view coverage rates comprises:

7. The method according to any of claims 1-6, wherein said sending the first image slice to a terminal device comprises:

8. The method of any of claims 2-7, wherein each image frame of the first full view video is associated with one left eye view image and one right eye view image; after the first full-view video is obtained based on the target image slicing and dividing strategy and the second full-view video, the method further comprises:

and carrying out coding processing on the spliced images.

9. The video processing device is characterized by comprising a first acquisition module, a second acquisition module, a determination module and a sending module, wherein:

10. A server, comprising: a processor and a memory;

the memory stores computer-executable instructions;

The processor executing computer-executable instructions stored in the memory, causing the processor to perform the video processing method of any one of claims 1 to 8.

11. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the video processing method of any of claims 1 to 8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the video processing method according to any one of claims 1 to 8.